ai-evaluation 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,193 @@
1
+ Metadata-Version: 2.3
2
+ Name: ai-evaluation
3
+ Version: 0.1.0
4
+ Summary: We help GenAI teams maintain high-accuracy for their Models in production.
5
+ Author: Future AGI
6
+ Author-email: no-reply@futureagi.com
7
+ Requires-Python: >3.9
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: Programming Language :: Python :: 3.10
10
+ Classifier: Programming Language :: Python :: 3.11
11
+ Classifier: Programming Language :: Python :: 3.12
12
+ Classifier: Programming Language :: Python :: 3.13
13
+ Requires-Dist: futureagi (>=0.6.2)
14
+ Description-Content-Type: text/markdown
15
+
16
+ # Future AGI
17
+
18
+ ![Company Logo](https://fi-content.s3.ap-south-1.amazonaws.com/Logo.png)
19
+
20
+ Welcome to Future AGI - Empowering GenAI Teams with Advanced Performance Management
21
+
22
+ # Overview
23
+
24
+ Future AGI provides a cutting-edge platform designed to help GenAI teams maintain peak model accuracy in production environments.
25
+ Our solution is purpose-built, scalable, and delivers results 10x faster than traditional methods.
26
+
27
+ **Key Features**
28
+
29
+ * **_Simplified GenAI Performance Management_**: Streamline your workflow and focus on developing cutting-edge AI models.
30
+ * **_Instant Evaluation_**: Score outputs without human-in-the-loop or ground truth, increasing QA team efficiency by up to 10x.
31
+ * **_Advanced Error Analytics_**: Gain ready-to-use insights with comprehensive error tagging and segmentation.
32
+ * **_Configurable Metrics_**: Define custom metrics tailored to your specific use case for precise model evaluation.
33
+
34
+ # Quickstart
35
+ ---
36
+ title: Quickstart
37
+ ---
38
+
39
+ This guide will walk you through setting up an evaluation in **Future AGI**, allowing you to assess AI models and workflows efficiently. You can run evaluations via the **Future AGI platform** or using the **Python SDK**.
40
+
41
+ ## Access API Key
42
+
43
+ To authenticate while running evals, you will need Future AGI's API keys, which you can get access by following below steps:
44
+
45
+ - Go to your Future AGI dashboard
46
+ - Click on **Keys** under **Developer** option from left column
47
+
48
+ - Copy both, **API Key** and **Secret Key**
49
+
50
+ ---
51
+
52
+ ## Setup Evaluator
53
+
54
+ Install the Future AGI Python SDK using below command:
55
+
56
+ ```python
57
+ pip install ai-evaluation
58
+ ```
59
+
60
+ Then initialise the Evaluator:
61
+
62
+ ```python
63
+ from fi.eval import Evaluator
64
+
65
+ evaluator = Evaluator(
66
+ fi_api_key="your_api_key",
67
+ fi_secret_key="your_secret_key",
68
+ )
69
+ ```
70
+
71
+ We recommend you to set the `FI_API_KEY` and `FI_SECRET_KEY` environment variables before using the `Evaluator` class, instead of passing them as parameters.
72
+
73
+ ---
74
+
75
+
76
+ ## Running Your First Eval
77
+
78
+ This section walks you through the process of running your first evaluation using the Future AGI evaluation framework. To get started, we'll use **Tone Evaluation** as an example.
79
+
80
+ ### a. Using Python SDK
81
+
82
+ **Define the Test Case**
83
+
84
+ Create a test case containing the **text input** that will be evaluated for tone.
85
+
86
+ ```python
87
+ from fi.testcases import TestCase
88
+
89
+ test_case = TestCase(
90
+ input='''
91
+ Dear Sir, I hope this email finds you well.
92
+ I look forward to any insights or advice you might have
93
+ whenever you have a free moment.
94
+ '''
95
+ )
96
+
97
+ ```
98
+
99
+ You can also directly send the data through a dictionary with valid keys. However, it is recommended to use the `TestCase` class when working with Future AGI Evaluations.
100
+
101
+
102
+ **Configure the Evaluation Template**
103
+
104
+ For **Tone Evaluation**, we use the **Tone Evaluation Template** to analyse the sentiment and emotional tone of the input.
105
+
106
+ ```python
107
+ from fi.eval.templates import Tone
108
+
109
+ tone_eval = Tone() # This is the evaluation template to use provided by Future AGI
110
+ ```
111
+
112
+ [Click here to read more about all the Evals provided by Future AGI](/future-agi/products/evaluation/eval-definition/overview)
113
+
114
+ **Run the Evaluation**
115
+
116
+ Execute the evaluation and retrieve the results.
117
+
118
+ ```python
119
+ result = evaluator.evaluate(eval_templates=[tone_eval], inputs=[test_case])
120
+ tone_result = result.eval_results[0].metrics[0].value
121
+ ```
122
+
123
+
124
+ To Evaluate the data on your own evaluation template which you have created, you can use the `evaluate` function with the `eval_templates` parameter.
125
+
126
+ ```python
127
+ from fi.eval import evaluate
128
+
129
+ result = evaluate(eval_templates="name-of-your-eval", inputs={
130
+ "input": "your_input_text",
131
+ "output": "your_output_text"
132
+ })
133
+
134
+ print(result.eval_results[0].metrics[0].value)
135
+ ```
136
+
137
+ ### b. Using Web Interface
138
+
139
+ **Select a Dataset**
140
+
141
+ Before running an evaluation, ensure you have selected a dataset. If no dataset is available, follow the steps to **Add Dataset** on the Future AGI platform.
142
+
143
+ [Read more about all the ways you can add dataset](/future-agi/products/dataset/overview)
144
+
145
+ **Access the Evaluation Panel**
146
+
147
+ - Navigate to your dataset.
148
+ - Click on the **Evaluate** button in the top-right menu.
149
+ - This will open the evaluation configuration panel.
150
+
151
+ **Starting a New Evaluation**
152
+
153
+ - Click on the **Add Evaluation** button.
154
+ - You will be directed to the Evaluation List page.
155
+ You can either create your own evaluation or select from the available templates built by Future AGI.
156
+ - Click on one of the available templates.
157
+ - Write the name of the evaluation and select the required dataset column.
158
+ <Tip>
159
+ Checkmark on **Error Localization** if you want to localize the errors in the dataset when the datapoint is evaluated and fails the evaluation.
160
+ </Tip>
161
+ - Click on the **Add & Run** button.
162
+
163
+
164
+ ## Creating a New Evaluation
165
+
166
+ Future AGI provides a wide range of evaluation templates to choose from. You can create your own evaluation to tailor your needs by following below simple steps:
167
+
168
+ - Click on the **Create your own eval** button after clicking on the **Add Evaluation** button.
169
+ - Write the name of the evaluation, this name will be used to identify the evaluation in the evaluation list. only lower case letters, numbers and underscores are allowed in the name.
170
+ - Select either **Use Future AGI Agents** or **Use other LLMs**
171
+
172
+ **Future AGI Agents** are our own proprietary models trained on a vast variety of datasets to perform evaluations. These models vary in capabilities and are suited for different use cases:
173
+ - **TURING_LARGE** – Flagship evaluation model that delivers best-in-class accuracy across multimodal inputs (text, images, audio). Recommended when maximal precision outweighs latency constraints.
174
+
175
+ - **TURING_SMALL** – Compact variant that preserves high evaluation fidelity while lowering computational cost. Supports text and image evaluations.
176
+
177
+ - **TURING_FLASH** – Latency-optimised version of TURING, providing high-accuracy assessments for text and image inputs with fast response times.
178
+
179
+ - **PROTECT** – Real-time guardrailing model for safety, policy compliance, and content-risk detection. Offers very low latency on text and audio streams and permits user-defined rule sets.
180
+
181
+ - **PROTECT_FLASH** – Ultra-fast binary guardrail for text content. Designed for first-pass filtering where millisecond-level turnaround is critical.
182
+
183
+ - In the Rule Prompt, you can write the rules that the evaluation should follow. Use `{{}}` to create a key (variable), that variable will be used in future when you configure the evaluation.
184
+ - Choose Output Type As either Pass/Fail or Percentage or Deterministic Choices
185
+ - **Pass/Fail**: The evaluation will return either Pass or Fail.
186
+ - **Percentage**: The evaluation will return a Score between 0 and 100.
187
+ - **Deterministic Choices**: The evaluation will return a categorical choice from the list of choices.
188
+ - Select the Tags for the evaluation that are suitable to use case.
189
+ - Write the description of the evaluation that will be used to identify the evaluation in the evaluation list.
190
+ - Checkmark on **Check Internet** to power your evaluation with the latest information.
191
+ - Click on the **Create Evaluation** button.
192
+
193
+ ---
@@ -0,0 +1,178 @@
1
+ # Future AGI
2
+
3
+ ![Company Logo](https://fi-content.s3.ap-south-1.amazonaws.com/Logo.png)
4
+
5
+ Welcome to Future AGI - Empowering GenAI Teams with Advanced Performance Management
6
+
7
+ # Overview
8
+
9
+ Future AGI provides a cutting-edge platform designed to help GenAI teams maintain peak model accuracy in production environments.
10
+ Our solution is purpose-built, scalable, and delivers results 10x faster than traditional methods.
11
+
12
+ **Key Features**
13
+
14
+ * **_Simplified GenAI Performance Management_**: Streamline your workflow and focus on developing cutting-edge AI models.
15
+ * **_Instant Evaluation_**: Score outputs without human-in-the-loop or ground truth, increasing QA team efficiency by up to 10x.
16
+ * **_Advanced Error Analytics_**: Gain ready-to-use insights with comprehensive error tagging and segmentation.
17
+ * **_Configurable Metrics_**: Define custom metrics tailored to your specific use case for precise model evaluation.
18
+
19
+ # Quickstart
20
+ ---
21
+ title: Quickstart
22
+ ---
23
+
24
+ This guide will walk you through setting up an evaluation in **Future AGI**, allowing you to assess AI models and workflows efficiently. You can run evaluations via the **Future AGI platform** or using the **Python SDK**.
25
+
26
+ ## Access API Key
27
+
28
+ To authenticate while running evals, you will need Future AGI's API keys, which you can get access by following below steps:
29
+
30
+ - Go to your Future AGI dashboard
31
+ - Click on **Keys** under **Developer** option from left column
32
+
33
+ - Copy both, **API Key** and **Secret Key**
34
+
35
+ ---
36
+
37
+ ## Setup Evaluator
38
+
39
+ Install the Future AGI Python SDK using below command:
40
+
41
+ ```python
42
+ pip install ai-evaluation
43
+ ```
44
+
45
+ Then initialise the Evaluator:
46
+
47
+ ```python
48
+ from fi.eval import Evaluator
49
+
50
+ evaluator = Evaluator(
51
+ fi_api_key="your_api_key",
52
+ fi_secret_key="your_secret_key",
53
+ )
54
+ ```
55
+
56
+ We recommend you to set the `FI_API_KEY` and `FI_SECRET_KEY` environment variables before using the `Evaluator` class, instead of passing them as parameters.
57
+
58
+ ---
59
+
60
+
61
+ ## Running Your First Eval
62
+
63
+ This section walks you through the process of running your first evaluation using the Future AGI evaluation framework. To get started, we'll use **Tone Evaluation** as an example.
64
+
65
+ ### a. Using Python SDK
66
+
67
+ **Define the Test Case**
68
+
69
+ Create a test case containing the **text input** that will be evaluated for tone.
70
+
71
+ ```python
72
+ from fi.testcases import TestCase
73
+
74
+ test_case = TestCase(
75
+ input='''
76
+ Dear Sir, I hope this email finds you well.
77
+ I look forward to any insights or advice you might have
78
+ whenever you have a free moment.
79
+ '''
80
+ )
81
+
82
+ ```
83
+
84
+ You can also directly send the data through a dictionary with valid keys. However, it is recommended to use the `TestCase` class when working with Future AGI Evaluations.
85
+
86
+
87
+ **Configure the Evaluation Template**
88
+
89
+ For **Tone Evaluation**, we use the **Tone Evaluation Template** to analyse the sentiment and emotional tone of the input.
90
+
91
+ ```python
92
+ from fi.eval.templates import Tone
93
+
94
+ tone_eval = Tone() # This is the evaluation template to use provided by Future AGI
95
+ ```
96
+
97
+ [Click here to read more about all the Evals provided by Future AGI](/future-agi/products/evaluation/eval-definition/overview)
98
+
99
+ **Run the Evaluation**
100
+
101
+ Execute the evaluation and retrieve the results.
102
+
103
+ ```python
104
+ result = evaluator.evaluate(eval_templates=[tone_eval], inputs=[test_case])
105
+ tone_result = result.eval_results[0].metrics[0].value
106
+ ```
107
+
108
+
109
+ To Evaluate the data on your own evaluation template which you have created, you can use the `evaluate` function with the `eval_templates` parameter.
110
+
111
+ ```python
112
+ from fi.eval import evaluate
113
+
114
+ result = evaluate(eval_templates="name-of-your-eval", inputs={
115
+ "input": "your_input_text",
116
+ "output": "your_output_text"
117
+ })
118
+
119
+ print(result.eval_results[0].metrics[0].value)
120
+ ```
121
+
122
+ ### b. Using Web Interface
123
+
124
+ **Select a Dataset**
125
+
126
+ Before running an evaluation, ensure you have selected a dataset. If no dataset is available, follow the steps to **Add Dataset** on the Future AGI platform.
127
+
128
+ [Read more about all the ways you can add dataset](/future-agi/products/dataset/overview)
129
+
130
+ **Access the Evaluation Panel**
131
+
132
+ - Navigate to your dataset.
133
+ - Click on the **Evaluate** button in the top-right menu.
134
+ - This will open the evaluation configuration panel.
135
+
136
+ **Starting a New Evaluation**
137
+
138
+ - Click on the **Add Evaluation** button.
139
+ - You will be directed to the Evaluation List page.
140
+ You can either create your own evaluation or select from the available templates built by Future AGI.
141
+ - Click on one of the available templates.
142
+ - Write the name of the evaluation and select the required dataset column.
143
+ <Tip>
144
+ Checkmark on **Error Localization** if you want to localize the errors in the dataset when the datapoint is evaluated and fails the evaluation.
145
+ </Tip>
146
+ - Click on the **Add & Run** button.
147
+
148
+
149
+ ## Creating a New Evaluation
150
+
151
+ Future AGI provides a wide range of evaluation templates to choose from. You can create your own evaluation to tailor your needs by following below simple steps:
152
+
153
+ - Click on the **Create your own eval** button after clicking on the **Add Evaluation** button.
154
+ - Write the name of the evaluation, this name will be used to identify the evaluation in the evaluation list. only lower case letters, numbers and underscores are allowed in the name.
155
+ - Select either **Use Future AGI Agents** or **Use other LLMs**
156
+
157
+ **Future AGI Agents** are our own proprietary models trained on a vast variety of datasets to perform evaluations. These models vary in capabilities and are suited for different use cases:
158
+ - **TURING_LARGE** – Flagship evaluation model that delivers best-in-class accuracy across multimodal inputs (text, images, audio). Recommended when maximal precision outweighs latency constraints.
159
+
160
+ - **TURING_SMALL** – Compact variant that preserves high evaluation fidelity while lowering computational cost. Supports text and image evaluations.
161
+
162
+ - **TURING_FLASH** – Latency-optimised version of TURING, providing high-accuracy assessments for text and image inputs with fast response times.
163
+
164
+ - **PROTECT** – Real-time guardrailing model for safety, policy compliance, and content-risk detection. Offers very low latency on text and audio streams and permits user-defined rule sets.
165
+
166
+ - **PROTECT_FLASH** – Ultra-fast binary guardrail for text content. Designed for first-pass filtering where millisecond-level turnaround is critical.
167
+
168
+ - In the Rule Prompt, you can write the rules that the evaluation should follow. Use `{{}}` to create a key (variable), that variable will be used in future when you configure the evaluation.
169
+ - Choose Output Type As either Pass/Fail or Percentage or Deterministic Choices
170
+ - **Pass/Fail**: The evaluation will return either Pass or Fail.
171
+ - **Percentage**: The evaluation will return a Score between 0 and 100.
172
+ - **Deterministic Choices**: The evaluation will return a categorical choice from the list of choices.
173
+ - Select the Tags for the evaluation that are suitable to use case.
174
+ - Write the description of the evaluation that will be used to identify the evaluation in the evaluation list.
175
+ - Checkmark on **Check Internet** to power your evaluation with the latest information.
176
+ - Click on the **Create Evaluation** button.
177
+
178
+ ---
@@ -0,0 +1,20 @@
1
+ import inspect
2
+
3
+ from fi.evals.evaluator import Evaluator, evaluate, list_evaluations # noqa: F401
4
+ from fi.evals.protect import Protect, protect # noqa: F401
5
+ from fi.evals.templates import * # noqa: F403, F401
6
+ from fi.evals.metrics import BLEUScore, ROUGEScore, NumericDiff, LevenshteinDistance, EmbeddingSimilarity, SemanticListContains, AggregatedMetric
7
+
8
+ # Dynamically generate __all__ from imported templates
9
+ _globals = globals()
10
+ evaluation_template_names = [
11
+ name
12
+ for name, obj in _globals.items()
13
+ if inspect.isclass(obj) and obj.__module__ == "fi.evals.templates"
14
+ ]
15
+
16
+ # Add the clients separately
17
+ client_names = ["Evaluator", "Protect", "evaluate", "protect", "list_evaluations"]
18
+
19
+ # Combine and sort for consistency
20
+ __all__ = sorted(evaluation_template_names + client_names)