PyPI - ragbits-evaluate - Versions diffs - 0.0.8.dev23005__tar.gz - Mend

ragbits-evaluate 0.0.8.dev23005__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (68) hide show

ragbits_evaluate-0.0.8.dev23005/.gitignore ADDED Viewed

@@ -0,0 +1,116 @@
+# Directories
+.vscode/
+.idea/
+.neptune/
+.pytest_cache/
+.mypy_cache/
+venv/
+.venv/
+__pycache__/
+**.egg-info/
+.deepeval/
+# Local cursor rules
+.cursor/rules/local/
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+env/
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+*.egg-info/
+.installed.cfg
+*.egg
+# Sphinx documentation
+docs/_build/
+public/
+# autogenerated package license table
+docs/licenses_table.rst
+# license dump file
+licenses.txt
+# File formats
+*.onnx
+*.pyc
+*.pt
+*.pth
+*.pkl
+*.mar
+*.torchscript
+**/.ipynb_checkpoints
+**/dist/
+**/checkpoints/
+**/outputs/
+**/multirun/
+# Other env files
+.python-version
+pyvenv.cfg
+pip-selfcheck.json
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*,cover
+.hypothesis/
+# dotenv
+.env
+# coverage and pytest reports
+coverage.xml
+report.xml
+# CMake
+cmake-build-*/
+# Terraform
+**/.terraform.lock.hcl
+**/.terraform
+# mkdocs generated files
+site/
+# build artifacts
+dist/
+# examples
+chroma/
+qdrant/
+.aider*
+.DS_Store
+node_modules/
+lazygit
+lazygit.tar.gz
+# chat conversation logs
+duet_conversation.log
+worktrees/

ragbits_evaluate-0.0.8.dev23005/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,244 @@
+# CHANGELOG
+## Unreleased
+- Feat: introduce agent evaluation pipelines and metrics (HotpotQA, HumanEval, GAIA) (#829)
+- Feat: introduce agent simulation module with utilities for agent-to-agent conversation and evaluation scenarios (#857)
+- Feat: add structured results to agent simulation with `SimulationResult`, `TurnResult`, `TaskResult`, and `ConversationMetrics` models (#885)
+- Feat: add `DomainContext` for domain-specific goal checking in agent simulation (currency, locale, business rules) (#884)
+- Feat: add `DataSnapshot` for data-grounded simulated user requests (prevents unrealistic requests for non-existent items) (#883)
+- Feat: add metrics collection system for agent simulation (`MetricCollector` protocol, `LatencyMetricCollector`, `TokenUsageMetricCollector`, `ToolUsageMetricCollector`) (#882)
+- Feat: add support for response adapters from `ragbits.chat.adapters` in agent simulation, enabling production chat interfaces to be used directly without wrapper classes
+## 1.3.0 (2025-09-11)
+### Changed
+- ragbits-core updated to version v1.3.0
+- Optional parallel batches execution in ragbits.evaluate.Evaluator (#769)
+## 1.2.2 (2025-08-08)
+### Changed
+- ragbits-core updated to version v1.2.2
+## 1.2.1 (2025-08-04)
+### Changed
+- ragbits-core updated to version v1.2.1
+## 1.2.0 (2025-08-01)
+### Changed
+- ragbits-core updated to version v1.2.0
+## 1.1.0 (2025-07-09)
+### Changed
+- ragbits-core updated to version v1.1.0
+- Update qa data loader docstring (#565)
+- Fix deadlock on qa metrics compute (#609)
+- Upgrade distilabel version to 1.5.0 (#682)
+## 1.0.0 (2025-06-04)
+### Changed
+- ragbits-core updated to version v1.0.0
+## 0.20.1 (2025-06-04)
+### Changed
+- ragbits-core updated to version v0.20.1
+## 0.20.0 (2025-06-03)
+### Changed
+- ragbits-core updated to version v0.20.0
+## 0.19.1 (2025-05-27)
+### Changed
+- ragbits-core updated to version v0.19.1
+## 0.19.0 (2025-05-27)
+### Changed
+- ragbits-core updated to version v0.19.0
+- Add evals for question answering (#577)
+- Add support for slicing dataset (#576)
+- Separate load and map ops in data loaders (#576)
+## 0.18.0 (2025-05-22)
+### Changed
+- ragbits-core updated to version v0.18.0
+- Add support for custom column names in evaluation dataset (#566)
+- Add support for reference document ids and page numbers in evaluation dataset (#566)
+- BREAKING CHANGE: Adjust eval pipline interface to batch processing (#555)
+- Rename DocumentMeta create_text_document_from_literal to from_literal (#561)
+- Adjust typing for DocumentSearch (#554)
+## 0.17.1 (2025-05-09)
+### Changed
+- ragbits-core updated to version v0.17.1
+## 0.17.0 (2025-05-06)
+### Changed
+- ragbits-core updated to version v0.17.0
+- Add tests for ragbits-evaluate package (#390)
+- Integrate sources with dataloaders (#529)
+## 0.16.0 (2025-04-29)
+### Changed
+- ragbits-core updated to version v0.16.0
+## 0.15.0 (2025-04-28)
+### Changed
+- ragbits-core updated to version v0.15.0
+## 0.14.0 (2025-04-22)
+### Changed
+- ragbits-core updated to version v0.14.0
+- move sources from ragbits-document-search to ragbits-core (#496)
+## 0.13.0 (2025-04-02)
+### Changed
+- ragbits-core updated to version v0.13.0
+## 0.12.0 (2025-03-25)
+### Changed
+- ragbits-core updated to version v0.12.0
+## 0.11.0 (2025-03-25)
+### Changed
+- ragbits-core updated to version v0.11.0
+## 0.10.2 (2025-03-21)
+### Changed
+- ragbits-core updated to version v0.10.2
+## 0.10.1 (2025-03-19)
+### Changed
+- ragbits-core updated to version v0.10.1
+## 0.10.0 (2025-03-17)
+### Changed
+- ragbits-core updated to version v0.10.0
+- Compability with the new Vector Store interface from ragbits-core (#288)
+- chore: fix typo in README.
+- fix typos in doc strings
+## 0.9.0 (2025-02-25)
+### Changed
+- ragbits-core updated to version v0.9.0
+- Add cli for document search evaluation added (#356)
+- Add local data loader (#334).
+## 0.8.0 (2025-01-29)
+### Changed
+- ragbits-core updated to version v0.8.0
+## 0.7.0 (2025-01-21)
+### Added
+- Simplified interface to document-search evaluation (#258).
+### Changed
+- ragbits-core updated to version v0.7.0
+## 0.6.0 (2024-12-27)
+### Changed
+- ragbits-core updated to version v0.6.0
+## 0.5.1 (2024-12-09)
+### Changed
+- ragbits-core updated to version v0.5.1
+- document search evaluation now returns all Element types, rather than only TextElements (#241).
+## 0.5.0 (2024-12-05)
+### Changed
+- ragbits-core updated to version v0.5.0
+## 0.4.0 (2024-11-27)
+### Added
+- Introduced optimization with optuna (#177).
+- Add synthetic data generation pipeline (#165).
+### Changed
+- ragbits-core updated to version v0.4.0
+## 0.3.0 (2024-11-06)
+### Changed
+- ragbits-core updated to version v0.3.0
+## 0.2.0 (2024-10-23)
+- Initial release of the package.
+- Evaluation pipeline framework with capability to define evaluators & metrics.
+- Evaluation pipeline for `ragbits-document-search`.
+### Changed
+- ragbits-core updated to version v0.2.0

ragbits_evaluate-0.0.8.dev23005/PKG-INFO ADDED Viewed

@@ -0,0 +1,58 @@
+Metadata-Version: 2.4
+Name: ragbits-evaluate
+Version: 0.0.8.dev23005
+Summary: Evaluation module for Ragbits components
+Project-URL: Homepage, https://github.com/deepsense-ai/ragbits
+Project-URL: Bug Reports, https://github.com/deepsense-ai/ragbits/issues
+Project-URL: Documentation, https://ragbits.deepsense.ai/
+Project-URL: Source, https://github.com/deepsense-ai/ragbits
+Author-email: "deepsense.ai" <ragbits@deepsense.ai>
+License-Expression: MIT
+Keywords: Evaluation,GenAI,Generative AI,LLMs,Large Language Models,RAG,Retrieval Augmented Generation
+Classifier: Development Status :: 4 - Beta
+Classifier: Environment :: Console
+Classifier: Intended Audience :: Science/Research
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Natural Language :: English
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Classifier: Topic :: Software Development :: Libraries :: Python Modules
+Requires-Python: >=3.10
+Requires-Dist: datasets<4.0.0,>=3.0.1
+Requires-Dist: deepeval<3.0.0,>=2.0.0
+Requires-Dist: distilabel<2.0.0,>=1.5.0
+Requires-Dist: hydra-core<2.0.0,>=1.3.2
+Requires-Dist: neptune[optuna]<2.0.0,>=1.12.0
+Requires-Dist: optuna<5.0.0,>=4.0.0
+Requires-Dist: ragbits-core==0.0.8.dev23005
+Provides-Extra: relari
+Requires-Dist: continuous-eval<1.0.0,>=0.3.12; extra == 'relari'
+Description-Content-Type: text/markdown
+# Ragbits Evaluate
+Ragbits Evaluate is a package that contains tools for evaluating the performance of AI pipelines defined with Ragbits components. It also helps with automatically finding the best hyperparameter configurations for them.
+## Installation
+To install the Ragbits Evaluate package, run:
+```sh
+pip install ragbits-evaluate
+```
+<!--
+TODO: Add a minimalistic example inspired by the Quickstart chapter on Ragbits Evaluate once it is ready.
+-->
+## Documentation
+<!--
+TODO:
+* Add link to the Quickstart chapter on Ragbits Evaluate once it is ready.
+* Add link to API Reference once classes from the Evaluate package are added to the API Reference.
+-->
+* [How-To Guides - Evaluate](https://ragbits.deepsense.ai/how-to/evaluate/optimize/)

ragbits_evaluate-0.0.8.dev23005/README.md ADDED Viewed

@@ -0,0 +1,23 @@
+# Ragbits Evaluate
+Ragbits Evaluate is a package that contains tools for evaluating the performance of AI pipelines defined with Ragbits components. It also helps with automatically finding the best hyperparameter configurations for them.
+## Installation
+To install the Ragbits Evaluate package, run:
+```sh
+pip install ragbits-evaluate
+```
+<!--
+TODO: Add a minimalistic example inspired by the Quickstart chapter on Ragbits Evaluate once it is ready.
+-->
+## Documentation
+<!--
+TODO:
+* Add link to the Quickstart chapter on Ragbits Evaluate once it is ready.
+* Add link to API Reference once classes from the Evaluate package are added to the API Reference.
+-->
+* [How-To Guides - Evaluate](https://ragbits.deepsense.ai/how-to/evaluate/optimize/)

ragbits_evaluate-0.0.8.dev23005/pyproject.toml ADDED Viewed

@@ -0,0 +1,68 @@
+[project]
+name = "ragbits-evaluate"
+version = "0.0.8.dev23005"
+description = "Evaluation module for Ragbits components"
+readme = "README.md"
+requires-python = ">=3.10"
+license = "MIT"
+authors = [
+    { name = "deepsense.ai", email = "ragbits@deepsense.ai"}
+]
+keywords = [
+    "Retrieval Augmented Generation",
+    "RAG",
+    "Large Language Models",
+    "LLMs",
+    "Generative AI",
+    "GenAI",
+    "Evaluation"
+]
+classifiers = [
+    "Development Status :: 4 - Beta",
+    "Environment :: Console",
+    "Intended Audience :: Science/Research",
+    "License :: OSI Approved :: MIT License",
+    "Natural Language :: English",
+    "Operating System :: OS Independent",
+    "Programming Language :: Python :: 3.10",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+    "Programming Language :: Python :: 3.13",
+    "Topic :: Scientific/Engineering :: Artificial Intelligence",
+    "Topic :: Software Development :: Libraries :: Python Modules",
+]
+dependencies = ["hydra-core>=1.3.2,<2.0.0", "neptune[optuna]>=1.12.0,<2.0.0", "optuna>=4.0.0,<5.0.0", "distilabel>=1.5.0,<2.0.0", "datasets>=3.0.1,<4.0.0", "ragbits-core==0.0.8.dev23005", "deepeval>=2.0.0,<3.0.0"]
+[project.urls]
+"Homepage" = "https://github.com/deepsense-ai/ragbits"
+"Bug Reports" = "https://github.com/deepsense-ai/ragbits/issues"
+"Documentation" = "https://ragbits.deepsense.ai/"
+"Source" = "https://github.com/deepsense-ai/ragbits"
+[project.optional-dependencies]
+relari = [
+    "continuous-eval>=0.3.12,<1.0.0",
+]
+[tool.uv]
+dev-dependencies = [
+    "pre-commit~=3.8.0",
+    "pytest~=8.3.3",
+    "pytest-cov~=5.0.0",
+    "pytest-asyncio~=0.24.0",
+    "pip-licenses>=4.0.0,<5.0.0"
+]
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+[tool.hatch.metadata]
+allow-direct-references = true
+[tool.hatch.build.targets.wheel]
+packages = ["src/ragbits"]
+[tool.pytest.ini_options]
+asyncio_mode = "auto"

ragbits_evaluate-0.0.8.dev23005/src/ragbits/evaluate/__init__.py ADDED Viewed

File without changes

ragbits_evaluate-0.0.8.dev23005/src/ragbits/evaluate/agent_simulation/__init__.py ADDED Viewed

@@ -0,0 +1,122 @@
+"""Agent simulation utilities for evaluation scenarios.
+This module uses lazy imports for components that require optional dependencies
+(ragbits-agents, ragbits-chat) to allow importing result models independently.
+"""
+from typing import TYPE_CHECKING
+# Import context, metrics, and result models eagerly - they have no external dependencies
+# Adapters are re-exported from ragbits.chat.adapters for convenience
+from ragbits.chat.adapters import (
+    AdapterContext,
+    AdapterPipeline,
+    BaseAdapter,
+    ChatResponseAdapter,
+    FilterAdapter,
+    ResponseAdapter,
+    TextAccumulatorAdapter,
+    ToolCallAccumulatorAdapter,
+    ToolResultTextAdapter,
+    UsageAggregatorAdapter,
+)
+from ragbits.evaluate.agent_simulation.context import DataSnapshot, DomainContext
+from ragbits.evaluate.agent_simulation.metrics import (
+    CompositeMetricCollector,
+    LatencyMetricCollector,
+    MetricCollector,
+    TokenUsageMetricCollector,
+    ToolUsageMetricCollector,
+)
+from ragbits.evaluate.agent_simulation.results import (
+    ConversationMetrics,
+    SimulationResult,
+    SimulationStatus,
+    TaskResult,
+    TurnResult,
+)
+if TYPE_CHECKING:
+    from ragbits.evaluate.agent_simulation.conversation import (
+        run_scenario_matrix,
+        run_simulation,
+        run_simulations_concurrent,
+    )
+    from ragbits.evaluate.agent_simulation.deepeval_evaluator import DeepEvalEvaluator
+    from ragbits.evaluate.agent_simulation.logger import ConversationLogger
+    from ragbits.evaluate.agent_simulation.models import Personality, Scenario, Task, Turn
+    from ragbits.evaluate.agent_simulation.scenarios import load_personalities, load_scenarios
+    from ragbits.evaluate.agent_simulation.simulation import GoalChecker, SimulatedUser
+__all__ = [
+    # Adapters
+    "AdapterContext",
+    "AdapterPipeline",
+    "BaseAdapter",
+    "ChatResponseAdapter",
+    "FilterAdapter",
+    "ResponseAdapter",
+    "TextAccumulatorAdapter",
+    "ToolCallAccumulatorAdapter",
+    "ToolResultTextAdapter",
+    "UsageAggregatorAdapter",
+    # Metrics
+    "CompositeMetricCollector",
+    "LatencyMetricCollector",
+    "MetricCollector",
+    "TokenUsageMetricCollector",
+    "ToolUsageMetricCollector",
+    # Context
+    "DataSnapshot",
+    "DomainContext",
+    # Results
+    "ConversationMetrics",
+    "SimulationResult",
+    "SimulationStatus",
+    "TaskResult",
+    "TurnResult",
+    # Components (lazy loaded)
+    "ConversationLogger",
+    "DeepEvalEvaluator",
+    "GoalChecker",
+    "Personality",
+    "Scenario",
+    "SimulatedUser",
+    "Task",
+    "Turn",
+    # Functions (lazy loaded)
+    "load_personalities",
+    "load_scenarios",
+    "run_scenario_matrix",
+    "run_simulation",
+    "run_simulations_concurrent",
+]
+def __getattr__(name: str) -> object:
+    """Lazy import for components with optional dependencies."""
+    if name in ("run_simulation", "run_simulations_concurrent", "run_scenario_matrix"):
+        from ragbits.evaluate.agent_simulation import conversation
+        return getattr(conversation, name)
+    if name == "DeepEvalEvaluator":
+        from ragbits.evaluate.agent_simulation.deepeval_evaluator import DeepEvalEvaluator
+        return DeepEvalEvaluator
+    if name == "ConversationLogger":
+        from ragbits.evaluate.agent_simulation.logger import ConversationLogger
+        return ConversationLogger
+    if name in ("Personality", "Scenario", "Task", "Turn"):
+        from ragbits.evaluate.agent_simulation import models
+        return getattr(models, name)
+    if name in ("load_personalities", "load_scenarios"):
+        from ragbits.evaluate.agent_simulation import scenarios
+        return getattr(scenarios, name)
+    if name in ("GoalChecker", "SimulatedUser"):
+        from ragbits.evaluate.agent_simulation import simulation
+        return getattr(simulation, name)
+    raise AttributeError(f"module {__name__!r} has no attribute {name!r}")