PyPI - hyperplane-eval - Versions diffs - 0.1.5__tar.gz → 0.1.7__tar.gz - Mend

hyperplane-eval 0.1.5tar.gz → 0.1.7tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (83) hide show

hyperplane_eval-0.1.7/MANIFEST.in ADDED Viewed

@@ -0,0 +1,6 @@
+include requirements.txt
+include README.md
+include LICENSE
+recursive-include hyperplane/prompts *.txt
+recursive-include hyperplane/framework/domain *.json
+recursive-include hyperplane/reporting/templates *.html

hyperplane_eval-0.1.7/PKG-INFO ADDED Viewed

@@ -0,0 +1,149 @@
+Metadata-Version: 2.4
+Name: hyperplane-eval
+Version: 0.1.7
+Summary: Local tool to evaluate AI agents and find their weak points.
+Author: Marten Panchev
+Author-email: marten@aquithm.com
+Classifier: Programming Language :: Python :: 3
+Classifier: License :: OSI Approved :: Apache Software License
+Classifier: Operating System :: OS Independent
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: pydantic>=2.0.0
+Requires-Dist: numpy>=1.24.0
+Requires-Dist: scipy>=1.10.0
+Requires-Dist: litellm>=1.0.0
+Requires-Dist: aiohttp>=3.9.0
+Requires-Dist: pandas>=2.0.0
+Requires-Dist: scikit-learn>=1.2.0
+Requires-Dist: openai>=1.0.0
+Requires-Dist: pyngrok>=7.1.0
+Requires-Dist: rich>=13.0.0
+Requires-Dist: questionary>=2.0.0
+Requires-Dist: PyYAML>=6.0.0
+Dynamic: author
+Dynamic: author-email
+Dynamic: classifier
+Dynamic: description
+Dynamic: description-content-type
+Dynamic: license-file
+Dynamic: requires-dist
+Dynamic: requires-python
+Dynamic: summary
+# Hyperplane Eval
+Hyperplane Eval helps teams discover behavioral failures before deployment.
+It works as a CLI and programmatic tool for evaluating AI agents against business, security, and ethical requirements using intelligent test generation rather than manually written test cases.
+## Why Hyperplane?
+Most AI teams evaluate agents with a few dozen manually written prompts.
+The problem is that manually written tests only cover a tiny fraction of the agent's behavioral space.
+Hyperplane automatically explores that space, generating thousands of semantically diverse inputs to uncover failures before users do.
+## Features
+- Generate thousands of semantically diverse test cases
+- Evaluate business, security, and ethical requirements of your agent
+- Automatically find edge cases and breaking points
+- Local-first CLI workflow
+- Framework-agnostic agent integration
+- Detailed evaluation reports. [See example here](https://n0tsu5.github.io/results-board/)
+## CLI Integration
+Hyperplane operates entirely locally through a terminal-based orchestration wizard.
+### Setup & Installation
+Install the framework via pip:
+```bash
+pip install hyperplane-eval
+```
+### Running the CLI
+Run the interactive CLI directly in your terminal from inside your project directory:
+```bash
+hyperplane
+```
+The CLI wizard will guide you through:
+1. **Target Binding:** Automatically finding your agent's code.
+2. **Constraint Definition:** Setting natural language rules for your agent.
+3. **Configuration:** Setting the scale and depth of the evaluation.
+4. **Execution:** Running the evaluation with a live dashboard.
+Hyperplane outputs a structured dataset and an HTML report identifying the specific types and characteristics of inputs that cause the agent to fail, helping you quickly isolate prompt engineering or logic regressions.
+![Hyperplane Evaluation Dashboard](5cr33n5h0t.png)
+## Programmatic Integration
+You can also integrate Hyperplane Eval directly into your Python codebase or CI/CD testing suites using the `Evaluator` API:
+```python
+from hyperplane import Evaluator
+from litellm import Router
+# 1. Initialize your litellm Router
+router = Router(
+    model_list=[{
+        "model_name": "gpt-4o",
+        "litellm_params": {"model": "gpt-4o"}
+    }]
+)
+# 2. Define your target agent
+def my_agent(prompt: str) -> str:
+    return "I am a safe agent."
+# 3. Initialize the Evaluator
+evaluator = Evaluator(
+    agent_desc="A helpful AI assistant",
+    param_desc={"prompt": "The user input prompt"},
+    target_callable=my_agent,
+    llm_client=router
+)
+# 4. Add constraints and run
+evaluator.run(rules=["Never execute tool calls with unsafe parameters.", "Always respond in English."])
+```
+## Architecture & Methodology
+Evaluating agentic systems presents a curse-of-dimensionality problem due to the infinite input space. Hyperplane mitigates this via a dimensional reduction and bounded sampling approach:
+1. **Orthogonal Dimension Extraction:** The target heuristic or constraint (e.g., a business logic rule) is passed to an LLM, which extracts a set of orthogonal, continuous axes representing the variance in potential inputs (e.g., `user_frustration` [0,1], `budget_constraint` [0,1]).
+2. **Quasi-Random Initialization:** The framework maps a bounded multi-dimensional continuous input space $S \in [0, 1]^D$. It utilizes Sobol sequences to generate a low-discrepancy initialization grid to ensure uniform volumetric coverage without clustering.
+3. **Synthetic Input Generation:** Bounded coordinate vectors are mapped back into natural language. A Generator LLM synthesizes adversarial or conversational inputs as structured payloads that strictly adhere to the defined vector coordinates.
+4. **Response Classification** The target agent executes the synthesized inputs. An Evaluator LLM utilizes Chain-of-Thought (CoT) reasoning to classify the agent's response as a pass (1) or fail (0) against the constraint.
+5. **Surrogate Modeling & Active Search:** The framework fits a Random Forest surrogate classifier over the evaluated points to approximate the failure boundary. It utilizes an active search algorithm to sample points near the decision boundary until volumetric saturation is reached, stopping early via dimension-scaled mismatch rate thresholds.
+The resulting artifact is a detailed report allowing engineers to identify the specific input themes and characteristics that reliably induce constraint violations.
+## Privacy & Security (BYOK)
+Hyperplane Eval is designed for enterprise privacy using a **Bring Your Own Key (BYOK)** architecture:
+- **100% Local Execution:** The orchestrator and test synthesis engine run entirely on your local machine or CI/CD runner.
+- **No Telemetry or Data Logging:** We do not collect product telemetry, execution logs, or source code.
+- **Direct Vendor Routing:** Your API keys are routed strictly to the LLM provider you configure (via `litellm`) and are never intercepted or proxied.
+- **Budget Safe:** Built-in safeguards and configurable depth/breadth parameters ensure the evaluation pipeline generates high coverage without blowing up your token bill.
+- **Open Source Auditing:** The entire orchestration pipeline is open-source, allowing your security team to fully audit the codebase.
+## 🛠 Technology Stack
+- **Language:** Python 3.10+
+- **Data Modeling:** `pydantic`
+- **Math/Geometry:** `numpy`, `scipy` (Sobol sequences, ConvexHull analysis)
+- **LLM Integration:** `litellm` for universal API connectivity (OpenAI, Gemini, Anthropic, or any local vLLM).
+## 📄 License
+This project is licensed under the Apache License, Version 2.0.
+See the [LICENSE](LICENSE) file for more information.

hyperplane_eval-0.1.7/README.md ADDED Viewed

@@ -0,0 +1,115 @@
+# Hyperplane Eval
+Hyperplane Eval helps teams discover behavioral failures before deployment.
+It works as a CLI and programmatic tool for evaluating AI agents against business, security, and ethical requirements using intelligent test generation rather than manually written test cases.
+## Why Hyperplane?
+Most AI teams evaluate agents with a few dozen manually written prompts.
+The problem is that manually written tests only cover a tiny fraction of the agent's behavioral space.
+Hyperplane automatically explores that space, generating thousands of semantically diverse inputs to uncover failures before users do.
+## Features
+- Generate thousands of semantically diverse test cases
+- Evaluate business, security, and ethical requirements of your agent
+- Automatically find edge cases and breaking points
+- Local-first CLI workflow
+- Framework-agnostic agent integration
+- Detailed evaluation reports. [See example here](https://n0tsu5.github.io/results-board/)
+## CLI Integration
+Hyperplane operates entirely locally through a terminal-based orchestration wizard.
+### Setup & Installation
+Install the framework via pip:
+```bash
+pip install hyperplane-eval
+```
+### Running the CLI
+Run the interactive CLI directly in your terminal from inside your project directory:
+```bash
+hyperplane
+```
+The CLI wizard will guide you through:
+1. **Target Binding:** Automatically finding your agent's code.
+2. **Constraint Definition:** Setting natural language rules for your agent.
+3. **Configuration:** Setting the scale and depth of the evaluation.
+4. **Execution:** Running the evaluation with a live dashboard.
+Hyperplane outputs a structured dataset and an HTML report identifying the specific types and characteristics of inputs that cause the agent to fail, helping you quickly isolate prompt engineering or logic regressions.
+![Hyperplane Evaluation Dashboard](5cr33n5h0t.png)
+## Programmatic Integration
+You can also integrate Hyperplane Eval directly into your Python codebase or CI/CD testing suites using the `Evaluator` API:
+```python
+from hyperplane import Evaluator
+from litellm import Router
+# 1. Initialize your litellm Router
+router = Router(
+    model_list=[{
+        "model_name": "gpt-4o",
+        "litellm_params": {"model": "gpt-4o"}
+    }]
+)
+# 2. Define your target agent
+def my_agent(prompt: str) -> str:
+    return "I am a safe agent."
+# 3. Initialize the Evaluator
+evaluator = Evaluator(
+    agent_desc="A helpful AI assistant",
+    param_desc={"prompt": "The user input prompt"},
+    target_callable=my_agent,
+    llm_client=router
+)
+# 4. Add constraints and run
+evaluator.run(rules=["Never execute tool calls with unsafe parameters.", "Always respond in English."])
+```
+## Architecture & Methodology
+Evaluating agentic systems presents a curse-of-dimensionality problem due to the infinite input space. Hyperplane mitigates this via a dimensional reduction and bounded sampling approach:
+1. **Orthogonal Dimension Extraction:** The target heuristic or constraint (e.g., a business logic rule) is passed to an LLM, which extracts a set of orthogonal, continuous axes representing the variance in potential inputs (e.g., `user_frustration` [0,1], `budget_constraint` [0,1]).
+2. **Quasi-Random Initialization:** The framework maps a bounded multi-dimensional continuous input space $S \in [0, 1]^D$. It utilizes Sobol sequences to generate a low-discrepancy initialization grid to ensure uniform volumetric coverage without clustering.
+3. **Synthetic Input Generation:** Bounded coordinate vectors are mapped back into natural language. A Generator LLM synthesizes adversarial or conversational inputs as structured payloads that strictly adhere to the defined vector coordinates.
+4. **Response Classification** The target agent executes the synthesized inputs. An Evaluator LLM utilizes Chain-of-Thought (CoT) reasoning to classify the agent's response as a pass (1) or fail (0) against the constraint.
+5. **Surrogate Modeling & Active Search:** The framework fits a Random Forest surrogate classifier over the evaluated points to approximate the failure boundary. It utilizes an active search algorithm to sample points near the decision boundary until volumetric saturation is reached, stopping early via dimension-scaled mismatch rate thresholds.
+The resulting artifact is a detailed report allowing engineers to identify the specific input themes and characteristics that reliably induce constraint violations.
+## Privacy & Security (BYOK)
+Hyperplane Eval is designed for enterprise privacy using a **Bring Your Own Key (BYOK)** architecture:
+- **100% Local Execution:** The orchestrator and test synthesis engine run entirely on your local machine or CI/CD runner.
+- **No Telemetry or Data Logging:** We do not collect product telemetry, execution logs, or source code.
+- **Direct Vendor Routing:** Your API keys are routed strictly to the LLM provider you configure (via `litellm`) and are never intercepted or proxied.
+- **Budget Safe:** Built-in safeguards and configurable depth/breadth parameters ensure the evaluation pipeline generates high coverage without blowing up your token bill.
+- **Open Source Auditing:** The entire orchestration pipeline is open-source, allowing your security team to fully audit the codebase.
+## 🛠 Technology Stack
+- **Language:** Python 3.10+
+- **Data Modeling:** `pydantic`
+- **Math/Geometry:** `numpy`, `scipy` (Sobol sequences, ConvexHull analysis)
+- **LLM Integration:** `litellm` for universal API connectivity (OpenAI, Gemini, Anthropic, or any local vLLM).
+## 📄 License
+This project is licensed under the Apache License, Version 2.0.
+See the [LICENSE](LICENSE) file for more information.

hyperplane_eval-0.1.7/hyperplane/__init__.py ADDED Viewed

@@ -0,0 +1,3 @@
+from .evaluator import Evaluator
+__all__ = ["Evaluator"]

{hyperplane_eval-0.1.5/hyperplane_eval → hyperplane_eval-0.1.7/hyperplane}/cli/app.py RENAMED Viewed

@@ -7,11 +7,11 @@ from rich.text import Text
 from rich.panel import Panel
 from typing import Any
-from hyperplane_eval.adapters.llms.llm_client import LLMClient
-from hyperplane_eval.adapters.runners.agent_runner import AgentRunner
-from hyperplane_eval.adapters.local_bindings.executor import execute_temp_runner
-from hyperplane_eval.engine.config import EvaluationConfig
-from hyperplane_eval.engine.orchestrator import PipelineOrchestrator
+from hyperplane.cli.llms.llm_client import LLMClient
+from hyperplane.cli.runners.agent_runner import AgentRunner
+from hyperplane.cli.local_bindings.executor import execute_temp_runner
+from hyperplane.framework.config import EvaluationConfig
+from hyperplane.framework.orchestrator import PipelineOrchestrator
 LOGO = """
@@ -148,7 +148,7 @@ class VerifyApp:
             )
             use_existing = await questionary.confirm("Use this target?").ask_async()
             if use_existing:
-                from hyperplane_eval.adapters.local_bindings.scanner import extract_functions
+                from hyperplane.cli.local_bindings.scanner import extract_functions
                 funcs = extract_functions(self.config["file"])
                 selected_func = next(
@@ -187,7 +187,7 @@ class VerifyApp:
             return None, None, None, []
         self.console.print("[cyan]Scanning for functions...[/cyan]")
-        from hyperplane_eval.adapters.local_bindings.scanner import extract_functions
+        from hyperplane.cli.local_bindings.scanner import extract_functions
         funcs = extract_functions(target_path)
         if not funcs:

{hyperplane_eval-0.1.5/hyperplane_eval/adapters → hyperplane_eval-0.1.7/hyperplane/cli}/llms/llm_client.py RENAMED Viewed

@@ -4,7 +4,7 @@ import re
 import asyncio
 from typing import Any, Dict
 from litellm import acompletion
-from hyperplane_eval.engine.prompt_loader import load_prompt
+from hyperplane.prompts.prompt_loader import load_prompt
 class LLMClient:
@@ -41,7 +41,7 @@ class LLMClient:
         temperature: float,
     ) -> str:
         schema_str = json.dumps(response_schema, indent=2)
-        prompt += "\n\n" + load_prompt("adapters/llm/schema_prompt", schema=schema_str)
+        prompt += "\n\n" + load_prompt("llms/schema_prompt", schema=schema_str)
         kwargs = {
             "model": self.model,  # Force using the user-selected model

{hyperplane_eval-0.1.5/hyperplane_eval/adapters → hyperplane_eval-0.1.7/hyperplane/cli}/local_bindings/executor.py RENAMED Viewed

@@ -17,7 +17,7 @@ async def execute_temp_runner(target_path: str, selected_func: dict, params: dic
 import sys, json, asyncio, inspect, importlib
 sys.path.insert(0, r"{target_dir}")
 try:
-    target_func = getattr(importlib.import_module("{module_name}"), "{selected_func['name']}")
+    target_func = getattr(importlib.import_module("{module_name}"), "{selected_func["name"]}")
 except Exception as e:
     print("VERIFY_RUN_ERROR:Load fail: " + str(e))
     sys.exit(1)
@@ -50,8 +50,8 @@ async function main() {{
     }} catch(e) {{
         mod = require(moduleName);
     }}
-    const func = mod.{selected_func['name']};
-    if (!func) throw new Error("Function {selected_func['name']} not found in module.");
+    const func = mod.{selected_func["name"]};
+    if (!func) throw new Error("Function {selected_func["name"]} not found in module.");
     const params = JSON.parse(process.argv[1]);
     const funcParams = {params_array_str};

hyperplane_eval-0.1.7/hyperplane/evaluator.py ADDED Viewed

@@ -0,0 +1,171 @@
+import asyncio
+import inspect
+import json
+from typing import Callable
+import litellm
+from hyperplane.framework.config import EvaluationConfig
+from hyperplane.framework.orchestrator import PipelineOrchestrator
+from hyperplane.cli.runners.agent_runner import AgentRunner
+from hyperplane.cli.llms.llm_client import LLMClient
+class Evaluator:
+    """
+    Programmatic entry point for the Hyperplane Eval framework.
+    """
+    def __init__(
+        self,
+        agent_desc: str,
+        param_desc: dict,
+        target_callable: Callable,
+        llm_client: litellm.Router,
+    ):
+        """
+        Initialize the Evaluator.
+        Args:
+            agent_desc: A description of what the agent does.
+            param_desc: A map containing descriptions of the input parameters the agent takes (e.g. {"input": "User query"}).
+            target_callable: A synchronous or asynchronous Python callable (the agent).
+            llm_client: A pre-configured litellm.Router instance.
+        """
+        self.agent_desc = agent_desc
+        self.param_desc = param_desc
+        self.target_callable = target_callable
+        self.llm_client = llm_client
+    def run(
+        self,
+        rules: list[str],
+        depth: int = 50,
+        breadth: int = 3,
+        adversarial_testing: bool = False,
+        conversational_testing: bool = False,
+    ):
+        """
+        Runs the evaluation pipeline.
+        """
+        # 1. Build schema by inspecting the callable
+        try:
+            sig = inspect.signature(self.target_callable)
+            param_names = list(sig.parameters.keys())
+        except Exception:
+            param_names = []
+        if not param_names and isinstance(self.param_desc, dict) and self.param_desc:
+            param_names = list(self.param_desc.keys())
+        elif not param_names:
+            param_names = ["input"]
+        schema = []
+        for p_name in param_names:
+            if isinstance(self.param_desc, dict):
+                desc = self.param_desc.get(p_name, "")
+                if isinstance(desc, dict):
+                    p_type = desc.get("type", "str")
+                    p_desc = desc.get("description", "")
+                else:
+                    p_type = "str"
+                    p_desc = str(desc)
+            else:
+                p_type = "str"
+                p_desc = str(self.param_desc)
+            schema.append({"name": p_name, "type": p_type, "description": p_desc})
+        # 2. Try to extract source code for context
+        try:
+            code = inspect.getsource(self.target_callable)
+        except Exception:
+            code = "Code unavailable"
+        # 3. Resolve LLM configuration
+        class RouterWrapper(LLMClient):
+            def __init__(self, router):
+                self.router = router
+                self.model = "router"
+                self._semaphore = asyncio.Semaphore(10)
+            async def generate(
+                self, prompt: str, response_schema: dict, temperature: float
+            ) -> str:
+                from hyperplane.prompts.prompt_loader import load_prompt
+                schema_str = json.dumps(response_schema, indent=2)
+                prompt += "\n\n" + load_prompt("llms/schema_prompt", schema=schema_str)
+                kwargs = {
+                    "model": "",
+                    "messages": [{"role": "user", "content": prompt}],
+                    "temperature": temperature,
+                    "response_format": {"type": "json_object"},
+                }
+                try:
+                    response = await self.router.acompletion(**kwargs)
+                    return response.choices[0].message.content
+                except Exception as e:
+                    raise RuntimeError(f"LLM Server Error: {e}")
+        llm_client_resolved = RouterWrapper(self.llm_client)
+        # 4. Setup Custom Execution Environment
+        selected_func = {
+            "name": getattr(self.target_callable, "__name__", "target_callable"),
+            "code": code,
+            "params": schema,
+        }
+        async def executor_func(target_path, func_meta, params):
+            try:
+                if asyncio.iscoroutinefunction(self.target_callable):
+                    res = await self.target_callable(**params)
+                else:
+                    res = self.target_callable(**params)
+                if not isinstance(res, str):
+                    res_str = json.dumps(res)
+                else:
+                    res_str = res
+                return {"successVal": res_str}
+            except Exception as e:
+                return {"errorVal": str(e)}
+        runner = AgentRunner(
+            executor_func=executor_func,
+            target_path="programmatic_execution",
+            selected_func=selected_func,
+        )
+        # 5. Build Config
+        eval_config = EvaluationConfig(
+            rules=rules,
+            runner=runner,
+            generator_target_schema=schema,
+            generator_target_code=code,
+            llm_client=llm_client_resolved,
+            depth=depth,
+            breadth=breadth,
+            adversarial_testing=adversarial_testing,
+            conversational_testing=conversational_testing,
+            agent_description=self.agent_desc,
+        )
+        # 6. Execute Pipeline
+        orchestrator = PipelineOrchestrator(eval_config)
+        try:
+            loop = asyncio.get_running_loop()
+        except RuntimeError:
+            loop = None
+        if loop and loop.is_running():
+            import warnings
+            warnings.warn(
+                "Evaluator.run() called from a running event loop. Returning a coroutine instead. Please await it."
+            )
+            return orchestrator.run()
+        return asyncio.run(orchestrator.run())

{hyperplane_eval-0.1.5/hyperplane_eval/engine → hyperplane_eval-0.1.7/hyperplane/framework}/config.py RENAMED Viewed

@@ -1,7 +1,7 @@
 from dataclasses import dataclass
 from typing import Any, List, Dict
-from hyperplane_eval.adapters.runners.agent_runner import AgentRunner
+from hyperplane.cli.runners.agent_runner import AgentRunner
 @dataclass

hyperplane_eval-0.1.7/hyperplane/framework/domain/dimensions/__init__.py ADDED Viewed

@@ -0,0 +1,3 @@
+from .prompt_feature import PromptFeature
+__all__ = ["PromptFeature"]

{hyperplane_eval-0.1.5/hyperplane_eval/engine → hyperplane_eval-0.1.7/hyperplane/framework}/input_space/input_space.py RENAMED Viewed

@@ -2,8 +2,8 @@ import json
 from pathlib import Path
 from typing import List
 from scipy.stats import qmc
-from hyperplane_eval.engine.domain.vectors import ScenarioVector, EvaluatedVector
-from hyperplane_eval.engine.domain.dimensions import PromptFeature
+from hyperplane.framework.domain.vectors import ScenarioVector, EvaluatedVector
+from hyperplane.framework.domain.dimensions.prompt_feature import PromptFeature
 class InputSpace:

{hyperplane_eval-0.1.5/hyperplane_eval/engine → hyperplane_eval-0.1.7/hyperplane/framework}/input_space/input_space_factory.py RENAMED Viewed

@@ -3,10 +3,10 @@ import json
 import random
 import os
 from typing import List
-from hyperplane_eval.engine.domain.dimensions import PromptFeature
-from hyperplane_eval.adapters.llms.llm_client import LLMClient
-from hyperplane_eval.engine.prompt_loader import load_prompt
-from hyperplane_eval.engine.input_space.input_space import InputSpace
+from hyperplane.framework.domain.dimensions.prompt_feature import PromptFeature
+from hyperplane.cli.llms.llm_client import LLMClient
+from hyperplane.prompts.prompt_loader import load_prompt
+from hyperplane.framework.input_space.input_space import InputSpace
 class InputSpaceFactory:
@@ -112,7 +112,10 @@ class InputSpaceFactory:
                 if plane:
                     hyperplanes.append(plane)
-        target_planes = self.BREADTH_MAP.get(breadth, self.DEFAULT_TARGET_PLANES)
+        if isinstance(breadth, int):
+            target_planes = breadth
+        else:
+            target_planes = self.BREADTH_MAP.get(breadth, self.DEFAULT_TARGET_PLANES)
         return hyperplanes[:target_planes]

{hyperplane_eval-0.1.5/hyperplane_eval/engine → hyperplane_eval-0.1.7/hyperplane/framework}/orchestrator.py RENAMED Viewed

@@ -4,13 +4,13 @@ import signal
 from pathlib import Path
 from typing import Dict, Any
-from hyperplane_eval.engine.input_space.input_space import InputSpace
-from hyperplane_eval.adapters.llms.llm_client import LLMClient
-from hyperplane_eval.engine.stages.generator import SyntheticInputGenerator
-from hyperplane_eval.engine.stages.evaluator import AgentOutputEvaluator
-from hyperplane_eval.engine.input_space.input_space_factory import InputSpaceFactory
-from hyperplane_eval.engine.config import EvaluationConfig
-from hyperplane_eval.reporting.analyser import ResultsAnalyser
+from hyperplane.framework.input_space.input_space import InputSpace
+from hyperplane.cli.llms.llm_client import LLMClient
+from hyperplane.framework.stages.generator import SyntheticInputGenerator
+from hyperplane.framework.stages.evaluator import AgentOutputEvaluator
+from hyperplane.framework.input_space.input_space_factory import InputSpaceFactory
+from hyperplane.framework.config import EvaluationConfig
+from hyperplane.framework.reporting.analyser import ResultsAnalyser
 from .plane_evaluator import PlaneEvaluator

{hyperplane_eval-0.1.5/hyperplane_eval/engine → hyperplane_eval-0.1.7/hyperplane/framework}/plane_evaluator.py RENAMED Viewed

@@ -2,17 +2,17 @@ from pathlib import Path
 import asyncio
 import sys
-from hyperplane_eval.engine.domain.vectors import (
+from hyperplane.framework.domain.vectors import (
     ScenarioVector,
     SynthesizedVector,
     ExecutedVector,
 )
-from hyperplane_eval.engine.domain.dimensions import PromptFeature
-from hyperplane_eval.engine.input_space.input_space import InputSpace
-from hyperplane_eval.engine.stages.generator import SyntheticInputGenerator
-from hyperplane_eval.engine.stages.evaluator import AgentOutputEvaluator
-from hyperplane_eval.engine.stages.navigator import AdaptiveNavigator
-from hyperplane_eval.adapters.runners.agent_runner import AgentRunner
+from hyperplane.framework.domain.dimensions.prompt_feature import PromptFeature
+from hyperplane.framework.input_space.input_space import InputSpace
+from hyperplane.framework.stages.generator import SyntheticInputGenerator
+from hyperplane.framework.stages.evaluator import AgentOutputEvaluator
+from hyperplane.framework.stages.navigator import AdaptiveNavigator
+from hyperplane.cli.runners.agent_runner import AgentRunner
 from typing import Any
@@ -177,14 +177,17 @@ class PlaneEvaluator:
         stop_event: asyncio.Event,
     ) -> InputSpace:
         """Evaluates a single hyperplane of prompt features."""
-        from hyperplane_eval.cli.app import VerifyApp
+        from hyperplane.cli.app import VerifyApp
         state_file = str(
             res_path / f"input_space_state_rule_{rule_idx}_plane_{plane_idx}.json"
         )
         plane_input_space = InputSpace(features=plane_features, state_path=state_file)
         unique_dims = len(set(f.name for f in plane_features))
-        multiplier = cls.DEPTH_MAP.get(depth, cls.DEFAULT_MULTIPLIER)
+        if isinstance(depth, int):
+            multiplier = depth
+        else:
+            multiplier = cls.DEPTH_MAP.get(depth, cls.DEFAULT_MULTIPLIER)
         scenarios_per_plane = unique_dims * multiplier
         navigator = AdaptiveNavigator(plane_input_space)

hyperplane-eval 0.1.5__tar.gz → 0.1.7__tar.gz

hyperplane-eval 0.1.5tar.gz → 0.1.7tar.gz