PyPI - fluxloop-cli - Versions diffs - 0.2.19__tar.gz → 0.2.36__tar.gz - Mend

fluxloop-cli 0.2.19tar.gz → 0.2.36tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (83) hide show

{fluxloop_cli-0.2.19 → fluxloop_cli-0.2.36}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: fluxloop-cli
-Version: 0.2.19
+Version: 0.2.36
 Summary: FluxLoop CLI for running agent simulations
 Author-email: FluxLoop Team <team@fluxloop.dev>
 License: Apache-2.0
@@ -26,6 +26,8 @@ Requires-Dist: httpx>=0.24.0
 Requires-Dist: rich>=13.0
 Requires-Dist: python-dotenv>=1.0.0
 Requires-Dist: fluxloop>=0.1.0
+Requires-Dist: ruamel.yaml>=0.17.0
+Requires-Dist: Jinja2>=3.0
 Provides-Extra: dev
 Requires-Dist: pytest>=7.0; extra == "dev"
 Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
@@ -65,14 +67,49 @@ The legacy `setting.yaml` is still supported, but new projects created with
 - `fluxloop init project` – scaffold a new project (configs, `.env`, examples)
 - `fluxloop generate inputs` – produce input variations for the active project
 - `fluxloop run experiment` – execute an experiment using `configs/simulation.yaml`
-- `fluxloop parse experiment` – convert experiment outputs into readable artifacts
-- `fluxloop evaluate experiment` – score experiment outputs using rule-based and LLM evaluators, generate reports with success criteria, analysis, and customizable templates
+- `fluxloop parse experiment` – convert experiment outputs into readable artifacts and emit structured per-trace JSON at `per_trace_analysis/per_trace.jsonl`
+- `fluxloop evaluate experiment` – run the LLM-driven evaluation pipeline (LLM-PT → rule aggregation → LLM-OV → HTML render). Requires the parsed per-trace file (or `--per-trace`) and writes an interactive report to `evaluation_report/report.html` by default.
 - `fluxloop config set-llm` – update LLM provider/model in `configs/input.yaml`
 - `fluxloop record enable|disable|status` – toggle recording mode across `.env` and simulation config
 - `fluxloop doctor` – summarize Python, FluxLoop CLI/MCP, and MCP index state for the active environment
+- `--yes/-y` (for `fluxloop run experiment`) – skip the interactive confirmation prompt, ideal for CI and the Pytest bridge
+### Multi-turn supervisor options
+`fluxloop run experiment` supports multi-turn orchestration out of the box:
+- Toggle with `--multi-turn/--no-multi-turn`
+- Limit depth via `--max-turns`
+- Control tool approvals with `--auto-approve-tools/--manual-approve-tools`
+- Override the supervisor persona target: `--persona-override`
+- Point at a specific LLM: `--supervisor-provider`, `--supervisor-model`, `--supervisor-temperature`, `--supervisor-api-key`
+These flags override the values in `configs/simulation.yaml` (`multi_turn` block). When enabled, the runner consults the supervisor after every turn to decide whether to continue and to synthesize the next realistic user message.
+**Scripted Playback Mode**: For deterministic multi-turn scenarios, switch `supervisor.provider` to `mock` and populate `supervisor.metadata.scripted_questions` with a list of user messages. FluxLoop will replay them sequentially and terminate when the script ends—ideal for regression testing and demos.
 Run `fluxloop --help` or `fluxloop <command> --help` for more detail.
+## Pytest Bridge (0.2.29+)
+- `fluxloop init pytest-template [project_root]` creates `tests/test_fluxloop_smoke.py`, already wired to the new `fluxloop_runner` fixture.
+- Fixtures live in `fluxloop_cli.testing.pytest_plugin` and return a `FluxLoopTestResult`, so you can assert on `total_runs`, `success_rate`, or call `require_success()`.
+- Full guide + CI example: see `docs/guides/pytest_bridge.md` (includes GitHub Actions workflow at `examples/ci/fluxloop_pytest.yml`).
+- Typical workflow:
+  1. `pip install -e packages/cli[dev]`
+  2. `fluxloop init pytest-template .`
+  3. `pytest -k fluxloop_smoke --maxfail=1`
+## Evaluation Workflow
+Evaluation now follows a two-step process so that multi-turn context is preserved:
+1. `fluxloop run experiment` – produce `trace_summary.jsonl` (and optionally `observations.jsonl`).
+2. `fluxloop parse experiment <experiment_dir>` – generate markdown summaries and a structured artifact at `per_trace_analysis/per_trace.jsonl`.
+3. `fluxloop evaluate experiment <experiment_dir>` – consume that structured file, run LLM-based per-trace + overall analysis, and emit an interactive dashboard at `<experiment_dir>/evaluation_report/report.html` (override with `--output`).
+`fluxloop evaluate` exits early with guidance when the per-trace artifact is missing. If you relocate the file, supply an explicit path with `--per-trace /path/to/per_trace.jsonl`.
 ## Quick Setup Script
 To prepare a fresh checkout (create `.venv`, install dependencies, and run diagnostics):

fluxloop_cli-0.2.36/README.md ADDED Viewed

@@ -0,0 +1,116 @@
+# FluxLoop CLI
+Command-line interface for running agent simulations.
+## Installation
+```
+pip install fluxloop-cli
+```
+## Configuration Overview (v0.2.0)
+FluxLoop CLI now stores experiment settings in four files under `configs/`:
+- `configs/project.yaml` – project metadata, collector defaults
+- `configs/input.yaml` – personas, base inputs, input generation options
+- `configs/simulation.yaml` – runtime parameters (iterations, runner, replay args)
+- `configs/evaluation.yaml` – evaluator definitions (rule-based, LLM judge, etc.)
+The legacy `setting.yaml` is still supported, but new projects created with
+`fluxloop init project` will generate the structured layout above.
+## Key Commands
+- `fluxloop init project` – scaffold a new project (configs, `.env`, examples)
+- `fluxloop generate inputs` – produce input variations for the active project
+- `fluxloop run experiment` – execute an experiment using `configs/simulation.yaml`
+- `fluxloop parse experiment` – convert experiment outputs into readable artifacts and emit structured per-trace JSON at `per_trace_analysis/per_trace.jsonl`
+- `fluxloop evaluate experiment` – run the LLM-driven evaluation pipeline (LLM-PT → rule aggregation → LLM-OV → HTML render). Requires the parsed per-trace file (or `--per-trace`) and writes an interactive report to `evaluation_report/report.html` by default.
+- `fluxloop config set-llm` – update LLM provider/model in `configs/input.yaml`
+- `fluxloop record enable|disable|status` – toggle recording mode across `.env` and simulation config
+- `fluxloop doctor` – summarize Python, FluxLoop CLI/MCP, and MCP index state for the active environment
+- `--yes/-y` (for `fluxloop run experiment`) – skip the interactive confirmation prompt, ideal for CI and the Pytest bridge
+### Multi-turn supervisor options
+`fluxloop run experiment` supports multi-turn orchestration out of the box:
+- Toggle with `--multi-turn/--no-multi-turn`
+- Limit depth via `--max-turns`
+- Control tool approvals with `--auto-approve-tools/--manual-approve-tools`
+- Override the supervisor persona target: `--persona-override`
+- Point at a specific LLM: `--supervisor-provider`, `--supervisor-model`, `--supervisor-temperature`, `--supervisor-api-key`
+These flags override the values in `configs/simulation.yaml` (`multi_turn` block). When enabled, the runner consults the supervisor after every turn to decide whether to continue and to synthesize the next realistic user message.
+**Scripted Playback Mode**: For deterministic multi-turn scenarios, switch `supervisor.provider` to `mock` and populate `supervisor.metadata.scripted_questions` with a list of user messages. FluxLoop will replay them sequentially and terminate when the script ends—ideal for regression testing and demos.
+Run `fluxloop --help` or `fluxloop <command> --help` for more detail.
+## Pytest Bridge (0.2.29+)
+- `fluxloop init pytest-template [project_root]` creates `tests/test_fluxloop_smoke.py`, already wired to the new `fluxloop_runner` fixture.
+- Fixtures live in `fluxloop_cli.testing.pytest_plugin` and return a `FluxLoopTestResult`, so you can assert on `total_runs`, `success_rate`, or call `require_success()`.
+- Full guide + CI example: see `docs/guides/pytest_bridge.md` (includes GitHub Actions workflow at `examples/ci/fluxloop_pytest.yml`).
+- Typical workflow:
+  1. `pip install -e packages/cli[dev]`
+  2. `fluxloop init pytest-template .`
+  3. `pytest -k fluxloop_smoke --maxfail=1`
+## Evaluation Workflow
+Evaluation now follows a two-step process so that multi-turn context is preserved:
+1. `fluxloop run experiment` – produce `trace_summary.jsonl` (and optionally `observations.jsonl`).
+2. `fluxloop parse experiment <experiment_dir>` – generate markdown summaries and a structured artifact at `per_trace_analysis/per_trace.jsonl`.
+3. `fluxloop evaluate experiment <experiment_dir>` – consume that structured file, run LLM-based per-trace + overall analysis, and emit an interactive dashboard at `<experiment_dir>/evaluation_report/report.html` (override with `--output`).
+`fluxloop evaluate` exits early with guidance when the per-trace artifact is missing. If you relocate the file, supply an explicit path with `--per-trace /path/to/per_trace.jsonl`.
+## Quick Setup Script
+To prepare a fresh checkout (create `.venv`, install dependencies, and run diagnostics):
+```
+bash scripts/setup_fluxloop_env.sh --target-source-root path/to/your/source
+```
+Options:
+- `--python PATH` – choose a specific interpreter (default `python3`)
+- `--target-source-root PATH` – pre-populate VSCode `fluxloop.targetSourceRoot`
+- `--skip-doctor` – skip the final `fluxloop doctor` check
+After running the script, open the folder in VSCode and use `FluxLoop: Show Environment Info`
+or `FluxLoop: Run Doctor` to confirm the environment.
+## Runner Integration Patterns
+Configure how FluxLoop calls your code in `configs/simulation.yaml`:
+- Module + function: `module_path`/`function_name` or `target: "module:function"`
+- Class.method (zero-arg ctor): `target: "module:Class.method"`
+- Module-scoped instance method: `target: "module:instance.method"`
+- Class.method with factory: add `factory: "module:make_instance"` (+ `factory_kwargs`)
+- Async generators: set `runner.stream_output_path` if your streamed event shape differs (default `message.delta`).
+See full examples: `packages/website/docs-cli/configuration/runner-targets.md`.
+## Developing
+Install dependencies and run tests:
+```
+python -m venv .venv
+source .venv/bin/activate
+pip install -e .[dev]
+pytest
+```
+To package the CLI:
+```
+./build.sh
+```

{fluxloop_cli-0.2.19 → fluxloop_cli-0.2.36}/fluxloop_cli/__init__.py RENAMED Viewed

@@ -2,7 +2,7 @@
 FluxLoop CLI - Command-line interface for running agent simulations.
 """
-__version__ = "0.2.19"
+__version__ = "0.2.36"
 from .main import app

{fluxloop_cli-0.2.19 → fluxloop_cli-0.2.36}/fluxloop_cli/arg_binder.py RENAMED Viewed

@@ -5,7 +5,9 @@ from __future__ import annotations
 import inspect
 import json
 from pathlib import Path
-from typing import Any, Callable, Dict, Optional
+from typing import Any, Callable, Dict, Optional, Sequence
+from fluxloop.schemas import ExperimentConfig, ReplayArgsConfig, PersonaConfig
 class _AttrDict(dict):
@@ -26,8 +28,6 @@ class _AttrDict(dict):
         except KeyError as exc:  # pragma: no cover
             raise AttributeError(item) from exc
-from fluxloop.schemas import ExperimentConfig, ReplayArgsConfig
 class _AwaitableNone:
     """Simple awaitable that resolves to ``None``."""
@@ -98,9 +98,17 @@ class ArgBinder:
         *,
         runtime_input: str,
         iteration: int = 0,
+        conversation_state: Optional[Dict[str, Any]] = None,
+        persona: Optional[PersonaConfig] = None,
+        auto_approve: Optional[bool] = None,
     ) -> Dict[str, Any]:
         """Construct kwargs for calling *func* based on replay or inspection."""
+        signature = inspect.signature(func)
+        parameters = list(signature.parameters.values())
+        if parameters and parameters[0].name == "self":
+            parameters = parameters[1:]
         if self._recording:
             kwargs = self._recording.get("kwargs", {}).copy()
@@ -111,22 +119,42 @@ class ArgBinder:
                 try:
                     self._set_by_path(kwargs, replay.override_param_path, runtime_input)
                 except (KeyError, TypeError):
-                    # If path missing, fall back to plain binding
-                    return self._bind_by_signature(func, runtime_input)
+                    kwargs = self._bind_runtime_input(parameters, runtime_input)
+            else:
+                fallback = self._bind_runtime_input(parameters, runtime_input)
+                for key, value in fallback.items():
+                    kwargs.setdefault(key, value)
             self._restore_callables(kwargs, replay)
             self._ensure_no_unmapped_callables(kwargs, replay)
-            return self._hydrate_structures(kwargs)
-        return self._bind_by_signature(func, runtime_input)
-    def _bind_by_signature(self, func: Callable, runtime_input: str) -> Dict[str, Any]:
-        signature = inspect.signature(func)
-        parameters = list(signature.parameters.values())
+            kwargs = self._hydrate_structures(kwargs)
+        else:
+            kwargs = self._bind_runtime_input(parameters, runtime_input)
+        return self._inject_optional_kwargs(
+            parameters=parameters,
+            kwargs=kwargs,
+            conversation_state=conversation_state,
+            persona=persona,
+            auto_approve=auto_approve,
+            iteration=iteration,
+        )
-        if parameters and parameters[0].name == "self":
-            parameters = parameters[1:]
+    def _bind_runtime_input(
+        self, parameters: Sequence[inspect.Parameter], runtime_input: str
+    ) -> Dict[str, Any]:
+        candidate = self._find_runtime_parameter(parameters)
+        if candidate:
+            return {candidate: runtime_input}
+        if parameters:
+            return {parameters[0].name: runtime_input}
+        raise ValueError(
+            "Cannot determine where to bind runtime input for the provided function."
+        )
+    @staticmethod
+    def _find_runtime_parameter(
+        parameters: Sequence[inspect.Parameter],
+    ) -> Optional[str]:
         candidate_names = [
             "input",
             "input_text",
@@ -134,18 +162,70 @@ class ArgBinder:
             "query",
             "text",
             "content",
+            "user_message",
         ]
+        for name in candidate_names:
+            for param in parameters:
+                if param.name == name:
+                    return name
+        return None
-        for param in parameters:
-            if param.name in candidate_names:
-                return {param.name: runtime_input}
+    def _inject_optional_kwargs(
+        self,
+        *,
+        parameters: Sequence[inspect.Parameter],
+        kwargs: Dict[str, Any],
+        conversation_state: Optional[Dict[str, Any]],
+        persona: Optional[PersonaConfig],
+        auto_approve: Optional[bool],
+        iteration: Optional[int],
+    ) -> Dict[str, Any]:
+        param_names = {param.name for param in parameters}
+        def assign(value: Any, candidates: Sequence[str]) -> bool:
+            if value is None:
+                return False
+            for name in candidates:
+                if name in param_names and name not in kwargs:
+                    kwargs[name] = value
+                    return True
+            return False
+        if conversation_state is not None:
+            assign(conversation_state, ["conversation_state", "state", "dialog_state"])
+            if isinstance(conversation_state, dict):
+                metadata = conversation_state.get("metadata")
+                if metadata:
+                    assign(
+                        metadata,
+                        ["conversation_metadata", "state_metadata", "conversation_meta"],
+                    )
+                turns = conversation_state.get("turns")
+                if turns:
+                    assign(turns, ["messages", "history", "turns"])
+        if persona is not None:
+            assign(persona, ["persona", "user_persona", "persona_config"])
+            try:
+                persona_prompt = persona.to_prompt()
+            except Exception:
+                persona_prompt = None
+            if persona_prompt:
+                assign(
+                    persona_prompt,
+                    ["persona_prompt", "persona_description", "persona_text"],
+                )
+        if auto_approve is not None:
+            assign(
+                auto_approve,
+                ["auto_approve", "auto_approve_tools", "approve_tools", "autoapprove"],
+            )
-        if parameters:
-            return {parameters[0].name: runtime_input}
+        if iteration is not None:
+            assign(iteration, ["iteration", "run_iteration", "loop_index"])
-        raise ValueError(
-            f"Cannot determine where to bind runtime input for function '{func.__name__}'."
-        )
+        return kwargs
     def _restore_callables(self, kwargs: Dict[str, Any], replay: ReplayArgsConfig) -> None:
         for param_name, provider in replay.callable_providers.items():
@@ -198,7 +278,6 @@ class ArgBinder:
             def _record(args: Any, kwargs: Any) -> None:
                 messages.append((args, kwargs))
-                pretty = args[0] if len(args) == 1 and not kwargs else {"args": args, "kwargs": kwargs}
             def send(*args: Any, **kwargs: Any) -> _AwaitableNone:
                 _record(args, kwargs)

{fluxloop_cli-0.2.19 → fluxloop_cli-0.2.36}/fluxloop_cli/commands/config.py RENAMED Viewed

@@ -14,7 +14,6 @@ from rich.syntax import Syntax
 from rich.table import Table
 from ..config_loader import load_experiment_config
-from ..templates import create_env_file, create_gitignore, create_sample_agent
 from ..constants import DEFAULT_CONFIG_PATH, DEFAULT_ROOT_DIR_NAME
 from ..config_schema import CONFIG_SECTION_FILENAMES
 from ..project_paths import (

fluxloop_cli-0.2.36/fluxloop_cli/commands/evaluate.py ADDED Viewed

@@ -0,0 +1,264 @@
+"""Evaluate command for generating interactive reports."""
+from __future__ import annotations
+import asyncio
+import logging
+import os
+import shutil
+from dataclasses import asdict
+from pathlib import Path
+from typing import Dict, List, Optional
+import typer
+import yaml
+from rich.console import Console
+from ..environment import load_env_chain
+from ..evaluation import load_evaluation_config
+from ..evaluation.artifacts import load_per_trace_records, load_trace_summary_records
+from ..evaluation.report.pipeline import ReportPipeline
+console = Console()
+app = typer.Typer(help="Evaluate experiment outputs and generate interactive reports.")
+logger = logging.getLogger(__name__)
+def _load_yaml_file(path: Optional[Path]) -> dict:
+    if not path or not path.exists():
+        return {}
+    with path.open("r", encoding="utf-8") as handle:
+        data = yaml.safe_load(handle) or {}
+    if isinstance(data, dict):
+        return data
+    return {}
+def _resolve_project_root(config_path: Path) -> Path:
+    config_dir = config_path.parent
+    return config_dir.parent if config_dir.name == "configs" else config_dir
+def _find_config_file(config_path: Path, filename: str) -> Optional[Path]:
+    config_dir = config_path.parent
+    project_root = _resolve_project_root(config_path)
+    candidates = [
+        config_dir / filename,
+        project_root / "configs" / filename,
+        project_root / filename,
+    ]
+    for candidate in candidates:
+        if candidate.exists():
+            return candidate
+    return None
+def _prepare_output_directory(path: Path, overwrite: bool) -> None:
+    if path.exists():
+        if not overwrite:
+            raise typer.BadParameter(
+                f"Output directory already exists: {path}. Use --overwrite to replace it."
+            )
+        shutil.rmtree(path)
+    path.mkdir(parents=True, exist_ok=True)
+def _load_generated_inputs_data(input_config: Dict[str, any], project_root: Path) -> Dict[str, any]:
+    """
+    Load generated inputs (variations) from the configured inputs file, if available.
+    """
+    inputs_file_value = input_config.get("inputs_file") or "inputs/generated.yaml"
+    inputs_path = Path(inputs_file_value)
+    if not inputs_path.is_absolute():
+        inputs_path = (project_root / inputs_path).resolve()
+    if not inputs_path.exists():
+        logger.debug("Generated inputs file not found at %s", inputs_path)
+        return {}
+    try:
+        with inputs_path.open("r", encoding="utf-8") as handle:
+            payload = yaml.safe_load(handle) or {}
+    except Exception as exc:  # noqa: BLE001
+        logger.warning("Failed to load generated inputs file %s: %s", inputs_path, exc)
+        return {}
+    inputs_list: List[Dict[str, any]] = []
+    generation_cfg: Dict[str, any] = {}
+    if isinstance(payload, dict):
+        inputs_list = payload.get("inputs") or payload.get("variations") or []
+        generation_cfg = payload.get("generation_config") or {}
+    elif isinstance(payload, list):
+        inputs_list = payload
+    else:
+        logger.debug("Generated inputs file %s did not contain a supported structure", inputs_path)
+        return {}
+    variations: List[Dict[str, str]] = []
+    for entry in inputs_list:
+        if not isinstance(entry, dict):
+            continue
+        text = entry.get("input")
+        if not text:
+            continue
+        metadata = entry.get("metadata") or {}
+        persona = entry.get("persona") or metadata.get("persona")
+        strategy = entry.get("strategy") or metadata.get("variation_strategy") or metadata.get("strategy")
+        variations.append(
+            {
+                "persona": (persona or "unknown").strip(),
+                "strategy": (strategy or "base").strip(),
+                "input": text.strip(),
+            }
+        )
+    generator_model = generation_cfg.get("model") or generation_cfg.get("generator_model")
+    provider = generation_cfg.get("provider")
+    if generator_model and provider and "/" not in str(generator_model):
+        generator_model = f"{provider}/{generator_model}"
+    return {
+        "path": str(inputs_path),
+        "variations": variations,
+        "generator_model": generator_model or input_config.get("input_generation", {})
+        .get("llm", {})
+        .get("model"),
+        "strategies": generation_cfg.get("strategies") or input_config.get("variation_strategies", []),
+    }
+@app.command()
+def experiment(
+    experiment_dir: Path = typer.Argument(
+        ...,
+        help="Path to the experiment output directory",
+        exists=True,
+        dir_okay=True,
+        file_okay=False,
+        resolve_path=True,
+    ),
+    config: Path = typer.Option(
+        Path("configs/evaluation.yaml"),
+        "--config",
+        "-c",
+        help="Path to evaluation configuration file",
+    ),
+    output: Path = typer.Option(
+        Path("evaluation_report"),
+        "--output",
+        "-o",
+        help="Output directory name (relative to the experiment directory)",
+    ),
+    overwrite: bool = typer.Option(
+        False,
+        "--overwrite",
+        help="Overwrite output directory if it already exists",
+    ),
+    llm_api_key: Optional[str] = typer.Option(
+        None,
+        "--llm-api-key",
+        help="LLM API key for report generation (optional)",
+        envvar="FLUXLOOP_LLM_API_KEY",
+    ),
+    per_trace: Optional[Path] = typer.Option(
+        None,
+        "--per-trace",
+        help="Path to structured per-trace JSONL generated by `fluxloop parse`",
+    ),
+    verbose: bool = typer.Option(
+        False,
+        "--verbose",
+        help="Enable verbose logging",
+    ),
+):
+    """
+    Evaluate experiment outputs and generate an interactive HTML report.
+    """
+    logging.basicConfig(
+        level=logging.DEBUG if verbose else logging.INFO,
+        format="%(message)s",
+    )
+    resolved_experiment_dir = experiment_dir.resolve()
+    if not resolved_experiment_dir.is_dir():
+        raise typer.BadParameter(f"Experiment directory not found: {resolved_experiment_dir}")
+    config_path = config.resolve() if config.is_absolute() else (Path.cwd() / config).resolve()
+    project_root = _resolve_project_root(config_path)
+    if per_trace is not None:
+        per_trace_path = per_trace.resolve() if per_trace.is_absolute() else (Path.cwd() / per_trace).resolve()
+    else:
+        per_trace_path = resolved_experiment_dir / "per_trace_analysis" / "per_trace.jsonl"
+    per_trace_records = load_per_trace_records(resolved_experiment_dir, per_trace_path)
+    trace_records = [record.trace for record in per_trace_records]
+    if not trace_records:
+        raise typer.BadParameter("No traces found in per-trace artifacts.")
+    trace_summary_path = resolved_experiment_dir / "trace_summary.jsonl"
+    trace_summaries = load_trace_summary_records(resolved_experiment_dir, trace_summary_path)
+    if not trace_summaries:
+        raise typer.BadParameter("No traces found in trace summary artifacts.")
+    try:
+        evaluation_config = load_evaluation_config(config_path)
+    except FileNotFoundError as exc:
+        raise typer.BadParameter(str(exc)) from exc
+    except Exception as exc:  # noqa: BLE001
+        raise typer.BadParameter(f"Failed to load evaluation config: {exc}") from exc
+    def _log_env_error(path: Path, exc: Exception) -> None:
+        console.log(
+            f"[yellow]Warning:[/yellow] Failed to load environment from {path}: {exc}"
+        )
+    load_env_chain(
+        evaluation_config.get_source_dir(),
+        refresh_config=True,
+        on_error=_log_env_error,
+    )
+    if llm_api_key is None:
+        llm_api_key = os.getenv("FLUXLOOP_LLM_API_KEY") or os.getenv("OPENAI_API_KEY")
+    output_dir = output if output.is_absolute() else (resolved_experiment_dir / output)
+    _prepare_output_directory(output_dir, overwrite)
+    input_config_path = _find_config_file(config_path, "input.yaml")
+    project_config_path = _find_config_file(config_path, "project.yaml")
+    input_config = _load_yaml_file(input_config_path)
+    project_config = _load_yaml_file(project_config_path)
+    generated_inputs = _load_generated_inputs_data(input_config, project_root)
+    config_bundle = {
+        "name": project_config.get("name") or resolved_experiment_dir.name,
+        "evaluation": asdict(evaluation_config),
+        "input": input_config,
+        "generated_inputs": generated_inputs,
+    }
+    pipeline = ReportPipeline(
+        config=config_bundle,
+        output_dir=output_dir,
+        api_key=llm_api_key,
+    )
+    message_lines = [
+        f"📊 Evaluating experiment at [cyan]{resolved_experiment_dir}[/cyan]",
+        f"⚙️  Config: [magenta]{config_path}[/magenta]",
+        f"🧵 Per-trace data: [blue]{per_trace_path}[/blue]",
+        f"📄 Trace summary: [blue]{trace_summary_path}[/blue]",
+        f"📁 Output: [green]{output_dir}[/green]",
+    ]
+    console.print("\n".join(message_lines))
+    artifacts = asyncio.run(pipeline.run(trace_records, trace_summaries))
+    console.print(f"\n✅ Report ready: [bold cyan]{artifacts.html_path}[/bold cyan]")

fluxloop-cli 0.2.19__tar.gz → 0.2.36__tar.gz

fluxloop-cli 0.2.19tar.gz → 0.2.36tar.gz