PyPI - osmosis-ai - Versions diffs - 0.2.3__tar.gz → 0.2.4__tar.gz - Mend

osmosis-ai 0.2.3tar.gz → 0.2.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of osmosis-ai might be problematic. Click here for more details.

Files changed (38) hide show

{osmosis_ai-0.2.3 → osmosis_ai-0.2.4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: osmosis-ai
-Version: 0.2.3
+Version: 0.2.4
 Summary: A Python library for reward function validation with strict type enforcement.
 Author-email: Osmosis AI <jake@osmosis.ai>
 License: MIT License
@@ -81,23 +81,12 @@ score = simple_reward("hello world", "hello world")  # Returns 1.0
 ```python
 from osmosis_ai import evaluate_rubric
-messages = [
-    {
-        "type": "message",
-        "role": "user",
-        "content": [{"type": "input_text", "text": "What is the capital of France?"}],
-    },
-    {
-        "type": "message",
-        "role": "assistant",
-        "content": [{"type": "output_text", "text": "The capital of France is Paris."}],
-    },
-]
+solution = "The capital of France is Paris."
 # Export OPENAI_API_KEY in your shell before running this snippet.
 rubric_score = evaluate_rubric(
     rubric="Assistant must mention the verified capital city.",
-    messages=messages,
+    solution_str=solution,
     model_info={
         "provider": "openai",
         "model": "gpt-5",
@@ -128,13 +117,15 @@ Credentials are resolved from environment variables by default:
 Override the environment variable name with `model_info={"api_key_env": "CUSTOM_ENV_NAME"}` when needed, or supply an inline secret with `model_info={"api_key": "sk-..."}` for ephemeral credentials. Missing API keys raise a `MissingAPIKeyError` that explains how to export the secret before trying again.
+`api_key` and `api_key_env` are mutually exclusive ways to provide the same credential. When `api_key` is present and non-empty it is used directly, skipping any environment lookup. Otherwise the resolver falls back to `api_key_env` (or the provider default) and pulls the value from your local environment with `os.getenv`.
 `model_info` accepts additional rubric-specific knobs:
 - `score_min` / `score_max` – change the default `[0.0, 1.0]` scoring bounds.
-- `system_prompt` / `original_input` – override the helper’s transcript inference when those entries are absent.
+- `system_prompt` / `original_input` – provide optional context strings that will be quoted in the judging prompt.
 - `timeout` – customise the provider timeout in seconds.
-Pass `extra_info={...}` to `evaluate_rubric` when you need structured context quoted in the judge prompt, and set `return_details=True` to receive the full `RewardRubricRunResult` payload (including the provider’s raw response).
+Pass `metadata={...}` to `evaluate_rubric` when you need structured context quoted in the judge prompt, and set `return_details=True` to receive the full `RewardRubricRunResult` payload (including the provider’s raw response).
 Remote failures surface as `ProviderRequestError` instances, with `ModelNotFoundError` reserved for missing model identifiers so you can retry with a new snapshot.
@@ -172,24 +163,35 @@ The decorator will raise a `TypeError` if the function doesn't match this exact
 ## Rubric Function Signature
-Rubric functions decorated with `@osmosis_rubric` must accept the parameters:
+Rubric functions decorated with `@osmosis_rubric` must match this signature:
+```python
+@osmosis_rubric
+def your_rubric(solution_str: str, ground_truth: str | None, extra_info: dict) -> float:
+    # Your rubric logic here
+    return float_score
+```
+> The runtime forwards `None` for `ground_truth` when no reference answer exists. Annotate the parameter as `Optional[str]` (or handle `None` explicitly) if your rubric logic expects to run in that scenario.
+### Required `extra_info` fields
-- `model_info: dict`
-- `rubric: str`
-- `messages: list`
-- `ground_truth: Optional[str] = None`
-- `system_message: Optional[str] = None`
-- `extra_info: dict = None`
-- `score_min: float = 0.0` *(optional lower bound; must default to 0.0 and stay below `score_max`)*
-- `score_max: float = 1.0` *(optional upper bound; must default to 1.0 and stay above `score_min`)*
+- **`provider`** – Non-empty string identifying the judge provider.
+- **`model`** – Non-empty string naming the provider model to call.
+- **`rubric`** – Natural-language rubric instructions for the judge model.
+- **`api_key` / `api_key_env`** – Supply either the raw key or the environment variable name that exposes it.
-and must return a `float`. The decorator validates the signature and runtime payload (including message role validation and return type) before delegating to your custom logic.
+### Optional `extra_info` fields
-> Required fields: `model_info` must contain non-empty `provider` and `model` string entries.
+- **`system_prompt`** – Optional string prepended to the provider’s base system prompt when invoking the judge; include it inside `extra_info` rather than as a separate argument.
+- **`score_min` / `score_max`** – Optional numeric overrides for the expected score range.
+- **`model_info_overrides`** – Optional dict merged into the provider configuration passed to the judge.
-> Annotation quirk: `extra_info` must be annotated as a plain `dict` with a default of `None` to satisfy the validator.
+Additional keys are passthrough and can be used for custom configuration. If you need to extend the provider payload (for example adding `api_key_env`), add a dict under `model_info_overrides` and it will be merged with the required `provider`/`model` pair before invoking `evaluate_rubric`. The decorator enforces the parameter names/annotations, validates the embedded configuration at call time, and ensures the wrapped function returns a `float`.
-> Tip: You can call `evaluate_rubric` from inside a rubric function (or any other orchestrator) to outsource judging to a hosted model while still benefiting from the decorator’s validation.
+> Annotation quirk: `extra_info` must be annotated as `dict` **without** a default value, unlike `@osmosis_reward`.
+> Tip: When delegating to `evaluate_rubric`, pass the raw `solution_str` directly and include any extra context inside the `metadata` payload.
 ## Examples
@@ -224,8 +226,8 @@ def numeric_tolerance(solution_str: str, ground_truth: str, extra_info: dict = N
 - `examples/rubric_functions.py` demonstrates `evaluate_rubric` with OpenAI, Anthropic, Gemini, and xAI using the schema-enforced SDK integrations.
 - `examples/reward_functions.py` keeps local reward helpers that showcase the decorator contract without external calls.
-- `examples/rubric_configs.yaml` bundles two rubric definitions, each with its own provider configuration and extra prompt context.
-- `examples/sample_data.jsonl` contains two conversation payloads mapped to those rubrics so you can trial dataset validation.
+- `examples/rubric_configs.yaml` bundles two rubric definitions with provider configuration and scoring bounds.
+- `examples/sample_data.jsonl` contains two rubric-aligned solution strings so you can trial dataset validation.
 ```yaml
 # examples/rubric_configs.yaml (excerpt)
@@ -239,8 +241,8 @@ rubrics:
 ```
 ```jsonl
-{"conversation_id": "ticket-001", "rubric_id": "support_followup", "...": "..."}
-{"conversation_id": "ticket-047", "rubric_id": "policy_grounding", "...": "..."}
+{"conversation_id": "ticket-001", "rubric_id": "support_followup", "original_input": "...", "solution_str": "..."}
+{"conversation_id": "ticket-047", "rubric_id": "policy_grounding", "original_input": "...", "solution_str": "..."}
 ```
 ## CLI Tools
@@ -253,7 +255,7 @@ Preview a rubric file and print every configuration discovered, including nested
 osmosis preview --path path/to/rubric.yaml
 ```
-Preview a dataset of chat transcripts stored as JSONL:
+Preview a dataset of rubric-scored solutions stored as JSONL:
 ```bash
 osmosis preview --path path/to/data.jsonl
@@ -271,6 +273,9 @@ osmosis eval --rubric support_followup --data examples/sample_data.jsonl
 - Provide `--output path/to/dir` to create the directory (if needed) and emit `rubric_eval_result_<unix_timestamp>.json`, or supply a full file path (any extension) to control the filename; each file captures every run, provider payloads, timestamps, and aggregate statistics for downstream analysis.
 - Skip `--output` to collect results under `~/.cache/osmosis/eval_result/<rubric_id>/rubric_eval_result_<identifier>.json`; the CLI writes this JSON whether the evaluation finishes cleanly or hits provider/runtime errors so you can inspect failures later (only a manual Ctrl+C interrupt leaves no file behind).
 - Dataset rows whose `rubric_id` does not match the requested rubric are skipped automatically.
+- Each dataset record must provide a non-empty `solution_str`; optional fields such as `original_input`, `ground_truth`, and `extra_info` travel with the record and are forwarded to the evaluator when present.
+- When delegating to a custom `@osmosis_rubric` function, the CLI enriches `extra_info` with the active `provider`, `model`, `rubric`, score bounds, any configured `system_prompt`, the resolved `original_input`, and the record’s metadata/extra fields so the decorator’s required entries are always present.
+- Rubric configuration files intentionally reject `extra_info`; provide per-example context through the dataset instead.
 Both commands validate the file, echo a short summary (`Loaded <n> ...`), and pretty-print the parsed records so you can confirm that new rubrics or test fixtures look correct before committing them. Invalid files raise a descriptive error and exit with a non-zero status code.
@@ -283,7 +288,13 @@ PYTHONPATH=. python examples/rubric_functions.py  # Uncomment the provider you n
 ## Testing
-Run `python -m pytest tests/test_rubric_eval.py` to exercise the guards that ensure rubric prompts ignore message metadata (for example `tests/test_rubric_eval.py::test_collect_text_skips_metadata_fields`) while still preserving nested tool output. Add additional tests under `tests/` as you extend the library.
+Run `python -m pytest` (or any subset under `tests/`) to exercise the updated helpers:
+- `tests/test_rubric_eval.py` covers prompt construction for `solution_str` evaluations.
+- `tests/test_cli_services.py` validates dataset parsing, extra-info enrichment, and engine interactions.
+- `tests/test_cli.py` ensures the CLI pathways surface the new fields end to end.
+Add additional tests under `tests/` as you extend the library.
 ## License

{osmosis_ai-0.2.3 → osmosis_ai-0.2.4}/README.md RENAMED Viewed

@@ -36,23 +36,12 @@ score = simple_reward("hello world", "hello world")  # Returns 1.0
 ```python
 from osmosis_ai import evaluate_rubric
-messages = [
-    {
-        "type": "message",
-        "role": "user",
-        "content": [{"type": "input_text", "text": "What is the capital of France?"}],
-    },
-    {
-        "type": "message",
-        "role": "assistant",
-        "content": [{"type": "output_text", "text": "The capital of France is Paris."}],
-    },
-]
+solution = "The capital of France is Paris."
 # Export OPENAI_API_KEY in your shell before running this snippet.
 rubric_score = evaluate_rubric(
     rubric="Assistant must mention the verified capital city.",
-    messages=messages,
+    solution_str=solution,
     model_info={
         "provider": "openai",
         "model": "gpt-5",
@@ -83,13 +72,15 @@ Credentials are resolved from environment variables by default:
 Override the environment variable name with `model_info={"api_key_env": "CUSTOM_ENV_NAME"}` when needed, or supply an inline secret with `model_info={"api_key": "sk-..."}` for ephemeral credentials. Missing API keys raise a `MissingAPIKeyError` that explains how to export the secret before trying again.
+`api_key` and `api_key_env` are mutually exclusive ways to provide the same credential. When `api_key` is present and non-empty it is used directly, skipping any environment lookup. Otherwise the resolver falls back to `api_key_env` (or the provider default) and pulls the value from your local environment with `os.getenv`.
 `model_info` accepts additional rubric-specific knobs:
 - `score_min` / `score_max` – change the default `[0.0, 1.0]` scoring bounds.
-- `system_prompt` / `original_input` – override the helper’s transcript inference when those entries are absent.
+- `system_prompt` / `original_input` – provide optional context strings that will be quoted in the judging prompt.
 - `timeout` – customise the provider timeout in seconds.
-Pass `extra_info={...}` to `evaluate_rubric` when you need structured context quoted in the judge prompt, and set `return_details=True` to receive the full `RewardRubricRunResult` payload (including the provider’s raw response).
+Pass `metadata={...}` to `evaluate_rubric` when you need structured context quoted in the judge prompt, and set `return_details=True` to receive the full `RewardRubricRunResult` payload (including the provider’s raw response).
 Remote failures surface as `ProviderRequestError` instances, with `ModelNotFoundError` reserved for missing model identifiers so you can retry with a new snapshot.
@@ -127,24 +118,35 @@ The decorator will raise a `TypeError` if the function doesn't match this exact
 ## Rubric Function Signature
-Rubric functions decorated with `@osmosis_rubric` must accept the parameters:
+Rubric functions decorated with `@osmosis_rubric` must match this signature:
+```python
+@osmosis_rubric
+def your_rubric(solution_str: str, ground_truth: str | None, extra_info: dict) -> float:
+    # Your rubric logic here
+    return float_score
+```
+> The runtime forwards `None` for `ground_truth` when no reference answer exists. Annotate the parameter as `Optional[str]` (or handle `None` explicitly) if your rubric logic expects to run in that scenario.
+### Required `extra_info` fields
-- `model_info: dict`
-- `rubric: str`
-- `messages: list`
-- `ground_truth: Optional[str] = None`
-- `system_message: Optional[str] = None`
-- `extra_info: dict = None`
-- `score_min: float = 0.0` *(optional lower bound; must default to 0.0 and stay below `score_max`)*
-- `score_max: float = 1.0` *(optional upper bound; must default to 1.0 and stay above `score_min`)*
+- **`provider`** – Non-empty string identifying the judge provider.
+- **`model`** – Non-empty string naming the provider model to call.
+- **`rubric`** – Natural-language rubric instructions for the judge model.
+- **`api_key` / `api_key_env`** – Supply either the raw key or the environment variable name that exposes it.
-and must return a `float`. The decorator validates the signature and runtime payload (including message role validation and return type) before delegating to your custom logic.
+### Optional `extra_info` fields
-> Required fields: `model_info` must contain non-empty `provider` and `model` string entries.
+- **`system_prompt`** – Optional string prepended to the provider’s base system prompt when invoking the judge; include it inside `extra_info` rather than as a separate argument.
+- **`score_min` / `score_max`** – Optional numeric overrides for the expected score range.
+- **`model_info_overrides`** – Optional dict merged into the provider configuration passed to the judge.
-> Annotation quirk: `extra_info` must be annotated as a plain `dict` with a default of `None` to satisfy the validator.
+Additional keys are passthrough and can be used for custom configuration. If you need to extend the provider payload (for example adding `api_key_env`), add a dict under `model_info_overrides` and it will be merged with the required `provider`/`model` pair before invoking `evaluate_rubric`. The decorator enforces the parameter names/annotations, validates the embedded configuration at call time, and ensures the wrapped function returns a `float`.
-> Tip: You can call `evaluate_rubric` from inside a rubric function (or any other orchestrator) to outsource judging to a hosted model while still benefiting from the decorator’s validation.
+> Annotation quirk: `extra_info` must be annotated as `dict` **without** a default value, unlike `@osmosis_reward`.
+> Tip: When delegating to `evaluate_rubric`, pass the raw `solution_str` directly and include any extra context inside the `metadata` payload.
 ## Examples
@@ -179,8 +181,8 @@ def numeric_tolerance(solution_str: str, ground_truth: str, extra_info: dict = N
 - `examples/rubric_functions.py` demonstrates `evaluate_rubric` with OpenAI, Anthropic, Gemini, and xAI using the schema-enforced SDK integrations.
 - `examples/reward_functions.py` keeps local reward helpers that showcase the decorator contract without external calls.
-- `examples/rubric_configs.yaml` bundles two rubric definitions, each with its own provider configuration and extra prompt context.
-- `examples/sample_data.jsonl` contains two conversation payloads mapped to those rubrics so you can trial dataset validation.
+- `examples/rubric_configs.yaml` bundles two rubric definitions with provider configuration and scoring bounds.
+- `examples/sample_data.jsonl` contains two rubric-aligned solution strings so you can trial dataset validation.
 ```yaml
 # examples/rubric_configs.yaml (excerpt)
@@ -194,8 +196,8 @@ rubrics:
 ```
 ```jsonl
-{"conversation_id": "ticket-001", "rubric_id": "support_followup", "...": "..."}
-{"conversation_id": "ticket-047", "rubric_id": "policy_grounding", "...": "..."}
+{"conversation_id": "ticket-001", "rubric_id": "support_followup", "original_input": "...", "solution_str": "..."}
+{"conversation_id": "ticket-047", "rubric_id": "policy_grounding", "original_input": "...", "solution_str": "..."}
 ```
 ## CLI Tools
@@ -208,7 +210,7 @@ Preview a rubric file and print every configuration discovered, including nested
 osmosis preview --path path/to/rubric.yaml
 ```
-Preview a dataset of chat transcripts stored as JSONL:
+Preview a dataset of rubric-scored solutions stored as JSONL:
 ```bash
 osmosis preview --path path/to/data.jsonl
@@ -226,6 +228,9 @@ osmosis eval --rubric support_followup --data examples/sample_data.jsonl
 - Provide `--output path/to/dir` to create the directory (if needed) and emit `rubric_eval_result_<unix_timestamp>.json`, or supply a full file path (any extension) to control the filename; each file captures every run, provider payloads, timestamps, and aggregate statistics for downstream analysis.
 - Skip `--output` to collect results under `~/.cache/osmosis/eval_result/<rubric_id>/rubric_eval_result_<identifier>.json`; the CLI writes this JSON whether the evaluation finishes cleanly or hits provider/runtime errors so you can inspect failures later (only a manual Ctrl+C interrupt leaves no file behind).
 - Dataset rows whose `rubric_id` does not match the requested rubric are skipped automatically.
+- Each dataset record must provide a non-empty `solution_str`; optional fields such as `original_input`, `ground_truth`, and `extra_info` travel with the record and are forwarded to the evaluator when present.
+- When delegating to a custom `@osmosis_rubric` function, the CLI enriches `extra_info` with the active `provider`, `model`, `rubric`, score bounds, any configured `system_prompt`, the resolved `original_input`, and the record’s metadata/extra fields so the decorator’s required entries are always present.
+- Rubric configuration files intentionally reject `extra_info`; provide per-example context through the dataset instead.
 Both commands validate the file, echo a short summary (`Loaded <n> ...`), and pretty-print the parsed records so you can confirm that new rubrics or test fixtures look correct before committing them. Invalid files raise a descriptive error and exit with a non-zero status code.
@@ -238,7 +243,13 @@ PYTHONPATH=. python examples/rubric_functions.py  # Uncomment the provider you n
 ## Testing
-Run `python -m pytest tests/test_rubric_eval.py` to exercise the guards that ensure rubric prompts ignore message metadata (for example `tests/test_rubric_eval.py::test_collect_text_skips_metadata_fields`) while still preserving nested tool output. Add additional tests under `tests/` as you extend the library.
+Run `python -m pytest` (or any subset under `tests/`) to exercise the updated helpers:
+- `tests/test_rubric_eval.py` covers prompt construction for `solution_str` evaluations.
+- `tests/test_cli_services.py` validates dataset parsing, extra-info enrichment, and engine interactions.
+- `tests/test_cli.py` ensures the CLI pathways surface the new fields end to end.
+Add additional tests under `tests/` as you extend the library.
 ## License

{osmosis_ai-0.2.3 → osmosis_ai-0.2.4}/osmosis_ai/cli_services/__init__.py RENAMED Viewed

@@ -10,13 +10,7 @@ from .config import (
     load_rubric_suite,
     render_yaml_items,
 )
-from .dataset import (
-    ConversationMessage,
-    DatasetLoader,
-    DatasetRecord,
-    load_jsonl_records,
-    render_json_records,
-)
+from .dataset import DatasetLoader, DatasetRecord, load_jsonl_records, render_json_records
 from .engine import (
     EvaluationRecordResult,
     EvaluationReport,
@@ -40,7 +34,6 @@ __all__ = [
     "BaselineStatistics",
     "CLIError",
     "ConsoleReportRenderer",
-    "ConversationMessage",
     "DatasetLoader",
     "DatasetRecord",
     "EvaluationSession",

{osmosis_ai-0.2.3 → osmosis_ai-0.2.4}/osmosis_ai/cli_services/config.py RENAMED Viewed

@@ -25,8 +25,7 @@ class RubricConfig:
     model_info: dict[str, Any]
     score_min: Optional[float]
     score_max: Optional[float]
-    system_message: Optional[str]
-    extra_info: Optional[dict[str, Any]]
+    system_prompt: Optional[str]
     original_input: Optional[str]
     ground_truth: Optional[str]
     source_label: str
@@ -195,6 +194,13 @@ def _build_document_configs(
         parsed_items.append(ParsedItem(label=item.label, payload=payload))
         if not isinstance(payload, dict):
             continue
+        if "extra_info" in payload:
+            message = (
+                f"Rubric entry in '{path}' (document {doc_index + 1}) must not include 'extra_info'."
+            )
+            if strict:
+                raise CLIError(message)
+            continue
         rubric_key_raw = payload.get("id")
         if not isinstance(rubric_key_raw, str) or not rubric_key_raw.strip():
@@ -223,14 +229,6 @@ def _build_document_configs(
                 )
             continue
-        extra_info_value = payload.get("extra_info", defaults.get("extra_info"))
-        if extra_info_value is not None and not isinstance(extra_info_value, dict):
-            if strict:
-                raise CLIError(
-                    f"'extra_info' for rubric '{rubric_key}' in '{path}' must be a mapping."
-                )
-            continue
         try:
             score_min = coerce_optional_float(
                 payload.get("score_min", defaults.get("score_min")),
@@ -247,8 +245,12 @@ def _build_document_configs(
                 raise
             continue
-        system_message = payload.get("system_message", defaults.get("system_message"))
+        system_prompt = payload.get("system_prompt", defaults.get("system_prompt"))
         original_input = payload.get("original_input", defaults.get("original_input"))
+        if not isinstance(original_input, str):
+            original_input = None
         ground_truth = payload.get("ground_truth", defaults.get("ground_truth"))
         label = item.label or f"document[{doc_index}]"
@@ -260,9 +262,8 @@ def _build_document_configs(
             model_info=copy.deepcopy(model_info),
             score_min=score_min,
             score_max=score_max,
-            system_message=system_message if isinstance(system_message, str) else None,
-            extra_info=copy.deepcopy(extra_info_value) if isinstance(extra_info_value, dict) else None,
-            original_input=original_input if isinstance(original_input, str) else None,
+            system_prompt=system_prompt if isinstance(system_prompt, str) else None,
+            original_input=original_input,
             ground_truth=ground_truth if isinstance(ground_truth, str) else None,
             source_label=source_label,
         )
@@ -347,10 +348,9 @@ def _extract_config_defaults(document: Any, path: Path, doc_index: int) -> dict[
     if not isinstance(document, dict):
         return {
             "model_info": None,
-            "extra_info": None,
             "score_min": None,
             "score_max": None,
-            "system_message": None,
+            "system_prompt": None,
             "original_input": None,
             "ground_truth": None,
         }
@@ -358,15 +358,18 @@ def _extract_config_defaults(document: Any, path: Path, doc_index: int) -> dict[
     source = f"document[{doc_index}] in {path}"
     defaults: dict[str, Any] = {}
+    if "default_extra_info" in document:
+        raise CLIError(
+            f"Rubric config document {doc_index + 1} in {path} must not include 'default_extra_info'; extra_info is no longer supported."
+        )
     defaults["model_info"] = document.get("default_model_info")
-    defaults["extra_info"] = document.get("default_extra_info")
     defaults["score_min"] = coerce_optional_float(
         document.get("default_score_min"), "default_score_min", source
     )
     defaults["score_max"] = coerce_optional_float(
         document.get("default_score_max"), "default_score_max", source
     )
-    defaults["system_message"] = document.get("default_system_message")
+    defaults["system_prompt"] = document.get("default_system_prompt")
     defaults["original_input"] = document.get("default_original_input")
     defaults["ground_truth"] = document.get("default_ground_truth")
     return defaults

{osmosis_ai-0.2.3 → osmosis_ai-0.2.4}/osmosis_ai/cli_services/dataset.py RENAMED Viewed

@@ -7,48 +7,7 @@ from pathlib import Path
 from typing import Any, Optional, Sequence
 from .errors import CLIError
-from .shared import coerce_optional_float, gather_text_fragments
-@dataclass(frozen=True)
-class ConversationMessage:
-    """Normalized conversation message with preserved raw payload fields."""
-    role: str
-    content: Any
-    metadata: dict[str, Any]
-    def to_payload(self) -> dict[str, Any]:
-        payload: dict[str, Any] = copy.deepcopy(self.metadata)
-        payload["role"] = self.role
-        if self.content is None:
-            payload.pop("content", None)
-        else:
-            payload["content"] = copy.deepcopy(self.content)
-        return payload
-    def text_fragments(self) -> list[str]:
-        fragments: list[str] = []
-        seen: set[int] = set()
-        gather_text_fragments(self.content, fragments, allow_free_strings=True, seen=seen)
-        for value in self.metadata.values():
-            gather_text_fragments(value, fragments, seen=seen)
-        return fragments
-    @classmethod
-    def from_raw(cls, raw: dict[str, Any], *, source_label: str, index: int) -> "ConversationMessage":
-        role_value = raw.get("role")
-        if not isinstance(role_value, str) or not role_value.strip():
-            raise CLIError(
-                f"Message {index} in {source_label} must include a non-empty string 'role'."
-            )
-        content_value = copy.deepcopy(raw.get("content"))
-        metadata: dict[str, Any] = {}
-        for key, value in raw.items():
-            if key in {"role", "content"}:
-                continue
-            metadata[str(key)] = copy.deepcopy(value)
-        return cls(role=role_value.strip().lower(), content=content_value, metadata=metadata)
+from .shared import coerce_optional_float
 @dataclass(frozen=True)
@@ -57,23 +16,16 @@ class DatasetRecord:
     rubric_id: str
     conversation_id: Optional[str]
     record_id: Optional[str]
-    messages: tuple[ConversationMessage, ...]
+    solution_str: str
     ground_truth: Optional[str]
-    system_message: Optional[str]
     original_input: Optional[str]
     metadata: Optional[dict[str, Any]]
     extra_info: Optional[dict[str, Any]]
     score_min: Optional[float]
     score_max: Optional[float]
-    def message_payloads(self) -> list[dict[str, Any]]:
-        """Return messages as provider-ready payloads."""
-        return [message.to_payload() for message in self.messages]
-    def merged_extra_info(self, config_extra: Optional[dict[str, Any]]) -> Optional[dict[str, Any]]:
+    def merged_extra_info(self) -> Optional[dict[str, Any]]:
         merged: dict[str, Any] = {}
-        if isinstance(config_extra, dict):
-            merged.update(copy.deepcopy(config_extra))
         if isinstance(self.extra_info, dict):
             merged.update(copy.deepcopy(self.extra_info))
         if isinstance(self.metadata, dict) and self.metadata:
@@ -81,19 +33,15 @@ class DatasetRecord:
         return merged or None
     def assistant_preview(self, *, max_length: int = 140) -> Optional[str]:
-        for message in reversed(self.messages):
-            if message.role != "assistant":
-                continue
-            fragments = message.text_fragments()
-            if not fragments:
-                continue
-            preview = " ".join(" ".join(fragments).split())
-            if not preview:
-                continue
-            if len(preview) > max_length:
-                preview = preview[: max_length - 3].rstrip() + "..."
-            return preview
-        return None
+        text = self.solution_str.strip()
+        if not text:
+            return None
+        preview = " ".join(text.split())
+        if not preview:
+            return None
+        if len(preview) > max_length:
+            preview = preview[: max_length - 3].rstrip() + "..."
+        return preview
     def conversation_label(self, fallback_index: int) -> str:
         if isinstance(self.conversation_id, str) and self.conversation_id.strip():
@@ -162,17 +110,29 @@ class DatasetLoader:
         metadata = payload.get("metadata") if isinstance(payload.get("metadata"), dict) else None
         extra_info = payload.get("extra_info") if isinstance(payload.get("extra_info"), dict) else None
         record_label = conversation_id or record_id or rubric_id_str or "<record>"
-        messages = _parse_messages(payload.get("messages"), source_label=record_label)
+        solution_raw = payload.get("solution_str")
+        if not isinstance(solution_raw, str) or not solution_raw.strip():
+            raise CLIError(f"Record '{record_label}' must include a non-empty 'solution_str' string.")
+        original_input_raw = payload.get("original_input")
+        if isinstance(original_input_raw, str):
+            original_input = original_input_raw
+        else:
+            original_input = None
+        if original_input is None and isinstance(extra_info, dict):
+            extra_original_input = extra_info.get("original_input")
+            if isinstance(extra_original_input, str):
+                original_input = extra_original_input
         return DatasetRecord(
             payload=payload,
             rubric_id=rubric_id_str,
             conversation_id=conversation_id,
             record_id=record_id,
-            messages=messages,
+            solution_str=solution_raw,
             ground_truth=payload.get("ground_truth") if isinstance(payload.get("ground_truth"), str) else None,
-            system_message=payload.get("system_message") if isinstance(payload.get("system_message"), str) else None,
-            original_input=payload.get("original_input") if isinstance(payload.get("original_input"), str) else None,
+            original_input=original_input,
             metadata=metadata,
             extra_info=extra_info,
             score_min=score_min,
@@ -213,17 +173,3 @@ def render_json_records(records: Sequence[dict[str, Any]]) -> str:
         segments.append("\n".join(snippet))
     return "\n".join(segments)
-def _parse_messages(messages: Any, *, source_label: str) -> tuple[ConversationMessage, ...]:
-    if not isinstance(messages, list) or not messages:
-        raise CLIError(f"Record '{source_label}' must include a non-empty 'messages' list.")
-    normalized: list[ConversationMessage] = []
-    for index, entry in enumerate(messages):
-        if not isinstance(entry, dict):
-            raise CLIError(
-                f"Message {index} in {source_label} must be an object, got {type(entry).__name__}."
-            )
-        normalized.append(ConversationMessage.from_raw(entry, source_label=source_label, index=index))
-    return tuple(normalized)

osmosis-ai 0.2.3__tar.gz → 0.2.4__tar.gz

Potentially problematic release.

osmosis-ai 0.2.3tar.gz → 0.2.4tar.gz