PyPI - xrtm-eval - Versions diffs - 0.1.1__tar.gz → 0.2.0__tar.gz - Mend

xrtm-eval 0.1.1tar.gz → 0.2.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

{xrtm_eval-0.1.1/src/xrtm_eval.egg-info → xrtm_eval-0.2.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: xrtm-eval
-Version: 0.1.1
+Version: 0.2.0
 Summary: The Judge/Scoring engine for XRTM.
 Author-email: XRTM Team <moy@xrtm.org>
 License: Apache-2.0
@@ -23,15 +23,27 @@ Dynamic: license-file
 [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
 [![Python](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
+[![PyPI](https://img.shields.io/pypi/v/xrtm-eval.svg)](https://pypi.org/project/xrtm-eval/)
 **The Judge for XRTM.**
 `xrtm-eval` is the rigorous scoring engine used to grade probabilistic forecasts. It operates independently of the inference engine to ensure objective evaluation.
+## Part of the XRTM Ecosystem
+```
+Layer 4: xrtm-train    → (imports all)
+Layer 3: xrtm-forecast → (imports eval, data)
+Layer 2: xrtm-eval     → (imports data) ← YOU ARE HERE
+Layer 1: xrtm-data     → (zero dependencies)
+```
+`xrtm-eval` provides scoring metrics AND trust primitives used by the forecast engine.
 ## Installation
 ```bash
-uv pip install xrtm-eval
+pip install xrtm-eval
 ```
 ## Core Primitives
@@ -54,6 +66,29 @@ score = evaluator.score(prediction=0.7, ground_truth=1)
 ### 2. Expected Calibration Error (ECE)
 Use the `ExpectedCalibrationErrorEvaluator` to measure the gap between confidence and accuracy across bin buckets.
+### 3. Epistemic Trust Primitives (v0.1.1+)
+`xrtm-eval` now includes trust scoring infrastructure:
+```python
+from xrtm.eval.core.epistemics import IntegrityGuardian, SourceTrustRegistry
+registry = SourceTrustRegistry()
+guardian = IntegrityGuardian(registry)
+```
+## Project Structure
+```
+src/xrtm/eval/
+├── core/            # Interfaces & Schemas
+│   ├── eval/            # Evaluator protocol, EvaluationResult
+│   ├── epistemics.py    # Trust primitives (SourceTrustRegistry)
+│   └── schemas/         # ForecastResolution
+├── kit/             # Composable evaluator implementations
+│   └── eval/metrics.py  # BrierScoreEvaluator, ECE
+└── providers/       # External evaluation services (future)
+```
 ## Development
 Prerequisites:

{xrtm_eval-0.1.1 → xrtm_eval-0.2.0}/README.md RENAMED Viewed

@@ -2,15 +2,27 @@
 [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
 [![Python](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
+[![PyPI](https://img.shields.io/pypi/v/xrtm-eval.svg)](https://pypi.org/project/xrtm-eval/)
 **The Judge for XRTM.**
 `xrtm-eval` is the rigorous scoring engine used to grade probabilistic forecasts. It operates independently of the inference engine to ensure objective evaluation.
+## Part of the XRTM Ecosystem
+```
+Layer 4: xrtm-train    → (imports all)
+Layer 3: xrtm-forecast → (imports eval, data)
+Layer 2: xrtm-eval     → (imports data) ← YOU ARE HERE
+Layer 1: xrtm-data     → (zero dependencies)
+```
+`xrtm-eval` provides scoring metrics AND trust primitives used by the forecast engine.
 ## Installation
 ```bash
-uv pip install xrtm-eval
+pip install xrtm-eval
 ```
 ## Core Primitives
@@ -33,6 +45,29 @@ score = evaluator.score(prediction=0.7, ground_truth=1)
 ### 2. Expected Calibration Error (ECE)
 Use the `ExpectedCalibrationErrorEvaluator` to measure the gap between confidence and accuracy across bin buckets.
+### 3. Epistemic Trust Primitives (v0.1.1+)
+`xrtm-eval` now includes trust scoring infrastructure:
+```python
+from xrtm.eval.core.epistemics import IntegrityGuardian, SourceTrustRegistry
+registry = SourceTrustRegistry()
+guardian = IntegrityGuardian(registry)
+```
+## Project Structure
+```
+src/xrtm/eval/
+├── core/            # Interfaces & Schemas
+│   ├── eval/            # Evaluator protocol, EvaluationResult
+│   ├── epistemics.py    # Trust primitives (SourceTrustRegistry)
+│   └── schemas/         # ForecastResolution
+├── kit/             # Composable evaluator implementations
+│   └── eval/metrics.py  # BrierScoreEvaluator, ECE
+└── providers/       # External evaluation services (future)
+```
 ## Development
 Prerequisites:

{xrtm_eval-0.1.1 → xrtm_eval-0.2.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "xrtm-eval"
-version = "0.1.1"
+version = "0.2.0"
 description = "The Judge/Scoring engine for XRTM."
 readme = "README.md"
 requires-python = ">=3.11"

xrtm_eval-0.2.0/src/xrtm/eval/core/__init__.py ADDED Viewed

@@ -0,0 +1,42 @@
+# coding=utf-8
+# Copyright 2026 XRTM Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+r"""
+Core interfaces and domain-agnostic logic for xrtm-eval.
+This module exports evaluator protocols, epistemics utilities, and
+core schemas. MUST NOT import from kit/ or providers/.
+"""
+from xrtm.eval.core.epistemics import (
+    IntegrityGuardian,
+    SourceTrustEntry,
+    SourceTrustRegistry,
+)
+from xrtm.eval.core.eval import EvaluationReport, EvaluationResult, Evaluator
+from xrtm.eval.core.schemas import ForecastResolution
+__all__ = [
+    # Evaluator protocol
+    "Evaluator",
+    "EvaluationResult",
+    "EvaluationReport",
+    # Epistemics
+    "IntegrityGuardian",
+    "SourceTrustRegistry",
+    "SourceTrustEntry",
+    # Schemas
+    "ForecastResolution",
+]

{xrtm_eval-0.1.1 → xrtm_eval-0.2.0}/src/xrtm/eval/core/eval/definitions.py RENAMED Viewed

@@ -30,11 +30,9 @@ class BrierDecomposition(BaseModel):
 class Evaluator(Protocol):
-    def score(self, prediction: Any, ground_truth: Any) -> float:
-        ...
+    def score(self, prediction: Any, ground_truth: Any) -> float: ...
-    def evaluate(self, prediction: Any, ground_truth: Any, subject_id: str) -> EvaluationResult:
-        ...
+    def evaluate(self, prediction: Any, ground_truth: Any, subject_id: str) -> EvaluationResult: ...
 class EvaluationReport(BaseModel):
@@ -55,6 +53,7 @@ class EvaluationReport(BaseModel):
     def to_pandas(self) -> Any:
         try:
             import pandas as pd
             return pd.DataFrame([r.model_dump() for r in self.results])
         except ImportError:
             raise ImportError("Pandas is required for to_pandas(). Install it with `pip install pandas`.")

xrtm_eval-0.2.0/src/xrtm/eval/core/schemas/__init__.py ADDED Viewed

@@ -0,0 +1,24 @@
+# coding=utf-8
+# Copyright 2026 XRTM Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+r"""
+Core schemas for xrtm-eval.
+This module exports evaluation-related Pydantic models.
+"""
+from xrtm.eval.core.schemas.forecast import ForecastResolution
+__all__ = ["ForecastResolution"]

xrtm_eval-0.2.0/src/xrtm/eval/core/schemas/forecast.py ADDED Viewed

@@ -0,0 +1,59 @@
+# coding=utf-8
+# Copyright 2026 XRTM Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+r"""
+Forecast resolution schema for evaluation.
+This module defines the ground-truth outcome schema used to evaluate
+forecast accuracy.
+Example:
+    >>> from xrtm.eval.core.schemas import ForecastResolution
+    >>> resolution = ForecastResolution(
+    ...     question_id="q1",
+    ...     outcome="yes",
+    ... )
+"""
+from datetime import datetime, timezone
+from typing import Any, Dict
+from pydantic import BaseModel, Field
+class ForecastResolution(BaseModel):
+    r"""
+    The ground-truth outcome used to evaluate forecast accuracy.
+    Attributes:
+        question_id: Reference to the forecasted question.
+        outcome: The final winning outcome or value.
+        resolved_at: When the outcome was determined.
+        metadata: Source info, verification method, etc.
+    Example:
+        >>> resolution = ForecastResolution(question_id="q1", outcome="yes")
+    """
+    question_id: str = Field(..., description="Reference to the forecasted question")
+    outcome: str = Field(..., description="The final winning outcome or value")
+    resolved_at: datetime = Field(
+        default_factory=lambda: datetime.now(timezone.utc),
+        description="When the outcome was determined",
+    )
+    metadata: Dict[str, Any] = Field(default_factory=dict, description="Source info, verification method")
+__all__ = ["ForecastResolution"]

{xrtm_eval-0.1.1 → xrtm_eval-0.2.0}/src/xrtm/eval/kit/eval/bias.py RENAMED Viewed

@@ -8,10 +8,18 @@ from xrtm.eval.core.eval.definitions import EvaluationResult, Evaluator
 class BiasInterceptor(Evaluator):
     COGNITIVE_BIASES = [
-        "Base-Rate Neglect", "Overconfidence", "Availability Heuristic",
-        "Confirmation Bias", "Anchoring Bias", "Sunk Cost Fallacy",
-        "Hindsight Bias", "Optimism Bias", "Pessimism Bias",
-        "Status Quo Bias", "Framing Effect", "Recency Bias",
+        "Base-Rate Neglect",
+        "Overconfidence",
+        "Availability Heuristic",
+        "Confirmation Bias",
+        "Anchoring Bias",
+        "Sunk Cost Fallacy",
+        "Hindsight Bias",
+        "Optimism Bias",
+        "Pessimism Bias",
+        "Status Quo Bias",
+        "Framing Effect",
+        "Recency Bias",
     ]
     def __init__(self, model: Any):
@@ -46,4 +54,5 @@ class BiasInterceptor(Evaluator):
             metadata={"type": "bias_audit"},
         )
 __all__ = ["BiasInterceptor"]

{xrtm_eval-0.1.1 → xrtm_eval-0.2.0}/src/xrtm/eval/kit/eval/epistemic_evaluator.py RENAMED Viewed

@@ -12,6 +12,7 @@ from xrtm.eval.core.epistemics import IntegrityGuardian, SourceTrustRegistry
 logger = logging.getLogger(__name__)
 class EpistemicEvaluator:
     def __init__(self, registry: Optional[SourceTrustRegistry] = None):
         self.registry = registry or SourceTrustRegistry()
@@ -28,4 +29,5 @@ class EpistemicEvaluator:
             "integrity_level": "HIGH" if avg_trust > 0.8 else "MEDIUM" if avg_trust >= 0.5 else "LOW",
         }
 __all__ = ["EpistemicEvaluator"]

{xrtm_eval-0.1.1 → xrtm_eval-0.2.0}/src/xrtm/eval/kit/eval/intervention.py RENAMED Viewed

@@ -8,6 +8,7 @@ from xrtm.data.schemas.forecast import ForecastOutput
 logger = logging.getLogger(__name__)
 class InterventionEngine:
     @staticmethod
     def apply_intervention(output: ForecastOutput, node_id: str, new_probability: float) -> ForecastOutput:
@@ -29,12 +30,18 @@ class InterventionEngine:
                 weight = data.get("weight", 1.0)
                 target_node = next(n for n in new_output.logical_trace if n.node_id == target_id)
                 old_target_prob = target_node.probability or 0.5
-                normalized_delta = (current_node.probability - (dg.nodes[current_id].get("probability") or 0.5)) * weight
+                normalized_delta = (
+                    current_node.probability - (dg.nodes[current_id].get("probability") or 0.5)
+                ) * weight
                 target_node.probability = max(0.0, min(1.0, old_target_prob + normalized_delta))
         leaf_nodes = [n for n in dg.nodes() if dg.out_degree(n) == 0]
         if leaf_nodes:
-            avg_leaf_prob = sum(next(n.probability for n in new_output.logical_trace if n.node_id == leaf_id) or 0.0 for leaf_id in leaf_nodes) / len(leaf_nodes)
+            avg_leaf_prob = sum(
+                next(n.probability for n in new_output.logical_trace if n.node_id == leaf_id) or 0.0
+                for leaf_id in leaf_nodes
+            ) / len(leaf_nodes)
             new_output.confidence = avg_leaf_prob
         return new_output
 __all__ = ["InterventionEngine"]

{xrtm_eval-0.1.1 → xrtm_eval-0.2.0}/src/xrtm/eval/kit/eval/metrics.py RENAMED Viewed

@@ -73,15 +73,22 @@ class ExpectedCalibrationErrorEvaluator(Evaluator):
     def compute_calibration_data(self, results: List[EvaluationResult]) -> Tuple[float, List[ReliabilityBin]]:
         bin_size = 1.0 / self.num_bins
-        bins: List[List[EvaluationResult]] = [[] for _ in range(self.num_bins)]
+        bins: List[List[Tuple[float, float]]] = [[] for _ in range(self.num_bins)]
         for res in results:
             try:
-                conf = min(max(float(res.prediction), 0.0), 1.0)
+                raw_conf = float(res.prediction)
+                conf = min(max(raw_conf, 0.0), 1.0)
                 idx = int(conf / bin_size)
                 if idx == self.num_bins:
                     idx -= 1
-                bins[idx].append(res)
+                gt = res.ground_truth
+                normalized_gt = (
+                    1.0 if (gt.lower() in ["yes", "1", "true", "won", "pass"] if isinstance(gt, str) else gt) else 0.0
+                )
+                bins[idx].append((raw_conf, normalized_gt))
             except (ValueError, TypeError):
                 continue
@@ -94,17 +101,13 @@ class ExpectedCalibrationErrorEvaluator(Evaluator):
             bin_center = (i + 0.5) * bin_size
             if n_b > 0:
-                mean_conf = sum(float(x.prediction) for x in bin_items) / n_b
-                accuracies = []
-                for x in bin_items:
-                    gt = x.ground_truth
-                    normalized_gt = 1.0 if (gt.lower() in ["yes", "1", "true", "won", "pass"] if isinstance(gt, str) else gt) else 0.0
-                    accuracies.append(normalized_gt)
-                mean_acc = sum(accuracies) / n_b
+                mean_conf = sum(x[0] for x in bin_items) / n_b
+                mean_acc = sum(x[1] for x in bin_items) / n_b
                 ece += (n_b / total_count) * abs(mean_acc - mean_conf)
                 reliability_data.append(
-                    ReliabilityBin(bin_center=bin_center, mean_prediction=mean_conf, mean_ground_truth=mean_acc, count=n_b)
+                    ReliabilityBin(
+                        bin_center=bin_center, mean_prediction=mean_conf, mean_ground_truth=mean_acc, count=n_b
+                    )
                 )
             else:
                 reliability_data.append(

{xrtm_eval-0.1.1 → xrtm_eval-0.2.0}/src/xrtm/eval/kit/eval/resilience.py RENAMED Viewed

@@ -11,22 +11,34 @@ class FakeNewsItem(BaseModel):
     trust_score: float = 0.1
     intended_bias: str = Field(..., description="e.g. 'Bearish', 'Bullish'")
 class GullibilityReport(BaseModel):
     initial_confidence: float
     post_injection_confidence: float
     delta: float
     resilience_score: float
 class AdversarialInjector:
     def __init__(self, intensity: float = 0.5):
         self.intensity = intensity
     def generate_attack(self, subject: str, direction: str) -> FakeNewsItem:
         if direction.lower() == "bearish":
-            return FakeNewsItem(headline=f"BREAKING: {subject} CEO Under Investigation", content="...", intended_bias="Bearish")
+            return FakeNewsItem(
+                headline=f"BREAKING: {subject} CEO Under Investigation", content="...", intended_bias="Bearish"
+            )
         return FakeNewsItem(headline=f"{subject} Secures Massive Contract", content="...", intended_bias="Bullish")
     def measure_resilience(self, initial_confidence: float, post_injection_confidence: float) -> GullibilityReport:
         delta = post_injection_confidence - initial_confidence
         score = max(0.0, 1.0 - abs(delta))
-        return GullibilityReport(initial_confidence=initial_confidence, post_injection_confidence=post_injection_confidence, delta=delta, resilience_score=score)
+        return GullibilityReport(
+            initial_confidence=initial_confidence,
+            post_injection_confidence=post_injection_confidence,
+            delta=delta,
+            resilience_score=score,
+        )
 __all__ = ["FakeNewsItem", "GullibilityReport", "AdversarialInjector"]

{xrtm_eval-0.1.1 → xrtm_eval-0.2.0}/src/xrtm/eval/kit/eval/viz.py RENAMED Viewed

@@ -9,12 +9,14 @@ import numpy as np
 logger = logging.getLogger(__name__)
 @dataclass
 class ReliabilityCurveData:
     prob_pred: np.ndarray
     prob_true: np.ndarray
     ece: float
 def compute_calibration_curve(y_true: List[int], y_prob: List[float], n_bins: int = 10) -> ReliabilityCurveData:
     y_true_arr = np.array(y_true)
     y_prob_arr = np.array(y_prob)
@@ -39,7 +41,10 @@ def compute_calibration_curve(y_true: List[int], y_prob: List[float], n_bins: in
         ece += (count / total_samples) * np.abs(fraction_true - mean_prob)
     return ReliabilityCurveData(prob_pred=np.array(bin_pred), prob_true=np.array(bin_true), ece=ece)
-def plot_reliability_diagram(data: ReliabilityCurveData, title: str = "Reliability Diagram", save_path: Optional[str] = None) -> Any:
+def plot_reliability_diagram(
+    data: ReliabilityCurveData, title: str = "Reliability Diagram", save_path: Optional[str] = None
+) -> Any:
     try:
         import matplotlib.pyplot as plt
         import seaborn as sns
@@ -61,13 +66,17 @@ def plot_reliability_diagram(data: ReliabilityCurveData, title: str = "Reliabili
         plt.savefig(save_path)
     return fig
 class ReliabilityDiagram:
     def __init__(self, n_bins: int = 10):
         self.n_bins = n_bins
     def compute(self, y_true: List[int], y_prob: List[float]) -> ReliabilityCurveData:
         return compute_calibration_curve(y_true, y_prob, self.n_bins)
     def plot(self, y_true: List[int], y_prob: List[float], save_path: Optional[str] = None) -> Any:
         data = self.compute(y_true, y_prob)
         return plot_reliability_diagram(data, save_path=save_path)
 __all__ = ["ReliabilityCurveData", "compute_calibration_curve", "plot_reliability_diagram", "ReliabilityDiagram"]

xrtm_eval-0.2.0/src/xrtm/eval/providers/__init__.py ADDED Viewed

@@ -0,0 +1,24 @@
+# coding=utf-8
+# Copyright 2026 XRTM Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+r"""
+External providers for xrtm-eval.
+This module provides adapters for external evaluation services.
+Currently empty - will be populated with remote judges, LLM-as-judge
+integrations, etc.
+"""
+__all__: list[str] = []

xrtm_eval-0.2.0/src/xrtm/eval/version.py ADDED Viewed

@@ -0,0 +1,28 @@
+# coding=utf-8
+# Copyright 2026 XRTM Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+r"""
+Version information for xrtm-eval.
+This module provides the single source of truth for the package version.
+"""
+__all__ = ["__version__", "__author__", "__contact__", "__license__", "__copyright__"]
+__version__ = "0.2.0"
+__author__ = "XRTM Team"
+__contact__ = "moy@xrtm.org"
+__license__ = "Apache-2.0"
+__copyright__ = "Copyright 2026 XRTM Team"

{xrtm_eval-0.1.1 → xrtm_eval-0.2.0/src/xrtm_eval.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: xrtm-eval
-Version: 0.1.1
+Version: 0.2.0
 Summary: The Judge/Scoring engine for XRTM.
 Author-email: XRTM Team <moy@xrtm.org>
 License: Apache-2.0
@@ -23,15 +23,27 @@ Dynamic: license-file
 [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
 [![Python](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
+[![PyPI](https://img.shields.io/pypi/v/xrtm-eval.svg)](https://pypi.org/project/xrtm-eval/)
 **The Judge for XRTM.**
 `xrtm-eval` is the rigorous scoring engine used to grade probabilistic forecasts. It operates independently of the inference engine to ensure objective evaluation.
+## Part of the XRTM Ecosystem
+```
+Layer 4: xrtm-train    → (imports all)
+Layer 3: xrtm-forecast → (imports eval, data)
+Layer 2: xrtm-eval     → (imports data) ← YOU ARE HERE
+Layer 1: xrtm-data     → (zero dependencies)
+```
+`xrtm-eval` provides scoring metrics AND trust primitives used by the forecast engine.
 ## Installation
 ```bash
-uv pip install xrtm-eval
+pip install xrtm-eval
 ```
 ## Core Primitives
@@ -54,6 +66,29 @@ score = evaluator.score(prediction=0.7, ground_truth=1)
 ### 2. Expected Calibration Error (ECE)
 Use the `ExpectedCalibrationErrorEvaluator` to measure the gap between confidence and accuracy across bin buckets.
+### 3. Epistemic Trust Primitives (v0.1.1+)
+`xrtm-eval` now includes trust scoring infrastructure:
+```python
+from xrtm.eval.core.epistemics import IntegrityGuardian, SourceTrustRegistry
+registry = SourceTrustRegistry()
+guardian = IntegrityGuardian(registry)
+```
+## Project Structure
+```
+src/xrtm/eval/
+├── core/            # Interfaces & Schemas
+│   ├── eval/            # Evaluator protocol, EvaluationResult
+│   ├── epistemics.py    # Trust primitives (SourceTrustRegistry)
+│   └── schemas/         # ForecastResolution
+├── kit/             # Composable evaluator implementations
+│   └── eval/metrics.py  # BrierScoreEvaluator, ECE
+└── providers/       # External evaluation services (future)
+```
 ## Development
 Prerequisites:

{xrtm_eval-0.1.1 → xrtm_eval-0.2.0}/src/xrtm_eval.egg-info/SOURCES.txt RENAMED Viewed

@@ -2,12 +2,15 @@ LICENSE
 README.md
 pyproject.toml
 src/xrtm/eval/__init__.py
+src/xrtm/eval/version.py
 src/xrtm/eval/core/__init__.py
 src/xrtm/eval/core/epistemics.py
 src/xrtm/eval/core/eval/__init__.py
 src/xrtm/eval/core/eval/aggregation.py
 src/xrtm/eval/core/eval/bayesian.py
 src/xrtm/eval/core/eval/definitions.py
+src/xrtm/eval/core/schemas/__init__.py
+src/xrtm/eval/core/schemas/forecast.py
 src/xrtm/eval/kit/eval/__init__.py
 src/xrtm/eval/kit/eval/analytics.py
 src/xrtm/eval/kit/eval/bias.py
@@ -16,11 +19,11 @@ src/xrtm/eval/kit/eval/intervention.py
 src/xrtm/eval/kit/eval/metrics.py
 src/xrtm/eval/kit/eval/resilience.py
 src/xrtm/eval/kit/eval/viz.py
-src/xrtm/eval/schemas/__init__.py
-src/xrtm/eval/schemas/forecast.py
+src/xrtm/eval/providers/__init__.py
 src/xrtm_eval.egg-info/PKG-INFO
 src/xrtm_eval.egg-info/SOURCES.txt
 src/xrtm_eval.egg-info/dependency_links.txt
 src/xrtm_eval.egg-info/requires.txt
 src/xrtm_eval.egg-info/top_level.txt
+tests/test_ece.py
 tests/test_metrics.py

xrtm_eval-0.2.0/tests/test_ece.py ADDED Viewed

@@ -0,0 +1,68 @@
+from xrtm.eval.core.eval.definitions import EvaluationResult
+from xrtm.eval.kit.eval.metrics import ExpectedCalibrationErrorEvaluator
+def test_ece_basic():
+    evaluator = ExpectedCalibrationErrorEvaluator(num_bins=10)
+    results = [
+        EvaluationResult(subject_id="1", score=0, ground_truth=1, prediction=0.9, metadata={}),  # Bin 9
+        EvaluationResult(subject_id="2", score=0, ground_truth=0, prediction=0.1, metadata={}),  # Bin 1
+    ]
+    ece, bins = evaluator.compute_calibration_data(results)
+    # Bin 9: 1 item, pred 0.9, gt 1. acc 1. mean_conf 0.9. abs(1 - 0.9) = 0.1
+    # Bin 1: 1 item, pred 0.1, gt 0. acc 0. mean_conf 0.1. abs(0 - 0.1) = 0.1
+    # ECE = (1/2)*0.1 + (1/2)*0.1 = 0.1
+    assert abs(ece - 0.1) < 1e-6
+def test_ece_mixed_types():
+    evaluator = ExpectedCalibrationErrorEvaluator(num_bins=2)
+    results = [
+        EvaluationResult(subject_id="1", score=0, ground_truth="yes", prediction=0.8, metadata={}),
+        EvaluationResult(subject_id="2", score=0, ground_truth="no", prediction="0.2", metadata={}),
+        EvaluationResult(subject_id="3", score=0, ground_truth=True, prediction=0.9, metadata={}),
+        EvaluationResult(subject_id="4", score=0, ground_truth=False, prediction=0.1, metadata={}),
+    ]
+    # Bin 0 (0-0.5): Items 2 (0.2), 4 (0.1).
+    # Item 2: gt "no" -> 0.0. pred 0.2.
+    # Item 4: gt False -> 0.0. pred 0.1.
+    # Bin 0 mean_conf = (0.2 + 0.1)/2 = 0.15. mean_acc = 0.
+    # Bin 1 (0.5-1.0): Items 1 (0.8), 3 (0.9).
+    # Item 1: gt "yes" -> 1.0. pred 0.8.
+    # Item 3: gt True -> 1.0. pred 0.9.
+    # Bin 1 mean_conf = (0.8 + 0.9)/2 = 0.85. mean_acc = 1.0.
+    # ECE = (2/4)*abs(0 - 0.15) + (2/4)*abs(1 - 0.85) = 0.5 * 0.15 + 0.5 * 0.15 = 0.075 + 0.075 = 0.15
+    ece, bins = evaluator.compute_calibration_data(results)
+    assert abs(ece - 0.15) < 1e-6
+def test_ece_out_of_bounds():
+    evaluator = ExpectedCalibrationErrorEvaluator(num_bins=10)
+    results = [
+        EvaluationResult(subject_id="1", score=0, ground_truth=1, prediction=1.5, metadata={}),
+        EvaluationResult(subject_id="2", score=0, ground_truth=0, prediction=-0.5, metadata={}),
+    ]
+    # Prediction 1.5 -> Clamped to 1.0 -> Bin 9 (last bin)
+    # Prediction -0.5 -> Clamped to 0.0 -> Bin 0
+    # Bin 9: 1 item. pred 1.5. gt 1. mean_conf 1.5. mean_acc 1. abs(1 - 1.5) = 0.5
+    # Bin 0: 1 item. pred -0.5. gt 0. mean_conf -0.5. mean_acc 0. abs(0 - -0.5) = 0.5
+    # ECE = 0.5 * 0.5 + 0.5 * 0.5 = 0.5
+    ece, bins = evaluator.compute_calibration_data(results)
+    assert abs(ece - 0.5) < 1e-6
+    # Check stored bins for correct values
+    # The last bin should have mean_prediction 1.5
+    assert abs(bins[9].mean_prediction - 1.5) < 1e-6
+    # The first bin should have mean_prediction -0.5
+    assert abs(bins[0].mean_prediction + 0.5) < 1e-6
+if __name__ == "__main__":
+    test_ece_basic()
+    test_ece_mixed_types()
+    test_ece_out_of_bounds()
+    print("All tests passed!")

{xrtm_eval-0.1.1 → xrtm_eval-0.2.0}/tests/test_metrics.py RENAMED Viewed

@@ -25,6 +25,7 @@ def test_brier_score_perfect_accurate():
     score = evaluator.score(prediction=0.0, ground_truth=0)
     assert score == 0.0
 def test_brier_score_worst_case():
     """Verify Brier score is 1.0 for completely wrong prediction."""
     evaluator = BrierScoreEvaluator()
@@ -34,12 +35,14 @@ def test_brier_score_worst_case():
     score = evaluator.score(prediction=0.0, ground_truth=1)
     assert score == 1.0
 def test_brier_score_uncertainty():
     """Verify Brier score for 0.5 prediction."""
     evaluator = BrierScoreEvaluator()
     score = evaluator.score(prediction=0.5, ground_truth=1)
     assert score == 0.25  # (0.5 - 1.0)^2 = 0.25
 def test_string_ground_truth_handling():
     """Verify string handling (Resolution logic)."""
     evaluator = BrierScoreEvaluator()

xrtm_eval-0.1.1/src/xrtm/eval/core/__init__.py DELETED Viewed

@@ -1,14 +0,0 @@
-# coding=utf-8
-# Copyright 2026 XRTM Team. All rights reserved.
-from .epistemics import IntegrityGuardian, SourceTrustEntry, SourceTrustRegistry
-from .eval import EvaluationReport, EvaluationResult, Evaluator
-__all__ = [
-    "Evaluator",
-    "EvaluationResult",
-    "EvaluationReport",
-    "IntegrityGuardian",
-    "SourceTrustRegistry",
-    "SourceTrustEntry",
-]

xrtm_eval-0.1.1/src/xrtm/eval/schemas/__init__.py DELETED Viewed

@@ -1,3 +0,0 @@
-from .forecast import ForecastResolution
-__all__ = ["ForecastResolution"]

xrtm_eval-0.1.1/src/xrtm/eval/schemas/forecast.py DELETED Viewed

@@ -1,21 +0,0 @@
-# coding=utf-8
-# Copyright 2026 XRTM Team. All rights reserved.
-from datetime import datetime, timezone
-from typing import Any, Dict
-from pydantic import BaseModel, Field
-class ForecastResolution(BaseModel):
-    r"""
-    The ground-truth outcome used to evaluate forecast accuracy.
-    """
-    question_id: str
-    outcome: str = Field(..., description="The final winning outcome or value")
-    resolved_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
-    metadata: Dict[str, Any] = Field(default_factory=dict, description="Source info, verification method")
-__all__ = ["ForecastResolution"]

{xrtm_eval-0.1.1 → xrtm_eval-0.2.0}/LICENSE RENAMED Viewed

File without changes

{xrtm_eval-0.1.1 → xrtm_eval-0.2.0}/setup.cfg RENAMED Viewed

File without changes

{xrtm_eval-0.1.1 → xrtm_eval-0.2.0}/src/xrtm/eval/__init__.py RENAMED Viewed

File without changes

{xrtm_eval-0.1.1 → xrtm_eval-0.2.0}/src/xrtm/eval/core/epistemics.py RENAMED Viewed

File without changes

{xrtm_eval-0.1.1 → xrtm_eval-0.2.0}/src/xrtm/eval/core/eval/__init__.py RENAMED Viewed

File without changes

{xrtm_eval-0.1.1 → xrtm_eval-0.2.0}/src/xrtm/eval/core/eval/aggregation.py RENAMED Viewed

File without changes

{xrtm_eval-0.1.1 → xrtm_eval-0.2.0}/src/xrtm/eval/core/eval/bayesian.py RENAMED Viewed

File without changes

{xrtm_eval-0.1.1 → xrtm_eval-0.2.0}/src/xrtm/eval/kit/eval/__init__.py RENAMED Viewed

File without changes

{xrtm_eval-0.1.1 → xrtm_eval-0.2.0}/src/xrtm/eval/kit/eval/analytics.py RENAMED Viewed

File without changes

{xrtm_eval-0.1.1 → xrtm_eval-0.2.0}/src/xrtm_eval.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{xrtm_eval-0.1.1 → xrtm_eval-0.2.0}/src/xrtm_eval.egg-info/requires.txt RENAMED Viewed

File without changes

{xrtm_eval-0.1.1 → xrtm_eval-0.2.0}/src/xrtm_eval.egg-info/top_level.txt RENAMED Viewed

File without changes

xrtm-eval 0.1.1__tar.gz → 0.2.0__tar.gz

xrtm-eval 0.1.1tar.gz → 0.2.0tar.gz