PyPI - python-flexeval - Versions diffs - 0.2.0__tar.gz → 0.4.0__tar.gz - Mend

python-flexeval 0.2.0tar.gz → 0.4.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (142) hide show

{python_flexeval-0.2.0 → python_flexeval-0.4.0}/.github/dependabot.yml RENAMED Viewed

@@ -5,4 +5,4 @@ updates:
   directory: "/"
   schedule:
     interval: "weekly"
-  target-branch: "dev"
+  target-branch: "main"

{python_flexeval-0.2.0 → python_flexeval-0.4.0}/CLAUDE.md RENAMED Viewed

@@ -59,7 +59,7 @@ FlexEval is a tool for evaluating LLM-powered systems using custom metrics, comp
 ### Core Abstractions
 **EvalRun** (`src/flexeval/schema/evalrun_schema.py`): The top-level execution unit that combines:
-- Data sources (conversations in JSONL format as inputs, an SQLite filepath as output)
+- Data sources (polymorphic via `type` discriminator: `FileDataSource`, `NamedDataSource`, `IterableDataSource`)
 - An Eval specification (metrics to compute)
 - Configuration (workers, database path, etc.)
 - Rubric and function sources
@@ -71,15 +71,18 @@ FlexEval is a tool for evaluating LLM-powered systems using custom metrics, comp
 - Grader LLM (for rubric evaluation)
 - Dependencies between metrics
-**Config** (`src/flexeval/schema/config_schema.py`): Defines how to evaluate (e.g. single- vs multi-process, etc.)
+**Config** (`src/flexeval/schema/config_schema.py`): Defines how to evaluate (e.g. single- vs multi-process, dataset reuse/naming constraints, etc.)
 ### Data Hierarchy
 The evaluation operates at multiple levels of granularity:
+- **Dataset** (`src/flexeval/classes/dataset.py`): Container for loaded data, linked to EvalSetRuns via many-to-many join table (`EvalSetRunDatasets`). Datasets can be reused across multiple eval runs.
 - **Thread**: Full conversation
-- **Turn**: User-assistant exchange pair
+- **Turn**: User-assistant exchange pair
 - **Message**: Individual message from user or assistant
 - **ToolCall**: Function/tool invocation within a message
+Thread, Turn, Message, and ToolCall belong to a Dataset. Metrics belong to both an EvalSetRun and a Dataset.
 ### Key Components
 **Configuration System**:
@@ -89,7 +92,7 @@ The evaluation operates at multiple levels of granularity:
 **Execution Pipeline** (`src/flexeval/runner.py`):
 1. Load configuration and eval specification
-2. Create Dataset from data sources
+2. Create Datasets from data sources and link to EvalSetRun via `EvalSetRunDatasets`
 3. Run EvalRunner to compute metrics
 4. Store results in SQLite database

{python_flexeval-0.2.0 → python_flexeval-0.4.0}/DEVELOPMENT.md RENAMED Viewed

@@ -30,7 +30,9 @@ uv sync --upgrade --all-groups
 uv build
 ```
-### Running tests
+### Unit tests
+Unit tests live in `tests/unit/` and are run in CI.
 Run the unit tests:
@@ -46,7 +48,35 @@ To run a specific file's tests:
 uv run python -m unittest tests.unit.{module_name}
 ```
-There are integration tests in tests/integration that can be executed.
+### Integration tests
+Integration tests live in `tests/integration/` and are **not** run in CI.
+Run the integration tests:
+```bash
+uv run python -m unittest tests.integration.functional_tests
+```
+**Prerequisites:**
+- An `.env` file at the repo root with `OPENAI_API_KEY` set
+- Suites with rubric metrics (`TestSuite04`) make **real API calls** to OpenAI (gpt-5.4-nano)
+- Function-only suites (`TestSuite01`, `TestSuite02`, `TestSuite03`) do not require API keys
+- LangGraph-based test suites use pre-generated test data from `tests/resources/langgraph-test-data.db`
+To run only the function-metric suites (no API key required):
+```bash
+uv run python -m unittest tests.integration.functional_tests.TestSuite01 tests.integration.functional_tests.TestSuite02 tests.integration.functional_tests.TestSuite03
+```
+**Regenerating LangGraph test data:**
+The file `tests/resources/langgraph-test-data.db` is pre-generated. To regenerate it (requires `OPENAI_API_KEY`):
+```bash
+uv run python tests/integration/langgraph_data.py
+```
 ### Adding or updating dependencies

{python_flexeval-0.2.0 → python_flexeval-0.4.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: python-flexeval
-Version: 0.2.0
+Version: 0.4.0
 Summary: FlexEval is a tool for designing custom metrics, completion functions, and LLM-graded rubrics for evaluating the behavior of LLM-powered systems.
 Project-URL: Homepage, https://digitalharborfoundation.github.io/FlexEval/
 Project-URL: GitHub, https://github.com/DigitalHarborFoundation/FlexEval
@@ -21,8 +21,8 @@ Requires-Dist: flatten-json>=0.1.14
 Requires-Dist: jsonschema>=4.23.0
 Requires-Dist: langchain-openai>=0.3.8
 Requires-Dist: langchain>=0.3.20
-Requires-Dist: langgraph-checkpoint-sqlite>=2.0.6
-Requires-Dist: langgraph>=0.3.6
+Requires-Dist: langgraph-checkpoint-sqlite>=3.0.0
+Requires-Dist: langgraph>=1.0.0
 Requires-Dist: litellm>=1.74.3
 Requires-Dist: msgpack>=1.1.0
 Requires-Dist: networkx>=3.4.2

{python_flexeval-0.2.0 → python_flexeval-0.4.0}/docs/user_guide/abstractions.rst RENAMED Viewed

@@ -16,7 +16,7 @@ An evaluation is represented by :class:`flexeval.schema.eval_schema.Eval`, and c
 - **Functions**: :class:`~flexeval.schema.eval_schema.FunctionItem`\s apply a Python function to the test data, returning a numeric value.
 - **Rubrics**: :class:`~flexeval.schema.eval_schema.RubricItem`\s use a configured :class:`~flexeval.schema.eval_schema.GraderLlm` function and the provided rubric template to generate a numeric score from an LLM's output.
-You execute an :class:`~flexeval.schema.eval_schema.Eval` by creating an :class:`flexeval.schema.evalrun_schema.EvalRun`.
+You execute an :class:`~flexeval.schema.eval_schema.Eval` by creating an :class:`flexeval.schema.evalrun_schema.EvalRun`.
 EvalRun contains:
 - Data sources (conversations as inputs, an SQLite filepath as output)
@@ -26,11 +26,31 @@ EvalRun contains:
 The :class:`~flexeval.schema.config_schema.Config` includes details about multi-threaded metric computation, about logging, etc.
+Data Sources
+------------
+Data sources can be any of these types:
+- :class:`~flexeval.schema.evalrun_schema.FileDataSource` (``type: file``): Load from a JSONL or LangGraph SQLite file. This is the most common data source.
+- :class:`~flexeval.schema.evalrun_schema.NamedDataSource` (``type: named``): Reference a previously loaded dataset by name, enabling dataset reuse across eval runs.
+- :class:`~flexeval.schema.evalrun_schema.IterableDataSource` (``type: iterable``): Load from an in-memory Python iterable (programmatic use only).
+In YAML configurations, specify the ``type`` field::
+    data_sources:
+      - type: file
+        path: conversations.jsonl
+In Python, the type is set automatically when you construct the appropriate class::
+    data_sources = [FileDataSource(path="conversations.jsonl")]
 Data Hierarchy
 --------------
-Metrics can operate at any of four levels of granularity:
+Data is organized at several levels of granularity:
+- :class:`~flexeval.classes.dataset.Dataset`: A loaded collection of conversations. Datasets can be shared across multiple eval runs.
 - :class:`~flexeval.classes.thread.Thread`: Full conversation
 - :class:`~flexeval.classes.turn.Turn`: Adjacent set of messages from the same user or assistant
 - :class:`~flexeval.classes.message.Message`: Individual message from user or assistant

{python_flexeval-0.2.0 → python_flexeval-0.4.0}/docs/vignettes.rst RENAMED Viewed

@@ -13,6 +13,7 @@ These vignettes demonstrate how to use FlexEval.
    generated/vignettes/basic_rubric
    generated/vignettes/custom_rubric
    generated/vignettes/basic_cli
+   generated/vignettes/multiple_configs
    generated/vignettes/metric_analysis

{python_flexeval-0.2.0 → python_flexeval-0.4.0}/pyproject.toml RENAMED Viewed

@@ -28,8 +28,8 @@ dependencies = [
     "jsonschema>=4.23.0",
     "langchain>=0.3.20",
     "langchain-openai>=0.3.8",
-    "langgraph>=0.3.6",
-    "langgraph-checkpoint-sqlite>=2.0.6",
+    "langgraph>=1.0.0",
+    "langgraph-checkpoint-sqlite>=3.0.0",
     "litellm>=1.74.3",
     "msgpack>=1.1.0",
     "networkx>=3.4.2",

python_flexeval-0.4.0/src/flexeval/__about__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ __version__ = "0.4.0"

python_flexeval-0.4.0/src/flexeval/classes/dataset.py ADDED Viewed

@@ -0,0 +1,22 @@
+import logging
+from datetime import datetime
+import peewee as pw
+from flexeval.classes.base import BaseModel
+from flexeval.classes.jsonview import JsonView
+logger = logging.getLogger(__name__)
+class Dataset(BaseModel):
+    """Holds a dataset, e.g. a jsonl file"""
+    id = pw.IntegerField(primary_key=True)
+    timestamp = pw.DateTimeField(default=datetime.now)
+    datasource_type = pw.TextField(null=False)
+    name = pw.TextField(default=None, null=True)
+    notes = pw.TextField(default=None, null=True)
+    is_loaded = pw.BooleanField(default=False)
+    metadata = pw.TextField(default="{}", null=False)
+    metadata_dict = JsonView("metadata")

{python_flexeval-0.2.0 → python_flexeval-0.4.0}/src/flexeval/classes/eval_set_run.py RENAMED Viewed

@@ -1,9 +1,9 @@
-import json
 from datetime import datetime
 import peewee as pw
 from flexeval.classes.base import BaseModel
+from flexeval.classes.dataset import Dataset
 class EvalSetRun(BaseModel):
@@ -12,7 +12,6 @@ class EvalSetRun(BaseModel):
     id = pw.IntegerField(primary_key=True)
     name = pw.CharField(null=True)
     notes = pw.TextField(null=True)
-    dataset_files = pw.TextField()  # JSON string
     metrics = pw.TextField()
     metrics_graph_ordered_list = pw.TextField()
     do_completion = pw.BooleanField()
@@ -25,8 +24,20 @@ class EvalSetRun(BaseModel):
         default=datetime.now
     )  # Automatically set to current date and time
-    def get_datasets(self) -> list[str]:
-        # TODO Turn these into DataSource instances instead, returning list[DataSource]
-        temp = json.loads(self.dataset_files)
-        assert isinstance(temp, list), "The `data` entry in evals.yaml must be a list."
-        return temp
+    @property
+    def dataset_list(self) -> list[Dataset]:
+        """Returns the actual Dataset objects linked to this EvalSetRun via the join table."""
+        return list(
+            Dataset.select()
+            .join(EvalSetRunDatasets)
+            .where(EvalSetRunDatasets.evalsetrun == self)
+        )
+class EvalSetRunDatasets(BaseModel):
+    """Datasets used by an EvalSetRun."""
+    id = pw.IntegerField(primary_key=True)
+    timestamp = pw.DateTimeField(default=datetime.now)
+    evalsetrun = pw.ForeignKeyField(EvalSetRun, backref="dataset_links")
+    dataset = pw.ForeignKeyField(Dataset, backref="evalsetrun_links")

python_flexeval-0.4.0/src/flexeval/classes/jsonview.py ADDED Viewed

@@ -0,0 +1,112 @@
+import json
+from collections import UserDict
+class JsonViewDict(UserDict):
+    """Dictionary that syncs changes back to the model field."""
+    def __init__(
+        self,
+        model_instance,
+        text_field_attr_name,
+        json_dumps_fn=json.dumps,
+        json_loads_fn=json.loads,
+    ):
+        self.model_instance = model_instance
+        self.text_field_attr_name = text_field_attr_name
+        self.json_dumps_fn = json_dumps_fn
+        self.json_loads_fn = json_loads_fn
+        text_value = getattr(model_instance, text_field_attr_name)
+        initial_data = self.json_loads_fn(text_value)
+        super().__init__(initial_data)
+    def _sync_to_model(self):
+        """Sync the current data back to the model field."""
+        json_str = self.json_dumps_fn(self.data)
+        setattr(self.model_instance, self.text_field_attr_name, json_str)
+    # Override mutating methods to trigger sync
+    def __setitem__(self, key, value):
+        super().__setitem__(key, value)
+        self._sync_to_model()
+    def __delitem__(self, key):
+        super().__delitem__(key)
+        self._sync_to_model()
+    def clear(self):
+        super().clear()
+        self._sync_to_model()
+    def pop(self, key, *args):
+        result = super().pop(key, *args)
+        self._sync_to_model()
+        return result
+    def popitem(self):
+        result = super().popitem()
+        self._sync_to_model()
+        return result
+    def setdefault(self, key, default=None):
+        result = super().setdefault(key, default)
+        self._sync_to_model()
+        return result
+    def update(self, *args, **kwargs):
+        super().update(*args, **kwargs)
+        self._sync_to_model()
+    def refresh_from_model(self):
+        """If the text attribute has been mutated in the model, this method brings the view back in sync.
+        If you're going to use the JsonView, avoid mutating the text attribute directly.
+        """
+        text_value = getattr(self.model_instance, self.text_field_attr_name)
+        self.update(self.json_loads_fn(text_value))
+class JsonView:
+    """Descriptor that provides dict-like access to a JSON text field.
+    Example:
+    class SomeModel(pw.Model):
+        some_field = pw.TextField(default="{}")
+        some_field_dict = JsonView(text_field_attr_name="some_field")
+    """
+    def __init__(self, text_field_attr_name):
+        self.text_field_attr_name = text_field_attr_name
+        self.attr_name = None
+    def __set_name__(self, owner, name):
+        """Called when the descriptor is assigned to a class attribute."""
+        self.attr_name = f"_{name}_dict"
+    def __get__(self, instance, owner) -> JsonViewDict:
+        if instance is None:
+            return self
+        # Check if we already have a cached JsonViewDict
+        if not hasattr(instance, self.attr_name):
+            if not hasattr(instance, self.text_field_attr_name):
+                raise ValueError(
+                    f"Failed to link this JsonView to field '{self.text_field_attr_name}' because it doesn't exist on this model instance."
+                )
+            # Cache a new JsonViewDict
+            json_dict = JsonViewDict(instance, self.text_field_attr_name)
+            setattr(instance, self.attr_name, json_dict)
+        return getattr(instance, self.attr_name)
+    def __set__(self, instance, value):
+        """Allow setting the entire dict."""
+        if isinstance(value, dict):
+            json_dict = JsonViewDict(instance, self.text_field_attr_name)
+            json_dict.update(value)
+            setattr(instance, self.attr_name, json_dict)
+        else:
+            raise ValueError(
+                f"This JsonView must be a dictionary to set linked field '{self.text_field_attr_name}' correctly."
+            )

{python_flexeval-0.2.0 → python_flexeval-0.4.0}/src/flexeval/classes/message.py RENAMED Viewed

@@ -7,9 +7,9 @@ from playhouse.shortcuts import model_to_dict
 from flexeval.classes.base import BaseModel
 from flexeval.classes.dataset import Dataset
-from flexeval.classes.eval_set_run import EvalSetRun
 from flexeval.classes.thread import Thread
 from flexeval.classes.turn import Turn
+from flexeval.classes.jsonview import JsonView
 from flexeval.configuration import completion_functions
 logger = logging.getLogger(__name__)
@@ -23,7 +23,6 @@ class Message(BaseModel):
     id = pw.IntegerField(primary_key=True)
-    evalsetrun = pw.ForeignKeyField(EvalSetRun, backref="messages")
     dataset = pw.ForeignKeyField(Dataset, backref="messages")
     thread = pw.ForeignKeyField(Thread, backref="messages")
     index_in_thread = pw.IntegerField()
@@ -34,6 +33,10 @@ class Message(BaseModel):
     content = pw.TextField()
     context = pw.TextField(null=True)  # Previous messages
+    # metadata
+    metadata = pw.TextField(default="{}", null=False)
+    metadata_dict = JsonView("metadata")
     # helpers
     system_prompt = pw.TextField(null=True)
     is_flexeval_completion = pw.BooleanField(null=True)
@@ -66,10 +69,18 @@ class Message(BaseModel):
         super().__init__(**kwargs)
         self.metrics_to_evaluate = []
-    def get_completion(self, include_system_prompt=False):
+    def get_completion(
+        self,
+        include_system_prompt=False,
+        completion_config: dict | None = None,
+        evalsetrun=None,
+    ):
         # only get a completion if this is the final turn - we probably don't want to branch from mid-conversation
         if self.is_final_turn_in_input:
-            completion_config = json.loads(self.evalsetrun.completion_llm)
+            if completion_config is None:
+                raise ValueError(
+                    "completion_config must be provided to get_completion()"
+                )
             completion_fn_name = completion_config.get("function_name", None)
             completion_function_kwargs = completion_config.get("kwargs", None)
@@ -99,7 +110,7 @@ class Message(BaseModel):
             # which generally means it'll have a structure like this
             # {"choices": [{"message": {"content": "hi", "role": "assistant"}}]}
             result = model_to_dict(self, exclude=[self.id])
-            result["evalsetrun"] = self.evalsetrun
+            result["evalsetrun"] = evalsetrun
             result["dataset"] = self.dataset
             result["datasetrow"] = self.datasetrow
             result["turn_number"] = self.turn_number + 1

{python_flexeval-0.2.0 → python_flexeval-0.4.0}/src/flexeval/classes/metric.py RENAMED Viewed

@@ -37,14 +37,6 @@ class Metric(BaseModel):
         null=True
     )  # necessary if rubric result is INVALID or e.g. latency doesn't apply to the very first message
     kwargs = pw.TextField()
-    # context_only allows us to create another kind of dependency
-    # where we can quantify something about the previous conversation
-    # and then use that quantity in a downstream analysis
-    # e.g. 'would a plot be pedagogically appropriate here' is really a question about the PAST of the conversation
-    #      NOTE: but we have gotten rid of context_only for rubrics, where only {context} is used so technically here 'context_only' is False
-    # or 'was the conversation ever flagged by the moderation api' would be a question about the previous turns that might
-    #    allow to have better context for the properties of this turn
-    # context_only = pw.BooleanField(default=False)
     source = pw.TextField()  # TODO - make another table for this? But maybe not, because this also contains filled-in rubrics
     depends_on = pw.TextField()
     rubric_prompt = pw.TextField(null=True)

{python_flexeval-0.2.0 → python_flexeval-0.4.0}/src/flexeval/classes/thread.py RENAMED Viewed

@@ -2,7 +2,7 @@ import peewee as pw
 from flexeval.classes.base import BaseModel
 from flexeval.classes.dataset import Dataset
-from flexeval.classes.eval_set_run import EvalSetRun
+from flexeval.classes.jsonview import JsonView
 class Thread(BaseModel):
@@ -12,7 +12,6 @@ class Thread(BaseModel):
     id = pw.IntegerField(primary_key=True)
     dataset = pw.ForeignKeyField(Dataset, backref="threads")
-    evalsetrun = pw.ForeignKeyField(EvalSetRun, backref="threads")
     langgraph_thread_id = pw.TextField(null=True)
     eval_run_thread_id = pw.TextField(null=True)
@@ -20,6 +19,9 @@ class Thread(BaseModel):
     system_prompt = pw.TextField(null=True)
+    metadata = pw.TextField(default="{}", null=False)
+    metadata_dict = JsonView("metadata")
     def __init__(self, **kwargs):
         super().__init__(**kwargs)
         self.metrics_to_evaluate = []

{python_flexeval-0.2.0 → python_flexeval-0.4.0}/src/flexeval/classes/tool_call.py RENAMED Viewed

@@ -2,7 +2,6 @@ import peewee as pw
 from flexeval.classes.base import BaseModel
 from flexeval.classes.dataset import Dataset
-from flexeval.classes.eval_set_run import EvalSetRun
 from flexeval.classes.message import Message
 from flexeval.classes.thread import Thread
 from flexeval.classes.turn import Turn
@@ -16,7 +15,6 @@ class ToolCall(BaseModel):
     id = pw.IntegerField(primary_key=True)
-    evalsetrun = pw.ForeignKeyField(EvalSetRun, backref="toolcalls")
     dataset = pw.ForeignKeyField(Dataset, backref="toolcalls")
     thread = pw.ForeignKeyField(Thread, backref="toolcalls")
     message = pw.ForeignKeyField(Message, backref="toolcalls")

{python_flexeval-0.2.0 → python_flexeval-0.4.0}/src/flexeval/classes/turn.py RENAMED Viewed

@@ -7,7 +7,6 @@ from playhouse.shortcuts import model_to_dict
 from flexeval.classes.base import BaseModel
 from flexeval.classes.dataset import Dataset
-from flexeval.classes.eval_set_run import EvalSetRun
 from flexeval.classes.thread import Thread
 from flexeval.configuration import completion_functions
@@ -22,7 +21,6 @@ class Turn(BaseModel):
     id = pw.IntegerField(primary_key=True)
-    evalsetrun = pw.ForeignKeyField(EvalSetRun, backref="turns")
     dataset = pw.ForeignKeyField(Dataset, backref="turns")
     thread = pw.ForeignKeyField(Thread, backref="turns")
     index_in_thread = pw.IntegerField()
@@ -32,10 +30,13 @@ class Turn(BaseModel):
         super().__init__(**kwargs)
         self.metrics_to_evaluate = []
-    def get_completion(self):
+    def get_completion(self, completion_config: dict | None = None, evalsetrun=None):
         # only get a completion if this is the final turn - we probably don't want to branch from mid-conversation
         if self.is_final_turn_in_input:
-            completion_config = json.loads(self.evalsetrun.completion_llm)
+            if completion_config is None:
+                raise ValueError(
+                    "completion_config must be provided to get_completion()"
+                )
             completion_fn_name = completion_config.get("function_name", None)
             completion_function_kwargs = completion_config.get("kwargs", None)
@@ -69,7 +70,7 @@ class Turn(BaseModel):
             #       - make the completion function just return content?
             # {"choices": [{"message": {"content": "hi", "role": "assistant"}}]}
             result = model_to_dict(self, exclude=[self.id])
-            result["evalsetrun"] = self.evalsetrun
+            result["evalsetrun"] = evalsetrun
             result["dataset"] = self.dataset
             result["datasetrow"] = self.datasetrow
             result["turn_number"] = self.turn_number + 1
@@ -108,6 +109,7 @@ class Turn(BaseModel):
         """
         context = ""
         for message in self.messages:
+            # TODO why not just use message.get_context(include_system_prompt=include_system_prompt) here?
             context = message.context
             break
         context = json.loads(context)

{python_flexeval-0.2.0 → python_flexeval-0.4.0}/src/flexeval/completions.py RENAMED Viewed

@@ -55,10 +55,15 @@ def get_completion(turn: classes.turn.Turn, completion_llm: CompletionLlm):
     return completion
-def get_completions(eval_run: EvalRun, evalsetrun: classes.eval_set_run.EvalSetRun):
+def get_completions(
+    eval_run: EvalRun,
+    evalsetrun: classes.eval_set_run.EvalSetRun,
+    datasets: list[classes.dataset.Dataset],
+):
     n_workers = eval_run.config.max_workers
+    threads = [thread for dataset in datasets for thread in dataset.threads]
     if n_workers == 1:
-        for thread in evalsetrun.threads:
+        for thread in threads:
             # select last turn in thread
             if len(thread.turns) == 0:
                 continue
@@ -75,7 +80,7 @@ def get_completions(eval_run: EvalRun, evalsetrun: classes.eval_set_run.EvalSetR
     else:
         with ThreadPoolExecutor(max_workers=n_workers) as executor:
             futures: dict[Future, classes.turn.Turn] = {}
-            for thread in evalsetrun.threads:
+            for thread in threads:
                 if len(thread.turns) == 0:
                     continue
                 turn = (
@@ -113,7 +118,6 @@ def save_completion(
         new_turn = turn
     else:
         new_turn = classes.turn.Turn.create(
-            evalsetrun=evalsetrun,
             dataset=turn.dataset,
             thread=turn.thread,
             index_in_thread=turn.index_in_thread + 1,
@@ -129,7 +133,6 @@ def save_completion(
         {"role": prev_message.role, "content": prev_message.content}
     )
     classes.message.Message.create(
-        evalsetrun=evalsetrun,
         dataset=turn.dataset,
         thread=turn.thread,
         turn=new_turn,

python-flexeval 0.2.0__tar.gz → 0.4.0__tar.gz

python-flexeval 0.2.0tar.gz → 0.4.0tar.gz