PyPI - lmnr - Versions diffs - 0.4.12b4__tar.gz → 0.4.14__tar.gz - Mend

lmnr 0.4.12b4tar.gz → 0.4.14tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (50) hide show

{lmnr-0.4.12b4 → lmnr-0.4.14}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: lmnr
-Version: 0.4.12b4
+Version: 0.4.14
 Summary: Python SDK for Laminar AI
 License: Apache-2.0
 Author: lmnr.ai
@@ -59,63 +59,37 @@ Description-Content-Type: text/markdown
 # Laminar Python
-OpenTelemetry log sender for [Laminar](https://github.com/lmnr-ai/lmnr) for Python code.
+Python SDK for [Laminar](https://www.lmnr.ai).
+[Laminar](https://www.lmnr.ai) is an open-source platform for engineering LLM products. Trace, evaluate, annotate, and analyze LLM data. Bring LLM applications to production with confidence.
+Check our [open-source repo](https://github.com/lmnr-ai/lmnr) and don't forget to star it ⭐
  <a href="https://pypi.org/project/lmnr/"> ![PyPI - Version](https://img.shields.io/pypi/v/lmnr?label=lmnr&logo=pypi&logoColor=3775A9) </a>
 ![PyPI - Downloads](https://img.shields.io/pypi/dm/lmnr)
 ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/lmnr)
 ## Quickstart
 First, install the package:
 ```sh
-python3 -m venv .myenv
-source .myenv/bin/activate  # or use your favorite env management tool
 pip install lmnr
 ```
-Then, you can initialize Laminar in your main file and instrument your code.
+And then in the code
 ```python
-import os
-from openai import OpenAI
 from lmnr import Laminar as L
-L.initialize(
-    project_api_key=os.environ["LMNR_PROJECT_API_KEY"],
-)
-client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
-def poem_writer(topic: str):
-    prompt = f"write a poem about {topic}"
-    # OpenAI calls are automatically instrumented
-    response = client.chat.completions.create(
-        model="gpt-4o",
-        messages=[
-            {"role": "system", "content": "You are a helpful assistant."},
-            {"role": "user", "content": prompt},
-        ],
-    )
-    poem = response.choices[0].message.content
-    return poem
-if __name__ == "__main__":
-    print(poem_writer("laminar flow"))
+L.initialize(project_api_key="<PROJECT_API_KEY>")
 ```
-Note that you need to only initialize Laminar once in your application.
-### Project API key
+This will automatically instrument most of the LLM, Vector DB, and related
+calls with OpenTelemetry-compatible instrumentation.
-Get the key from the settings page of your Laminar project ([Learn more](https://docs.lmnr.ai/api-reference/introduction#authentication)).
-You can either pass it to `.initialize()` or set it to `.env` at the root of your package with the key `LMNR_PROJECT_API_KEY`.
+Note that you need to only initialize Laminar once in your application.
 ## Instrumentation
@@ -224,6 +198,68 @@ L.event("topic alignment", topic in poem)
 L.evaluate_event("excessive_wordiness", "check_wordy", {"text_input": poem})
 ```
+## Evaluations
+### Quickstart
+Install the package:
+```sh
+pip install lmnr
+```
+Create a file named `my_first_eval.py` with the following code:
+```python
+from lmnr import evaluate
+def write_poem(data):
+    return f"This is a good poem about {data['topic']}"
+def contains_poem(output, target):
+    return 1 if output in target['poem'] else 0
+# Evaluation data
+data = [
+    {"data": {"topic": "flowers"}, "target": {"poem": "This is a good poem about flowers"}},
+    {"data": {"topic": "cars"}, "target": {"poem": "I like cars"}},
+]
+evaluate(
+    data=data,
+    executor=write_poem,
+    evaluators={
+        "containsPoem": contains_poem
+    },
+    group_id="my_first_feature"
+)
+```
+Run the following commands:
+```sh
+export LMNR_PROJECT_API_KEY=<YOUR_PROJECT_API_KEY>  # get from Laminar project settings
+lmnr eval my_first_eval.py  # run in the virtual environment where lmnr is installed
+```
+Visit the URL printed in the console to see the results.
+### Overview
+Bring rigor to the development of your LLM applications with evaluations.
+You can run evaluations locally by providing executor (part of the logic used in your application) and evaluators (numeric scoring functions) to `evaluate` function.
+`evaluate` takes in the following parameters:
+- `data` – an array of `EvaluationDatapoint` objects, where each `EvaluationDatapoint` has two keys: `target` and `data`, each containing a key-value object. Alternatively, you can pass in dictionaries, and we will instantiate `EvaluationDatapoint`s with pydantic if possible
+- `executor` – the logic you want to evaluate. This function must take `data` as the first argument, and produce any output. It can be both a function or an `async` function.
+- `evaluators` – Dictionary which maps evaluator names to evaluators. Functions that take output of executor as the first argument, `target` as the second argument and produce a numeric scores. Each function can produce either a single number or `dict[str, int|float]` of scores. Each evaluator can be both a function or an `async` function.
+- `name` – optional name for the evaluation. Automatically generated if not provided.
+\* If you already have the outputs of executors you want to evaluate, you can specify the executor as an identity function, that takes in `data` and returns only needed value(s) from it.
+[Read docs](https://docs.lmnr.ai/evaluations/introduction) to learn more about evaluations.
 ## Laminar pipelines as prompt chain managers
 You can create Laminar pipelines in the UI and manage chains of LLM calls there.
@@ -258,71 +294,3 @@ PipelineRunResponse(
 )
 ```
-## Running offline evaluations on your data
-You can evaluate your code with your own data and send it to Laminar using the `Evaluation` class.
-Evaluation takes in the following parameters:
-- `name` – the name of your evaluation. If no such evaluation exists in the project, it will be created. Otherwise, data will be pushed to the existing evaluation
-- `data` – an array of `EvaluationDatapoint` objects, where each `EvaluationDatapoint` has two keys: `target` and `data`, each containing a key-value object. Alternatively, you can pass in dictionaries, and we will instantiate `EvaluationDatapoint`s with pydantic if possible
-- `executor` – the logic you want to evaluate. This function must take `data` as the first argument, and produce any output. *
-- `evaluators` – evaluaton logic. Functions that take output of executor as the first argument, `target` as the second argument and produce a numeric scores. Pass a dict from evaluator name to a function. Each function can produce either a single number or `dict[str, int|float]` of scores.
-\* If you already have the outputs of executors you want to evaluate, you can specify the executor as an identity function, that takes in `data` and returns only needed value(s) from it.
-### Example code
-```python
-from lmnr import evaluate
-from openai import AsyncOpenAI
-import asyncio
-import os
-openai_client = AsyncOpenAI(api_key=os.environ["OPENAI_API_KEY"])
-async def get_capital(data):
-    country = data["country"]
-    response = await openai_client.chat.completions.create(
-        model="gpt-4o-mini",
-        messages=[
-            {"role": "system", "content": "You are a helpful assistant."},
-            {
-                "role": "user",
-                "content": f"What is the capital of {country}? Just name the "
-                "city and nothing else",
-            },
-        ],
-    )
-    return response.choices[0].message.content.strip()
-# Evaluation data
-data = [
-    {"data": {"country": "Canada"}, "target": {"capital": "Ottawa"}},
-    {"data": {"country": "Germany"}, "target": {"capital": "Berlin"}},
-    {"data": {"country": "Tanzania"}, "target": {"capital": "Dodoma"}},
-]
-def correctness(output, target):
-    return 1 if output == target["capital"] else 0
-# Create an Evaluation instance
-e = evaluate(
-    name="my-evaluation",
-    data=data,
-    executor=get_capital,
-    evaluators={"correctness": correctness},
-    project_api_key=os.environ["LMNR_PROJECT_API_KEY"],
-)
-```
-### Running from CLI.
-1. Make sure `lmnr` is installed in a venv. CLI does not work with a global env
-1. Run `lmnr path/to/my/eval.py`
-### Running from code
-Simply execute the function, e.g. `python3 path/to/my/eval.py`

{lmnr-0.4.12b4 → lmnr-0.4.14}/README.md RENAMED Viewed

@@ -1,62 +1,36 @@
 # Laminar Python
-OpenTelemetry log sender for [Laminar](https://github.com/lmnr-ai/lmnr) for Python code.
+Python SDK for [Laminar](https://www.lmnr.ai).
+[Laminar](https://www.lmnr.ai) is an open-source platform for engineering LLM products. Trace, evaluate, annotate, and analyze LLM data. Bring LLM applications to production with confidence.
+Check our [open-source repo](https://github.com/lmnr-ai/lmnr) and don't forget to star it ⭐
  <a href="https://pypi.org/project/lmnr/"> ![PyPI - Version](https://img.shields.io/pypi/v/lmnr?label=lmnr&logo=pypi&logoColor=3775A9) </a>
 ![PyPI - Downloads](https://img.shields.io/pypi/dm/lmnr)
 ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/lmnr)
 ## Quickstart
 First, install the package:
 ```sh
-python3 -m venv .myenv
-source .myenv/bin/activate  # or use your favorite env management tool
 pip install lmnr
 ```
-Then, you can initialize Laminar in your main file and instrument your code.
+And then in the code
 ```python
-import os
-from openai import OpenAI
 from lmnr import Laminar as L
-L.initialize(
-    project_api_key=os.environ["LMNR_PROJECT_API_KEY"],
-)
-client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
-def poem_writer(topic: str):
-    prompt = f"write a poem about {topic}"
-    # OpenAI calls are automatically instrumented
-    response = client.chat.completions.create(
-        model="gpt-4o",
-        messages=[
-            {"role": "system", "content": "You are a helpful assistant."},
-            {"role": "user", "content": prompt},
-        ],
-    )
-    poem = response.choices[0].message.content
-    return poem
-if __name__ == "__main__":
-    print(poem_writer("laminar flow"))
+L.initialize(project_api_key="<PROJECT_API_KEY>")
 ```
-Note that you need to only initialize Laminar once in your application.
-### Project API key
+This will automatically instrument most of the LLM, Vector DB, and related
+calls with OpenTelemetry-compatible instrumentation.
-Get the key from the settings page of your Laminar project ([Learn more](https://docs.lmnr.ai/api-reference/introduction#authentication)).
-You can either pass it to `.initialize()` or set it to `.env` at the root of your package with the key `LMNR_PROJECT_API_KEY`.
+Note that you need to only initialize Laminar once in your application.
 ## Instrumentation
@@ -165,6 +139,68 @@ L.event("topic alignment", topic in poem)
 L.evaluate_event("excessive_wordiness", "check_wordy", {"text_input": poem})
 ```
+## Evaluations
+### Quickstart
+Install the package:
+```sh
+pip install lmnr
+```
+Create a file named `my_first_eval.py` with the following code:
+```python
+from lmnr import evaluate
+def write_poem(data):
+    return f"This is a good poem about {data['topic']}"
+def contains_poem(output, target):
+    return 1 if output in target['poem'] else 0
+# Evaluation data
+data = [
+    {"data": {"topic": "flowers"}, "target": {"poem": "This is a good poem about flowers"}},
+    {"data": {"topic": "cars"}, "target": {"poem": "I like cars"}},
+]
+evaluate(
+    data=data,
+    executor=write_poem,
+    evaluators={
+        "containsPoem": contains_poem
+    },
+    group_id="my_first_feature"
+)
+```
+Run the following commands:
+```sh
+export LMNR_PROJECT_API_KEY=<YOUR_PROJECT_API_KEY>  # get from Laminar project settings
+lmnr eval my_first_eval.py  # run in the virtual environment where lmnr is installed
+```
+Visit the URL printed in the console to see the results.
+### Overview
+Bring rigor to the development of your LLM applications with evaluations.
+You can run evaluations locally by providing executor (part of the logic used in your application) and evaluators (numeric scoring functions) to `evaluate` function.
+`evaluate` takes in the following parameters:
+- `data` – an array of `EvaluationDatapoint` objects, where each `EvaluationDatapoint` has two keys: `target` and `data`, each containing a key-value object. Alternatively, you can pass in dictionaries, and we will instantiate `EvaluationDatapoint`s with pydantic if possible
+- `executor` – the logic you want to evaluate. This function must take `data` as the first argument, and produce any output. It can be both a function or an `async` function.
+- `evaluators` – Dictionary which maps evaluator names to evaluators. Functions that take output of executor as the first argument, `target` as the second argument and produce a numeric scores. Each function can produce either a single number or `dict[str, int|float]` of scores. Each evaluator can be both a function or an `async` function.
+- `name` – optional name for the evaluation. Automatically generated if not provided.
+\* If you already have the outputs of executors you want to evaluate, you can specify the executor as an identity function, that takes in `data` and returns only needed value(s) from it.
+[Read docs](https://docs.lmnr.ai/evaluations/introduction) to learn more about evaluations.
 ## Laminar pipelines as prompt chain managers
 You can create Laminar pipelines in the UI and manage chains of LLM calls there.
@@ -198,72 +234,3 @@ PipelineRunResponse(
     run_id='53b012d5-5759-48a6-a9c5-0011610e3669'
 )
 ```
-## Running offline evaluations on your data
-You can evaluate your code with your own data and send it to Laminar using the `Evaluation` class.
-Evaluation takes in the following parameters:
-- `name` – the name of your evaluation. If no such evaluation exists in the project, it will be created. Otherwise, data will be pushed to the existing evaluation
-- `data` – an array of `EvaluationDatapoint` objects, where each `EvaluationDatapoint` has two keys: `target` and `data`, each containing a key-value object. Alternatively, you can pass in dictionaries, and we will instantiate `EvaluationDatapoint`s with pydantic if possible
-- `executor` – the logic you want to evaluate. This function must take `data` as the first argument, and produce any output. *
-- `evaluators` – evaluaton logic. Functions that take output of executor as the first argument, `target` as the second argument and produce a numeric scores. Pass a dict from evaluator name to a function. Each function can produce either a single number or `dict[str, int|float]` of scores.
-\* If you already have the outputs of executors you want to evaluate, you can specify the executor as an identity function, that takes in `data` and returns only needed value(s) from it.
-### Example code
-```python
-from lmnr import evaluate
-from openai import AsyncOpenAI
-import asyncio
-import os
-openai_client = AsyncOpenAI(api_key=os.environ["OPENAI_API_KEY"])
-async def get_capital(data):
-    country = data["country"]
-    response = await openai_client.chat.completions.create(
-        model="gpt-4o-mini",
-        messages=[
-            {"role": "system", "content": "You are a helpful assistant."},
-            {
-                "role": "user",
-                "content": f"What is the capital of {country}? Just name the "
-                "city and nothing else",
-            },
-        ],
-    )
-    return response.choices[0].message.content.strip()
-# Evaluation data
-data = [
-    {"data": {"country": "Canada"}, "target": {"capital": "Ottawa"}},
-    {"data": {"country": "Germany"}, "target": {"capital": "Berlin"}},
-    {"data": {"country": "Tanzania"}, "target": {"capital": "Dodoma"}},
-]
-def correctness(output, target):
-    return 1 if output == target["capital"] else 0
-# Create an Evaluation instance
-e = evaluate(
-    name="my-evaluation",
-    data=data,
-    executor=get_capital,
-    evaluators={"correctness": correctness},
-    project_api_key=os.environ["LMNR_PROJECT_API_KEY"],
-)
-```
-### Running from CLI.
-1. Make sure `lmnr` is installed in a venv. CLI does not work with a global env
-1. Run `lmnr path/to/my/eval.py`
-### Running from code
-Simply execute the function, e.g. `python3 path/to/my/eval.py`

{lmnr-0.4.12b4 → lmnr-0.4.14}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "lmnr"
-version = "0.4.12b4"
+version = "0.4.14"
 description = "Python SDK for Laminar AI"
 authors = [
   { name = "lmnr.ai", email = "founders@lmnr.ai" }
@@ -11,7 +11,7 @@ license = "Apache-2.0"
 [tool.poetry]
 name = "lmnr"
-version = "0.4.12b4"
+version = "0.4.14"
 description = "Python SDK for Laminar AI"
 authors = ["lmnr.ai"]
 readme = "README.md"

{lmnr-0.4.12b4 → lmnr-0.4.14}/src/lmnr/sdk/evaluations.py RENAMED Viewed

@@ -1,4 +1,5 @@
 import asyncio
+import re
 import sys
 from abc import ABC, abstractmethod
 from contextlib import contextmanager
@@ -12,7 +13,6 @@ from ..traceloop_sdk.tracing.attributes import SPAN_TYPE
 from .laminar import Laminar as L
 from .types import (
-    CreateEvaluationResponse,
     Datapoint,
     EvaluationResultDatapoint,
     EvaluatorFunction,
@@ -46,13 +46,26 @@ def get_evaluation_url(project_id: str, evaluation_id: str):
     return f"https://www.lmnr.ai/project/{project_id}/evaluations/{evaluation_id}"
+def get_average_scores(results: list[EvaluationResultDatapoint]) -> dict[str, Numeric]:
+    per_score_values = {}
+    for result in results:
+        for key, value in result.scores.items():
+            if key not in per_score_values:
+                per_score_values[key] = []
+            per_score_values[key].append(value)
+    average_scores = {}
+    for key, values in per_score_values.items():
+        average_scores[key] = sum(values) / len(values)
+    return average_scores
 class EvaluationReporter:
     def __init__(self):
         pass
-    def start(self, name: str, project_id: str, id: str, length: int):
-        print(f"Running evaluation {name}...\n")
-        print(f"Check progress and results at {get_evaluation_url(project_id, id)}\n")
+    def start(self, length: int):
         self.cli_progress = tqdm(
             total=length,
             bar_format="{bar} {percentage:3.0f}% | ETA: {remaining}s | {n_fmt}/{total_fmt}",
@@ -66,9 +79,10 @@ class EvaluationReporter:
         self.cli_progress.close()
         sys.stderr.write(f"\nError: {error}\n")
-    def stop(self, average_scores: dict[str, Numeric]):
+    def stop(self, average_scores: dict[str, Numeric], project_id: str, evaluation_id: str):
         self.cli_progress.close()
-        print("\nAverage scores:")
+        print(f"\nCheck progress and results at {get_evaluation_url(project_id, evaluation_id)}\n")
+        print("Average scores:")
         for name, score in average_scores.items():
             print(f"{name}: {score}")
         print("\n")
@@ -97,6 +111,7 @@ class Evaluation:
         data: Union[EvaluationDataset, list[Union[Datapoint, dict]]],
         executor: Any,
         evaluators: dict[str, EvaluatorFunction],
+        group_id: Optional[str] = None,
         name: Optional[str] = None,
         batch_size: int = DEFAULT_BATCH_SIZE,
         project_api_key: Optional[str] = None,
@@ -123,6 +138,8 @@ class Evaluation:
                 evaluator function. If the function is anonymous, it will be
                 named `evaluator_${index}`, where index is the index of the
                 evaluator function in the list starting from 1.
+            group_id (Optional[str], optional): Group id of the evaluation.
+                            Defaults to "default".
             name (Optional[str], optional): The name of the evaluation.
                             It will be auto-generated if not provided.
             batch_size (int, optional): The batch size for evaluation.
@@ -138,11 +155,16 @@ class Evaluation:
                             Defaults to None. If None, all available instruments will be used.
         """
+        if not evaluators:
+            raise ValueError("No evaluators provided")
+        # TODO: Compile regex once and then reuse it
+        for evaluator_name in evaluators:
+            if not re.match(r'^[\w\s-]+$', evaluator_name):
+                raise ValueError(f'Invalid evaluator key: "{evaluator_name}". Keys must only contain letters, digits, hyphens, underscores, or spaces.')
         self.is_finished = False
-        self.name = name
         self.reporter = EvaluationReporter()
-        self.executor = executor
-        self.evaluators = evaluators
         if isinstance(data, list):
             self.data = [
                 (Datapoint.model_validate(point) if isinstance(point, dict) else point)
@@ -150,6 +172,10 @@ class Evaluation:
             ]
         else:
             self.data = data
+        self.executor = executor
+        self.evaluators = evaluators
+        self.group_id = group_id
+        self.name = name
         self.batch_size = batch_size
         L.initialize(
             project_api_key=project_api_key,
@@ -160,23 +186,6 @@ class Evaluation:
         )
     def run(self) -> Union[None, Awaitable[None]]:
-        """Runs the evaluation.
-        Creates a new evaluation if no evaluation with such name exists, or
-        adds data to an existing one otherwise. Evaluates data points in
-        batches of `self.batch_size`. The executor
-        function is called on each data point to get the output,
-        and then evaluate it by each evaluator function.
-        Usage:
-        ```python
-        # in a synchronous context:
-        e.run()
-        # in an asynchronous context:
-        await e.run()
-        ```
-        """
         if self.is_finished:
             raise Exception("Evaluation is already finished")
@@ -187,41 +196,34 @@ class Evaluation:
             return loop.run_until_complete(self._run())
     async def _run(self) -> None:
-        evaluation = L.create_evaluation(self.name)
         self.reporter.start(
-            evaluation.name,
-            evaluation.projectId,
-            evaluation.id,
             len(self.data),
         )
         try:
-            await self.evaluate_in_batches(evaluation)
+            result_datapoints = await self.evaluate_in_batches()
         except Exception as e:
-            L.update_evaluation_status(evaluation.id, "Error")
             self.reporter.stopWithError(e)
             self.is_finished = True
             return
+        else:
+            evaluation = L.create_evaluation(data=result_datapoints, group_id=self.group_id, name=self.name)
+            average_scores = get_average_scores(result_datapoints)
+            self.reporter.stop(average_scores, evaluation.projectId, evaluation.id)
+            self.is_finished = True
-        # If we update with status "Finished", we expect averageScores to be not empty
-        updated_evaluation = L.update_evaluation_status(evaluation.id, "Finished")
-        self.reporter.stop(updated_evaluation.averageScores)
-        self.is_finished = True
-    async def evaluate_in_batches(self, evaluation: CreateEvaluationResponse):
+    async def evaluate_in_batches(self) -> list[EvaluationResultDatapoint]:
+        result_datapoints = []
         for i in range(0, len(self.data), self.batch_size):
             batch = (
-                self.data[i : i + self.batch_size]
+                self.data[i: i + self.batch_size]
                 if isinstance(self.data, list)
                 else self.data.slice(i, i + self.batch_size)
             )
-            try:
-                results = await self._evaluate_batch(batch)
-                L.post_evaluation_results(evaluation.id, results)
-            except Exception as e:
-                print(f"Error evaluating batch: {e}")
-            finally:
-                self.reporter.update(len(batch))
+            batch_datapoints = await self._evaluate_batch(batch)
+            result_datapoints.extend(batch_datapoints)
+            self.reporter.update(len(batch))
+        return result_datapoints
     async def _evaluate_batch(
         self, batch: list[Datapoint]
@@ -252,7 +254,7 @@ class Evaluation:
             scores: dict[str, Numeric] = {}
             for evaluator_name, evaluator in self.evaluators.items():
                 with L.start_as_current_span(
-                    "evaluator", input={"output": output, "target": target}
+                    evaluator_name, input={"output": output, "target": target}
                 ) as evaluator_span:
                     evaluator_span.set_attribute(SPAN_TYPE, SpanType.EVALUATOR.value)
                     value = (
@@ -282,6 +284,7 @@ def evaluate(
     data: Union[EvaluationDataset, list[Union[Datapoint, dict]]],
     executor: ExecutorFunction,
     evaluators: dict[str, EvaluatorFunction],
+    group_id: Optional[str] = None,
     name: Optional[str] = None,
     batch_size: int = DEFAULT_BATCH_SIZE,
     project_api_key: Optional[str] = None,
@@ -310,8 +313,11 @@ def evaluate(
             evaluator function. If the function is anonymous, it will be
             named `evaluator_${index}`, where index is the index of the
             evaluator function in the list starting from 1.
-        name (Optional[str], optional): The name of the evaluation.
-            It will be auto-generated if not provided.
+        group_id (Optional[str], optional): Group name which is same
+                        as the feature you are evaluating in your project or application.
+                        Defaults to "default".
+        name (Optional[str], optional): Optional name of the evaluation. Used to easily
+                        identify the evaluation in the group.
         batch_size (int, optional): The batch size for evaluation.
                         Defaults to DEFAULT_BATCH_SIZE.
         project_api_key (Optional[str], optional): The project API key.
@@ -331,6 +337,7 @@ def evaluate(
         data=data,
         executor=executor,
         evaluators=evaluators,
+        group_id=group_id,
         name=name,
         batch_size=batch_size,
         project_api_key=project_api_key,

{lmnr-0.4.12b4 → lmnr-0.4.14}/src/lmnr/sdk/laminar.py RENAMED Viewed

@@ -3,11 +3,9 @@ from opentelemetry import context
 from opentelemetry.trace import (
     INVALID_SPAN,
     get_current_span,
-    SpanKind,
 )
 from opentelemetry.util.types import AttributeValue
-from opentelemetry.context.context import Context
-from opentelemetry.util import types
+from opentelemetry.context import set_value, attach, detach
 from lmnr.traceloop_sdk import Traceloop
 from lmnr.traceloop_sdk.tracing import get_tracer
 from contextlib import contextmanager
@@ -29,10 +27,12 @@ from lmnr.traceloop_sdk.tracing.attributes import (
     SESSION_ID,
     SPAN_INPUT,
     SPAN_OUTPUT,
+    SPAN_PATH,
     TRACE_TYPE,
     USER_ID,
 )
 from lmnr.traceloop_sdk.tracing.tracing import (
+    get_span_path,
     set_association_properties,
     update_association_properties,
 )
@@ -47,7 +47,6 @@ from .types import (
     NodeInput,
     PipelineRunRequest,
     TraceType,
-    UpdateEvaluationResponse,
 )
@@ -315,14 +314,6 @@ class Laminar:
         cls,
         name: str,
         input: Any = None,
-        context: Optional[Context] = None,
-        kind: SpanKind = SpanKind.INTERNAL,
-        attributes: types.Attributes = None,
-        links=None,
-        start_time: Optional[int] = None,
-        record_exception: bool = True,
-        set_status_on_exception: bool = True,
-        end_on_exit: bool = True,
     ):
         """Start a new span as the current span. Useful for manual instrumentation.
         This is the preferred and more stable way to use manual instrumentation.
@@ -337,32 +328,15 @@ class Laminar:
             name (str): name of the span
             input (Any, optional): input to the span. Will be sent as an
                 attribute, so must be json serializable. Defaults to None.
-            context (Optional[Context], optional): context to start the span in.
-                Defaults to None.
-            kind (SpanKind, optional): kind of the span. Defaults to SpanKind.INTERNAL.
-            attributes (types.Attributes, optional): attributes to set on the span.
-                Defaults to None.
-            links ([type], optional): links to set on the span. Defaults to None.
-            start_time (Optional[int], optional): start time of the span.
-                Defaults to None.
-            record_exception (bool, optional): whether to record exceptions.
-                Defaults to True.
-            set_status_on_exception (bool, optional): whether to set status on exception.
-                Defaults to True.
-            end_on_exit (bool, optional): whether to end the span on exit.
-                Defaults to True.
         """
         with get_tracer() as tracer:
+            span_path = get_span_path(name)
+            ctx = set_value("span_path", span_path)
+            ctx_token = attach(set_value("span_path", span_path))
             with tracer.start_as_current_span(
                 name,
-                context=context,
-                kind=kind,
-                attributes=attributes,
-                links=links,
-                start_time=start_time,
-                record_exception=record_exception,
-                set_status_on_exception=set_status_on_exception,
-                end_on_exit=end_on_exit,
+                context=ctx,
+                attributes={SPAN_PATH: span_path},
             ) as span:
                 if input is not None:
                     span.set_attribute(
@@ -371,6 +345,12 @@ class Laminar:
                     )
                 yield span
+            # TODO: Figure out if this is necessary
+            try:
+                detach(ctx_token)
+            except Exception:
+                pass
     @classmethod
     def set_span_output(cls, output: Any = None):
         """Set the output of the current span. Useful for manual instrumentation.
@@ -432,10 +412,14 @@ class Laminar:
         set_association_properties(props)
     @classmethod
-    def create_evaluation(cls, name: Optional[str]) -> CreateEvaluationResponse:
+    def create_evaluation(cls, data: list[EvaluationResultDatapoint], group_id: Optional[str] = None, name: Optional[str] = None) -> CreateEvaluationResponse:
         response = requests.post(
             cls.__base_http_url + "/v1/evaluations",
-            data=json.dumps({"name": name}),
+            data=json.dumps({
+                "groupId": group_id,
+                "name": name,
+                "points": [datapoint.to_dict() for datapoint in data]
+            }),
             headers=cls._headers(),
         )
         if response.status_code != 200:
@@ -446,66 +430,6 @@ class Laminar:
                 raise ValueError(f"Error creating evaluation {response.text}")
         return CreateEvaluationResponse.model_validate(response.json())
-    @classmethod
-    def post_evaluation_results(
-        cls, evaluation_id: uuid.UUID, data: list[EvaluationResultDatapoint]
-    ) -> requests.Response:
-        body = {
-            "evaluationId": str(evaluation_id),
-            "points": [datapoint.to_dict() for datapoint in data],
-        }
-        response = requests.post(
-            cls.__base_http_url + "/v1/evaluation-datapoints",
-            data=json.dumps(body),
-            headers=cls._headers(),
-        )
-        if response.status_code != 200:
-            try:
-                resp_json = response.json()
-                raise ValueError(
-                    f"Failed to send evaluation results. Response: {json.dumps(resp_json)}"
-                )
-            except Exception:
-                raise ValueError(
-                    f"Failed to send evaluation results. Error: {response.text}"
-                )
-        return response
-    @classmethod
-    def update_evaluation_status(
-        cls, evaluation_id: str, status: str
-    ) -> UpdateEvaluationResponse:
-        """
-        Updates the status of an evaluation. Returns the updated evaluation object.
-        Args:
-            evaluation_id (str): The ID of the evaluation to update.
-            status (str): The status to set for the evaluation.
-        Returns:
-            UpdateEvaluationResponse: The updated evaluation response.
-        Raises:
-            ValueError: If the request fails.
-        """
-        body = {
-            "status": status,
-        }
-        url = f"{cls.__base_http_url}/v1/evaluations/{evaluation_id}"
-        response = requests.post(
-            url,
-            data=json.dumps(body),
-            headers=cls._headers(),
-        )
-        if response.status_code != 200:
-            raise ValueError(
-                f"Failed to update evaluation status {evaluation_id}. "
-                f"Response: {response.text}"
-            )
-        return UpdateEvaluationResponse.model_validate(response.json())
     @classmethod
     def _headers(cls):
         assert cls.__project_api_key is not None, "Project API key is not set"

{lmnr-0.4.12b4 → lmnr-0.4.14}/src/lmnr/sdk/types.py RENAMED Viewed

@@ -2,7 +2,7 @@ import datetime
 from enum import Enum
 import pydantic
 import requests
-from typing import Any, Awaitable, Callable, Literal, Optional, Union
+from typing import Any, Awaitable, Callable, Optional, Union
 import uuid
 from .utils import serialize
@@ -107,20 +107,13 @@ EvaluatorFunction = Callable[
     Union[EvaluatorFunctionReturnType, Awaitable[EvaluatorFunctionReturnType]],
 ]
-EvaluationStatus = Literal["Started", "Finished", "Error"]
 class CreateEvaluationResponse(pydantic.BaseModel):
     id: uuid.UUID
     createdAt: datetime.datetime
+    groupId: str
     name: str
-    status: EvaluationStatus
     projectId: uuid.UUID
-    metadata: Optional[dict[str, Any]] = None
-    averageScores: Optional[dict[str, Numeric]] = None
-UpdateEvaluationResponse = CreateEvaluationResponse
 class EvaluationResultDatapoint(pydantic.BaseModel):

{lmnr-0.4.12b4 → lmnr-0.4.14}/src/lmnr/traceloop_sdk/decorators/base.py RENAMED Viewed

@@ -10,8 +10,8 @@ from opentelemetry import context as context_api
 from lmnr.sdk.utils import get_input_from_func_args, is_method
 from lmnr.traceloop_sdk.tracing import get_tracer
-from lmnr.traceloop_sdk.tracing.attributes import SPAN_INPUT, SPAN_OUTPUT
-from lmnr.traceloop_sdk.tracing.tracing import TracerWrapper
+from lmnr.traceloop_sdk.tracing.attributes import SPAN_INPUT, SPAN_OUTPUT, SPAN_PATH
+from lmnr.traceloop_sdk.tracing.tracing import TracerWrapper, get_span_path
 from lmnr.traceloop_sdk.utils.json_encoder import JSONEncoder
@@ -47,7 +47,12 @@ def entity_method(
             with get_tracer() as tracer:
                 span = tracer.start_span(span_name)
-                ctx = trace.set_span_in_context(span)
+                span_path = get_span_path(span_name)
+                span.set_attribute(SPAN_PATH, span_path)
+                ctx = context_api.set_value("span_path", span_path)
+                ctx = trace.set_span_in_context(span, ctx)
                 ctx_token = context_api.attach(ctx)
                 try:
@@ -104,7 +109,12 @@ def aentity_method(
             with get_tracer() as tracer:
                 span = tracer.start_span(span_name)
-                ctx = trace.set_span_in_context(span)
+                span_path = get_span_path(span_name)
+                span.set_attribute(SPAN_PATH, span_path)
+                ctx = context_api.set_value("span_path", span_path)
+                ctx = trace.set_span_in_context(span, ctx)
                 ctx_token = context_api.attach(ctx)
                 try:

{lmnr-0.4.12b4 → lmnr-0.4.14}/src/lmnr/traceloop_sdk/tracing/attributes.py RENAMED Viewed

@@ -1,6 +1,7 @@
 SPAN_INPUT = "lmnr.span.input"
 SPAN_OUTPUT = "lmnr.span.output"
 SPAN_TYPE = "lmnr.span.type"
+SPAN_PATH = "lmnr.span.path"
 ASSOCIATION_PROPERTIES = "lmnr.association.properties"
 SESSION_ID = "session_id"

{lmnr-0.4.12b4 → lmnr-0.4.14}/src/lmnr/traceloop_sdk/tracing/tracing.py RENAMED Viewed

@@ -25,7 +25,7 @@ from opentelemetry.instrumentation.threading import ThreadingInstrumentor
 # from lmnr.traceloop_sdk import Telemetry
 from lmnr.traceloop_sdk.instruments import Instruments
-from lmnr.traceloop_sdk.tracing.attributes import ASSOCIATION_PROPERTIES
+from lmnr.traceloop_sdk.tracing.attributes import ASSOCIATION_PROPERTIES, SPAN_PATH
 from lmnr.traceloop_sdk.tracing.content_allow_list import ContentAllowList
 from lmnr.traceloop_sdk.utils import is_notebook
 from lmnr.traceloop_sdk.utils.package_check import is_package_installed
@@ -245,6 +245,14 @@ class TracerWrapper(object):
         self.flush()
     def _span_processor_on_start(self, span, parent_context):
+        span_path = get_value("span_path")
+        if span_path is not None:
+            # This is done redundantly here for most decorated functions
+            # However, need to do this for auto-instrumented libraries.
+            # Then, for auto-instrumented ones, they'll attach
+            # the final part of the name to the span on the backend.
+            span.set_attribute(SPAN_PATH, span_path)
         association_properties = get_value("association_properties")
         if association_properties is not None:
             _set_association_properties_attributes(span, association_properties)
@@ -318,6 +326,12 @@ def _set_association_properties_attributes(span, properties: dict) -> None:
         )
+def get_span_path(span_name: str) -> str:
+    current_span_path = get_value("span_path")
+    span_path = f"{current_span_path}.{span_name}" if current_span_path else span_name
+    return span_path
 def set_managed_prompt_tracing_context(
     key: str,
     version: int,