PyPI - openaivec - Versions diffs - 1.0.1__tar.gz → 1.0.3__tar.gz - Mend

openaivec 1.0.1tar.gz → 1.0.3tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (90) hide show

{openaivec-1.0.1 → openaivec-1.0.3}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: openaivec
-Version: 1.0.1
+Version: 1.0.3
 Summary: Generative mutation for tabular calculation
 Project-URL: Homepage, https://microsoft.github.io/openaivec/
 Project-URL: Repository, https://github.com/microsoft/openaivec
@@ -57,13 +57,27 @@ reviews = pd.Series([
 sentiment = reviews.ai.responses(
     "Summarize sentiment in one short sentence.",
-    reasoning={"effort": "medium"},  # Mirrors OpenAI SDK for reasoning models
+    reasoning={"effort": "none"},  # Mirrors OpenAI SDK for reasoning models
 )
 print(sentiment.tolist())
 ```
 **Try it live:** https://microsoft.github.io/openaivec/examples/pandas/
+## Benchmarks
+Simple task benchmark from [benchmark.ipynb](https://github.com/microsoft/openaivec/blob/main/docs/examples/benchmark.ipynb) (100 numeric strings → integer literals, `Series.aio.responses`, model `gpt-5.1`):
+| Mode                | Settings                                        | Time (s) |
+| ------------------- | ----------------------------------------------- | -------- |
+| Serial              | `batch_size=1`, `max_concurrency=1`             | ~141     |
+| Batching            | default `batch_size`, `max_concurrency=1`       | ~15      |
+| Concurrent batching | default `batch_size`, default `max_concurrency` | ~6       |
+Batching alone removes most HTTP overhead, and letting batching overlap with concurrency cuts total runtime to a few seconds while still yielding one output per input.
+![Benchmark comparison for simple task](https://private-user-images.githubusercontent.com/6128022/519474214-d1931e34-6f9e-4695-8042-88b771e002c3.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NjQyMDc5ODAsIm5iZiI6MTc2NDIwNzY4MCwicGF0aCI6Ii82MTI4MDIyLzUxOTQ3NDIxNC1kMTkzMWUzNC02ZjllLTQ2OTUtODA0Mi04OGI3NzFlMDAyYzMucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MTEyNyUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTExMjdUMDE0MTIwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9Y2JhYmU2YjZhNDUxNDkxZDg5NGMxZGI1OTUzODgyYjQ4OTVhYzEzZjU3NmRkMjE1M2Y1ZDI3ZTdiNWI0M2VlMCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.yuxT4AbDIBNsRGCIxPMjpGiHFqLcQUCLg_DjpqH02Lw)
 ## Contents
 - [Why openaivec?](#why-openaivec)
@@ -109,7 +123,7 @@ client = BatchResponses.of(
 result = client.parse(
     ["panda", "rabbit", "koala"],
-    reasoning={"effort": "medium"},  # Required for gpt-5.1
+    reasoning={"effort": "none"},
 )
 print(result)  # Expected output: ['bear family', 'rabbit family', 'koala family']
 ```
@@ -147,15 +161,15 @@ df = pd.DataFrame({"name": ["panda", "rabbit", "koala"]})
 result = df.assign(
     family=lambda df: df.name.ai.responses(
         "What animal family? Answer with 'X family'",
-        reasoning={"effort": "medium"},
+        reasoning={"effort": "none"},
     ),
     habitat=lambda df: df.name.ai.responses(
         "Primary habitat in one word",
-        reasoning={"effort": "medium"},
+        reasoning={"effort": "none"},
     ),
     fun_fact=lambda df: df.name.ai.responses(
         "One interesting fact in 10 words or less",
-        reasoning={"effort": "medium"},
+        reasoning={"effort": "none"},
     ),
 )
 ```
@@ -178,7 +192,7 @@ pandas_ext.set_responses_model("o1-mini")  # Set your reasoning model
 result = df.assign(
     analysis=lambda df: df.text.ai.responses(
         "Analyze this text step by step",
-        reasoning={"effort": "medium"}  # Optional: mirrors the OpenAI SDK argument
+        reasoning={"effort": "none"}  # Optional: mirrors the OpenAI SDK argument
     )
 )
 ```
@@ -232,7 +246,7 @@ df = pd.DataFrame({"text": [
 async def process_data():
     return await df["text"].aio.responses(
         "Analyze sentiment and classify as positive/negative/neutral",
-        reasoning={"effort": "medium"},  # Required for gpt-5.1
+        reasoning={"effort": "none"},  # Required for gpt-5.1
         max_concurrency=12    # Allow up to 12 concurrent requests
     )
@@ -284,7 +298,7 @@ spark.udf.register(
     "extract_brand",
     responses_udf(
         instructions="Extract the brand name from the product. Return only the brand name.",
-        reasoning={"effort": "medium"},  # Recommended with gpt-5.1
+        reasoning={"effort": "none"},  # Recommended with gpt-5.1
     )
 )
@@ -298,7 +312,7 @@ spark.udf.register(
     responses_udf(
         instructions="Translate the text to English, French, and Japanese.",
         response_format=Translation,
-        reasoning={"effort": "medium"},  # Recommended with gpt-5.1
+        reasoning={"effort": "none"},  # Recommended with gpt-5.1
     )
 )
@@ -336,7 +350,7 @@ prompt = (
 ## Using with Microsoft Fabric
-[Microsoft Fabric](https://www.microsoft.com/en-us/microsoft-fabric/) is a unified, cloud-based analytics platform. Add `openaivec` from PyPI in your Fabric environment, select it in your notebook, and use `openaivec.spark` like standard Spark. Detailed walkthrough: 📓 **[Fabric guide →](https://microsoft.github.io/openaivec/examples/fabric/)**.
+[Microsoft Fabric](https://www.microsoft.com/en-us/microsoft-fabric/) is a unified, cloud-based analytics platform. Add `openaivec` from PyPI in your Fabric environment, select it in your notebook, and use `openaivec.spark` like standard Spark.
 ## Contributing
@@ -374,4 +388,4 @@ uv run pytest -m "not slow and not requires_api"
 ## Community
-Join our Discord community for support and announcements: https://discord.gg/vbb83Pgn
+Join our Discord community for support and announcements: https://discord.gg/hXCS9J6Qek

{openaivec-1.0.1 → openaivec-1.0.3}/README.md RENAMED Viewed

@@ -31,13 +31,27 @@ reviews = pd.Series([
 sentiment = reviews.ai.responses(
     "Summarize sentiment in one short sentence.",
-    reasoning={"effort": "medium"},  # Mirrors OpenAI SDK for reasoning models
+    reasoning={"effort": "none"},  # Mirrors OpenAI SDK for reasoning models
 )
 print(sentiment.tolist())
 ```
 **Try it live:** https://microsoft.github.io/openaivec/examples/pandas/
+## Benchmarks
+Simple task benchmark from [benchmark.ipynb](https://github.com/microsoft/openaivec/blob/main/docs/examples/benchmark.ipynb) (100 numeric strings → integer literals, `Series.aio.responses`, model `gpt-5.1`):
+| Mode                | Settings                                        | Time (s) |
+| ------------------- | ----------------------------------------------- | -------- |
+| Serial              | `batch_size=1`, `max_concurrency=1`             | ~141     |
+| Batching            | default `batch_size`, `max_concurrency=1`       | ~15      |
+| Concurrent batching | default `batch_size`, default `max_concurrency` | ~6       |
+Batching alone removes most HTTP overhead, and letting batching overlap with concurrency cuts total runtime to a few seconds while still yielding one output per input.
+![Benchmark comparison for simple task](https://private-user-images.githubusercontent.com/6128022/519474214-d1931e34-6f9e-4695-8042-88b771e002c3.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NjQyMDc5ODAsIm5iZiI6MTc2NDIwNzY4MCwicGF0aCI6Ii82MTI4MDIyLzUxOTQ3NDIxNC1kMTkzMWUzNC02ZjllLTQ2OTUtODA0Mi04OGI3NzFlMDAyYzMucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MTEyNyUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTExMjdUMDE0MTIwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9Y2JhYmU2YjZhNDUxNDkxZDg5NGMxZGI1OTUzODgyYjQ4OTVhYzEzZjU3NmRkMjE1M2Y1ZDI3ZTdiNWI0M2VlMCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.yuxT4AbDIBNsRGCIxPMjpGiHFqLcQUCLg_DjpqH02Lw)
 ## Contents
 - [Why openaivec?](#why-openaivec)
@@ -83,7 +97,7 @@ client = BatchResponses.of(
 result = client.parse(
     ["panda", "rabbit", "koala"],
-    reasoning={"effort": "medium"},  # Required for gpt-5.1
+    reasoning={"effort": "none"},
 )
 print(result)  # Expected output: ['bear family', 'rabbit family', 'koala family']
 ```
@@ -121,15 +135,15 @@ df = pd.DataFrame({"name": ["panda", "rabbit", "koala"]})
 result = df.assign(
     family=lambda df: df.name.ai.responses(
         "What animal family? Answer with 'X family'",
-        reasoning={"effort": "medium"},
+        reasoning={"effort": "none"},
     ),
     habitat=lambda df: df.name.ai.responses(
         "Primary habitat in one word",
-        reasoning={"effort": "medium"},
+        reasoning={"effort": "none"},
     ),
     fun_fact=lambda df: df.name.ai.responses(
         "One interesting fact in 10 words or less",
-        reasoning={"effort": "medium"},
+        reasoning={"effort": "none"},
     ),
 )
 ```
@@ -152,7 +166,7 @@ pandas_ext.set_responses_model("o1-mini")  # Set your reasoning model
 result = df.assign(
     analysis=lambda df: df.text.ai.responses(
         "Analyze this text step by step",
-        reasoning={"effort": "medium"}  # Optional: mirrors the OpenAI SDK argument
+        reasoning={"effort": "none"}  # Optional: mirrors the OpenAI SDK argument
     )
 )
 ```
@@ -206,7 +220,7 @@ df = pd.DataFrame({"text": [
 async def process_data():
     return await df["text"].aio.responses(
         "Analyze sentiment and classify as positive/negative/neutral",
-        reasoning={"effort": "medium"},  # Required for gpt-5.1
+        reasoning={"effort": "none"},  # Required for gpt-5.1
         max_concurrency=12    # Allow up to 12 concurrent requests
     )
@@ -258,7 +272,7 @@ spark.udf.register(
     "extract_brand",
     responses_udf(
         instructions="Extract the brand name from the product. Return only the brand name.",
-        reasoning={"effort": "medium"},  # Recommended with gpt-5.1
+        reasoning={"effort": "none"},  # Recommended with gpt-5.1
     )
 )
@@ -272,7 +286,7 @@ spark.udf.register(
     responses_udf(
         instructions="Translate the text to English, French, and Japanese.",
         response_format=Translation,
-        reasoning={"effort": "medium"},  # Recommended with gpt-5.1
+        reasoning={"effort": "none"},  # Recommended with gpt-5.1
     )
 )
@@ -310,7 +324,7 @@ prompt = (
 ## Using with Microsoft Fabric
-[Microsoft Fabric](https://www.microsoft.com/en-us/microsoft-fabric/) is a unified, cloud-based analytics platform. Add `openaivec` from PyPI in your Fabric environment, select it in your notebook, and use `openaivec.spark` like standard Spark. Detailed walkthrough: 📓 **[Fabric guide →](https://microsoft.github.io/openaivec/examples/fabric/)**.
+[Microsoft Fabric](https://www.microsoft.com/en-us/microsoft-fabric/) is a unified, cloud-based analytics platform. Add `openaivec` from PyPI in your Fabric environment, select it in your notebook, and use `openaivec.spark` like standard Spark.
 ## Contributing
@@ -348,4 +362,4 @@ uv run pytest -m "not slow and not requires_api"
 ## Community
-Join our Discord community for support and announcements: https://discord.gg/vbb83Pgn
+Join our Discord community for support and announcements: https://discord.gg/hXCS9J6Qek

{openaivec-1.0.1 → openaivec-1.0.3}/mkdocs.yml RENAMED Viewed

@@ -63,6 +63,7 @@ nav:
       - Prompt Engineering: examples/prompt.ipynb
       - FAQ Generation: examples/generate_faq.ipynb
       - Token Count and Processing Time: examples/batch_size.ipynb
+      - Request Batching Benchmark: examples/benchmark.ipynb
   - API Reference:
       - Main Package: api/main.md
       - pandas_ext: api/pandas_ext.md
@@ -121,7 +122,7 @@ extra:
     - icon: fontawesome/brands/python
       link: https://pypi.org/project/openaivec/
     - icon: fontawesome/brands/discord
-      link: https://discord.gg/vbb83Pgn
+      link: https://discord.gg/hXCS9J6Qek
 plugins:
   - search:

{openaivec-1.0.1 → openaivec-1.0.3}/src/openaivec/_cache/proxy.py RENAMED Viewed

@@ -186,11 +186,15 @@ class BatchingMapProxy(ProxyBase[S, T], Generic[S, T]):
     performance (targeting 30-60 seconds per batch).
     Example:
-        >>> p = BatchingMapProxy[int, str](batch_size=3)
-        >>> def f(xs: list[int]) -> list[str]:
-        ...     return [f"v:{x}" for x in xs]
-        >>> p.map([1, 2, 2, 3, 4], f)
-        ['v:1', 'v:2', 'v:2', 'v:3', 'v:4']
+        ```python
+        p = BatchingMapProxy[int, str](batch_size=3)
+        def f(xs: list[int]) -> list[str]:
+            return [f"v:{x}" for x in xs]
+        p.map([1, 2, 2, 3, 4], f)
+        # ['v:1', 'v:2', 'v:2', 'v:3', 'v:4']
+        ```
     """
     # Number of items to process per call to map_func.
@@ -449,6 +453,21 @@ class BatchingMapProxy(ProxyBase[S, T], Generic[S, T]):
         Raises:
             Exception: Propagates any exception raised by ``map_func``.
+        Example:
+            ```python
+            proxy: BatchingMapProxy[int, str] = BatchingMapProxy(batch_size=2)
+            calls: list[list[int]] = []
+            def mapper(chunk: list[int]) -> list[str]:
+                calls.append(chunk)
+                return [f"v:{x}" for x in chunk]
+            proxy.map([1, 2, 2, 3], mapper)
+            # ['v:1', 'v:2', 'v:2', 'v:3']
+            calls  # duplicate ``2`` is only computed once
+            # [[1, 2], [3]]
+            ```
         """
         if self.__all_cached(items):
             return self.__values(items)
@@ -490,16 +509,21 @@ class AsyncBatchingMapProxy(ProxyBase[S, T], Generic[S, T]):
     performance (targeting 30-60 seconds per batch).
     Example:
-        >>> import asyncio
-        >>> from typing import List
-        >>> p = AsyncBatchingMapProxy[int, str](batch_size=2)
-        >>> async def af(xs: list[int]) -> list[str]:
-        ...     await asyncio.sleep(0)
-        ...     return [f"v:{x}" for x in xs]
-        >>> async def run():
-        ...     return await p.map([1, 2, 3], af)
-        >>> asyncio.run(run())
-        ['v:1', 'v:2', 'v:3']
+        ```python
+        import asyncio
+        p = AsyncBatchingMapProxy[int, str](batch_size=2)
+        async def af(xs: list[int]) -> list[str]:
+            await asyncio.sleep(0)
+            return [f"v:{x}" for x in xs]
+        async def run():
+            return await p.map([1, 2, 3], af)
+        asyncio.run(run())
+        # ['v:1', 'v:2', 'v:3']
+        ```
     """
     # Number of items to process per call to map_func.
@@ -747,6 +771,19 @@ class AsyncBatchingMapProxy(ProxyBase[S, T], Generic[S, T]):
         Returns:
             list[T]: Mapped values corresponding to ``items`` in the same order.
+        Example:
+            ```python
+            import asyncio
+            async def mapper(chunk: list[int]) -> list[str]:
+                await asyncio.sleep(0)
+                return [f"v:{x}" for x in chunk]
+            proxy: AsyncBatchingMapProxy[int, str] = AsyncBatchingMapProxy(batch_size=2)
+            asyncio.run(proxy.map([1, 1, 2], mapper))
+            # ['v:1', 'v:1', 'v:2']
+            ```
         """
         if await self.__all_cached(items):
             return await self.__values(items)

{openaivec-1.0.1 → openaivec-1.0.3}/src/openaivec/spark.py RENAMED Viewed

@@ -181,6 +181,20 @@ def setup(
             If provided, registers `ResponsesModelName` in the DI container.
         embeddings_model_name (str | None): Default model name for embeddings.
             If provided, registers `EmbeddingsModelName` in the DI container.
+    Example:
+        ```python
+        from pyspark.sql import SparkSession
+        from openaivec.spark import setup
+        spark = SparkSession.builder.getOrCreate()
+        setup(
+            spark,
+            api_key="sk-***",
+            responses_model_name="gpt-4.1-mini",
+            embeddings_model_name="text-embedding-3-small",
+        )
+        ```
     """
     CONTAINER.register(SparkSession, lambda: spark)
@@ -221,6 +235,22 @@ def setup_azure(
             If provided, registers `ResponsesModelName` in the DI container.
         embeddings_model_name (str | None): Default model name for embeddings.
             If provided, registers `EmbeddingsModelName` in the DI container.
+    Example:
+        ```python
+        from pyspark.sql import SparkSession
+        from openaivec.spark import setup_azure
+        spark = SparkSession.builder.getOrCreate()
+        setup_azure(
+            spark,
+            api_key="azure-key",
+            base_url="https://YOUR-RESOURCE-NAME.services.ai.azure.com/openai/v1/",
+            api_version="preview",
+            responses_model_name="gpt4-deployment",
+            embeddings_model_name="embedding-deployment",
+        )
+        ```
     """
     CONTAINER.register(SparkSession, lambda: spark)
@@ -375,6 +405,19 @@ def responses_udf(
     Raises:
         ValueError: If `response_format` is not `str` or a Pydantic `BaseModel`.
+    Example:
+        ```python
+        from pyspark.sql import SparkSession
+        from openaivec.spark import responses_udf, setup
+        spark = SparkSession.builder.getOrCreate()
+        setup(spark, api_key="sk-***", responses_model_name="gpt-4.1-mini")
+        udf = responses_udf("Reply with one word.")
+        spark.udf.register("short_answer", udf)
+        df = spark.createDataFrame([("hello",), ("bye",)], ["text"])
+        df.selectExpr("short_answer(text) as reply").show()
+        ```
     Note:
         For optimal performance in distributed environments:
         - **Automatic Caching**: Duplicate inputs within each partition are cached,
@@ -533,6 +576,20 @@ def infer_schema(
     Returns:
         InferredSchema: An object containing the inferred schema and response format.
+    Example:
+        ```python
+        from pyspark.sql import SparkSession
+        spark = SparkSession.builder.getOrCreate()
+        spark.createDataFrame([("great product",), ("bad service",)], ["text"]).createOrReplaceTempView("examples")
+        infer_schema(
+            instructions="Classify sentiment as positive or negative.",
+            example_table_name="examples",
+            example_field_name="text",
+            max_examples=2,
+        )
+        ```
     """
     spark = CONTAINER.resolve(SparkSession)
@@ -595,6 +652,23 @@ def parse_udf(
             forwarded verbatim to the underlying API calls. These parameters are applied to
             all API requests made by the UDF and override any parameters set in the
             response_format or example data.
+    Example:
+        ```python
+        from pyspark.sql import SparkSession
+        spark = SparkSession.builder.getOrCreate()
+        spark.createDataFrame(
+            [("Order #123 delivered",), ("Order #456 delayed",)],
+            ["body"],
+        ).createOrReplaceTempView("messages")
+        udf = parse_udf(
+            instructions="Extract order id as `order_id` and status as `status`.",
+            example_table_name="messages",
+            example_field_name="body",
+        )
+        spark.udf.register("parse_ticket", udf)
+        spark.sql("SELECT parse_ticket(body) AS parsed FROM messages").show()
+        ```
     Returns:
         UserDefinedFunction: A Spark pandas UDF configured to parse responses asynchronously.
             Output schema is `StringType` for str response format or a struct derived from

openaivec 1.0.1__tar.gz → 1.0.3__tar.gz

openaivec 1.0.1tar.gz → 1.0.3tar.gz