PyPI - llm-gemini - Versions diffs - 0.13.1__tar.gz → 0.14.1__tar.gz - Mend

llm-gemini 0.13.1tar.gz → 0.14.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

{llm_gemini-0.13.1 → llm_gemini-0.14.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: llm-gemini
-Version: 0.13.1
+Version: 0.14.1
 Summary: LLM plugin to access Google's Gemini family of models
 Author: Simon Willison
 License: Apache-2.0
@@ -17,6 +17,7 @@ Requires-Dist: ijson
 Provides-Extra: test
 Requires-Dist: pytest; extra == "test"
 Requires-Dist: pytest-recording; extra == "test"
+Requires-Dist: pytest-asyncio; extra == "test"
 Requires-Dist: nest-asyncio; extra == "test"
 # llm-gemini
@@ -145,7 +146,7 @@ llm chat -m gemini-1.5-pro-latest
 ## Embeddings
-The plugin also adds support for the `text-embedding-004` embedding model.
+The plugin also adds support for the `gemini-embedding-exp-03-07` and `text-embedding-004` embedding models.
 Run that against a single string like this:
 ```bash
@@ -153,10 +154,20 @@ llm embed -m text-embedding-004 -c 'hello world'
 ```
 This returns a JSON array of 768 numbers.
+The `gemini-embedding-exp-03-07` model is larger, returning 3072 numbers. You can also use variants of it that are truncated down to smaller sizes:
+- `gemini-embedding-exp-03-07` - 3072 numbers
+- `gemini-embedding-exp-03-07-2048` - 2048 numbers
+- `gemini-embedding-exp-03-07-1024` - 1024 numbers
+- `gemini-embedding-exp-03-07-512` - 512 numbers
+- `gemini-embedding-exp-03-07-256` - 256 numbers
+- `gemini-embedding-exp-03-07-128` - 128 numbers
 This command will embed every `README.md` file in child directories of the current directory and store the results in a SQLite database called `embed.db` in a collection called `readmes`:
 ```bash
-llm embed-multi readmes --files . '*/README.md' -d embed.db -m text-embedding-004
+llm embed-multi readmes -d embed.db -m gemini-embedding-exp-03-07-128 \
+  --files . '*/README.md'
 ```
 You can then run similarity searches against that collection like this:
 ```bash

{llm_gemini-0.13.1 → llm_gemini-0.14.1}/README.md RENAMED Viewed

@@ -124,7 +124,7 @@ llm chat -m gemini-1.5-pro-latest
 ## Embeddings
-The plugin also adds support for the `text-embedding-004` embedding model.
+The plugin also adds support for the `gemini-embedding-exp-03-07` and `text-embedding-004` embedding models.
 Run that against a single string like this:
 ```bash
@@ -132,10 +132,20 @@ llm embed -m text-embedding-004 -c 'hello world'
 ```
 This returns a JSON array of 768 numbers.
+The `gemini-embedding-exp-03-07` model is larger, returning 3072 numbers. You can also use variants of it that are truncated down to smaller sizes:
+- `gemini-embedding-exp-03-07` - 3072 numbers
+- `gemini-embedding-exp-03-07-2048` - 2048 numbers
+- `gemini-embedding-exp-03-07-1024` - 1024 numbers
+- `gemini-embedding-exp-03-07-512` - 512 numbers
+- `gemini-embedding-exp-03-07-256` - 256 numbers
+- `gemini-embedding-exp-03-07-128` - 128 numbers
 This command will embed every `README.md` file in child directories of the current directory and store the results in a SQLite database called `embed.db` in a collection called `readmes`:
 ```bash
-llm embed-multi readmes --files . '*/README.md' -d embed.db -m text-embedding-004
+llm embed-multi readmes -d embed.db -m gemini-embedding-exp-03-07-128 \
+  --files . '*/README.md'
 ```
 You can then run similarity searches against that collection like this:
 ```bash

{llm_gemini-0.13.1 → llm_gemini-0.14.1}/llm_gemini.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: llm-gemini
-Version: 0.13.1
+Version: 0.14.1
 Summary: LLM plugin to access Google's Gemini family of models
 Author: Simon Willison
 License: Apache-2.0
@@ -17,6 +17,7 @@ Requires-Dist: ijson
 Provides-Extra: test
 Requires-Dist: pytest; extra == "test"
 Requires-Dist: pytest-recording; extra == "test"
+Requires-Dist: pytest-asyncio; extra == "test"
 Requires-Dist: nest-asyncio; extra == "test"
 # llm-gemini
@@ -145,7 +146,7 @@ llm chat -m gemini-1.5-pro-latest
 ## Embeddings
-The plugin also adds support for the `text-embedding-004` embedding model.
+The plugin also adds support for the `gemini-embedding-exp-03-07` and `text-embedding-004` embedding models.
 Run that against a single string like this:
 ```bash
@@ -153,10 +154,20 @@ llm embed -m text-embedding-004 -c 'hello world'
 ```
 This returns a JSON array of 768 numbers.
+The `gemini-embedding-exp-03-07` model is larger, returning 3072 numbers. You can also use variants of it that are truncated down to smaller sizes:
+- `gemini-embedding-exp-03-07` - 3072 numbers
+- `gemini-embedding-exp-03-07-2048` - 2048 numbers
+- `gemini-embedding-exp-03-07-1024` - 1024 numbers
+- `gemini-embedding-exp-03-07-512` - 512 numbers
+- `gemini-embedding-exp-03-07-256` - 256 numbers
+- `gemini-embedding-exp-03-07-128` - 128 numbers
 This command will embed every `README.md` file in child directories of the current directory and store the results in a SQLite database called `embed.db` in a collection called `readmes`:
 ```bash
-llm embed-multi readmes --files . '*/README.md' -d embed.db -m text-embedding-004
+llm embed-multi readmes -d embed.db -m gemini-embedding-exp-03-07-128 \
+  --files . '*/README.md'
 ```
 You can then run similarity searches against that collection like this:
 ```bash

{llm_gemini-0.13.1 → llm_gemini-0.14.1}/llm_gemini.egg-info/requires.txt RENAMED Viewed

@@ -5,4 +5,5 @@ ijson
 [test]
 pytest
 pytest-recording
+pytest-asyncio
 nest-asyncio

{llm_gemini-0.13.1 → llm_gemini-0.14.1}/llm_gemini.py RENAMED Viewed

@@ -88,18 +88,24 @@ def resolve_type(attachment):
     return mime_type
-def cleanup_schema(schema):
+def cleanup_schema(schema, in_properties=False):
     "Gemini supports only a subset of JSON schema"
     keys_to_remove = ("$schema", "additionalProperties", "title")
-    # Recursively remove them
     if isinstance(schema, dict):
-        for key in keys_to_remove:
-            schema.pop(key, None)
-        for value in schema.values():
-            cleanup_schema(value)
+        # Only remove keys if we're not inside a 'properties' block.
+        if not in_properties:
+            for key in keys_to_remove:
+                schema.pop(key, None)
+        for key, value in list(schema.items()):
+            # If the key is 'properties', set the flag for its value.
+            if key == "properties" and isinstance(value, dict):
+                cleanup_schema(value, in_properties=True)
+            else:
+                cleanup_schema(value, in_properties=in_properties)
     elif isinstance(schema, list):
-        for value in schema:
-            cleanup_schema(value)
+        for item in schema:
+            cleanup_schema(item, in_properties=in_properties)
     return schema
@@ -378,9 +384,19 @@ class AsyncGeminiPro(_SharedGemini, llm.AsyncKeyModel):
 @llm.hookimpl
 def register_embedding_models(register):
+    register(GeminiEmbeddingModel("text-embedding-004", "text-embedding-004"))
+    # gemini-embedding-exp-03-07 in different truncation sizes
     register(
-        GeminiEmbeddingModel("text-embedding-004", "text-embedding-004"),
+        GeminiEmbeddingModel(
+            "gemini-embedding-exp-03-07", "gemini-embedding-exp-03-07"
+        ),
     )
+    for i in (128, 256, 512, 1024, 2048):
+        register(
+            GeminiEmbeddingModel(
+                f"gemini-embedding-exp-03-07-{i}", f"gemini-embedding-exp-03-07", i
+            ),
+        )
 class GeminiEmbeddingModel(llm.EmbeddingModel):
@@ -388,9 +404,10 @@ class GeminiEmbeddingModel(llm.EmbeddingModel):
     key_env_var = "LLM_GEMINI_KEY"
     batch_size = 20
-    def __init__(self, model_id, gemini_model_id):
+    def __init__(self, model_id, gemini_model_id, truncate=None):
         self.model_id = model_id
         self.gemini_model_id = gemini_model_id
+        self.truncate = truncate
     def embed_batch(self, items):
         headers = {
@@ -416,4 +433,7 @@ class GeminiEmbeddingModel(llm.EmbeddingModel):
             )
         response.raise_for_status()
-        return [item["values"] for item in response.json()["embeddings"]]
+        values = [item["values"] for item in response.json()["embeddings"]]
+        if self.truncate:
+            values = [value[: self.truncate] for value in values]
+        return values

{llm_gemini-0.13.1 → llm_gemini-0.14.1}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "llm-gemini"
-version = "0.13.1"
+version = "0.14.1"
 description = "LLM plugin to access Google's Gemini family of models"
 readme = "README.md"
 authors = [{name = "Simon Willison"}]
@@ -24,4 +24,8 @@ CI = "https://github.com/simonw/llm-gemini/actions"
 gemini = "llm_gemini"
 [project.optional-dependencies]
-test = ["pytest", "pytest-recording", "nest-asyncio"]
+test = ["pytest", "pytest-recording", "pytest-asyncio", "nest-asyncio"]
+[tool.pytest.ini_options]
+asyncio_mode = "strict"
+asyncio_default_fixture_loop_scope = "function"

llm_gemini-0.14.1/tests/test_gemini.py ADDED Viewed

@@ -0,0 +1,212 @@
+import llm
+import nest_asyncio
+import json
+import os
+import pytest
+import pydantic
+from llm_gemini import cleanup_schema
+nest_asyncio.apply()
+GEMINI_API_KEY = os.environ.get("PYTEST_GEMINI_API_KEY", None) or "gm-..."
+@pytest.mark.vcr
+@pytest.mark.asyncio
+async def test_prompt():
+    model = llm.get_model("gemini-1.5-flash-latest")
+    response = model.prompt("Name for a pet pelican, just the name", key=GEMINI_API_KEY)
+    assert str(response) == "Percy\n"
+    assert response.response_json == {
+        "candidates": [
+            {
+                "finishReason": "STOP",
+                "safetyRatings": [
+                    {
+                        "category": "HARM_CATEGORY_HATE_SPEECH",
+                        "probability": "NEGLIGIBLE",
+                    },
+                    {
+                        "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
+                        "probability": "NEGLIGIBLE",
+                    },
+                    {
+                        "category": "HARM_CATEGORY_HARASSMENT",
+                        "probability": "NEGLIGIBLE",
+                    },
+                    {
+                        "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
+                        "probability": "NEGLIGIBLE",
+                    },
+                ],
+            }
+        ],
+        "modelVersion": "gemini-1.5-flash-latest",
+    }
+    assert response.token_details == {
+        "promptTokensDetails": [{"modality": "TEXT", "tokenCount": 9}],
+        "candidatesTokensDetails": [{"modality": "TEXT", "tokenCount": 2}],
+    }
+    assert response.input_tokens == 9
+    assert response.output_tokens == 2
+    # And try it async too
+    async_model = llm.get_async_model("gemini-1.5-flash-latest")
+    response = await async_model.prompt(
+        "Name for a pet pelican, just the name", key=GEMINI_API_KEY
+    )
+    text = await response.text()
+    assert text == "Percy\n"
+@pytest.mark.vcr
+@pytest.mark.asyncio
+async def test_prompt_with_pydantic_schema():
+    class Dog(pydantic.BaseModel):
+        name: str
+        age: int
+        bio: str
+    model = llm.get_model("gemini-1.5-flash-latest")
+    response = model.prompt(
+        "Invent a cool dog", key=GEMINI_API_KEY, schema=Dog, stream=False
+    )
+    assert json.loads(response.text()) == {
+        "age": 3,
+        "bio": "A fluffy Samoyed with exceptional intelligence and a love for belly rubs. He's mastered several tricks, including fetching the newspaper and opening doors.",
+        "name": "Cloud",
+    }
+    assert response.response_json == {
+        "candidates": [
+            {
+                "finishReason": "STOP",
+                "safetyRatings": [
+                    {
+                        "category": "HARM_CATEGORY_HATE_SPEECH",
+                        "probability": "NEGLIGIBLE",
+                    },
+                    {
+                        "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
+                        "probability": "NEGLIGIBLE",
+                    },
+                    {
+                        "category": "HARM_CATEGORY_HARASSMENT",
+                        "probability": "NEGLIGIBLE",
+                    },
+                    {
+                        "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
+                        "probability": "NEGLIGIBLE",
+                    },
+                ],
+            }
+        ],
+        "modelVersion": "gemini-1.5-flash-latest",
+    }
+    assert response.input_tokens == 10
+@pytest.mark.vcr
+@pytest.mark.parametrize(
+    "model_id",
+    (
+        "gemini-embedding-exp-03-07",
+        "gemini-embedding-exp-03-07-128",
+        "gemini-embedding-exp-03-07-512",
+    ),
+)
+def test_embedding(model_id, monkeypatch):
+    monkeypatch.setenv("LLM_GEMINI_KEY", GEMINI_API_KEY)
+    model = llm.get_embedding_model(model_id)
+    response = model.embed("Some text goes here")
+    expected_length = 3072
+    if model_id.endswith("-128"):
+        expected_length = 128
+    elif model_id.endswith("-512"):
+        expected_length = 512
+    assert len(response) == expected_length
+@pytest.mark.parametrize(
+    "schema,expected",
+    [
+        # Test 1: Top-level keys removal
+        (
+            {
+                "$schema": "http://json-schema.org/draft-07/schema#",
+                "title": "Example Schema",
+                "additionalProperties": False,
+                "type": "object",
+            },
+            {"type": "object"},
+        ),
+        # Test 2: Preserve keys within a "properties" block
+        (
+            {
+                "type": "object",
+                "properties": {
+                    "authors": {"type": "string"},
+                    "title": {"type": "string"},
+                    "reference": {"type": "string"},
+                    "year": {"type": "string"},
+                },
+                "title": "This should be removed from the top-level",
+            },
+            {
+                "type": "object",
+                "properties": {
+                    "authors": {"type": "string"},
+                    "title": {"type": "string"},
+                    "reference": {"type": "string"},
+                    "year": {"type": "string"},
+                },
+            },
+        ),
+        # Test 3: Nested keys outside and inside properties block
+        (
+            {
+                "definitions": {
+                    "info": {
+                        "title": "Info title",  # should be removed because it's not inside a "properties" block
+                        "description": "A description",
+                        "properties": {
+                            "name": {
+                                "title": "Name Title",
+                                "type": "string",
+                            },  # title here should be preserved
+                            "$schema": {
+                                "type": "string"
+                            },  # should be preserved as it's within properties
+                        },
+                    }
+                },
+                "$schema": "http://example.com/schema",
+            },
+            {
+                "definitions": {
+                    "info": {
+                        "description": "A description",
+                        "properties": {
+                            "name": {"title": "Name Title", "type": "string"},
+                            "$schema": {"type": "string"},
+                        },
+                    }
+                }
+            },
+        ),
+        # Test 4: List of schemas
+        (
+            [
+                {
+                    "$schema": "http://json-schema.org/draft-07/schema#",
+                    "type": "object",
+                },
+                {"title": "Should be removed", "type": "array"},
+            ],
+            [{"type": "object"}, {"type": "array"}],
+        ),
+    ],
+)
+def test_cleanup_schema(schema, expected):
+    # Use a deep copy so the original test data remains unchanged.
+    result = cleanup_schema(schema)
+    assert result == expected

llm_gemini-0.13.1/tests/test_gemini.py DELETED Viewed

@@ -1,104 +0,0 @@
-import llm
-import nest_asyncio
-import json
-import os
-import pytest
-import pydantic
-nest_asyncio.apply()
-GEMINI_API_KEY = os.environ.get("PYTEST_GEMINI_API_KEY", None) or "gm-..."
-@pytest.mark.vcr
-@pytest.mark.asyncio
-async def test_prompt():
-    model = llm.get_model("gemini-1.5-flash-latest")
-    response = model.prompt("Name for a pet pelican, just the name", key=GEMINI_API_KEY)
-    assert str(response) == "Percy\n"
-    assert response.response_json == {
-        "candidates": [
-            {
-                "finishReason": "STOP",
-                "safetyRatings": [
-                    {
-                        "category": "HARM_CATEGORY_HATE_SPEECH",
-                        "probability": "NEGLIGIBLE",
-                    },
-                    {
-                        "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
-                        "probability": "NEGLIGIBLE",
-                    },
-                    {
-                        "category": "HARM_CATEGORY_HARASSMENT",
-                        "probability": "NEGLIGIBLE",
-                    },
-                    {
-                        "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
-                        "probability": "NEGLIGIBLE",
-                    },
-                ],
-            }
-        ],
-        "modelVersion": "gemini-1.5-flash-latest",
-    }
-    assert response.token_details == {
-        "promptTokensDetails": [{"modality": "TEXT", "tokenCount": 9}],
-        "candidatesTokensDetails": [{"modality": "TEXT", "tokenCount": 2}],
-    }
-    assert response.input_tokens == 9
-    assert response.output_tokens == 2
-    # And try it async too
-    async_model = llm.get_async_model("gemini-1.5-flash-latest")
-    response = await async_model.prompt(
-        "Name for a pet pelican, just the name", key=GEMINI_API_KEY
-    )
-    text = await response.text()
-    assert text == "Percy\n"
-@pytest.mark.vcr
-@pytest.mark.asyncio
-async def test_prompt_with_pydantic_schema():
-    class Dog(pydantic.BaseModel):
-        name: str
-        age: int
-        bio: str
-    model = llm.get_model("gemini-1.5-flash-latest")
-    response = model.prompt(
-        "Invent a cool dog", key=GEMINI_API_KEY, schema=Dog, stream=False
-    )
-    assert json.loads(response.text()) == {
-        "age": 3,
-        "bio": "A fluffy Samoyed with exceptional intelligence and a love for belly rubs. He's mastered several tricks, including fetching the newspaper and opening doors.",
-        "name": "Cloud",
-    }
-    assert response.response_json == {
-        "candidates": [
-            {
-                "finishReason": "STOP",
-                "safetyRatings": [
-                    {
-                        "category": "HARM_CATEGORY_HATE_SPEECH",
-                        "probability": "NEGLIGIBLE",
-                    },
-                    {
-                        "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
-                        "probability": "NEGLIGIBLE",
-                    },
-                    {
-                        "category": "HARM_CATEGORY_HARASSMENT",
-                        "probability": "NEGLIGIBLE",
-                    },
-                    {
-                        "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
-                        "probability": "NEGLIGIBLE",
-                    },
-                ],
-            }
-        ],
-        "modelVersion": "gemini-1.5-flash-latest",
-    }
-    assert response.input_tokens == 10

{llm_gemini-0.13.1 → llm_gemini-0.14.1}/LICENSE RENAMED Viewed

File without changes

{llm_gemini-0.13.1 → llm_gemini-0.14.1}/llm_gemini.egg-info/SOURCES.txt RENAMED Viewed

File without changes

{llm_gemini-0.13.1 → llm_gemini-0.14.1}/llm_gemini.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{llm_gemini-0.13.1 → llm_gemini-0.14.1}/llm_gemini.egg-info/entry_points.txt RENAMED Viewed

File without changes

{llm_gemini-0.13.1 → llm_gemini-0.14.1}/llm_gemini.egg-info/top_level.txt RENAMED Viewed

File without changes

{llm_gemini-0.13.1 → llm_gemini-0.14.1}/setup.cfg RENAMED Viewed

File without changes

llm-gemini 0.13.1__tar.gz → 0.14.1__tar.gz

llm-gemini 0.13.1tar.gz → 0.14.1tar.gz