PyPI - projectdavid - Versions diffs - 1.33.23__tar.gz → 1.33.24__tar.gz - Mend

projectdavid 1.33.23tar.gz → 1.33.24tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of projectdavid might be problematic. Click here for more details.

Files changed (73) hide show

{projectdavid-1.33.23 → projectdavid-1.33.24}/CHANGELOG.md RENAMED Viewed

@@ -1,3 +1,10 @@
+## [1.33.24](https://github.com/frankie336/projectdavid/compare/v1.33.23...v1.33.24) (2025-06-22)
+### Bug Fixes
+* Remove Kargs from FileProcessor() ([17a19b3](https://github.com/frankie336/projectdavid/commit/17a19b36f2275bc408b60333f4798b1a462fb96c))
 ## [1.33.23](https://github.com/frankie336/projectdavid/compare/v1.33.22...v1.33.23) (2025-06-17)

{projectdavid-1.33.23/src/projectdavid.egg-info → projectdavid-1.33.24}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: projectdavid
-Version: 1.33.23
+Version: 1.33.24
 Summary: Python SDK for interacting with the Entities Assistant API.
 Author-email: Francis Neequaye Armah <francis.neequaye@projectdavid.co.uk>
 License: PolyForm Noncommercial License 1.0.0

projectdavid-1.33.24/docs/UPDATE-V133.23.md ADDED Viewed

@@ -0,0 +1,153 @@
+> **PLEASE NOTE:**
+>
+>  As of V1.33.23 Thread creation no longer requires participant_ids to be passed in:
+>
+```python
+thread = client.threads.create_thread(participant_ids=user.id)
+# Can be shortened to:
+thread = client.threads.create_thread()
+```
+## Function call error surfacing
+Function call error trace stack messages are now surfaced to the message dialogue. If the assistant
+does not proactively say so, you can force this with a followup prompt like:
+```python
+"What happened?"
+```
+Sometimes it receives the error but does not proactively reveal.  Please be aware that your consumers
+will have access to stack trace messages. OpenAI do this, the risk is minimal.
+# Data Ingestion and Search Methods
+## Vector Store Standard Data Ingestion Pipeline
+As of **projectdavid v1.33.23** Json output from function calls are now suppressed by default.
+You can reveal Json output them with:
+```python
+sync_stream.stream_chunks(
+            provider=PROVIDER,
+            model=MODEL,
+            timeout_per_chunk=60.0,
+            suppress_fc=True,
+        ):
+...
+```
+Please see [here](https://github.com/frankie336/projectdavid/blob/master/docs/inference.md)
+for  detailed use example.
+## Vector Store Standard Data Ingestion Pipeline
+Our standard public ingestion method, `VectorStoreClient.add_file_to_vector_store`, will:
+- Pre-process files.
+- Chunk files.
+- Generate embeddings.
+- Upload processed chunks to the specified vector store.
+- Prepare file contents for semantic search.
+This method is designed primarily for individual files containing mostly unstructured text. It’s powerful and will cover most Retrieval-Augmented Generation (RAG) use cases.
+For detailed instructions, please refer to our [usage documentation](https://github.com/frankie336/projectdavid/blob/master/docs/vector_store.md).
+You can also find a real-world usage example in our cookbook here:
+[Basic Vector Embeddings Search Example](https://github.com/frankie336/entities_cook_book/blob/master/recipes/vector_store/basic_vector_embeddings_search.py).
+## Your Vector Store Custom Data Ingestion Pipeline
+We directly leverage our embedding model to craft a customized ingestion pipeline using `FileProcessor.embedding_model`. This pipeline:
+- Pre-processes structured datasets (e.g., MovieLens).
+- Converts each movie record into its own chunk.
+- Manually constructs rich text embeddings from multiple metadata fields (title, genres, release year, etc.).
+In summary, this custom pipeline is optimized for granular semantic results from structured datasets such as MovieLens. This approach can easily be adapted to other similar datasets, especially useful in recommendation algorithms.
+The custom pipeline example is available [here](#).
+## Search Methods
+As of **projectdavid v1.33.23** ([PyPI Link](https://pypi.org/project/projectdavid/)), the following search methods are available:
+### `VectorStoreClient.vector_file_search_raw`
+> **PLEASE NOTE:**
+>
+> This method was previously named `VectorStoreClient.vector_file_search`. Please update your code accordingly when migrating to v1.33.23.
+- Returns raw dictionaries with results ranked by corresponding `K` values in descending order.
+- Currently used in our semantic search examples on vectorized instances of the MovieLens dataset.
+#### Batch Search Example:
+[Batch Search on MovieLens](https://github.com/frankie336/entities_cook_book/blob/master/recipes/reccomender/batch_search_movielens.py)
+#### Fuzzy Search App Example:
+[Fuzzy Search App](https://github.com/frankie336/entities_cook_book/blob/master/recipes/reccomender/search_movielens-v2.py)
+### `VectorStoreClient.simple_vector_file_search`
+Returns a data structure optimized for interpretation and synthesis by LLM models, suitable for function call returns with potential citations.
+**Example Response:**
+```json
+{
+  "object": "vector_store.file_search_result",
+  "data": [
+    {
+      "object": "vector_store.file_hit",
+      "index": 0,
+      "text": "Title: Toy Story. Genres: Animation, Children's, Comedy. Released in 1995.",
+      "score": 0.92,
+      "meta_data": {
+        "item_id": 1,
+        "title": "Toy Story",
+        "genres": ["Animation", "Children's", "Comedy"],
+        "release_year": 1995,
+        "IMDb_URL": "http://www.imdb.com/title/tt0114709/"
+      },
+      "vector_id": "vec_abc123",
+      "store_id": "vect_mqfWyNlZbacer73PQu4Upy"
+    },
+    {
+      "object": "vector_store.file_hit",
+      "index": 1,
+      "text": "Title: The Lion King. Genres: Animation, Children's, Musical. Released in 1994.",
+      "score": 0.89,
+      "meta_data": {
+        "item_id": 2,
+        "title": "The Lion King",
+        "genres": ["Animation", "Children's", "Musical"],
+        "release_year": 1994,
+        "IMDb_URL": "http://www.imdb.com/title/tt0110357/"
+      },
+      "vector_id": "vec_def456",
+      "store_id": "vect_mqfWyNlZbacer73PQu4Upy"
+    }
+  ],
+  "answer": "Here are 2 fun kids' movies from the 1990s: **Toy Story** (1995, Animation/Comedy) and **The Lion King** (1994, Animation/Musical). Both are highly rated family films.",
+  "query": "fun kids movies from the 1990s"
+}
+```
+### `VectorStoreClient.attended_file_search`
+Utilizes an integrated AI agent to synthesize analysis and employs a specialized post-processing ranking model to ensure highly precise results. Outputs use a similar envelope as `simple_vector_file_search`. Ideal for quick demonstrations or standalone push-button integrations.
+### `VectorStoreClient.unattended_file_search`
+Employs the same advanced post-processing ranking model as `attended_file_search`, but without integrated synthesis. Suitable for standard function call implementations.

projectdavid-1.33.24/docs/inference.md ADDED Viewed

@@ -0,0 +1,116 @@
+from sympy import python
+# Inference
+## Overview
+Inference is the final stage of the Entities API workflow, where the assistant processes a prompt and generates a response. This stage highlights the assistant's intelligence and capabilities. Inference can be executed on edge devices or in the cloud, according to your specific needs. Our API supports both options, allowing flexibility tailored to your use case.
+---
+## Basic Inference Streaming Example
+The following example demonstrates how to:
+1. Create an assistant and a thread.
+2. Send a user message.
+3. Initiate a run.
+4. Stream the assistant's response via the Hyperbolic provider.
+### Requirements
+Ensure the following environment variables are set:
+- `ENTITIES_API_KEY`: API key for Entities API access.
+- `BASE_URL`: Base URL of the Entities API instance (default: `http://localhost:9000`).
+- `HYPERBOLIC_API_KEY`: API key for the Hyperbolic provider.
+- `ENTITIES_USER_ID`: User ID associated with the Entities API.
+### Example Implementation
+```python
+import os
+from dotenv import load_dotenv
+from projectdavid import Entity
+# Load environment variables from .env file
+load_dotenv()
+# Initialize Entities API client
+client = Entity(
+    base_url=os.getenv("BASE_URL", "http://localhost:9000"),
+    api_key=os.getenv("ENTITIES_API_KEY")
+)
+# Constants for Hyperbolic provider
+API_KEY = "your-hyperbolic-key-here"
+MODEL = "hyperbolic/deepseek-ai/DeepSeek-V3-0324"
+PROVIDER = "Hyperbolic"
+def main():
+    # Create assistant (can be reused across runs)
+    assistant = client.assistants.create_assistant(
+        name="test_assistant",
+        instructions="You are a helpful AI assistant",)
+    # Create thread (can also be reused)
+    thread = client.threads.create_thread()
+    # Create user message
+    message = client.messages.create_message(
+        thread_id=thread.id,
+        role="user",
+        content="Explain a black hole to me in pure mathematical terms",
+        assistant_id=assistant.id
+    )
+    # Create a run
+    run = client.runs.create_run(
+        assistant_id=assistant.id,
+        thread_id=thread.id
+    )
+```
+### Stream the Assistant's Response
+```python
+    # --------------------------------------
+    #
+    # Setup synchronous streaming
+    #
+    #-----------------------------------------
+    sync_stream = client.synchronous_inference_stream
+    sync_stream.setup(
+        user_id=user_id,
+        thread_id=thread.id,
+        assistant_id=assistant.id,
+        message_id=message.id,
+        run_id=run.id,
+        api_key=API_KEY
+    )
+    # Stream the assistant's response
+    try:
+        for chunk in sync_stream.stream_chunks(
+            provider=PROVIDER,
+            model=MODEL,
+            timeout_per_chunk=60.0,
+            suppress_fc=True,
+        ):
+            content = chunk.get("content", "")
+            if content:
+                print(content, end="", flush=True)
+        print("\n--- End of Stream ---")
+    except Exception as e:
+        print(f"Stream Error: {e}")
+if __name__ == "__main__":
+    main()
+```

{projectdavid-1.33.23 → projectdavid-1.33.24}/docs/threads.md RENAMED Viewed

@@ -17,7 +17,7 @@ client = Entity()
 user = client.users.create_user(name='My test user')
-thread = client.threads.create_thread(participant_ids=user.id)
+thread = client.threads.create_thread()
 print(thread.id)

{projectdavid-1.33.23 → projectdavid-1.33.24}/docs/vector_store.md RENAMED Viewed

@@ -14,13 +14,15 @@ Associated methods can be used to extend the memory and contextual recall of AI
 ## Basic Vector Store Operations
 ```python
+import os
 from projectdavid import Entity
-client = Entity()
-# create a user
-test_user = client.users.create_user(name='test_user')
-print(test_user)
+client = Entity(
+    base_url=os.getenv("BASE_URL", "http://localhost:9000"),
+    api_key=os.getenv("ENTITIES_API_KEY"), #This is the entities user API Key
+)
 # create a vector store
 store = client.vectors.create_vector_store(
@@ -71,25 +73,89 @@ client = Entity()
 save_file_to_store = client.vectors.add_file_to_vector_store(
     vector_store_id='vect_WsdjjLHoQqyMLmCdrvShc6',
-    file_path='test_file.txt'
+    file_path='Donoghue_v_Stevenson__1932__UKHL_100__26_May_1932_.pdf'
 )
 ```
+Text is split, embedded into  a vector space, enriched with metadata, and pushed into a vector database.
+This allows for semantic search over its contents.
+---
-At this point, your file has been vectorized to your store.
+## Search Methods
+The Entities Vector Store supports four distinct search methods, each tailored to a specific use case:
----
-### Searches
+```VectorStoreClient.vector_file_search_raw```
+  Returns raw similarity-ranked vectors with full metadata. Best for low-level access or post-processing.
+````python
+client = Entity(
+    base_url=os.getenv("BASE_URL", "http://localhost:9000"),
+    api_key=os.getenv("ENTITIES_API_KEY"), #This is the entities user API Key
+)
----
+client.vectors.vector_file_search_raw(
+    vector_store_id = store.id
+    query_text = 'Explain the neighbour principle'
+)
+````
+```VectorStoreClient.simple_vector_file_search``` Returns a structured response optimized for LLM consumption — useful in function calls with citation-ready output.
+````python
+client = Entity(
+    base_url=os.getenv("BASE_URL", "http://localhost:9000"),
+    api_key=os.getenv("ENTITIES_API_KEY"), #This is the entities user API Key
+)
+client.vectors.simple_vector_file_search(
+    vector_store_id = store.id
+    query_text = 'Explain the neighbour principle'
+)
+````
+```VectorStoreClient.attended_file_search```Performs search, ranking, and synthesis using an internal agent. Ideal for push-button demos or standalone assistants.
+````python
+client = Entity(
+    base_url=os.getenv("BASE_URL", "http://localhost:9000"),
+    api_key=os.getenv("ENTITIES_API_KEY"), #This is the entities user API Key
+)
+client.vectors.attended_file_search(
+    vector_store_id = store.id
+    query_text = 'Explain the neighbour principle'
+)
+````
+```VectorStoreClient.unattended_file_search```Performs high-precision search with post-ranking, but without synthesis. Use this in toolchains or function-calling workflows.
+````python
+client = Entity(
+    base_url=os.getenv("BASE_URL", "http://localhost:9000"),
+    api_key=os.getenv("ENTITIES_API_KEY"), #This is the entities user API Key
+)
+client.vectors.unattended_file_search(
+    vector_store_id = store.id
+    query_text = 'Explain the neighbour principle'
+)
+````
+---
 - The assistant will self-select appropriate vector store
 searches using its latent logic when responding to a prompt.
@@ -134,11 +200,6 @@ list_store_files(vector_store_id) → List[VectorStoreFileRead]
 update_vector_store_file_status(vector_store_id, file_id, status, error_message=None) → VectorStoreFileRead
 ```
-### Search
-```python
-search_vector_store(vector_store_id, query_text, top_k=5, filters=None) → List[dict]
-```
 ### Assistant Integration

{projectdavid-1.33.23 → projectdavid-1.33.24}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "projectdavid"
-version = "1.33.23"
+version = "1.33.24"
 description = "Python SDK for interacting with the Entities Assistant API."
 readme = "README.md"
 authors = [

{projectdavid-1.33.23 → projectdavid-1.33.24}/src/projectdavid/clients/vectors.py RENAMED Viewed

@@ -92,7 +92,9 @@ class VectorStoreClient:
         self.identifier_service = UtilsInterface.IdentifierService()
         # 🔶 forward kwargs into the upgraded FileProcessor
-        self.file_processor = FileProcessor(**(file_processor_kwargs or {}))
+        # self.file_processor = FileProcessor(**(file_processor_kwargs or {}))
+        # Using Stripped down version for now until we move forward with multi-modal stores
+        self.file_processor = FileProcessor()
         log.info("VectorStoreClient → %s", self.base_url)

{projectdavid-1.33.23 → projectdavid-1.33.24/src/projectdavid.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: projectdavid
-Version: 1.33.23
+Version: 1.33.24
 Summary: Python SDK for interacting with the Entities Assistant API.
 Author-email: Francis Neequaye Armah <francis.neequaye@projectdavid.co.uk>
 License: PolyForm Noncommercial License 1.0.0

{projectdavid-1.33.23 → projectdavid-1.33.24}/src/projectdavid.egg-info/SOURCES.txt RENAMED Viewed

@@ -3,6 +3,7 @@ LICENSE
 MANIFEST.in
 README.md
 pyproject.toml
+docs/UPDATE-V133.23.md
 docs/assistants.md
 docs/code_interpretation.md
 docs/database.md

projectdavid-1.33.23/docs/inference.md DELETED Viewed

@@ -1,7 +0,0 @@
-# Inference
-## Overview
-Inference is the final stage of the Entities API workflow, where the assistant processes a prompt and generates a reply. This stage is where the magic happens, and the assistant’s intelligence is put to the test. Inference can be performed on edge devices or in the cloud, depending on your specific requirements and constraints. Our API supports both options, allowing you to choose the best approach for your use case.
-...In progess, please check here soon.