PyPI - graphiti-core - Versions diffs - 0.3.3__tar.gz → 0.3.4__tar.gz - Mend

graphiti-core 0.3.3tar.gz → 0.3.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of graphiti-core might be problematic. Click here for more details.

Files changed (43) hide show

{graphiti_core-0.3.3 → graphiti_core-0.3.4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: graphiti-core
-Version: 0.3.3
+Version: 0.3.4
 Summary: A temporal graph building library
 License: Apache-2.0
 Author: Paul Paliychuk
@@ -21,7 +21,7 @@ Description-Content-Type: text/markdown
 <div align="center">
-# Graphiti
+<img width="350" alt="Graphiti-ts-small" src="https://github.com/user-attachments/assets/bbd02947-e435-4a05-b25a-bbbac36d52c8">
 ## Temporal Knowledge Graphs for Agentic Applications
@@ -37,7 +37,9 @@ Description-Content-Type: text/markdown
 </div>
-Graphiti builds dynamic, temporally aware Knowledge Graphs that represent complex, evolving relationships between entities over time. Graphiti ingests both unstructured and structured data, and the resulting graph may be queried using a fusion of time, full-text, semantic, and graph algorithm approaches.
+Graphiti builds dynamic, temporally aware Knowledge Graphs that represent complex, evolving relationships between
+entities over time. Graphiti ingests both unstructured and structured data, and the resulting graph may be queried using
+a fusion of time, full-text, semantic, and graph algorithm approaches.
 <br />
@@ -47,25 +49,39 @@ Graphiti builds dynamic, temporally aware Knowledge Graphs that represent comple
 <br />
-Graphiti helps you create and query Knowledge Graphs that evolve over time. A knowledge graph is a network of interconnected facts, such as _“Kendra loves Adidas shoes.”_ Each fact is a “triplet” represented by two entities, or nodes (_”Kendra”_, _“Adidas shoes”_), and their relationship, or edge (_”loves”_). Knowledge Graphs have been explored extensively for information retrieval. What makes Graphiti unique is its ability to autonomously build a knowledge graph while handling changing relationships and maintaining historical context.
+Graphiti helps you create and query Knowledge Graphs that evolve over time. A knowledge graph is a network of
+interconnected facts, such as _“Kendra loves Adidas shoes.”_ Each fact is a “triplet” represented by two entities, or
+nodes (_”Kendra”_, _“Adidas shoes”_), and their relationship, or edge (_”loves”_). Knowledge Graphs have been explored
+extensively for information retrieval. What makes Graphiti unique is its ability to autonomously build a knowledge graph
+while handling changing relationships and maintaining historical context.
 With Graphiti, you can build LLM applications such as:
-- Assistants that learn from user interactions, fusing personal knowledge with dynamic data from business systems like CRMs and billing platforms.
+- Assistants that learn from user interactions, fusing personal knowledge with dynamic data from business systems like
+  CRMs and billing platforms.
 - Agents that autonomously execute complex tasks, reasoning with state changes from multiple dynamic sources.
-Graphiti supports a wide range of applications in sales, customer service, health, finance, and more, enabling long-term recall and state-based reasoning for both assistants and agents.
+Graphiti supports a wide range of applications in sales, customer service, health, finance, and more, enabling long-term
+recall and state-based reasoning for both assistants and agents.
 ## Why Graphiti?
-We were intrigued by Microsoft’s GraphRAG, which expanded on RAG text chunking by using a graph to better model a document corpus and making this representation available via semantic and graph search techniques. However, GraphRAG did not address our core problem: It's primarily designed for static documents and doesn't inherently handle temporal aspects of data.
-Graphiti is designed from the ground up to handle constantly changing information, hybrid semantic and graph search, and scale:
-- **Temporal Awareness:** Tracks changes in facts and relationships over time, enabling point-in-time queries. Graph edges include temporal metadata to record relationship lifecycles.
-- **Episodic Processing:** Ingests data as discrete episodes, maintaining data provenance and allowing incremental entity and relationship extraction.
-- **Hybrid Search:** Combines semantic and BM25 full-text search, with the ability to rerank results by distance from a central node e.g. “Kendra”.
-- **Scalable:** Designed for processing large datasets, with parallelization of LLM calls for bulk processing while preserving the chronology of events.
+We were intrigued by Microsoft’s GraphRAG, which expanded on RAG text chunking by using a graph to better model a
+document corpus and making this representation available via semantic and graph search techniques. However, GraphRAG did
+not address our core problem: It's primarily designed for static documents and doesn't inherently handle temporal
+aspects of data.
+Graphiti is designed from the ground up to handle constantly changing information, hybrid semantic and graph search, and
+scale:
+- **Temporal Awareness:** Tracks changes in facts and relationships over time, enabling point-in-time queries. Graph
+  edges include temporal metadata to record relationship lifecycles.
+- **Episodic Processing:** Ingests data as discrete episodes, maintaining data provenance and allowing incremental
+  entity and relationship extraction.
+- **Hybrid Search:** Combines semantic and BM25 full-text search, with the ability to rerank results by distance from a
+  central node e.g. “Kendra”.
+- **Scalable:** Designed for processing large datasets, with parallelization of LLM calls for bulk processing while
+  preserving the chronology of events.
 - **Supports Varied Sources:** Can ingest both unstructured text and structured JSON data.
 <p align="center">
@@ -91,7 +107,8 @@ Optional:
 - Anthropic or Groq API key (for alternative LLM providers)
 > [!TIP]
-> The simplest way to install Neo4j is via [Neo4j Desktop](https://neo4j.com/download/). It provides a user-friendly interface to manage Neo4j instances and databases.
+> The simplest way to install Neo4j is via [Neo4j Desktop](https://neo4j.com/download/). It provides a user-friendly
+> interface to manage Neo4j instances and databases.
 ```bash
 pip install graphiti-core
@@ -106,7 +123,8 @@ poetry add graphiti-core
 ## Quick Start
 > [!IMPORTANT]
-> Graphiti uses OpenAI for LLM inference and embedding. Ensure that an `OPENAI_API_KEY` is set in your environment. Support for Anthropic and Groq LLM inferences is available, too.
+> Graphiti uses OpenAI for LLM inference and embedding. Ensure that an `OPENAI_API_KEY` is set in your environment.
+> Support for Anthropic and Groq LLM inferences is available, too.
 ```python
 from graphiti_core import Graphiti
@@ -140,25 +158,25 @@ for i, episode in enumerate(episodes):
 results = await graphiti.search('Who was the California Attorney General?')
 [
     EntityEdge(
-    │   uuid='3133258f738e487383f07b04e15d4ac0',
-    │   source_node_uuid='2a85789b318d4e418050506879906e62',
-    │   target_node_uuid='baf7781f445945989d6e4f927f881556',
-    │   created_at=datetime.datetime(2024, 8, 26, 13, 13, 24, 861097),
-    │   name='HELD_POSITION',
-        # the fact reflects the updated state that Harris is
-        # no longer the AG of California
-    │   fact='Kamala Harris was the Attorney General of California',
-    │   fact_embedding=[
-    │   │   -0.009955154731869698,
-    │       ...
-    │   │   0.00784289836883545
-    │   ],
-    │   episodes=['b43e98ad0a904088a76c67985caecc22'],
-    │   expired_at=datetime.datetime(2024, 8, 26, 20, 18, 1, 53812),
-        # These dates represent the date this edge was true.
-    │   valid_at=datetime.datetime(2011, 1, 3, 0, 0, tzinfo=<UTC>),
-    │   invalid_at=datetime.datetime(2017, 1, 3, 0, 0, tzinfo=<UTC>)
-    )
+│   uuid = '3133258f738e487383f07b04e15d4ac0',
+│   source_node_uuid = '2a85789b318d4e418050506879906e62',
+│   target_node_uuid = 'baf7781f445945989d6e4f927f881556',
+│   created_at = datetime.datetime(2024, 8, 26, 13, 13, 24, 861097),
+│   name = 'HELD_POSITION',
+# the fact reflects the updated state that Harris is
+# no longer the AG of California
+│   fact = 'Kamala Harris was the Attorney General of California',
+│   fact_embedding = [
+│   │   -0.009955154731869698,
+│       ...
+│   │   0.00784289836883545
+│],
+│   episodes = ['b43e98ad0a904088a76c67985caecc22'],
+│   expired_at = datetime.datetime(2024, 8, 26, 20, 18, 1, 53812),
+# These dates represent the date this edge was true.
+│   valid_at = datetime.datetime(2011, 1, 3, 0, 0, tzinfo= < UTC >),
+│   invalid_at = datetime.datetime(2017, 1, 3, 0, 0, tzinfo= < UTC >)
+)
 ]
 # Rerank search results based on graph distance
@@ -191,14 +209,16 @@ Graphiti is under active development. We aim to maintain API stability while wor
 - [ ] Achieving good performance with different LLM and embedding models
 - [ ] Creating a dedicated embedder interface
 - [ ] Supporting custom graph schemas:
-  - Allow developers to provide their own defined node and edge classes when ingesting episodes
-  - Enable more flexible knowledge representation tailored to specific use cases
+    - Allow developers to provide their own defined node and edge classes when ingesting episodes
+    - Enable more flexible knowledge representation tailored to specific use cases
 - [ ] Enhancing retrieval capabilities with more robust and configurable options
 - [ ] Expanding test coverage to ensure reliability and catch edge cases
 ## Contributing
-We encourage and appreciate all forms of contributions, whether it's code, documentation, addressing GitHub Issues, or answering questions in the Graphiti Discord channel. For detailed guidelines on code contributions, please refer to [CONTRIBUTING](CONTRIBUTING.md).
+We encourage and appreciate all forms of contributions, whether it's code, documentation, addressing GitHub Issues, or
+answering questions in the Graphiti Discord channel. For detailed guidelines on code contributions, please refer
+to [CONTRIBUTING](CONTRIBUTING.md).
 ## Support

{graphiti_core-0.3.3 → graphiti_core-0.3.4}/README.md RENAMED Viewed

@@ -1,6 +1,6 @@
 <div align="center">
-# Graphiti
+<img width="350" alt="Graphiti-ts-small" src="https://github.com/user-attachments/assets/bbd02947-e435-4a05-b25a-bbbac36d52c8">
 ## Temporal Knowledge Graphs for Agentic Applications
@@ -16,7 +16,9 @@
 </div>
-Graphiti builds dynamic, temporally aware Knowledge Graphs that represent complex, evolving relationships between entities over time. Graphiti ingests both unstructured and structured data, and the resulting graph may be queried using a fusion of time, full-text, semantic, and graph algorithm approaches.
+Graphiti builds dynamic, temporally aware Knowledge Graphs that represent complex, evolving relationships between
+entities over time. Graphiti ingests both unstructured and structured data, and the resulting graph may be queried using
+a fusion of time, full-text, semantic, and graph algorithm approaches.
 <br />
@@ -26,25 +28,39 @@ Graphiti builds dynamic, temporally aware Knowledge Graphs that represent comple
 <br />
-Graphiti helps you create and query Knowledge Graphs that evolve over time. A knowledge graph is a network of interconnected facts, such as _“Kendra loves Adidas shoes.”_ Each fact is a “triplet” represented by two entities, or nodes (_”Kendra”_, _“Adidas shoes”_), and their relationship, or edge (_”loves”_). Knowledge Graphs have been explored extensively for information retrieval. What makes Graphiti unique is its ability to autonomously build a knowledge graph while handling changing relationships and maintaining historical context.
+Graphiti helps you create and query Knowledge Graphs that evolve over time. A knowledge graph is a network of
+interconnected facts, such as _“Kendra loves Adidas shoes.”_ Each fact is a “triplet” represented by two entities, or
+nodes (_”Kendra”_, _“Adidas shoes”_), and their relationship, or edge (_”loves”_). Knowledge Graphs have been explored
+extensively for information retrieval. What makes Graphiti unique is its ability to autonomously build a knowledge graph
+while handling changing relationships and maintaining historical context.
 With Graphiti, you can build LLM applications such as:
-- Assistants that learn from user interactions, fusing personal knowledge with dynamic data from business systems like CRMs and billing platforms.
+- Assistants that learn from user interactions, fusing personal knowledge with dynamic data from business systems like
+  CRMs and billing platforms.
 - Agents that autonomously execute complex tasks, reasoning with state changes from multiple dynamic sources.
-Graphiti supports a wide range of applications in sales, customer service, health, finance, and more, enabling long-term recall and state-based reasoning for both assistants and agents.
+Graphiti supports a wide range of applications in sales, customer service, health, finance, and more, enabling long-term
+recall and state-based reasoning for both assistants and agents.
 ## Why Graphiti?
-We were intrigued by Microsoft’s GraphRAG, which expanded on RAG text chunking by using a graph to better model a document corpus and making this representation available via semantic and graph search techniques. However, GraphRAG did not address our core problem: It's primarily designed for static documents and doesn't inherently handle temporal aspects of data.
-Graphiti is designed from the ground up to handle constantly changing information, hybrid semantic and graph search, and scale:
-- **Temporal Awareness:** Tracks changes in facts and relationships over time, enabling point-in-time queries. Graph edges include temporal metadata to record relationship lifecycles.
-- **Episodic Processing:** Ingests data as discrete episodes, maintaining data provenance and allowing incremental entity and relationship extraction.
-- **Hybrid Search:** Combines semantic and BM25 full-text search, with the ability to rerank results by distance from a central node e.g. “Kendra”.
-- **Scalable:** Designed for processing large datasets, with parallelization of LLM calls for bulk processing while preserving the chronology of events.
+We were intrigued by Microsoft’s GraphRAG, which expanded on RAG text chunking by using a graph to better model a
+document corpus and making this representation available via semantic and graph search techniques. However, GraphRAG did
+not address our core problem: It's primarily designed for static documents and doesn't inherently handle temporal
+aspects of data.
+Graphiti is designed from the ground up to handle constantly changing information, hybrid semantic and graph search, and
+scale:
+- **Temporal Awareness:** Tracks changes in facts and relationships over time, enabling point-in-time queries. Graph
+  edges include temporal metadata to record relationship lifecycles.
+- **Episodic Processing:** Ingests data as discrete episodes, maintaining data provenance and allowing incremental
+  entity and relationship extraction.
+- **Hybrid Search:** Combines semantic and BM25 full-text search, with the ability to rerank results by distance from a
+  central node e.g. “Kendra”.
+- **Scalable:** Designed for processing large datasets, with parallelization of LLM calls for bulk processing while
+  preserving the chronology of events.
 - **Supports Varied Sources:** Can ingest both unstructured text and structured JSON data.
 <p align="center">
@@ -70,7 +86,8 @@ Optional:
 - Anthropic or Groq API key (for alternative LLM providers)
 > [!TIP]
-> The simplest way to install Neo4j is via [Neo4j Desktop](https://neo4j.com/download/). It provides a user-friendly interface to manage Neo4j instances and databases.
+> The simplest way to install Neo4j is via [Neo4j Desktop](https://neo4j.com/download/). It provides a user-friendly
+> interface to manage Neo4j instances and databases.
 ```bash
 pip install graphiti-core
@@ -85,7 +102,8 @@ poetry add graphiti-core
 ## Quick Start
 > [!IMPORTANT]
-> Graphiti uses OpenAI for LLM inference and embedding. Ensure that an `OPENAI_API_KEY` is set in your environment. Support for Anthropic and Groq LLM inferences is available, too.
+> Graphiti uses OpenAI for LLM inference and embedding. Ensure that an `OPENAI_API_KEY` is set in your environment.
+> Support for Anthropic and Groq LLM inferences is available, too.
 ```python
 from graphiti_core import Graphiti
@@ -119,25 +137,25 @@ for i, episode in enumerate(episodes):
 results = await graphiti.search('Who was the California Attorney General?')
 [
     EntityEdge(
-    │   uuid='3133258f738e487383f07b04e15d4ac0',
-    │   source_node_uuid='2a85789b318d4e418050506879906e62',
-    │   target_node_uuid='baf7781f445945989d6e4f927f881556',
-    │   created_at=datetime.datetime(2024, 8, 26, 13, 13, 24, 861097),
-    │   name='HELD_POSITION',
-        # the fact reflects the updated state that Harris is
-        # no longer the AG of California
-    │   fact='Kamala Harris was the Attorney General of California',
-    │   fact_embedding=[
-    │   │   -0.009955154731869698,
-    │       ...
-    │   │   0.00784289836883545
-    │   ],
-    │   episodes=['b43e98ad0a904088a76c67985caecc22'],
-    │   expired_at=datetime.datetime(2024, 8, 26, 20, 18, 1, 53812),
-        # These dates represent the date this edge was true.
-    │   valid_at=datetime.datetime(2011, 1, 3, 0, 0, tzinfo=<UTC>),
-    │   invalid_at=datetime.datetime(2017, 1, 3, 0, 0, tzinfo=<UTC>)
-    )
+│   uuid = '3133258f738e487383f07b04e15d4ac0',
+│   source_node_uuid = '2a85789b318d4e418050506879906e62',
+│   target_node_uuid = 'baf7781f445945989d6e4f927f881556',
+│   created_at = datetime.datetime(2024, 8, 26, 13, 13, 24, 861097),
+│   name = 'HELD_POSITION',
+# the fact reflects the updated state that Harris is
+# no longer the AG of California
+│   fact = 'Kamala Harris was the Attorney General of California',
+│   fact_embedding = [
+│   │   -0.009955154731869698,
+│       ...
+│   │   0.00784289836883545
+│],
+│   episodes = ['b43e98ad0a904088a76c67985caecc22'],
+│   expired_at = datetime.datetime(2024, 8, 26, 20, 18, 1, 53812),
+# These dates represent the date this edge was true.
+│   valid_at = datetime.datetime(2011, 1, 3, 0, 0, tzinfo= < UTC >),
+│   invalid_at = datetime.datetime(2017, 1, 3, 0, 0, tzinfo= < UTC >)
+)
 ]
 # Rerank search results based on graph distance
@@ -170,14 +188,16 @@ Graphiti is under active development. We aim to maintain API stability while wor
 - [ ] Achieving good performance with different LLM and embedding models
 - [ ] Creating a dedicated embedder interface
 - [ ] Supporting custom graph schemas:
-  - Allow developers to provide their own defined node and edge classes when ingesting episodes
-  - Enable more flexible knowledge representation tailored to specific use cases
+    - Allow developers to provide their own defined node and edge classes when ingesting episodes
+    - Enable more flexible knowledge representation tailored to specific use cases
 - [ ] Enhancing retrieval capabilities with more robust and configurable options
 - [ ] Expanding test coverage to ensure reliability and catch edge cases
 ## Contributing
-We encourage and appreciate all forms of contributions, whether it's code, documentation, addressing GitHub Issues, or answering questions in the Graphiti Discord channel. For detailed guidelines on code contributions, please refer to [CONTRIBUTING](CONTRIBUTING.md).
+We encourage and appreciate all forms of contributions, whether it's code, documentation, addressing GitHub Issues, or
+answering questions in the Graphiti Discord channel. For detailed guidelines on code contributions, please refer
+to [CONTRIBUTING](CONTRIBUTING.md).
 ## Support

{graphiti_core-0.3.3 → graphiti_core-0.3.4}/graphiti_core/edges.py RENAMED Viewed

@@ -104,7 +104,6 @@ class EpisodicEdge(Edge):
         edges = [get_episodic_edge_from_record(record) for record in records]
-        logger.info(f'Found Edge: {uuid}')
         if len(edges) == 0:
             raise EdgeNotFoundError(uuid)
         return edges[0]
@@ -127,7 +126,29 @@ class EpisodicEdge(Edge):
         edges = [get_episodic_edge_from_record(record) for record in records]
-        logger.info(f'Found Edges: {uuids}')
+        if len(edges) == 0:
+            raise EdgeNotFoundError(uuids[0])
+        return edges
+    @classmethod
+    async def get_by_group_ids(cls, driver: AsyncDriver, group_ids: list[str | None]):
+        records, _, _ = await driver.execute_query(
+            """
+        MATCH (n:Episodic)-[e:MENTIONS]->(m:Entity)
+        WHERE e.group_id IN $group_ids
+        RETURN
+            e.uuid As uuid,
+            e.group_id AS group_id,
+            n.uuid AS source_node_uuid,
+            m.uuid AS target_node_uuid,
+            e.created_at AS created_at
+        """,
+            group_ids=group_ids,
+        )
+        edges = [get_episodic_edge_from_record(record) for record in records]
+        uuids = [edge.uuid for edge in edges]
         if len(edges) == 0:
             raise EdgeNotFoundError(uuids[0])
         return edges
@@ -215,7 +236,6 @@ class EntityEdge(Edge):
         edges = [get_entity_edge_from_record(record) for record in records]
-        logger.info(f'Found Edge: {uuid}')
         if len(edges) == 0:
             raise EdgeNotFoundError(uuid)
         return edges[0]
@@ -245,7 +265,36 @@ class EntityEdge(Edge):
         edges = [get_entity_edge_from_record(record) for record in records]
-        logger.info(f'Found Edges: {uuids}')
+        if len(edges) == 0:
+            raise EdgeNotFoundError(uuids[0])
+        return edges
+    @classmethod
+    async def get_by_group_ids(cls, driver: AsyncDriver, group_ids: list[str | None]):
+        records, _, _ = await driver.execute_query(
+            """
+        MATCH (n:Entity)-[e:RELATES_TO]->(m:Entity)
+        WHERE e.group_id IN $group_ids
+        RETURN
+            e.uuid AS uuid,
+            n.uuid AS source_node_uuid,
+            m.uuid AS target_node_uuid,
+            e.created_at AS created_at,
+            e.name AS name,
+            e.group_id AS group_id,
+            e.fact AS fact,
+            e.fact_embedding AS fact_embedding,
+            e.episodes AS episodes,
+            e.expired_at AS expired_at,
+            e.valid_at AS valid_at,
+            e.invalid_at AS invalid_at
+        """,
+            group_ids=group_ids,
+        )
+        edges = [get_entity_edge_from_record(record) for record in records]
+        uuids = [edge.uuid for edge in edges]
         if len(edges) == 0:
             raise EdgeNotFoundError(uuids[0])
         return edges
@@ -288,8 +337,6 @@ class CommunityEdge(Edge):
         edges = [get_community_edge_from_record(record) for record in records]
-        logger.info(f'Found Edge: {uuid}')
         return edges[0]
     @classmethod
@@ -310,7 +357,25 @@ class CommunityEdge(Edge):
         edges = [get_community_edge_from_record(record) for record in records]
-        logger.info(f'Found Edges: {uuids}')
+        return edges
+    @classmethod
+    async def get_by_group_ids(cls, driver: AsyncDriver, group_ids: list[str | None]):
+        records, _, _ = await driver.execute_query(
+            """
+        MATCH (n:Community)-[e:HAS_MEMBER]->(m:Entity | Community)
+        WHERE e.group_id IN $group_ids
+        RETURN
+            e.uuid As uuid,
+            e.group_id AS group_id,
+            n.uuid AS source_node_uuid,
+            m.uuid AS target_node_uuid,
+            e.created_at AS created_at
+        """,
+            group_ids=group_ids,
+        )
+        edges = [get_community_edge_from_record(record) for record in records]
         return edges

{graphiti_core-0.3.3 → graphiti_core-0.3.4}/graphiti_core/graphiti.py RENAMED Viewed

@@ -77,7 +77,14 @@ load_dotenv()
 class Graphiti:
-    def __init__(self, uri: str, user: str, password: str, llm_client: LLMClient | None = None):
+    def __init__(
+        self,
+        uri: str,
+        user: str,
+        password: str,
+        llm_client: LLMClient | None = None,
+        store_raw_episode_content: bool = True,
+    ):
         """
         Initialize a Graphiti instance.
@@ -116,6 +123,7 @@ class Graphiti:
         """
         self.driver = AsyncGraphDatabase.driver(uri, auth=(user, password))
         self.database = 'neo4j'
+        self.store_raw_episode_content = store_raw_episode_content
         if llm_client:
             self.llm_client = llm_client
         else:
@@ -150,8 +158,8 @@ class Graphiti:
                 # Use graphiti...
             finally:
                 graphiti.close()
-        self.driver.close()
         """
+        self.driver.close()
     async def build_indices_and_constraints(self):
         """
@@ -251,6 +259,8 @@ class Graphiti:
             An id for the graph partition the episode is a part of.
         uuid : str | None
             Optional uuid of the episode.
+        update_communities : bool
+            Optional. Whether to update communities with new node information
         Returns
         -------
@@ -276,7 +286,6 @@ class Graphiti:
         try:
             start = time()
-            nodes: list[EntityNode] = []
             entity_edges: list[EntityEdge] = []
             embedder = self.llm_client.get_embedder()
             now = datetime.now()
@@ -295,6 +304,8 @@ class Graphiti:
                 valid_at=reference_time,
             )
             episode.uuid = uuid if uuid is not None else episode.uuid
+            if not self.store_raw_episode_content:
+                episode.content = ''
             # Extract entities as nodes
@@ -323,7 +334,7 @@ class Graphiti:
                 ),
             )
             logger.info(f'Adjusted mentioned nodes: {[(n.name, n.uuid) for n in mentioned_nodes]}')
-            nodes.extend(mentioned_nodes)
+            nodes = mentioned_nodes
             extracted_edges_with_resolved_pointers = resolve_edge_pointers(
                 extracted_edges, uuid_map
@@ -568,7 +579,7 @@ class Graphiti:
         center_node_uuid: str | None = None,
         group_ids: list[str | None] | None = None,
         num_results=DEFAULT_SEARCH_LIMIT,
-    ):
+    ) -> list[EntityEdge]:
         """
         Perform a hybrid search on the knowledge graph.

{graphiti_core-0.3.3 → graphiti_core-0.3.4}/graphiti_core/llm_client/anthropic_client.py RENAMED Viewed

@@ -30,13 +30,17 @@ from .errors import RateLimitError
 logger = logging.getLogger(__name__)
 DEFAULT_MODEL = 'claude-3-5-sonnet-20240620'
+DEFAULT_MAX_TOKENS = 8192
 class AnthropicClient(LLMClient):
     def __init__(self, config: LLMConfig | None = None, cache: bool = False):
         if config is None:
-            config = LLMConfig()
+            config = LLMConfig(max_tokens=DEFAULT_MAX_TOKENS)
+        elif config.max_tokens is None:
+            config.max_tokens = DEFAULT_MAX_TOKENS
         super().__init__(config, cache)
         self.client = AsyncAnthropic(
             api_key=config.api_key,
             # we'll use tenacity to retry

{graphiti_core-0.3.3 → graphiti_core-0.3.4}/graphiti_core/llm_client/client.py RENAMED Viewed

@@ -35,7 +35,7 @@ logger = logging.getLogger(__name__)
 def is_server_or_retry_error(exception):
-    if isinstance(exception, RateLimitError):
+    if isinstance(exception, (RateLimitError, json.decoder.JSONDecodeError)):
         return True
     return (

{graphiti_core-0.3.3 → graphiti_core-0.3.4}/graphiti_core/llm_client/config.py RENAMED Viewed

@@ -15,7 +15,7 @@ limitations under the License.
 """
 EMBEDDING_DIM = 1024
-DEFAULT_MAX_TOKENS = 4096
+DEFAULT_MAX_TOKENS = 16384
 DEFAULT_TEMPERATURE = 0

{graphiti_core-0.3.3 → graphiti_core-0.3.4}/graphiti_core/llm_client/groq_client.py RENAMED Viewed

@@ -31,13 +31,17 @@ from .errors import RateLimitError
 logger = logging.getLogger(__name__)
 DEFAULT_MODEL = 'llama-3.1-70b-versatile'
+DEFAULT_MAX_TOKENS = 2048
 class GroqClient(LLMClient):
     def __init__(self, config: LLMConfig | None = None, cache: bool = False):
         if config is None:
-            config = LLMConfig()
+            config = LLMConfig(max_tokens=DEFAULT_MAX_TOKENS)
+        elif config.max_tokens is None:
+            config.max_tokens = DEFAULT_MAX_TOKENS
         super().__init__(config, cache)
         self.client = AsyncGroq(api_key=config.api_key)
     def get_embedder(self) -> typing.Any:

{graphiti_core-0.3.3 → graphiti_core-0.3.4}/graphiti_core/llm_client/openai_client.py RENAMED Viewed

@@ -33,13 +33,50 @@ DEFAULT_MODEL = 'gpt-4o-2024-08-06'
 class OpenAIClient(LLMClient):
-    def __init__(self, config: LLMConfig | None = None, cache: bool = False):
+    """
+    OpenAIClient is a client class for interacting with OpenAI's language models.
+    This class extends the LLMClient and provides methods to initialize the client,
+    get an embedder, and generate responses from the language model.
+    Attributes:
+        client (AsyncOpenAI): The OpenAI client used to interact with the API.
+        model (str): The model name to use for generating responses.
+        temperature (float): The temperature to use for generating responses.
+        max_tokens (int): The maximum number of tokens to generate in a response.
+    Methods:
+        __init__(config: LLMConfig | None = None, cache: bool = False, client: typing.Any = None):
+            Initializes the OpenAIClient with the provided configuration, cache setting, and client.
+        get_embedder() -> typing.Any:
+            Returns the embedder from the OpenAI client.
+        _generate_response(messages: list[Message]) -> dict[str, typing.Any]:
+            Generates a response from the language model based on the provided messages.
+    """
+    def __init__(
+        self, config: LLMConfig | None = None, cache: bool = False, client: typing.Any = None
+    ):
+        """
+        Initialize the OpenAIClient with the provided configuration, cache setting, and client.
+        Args:
+            config (LLMConfig | None): The configuration for the LLM client, including API key, model, base URL, temperature, and max tokens.
+            cache (bool): Whether to use caching for responses. Defaults to False.
+            client (Any | None): An optional async client instance to use. If not provided, a new AsyncOpenAI client is created.
+        """
         if config is None:
             config = LLMConfig()
         super().__init__(config, cache)
-        self.client = AsyncOpenAI(api_key=config.api_key, base_url=config.base_url)
+        if client is None:
+            self.client = AsyncOpenAI(api_key=config.api_key, base_url=config.base_url)
+        else:
+            self.client = client
     def get_embedder(self) -> typing.Any:
         return self.client.embeddings

{graphiti_core-0.3.3 → graphiti_core-0.3.4}/graphiti_core/nodes.py RENAMED Viewed

@@ -158,8 +158,6 @@ class EpisodicNode(Node):
         episodes = [get_episodic_node_from_record(record) for record in records]
-        logger.info(f'Found Node: {uuid}')
         if len(episodes) == 0:
             raise NodeNotFoundError(uuid)
@@ -185,7 +183,27 @@ class EpisodicNode(Node):
         episodes = [get_episodic_node_from_record(record) for record in records]
-        logger.info(f'Found Nodes: {uuids}')
+        return episodes
+    @classmethod
+    async def get_by_group_ids(cls, driver: AsyncDriver, group_ids: list[str | None]):
+        records, _, _ = await driver.execute_query(
+            """
+        MATCH (e:Episodic) WHERE e.group_id IN $group_ids
+            RETURN DISTINCT
+            e.content AS content,
+            e.created_at AS created_at,
+            e.valid_at AS valid_at,
+            e.uuid AS uuid,
+            e.name AS name,
+            e.group_id AS group_id,
+            e.source_description AS source_description,
+            e.source AS source
+        """,
+            group_ids=group_ids,
+        )
+        episodes = [get_episodic_node_from_record(record) for record in records]
         return episodes
@@ -240,8 +258,6 @@ class EntityNode(Node):
         nodes = [get_entity_node_from_record(record) for record in records]
-        logger.info(f'Found Node: {uuid}')
         return nodes[0]
     @classmethod
@@ -262,7 +278,25 @@ class EntityNode(Node):
         nodes = [get_entity_node_from_record(record) for record in records]
-        logger.info(f'Found Nodes: {uuids}')
+        return nodes
+    @classmethod
+    async def get_by_group_ids(cls, driver: AsyncDriver, group_ids: list[str | None]):
+        records, _, _ = await driver.execute_query(
+            """
+        MATCH (n:Entity) WHERE n.group_id IN $group_ids
+        RETURN
+            n.uuid As uuid,
+            n.name AS name,
+            n.name_embedding AS name_embedding,
+            n.group_id AS group_id,
+            n.created_at AS created_at,
+            n.summary AS summary
+        """,
+            group_ids=group_ids,
+        )
+        nodes = [get_entity_node_from_record(record) for record in records]
         return nodes
@@ -317,8 +351,6 @@ class CommunityNode(Node):
         nodes = [get_community_node_from_record(record) for record in records]
-        logger.info(f'Found Node: {uuid}')
         return nodes[0]
     @classmethod
@@ -337,11 +369,29 @@ class CommunityNode(Node):
             uuids=uuids,
         )
-        nodes = [get_community_node_from_record(record) for record in records]
+        communities = [get_community_node_from_record(record) for record in records]
-        logger.info(f'Found Nodes: {uuids}')
+        return communities
-        return nodes
+    @classmethod
+    async def get_by_group_ids(cls, driver: AsyncDriver, group_ids: list[str | None]):
+        records, _, _ = await driver.execute_query(
+            """
+        MATCH (n:Community) WHERE n.group_id IN $group_ids
+        RETURN
+            n.uuid As uuid,
+            n.name AS name,
+            n.name_embedding AS name_embedding,
+            n.group_id AS group_id,
+            n.created_at AS created_at,
+            n.summary AS summary
+        """,
+            group_ids=group_ids,
+        )
+        communities = [get_community_node_from_record(record) for record in records]
+        return communities
 # Node helpers

{graphiti_core-0.3.3 → graphiti_core-0.3.4}/graphiti_core/utils/maintenance/community_operations.py RENAMED Viewed

@@ -4,6 +4,7 @@ from collections import defaultdict
 from datetime import datetime
 from neo4j import AsyncDriver
+from pydantic import BaseModel
 from graphiti_core.edges import CommunityEdge
 from graphiti_core.llm_client import LLMClient
@@ -11,9 +12,17 @@ from graphiti_core.nodes import CommunityNode, EntityNode, get_community_node_fr
 from graphiti_core.prompts import prompt_library
 from graphiti_core.utils.maintenance.edge_operations import build_community_edges
+MAX_COMMUNITY_BUILD_CONCURRENCY = 10
 logger = logging.getLogger(__name__)
+class Neighbor(BaseModel):
+    node_uuid: str
+    edge_count: int
 async def build_community_projection(driver: AsyncDriver) -> str:
     records, _, _ = await driver.execute_query("""
     CALL gds.graph.project("communities", "Entity",
@@ -29,36 +38,96 @@ async def build_community_projection(driver: AsyncDriver) -> str:
     return records[0]['graph']
-async def destroy_projection(driver: AsyncDriver, projection_name: str):
-    await driver.execute_query(
-        """
-    CALL gds.graph.drop($projection_name)
-    """,
-        projection_name=projection_name,
-    )
+async def get_community_clusters(driver: AsyncDriver) -> list[list[EntityNode]]:
+    community_clusters: list[list[EntityNode]] = []
-async def get_community_clusters(
-    driver: AsyncDriver, projection_name: str
-) -> list[list[EntityNode]]:
-    records, _, _ = await driver.execute_query("""
-    CALL gds.leiden.stream("communities")
-    YIELD nodeId, communityId
-    RETURN gds.util.asNode(nodeId).uuid AS entity_uuid, communityId
+    group_id_values, _, _ = await driver.execute_query("""
+    MATCH (n:Entity WHERE n.group_id IS NOT NULL)
+    RETURN
+        collect(DISTINCT n.group_id) AS group_ids
     """)
-    community_map: dict[int, list[str]] = defaultdict(list)
-    for record in records:
-        community_map[record['communityId']].append(record['entity_uuid'])
-    community_clusters: list[list[EntityNode]] = list(
-        await asyncio.gather(
-            *[EntityNode.get_by_uuids(driver, cluster) for cluster in community_map.values()]
+    group_ids = group_id_values[0]['group_ids']
+    for group_id in group_ids:
+        projection: dict[str, list[Neighbor]] = {}
+        nodes = await EntityNode.get_by_group_ids(driver, [group_id])
+        for node in nodes:
+            records, _, _ = await driver.execute_query(
+                """
+            MATCH (n:Entity {group_id: $group_id, uuid: $uuid})-[r:RELATES_TO]-(m: Entity {group_id: $group_id})
+            WITH count(r) AS count, m.uuid AS uuid
+            RETURN
+                uuid,
+                count
+            """,
+                uuid=node.uuid,
+                group_id=group_id,
+            )
+            projection[node.uuid] = [
+                Neighbor(node_uuid=record['uuid'], edge_count=record['count']) for record in records
+            ]
+        cluster_uuids = label_propagation(projection)
+        community_clusters.extend(
+            list(
+                await asyncio.gather(
+                    *[EntityNode.get_by_uuids(driver, cluster) for cluster in cluster_uuids]
+                )
+            )
         )
-    )
     return community_clusters
+def label_propagation(projection: dict[str, list[Neighbor]]) -> list[list[str]]:
+    # Implement the label propagation community detection algorithm.
+    # 1. Start with each node being assigned its own community
+    # 2. Each node will take on the community of the plurality of its neighbors
+    # 3. Ties are broken by going to the largest community
+    # 4. Continue until no communities change during propagation
+    community_map = {uuid: i for i, uuid in enumerate(projection.keys())}
+    while True:
+        no_change = True
+        new_community_map: dict[str, int] = {}
+        for uuid, neighbors in projection.items():
+            curr_community = community_map[uuid]
+            community_candidates: dict[int, int] = defaultdict(int)
+            for neighbor in neighbors:
+                community_candidates[community_map[neighbor.node_uuid]] += neighbor.edge_count
+            community_lst = [
+                (count, community) for community, count in community_candidates.items()
+            ]
+            community_lst.sort(reverse=True)
+            community_candidate = community_lst[0][1] if len(community_lst) > 0 else -1
+            new_community = max(community_candidate, curr_community)
+            new_community_map[uuid] = new_community
+            if new_community != curr_community:
+                no_change = False
+        if no_change:
+            break
+        community_map = new_community_map
+    community_cluster_map = defaultdict(list)
+    for uuid, community in community_map.items():
+        community_cluster_map[community].append(uuid)
+    clusters = [cluster for cluster in community_cluster_map.values()]
+    return clusters
 async def summarize_pair(llm_client: LLMClient, summary_pair: tuple[str, str]) -> str:
     # Prepare context for LLM
     context = {'node_summaries': [{'summary': summary} for summary in summary_pair]}
@@ -85,7 +154,7 @@ async def generate_summary_description(llm_client: LLMClient, summary: str) -> s
 async def build_community(
-    llm_client: LLMClient, community_cluster: list[EntityNode]
+        llm_client: LLMClient, community_cluster: list[EntityNode]
 ) -> tuple[CommunityNode, list[CommunityEdge]]:
     summaries = [entity.summary for entity in community_cluster]
     length = len(summaries)
@@ -99,7 +168,7 @@ async def build_community(
                 *[
                     summarize_pair(llm_client, (str(left_summary), str(right_summary)))
                     for left_summary, right_summary in zip(
-                        summaries[: int(length / 2)], summaries[int(length / 2) :]
+                        summaries[: int(length / 2)], summaries[int(length / 2):]
                     )
                 ]
             )
@@ -127,15 +196,18 @@ async def build_community(
 async def build_communities(
-    driver: AsyncDriver, llm_client: LLMClient
+        driver: AsyncDriver, llm_client: LLMClient
 ) -> tuple[list[CommunityNode], list[CommunityEdge]]:
-    projection = await build_community_projection(driver)
-    community_clusters = await get_community_clusters(driver, projection)
+    community_clusters = await get_community_clusters(driver)
+    semaphore = asyncio.Semaphore(MAX_COMMUNITY_BUILD_CONCURRENCY)
+    async def limited_build_community(cluster):
+        async with semaphore:
+            return await build_community(llm_client, cluster)
     communities: list[tuple[CommunityNode, list[CommunityEdge]]] = list(
-        await asyncio.gather(
-            *[build_community(llm_client, cluster) for cluster in community_clusters]
-        )
+        await asyncio.gather(*[limited_build_community(cluster) for cluster in community_clusters])
     )
     community_nodes: list[CommunityNode] = []
@@ -144,7 +216,6 @@ async def build_communities(
         community_nodes.append(community[0])
         community_edges.extend(community[1])
-    await destroy_projection(driver, projection)
     return community_nodes, community_edges
@@ -156,7 +227,7 @@ async def remove_communities(driver: AsyncDriver):
 async def determine_entity_community(
-    driver: AsyncDriver, entity: EntityNode
+        driver: AsyncDriver, entity: EntityNode
 ) -> tuple[CommunityNode | None, bool]:
     # Check if the node is already part of a community
     records, _, _ = await driver.execute_query(
@@ -217,7 +288,7 @@ async def determine_entity_community(
 async def update_community(
-    driver: AsyncDriver, llm_client: LLMClient, embedder, entity: EntityNode
+        driver: AsyncDriver, llm_client: LLMClient, embedder, entity: EntityNode
 ):
     community, is_new = await determine_entity_community(driver, entity)
@@ -236,4 +307,4 @@ async def update_community(
     await community.generate_name_embedding(embedder)
-    await community.save(driver)
+    await community.save(driver)

{graphiti_core-0.3.3 → graphiti_core-0.3.4}/graphiti_core/utils/maintenance/node_operations.py RENAMED Viewed

@@ -272,9 +272,12 @@ async def dedupe_node_list(
     unique_nodes = []
     uuid_map: dict[str, str] = {}
     for node_data in nodes_data:
-        node = node_map[node_data['uuids'][0]]
-        node.summary = node_data['summary']
-        unique_nodes.append(node)
+        node_instance: EntityNode | None = node_map.get(node_data['uuids'][0])
+        if node_instance is None:
+            logger.warning(f'Node {node_data["uuids"][0]} not found in node map')
+            continue
+        node_instance.summary = node_data['summary']
+        unique_nodes.append(node_instance)
         for uuid in node_data['uuids'][1:]:
             uuid_value = node_map[node_data['uuids'][0]].uuid

{graphiti_core-0.3.3 → graphiti_core-0.3.4}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "graphiti-core"
-version = "0.3.3"
+version = "0.3.4"
 description = "A temporal graph building library"
 authors = [
     "Paul Paliychuk <paul@getzep.com>",