PyPI - cognee - Versions diffs - 0.5.1.dev0__py3-none-any.whl → 0.5.2__py3-none-any.whl - Mend

cognee 0.5.1.dev0py3-none-any.whl → 0.5.2py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (241) hide show

cognee/modules/notebooks/tutorials/python-development-with-cognee/data/my_developer_rules.md ADDED Viewed

@@ -0,0 +1,79 @@
+Assistant Guidelines
+These rules are absolutely imperative to adhere to. Comply with them precisely as they are outlined.
+The agent must use sequential thinking MCP tool to work out problems.
+Core Behavior Guidelines
+Respond only to explicit requests. Do not add files, code, tests, or comments unless asked.
+Follow instructions precisely. No assumptions or speculative additions.
+Use provided context accurately.
+Avoid extra output. No debugging logs or test harnesses unless requested.
+Produce clean, optimized code when code is requested. Respect existing style.
+Deliver complete, standalone solutions. No placeholders.
+Limit file creation. Only create new files when necessary.
+If you modify the model in a user's code, you must confirm with the user and never be sneaky. Always tell the user exactly what you are doing.
+Communication & Delivery
+9. Don't explain unless asked. Do not expose reasoning in outputs.
+10. If unsure, say "I don't know." Avoid hallucinated content.
+11. Maintain consistency across sessions. Refer to project memory and documentation.
+12. Respect privacy and permissions. Never leak or infer secure data.
+13. Prioritize targeted edits over full rewrites.
+14. Optimize incrementally. Avoid unnecessary overhauls.
+Spec.md Requirement
+You must maintain a file named Spec.md. This file acts as the single source of truth for the project.
+Rules:
+Before starting any implementation, check if Spec.md already exists.
+If it does not exist, create one using the template provided below.
+Always update Spec.md before and after any major change.
+Use the contents of Spec.md to guide logic, structure, and implementation decisions.
+When updating a section, condense previous content to keep the document concise.
+Spec.md Starter Template (Plain Text Format)
+Title: Spec.md – Project Specification
+Section: Purpose
+Describe the main goal of this feature, tool, or system.
+Section: Core Functionality
+List the key features, expected behaviors, and common use cases.
+Section: Architecture Overview
+Summarize the technical setup, frameworks used, and main modules or services.
+Section: Input and Output Contracts
+List all inputs and outputs in a table-like format:
+Input: describe the input data, its format, and where it comes from.
+Output: describe the output data, its format, and its destination.
+Section: Edge Cases and Constraints
+List known limitations, special scenarios, and fallback behaviors.
+Section: File and Module Map
+List all important files or modules and describe what each one is responsible for.
+Section: Open Questions or TODOs
+Create a checklist of unresolved decisions, logic that needs clarification, or tasks that are still pending.
+Section: Last Updated
+Include the most recent update date and who made the update.

cognee/modules/notebooks/tutorials/python-development-with-cognee/data/pep_style_guide.md ADDED Viewed

@@ -0,0 +1,74 @@
+# PEP 8 Style Guide: Essentials
+## Code Layout
+- Indentation: 4 spaces per level
+- Line length: 79 for code (88/100 acceptable by team), 72 for comments/docstrings
+- Blank lines: 2 around top-level defs/classes, 1 between methods
+```python
+# Hanging indent for long calls
+foo = long_function_name(
+    var_one, var_two,
+    var_three, var_four,
+)
+```
+## Imports
+- One import per line
+- Group: stdlib, third-party, local
+- Prefer absolute imports; avoid wildcard imports
+```python
+import os
+import sys
+from subprocess import Popen, PIPE
+import requests
+from myproject.models import User
+```
+## Whitespace
+- No space inside brackets or before commas/semicolons
+- Spaces around binary operators
+```python
+x = 1
+hypot2 = x * x + y * y
+```
+## Naming
+- snake_case: functions, variables
+- PascalCase: classes
+- SCREAMING_SNAKE_CASE: constants
+## Comments & Docstrings
+- Use complete sentences; keep up to date
+- Triple-double quotes for public modules, classes, functions
+```python
+def f(x: int) -> int:
+    """Return x doubled."""
+    return x * 2
+```
+## Type Hints
+- Space after colon; arrow for returns
+```python
+def munge(s: str) -> str: ...
+```
+## Tooling
+- Black, isort, Flake8 (or Ruff) to automate style
+- Example pyproject.toml excerpt:
+```toml
+[tool.black]
+line-length = 88
+[tool.isort]
+profile = "black"
+```
+## Common Violations
+- E501: line too long -> break with parentheses
+- E225: missing whitespace around operator
+- E402: module import not at top of file

cognee/modules/notebooks/tutorials/python-development-with-cognee/data/zen_principles.md ADDED Viewed

@@ -0,0 +1,74 @@
+# The Zen of Python: Practical Guide
+## Overview
+The Zen of Python (Tim Peters, import this) captures Python's philosophy. Use these principles as a checklist during design, coding, and reviews.
+## Key Principles With Guidance
+### 1. Beautiful is better than ugly
+Prefer descriptive names, clear structure, and consistent formatting.
+### 2. Explicit is better than implicit
+Be clear about behavior, imports, and types.
+```python
+from datetime import datetime, timedelta
+def get_future_date(days_ahead: int) -> datetime:
+    return datetime.now() + timedelta(days=days_ahead)
+```
+### 3. Simple is better than complex
+Choose straightforward solutions first.
+### 4. Complex is better than complicated
+When complexity is needed, organize it with clear abstractions.
+### 5. Flat is better than nested
+Use early returns to reduce indentation.
+### 6. Sparse is better than dense
+Give code room to breathe with whitespace.
+### 7. Readability counts
+Optimize for human readers; add docstrings for nontrivial code.
+### 8. Special cases aren't special enough to break the rules
+Stay consistent; exceptions should be rare and justified.
+### 9. Although practicality beats purity
+Prefer practical solutions that teams can maintain.
+### 10. Errors should never pass silently
+Handle exceptions explicitly; log with context.
+### 11. Unless explicitly silenced
+Silence only specific, acceptable errors and document why.
+### 12. In the face of ambiguity, refuse the temptation to guess
+Require explicit inputs and behavior.
+### 13. There should be one obvious way to do it
+Prefer standard library patterns and idioms.
+### 14. Although that way may not be obvious at first
+Learn Python idioms; embrace clarity over novelty.
+### 15. Now is better than never; 16. Never is often better than right now
+Iterate, but don't rush broken code.
+### 17/18. Hard to explain is bad; easy to explain is good
+Prefer designs you can explain simply.
+### 19. Namespaces are one honking great idea
+Use modules/packages to separate concerns; avoid wildcard imports.
+## Modern Python Tie-ins
+- Type hints reinforce explicitness
+- Context managers enforce safe resource handling
+- Dataclasses improve readability for data containers
+## Quick Review Checklist
+- Is it readable and explicit?
+- Is this the simplest working solution?
+- Are errors explicit and logged?
+- Are modules/namespaces used appropriately?

cognee/modules/retrieval/EntityCompletionRetriever.py CHANGED Viewed

@@ -1,5 +1,5 @@
 import asyncio
-from typing import Any, Optional, List, Type
+from typing import Any, Optional, List, Type, Union
 from cognee.shared.logging_utils import get_logger
 from cognee.infrastructure.entities.BaseEntityExtractor import BaseEntityExtractor
@@ -40,19 +40,21 @@ class EntityCompletionRetriever(BaseRetriever):
         context_provider: BaseContextProvider,
         user_prompt_path: str = "context_for_question.txt",
         system_prompt_path: str = "answer_simple_question.txt",
+        session_id: Optional[str] = None,
+        response_model: Type = str,
     ):
         self.extractor = extractor
         self.context_provider = context_provider
         self.user_prompt_path = user_prompt_path
         self.system_prompt_path = system_prompt_path
+        self.session_id = session_id
+        self.response_model = response_model
-    async def get_context(self, query: str) -> Any:
+    async def get_retrieved_objects(self, query: str) -> Any:
         """
-        Get context using entity extraction and context provider.
+        Get relevant objects from the provided query.
-        Logs the processing of the query and retrieves entities. If entities are extracted, it
-        attempts to retrieve the corresponding context using the context provider. Returns None
-        if no entities or context are found, or logs the error if an exception occurs.
+        Extracts and returns entities from the provided query, returning None if no entities are found.
         Parameters:
         -----------
@@ -62,8 +64,8 @@ class EntityCompletionRetriever(BaseRetriever):
         Returns:
         --------
-            - Any: The context retrieved from the context provider or None if not found or an
-              error occurred.
+            - Any: The extracted entities, or None if no entities are found.
         """
         try:
             logger.info(f"Processing query: {query[:100]}")
@@ -73,40 +75,57 @@ class EntityCompletionRetriever(BaseRetriever):
                 logger.info("No entities extracted")
                 return None
-            context = await self.context_provider.get_context(entities, query)
+            return entities
+        except Exception as e:
+            logger.error(f"Context retrieval failed: {str(e)}")
+            return None
+    async def get_context_from_objects(self, query: str, retrieved_objects: Any) -> str:
+        """
+        Get context using the extracted entities and a context provider.
+        Retrieves the context corresponding to the retrieved entities in retrieved_objects.
+        Returns and empty string if no context is retrieved.
+        Parameters:
+        -----------
+            - query (str): The query string for which context is being retrieved.
+            - retrieved_objects (Any): The retrieved entities extracted from the query.
+        Returns:
+        --------
+            - str: The context retrieved from the context provider or an empty string
+            if not found or an error occurred.
+        """
+        try:
+            logger.info(f"Processing query: {query[:100]}")
+            context = await self.context_provider.get_context(retrieved_objects, query)
             if not context:
                 logger.info("No context retrieved")
-                return None
+                return ""
             return context
         except Exception as e:
             logger.error(f"Context retrieval failed: {str(e)}")
-            return None
+            return ""
-    async def get_completion(
-        self,
-        query: str,
-        context: Optional[Any] = None,
-        session_id: Optional[str] = None,
-        response_model: Type = str,
-    ) -> List[Any]:
+    async def get_completion_from_context(
+        self, query: str, retrieved_objects: Any, context: Any
+    ) -> Union[List[str], List[dict]]:
         """
-        Generate completion using provided context or fetch new context.
-        If context is not provided, it fetches context using the query. If no context is
-        available, it returns an error message. Logs an error if completion generation fails due
-        to an exception.
+        Generate completion using provided context.
         Parameters:
         -----------
             - query (str): The query string for which completion is being generated.
-            - context (Optional[Any]): Optional context to be used for generating completion;
-              fetched if not provided. (default None)
-            - session_id (Optional[str]): Optional session identifier for caching. If None,
-              defaults to 'default_session'. (default None)
-            - response_model (Type): The Pydantic model type for structured output. (default str)
+            - retrieved_objects (Any): The retrieved objects extracted from the query.
+            - context (Any): Optional context to be used for generating completion.
         Returns:
         --------
@@ -115,12 +134,6 @@ class EntityCompletionRetriever(BaseRetriever):
               relevant entities were found.
         """
         try:
-            if context is None:
-                context = await self.get_context(query)
-            if context is None:
-                return ["No relevant entities found for the query."]
             # Check if we need to generate context summary for caching
             cache_config = CacheConfig()
             user = session_user.get()
@@ -128,7 +141,7 @@ class EntityCompletionRetriever(BaseRetriever):
             session_save = user_id and cache_config.caching
             if session_save:
-                conversation_history = await get_conversation_history(session_id=session_id)
+                conversation_history = await get_conversation_history(session_id=self.session_id)
                 context_summary, completion = await asyncio.gather(
                     summarize_text(str(context)),
@@ -138,7 +151,7 @@ class EntityCompletionRetriever(BaseRetriever):
                         user_prompt_path=self.user_prompt_path,
                         system_prompt_path=self.system_prompt_path,
                         conversation_history=conversation_history,
-                        response_model=response_model,
+                        response_model=self.response_model,
                     ),
                 )
             else:
@@ -147,7 +160,7 @@ class EntityCompletionRetriever(BaseRetriever):
                     context=context,
                     user_prompt_path=self.user_prompt_path,
                     system_prompt_path=self.system_prompt_path,
-                    response_model=response_model,
+                    response_model=self.response_model,
                 )
             if session_save:
@@ -155,7 +168,7 @@ class EntityCompletionRetriever(BaseRetriever):
                     query=query,
                     context_summary=context_summary,
                     answer=completion,
-                    session_id=session_id,
+                    session_id=self.session_id,
                 )
             return [completion]

cognee/modules/retrieval/__init__.py CHANGED Viewed

	@@ -1 +0,0 @@
1	-

cognee/modules/retrieval/base_retriever.py CHANGED Viewed

@@ -1,22 +1,78 @@
 from abc import ABC, abstractmethod
-from typing import Any, Optional, Type, List
+from typing import Any, Optional, Type, List, Union
 class BaseRetriever(ABC):
-    """Base class for all retrieval operations."""
+    """
+    Base class for all retrieval operations.
+    The retrieval workflow follows a three-step pipeline:
+    1. get_retrieved_objects: Fetch raw data (e.g., Graph Edges, Vector chunks).
+    2. get_context: Process raw data into a format suitable for an LLM (e.g., text string).
+    3. get_completion: Generate a final response with the help of an LLM using the context and original query.
+    """
     @abstractmethod
-    async def get_context(self, query: str) -> Any:
-        """Retrieves context based on the query."""
+    async def get_retrieved_objects(self, query: str) -> Any:
+        """
+        Retrieves the raw data points from the underlying storage (Graph or Vector DB).
+        Args:
+            query (str): The search query or input string.
+        Returns:
+            List[Any]: A list of raw objects (e.g., Edge objects, Document chunks)
+                       relevant to the query.
+        """
         pass
     @abstractmethod
-    async def get_completion(
+    async def get_context_from_objects(self, query: str, retrieved_objects: Any) -> str:
+        """
+        Transforms raw retrieved objects into a structured context for the LLM.
+        Args:
+            query (str): The search query or input string.
+            retrieved_objects (List[Any]): The output from get_retrieved_objects.
+        Returns:
+            Any: The formatted context (typically a string or a list of strings)
+                 to be injected into a prompt.
+        """
+        pass
+    @abstractmethod
+    async def get_completion_from_context(
         self,
         query: str,
-        context: Optional[Any] = None,
-        session_id: Optional[str] = None,
-        response_model: Type = str,
-    ) -> List[Any]:
-        """Generates a response using the query and optional context."""
+        retrieved_objects: Any,
+        context: Any,
+    ) -> Union[List[str], List[dict]]:
+        """
+        Generates a final output or answer based on the query and retrieved context.
+        Args:
+            query (str): The original user query.
+            retrieved_objects (List[Any]): The output from get_retrieved_objects.
+            context (Optional[Any]): The formatted context string/data used to
+                augment the generation. Output from get_context_from_objects.
+        Returns:
+            List[Any]: A list containing the generated completions or response objects.
+        """
         pass
+    async def get_completion(self, query: str) -> Union[List[str], List[dict]]:
+        """
+        Generates a final output or answer based on the query and retrieved context.
+        Args:
+            query (str): The original user query.
+        Returns:
+            List[Any]: A list containing the generated completions or response objects.
+        """
+        retrieved_objects = await self.get_retrieved_objects(query)
+        context = await self.get_context_from_objects(query, retrieved_objects)
+        completion = await self.get_completion_from_context(query, retrieved_objects, context)
+        return completion

cognee/modules/retrieval/chunks_retriever.py CHANGED Viewed

@@ -1,10 +1,11 @@
-from typing import Any, Optional
+from typing import Any, Optional, List, Union
+from cognee.modules.retrieval.utils.access_tracking import update_node_access_timestamps
 from cognee.shared.logging_utils import get_logger
 from cognee.infrastructure.databases.vector import get_vector_engine
 from cognee.modules.retrieval.base_retriever import BaseRetriever
 from cognee.modules.retrieval.exceptions.exceptions import NoDataError
 from cognee.infrastructure.databases.vector.exceptions.exceptions import CollectionNotFoundError
+from datetime import datetime, timezone
 logger = get_logger("ChunksRetriever")
@@ -27,75 +28,82 @@ class ChunksRetriever(BaseRetriever):
     ):
         self.top_k = top_k
-    async def get_context(self, query: str) -> Any:
+    async def get_completion_from_context(
+        self, query: str, retrieved_objects: Any, context: Any
+    ) -> Union[List[str], List[dict]]:
         """
-        Retrieves document chunks context based on the query.
-        Searches for document chunks relevant to the specified query using a vector engine.
-        Raises a NoDataError if no data is found in the system.
+        Generates a completion using document chunks context.
+        In case of the Chunks Retriever, we do not generate a completion, we just return
+        the payloads of found chunks.
         Parameters:
         -----------
-            - query (str): The query string to search for relevant document chunks.
+            - query (str): The query string to be used for generating a completion.
+            - retrieved_objects (Any): The retrieved objects to be used for generating a completion.
+            - context (Any): The context to be used for generating a completion.
         Returns:
         --------
-            - Any: A list of document chunk payloads retrieved from the search.
+            - List[dict]: A list of payloads of found chunks.
         """
-        logger.info(
-            f"Starting chunk retrieval for query: '{query[:100]}{'...' if len(query) > 100 else ''}'"
-        )
-        vector_engine = get_vector_engine()
-        try:
-            found_chunks = await vector_engine.search("DocumentChunk_text", query, limit=self.top_k)
-            logger.info(f"Found {len(found_chunks)} chunks from vector search")
-        except CollectionNotFoundError as error:
-            logger.error("DocumentChunk_text collection not found in vector database")
-            raise NoDataError("No data found in the system, please add data first.") from error
-        chunk_payloads = [result.payload for result in found_chunks]
-        logger.info(f"Returning {len(chunk_payloads)} chunk payloads")
-        return chunk_payloads
+        # TODO: Do we want to generate a completion using LLM here?
+        if retrieved_objects:
+            chunk_payloads = [found_chunk.payload for found_chunk in retrieved_objects]
+            return chunk_payloads
+        else:
+            return []
-    async def get_completion(
-        self, query: str, context: Optional[Any] = None, session_id: Optional[str] = None
-    ) -> Any:
+    async def get_context_from_objects(self, query: str, retrieved_objects: Any) -> str:
         """
-        Generates a completion using document chunks context.
-        If the context is not provided, it retrieves the context based on the query. Returns the
-        context, which can be used for further processing or generation of outputs.
+        Retrieves context from retrieved chunks, in text form.
         Parameters:
         -----------
-            - query (str): The query string to be used for generating a completion.
-            - context (Optional[Any]): Optional pre-fetched context to use for generating the
-              completion; if None, it retrieves the context for the query. (default None)
-            - session_id (Optional[str]): Optional session identifier for caching. If None,
-              defaults to 'default_session'. (default None)
+            - query (str): The query string used to search for relevant document chunks.
+            - retrieved_objects (Any): The retrieved objects to be used for generating textual context.
         Returns:
         --------
-            - Any: The context used for the completion or the retrieved context if none was
-              provided.
+            - str: A string containing the combined text of the retrieved chunks, or an
+              empty string if none are found.
         """
-        logger.info(
-            f"Starting completion generation for query: '{query[:100]}{'...' if len(query) > 100 else ''}'"
-        )
-        if context is None:
-            logger.debug("No context provided, retrieving context from vector database")
-            context = await self.get_context(query)
+        if retrieved_objects:
+            chunk_payload_texts = [found_chunk.payload["text"] for found_chunk in retrieved_objects]
+            return "\n".join(chunk_payload_texts)
         else:
-            logger.debug("Using provided context")
+            return ""
+    async def get_retrieved_objects(self, query: str) -> Any:
+        """
+        Retrieves document chunks context based on the query.
+        Searches for document chunks relevant to the specified query using a vector engine.
+        Raises a NoDataError if no data is found in the system.
+        Parameters:
+        -----------
+            - query (str): The query string to search for relevant document chunks.
+        Returns:
+        --------
+            - Any: A list of document chunks retrieved from the search.
+        """
         logger.info(
-            f"Returning context with {len(context) if isinstance(context, list) else 1} item(s)"
+            f"Starting chunk retrieval for query: '{query[:100]}{'...' if len(query) > 100 else ''}'"
         )
-        return context
+        vector_engine = get_vector_engine()
+        try:
+            found_chunks = await vector_engine.search(
+                "DocumentChunk_text", query, limit=self.top_k, include_payload=True
+            )
+            logger.info(f"Found {len(found_chunks)} chunks from vector search")
+            await update_node_access_timestamps(found_chunks)
+            return found_chunks
+        except CollectionNotFoundError as error:
+            logger.error("DocumentChunk_text collection not found in vector database")
+            raise NoDataError("No data found in the system, please add data first.") from error

cognee/modules/retrieval/coding_rules_retriever.py CHANGED Viewed

@@ -1,17 +1,18 @@
 import asyncio
 from functools import reduce
-from typing import List, Optional
+from typing import List, Optional, Any
 from cognee.shared.logging_utils import get_logger
+from cognee.modules.retrieval.base_retriever import BaseRetriever
 from cognee.tasks.codingagents.coding_rule_associations import get_existing_rules
 logger = get_logger("CodingRulesRetriever")
-class CodingRulesRetriever:
+class CodingRulesRetriever(BaseRetriever):
     """Retriever for handling codeing rule based searches."""
     def __init__(self, rules_nodeset_name: Optional[List[str]] = None):
-        if isinstance(rules_nodeset_name, list):
+        if isinstance(rules_nodeset_name, list) or rules_nodeset_name is None:
             if not rules_nodeset_name:
                 # If there is no provided nodeset set to coding_agent_rules
                 rules_nodeset_name = ["coding_agent_rules"]
@@ -19,7 +20,7 @@ class CodingRulesRetriever:
         self.rules_nodeset_name = rules_nodeset_name
         """Initialize retriever with search parameters."""
-    async def get_existing_rules(self, query_text):
+    async def get_retrieved_objects(self, query: str) -> Any:
         if self.rules_nodeset_name:
             rules_list = await asyncio.gather(
                 *[
@@ -27,5 +28,11 @@ class CodingRulesRetriever:
                     for nodeset in self.rules_nodeset_name
                 ]
             )
             return reduce(lambda x, y: x + y, rules_list, [])
+    async def get_context_from_objects(self, query, retrieved_objects):
+        return retrieved_objects
+    async def get_completion_from_context(self, query, retrieved_objects, context):
+        # TODO: Add completion generation logic if needed
+        return context

cognee 0.5.1.dev0__py3-none-any.whl → 0.5.2__py3-none-any.whl

cognee 0.5.1.dev0py3-none-any.whl → 0.5.2py3-none-any.whl