PyPI - ragtime-cli - Versions diffs - 0.2.9__tar.gz → 0.2.11__tar.gz - Mend

ragtime-cli 0.2.9tar.gz → 0.2.11tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

{ragtime_cli-0.2.9/ragtime_cli.egg-info → ragtime_cli-0.2.11}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: ragtime-cli
-Version: 0.2.9
+Version: 0.2.11
 Summary: Local-first memory and RAG system for Claude Code - semantic search over code, docs, and team knowledge
 Author-email: Bret Martineau <bretwardjames@gmail.com>
 License-Expression: MIT
@@ -100,13 +100,16 @@ ragtime forget <memory-id>
 # Index everything (docs + code)
 ragtime index
+# Incremental index (only changed files - fast!)
+ragtime index  # ~8 seconds vs ~5 minutes for unchanged codebases
 # Index only docs
 ragtime index --type docs
 # Index only code (functions, classes, composables)
 ragtime index --type code
-# Re-index with clear (removes old entries)
+# Full re-index (removes old entries, recomputes all embeddings)
 ragtime index --clear
 # Semantic search across all content
@@ -118,6 +121,10 @@ ragtime search "useAsyncState" --type code
 # Search only docs
 ragtime search "authentication" --type docs --namespace app
+# Hybrid search: semantic + keyword filtering
+# Use -r/--require to ensure terms appear in results
+ragtime search "error handling" -r mobile -r dart
 # Reindex memory files
 ragtime reindex
@@ -230,9 +237,9 @@ ragtime setup-ghp
 ```yaml
 docs:
-  paths: ["docs", ".ragtime"]
+  paths: ["docs"]
   patterns: ["**/*.md"]
-  exclude: ["**/node_modules/**"]
+  exclude: ["**/node_modules/**", "**/.ragtime/**"]
 code:
   paths: ["."]
@@ -244,6 +251,32 @@ conventions:
   also_search_memories: true
 ```
+## How Search Works
+Search returns **summaries with locations**, not full code:
+1. **What you get**: Function signatures, docstrings, class definitions
+2. **What you don't get**: Full implementations
+3. **What to do**: Use the file path + line number to read the full code
+This is intentional - embeddings work better on focused summaries than large code blocks. The search tells you *what exists and where*, then you read the file for details.
+For Claude/MCP usage: The search tool description instructs Claude to read returned file paths for full implementations before making code changes.
+### Hybrid Search
+Semantic search can lose qualifiers - "error handling in mobile app" might return web app results because "error handling" dominates the embedding. Use `require_terms` to ensure specific words appear:
+```bash
+# CLI
+ragtime search "error handling" -r mobile -r dart
+# MCP
+search(query="error handling", require_terms=["mobile", "dart"])
+```
+This combines semantic similarity (finds conceptually related content) with keyword filtering (ensures qualifiers aren't ignored).
 ## Code Indexing
 The code indexer extracts meaningful symbols from your codebase:
@@ -251,7 +284,7 @@ The code indexer extracts meaningful symbols from your codebase:
 | Language | What Gets Indexed |
 |----------|-------------------|
 | Python | Classes, methods, functions (with docstrings) |
-| TypeScript/JS | Exported functions, classes, interfaces, types, constants |
+| TypeScript/JS | Functions, classes, interfaces, types (exported and non-exported) |
 | Vue | Components, composable usage (useXxx calls) |
 | Dart | Classes, functions, mixins, extensions |

{ragtime_cli-0.2.9 → ragtime_cli-0.2.11}/README.md RENAMED Viewed

@@ -70,13 +70,16 @@ ragtime forget <memory-id>
 # Index everything (docs + code)
 ragtime index
+# Incremental index (only changed files - fast!)
+ragtime index  # ~8 seconds vs ~5 minutes for unchanged codebases
 # Index only docs
 ragtime index --type docs
 # Index only code (functions, classes, composables)
 ragtime index --type code
-# Re-index with clear (removes old entries)
+# Full re-index (removes old entries, recomputes all embeddings)
 ragtime index --clear
 # Semantic search across all content
@@ -88,6 +91,10 @@ ragtime search "useAsyncState" --type code
 # Search only docs
 ragtime search "authentication" --type docs --namespace app
+# Hybrid search: semantic + keyword filtering
+# Use -r/--require to ensure terms appear in results
+ragtime search "error handling" -r mobile -r dart
 # Reindex memory files
 ragtime reindex
@@ -200,9 +207,9 @@ ragtime setup-ghp
 ```yaml
 docs:
-  paths: ["docs", ".ragtime"]
+  paths: ["docs"]
   patterns: ["**/*.md"]
-  exclude: ["**/node_modules/**"]
+  exclude: ["**/node_modules/**", "**/.ragtime/**"]
 code:
   paths: ["."]
@@ -214,6 +221,32 @@ conventions:
   also_search_memories: true
 ```
+## How Search Works
+Search returns **summaries with locations**, not full code:
+1. **What you get**: Function signatures, docstrings, class definitions
+2. **What you don't get**: Full implementations
+3. **What to do**: Use the file path + line number to read the full code
+This is intentional - embeddings work better on focused summaries than large code blocks. The search tells you *what exists and where*, then you read the file for details.
+For Claude/MCP usage: The search tool description instructs Claude to read returned file paths for full implementations before making code changes.
+### Hybrid Search
+Semantic search can lose qualifiers - "error handling in mobile app" might return web app results because "error handling" dominates the embedding. Use `require_terms` to ensure specific words appear:
+```bash
+# CLI
+ragtime search "error handling" -r mobile -r dart
+# MCP
+search(query="error handling", require_terms=["mobile", "dart"])
+```
+This combines semantic similarity (finds conceptually related content) with keyword filtering (ensures qualifiers aren't ignored).
 ## Code Indexing
 The code indexer extracts meaningful symbols from your codebase:
@@ -221,7 +254,7 @@ The code indexer extracts meaningful symbols from your codebase:
 | Language | What Gets Indexed |
 |----------|-------------------|
 | Python | Classes, methods, functions (with docstrings) |
-| TypeScript/JS | Exported functions, classes, interfaces, types, constants |
+| TypeScript/JS | Functions, classes, interfaces, types (exported and non-exported) |
 | Vue | Components, composable usage (useXxx calls) |
 | Dart | Classes, functions, mixins, extensions |

{ragtime_cli-0.2.9 → ragtime_cli-0.2.11}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "ragtime-cli"
-version = "0.2.9"
+version = "0.2.11"
 description = "Local-first memory and RAG system for Claude Code - semantic search over code, docs, and team knowledge"
 readme = "README.md"
 license = "MIT"

{ragtime_cli-0.2.9 → ragtime_cli-0.2.11/ragtime_cli.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: ragtime-cli
-Version: 0.2.9
+Version: 0.2.11
 Summary: Local-first memory and RAG system for Claude Code - semantic search over code, docs, and team knowledge
 Author-email: Bret Martineau <bretwardjames@gmail.com>
 License-Expression: MIT
@@ -100,13 +100,16 @@ ragtime forget <memory-id>
 # Index everything (docs + code)
 ragtime index
+# Incremental index (only changed files - fast!)
+ragtime index  # ~8 seconds vs ~5 minutes for unchanged codebases
 # Index only docs
 ragtime index --type docs
 # Index only code (functions, classes, composables)
 ragtime index --type code
-# Re-index with clear (removes old entries)
+# Full re-index (removes old entries, recomputes all embeddings)
 ragtime index --clear
 # Semantic search across all content
@@ -118,6 +121,10 @@ ragtime search "useAsyncState" --type code
 # Search only docs
 ragtime search "authentication" --type docs --namespace app
+# Hybrid search: semantic + keyword filtering
+# Use -r/--require to ensure terms appear in results
+ragtime search "error handling" -r mobile -r dart
 # Reindex memory files
 ragtime reindex
@@ -230,9 +237,9 @@ ragtime setup-ghp
 ```yaml
 docs:
-  paths: ["docs", ".ragtime"]
+  paths: ["docs"]
   patterns: ["**/*.md"]
-  exclude: ["**/node_modules/**"]
+  exclude: ["**/node_modules/**", "**/.ragtime/**"]
 code:
   paths: ["."]
@@ -244,6 +251,32 @@ conventions:
   also_search_memories: true
 ```
+## How Search Works
+Search returns **summaries with locations**, not full code:
+1. **What you get**: Function signatures, docstrings, class definitions
+2. **What you don't get**: Full implementations
+3. **What to do**: Use the file path + line number to read the full code
+This is intentional - embeddings work better on focused summaries than large code blocks. The search tells you *what exists and where*, then you read the file for details.
+For Claude/MCP usage: The search tool description instructs Claude to read returned file paths for full implementations before making code changes.
+### Hybrid Search
+Semantic search can lose qualifiers - "error handling in mobile app" might return web app results because "error handling" dominates the embedding. Use `require_terms` to ensure specific words appear:
+```bash
+# CLI
+ragtime search "error handling" -r mobile -r dart
+# MCP
+search(query="error handling", require_terms=["mobile", "dart"])
+```
+This combines semantic similarity (finds conceptually related content) with keyword filtering (ensures qualifiers aren't ignored).
 ## Code Indexing
 The code indexer extracts meaningful symbols from your codebase:
@@ -251,7 +284,7 @@ The code indexer extracts meaningful symbols from your codebase:
 | Language | What Gets Indexed |
 |----------|-------------------|
 | Python | Classes, methods, functions (with docstrings) |
-| TypeScript/JS | Exported functions, classes, interfaces, types, constants |
+| TypeScript/JS | Functions, classes, interfaces, types (exported and non-exported) |
 | Vue | Components, composable usage (useXxx calls) |
 | Dart | Classes, functions, mixins, extensions |

{ragtime_cli-0.2.9 → ragtime_cli-0.2.11}/src/cli.py RENAMED Viewed

@@ -469,12 +469,19 @@ def index(path: Path, index_type: str, clear: bool):
 @click.option("--path", type=click.Path(exists=True, path_type=Path), default=".")
 @click.option("--type", "type_filter", type=click.Choice(["all", "docs", "code"]), default="all")
 @click.option("--namespace", "-n", help="Filter by namespace")
+@click.option("--require", "-r", "require_terms", multiple=True,
+              help="Terms that MUST appear in results (repeatable)")
 @click.option("--include-archive", is_flag=True, help="Also search archived branches")
 @click.option("--limit", "-l", default=5, help="Max results")
 @click.option("--verbose", "-v", is_flag=True, help="Show full content")
 def search(query: str, path: Path, type_filter: str, namespace: str,
-           include_archive: bool, limit: int, verbose: bool):
-    """Search indexed content."""
+           require_terms: tuple, include_archive: bool, limit: int, verbose: bool):
+    """
+    Hybrid search: semantic similarity + keyword filtering.
+    Use --require/-r to ensure specific terms appear in results.
+    Example: ragtime search "error handling" -r mobile -r dart
+    """
     path = Path(path).resolve()
     db = get_db(path)
@@ -485,6 +492,7 @@ def search(query: str, path: Path, type_filter: str, namespace: str,
         limit=limit,
         type_filter=type_arg,
         namespace=namespace,
+        require_terms=list(require_terms) if require_terms else None,
     )
     if not results:

{ragtime_cli-0.2.9 → ragtime_cli-0.2.11}/src/config.py RENAMED Viewed

@@ -12,13 +12,14 @@ import yaml
 @dataclass
 class DocsConfig:
     """Configuration for docs indexing."""
-    paths: list[str] = field(default_factory=lambda: ["docs", ".ragtime"])
+    # Note: .ragtime/ is NOT included here - memories are indexed separately via 'reindex'
+    # to avoid duplicate entries (same file indexed as both doc and memory)
+    paths: list[str] = field(default_factory=lambda: ["docs"])
     patterns: list[str] = field(default_factory=lambda: ["**/*.md"])
     exclude: list[str] = field(default_factory=lambda: [
         "**/node_modules/**",
         "**/.git/**",
-        "**/.ragtime/index/**",
-        "**/.ragtime/branches/.*",  # Exclude synced (dot-prefixed) branches
+        "**/.ragtime/**",  # Memories indexed separately
     ])

{ragtime_cli-0.2.9 → ragtime_cli-0.2.11}/src/db.py RENAMED Viewed

@@ -84,48 +84,80 @@ class RagtimeDB:
         limit: int = 10,
         type_filter: str | None = None,
         namespace: str | None = None,
+        require_terms: list[str] | None = None,
         **filters,
     ) -> list[dict]:
         """
-        Semantic search over indexed content.
+        Hybrid search: semantic similarity + keyword filtering.
         Args:
             query: Natural language search query
             limit: Max results to return
             type_filter: "code" or "docs" (None = both)
             namespace: Filter by namespace (for docs)
-            **filters: Additional metadata filters
+            require_terms: List of terms that MUST appear in results (case-insensitive).
+                          Use for scoped queries like "error handling in mobile" with
+                          require_terms=["mobile"] to ensure "mobile" isn't ignored.
+            **filters: Additional metadata filters (None values are ignored)
         Returns:
             List of dicts with 'content', 'metadata', 'distance'
         """
-        where = {}
+        # Build list of filter conditions, excluding None values
+        conditions = []
         if type_filter:
-            where["type"] = type_filter
+            conditions.append({"type": type_filter})
         if namespace:
-            where["namespace"] = namespace
+            conditions.append({"namespace": namespace})
+        # Add any additional filters, but skip None values
         for key, value in filters.items():
-            where[key] = value
+            if value is not None:
+                conditions.append({key: value})
+        # ChromaDB requires $and for multiple conditions
+        if len(conditions) == 0:
+            where = None
+        elif len(conditions) == 1:
+            where = conditions[0]
+        else:
+            where = {"$and": conditions}
+        # When using require_terms, fetch more results since we'll filter some out
+        fetch_limit = limit * 5 if require_terms else limit
         results = self.collection.query(
             query_texts=[query],
-            n_results=limit,
-            where=where if where else None,
+            n_results=fetch_limit,
+            where=where,
         )
         # Flatten results into list of dicts
         output = []
         if results["documents"] and results["documents"][0]:
             for i, doc in enumerate(results["documents"][0]):
+                # Hybrid filtering: ensure required terms appear
+                if require_terms:
+                    doc_lower = doc.lower()
+                    # Also check file path in metadata for code/file matches
+                    file_path = (results["metadatas"][0][i].get("file", "") or "").lower()
+                    combined_text = f"{doc_lower} {file_path}"
+                    if not all(term.lower() in combined_text for term in require_terms):
+                        continue
                 output.append({
                     "content": doc,
                     "metadata": results["metadatas"][0][i] if results["metadatas"] else {},
                     "distance": results["distances"][0][i] if results["distances"] else None,
                 })
+                # Stop once we have enough
+                if len(output) >= limit:
+                    break
         return output
     def delete(self, ids: list[str]) -> None:

{ragtime_cli-0.2.9 → ragtime_cli-0.2.11}/src/mcp_server.py RENAMED Viewed

@@ -132,7 +132,7 @@ class RagtimeMCPServer:
             },
             {
                 "name": "search",
-                "description": "Semantic search over indexed code and docs. Returns function signatures, class definitions, and doc summaries with file paths and line numbers. IMPORTANT: Results are summaries only - use the Read tool on returned file paths to see full implementations before making code changes or decisions.",
+                "description": "Hybrid search over indexed code and docs (semantic + keyword). Returns function signatures, class definitions, and doc summaries with file paths and line numbers. IMPORTANT: Results are summaries only - use the Read tool on returned file paths to see full implementations before making code changes or decisions.",
                 "inputSchema": {
                     "type": "object",
                     "properties": {
@@ -152,6 +152,11 @@ class RagtimeMCPServer:
                             "type": "string",
                             "description": "Filter by component"
                         },
+                        "require_terms": {
+                            "type": "array",
+                            "items": {"type": "string"},
+                            "description": "Terms that MUST appear in results (case-insensitive). Use for scoped queries like 'error handling in mobile' with require_terms=['mobile'] to ensure the qualifier isn't lost in semantic search."
+                        },
                         "limit": {
                             "type": "integer",
                             "default": 10,
@@ -333,13 +338,14 @@ class RagtimeMCPServer:
         }
     def _search(self, args: dict) -> dict:
-        """Search indexed content."""
+        """Search indexed content with hybrid semantic + keyword matching."""
         results = self.db.search(
             query=args["query"],
             limit=args.get("limit", 10),
             namespace=args.get("namespace"),
             type_filter=args.get("type"),
             component=args.get("component"),
+            require_terms=args.get("require_terms"),
         )
         return {
@@ -487,7 +493,7 @@ class RagtimeMCPServer:
                         "protocolVersion": "2024-11-05",
                         "serverInfo": {
                             "name": "ragtime",
-                            "version": "0.2.9",
+                            "version": "0.2.11",
                         },
                         "capabilities": {
                             "tools": {},

{ragtime_cli-0.2.9 → ragtime_cli-0.2.11}/src/memory.py RENAMED Viewed

@@ -32,6 +32,8 @@ class Memory:
     epic: Optional[str] = None
     branch: Optional[str] = None
     supersedes: Optional[str] = None
+    # Internal: actual file path when loaded from disk (not serialized)
+    _file_path: Optional[str] = field(default=None, repr=False)
     def to_frontmatter(self) -> dict:
         """Convert to YAML frontmatter dict."""
@@ -71,7 +73,8 @@ class Memory:
     def to_metadata(self) -> dict:
         """Convert to metadata dict for ChromaDB."""
         meta = self.to_frontmatter()
-        meta["file"] = self.get_relative_path()
+        # Use actual file path if loaded from disk, otherwise generate it
+        meta["file"] = self._file_path if self._file_path else self.get_relative_path()
         return meta
     def get_relative_path(self) -> str:
@@ -107,8 +110,14 @@ class Memory:
         return slug[:40]  # Limit length
     @classmethod
-    def from_file(cls, path: Path) -> "Memory":
-        """Parse a memory from a markdown file with YAML frontmatter."""
+    def from_file(cls, path: Path, relative_to: Optional[Path] = None) -> "Memory":
+        """
+        Parse a memory from a markdown file with YAML frontmatter.
+        Args:
+            path: Full path to the markdown file
+            relative_to: Base directory to compute relative path from (for indexing)
+        """
         text = path.read_text()
         if not text.startswith("---"):
@@ -122,6 +131,14 @@ class Memory:
         frontmatter = yaml.safe_load(parts[1])
         content = parts[2].strip()
+        # Compute relative file path for indexing
+        file_path = None
+        if relative_to:
+            try:
+                file_path = str(path.relative_to(relative_to))
+            except ValueError:
+                pass  # path not relative to base, will regenerate
         return cls(
             id=frontmatter.get("id", str(uuid.uuid4())[:8]),
             content=content,
@@ -138,6 +155,7 @@ class Memory:
             epic=frontmatter.get("epic"),
             branch=frontmatter.get("branch"),
             supersedes=frontmatter.get("supersedes"),
+            _file_path=file_path,
         )
@@ -189,24 +207,41 @@ class MemoryStore:
     def get(self, memory_id: str) -> Optional[Memory]:
         """Get a memory by ID."""
-        # Search in ChromaDB to find the file
-        results = self.db.collection.get(ids=[memory_id])
+        # Search in ChromaDB to find the memory
+        results = self.db.collection.get(ids=[memory_id], include=["documents", "metadatas"])
         if not results["ids"]:
             return None
         metadata = results["metadatas"][0]
+        content = results["documents"][0] if results["documents"] else ""
         file_rel_path = metadata.get("file", "")
-        if not file_rel_path:
-            return None
-        file_path = self.memory_dir / file_rel_path
+        # Try to read from file first (has full frontmatter data)
+        if file_rel_path:
+            file_path = self.memory_dir / file_rel_path
+            if file_path.exists():
+                return Memory.from_file(file_path, relative_to=self.memory_dir)
-        if file_path.exists():
-            return Memory.from_file(file_path)
-        return None
+        # Fall back to constructing from ChromaDB data
+        # This handles cases where file path is wrong or file was deleted
+        return Memory(
+            id=memory_id,
+            content=content,
+            namespace=metadata.get("namespace", "unknown"),
+            type=metadata.get("type", "unknown"),
+            component=metadata.get("component"),
+            confidence=metadata.get("confidence", "medium"),
+            confidence_reason=metadata.get("confidence_reason"),
+            source=metadata.get("source", "unknown"),
+            status=metadata.get("status", "active"),
+            added=metadata.get("added", ""),
+            author=metadata.get("author"),
+            issue=metadata.get("issue"),
+            epic=metadata.get("epic"),
+            branch=metadata.get("branch"),
+            _file_path=file_rel_path,
+        )
     def delete(self, memory_id: str) -> bool:
         """Delete a memory by ID."""
@@ -283,29 +318,44 @@ class MemoryStore:
         limit: int = 100,
     ) -> list[Memory]:
         """List memories with optional filters."""
-        where = {}
+        # Build filter conditions
+        conditions = []
+        namespace_prefix = None
         if namespace:
             if namespace.endswith("*"):
-                # Prefix match - ChromaDB doesn't support this directly
-                # We'll filter in Python
-                pass
+                # Prefix match - filter in Python after fetching
+                namespace_prefix = namespace[:-1]
             else:
-                where["namespace"] = namespace
+                conditions.append({"namespace": namespace})
         if type_filter:
-            where["type"] = type_filter
+            conditions.append({"type": type_filter})
         if status:
-            where["status"] = status
+            conditions.append({"status": status})
         if component:
-            where["component"] = component
+            conditions.append({"component": component})
+        # Exclude docs/code entries - they use type="docs" or type="code"
+        # while memories use types like "architecture", "feature", etc.
+        # This is especially important for wildcard queries
+        conditions.append({"type": {"$nin": ["docs", "code"]}})
+        # Build where clause with $and if multiple conditions
+        if len(conditions) == 1:
+            where = conditions[0]
+        else:
+            where = {"$and": conditions}
+        # When using prefix match, fetch more results since we'll filter some out
+        fetch_limit = limit * 5 if namespace_prefix else limit
         # Get from ChromaDB
         results = self.db.collection.get(
-            where=where if where else None,
-            limit=limit,
+            where=where,
+            limit=fetch_limit,
         )
         memories = []
@@ -314,9 +364,8 @@ class MemoryStore:
             content = results["documents"][i] if results["documents"] else ""
             # Handle namespace prefix filtering
-            if namespace and namespace.endswith("*"):
-                prefix = namespace[:-1]
-                if not metadata.get("namespace", "").startswith(prefix):
+            if namespace_prefix:
+                if not metadata.get("namespace", "").startswith(namespace_prefix):
                     continue
             memories.append(Memory(
@@ -332,6 +381,10 @@ class MemoryStore:
                 author=metadata.get("author"),
             ))
+            # Stop once we have enough
+            if len(memories) >= limit:
+                break
         return memories
     def store_document(self, file_path: Path, namespace: str, doc_type: str = "handoff") -> Memory:
@@ -367,7 +420,8 @@ class MemoryStore:
         count = 0
         for md_file in self.memory_dir.rglob("*.md"):
             try:
-                memory = Memory.from_file(md_file)
+                # Pass memory_dir so the actual file path is stored, not regenerated
+                memory = Memory.from_file(md_file, relative_to=self.memory_dir)
                 self.db.upsert(
                     ids=[memory.id],
                     documents=[memory.content],