PyPI - ragtime-cli - Versions diffs - 0.2.10__tar.gz → 0.2.12__tar.gz - Mend

ragtime-cli 0.2.10tar.gz → 0.2.12tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

{ragtime_cli-0.2.10/ragtime_cli.egg-info → ragtime_cli-0.2.12}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: ragtime-cli
-Version: 0.2.10
+Version: 0.2.12
 Summary: Local-first memory and RAG system for Claude Code - semantic search over code, docs, and team knowledge
 Author-email: Bret Martineau <bretwardjames@gmail.com>
 License-Expression: MIT
@@ -121,6 +121,10 @@ ragtime search "useAsyncState" --type code
 # Search only docs
 ragtime search "authentication" --type docs --namespace app
+# Hybrid search: semantic + keyword filtering
+# Use -r/--require to ensure terms appear in results
+ragtime search "error handling" -r mobile -r dart
 # Reindex memory files
 ragtime reindex
@@ -233,9 +237,9 @@ ragtime setup-ghp
 ```yaml
 docs:
-  paths: ["docs", ".ragtime"]
+  paths: ["docs"]
   patterns: ["**/*.md"]
-  exclude: ["**/node_modules/**"]
+  exclude: ["**/node_modules/**", "**/.ragtime/**"]
 code:
   paths: ["."]
@@ -259,6 +263,20 @@ This is intentional - embeddings work better on focused summaries than large cod
 For Claude/MCP usage: The search tool description instructs Claude to read returned file paths for full implementations before making code changes.
+### Hybrid Search
+Semantic search can lose qualifiers - "error handling in mobile app" might return web app results because "error handling" dominates the embedding. Use `require_terms` to ensure specific words appear:
+```bash
+# CLI
+ragtime search "error handling" -r mobile -r dart
+# MCP
+search(query="error handling", require_terms=["mobile", "dart"])
+```
+This combines semantic similarity (finds conceptually related content) with keyword filtering (ensures qualifiers aren't ignored).
 ## Code Indexing
 The code indexer extracts meaningful symbols from your codebase:

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/README.md RENAMED Viewed

@@ -91,6 +91,10 @@ ragtime search "useAsyncState" --type code
 # Search only docs
 ragtime search "authentication" --type docs --namespace app
+# Hybrid search: semantic + keyword filtering
+# Use -r/--require to ensure terms appear in results
+ragtime search "error handling" -r mobile -r dart
 # Reindex memory files
 ragtime reindex
@@ -203,9 +207,9 @@ ragtime setup-ghp
 ```yaml
 docs:
-  paths: ["docs", ".ragtime"]
+  paths: ["docs"]
   patterns: ["**/*.md"]
-  exclude: ["**/node_modules/**"]
+  exclude: ["**/node_modules/**", "**/.ragtime/**"]
 code:
   paths: ["."]
@@ -229,6 +233,20 @@ This is intentional - embeddings work better on focused summaries than large cod
 For Claude/MCP usage: The search tool description instructs Claude to read returned file paths for full implementations before making code changes.
+### Hybrid Search
+Semantic search can lose qualifiers - "error handling in mobile app" might return web app results because "error handling" dominates the embedding. Use `require_terms` to ensure specific words appear:
+```bash
+# CLI
+ragtime search "error handling" -r mobile -r dart
+# MCP
+search(query="error handling", require_terms=["mobile", "dart"])
+```
+This combines semantic similarity (finds conceptually related content) with keyword filtering (ensures qualifiers aren't ignored).
 ## Code Indexing
 The code indexer extracts meaningful symbols from your codebase:

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "ragtime-cli"
-version = "0.2.10"
+version = "0.2.12"
 description = "Local-first memory and RAG system for Claude Code - semantic search over code, docs, and team knowledge"
 readme = "README.md"
 license = "MIT"

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12/ragtime_cli.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: ragtime-cli
-Version: 0.2.10
+Version: 0.2.12
 Summary: Local-first memory and RAG system for Claude Code - semantic search over code, docs, and team knowledge
 Author-email: Bret Martineau <bretwardjames@gmail.com>
 License-Expression: MIT
@@ -121,6 +121,10 @@ ragtime search "useAsyncState" --type code
 # Search only docs
 ragtime search "authentication" --type docs --namespace app
+# Hybrid search: semantic + keyword filtering
+# Use -r/--require to ensure terms appear in results
+ragtime search "error handling" -r mobile -r dart
 # Reindex memory files
 ragtime reindex
@@ -233,9 +237,9 @@ ragtime setup-ghp
 ```yaml
 docs:
-  paths: ["docs", ".ragtime"]
+  paths: ["docs"]
   patterns: ["**/*.md"]
-  exclude: ["**/node_modules/**"]
+  exclude: ["**/node_modules/**", "**/.ragtime/**"]
 code:
   paths: ["."]
@@ -259,6 +263,20 @@ This is intentional - embeddings work better on focused summaries than large cod
 For Claude/MCP usage: The search tool description instructs Claude to read returned file paths for full implementations before making code changes.
+### Hybrid Search
+Semantic search can lose qualifiers - "error handling in mobile app" might return web app results because "error handling" dominates the embedding. Use `require_terms` to ensure specific words appear:
+```bash
+# CLI
+ragtime search "error handling" -r mobile -r dart
+# MCP
+search(query="error handling", require_terms=["mobile", "dart"])
+```
+This combines semantic similarity (finds conceptually related content) with keyword filtering (ensures qualifiers aren't ignored).
 ## Code Indexing
 The code indexer extracts meaningful symbols from your codebase:

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/src/cli.py RENAMED Viewed

@@ -469,12 +469,19 @@ def index(path: Path, index_type: str, clear: bool):
 @click.option("--path", type=click.Path(exists=True, path_type=Path), default=".")
 @click.option("--type", "type_filter", type=click.Choice(["all", "docs", "code"]), default="all")
 @click.option("--namespace", "-n", help="Filter by namespace")
+@click.option("--require", "-r", "require_terms", multiple=True,
+              help="Terms that MUST appear in results (repeatable)")
 @click.option("--include-archive", is_flag=True, help="Also search archived branches")
 @click.option("--limit", "-l", default=5, help="Max results")
 @click.option("--verbose", "-v", is_flag=True, help="Show full content")
 def search(query: str, path: Path, type_filter: str, namespace: str,
-           include_archive: bool, limit: int, verbose: bool):
-    """Search indexed content."""
+           require_terms: tuple, include_archive: bool, limit: int, verbose: bool):
+    """
+    Hybrid search: semantic similarity + keyword filtering.
+    Use --require/-r to ensure specific terms appear in results.
+    Example: ragtime search "error handling" -r mobile -r dart
+    """
     path = Path(path).resolve()
     db = get_db(path)
@@ -485,6 +492,7 @@ def search(query: str, path: Path, type_filter: str, namespace: str,
         limit=limit,
         type_filter=type_arg,
         namespace=namespace,
+        require_terms=list(require_terms) if require_terms else None,
     )
     if not results:
@@ -726,6 +734,51 @@ def reindex(path: Path):
     click.echo(f"✓ Reindexed {count} memory files")
+@main.command()
+@click.option("--path", type=click.Path(exists=True, path_type=Path), default=".")
+@click.option("--dry-run", is_flag=True, help="Show duplicates without removing them")
+def dedupe(path: Path, dry_run: bool):
+    """Remove duplicate entries from the index.
+    Keeps one entry per unique file path, removing duplicates created
+    by older versions of reindex that generated random IDs.
+    """
+    path = Path(path).resolve()
+    db = get_db(path)
+    # Get all entries with their file paths
+    results = db.collection.get(include=["metadatas"])
+    # Group by file path
+    by_file: dict[str, list[str]] = {}
+    for i, mem_id in enumerate(results["ids"]):
+        file_path = results["metadatas"][i].get("file", "")
+        if file_path:
+            if file_path not in by_file:
+                by_file[file_path] = []
+            by_file[file_path].append(mem_id)
+    # Find duplicates
+    duplicates_to_remove = []
+    for file_path, ids in by_file.items():
+        if len(ids) > 1:
+            # Keep the first one, remove the rest
+            duplicates_to_remove.extend(ids[1:])
+            if dry_run:
+                click.echo(f"  {file_path}: {len(ids)} copies (would remove {len(ids) - 1})")
+    if not duplicates_to_remove:
+        click.echo("✓ No duplicates found")
+        return
+    if dry_run:
+        click.echo(f"\nWould remove {len(duplicates_to_remove)} duplicate entries")
+        click.echo("Run without --dry-run to remove them")
+    else:
+        db.delete(duplicates_to_remove)
+        click.echo(f"✓ Removed {len(duplicates_to_remove)} duplicate entries")
 @main.command("new-branch")
 @click.argument("issue", type=int)
 @click.option("--path", type=click.Path(exists=True, path_type=Path), default=".")

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/src/config.py RENAMED Viewed

@@ -12,13 +12,14 @@ import yaml
 @dataclass
 class DocsConfig:
     """Configuration for docs indexing."""
-    paths: list[str] = field(default_factory=lambda: ["docs", ".ragtime"])
+    # Note: .ragtime/ is NOT included here - memories are indexed separately via 'reindex'
+    # to avoid duplicate entries (same file indexed as both doc and memory)
+    paths: list[str] = field(default_factory=lambda: ["docs"])
     patterns: list[str] = field(default_factory=lambda: ["**/*.md"])
     exclude: list[str] = field(default_factory=lambda: [
         "**/node_modules/**",
         "**/.git/**",
-        "**/.ragtime/index/**",
-        "**/.ragtime/branches/.*",  # Exclude synced (dot-prefixed) branches
+        "**/.ragtime/**",  # Memories indexed separately
     ])

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/src/db.py RENAMED Viewed

@@ -84,16 +84,20 @@ class RagtimeDB:
         limit: int = 10,
         type_filter: str | None = None,
         namespace: str | None = None,
+        require_terms: list[str] | None = None,
         **filters,
     ) -> list[dict]:
         """
-        Semantic search over indexed content.
+        Hybrid search: semantic similarity + keyword filtering.
         Args:
             query: Natural language search query
             limit: Max results to return
             type_filter: "code" or "docs" (None = both)
             namespace: Filter by namespace (for docs)
+            require_terms: List of terms that MUST appear in results (case-insensitive).
+                          Use for scoped queries like "error handling in mobile" with
+                          require_terms=["mobile"] to ensure "mobile" isn't ignored.
             **filters: Additional metadata filters (None values are ignored)
         Returns:
@@ -121,9 +125,12 @@ class RagtimeDB:
         else:
             where = {"$and": conditions}
+        # When using require_terms, fetch more results since we'll filter some out
+        fetch_limit = limit * 5 if require_terms else limit
         results = self.collection.query(
             query_texts=[query],
-            n_results=limit,
+            n_results=fetch_limit,
             where=where,
         )
@@ -131,12 +138,26 @@ class RagtimeDB:
         output = []
         if results["documents"] and results["documents"][0]:
             for i, doc in enumerate(results["documents"][0]):
+                # Hybrid filtering: ensure required terms appear
+                if require_terms:
+                    doc_lower = doc.lower()
+                    # Also check file path in metadata for code/file matches
+                    file_path = (results["metadatas"][0][i].get("file", "") or "").lower()
+                    combined_text = f"{doc_lower} {file_path}"
+                    if not all(term.lower() in combined_text for term in require_terms):
+                        continue
                 output.append({
                     "content": doc,
                     "metadata": results["metadatas"][0][i] if results["metadatas"] else {},
                     "distance": results["distances"][0][i] if results["distances"] else None,
                 })
+                # Stop once we have enough
+                if len(output) >= limit:
+                    break
         return output
     def delete(self, ids: list[str]) -> None:

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/src/mcp_server.py RENAMED Viewed

@@ -132,7 +132,7 @@ class RagtimeMCPServer:
             },
             {
                 "name": "search",
-                "description": "Semantic search over indexed code and docs. Returns function signatures, class definitions, and doc summaries with file paths and line numbers. IMPORTANT: Results are summaries only - use the Read tool on returned file paths to see full implementations before making code changes or decisions.",
+                "description": "Hybrid search over indexed code and docs (semantic + keyword). Returns function signatures, class definitions, and doc summaries with file paths and line numbers. IMPORTANT: Results are summaries only - use the Read tool on returned file paths to see full implementations before making code changes or decisions.",
                 "inputSchema": {
                     "type": "object",
                     "properties": {
@@ -152,6 +152,11 @@ class RagtimeMCPServer:
                             "type": "string",
                             "description": "Filter by component"
                         },
+                        "require_terms": {
+                            "type": "array",
+                            "items": {"type": "string"},
+                            "description": "Terms that MUST appear in results (case-insensitive). Use for scoped queries like 'error handling in mobile' with require_terms=['mobile'] to ensure the qualifier isn't lost in semantic search."
+                        },
                         "limit": {
                             "type": "integer",
                             "default": 10,
@@ -333,13 +338,14 @@ class RagtimeMCPServer:
         }
     def _search(self, args: dict) -> dict:
-        """Search indexed content."""
+        """Search indexed content with hybrid semantic + keyword matching."""
         results = self.db.search(
             query=args["query"],
             limit=args.get("limit", 10),
             namespace=args.get("namespace"),
             type_filter=args.get("type"),
             component=args.get("component"),
+            require_terms=args.get("require_terms"),
         )
         return {
@@ -487,7 +493,7 @@ class RagtimeMCPServer:
                         "protocolVersion": "2024-11-05",
                         "serverInfo": {
                             "name": "ragtime",
-                            "version": "0.2.10",
+                            "version": "0.2.12",
                         },
                         "capabilities": {
                             "tools": {},

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/src/memory.py RENAMED Viewed

@@ -10,6 +10,7 @@ from dataclasses import dataclass, field
 from datetime import date
 from typing import Optional
 import uuid
+import hashlib
 import re
 import yaml
@@ -139,8 +140,19 @@ class Memory:
             except ValueError:
                 pass  # path not relative to base, will regenerate
+        # Use frontmatter ID if present, otherwise derive stable ID from file path
+        # This ensures reindex is idempotent - same file always gets same ID
+        if "id" in frontmatter:
+            memory_id = frontmatter["id"]
+        elif file_path:
+            # Stable hash of relative path
+            memory_id = hashlib.sha256(file_path.encode()).hexdigest()[:8]
+        else:
+            # Fallback: hash of absolute path
+            memory_id = hashlib.sha256(str(path).encode()).hexdigest()[:8]
         return cls(
-            id=frontmatter.get("id", str(uuid.uuid4())[:8]),
+            id=memory_id,
             content=content,
             namespace=frontmatter.get("namespace", "app"),
             type=frontmatter.get("type", "unknown"),
@@ -207,25 +219,41 @@ class MemoryStore:
     def get(self, memory_id: str) -> Optional[Memory]:
         """Get a memory by ID."""
-        # Search in ChromaDB to find the file
-        results = self.db.collection.get(ids=[memory_id])
+        # Search in ChromaDB to find the memory
+        results = self.db.collection.get(ids=[memory_id], include=["documents", "metadatas"])
         if not results["ids"]:
             return None
         metadata = results["metadatas"][0]
+        content = results["documents"][0] if results["documents"] else ""
         file_rel_path = metadata.get("file", "")
-        if not file_rel_path:
-            return None
+        # Try to read from file first (has full frontmatter data)
+        if file_rel_path:
+            file_path = self.memory_dir / file_rel_path
+            if file_path.exists():
+                return Memory.from_file(file_path, relative_to=self.memory_dir)
-        file_path = self.memory_dir / file_rel_path
-        if file_path.exists():
-            # Pass relative_to so the memory preserves its actual file path
-            return Memory.from_file(file_path, relative_to=self.memory_dir)
-        return None
+        # Fall back to constructing from ChromaDB data
+        # This handles cases where file path is wrong or file was deleted
+        return Memory(
+            id=memory_id,
+            content=content,
+            namespace=metadata.get("namespace", "unknown"),
+            type=metadata.get("type", "unknown"),
+            component=metadata.get("component"),
+            confidence=metadata.get("confidence", "medium"),
+            confidence_reason=metadata.get("confidence_reason"),
+            source=metadata.get("source", "unknown"),
+            status=metadata.get("status", "active"),
+            added=metadata.get("added", ""),
+            author=metadata.get("author"),
+            issue=metadata.get("issue"),
+            epic=metadata.get("epic"),
+            branch=metadata.get("branch"),
+            _file_path=file_rel_path,
+        )
     def delete(self, memory_id: str) -> bool:
         """Delete a memory by ID."""
@@ -322,10 +350,13 @@ class MemoryStore:
         if component:
             conditions.append({"component": component})
+        # Exclude docs/code entries - they use type="docs" or type="code"
+        # while memories use types like "architecture", "feature", etc.
+        # This is especially important for wildcard queries
+        conditions.append({"type": {"$nin": ["docs", "code"]}})
         # Build where clause with $and if multiple conditions
-        if len(conditions) == 0:
-            where = None
-        elif len(conditions) == 1:
+        if len(conditions) == 1:
             where = conditions[0]
         else:
             where = {"$and": conditions}

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/LICENSE RENAMED Viewed

File without changes

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/ragtime_cli.egg-info/SOURCES.txt RENAMED Viewed

File without changes

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/ragtime_cli.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/ragtime_cli.egg-info/entry_points.txt RENAMED Viewed

File without changes

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/ragtime_cli.egg-info/requires.txt RENAMED Viewed

File without changes

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/ragtime_cli.egg-info/top_level.txt RENAMED Viewed

File without changes

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/setup.cfg RENAMED Viewed

File without changes

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/src/__init__.py RENAMED Viewed

File without changes

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/src/commands/audit.md RENAMED Viewed

File without changes

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/src/commands/create-pr.md RENAMED Viewed

File without changes

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/src/commands/generate-docs.md RENAMED Viewed

File without changes

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/src/commands/handoff.md RENAMED Viewed

File without changes

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/src/commands/import-docs.md RENAMED Viewed

File without changes

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/src/commands/pr-graduate.md RENAMED Viewed

File without changes

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/src/commands/recall.md RENAMED Viewed

File without changes

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/src/commands/remember.md RENAMED Viewed

File without changes

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/src/commands/save.md RENAMED Viewed

File without changes

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/src/commands/start.md RENAMED Viewed

File without changes

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/src/indexers/__init__.py RENAMED Viewed

File without changes

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/src/indexers/code.py RENAMED Viewed

File without changes

{ragtime_cli-0.2.10 → ragtime_cli-0.2.12}/src/indexers/docs.py RENAMED Viewed

File without changes

ragtime-cli 0.2.10__tar.gz → 0.2.12__tar.gz

ragtime-cli 0.2.10tar.gz → 0.2.12tar.gz