npm - claude-self-reflect - Versions diffs - 3.2.2 → 3.2.4 - Mend

claude-self-reflect 3.2.2 → 3.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/.claude/agents/documentation-writer.md +1 -1
package/.claude/agents/qdrant-specialist.md +2 -2
package/.claude/agents/reflection-specialist.md +2 -2
package/.claude/agents/search-optimizer.md +9 -7
package/README.md +6 -8
package/installer/cli.js +12 -4
package/mcp-server/pyproject.toml +1 -1
package/mcp-server/src/project_resolver.py +28 -46
package/mcp-server/src/server.py +80 -62
package/package.json +2 -1
package/scripts/import-conversations-unified.py +12 -4
package/scripts/importer/utils/project_normalizer.py +22 -9
package/shared/__init__.py +5 -0
package/shared/normalization.py +54 -0

package/.claude/agents/documentation-writer.md CHANGED Viewed

@@ -48,7 +48,7 @@ You are a technical documentation specialist for the Claude Self Reflect project
  * @param query - Natural language search query
  * @param options - Search configuration options
  * @param options.limit - Maximum results to return (default: 10)
- * @param options.threshold - Minimum similarity score 0-1 (default: 0.7)
+ * @param options.threshold - Minimum similarity score 0-1 (removed in v3.2.4 - uses natural scoring)
  * @param options.project - Filter by specific project name
  * @returns Promise resolving to array of search results
  *

package/.claude/agents/qdrant-specialist.md CHANGED Viewed

@@ -12,7 +12,7 @@ You are a Qdrant vector database specialist for the claude-self-reflect project.
 - Collections use per-project isolation: `conv_<md5_hash>_local` or `conv_<md5_hash>_voyage` naming
 - Project paths: ~/.claude/projects/-Users-{username}-projects-{project-name}/*.jsonl
 - Project name is extracted from path and MD5 hashed for collection naming
-- Cross-collection search enabled with 0.7 similarity threshold
+- Cross-collection search uses Qdrant's natural scoring (no artificial thresholds since v3.2.4)
 - Streaming importer detects file growth and processes new lines incrementally
 - MCP server expects collections to match project name MD5 hash
@@ -195,7 +195,7 @@ docker stats qdrant
 ## Project-Specific Rules
 - Always use Voyage AI embeddings for consistency
-- Maintain 0.7 similarity threshold as baseline
+- Use Qdrant's natural scoring (no artificial thresholds since v3.2.4)
 - Preserve per-project collection isolation
 - Do not grep JSONL files unless explicitly asked
 - Always verify the MCP integration works end-to-end

package/.claude/agents/reflection-specialist.md CHANGED Viewed

@@ -128,7 +128,7 @@ Fast search that returns only the count and top result. Perfect for quick checks
 // Quick overview of matches
 {
   query: "authentication patterns",
-  min_score: 0.5,  // Optional, defaults to 0.7
+  min_score: 0.5,  // Optional (v3.2.4+ ignores this - uses natural scoring)
   project: "all"    // Optional, defaults to current project
 }
 ```
@@ -165,7 +165,7 @@ Pagination support for getting additional results after an initial search.
   query: "original search query",  // Must match original query
   offset: 3,                      // Skip first 3 results
   limit: 3,                       // Get next 3 results
-  min_score: 0.7,                 // Optional
+  min_score: 0.7,                 // Optional (v3.2.4+ ignores this)
   project: "all"                  // Optional
 }
 ```

package/.claude/agents/search-optimizer.md CHANGED Viewed

@@ -9,7 +9,7 @@ You are a search optimization specialist for the claude-self-reflect project. Yo
 ## Project Context
 - Current baseline: 66.1% search accuracy with Voyage AI
 - Gemini comparison showed 70-77% accuracy but 50% slower
-- Default similarity threshold: 0.7
+- Search scoring: Uses Qdrant's natural scoring (no artificial thresholds as of v3.2.4)
 - Cross-collection search adds ~100ms overhead
 - 24+ projects with 10,165+ conversation chunks
@@ -71,9 +71,11 @@ python scripts/analyze-search-quality.py
 ### Threshold Tuning
 ```bash
 # Test different thresholds
-for threshold in 0.5 0.6 0.7 0.8 0.9; do
-  echo "Testing threshold: $threshold"
-  SIMILARITY_THRESHOLD=$threshold npm test
+# Note: As of v3.2.4, artificial thresholds removed
+# Focus on embedding model comparison instead
+for model in voyage openai gemini; do
+  echo "Testing model: $model"
+  EMBEDDING_MODEL=$model npm test
 done
 # Find optimal threshold
@@ -237,7 +239,7 @@ def calculate_mrr(queries, results):
 interface ABTestConfig {
   control: {
     model: 'voyage',
-    threshold: 0.7,
+    scoring: 'natural',
     limit: 10
   },
   variant: {
@@ -285,7 +287,7 @@ async function abTestSearch(query: string, userId: string) {
 ### Recommended Settings
 ```env
 # Search Configuration
-SIMILARITY_THRESHOLD=0.7
+# SIMILARITY_THRESHOLD removed in v3.2.4 - uses natural scoring
 SEARCH_LIMIT=10
 CROSS_COLLECTION_LIMIT=5
@@ -300,7 +302,7 @@ SAMPLE_RATE=0.1
 ```
 ## Project-Specific Rules
-- Maintain 0.7 similarity threshold as baseline
+- Use Qdrant's natural scoring (no artificial thresholds since v3.2.4)
 - Always compare against Voyage AI baseline (66.1%)
 - Consider search latency alongside accuracy
 - Test with real conversation data

package/README.md CHANGED Viewed

@@ -116,11 +116,9 @@ Works with [Claude Code Statusline](https://github.com/sirmalloc/ccstatusline) -
 <summary><b>MCP Tools Available to Claude</b></summary>
 **Search & Memory Tools:**
-- `reflect_on_past` - Search past conversations using semantic similarity with time decay
+- `reflect_on_past` - Search past conversations using semantic similarity with time decay (supports quick/summary modes)
 - `store_reflection` - Store important insights or learnings for future reference
-- `quick_search` - Fast search returning only count and top result
-- `search_summary` - Get aggregated insights without individual details
-- `get_more_results` - Paginate through additional search results
+- `get_next_results` - Paginate through additional search results
 - `search_by_file` - Find conversations that analyzed specific files
 - `search_by_concept` - Search for conversations about development concepts
 - `get_full_conversation` - Retrieve complete JSONL conversation files (v2.8.8)
@@ -288,11 +286,11 @@ npm uninstall -g claude-self-reflect
 ## What's New
 <details>
-<summary>v2.8.8 - Latest Release</summary>
+<summary>v3.2.4 - Latest Release</summary>
-- **Full Conversation Access**: New `get_full_conversation` tool provides complete JSONL files instead of 200-char excerpts
-- **95% Value Increase**: Agents can now access entire conversations with full implementation details
-- **Direct File Access**: Returns absolute paths for efficient reading with standard tools
+- **CRITICAL: Search Threshold Removal**: Eliminated artificial 0.7+ thresholds that blocked broad searches like "docker", "MCP", "python"
+- **Shared Normalization Module**: Created centralized project name normalization preventing search failures
+- **Memory Decay Fixes**: Corrected mathematical errors in exponential decay calculation
 </details>

package/installer/cli.js CHANGED Viewed

@@ -29,13 +29,21 @@ async function setup() {
 }
 async function status() {
-  // Call the Python MCP server's --status command
+  // Call the Python status script directly
   const mcpServerPath = join(__dirname, '..', 'mcp-server');
-  const venvPython = join(mcpServerPath, 'venv', 'bin', 'python');
-  const mcpModule = join(mcpServerPath, 'src');
+  // Check for venv or .venv
+  let venvPython = join(mcpServerPath, 'venv', 'bin', 'python');
   try {
-    const child = spawn(venvPython, ['-m', 'src', '--status'], {
+    await fs.access(venvPython);
+  } catch {
+    venvPython = join(mcpServerPath, '.venv', 'bin', 'python');
+  }
+  const statusScript = join(mcpServerPath, 'src', 'status.py');
+  try {
+    const child = spawn(venvPython, [statusScript], {
       cwd: mcpServerPath,
       stdio: ['inherit', 'pipe', 'pipe']
     });

package/mcp-server/pyproject.toml CHANGED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "claude-self-reflect-mcp"
-version = "2.8.9"
+version = "2.8.10"
 description = "MCP server for Claude self-reflection with memory decay"
 # readme = "README.md"
 requires-python = ">=3.10"

package/mcp-server/src/project_resolver.py CHANGED Viewed

@@ -6,11 +6,21 @@ Handles mapping between user-friendly names and internal collection names.
 import hashlib
 import logging
 import re
+import sys
 from pathlib import Path
 from typing import List, Dict, Optional, Set
 from time import time
 from qdrant_client import QdrantClient
+# Import from shared module for consistent normalization
+sys.path.insert(0, str(Path(__file__).parent.parent.parent))
+try:
+    from shared.normalization import normalize_project_name
+except ImportError:
+    # Fall back to creating local version if shared module not found
+    logging.warning("Could not import shared normalization module")
+    normalize_project_name = None
 logger = logging.getLogger(__name__)
 # Project discovery markers - common parent directories that indicate project roots
@@ -244,59 +254,31 @@ class ProjectResolver:
     def _normalize_project_name(self, project_path: str) -> str:
         """
         Normalize project name for consistent hashing.
-        Extracts the actual project name from various path formats.
+        Uses the shared normalization module to ensure consistency
+        with import scripts.
         """
+        # Use the shared normalization function if available
+        if normalize_project_name:
+            return normalize_project_name(project_path)
+        # Fallback implementation - EXACT copy of shared module
         if not project_path:
             return ""
-        # Remove trailing slashes
-        project_path = project_path.rstrip('/')
+        path = Path(project_path.rstrip('/'))
-        # Handle Claude logs format (starts with dash)
-        if project_path.startswith('-'):
-            # Split on dashes but don't convert to path separators
-            # This preserves project names that contain dashes
-            path_str = project_path[1:]  # Remove leading dash
-            path_parts = path_str.split('-')  # Split on dashes, not path separators
-            # Look for common project parent directories
-            project_parents = {'projects', 'code', 'Code', 'repos', 'repositories',
-                              'dev', 'Development', 'work', 'src', 'github'}
-            # Find the project name after a known parent directory
-            for i, part in enumerate(path_parts):
-                if part.lower() in project_parents and i + 1 < len(path_parts):
-                    # Return everything after the parent directory
-                    remaining = path_parts[i + 1:]
-                    # Use segment-based approach for complex paths
-                    # Return the most likely project name from remaining segments
-                    if remaining:
-                        # If it's a single segment, return it
-                        if len(remaining) == 1:
-                            return remaining[0]
-                        # For multiple segments, look for project-like patterns
-                        for r in remaining:
-                            r_lower = r.lower()
-                            # Prioritize segments with project indicators
-                            if any(ind in r_lower for ind in ['app', 'service', 'project', 'api', 'client']):
-                                return r
-                    # Otherwise join remaining parts
-                    return '-'.join(remaining)
-            # Fallback: use the last component
-            return path_parts[-1] if path_parts else project_path
+        # Extract the final directory name
+        final_component = path.name
-        # For regular paths or simple names
-        path_obj = Path(project_path)
+        # If it's Claude's dash-separated format, extract project name
+        if final_component.startswith('-') and 'projects' in final_component:
+            # Find the last occurrence of 'projects-' to handle edge cases
+            idx = final_component.rfind('projects-')
+            if idx != -1:
+                return final_component[idx + len('projects-'):]
-        # If it's already a simple name, return it
-        if '/' not in project_path and '\\' not in project_path:
-            return project_path
-        # Otherwise extract from path
-        return path_obj.name
+        # For regular paths, just return the directory name
+        return final_component if final_component else path.parent.name
     def _project_matches(self, stored_project: str, target_project: str) -> bool:
         """

package/mcp-server/src/server.py CHANGED Viewed

@@ -10,10 +10,22 @@ import numpy as np
 import hashlib
 import time
 import logging
+import math
 from xml.sax.saxutils import escape
 from fastmcp import FastMCP, Context
-from .utils import normalize_project_name
+# Import from shared module for consistent normalization
+import sys
+sys.path.insert(0, str(Path(__file__).parent.parent.parent))
+try:
+    from shared.normalization import normalize_project_name
+except ImportError:
+    # Fall back to local utils if shared module not found
+    from .utils import normalize_project_name
+    import logging
+    logging.warning("Using legacy utils.normalize_project_name - shared module not found")
 from .project_resolver import ProjectResolver
 from pydantic import BaseModel, Field
 from qdrant_client import AsyncQdrantClient, models
@@ -571,7 +583,7 @@ async def reflect_on_past(
     ctx: Context,
     query: str = Field(description="The search query to find semantically similar conversations"),
     limit: int = Field(default=5, description="Maximum number of results to return"),
-    min_score: float = Field(default=0.7, description="Minimum similarity score (0-1)"),
+    min_score: float = Field(default=0.3, description="Minimum similarity score (0-1)"),
     use_decay: Union[int, str] = Field(default=-1, description="Apply time-based decay: 1=enable, 0=disable, -1=use environment default (accepts int or str)"),
     project: Optional[str] = Field(default=None, description="Search specific project only. If not provided, searches current project based on working directory. Use 'all' to search across all projects."),
     include_raw: bool = Field(default=False, description="Include raw Qdrant payload data for debugging (increases response size)"),
@@ -669,8 +681,10 @@ async def reflect_on_past(
         # Filter collections by project if not searching all
         project_collections = []  # Define at this scope for later use
         if target_project != 'all':
-            # Use ProjectResolver to find collections for this project
-            resolver = ProjectResolver(qdrant_client)
+            # Use ProjectResolver with sync client (resolver expects sync operations)
+            from qdrant_client import QdrantClient as SyncQdrantClient
+            sync_client = SyncQdrantClient(url=QDRANT_URL)
+            resolver = ProjectResolver(sync_client)
             project_collections = resolver.find_collections_for_project(target_project)
             if not project_collections:
@@ -739,33 +753,32 @@ async def reflect_on_past(
                     await ctx.debug(f"Using NATIVE Qdrant decay (new API) for {collection_name}")
                     # Build the query with native Qdrant decay formula using newer API
-                    query_obj = Query(
-                        nearest=query_embedding,
-                        formula=Formula(
+                    # Convert half-life to seconds (Qdrant uses seconds for datetime)
+                    half_life_seconds = DECAY_SCALE_DAYS * 24 * 60 * 60
+                    # Build query using proper Python models as per Qdrant docs
+                    from qdrant_client import models
+                    query_obj = models.FormulaQuery(
+                        formula=models.SumExpression(
                             sum=[
-                                # Original similarity score
-                                Expression(variable="score"),
-                                # Decay boost term
-                                Expression(
-                                    mult=MultExpression(
-                                        mult=[
-                                            # Decay weight
-                                            Expression(constant=DECAY_WEIGHT),
-                                            # Exponential decay function
-                                            Expression(
-                                                exp_decay=DecayParamsExpression(
-                                                    # Use timestamp field for decay
-                                                    x=Expression(datetime_key="timestamp"),
-                                                    # Decay from current time (server-side)
-                                                    target=Expression(datetime="now"),
-                                                    # Scale in milliseconds
-                                                    scale=DECAY_SCALE_DAYS * 24 * 60 * 60 * 1000,
-                                                    # Standard exponential decay midpoint
-                                                    midpoint=0.5
-                                                )
+                                "$score",  # Original similarity score
+                                models.MultExpression(
+                                    mult=[
+                                        DECAY_WEIGHT,  # Weight multiplier
+                                        models.ExpDecayExpression(
+                                            exp_decay=models.DecayParamsExpression(
+                                                x=models.DatetimeKeyExpression(
+                                                    datetime_key="timestamp"  # Payload field with datetime
+                                                ),
+                                                target=models.DatetimeExpression(
+                                                    datetime="now"  # Current time on server
+                                                ),
+                                                scale=half_life_seconds,  # Scale in seconds
+                                                midpoint=0.5  # Half-life semantics
                                             )
-                                        ]
-                                    )
+                                        )
+                                    ]
                                 )
                             ]
                         )
@@ -776,36 +789,32 @@ async def reflect_on_past(
                         collection_name=collection_name,
                         query=query_obj,
                         limit=limit,
-                        score_threshold=min_score,
                         with_payload=True
+                        # No score_threshold - let Qdrant's decay formula handle relevance
                     )
                 elif should_use_decay and USE_NATIVE_DECAY and not NATIVE_DECAY_AVAILABLE:
                     # Use native Qdrant decay with older API
                     await ctx.debug(f"Using NATIVE Qdrant decay (legacy API) for {collection_name}")
+                    # Convert half-life to seconds (Qdrant uses seconds for datetime)
+                    half_life_seconds = DECAY_SCALE_DAYS * 24 * 60 * 60
                     # Build the query with native Qdrant decay formula using older API
+                    # Use the same models but with FormulaQuery
                     query_obj = FormulaQuery(
                         nearest=query_embedding,
                         formula=SumExpression(
                             sum=[
-                                # Original similarity score
-                                'score',  # Variable expression can be a string
-                                # Decay boost term
+                                "$score",  # Original similarity score
                                 {
-                                    'mult': [
-                                        # Decay weight (constant as float)
-                                        DECAY_WEIGHT,
-                                        # Exponential decay function
+                                    "mult": [
+                                        DECAY_WEIGHT,  # Weight multiplier
                                         {
-                                            'exp_decay': DecayParamsExpression(
-                                                # Use timestamp field for decay
-                                                x=DatetimeKeyExpression(datetime_key='timestamp'),
-                                                # Decay from current time (server-side)
-                                                target=DatetimeExpression(datetime='now'),
-                                                # Scale in milliseconds
-                                                scale=DECAY_SCALE_DAYS * 24 * 60 * 60 * 1000,
-                                                # Standard exponential decay midpoint
-                                                midpoint=0.5
+                                            "exp_decay": DecayParamsExpression(
+                                                x=DatetimeKeyExpression(datetime_key="timestamp"),
+                                                target=DatetimeExpression(datetime="now"),
+                                                scale=half_life_seconds,  # Scale in seconds
+                                                midpoint=0.5  # Half-life semantics
                                             )
                                         }
                                     ]
@@ -819,8 +828,8 @@ async def reflect_on_past(
                         collection_name=collection_name,
                         query=query_obj,
                         limit=limit,
-                        score_threshold=min_score,
                         with_payload=True
+                        # No score_threshold - let Qdrant's decay formula handle relevance
                     )
                     # Process results from native decay search
@@ -916,11 +925,14 @@ async def reflect_on_past(
                                     timestamp = timestamp.replace(tzinfo=timezone.utc)
                                 age_ms = (now - timestamp).total_seconds() * 1000
-                                # Calculate decay factor
-                                decay_factor = np.exp(-age_ms / scale_ms)
+                                # Calculate decay factor using proper half-life formula
+                                # For half-life H: decay = exp(-ln(2) * age / H)
+                                ln2 = math.log(2)
+                                decay_factor = math.exp(-ln2 * age_ms / scale_ms)
-                                # Apply decay formula
-                                adjusted_score = point.score + (DECAY_WEIGHT * decay_factor)
+                                # Apply multiplicative decay formula to keep scores bounded [0, 1]
+                                # adjusted = score * ((1 - weight) + weight * decay)
+                                adjusted_score = point.score * ((1 - DECAY_WEIGHT) + DECAY_WEIGHT * decay_factor)
                                 # Debug: show the calculation
                                 age_days = age_ms / (24 * 60 * 60 * 1000)
@@ -1001,12 +1013,13 @@ async def reflect_on_past(
                         ))
                 else:
                     # Standard search without decay
+                    # Let Qdrant handle scoring natively
                     results = await qdrant_client.search(
                         collection_name=collection_name,
                         query_vector=query_embedding,
                         limit=limit * 2,  # Get more results to account for filtering
-                        score_threshold=min_score * 0.9,  # Slightly lower threshold to catch v1 chunks
                         with_payload=True
+                        # No score_threshold - let Qdrant decide what's relevant
                     )
                     for point in results:
@@ -1691,7 +1704,7 @@ async def store_reflection(
 async def quick_search(
     ctx: Context,
     query: str = Field(description="The search query to find semantically similar conversations"),
-    min_score: float = Field(default=0.7, description="Minimum similarity score (0-1)"),
+    min_score: float = Field(default=0.3, description="Minimum similarity score (0-1)"),
     project: Optional[str] = Field(default=None, description="Search specific project only. If not provided, searches current project based on working directory. Use 'all' to search across all projects.")
 ) -> str:
     """Quick search that returns only the count and top result for fast overview."""
@@ -1737,7 +1750,7 @@ async def get_more_results(
     query: str = Field(description="The original search query"),
     offset: int = Field(default=3, description="Number of results to skip (for pagination)"),
     limit: int = Field(default=3, description="Number of additional results to return"),
-    min_score: float = Field(default=0.7, description="Minimum similarity score (0-1)"),
+    min_score: float = Field(default=0.3, description="Minimum similarity score (0-1)"),
     project: Optional[str] = Field(default=None, description="Search specific project only")
 ) -> str:
     """Get additional search results after an initial search (pagination support)."""
@@ -1772,8 +1785,9 @@ async def search_by_file(
     collections = await get_all_collections() if not project else []
     if project and project != 'all':
-        # Filter collections for specific project
-        project_hash = hashlib.md5(project.encode()).hexdigest()[:8]
+        # Filter collections for specific project - normalize first!
+        normalized_project = normalize_project_name(project)
+        project_hash = hashlib.md5(normalized_project.encode()).hexdigest()[:8]
         collection_prefix = f"conv_{project_hash}_"
         collections = [c for c in await get_all_collections() if c.startswith(collection_prefix)]
     elif project == 'all':
@@ -2137,7 +2151,7 @@ async def get_next_results(
     query: str = Field(description="The original search query"),
     offset: int = Field(default=3, description="Number of results to skip (for pagination)"),
     limit: int = Field(default=3, description="Number of additional results to return"),
-    min_score: float = Field(default=0.7, description="Minimum similarity score (0-1)"),
+    min_score: float = Field(default=0.3, description="Minimum similarity score (0-1)"),
     project: Optional[str] = Field(default=None, description="Search specific project only")
 ) -> str:
     """Get additional search results after an initial search (pagination support)."""
@@ -2152,9 +2166,10 @@ async def get_next_results(
             # Search all collections if project is "all" or not specified
             collections = await get_all_collections()
         else:
-            # Search specific project
+            # Search specific project - normalize first!
             all_collections = await get_all_collections()
-            project_hash = hashlib.md5(project.encode()).hexdigest()[:8]
+            normalized_project = normalize_project_name(project)
+            project_hash = hashlib.md5(normalized_project.encode()).hexdigest()[:8]
             collections = [
                 c for c in all_collections
                 if c.startswith(f"conv_{project_hash}_")
@@ -2196,9 +2211,12 @@ async def get_next_results(
                     if use_decay_bool and 'timestamp' in payload:
                         try:
                             timestamp = datetime.fromisoformat(payload['timestamp'].replace('Z', '+00:00'))
-                            age_days = (datetime.now(timezone.utc) - timestamp).days
-                            decay_factor = DECAY_WEIGHT + (1 - DECAY_WEIGHT) * math.exp(-age_days / DECAY_SCALE_DAYS)
-                            score = score * decay_factor
+                            age_days = (datetime.now(timezone.utc) - timestamp).total_seconds() / (24 * 60 * 60)
+                            # Use consistent half-life formula: decay = exp(-ln(2) * age / half_life)
+                            ln2 = math.log(2)
+                            decay_factor = math.exp(-ln2 * age_days / DECAY_SCALE_DAYS)
+                            # Apply multiplicative formula: score * ((1 - weight) + weight * decay)
+                            score = score * ((1 - DECAY_WEIGHT) + DECAY_WEIGHT * decay_factor)
                         except (ValueError, TypeError) as e:
                             # Log but continue - timestamp format issue shouldn't break search
                             logger.debug(f"Failed to apply decay for timestamp {payload.get('timestamp')}: {e}")

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claude-self-reflect",
-  "version": "3.2.2",
+  "version": "3.2.4",
   "description": "Give Claude perfect memory of all your conversations - Installation wizard for Python MCP server",
   "keywords": [
     "claude",
@@ -44,6 +44,7 @@
     "scripts/importer/**/*.py",
     "scripts/delta-metadata-update-safe.py",
     "scripts/force-metadata-recovery.py",
+    "shared/**/*.py",
     ".claude/agents/*.md",
     "config/qdrant-config.yaml",
     "docker-compose.yaml",

package/scripts/import-conversations-unified.py CHANGED Viewed

@@ -25,12 +25,20 @@ sys.path.insert(0, str(scripts_dir))
 from qdrant_client import QdrantClient
 from qdrant_client.models import PointStruct, Distance, VectorParams
-# Import the correct normalize_project_name from utils
+# Import normalize_project_name from shared module
+# Add parent directory to path to import shared module
+sys.path.insert(0, str(Path(__file__).parent.parent))
 try:
-    from utils import normalize_project_name
+    from shared.normalization import normalize_project_name
 except ImportError as e:
-    logging.error(f"Failed to import normalize_project_name from utils: {e}")
-    sys.exit(1)
+    logging.error(f"Failed to import normalize_project_name from shared module: {e}")
+    # Fall back to local utils if shared module not found
+    try:
+        from utils import normalize_project_name
+        logging.warning("Using legacy utils.normalize_project_name - consider updating")
+    except ImportError:
+        logging.error("Could not import normalize_project_name from any source")
+        sys.exit(1)
 # Set up logging
 logging.basicConfig(

package/scripts/importer/utils/project_normalizer.py CHANGED Viewed

@@ -2,8 +2,17 @@
 import hashlib
 import logging
+import sys
 from pathlib import Path
+# Import from shared module for consistent normalization
+sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent))
+try:
+    from shared.normalization import normalize_project_name as shared_normalize
+except ImportError:
+    shared_normalize = None
+    logging.warning("Could not import shared normalization module")
 logger = logging.getLogger(__name__)
@@ -20,32 +29,36 @@ class ProjectNormalizer:
         """
         Normalize a project path to a consistent project name.
-        CRITICAL: This must match the implementation in utils.py
+        Uses the shared normalization module to ensure consistency
+        across all components.
         Examples:
         - "-Users-name-projects-claude-self-reflect" -> "claude-self-reflect"
         - "claude-self-reflect" -> "claude-self-reflect"
         - "/path/to/-Users-name-projects-myapp" -> "myapp"
         """
-        # Get the final component of the path
-        if '/' in project_path:
-            final_component = project_path.split('/')[-1]
-        else:
-            final_component = project_path
+        # Use shared normalization if available
+        if shared_normalize:
+            return shared_normalize(project_path)
+        # Fallback implementation (matches shared module)
+        if not project_path:
+            return ""
+        path = Path(project_path.rstrip('/'))
+        final_component = path.name
         # Handle Claude's dash-separated format
         if final_component.startswith('-') and 'projects' in final_component:
-            # Find the last occurrence of 'projects-'
             idx = final_component.rfind('projects-')
             if idx != -1:
-                # Extract everything after 'projects-'
                 project_name = final_component[idx + len('projects-'):]
                 logger.debug(f"Normalized '{project_path}' to '{project_name}'")
                 return project_name
         # Already normalized or different format
         logger.debug(f"Project path '{project_path}' already normalized")
-        return final_component
+        return final_component if final_component else path.parent.name
     def get_project_name(self, file_path: Path) -> str:
         """

package/shared/__init__.py ADDED Viewed

@@ -0,0 +1,5 @@
+"""Shared utilities for claude-self-reflect."""
+from .normalization import normalize_project_name
+__all__ = ['normalize_project_name']

package/shared/normalization.py ADDED Viewed

@@ -0,0 +1,54 @@
+"""Shared normalization utilities for claude-self-reflect.
+This module provides the single source of truth for project name normalization,
+ensuring consistent hashing across import scripts and the MCP server.
+"""
+from pathlib import Path
+def normalize_project_name(project_path: str, _depth: int = 0) -> str:
+    """
+    Normalize project name for consistent hashing across import/search.
+    This is the authoritative normalization function used by both:
+    - Import scripts (import-conversations-unified.py)
+    - MCP server (server.py)
+    Examples:
+        '/Users/name/.claude/projects/-Users-name-projects-myproject' -> 'myproject'
+        '-Users-name-projects-myproject' -> 'myproject'
+        '/path/to/myproject' -> 'myproject'
+        'myproject' -> 'myproject'
+    Special handling for Claude's dash-separated format:
+        When a path component starts with '-' and contains 'projects',
+        we extract everything after 'projects-' as the project name.
+        This handles dashes in project names correctly.
+    Args:
+        project_path: Project path or name in any format
+        _depth: Internal recursion depth counter (for backwards compatibility)
+    Returns:
+        Normalized project name suitable for consistent hashing
+    """
+    if not project_path:
+        return ""
+    path = Path(project_path.rstrip('/'))
+    # Extract the final directory name
+    final_component = path.name
+    # If it's Claude's dash-separated format, extract project name
+    if final_component.startswith('-') and 'projects' in final_component:
+        # Find the last occurrence of 'projects-' to handle edge cases
+        # This correctly extracts 'claude-self-reflect' from:
+        # '-Users-ramakrishnanannaswamy-projects-claude-self-reflect'
+        idx = final_component.rfind('projects-')
+        if idx != -1:
+            return final_component[idx + len('projects-'):]
+    # For regular paths, just return the directory name
+    return final_component if final_component else path.parent.name