npm - claude-self-reflect - Versions diffs - 2.4.14 → 2.5.1 - Mend

claude-self-reflect 2.4.14 → 2.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/.claude/agents/open-source-maintainer.md +94 -8
package/Dockerfile.watcher +7 -0
package/README.md +4 -0
package/docker-compose.yaml +5 -4
package/mcp-server/pyproject.toml +1 -1
package/mcp-server/src/server.py +217 -0
package/package.json +1 -1
package/scripts/import-conversations-enhanced.py +672 -0
package/scripts/import-conversations-unified.py +3 -1
package/scripts/import-watcher.py +75 -20

package/.claude/agents/open-source-maintainer.md CHANGED Viewed

@@ -175,20 +175,76 @@ safety check -r mcp-server/requirements.txt
 # For Node: npm test
 ```
-#### 5. Release Creation
+#### 4.5. Create Professional Release Notes
 ```bash
+# Create release notes file
+VERSION=$(node -p "require('./package.json').version")
+cat > docs/RELEASE_NOTES_v${VERSION}.md << 'EOF'
+# Release Notes - v${VERSION}
+## Summary
+Brief description of what this release addresses and why it matters.
+## Changes
+### Bug Fixes
+- Fixed global npm installation failing due to Docker build context issues (#13)
+  - Modified Dockerfile.importer to embed Python dependencies directly
+  - Removed dependency on external requirements.txt file during build
+  - Ensures compatibility with both local development and global npm installations
+### Technical Details
+- Files modified:
+  - `Dockerfile.importer`: Embedded Python dependencies inline
+  - Removed COPY instruction for scripts that are volume-mounted at runtime
+### Verification
+- Docker builds tested successfully in isolation
+- Import process verified to skip already imported files
+- Both local and global npm installation paths validated
+## Installation
+```bash
+npm install -g claude-self-reflect@${VERSION}
+```
+## Contributors
+Thank you to everyone who reported issues and helped test this release:
+- @mattias012 - Reported npm global installation issue
+- @vbp1 - Confirmed Docker setup problems
+## Related Issues
+- Resolves #13: Global npm installation Docker build failures
+EOF
+```
+#### 5. Version Bump & Release Creation
+```bash
+# Update package.json version BEFORE creating tag
+# Determine version bump type based on changes:
+# - patch: bug fixes, minor updates (2.4.10 -> 2.4.11)
+# - minor: new features, non-breaking changes (2.4.10 -> 2.5.0)
+# - major: breaking changes (2.4.10 -> 3.0.0)
+npm version patch --no-git-tag-version  # Updates package.json and package-lock.json
+# Commit version bump
+VERSION=$(node -p "require('./package.json').version")
+git add package.json package-lock.json
+git commit -m "chore: bump version to ${VERSION} for release"
+git push origin main
 # Create and push tag
-git tag -a vX.Y.Z -m "Release vX.Y.Z - Brief description"
-git push origin vX.Y.Z
+git tag -a v${VERSION} -m "Release v${VERSION} - Brief description"
+git push origin v${VERSION}
 # Create GitHub release
-gh release create vX.Y.Z \
-  --title "vX.Y.Z - Release Title" \
-  --notes-file docs/RELEASE_NOTES_vX.Y.Z.md \
+gh release create v${VERSION} \
+  --title "v${VERSION} - Release Title" \
+  --notes-file docs/RELEASE_NOTES_v${VERSION}.md \
   --target main
 # Monitor the release workflow
-echo "🚀 Release created! Monitoring automated publishing..."
+echo "Release created! Monitoring automated publishing..."
 gh run list --workflow "CI/CD Pipeline" --limit 1
 gh run watch
 ```
@@ -207,7 +263,7 @@ echo "⏳ Waiting for automated npm publish..."
 # Monitor the release workflow until npm publish completes
 ```
-#### 7. Post-Release Verification
+#### 7. Post-Release Verification & Issue Management
 ```bash
 # Verify GitHub release
 gh release view vX.Y.Z
@@ -217,6 +273,36 @@ npm view claude-self-reflect version
 # Check that related PRs are closed
 gh pr list --state closed --limit 10
+# Handle related issues professionally
+# For each issue addressed in this release:
+ISSUE_NUMBER=13  # Example
+VERSION=$(node -p "require('./package.json').version")
+# Determine if issue should be closed or kept open
+# Close if: bug fixed, feature implemented, question answered
+# Keep open if: partial fix, needs more work, ongoing discussion
+# Professional comment template (no emojis, clear references)
+gh issue comment $ISSUE_NUMBER --body "Thank you for reporting this issue. The global npm installation problem has been addressed in release v${VERSION}.
+The fix involved modifying the Docker build process to embed dependencies directly:
+- Modified: Dockerfile.importer - Embedded Python dependencies to avoid file path issues
+- Verified: Docker builds work correctly without requiring scripts directory in build context
+- Tested: Import process correctly skips already imported files
+You can update to the latest version with:
+\`\`\`bash
+npm install -g claude-self-reflect@${VERSION}
+\`\`\`
+Please let us know if you encounter any issues with the new version."
+# Close the issue if fully resolved
+gh issue close $ISSUE_NUMBER --comment "Closing as resolved in v${VERSION}. Feel free to reopen if you encounter any related issues."
+# Or keep open with status update if partially resolved
+# gh issue comment $ISSUE_NUMBER --body "Partial fix implemented in v${VERSION}. Keeping this issue open to track remaining work on [specific aspect]."
 ```
 #### 8. Rollback Procedures

package/Dockerfile.watcher CHANGED Viewed

@@ -20,12 +20,19 @@ RUN pip install --no-cache-dir \
 # Create non-root user
 RUN useradd -m -u 1000 watcher
+# Pre-download FastEmbed model to avoid runtime downloads
+RUN mkdir -p /home/watcher/.cache && \
+    FASTEMBED_CACHE_PATH=/home/watcher/.cache/fastembed python -c "from fastembed import TextEmbedding; import os; os.environ['FASTEMBED_CACHE_PATH']='/home/watcher/.cache/fastembed'; TextEmbedding('sentence-transformers/all-MiniLM-L6-v2')" && \
+    chown -R watcher:watcher /home/watcher/.cache
 # Create scripts directory and copy required files
 RUN mkdir -p /scripts
 # Copy all necessary scripts
 COPY scripts/import-conversations-unified.py /scripts/
 COPY scripts/import-watcher.py /scripts/
+COPY scripts/utils.py /scripts/
+COPY scripts/trigger-import.py /scripts/
 RUN chmod +x /scripts/*.py

package/README.md CHANGED Viewed

@@ -203,6 +203,10 @@ Recent conversations matter more. Old ones fade. Like your brain, but reliable.
 Works perfectly out of the box. [Configure if you're particular](docs/memory-decay.md).
+## Theoretical Foundation
+Claude Self-Reflect addresses the "reality gap" in AI memory systems - the distance between perfect recall expectations and practical utility. Our approach aligns with the SPAR Framework (Sense, Plan, Act, Reflect) for agentic AI systems. [Learn more about our design philosophy](docs/architecture/SPAR-alignment.md).
 ## For the Skeptics
 **"Just use grep"** - Sure, enjoy your 10,000 matches for "database"

package/docker-compose.yaml CHANGED Viewed

@@ -22,8 +22,8 @@ services:
       - QDRANT__LOG_LEVEL=INFO
       - QDRANT__SERVICE__HTTP_PORT=6333
     restart: unless-stopped
-    mem_limit: ${QDRANT_MEMORY:-1g}
-    memswap_limit: ${QDRANT_MEMORY:-1g}
+    mem_limit: ${QDRANT_MEMORY:-2g}
+    memswap_limit: ${QDRANT_MEMORY:-2g}
   # One-time import service (runs once then exits)
   importer:
@@ -66,6 +66,7 @@ services:
       - ${CLAUDE_LOGS_PATH:-~/.claude/projects}:/logs:ro
       - ${CONFIG_PATH:-~/.claude-self-reflect/config}:/config
       - ./scripts:/scripts:ro
+      - /tmp:/tmp
     environment:
       - QDRANT_URL=http://qdrant:6333
       - STATE_FILE=/config/imported-files.json
@@ -78,8 +79,8 @@ services:
       - PYTHONUNBUFFERED=1
     restart: unless-stopped
     profiles: ["watch"]
-    mem_limit: 2g
-    memswap_limit: 2g
+    mem_limit: 1g
+    memswap_limit: 1g
   # MCP server for Claude integration
   mcp-server:

package/mcp-server/pyproject.toml CHANGED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "claude-self-reflect-mcp"
-version = "2.4.11"
+version = "2.5.1"
 description = "MCP server for Claude self-reflection with memory decay"
 # readme = "README.md"
 requires-python = ">=3.10"

package/mcp-server/src/server.py CHANGED Viewed

@@ -887,6 +887,223 @@ async def get_more_results(
     return response
+@mcp.tool()
+async def search_by_file(
+    ctx: Context,
+    file_path: str = Field(description="The file path to search for in conversations"),
+    limit: int = Field(default=10, description="Maximum number of results to return"),
+    project: Optional[str] = Field(default=None, description="Search specific project only. Use 'all' to search across all projects.")
+) -> str:
+    """Search for conversations that analyzed a specific file."""
+    global qdrant_client
+    # Normalize file path
+    normalized_path = file_path.replace("\\", "/").replace("/Users/", "~/")
+    # Determine which collections to search
+    # If no project specified, search all collections
+    collections = await get_all_collections() if not project else []
+    if project and project != 'all':
+        # Filter collections for specific project
+        project_hash = hashlib.md5(project.encode()).hexdigest()[:8]
+        collection_prefix = f"conv_{project_hash}_"
+        collections = [c for c in await get_all_collections() if c.startswith(collection_prefix)]
+    elif project == 'all':
+        collections = await get_all_collections()
+    if not collections:
+        return "<search_by_file>\n<error>No collections found to search</error>\n</search_by_file>"
+    # Prepare results
+    all_results = []
+    for collection_name in collections:
+        try:
+            # Use scroll to get all points and filter manually
+            # Qdrant's array filtering can be tricky, so we'll filter in code
+            scroll_result = await qdrant_client.scroll(
+                collection_name=collection_name,
+                limit=1000,  # Get a batch
+                with_payload=True
+            )
+            # Filter results that contain the file
+            for point in scroll_result[0]:
+                payload = point.payload
+                files_analyzed = payload.get('files_analyzed', [])
+                files_edited = payload.get('files_edited', [])
+                if normalized_path in files_analyzed or normalized_path in files_edited:
+                    all_results.append({
+                        'score': 1.0,  # File match is always 1.0
+                        'payload': payload,
+                        'collection': collection_name
+                    })
+        except Exception as e:
+            continue
+    # Sort by timestamp (newest first)
+    all_results.sort(key=lambda x: x['payload'].get('timestamp', ''), reverse=True)
+    # Format results
+    if not all_results:
+        return f"""<search_by_file>
+<query>{file_path}</query>
+<normalized_path>{normalized_path}</normalized_path>
+<message>No conversations found that analyzed this file</message>
+</search_by_file>"""
+    results_text = []
+    for i, result in enumerate(all_results[:limit]):
+        payload = result['payload']
+        timestamp = payload.get('timestamp', 'Unknown')
+        conversation_id = payload.get('conversation_id', 'Unknown')
+        project = payload.get('project', 'Unknown')
+        text_preview = payload.get('text', '')[:200] + '...' if len(payload.get('text', '')) > 200 else payload.get('text', '')
+        # Check if file was edited or just read
+        action = "edited" if normalized_path in payload.get('files_edited', []) else "analyzed"
+        # Get related tools used
+        tool_summary = payload.get('tool_summary', {})
+        tools_used = ', '.join(f"{tool}({count})" for tool, count in tool_summary.items())
+        results_text.append(f"""<result rank="{i+1}">
+<conversation_id>{conversation_id}</conversation_id>
+<project>{project}</project>
+<timestamp>{timestamp}</timestamp>
+<action>{action}</action>
+<tools_used>{tools_used}</tools_used>
+<preview>{text_preview}</preview>
+</result>""")
+    return f"""<search_by_file>
+<query>{file_path}</query>
+<normalized_path>{normalized_path}</normalized_path>
+<count>{len(all_results)}</count>
+<results>
+{''.join(results_text)}
+</results>
+</search_by_file>"""
+@mcp.tool()
+async def search_by_concept(
+    ctx: Context,
+    concept: str = Field(description="The concept to search for (e.g., 'security', 'docker', 'testing')"),
+    include_files: bool = Field(default=True, description="Include file information in results"),
+    limit: int = Field(default=10, description="Maximum number of results to return"),
+    project: Optional[str] = Field(default=None, description="Search specific project only. Use 'all' to search across all projects.")
+) -> str:
+    """Search for conversations about a specific development concept."""
+    global qdrant_client
+    # Generate embedding for the concept
+    embedding = await generate_embedding(concept)
+    # Determine which collections to search
+    # If no project specified, search all collections
+    collections = await get_all_collections() if not project else []
+    if project and project != 'all':
+        # Filter collections for specific project
+        project_hash = hashlib.md5(project.encode()).hexdigest()[:8]
+        collection_prefix = f"conv_{project_hash}_"
+        collections = [c for c in await get_all_collections() if c.startswith(collection_prefix)]
+    elif project == 'all':
+        collections = await get_all_collections()
+    if not collections:
+        return "<search_by_concept>\n<error>No collections found to search</error>\n</search_by_concept>"
+    # Search all collections
+    all_results = []
+    for collection_name in collections:
+        try:
+            # Hybrid search: semantic + concept filter
+            results = await qdrant_client.search(
+                collection_name=collection_name,
+                query_vector=embedding,
+                query_filter=models.Filter(
+                    should=[
+                        models.FieldCondition(
+                            key="concepts",
+                            match=models.MatchAny(any=[concept.lower()])
+                        )
+                    ]
+                ),
+                limit=limit * 2,  # Get more results for better filtering
+                with_payload=True
+            )
+            for point in results:
+                payload = point.payload
+                # Boost score if concept is in the concepts list
+                score_boost = 0.2 if concept.lower() in payload.get('concepts', []) else 0.0
+                all_results.append({
+                    'score': float(point.score) + score_boost,
+                    'payload': payload,
+                    'collection': collection_name
+                })
+        except Exception as e:
+            continue
+    # Sort by score and limit
+    all_results.sort(key=lambda x: x['score'], reverse=True)
+    all_results = all_results[:limit]
+    # Format results
+    if not all_results:
+        return f"""<search_by_concept>
+<concept>{concept}</concept>
+<message>No conversations found about this concept</message>
+</search_by_concept>"""
+    results_text = []
+    for i, result in enumerate(all_results):
+        payload = result['payload']
+        score = result['score']
+        timestamp = payload.get('timestamp', 'Unknown')
+        conversation_id = payload.get('conversation_id', 'Unknown')
+        project = payload.get('project', 'Unknown')
+        concepts = payload.get('concepts', [])
+        # Get text preview
+        text_preview = payload.get('text', '')[:200] + '...' if len(payload.get('text', '')) > 200 else payload.get('text', '')
+        # File information
+        files_info = ""
+        if include_files:
+            files_analyzed = payload.get('files_analyzed', [])[:5]
+            if files_analyzed:
+                files_info = f"\n<files_analyzed>{', '.join(files_analyzed)}</files_analyzed>"
+        # Related concepts
+        related_concepts = [c for c in concepts if c != concept.lower()][:5]
+        results_text.append(f"""<result rank="{i+1}">
+<score>{score:.3f}</score>
+<conversation_id>{conversation_id}</conversation_id>
+<project>{project}</project>
+<timestamp>{timestamp}</timestamp>
+<concepts>{', '.join(concepts)}</concepts>
+<related_concepts>{', '.join(related_concepts)}</related_concepts>{files_info}
+<preview>{text_preview}</preview>
+</result>""")
+    return f"""<search_by_concept>
+<concept>{concept}</concept>
+<count>{len(all_results)}</count>
+<results>
+{''.join(results_text)}
+</results>
+</search_by_concept>"""
 # Debug output
 print(f"[DEBUG] FastMCP server created with name: {mcp.name}")

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claude-self-reflect",
-  "version": "2.4.14",
+  "version": "2.5.1",
   "description": "Give Claude perfect memory of all your conversations - Installation wizard for Python MCP server",
   "keywords": [
     "claude",

package/scripts/import-conversations-enhanced.py ADDED Viewed

@@ -0,0 +1,672 @@
+#!/usr/bin/env python3
+"""
+Enhanced import script that extracts tool usage metadata from conversations.
+Supports both local and Voyage AI embeddings with tool tracking.
+"""
+import os
+import sys
+import json
+import glob
+import hashlib
+import gc
+import re
+import time
+from datetime import datetime, timedelta
+from typing import List, Dict, Any, Set, Tuple
+import logging
+from pathlib import Path
+from qdrant_client import QdrantClient
+from qdrant_client.models import (
+    VectorParams, Distance, PointStruct,
+    Filter, FieldCondition, MatchValue
+)
+from tenacity import (
+    retry,
+    stop_after_attempt,
+    wait_random_exponential,
+)
+# Configuration
+QDRANT_URL = os.getenv("QDRANT_URL", "http://localhost:6333")
+LOGS_DIR = os.getenv("LOGS_DIR", "/logs")
+STATE_FILE = os.getenv("STATE_FILE", "./config/imported-files-enhanced.json")
+BATCH_SIZE = int(os.getenv("BATCH_SIZE", "10"))
+PREFER_LOCAL_EMBEDDINGS = os.getenv("PREFER_LOCAL_EMBEDDINGS", "false").lower() == "true"
+VOYAGE_API_KEY = os.getenv("VOYAGE_KEY")
+DRY_RUN = os.getenv("DRY_RUN", "false").lower() == "true"
+# Set up logging
+logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
+logger = logging.getLogger(__name__)
+# Import timing stats
+timing_stats = {
+    "extract": [],
+    "chunk": [],
+    "embed": [],
+    "store": [],
+    "total": []
+}
+def normalize_path(path: str) -> str:
+    """Normalize file paths for consistency across platforms."""
+    if not path:
+        return ""
+    # Remove common prefixes
+    path = path.replace("/Users/", "~/")
+    path = path.replace("\\Users\\", "~\\")
+    # Convert to forward slashes
+    path = path.replace("\\", "/")
+    # Remove duplicate slashes
+    path = re.sub(r'/+', '/', path)
+    return path
+def extract_concepts(text: str, tool_usage: Dict[str, Any]) -> Set[str]:
+    """Extract high-level concepts from conversation and tool usage."""
+    concepts = set()
+    # Common development concepts with patterns
+    concept_patterns = {
+        'security': r'(security|vulnerability|CVE|injection|sanitize|escape|auth|token|JWT)',
+        'performance': r'(performance|optimization|speed|memory|efficient|benchmark|latency)',
+        'testing': r'(test|pytest|unittest|coverage|TDD|spec|assert)',
+        'docker': r'(docker|container|compose|dockerfile|kubernetes|k8s)',
+        'api': r'(API|REST|GraphQL|endpoint|webhook|http|request)',
+        'database': r'(database|SQL|query|migration|schema|postgres|mysql|mongodb)',
+        'authentication': r'(auth|login|token|JWT|session|oauth|permission)',
+        'debugging': r'(debug|error|exception|traceback|log|stack|trace)',
+        'refactoring': r'(refactor|cleanup|improve|restructure|optimize|technical debt)',
+        'deployment': r'(deploy|CI/CD|release|production|staging|rollout)',
+        'git': r'(git|commit|branch|merge|pull request|PR|rebase)',
+        'architecture': r'(architecture|design|pattern|structure|component|module)',
+        'mcp': r'(MCP|claude-self-reflect|tool|agent|claude code)',
+        'embeddings': r'(embedding|vector|semantic|similarity|fastembed|voyage)',
+        'search': r'(search|query|find|filter|match|relevance)'
+    }
+    # Check text content
+    combined_text = text.lower()
+    for concept, pattern in concept_patterns.items():
+        if re.search(pattern, combined_text, re.IGNORECASE):
+            concepts.add(concept)
+    # Check tool usage patterns
+    tool_text = json.dumps(tool_usage).lower()
+    for concept, pattern in concept_patterns.items():
+        if re.search(pattern, tool_text, re.IGNORECASE):
+            concepts.add(concept)
+    # Add concepts based on specific tool usage
+    if tool_usage.get('grep_searches'):
+        concepts.add('search')
+    if tool_usage.get('files_edited') or tool_usage.get('files_created'):
+        concepts.add('development')
+    if any('test' in str(f).lower() for f in tool_usage.get('files_read', [])):
+        concepts.add('testing')
+    if any('docker' in str(cmd).lower() for cmd in tool_usage.get('bash_commands', [])):
+        concepts.add('docker')
+    return concepts
+def extract_tool_usage_from_jsonl(jsonl_path: str) -> Dict[str, Any]:
+    """Extract all tool usage from a conversation."""
+    tool_usage = {
+        "files_read": [],
+        "files_edited": [],
+        "files_created": [],
+        "grep_searches": [],
+        "bash_commands": [],
+        "glob_patterns": [],
+        "task_calls": [],
+        "mcp_calls": [],
+        "tools_summary": {},
+        "concepts": set(),
+        "timing": {},
+        "errors": [],
+        "tool_results": {}
+    }
+    start_time = time.time()
+    with open(jsonl_path, 'r', encoding='utf-8') as f:
+        for line_num, line in enumerate(f, 1):
+            line = line.strip()
+            if not line:
+                continue
+            try:
+                data = json.loads(line)
+                # Skip API error messages
+                if data.get('isApiErrorMessage'):
+                    continue
+                # Process message content
+                if 'message' in data and 'content' in data['message']:
+                    content = data['message']['content']
+                    # Handle content array (where tool_use lives)
+                    if isinstance(content, list):
+                        for item in content:
+                            if isinstance(item, dict) and item.get('type') == 'tool_use':
+                                extract_single_tool_use(item, tool_usage)
+            except json.JSONDecodeError as e:
+                logger.debug(f"Skipping invalid JSON at line {line_num}: {e}")
+            except Exception as e:
+                logger.error(f"Error processing line {line_num}: {e}")
+                tool_usage["errors"].append({"line": line_num, "error": str(e)})
+    # Calculate timing
+    tool_usage["timing"]["extract_ms"] = int((time.time() - start_time) * 1000)
+    # Convert sets to lists for JSON serialization
+    tool_usage["concepts"] = list(tool_usage["concepts"])
+    return tool_usage
+def extract_single_tool_use(tool_data: Dict[str, Any], usage_dict: Dict[str, Any]) -> None:
+    """Parse individual tool usage with enhanced metadata extraction."""
+    tool_name = tool_data.get('name')
+    inputs = tool_data.get('input', {})
+    tool_id = tool_data.get('id')
+    # Track tool frequency
+    usage_dict['tools_summary'][tool_name] = usage_dict['tools_summary'].get(tool_name, 0) + 1
+    # Extract based on tool type
+    if tool_name == 'Read':
+        path = inputs.get('file_path')
+        if path:
+            usage_dict['files_read'].append({
+                'path': normalize_path(path),
+                'offset': inputs.get('offset', 0),
+                'limit': inputs.get('limit', -1),
+                'tool_id': tool_id
+            })
+    elif tool_name == 'Grep':
+        pattern = inputs.get('pattern')
+        if pattern:
+            usage_dict['grep_searches'].append({
+                'pattern': pattern[:100],  # Limit pattern length
+                'path': normalize_path(inputs.get('path', '.')),
+                'glob': inputs.get('glob'),
+                'output_mode': inputs.get('output_mode', 'files_with_matches'),
+                'case_insensitive': inputs.get('-i', False)
+            })
+            # Add search concept
+            usage_dict['concepts'].add('search')
+    elif tool_name == 'Edit' or tool_name == 'MultiEdit':
+        path = inputs.get('file_path')
+        if path:
+            usage_dict['files_edited'].append({
+                'path': normalize_path(path),
+                'operation': tool_name.lower()
+            })
+    elif tool_name == 'Write':
+        path = inputs.get('file_path')
+        if path:
+            usage_dict['files_created'].append(normalize_path(path))
+    elif tool_name == 'Bash':
+        cmd = inputs.get('command', '')
+        if cmd:
+            # Extract command name
+            cmd_parts = cmd.split()
+            cmd_name = cmd_parts[0] if cmd_parts else 'unknown'
+            usage_dict['bash_commands'].append({
+                'command': cmd_name,
+                'description': inputs.get('description', '')[:100]
+            })
+            # Add concepts based on commands
+            if 'docker' in cmd.lower():
+                usage_dict['concepts'].add('docker')
+            if 'git' in cmd.lower():
+                usage_dict['concepts'].add('git')
+            if 'test' in cmd.lower() or 'pytest' in cmd.lower():
+                usage_dict['concepts'].add('testing')
+    elif tool_name == 'Glob':
+        pattern = inputs.get('pattern')
+        if pattern:
+            usage_dict['glob_patterns'].append({
+                'pattern': pattern,
+                'path': normalize_path(inputs.get('path', '.'))
+            })
+    elif tool_name == 'Task':
+        usage_dict['task_calls'].append({
+            'description': inputs.get('description', '')[:100],
+            'subagent_type': inputs.get('subagent_type')
+        })
+    # Handle MCP tools
+    elif tool_name and tool_name.startswith('mcp__'):
+        usage_dict['mcp_calls'].append({
+            'tool': tool_name,
+            'params': list(inputs.keys()) if inputs else []
+        })
+        usage_dict['concepts'].add('mcp')
+def create_enhanced_chunk(messages: List[Dict], chunk_index: int, tool_usage: Dict[str, Any],
+                         conversation_metadata: Dict[str, Any]) -> Dict[str, Any]:
+    """Create chunk with tool usage metadata."""
+    # Extract text from messages
+    chunk_text = "\n\n".join([
+        f"{msg['role'].upper()}: {msg['content']}"
+        for msg in messages
+    ])
+    # Extract concepts from chunk text and tool usage
+    concepts = extract_concepts(chunk_text, tool_usage)
+    # Deduplicate and clean file paths
+    all_file_items = tool_usage.get('files_read', []) + tool_usage.get('files_edited', [])
+    files_analyzed = list(set([
+        item['path'] if isinstance(item, dict) else item
+        for item in all_file_items
+        if (isinstance(item, dict) and item.get('path')) or isinstance(item, str)
+    ]))[:20]  # Limit to 20 files
+    files_edited = list(set([
+        item['path'] if isinstance(item, dict) else item
+        for item in tool_usage.get('files_edited', [])
+        if (isinstance(item, dict) and item.get('path')) or isinstance(item, str)
+    ]))[:10]  # Limit to 10 files
+    # Build enhanced chunk
+    chunk = {
+        "text": chunk_text,
+        "conversation_id": conversation_metadata['id'],
+        "chunk_index": chunk_index,
+        "timestamp": conversation_metadata['timestamp'],
+        "project": conversation_metadata['project'],
+        "start_role": messages[0]['role'] if messages else 'unknown',
+        # Tool usage metadata
+        "files_analyzed": files_analyzed,
+        "files_edited": files_edited,
+        "search_patterns": [s['pattern'] for s in tool_usage.get('grep_searches', [])][:10],
+        "concepts": list(concepts)[:15],
+        "tool_summary": dict(list(tool_usage.get('tools_summary', {}).items())[:10]),
+        "analysis_only": len(tool_usage.get('files_edited', [])) == 0 and len(tool_usage.get('files_created', [])) == 0,
+        # Additional context
+        "commands_used": list(set([c['command'] for c in tool_usage.get('bash_commands', [])]))[:10],
+        "has_security_check": 'security' in concepts,
+        "has_performance_check": 'performance' in concepts,
+        "mcp_tools_used": list(set([m['tool'].split('__')[1] if '__' in m['tool'] else m['tool']
+                                   for m in tool_usage.get('mcp_calls', [])]))[:5]
+    }
+    return chunk
+# Import state management functions (same as original)
+def load_state():
+    """Load the import state from file."""
+    if os.path.exists(STATE_FILE):
+        try:
+            with open(STATE_FILE, 'r') as f:
+                state = json.load(f)
+                if "imported_files" not in state:
+                    state["imported_files"] = {}
+                return state
+        except Exception as e:
+            logger.warning(f"Failed to load state file: {e}")
+    return {"imported_files": {}}
+def save_state(state):
+    """Save the import state to file."""
+    try:
+        os.makedirs(os.path.dirname(STATE_FILE), exist_ok=True)
+        temp_file = STATE_FILE + ".tmp"
+        with open(temp_file, 'w') as f:
+            json.dump(state, f, indent=2)
+        os.replace(temp_file, STATE_FILE)
+        logger.debug(f"Saved state with {len(state['imported_files'])} files")
+    except Exception as e:
+        logger.error(f"Failed to save state file: {e}")
+def should_import_file(file_path, state):
+    """Check if a file should be imported based on modification time."""
+    str_path = str(file_path)
+    file_mtime = os.path.getmtime(file_path)
+    if str_path in state["imported_files"]:
+        last_imported = state["imported_files"][str_path].get("last_imported", 0)
+        last_modified = state["imported_files"][str_path].get("last_modified", 0)
+        if file_mtime <= last_modified and last_imported > 0:
+            logger.info(f"Skipping unchanged file: {file_path.name}")
+            return False
+    return True
+def update_file_state(file_path, state, chunks_imported, tool_stats=None):
+    """Update the state for an imported file with tool usage stats."""
+    str_path = str(file_path)
+    state["imported_files"][str_path] = {
+        "last_modified": os.path.getmtime(file_path),
+        "last_imported": datetime.now().timestamp(),
+        "chunks_imported": chunks_imported,
+        "tool_stats": tool_stats or {}
+    }
+# Initialize embedding provider
+embedding_provider = None
+embedding_dimension = None
+collection_suffix = None
+if PREFER_LOCAL_EMBEDDINGS or not VOYAGE_API_KEY:
+    logger.info("Using local FastEmbed embeddings")
+    from fastembed import TextEmbedding
+    embedding_provider = TextEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2")
+    embedding_dimension = 384
+    collection_suffix = "_local"
+else:
+    logger.info("Using Voyage AI embeddings")
+    import voyageai
+    vo = voyageai.Client(api_key=VOYAGE_API_KEY)
+    embedding_provider = vo
+    embedding_dimension = 1024
+    collection_suffix = "_voyage"
+# Initialize Qdrant client
+client = QdrantClient(url=QDRANT_URL)
+def chunk_conversation(messages: List[Dict], chunk_size: int = 10) -> List[Dict]:
+    """Split conversation into chunks of messages."""
+    chunks = []
+    for i in range(0, len(messages), chunk_size):
+        chunk_messages = messages[i:i + chunk_size]
+        chunks.append({
+            "messages": chunk_messages,
+            "chunk_index": i // chunk_size
+        })
+    return chunks
+@retry(stop=stop_after_attempt(3), wait=wait_random_exponential(min=1, max=20))
+def generate_embeddings(texts: List[str]) -> List[List[float]]:
+    """Generate embeddings for texts with retry logic."""
+    if PREFER_LOCAL_EMBEDDINGS or not VOYAGE_API_KEY:
+        embeddings = list(embedding_provider.embed(texts))
+        return [emb.tolist() if hasattr(emb, 'tolist') else emb for emb in embeddings]
+    else:
+        result = embedding_provider.embed(texts, model="voyage-3", input_type="document")
+        return result.embeddings
+def import_project(project_path: Path, state: Dict) -> int:
+    """Import conversations from a single project with tool usage extraction."""
+    total_chunks = 0
+    jsonl_files = list(project_path.glob("*.jsonl"))
+    if not jsonl_files:
+        return 0
+    # Create or verify collection
+    collection_name = f"conv_{hashlib.md5(project_path.name.encode()).hexdigest()[:8]}{collection_suffix}"
+    try:
+        collections = [c.name for c in client.get_collections().collections]
+        if collection_name not in collections:
+            client.create_collection(
+                collection_name=collection_name,
+                vectors_config=VectorParams(size=embedding_dimension, distance=Distance.COSINE)
+            )
+            logger.info(f"Created collection: {collection_name}")
+    except Exception as e:
+        logger.error(f"Failed to create/verify collection {collection_name}: {e}")
+        return 0
+    for jsonl_file in jsonl_files:
+        if not should_import_file(jsonl_file, state):
+            continue
+        logger.info(f"Processing file: {jsonl_file.name}")
+        try:
+            file_start_time = time.time()
+            # Extract tool usage
+            extract_start = time.time()
+            tool_usage = extract_tool_usage_from_jsonl(str(jsonl_file))
+            extract_time = time.time() - extract_start
+            timing_stats["extract"].append(extract_time)
+            # Read and process messages (original logic)
+            messages = []
+            created_at = None
+            with open(jsonl_file, 'r', encoding='utf-8') as f:
+                for line_num, line in enumerate(f, 1):
+                    line = line.strip()
+                    if not line:
+                        continue
+                    try:
+                        data = json.loads(line)
+                        if created_at is None and 'timestamp' in data:
+                            created_at = data.get('timestamp')
+                        if data.get('type') == 'summary':
+                            continue
+                        if 'message' in data and data['message']:
+                            msg = data['message']
+                            if msg.get('role') and msg.get('content'):
+                                content = msg['content']
+                                if isinstance(content, list):
+                                    text_parts = []
+                                    for item in content:
+                                        if isinstance(item, dict) and item.get('type') == 'text':
+                                            text_parts.append(item.get('text', ''))
+                                        elif isinstance(item, str):
+                                            text_parts.append(item)
+                                    content = '\n'.join(text_parts)
+                                if content:
+                                    messages.append({
+                                        'role': msg['role'],
+                                        'content': content
+                                    })
+                    except Exception as e:
+                        logger.error(f"Error processing line {line_num}: {e}")
+            if not messages:
+                continue
+            # Prepare metadata
+            if created_at is None:
+                created_at = datetime.now().isoformat()
+            conversation_id = jsonl_file.stem
+            conversation_metadata = {
+                'id': conversation_id,
+                'timestamp': created_at,
+                'project': project_path.name
+            }
+            # Chunk the conversation
+            chunk_start = time.time()
+            chunks_data = chunk_conversation(messages)
+            enhanced_chunks = []
+            for chunk_data in chunks_data:
+                enhanced_chunk = create_enhanced_chunk(
+                    chunk_data["messages"],
+                    chunk_data["chunk_index"],
+                    tool_usage,
+                    conversation_metadata
+                )
+                enhanced_chunks.append(enhanced_chunk)
+            chunk_time = time.time() - chunk_start
+            timing_stats["chunk"].append(chunk_time)
+            if not enhanced_chunks:
+                continue
+            # Process in batches
+            for batch_start in range(0, len(enhanced_chunks), BATCH_SIZE):
+                batch = enhanced_chunks[batch_start:batch_start + BATCH_SIZE]
+                texts = [chunk["text"] for chunk in batch]
+                # Generate embeddings
+                embed_start = time.time()
+                embeddings = generate_embeddings(texts)
+                embed_time = time.time() - embed_start
+                timing_stats["embed"].append(embed_time)
+                # Create points
+                points = []
+                for chunk, embedding in zip(batch, embeddings):
+                    point_id = hashlib.md5(
+                        f"{conversation_id}_{chunk['chunk_index']}".encode()
+                    ).hexdigest()[:16]
+                    points.append(PointStruct(
+                        id=int(point_id, 16) % (2**63),
+                        vector=embedding,
+                        payload=chunk
+                    ))
+                # Upload to Qdrant (unless dry run)
+                if not DRY_RUN:
+                    store_start = time.time()
+                    client.upsert(
+                        collection_name=collection_name,
+                        points=points
+                    )
+                    store_time = time.time() - store_start
+                    timing_stats["store"].append(store_time)
+                else:
+                    logger.info(f"[DRY RUN] Would upload {len(points)} points to {collection_name}")
+                total_chunks += len(points)
+            file_chunks = len(enhanced_chunks)
+            total_time = time.time() - file_start_time
+            timing_stats["total"].append(total_time)
+            logger.info(f"Imported {file_chunks} chunks from {jsonl_file.name} "
+                       f"(extract: {extract_time:.2f}s, chunk: {chunk_time:.2f}s, total: {total_time:.2f}s)")
+            # Update state with tool stats
+            tool_stats = {
+                "tools_used": list(tool_usage['tools_summary'].keys()),
+                "files_analyzed": len(enhanced_chunks[0].get('files_analyzed', [])) if enhanced_chunks else 0,
+                "concepts": list(tool_usage.get('concepts', []))[:10]
+            }
+            update_file_state(jsonl_file, state, file_chunks, tool_stats)
+            # Save state after each file
+            if not DRY_RUN:
+                save_state(state)
+            gc.collect()
+        except Exception as e:
+            logger.error(f"Failed to import {jsonl_file}: {e}")
+            import traceback
+            logger.error(traceback.format_exc())
+    return total_chunks
+def main():
+    """Main import function with enhanced features."""
+    import argparse
+    parser = argparse.ArgumentParser(description='Import conversations with tool usage extraction')
+    parser.add_argument('--days', type=int, help='Import only files from last N days')
+    parser.add_argument('--limit', type=int, help='Limit number of files to import')
+    parser.add_argument('--dry-run', action='store_true', help='Run without actually importing')
+    parser.add_argument('--project', type=str, help='Import only specific project')
+    args = parser.parse_args()
+    if args.dry_run:
+        global DRY_RUN
+        DRY_RUN = True
+        logger.info("Running in DRY RUN mode - no data will be imported")
+    logs_path = Path(LOGS_DIR)
+    # Handle local development vs Docker paths
+    if not logs_path.exists():
+        # Try local development path
+        home_logs = Path.home() / '.claude' / 'projects'
+        if home_logs.exists():
+            logs_path = home_logs
+            logger.info(f"Using local logs directory: {logs_path}")
+        else:
+            logger.error(f"Logs directory not found: {LOGS_DIR}")
+            return
+    # Load existing state
+    state = load_state()
+    logger.info(f"Loaded state with {len(state['imported_files'])} previously imported files")
+    # Find project directories
+    if args.project:
+        project_dirs = [d for d in logs_path.iterdir() if d.is_dir() and args.project in d.name]
+    else:
+        project_dirs = [d for d in logs_path.iterdir() if d.is_dir()]
+    if not project_dirs:
+        logger.warning("No project directories found")
+        return
+    # Filter by date if specified
+    if args.days:
+        cutoff_date = datetime.now() - timedelta(days=args.days)
+        filtered_dirs = []
+        for project_dir in project_dirs:
+            jsonl_files = list(project_dir.glob("*.jsonl"))
+            recent_files = [f for f in jsonl_files if datetime.fromtimestamp(f.stat().st_mtime) > cutoff_date]
+            if recent_files:
+                filtered_dirs.append(project_dir)
+        project_dirs = filtered_dirs
+        logger.info(f"Filtered to {len(project_dirs)} projects with files from last {args.days} days")
+    # Apply limit if specified
+    if args.limit:
+        project_dirs = project_dirs[:args.limit]
+    logger.info(f"Found {len(project_dirs)} projects to import")
+    # Import each project
+    total_imported = 0
+    for project_dir in project_dirs:
+        logger.info(f"Importing project: {project_dir.name}")
+        chunks = import_project(project_dir, state)
+        total_imported += chunks
+    # Print timing statistics
+    logger.info("\n=== Import Performance Summary ===")
+    logger.info(f"Total chunks imported: {total_imported}")
+    if timing_stats["total"]:
+        logger.info(f"\nTiming averages:")
+        logger.info(f"  Extract: {sum(timing_stats['extract'])/len(timing_stats['extract']):.2f}s")
+        logger.info(f"  Chunk: {sum(timing_stats['chunk'])/len(timing_stats['chunk']):.2f}s")
+        if timing_stats['embed']:
+            logger.info(f"  Embed: {sum(timing_stats['embed'])/len(timing_stats['embed']):.2f}s")
+        if timing_stats['store']:
+            logger.info(f"  Store: {sum(timing_stats['store'])/len(timing_stats['store']):.2f}s")
+        logger.info(f"  Total: {sum(timing_stats['total'])/len(timing_stats['total']):.2f}s per file")
+if __name__ == "__main__":
+    main()

package/scripts/import-conversations-unified.py CHANGED Viewed

@@ -33,7 +33,9 @@ from tenacity import (
 # Configuration
 QDRANT_URL = os.getenv("QDRANT_URL", "http://localhost:6333")
 LOGS_DIR = os.getenv("LOGS_DIR", "/logs")
-STATE_FILE = os.getenv("STATE_FILE", "/config/imported-files.json")
+# Default to project config directory for state file
+default_state_file = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "config", "imported-files.json")
+STATE_FILE = os.getenv("STATE_FILE", default_state_file)
 BATCH_SIZE = int(os.getenv("BATCH_SIZE", "10"))  # Reduced from 100 to prevent OOM
 PREFER_LOCAL_EMBEDDINGS = os.getenv("PREFER_LOCAL_EMBEDDINGS", "false").lower() == "true"
 VOYAGE_API_KEY = os.getenv("VOYAGE_KEY")

package/scripts/import-watcher.py CHANGED Viewed

@@ -1,33 +1,88 @@
 #!/usr/bin/env python3
-"""Simple watcher that runs import periodically."""
+"""Enhanced watcher that runs import periodically and supports manual triggers."""
 import time
 import subprocess
 import os
 import sys
 from datetime import datetime
+from pathlib import Path
 WATCH_INTERVAL = int(os.getenv('WATCH_INTERVAL', '60'))
+SIGNAL_FILE = Path("/tmp/claude-self-reflect-import-current")
+CHECK_INTERVAL = 1  # Check for signal file every second
-print(f"[Watcher] Starting import watcher with {WATCH_INTERVAL}s interval", flush=True)
+print(f"[Watcher] Starting enhanced import watcher with {WATCH_INTERVAL}s interval", flush=True)
+print(f"[Watcher] Monitoring signal file: {SIGNAL_FILE}", flush=True)
+last_import = 0
 while True:
-    try:
-        print(f"[Watcher] Running import at {datetime.now().isoformat()}", flush=True)
-        result = subprocess.run([
-            sys.executable,
-            "/scripts/import-conversations-unified.py"
-        ], capture_output=True, text=True)
-        if result.returncode == 0:
-            print(f"[Watcher] Import completed successfully", flush=True)
-        else:
-            print(f"[Watcher] Import failed with code {result.returncode}", flush=True)
-            if result.stderr:
-                print(f"[Watcher] Error: {result.stderr}", flush=True)
-    except Exception as e:
-        print(f"[Watcher] Error: {e}", flush=True)
+    current_time = time.time()
+    # Check for manual trigger signal
+    if SIGNAL_FILE.exists():
+        print(f"[Watcher] Signal detected! Running immediate import...", flush=True)
+        try:
+            # Read conversation ID if provided
+            conversation_id = None
+            try:
+                conversation_id = SIGNAL_FILE.read_text().strip()
+            except:
+                pass
+            # Remove signal file to prevent re-triggering
+            SIGNAL_FILE.unlink()
+            # Run import with special flag for current conversation only
+            cmd = [sys.executable, "/scripts/import-conversations-unified.py"]
+            if conversation_id:
+                cmd.extend(["--conversation-id", conversation_id])
+            else:
+                # Import only today's conversations for manual trigger
+                cmd.extend(["--days", "1"])
+            # Write progress indicator
+            progress_file = Path("/tmp/claude-self-reflect-import-progress")
+            progress_file.write_text("🔄 Starting import...")
+            print(f"[Watcher] Running command: {' '.join(cmd)}", flush=True)
+            result = subprocess.run(cmd, capture_output=True, text=True)
+            if result.returncode == 0:
+                print(f"[Watcher] Manual import completed successfully", flush=True)
+                # Create completion signal
+                Path("/tmp/claude-self-reflect-import-complete").touch()
+            else:
+                print(f"[Watcher] Manual import failed with code {result.returncode}", flush=True)
+                if result.stderr:
+                    print(f"[Watcher] Error: {result.stderr}", flush=True)
+            last_import = current_time
+        except Exception as e:
+            print(f"[Watcher] Error during manual import: {e}", flush=True)
+    # Regular scheduled import
+    elif current_time - last_import >= WATCH_INTERVAL:
+        try:
+            print(f"[Watcher] Running scheduled import at {datetime.now().isoformat()}", flush=True)
+            result = subprocess.run([
+                sys.executable,
+                "/scripts/import-conversations-unified.py"
+            ], capture_output=True, text=True)
+            if result.returncode == 0:
+                print(f"[Watcher] Scheduled import completed successfully", flush=True)
+            else:
+                print(f"[Watcher] Scheduled import failed with code {result.returncode}", flush=True)
+                if result.stderr:
+                    print(f"[Watcher] Error: {result.stderr}", flush=True)
+            last_import = current_time
+        except Exception as e:
+            print(f"[Watcher] Error during scheduled import: {e}", flush=True)
-    print(f"[Watcher] Sleeping for {WATCH_INTERVAL} seconds...", flush=True)
-    time.sleep(WATCH_INTERVAL)
+    # Short sleep to check for signals frequently
+    time.sleep(CHECK_INTERVAL)