npm - claude-self-reflect - Versions diffs - 4.0.0 → 4.0.1 - Mend

claude-self-reflect 4.0.0 → 4.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/.claude/agents/csr-validator.md +151 -0
package/.claude/agents/open-source-maintainer.md +46 -7
package/mcp-server/src/parallel_search.py +6 -1
package/mcp-server/src/search_tools.py +8 -2
package/mcp-server/src/status_unified.py +286 -0
package/package.json +1 -1
package/scripts/import-conversations-unified.py +96 -99
package/scripts/streaming-watcher.py +113 -158

package/.claude/agents/csr-validator.md ADDED Viewed

@@ -0,0 +1,151 @@
+---
+name: csr-validator
+description: Validates Claude Self-Reflect system functionality. Use for testing MCP tools, embedding modes, import pipeline, and search. MUST BE USED before releases and after major changes.
+tools: mcp__claude-self-reflect__switch_embedding_mode, mcp__claude-self-reflect__get_embedding_mode, mcp__claude-self-reflect__store_reflection, mcp__claude-self-reflect__csr_reflect_on_past, mcp__claude-self-reflect__csr_quick_check, mcp__claude-self-reflect__csr_search_insights, mcp__claude-self-reflect__get_recent_work, mcp__claude-self-reflect__search_by_recency, mcp__claude-self-reflect__get_timeline, mcp__claude-self-reflect__search_by_file, mcp__claude-self-reflect__search_by_concept, mcp__claude-self-reflect__get_full_conversation, mcp__claude-self-reflect__get_next_results, mcp__claude-self-reflect__csr_get_more, mcp__claude-self-reflect__reload_code, mcp__claude-self-reflect__reload_status, mcp__claude-self-reflect__clear_module_cache, Bash, Read
+model: inherit
+---
+You are a focused CSR system validator. Test ONLY through MCP protocol - NEVER import Python modules directly.
+## Test Sequence (MANDATORY ORDER)
+### 1. Mode Testing
+```
+1. Get current mode (get_embedding_mode)
+2. Switch to CLOUD mode (switch_embedding_mode)
+3. Verify 1024 dimensions
+4. Store test reflection with tag "cloud-test-{timestamp}"
+5. Search for it immediately
+6. Switch to LOCAL mode
+7. Verify 384 dimensions
+8. Store test reflection with tag "local-test-{timestamp}"
+9. Search for it immediately
+```
+### 2. MCP Tools Validation (ALL 15+)
+Test each tool with minimal viable input:
+- `csr_reflect_on_past`: Query "test"
+- `csr_quick_check`: Query "system"
+- `store_reflection`: Content with unique timestamp
+- `get_recent_work`: Limit 2
+- `search_by_recency`: Query "import", time_range "today"
+- `get_timeline`: Range "last hour"
+- `search_by_file`: Path "*.py"
+- `search_by_concept`: Concept "testing"
+- `get_full_conversation`: Use any recent ID
+- `csr_search_insights`: Query "performance"
+- `csr_get_more`: After any search
+- `get_next_results`: After any search
+- `reload_status`: Check reload state
+- `clear_module_cache`: If needed
+- `reload_code`: If status shows changes
+### 3. Security Scan (CRITICAL)
+```bash
+# Scan for hardcoded paths
+grep -r "/Users/[a-zA-Z]*/\|/home/[a-zA-Z]*/" scripts/ --include="*.py" | grep -v "^#" | head -20
+# Scan for API keys/secrets (VOYAGE_KEY, etc)
+grep -r "VOYAGE_KEY\|API_KEY\|SECRET\|PASSWORD" scripts/ --include="*.py" | grep -v "os.environ\|getenv" | head -10
+# Check for sensitive patterns in state files
+grep -E "(api_key|secret|password|token)" ~/.claude-self-reflect/config/*.json | head -10
+# Find transient test files
+find . -name "*test*.py" -o -name "*benchmark*.py" -o -name "*tmp*" -o -name "*.pyc" | grep -v ".git" | head -20
+```
+### 4. Performance Check
+```bash
+# Via Bash tool only
+time python -c "from datetime import datetime; print(datetime.now())"
+ps aux | grep python | head -5
+docker ps --format "table {{.Names}}\t{{.Status}}" | grep qdrant
+```
+### 5. State Verification
+```bash
+# Check unified state
+ls -la ~/.claude-self-reflect/config/unified-state.json
+wc -l ~/.claude-self-reflect/config/unified-state.json
+head -20 ~/.claude-self-reflect/config/unified-state.json
+```
+### 6. CodeRabbit CLI Analysis
+```bash
+# Run CodeRabbit for code quality check
+echo "=== Running CodeRabbit CLI ==="
+coderabbit --version
+script -q /dev/null coderabbit --prompt-only || echo "CodeRabbit CLI issues detected - terminal mode incompatibility"
+# Alternative: Check GitHub PR for CodeRabbit comments
+echo "=== Checking PR CodeRabbit feedback ==="
+gh pr list --state open --limit 1 --json number --jq '.[0].number' | xargs -I {} gh pr view {} --comments | grep -A 5 "coderabbitai" || echo "No open PRs with CodeRabbit feedback"
+```
+### 7. Cleanup Transient Files
+```bash
+# List transient files (DO NOT DELETE YET)
+echo "=== Transient files found ==="
+find . -type f \( -name "*test_*.py" -o -name "test_*.py" -o -name "*benchmark*.py" \) -not -path "./.git/*" -not -path "./tests/*"
+# Archive or mark for deletion
+echo "=== Suggest archiving to: tests/throwaway/ ==="
+```
+## Output Format
+```
+CSR VALIDATION REPORT
+====================
+SECURITY SCAN: [PASS/FAIL]
+- Hardcoded paths: [0 found/X found - LIST THEM]
+- API keys exposed: [0 found/X found - LIST THEM]
+- Sensitive data: [none/FOUND - LIST]
+- Transient files: [X files - LIST FOR CLEANUP]
+Mode Switching: [PASS/FAIL]
+- Local→Cloud: [✓/✗]
+- Cloud→Local: [✓/✗]
+- Dimensions: [384/1024 verified]
+MCP Tools (15/15):
+- csr_reflect_on_past: [✓/✗]
+- [... list all ...]
+Performance:
+- Search latency: [Xms]
+- Memory usage: [XMB]
+- Qdrant status: [healthy/unhealthy]
+CodeRabbit Analysis: [PASS/FAIL]
+- CLI execution: [✓/✗ - terminal mode issues]
+- PR feedback checked: [✓/✗]
+- Issues found: [none/list]
+Critical Issues: [none/list]
+CLEANUP NEEDED:
+- [ ] Remove: [list transient files]
+- [ ] Archive: [list test files]
+- [ ] Fix: [list hardcoded paths]
+VERDICT: [GREEN/YELLOW/RED]
+```
+## Rules
+1. NEVER import Python modules (no `from X import Y`)
+2. Use ONLY mcp__claude-self-reflect__ prefixed tools
+3. Use Bash for system checks ONLY (no Python scripts)
+4. Report EVERY failure, even minor
+5. Test BOTH modes completely
+6. Restore to LOCAL mode at end
+7. Complete in <2 minutes
+## Failure Handling
+- If any MCP tool fails: Report exact error, continue testing others
+- If mode switch fails: CRITICAL - stop and report
+- If search returns no results: Note but continue
+- If Bash fails: Try alternative command
+Focus: Validate MCP protocol layer functionality, not implementation details.

package/.claude/agents/open-source-maintainer.md CHANGED Viewed

@@ -6,8 +6,42 @@ tools: Read, Write, Edit, Bash, Grep, Glob, LS, WebFetch
 You are an open-source project maintainer for the Claude Self Reflect project. Your expertise covers community management, release processes, and maintaining a healthy, welcoming project.
+## CRITICAL WORKFLOW - MUST FOLLOW THIS SEQUENCE
+### Complete Release Flow (CSR Tester → Open Source Maintainer → NPM)
+1. **Code Review Phase**
+   - Check CodeRabbit feedback on existing PRs
+   - Fix ALL identified issues locally
+   - Create feature branch for fixes
+2. **PR Creation Phase**
+   - Create PR with all fixes
+   - Monitor CodeRabbit automated review on the PR
+   - Address any new issues CodeRabbit identifies
+   - Ensure all CI/CD checks pass
+3. **PR Merge Phase**
+   - Request review/approval
+   - Merge PR to main branch
+   - Verify merge completed successfully
+4. **Release Creation Phase**
+   - Create GitHub release with comprehensive notes
+   - Tag appropriately following semver
+   - Monitor automated workflows
+5. **NPM Publication Phase**
+   - Watch CI/CD pipeline for npm publish
+   - Verify package published to npm registry
+   - Test installation: `npm install -g claude-self-reflect@latest`
+6. **Post-Release Phase**
+   - Close related issues with release references
+   - Update project documentation
+   - Announce release in discussions/social
 ## Core Workflow: Explore, Plan, Execute, Verify
-1. **Explore**: Read relevant files, check git history, review PRs
+1. **Explore**: Read relevant files, check git history, review PRs, check CodeRabbit feedback
 2. **Plan**: Think hard about the release strategy before executing
 3. **Execute**: Implement the release with proper checks
 4. **Verify**: Use independent verification (or ask user to verify)
@@ -81,13 +115,18 @@ git log -p --grep="feature name"
 gh pr list --state merged --limit 10
 ```
-### PR Review Process
+### PR Review Process with CodeRabbit
 1. Thank contributor for their time
-2. Run CI/CD checks
-3. Review code for quality and style
-4. Test changes locally
-5. Provide constructive feedback
-6. Merge with descriptive commit message
+2. Check CodeRabbit automated review comments
+   ```bash
+   gh pr view PR_NUMBER --comments | grep -B2 -A10 "coderabbitai"
+   ```
+3. Address any CodeRabbit-identified issues
+4. Run CI/CD checks
+5. Review code for quality and style
+6. Test changes locally
+7. Provide constructive feedback
+8. Merge with descriptive commit message
 ### Release Checklist

package/mcp-server/src/parallel_search.py CHANGED Viewed

@@ -83,9 +83,14 @@ async def search_single_collection(
                 with_payload=True
             )
+            # CRITICAL FIX: Handle None search results (cloud mode issue)
+            if search_results is None:
+                logger.warning(f"Search returned None for collection {collection_name}")
+                search_results = []
             # Debug: Log search results
             logger.debug(f"Search of {collection_name} returned {len(search_results)} results")
             if should_use_decay and not USE_NATIVE_DECAY:
                 # Apply client-side decay
                 await ctx.debug(f"Using CLIENT-SIDE decay for {collection_name}")

package/mcp-server/src/search_tools.py CHANGED Viewed

@@ -102,9 +102,15 @@ class SearchTools:
                 collection_name=collection_name,
                 query_vector=query_embedding,
                 limit=limit,
-                score_threshold=min_score
+                score_threshold=min_score,
+                with_payload=True  # Explicitly request payloads from Qdrant
             )
+            # CRITICAL FIX: Handle None search results (cloud mode issue)
+            if search_results is None:
+                logger.warning(f"Search returned None for collection {collection_name}")
+                search_results = []
             # Convert results to dict format
             results = []
             for result in search_results:

package/mcp-server/src/status_unified.py ADDED Viewed

@@ -0,0 +1,286 @@
+"""Ultra-fast status checker using unified state management.
+This module reads from the unified state file for indexing status.
+Designed for <20ms execution time to support status bars and shell scripts.
+"""
+import json
+import time
+import sys
+from pathlib import Path
+from collections import defaultdict
+# Add scripts directory to path for unified state manager
+scripts_dir = Path(__file__).parent.parent.parent / "scripts"
+if scripts_dir.exists():
+    sys.path.insert(0, str(scripts_dir))
+try:
+    from unified_state_manager import UnifiedStateManager
+except ImportError:
+    # Fallback to reading JSON directly if manager not available
+    UnifiedStateManager = None
+# Try to import shared utilities
+try:
+    from shared_utils import (
+        extract_project_name_from_path,
+        get_claude_projects_dir,
+        get_csr_config_dir
+    )
+except ImportError:
+    # Fallback implementations
+    def extract_project_name_from_path(file_path: str) -> str:
+        """Extract project name from JSONL file path."""
+        path_obj = Path(file_path)
+        dir_name = path_obj.parent.name
+        if dir_name.startswith('-') and 'projects' in dir_name:
+            parts = dir_name.split('-')
+            try:
+                projects_idx = parts.index('projects')
+                if projects_idx + 1 < len(parts):
+                    project_parts = parts[projects_idx + 1:]
+                    return '-'.join(project_parts)
+            except ValueError:
+                pass
+        return dir_name.lstrip('-')
+    def get_claude_projects_dir() -> Path:
+        """Get Claude projects directory."""
+        import os
+        if 'CLAUDE_PROJECTS_DIR' in os.environ:
+            return Path(os.environ['CLAUDE_PROJECTS_DIR'])
+        return Path.home() / ".claude" / "projects"
+    def get_csr_config_dir() -> Path:
+        """Get CSR config directory."""
+        import os
+        if 'CSR_CONFIG_DIR' in os.environ:
+            return Path(os.environ['CSR_CONFIG_DIR'])
+        return Path.home() / '.claude-self-reflect' / 'config'
+def get_watcher_status() -> dict:
+    """Get streaming watcher status from unified state."""
+    try:
+        if UnifiedStateManager:
+            manager = UnifiedStateManager()
+            state = manager.read_state()
+            # Get watcher status from importers section
+            watcher_info = state.get("importers", {}).get("streaming", {})
+            last_run = watcher_info.get("last_run")
+            if last_run:
+                from datetime import datetime, timezone
+                last_run_dt = datetime.fromisoformat(last_run)
+                now = datetime.now(timezone.utc)
+                age_seconds = (now - last_run_dt).total_seconds()
+                is_active = age_seconds < 120  # Active if updated in last 2 minutes
+            else:
+                is_active = False
+                age_seconds = float('inf')
+            return {
+                "running": is_active,
+                "files_processed": watcher_info.get("files_processed", 0),
+                "last_update_seconds": int(age_seconds) if age_seconds != float('inf') else None,
+                "status": "🟢 active" if is_active else "🔴 inactive"
+            }
+        else:
+            # Fallback to old method if UnifiedStateManager not available
+            watcher_state_file = get_csr_config_dir() / "csr-watcher.json"
+            if not watcher_state_file.exists():
+                return {"running": False, "status": "not configured"}
+            with open(watcher_state_file) as f:
+                state = json.load(f)
+            file_age = time.time() - watcher_state_file.stat().st_mtime
+            is_active = file_age < 120
+            return {
+                "running": is_active,
+                "files_processed": len(state.get("imported_files", {})),
+                "last_update_seconds": int(file_age),
+                "status": "🟢 active" if is_active else "🔴 inactive"
+            }
+    except Exception as e:
+        return {"running": False, "status": f"error: {str(e)[:50]}"}
+def get_status() -> dict:
+    """Get indexing status from unified state with per-project breakdown.
+    Returns:
+        dict: JSON structure with overall and per-project indexing status
+    """
+    start_time = time.time()
+    try:
+        if UnifiedStateManager:
+            # Use unified state manager for fast access
+            manager = UnifiedStateManager()
+            status = manager.get_status()
+            # Get per-project breakdown
+            project_stats = defaultdict(lambda: {"indexed": 0, "total": 0})
+            # Count total JSONL files per project
+            projects_dir = get_claude_projects_dir()
+            if projects_dir.exists():
+                for jsonl_file in projects_dir.glob("**/*.jsonl"):
+                    project_name = extract_project_name_from_path(str(jsonl_file))
+                    project_stats[project_name]["total"] += 1
+            # Count indexed files per project from unified state
+            state = manager.read_state()
+            for file_path, metadata in state.get("files", {}).items():
+                if metadata.get("status") == "completed":
+                    project_name = extract_project_name_from_path(file_path)
+                    if project_name in project_stats:
+                        project_stats[project_name]["indexed"] += 1
+            # Format response
+            result = {
+                "overall": {
+                    "percentage": status["percentage"],
+                    "indexed_files": status["indexed_files"],
+                    "total_files": status["total_files"],
+                    "total_chunks": status["total_chunks"],
+                },
+                "watcher": get_watcher_status(),
+                "projects": dict(project_stats),
+                "execution_time_ms": round((time.time() - start_time) * 1000, 2)
+            }
+            return result
+        else:
+            # Fallback to old multi-file method
+            return get_status_legacy()
+    except Exception as e:
+        return {
+            "error": str(e),
+            "execution_time_ms": round((time.time() - start_time) * 1000, 2)
+        }
+def get_status_legacy() -> dict:
+    """Legacy status method reading from multiple files (fallback)."""
+    projects_dir = get_claude_projects_dir()
+    project_stats = defaultdict(lambda: {"indexed": 0, "total": 0})
+    # Count total JSONL files per project
+    if projects_dir.exists():
+        for jsonl_file in projects_dir.glob("**/*.jsonl"):
+            file_str = str(jsonl_file)
+            project_name = extract_project_name_from_path(file_str)
+            project_stats[project_name]["total"] += 1
+    # Read imported-files.json to count indexed files
+    config_dir = get_csr_config_dir()
+    imported_files_path = config_dir / "imported-files.json"
+    if imported_files_path.exists():
+        try:
+            with open(imported_files_path, 'r') as f:
+                data = json.load(f)
+                imported_files = data.get("imported_files", {})
+                for file_path in imported_files.keys():
+                    # Normalize path
+                    if file_path.startswith("/logs/"):
+                        projects_path = str(get_claude_projects_dir())
+                        normalized_path = file_path.replace("/logs/", projects_path + "/", 1)
+                    else:
+                        normalized_path = file_path
+                    # Check if file exists and count it
+                    if Path(normalized_path).exists():
+                        project_name = extract_project_name_from_path(normalized_path)
+                        if project_name in project_stats:
+                            project_stats[project_name]["indexed"] += 1
+        except Exception:
+            pass
+    # Calculate overall stats
+    total_files = sum(p["total"] for p in project_stats.values())
+    indexed_files = sum(p["indexed"] for p in project_stats.values())
+    percentage = (indexed_files / max(total_files, 1)) * 100
+    return {
+        "overall": {
+            "percentage": percentage,
+            "indexed_files": indexed_files,
+            "total_files": total_files
+        },
+        "watcher": get_watcher_status(),
+        "projects": dict(project_stats)
+    }
+def main():
+    """CLI interface for status checking."""
+    import argparse
+    parser = argparse.ArgumentParser(description="Check Claude Self-Reflect indexing status")
+    parser.add_argument("--format", choices=["json", "text"], default="json",
+                       help="Output format (default: json)")
+    parser.add_argument("--watch", action="store_true",
+                       help="Watch mode - update every 2 seconds")
+    args = parser.parse_args()
+    if args.watch:
+        try:
+            while True:
+                status = get_status()
+                if args.format == "json":
+                    print(json.dumps(status, indent=2))
+                else:
+                    overall = status.get("overall", {})
+                    print(f"Indexing: {overall.get('percentage', 0):.1f}% "
+                          f"({overall.get('indexed_files', 0)}/{overall.get('total_files', 0)})")
+                    watcher = status.get("watcher", {})
+                    print(f"Watcher: {watcher.get('status', '🔴 inactive')}")
+                    if status.get("execution_time_ms"):
+                        print(f"Time: {status['execution_time_ms']}ms")
+                print("\n" + "-" * 40)
+                time.sleep(2)
+        except KeyboardInterrupt:
+            print("\nStopped")
+    else:
+        status = get_status()
+        if args.format == "json":
+            print(json.dumps(status, indent=2))
+        else:
+            overall = status.get("overall", {})
+            print(f"Indexing: {overall.get('percentage', 0):.1f}% "
+                  f"({overall.get('indexed_files', 0)}/{overall.get('total_files', 0)} files)")
+            watcher = status.get("watcher", {})
+            print(f"Watcher: {watcher.get('status', '🔴 inactive')}")
+            # Show per-project if available
+            projects = status.get("projects", {})
+            if projects:
+                print("\nProjects:")
+                for proj, stats in projects.items():
+                    pct = (stats["indexed"] / max(stats["total"], 1)) * 100
+                    print(f"  {proj}: {pct:.1f}% ({stats['indexed']}/{stats['total']})")
+            if status.get("execution_time_ms"):
+                print(f"\nExecution time: {status['execution_time_ms']}ms")
+if __name__ == "__main__":
+    main()

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claude-self-reflect",
-  "version": "4.0.0",
+  "version": "4.0.1",
   "description": "Give Claude perfect memory of all your conversations - Installation wizard for Python MCP server",
   "keywords": [
     "claude",

package/scripts/import-conversations-unified.py CHANGED Viewed

@@ -15,7 +15,7 @@ import fcntl
 import time
 import argparse
 from pathlib import Path
-from datetime import datetime
+from datetime import datetime, timezone
 from typing import List, Dict, Any, Optional, Set
 import logging
@@ -34,6 +34,9 @@ except ImportError:
 scripts_dir = Path(__file__).parent
 sys.path.insert(0, str(scripts_dir))
+# Import UnifiedStateManager
+from unified_state_manager import UnifiedStateManager
 from qdrant_client import QdrantClient
 from qdrant_client.models import PointStruct, Distance, VectorParams
@@ -72,32 +75,15 @@ MAX_FILES_EDITED = 20
 MAX_TOOLS_USED = 15
 MAX_CONCEPT_MESSAGES = 50
-# Robust cross-platform state file resolution
-def get_default_state_file():
-    """Determine the default state file location with cross-platform support."""
-    from pathlib import Path
-    # Check if we're in Docker (more reliable than just checking /config)
-    docker_indicators = [
-        Path("/.dockerenv").exists(),  # Docker creates this file
-        os.path.exists("/config") and os.access("/config", os.W_OK)  # Mounted config dir with write access
-    ]
-    if any(docker_indicators):
-        return "/config/imported-files.json"
-    # Use pathlib for cross-platform home directory path
-    home_state = Path.home() / ".claude-self-reflect" / "config" / "imported-files.json"
-    return str(home_state)
-# Get state file path with env override support
+# Initialize UnifiedStateManager
+# Support legacy STATE_FILE environment variable
 env_state = os.getenv("STATE_FILE")
 if env_state:
-    # Normalize any user-provided path to absolute
     from pathlib import Path
-    STATE_FILE = str(Path(env_state).expanduser().resolve())
+    state_file_path = Path(env_state).expanduser().resolve()
+    state_manager = UnifiedStateManager(state_file_path)
 else:
-    STATE_FILE = get_default_state_file()
+    state_manager = UnifiedStateManager()  # Uses default location
 PREFER_LOCAL_EMBEDDINGS = os.getenv("PREFER_LOCAL_EMBEDDINGS", "true").lower() == "true"
 VOYAGE_API_KEY = os.getenv("VOYAGE_KEY")
 MAX_CHUNK_SIZE = int(os.getenv("MAX_CHUNK_SIZE", "50"))  # Messages per chunk
@@ -686,18 +672,13 @@ def stream_import_file(jsonl_file: Path, collection_name: str, project_path: Pat
     except Exception as e:
         logger.error(f"Failed to import {jsonl_file}: {e}")
+        # Mark file as failed in state manager
+        try:
+            state_manager.mark_file_failed(str(jsonl_file), str(e))
+        except Exception as state_error:
+            logger.warning(f"Could not mark file as failed in state: {state_error}")
         return 0
-def _locked_open(path, mode):
-    """Open file with exclusive lock for concurrent safety."""
-    f = open(path, mode)
-    try:
-        fcntl.flock(f.fileno(), fcntl.LOCK_EX)
-    except Exception:
-        f.close()
-        raise
-    return f
 def _with_retries(fn, attempts=3, base_sleep=0.5):
     """Execute function with retries and exponential backoff."""
     for i in range(attempts):
@@ -709,66 +690,78 @@ def _with_retries(fn, attempts=3, base_sleep=0.5):
             time.sleep(base_sleep * (2 ** i))
             logger.debug(f"Retrying after error: {e}")
-def load_state() -> dict:
-    """Load import state with file locking."""
-    if os.path.exists(STATE_FILE):
-        try:
-            with _locked_open(STATE_FILE, 'r') as f:
-                return json.load(f)
-        except Exception as e:
-            logger.warning(f"Failed to load state: {e}")
-    return {"imported_files": {}}
-def save_state(state: dict):
-    """Save import state with atomic write."""
-    # Fix: Handle case where STATE_FILE has no directory component
-    state_dir = os.path.dirname(STATE_FILE)
-    if state_dir:
-        os.makedirs(state_dir, exist_ok=True)
-    # Use atomic write with locking to prevent corruption
-    temp_file = f"{STATE_FILE}.tmp"
-    with _locked_open(temp_file, 'w') as f:
-        json.dump(state, f, indent=2)
-        f.flush()
-        os.fsync(f.fileno())
-    # Atomic rename (on POSIX systems)
-    os.replace(temp_file, STATE_FILE)
-def should_import_file(file_path: Path, state: dict) -> bool:
-    """Check if file should be imported."""
-    file_str = str(file_path)
-    if file_str in state.get("imported_files", {}):
-        file_info = state["imported_files"][file_str]
-        last_modified = file_path.stat().st_mtime
-        # Check if file has been modified
-        if file_info.get("last_modified") != last_modified:
-            logger.info(f"File modified, will re-import: {file_path.name}")
-            return True
-        # Check for suspiciously low chunk counts (likely failed imports)
-        chunks = file_info.get("chunks", 0)
-        file_size_kb = file_path.stat().st_size / 1024
-        # Heuristic: Files > 10KB should have more than 2 chunks
-        if file_size_kb > 10 and chunks <= 2:
-            logger.warning(f"File has suspiciously low chunks ({chunks}) for size {file_size_kb:.1f}KB, will re-import: {file_path.name}")
-            return True
-        logger.info(f"Skipping unchanged file: {file_path.name}")
-        return False
-    return True
-def update_file_state(file_path: Path, state: dict, chunks: int):
-    """Update state for imported file."""
-    file_str = str(file_path)
-    state["imported_files"][file_str] = {
-        "imported_at": datetime.now().isoformat(),
-        "last_modified": file_path.stat().st_mtime,
-        "chunks": chunks
-    }
+def should_import_file(file_path: Path) -> bool:
+    """Check if file should be imported using UnifiedStateManager."""
+    try:
+        # Get imported files from state manager
+        imported_files = state_manager.get_imported_files()
+        # Normalize the file path for comparison
+        normalized_path = state_manager.normalize_path(str(file_path))
+        if normalized_path in imported_files:
+            file_info = imported_files[normalized_path]
+            # Skip if file failed and we haven't reached retry limit
+            if file_info.get("status") == "failed" and file_info.get("retry_count", 0) >= 3:
+                logger.info(f"Skipping failed file (max retries reached): {file_path.name}")
+                return False
+            # Get file modification time for comparison
+            last_modified = file_path.stat().st_mtime
+            stored_modified = file_info.get("last_modified")
+            # Check if file has been modified (convert stored timestamp to float if needed)
+            if stored_modified:
+                try:
+                    # Parse ISO timestamp to float for comparison
+                    stored_time = datetime.fromisoformat(stored_modified.replace("Z", "+00:00")).timestamp()
+                    if abs(last_modified - stored_time) > 1:  # Allow 1 second tolerance
+                        logger.info(f"File modified, will re-import: {file_path.name}")
+                        return True
+                except (ValueError, TypeError):
+                    # If we can't parse the stored time, re-import to be safe
+                    logger.warning(f"Could not parse stored modification time, will re-import: {file_path.name}")
+                    return True
+            # Check for suspiciously low chunk counts (likely failed imports)
+            chunks = file_info.get("chunks", 0)
+            file_size_kb = file_path.stat().st_size / 1024
+            # Heuristic: Files > 10KB should have more than 2 chunks
+            if file_size_kb > 10 and chunks <= 2 and file_info.get("status") != "failed":
+                logger.warning(f"File has suspiciously low chunks ({chunks}) for size {file_size_kb:.1f}KB, will re-import: {file_path.name}")
+                return True
+            # Skip if successfully imported
+            if file_info.get("status") == "completed":
+                logger.info(f"Skipping successfully imported file: {file_path.name}")
+                return False
+        return True
+    except Exception as e:
+        logger.warning(f"Error checking import status for {file_path}: {e}")
+        return True  # Default to importing if we can't check status
+def update_file_state(file_path: Path, chunks: int, collection_name: str):
+    """Update state for imported file using UnifiedStateManager."""
+    try:
+        # Determine embedding mode from collection suffix
+        embedding_mode = "local" if collection_suffix == "local" else "cloud"
+        # Add file to state manager
+        state_manager.add_imported_file(
+            file_path=str(file_path),
+            chunks=chunks,
+            importer="streaming",
+            collection=collection_name,
+            embedding_mode=embedding_mode,
+            status="completed"
+        )
+        logger.debug(f"Updated state for {file_path.name}: {chunks} chunks")
+    except Exception as e:
+        logger.error(f"Failed to update state for {file_path}: {e}")
 def main():
     """Main import function."""
@@ -798,9 +791,9 @@ def main():
         collection_suffix = "voyage"
         logger.info("Switched to Voyage AI embeddings (dimension: 1024)")
-    # Load state
-    state = load_state()
-    logger.info(f"Loaded state with {len(state.get('imported_files', {}))} previously imported files")
+    # Get status from state manager
+    status = state_manager.get_status()
+    logger.info(f"Loaded state with {status['indexed_files']} previously imported files")
     # Find all projects
     # Use LOGS_DIR env var, or fall back to Claude projects directory, then /logs for Docker
@@ -848,7 +841,7 @@ def main():
                 logger.info(f"Reached limit of {args.limit} files, stopping import")
                 break
-            if should_import_file(jsonl_file, state):
+            if should_import_file(jsonl_file):
                 chunks = stream_import_file(jsonl_file, collection_name, project_dir)
                 files_processed += 1
                 if chunks > 0:
@@ -868,8 +861,7 @@ def main():
                         if actual_count > 0:
                             logger.info(f"Verified {actual_count} points in Qdrant for {conversation_id}")
-                            update_file_state(jsonl_file, state, chunks)
-                            save_state(state)
+                            update_file_state(jsonl_file, chunks, collection_name)
                             total_imported += 1
                         else:
                             logger.error(f"No points found in Qdrant for {conversation_id} despite {chunks} chunks processed - not marking as imported")
@@ -883,6 +875,11 @@ def main():
                     # Critical fix: Don't mark files with 0 chunks as imported
                     # This allows retry on next run
                     logger.warning(f"File produced 0 chunks, not marking as imported: {jsonl_file.name}")
+                    # Mark as failed so we don't keep retrying indefinitely
+                    try:
+                        state_manager.mark_file_failed(str(jsonl_file), "File produced 0 chunks during import")
+                    except Exception as state_error:
+                        logger.warning(f"Could not mark file as failed in state: {state_error}")
     logger.info(f"Import complete: processed {total_imported} files")

package/scripts/streaming-watcher.py CHANGED Viewed

@@ -35,10 +35,11 @@ from qdrant_client.http.exceptions import UnexpectedResponse
 from fastembed import TextEmbedding
 import psutil
-# Import normalize_project_name
+# Import normalize_project_name and UnifiedStateManager
 import sys
 sys.path.insert(0, str(Path(__file__).parent))
 from utils import normalize_project_name
+from unified_state_manager import UnifiedStateManager
 # Configure logging
 logging.basicConfig(
@@ -52,26 +53,14 @@ logger = logging.getLogger(__name__)
 class Config:
     """Production configuration with proper defaults."""
     qdrant_url: str = field(default_factory=lambda: os.getenv("QDRANT_URL", "http://localhost:6333"))
+    qdrant_api_key: Optional[str] = field(default_factory=lambda: os.getenv("QDRANT_API_KEY"))
+    require_tls_for_remote: bool = field(default_factory=lambda: os.getenv("QDRANT_REQUIRE_TLS_FOR_REMOTE", "true").lower() == "true")
     voyage_api_key: Optional[str] = field(default_factory=lambda: os.getenv("VOYAGE_API_KEY"))
     prefer_local_embeddings: bool = field(default_factory=lambda: os.getenv("PREFER_LOCAL_EMBEDDINGS", "true").lower() == "true")
     embedding_model: str = field(default_factory=lambda: os.getenv("EMBEDDING_MODEL", "sentence-transformers/all-MiniLM-L6-v2"))
     logs_dir: Path = field(default_factory=lambda: Path(os.getenv("LOGS_DIR", "~/.claude/projects")).expanduser())
-    # Production state file with proper naming
-    state_file: Path = field(default_factory=lambda: (
-        # Docker/cloud mode: use /config volume
-        Path("/config/csr-watcher.json") if os.path.exists("/.dockerenv")
-        # Local mode with cloud flag: separate state file
-        else Path("~/.claude-self-reflect/config/csr-watcher-cloud.json").expanduser()
-        if os.getenv("PREFER_LOCAL_EMBEDDINGS", "true").lower() == "false" and os.getenv("VOYAGE_API_KEY")
-        # Default local mode
-        else Path("~/.claude-self-reflect/config/csr-watcher.json").expanduser()
-        if os.getenv("STATE_FILE") is None
-        # User override
-        else Path(os.getenv("STATE_FILE")).expanduser()
-    ))
     collection_prefix: str = "conv"
     vector_size: int = 384  # FastEmbed all-MiniLM-L6-v2
@@ -496,7 +485,7 @@ class QdrantService:
         # Initialize with API key if provided
         self.client = AsyncQdrantClient(
             url=config.qdrant_url,
-            api_key=config.qdrant_api_key if hasattr(config, 'qdrant_api_key') else None
+            api_key=config.qdrant_api_key
         )
         self.embedding_provider = embedding_provider
         self._collection_cache: Dict[str, float] = {}
@@ -797,7 +786,7 @@ class StreamingWatcher:
     def __init__(self, config: Config):
         self.config = config
-        self.state: Dict[str, Any] = {}
+        self.state_manager = UnifiedStateManager()
         self.embedding_provider = self._create_embedding_provider()
         self.qdrant_service = QdrantService(config, self.embedding_provider)
         self.chunker = TokenAwareChunker()
@@ -805,23 +794,23 @@ class StreamingWatcher:
         self.memory_monitor = MemoryMonitor(config.memory_limit_mb, config.memory_warning_mb)
         self.queue_manager = QueueManager(config.max_queue_size, config.max_backlog_hours)
         self.progress = IndexingProgress(config.logs_dir)
         self.stats = {
             "files_processed": 0,
             "chunks_processed": 0,
             "failures": 0,
             "start_time": time.time()
         }
         # Track file wait times for starvation prevention
         self.file_first_seen: Dict[str, float] = {}
         self.current_project: Optional[str] = self._detect_current_project()
         self.last_mode: Optional[str] = None  # Track mode changes for logging
         self.shutdown_event = asyncio.Event()
-        logger.info(f"Streaming Watcher v3.0.0 with HOT/WARM/COLD prioritization")
-        logger.info(f"State file: {self.config.state_file}")
+        logger.info("Streaming Watcher v3.0.0 with HOT/WARM/COLD prioritization")
+        logger.info(f"State file: {self.state_manager.state_file}")
         logger.info(f"Memory limits: {config.memory_warning_mb}MB warning, {config.memory_limit_mb}MB limit")
         logger.info(f"HOT window: {config.hot_window_minutes} min, WARM window: {config.warm_window_hours} hrs")
@@ -901,75 +890,19 @@ class StreamingWatcher:
             )
     async def load_state(self) -> None:
-        """Load persisted state with migration support."""
-        if self.config.state_file.exists():
-            try:
-                with open(self.config.state_file, 'r') as f:
-                    self.state = json.load(f)
-                # Migrate old state format if needed
-                if "imported_files" in self.state:
-                    imported_count = len(self.state["imported_files"])
-                    logger.info(f"Loaded state with {imported_count} files")
-                    # Ensure all entries have full paths as keys
-                    migrated = {}
-                    for key, value in self.state["imported_files"].items():
-                        # Ensure key is a full path
-                        if not key.startswith('/'):
-                            # Try to reconstruct full path
-                            possible_path = self.config.logs_dir / key
-                            if possible_path.exists():
-                                migrated[str(possible_path)] = value
-                            else:
-                                migrated[key] = value  # Keep as is
-                        else:
-                            migrated[key] = value
-                    if len(migrated) != len(self.state["imported_files"]):
-                        logger.info(f"Migrated state format: {len(self.state['imported_files'])} -> {len(migrated)} entries")
-                        self.state["imported_files"] = migrated
-            except Exception as e:
-                logger.error(f"Error loading state: {e}")
-                self.state = {}
-        if "imported_files" not in self.state:
-            self.state["imported_files"] = {}
-        if "high_water_mark" not in self.state:
-            self.state["high_water_mark"] = 0
-        # Update progress tracker
-        self.progress.update(len(self.state["imported_files"]))
-    async def save_state(self) -> None:
-        """Save state atomically."""
+        """Load persisted state using UnifiedStateManager."""
         try:
-            self.config.state_file.parent.mkdir(parents=True, exist_ok=True)
-            temp_file = self.config.state_file.with_suffix('.tmp')
-            with open(temp_file, 'w') as f:
-                json.dump(self.state, f, indent=2)
-                f.flush()
-                os.fsync(f.fileno())
-            if platform.system() == 'Windows':
-                if self.config.state_file.exists():
-                    self.config.state_file.unlink()
-                temp_file.rename(self.config.state_file)
-            else:
-                os.replace(temp_file, self.config.state_file)
-            # Directory fsync for stronger guarantees
-            try:
-                dir_fd = os.open(str(self.config.state_file.parent), os.O_DIRECTORY)
-                os.fsync(dir_fd)
-                os.close(dir_fd)
-            except:
-                pass
+            status = self.state_manager.get_status()
+            imported_count = status["indexed_files"]
+            logger.info(f"Loaded state with {imported_count} files")
+            # Update progress tracker
+            self.progress.update(imported_count)
         except Exception as e:
-            logger.error(f"Error saving state: {e}")
+            logger.error(f"Error loading state: {e}")
+            # Initialize progress with 0
+            self.progress.update(0)
     def get_collection_name(self, project_path: str) -> str:
         """Get collection name for project."""
@@ -1092,15 +1025,15 @@ class StreamingWatcher:
                             continue
             if not all_messages:
-                logger.warning(f"No messages in {file_path}, marking as processed")
-                # Mark file as processed with 0 chunks
-                self.state["imported_files"][str(file_path)] = {
-                    "imported_at": datetime.now().isoformat(),
-                    "_parsed_time": datetime.now().timestamp(),
-                    "chunks": 0,
-                    "collection": collection_name,
-                    "empty_file": True
-                }
+                logger.warning(f"No messages in {file_path}, marking as failed")
+                # Mark as failed to enable retry and correct progress
+                try:
+                    self.state_manager.mark_file_failed(
+                        str(file_path),
+                        "No messages found in conversation (0 chunks)"
+                    )
+                except Exception as e:
+                    logger.exception("Failed to update state for %s", file_path)
                 self.stats["files_processed"] += 1
                 return True
@@ -1181,15 +1114,15 @@ class StreamingWatcher:
             combined_text = "\n\n".join(text_parts)
             if not combined_text.strip():
-                logger.warning(f"No textual content in {file_path}, marking as processed")
-                # Mark file as processed with 0 chunks (has messages but no extractable text)
-                self.state["imported_files"][str(file_path)] = {
-                    "imported_at": datetime.now().isoformat(),
-                    "_parsed_time": datetime.now().timestamp(),
-                    "chunks": 0,
-                    "collection": collection_name,
-                    "no_text_content": True
-                }
+                logger.warning(f"No textual content in {file_path}, marking as failed")
+                # Mark as failed to enable retry and correct progress
+                try:
+                    self.state_manager.mark_file_failed(
+                        str(file_path),
+                        "No textual content in conversation (0 chunks)"
+                    )
+                except Exception as e:
+                    logger.exception("Failed to update state for %s", file_path)
                 self.stats["files_processed"] += 1
                 return True
@@ -1280,23 +1213,34 @@ class StreamingWatcher:
                     if should_cleanup:
                         await self.memory_monitor.cleanup()
-            # Update state - use full path as key
-            self.state["imported_files"][str(file_path)] = {
-                "imported_at": datetime.now().isoformat(),
-                "_parsed_time": datetime.now().timestamp(),
-                "chunks": chunks_processed,
-                "collection": collection_name
-            }
+            # Update state using UnifiedStateManager
+            try:
+                self.state_manager.add_imported_file(
+                    file_path=str(file_path),
+                    chunks=chunks_processed,
+                    importer="streaming",
+                    collection=collection_name,
+                    embedding_mode="local" if self.config.prefer_local_embeddings else "cloud",
+                    status="completed"
+                )
+            except Exception as e:
+                logger.error(f"Failed to update state for {file_path}: {e}")
+                return False
             self.stats["files_processed"] += 1
             self.stats["chunks_processed"] += chunks_processed
             logger.info(f"Completed: {file_path.name} ({chunks_processed} chunks)")
             return True
         except Exception as e:
             logger.error(f"Error processing {file_path}: {e}")
             self.stats["failures"] += 1
+            # Mark file as failed using UnifiedStateManager
+            try:
+                self.state_manager.mark_file_failed(str(file_path), str(e))
+            except Exception as mark_error:
+                logger.error(f"Failed to mark file as failed: {mark_error}")
             return False
     async def find_new_files(self) -> List[Tuple[Path, FreshnessLevel, int]]:
@@ -1304,47 +1248,51 @@ class StreamingWatcher:
         if not self.config.logs_dir.exists():
             logger.warning(f"Logs dir not found: {self.config.logs_dir}")
             return []
         categorized_files = []
-        high_water_mark = self.state.get("high_water_mark", 0)
-        new_high_water = high_water_mark
         now = time.time()
+        # Get imported files from UnifiedStateManager
+        try:
+            imported_files = self.state_manager.get_imported_files()
+        except Exception as e:
+            logger.error(f"Error getting imported files: {e}")
+            imported_files = {}
         try:
             for project_dir in self.config.logs_dir.iterdir():
                 if not project_dir.is_dir():
                     continue
                 try:
                     for jsonl_file in project_dir.glob("*.jsonl"):
                         file_mtime = jsonl_file.stat().st_mtime
-                        new_high_water = max(new_high_water, file_mtime)
-                        # Check if already processed (using full path)
-                        file_key = str(jsonl_file)
-                        if file_key in self.state["imported_files"]:
-                            stored = self.state["imported_files"][file_key]
-                            if "_parsed_time" in stored:
-                                if file_mtime <= stored["_parsed_time"]:
-                                    continue
-                            elif "imported_at" in stored:
-                                import_time = datetime.fromisoformat(stored["imported_at"]).timestamp()
-                                stored["_parsed_time"] = import_time
-                                if file_mtime <= import_time:
-                                    continue
+                        # Check if already processed (using normalized path)
+                        try:
+                            normalized_path = self.state_manager.normalize_path(str(jsonl_file))
+                            if normalized_path in imported_files:
+                                stored = imported_files[normalized_path]
+                                # Check if file was modified after import
+                                import_time_str = stored.get("imported_at")
+                                if import_time_str:
+                                    import_time = datetime.fromisoformat(import_time_str.replace("Z", "+00:00")).timestamp()
+                                    if file_mtime <= import_time:
+                                        continue
+                        except Exception as e:
+                            logger.debug(f"Error checking import status for {jsonl_file}: {e}")
+                            # If we can't check, assume not imported
                         # Categorize file freshness (handles first_seen tracking internally)
                         freshness_level, priority_score = self.categorize_freshness(jsonl_file)
                         categorized_files.append((jsonl_file, freshness_level, priority_score))
                 except Exception as e:
                     logger.error(f"Error scanning project dir {project_dir}: {e}")
         except Exception as e:
             logger.error(f"Error scanning logs dir: {e}")
-        self.state["high_water_mark"] = new_high_water
         # Sort by priority score (lower = higher priority)
         categorized_files.sort(key=lambda x: x[2])
@@ -1370,7 +1318,7 @@ class StreamingWatcher:
         logger.info("=" * 60)
         logger.info("Claude Self-Reflect Streaming Watcher v3.0.0")
         logger.info("=" * 60)
-        logger.info(f"State file: {self.config.state_file}")
+        logger.info("State manager: UnifiedStateManager")
         logger.info(f"Memory: {self.config.memory_warning_mb}MB warning, {self.config.memory_limit_mb}MB limit")
         logger.info(f"CPU limit: {self.cpu_monitor.max_total_cpu:.1f}%")
         logger.info(f"Queue size: {self.config.max_queue_size}")
@@ -1380,9 +1328,10 @@ class StreamingWatcher:
         # Initial progress scan
         total_files = self.progress.scan_total_files()
-        indexed_files = len(self.state.get("imported_files", {}))
+        status = self.state_manager.get_status()
+        indexed_files = status["indexed_files"]
         self.progress.update(indexed_files)
         initial_progress = self.progress.get_progress()
         logger.info(f"Initial progress: {indexed_files}/{total_files} files ({initial_progress['percent']:.1f}%)")
@@ -1433,23 +1382,30 @@ class StreamingWatcher:
                         except FileNotFoundError:
                             logger.warning(f"File disappeared: {file_path}")
                             continue
-                        imported = self.state["imported_files"].get(file_key)
-                        if imported:
-                            parsed_time = imported.get("_parsed_time")
-                            if not parsed_time and "imported_at" in imported:
-                                parsed_time = datetime.fromisoformat(imported["imported_at"]).timestamp()
-                            if parsed_time and file_mtime <= parsed_time:
-                                logger.debug(f"Skipping already imported: {file_path.name}")
-                                continue
+                        # Check if already imported using UnifiedStateManager
+                        try:
+                            normalized_path = self.state_manager.normalize_path(file_key)
+                            imported_files = self.state_manager.get_imported_files()
+                            if normalized_path in imported_files:
+                                stored = imported_files[normalized_path]
+                                import_time_str = stored.get("imported_at")
+                                if import_time_str:
+                                    import_time = datetime.fromisoformat(import_time_str.replace("Z", "+00:00")).timestamp()
+                                    if file_mtime <= import_time:
+                                        logger.debug(f"Skipping already imported: {file_path.name}")
+                                        continue
+                        except Exception as e:
+                            logger.debug(f"Error checking import status: {e}")
                         success = await self.process_file(file_path)
                         if success:
                             # Clean up first_seen tracking to prevent memory leak
                             self.file_first_seen.pop(file_key, None)
-                            await self.save_state()
-                            self.progress.update(len(self.state["imported_files"]))
+                            # Update progress (state is managed by UnifiedStateManager)
+                            status = self.state_manager.get_status()
+                            self.progress.update(status["indexed_files"])
                     # Log comprehensive metrics
                     if batch or cycle_count % 6 == 0:  # Every minute if idle
@@ -1519,7 +1475,6 @@ class StreamingWatcher:
             raise
         finally:
             logger.info("Shutting down...")
-            await self.save_state()
             await self.embedding_provider.close()
             await self.qdrant_service.close()