npm - claude-self-reflect - Versions diffs - 3.2.4 → 3.3.0 - Mend

claude-self-reflect 3.2.4 → 3.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (33) hide show

package/.claude/agents/claude-self-reflect-test.md +595 -528
package/.claude/agents/reflection-specialist.md +59 -3
package/README.md +14 -5
package/mcp-server/run-mcp.sh +49 -5
package/mcp-server/src/app_context.py +64 -0
package/mcp-server/src/config.py +57 -0
package/mcp-server/src/connection_pool.py +286 -0
package/mcp-server/src/decay_manager.py +106 -0
package/mcp-server/src/embedding_manager.py +64 -40
package/mcp-server/src/embeddings_old.py +141 -0
package/mcp-server/src/models.py +64 -0
package/mcp-server/src/parallel_search.py +371 -0
package/mcp-server/src/project_resolver.py +5 -0
package/mcp-server/src/reflection_tools.py +206 -0
package/mcp-server/src/rich_formatting.py +196 -0
package/mcp-server/src/search_tools.py +826 -0
package/mcp-server/src/server.py +127 -1720
package/mcp-server/src/temporal_design.py +132 -0
package/mcp-server/src/temporal_tools.py +597 -0
package/mcp-server/src/temporal_utils.py +384 -0
package/mcp-server/src/utils.py +150 -67
package/package.json +10 -1
package/scripts/add-timestamp-indexes.py +134 -0
package/scripts/check-collections.py +29 -0
package/scripts/debug-august-parsing.py +76 -0
package/scripts/debug-import-single.py +91 -0
package/scripts/debug-project-resolver.py +82 -0
package/scripts/debug-temporal-tools.py +135 -0
package/scripts/delta-metadata-update.py +547 -0
package/scripts/import-conversations-unified.py +53 -2
package/scripts/precompact-hook.sh +33 -0
package/scripts/streaming-watcher.py +1443 -0
package/scripts/utils.py +39 -0

package/.claude/agents/claude-self-reflect-test.md CHANGED Viewed

@@ -1,32 +1,64 @@
 ---
 name: claude-self-reflect-test
 description: Comprehensive end-to-end testing specialist for Claude Self-Reflect system validation. Tests all components including import pipeline, MCP integration, search functionality, and both local/cloud embedding modes. Ensures system integrity before releases and validates installations. Always restores system to local mode after testing.
-tools: Read, Bash, Grep, Glob, LS, Write, Edit, TodoWrite
+tools: Read, Bash, Grep, Glob, LS, Write, Edit, TodoWrite, mcp__claude-self-reflect__reflect_on_past, mcp__claude-self-reflect__store_reflection, mcp__claude-self-reflect__get_recent_work, mcp__claude-self-reflect__search_by_recency, mcp__claude-self-reflect__get_timeline, mcp__claude-self-reflect__quick_search, mcp__claude-self-reflect__search_summary, mcp__claude-self-reflect__get_more_results, mcp__claude-self-reflect__search_by_file, mcp__claude-self-reflect__search_by_concept, mcp__claude-self-reflect__get_full_conversation, mcp__claude-self-reflect__get_next_results
 ---
-You are a comprehensive testing specialist for Claude Self-Reflect. You validate the entire system end-to-end, ensuring all components work correctly across different configurations and deployment scenarios.
+You are the comprehensive testing specialist for Claude Self-Reflect. You validate EVERY component and feature, ensuring complete system integrity across all configurations and deployment scenarios. You test current v3.x features including temporal queries, time-based search, and activity timelines.
 ## Core Testing Philosophy
-1. **Test Everything** - Import pipeline, MCP tools, search functionality, state management
-2. **Both Modes** - Validate both local (FastEmbed) and cloud (Voyage AI) embeddings
-3. **Always Restore** - System MUST be left in 100% local state after any testing
-4. **Diagnose & Fix** - When issues are found, diagnose root causes and provide solutions
-5. **Document Results** - Create clear test reports with actionable findings
+1. **Test Everything** - Every feature, every tool, every pipeline
+2. **Both Modes** - Validate local (FastEmbed) and cloud (Voyage AI) embeddings
+3. **Always Restore** - System MUST be left in 100% local state after testing
+4. **Diagnose & Fix** - Identify root causes and provide solutions
+5. **Document Results** - Create clear, actionable test reports
-## System Architecture Understanding
+## System Architecture Knowledge
 ### Components to Test
 - **Import Pipeline**: JSONL parsing, chunking, embedding generation, Qdrant storage
-- **MCP Server**: Tool availability, search functionality, reflection storage
+- **MCP Server**: 15+ tools including temporal, search, reflection, pagination tools
+- **Temporal Tools** (v3.x): get_recent_work, search_by_recency, get_timeline
+- **CLI Tool**: Installation, packaging, setup wizard, status commands
+- **Docker Stack**: Qdrant, streaming watcher, health monitoring
 - **State Management**: File locking, atomic writes, resume capability
-- **Docker Containers**: Qdrant, streaming watcher, service health
 - **Search Quality**: Relevance scores, metadata extraction, cross-project search
+- **Memory Decay**: Client-side and native Qdrant decay
+- **Modularization**: Server architecture with 2,835+ lines
-### Embedding Modes
-- **Local Mode**: FastEmbed with all-MiniLM-L6-v2 (384 dimensions)
-- **Cloud Mode**: Voyage AI with voyage-3-lite (1024 dimensions)
-- **Mode Detection**: Check collection suffixes (_local vs _voyage)
+### Test Files Knowledge
+```
+scripts/
+├── import-conversations-unified.py      # Main import script
+├── streaming-importer.py               # Streaming import
+├── delta-metadata-update.py            # Metadata updater
+├── check-collections.py                # Collection checker
+├── add-timestamp-indexes.py            # Timestamp indexer (NEW)
+├── test-temporal-comprehensive.py      # Temporal tests (NEW)
+├── test-project-scoping.py            # Project scoping test (NEW)
+├── test-direct-temporal.py            # Direct temporal test (NEW)
+├── debug-temporal-tools.py            # Temporal debug (NEW)
+└── status.py                           # Import status checker
+mcp-server/
+├── src/
+│   ├── server.py                      # Main MCP server (2,835 lines!)
+│   ├── temporal_utils.py              # Temporal utilities (NEW)
+│   ├── temporal_design.py             # Temporal design doc (NEW)
+│   └── project_resolver.py            # Project resolution
+tests/
+├── unit/                               # Unit tests
+├── integration/                        # Integration tests
+├── performance/                        # Performance tests
+└── e2e/                               # End-to-end tests
+config/
+├── imported-files.json                # Import state
+├── csr-watcher.json                   # Watcher state
+└── delta-update-state.json            # Delta update state
+```
 ## Comprehensive Test Suite
@@ -35,659 +67,694 @@ You are a comprehensive testing specialist for Claude Self-Reflect. You validate
 #!/bin/bash
 echo "=== SYSTEM HEALTH CHECK ==="
+# Check version
+echo "Version Check:"
+grep version package.json | cut -d'"' -f4
+echo ""
 # Check Docker services
 echo "Docker Services:"
 docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}" | grep -E "(qdrant|watcher|streaming)"
-# Check Qdrant collections
-echo -e "\nQdrant Collections:"
-curl -s http://localhost:6333/collections | jq -r '.result.collections[] | "\(.name)\t\(.points_count) points"'
+# Check Qdrant collections with indexes
+echo -e "\nQdrant Collections (with timestamp indexes):"
+curl -s http://localhost:6333/collections | jq -r '.result.collections[] |
+    "\(.name)\t\(.points_count) points"'
-# Check MCP connection
-echo -e "\nMCP Status:"
+# Check for timestamp indexes
+echo -e "\nTimestamp Index Status:"
+python -c "
+from qdrant_client import QdrantClient
+from qdrant_client.models import OrderBy
+client = QdrantClient('http://localhost:6333')
+collections = client.get_collections().collections
+indexed = 0
+for col in collections[:5]:
+    try:
+        client.scroll(col.name, order_by=OrderBy(key='timestamp', direction='desc'), limit=1)
+        indexed += 1
+    except:
+        pass
+print(f'Collections with timestamp index: {indexed}/{len(collections)}')
+"
+# Check MCP connection with temporal tools
+echo -e "\nMCP Status (with temporal tools):"
 claude mcp list | grep claude-self-reflect || echo "MCP not configured"
 # Check import status
 echo -e "\nImport Status:"
-python mcp-server/src/status.py | jq '.overall'
+python mcp-server/src/status.py 2>/dev/null | jq '.overall' || echo "Status check failed"
 # Check embedding mode
-echo -e "\nCurrent Mode:"
+echo -e "\nCurrent Embedding Mode:"
 if [ -f .env ] && grep -q "PREFER_LOCAL_EMBEDDINGS=false" .env; then
-    echo "Cloud mode (Voyage AI)"
+    echo "Cloud mode (Voyage AI) - 1024 dimensions"
 else
-    echo "Local mode (FastEmbed)"
+    echo "Local mode (FastEmbed) - 384 dimensions"
 fi
+# Check CLI installation
+echo -e "\nCLI Installation:"
+which claude-self-reflect && echo "CLI installed globally" || echo "CLI not in PATH"
+# Check server.py size (modularization needed)
+echo -e "\nServer.py Status:"
+wc -l mcp-server/src/server.py | awk '{print "Lines: " $1 " (needs modularization if >1000)"}'
 ```
-### 2. Import Pipeline Validation
+### 2. Temporal Tools Testing (v3.x)
 ```bash
 #!/bin/bash
-echo "=== IMPORT PIPELINE VALIDATION ==="
+echo "=== TEMPORAL TOOLS TESTING ==="
-# Test JSONL parsing
-test_jsonl_parsing() {
-    echo "Testing JSONL parsing..."
-    TEST_FILE="/tmp/test-$$.jsonl"
-    cat > $TEST_FILE << 'EOF'
-{"type":"conversation","uuid":"test-001","messages":[{"role":"human","content":"Test question"},{"role":"assistant","content":[{"type":"text","text":"Test answer with code:\n```python\nprint('hello')\n```"}]}]}
-EOF
-    python -c "
-import json
-with open('$TEST_FILE') as f:
-    data = json.load(f)
-    assert data['uuid'] == 'test-001'
-    assert len(data['messages']) == 2
-    print('✅ PASS: JSONL parsing works')
-" || echo "❌ FAIL: JSONL parsing error"
-    rm -f $TEST_FILE
+# Test timestamp indexes exist
+test_timestamp_indexes() {
+    echo "Testing timestamp indexes..."
+    python scripts/add-timestamp-indexes.py
+    echo "✅ Timestamp indexes updated"
 }
-# Test chunking
-test_chunking() {
-    echo "Testing message chunking..."
-    python -c "
-from scripts.import_conversations_unified import chunk_messages
-messages = [
-    {'role': 'human', 'content': 'Q1'},
-    {'role': 'assistant', 'content': 'A1'},
-    {'role': 'human', 'content': 'Q2'},
-    {'role': 'assistant', 'content': 'A2'},
-]
-chunks = list(chunk_messages(messages, chunk_size=3))
-if len(chunks) == 2:
-    print('✅ PASS: Chunking works correctly')
-else:
-    print(f'❌ FAIL: Expected 2 chunks, got {len(chunks)}')
-"
+# Test get_recent_work
+test_get_recent_work() {
+    echo "Testing get_recent_work..."
+    cat << 'EOF' > /tmp/test_recent_work.py
+import asyncio
+import sys
+import os
+sys.path.insert(0, 'mcp-server/src')
+os.environ['QDRANT_URL'] = 'http://localhost:6333'
+async def test():
+    from server import get_recent_work
+    class MockContext:
+        async def debug(self, msg): print(f"[DEBUG] {msg}")
+        async def report_progress(self, *args): pass
+    ctx = MockContext()
+    # Test no scope (should default to current project)
+    result1 = await get_recent_work(ctx, limit=3)
+    print("No scope result:", "PASS" if "conversation" in result1 else "FAIL")
+    # Test with scope='all'
+    result2 = await get_recent_work(ctx, limit=3, project='all')
+    print("All scope result:", "PASS" if "conversation" in result2 else "FAIL")
+    # Test with specific project
+    result3 = await get_recent_work(ctx, limit=3, project='claude-self-reflect')
+    print("Specific project:", "PASS" if "conversation" in result3 else "FAIL")
+asyncio.run(test())
+EOF
+    python /tmp/test_recent_work.py
 }
-# Test embedding generation
-test_embeddings() {
-    echo "Testing embedding generation..."
-    python -c "
+# Test search_by_recency
+test_search_by_recency() {
+    echo "Testing search_by_recency..."
+    cat << 'EOF' > /tmp/test_search_recency.py
+import asyncio
+import sys
 import os
-os.environ['PREFER_LOCAL_EMBEDDINGS'] = 'true'
-from fastembed import TextEmbedding
-model = TextEmbedding('sentence-transformers/all-MiniLM-L6-v2')
-embeddings = list(model.embed(['test text']))
-if len(embeddings[0]) == 384:
-    print('✅ PASS: Local embeddings work (384 dims)')
-else:
-    print(f'❌ FAIL: Wrong dimensions: {len(embeddings[0])}')
-"
-}
+sys.path.insert(0, 'mcp-server/src')
+os.environ['QDRANT_URL'] = 'http://localhost:6333'
-# Test Qdrant operations
-test_qdrant() {
-    echo "Testing Qdrant operations..."
-    python -c "
-from qdrant_client import QdrantClient
-client = QdrantClient('http://localhost:6333')
-collections = client.get_collections().collections
-if collections:
-    print(f'✅ PASS: Qdrant accessible ({len(collections)} collections)')
-else:
-    print('❌ FAIL: No Qdrant collections found')
-"
+async def test():
+    from server import search_by_recency
+    class MockContext:
+        async def debug(self, msg): print(f"[DEBUG] {msg}")
+    ctx = MockContext()
+    result = await search_by_recency(ctx, query="test", time_range="last week")
+    print("Search by recency:", "PASS" if "result" in result or "no_results" in result else "FAIL")
+asyncio.run(test())
+EOF
+    python /tmp/test_search_recency.py
 }
-# Run all tests
-test_jsonl_parsing
-test_chunking
-test_embeddings
-test_qdrant
-```
+# Test get_timeline
+test_get_timeline() {
+    echo "Testing get_timeline..."
+    cat << 'EOF' > /tmp/test_timeline.py
+import asyncio
+import sys
+import os
+sys.path.insert(0, 'mcp-server/src')
+os.environ['QDRANT_URL'] = 'http://localhost:6333'
-### 3. MCP Integration Test
-```bash
-#!/bin/bash
-echo "=== MCP INTEGRATION TEST ==="
-# Test search functionality
-test_mcp_search() {
-    echo "Testing MCP search..."
-    # This would be run in Claude Code
-    cat << 'EOF'
-To test in Claude Code:
-1. Search for any recent conversation topic
-2. Verify results have scores > 0.7
-3. Check that metadata includes files and tools
+async def test():
+    from server import get_timeline
+    class MockContext:
+        async def debug(self, msg): print(f"[DEBUG] {msg}")
+    ctx = MockContext()
+    result = await get_timeline(ctx, time_range="last month", granularity="week")
+    print("Timeline result:", "PASS" if "timeline" in result else "FAIL")
+asyncio.run(test())
 EOF
+    python /tmp/test_timeline.py
 }
-# Test search_by_file
-test_search_by_file() {
-    echo "Testing search_by_file..."
+# Test natural language time parsing
+test_temporal_parsing() {
+    echo "Testing temporal parsing..."
     python -c "
-# Simulate MCP search_by_file
-from qdrant_client import QdrantClient
-client = QdrantClient('http://localhost:6333')
-# Get collections with file metadata
-found_files = False
-for collection in client.get_collections().collections[:5]:
-    points = client.scroll(collection.name, limit=10)[0]
-    for point in points:
-        if 'files_analyzed' in point.payload:
-            found_files = True
-            break
-    if found_files:
-        break
-if found_files:
-    print('✅ PASS: File metadata available for search')
-else:
-    print('⚠️  WARN: No file metadata found (run delta-metadata-update.py)')
+from mcp_server.src.temporal_utils import TemporalParser
+parser = TemporalParser()
+tests = ['yesterday', 'last week', 'past 3 days']
+for expr in tests:
+    try:
+        start, end = parser.parse_time_expression(expr)
+        print(f'✅ {expr}: {start.date()} to {end.date()}')
+    except Exception as e:
+        print(f'❌ {expr}: {e}')
 "
 }
-# Test reflection storage
-test_reflection_storage() {
-    echo "Testing reflection storage..."
-    # This requires MCP server to be running
-    echo "Manual test in Claude Code:"
-    echo "1. Store a reflection with tags"
-    echo "2. Search for it immediately"
-    echo "3. Verify it's retrievable"
-}
-test_mcp_search
-test_search_by_file
-test_reflection_storage
+# Run all temporal tests
+test_timestamp_indexes
+test_get_recent_work
+test_search_by_recency
+test_get_timeline
+test_temporal_parsing
 ```
-### 4. Dual-Mode Testing with Auto-Restore
+### 3. CLI Tool Testing (Enhanced)
 ```bash
 #!/bin/bash
-# CRITICAL: This script ALWAYS restores to local mode on exit
+echo "=== CLI TOOL TESTING ==="
-echo "=== DUAL-MODE TESTING WITH AUTO-RESTORE ==="
-# Function to restore local state
-restore_local_state() {
-    echo "=== RESTORING 100% LOCAL STATE ==="
-    # Update .env
-    if [ -f .env ]; then
-        sed -i.bak 's/PREFER_LOCAL_EMBEDDINGS=false/PREFER_LOCAL_EMBEDDINGS=true/' .env
-        sed -i.bak 's/USE_VOYAGE=true/USE_VOYAGE=false/' .env
-    fi
+# Test CLI installation
+test_cli_installation() {
+    echo "Testing CLI installation..."
-    # Update MCP to use local
-    claude mcp remove claude-self-reflect 2>/dev/null
-    claude mcp add claude-self-reflect \
-        "$(pwd)/mcp-server/run-mcp.sh" \
-        -e QDRANT_URL="http://localhost:6333" \
-        -e PREFER_LOCAL_EMBEDDINGS="true" \
-        -s user
-    # Restart containers if needed
-    if docker ps | grep -q streaming-importer; then
-        docker-compose restart streaming-importer
+    # Check if installed globally
+    if command -v claude-self-reflect &> /dev/null; then
+        VERSION=$(claude-self-reflect --version 2>/dev/null || echo "unknown")
+        echo "✅ CLI installed globally (version: $VERSION)"
+    else
+        echo "❌ CLI not found in PATH"
     fi
-    echo "✅ System restored to 100% local state"
-}
-# Set trap to ALWAYS restore on exit
-trap restore_local_state EXIT INT TERM
-# Test local mode
-test_local_mode() {
-    echo "=== Testing Local Mode (FastEmbed) ==="
-    export PREFER_LOCAL_EMBEDDINGS=true
-    # Create test data
-    TEST_DIR="/tmp/test-local-$$"
-    mkdir -p "$TEST_DIR"
-    cat > "$TEST_DIR/test.jsonl" << 'EOF'
-{"type":"conversation","uuid":"local-test","messages":[{"role":"human","content":"Local mode test"}]}
-EOF
-    # Import and verify
-    python scripts/import-conversations-unified.py --file "$TEST_DIR/test.jsonl"
+    # Check package.json files
+    echo "Checking package files..."
+    FILES=(
+        "package.json"
+        "cli/package.json"
+        "cli/src/index.js"
+        "cli/src/setup-wizard.js"
+    )
-    # Check dimensions
-    COLLECTION=$(curl -s http://localhost:6333/collections | jq -r '.result.collections[] | select(.name | contains("_local")) | .name' | head -1)
-    if [ -n "$COLLECTION" ]; then
-        DIMS=$(curl -s "http://localhost:6333/collections/$COLLECTION" | jq '.result.config.params.vectors.size')
-        if [ "$DIMS" = "384" ]; then
-            echo "✅ PASS: Local mode uses 384 dimensions"
+    for file in "${FILES[@]}"; do
+        if [ -f "$file" ]; then
+            echo "✅ $file exists"
         else
-            echo "❌ FAIL: Wrong dimensions: $DIMS"
+            echo "❌ $file missing"
         fi
-    fi
-    rm -rf "$TEST_DIR"
+    done
 }
-# Test cloud mode (if available)
-test_cloud_mode() {
-    if [ ! -f .env ] || ! grep -q "VOYAGE_KEY=" .env; then
-        echo "⚠️  SKIP: No Voyage API key configured"
-        return
-    fi
-    echo "=== Testing Cloud Mode (Voyage AI) ==="
-    export PREFER_LOCAL_EMBEDDINGS=false
-    export VOYAGE_KEY=$(grep VOYAGE_KEY .env | cut -d= -f2)
+# Test CLI commands
+test_cli_commands() {
+    echo "Testing CLI commands..."
-    # Create test data
-    TEST_DIR="/tmp/test-voyage-$$"
-    mkdir -p "$TEST_DIR"
-    cat > "$TEST_DIR/test.jsonl" << 'EOF'
-{"type":"conversation","uuid":"voyage-test","messages":[{"role":"human","content":"Cloud mode test"}]}
-EOF
+    # Test status command
+    claude-self-reflect status 2>/dev/null && echo "✅ Status command works" || echo "❌ Status command failed"
-    # Import and verify
-    python scripts/import-conversations-unified.py --file "$TEST_DIR/test.jsonl"
+    # Test help
+    claude-self-reflect --help 2>/dev/null && echo "✅ Help works" || echo "❌ Help failed"
+}
+# Test npm packaging
+test_npm_packaging() {
+    echo "Testing npm packaging..."
-    # Check dimensions
-    COLLECTION=$(curl -s http://localhost:6333/collections | jq -r '.result.collections[] | select(.name | contains("_voyage")) | .name' | head -1)
-    if [ -n "$COLLECTION" ]; then
-        DIMS=$(curl -s "http://localhost:6333/collections/$COLLECTION" | jq '.result.config.params.vectors.size')
-        if [ "$DIMS" = "1024" ]; then
-            echo "✅ PASS: Cloud mode uses 1024 dimensions"
-        else
-            echo "❌ FAIL: Wrong dimensions: $DIMS"
-        fi
-    fi
+    # Check if publishable
+    npm pack --dry-run 2>&1 | grep -q "claude-self-reflect" && \
+        echo "✅ Package is publishable" || \
+        echo "❌ Package issues detected"
-    rm -rf "$TEST_DIR"
+    # Check dependencies
+    npm ls --depth=0 2>&1 | grep -q "UNMET" && \
+        echo "❌ Unmet dependencies" || \
+        echo "✅ Dependencies satisfied"
 }
-# Run tests
-test_local_mode
-test_cloud_mode
-# Trap ensures restoration even if tests fail
+test_cli_installation
+test_cli_commands
+test_npm_packaging
 ```
-### 5. Data Integrity Validation
+### 4. Import Pipeline Validation (Enhanced)
 ```bash
 #!/bin/bash
-echo "=== DATA INTEGRITY VALIDATION ==="
+echo "=== IMPORT PIPELINE VALIDATION ==="
-# Test no duplicates on re-import
-test_no_duplicates() {
-    echo "Testing duplicate prevention..."
+# Test unified importer
+test_unified_importer() {
+    echo "Testing unified importer..."
-    # Find a test file
+    # Find a test JSONL file
     TEST_FILE=$(find ~/.claude/projects -name "*.jsonl" -type f | head -1)
     if [ -z "$TEST_FILE" ]; then
-        echo "⚠️  SKIP: No test files available"
+        echo "⚠️  No test files available"
         return
     fi
-    # Get collection
-    PROJECT_DIR=$(dirname "$TEST_FILE")
-    PROJECT_NAME=$(basename "$PROJECT_DIR")
-    COLLECTION="${PROJECT_NAME}_local"
-    # Count before
-    COUNT_BEFORE=$(curl -s "http://localhost:6333/collections/$COLLECTION/points/count" | jq '.result.count')
+    # Test with limit
+    python scripts/import-conversations-unified.py --file "$TEST_FILE" --limit 1
-    # Force re-import
-    python scripts/import-conversations-unified.py --file "$TEST_FILE" --force
-    # Count after
-    COUNT_AFTER=$(curl -s "http://localhost:6333/collections/$COLLECTION/points/count" | jq '.result.count')
-    if [ "$COUNT_BEFORE" = "$COUNT_AFTER" ]; then
-        echo "✅ PASS: No duplicates created on re-import"
+    if [ $? -eq 0 ]; then
+        echo "✅ Unified importer works"
     else
-        echo "❌ FAIL: Duplicates detected ($COUNT_BEFORE -> $COUNT_AFTER)"
+        echo "❌ Unified importer failed"
     fi
 }
-# Test file locking
-test_file_locking() {
-    echo "Testing concurrent import safety..."
+# Test streaming importer
+test_streaming_importer() {
+    echo "Testing streaming importer..."
-    # Run parallel imports
-    python scripts/import-conversations-unified.py --limit 1 &
-    PID1=$!
-    python scripts/import-conversations-unified.py --limit 1 &
-    PID2=$!
-    wait $PID1 $PID2
-    if [ $? -eq 0 ]; then
-        echo "✅ PASS: Concurrent imports handled safely"
+    if docker ps | grep -q streaming-importer; then
+        # Check if processing
+        docker logs streaming-importer --tail 10 | grep -q "Processing" && \
+            echo "✅ Streaming importer active" || \
+            echo "⚠️  Streaming importer idle"
     else
-        echo "❌ FAIL: File locking issue detected"
+        echo "❌ Streaming importer not running"
     fi
 }
-# Test state persistence
-test_state_persistence() {
-    echo "Testing state file persistence..."
+# Test delta metadata update
+test_delta_metadata() {
+    echo "Testing delta metadata update..."
-    STATE_FILE="$HOME/.claude-self-reflect/config/imported-files.json"
-    if [ -f "$STATE_FILE" ]; then
-        # Check file is valid JSON
-        if jq empty "$STATE_FILE" 2>/dev/null; then
-            echo "✅ PASS: State file is valid JSON"
-        else
-            echo "❌ FAIL: State file corrupted"
-        fi
-    else
-        echo "⚠️  WARN: No state file found"
-    fi
+    DRY_RUN=true python scripts/delta-metadata-update.py 2>&1 | grep -q "would update" && \
+        echo "✅ Delta metadata updater works" || \
+        echo "❌ Delta metadata updater failed"
 }
-test_no_duplicates
-test_file_locking
-test_state_persistence
+test_unified_importer
+test_streaming_importer
+test_delta_metadata
 ```
-### 6. Performance Validation
+### 5. MCP Tools Comprehensive Test
 ```bash
 #!/bin/bash
-echo "=== PERFORMANCE VALIDATION ==="
+echo "=== MCP TOOLS COMPREHENSIVE TEST ==="
+# This should be run via Claude Code for actual MCP testing
+cat << 'EOF'
+To test all MCP tools in Claude Code:
+1. SEARCH TOOLS:
+   - mcp__claude-self-reflect__reflect_on_past("test query", limit=3)
+   - mcp__claude-self-reflect__quick_search("test")
+   - mcp__claude-self-reflect__search_summary("test")
+   - mcp__claude-self-reflect__search_by_file("server.py")
+   - mcp__claude-self-reflect__search_by_concept("testing")
+2. TEMPORAL TOOLS (NEW):
+   - mcp__claude-self-reflect__get_recent_work(limit=5)
+   - mcp__claude-self-reflect__get_recent_work(project="all")
+   - mcp__claude-self-reflect__search_by_recency("bug", time_range="last week")
+   - mcp__claude-self-reflect__get_timeline(time_range="last month", granularity="week")
+3. REFLECTION TOOLS:
+   - mcp__claude-self-reflect__store_reflection("Test insight", tags=["test"])
+   - mcp__claude-self-reflect__get_full_conversation("conversation-id")
+4. PAGINATION:
+   - mcp__claude-self-reflect__get_more_results("query", offset=3)
+   - mcp__claude-self-reflect__get_next_results("query", offset=3)
+Expected Results:
+- All tools should return valid XML/markdown responses
+- Search scores should be > 0.3 for relevant results
+- Temporal tools should respect project scoping
+- No errors or timeouts
+EOF
+```
-# Test import speed
-test_import_performance() {
-    echo "Testing import performance..."
-    START_TIME=$(date +%s)
-    TEST_FILE=$(find ~/.claude/projects -name "*.jsonl" -type f | head -1)
-    if [ -n "$TEST_FILE" ]; then
-        timeout 30 python scripts/import-conversations-unified.py --file "$TEST_FILE" --limit 1
-        END_TIME=$(date +%s)
-        DURATION=$((END_TIME - START_TIME))
+### 6. Docker Health Validation
+```bash
+#!/bin/bash
+echo "=== DOCKER HEALTH VALIDATION ==="
+# Check Qdrant health
+check_qdrant_health() {
+    echo "Checking Qdrant health..."
+    # Check if running
+    if docker ps | grep -q qdrant; then
+        # Check API responsive
+        curl -s http://localhost:6333/health | grep -q "ok" && \
+            echo "✅ Qdrant healthy" || \
+            echo "❌ Qdrant API not responding"
-        if [ $DURATION -lt 10 ]; then
-            echo "✅ PASS: Import completed in ${DURATION}s"
+        # Check disk usage
+        DISK_USAGE=$(docker exec qdrant df -h /qdrant/storage | tail -1 | awk '{print $5}' | sed 's/%//')
+        if [ "$DISK_USAGE" -lt 80 ]; then
+            echo "✅ Disk usage: ${DISK_USAGE}%"
         else
-            echo "⚠️  WARN: Import took ${DURATION}s (expected <10s)"
+            echo "⚠️  High disk usage: ${DISK_USAGE}%"
         fi
+    else
+        echo "❌ Qdrant not running"
     fi
 }
-# Test search performance
-test_search_performance() {
-    echo "Testing search performance..."
+# Check watcher health
+check_watcher_health() {
+    echo "Checking watcher health..."
-    python -c "
-import time
-from qdrant_client import QdrantClient
-from fastembed import TextEmbedding
-client = QdrantClient('http://localhost:6333')
-model = TextEmbedding('sentence-transformers/all-MiniLM-L6-v2')
-# Generate query embedding
-query_vec = list(model.embed(['test search query']))[0]
-# Time search across collections
-start = time.time()
-collections = client.get_collections().collections[:5]
-for col in collections:
-    if '_local' in col.name:
-        try:
-            client.search(col.name, query_vec, limit=5)
-        except:
-            pass
-elapsed = time.time() - start
-if elapsed < 1:
-    print(f'✅ PASS: Search completed in {elapsed:.2f}s')
-else:
-    print(f'⚠️  WARN: Search took {elapsed:.2f}s')
-"
+    WATCHER_NAME="claude-reflection-safe-watcher"
+    if docker ps | grep -q "$WATCHER_NAME"; then
+        # Check memory usage
+        MEM=$(docker stats --no-stream --format "{{.MemUsage}}" "$WATCHER_NAME" 2>/dev/null | cut -d'/' -f1 | sed 's/[^0-9.]//g')
+        if [ -n "$MEM" ]; then
+            echo "✅ Watcher running (Memory: ${MEM}MB)"
+        else
+            echo "⚠️  Watcher running but stats unavailable"
+        fi
+        # Check for errors in logs
+        ERROR_COUNT=$(docker logs "$WATCHER_NAME" --tail 100 2>&1 | grep -c ERROR)
+        if [ "$ERROR_COUNT" -eq 0 ]; then
+            echo "✅ No errors in recent logs"
+        else
+            echo "⚠️  Found $ERROR_COUNT errors in logs"
+        fi
+    else
+        echo "❌ Watcher not running"
+    fi
 }
-# Test memory usage
-test_memory_usage() {
-    echo "Testing memory usage..."
+# Check docker-compose status
+check_compose_status() {
+    echo "Checking docker-compose status..."
-    if docker ps | grep -q streaming-importer; then
-        MEM=$(docker stats --no-stream --format "{{.MemUsage}}" streaming-importer | cut -d'/' -f1 | sed 's/[^0-9.]//g')
-        # Note: Total includes ~180MB for FastEmbed model
-        if (( $(echo "$MEM < 300" | bc -l) )); then
-            echo "✅ PASS: Memory usage ${MEM}MB is acceptable"
-        else
-            echo "⚠️  WARN: High memory usage: ${MEM}MB"
-        fi
+    if [ -f docker-compose.yaml ]; then
+        # Validate compose file
+        docker-compose config --quiet 2>/dev/null && \
+            echo "✅ docker-compose.yaml valid" || \
+            echo "❌ docker-compose.yaml has errors"
+        # Check defined services
+        SERVICES=$(docker-compose config --services 2>/dev/null)
+        echo "Defined services: $SERVICES"
+    else
+        echo "❌ docker-compose.yaml not found"
     fi
 }
-test_import_performance
-test_search_performance
-test_memory_usage
+check_qdrant_health
+check_watcher_health
+check_compose_status
 ```
-### 7. Security Validation
+### 7. Modularization Readiness Check (NEW)
 ```bash
 #!/bin/bash
-echo "=== SECURITY VALIDATION ==="
+echo "=== MODULARIZATION READINESS CHECK ==="
-# Check for API key leaks
-check_api_key_security() {
-    echo "Checking for API key exposure..."
-    CHECKS=(
-        "docker logs qdrant 2>&1"
-        "docker logs streaming-importer 2>&1"
-        "find /tmp -name '*claude*' -type f 2>/dev/null"
-    )
-    EXPOSED=false
-    for check in "${CHECKS[@]}"; do
-        if eval "$check" | grep -q "VOYAGE_KEY=\|pa-"; then
-            echo "❌ FAIL: Potential API key exposure in: $check"
-            EXPOSED=true
-        fi
-    done
+# Analyze server.py for modularization
+analyze_server_py() {
+    echo "Analyzing server.py for modularization..."
-    if [ "$EXPOSED" = false ]; then
-        echo "✅ PASS: No API key exposure detected"
+    FILE="mcp-server/src/server.py"
+    if [ -f "$FILE" ]; then
+        # Count lines
+        LINES=$(wc -l < "$FILE")
+        echo "Total lines: $LINES"
+        # Count tools
+        TOOL_COUNT=$(grep -c "@mcp.tool()" "$FILE")
+        echo "MCP tools defined: $TOOL_COUNT"
+        # Count imports
+        IMPORT_COUNT=$(grep -c "^import\|^from" "$FILE")
+        echo "Import statements: $IMPORT_COUNT"
+        # Identify major sections
+        echo -e "\nMajor sections to extract:"
+        echo "- Temporal tools (get_recent_work, search_by_recency, get_timeline)"
+        echo "- Search tools (reflect_on_past, quick_search, etc.)"
+        echo "- Reflection tools (store_reflection, get_full_conversation)"
+        echo "- Embedding management (EmbeddingManager, generate_embedding)"
+        echo "- Decay logic (calculate_decay, apply_decay)"
+        echo "- Utils (ProjectResolver, normalize_project_name)"
+        # Check for circular dependencies
+        echo -e "\nChecking for potential circular dependencies..."
+        grep -q "from server import" "$FILE" && \
+            echo "⚠️  Potential circular imports detected" || \
+            echo "✅ No obvious circular imports"
+    else
+        echo "❌ server.py not found"
     fi
 }
-# Check file permissions
-check_file_permissions() {
-    echo "Checking file permissions..."
+# Check for existing modular files
+check_existing_modules() {
+    echo -e "\nChecking for existing modular files..."
+    MODULES=(
+        "temporal_utils.py"
+        "temporal_design.py"
+        "project_resolver.py"
+        "embedding_manager.py"
+    )
-    CONFIG_DIR="$HOME/.claude-self-reflect/config"
-    if [ -d "$CONFIG_DIR" ]; then
-        # Check for world-readable files
-        WORLD_READABLE=$(find "$CONFIG_DIR" -perm -004 -type f 2>/dev/null)
-        if [ -z "$WORLD_READABLE" ]; then
-            echo "✅ PASS: Config files properly secured"
+    for module in "${MODULES[@]}"; do
+        if [ -f "mcp-server/src/$module" ]; then
+            echo "✅ $module exists"
         else
-            echo "⚠️  WARN: World-readable files found"
+            echo "⚠️  $module not found (needs creation)"
         fi
-    fi
+    done
 }
-check_api_key_security
-check_file_permissions
+analyze_server_py
+check_existing_modules
 ```
-## Test Execution Workflow
-### Pre-Release Testing
+### 8. Performance & Memory Testing
 ```bash
 #!/bin/bash
-# Complete pre-release validation
-echo "=== PRE-RELEASE TEST SUITE ==="
-echo "Version: $(grep version package.json | cut -d'"' -f4)"
-echo "Date: $(date)"
-echo ""
+echo "=== PERFORMANCE & MEMORY TESTING ==="
-# 1. Backup current state
-echo "Step 1: Backing up current state..."
-mkdir -p ~/claude-reflect-backup-$(date +%Y%m%d-%H%M%S)
-docker exec qdrant qdrant-backup create
-# 2. Run all test suites
-echo "Step 2: Running test suites..."
-./test-system-health.sh
-./test-import-pipeline.sh
-./test-mcp-integration.sh
-./test-data-integrity.sh
-./test-performance.sh
-./test-security.sh
-# 3. Test both embedding modes
-echo "Step 3: Testing dual modes..."
-./test-dual-mode.sh
-# 4. Generate report
-echo "Step 4: Generating test report..."
-cat > test-report-$(date +%Y%m%d).md << EOF
-# Claude Self-Reflect Test Report
+# Test search performance with temporal tools
+test_search_performance() {
+    echo "Testing search performance..."
+    python -c "
+import time
+import asyncio
+import sys
+import os
+sys.path.insert(0, 'mcp-server/src')
+os.environ['QDRANT_URL'] = 'http://localhost:6333'
-## Summary
-- Date: $(date)
-- Version: $(grep version package.json | cut -d'"' -f4)
-- All Tests: PASS/FAIL
+async def test():
+    from server import get_recent_work, search_by_recency
+    class MockContext:
+        async def debug(self, msg): pass
+        async def report_progress(self, *args): pass
+    ctx = MockContext()
+    # Time get_recent_work
+    start = time.time()
+    await get_recent_work(ctx, limit=10)
+    recent_time = time.time() - start
+    # Time search_by_recency
+    start = time.time()
+    await search_by_recency(ctx, 'test', 'last week')
+    search_time = time.time() - start
+    print(f'get_recent_work: {recent_time:.2f}s')
+    print(f'search_by_recency: {search_time:.2f}s')
+    if recent_time < 2 and search_time < 2:
+        print('✅ Performance acceptable')
+    else:
+        print('⚠️  Performance needs optimization')
-## Test Results
-- System Health: ✅
-- Import Pipeline: ✅
-- MCP Integration: ✅
-- Data Integrity: ✅
-- Performance: ✅
-- Security: ✅
-- Dual Mode: ✅
+asyncio.run(test())
+"
+}
-## Certification
-System ready for release: YES/NO
-EOF
+# Test memory usage
+test_memory_usage() {
+    echo "Testing memory usage..."
+    # Check Python process memory
+    python -c "
+import psutil
+import os
+process = psutil.Process(os.getpid())
+mem_mb = process.memory_info().rss / 1024 / 1024
+print(f'Python process: {mem_mb:.1f}MB')
+"
+    # Check Docker container memory
+    for container in qdrant claude-reflection-safe-watcher; do
+        if docker ps | grep -q $container; then
+            MEM=$(docker stats --no-stream --format "{{.MemUsage}}" $container 2>/dev/null | cut -d'/' -f1 | sed 's/[^0-9.]//g')
+            echo "$container: ${MEM}MB"
+        fi
+    done
+}
-echo "✅ Pre-release testing complete"
+test_search_performance
+test_memory_usage
 ```
-### Fresh Installation Test
+### 9. Complete Test Report Generator
 ```bash
 #!/bin/bash
-# Simulate fresh installation
-echo "=== FRESH INSTALLATION TEST ==="
-# 1. Clean environment
-docker-compose down -v
-rm -rf data/ config/
-claude mcp remove claude-self-reflect
-# 2. Install from npm
-npm install -g claude-self-reflect@latest
-# 3. Run setup
-claude-self-reflect setup --local
-# 4. Wait for first import
-sleep 70
-# 5. Verify functionality
-curl -s http://localhost:6333/collections | jq '.result.collections'
+echo "=== GENERATING TEST REPORT ==="
-# 6. Test MCP
-echo "Manual step: Test MCP tools in Claude Code"
-```
-## Success Criteria
+REPORT_FILE="test-report-$(date +%Y%m%d-%H%M%S).md"
-### Core Functionality
-- [ ] Import pipeline processes all JSONL files
-- [ ] Embeddings generated correctly (384/1024 dims)
-- [ ] Qdrant stores vectors with proper metadata
-- [ ] MCP tools accessible and functional
-- [ ] Search returns relevant results (>0.7 scores)
+cat > "$REPORT_FILE" << EOF
+# Claude Self-Reflect Test Report
-### Reliability
-- [ ] No duplicates on re-import
-- [ ] File locking prevents corruption
-- [ ] State persists across restarts
-- [ ] Resume works after interruption
-- [ ] Retry logic handles transient failures
+## Test Summary
+- **Date**: $(date)
+- **Version**: $(grep version package.json | cut -d'"' -f4)
+- **Server.py Lines**: $(wc -l < mcp-server/src/server.py)
+- **Collections**: $(curl -s http://localhost:6333/collections | jq '.result.collections | length')
+## Feature Tests
+### Core Features
+- [ ] Import Pipeline: PASS/FAIL
+- [ ] MCP Tools (12): PASS/FAIL
+- [ ] Search Quality: PASS/FAIL
+- [ ] State Management: PASS/FAIL
+### v3.x Features
+- [ ] Temporal Tools (3): PASS/FAIL
+- [ ] get_recent_work: PASS/FAIL
+- [ ] search_by_recency: PASS/FAIL
+- [ ] get_timeline: PASS/FAIL
+- [ ] Timestamp Indexes: PASS/FAIL
+- [ ] Project Scoping: PASS/FAIL
+### Infrastructure
+- [ ] CLI Tool: PASS/FAIL
+- [ ] Docker Health: PASS/FAIL
+- [ ] Qdrant: PASS/FAIL
+- [ ] Watcher: PASS/FAIL
 ### Performance
-- [ ] Import <10s per file
-- [ ] Search <1s response time
-- [ ] Memory <300MB total (including model)
-- [ ] No memory leaks over time
-- [ ] Efficient batch processing
-### Security
-- [ ] No API keys in logs
-- [ ] Secure file permissions
-- [ ] No sensitive data exposure
-- [ ] Proper input validation
-- [ ] Safe concurrent access
-## Troubleshooting Guide
+- [ ] Search < 2s: PASS/FAIL
+- [ ] Import < 10s: PASS/FAIL
+- [ ] Memory < 500MB: PASS/FAIL
+### Code Quality
+- [ ] No Critical Bugs: PASS/FAIL
+- [ ] XML Injection Fixed: PASS/FAIL
+- [ ] Native Decay Fixed: PASS/FAIL
+- [ ] Modularization Ready: PASS/FAIL
+## Observations
+$(date): Test execution started
+$(date): All temporal tools tested
+$(date): Project scoping validated
+$(date): CLI packaging verified
+$(date): Docker health confirmed
+## Recommendations
+1. Fix critical bugs before release
+2. Complete modularization (2,835 lines → multiple modules)
+3. Add more comprehensive unit tests
+4. Update documentation for v3.x features
-### Common Issues and Solutions
-#### Import Not Working
-```bash
-# Check logs
-docker logs streaming-importer --tail 50
-# Verify paths
-ls -la ~/.claude/projects/
+## Certification
+**System Ready for Release**: YES/NO
-# Check permissions
-chmod -R 755 ~/.claude/projects/
+## Sign-off
+Tested by: claude-self-reflect-test agent
+Date: $(date)
+EOF
-# Force re-import
-rm ~/.claude-self-reflect/config/imported-files.json
-python scripts/import-conversations-unified.py
+echo "✅ Test report generated: $REPORT_FILE"
 ```
-#### Search Returns Poor Results
-```bash
-# Update metadata
-python scripts/delta-metadata-update.py
-# Check embedding mode
-grep PREFER_LOCAL_EMBEDDINGS .env
-# Verify collection dimensions
-curl http://localhost:6333/collections | jq
-```
+## Test Execution Protocol
-#### MCP Not Available
+### Run Complete Test Suite
 ```bash
-# Remove and re-add
-claude mcp remove claude-self-reflect
-claude mcp add claude-self-reflect /full/path/to/run-mcp.sh \
-    -e QDRANT_URL="http://localhost:6333" -s user
+#!/bin/bash
+# Master test runner
-# Restart Claude Code
-echo "Restart Claude Code manually"
-```
+echo "=== CLAUDE SELF-REFLECT COMPLETE TEST SUITE ==="
+echo "Starting at: $(date)"
+echo ""
-#### High Memory Usage
-```bash
-# Check for duplicate models
-ls -la ~/.cache/fastembed/
+# Create test results directory
+mkdir -p test-results-$(date +%Y%m%d)
+cd test-results-$(date +%Y%m%d)
-# Restart containers
-docker-compose restart
+# Run all test suites
+../test-system-health.sh > health.log 2>&1
+../test-temporal-tools.sh > temporal.log 2>&1
+../test-cli-tool.sh > cli.log 2>&1
+../test-import-pipeline.sh > import.log 2>&1
+../test-docker-health.sh > docker.log 2>&1
+../test-modularization.sh > modular.log 2>&1
+../test-performance.sh > performance.log 2>&1
-# Clear cache if needed
-rm -rf ~/.cache/fastembed/
-```
+# Generate final report
+../generate-test-report.sh
-## Final Certification
+echo ""
+echo "=== TEST SUITE COMPLETE ==="
+echo "Results in: test-results-$(date +%Y%m%d)/"
+echo "Report: test-report-*.md"
+```
-After running all tests, the system should:
-1. Process all conversations correctly
-2. Support both embedding modes
-3. Provide accurate search results
-4. Handle concurrent operations safely
-5. Maintain data integrity
-6. Perform within acceptable limits
-7. Secure sensitive information
-8. **ALWAYS be in local mode after testing**
+## Success Criteria
-Remember: The goal is a robust, reliable system that "just works" for users.
+### Must Pass
+- [ ] All 12 MCP tools functional
+- [ ] Temporal tools work with proper scoping
+- [ ] Timestamp indexes on all collections
+- [ ] CLI installs and runs globally
+- [ ] Docker containers healthy
+- [ ] No critical bugs (native decay, XML injection)
+- [ ] Search returns relevant results
+- [ ] Import pipeline processes files
+- [ ] State persists correctly
+### Should Pass
+- [ ] Performance within limits
+- [ ] Memory usage acceptable
+- [ ] Modularization plan approved
+- [ ] Documentation updated
+- [ ] All unit tests pass
+### Nice to Have
+- [ ] 100% test coverage
+- [ ] Zero warnings in logs
+- [ ] Sub-second search times
+## Final Notes
+This agent knows ALL features of Claude Self-Reflect including:
+- New temporal tools
+- Project scoping fixes
+- Timestamp indexing
+- 2,835-line server.py needing modularization
+- GPT-5 review recommendations
+- All test scripts and their purposes
+The agent will ALWAYS restore the system to local mode after testing and provide comprehensive reports suitable for release decisions.