claude-self-reflect 5.0.7 → 6.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (69) hide show
  1. package/.claude/agents/open-source-maintainer.md +1 -1
  2. package/.claude/agents/reflection-specialist.md +2 -2
  3. package/Dockerfile.async-importer +6 -4
  4. package/Dockerfile.importer +6 -6
  5. package/Dockerfile.safe-watcher +8 -8
  6. package/Dockerfile.streaming-importer +8 -1
  7. package/Dockerfile.watcher +8 -16
  8. package/README.md +0 -3
  9. package/docker-compose.yaml +2 -6
  10. package/installer/.claude/agents/README.md +138 -0
  11. package/package.json +5 -26
  12. package/src/__init__.py +0 -0
  13. package/src/cli/__init__.py +0 -0
  14. package/src/runtime/__init__.py +0 -0
  15. package/src/runtime/import-latest.py +124 -0
  16. package/{scripts → src/runtime}/precompact-hook.sh +1 -1
  17. package/src/runtime/streaming-importer.py +995 -0
  18. package/{scripts → src/runtime}/watcher-loop.sh +1 -1
  19. package/.claude/agents/claude-self-reflect-test.md +0 -1274
  20. package/.claude/agents/reflect-tester.md +0 -300
  21. package/scripts/add-timestamp-indexes.py +0 -134
  22. package/scripts/ast_grep_final_analyzer.py +0 -338
  23. package/scripts/ast_grep_unified_registry.py +0 -710
  24. package/scripts/check-collections.py +0 -29
  25. package/scripts/debug-august-parsing.py +0 -80
  26. package/scripts/debug-import-single.py +0 -91
  27. package/scripts/debug-project-resolver.py +0 -82
  28. package/scripts/debug-temporal-tools.py +0 -135
  29. package/scripts/import-conversations-enhanced.py +0 -672
  30. package/scripts/migrate-to-unified-state.py +0 -426
  31. package/scripts/session_quality_tracker.py +0 -671
  32. package/scripts/update_patterns.py +0 -334
  33. /package/{scripts → src}/importer/__init__.py +0 -0
  34. /package/{scripts → src}/importer/__main__.py +0 -0
  35. /package/{scripts → src}/importer/core/__init__.py +0 -0
  36. /package/{scripts → src}/importer/core/config.py +0 -0
  37. /package/{scripts → src}/importer/core/exceptions.py +0 -0
  38. /package/{scripts → src}/importer/core/models.py +0 -0
  39. /package/{scripts → src}/importer/embeddings/__init__.py +0 -0
  40. /package/{scripts → src}/importer/embeddings/base.py +0 -0
  41. /package/{scripts → src}/importer/embeddings/fastembed_provider.py +0 -0
  42. /package/{scripts → src}/importer/embeddings/validator.py +0 -0
  43. /package/{scripts → src}/importer/embeddings/voyage_provider.py +0 -0
  44. /package/{scripts → src}/importer/main.py +0 -0
  45. /package/{scripts → src}/importer/processors/__init__.py +0 -0
  46. /package/{scripts → src}/importer/processors/ast_extractor.py +0 -0
  47. /package/{scripts → src}/importer/processors/chunker.py +0 -0
  48. /package/{scripts → src}/importer/processors/concept_extractor.py +0 -0
  49. /package/{scripts → src}/importer/processors/conversation_parser.py +0 -0
  50. /package/{scripts → src}/importer/processors/tool_extractor.py +0 -0
  51. /package/{scripts → src}/importer/state/__init__.py +0 -0
  52. /package/{scripts → src}/importer/state/state_manager.py +0 -0
  53. /package/{scripts → src}/importer/storage/__init__.py +0 -0
  54. /package/{scripts → src}/importer/storage/qdrant_storage.py +0 -0
  55. /package/{scripts → src}/importer/utils/__init__.py +0 -0
  56. /package/{scripts → src}/importer/utils/logger.py +0 -0
  57. /package/{scripts → src}/importer/utils/project_normalizer.py +0 -0
  58. /package/{scripts → src/runtime}/delta-metadata-update-safe.py +0 -0
  59. /package/{scripts → src/runtime}/delta-metadata-update.py +0 -0
  60. /package/{scripts → src/runtime}/doctor.py +0 -0
  61. /package/{scripts → src/runtime}/embedding_service.py +0 -0
  62. /package/{scripts → src/runtime}/force-metadata-recovery.py +0 -0
  63. /package/{scripts → src/runtime}/import-conversations-unified.py +0 -0
  64. /package/{scripts → src/runtime}/import_strategies.py +0 -0
  65. /package/{scripts → src/runtime}/message_processors.py +0 -0
  66. /package/{scripts → src/runtime}/metadata_extractor.py +0 -0
  67. /package/{scripts → src/runtime}/streaming-watcher.py +0 -0
  68. /package/{scripts → src/runtime}/unified_state_manager.py +0 -0
  69. /package/{scripts → src/runtime}/utils.py +0 -0
@@ -1,1274 +0,0 @@
1
- ---
2
- name: claude-self-reflect-test
3
- description: Comprehensive end-to-end testing specialist for Claude Self-Reflect system validation. Tests all components including import pipeline, MCP integration, search functionality, and both local/cloud embedding modes. Ensures system integrity before releases and validates installations. Always restores system to local mode after testing.
4
- tools: Read, Bash, Grep, Glob, LS, Write, Edit, TodoWrite, mcp__claude-self-reflect__reflect_on_past, mcp__claude-self-reflect__store_reflection, mcp__claude-self-reflect__get_recent_work, mcp__claude-self-reflect__search_by_recency, mcp__claude-self-reflect__get_timeline, mcp__claude-self-reflect__quick_search, mcp__claude-self-reflect__search_summary, mcp__claude-self-reflect__get_more_results, mcp__claude-self-reflect__search_by_file, mcp__claude-self-reflect__search_by_concept, mcp__claude-self-reflect__get_full_conversation, mcp__claude-self-reflect__get_next_results
5
- ---
6
-
7
- You are the comprehensive testing specialist for Claude Self-Reflect. You validate EVERY component and feature, ensuring complete system integrity across all configurations and deployment scenarios. You test current v3.x features including temporal queries, time-based search, and activity timelines.
8
-
9
- ## Core Testing Philosophy
10
-
11
- 1. **Test Everything** - Every feature, every tool, every pipeline
12
- 2. **Both Modes** - Validate local (FastEmbed) and cloud (Voyage AI) embeddings
13
- 3. **Always Restore** - System MUST be left in 100% local state after testing
14
- 4. **Diagnose & Fix** - Identify root causes and provide solutions
15
- 5. **Document Results** - Create clear, actionable test reports
16
-
17
- ## System Architecture Knowledge
18
-
19
- ### Components to Test
20
- - **Import Pipeline**: JSONL parsing, chunking, embedding generation, Qdrant storage
21
- - **MCP Server**: 15+ tools including temporal, search, reflection, pagination tools
22
- - **Temporal Tools** (v3.x): get_recent_work, search_by_recency, get_timeline
23
- - **CLI Tool**: Installation, packaging, setup wizard, status commands
24
- - **Docker Stack**: Qdrant, streaming watcher, health monitoring
25
- - **State Management**: File locking, atomic writes, resume capability
26
- - **Search Quality**: Relevance scores, metadata extraction, cross-project search
27
- - **Memory Decay**: Client-side and native Qdrant decay
28
- - **Modularization**: Server architecture with search_tools, temporal_tools, reflection_tools, parallel_search modules
29
- - **Metadata Extraction**: AST patterns, concepts, files analyzed, tools used
30
- - **Hook System**: session-start, precompact, submit hooks
31
- - **Sub-Agents**: All 6 specialized agents (reflection, import-debugger, docker, mcp, search, qdrant)
32
- - **Embedding Modes**: Local (FastEmbed 384d) and Cloud (Voyage AI 1024d) with mode switching
33
- - **Zero Vector Detection**: Root cause analysis and prevention
34
-
35
- ### Test Files Knowledge
36
- ```
37
- scripts/
38
- ├── import-conversations-unified.py # Main import script
39
- ├── streaming-importer.py # Streaming import
40
- ├── delta-metadata-update.py # Metadata updater
41
- ├── check-collections.py # Collection checker
42
- ├── add-timestamp-indexes.py # Timestamp indexer (NEW)
43
- ├── test-temporal-comprehensive.py # Temporal tests (NEW)
44
- ├── test-project-scoping.py # Project scoping test (NEW)
45
- ├── test-direct-temporal.py # Direct temporal test (NEW)
46
- ├── debug-temporal-tools.py # Temporal debug (NEW)
47
- └── status.py # Import status checker
48
-
49
- mcp-server/
50
- ├── src/
51
- │ ├── server.py # Main MCP server (2,835 lines!)
52
- │ ├── temporal_utils.py # Temporal utilities (NEW)
53
- │ ├── temporal_design.py # Temporal design doc (NEW)
54
- │ └── project_resolver.py # Project resolution
55
-
56
- tests/
57
- ├── unit/ # Unit tests
58
- ├── integration/ # Integration tests
59
- ├── performance/ # Performance tests
60
- └── e2e/ # End-to-end tests
61
-
62
- config/
63
- ├── imported-files.json # Import state
64
- ├── csr-watcher.json # Watcher state
65
- └── delta-update-state.json # Delta update state
66
- ```
67
-
68
- ## Comprehensive Test Suite
69
-
70
- ### 1. System Health Check
71
- ```bash
72
- #!/bin/bash
73
- echo "=== SYSTEM HEALTH CHECK ==="
74
-
75
- # Check version
76
- echo "Version Check:"
77
- grep version package.json | cut -d'"' -f4
78
- echo ""
79
-
80
- # Check Docker services
81
- echo "Docker Services:"
82
- docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}" | grep -E "(qdrant|watcher|streaming)"
83
-
84
- # Check Qdrant collections with indexes
85
- echo -e "\nQdrant Collections (with timestamp indexes):"
86
- curl -s http://localhost:6333/collections | jq -r '.result.collections[] |
87
- "\(.name)\t\(.points_count) points"'
88
-
89
- # Check for timestamp indexes
90
- echo -e "\nTimestamp Index Status:"
91
- python -c "
92
- from qdrant_client import QdrantClient
93
- from qdrant_client.models import OrderBy
94
- client = QdrantClient('http://localhost:6333')
95
- collections = client.get_collections().collections
96
- indexed = 0
97
- for col in collections[:5]:
98
- try:
99
- client.scroll(col.name, order_by=OrderBy(key='timestamp', direction='desc'), limit=1)
100
- indexed += 1
101
- except:
102
- pass
103
- print(f'Collections with timestamp index: {indexed}/{len(collections)}')
104
- "
105
-
106
- # Check MCP connection with temporal tools
107
- echo -e "\nMCP Status (with temporal tools):"
108
- claude mcp list | grep claude-self-reflect || echo "MCP not configured"
109
-
110
- # Check import status
111
- echo -e "\nImport Status:"
112
- python mcp-server/src/status.py 2>/dev/null | jq '.overall' || echo "Status check failed"
113
-
114
- # Check embedding mode
115
- echo -e "\nCurrent Embedding Mode:"
116
- if [ -f .env ] && grep -q "PREFER_LOCAL_EMBEDDINGS=false" .env; then
117
- echo "Cloud mode (Voyage AI) - 1024 dimensions"
118
- else
119
- echo "Local mode (FastEmbed) - 384 dimensions"
120
- fi
121
-
122
- # Check CLI installation
123
- echo -e "\nCLI Installation:"
124
- which claude-self-reflect && echo "CLI installed globally" || echo "CLI not in PATH"
125
-
126
- # Check server.py size (modularization needed)
127
- echo -e "\nServer.py Status:"
128
- wc -l mcp-server/src/server.py | awk '{print "Lines: " $1 " (needs modularization if >1000)"}'
129
- ```
130
-
131
- ### 2. Temporal Tools Testing (v3.x)
132
- ```bash
133
- #!/bin/bash
134
- echo "=== TEMPORAL TOOLS TESTING ==="
135
-
136
- # Test timestamp indexes exist
137
- test_timestamp_indexes() {
138
- echo "Testing timestamp indexes..."
139
- python scripts/add-timestamp-indexes.py
140
- echo "✅ Timestamp indexes updated"
141
- }
142
-
143
- # Test get_recent_work
144
- test_get_recent_work() {
145
- echo "Testing get_recent_work..."
146
- cat << 'EOF' > /tmp/test_recent_work.py
147
- import asyncio
148
- import sys
149
- import os
150
- sys.path.insert(0, 'mcp-server/src')
151
- os.environ['QDRANT_URL'] = 'http://localhost:6333'
152
-
153
- async def test():
154
- from server import get_recent_work
155
- class MockContext:
156
- async def debug(self, msg): print(f"[DEBUG] {msg}")
157
- async def report_progress(self, *args): pass
158
-
159
- ctx = MockContext()
160
- # Test no scope (should default to current project)
161
- result1 = await get_recent_work(ctx, limit=3)
162
- print("No scope result:", "PASS" if "conversation" in result1 else "FAIL")
163
-
164
- # Test with scope='all'
165
- result2 = await get_recent_work(ctx, limit=3, project='all')
166
- print("All scope result:", "PASS" if "conversation" in result2 else "FAIL")
167
-
168
- # Test with specific project
169
- result3 = await get_recent_work(ctx, limit=3, project='claude-self-reflect')
170
- print("Specific project:", "PASS" if "conversation" in result3 else "FAIL")
171
-
172
- asyncio.run(test())
173
- EOF
174
- python /tmp/test_recent_work.py
175
- }
176
-
177
- # Test search_by_recency
178
- test_search_by_recency() {
179
- echo "Testing search_by_recency..."
180
- cat << 'EOF' > /tmp/test_search_recency.py
181
- import asyncio
182
- import sys
183
- import os
184
- sys.path.insert(0, 'mcp-server/src')
185
- os.environ['QDRANT_URL'] = 'http://localhost:6333'
186
-
187
- async def test():
188
- from server import search_by_recency
189
- class MockContext:
190
- async def debug(self, msg): print(f"[DEBUG] {msg}")
191
-
192
- ctx = MockContext()
193
- result = await search_by_recency(ctx, query="test", time_range="last week")
194
- print("Search by recency:", "PASS" if "result" in result or "no_results" in result else "FAIL")
195
-
196
- asyncio.run(test())
197
- EOF
198
- python /tmp/test_search_recency.py
199
- }
200
-
201
- # Test get_timeline
202
- test_get_timeline() {
203
- echo "Testing get_timeline..."
204
- cat << 'EOF' > /tmp/test_timeline.py
205
- import asyncio
206
- import sys
207
- import os
208
- sys.path.insert(0, 'mcp-server/src')
209
- os.environ['QDRANT_URL'] = 'http://localhost:6333'
210
-
211
- async def test():
212
- from server import get_timeline
213
- class MockContext:
214
- async def debug(self, msg): print(f"[DEBUG] {msg}")
215
-
216
- ctx = MockContext()
217
- result = await get_timeline(ctx, time_range="last month", granularity="week")
218
- print("Timeline result:", "PASS" if "timeline" in result else "FAIL")
219
-
220
- asyncio.run(test())
221
- EOF
222
- python /tmp/test_timeline.py
223
- }
224
-
225
- # Test natural language time parsing
226
- test_temporal_parsing() {
227
- echo "Testing temporal parsing..."
228
- python -c "
229
- from mcp_server.src.temporal_utils import TemporalParser
230
- parser = TemporalParser()
231
- tests = ['yesterday', 'last week', 'past 3 days']
232
- for expr in tests:
233
- try:
234
- start, end = parser.parse_time_expression(expr)
235
- print(f'✅ {expr}: {start.date()} to {end.date()}')
236
- except Exception as e:
237
- print(f'❌ {expr}: {e}')
238
- "
239
- }
240
-
241
- # Run all temporal tests
242
- test_timestamp_indexes
243
- test_get_recent_work
244
- test_search_by_recency
245
- test_get_timeline
246
- test_temporal_parsing
247
- ```
248
-
249
- ### 3. CLI Tool Testing (Enhanced)
250
- ```bash
251
- #!/bin/bash
252
- echo "=== CLI TOOL TESTING ==="
253
-
254
- # Test CLI installation
255
- test_cli_installation() {
256
- echo "Testing CLI installation..."
257
-
258
- # Check if installed globally
259
- if command -v claude-self-reflect &> /dev/null; then
260
- VERSION=$(claude-self-reflect --version 2>/dev/null || echo "unknown")
261
- echo "✅ CLI installed globally (version: $VERSION)"
262
- else
263
- echo "❌ CLI not found in PATH"
264
- fi
265
-
266
- # Check package.json files
267
- echo "Checking package files..."
268
- FILES=(
269
- "package.json"
270
- "cli/package.json"
271
- "cli/src/index.js"
272
- "cli/src/setup-wizard.js"
273
- )
274
-
275
- for file in "${FILES[@]}"; do
276
- if [ -f "$file" ]; then
277
- echo "✅ $file exists"
278
- else
279
- echo "❌ $file missing"
280
- fi
281
- done
282
- }
283
-
284
- # Test CLI commands
285
- test_cli_commands() {
286
- echo "Testing CLI commands..."
287
-
288
- # Test status command
289
- claude-self-reflect status 2>/dev/null && echo "✅ Status command works" || echo "❌ Status command failed"
290
-
291
- # Test help
292
- claude-self-reflect --help 2>/dev/null && echo "✅ Help works" || echo "❌ Help failed"
293
- }
294
-
295
- # Test npm packaging
296
- test_npm_packaging() {
297
- echo "Testing npm packaging..."
298
-
299
- # Check if publishable
300
- npm pack --dry-run 2>&1 | grep -q "claude-self-reflect" && \
301
- echo "✅ Package is publishable" || \
302
- echo "❌ Package issues detected"
303
-
304
- # Check dependencies
305
- npm ls --depth=0 2>&1 | grep -q "UNMET" && \
306
- echo "❌ Unmet dependencies" || \
307
- echo "✅ Dependencies satisfied"
308
- }
309
-
310
- test_cli_installation
311
- test_cli_commands
312
- test_npm_packaging
313
- ```
314
-
315
- ### 4. Import Pipeline Validation (Enhanced)
316
- ```bash
317
- #!/bin/bash
318
- echo "=== IMPORT PIPELINE VALIDATION ==="
319
-
320
- # Test unified importer
321
- test_unified_importer() {
322
- echo "Testing unified importer..."
323
-
324
- # Find a test JSONL file
325
- TEST_FILE=$(find ~/.claude/projects -name "*.jsonl" -type f | head -1)
326
- if [ -z "$TEST_FILE" ]; then
327
- echo "⚠️ No test files available"
328
- return
329
- fi
330
-
331
- # Test with limit
332
- python scripts/import-conversations-unified.py --file "$TEST_FILE" --limit 1
333
-
334
- if [ $? -eq 0 ]; then
335
- echo "✅ Unified importer works"
336
- else
337
- echo "❌ Unified importer failed"
338
- fi
339
- }
340
-
341
- # Test for zero chunks/vectors - CRITICAL
342
- test_zero_chunks_detection() {
343
- echo "Testing zero chunks/vectors detection..."
344
-
345
- # Check recent imports for zero chunks
346
- IMPORT_LOG=$(python scripts/import-conversations-unified.py --limit 5 2>&1)
347
-
348
- # Check for zero chunks warnings
349
- if echo "$IMPORT_LOG" | grep -q "Imported 0 chunks"; then
350
- echo "❌ CRITICAL: Found imports with 0 chunks!"
351
- echo " Files producing 0 chunks:"
352
- echo "$IMPORT_LOG" | grep -B1 "Imported 0 chunks" | grep "import of"
353
-
354
- # Analyze why chunks are zero
355
- echo " Analyzing root cause..."
356
-
357
- # Check for thinking-only content
358
- PROBLEM_FILE=$(echo "$IMPORT_LOG" | grep -B1 "Imported 0 chunks" | grep "\.jsonl" | head -1 | awk '{print $NF}')
359
- if [ -n "$PROBLEM_FILE" ]; then
360
- python -c "
361
- import json
362
- file_path = '$PROBLEM_FILE'
363
- has_thinking = 0
364
- has_text = 0
365
- with open(file_path, 'r') as f:
366
- for line in f:
367
- data = json.loads(line.strip())
368
- if 'message' in data and data['message']:
369
- content = data['message'].get('content', [])
370
- if isinstance(content, list):
371
- for item in content:
372
- if isinstance(item, dict):
373
- if item.get('type') == 'thinking':
374
- has_thinking += 1
375
- elif item.get('type') == 'text':
376
- has_text += 1
377
- print(f' Thinking blocks: {has_thinking}')
378
- print(f' Text blocks: {has_text}')
379
- if has_thinking > 0 and has_text == 0:
380
- print(' ⚠️ File has only thinking content - import script may need fix')
381
- "
382
- fi
383
-
384
- # DO NOT CERTIFY WITH ZERO CHUNKS
385
- echo " ⛔ CERTIFICATION BLOCKED: Fix zero chunks issue before certifying!"
386
- return 1
387
- else
388
- echo "✅ No zero chunks detected in recent imports"
389
- fi
390
-
391
- # Also check Qdrant for empty collections
392
- python -c "
393
- from qdrant_client import QdrantClient
394
- client = QdrantClient('http://localhost:6333')
395
- collections = client.get_collections().collections
396
- empty_collections = []
397
- for col in collections:
398
- count = client.count(collection_name=col.name).count
399
- if count == 0:
400
- empty_collections.append(col.name)
401
- if empty_collections:
402
- print(f'❌ Found {len(empty_collections)} empty collections: {empty_collections}')
403
- print(' ⛔ CERTIFICATION BLOCKED: Empty collections detected!')
404
- else:
405
- print('✅ All collections have vectors')
406
- " 2>/dev/null || echo "⚠️ Could not check Qdrant collections"
407
- }
408
-
409
- # Test streaming importer
410
- test_streaming_importer() {
411
- echo "Testing streaming importer..."
412
-
413
- if docker ps | grep -q streaming-importer; then
414
- # Check if processing
415
- docker logs streaming-importer --tail 10 | grep -q "Processing" && \
416
- echo "✅ Streaming importer active" || \
417
- echo "⚠️ Streaming importer idle"
418
- else
419
- echo "❌ Streaming importer not running"
420
- fi
421
- }
422
-
423
- # Test delta metadata update
424
- test_delta_metadata() {
425
- echo "Testing delta metadata update..."
426
-
427
- DRY_RUN=true python scripts/delta-metadata-update.py 2>&1 | grep -q "would update" && \
428
- echo "✅ Delta metadata updater works" || \
429
- echo "❌ Delta metadata updater failed"
430
- }
431
-
432
- test_unified_importer
433
- test_zero_chunks_detection # CRITICAL: Must pass before certification
434
- test_streaming_importer
435
- test_delta_metadata
436
- ```
437
-
438
- ### 5. Hook System Testing
439
- ```bash
440
- #!/bin/bash
441
- echo "=== HOOK SYSTEM TESTING ==="
442
-
443
- # Test session-start hook
444
- test_session_start_hook() {
445
- echo "Testing session-start hook..."
446
- HOOK_PATH="$HOME/.claude/hooks/session-start"
447
- if [ -f "$HOOK_PATH" ]; then
448
- echo "✅ session-start hook exists"
449
- # Check if executable
450
- [ -x "$HOOK_PATH" ] && echo "✅ Hook is executable" || echo "❌ Hook not executable"
451
- else
452
- echo "⚠️ session-start hook not configured"
453
- fi
454
- }
455
-
456
- # Test precompact hook
457
- test_precompact_hook() {
458
- echo "Testing precompact hook..."
459
- HOOK_PATH="$HOME/.claude/hooks/precompact"
460
- if [ -f "$HOOK_PATH" ]; then
461
- echo "✅ precompact hook exists"
462
- # Test execution
463
- timeout 10 "$HOOK_PATH" && echo "✅ Hook executes successfully" || echo "❌ Hook failed"
464
- else
465
- echo "⚠️ precompact hook not configured"
466
- fi
467
- }
468
-
469
- test_session_start_hook
470
- test_precompact_hook
471
- ```
472
-
473
- ### 6. Metadata Extraction Testing
474
- ```bash
475
- #!/bin/bash
476
- echo "=== METADATA EXTRACTION TESTING ==="
477
-
478
- # Test metadata extraction
479
- test_metadata_extraction() {
480
- echo "Testing metadata extraction..."
481
- python -c "
482
- import json
483
- from pathlib import Path
484
-
485
- # Check if metadata is being extracted
486
- config_dir = Path.home() / '.claude-self-reflect' / 'config'
487
- delta_state = config_dir / 'delta-update-state.json'
488
-
489
- if delta_state.exists():
490
- with open(delta_state) as f:
491
- state = json.load(f)
492
- updated = state.get('updated_points', {})
493
- if updated:
494
- sample = list(updated.values())[0] if updated else {}
495
- print(f'✅ Metadata extracted for {len(updated)} points')
496
- if 'files_analyzed' in str(sample):
497
- print('✅ files_analyzed metadata present')
498
- if 'tools_used' in str(sample):
499
- print('✅ tools_used metadata present')
500
- if 'concepts' in str(sample):
501
- print('✅ concepts metadata present')
502
- if 'code_patterns' in str(sample):
503
- print('✅ code_patterns (AST) metadata present')
504
- else:
505
- print('⚠️ No metadata updates found')
506
- else:
507
- print('❌ Delta update state file not found')
508
- "
509
- }
510
-
511
- # Test AST pattern extraction
512
- test_ast_patterns() {
513
- echo "Testing AST pattern extraction..."
514
- TEST_FILE=$(mktemp)
515
- cat > "$TEST_FILE" << 'EOF'
516
- import ast
517
- text = "def test(): return True"
518
- tree = ast.parse(text)
519
- patterns = [node.__class__.__name__ for node in ast.walk(tree)]
520
- print(f"AST patterns: {patterns}")
521
- EOF
522
- python "$TEST_FILE"
523
- rm "$TEST_FILE"
524
- }
525
-
526
- test_metadata_extraction
527
- test_ast_patterns
528
- ```
529
-
530
- ### 7. Zero Vector Investigation
531
- ```bash
532
- #!/bin/bash
533
- echo "=== ZERO VECTOR INVESTIGATION ==="
534
-
535
- test_zero_vectors() {
536
- python -c "
537
- import numpy as np
538
- from qdrant_client import QdrantClient
539
-
540
- # Connect to Qdrant
541
- client = QdrantClient('http://localhost:6333')
542
-
543
- # Check for zero vectors
544
- collections = client.get_collections().collections
545
- zero_count = 0
546
- total_checked = 0
547
-
548
- for col in collections[:5]: # Check first 5 collections
549
- try:
550
- points = client.scroll(
551
- collection_name=col.name,
552
- limit=10,
553
- with_vectors=True
554
- )[0]
555
-
556
- for point in points:
557
- total_checked += 1
558
- if point.vector:
559
- if isinstance(point.vector, list) and all(v == 0 for v in point.vector):
560
- zero_count += 1
561
- print(f'❌ CRITICAL: Zero vector in {col.name}, point {point.id}')
562
- elif isinstance(point.vector, dict):
563
- for vec_name, vec in point.vector.items():
564
- if all(v == 0 for v in vec):
565
- zero_count += 1
566
- print(f'❌ CRITICAL: Zero vector in {col.name}, point {point.id}, vector {vec_name}')
567
- except Exception as e:
568
- print(f'⚠️ Error checking {col.name}: {e}')
569
-
570
- if zero_count == 0:
571
- print(f'✅ No zero vectors found (checked {total_checked} points)')
572
- else:
573
- print(f'❌ Found {zero_count} zero vectors out of {total_checked} points')
574
- "
575
- }
576
-
577
- # Test embedding generation
578
- test_embedding_generation() {
579
- echo "Testing embedding generation..."
580
- python -c "
581
- try:
582
- from fastembed import TextEmbedding
583
- model = TextEmbedding('sentence-transformers/all-MiniLM-L6-v2')
584
- texts = ['test', 'hello world', '']
585
-
586
- for text in texts:
587
- embedding = list(model.embed([text]))[0]
588
- is_zero = all(v == 0 for v in embedding)
589
- if is_zero:
590
- print(f'❌ CRITICAL: Zero embedding for \'{text}\'')
591
- else:
592
- import numpy as np
593
- print(f'✅ Non-zero embedding for \'{text}\' (mean={np.mean(embedding):.4f})')
594
- except ImportError:
595
- print('❌ FastEmbed not installed')
596
- "
597
- }
598
-
599
- test_zero_vectors
600
- test_embedding_generation
601
- ```
602
-
603
- ### 8. Sub-Agent Testing
604
- ```bash
605
- #!/bin/bash
606
- echo "=== SUB-AGENT TESTING ==="
607
-
608
- # List all sub-agents
609
- test_subagent_availability() {
610
- echo "Checking sub-agent availability..."
611
- AGENTS_DIR="$HOME/projects/claude-self-reflect/.claude/agents"
612
-
613
- EXPECTED_AGENTS=(
614
- "claude-self-reflect-test.md"
615
- "import-debugger.md"
616
- "docker-orchestrator.md"
617
- "mcp-integration.md"
618
- "search-optimizer.md"
619
- "reflection-specialist.md"
620
- "qdrant-specialist.md"
621
- )
622
-
623
- for agent in "${EXPECTED_AGENTS[@]}"; do
624
- if [ -f "$AGENTS_DIR/$agent" ]; then
625
- echo "✅ $agent present"
626
- else
627
- echo "❌ $agent missing"
628
- fi
629
- done
630
- }
631
-
632
- test_subagent_availability
633
- ```
634
-
635
- ### 9. Embedding Mode Comprehensive Test
636
- ```bash
637
- #!/bin/bash
638
- echo "=== EMBEDDING MODE TESTING ==="
639
-
640
- # CRITICAL: Instructions for switching to cloud mode
641
- # The system needs new collections with 1024 dimensions for cloud mode
642
- # This requires MCP restart with VOYAGE_KEY parameter
643
-
644
- # Test both modes
645
- test_both_embedding_modes() {
646
- echo "Testing local mode (FastEmbed)..."
647
- PREFER_LOCAL_EMBEDDINGS=true python -c "
648
- from mcp_server.src.embedding_manager import get_embedding_manager
649
- em = get_embedding_manager()
650
- print(f'Local mode: {em.model_type}, dimension: {em.get_vector_dimension()}')
651
- "
652
-
653
- if [ -n "$VOYAGE_KEY" ]; then
654
- echo "Testing cloud mode (Voyage AI)..."
655
- PREFER_LOCAL_EMBEDDINGS=false python -c "
656
- from mcp_server.src.embedding_manager import get_embedding_manager
657
- em = get_embedding_manager()
658
- print(f'Cloud mode: {em.model_type}, dimension: {em.get_vector_dimension()}')
659
- "
660
- else
661
- echo "⚠️ VOYAGE_KEY not set, skipping cloud mode test"
662
- fi
663
- }
664
-
665
- # CRITICAL CLOUD MODE SWITCH PROCEDURE
666
- switch_to_cloud_mode() {
667
- echo "=== SWITCHING TO CLOUD MODE (1024 dimensions) ==="
668
- echo "This creates NEW collections with _voyage suffix"
669
-
670
- # Step 1: Get VOYAGE_KEY from .env
671
- VOYAGE_KEY=$(grep "^VOYAGE_KEY=" .env | cut -d'=' -f2)
672
- if [ -z "$VOYAGE_KEY" ]; then
673
- echo "❌ VOYAGE_KEY not found in .env file"
674
- echo "Please add VOYAGE_KEY=your-key-here to .env file"
675
- return 1
676
- fi
677
-
678
- # Step 2: Remove existing MCP
679
- echo "Removing existing MCP configuration..."
680
- claude mcp remove claude-self-reflect
681
-
682
- # Step 3: Re-add with cloud parameters
683
- echo "Adding MCP with cloud mode parameters..."
684
- claude mcp add claude-self-reflect \
685
- "/Users/$(whoami)/projects/claude-self-reflect/mcp-server/run-mcp.sh" \
686
- -e PREFER_LOCAL_EMBEDDINGS="false" \
687
- -e VOYAGE_KEY="$VOYAGE_KEY" \
688
- -e QDRANT_URL="http://localhost:6333" \
689
- -s user
690
-
691
- # Step 4: Wait for MCP to initialize
692
- echo "Waiting 30 seconds for MCP to initialize..."
693
- sleep 30
694
-
695
- # Step 5: Test MCP connection
696
- echo "Testing MCP connection..."
697
- claude mcp list | grep claude-self-reflect
698
-
699
- echo "✅ Switched to CLOUD mode with 1024-dimensional embeddings"
700
- echo "⚠️ New collections will be created with _voyage suffix"
701
- }
702
-
703
- # CRITICAL LOCAL MODE RESTORE PROCEDURE
704
- switch_to_local_mode() {
705
- echo "=== RESTORING LOCAL MODE (384 dimensions) ==="
706
- echo "This uses collections with _local suffix"
707
-
708
- # Step 1: Remove existing MCP
709
- echo "Removing existing MCP configuration..."
710
- claude mcp remove claude-self-reflect
711
-
712
- # Step 2: Re-add with local parameters (default)
713
- echo "Adding MCP with local mode parameters..."
714
- claude mcp add claude-self-reflect \
715
- "/Users/$(whoami)/projects/claude-self-reflect/mcp-server/run-mcp.sh" \
716
- -e PREFER_LOCAL_EMBEDDINGS="true" \
717
- -e QDRANT_URL="http://localhost:6333" \
718
- -s user
719
-
720
- # Step 3: Wait for MCP to initialize
721
- echo "Waiting 30 seconds for MCP to initialize..."
722
- sleep 30
723
-
724
- # Step 4: Test MCP connection
725
- echo "Testing MCP connection..."
726
- claude mcp list | grep claude-self-reflect
727
-
728
- echo "✅ Restored to LOCAL mode with 384-dimensional embeddings"
729
- echo "Privacy-first mode active"
730
- }
731
-
732
- # Test mode switching
733
- test_mode_switching() {
734
- echo "Testing mode switching..."
735
- python -c "
736
- from pathlib import Path
737
- env_file = Path('.env')
738
- if env_file.exists():
739
- content = env_file.read_text()
740
- if 'PREFER_LOCAL_EMBEDDINGS=false' in content:
741
- print('Currently in CLOUD mode (per .env file)')
742
- else:
743
- print('Currently in LOCAL mode (per .env file)')
744
- else:
745
- print('⚠️ .env file not found')
746
- "
747
- }
748
-
749
- # Full cloud mode test procedure
750
- full_cloud_mode_test() {
751
- echo "=== FULL CLOUD MODE TEST PROCEDURE ==="
752
-
753
- # 1. Switch to cloud mode
754
- switch_to_cloud_mode
755
-
756
- # 2. Test cloud embedding generation
757
- echo "Testing cloud embedding generation..."
758
- # This will create new collections with _voyage suffix
759
-
760
- # 3. Run import with cloud embeddings
761
- echo "Running test import with cloud embeddings..."
762
- cd /Users/$(whoami)/projects/claude-self-reflect
763
- source venv/bin/activate
764
- PREFER_LOCAL_EMBEDDINGS=false python scripts/import-conversations-unified.py --limit 5
765
-
766
- # 4. Verify cloud collections created
767
- echo "Verifying cloud collections..."
768
- curl -s http://localhost:6333/collections | jq '.result.collections[] | select(.name | endswith("_voyage")) | .name'
769
-
770
- # 5. Test search with cloud embeddings
771
- echo "Testing search with cloud embeddings..."
772
- # Test via MCP tools
773
-
774
- # 6. CRITICAL: Always restore to local mode
775
- echo "⚠️ CRITICAL: Restoring to local mode..."
776
- switch_to_local_mode
777
-
778
- echo "✅ Cloud mode test complete, system restored to local mode"
779
- }
780
-
781
- test_both_embedding_modes
782
- test_mode_switching
783
- # Uncomment to run full cloud test:
784
- # full_cloud_mode_test
785
- ```
786
-
787
- ### 10. MCP Tools Comprehensive Test
788
- ```bash
789
- #!/bin/bash
790
- echo "=== MCP TOOLS COMPREHENSIVE TEST ==="
791
-
792
- # This should be run via Claude Code for actual MCP testing
793
- cat << 'EOF'
794
- To test all MCP tools in Claude Code:
795
-
796
- 1. SEARCH TOOLS:
797
- - mcp__claude-self-reflect__reflect_on_past("test query", limit=3)
798
- - mcp__claude-self-reflect__quick_search("test")
799
- - mcp__claude-self-reflect__search_summary("test")
800
- - mcp__claude-self-reflect__search_by_file("server.py")
801
- - mcp__claude-self-reflect__search_by_concept("testing")
802
-
803
- 2. TEMPORAL TOOLS (NEW):
804
- - mcp__claude-self-reflect__get_recent_work(limit=5)
805
- - mcp__claude-self-reflect__get_recent_work(project="all")
806
- - mcp__claude-self-reflect__search_by_recency("bug", time_range="last week")
807
- - mcp__claude-self-reflect__get_timeline(time_range="last month", granularity="week")
808
-
809
- 3. REFLECTION TOOLS:
810
- - mcp__claude-self-reflect__store_reflection("Test insight", tags=["test"])
811
- - mcp__claude-self-reflect__get_full_conversation("conversation-id")
812
-
813
- 4. PAGINATION:
814
- - mcp__claude-self-reflect__get_more_results("query", offset=3)
815
- - mcp__claude-self-reflect__get_next_results("query", offset=3)
816
-
817
- Expected Results:
818
- - All tools should return valid XML/markdown responses
819
- - Search scores should be > 0.3 for relevant results
820
- - Temporal tools should respect project scoping
821
- - No errors or timeouts
822
- EOF
823
- ```
824
-
825
- ### 6. Docker Health Validation
826
- ```bash
827
- #!/bin/bash
828
- echo "=== DOCKER HEALTH VALIDATION ==="
829
-
830
- # Check Qdrant health
831
- check_qdrant_health() {
832
- echo "Checking Qdrant health..."
833
-
834
- # Check if running
835
- if docker ps | grep -q qdrant; then
836
- # Check API responsive
837
- curl -s http://localhost:6333/health | grep -q "ok" && \
838
- echo "✅ Qdrant healthy" || \
839
- echo "❌ Qdrant API not responding"
840
-
841
- # Check disk usage
842
- DISK_USAGE=$(docker exec qdrant df -h /qdrant/storage | tail -1 | awk '{print $5}' | sed 's/%//')
843
- if [ "$DISK_USAGE" -lt 80 ]; then
844
- echo "✅ Disk usage: ${DISK_USAGE}%"
845
- else
846
- echo "⚠️ High disk usage: ${DISK_USAGE}%"
847
- fi
848
- else
849
- echo "❌ Qdrant not running"
850
- fi
851
- }
852
-
853
- # Check watcher health
854
- check_watcher_health() {
855
- echo "Checking watcher health..."
856
-
857
- WATCHER_NAME="claude-reflection-safe-watcher"
858
- if docker ps | grep -q "$WATCHER_NAME"; then
859
- # Check memory usage
860
- MEM=$(docker stats --no-stream --format "{{.MemUsage}}" "$WATCHER_NAME" 2>/dev/null | cut -d'/' -f1 | sed 's/[^0-9.]//g')
861
- if [ -n "$MEM" ]; then
862
- echo "✅ Watcher running (Memory: ${MEM}MB)"
863
- else
864
- echo "⚠️ Watcher running but stats unavailable"
865
- fi
866
-
867
- # Check for errors in logs
868
- ERROR_COUNT=$(docker logs "$WATCHER_NAME" --tail 100 2>&1 | grep -c ERROR)
869
- if [ "$ERROR_COUNT" -eq 0 ]; then
870
- echo "✅ No errors in recent logs"
871
- else
872
- echo "⚠️ Found $ERROR_COUNT errors in logs"
873
- fi
874
- else
875
- echo "❌ Watcher not running"
876
- fi
877
- }
878
-
879
- # Check docker-compose status
880
- check_compose_status() {
881
- echo "Checking docker-compose status..."
882
-
883
- if [ -f docker-compose.yaml ]; then
884
- # Validate compose file
885
- docker-compose config --quiet 2>/dev/null && \
886
- echo "✅ docker-compose.yaml valid" || \
887
- echo "❌ docker-compose.yaml has errors"
888
-
889
- # Check defined services
890
- SERVICES=$(docker-compose config --services 2>/dev/null)
891
- echo "Defined services: $SERVICES"
892
- else
893
- echo "❌ docker-compose.yaml not found"
894
- fi
895
- }
896
-
897
- check_qdrant_health
898
- check_watcher_health
899
- check_compose_status
900
- ```
901
-
902
- ### 7. Modularization Readiness Check (NEW)
903
- ```bash
904
- #!/bin/bash
905
- echo "=== MODULARIZATION READINESS CHECK ==="
906
-
907
- # Analyze server.py for modularization
908
- analyze_server_py() {
909
- echo "Analyzing server.py for modularization..."
910
-
911
- FILE="mcp-server/src/server.py"
912
- if [ -f "$FILE" ]; then
913
- # Count lines
914
- LINES=$(wc -l < "$FILE")
915
- echo "Total lines: $LINES"
916
-
917
- # Count tools
918
- TOOL_COUNT=$(grep -c "@mcp.tool()" "$FILE")
919
- echo "MCP tools defined: $TOOL_COUNT"
920
-
921
- # Count imports
922
- IMPORT_COUNT=$(grep -c "^import\|^from" "$FILE")
923
- echo "Import statements: $IMPORT_COUNT"
924
-
925
- # Identify major sections
926
- echo -e "\nMajor sections to extract:"
927
- echo "- Temporal tools (get_recent_work, search_by_recency, get_timeline)"
928
- echo "- Search tools (reflect_on_past, quick_search, etc.)"
929
- echo "- Reflection tools (store_reflection, get_full_conversation)"
930
- echo "- Embedding management (EmbeddingManager, generate_embedding)"
931
- echo "- Decay logic (calculate_decay, apply_decay)"
932
- echo "- Utils (ProjectResolver, normalize_project_name)"
933
-
934
- # Check for circular dependencies
935
- echo -e "\nChecking for potential circular dependencies..."
936
- grep -q "from server import" "$FILE" && \
937
- echo "⚠️ Potential circular imports detected" || \
938
- echo "✅ No obvious circular imports"
939
- else
940
- echo "❌ server.py not found"
941
- fi
942
- }
943
-
944
- # Check for existing modular files
945
- check_existing_modules() {
946
- echo -e "\nChecking for existing modular files..."
947
-
948
- MODULES=(
949
- "temporal_utils.py"
950
- "temporal_design.py"
951
- "project_resolver.py"
952
- "embedding_manager.py"
953
- )
954
-
955
- for module in "${MODULES[@]}"; do
956
- if [ -f "mcp-server/src/$module" ]; then
957
- echo "✅ $module exists"
958
- else
959
- echo "⚠️ $module not found (needs creation)"
960
- fi
961
- done
962
- }
963
-
964
- analyze_server_py
965
- check_existing_modules
966
- ```
967
-
968
- ### 8. Performance & Memory Testing
969
- ```bash
970
- #!/bin/bash
971
- echo "=== PERFORMANCE & MEMORY TESTING ==="
972
-
973
- # Test search performance with temporal tools
974
- test_search_performance() {
975
- echo "Testing search performance..."
976
-
977
- python -c "
978
- import time
979
- import asyncio
980
- import sys
981
- import os
982
- sys.path.insert(0, 'mcp-server/src')
983
- os.environ['QDRANT_URL'] = 'http://localhost:6333'
984
-
985
- async def test():
986
- from server import get_recent_work, search_by_recency
987
-
988
- class MockContext:
989
- async def debug(self, msg): pass
990
- async def report_progress(self, *args): pass
991
-
992
- ctx = MockContext()
993
-
994
- # Time get_recent_work
995
- start = time.time()
996
- await get_recent_work(ctx, limit=10)
997
- recent_time = time.time() - start
998
-
999
- # Time search_by_recency
1000
- start = time.time()
1001
- await search_by_recency(ctx, 'test', 'last week')
1002
- search_time = time.time() - start
1003
-
1004
- print(f'get_recent_work: {recent_time:.2f}s')
1005
- print(f'search_by_recency: {search_time:.2f}s')
1006
-
1007
- if recent_time < 2 and search_time < 2:
1008
- print('✅ Performance acceptable')
1009
- else:
1010
- print('⚠️ Performance needs optimization')
1011
-
1012
- asyncio.run(test())
1013
- "
1014
- }
1015
-
1016
- # Test memory usage
1017
- test_memory_usage() {
1018
- echo "Testing memory usage..."
1019
-
1020
- # Check Python process memory
1021
- python -c "
1022
- import psutil
1023
- import os
1024
- process = psutil.Process(os.getpid())
1025
- mem_mb = process.memory_info().rss / 1024 / 1024
1026
- print(f'Python process: {mem_mb:.1f}MB')
1027
- "
1028
-
1029
- # Check Docker container memory
1030
- for container in qdrant claude-reflection-safe-watcher; do
1031
- if docker ps | grep -q $container; then
1032
- MEM=$(docker stats --no-stream --format "{{.MemUsage}}" $container 2>/dev/null | cut -d'/' -f1 | sed 's/[^0-9.]//g')
1033
- echo "$container: ${MEM}MB"
1034
- fi
1035
- done
1036
- }
1037
-
1038
- test_search_performance
1039
- test_memory_usage
1040
- ```
1041
-
1042
- ### 9. Complete Test Report Generator
1043
- ```bash
1044
- #!/bin/bash
1045
- echo "=== GENERATING TEST REPORT ==="
1046
-
1047
- REPORT_FILE="test-report-$(date +%Y%m%d-%H%M%S).md"
1048
-
1049
- cat > "$REPORT_FILE" << EOF
1050
- # Claude Self-Reflect Test Report
1051
-
1052
- ## Test Summary
1053
- - **Date**: $(date)
1054
- - **Version**: $(grep version package.json | cut -d'"' -f4)
1055
- - **Server.py Lines**: $(wc -l < mcp-server/src/server.py)
1056
- - **Collections**: $(curl -s http://localhost:6333/collections | jq '.result.collections | length')
1057
-
1058
- ## Feature Tests
1059
-
1060
- ### Core Features
1061
- - [ ] Import Pipeline: PASS/FAIL
1062
- - [ ] MCP Tools (12): PASS/FAIL
1063
- - [ ] Search Quality: PASS/FAIL
1064
- - [ ] State Management: PASS/FAIL
1065
-
1066
- ### v3.x Features
1067
- - [ ] Temporal Tools (3): PASS/FAIL
1068
- - [ ] get_recent_work: PASS/FAIL
1069
- - [ ] search_by_recency: PASS/FAIL
1070
- - [ ] get_timeline: PASS/FAIL
1071
- - [ ] Timestamp Indexes: PASS/FAIL
1072
- - [ ] Project Scoping: PASS/FAIL
1073
-
1074
- ### Infrastructure
1075
- - [ ] CLI Tool: PASS/FAIL
1076
- - [ ] Docker Health: PASS/FAIL
1077
- - [ ] Qdrant: PASS/FAIL
1078
- - [ ] Watcher: PASS/FAIL
1079
-
1080
- ### Performance
1081
- - [ ] Search < 2s: PASS/FAIL
1082
- - [ ] Import < 10s: PASS/FAIL
1083
- - [ ] Memory < 500MB: PASS/FAIL
1084
-
1085
- ### Code Quality
1086
- - [ ] No Critical Bugs: PASS/FAIL
1087
- - [ ] XML Injection Fixed: PASS/FAIL
1088
- - [ ] Native Decay Fixed: PASS/FAIL
1089
- - [ ] Modularization Ready: PASS/FAIL
1090
-
1091
- ## Observations
1092
- $(date): Test execution started
1093
- $(date): All temporal tools tested
1094
- $(date): Project scoping validated
1095
- $(date): CLI packaging verified
1096
- $(date): Docker health confirmed
1097
-
1098
- ## Recommendations
1099
- 1. Fix critical bugs before release
1100
- 2. Complete modularization (2,835 lines → multiple modules)
1101
- 3. Add more comprehensive unit tests
1102
- 4. Update documentation for v3.x features
1103
-
1104
- ## Certification
1105
- **System Ready for Release**: YES/NO
1106
-
1107
- ## Sign-off
1108
- Tested by: claude-self-reflect-test agent
1109
- Date: $(date)
1110
- EOF
1111
-
1112
- echo "✅ Test report generated: $REPORT_FILE"
1113
- ```
1114
-
1115
- ## Pre-Test Validation Protocol
1116
-
1117
- ### Agent Self-Review
1118
- Before running any tests, I MUST review myself to ensure comprehensive coverage:
1119
-
1120
- ```bash
1121
- #!/bin/bash
1122
- echo "=== PRE-TEST AGENT VALIDATION ==="
1123
-
1124
- # Review this agent file for completeness
1125
- review_agent_completeness() {
1126
- echo "Reviewing CSR-tester agent for missing features..."
1127
-
1128
- # Check if agent covers all known features
1129
- AGENT_FILE="$HOME/projects/claude-self-reflect/.claude/agents/claude-self-reflect-test.md"
1130
-
1131
- REQUIRED_FEATURES=(
1132
- "15+ MCP tools"
1133
- "Temporal tools"
1134
- "Metadata extraction"
1135
- "Hook system"
1136
- "Sub-agents"
1137
- "Embedding modes"
1138
- "Zero vectors"
1139
- "Streaming watcher"
1140
- "Delta metadata"
1141
- "Import pipeline"
1142
- "Docker stack"
1143
- "CLI tool"
1144
- "State management"
1145
- "Memory decay"
1146
- "Parallel search"
1147
- "Project scoping"
1148
- "Collection naming"
1149
- "Dimension validation"
1150
- "XML escaping"
1151
- "Error handling"
1152
- )
1153
-
1154
- for feature in "${REQUIRED_FEATURES[@]}"; do
1155
- if grep -qi "$feature" "$AGENT_FILE"; then
1156
- echo "✅ $feature: Covered"
1157
- else
1158
- echo "❌ $feature: MISSING - Add test coverage!"
1159
- fi
1160
- done
1161
- }
1162
-
1163
- # Discover any new features from codebase
1164
- discover_new_features() {
1165
- echo "Scanning for undocumented features..."
1166
-
1167
- # Check for new MCP tools
1168
- NEW_TOOLS=$(grep -h "@mcp.tool()" mcp-server/src/*.py 2>/dev/null | wc -l)
1169
- echo "MCP tools found: $NEW_TOOLS"
1170
-
1171
- # Check for new scripts
1172
- NEW_SCRIPTS=$(ls scripts/*.py 2>/dev/null | wc -l)
1173
- echo "Python scripts found: $NEW_SCRIPTS"
1174
-
1175
- # Check for new test files
1176
- NEW_TESTS=$(find tests -name "*.py" 2>/dev/null | wc -l)
1177
- echo "Test files found: $NEW_TESTS"
1178
-
1179
- # Check for new hooks
1180
- if [ -d "$HOME/.claude/hooks" ]; then
1181
- HOOKS=$(ls "$HOME/.claude/hooks" 2>/dev/null | wc -l)
1182
- echo "Hooks configured: $HOOKS"
1183
- fi
1184
- }
1185
-
1186
- review_agent_completeness
1187
- discover_new_features
1188
- ```
1189
-
1190
- ## Test Execution Protocol
1191
-
1192
- ### Run Complete Test Suite
1193
- ```bash
1194
- #!/bin/bash
1195
- # Master test runner - CSR-tester is the SOLE executor of all tests
1196
-
1197
- echo "=== CLAUDE SELF-REFLECT COMPLETE TEST SUITE ==="
1198
- echo "Starting at: $(date)"
1199
- echo "Executor: CSR-tester agent (sole test runner)"
1200
- echo ""
1201
-
1202
- # Pre-test validation
1203
- echo "Phase 0: Pre-test Validation..."
1204
- ./review_agent_completeness.sh
1205
-
1206
- # Create test results directory
1207
- mkdir -p test-results-$(date +%Y%m%d)
1208
- cd test-results-$(date +%Y%m%d)
1209
-
1210
- # Run all test suites
1211
- ../test-system-health.sh > health.log 2>&1
1212
- ../test-temporal-tools.sh > temporal.log 2>&1
1213
- ../test-cli-tool.sh > cli.log 2>&1
1214
- ../test-import-pipeline.sh > import.log 2>&1
1215
- ../test-docker-health.sh > docker.log 2>&1
1216
- ../test-modularization.sh > modular.log 2>&1
1217
- ../test-performance.sh > performance.log 2>&1
1218
-
1219
- # Generate final report
1220
- ../generate-test-report.sh
1221
-
1222
- echo ""
1223
- echo "=== TEST SUITE COMPLETE ==="
1224
- echo "Results in: test-results-$(date +%Y%m%d)/"
1225
- echo "Report: test-report-*.md"
1226
- ```
1227
-
1228
- ## Success Criteria
1229
-
1230
- ### Must Pass
1231
- - [ ] All 15+ MCP tools functional
1232
- - [ ] Temporal tools work with proper scoping
1233
- - [ ] Timestamp indexes on all collections
1234
- - [ ] CLI installs and runs globally
1235
- - [ ] Docker containers healthy
1236
- - [ ] No critical bugs (native decay, XML injection, dimension mismatch)
1237
- - [ ] Search returns relevant results
1238
- - [ ] Import pipeline processes files
1239
- - [ ] State persists correctly
1240
- - [ ] NO ZERO VECTORS in any collection
1241
- - [ ] Metadata extraction working (files, tools, concepts, AST patterns)
1242
- - [ ] Both embedding modes functional (local 384d, Voyage 1024d)
1243
- - [ ] Hooks execute properly (session-start, precompact)
1244
- - [ ] All 6 sub-agents available
1245
-
1246
- ### Should Pass
1247
- - [ ] Performance within limits
1248
- - [ ] Memory usage acceptable
1249
- - [ ] Modularization plan approved
1250
- - [ ] Documentation updated
1251
- - [ ] All unit tests pass
1252
-
1253
- ### Nice to Have
1254
- - [ ] 100% test coverage
1255
- - [ ] Zero warnings in logs
1256
- - [ ] Sub-second search times
1257
-
1258
- ## Final Notes
1259
-
1260
- This agent knows ALL features of Claude Self-Reflect v3.3.0 including:
1261
- - 15+ MCP tools with temporal, search, reflection, pagination capabilities
1262
- - Modularized architecture (search_tools.py, temporal_tools.py, reflection_tools.py, parallel_search.py)
1263
- - Metadata extraction (AST patterns, concepts, files analyzed, tools used)
1264
- - Hook system (session-start, precompact, submit hooks)
1265
- - 6 specialized sub-agents for different domains
1266
- - Dual embedding support (FastEmbed 384d, Voyage AI 1024d)
1267
- - Zero vector detection and prevention
1268
- - Streaming watcher and delta metadata updater
1269
- - Project scoping and cross-collection search
1270
- - Memory decay (client-side with 90-day half-life)
1271
- - GPT-5 review recommendations and critical fixes
1272
- - All test scripts and their purposes
1273
-
1274
- The agent will ALWAYS restore the system to local mode after testing and provide comprehensive reports suitable for release decisions.