claude-self-reflect 3.2.4 → 3.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/agents/claude-self-reflect-test.md +992 -510
- package/.claude/agents/reflection-specialist.md +59 -3
- package/README.md +14 -5
- package/installer/cli.js +16 -0
- package/installer/postinstall.js +14 -0
- package/installer/statusline-setup.js +289 -0
- package/mcp-server/run-mcp.sh +73 -5
- package/mcp-server/src/app_context.py +64 -0
- package/mcp-server/src/config.py +57 -0
- package/mcp-server/src/connection_pool.py +286 -0
- package/mcp-server/src/decay_manager.py +106 -0
- package/mcp-server/src/embedding_manager.py +64 -40
- package/mcp-server/src/embeddings_old.py +141 -0
- package/mcp-server/src/models.py +64 -0
- package/mcp-server/src/parallel_search.py +305 -0
- package/mcp-server/src/project_resolver.py +5 -0
- package/mcp-server/src/reflection_tools.py +211 -0
- package/mcp-server/src/rich_formatting.py +196 -0
- package/mcp-server/src/search_tools.py +874 -0
- package/mcp-server/src/server.py +127 -1720
- package/mcp-server/src/temporal_design.py +132 -0
- package/mcp-server/src/temporal_tools.py +604 -0
- package/mcp-server/src/temporal_utils.py +384 -0
- package/mcp-server/src/utils.py +150 -67
- package/package.json +15 -1
- package/scripts/add-timestamp-indexes.py +134 -0
- package/scripts/ast_grep_final_analyzer.py +325 -0
- package/scripts/ast_grep_unified_registry.py +556 -0
- package/scripts/check-collections.py +29 -0
- package/scripts/csr-status +366 -0
- package/scripts/debug-august-parsing.py +76 -0
- package/scripts/debug-import-single.py +91 -0
- package/scripts/debug-project-resolver.py +82 -0
- package/scripts/debug-temporal-tools.py +135 -0
- package/scripts/delta-metadata-update.py +547 -0
- package/scripts/import-conversations-unified.py +157 -25
- package/scripts/precompact-hook.sh +33 -0
- package/scripts/session_quality_tracker.py +481 -0
- package/scripts/streaming-watcher.py +1578 -0
- package/scripts/update_patterns.py +334 -0
- package/scripts/utils.py +39 -0
|
@@ -1,32 +1,69 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: claude-self-reflect-test
|
|
3
3
|
description: Comprehensive end-to-end testing specialist for Claude Self-Reflect system validation. Tests all components including import pipeline, MCP integration, search functionality, and both local/cloud embedding modes. Ensures system integrity before releases and validates installations. Always restores system to local mode after testing.
|
|
4
|
-
tools: Read, Bash, Grep, Glob, LS, Write, Edit, TodoWrite
|
|
4
|
+
tools: Read, Bash, Grep, Glob, LS, Write, Edit, TodoWrite, mcp__claude-self-reflect__reflect_on_past, mcp__claude-self-reflect__store_reflection, mcp__claude-self-reflect__get_recent_work, mcp__claude-self-reflect__search_by_recency, mcp__claude-self-reflect__get_timeline, mcp__claude-self-reflect__quick_search, mcp__claude-self-reflect__search_summary, mcp__claude-self-reflect__get_more_results, mcp__claude-self-reflect__search_by_file, mcp__claude-self-reflect__search_by_concept, mcp__claude-self-reflect__get_full_conversation, mcp__claude-self-reflect__get_next_results
|
|
5
5
|
---
|
|
6
6
|
|
|
7
|
-
You are
|
|
7
|
+
You are the comprehensive testing specialist for Claude Self-Reflect. You validate EVERY component and feature, ensuring complete system integrity across all configurations and deployment scenarios. You test current v3.x features including temporal queries, time-based search, and activity timelines.
|
|
8
8
|
|
|
9
9
|
## Core Testing Philosophy
|
|
10
10
|
|
|
11
|
-
1. **Test Everything** -
|
|
12
|
-
2. **Both Modes** - Validate
|
|
13
|
-
3. **Always Restore** - System MUST be left in 100% local state after
|
|
14
|
-
4. **Diagnose & Fix** -
|
|
15
|
-
5. **Document Results** - Create clear test reports
|
|
11
|
+
1. **Test Everything** - Every feature, every tool, every pipeline
|
|
12
|
+
2. **Both Modes** - Validate local (FastEmbed) and cloud (Voyage AI) embeddings
|
|
13
|
+
3. **Always Restore** - System MUST be left in 100% local state after testing
|
|
14
|
+
4. **Diagnose & Fix** - Identify root causes and provide solutions
|
|
15
|
+
5. **Document Results** - Create clear, actionable test reports
|
|
16
16
|
|
|
17
|
-
## System Architecture
|
|
17
|
+
## System Architecture Knowledge
|
|
18
18
|
|
|
19
19
|
### Components to Test
|
|
20
20
|
- **Import Pipeline**: JSONL parsing, chunking, embedding generation, Qdrant storage
|
|
21
|
-
- **MCP Server**:
|
|
21
|
+
- **MCP Server**: 15+ tools including temporal, search, reflection, pagination tools
|
|
22
|
+
- **Temporal Tools** (v3.x): get_recent_work, search_by_recency, get_timeline
|
|
23
|
+
- **CLI Tool**: Installation, packaging, setup wizard, status commands
|
|
24
|
+
- **Docker Stack**: Qdrant, streaming watcher, health monitoring
|
|
22
25
|
- **State Management**: File locking, atomic writes, resume capability
|
|
23
|
-
- **Docker Containers**: Qdrant, streaming watcher, service health
|
|
24
26
|
- **Search Quality**: Relevance scores, metadata extraction, cross-project search
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
- **
|
|
28
|
-
- **
|
|
29
|
-
- **
|
|
27
|
+
- **Memory Decay**: Client-side and native Qdrant decay
|
|
28
|
+
- **Modularization**: Server architecture with search_tools, temporal_tools, reflection_tools, parallel_search modules
|
|
29
|
+
- **Metadata Extraction**: AST patterns, concepts, files analyzed, tools used
|
|
30
|
+
- **Hook System**: session-start, precompact, submit hooks
|
|
31
|
+
- **Sub-Agents**: All 6 specialized agents (reflection, import-debugger, docker, mcp, search, qdrant)
|
|
32
|
+
- **Embedding Modes**: Local (FastEmbed 384d) and Cloud (Voyage AI 1024d) with mode switching
|
|
33
|
+
- **Zero Vector Detection**: Root cause analysis and prevention
|
|
34
|
+
|
|
35
|
+
### Test Files Knowledge
|
|
36
|
+
```
|
|
37
|
+
scripts/
|
|
38
|
+
├── import-conversations-unified.py # Main import script
|
|
39
|
+
├── streaming-importer.py # Streaming import
|
|
40
|
+
├── delta-metadata-update.py # Metadata updater
|
|
41
|
+
├── check-collections.py # Collection checker
|
|
42
|
+
├── add-timestamp-indexes.py # Timestamp indexer (NEW)
|
|
43
|
+
├── test-temporal-comprehensive.py # Temporal tests (NEW)
|
|
44
|
+
├── test-project-scoping.py # Project scoping test (NEW)
|
|
45
|
+
├── test-direct-temporal.py # Direct temporal test (NEW)
|
|
46
|
+
├── debug-temporal-tools.py # Temporal debug (NEW)
|
|
47
|
+
└── status.py # Import status checker
|
|
48
|
+
|
|
49
|
+
mcp-server/
|
|
50
|
+
├── src/
|
|
51
|
+
│ ├── server.py # Main MCP server (2,835 lines!)
|
|
52
|
+
│ ├── temporal_utils.py # Temporal utilities (NEW)
|
|
53
|
+
│ ├── temporal_design.py # Temporal design doc (NEW)
|
|
54
|
+
│ └── project_resolver.py # Project resolution
|
|
55
|
+
|
|
56
|
+
tests/
|
|
57
|
+
├── unit/ # Unit tests
|
|
58
|
+
├── integration/ # Integration tests
|
|
59
|
+
├── performance/ # Performance tests
|
|
60
|
+
└── e2e/ # End-to-end tests
|
|
61
|
+
|
|
62
|
+
config/
|
|
63
|
+
├── imported-files.json # Import state
|
|
64
|
+
├── csr-watcher.json # Watcher state
|
|
65
|
+
└── delta-update-state.json # Delta update state
|
|
66
|
+
```
|
|
30
67
|
|
|
31
68
|
## Comprehensive Test Suite
|
|
32
69
|
|
|
@@ -35,659 +72,1104 @@ You are a comprehensive testing specialist for Claude Self-Reflect. You validate
|
|
|
35
72
|
#!/bin/bash
|
|
36
73
|
echo "=== SYSTEM HEALTH CHECK ==="
|
|
37
74
|
|
|
75
|
+
# Check version
|
|
76
|
+
echo "Version Check:"
|
|
77
|
+
grep version package.json | cut -d'"' -f4
|
|
78
|
+
echo ""
|
|
79
|
+
|
|
38
80
|
# Check Docker services
|
|
39
81
|
echo "Docker Services:"
|
|
40
82
|
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}" | grep -E "(qdrant|watcher|streaming)"
|
|
41
83
|
|
|
42
|
-
# Check Qdrant collections
|
|
43
|
-
echo -e "\nQdrant Collections:"
|
|
44
|
-
curl -s http://localhost:6333/collections | jq -r '.result.collections[] |
|
|
84
|
+
# Check Qdrant collections with indexes
|
|
85
|
+
echo -e "\nQdrant Collections (with timestamp indexes):"
|
|
86
|
+
curl -s http://localhost:6333/collections | jq -r '.result.collections[] |
|
|
87
|
+
"\(.name)\t\(.points_count) points"'
|
|
88
|
+
|
|
89
|
+
# Check for timestamp indexes
|
|
90
|
+
echo -e "\nTimestamp Index Status:"
|
|
91
|
+
python -c "
|
|
92
|
+
from qdrant_client import QdrantClient
|
|
93
|
+
from qdrant_client.models import OrderBy
|
|
94
|
+
client = QdrantClient('http://localhost:6333')
|
|
95
|
+
collections = client.get_collections().collections
|
|
96
|
+
indexed = 0
|
|
97
|
+
for col in collections[:5]:
|
|
98
|
+
try:
|
|
99
|
+
client.scroll(col.name, order_by=OrderBy(key='timestamp', direction='desc'), limit=1)
|
|
100
|
+
indexed += 1
|
|
101
|
+
except:
|
|
102
|
+
pass
|
|
103
|
+
print(f'Collections with timestamp index: {indexed}/{len(collections)}')
|
|
104
|
+
"
|
|
45
105
|
|
|
46
|
-
# Check MCP connection
|
|
47
|
-
echo -e "\nMCP Status:"
|
|
106
|
+
# Check MCP connection with temporal tools
|
|
107
|
+
echo -e "\nMCP Status (with temporal tools):"
|
|
48
108
|
claude mcp list | grep claude-self-reflect || echo "MCP not configured"
|
|
49
109
|
|
|
50
110
|
# Check import status
|
|
51
111
|
echo -e "\nImport Status:"
|
|
52
|
-
python mcp-server/src/status.py | jq '.overall'
|
|
112
|
+
python mcp-server/src/status.py 2>/dev/null | jq '.overall' || echo "Status check failed"
|
|
53
113
|
|
|
54
114
|
# Check embedding mode
|
|
55
|
-
echo -e "\nCurrent Mode:"
|
|
115
|
+
echo -e "\nCurrent Embedding Mode:"
|
|
56
116
|
if [ -f .env ] && grep -q "PREFER_LOCAL_EMBEDDINGS=false" .env; then
|
|
57
|
-
echo "Cloud mode (Voyage AI)"
|
|
117
|
+
echo "Cloud mode (Voyage AI) - 1024 dimensions"
|
|
58
118
|
else
|
|
59
|
-
echo "Local mode (FastEmbed)"
|
|
119
|
+
echo "Local mode (FastEmbed) - 384 dimensions"
|
|
60
120
|
fi
|
|
121
|
+
|
|
122
|
+
# Check CLI installation
|
|
123
|
+
echo -e "\nCLI Installation:"
|
|
124
|
+
which claude-self-reflect && echo "CLI installed globally" || echo "CLI not in PATH"
|
|
125
|
+
|
|
126
|
+
# Check server.py size (modularization needed)
|
|
127
|
+
echo -e "\nServer.py Status:"
|
|
128
|
+
wc -l mcp-server/src/server.py | awk '{print "Lines: " $1 " (needs modularization if >1000)"}'
|
|
61
129
|
```
|
|
62
130
|
|
|
63
|
-
### 2.
|
|
131
|
+
### 2. Temporal Tools Testing (v3.x)
|
|
64
132
|
```bash
|
|
65
133
|
#!/bin/bash
|
|
66
|
-
echo "===
|
|
134
|
+
echo "=== TEMPORAL TOOLS TESTING ==="
|
|
67
135
|
|
|
68
|
-
# Test
|
|
69
|
-
|
|
70
|
-
echo "Testing
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
{"type":"conversation","uuid":"test-001","messages":[{"role":"human","content":"Test question"},{"role":"assistant","content":[{"type":"text","text":"Test answer with code:\n```python\nprint('hello')\n```"}]}]}
|
|
74
|
-
EOF
|
|
75
|
-
|
|
76
|
-
python -c "
|
|
77
|
-
import json
|
|
78
|
-
with open('$TEST_FILE') as f:
|
|
79
|
-
data = json.load(f)
|
|
80
|
-
assert data['uuid'] == 'test-001'
|
|
81
|
-
assert len(data['messages']) == 2
|
|
82
|
-
print('✅ PASS: JSONL parsing works')
|
|
83
|
-
" || echo "❌ FAIL: JSONL parsing error"
|
|
84
|
-
rm -f $TEST_FILE
|
|
136
|
+
# Test timestamp indexes exist
|
|
137
|
+
test_timestamp_indexes() {
|
|
138
|
+
echo "Testing timestamp indexes..."
|
|
139
|
+
python scripts/add-timestamp-indexes.py
|
|
140
|
+
echo "✅ Timestamp indexes updated"
|
|
85
141
|
}
|
|
86
142
|
|
|
87
|
-
# Test
|
|
88
|
-
|
|
89
|
-
echo "Testing
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
143
|
+
# Test get_recent_work
|
|
144
|
+
test_get_recent_work() {
|
|
145
|
+
echo "Testing get_recent_work..."
|
|
146
|
+
cat << 'EOF' > /tmp/test_recent_work.py
|
|
147
|
+
import asyncio
|
|
148
|
+
import sys
|
|
149
|
+
import os
|
|
150
|
+
sys.path.insert(0, 'mcp-server/src')
|
|
151
|
+
os.environ['QDRANT_URL'] = 'http://localhost:6333'
|
|
152
|
+
|
|
153
|
+
async def test():
|
|
154
|
+
from server import get_recent_work
|
|
155
|
+
class MockContext:
|
|
156
|
+
async def debug(self, msg): print(f"[DEBUG] {msg}")
|
|
157
|
+
async def report_progress(self, *args): pass
|
|
158
|
+
|
|
159
|
+
ctx = MockContext()
|
|
160
|
+
# Test no scope (should default to current project)
|
|
161
|
+
result1 = await get_recent_work(ctx, limit=3)
|
|
162
|
+
print("No scope result:", "PASS" if "conversation" in result1 else "FAIL")
|
|
163
|
+
|
|
164
|
+
# Test with scope='all'
|
|
165
|
+
result2 = await get_recent_work(ctx, limit=3, project='all')
|
|
166
|
+
print("All scope result:", "PASS" if "conversation" in result2 else "FAIL")
|
|
167
|
+
|
|
168
|
+
# Test with specific project
|
|
169
|
+
result3 = await get_recent_work(ctx, limit=3, project='claude-self-reflect')
|
|
170
|
+
print("Specific project:", "PASS" if "conversation" in result3 else "FAIL")
|
|
171
|
+
|
|
172
|
+
asyncio.run(test())
|
|
173
|
+
EOF
|
|
174
|
+
python /tmp/test_recent_work.py
|
|
104
175
|
}
|
|
105
176
|
|
|
106
|
-
# Test
|
|
107
|
-
|
|
108
|
-
echo "Testing
|
|
109
|
-
|
|
177
|
+
# Test search_by_recency
|
|
178
|
+
test_search_by_recency() {
|
|
179
|
+
echo "Testing search_by_recency..."
|
|
180
|
+
cat << 'EOF' > /tmp/test_search_recency.py
|
|
181
|
+
import asyncio
|
|
182
|
+
import sys
|
|
110
183
|
import os
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
model = TextEmbedding('sentence-transformers/all-MiniLM-L6-v2')
|
|
114
|
-
embeddings = list(model.embed(['test text']))
|
|
115
|
-
if len(embeddings[0]) == 384:
|
|
116
|
-
print('✅ PASS: Local embeddings work (384 dims)')
|
|
117
|
-
else:
|
|
118
|
-
print(f'❌ FAIL: Wrong dimensions: {len(embeddings[0])}')
|
|
119
|
-
"
|
|
120
|
-
}
|
|
184
|
+
sys.path.insert(0, 'mcp-server/src')
|
|
185
|
+
os.environ['QDRANT_URL'] = 'http://localhost:6333'
|
|
121
186
|
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
if
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
187
|
+
async def test():
|
|
188
|
+
from server import search_by_recency
|
|
189
|
+
class MockContext:
|
|
190
|
+
async def debug(self, msg): print(f"[DEBUG] {msg}")
|
|
191
|
+
|
|
192
|
+
ctx = MockContext()
|
|
193
|
+
result = await search_by_recency(ctx, query="test", time_range="last week")
|
|
194
|
+
print("Search by recency:", "PASS" if "result" in result or "no_results" in result else "FAIL")
|
|
195
|
+
|
|
196
|
+
asyncio.run(test())
|
|
197
|
+
EOF
|
|
198
|
+
python /tmp/test_search_recency.py
|
|
134
199
|
}
|
|
135
200
|
|
|
136
|
-
#
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
201
|
+
# Test get_timeline
|
|
202
|
+
test_get_timeline() {
|
|
203
|
+
echo "Testing get_timeline..."
|
|
204
|
+
cat << 'EOF' > /tmp/test_timeline.py
|
|
205
|
+
import asyncio
|
|
206
|
+
import sys
|
|
207
|
+
import os
|
|
208
|
+
sys.path.insert(0, 'mcp-server/src')
|
|
209
|
+
os.environ['QDRANT_URL'] = 'http://localhost:6333'
|
|
142
210
|
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
To test in Claude Code:
|
|
154
|
-
1. Search for any recent conversation topic
|
|
155
|
-
2. Verify results have scores > 0.7
|
|
156
|
-
3. Check that metadata includes files and tools
|
|
211
|
+
async def test():
|
|
212
|
+
from server import get_timeline
|
|
213
|
+
class MockContext:
|
|
214
|
+
async def debug(self, msg): print(f"[DEBUG] {msg}")
|
|
215
|
+
|
|
216
|
+
ctx = MockContext()
|
|
217
|
+
result = await get_timeline(ctx, time_range="last month", granularity="week")
|
|
218
|
+
print("Timeline result:", "PASS" if "timeline" in result else "FAIL")
|
|
219
|
+
|
|
220
|
+
asyncio.run(test())
|
|
157
221
|
EOF
|
|
222
|
+
python /tmp/test_timeline.py
|
|
158
223
|
}
|
|
159
224
|
|
|
160
|
-
# Test
|
|
161
|
-
|
|
162
|
-
echo "Testing
|
|
225
|
+
# Test natural language time parsing
|
|
226
|
+
test_temporal_parsing() {
|
|
227
|
+
echo "Testing temporal parsing..."
|
|
163
228
|
python -c "
|
|
164
|
-
|
|
165
|
-
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
if 'files_analyzed' in point.payload:
|
|
174
|
-
found_files = True
|
|
175
|
-
break
|
|
176
|
-
if found_files:
|
|
177
|
-
break
|
|
178
|
-
|
|
179
|
-
if found_files:
|
|
180
|
-
print('✅ PASS: File metadata available for search')
|
|
181
|
-
else:
|
|
182
|
-
print('⚠️ WARN: No file metadata found (run delta-metadata-update.py)')
|
|
229
|
+
from mcp_server.src.temporal_utils import TemporalParser
|
|
230
|
+
parser = TemporalParser()
|
|
231
|
+
tests = ['yesterday', 'last week', 'past 3 days']
|
|
232
|
+
for expr in tests:
|
|
233
|
+
try:
|
|
234
|
+
start, end = parser.parse_time_expression(expr)
|
|
235
|
+
print(f'✅ {expr}: {start.date()} to {end.date()}')
|
|
236
|
+
except Exception as e:
|
|
237
|
+
print(f'❌ {expr}: {e}')
|
|
183
238
|
"
|
|
184
239
|
}
|
|
185
240
|
|
|
186
|
-
#
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
|
|
190
|
-
|
|
191
|
-
|
|
192
|
-
echo "2. Search for it immediately"
|
|
193
|
-
echo "3. Verify it's retrievable"
|
|
194
|
-
}
|
|
195
|
-
|
|
196
|
-
test_mcp_search
|
|
197
|
-
test_search_by_file
|
|
198
|
-
test_reflection_storage
|
|
241
|
+
# Run all temporal tests
|
|
242
|
+
test_timestamp_indexes
|
|
243
|
+
test_get_recent_work
|
|
244
|
+
test_search_by_recency
|
|
245
|
+
test_get_timeline
|
|
246
|
+
test_temporal_parsing
|
|
199
247
|
```
|
|
200
248
|
|
|
201
|
-
###
|
|
249
|
+
### 3. CLI Tool Testing (Enhanced)
|
|
202
250
|
```bash
|
|
203
251
|
#!/bin/bash
|
|
204
|
-
|
|
205
|
-
|
|
206
|
-
echo "=== DUAL-MODE TESTING WITH AUTO-RESTORE ==="
|
|
252
|
+
echo "=== CLI TOOL TESTING ==="
|
|
207
253
|
|
|
208
|
-
#
|
|
209
|
-
|
|
210
|
-
echo "
|
|
254
|
+
# Test CLI installation
|
|
255
|
+
test_cli_installation() {
|
|
256
|
+
echo "Testing CLI installation..."
|
|
211
257
|
|
|
212
|
-
#
|
|
213
|
-
if
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
|
|
217
|
-
|
|
218
|
-
# Update MCP to use local
|
|
219
|
-
claude mcp remove claude-self-reflect 2>/dev/null
|
|
220
|
-
claude mcp add claude-self-reflect \
|
|
221
|
-
"$(pwd)/mcp-server/run-mcp.sh" \
|
|
222
|
-
-e QDRANT_URL="http://localhost:6333" \
|
|
223
|
-
-e PREFER_LOCAL_EMBEDDINGS="true" \
|
|
224
|
-
-s user
|
|
225
|
-
|
|
226
|
-
# Restart containers if needed
|
|
227
|
-
if docker ps | grep -q streaming-importer; then
|
|
228
|
-
docker-compose restart streaming-importer
|
|
258
|
+
# Check if installed globally
|
|
259
|
+
if command -v claude-self-reflect &> /dev/null; then
|
|
260
|
+
VERSION=$(claude-self-reflect --version 2>/dev/null || echo "unknown")
|
|
261
|
+
echo "✅ CLI installed globally (version: $VERSION)"
|
|
262
|
+
else
|
|
263
|
+
echo "❌ CLI not found in PATH"
|
|
229
264
|
fi
|
|
230
265
|
|
|
231
|
-
|
|
232
|
-
|
|
233
|
-
|
|
234
|
-
|
|
235
|
-
|
|
236
|
-
|
|
237
|
-
|
|
238
|
-
|
|
239
|
-
echo "=== Testing Local Mode (FastEmbed) ==="
|
|
240
|
-
export PREFER_LOCAL_EMBEDDINGS=true
|
|
241
|
-
|
|
242
|
-
# Create test data
|
|
243
|
-
TEST_DIR="/tmp/test-local-$$"
|
|
244
|
-
mkdir -p "$TEST_DIR"
|
|
245
|
-
cat > "$TEST_DIR/test.jsonl" << 'EOF'
|
|
246
|
-
{"type":"conversation","uuid":"local-test","messages":[{"role":"human","content":"Local mode test"}]}
|
|
247
|
-
EOF
|
|
248
|
-
|
|
249
|
-
# Import and verify
|
|
250
|
-
python scripts/import-conversations-unified.py --file "$TEST_DIR/test.jsonl"
|
|
266
|
+
# Check package.json files
|
|
267
|
+
echo "Checking package files..."
|
|
268
|
+
FILES=(
|
|
269
|
+
"package.json"
|
|
270
|
+
"cli/package.json"
|
|
271
|
+
"cli/src/index.js"
|
|
272
|
+
"cli/src/setup-wizard.js"
|
|
273
|
+
)
|
|
251
274
|
|
|
252
|
-
|
|
253
|
-
|
|
254
|
-
|
|
255
|
-
DIMS=$(curl -s "http://localhost:6333/collections/$COLLECTION" | jq '.result.config.params.vectors.size')
|
|
256
|
-
if [ "$DIMS" = "384" ]; then
|
|
257
|
-
echo "✅ PASS: Local mode uses 384 dimensions"
|
|
275
|
+
for file in "${FILES[@]}"; do
|
|
276
|
+
if [ -f "$file" ]; then
|
|
277
|
+
echo "✅ $file exists"
|
|
258
278
|
else
|
|
259
|
-
echo "❌
|
|
279
|
+
echo "❌ $file missing"
|
|
260
280
|
fi
|
|
261
|
-
|
|
262
|
-
|
|
263
|
-
rm -rf "$TEST_DIR"
|
|
281
|
+
done
|
|
264
282
|
}
|
|
265
283
|
|
|
266
|
-
# Test
|
|
267
|
-
|
|
268
|
-
|
|
269
|
-
echo "⚠️ SKIP: No Voyage API key configured"
|
|
270
|
-
return
|
|
271
|
-
fi
|
|
284
|
+
# Test CLI commands
|
|
285
|
+
test_cli_commands() {
|
|
286
|
+
echo "Testing CLI commands..."
|
|
272
287
|
|
|
273
|
-
|
|
274
|
-
|
|
275
|
-
export VOYAGE_KEY=$(grep VOYAGE_KEY .env | cut -d= -f2)
|
|
288
|
+
# Test status command
|
|
289
|
+
claude-self-reflect status 2>/dev/null && echo "✅ Status command works" || echo "❌ Status command failed"
|
|
276
290
|
|
|
277
|
-
#
|
|
278
|
-
|
|
279
|
-
|
|
280
|
-
|
|
281
|
-
|
|
282
|
-
|
|
291
|
+
# Test help
|
|
292
|
+
claude-self-reflect --help 2>/dev/null && echo "✅ Help works" || echo "❌ Help failed"
|
|
293
|
+
}
|
|
294
|
+
|
|
295
|
+
# Test npm packaging
|
|
296
|
+
test_npm_packaging() {
|
|
297
|
+
echo "Testing npm packaging..."
|
|
283
298
|
|
|
284
|
-
#
|
|
285
|
-
|
|
299
|
+
# Check if publishable
|
|
300
|
+
npm pack --dry-run 2>&1 | grep -q "claude-self-reflect" && \
|
|
301
|
+
echo "✅ Package is publishable" || \
|
|
302
|
+
echo "❌ Package issues detected"
|
|
286
303
|
|
|
287
|
-
# Check
|
|
288
|
-
|
|
289
|
-
|
|
290
|
-
|
|
291
|
-
if [ "$DIMS" = "1024" ]; then
|
|
292
|
-
echo "✅ PASS: Cloud mode uses 1024 dimensions"
|
|
293
|
-
else
|
|
294
|
-
echo "❌ FAIL: Wrong dimensions: $DIMS"
|
|
295
|
-
fi
|
|
296
|
-
fi
|
|
297
|
-
|
|
298
|
-
rm -rf "$TEST_DIR"
|
|
304
|
+
# Check dependencies
|
|
305
|
+
npm ls --depth=0 2>&1 | grep -q "UNMET" && \
|
|
306
|
+
echo "❌ Unmet dependencies" || \
|
|
307
|
+
echo "✅ Dependencies satisfied"
|
|
299
308
|
}
|
|
300
309
|
|
|
301
|
-
|
|
302
|
-
|
|
303
|
-
|
|
304
|
-
|
|
305
|
-
# Trap ensures restoration even if tests fail
|
|
310
|
+
test_cli_installation
|
|
311
|
+
test_cli_commands
|
|
312
|
+
test_npm_packaging
|
|
306
313
|
```
|
|
307
314
|
|
|
308
|
-
###
|
|
315
|
+
### 4. Import Pipeline Validation (Enhanced)
|
|
309
316
|
```bash
|
|
310
317
|
#!/bin/bash
|
|
311
|
-
echo "===
|
|
318
|
+
echo "=== IMPORT PIPELINE VALIDATION ==="
|
|
312
319
|
|
|
313
|
-
# Test
|
|
314
|
-
|
|
315
|
-
echo "Testing
|
|
320
|
+
# Test unified importer
|
|
321
|
+
test_unified_importer() {
|
|
322
|
+
echo "Testing unified importer..."
|
|
316
323
|
|
|
317
|
-
# Find a test file
|
|
324
|
+
# Find a test JSONL file
|
|
318
325
|
TEST_FILE=$(find ~/.claude/projects -name "*.jsonl" -type f | head -1)
|
|
319
326
|
if [ -z "$TEST_FILE" ]; then
|
|
320
|
-
echo "⚠️
|
|
327
|
+
echo "⚠️ No test files available"
|
|
321
328
|
return
|
|
322
329
|
fi
|
|
323
330
|
|
|
324
|
-
#
|
|
325
|
-
|
|
326
|
-
PROJECT_NAME=$(basename "$PROJECT_DIR")
|
|
327
|
-
COLLECTION="${PROJECT_NAME}_local"
|
|
331
|
+
# Test with limit
|
|
332
|
+
python scripts/import-conversations-unified.py --file "$TEST_FILE" --limit 1
|
|
328
333
|
|
|
329
|
-
|
|
330
|
-
|
|
331
|
-
|
|
332
|
-
# Force re-import
|
|
333
|
-
python scripts/import-conversations-unified.py --file "$TEST_FILE" --force
|
|
334
|
-
|
|
335
|
-
# Count after
|
|
336
|
-
COUNT_AFTER=$(curl -s "http://localhost:6333/collections/$COLLECTION/points/count" | jq '.result.count')
|
|
337
|
-
|
|
338
|
-
if [ "$COUNT_BEFORE" = "$COUNT_AFTER" ]; then
|
|
339
|
-
echo "✅ PASS: No duplicates created on re-import"
|
|
334
|
+
if [ $? -eq 0 ]; then
|
|
335
|
+
echo "✅ Unified importer works"
|
|
340
336
|
else
|
|
341
|
-
echo "❌
|
|
337
|
+
echo "❌ Unified importer failed"
|
|
342
338
|
fi
|
|
343
339
|
}
|
|
344
340
|
|
|
345
|
-
# Test
|
|
346
|
-
|
|
347
|
-
echo "Testing
|
|
348
|
-
|
|
349
|
-
#
|
|
350
|
-
python scripts/import-conversations-unified.py --limit 1
|
|
351
|
-
|
|
352
|
-
|
|
353
|
-
|
|
354
|
-
|
|
355
|
-
|
|
356
|
-
|
|
357
|
-
|
|
358
|
-
|
|
341
|
+
# Test for zero chunks/vectors - CRITICAL
|
|
342
|
+
test_zero_chunks_detection() {
|
|
343
|
+
echo "Testing zero chunks/vectors detection..."
|
|
344
|
+
|
|
345
|
+
# Check recent imports for zero chunks
|
|
346
|
+
IMPORT_LOG=$(python scripts/import-conversations-unified.py --limit 5 2>&1)
|
|
347
|
+
|
|
348
|
+
# Check for zero chunks warnings
|
|
349
|
+
if echo "$IMPORT_LOG" | grep -q "Imported 0 chunks"; then
|
|
350
|
+
echo "❌ CRITICAL: Found imports with 0 chunks!"
|
|
351
|
+
echo " Files producing 0 chunks:"
|
|
352
|
+
echo "$IMPORT_LOG" | grep -B1 "Imported 0 chunks" | grep "import of"
|
|
353
|
+
|
|
354
|
+
# Analyze why chunks are zero
|
|
355
|
+
echo " Analyzing root cause..."
|
|
356
|
+
|
|
357
|
+
# Check for thinking-only content
|
|
358
|
+
PROBLEM_FILE=$(echo "$IMPORT_LOG" | grep -B1 "Imported 0 chunks" | grep "\.jsonl" | head -1 | awk '{print $NF}')
|
|
359
|
+
if [ -n "$PROBLEM_FILE" ]; then
|
|
360
|
+
python -c "
|
|
361
|
+
import json
|
|
362
|
+
file_path = '$PROBLEM_FILE'
|
|
363
|
+
has_thinking = 0
|
|
364
|
+
has_text = 0
|
|
365
|
+
with open(file_path, 'r') as f:
|
|
366
|
+
for line in f:
|
|
367
|
+
data = json.loads(line.strip())
|
|
368
|
+
if 'message' in data and data['message']:
|
|
369
|
+
content = data['message'].get('content', [])
|
|
370
|
+
if isinstance(content, list):
|
|
371
|
+
for item in content:
|
|
372
|
+
if isinstance(item, dict):
|
|
373
|
+
if item.get('type') == 'thinking':
|
|
374
|
+
has_thinking += 1
|
|
375
|
+
elif item.get('type') == 'text':
|
|
376
|
+
has_text += 1
|
|
377
|
+
print(f' Thinking blocks: {has_thinking}')
|
|
378
|
+
print(f' Text blocks: {has_text}')
|
|
379
|
+
if has_thinking > 0 and has_text == 0:
|
|
380
|
+
print(' ⚠️ File has only thinking content - import script may need fix')
|
|
381
|
+
"
|
|
382
|
+
fi
|
|
383
|
+
|
|
384
|
+
# DO NOT CERTIFY WITH ZERO CHUNKS
|
|
385
|
+
echo " ⛔ CERTIFICATION BLOCKED: Fix zero chunks issue before certifying!"
|
|
386
|
+
return 1
|
|
359
387
|
else
|
|
360
|
-
echo "
|
|
388
|
+
echo "✅ No zero chunks detected in recent imports"
|
|
361
389
|
fi
|
|
390
|
+
|
|
391
|
+
# Also check Qdrant for empty collections
|
|
392
|
+
python -c "
|
|
393
|
+
from qdrant_client import QdrantClient
|
|
394
|
+
client = QdrantClient('http://localhost:6333')
|
|
395
|
+
collections = client.get_collections().collections
|
|
396
|
+
empty_collections = []
|
|
397
|
+
for col in collections:
|
|
398
|
+
count = client.count(collection_name=col.name).count
|
|
399
|
+
if count == 0:
|
|
400
|
+
empty_collections.append(col.name)
|
|
401
|
+
if empty_collections:
|
|
402
|
+
print(f'❌ Found {len(empty_collections)} empty collections: {empty_collections}')
|
|
403
|
+
print(' ⛔ CERTIFICATION BLOCKED: Empty collections detected!')
|
|
404
|
+
else:
|
|
405
|
+
print('✅ All collections have vectors')
|
|
406
|
+
" 2>/dev/null || echo "⚠️ Could not check Qdrant collections"
|
|
362
407
|
}
|
|
363
408
|
|
|
364
|
-
# Test
|
|
365
|
-
|
|
366
|
-
echo "Testing
|
|
409
|
+
# Test streaming importer
|
|
410
|
+
test_streaming_importer() {
|
|
411
|
+
echo "Testing streaming importer..."
|
|
367
412
|
|
|
368
|
-
|
|
369
|
-
|
|
370
|
-
|
|
371
|
-
|
|
372
|
-
echo "
|
|
373
|
-
else
|
|
374
|
-
echo "❌ FAIL: State file corrupted"
|
|
375
|
-
fi
|
|
413
|
+
if docker ps | grep -q streaming-importer; then
|
|
414
|
+
# Check if processing
|
|
415
|
+
docker logs streaming-importer --tail 10 | grep -q "Processing" && \
|
|
416
|
+
echo "✅ Streaming importer active" || \
|
|
417
|
+
echo "⚠️ Streaming importer idle"
|
|
376
418
|
else
|
|
377
|
-
echo "
|
|
419
|
+
echo "❌ Streaming importer not running"
|
|
378
420
|
fi
|
|
379
421
|
}
|
|
380
422
|
|
|
381
|
-
|
|
382
|
-
|
|
383
|
-
|
|
423
|
+
# Test delta metadata update
|
|
424
|
+
test_delta_metadata() {
|
|
425
|
+
echo "Testing delta metadata update..."
|
|
426
|
+
|
|
427
|
+
DRY_RUN=true python scripts/delta-metadata-update.py 2>&1 | grep -q "would update" && \
|
|
428
|
+
echo "✅ Delta metadata updater works" || \
|
|
429
|
+
echo "❌ Delta metadata updater failed"
|
|
430
|
+
}
|
|
431
|
+
|
|
432
|
+
test_unified_importer
|
|
433
|
+
test_zero_chunks_detection # CRITICAL: Must pass before certification
|
|
434
|
+
test_streaming_importer
|
|
435
|
+
test_delta_metadata
|
|
384
436
|
```
|
|
385
437
|
|
|
386
|
-
###
|
|
438
|
+
### 5. Hook System Testing
|
|
387
439
|
```bash
|
|
388
440
|
#!/bin/bash
|
|
389
|
-
echo "===
|
|
441
|
+
echo "=== HOOK SYSTEM TESTING ==="
|
|
442
|
+
|
|
443
|
+
# Test session-start hook
|
|
444
|
+
test_session_start_hook() {
|
|
445
|
+
echo "Testing session-start hook..."
|
|
446
|
+
HOOK_PATH="$HOME/.claude/hooks/session-start"
|
|
447
|
+
if [ -f "$HOOK_PATH" ]; then
|
|
448
|
+
echo "✅ session-start hook exists"
|
|
449
|
+
# Check if executable
|
|
450
|
+
[ -x "$HOOK_PATH" ] && echo "✅ Hook is executable" || echo "❌ Hook not executable"
|
|
451
|
+
else
|
|
452
|
+
echo "⚠️ session-start hook not configured"
|
|
453
|
+
fi
|
|
454
|
+
}
|
|
390
455
|
|
|
391
|
-
# Test
|
|
392
|
-
|
|
393
|
-
echo "Testing
|
|
394
|
-
|
|
395
|
-
|
|
396
|
-
|
|
397
|
-
|
|
398
|
-
|
|
399
|
-
|
|
400
|
-
|
|
401
|
-
DURATION=$((END_TIME - START_TIME))
|
|
402
|
-
|
|
403
|
-
if [ $DURATION -lt 10 ]; then
|
|
404
|
-
echo "✅ PASS: Import completed in ${DURATION}s"
|
|
405
|
-
else
|
|
406
|
-
echo "⚠️ WARN: Import took ${DURATION}s (expected <10s)"
|
|
407
|
-
fi
|
|
456
|
+
# Test precompact hook
|
|
457
|
+
test_precompact_hook() {
|
|
458
|
+
echo "Testing precompact hook..."
|
|
459
|
+
HOOK_PATH="$HOME/.claude/hooks/precompact"
|
|
460
|
+
if [ -f "$HOOK_PATH" ]; then
|
|
461
|
+
echo "✅ precompact hook exists"
|
|
462
|
+
# Test execution
|
|
463
|
+
timeout 10 "$HOOK_PATH" && echo "✅ Hook executes successfully" || echo "❌ Hook failed"
|
|
464
|
+
else
|
|
465
|
+
echo "⚠️ precompact hook not configured"
|
|
408
466
|
fi
|
|
409
467
|
}
|
|
410
468
|
|
|
411
|
-
|
|
412
|
-
|
|
413
|
-
|
|
414
|
-
|
|
469
|
+
test_session_start_hook
|
|
470
|
+
test_precompact_hook
|
|
471
|
+
```
|
|
472
|
+
|
|
473
|
+
### 6. Metadata Extraction Testing
|
|
474
|
+
```bash
|
|
475
|
+
#!/bin/bash
|
|
476
|
+
echo "=== METADATA EXTRACTION TESTING ==="
|
|
477
|
+
|
|
478
|
+
# Test metadata extraction
|
|
479
|
+
test_metadata_extraction() {
|
|
480
|
+
echo "Testing metadata extraction..."
|
|
415
481
|
python -c "
|
|
416
|
-
import
|
|
482
|
+
import json
|
|
483
|
+
from pathlib import Path
|
|
484
|
+
|
|
485
|
+
# Check if metadata is being extracted
|
|
486
|
+
config_dir = Path.home() / '.claude-self-reflect' / 'config'
|
|
487
|
+
delta_state = config_dir / 'delta-update-state.json'
|
|
488
|
+
|
|
489
|
+
if delta_state.exists():
|
|
490
|
+
with open(delta_state) as f:
|
|
491
|
+
state = json.load(f)
|
|
492
|
+
updated = state.get('updated_points', {})
|
|
493
|
+
if updated:
|
|
494
|
+
sample = list(updated.values())[0] if updated else {}
|
|
495
|
+
print(f'✅ Metadata extracted for {len(updated)} points')
|
|
496
|
+
if 'files_analyzed' in str(sample):
|
|
497
|
+
print('✅ files_analyzed metadata present')
|
|
498
|
+
if 'tools_used' in str(sample):
|
|
499
|
+
print('✅ tools_used metadata present')
|
|
500
|
+
if 'concepts' in str(sample):
|
|
501
|
+
print('✅ concepts metadata present')
|
|
502
|
+
if 'code_patterns' in str(sample):
|
|
503
|
+
print('✅ code_patterns (AST) metadata present')
|
|
504
|
+
else:
|
|
505
|
+
print('⚠️ No metadata updates found')
|
|
506
|
+
else:
|
|
507
|
+
print('❌ Delta update state file not found')
|
|
508
|
+
"
|
|
509
|
+
}
|
|
510
|
+
|
|
511
|
+
# Test AST pattern extraction
|
|
512
|
+
test_ast_patterns() {
|
|
513
|
+
echo "Testing AST pattern extraction..."
|
|
514
|
+
TEST_FILE=$(mktemp)
|
|
515
|
+
cat > "$TEST_FILE" << 'EOF'
|
|
516
|
+
import ast
|
|
517
|
+
text = "def test(): return True"
|
|
518
|
+
tree = ast.parse(text)
|
|
519
|
+
patterns = [node.__class__.__name__ for node in ast.walk(tree)]
|
|
520
|
+
print(f"AST patterns: {patterns}")
|
|
521
|
+
EOF
|
|
522
|
+
python "$TEST_FILE"
|
|
523
|
+
rm "$TEST_FILE"
|
|
524
|
+
}
|
|
525
|
+
|
|
526
|
+
test_metadata_extraction
|
|
527
|
+
test_ast_patterns
|
|
528
|
+
```
|
|
529
|
+
|
|
530
|
+
### 7. Zero Vector Investigation
|
|
531
|
+
```bash
|
|
532
|
+
#!/bin/bash
|
|
533
|
+
echo "=== ZERO VECTOR INVESTIGATION ==="
|
|
534
|
+
|
|
535
|
+
test_zero_vectors() {
|
|
536
|
+
python -c "
|
|
537
|
+
import numpy as np
|
|
417
538
|
from qdrant_client import QdrantClient
|
|
418
|
-
from fastembed import TextEmbedding
|
|
419
539
|
|
|
540
|
+
# Connect to Qdrant
|
|
420
541
|
client = QdrantClient('http://localhost:6333')
|
|
421
|
-
model = TextEmbedding('sentence-transformers/all-MiniLM-L6-v2')
|
|
422
542
|
|
|
423
|
-
#
|
|
424
|
-
|
|
425
|
-
|
|
426
|
-
|
|
427
|
-
|
|
428
|
-
|
|
429
|
-
|
|
430
|
-
|
|
431
|
-
|
|
432
|
-
|
|
433
|
-
|
|
434
|
-
|
|
435
|
-
|
|
436
|
-
|
|
437
|
-
|
|
438
|
-
|
|
543
|
+
# Check for zero vectors
|
|
544
|
+
collections = client.get_collections().collections
|
|
545
|
+
zero_count = 0
|
|
546
|
+
total_checked = 0
|
|
547
|
+
|
|
548
|
+
for col in collections[:5]: # Check first 5 collections
|
|
549
|
+
try:
|
|
550
|
+
points = client.scroll(
|
|
551
|
+
collection_name=col.name,
|
|
552
|
+
limit=10,
|
|
553
|
+
with_vectors=True
|
|
554
|
+
)[0]
|
|
555
|
+
|
|
556
|
+
for point in points:
|
|
557
|
+
total_checked += 1
|
|
558
|
+
if point.vector:
|
|
559
|
+
if isinstance(point.vector, list) and all(v == 0 for v in point.vector):
|
|
560
|
+
zero_count += 1
|
|
561
|
+
print(f'❌ CRITICAL: Zero vector in {col.name}, point {point.id}')
|
|
562
|
+
elif isinstance(point.vector, dict):
|
|
563
|
+
for vec_name, vec in point.vector.items():
|
|
564
|
+
if all(v == 0 for v in vec):
|
|
565
|
+
zero_count += 1
|
|
566
|
+
print(f'❌ CRITICAL: Zero vector in {col.name}, point {point.id}, vector {vec_name}')
|
|
567
|
+
except Exception as e:
|
|
568
|
+
print(f'⚠️ Error checking {col.name}: {e}')
|
|
569
|
+
|
|
570
|
+
if zero_count == 0:
|
|
571
|
+
print(f'✅ No zero vectors found (checked {total_checked} points)')
|
|
439
572
|
else:
|
|
440
|
-
print(f'
|
|
573
|
+
print(f'❌ Found {zero_count} zero vectors out of {total_checked} points')
|
|
441
574
|
"
|
|
442
575
|
}
|
|
443
576
|
|
|
444
|
-
# Test
|
|
445
|
-
|
|
446
|
-
echo "Testing
|
|
447
|
-
|
|
448
|
-
|
|
449
|
-
|
|
450
|
-
|
|
451
|
-
|
|
452
|
-
|
|
577
|
+
# Test embedding generation
|
|
578
|
+
test_embedding_generation() {
|
|
579
|
+
echo "Testing embedding generation..."
|
|
580
|
+
python -c "
|
|
581
|
+
try:
|
|
582
|
+
from fastembed import TextEmbedding
|
|
583
|
+
model = TextEmbedding('sentence-transformers/all-MiniLM-L6-v2')
|
|
584
|
+
texts = ['test', 'hello world', '']
|
|
585
|
+
|
|
586
|
+
for text in texts:
|
|
587
|
+
embedding = list(model.embed([text]))[0]
|
|
588
|
+
is_zero = all(v == 0 for v in embedding)
|
|
589
|
+
if is_zero:
|
|
590
|
+
print(f'❌ CRITICAL: Zero embedding for \'{text}\'')
|
|
591
|
+
else:
|
|
592
|
+
import numpy as np
|
|
593
|
+
print(f'✅ Non-zero embedding for \'{text}\' (mean={np.mean(embedding):.4f})')
|
|
594
|
+
except ImportError:
|
|
595
|
+
print('❌ FastEmbed not installed')
|
|
596
|
+
"
|
|
597
|
+
}
|
|
598
|
+
|
|
599
|
+
test_zero_vectors
|
|
600
|
+
test_embedding_generation
|
|
601
|
+
```
|
|
602
|
+
|
|
603
|
+
### 8. Sub-Agent Testing
|
|
604
|
+
```bash
|
|
605
|
+
#!/bin/bash
|
|
606
|
+
echo "=== SUB-AGENT TESTING ==="
|
|
607
|
+
|
|
608
|
+
# List all sub-agents
|
|
609
|
+
test_subagent_availability() {
|
|
610
|
+
echo "Checking sub-agent availability..."
|
|
611
|
+
AGENTS_DIR="$HOME/projects/claude-self-reflect/.claude/agents"
|
|
612
|
+
|
|
613
|
+
EXPECTED_AGENTS=(
|
|
614
|
+
"claude-self-reflect-test.md"
|
|
615
|
+
"import-debugger.md"
|
|
616
|
+
"docker-orchestrator.md"
|
|
617
|
+
"mcp-integration.md"
|
|
618
|
+
"search-optimizer.md"
|
|
619
|
+
"reflection-specialist.md"
|
|
620
|
+
"qdrant-specialist.md"
|
|
621
|
+
)
|
|
622
|
+
|
|
623
|
+
for agent in "${EXPECTED_AGENTS[@]}"; do
|
|
624
|
+
if [ -f "$AGENTS_DIR/$agent" ]; then
|
|
625
|
+
echo "✅ $agent present"
|
|
453
626
|
else
|
|
454
|
-
echo "
|
|
627
|
+
echo "❌ $agent missing"
|
|
455
628
|
fi
|
|
629
|
+
done
|
|
630
|
+
}
|
|
631
|
+
|
|
632
|
+
test_subagent_availability
|
|
633
|
+
```
|
|
634
|
+
|
|
635
|
+
### 9. Embedding Mode Comprehensive Test
|
|
636
|
+
```bash
|
|
637
|
+
#!/bin/bash
|
|
638
|
+
echo "=== EMBEDDING MODE TESTING ==="
|
|
639
|
+
|
|
640
|
+
# Test both modes
|
|
641
|
+
test_both_embedding_modes() {
|
|
642
|
+
echo "Testing local mode (FastEmbed)..."
|
|
643
|
+
PREFER_LOCAL_EMBEDDINGS=true python -c "
|
|
644
|
+
from mcp_server.src.embedding_manager import get_embedding_manager
|
|
645
|
+
em = get_embedding_manager()
|
|
646
|
+
print(f'Local mode: {em.model_type}, dimension: {em.get_vector_dimension()}')
|
|
647
|
+
"
|
|
648
|
+
|
|
649
|
+
if [ -n "$VOYAGE_KEY" ]; then
|
|
650
|
+
echo "Testing cloud mode (Voyage AI)..."
|
|
651
|
+
PREFER_LOCAL_EMBEDDINGS=false python -c "
|
|
652
|
+
from mcp_server.src.embedding_manager import get_embedding_manager
|
|
653
|
+
em = get_embedding_manager()
|
|
654
|
+
print(f'Cloud mode: {em.model_type}, dimension: {em.get_vector_dimension()}')
|
|
655
|
+
"
|
|
656
|
+
else
|
|
657
|
+
echo "⚠️ VOYAGE_KEY not set, skipping cloud mode test"
|
|
456
658
|
fi
|
|
457
659
|
}
|
|
458
660
|
|
|
459
|
-
|
|
460
|
-
|
|
461
|
-
|
|
661
|
+
# Test mode switching
|
|
662
|
+
test_mode_switching() {
|
|
663
|
+
echo "Testing mode switching..."
|
|
664
|
+
python -c "
|
|
665
|
+
from pathlib import Path
|
|
666
|
+
env_file = Path('.env')
|
|
667
|
+
if env_file.exists():
|
|
668
|
+
content = env_file.read_text()
|
|
669
|
+
if 'PREFER_LOCAL_EMBEDDINGS=false' in content:
|
|
670
|
+
print('Currently in CLOUD mode')
|
|
671
|
+
else:
|
|
672
|
+
print('Currently in LOCAL mode')
|
|
673
|
+
|
|
674
|
+
# Test switching
|
|
675
|
+
print('Testing switch to LOCAL mode...')
|
|
676
|
+
new_content = content.replace('PREFER_LOCAL_EMBEDDINGS=false', 'PREFER_LOCAL_EMBEDDINGS=true')
|
|
677
|
+
env_file.write_text(new_content)
|
|
678
|
+
print('✅ Switched to LOCAL mode')
|
|
679
|
+
else:
|
|
680
|
+
print('⚠️ .env file not found')
|
|
681
|
+
"
|
|
682
|
+
}
|
|
683
|
+
|
|
684
|
+
test_both_embedding_modes
|
|
685
|
+
test_mode_switching
|
|
462
686
|
```
|
|
463
687
|
|
|
464
|
-
###
|
|
688
|
+
### 10. MCP Tools Comprehensive Test
|
|
465
689
|
```bash
|
|
466
690
|
#!/bin/bash
|
|
467
|
-
echo "===
|
|
691
|
+
echo "=== MCP TOOLS COMPREHENSIVE TEST ==="
|
|
692
|
+
|
|
693
|
+
# This should be run via Claude Code for actual MCP testing
|
|
694
|
+
cat << 'EOF'
|
|
695
|
+
To test all MCP tools in Claude Code:
|
|
696
|
+
|
|
697
|
+
1. SEARCH TOOLS:
|
|
698
|
+
- mcp__claude-self-reflect__reflect_on_past("test query", limit=3)
|
|
699
|
+
- mcp__claude-self-reflect__quick_search("test")
|
|
700
|
+
- mcp__claude-self-reflect__search_summary("test")
|
|
701
|
+
- mcp__claude-self-reflect__search_by_file("server.py")
|
|
702
|
+
- mcp__claude-self-reflect__search_by_concept("testing")
|
|
703
|
+
|
|
704
|
+
2. TEMPORAL TOOLS (NEW):
|
|
705
|
+
- mcp__claude-self-reflect__get_recent_work(limit=5)
|
|
706
|
+
- mcp__claude-self-reflect__get_recent_work(project="all")
|
|
707
|
+
- mcp__claude-self-reflect__search_by_recency("bug", time_range="last week")
|
|
708
|
+
- mcp__claude-self-reflect__get_timeline(time_range="last month", granularity="week")
|
|
709
|
+
|
|
710
|
+
3. REFLECTION TOOLS:
|
|
711
|
+
- mcp__claude-self-reflect__store_reflection("Test insight", tags=["test"])
|
|
712
|
+
- mcp__claude-self-reflect__get_full_conversation("conversation-id")
|
|
713
|
+
|
|
714
|
+
4. PAGINATION:
|
|
715
|
+
- mcp__claude-self-reflect__get_more_results("query", offset=3)
|
|
716
|
+
- mcp__claude-self-reflect__get_next_results("query", offset=3)
|
|
717
|
+
|
|
718
|
+
Expected Results:
|
|
719
|
+
- All tools should return valid XML/markdown responses
|
|
720
|
+
- Search scores should be > 0.3 for relevant results
|
|
721
|
+
- Temporal tools should respect project scoping
|
|
722
|
+
- No errors or timeouts
|
|
723
|
+
EOF
|
|
724
|
+
```
|
|
468
725
|
|
|
469
|
-
|
|
470
|
-
|
|
471
|
-
|
|
472
|
-
|
|
473
|
-
|
|
474
|
-
|
|
475
|
-
|
|
476
|
-
|
|
477
|
-
)
|
|
726
|
+
### 6. Docker Health Validation
|
|
727
|
+
```bash
|
|
728
|
+
#!/bin/bash
|
|
729
|
+
echo "=== DOCKER HEALTH VALIDATION ==="
|
|
730
|
+
|
|
731
|
+
# Check Qdrant health
|
|
732
|
+
check_qdrant_health() {
|
|
733
|
+
echo "Checking Qdrant health..."
|
|
478
734
|
|
|
479
|
-
|
|
480
|
-
|
|
481
|
-
|
|
482
|
-
|
|
483
|
-
|
|
735
|
+
# Check if running
|
|
736
|
+
if docker ps | grep -q qdrant; then
|
|
737
|
+
# Check API responsive
|
|
738
|
+
curl -s http://localhost:6333/health | grep -q "ok" && \
|
|
739
|
+
echo "✅ Qdrant healthy" || \
|
|
740
|
+
echo "❌ Qdrant API not responding"
|
|
741
|
+
|
|
742
|
+
# Check disk usage
|
|
743
|
+
DISK_USAGE=$(docker exec qdrant df -h /qdrant/storage | tail -1 | awk '{print $5}' | sed 's/%//')
|
|
744
|
+
if [ "$DISK_USAGE" -lt 80 ]; then
|
|
745
|
+
echo "✅ Disk usage: ${DISK_USAGE}%"
|
|
746
|
+
else
|
|
747
|
+
echo "⚠️ High disk usage: ${DISK_USAGE}%"
|
|
484
748
|
fi
|
|
485
|
-
|
|
486
|
-
|
|
487
|
-
if [ "$EXPOSED" = false ]; then
|
|
488
|
-
echo "✅ PASS: No API key exposure detected"
|
|
749
|
+
else
|
|
750
|
+
echo "❌ Qdrant not running"
|
|
489
751
|
fi
|
|
490
752
|
}
|
|
491
753
|
|
|
492
|
-
# Check
|
|
493
|
-
|
|
494
|
-
echo "Checking
|
|
754
|
+
# Check watcher health
|
|
755
|
+
check_watcher_health() {
|
|
756
|
+
echo "Checking watcher health..."
|
|
495
757
|
|
|
496
|
-
|
|
497
|
-
if
|
|
498
|
-
# Check
|
|
499
|
-
|
|
500
|
-
if [ -
|
|
501
|
-
echo "✅
|
|
758
|
+
WATCHER_NAME="claude-reflection-safe-watcher"
|
|
759
|
+
if docker ps | grep -q "$WATCHER_NAME"; then
|
|
760
|
+
# Check memory usage
|
|
761
|
+
MEM=$(docker stats --no-stream --format "{{.MemUsage}}" "$WATCHER_NAME" 2>/dev/null | cut -d'/' -f1 | sed 's/[^0-9.]//g')
|
|
762
|
+
if [ -n "$MEM" ]; then
|
|
763
|
+
echo "✅ Watcher running (Memory: ${MEM}MB)"
|
|
502
764
|
else
|
|
503
|
-
echo "⚠️
|
|
765
|
+
echo "⚠️ Watcher running but stats unavailable"
|
|
504
766
|
fi
|
|
767
|
+
|
|
768
|
+
# Check for errors in logs
|
|
769
|
+
ERROR_COUNT=$(docker logs "$WATCHER_NAME" --tail 100 2>&1 | grep -c ERROR)
|
|
770
|
+
if [ "$ERROR_COUNT" -eq 0 ]; then
|
|
771
|
+
echo "✅ No errors in recent logs"
|
|
772
|
+
else
|
|
773
|
+
echo "⚠️ Found $ERROR_COUNT errors in logs"
|
|
774
|
+
fi
|
|
775
|
+
else
|
|
776
|
+
echo "❌ Watcher not running"
|
|
505
777
|
fi
|
|
506
778
|
}
|
|
507
779
|
|
|
508
|
-
|
|
509
|
-
|
|
510
|
-
|
|
780
|
+
# Check docker-compose status
|
|
781
|
+
check_compose_status() {
|
|
782
|
+
echo "Checking docker-compose status..."
|
|
783
|
+
|
|
784
|
+
if [ -f docker-compose.yaml ]; then
|
|
785
|
+
# Validate compose file
|
|
786
|
+
docker-compose config --quiet 2>/dev/null && \
|
|
787
|
+
echo "✅ docker-compose.yaml valid" || \
|
|
788
|
+
echo "❌ docker-compose.yaml has errors"
|
|
789
|
+
|
|
790
|
+
# Check defined services
|
|
791
|
+
SERVICES=$(docker-compose config --services 2>/dev/null)
|
|
792
|
+
echo "Defined services: $SERVICES"
|
|
793
|
+
else
|
|
794
|
+
echo "❌ docker-compose.yaml not found"
|
|
795
|
+
fi
|
|
796
|
+
}
|
|
511
797
|
|
|
512
|
-
|
|
798
|
+
check_qdrant_health
|
|
799
|
+
check_watcher_health
|
|
800
|
+
check_compose_status
|
|
801
|
+
```
|
|
513
802
|
|
|
514
|
-
###
|
|
803
|
+
### 7. Modularization Readiness Check (NEW)
|
|
515
804
|
```bash
|
|
516
805
|
#!/bin/bash
|
|
517
|
-
|
|
806
|
+
echo "=== MODULARIZATION READINESS CHECK ==="
|
|
518
807
|
|
|
519
|
-
|
|
520
|
-
|
|
521
|
-
echo "
|
|
522
|
-
|
|
808
|
+
# Analyze server.py for modularization
|
|
809
|
+
analyze_server_py() {
|
|
810
|
+
echo "Analyzing server.py for modularization..."
|
|
811
|
+
|
|
812
|
+
FILE="mcp-server/src/server.py"
|
|
813
|
+
if [ -f "$FILE" ]; then
|
|
814
|
+
# Count lines
|
|
815
|
+
LINES=$(wc -l < "$FILE")
|
|
816
|
+
echo "Total lines: $LINES"
|
|
817
|
+
|
|
818
|
+
# Count tools
|
|
819
|
+
TOOL_COUNT=$(grep -c "@mcp.tool()" "$FILE")
|
|
820
|
+
echo "MCP tools defined: $TOOL_COUNT"
|
|
821
|
+
|
|
822
|
+
# Count imports
|
|
823
|
+
IMPORT_COUNT=$(grep -c "^import\|^from" "$FILE")
|
|
824
|
+
echo "Import statements: $IMPORT_COUNT"
|
|
825
|
+
|
|
826
|
+
# Identify major sections
|
|
827
|
+
echo -e "\nMajor sections to extract:"
|
|
828
|
+
echo "- Temporal tools (get_recent_work, search_by_recency, get_timeline)"
|
|
829
|
+
echo "- Search tools (reflect_on_past, quick_search, etc.)"
|
|
830
|
+
echo "- Reflection tools (store_reflection, get_full_conversation)"
|
|
831
|
+
echo "- Embedding management (EmbeddingManager, generate_embedding)"
|
|
832
|
+
echo "- Decay logic (calculate_decay, apply_decay)"
|
|
833
|
+
echo "- Utils (ProjectResolver, normalize_project_name)"
|
|
834
|
+
|
|
835
|
+
# Check for circular dependencies
|
|
836
|
+
echo -e "\nChecking for potential circular dependencies..."
|
|
837
|
+
grep -q "from server import" "$FILE" && \
|
|
838
|
+
echo "⚠️ Potential circular imports detected" || \
|
|
839
|
+
echo "✅ No obvious circular imports"
|
|
840
|
+
else
|
|
841
|
+
echo "❌ server.py not found"
|
|
842
|
+
fi
|
|
843
|
+
}
|
|
523
844
|
|
|
524
|
-
#
|
|
525
|
-
|
|
526
|
-
|
|
527
|
-
|
|
528
|
-
|
|
529
|
-
|
|
530
|
-
|
|
531
|
-
|
|
532
|
-
|
|
533
|
-
|
|
534
|
-
|
|
535
|
-
|
|
536
|
-
|
|
537
|
-
|
|
538
|
-
|
|
539
|
-
echo "
|
|
540
|
-
|
|
541
|
-
|
|
542
|
-
|
|
543
|
-
echo "Step 4: Generating test report..."
|
|
544
|
-
cat > test-report-$(date +%Y%m%d).md << EOF
|
|
545
|
-
# Claude Self-Reflect Test Report
|
|
845
|
+
# Check for existing modular files
|
|
846
|
+
check_existing_modules() {
|
|
847
|
+
echo -e "\nChecking for existing modular files..."
|
|
848
|
+
|
|
849
|
+
MODULES=(
|
|
850
|
+
"temporal_utils.py"
|
|
851
|
+
"temporal_design.py"
|
|
852
|
+
"project_resolver.py"
|
|
853
|
+
"embedding_manager.py"
|
|
854
|
+
)
|
|
855
|
+
|
|
856
|
+
for module in "${MODULES[@]}"; do
|
|
857
|
+
if [ -f "mcp-server/src/$module" ]; then
|
|
858
|
+
echo "✅ $module exists"
|
|
859
|
+
else
|
|
860
|
+
echo "⚠️ $module not found (needs creation)"
|
|
861
|
+
fi
|
|
862
|
+
done
|
|
863
|
+
}
|
|
546
864
|
|
|
547
|
-
|
|
548
|
-
|
|
549
|
-
|
|
550
|
-
- All Tests: PASS/FAIL
|
|
865
|
+
analyze_server_py
|
|
866
|
+
check_existing_modules
|
|
867
|
+
```
|
|
551
868
|
|
|
552
|
-
|
|
553
|
-
|
|
554
|
-
|
|
555
|
-
|
|
556
|
-
- Data Integrity: ✅
|
|
557
|
-
- Performance: ✅
|
|
558
|
-
- Security: ✅
|
|
559
|
-
- Dual Mode: ✅
|
|
869
|
+
### 8. Performance & Memory Testing
|
|
870
|
+
```bash
|
|
871
|
+
#!/bin/bash
|
|
872
|
+
echo "=== PERFORMANCE & MEMORY TESTING ==="
|
|
560
873
|
|
|
561
|
-
|
|
562
|
-
|
|
563
|
-
|
|
874
|
+
# Test search performance with temporal tools
|
|
875
|
+
test_search_performance() {
|
|
876
|
+
echo "Testing search performance..."
|
|
877
|
+
|
|
878
|
+
python -c "
|
|
879
|
+
import time
|
|
880
|
+
import asyncio
|
|
881
|
+
import sys
|
|
882
|
+
import os
|
|
883
|
+
sys.path.insert(0, 'mcp-server/src')
|
|
884
|
+
os.environ['QDRANT_URL'] = 'http://localhost:6333'
|
|
564
885
|
|
|
565
|
-
|
|
886
|
+
async def test():
|
|
887
|
+
from server import get_recent_work, search_by_recency
|
|
888
|
+
|
|
889
|
+
class MockContext:
|
|
890
|
+
async def debug(self, msg): pass
|
|
891
|
+
async def report_progress(self, *args): pass
|
|
892
|
+
|
|
893
|
+
ctx = MockContext()
|
|
894
|
+
|
|
895
|
+
# Time get_recent_work
|
|
896
|
+
start = time.time()
|
|
897
|
+
await get_recent_work(ctx, limit=10)
|
|
898
|
+
recent_time = time.time() - start
|
|
899
|
+
|
|
900
|
+
# Time search_by_recency
|
|
901
|
+
start = time.time()
|
|
902
|
+
await search_by_recency(ctx, 'test', 'last week')
|
|
903
|
+
search_time = time.time() - start
|
|
904
|
+
|
|
905
|
+
print(f'get_recent_work: {recent_time:.2f}s')
|
|
906
|
+
print(f'search_by_recency: {search_time:.2f}s')
|
|
907
|
+
|
|
908
|
+
if recent_time < 2 and search_time < 2:
|
|
909
|
+
print('✅ Performance acceptable')
|
|
910
|
+
else:
|
|
911
|
+
print('⚠️ Performance needs optimization')
|
|
912
|
+
|
|
913
|
+
asyncio.run(test())
|
|
914
|
+
"
|
|
915
|
+
}
|
|
916
|
+
|
|
917
|
+
# Test memory usage
|
|
918
|
+
test_memory_usage() {
|
|
919
|
+
echo "Testing memory usage..."
|
|
920
|
+
|
|
921
|
+
# Check Python process memory
|
|
922
|
+
python -c "
|
|
923
|
+
import psutil
|
|
924
|
+
import os
|
|
925
|
+
process = psutil.Process(os.getpid())
|
|
926
|
+
mem_mb = process.memory_info().rss / 1024 / 1024
|
|
927
|
+
print(f'Python process: {mem_mb:.1f}MB')
|
|
928
|
+
"
|
|
929
|
+
|
|
930
|
+
# Check Docker container memory
|
|
931
|
+
for container in qdrant claude-reflection-safe-watcher; do
|
|
932
|
+
if docker ps | grep -q $container; then
|
|
933
|
+
MEM=$(docker stats --no-stream --format "{{.MemUsage}}" $container 2>/dev/null | cut -d'/' -f1 | sed 's/[^0-9.]//g')
|
|
934
|
+
echo "$container: ${MEM}MB"
|
|
935
|
+
fi
|
|
936
|
+
done
|
|
937
|
+
}
|
|
938
|
+
|
|
939
|
+
test_search_performance
|
|
940
|
+
test_memory_usage
|
|
566
941
|
```
|
|
567
942
|
|
|
568
|
-
###
|
|
943
|
+
### 9. Complete Test Report Generator
|
|
569
944
|
```bash
|
|
570
945
|
#!/bin/bash
|
|
571
|
-
|
|
946
|
+
echo "=== GENERATING TEST REPORT ==="
|
|
572
947
|
|
|
573
|
-
|
|
948
|
+
REPORT_FILE="test-report-$(date +%Y%m%d-%H%M%S).md"
|
|
574
949
|
|
|
575
|
-
|
|
576
|
-
|
|
577
|
-
rm -rf data/ config/
|
|
578
|
-
claude mcp remove claude-self-reflect
|
|
950
|
+
cat > "$REPORT_FILE" << EOF
|
|
951
|
+
# Claude Self-Reflect Test Report
|
|
579
952
|
|
|
580
|
-
|
|
581
|
-
|
|
953
|
+
## Test Summary
|
|
954
|
+
- **Date**: $(date)
|
|
955
|
+
- **Version**: $(grep version package.json | cut -d'"' -f4)
|
|
956
|
+
- **Server.py Lines**: $(wc -l < mcp-server/src/server.py)
|
|
957
|
+
- **Collections**: $(curl -s http://localhost:6333/collections | jq '.result.collections | length')
|
|
958
|
+
|
|
959
|
+
## Feature Tests
|
|
960
|
+
|
|
961
|
+
### Core Features
|
|
962
|
+
- [ ] Import Pipeline: PASS/FAIL
|
|
963
|
+
- [ ] MCP Tools (12): PASS/FAIL
|
|
964
|
+
- [ ] Search Quality: PASS/FAIL
|
|
965
|
+
- [ ] State Management: PASS/FAIL
|
|
966
|
+
|
|
967
|
+
### v3.x Features
|
|
968
|
+
- [ ] Temporal Tools (3): PASS/FAIL
|
|
969
|
+
- [ ] get_recent_work: PASS/FAIL
|
|
970
|
+
- [ ] search_by_recency: PASS/FAIL
|
|
971
|
+
- [ ] get_timeline: PASS/FAIL
|
|
972
|
+
- [ ] Timestamp Indexes: PASS/FAIL
|
|
973
|
+
- [ ] Project Scoping: PASS/FAIL
|
|
974
|
+
|
|
975
|
+
### Infrastructure
|
|
976
|
+
- [ ] CLI Tool: PASS/FAIL
|
|
977
|
+
- [ ] Docker Health: PASS/FAIL
|
|
978
|
+
- [ ] Qdrant: PASS/FAIL
|
|
979
|
+
- [ ] Watcher: PASS/FAIL
|
|
582
980
|
|
|
583
|
-
|
|
584
|
-
|
|
981
|
+
### Performance
|
|
982
|
+
- [ ] Search < 2s: PASS/FAIL
|
|
983
|
+
- [ ] Import < 10s: PASS/FAIL
|
|
984
|
+
- [ ] Memory < 500MB: PASS/FAIL
|
|
985
|
+
|
|
986
|
+
### Code Quality
|
|
987
|
+
- [ ] No Critical Bugs: PASS/FAIL
|
|
988
|
+
- [ ] XML Injection Fixed: PASS/FAIL
|
|
989
|
+
- [ ] Native Decay Fixed: PASS/FAIL
|
|
990
|
+
- [ ] Modularization Ready: PASS/FAIL
|
|
991
|
+
|
|
992
|
+
## Observations
|
|
993
|
+
$(date): Test execution started
|
|
994
|
+
$(date): All temporal tools tested
|
|
995
|
+
$(date): Project scoping validated
|
|
996
|
+
$(date): CLI packaging verified
|
|
997
|
+
$(date): Docker health confirmed
|
|
998
|
+
|
|
999
|
+
## Recommendations
|
|
1000
|
+
1. Fix critical bugs before release
|
|
1001
|
+
2. Complete modularization (2,835 lines → multiple modules)
|
|
1002
|
+
3. Add more comprehensive unit tests
|
|
1003
|
+
4. Update documentation for v3.x features
|
|
585
1004
|
|
|
586
|
-
|
|
587
|
-
|
|
1005
|
+
## Certification
|
|
1006
|
+
**System Ready for Release**: YES/NO
|
|
588
1007
|
|
|
589
|
-
|
|
590
|
-
|
|
1008
|
+
## Sign-off
|
|
1009
|
+
Tested by: claude-self-reflect-test agent
|
|
1010
|
+
Date: $(date)
|
|
1011
|
+
EOF
|
|
591
1012
|
|
|
592
|
-
|
|
593
|
-
echo "Manual step: Test MCP tools in Claude Code"
|
|
1013
|
+
echo "✅ Test report generated: $REPORT_FILE"
|
|
594
1014
|
```
|
|
595
1015
|
|
|
596
|
-
##
|
|
597
|
-
|
|
598
|
-
### Core Functionality
|
|
599
|
-
- [ ] Import pipeline processes all JSONL files
|
|
600
|
-
- [ ] Embeddings generated correctly (384/1024 dims)
|
|
601
|
-
- [ ] Qdrant stores vectors with proper metadata
|
|
602
|
-
- [ ] MCP tools accessible and functional
|
|
603
|
-
- [ ] Search returns relevant results (>0.7 scores)
|
|
1016
|
+
## Pre-Test Validation Protocol
|
|
604
1017
|
|
|
605
|
-
###
|
|
606
|
-
|
|
607
|
-
- [ ] File locking prevents corruption
|
|
608
|
-
- [ ] State persists across restarts
|
|
609
|
-
- [ ] Resume works after interruption
|
|
610
|
-
- [ ] Retry logic handles transient failures
|
|
1018
|
+
### Agent Self-Review
|
|
1019
|
+
Before running any tests, I MUST review myself to ensure comprehensive coverage:
|
|
611
1020
|
|
|
612
|
-
|
|
613
|
-
|
|
614
|
-
|
|
615
|
-
|
|
616
|
-
|
|
617
|
-
|
|
1021
|
+
```bash
|
|
1022
|
+
#!/bin/bash
|
|
1023
|
+
echo "=== PRE-TEST AGENT VALIDATION ==="
|
|
1024
|
+
|
|
1025
|
+
# Review this agent file for completeness
|
|
1026
|
+
review_agent_completeness() {
|
|
1027
|
+
echo "Reviewing CSR-tester agent for missing features..."
|
|
1028
|
+
|
|
1029
|
+
# Check if agent covers all known features
|
|
1030
|
+
AGENT_FILE="$HOME/projects/claude-self-reflect/.claude/agents/claude-self-reflect-test.md"
|
|
1031
|
+
|
|
1032
|
+
REQUIRED_FEATURES=(
|
|
1033
|
+
"15+ MCP tools"
|
|
1034
|
+
"Temporal tools"
|
|
1035
|
+
"Metadata extraction"
|
|
1036
|
+
"Hook system"
|
|
1037
|
+
"Sub-agents"
|
|
1038
|
+
"Embedding modes"
|
|
1039
|
+
"Zero vectors"
|
|
1040
|
+
"Streaming watcher"
|
|
1041
|
+
"Delta metadata"
|
|
1042
|
+
"Import pipeline"
|
|
1043
|
+
"Docker stack"
|
|
1044
|
+
"CLI tool"
|
|
1045
|
+
"State management"
|
|
1046
|
+
"Memory decay"
|
|
1047
|
+
"Parallel search"
|
|
1048
|
+
"Project scoping"
|
|
1049
|
+
"Collection naming"
|
|
1050
|
+
"Dimension validation"
|
|
1051
|
+
"XML escaping"
|
|
1052
|
+
"Error handling"
|
|
1053
|
+
)
|
|
618
1054
|
|
|
619
|
-
|
|
620
|
-
|
|
621
|
-
|
|
622
|
-
|
|
623
|
-
|
|
624
|
-
|
|
1055
|
+
for feature in "${REQUIRED_FEATURES[@]}"; do
|
|
1056
|
+
if grep -qi "$feature" "$AGENT_FILE"; then
|
|
1057
|
+
echo "✅ $feature: Covered"
|
|
1058
|
+
else
|
|
1059
|
+
echo "❌ $feature: MISSING - Add test coverage!"
|
|
1060
|
+
fi
|
|
1061
|
+
done
|
|
1062
|
+
}
|
|
625
1063
|
|
|
626
|
-
|
|
1064
|
+
# Discover any new features from codebase
|
|
1065
|
+
discover_new_features() {
|
|
1066
|
+
echo "Scanning for undocumented features..."
|
|
627
1067
|
|
|
628
|
-
|
|
1068
|
+
# Check for new MCP tools
|
|
1069
|
+
NEW_TOOLS=$(grep -h "@mcp.tool()" mcp-server/src/*.py 2>/dev/null | wc -l)
|
|
1070
|
+
echo "MCP tools found: $NEW_TOOLS"
|
|
629
1071
|
|
|
630
|
-
|
|
631
|
-
|
|
632
|
-
|
|
633
|
-
docker logs streaming-importer --tail 50
|
|
1072
|
+
# Check for new scripts
|
|
1073
|
+
NEW_SCRIPTS=$(ls scripts/*.py 2>/dev/null | wc -l)
|
|
1074
|
+
echo "Python scripts found: $NEW_SCRIPTS"
|
|
634
1075
|
|
|
635
|
-
#
|
|
636
|
-
|
|
1076
|
+
# Check for new test files
|
|
1077
|
+
NEW_TESTS=$(find tests -name "*.py" 2>/dev/null | wc -l)
|
|
1078
|
+
echo "Test files found: $NEW_TESTS"
|
|
637
1079
|
|
|
638
|
-
# Check
|
|
639
|
-
|
|
1080
|
+
# Check for new hooks
|
|
1081
|
+
if [ -d "$HOME/.claude/hooks" ]; then
|
|
1082
|
+
HOOKS=$(ls "$HOME/.claude/hooks" 2>/dev/null | wc -l)
|
|
1083
|
+
echo "Hooks configured: $HOOKS"
|
|
1084
|
+
fi
|
|
1085
|
+
}
|
|
640
1086
|
|
|
641
|
-
|
|
642
|
-
|
|
643
|
-
python scripts/import-conversations-unified.py
|
|
1087
|
+
review_agent_completeness
|
|
1088
|
+
discover_new_features
|
|
644
1089
|
```
|
|
645
1090
|
|
|
646
|
-
|
|
647
|
-
```bash
|
|
648
|
-
# Update metadata
|
|
649
|
-
python scripts/delta-metadata-update.py
|
|
1091
|
+
## Test Execution Protocol
|
|
650
1092
|
|
|
651
|
-
|
|
652
|
-
|
|
1093
|
+
### Run Complete Test Suite
|
|
1094
|
+
```bash
|
|
1095
|
+
#!/bin/bash
|
|
1096
|
+
# Master test runner - CSR-tester is the SOLE executor of all tests
|
|
653
1097
|
|
|
654
|
-
|
|
655
|
-
|
|
656
|
-
|
|
1098
|
+
echo "=== CLAUDE SELF-REFLECT COMPLETE TEST SUITE ==="
|
|
1099
|
+
echo "Starting at: $(date)"
|
|
1100
|
+
echo "Executor: CSR-tester agent (sole test runner)"
|
|
1101
|
+
echo ""
|
|
657
1102
|
|
|
658
|
-
|
|
659
|
-
|
|
660
|
-
|
|
661
|
-
claude mcp remove claude-self-reflect
|
|
662
|
-
claude mcp add claude-self-reflect /full/path/to/run-mcp.sh \
|
|
663
|
-
-e QDRANT_URL="http://localhost:6333" -s user
|
|
1103
|
+
# Pre-test validation
|
|
1104
|
+
echo "Phase 0: Pre-test Validation..."
|
|
1105
|
+
./review_agent_completeness.sh
|
|
664
1106
|
|
|
665
|
-
#
|
|
666
|
-
|
|
667
|
-
|
|
1107
|
+
# Create test results directory
|
|
1108
|
+
mkdir -p test-results-$(date +%Y%m%d)
|
|
1109
|
+
cd test-results-$(date +%Y%m%d)
|
|
668
1110
|
|
|
669
|
-
|
|
670
|
-
|
|
671
|
-
|
|
672
|
-
|
|
1111
|
+
# Run all test suites
|
|
1112
|
+
../test-system-health.sh > health.log 2>&1
|
|
1113
|
+
../test-temporal-tools.sh > temporal.log 2>&1
|
|
1114
|
+
../test-cli-tool.sh > cli.log 2>&1
|
|
1115
|
+
../test-import-pipeline.sh > import.log 2>&1
|
|
1116
|
+
../test-docker-health.sh > docker.log 2>&1
|
|
1117
|
+
../test-modularization.sh > modular.log 2>&1
|
|
1118
|
+
../test-performance.sh > performance.log 2>&1
|
|
673
1119
|
|
|
674
|
-
#
|
|
675
|
-
|
|
1120
|
+
# Generate final report
|
|
1121
|
+
../generate-test-report.sh
|
|
676
1122
|
|
|
677
|
-
|
|
678
|
-
|
|
1123
|
+
echo ""
|
|
1124
|
+
echo "=== TEST SUITE COMPLETE ==="
|
|
1125
|
+
echo "Results in: test-results-$(date +%Y%m%d)/"
|
|
1126
|
+
echo "Report: test-report-*.md"
|
|
679
1127
|
```
|
|
680
1128
|
|
|
681
|
-
##
|
|
682
|
-
|
|
683
|
-
After running all tests, the system should:
|
|
684
|
-
1. Process all conversations correctly
|
|
685
|
-
2. Support both embedding modes
|
|
686
|
-
3. Provide accurate search results
|
|
687
|
-
4. Handle concurrent operations safely
|
|
688
|
-
5. Maintain data integrity
|
|
689
|
-
6. Perform within acceptable limits
|
|
690
|
-
7. Secure sensitive information
|
|
691
|
-
8. **ALWAYS be in local mode after testing**
|
|
1129
|
+
## Success Criteria
|
|
692
1130
|
|
|
693
|
-
|
|
1131
|
+
### Must Pass
|
|
1132
|
+
- [ ] All 15+ MCP tools functional
|
|
1133
|
+
- [ ] Temporal tools work with proper scoping
|
|
1134
|
+
- [ ] Timestamp indexes on all collections
|
|
1135
|
+
- [ ] CLI installs and runs globally
|
|
1136
|
+
- [ ] Docker containers healthy
|
|
1137
|
+
- [ ] No critical bugs (native decay, XML injection, dimension mismatch)
|
|
1138
|
+
- [ ] Search returns relevant results
|
|
1139
|
+
- [ ] Import pipeline processes files
|
|
1140
|
+
- [ ] State persists correctly
|
|
1141
|
+
- [ ] NO ZERO VECTORS in any collection
|
|
1142
|
+
- [ ] Metadata extraction working (files, tools, concepts, AST patterns)
|
|
1143
|
+
- [ ] Both embedding modes functional (local 384d, Voyage 1024d)
|
|
1144
|
+
- [ ] Hooks execute properly (session-start, precompact)
|
|
1145
|
+
- [ ] All 6 sub-agents available
|
|
1146
|
+
|
|
1147
|
+
### Should Pass
|
|
1148
|
+
- [ ] Performance within limits
|
|
1149
|
+
- [ ] Memory usage acceptable
|
|
1150
|
+
- [ ] Modularization plan approved
|
|
1151
|
+
- [ ] Documentation updated
|
|
1152
|
+
- [ ] All unit tests pass
|
|
1153
|
+
|
|
1154
|
+
### Nice to Have
|
|
1155
|
+
- [ ] 100% test coverage
|
|
1156
|
+
- [ ] Zero warnings in logs
|
|
1157
|
+
- [ ] Sub-second search times
|
|
1158
|
+
|
|
1159
|
+
## Final Notes
|
|
1160
|
+
|
|
1161
|
+
This agent knows ALL features of Claude Self-Reflect v3.3.0 including:
|
|
1162
|
+
- 15+ MCP tools with temporal, search, reflection, pagination capabilities
|
|
1163
|
+
- Modularized architecture (search_tools.py, temporal_tools.py, reflection_tools.py, parallel_search.py)
|
|
1164
|
+
- Metadata extraction (AST patterns, concepts, files analyzed, tools used)
|
|
1165
|
+
- Hook system (session-start, precompact, submit hooks)
|
|
1166
|
+
- 6 specialized sub-agents for different domains
|
|
1167
|
+
- Dual embedding support (FastEmbed 384d, Voyage AI 1024d)
|
|
1168
|
+
- Zero vector detection and prevention
|
|
1169
|
+
- Streaming watcher and delta metadata updater
|
|
1170
|
+
- Project scoping and cross-collection search
|
|
1171
|
+
- Memory decay (client-side with 90-day half-life)
|
|
1172
|
+
- GPT-5 review recommendations and critical fixes
|
|
1173
|
+
- All test scripts and their purposes
|
|
1174
|
+
|
|
1175
|
+
The agent will ALWAYS restore the system to local mode after testing and provide comprehensive reports suitable for release decisions.
|