superlocalmemory 2.4.0 → 2.4.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +25 -0
- package/README.md +29 -7
- package/package.json +2 -1
- package/src/graph_engine.py +336 -8
- package/src/setup_validator.py +8 -1
- package/ui_server.py +17 -13
- package/src/__pycache__/auto_backup.cpython-312.pyc +0 -0
- package/src/__pycache__/cache_manager.cpython-312.pyc +0 -0
- package/src/__pycache__/embedding_engine.cpython-312.pyc +0 -0
- package/src/__pycache__/graph_engine.cpython-312.pyc +0 -0
- package/src/__pycache__/hnsw_index.cpython-312.pyc +0 -0
- package/src/__pycache__/hybrid_search.cpython-312.pyc +0 -0
- package/src/__pycache__/memory-profiles.cpython-312.pyc +0 -0
- package/src/__pycache__/memory-reset.cpython-312.pyc +0 -0
- package/src/__pycache__/memory_compression.cpython-312.pyc +0 -0
- package/src/__pycache__/memory_store_v2.cpython-312.pyc +0 -0
- package/src/__pycache__/migrate_v1_to_v2.cpython-312.pyc +0 -0
- package/src/__pycache__/pattern_learner.cpython-312.pyc +0 -0
- package/src/__pycache__/query_optimizer.cpython-312.pyc +0 -0
- package/src/__pycache__/search_engine_v2.cpython-312.pyc +0 -0
- package/src/__pycache__/setup_validator.cpython-312.pyc +0 -0
- package/src/__pycache__/tree_manager.cpython-312.pyc +0 -0
package/CHANGELOG.md
CHANGED
|
@@ -16,6 +16,31 @@ SuperLocalMemory V2 - Intelligent local memory system for AI coding assistants.
|
|
|
16
16
|
|
|
17
17
|
---
|
|
18
18
|
|
|
19
|
+
## [2.4.1] - 2026-02-11
|
|
20
|
+
|
|
21
|
+
**Release Type:** Hierarchical Clustering & Documentation Release
|
|
22
|
+
**Backward Compatible:** Yes (additive schema changes only)
|
|
23
|
+
|
|
24
|
+
### Added
|
|
25
|
+
- **Hierarchical Leiden clustering** (`graph_engine.py`): Recursive community detection — large clusters (≥10 members) are automatically sub-divided up to 3 levels deep. E.g., "Python" → "FastAPI" → "Authentication patterns". New `parent_cluster_id` and `depth` columns in `graph_clusters` table
|
|
26
|
+
- **Community summaries** (`graph_engine.py`): TF-IDF structured reports for every cluster — key topics, projects, categories, hierarchy context. Stored in `graph_clusters.summary` column, surfaced in `/api/clusters` endpoint and web dashboard
|
|
27
|
+
- **CLI commands**: `python3 graph_engine.py hierarchical` and `python3 graph_engine.py summaries` for manual runs
|
|
28
|
+
- **Schema migration**: Safe `ALTER TABLE` additions for `summary`, `parent_cluster_id`, `depth` columns — backward compatible with existing databases
|
|
29
|
+
|
|
30
|
+
### Changed
|
|
31
|
+
- `build_graph()` now automatically runs hierarchical sub-clustering and summary generation after flat Leiden
|
|
32
|
+
- `/api/clusters` endpoint returns `summary`, `parent_cluster_id`, `depth` fields
|
|
33
|
+
- `get_stats()` includes `max_depth` and per-cluster summary/hierarchy data
|
|
34
|
+
- `setup_validator.py` schema updated to include new columns
|
|
35
|
+
|
|
36
|
+
### Documentation
|
|
37
|
+
- **README.md**: v2.4.0→v2.4.1, added Hierarchical Leiden, Community Summaries, MACLA, Auto-Backup sections
|
|
38
|
+
- **Wiki**: Updated Roadmap, Pattern-Learning-Explained, Knowledge-Graph-Guide, Configuration, Visualization-Dashboard, Footer
|
|
39
|
+
- **Website**: Updated features.astro, comparison.astro, index.astro for v2.4.1 features
|
|
40
|
+
- **`.npmignore`**: Recursive `__pycache__` exclusion patterns
|
|
41
|
+
|
|
42
|
+
---
|
|
43
|
+
|
|
19
44
|
## [2.4.0] - 2026-02-11
|
|
20
45
|
|
|
21
46
|
**Release Type:** Profile System & Intelligence Release
|
package/README.md
CHANGED
|
@@ -130,7 +130,7 @@ npm update -g superlocalmemory
|
|
|
130
130
|
npm install -g superlocalmemory@latest
|
|
131
131
|
|
|
132
132
|
# Install specific version
|
|
133
|
-
npm install -g superlocalmemory@
|
|
133
|
+
npm install -g superlocalmemory@latest
|
|
134
134
|
```
|
|
135
135
|
|
|
136
136
|
**Manual install users:**
|
|
@@ -189,6 +189,19 @@ python ~/.claude-memory/ui_server.py
|
|
|
189
189
|
|
|
190
190
|
---
|
|
191
191
|
|
|
192
|
+
### New in v2.4.1: Hierarchical Clustering, Community Summaries & Auto-Backup
|
|
193
|
+
|
|
194
|
+
| Feature | Description |
|
|
195
|
+
|---------|-------------|
|
|
196
|
+
| **Hierarchical Leiden** | Recursive community detection — clusters within clusters up to 3 levels. "Python" → "FastAPI" → "Auth patterns" |
|
|
197
|
+
| **Community Summaries** | TF-IDF structured reports per cluster: key topics, projects, categories at a glance |
|
|
198
|
+
| **MACLA Confidence** | Bayesian Beta-Binomial scoring (arXiv:2512.18950) — calibrated confidence, not raw frequency |
|
|
199
|
+
| **Auto-Backup** | Configurable SQLite backups with retention policies, one-click restore from dashboard |
|
|
200
|
+
| **Profile UI** | Create, switch, delete profiles from the web dashboard — full isolation per context |
|
|
201
|
+
| **Profile Isolation** | All API endpoints (graph, clusters, patterns, timeline) scoped to active profile |
|
|
202
|
+
|
|
203
|
+
---
|
|
204
|
+
|
|
192
205
|
## 🔍 Advanced Search
|
|
193
206
|
|
|
194
207
|
SuperLocalMemory V2.2.0 implements **hybrid search** combining multiple strategies for maximum accuracy.
|
|
@@ -433,13 +446,13 @@ Not another simple key-value store. SuperLocalMemory implements **cutting-edge m
|
|
|
433
446
|
│ 6 universal slash-commands for AI assistants │
|
|
434
447
|
│ Compatible with Claude Code, Continue, Cody │
|
|
435
448
|
├─────────────────────────────────────────────────────────────┤
|
|
436
|
-
│ Layer 4: PATTERN LEARNING
|
|
437
|
-
│
|
|
449
|
+
│ Layer 4: PATTERN LEARNING + MACLA (v2.4.0) │
|
|
450
|
+
│ Bayesian Beta-Binomial confidence (arXiv:2512.18950) │
|
|
438
451
|
│ "You prefer React over Vue" (73% confidence) │
|
|
439
452
|
├─────────────────────────────────────────────────────────────┤
|
|
440
|
-
│ Layer 3: KNOWLEDGE GRAPH
|
|
441
|
-
│
|
|
442
|
-
│
|
|
453
|
+
│ Layer 3: KNOWLEDGE GRAPH + HIERARCHICAL LEIDEN (v2.4.1) │
|
|
454
|
+
│ Recursive clustering: "Python" → "FastAPI" → "Auth" │
|
|
455
|
+
│ Community summaries + TF-IDF structured reports │
|
|
443
456
|
├─────────────────────────────────────────────────────────────┤
|
|
444
457
|
│ Layer 2: HIERARCHICAL INDEX │
|
|
445
458
|
│ Tree structure for fast navigation │
|
|
@@ -488,6 +501,8 @@ python ~/.claude-memory/pattern_learner.py context 0.5
|
|
|
488
501
|
|
|
489
502
|
**Your AI assistant can now match your preferences automatically.**
|
|
490
503
|
|
|
504
|
+
**MACLA Confidence Scoring (v2.4.0):** Confidence uses a Bayesian Beta-Binomial posterior (Forouzandeh et al., [arXiv:2512.18950](https://arxiv.org/abs/2512.18950)). Pattern-specific priors, log-scaled competition, recency bonus. Range: 0.0–0.95 (hard cap prevents overconfidence).
|
|
505
|
+
|
|
491
506
|
### Multi-Profile Support
|
|
492
507
|
|
|
493
508
|
```bash
|
|
@@ -537,14 +552,21 @@ superlocalmemoryv2:profile create <name> # New profile
|
|
|
537
552
|
superlocalmemoryv2:profile switch <name> # Switch context
|
|
538
553
|
|
|
539
554
|
# Knowledge Graph
|
|
540
|
-
python ~/.claude-memory/graph_engine.py build # Build graph
|
|
555
|
+
python ~/.claude-memory/graph_engine.py build # Build graph (+ hierarchical + summaries)
|
|
541
556
|
python ~/.claude-memory/graph_engine.py stats # View clusters
|
|
542
557
|
python ~/.claude-memory/graph_engine.py related --id 5 # Find related
|
|
558
|
+
python ~/.claude-memory/graph_engine.py hierarchical # Sub-cluster large communities
|
|
559
|
+
python ~/.claude-memory/graph_engine.py summaries # Generate cluster summaries
|
|
543
560
|
|
|
544
561
|
# Pattern Learning
|
|
545
562
|
python ~/.claude-memory/pattern_learner.py update # Learn patterns
|
|
546
563
|
python ~/.claude-memory/pattern_learner.py context 0.5 # Get identity
|
|
547
564
|
|
|
565
|
+
# Auto-Backup (v2.4.0)
|
|
566
|
+
python ~/.claude-memory/auto_backup.py backup # Manual backup
|
|
567
|
+
python ~/.claude-memory/auto_backup.py list # List backups
|
|
568
|
+
python ~/.claude-memory/auto_backup.py status # Backup status
|
|
569
|
+
|
|
548
570
|
# Reset (Use with caution!)
|
|
549
571
|
superlocalmemoryv2:reset soft # Clear memories
|
|
550
572
|
superlocalmemoryv2:reset hard --confirm # Nuclear option
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "superlocalmemory",
|
|
3
|
-
"version": "2.4.
|
|
3
|
+
"version": "2.4.1",
|
|
4
4
|
"description": "Your AI Finally Remembers You - Local-first intelligent memory system for AI assistants. Works with Claude, Cursor, Windsurf, VS Code/Copilot, Codex, and 16+ AI tools. 100% local, zero cloud dependencies.",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"ai-memory",
|
|
@@ -43,6 +43,7 @@
|
|
|
43
43
|
"superlocalmemory": "./bin/slm-npm"
|
|
44
44
|
},
|
|
45
45
|
"scripts": {
|
|
46
|
+
"prepack": "find . -type d -name __pycache__ -exec rm -rf {} + 2>/dev/null; find . -name '*.pyc' -delete 2>/dev/null; true",
|
|
46
47
|
"postinstall": "node scripts/postinstall.js",
|
|
47
48
|
"preuninstall": "node scripts/preuninstall.js",
|
|
48
49
|
"test": "echo \"Run: npm install -g . && slm status\" && exit 0"
|
package/src/graph_engine.py
CHANGED
|
@@ -404,6 +404,293 @@ class ClusterBuilder:
|
|
|
404
404
|
return name[:100] # Limit length
|
|
405
405
|
|
|
406
406
|
|
|
407
|
+
def hierarchical_cluster(self, min_subcluster_size: int = 5, max_depth: int = 3) -> Dict[str, any]:
|
|
408
|
+
"""
|
|
409
|
+
Run recursive Leiden clustering — cluster the clusters.
|
|
410
|
+
|
|
411
|
+
Large communities (>= min_subcluster_size * 2) are recursively sub-clustered
|
|
412
|
+
to reveal finer-grained thematic structure. E.g., "Python" → "FastAPI" → "Auth".
|
|
413
|
+
|
|
414
|
+
Args:
|
|
415
|
+
min_subcluster_size: Minimum members to attempt sub-clustering (default 5)
|
|
416
|
+
max_depth: Maximum recursion depth (default 3)
|
|
417
|
+
|
|
418
|
+
Returns:
|
|
419
|
+
Dictionary with hierarchical clustering statistics
|
|
420
|
+
"""
|
|
421
|
+
try:
|
|
422
|
+
import igraph as ig
|
|
423
|
+
import leidenalg
|
|
424
|
+
except ImportError:
|
|
425
|
+
raise ImportError("python-igraph and leidenalg required. Install: pip install python-igraph leidenalg")
|
|
426
|
+
|
|
427
|
+
conn = sqlite3.connect(self.db_path)
|
|
428
|
+
cursor = conn.cursor()
|
|
429
|
+
active_profile = self._get_active_profile()
|
|
430
|
+
|
|
431
|
+
try:
|
|
432
|
+
# Get top-level clusters for this profile that are large enough to sub-cluster
|
|
433
|
+
cursor.execute('''
|
|
434
|
+
SELECT cluster_id, COUNT(*) as cnt
|
|
435
|
+
FROM memories
|
|
436
|
+
WHERE cluster_id IS NOT NULL AND profile = ?
|
|
437
|
+
GROUP BY cluster_id
|
|
438
|
+
HAVING cnt >= ?
|
|
439
|
+
''', (active_profile, min_subcluster_size * 2))
|
|
440
|
+
large_clusters = cursor.fetchall()
|
|
441
|
+
|
|
442
|
+
if not large_clusters:
|
|
443
|
+
logger.info("No clusters large enough for hierarchical decomposition")
|
|
444
|
+
return {'subclusters_created': 0, 'depth_reached': 0}
|
|
445
|
+
|
|
446
|
+
total_subclusters = 0
|
|
447
|
+
max_depth_reached = 0
|
|
448
|
+
|
|
449
|
+
for parent_cid, member_count in large_clusters:
|
|
450
|
+
subs, depth = self._recursive_subcluster(
|
|
451
|
+
conn, cursor, parent_cid, active_profile,
|
|
452
|
+
min_subcluster_size, max_depth, current_depth=1
|
|
453
|
+
)
|
|
454
|
+
total_subclusters += subs
|
|
455
|
+
max_depth_reached = max(max_depth_reached, depth)
|
|
456
|
+
|
|
457
|
+
conn.commit()
|
|
458
|
+
logger.info(f"Hierarchical clustering: {total_subclusters} sub-clusters, depth {max_depth_reached}")
|
|
459
|
+
return {
|
|
460
|
+
'subclusters_created': total_subclusters,
|
|
461
|
+
'depth_reached': max_depth_reached,
|
|
462
|
+
'parent_clusters_processed': len(large_clusters)
|
|
463
|
+
}
|
|
464
|
+
|
|
465
|
+
except Exception as e:
|
|
466
|
+
logger.error(f"Hierarchical clustering failed: {e}")
|
|
467
|
+
conn.rollback()
|
|
468
|
+
return {'subclusters_created': 0, 'error': str(e)}
|
|
469
|
+
finally:
|
|
470
|
+
conn.close()
|
|
471
|
+
|
|
472
|
+
def _recursive_subcluster(self, conn, cursor, parent_cluster_id: int,
|
|
473
|
+
profile: str, min_size: int, max_depth: int,
|
|
474
|
+
current_depth: int) -> Tuple[int, int]:
|
|
475
|
+
"""Recursively sub-cluster a community using Leiden."""
|
|
476
|
+
import igraph as ig
|
|
477
|
+
import leidenalg
|
|
478
|
+
|
|
479
|
+
if current_depth > max_depth:
|
|
480
|
+
return 0, current_depth - 1
|
|
481
|
+
|
|
482
|
+
# Get memory IDs in this cluster
|
|
483
|
+
cursor.execute('''
|
|
484
|
+
SELECT id FROM memories
|
|
485
|
+
WHERE cluster_id = ? AND profile = ?
|
|
486
|
+
''', (parent_cluster_id, profile))
|
|
487
|
+
member_ids = [row[0] for row in cursor.fetchall()]
|
|
488
|
+
|
|
489
|
+
if len(member_ids) < min_size * 2:
|
|
490
|
+
return 0, current_depth - 1
|
|
491
|
+
|
|
492
|
+
# Get edges between members of this cluster
|
|
493
|
+
placeholders = ','.join('?' * len(member_ids))
|
|
494
|
+
edges = cursor.execute(f'''
|
|
495
|
+
SELECT source_memory_id, target_memory_id, weight
|
|
496
|
+
FROM graph_edges
|
|
497
|
+
WHERE source_memory_id IN ({placeholders})
|
|
498
|
+
AND target_memory_id IN ({placeholders})
|
|
499
|
+
''', member_ids + member_ids).fetchall()
|
|
500
|
+
|
|
501
|
+
if len(edges) < 2:
|
|
502
|
+
return 0, current_depth - 1
|
|
503
|
+
|
|
504
|
+
# Build sub-graph
|
|
505
|
+
id_to_vertex = {mid: idx for idx, mid in enumerate(member_ids)}
|
|
506
|
+
vertex_to_id = {idx: mid for mid, idx in id_to_vertex.items()}
|
|
507
|
+
|
|
508
|
+
g = ig.Graph()
|
|
509
|
+
g.add_vertices(len(member_ids))
|
|
510
|
+
edge_list, edge_weights = [], []
|
|
511
|
+
for src, tgt, w in edges:
|
|
512
|
+
if src in id_to_vertex and tgt in id_to_vertex:
|
|
513
|
+
edge_list.append((id_to_vertex[src], id_to_vertex[tgt]))
|
|
514
|
+
edge_weights.append(w)
|
|
515
|
+
|
|
516
|
+
if not edge_list:
|
|
517
|
+
return 0, current_depth - 1
|
|
518
|
+
|
|
519
|
+
g.add_edges(edge_list)
|
|
520
|
+
|
|
521
|
+
# Run Leiden with higher resolution for finer communities
|
|
522
|
+
partition = leidenalg.find_partition(
|
|
523
|
+
g, leidenalg.ModularityVertexPartition,
|
|
524
|
+
weights=edge_weights, n_iterations=100, seed=42
|
|
525
|
+
)
|
|
526
|
+
|
|
527
|
+
# Only proceed if Leiden found > 1 community (actual split)
|
|
528
|
+
non_singleton = [c for c in partition if len(c) >= 2]
|
|
529
|
+
if len(non_singleton) <= 1:
|
|
530
|
+
return 0, current_depth - 1
|
|
531
|
+
|
|
532
|
+
subclusters_created = 0
|
|
533
|
+
deepest = current_depth
|
|
534
|
+
|
|
535
|
+
# Get parent depth
|
|
536
|
+
cursor.execute('SELECT depth FROM graph_clusters WHERE id = ?', (parent_cluster_id,))
|
|
537
|
+
parent_row = cursor.fetchone()
|
|
538
|
+
parent_depth = parent_row[0] if parent_row else 0
|
|
539
|
+
|
|
540
|
+
for community in non_singleton:
|
|
541
|
+
sub_member_ids = [vertex_to_id[v] for v in community]
|
|
542
|
+
|
|
543
|
+
if len(sub_member_ids) < 2:
|
|
544
|
+
continue
|
|
545
|
+
|
|
546
|
+
avg_imp = self._get_avg_importance(cursor, sub_member_ids)
|
|
547
|
+
cluster_name = self._generate_cluster_name(cursor, sub_member_ids)
|
|
548
|
+
|
|
549
|
+
result = cursor.execute('''
|
|
550
|
+
INSERT INTO graph_clusters (name, member_count, avg_importance, parent_cluster_id, depth)
|
|
551
|
+
VALUES (?, ?, ?, ?, ?)
|
|
552
|
+
''', (cluster_name, len(sub_member_ids), avg_imp, parent_cluster_id, parent_depth + 1))
|
|
553
|
+
|
|
554
|
+
sub_cluster_id = result.lastrowid
|
|
555
|
+
|
|
556
|
+
# Update memories to point to sub-cluster
|
|
557
|
+
cursor.executemany('''
|
|
558
|
+
UPDATE memories SET cluster_id = ? WHERE id = ?
|
|
559
|
+
''', [(sub_cluster_id, mid) for mid in sub_member_ids])
|
|
560
|
+
|
|
561
|
+
subclusters_created += 1
|
|
562
|
+
logger.info(f"Sub-cluster {sub_cluster_id} under {parent_cluster_id}: "
|
|
563
|
+
f"'{cluster_name}' ({len(sub_member_ids)} members, depth {parent_depth + 1})")
|
|
564
|
+
|
|
565
|
+
# Recurse into this sub-cluster if large enough
|
|
566
|
+
child_subs, child_depth = self._recursive_subcluster(
|
|
567
|
+
conn, cursor, sub_cluster_id, profile,
|
|
568
|
+
min_size, max_depth, current_depth + 1
|
|
569
|
+
)
|
|
570
|
+
subclusters_created += child_subs
|
|
571
|
+
deepest = max(deepest, child_depth)
|
|
572
|
+
|
|
573
|
+
return subclusters_created, deepest
|
|
574
|
+
|
|
575
|
+
def generate_cluster_summaries(self) -> int:
|
|
576
|
+
"""
|
|
577
|
+
Generate TF-IDF structured summaries for all clusters.
|
|
578
|
+
|
|
579
|
+
For each cluster, analyzes member content to produce a human-readable
|
|
580
|
+
summary describing the cluster's theme, key topics, and scope.
|
|
581
|
+
|
|
582
|
+
Returns:
|
|
583
|
+
Number of clusters with summaries generated
|
|
584
|
+
"""
|
|
585
|
+
conn = sqlite3.connect(self.db_path)
|
|
586
|
+
cursor = conn.cursor()
|
|
587
|
+
active_profile = self._get_active_profile()
|
|
588
|
+
|
|
589
|
+
try:
|
|
590
|
+
# Get all clusters for this profile
|
|
591
|
+
cursor.execute('''
|
|
592
|
+
SELECT DISTINCT gc.id, gc.name, gc.member_count
|
|
593
|
+
FROM graph_clusters gc
|
|
594
|
+
JOIN memories m ON m.cluster_id = gc.id
|
|
595
|
+
WHERE m.profile = ?
|
|
596
|
+
''', (active_profile,))
|
|
597
|
+
clusters = cursor.fetchall()
|
|
598
|
+
|
|
599
|
+
if not clusters:
|
|
600
|
+
return 0
|
|
601
|
+
|
|
602
|
+
summaries_generated = 0
|
|
603
|
+
|
|
604
|
+
for cluster_id, cluster_name, member_count in clusters:
|
|
605
|
+
summary = self._build_cluster_summary(cursor, cluster_id, active_profile)
|
|
606
|
+
if summary:
|
|
607
|
+
cursor.execute('''
|
|
608
|
+
UPDATE graph_clusters SET summary = ?, updated_at = CURRENT_TIMESTAMP
|
|
609
|
+
WHERE id = ?
|
|
610
|
+
''', (summary, cluster_id))
|
|
611
|
+
summaries_generated += 1
|
|
612
|
+
logger.info(f"Summary for cluster {cluster_id} ({cluster_name}): {summary[:80]}...")
|
|
613
|
+
|
|
614
|
+
conn.commit()
|
|
615
|
+
logger.info(f"Generated {summaries_generated} cluster summaries")
|
|
616
|
+
return summaries_generated
|
|
617
|
+
|
|
618
|
+
except Exception as e:
|
|
619
|
+
logger.error(f"Summary generation failed: {e}")
|
|
620
|
+
conn.rollback()
|
|
621
|
+
return 0
|
|
622
|
+
finally:
|
|
623
|
+
conn.close()
|
|
624
|
+
|
|
625
|
+
def _build_cluster_summary(self, cursor, cluster_id: int, profile: str) -> str:
|
|
626
|
+
"""Build a TF-IDF structured summary for a single cluster."""
|
|
627
|
+
# Get member content
|
|
628
|
+
cursor.execute('''
|
|
629
|
+
SELECT m.content, m.summary, m.tags, m.category, m.project_name
|
|
630
|
+
FROM memories m
|
|
631
|
+
WHERE m.cluster_id = ? AND m.profile = ?
|
|
632
|
+
''', (cluster_id, profile))
|
|
633
|
+
members = cursor.fetchall()
|
|
634
|
+
|
|
635
|
+
if not members:
|
|
636
|
+
return ""
|
|
637
|
+
|
|
638
|
+
# Collect entities from graph nodes
|
|
639
|
+
cursor.execute('''
|
|
640
|
+
SELECT gn.entities
|
|
641
|
+
FROM graph_nodes gn
|
|
642
|
+
JOIN memories m ON gn.memory_id = m.id
|
|
643
|
+
WHERE m.cluster_id = ? AND m.profile = ?
|
|
644
|
+
''', (cluster_id, profile))
|
|
645
|
+
all_entities = []
|
|
646
|
+
for row in cursor.fetchall():
|
|
647
|
+
if row[0]:
|
|
648
|
+
try:
|
|
649
|
+
all_entities.extend(json.loads(row[0]))
|
|
650
|
+
except (json.JSONDecodeError, TypeError):
|
|
651
|
+
pass
|
|
652
|
+
|
|
653
|
+
# Top entities by frequency (TF-IDF already extracted these)
|
|
654
|
+
entity_counts = Counter(all_entities)
|
|
655
|
+
top_entities = [e for e, _ in entity_counts.most_common(5)]
|
|
656
|
+
|
|
657
|
+
# Collect unique projects and categories
|
|
658
|
+
projects = set()
|
|
659
|
+
categories = set()
|
|
660
|
+
for m in members:
|
|
661
|
+
if m[3]: # category
|
|
662
|
+
categories.add(m[3])
|
|
663
|
+
if m[4]: # project_name
|
|
664
|
+
projects.add(m[4])
|
|
665
|
+
|
|
666
|
+
# Build structured summary
|
|
667
|
+
parts = []
|
|
668
|
+
|
|
669
|
+
# Theme from top entities
|
|
670
|
+
if top_entities:
|
|
671
|
+
parts.append(f"Key topics: {', '.join(top_entities[:5])}")
|
|
672
|
+
|
|
673
|
+
# Scope
|
|
674
|
+
if projects:
|
|
675
|
+
parts.append(f"Projects: {', '.join(sorted(projects)[:3])}")
|
|
676
|
+
if categories:
|
|
677
|
+
parts.append(f"Categories: {', '.join(sorted(categories)[:3])}")
|
|
678
|
+
|
|
679
|
+
# Size context
|
|
680
|
+
parts.append(f"{len(members)} memories")
|
|
681
|
+
|
|
682
|
+
# Check for hierarchical context
|
|
683
|
+
cursor.execute('SELECT parent_cluster_id FROM graph_clusters WHERE id = ?', (cluster_id,))
|
|
684
|
+
parent_row = cursor.fetchone()
|
|
685
|
+
if parent_row and parent_row[0]:
|
|
686
|
+
cursor.execute('SELECT name FROM graph_clusters WHERE id = ?', (parent_row[0],))
|
|
687
|
+
parent_name_row = cursor.fetchone()
|
|
688
|
+
if parent_name_row:
|
|
689
|
+
parts.append(f"Sub-cluster of: {parent_name_row[0]}")
|
|
690
|
+
|
|
691
|
+
return " | ".join(parts)
|
|
692
|
+
|
|
693
|
+
|
|
407
694
|
class ClusterNamer:
|
|
408
695
|
"""Enhanced cluster naming with optional LLM support (future)."""
|
|
409
696
|
|
|
@@ -498,13 +785,24 @@ class GraphEngine:
|
|
|
498
785
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
499
786
|
name TEXT NOT NULL,
|
|
500
787
|
description TEXT,
|
|
788
|
+
summary TEXT,
|
|
501
789
|
member_count INTEGER DEFAULT 0,
|
|
502
790
|
avg_importance REAL,
|
|
791
|
+
parent_cluster_id INTEGER,
|
|
792
|
+
depth INTEGER DEFAULT 0,
|
|
503
793
|
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
|
504
|
-
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
|
794
|
+
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
|
795
|
+
FOREIGN KEY (parent_cluster_id) REFERENCES graph_clusters(id) ON DELETE SET NULL
|
|
505
796
|
)
|
|
506
797
|
''')
|
|
507
798
|
|
|
799
|
+
# Safe column additions for existing databases
|
|
800
|
+
for col, col_type in [('summary', 'TEXT'), ('parent_cluster_id', 'INTEGER'), ('depth', 'INTEGER DEFAULT 0')]:
|
|
801
|
+
try:
|
|
802
|
+
cursor.execute(f'ALTER TABLE graph_clusters ADD COLUMN {col} {col_type}')
|
|
803
|
+
except sqlite3.OperationalError:
|
|
804
|
+
pass
|
|
805
|
+
|
|
508
806
|
# Add cluster_id to memories if not exists
|
|
509
807
|
try:
|
|
510
808
|
cursor.execute('ALTER TABLE memories ADD COLUMN cluster_id INTEGER')
|
|
@@ -648,9 +946,16 @@ class GraphEngine:
|
|
|
648
946
|
memory_ids, vectors, entities_list
|
|
649
947
|
)
|
|
650
948
|
|
|
651
|
-
# Detect communities
|
|
949
|
+
# Detect communities (flat Leiden)
|
|
652
950
|
clusters_count = self.cluster_builder.detect_communities()
|
|
653
951
|
|
|
952
|
+
# Hierarchical sub-clustering on large communities
|
|
953
|
+
hierarchical_stats = self.cluster_builder.hierarchical_cluster()
|
|
954
|
+
subclusters = hierarchical_stats.get('subclusters_created', 0)
|
|
955
|
+
|
|
956
|
+
# Generate TF-IDF structured summaries for all clusters
|
|
957
|
+
summaries = self.cluster_builder.generate_cluster_summaries()
|
|
958
|
+
|
|
654
959
|
elapsed = time.time() - start_time
|
|
655
960
|
|
|
656
961
|
stats = {
|
|
@@ -659,6 +964,9 @@ class GraphEngine:
|
|
|
659
964
|
'nodes': len(memory_ids),
|
|
660
965
|
'edges': edges_count,
|
|
661
966
|
'clusters': clusters_count,
|
|
967
|
+
'subclusters': subclusters,
|
|
968
|
+
'max_depth': hierarchical_stats.get('depth_reached', 0),
|
|
969
|
+
'summaries_generated': summaries,
|
|
662
970
|
'time_seconds': round(elapsed, 2)
|
|
663
971
|
}
|
|
664
972
|
|
|
@@ -962,28 +1270,36 @@ class GraphEngine:
|
|
|
962
1270
|
WHERE cluster_id IS NOT NULL AND profile = ?
|
|
963
1271
|
''', (active_profile,)).fetchone()[0]
|
|
964
1272
|
|
|
965
|
-
# Cluster breakdown for active profile
|
|
1273
|
+
# Cluster breakdown for active profile (including hierarchy)
|
|
966
1274
|
cluster_info = cursor.execute('''
|
|
967
|
-
SELECT gc.name, gc.member_count, gc.avg_importance
|
|
1275
|
+
SELECT gc.name, gc.member_count, gc.avg_importance,
|
|
1276
|
+
gc.summary, gc.parent_cluster_id, gc.depth
|
|
968
1277
|
FROM graph_clusters gc
|
|
969
1278
|
WHERE gc.id IN (
|
|
970
1279
|
SELECT DISTINCT cluster_id FROM memories
|
|
971
1280
|
WHERE cluster_id IS NOT NULL AND profile = ?
|
|
972
1281
|
)
|
|
973
|
-
ORDER BY gc.member_count DESC
|
|
974
|
-
LIMIT
|
|
1282
|
+
ORDER BY gc.depth ASC, gc.member_count DESC
|
|
1283
|
+
LIMIT 20
|
|
975
1284
|
''', (active_profile,)).fetchall()
|
|
976
1285
|
|
|
1286
|
+
# Count hierarchical depth
|
|
1287
|
+
max_depth = max((c[5] or 0 for c in cluster_info), default=0) if cluster_info else 0
|
|
1288
|
+
|
|
977
1289
|
return {
|
|
978
1290
|
'profile': active_profile,
|
|
979
1291
|
'nodes': nodes,
|
|
980
1292
|
'edges': edges,
|
|
981
1293
|
'clusters': clusters,
|
|
1294
|
+
'max_depth': max_depth,
|
|
982
1295
|
'top_clusters': [
|
|
983
1296
|
{
|
|
984
1297
|
'name': c[0],
|
|
985
1298
|
'members': c[1],
|
|
986
|
-
'avg_importance': round(c[2], 1)
|
|
1299
|
+
'avg_importance': round(c[2], 1) if c[2] else 5.0,
|
|
1300
|
+
'summary': c[3],
|
|
1301
|
+
'parent_cluster_id': c[4],
|
|
1302
|
+
'depth': c[5] or 0
|
|
987
1303
|
}
|
|
988
1304
|
for c in cluster_info
|
|
989
1305
|
]
|
|
@@ -998,7 +1314,7 @@ def main():
|
|
|
998
1314
|
import argparse
|
|
999
1315
|
|
|
1000
1316
|
parser = argparse.ArgumentParser(description='GraphEngine - Knowledge Graph Management')
|
|
1001
|
-
parser.add_argument('command', choices=['build', 'stats', 'related', 'cluster'],
|
|
1317
|
+
parser.add_argument('command', choices=['build', 'stats', 'related', 'cluster', 'hierarchical', 'summaries'],
|
|
1002
1318
|
help='Command to execute')
|
|
1003
1319
|
parser.add_argument('--memory-id', type=int, help='Memory ID for related/add commands')
|
|
1004
1320
|
parser.add_argument('--cluster-id', type=int, help='Cluster ID for cluster command')
|
|
@@ -1052,6 +1368,18 @@ def main():
|
|
|
1052
1368
|
summary = mem['summary'] or '[No summary]'
|
|
1053
1369
|
print(f" {summary[:100]}...")
|
|
1054
1370
|
|
|
1371
|
+
elif args.command == 'hierarchical':
|
|
1372
|
+
print("Running hierarchical sub-clustering...")
|
|
1373
|
+
cluster_builder = ClusterBuilder(engine.db_path)
|
|
1374
|
+
stats = cluster_builder.hierarchical_cluster()
|
|
1375
|
+
print(json.dumps(stats, indent=2))
|
|
1376
|
+
|
|
1377
|
+
elif args.command == 'summaries':
|
|
1378
|
+
print("Generating cluster summaries...")
|
|
1379
|
+
cluster_builder = ClusterBuilder(engine.db_path)
|
|
1380
|
+
count = cluster_builder.generate_cluster_summaries()
|
|
1381
|
+
print(f"Generated summaries for {count} clusters")
|
|
1382
|
+
|
|
1055
1383
|
|
|
1056
1384
|
if __name__ == '__main__':
|
|
1057
1385
|
main()
|
package/src/setup_validator.py
CHANGED
|
@@ -257,11 +257,18 @@ def initialize_database() -> Tuple[bool, str]:
|
|
|
257
257
|
CREATE TABLE IF NOT EXISTS graph_clusters (
|
|
258
258
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
259
259
|
cluster_name TEXT,
|
|
260
|
+
name TEXT,
|
|
260
261
|
description TEXT,
|
|
262
|
+
summary TEXT,
|
|
261
263
|
memory_count INTEGER DEFAULT 0,
|
|
264
|
+
member_count INTEGER DEFAULT 0,
|
|
262
265
|
avg_importance REAL DEFAULT 5.0,
|
|
263
266
|
top_entities TEXT DEFAULT '[]',
|
|
264
|
-
|
|
267
|
+
parent_cluster_id INTEGER,
|
|
268
|
+
depth INTEGER DEFAULT 0,
|
|
269
|
+
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
|
270
|
+
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
|
271
|
+
FOREIGN KEY (parent_cluster_id) REFERENCES graph_clusters(id) ON DELETE SET NULL
|
|
265
272
|
)
|
|
266
273
|
''')
|
|
267
274
|
|
package/ui_server.py
CHANGED
|
@@ -707,22 +707,26 @@ async def get_clusters():
|
|
|
707
707
|
|
|
708
708
|
active_profile = get_active_profile()
|
|
709
709
|
|
|
710
|
-
# Get cluster statistics
|
|
710
|
+
# Get cluster statistics with hierarchy and summaries
|
|
711
711
|
cursor.execute("""
|
|
712
712
|
SELECT
|
|
713
|
-
cluster_id,
|
|
713
|
+
m.cluster_id,
|
|
714
714
|
COUNT(*) as member_count,
|
|
715
|
-
AVG(importance) as avg_importance,
|
|
716
|
-
MIN(importance) as min_importance,
|
|
717
|
-
MAX(importance) as max_importance,
|
|
718
|
-
GROUP_CONCAT(DISTINCT category) as categories,
|
|
719
|
-
GROUP_CONCAT(DISTINCT project_name) as projects,
|
|
720
|
-
MIN(created_at) as first_memory,
|
|
721
|
-
MAX(created_at) as latest_memory
|
|
722
|
-
|
|
723
|
-
|
|
724
|
-
|
|
725
|
-
|
|
715
|
+
AVG(m.importance) as avg_importance,
|
|
716
|
+
MIN(m.importance) as min_importance,
|
|
717
|
+
MAX(m.importance) as max_importance,
|
|
718
|
+
GROUP_CONCAT(DISTINCT m.category) as categories,
|
|
719
|
+
GROUP_CONCAT(DISTINCT m.project_name) as projects,
|
|
720
|
+
MIN(m.created_at) as first_memory,
|
|
721
|
+
MAX(m.created_at) as latest_memory,
|
|
722
|
+
gc.summary,
|
|
723
|
+
gc.parent_cluster_id,
|
|
724
|
+
gc.depth
|
|
725
|
+
FROM memories m
|
|
726
|
+
LEFT JOIN graph_clusters gc ON m.cluster_id = gc.id
|
|
727
|
+
WHERE m.cluster_id IS NOT NULL AND m.profile = ?
|
|
728
|
+
GROUP BY m.cluster_id
|
|
729
|
+
ORDER BY COALESCE(gc.depth, 0) ASC, member_count DESC
|
|
726
730
|
""", (active_profile,))
|
|
727
731
|
clusters = cursor.fetchall()
|
|
728
732
|
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|