loki-mode 5.1.3 → 5.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -352,6 +352,262 @@ def before_task_execution(task):
352
352
  return task
353
353
  ```
354
354
 
355
+ ### Task-Aware Memory Strategy Selection
356
+
357
+ **Research Source:** arXiv 2512.18746 - MemEvolve demonstrates that task-aware memory adaptation improves agent performance by 17% compared to static retrieval weights.
358
+
359
+ **Important Clarification:** This is NOT full meta-evolution where the system learns to modify its own memory strategies over time. This is a simpler, pragmatic approach: task-type detection followed by applying pre-defined weight configurations. True meta-evolution (as in the full MemEvolve paper) would require online learning of strategy parameters - that remains future work.
360
+
361
+ #### Strategy Definitions
362
+
363
+ Different task types benefit from different memory compositions:
364
+
365
+ ```yaml
366
+ task_memory_strategies:
367
+ exploration:
368
+ description: "Understanding codebase, researching options, investigating architecture"
369
+ weights:
370
+ episodic: 0.6 # Breadth of past experiences - what was tried before
371
+ semantic: 0.3 # General patterns - how things usually work
372
+ skills: 0.1 # Less relevant - not yet executing
373
+ rationale: "Exploration benefits from breadth over depth"
374
+
375
+ implementation:
376
+ description: "Writing code, building features, creating new functionality"
377
+ weights:
378
+ semantic: 0.5 # Proven patterns - what works
379
+ skills: 0.35 # Action sequences - how to do it
380
+ episodic: 0.15 # Similar implementations - specific examples
381
+ rationale: "Implementation needs patterns and procedures"
382
+
383
+ debugging:
384
+ description: "Fixing bugs, investigating issues, resolving errors"
385
+ weights:
386
+ anti_patterns: 0.4 # What NOT to do - critical for debugging
387
+ episodic: 0.4 # Similar error cases - what worked before
388
+ semantic: 0.2 # General error patterns
389
+ rationale: "Debugging requires knowing what fails and past solutions"
390
+
391
+ review:
392
+ description: "Code review, quality checks, validation"
393
+ weights:
394
+ semantic: 0.5 # Quality patterns - standards to enforce
395
+ episodic: 0.3 # Past review outcomes - what was caught
396
+ anti_patterns: 0.2 # Common mistakes to flag
397
+ rationale: "Review needs quality criteria and historical issues"
398
+
399
+ refactoring:
400
+ description: "Improving code structure without changing behavior"
401
+ weights:
402
+ semantic: 0.45 # Design patterns - target structure
403
+ skills: 0.3 # Refactoring procedures - safe transformations
404
+ episodic: 0.25 # Past refactoring success/failure
405
+ rationale: "Refactoring needs clear patterns and safe procedures"
406
+ ```
407
+
408
+ #### Task Type Detection
409
+
410
+ Detect task type from context using keyword signals and structural patterns:
411
+
412
+ ```python
413
+ def detect_task_type(task_context):
414
+ """
415
+ Detect task type from context to select appropriate memory strategy.
416
+ Uses keyword matching and structural analysis.
417
+
418
+ Returns: One of 'exploration', 'implementation', 'debugging', 'review', 'refactoring'
419
+ """
420
+ goal = task_context.get("goal", "").lower()
421
+ action = task_context.get("action_type", "").lower()
422
+ phase = task_context.get("phase", "").lower()
423
+
424
+ # Keyword signals for each task type
425
+ signals = {
426
+ "exploration": {
427
+ "keywords": ["explore", "understand", "research", "investigate",
428
+ "analyze", "discover", "find", "what is", "how does",
429
+ "architecture", "structure", "overview"],
430
+ "actions": ["read_file", "search", "list_files"],
431
+ "phases": ["planning", "discovery", "research"]
432
+ },
433
+ "implementation": {
434
+ "keywords": ["implement", "create", "build", "add", "write",
435
+ "develop", "make", "construct", "new feature"],
436
+ "actions": ["write_file", "create_file", "edit_file"],
437
+ "phases": ["development", "implementation", "coding"]
438
+ },
439
+ "debugging": {
440
+ "keywords": ["fix", "debug", "error", "bug", "issue", "broken",
441
+ "failing", "crash", "exception", "investigate error"],
442
+ "actions": ["run_test", "check_logs", "trace"],
443
+ "phases": ["debugging", "troubleshooting", "fixing"]
444
+ },
445
+ "review": {
446
+ "keywords": ["review", "check", "validate", "verify", "audit",
447
+ "inspect", "quality", "standards", "lint"],
448
+ "actions": ["diff", "review_pr", "check_style"],
449
+ "phases": ["review", "qa", "validation"]
450
+ },
451
+ "refactoring": {
452
+ "keywords": ["refactor", "restructure", "reorganize", "clean up",
453
+ "improve structure", "extract", "rename", "move"],
454
+ "actions": ["rename", "move_file", "extract_function"],
455
+ "phases": ["refactoring", "cleanup", "optimization"]
456
+ }
457
+ }
458
+
459
+ # Score each task type based on signal matches
460
+ scores = {}
461
+ for task_type, type_signals in signals.items():
462
+ score = 0
463
+
464
+ # Keyword matches (weight: 2)
465
+ for keyword in type_signals["keywords"]:
466
+ if keyword in goal:
467
+ score += 2
468
+
469
+ # Action matches (weight: 3)
470
+ for action_signal in type_signals["actions"]:
471
+ if action_signal in action:
472
+ score += 3
473
+
474
+ # Phase matches (weight: 4 - strongest signal)
475
+ for phase_signal in type_signals["phases"]:
476
+ if phase_signal in phase:
477
+ score += 4
478
+
479
+ scores[task_type] = score
480
+
481
+ # Return highest scoring type, default to 'implementation'
482
+ best_type = max(scores, key=scores.get)
483
+ if scores[best_type] == 0:
484
+ return "implementation" # Default when no signals match
485
+
486
+ return best_type
487
+ ```
488
+
489
+ #### Applying Task-Aware Retrieval
490
+
491
+ Modified retrieval function that applies task-specific weights:
492
+
493
+ ```python
494
+ # Strategy weight configurations
495
+ TASK_STRATEGIES = {
496
+ "exploration": {"episodic": 0.6, "semantic": 0.3, "skills": 0.1, "anti_patterns": 0.0},
497
+ "implementation": {"episodic": 0.15, "semantic": 0.5, "skills": 0.35, "anti_patterns": 0.0},
498
+ "debugging": {"episodic": 0.4, "semantic": 0.2, "skills": 0.0, "anti_patterns": 0.4},
499
+ "review": {"episodic": 0.3, "semantic": 0.5, "skills": 0.0, "anti_patterns": 0.2},
500
+ "refactoring": {"episodic": 0.25, "semantic": 0.45, "skills": 0.3, "anti_patterns": 0.0}
501
+ }
502
+
503
+ def retrieve_task_aware_memory(current_context):
504
+ """
505
+ Retrieve memories with task-type-aware weighting.
506
+
507
+ Based on MemEvolve (arXiv 2512.18746) finding that task-aware
508
+ adaptation improves performance by 17% over static weights.
509
+
510
+ Note: This is simple strategy selection, NOT meta-evolution.
511
+ """
512
+ # 1. Detect task type
513
+ task_type = detect_task_type(current_context)
514
+ weights = TASK_STRATEGIES[task_type]
515
+
516
+ query_embedding = embed(current_context.goal)
517
+ results = {}
518
+
519
+ # 2. Search each memory type
520
+ if weights["semantic"] > 0:
521
+ results["semantic"] = vector_search(
522
+ collection="semantic",
523
+ query=query_embedding,
524
+ top_k=int(5 * weights["semantic"] / 0.5) # Scale top_k by weight
525
+ )
526
+
527
+ if weights["episodic"] > 0:
528
+ # For debugging, don't filter to only successful episodes
529
+ filters = {} if task_type == "debugging" else {"outcome": "success"}
530
+ results["episodic"] = vector_search(
531
+ collection="episodic",
532
+ query=query_embedding,
533
+ top_k=int(5 * weights["episodic"] / 0.5),
534
+ filters=filters
535
+ )
536
+
537
+ if weights["skills"] > 0:
538
+ results["skills"] = keyword_search(
539
+ collection="skills",
540
+ keywords=extract_keywords(current_context),
541
+ top_k=int(3 * weights["skills"] / 0.3)
542
+ )
543
+
544
+ if weights["anti_patterns"] > 0:
545
+ results["anti_patterns"] = search_anti_patterns(
546
+ query=query_embedding,
547
+ top_k=int(5 * weights["anti_patterns"] / 0.4)
548
+ )
549
+
550
+ # 3. Merge with task-specific weights
551
+ combined = merge_and_rank_weighted(results, weights)
552
+
553
+ # 4. Log strategy selection for analysis
554
+ log_strategy_selection(
555
+ task_type=task_type,
556
+ weights=weights,
557
+ results_count={k: len(v) for k, v in results.items() if v}
558
+ )
559
+
560
+ return combined[:5]
561
+ ```
562
+
563
+ #### Integration with Existing Retrieval
564
+
565
+ Update `before_task_execution` to use task-aware retrieval:
566
+
567
+ ```python
568
+ def before_task_execution_with_strategy(task):
569
+ """
570
+ Inject relevant memories into task context using task-aware strategy.
571
+ Enhanced version of before_task_execution().
572
+ """
573
+ # 1. Detect task type for logging/debugging
574
+ task_type = detect_task_type(task)
575
+ task.context["detected_task_type"] = task_type
576
+
577
+ # 2. Retrieve memories with task-aware weights
578
+ memories = retrieve_task_aware_memory(task)
579
+
580
+ # 3. For debugging tasks, explicitly surface anti-patterns
581
+ if task_type == "debugging":
582
+ anti_patterns = [m for m in memories if m.source == "anti_patterns"]
583
+ task.context["critical_avoid"] = [a.summary for a in anti_patterns]
584
+
585
+ # 4. Inject into prompt
586
+ task.context["relevant_patterns"] = [m.summary for m in memories]
587
+ task.context["applicable_skills"] = find_skills(task.type)
588
+
589
+ return task
590
+ ```
591
+
592
+ #### Limitations and Future Work
593
+
594
+ **What this IS:**
595
+ - Pre-defined strategy selection based on task type detection
596
+ - Static weight configurations applied dynamically
597
+ - Simple keyword/phase-based task classification
598
+
599
+ **What this is NOT (future work):**
600
+ - Meta-evolution: System does NOT learn to modify its own strategies
601
+ - Online learning: Weights are NOT updated based on outcomes
602
+ - Adaptive threshold adjustment: Detection thresholds are fixed
603
+ - Cross-task transfer: Learned patterns don't automatically propagate
604
+
605
+ **Future enhancements (not yet implemented):**
606
+ 1. Track retrieval effectiveness per strategy -> adjust weights
607
+ 2. Learn task type detection from outcomes
608
+ 3. Implement full MemEvolve meta-learning loop
609
+ 4. A/B test different weight configurations
610
+
355
611
  ---
356
612
 
357
613
  ## Ledger System (Agent Checkpoints)
@@ -451,9 +707,847 @@ def merge_duplicate_semantics():
451
707
 
452
708
  ---
453
709
 
710
+ ## Progressive Disclosure Architecture
711
+
712
+ **Research Source:** claude-mem (thedotmack) - Progressive Disclosure memory system
713
+
714
+ The Progressive Disclosure pattern reduces token usage by 60-80% by structuring memory into three layers that load progressively based on need. Instead of loading all context upfront, the system discovers what exists cheaply, then loads only what is relevant.
715
+
716
+ ### The Problem
717
+
718
+ Traditional memory systems load full context every time:
719
+ - 10,000 tokens of episodic memory loaded
720
+ - Agent only needed 500 tokens of relevant context
721
+ - 9,500 tokens wasted on discovery
722
+
723
+ ### Three-Layer Solution
724
+
725
+ ```
726
+ +------------------------------------------------------------------+
727
+ | LAYER 1: INDEX (~100 tokens) |
728
+ | .loki/memory/index.json |
729
+ | What exists? Quick topic scan. Always loaded at session start. |
730
+ +------------------------------------------------------------------+
731
+ |
732
+ | (load on context need)
733
+ v
734
+ +------------------------------------------------------------------+
735
+ | LAYER 2: TIMELINE (~500 tokens) |
736
+ | .loki/memory/timeline.json |
737
+ | Recent compressed history. Key decisions. Active context. |
738
+ +------------------------------------------------------------------+
739
+ |
740
+ | (load on specific topic need)
741
+ v
742
+ +------------------------------------------------------------------+
743
+ | LAYER 3: FULL DETAILS (unlimited) |
744
+ | .loki/memory/episodic/*.json, .loki/memory/semantic/*.json |
745
+ | Complete context. Loaded only when specific topic needed. |
746
+ +------------------------------------------------------------------+
747
+ ```
748
+
749
+ ### Layer 1: Index Layer (~100 tokens)
750
+
751
+ Always loaded at session start. Provides a quick scan of what exists in memory.
752
+
753
+ **File:** `.loki/memory/index.json`
754
+
755
+ ```json
756
+ {
757
+ "version": "1.0",
758
+ "last_updated": "2026-01-25T10:00:00Z",
759
+ "topics": [
760
+ {
761
+ "id": "auth-system",
762
+ "summary": "JWT authentication implementation",
763
+ "relevance_score": 0.92,
764
+ "last_accessed": "2026-01-25T09:30:00Z",
765
+ "token_count": 2400
766
+ },
767
+ {
768
+ "id": "api-routes",
769
+ "summary": "REST API endpoint patterns",
770
+ "relevance_score": 0.85,
771
+ "last_accessed": "2026-01-24T14:00:00Z",
772
+ "token_count": 1800
773
+ },
774
+ {
775
+ "id": "deployment-config",
776
+ "summary": "Docker and CI/CD setup",
777
+ "relevance_score": 0.71,
778
+ "last_accessed": "2026-01-23T11:00:00Z",
779
+ "token_count": 950
780
+ }
781
+ ],
782
+ "total_memories": 47,
783
+ "total_tokens_available": 28500
784
+ }
785
+ ```
786
+
787
+ **Usage:** Scan topics to determine if relevant context exists before loading anything.
788
+
789
+ ### Layer 2: Timeline Layer (~500 tokens)
790
+
791
+ Compressed recent history. Loaded when context is needed but full details are not yet required.
792
+
793
+ **File:** `.loki/memory/timeline.json`
794
+
795
+ ```json
796
+ {
797
+ "version": "1.0",
798
+ "last_updated": "2026-01-25T10:00:00Z",
799
+ "recent_actions": [
800
+ {
801
+ "timestamp": "2026-01-25T09:45:00Z",
802
+ "action": "Implemented refresh token rotation",
803
+ "outcome": "success",
804
+ "topic_id": "auth-system"
805
+ },
806
+ {
807
+ "timestamp": "2026-01-25T09:30:00Z",
808
+ "action": "Fixed JWT expiration bug",
809
+ "outcome": "success",
810
+ "topic_id": "auth-system"
811
+ },
812
+ {
813
+ "timestamp": "2026-01-25T08:00:00Z",
814
+ "action": "Added rate limiting to /api/login",
815
+ "outcome": "success",
816
+ "topic_id": "api-routes"
817
+ }
818
+ ],
819
+ "key_decisions": [
820
+ {
821
+ "decision": "Use RS256 for JWT signing",
822
+ "rationale": "Better security for distributed verification",
823
+ "date": "2026-01-24",
824
+ "topic_id": "auth-system"
825
+ },
826
+ {
827
+ "decision": "15-minute access token expiry",
828
+ "rationale": "Balance security with UX",
829
+ "date": "2026-01-24",
830
+ "topic_id": "auth-system"
831
+ }
832
+ ],
833
+ "active_context": {
834
+ "current_focus": "auth-system",
835
+ "blocked_by": [],
836
+ "next_up": ["api-routes", "testing"]
837
+ }
838
+ }
839
+ ```
840
+
841
+ **Usage:** Understand recent activity and key decisions without loading full episodic traces.
842
+
843
+ ### Layer 3: Full Details (Unlimited)
844
+
845
+ Complete context loaded only when a specific topic is needed.
846
+
847
+ **Files:**
848
+ - `.loki/memory/episodic/*.json` - Full interaction traces
849
+ - `.loki/memory/semantic/*.json` - Complete pattern definitions
850
+
851
+ **Usage:** Load only when working directly on a topic identified from Layer 1/2.
852
+
853
+ ### Token Economics Tracking
854
+
855
+ Track the efficiency of memory access patterns:
856
+
857
+ **File:** `.loki/memory/token_economics.json`
858
+
859
+ ```json
860
+ {
861
+ "session_id": "session-2026-01-25-001",
862
+ "metrics": {
863
+ "discovery_tokens": 150,
864
+ "read_tokens": 2400,
865
+ "ratio": 0.0625,
866
+ "savings_vs_full_load": "78%"
867
+ },
868
+ "breakdown": {
869
+ "layer1_loads": 1,
870
+ "layer1_tokens": 100,
871
+ "layer2_loads": 1,
872
+ "layer2_tokens_scanned": 50,
873
+ "layer2_tokens_available": 450,
874
+ "layer3_loads": 1,
875
+ "layer3_tokens": 2400
876
+ },
877
+ "baseline_comparison": {
878
+ "traditional_approach_tokens": 11500,
879
+ "progressive_approach_tokens": 2550,
880
+ "tokens_saved": 8950
881
+ }
882
+ }
883
+ ```
884
+
885
+ **Key Metrics:**
886
+
887
+ | Metric | Definition | Target |
888
+ |--------|------------|--------|
889
+ | `discovery_tokens` | Tokens spent finding what exists (L1 + L2 scanning) | Minimize |
890
+ | `read_tokens` | Tokens spent reading full context (L3) | Necessary cost |
891
+ | `ratio` | discovery_tokens / read_tokens | < 0.1 (10%) |
892
+ | `savings_vs_full_load` | % tokens saved vs loading everything | > 60% |
893
+
894
+ ### Action Thresholds
895
+
896
+ When metrics exceed thresholds, take corrective action:
897
+
898
+ | Metric | Threshold | Action | Rationale |
899
+ |--------|-----------|--------|-----------|
900
+ | `ratio` | > 0.15 (15%) | Compress Layer 3 entries | Discovery overhead too high; reduce L3 volume by archiving old entries or merging related topics |
901
+ | `savings_vs_full_load` | < 50% | Review topic relevance scoring | Progressive loading not providing sufficient benefit; topic boundaries may be too broad |
902
+ | `layer3_loads` | > 3 in single task | Create specialized index | Frequent cross-topic access indicates missing abstraction; create composite topic entry |
903
+ | `discovery_tokens` | > 200 | Reorganize topic index | Layer 1 index bloated or poorly structured; prune stale topics, merge overlapping entries |
904
+ | `layer2_tokens_scanned` | > 100 per query | Split large timelines | Timeline entries too verbose or topic too broad; consider sub-topic decomposition |
905
+ | `tokens_saved` | < 1000 per session | Evaluate memory ROI | Memory system overhead may exceed benefit for this session type; consider bypass mode |
906
+
907
+ #### Evaluation Frequency
908
+
909
+ Thresholds should be evaluated at specific checkpoints rather than continuously:
910
+
911
+ | Checkpoint | Evaluation Type | Rationale |
912
+ |------------|-----------------|-----------|
913
+ | After each task completion | Lightweight check | Per-task evaluation is cheap (~10 token overhead). Catches issues early before they compound. Only checks `ratio` and `layer3_loads`. |
914
+ | At session boundaries | Full evaluation | Comprehensive check of all thresholds. Session end is natural pause point for maintenance. Evaluates all 6 metrics. |
915
+ | When Layer 3 load count exceeds 2 | Triggered check | Prevents runaway costs mid-session. If hitting L3 frequently, stop and evaluate before continuing. Checks `layer3_loads` and `ratio`. |
916
+
917
+ **Why this pattern:**
918
+ - Per-task checks are lightweight (~10 tokens overhead) and catch issues early before they compound into expensive sessions
919
+ - Session boundary checks are comprehensive but infrequent, amortizing evaluation cost across all session work
920
+ - Triggered checks act as circuit breakers, preventing runaway token costs when access patterns degrade mid-session
921
+
922
+ ```python
923
+ def should_evaluate_thresholds(checkpoint_type, breakdown):
924
+ """
925
+ Determine evaluation scope based on checkpoint type.
926
+ """
927
+ if checkpoint_type == "task_complete":
928
+ # Lightweight: only check high-impact metrics
929
+ return ["ratio", "layer3_loads"]
930
+
931
+ elif checkpoint_type == "session_end":
932
+ # Full evaluation at session boundary
933
+ return ["ratio", "savings_vs_full_load", "layer3_loads",
934
+ "discovery_tokens", "layer2_tokens_scanned", "tokens_saved"]
935
+
936
+ elif checkpoint_type == "triggered":
937
+ # Mid-session triggered by high L3 access
938
+ if breakdown.get("layer3_loads", 0) > 2:
939
+ return ["layer3_loads", "ratio"]
940
+
941
+ return [] # No evaluation needed
942
+ ```
943
+
944
+ #### Priority Order (Multiple Thresholds)
945
+
946
+ When multiple thresholds are exceeded simultaneously, process corrective actions in this order:
947
+
948
+ | Priority | Metric | Action | Rationale |
949
+ |----------|--------|--------|-----------|
950
+ | 1 (HIGHEST) | `ratio > 0.15` | Compress Layer 3 entries | Cost control - discovery overhead directly impacts every query |
951
+ | 2 | `savings_vs_full_load < 50%` | Review topic relevance scoring | ROI validation - progressive loading must justify its complexity |
952
+ | 3 | `layer3_loads > 3` | Create specialized index | Structure issue - frequent cross-topic access indicates architectural gap |
953
+ | 4 | `discovery_tokens > 200` | Reorganize topic index | Index bloat - Layer 1 should remain lean for fast scanning |
954
+ | 5 | `layer2_tokens_scanned > 100` | Split large timelines | Timeline bloat - Layer 2 scanning becoming expensive |
955
+ | 6 (LOWEST) | `tokens_saved < 1000` | Evaluate memory ROI | Informational - may indicate memory system not needed for this session type |
956
+
957
+ **Prioritization rationale:** Cost-impacting issues (ratio, savings) take precedence over structural issues (loads, bloat). Token spend is the primary optimization target; structural improvements support that goal.
958
+
959
+ ```python
960
+ # Priority-ordered threshold definitions
961
+ THRESHOLD_PRIORITY = [
962
+ {"metric": "ratio", "threshold": 0.15, "priority": 1},
963
+ {"metric": "savings_vs_full_load", "threshold": 50, "priority": 2},
964
+ {"metric": "layer3_loads", "threshold": 3, "priority": 3},
965
+ {"metric": "discovery_tokens", "threshold": 200, "priority": 4},
966
+ {"metric": "layer2_tokens_scanned", "threshold": 100, "priority": 5},
967
+ {"metric": "tokens_saved", "threshold": 1000, "priority": 6},
968
+ ]
969
+
970
+ def prioritize_actions(actions):
971
+ """
972
+ Sort corrective actions by priority when multiple thresholds exceeded.
973
+ """
974
+ priority_map = {t["metric"]: t["priority"] for t in THRESHOLD_PRIORITY}
975
+
976
+ def get_priority(action):
977
+ # Extract metric from action reason
978
+ for metric in priority_map:
979
+ if metric in action.get("reason", ""):
980
+ return priority_map[metric]
981
+ return 99 # Unknown actions last
982
+
983
+ return sorted(actions, key=get_priority)
984
+ ```
985
+
986
+ **Threshold Implementation:**
987
+
988
+ ```python
989
+ def check_thresholds(metrics, breakdown):
990
+ """
991
+ Evaluate metrics against action thresholds.
992
+ Returns list of recommended actions.
993
+ """
994
+ actions = []
995
+
996
+ if metrics["ratio"] > 0.15:
997
+ actions.append({
998
+ "action": "compress_layer3",
999
+ "reason": f"Discovery ratio {metrics['ratio']:.2%} exceeds 15%",
1000
+ "priority": "high"
1001
+ })
1002
+
1003
+ if float(metrics["savings_vs_full_load"].rstrip('%')) < 50:
1004
+ actions.append({
1005
+ "action": "review_topic_relevance",
1006
+ "reason": f"Savings {metrics['savings_vs_full_load']} below 50% target",
1007
+ "priority": "medium"
1008
+ })
1009
+
1010
+ if breakdown["layer3_loads"] > 3:
1011
+ actions.append({
1012
+ "action": "create_specialized_index",
1013
+ "reason": f"{breakdown['layer3_loads']} L3 loads in single task",
1014
+ "priority": "medium"
1015
+ })
1016
+
1017
+ if metrics["discovery_tokens"] > 200:
1018
+ actions.append({
1019
+ "action": "reorganize_topic_index",
1020
+ "reason": f"Discovery tokens ({metrics['discovery_tokens']}) exceed 200",
1021
+ "priority": "low"
1022
+ })
1023
+
1024
+ return actions
1025
+ ```
1026
+
1027
+ ### Compression Algorithm
1028
+
1029
+ Moving context from Layer 3 to Layer 2 to Layer 1:
1030
+
1031
+ ```python
1032
+ def compress_to_timeline(episodic_entry):
1033
+ """
1034
+ Layer 3 -> Layer 2: Full episodic trace to timeline entry.
1035
+ Compression ratio: ~10:1
1036
+ """
1037
+ return {
1038
+ "timestamp": episodic_entry["timestamp"],
1039
+ "action": summarize_in_10_words(episodic_entry["context"]["goal"]),
1040
+ "outcome": episodic_entry["outcome"],
1041
+ "topic_id": extract_topic(episodic_entry)
1042
+ }
1043
+
1044
+ def compress_to_index(topic_memories):
1045
+ """
1046
+ Layer 2 -> Layer 1: Timeline entries to index topic.
1047
+ Compression ratio: ~20:1
1048
+ """
1049
+ return {
1050
+ "id": topic_memories[0]["topic_id"],
1051
+ "summary": extract_one_line_summary(topic_memories),
1052
+ "relevance_score": calculate_relevance(topic_memories),
1053
+ "last_accessed": max(m["timestamp"] for m in topic_memories),
1054
+ "token_count": sum(m.get("token_count", 0) for m in topic_memories)
1055
+ }
1056
+
1057
+ def summarize_in_10_words(text):
1058
+ """
1059
+ Compress any text to ~10 words capturing core meaning.
1060
+ Uses extractive summarization, not generative.
1061
+ """
1062
+ # Extract subject-verb-object from first sentence
1063
+ # Remove adjectives and adverbs
1064
+ # Keep only action and target
1065
+ return extract_svo(text)[:10]
1066
+
1067
+ def extract_one_line_summary(topic_memories):
1068
+ """
1069
+ Create topic summary from multiple memories.
1070
+ """
1071
+ # Find most common action type
1072
+ # Combine with most specific target
1073
+ # Result: "JWT authentication implementation"
1074
+ actions = [m["action"] for m in topic_memories]
1075
+ return find_common_theme(actions)
1076
+ ```
1077
+
1078
+ ### Progressive Loading Algorithm
1079
+
1080
+ ```python
1081
+ def load_relevant_context(query):
1082
+ """
1083
+ Load context progressively, minimizing token usage.
1084
+ """
1085
+ tokens_used = {"discovery": 0, "read": 0}
1086
+
1087
+ # Step 1: Always load index (~100 tokens)
1088
+ index = load_file(".loki/memory/index.json")
1089
+ tokens_used["discovery"] += 100
1090
+
1091
+ # Step 2: Find relevant topics from index
1092
+ relevant_topics = [
1093
+ t for t in index["topics"]
1094
+ if similarity(t["summary"], query) > 0.5
1095
+ ]
1096
+
1097
+ if not relevant_topics:
1098
+ return None, tokens_used # Nothing relevant found
1099
+
1100
+ # Step 3: Load timeline for context (~500 tokens)
1101
+ timeline = load_file(".loki/memory/timeline.json")
1102
+ tokens_used["discovery"] += 50 # Only scan relevant entries
1103
+
1104
+ # Check if timeline has enough context
1105
+ recent_for_topic = [
1106
+ a for a in timeline["recent_actions"]
1107
+ if a["topic_id"] in [t["id"] for t in relevant_topics]
1108
+ ]
1109
+
1110
+ if sufficient_context(recent_for_topic, query):
1111
+ return recent_for_topic, tokens_used # Timeline was enough
1112
+
1113
+ # Step 4: Load full details only if needed
1114
+ for topic in relevant_topics[:2]: # Limit to top 2 topics
1115
+ full_context = load_full_topic(topic["id"])
1116
+ tokens_used["read"] += topic["token_count"]
1117
+
1118
+ return full_context, tokens_used
1119
+ ```
1120
+
1121
+ ### Directory Structure Update
1122
+
1123
+ The progressive disclosure layers integrate with the existing memory structure:
1124
+
1125
+ ```
1126
+ .loki/memory/
1127
+ +-- index.json # Layer 1: Topic index (~100 tokens)
1128
+ +-- timeline.json # Layer 2: Compressed history (~500 tokens)
1129
+ +-- token_economics.json # Tracking metrics
1130
+ +-- episodic/ # Layer 3: Full episodic traces
1131
+ | +-- 2026-01-06/
1132
+ | | +-- task-001.json
1133
+ | +-- index.json # Temporal index for retrieval
1134
+ +-- semantic/ # Layer 3: Full semantic patterns
1135
+ | +-- patterns.json
1136
+ | +-- anti-patterns.json
1137
+ | +-- facts.json
1138
+ | +-- links.json # Zettelkasten connections
1139
+ +-- skills/ # Procedural memory (separate system)
1140
+ +-- ledgers/ # Agent checkpoints
1141
+ +-- handoffs/ # Agent-to-agent transfers
1142
+ +-- learnings/ # Extracted from errors
1143
+ ```
1144
+
1145
+ ### Benefits
1146
+
1147
+ 1. **60-80% Token Reduction:** Only load what you need
1148
+ 2. **Faster Discovery:** Index scan is O(1) instead of O(n)
1149
+ 3. **Predictable Costs:** Know token budget before loading
1150
+ 4. **Graceful Degradation:** Works with partial loads
1151
+
1152
+ ---
1153
+
1154
+ ## Modular Memory Design Space (MemEvolve)
1155
+
1156
+ **Research Source:** arXiv 2512.18746 - MemEvolve: Meta-Evolution of Agent Memory Systems
1157
+
1158
+ MemEvolve proposes a 4-component framework for decomposing and analyzing agent memory systems. This section maps the framework to Loki Mode's existing architecture.
1159
+
1160
+ ### The Four Components
1161
+
1162
+ | Component | Function | Loki Mode Implementation |
1163
+ |-----------|----------|--------------------------|
1164
+ | **Encode** | How raw experience is structured | Episodic traces with action_log, context, artifacts |
1165
+ | **Store** | How it is integrated into memory | JSON files in `.loki/memory/`, consolidation pipeline |
1166
+ | **Retrieve** | How it is recalled | Similarity search, temporal index, keyword search |
1167
+ | **Manage** | Offline processes | Pruning, merging duplicates, archiving |
1168
+
1169
+ ### Encode: Structuring Raw Experience
1170
+
1171
+ MemEvolve identifies encoding as the first critical decision: how raw observations, actions, and outcomes are transformed into structured memory entries.
1172
+
1173
+ **Loki Mode Implementation:**
1174
+
1175
+ Episodic memory entries encode experience with:
1176
+ - `action_log`: Timestamped sequence of actions taken
1177
+ - `context`: Phase, goal, constraints, files involved
1178
+ - `errors_encountered`: Structured error types, messages, resolutions
1179
+ - `artifacts_produced`: Files created or modified
1180
+ - `outcome`: Success/failure classification
1181
+
1182
+ ```json
1183
+ {
1184
+ "context": {
1185
+ "phase": "development",
1186
+ "goal": "Implement POST /api/todos endpoint",
1187
+ "constraints": ["No third-party deps", "< 200ms response"],
1188
+ "files_involved": ["src/routes/todos.ts", "src/db/todos.ts"]
1189
+ },
1190
+ "action_log": [
1191
+ {"t": 0, "action": "read_file", "target": "openapi.yaml"},
1192
+ {"t": 5, "action": "write_file", "target": "src/routes/todos.ts"}
1193
+ ],
1194
+ "artifacts_produced": ["src/routes/todos.ts", "tests/todos.test.ts"]
1195
+ }
1196
+ ```
1197
+
1198
+ **MemEvolve Alignment:** Loki Mode uses structured encoding with explicit action traces. The encoding preserves temporal ordering and causal relationships between actions.
1199
+
1200
+ ### Store: Integrating Into Memory
1201
+
1202
+ MemEvolve examines how encoded information is integrated and organized within the memory system.
1203
+
1204
+ **Loki Mode Implementation:**
1205
+
1206
+ 1. **Primary Storage:** JSON files in `.loki/memory/` hierarchy
1207
+ - Episodic: `.loki/memory/episodic/{date}/task-{id}.json`
1208
+ - Semantic: `.loki/memory/semantic/patterns.json`, `anti-patterns.json`
1209
+ - Procedural: `.loki/memory/skills/*.md`
1210
+
1211
+ 2. **Consolidation Pipeline:** Episodic-to-semantic transformation
1212
+ - Cluster similar episodes
1213
+ - Extract common patterns (confidence >= 0.8)
1214
+ - Update existing patterns or create new ones
1215
+ - Extract anti-patterns from error episodes
1216
+
1217
+ 3. **Zettelkasten Linking:** Cross-references between memory entries
1218
+ - Relations: `derived_from`, `related_to`, `contradicts`, `supersedes`
1219
+ - Stored in `links.json` and embedded in pattern entries
1220
+
1221
+ **MemEvolve Alignment:** Loki Mode implements hierarchical storage (episodic -> semantic -> procedural) with explicit consolidation. The Zettelkasten linking provides associative structure beyond simple hierarchies.
1222
+
1223
+ ### Retrieve: Recalling Memories
1224
+
1225
+ MemEvolve analyzes how memories are located and retrieved at inference time.
1226
+
1227
+ **Loki Mode Implementation:**
1228
+
1229
+ 1. **Similarity Search:** Vector-based retrieval
1230
+ ```python
1231
+ semantic_matches = vector_search(
1232
+ collection="semantic",
1233
+ query=query_embedding,
1234
+ top_k=5
1235
+ )
1236
+ ```
1237
+
1238
+ 2. **Temporal Index:** Date-based episodic retrieval
1239
+ - `.loki/memory/episodic/index.json` provides temporal navigation
1240
+ - Enables queries like "what happened last week"
1241
+
1242
+ 3. **Keyword Search:** Skill matching by keywords
1243
+ ```python
1244
+ skill_matches = keyword_search(
1245
+ collection="skills",
1246
+ keywords=extract_keywords(current_context)
1247
+ )
1248
+ ```
1249
+
1250
+ 4. **Progressive Disclosure:** 3-layer loading (index -> timeline -> full)
1251
+ - Minimizes token usage by loading incrementally
1252
+ - Index layer (~100 tokens) always loaded first
1253
+
1254
+ **MemEvolve Alignment:** Loki Mode uses multi-modal retrieval (vector, temporal, keyword). The progressive disclosure pattern optimizes retrieval cost.
1255
+
1256
+ ### Manage: Offline Processes
1257
+
1258
+ MemEvolve identifies management operations that maintain memory quality over time.
1259
+
1260
+ **Loki Mode Implementation:**
1261
+
1262
+ 1. **Pruning:** Time-based retention policy
1263
+ - Last 7 days: full detail
1264
+ - Last 30 days: summarized
1265
+ - Older: archived unless referenced by semantic memory
1266
+
1267
+ 2. **Merging Duplicates:** Semantic deduplication
1268
+ ```python
1269
+ clusters = cluster_by_embedding_similarity(all_patterns, threshold=0.9)
1270
+ # Keep highest confidence, merge source episodes
1271
+ # Create superseded_by links to deprecated entries
1272
+ ```
1273
+
1274
+ 3. **Archiving:** Move stale entries to cold storage
1275
+ - Unreferenced episodes archived after 30 days
1276
+ - Semantic patterns with low usage_count flagged for review
1277
+
1278
+ 4. **Compression:** Layer transitions
1279
+ - Full episodic -> Timeline entry (~10:1 compression)
1280
+ - Timeline entries -> Index topic (~20:1 compression)
1281
+
1282
+ **MemEvolve Alignment:** Loki Mode implements standard memory hygiene (pruning, merging, archiving). Compression enables efficient long-term storage.
1283
+
1284
+ ### Gap Analysis: Meta-Evolution
1285
+
1286
+ **Critical Limitation:** Loki Mode does NOT implement meta-evolution.
1287
+
1288
+ MemEvolve's key contribution is the meta-evolution mechanism: using search algorithms (evolutionary strategies, random search) to automatically discover optimal memory architectures for specific tasks.
1289
+
1290
+ | MemEvolve Feature | Loki Mode Status |
1291
+ |-------------------|------------------|
1292
+ | Encode component | Implemented (fixed design) |
1293
+ | Store component | Implemented (fixed design) |
1294
+ | Retrieve component | Implemented (fixed design) |
1295
+ | Manage component | Implemented (fixed design) |
1296
+ | **Meta-evolution** | **NOT IMPLEMENTED** |
1297
+
1298
+ **What Loki Mode lacks:**
1299
+
1300
+ 1. **Architecture Search:** No mechanism to automatically vary memory component implementations
1301
+ 2. **Fitness Evaluation:** No systematic evaluation of memory architecture performance on tasks
1302
+ 3. **Evolutionary Optimization:** No search over the design space of possible memory configurations
1303
+
1304
+ **Current State:** Loki Mode's memory architecture is **static** - defined at design time and unchanged during operation. The 4 components (encode, store, retrieve, manage) are implemented but fixed.
1305
+
1306
+ **Potential Future Work:**
1307
+
1308
+ If meta-evolution were added, it could optimize:
1309
+ - Encoding granularity (action-level vs. session-level traces)
1310
+ - Storage hierarchy depth (2 layers vs. 3 layers)
1311
+ - Retrieval weighting (semantic vs. episodic emphasis)
1312
+ - Pruning thresholds (retention periods, confidence cutoffs)
1313
+
1314
+ However, this would require infrastructure for:
1315
+ - Generating architecture variants
1316
+ - Evaluating variant performance on benchmark tasks
1317
+ - Selecting and propagating successful variants
1318
+
1319
+ This represents a significant architectural change beyond current scope.
1320
+
1321
+ ### Meta-Evolution Roadmap (Speculative)
1322
+
1323
+ **Important Disclaimer:** This roadmap is speculative. Loki Mode may **NEVER** implement full meta-evolution. The pragmatic approach is to optimize the fixed architecture based on observed usage patterns rather than invest in automated architecture search. The phases below represent what *could* be built, not what *will* be built.
1324
+
1325
+ #### Phase 1: Strategy Logging (v6.x) - Effort: Low (~2 weeks)
1326
+
1327
+ **Status:** NOT PLANNED
1328
+
1329
+ Track which memory strategies perform best in practice without any automated adjustment.
1330
+
1331
+ **What it would add:**
1332
+ - Log retrieval strategy used per query (vector, temporal, keyword)
1333
+ - Record outcome signal (did the retrieved memory help?)
1334
+ - Store success/failure rates per strategy in `.loki/metrics/memory-strategy/`
1335
+
1336
+ **Example logging format:**
1337
+ ```json
1338
+ {
1339
+ "query_id": "q-20260125-001",
1340
+ "timestamp": "2026-01-25T10:30:00Z",
1341
+ "retrieval_strategies": [
1342
+ {"type": "vector", "results": 3, "latency_ms": 45},
1343
+ {"type": "temporal", "results": 2, "latency_ms": 12}
1344
+ ],
1345
+ "selected_memories": ["sem-042", "ep-2026-01-20-007"],
1346
+ "outcome": "success",
1347
+ "task_completed": true
1348
+ }
1349
+ ```
1350
+
1351
+ **Value:** Data collection only. Enables manual analysis of which strategies work best for which task types. No automatic adjustment.
1352
+
1353
+ **Risk:** Low. Pure observability layer with no behavioral changes.
1354
+
1355
+ ---
1356
+
1357
+ #### Phase 2: A/B Testing Infrastructure (v7.x) - Effort: Medium (~1-2 months)
1358
+
1359
+ **Status:** NOT PLANNED
1360
+
1361
+ Enable comparison of different weight configurations without automated selection.
1362
+
1363
+ **What it would add:**
1364
+ - Define named weight profiles (e.g., "semantic-heavy", "episodic-heavy", "balanced")
1365
+ - Randomly assign sessions to profiles for controlled comparison
1366
+ - Dashboard/report comparing profile performance metrics
1367
+
1368
+ **Example profile definitions:**
1369
+ ```yaml
1370
+ profiles:
1371
+ semantic-heavy:
1372
+ semantic_weight: 0.7
1373
+ episodic_weight: 0.2
1374
+ skill_weight: 0.1
1375
+
1376
+ episodic-heavy:
1377
+ semantic_weight: 0.2
1378
+ episodic_weight: 0.7
1379
+ skill_weight: 0.1
1380
+
1381
+ balanced:
1382
+ semantic_weight: 0.34
1383
+ episodic_weight: 0.33
1384
+ skill_weight: 0.33
1385
+ ```
1386
+
1387
+ **Metrics to compare:**
1388
+ - Task completion rate per profile
1389
+ - Average retrieval latency
1390
+ - Context token usage
1391
+ - User satisfaction signals (if available)
1392
+
1393
+ **Value:** Evidence-based profile selection. Humans choose best profile based on data.
1394
+
1395
+ **Risk:** Medium. Requires careful experiment design to avoid confounding variables.
1396
+
1397
+ ---
1398
+
1399
+ #### Phase 3: Online Learning (v8.x) - Effort: High (~3-6 months)
1400
+
1401
+ **Status:** NOT PLANNED - SIGNIFICANT INVESTMENT
1402
+
1403
+ Automatically adjust retrieval weights based on observed outcomes.
1404
+
1405
+ **What it would add:**
1406
+ - Bandit algorithm (Thompson Sampling or UCB) over weight configurations
1407
+ - Gradual weight adjustment based on success signals
1408
+ - Guardrails to prevent catastrophic configuration drift
1409
+
1410
+ **Algorithm sketch:**
1411
+ ```python
1412
+ # Thompson Sampling over discrete weight buckets
1413
+ class MemoryWeightOptimizer:
1414
+ def __init__(self):
1415
+ # Prior: Beta(1,1) = uniform for each weight bucket
1416
+ self.alpha = defaultdict(lambda: 1.0) # success counts
1417
+ self.beta = defaultdict(lambda: 1.0) # failure counts
1418
+
1419
+ def select_weights(self):
1420
+ # Sample from posterior for each strategy
1421
+ samples = {}
1422
+ for strategy in ['semantic', 'episodic', 'skill']:
1423
+ samples[strategy] = np.random.beta(
1424
+ self.alpha[strategy],
1425
+ self.beta[strategy]
1426
+ )
1427
+ # Normalize to sum to 1
1428
+ total = sum(samples.values())
1429
+ return {k: v/total for k, v in samples.items()}
1430
+
1431
+ def update(self, strategy, success: bool):
1432
+ if success:
1433
+ self.alpha[strategy] += 1
1434
+ else:
1435
+ self.beta[strategy] += 1
1436
+ ```
1437
+
1438
+ **Guardrails required:**
1439
+ - Minimum exploration rate (never drop below 10% for any strategy)
1440
+ - Maximum update rate (no more than 5% weight shift per day)
1441
+ - Automatic rollback if task completion rate drops >20%
1442
+
1443
+ **Value:** Self-improving retrieval that adapts to usage patterns.
1444
+
1445
+ **Risk:** High. Automated adjustment could degrade performance if outcome signals are noisy or delayed. Requires robust monitoring and rollback capabilities.
1446
+
1447
+ ---
1448
+
1449
+ #### Phase 4: Full Meta-Evolution (v9.x) - Effort: Very High (~6-12 months)
1450
+
1451
+ **Status:** NOT PLANNED - MAY NEVER BE IMPLEMENTED
1452
+
1453
+ True architecture search over the design space of memory configurations.
1454
+
1455
+ **What it would add:**
1456
+ - Define architecture search space (component variants, hyperparameters)
1457
+ - Evolutionary algorithm over architecture configurations
1458
+ - Fitness function based on composite performance metrics
1459
+ - Population-based training with crossover and mutation
1460
+
1461
+ **Search space dimensions:**
1462
+ | Component | Variants |
1463
+ |-----------|----------|
1464
+ | Encoding | Action-level, Session-level, Hierarchical |
1465
+ | Storage layers | 2, 3, or 4 layers |
1466
+ | Retrieval modes | Vector-only, Hybrid, Graph-based |
1467
+ | Pruning policy | Time-based, Usage-based, Importance-based |
1468
+ | Compression | 5:1, 10:1, 20:1 ratios |
1469
+
1470
+ **Search space size:** ~300+ distinct configurations
1471
+
1472
+ **Fitness function (example):**
1473
+ ```python
1474
+ def fitness(architecture, benchmark_tasks):
1475
+ task_score = evaluate_task_completion(architecture, benchmark_tasks)
1476
+ efficiency = 1.0 / measure_token_usage(architecture)
1477
+ latency_penalty = max(0, avg_latency - 100ms) / 1000
1478
+
1479
+ return (
1480
+ 0.6 * task_score +
1481
+ 0.3 * efficiency -
1482
+ 0.1 * latency_penalty
1483
+ )
1484
+ ```
1485
+
1486
+ **Why this may never be built:**
1487
+
1488
+ 1. **Diminishing returns:** A well-tuned fixed architecture likely captures 90%+ of potential value. Meta-evolution chases the last 10% at 10x the engineering cost.
1489
+
1490
+ 2. **Benchmark limitations:** MemEvolve evaluated on synthetic benchmarks (reading comprehension, trip planning). Real-world Loki Mode tasks have no clean benchmark equivalent.
1491
+
1492
+ 3. **Evaluation cost:** Each architecture variant requires extensive testing. At 300+ configurations with multiple evaluation runs each, this becomes expensive.
1493
+
1494
+ 4. **Complexity vs. maintainability:** Self-modifying architectures are hard to debug, reason about, and maintain.
1495
+
1496
+ 5. **Sufficient without it:** The current fixed design handles Loki Mode's use cases well. Investment is better spent on other features.
1497
+
1498
+ **Honest assessment:** Full meta-evolution is academically interesting but likely not worth the investment for Loki Mode. The pragmatic path is:
1499
+ - Implement Phase 1 (logging) to understand patterns
1500
+ - Maybe implement Phase 2 (A/B testing) if data suggests significant improvement potential
1501
+ - Stop there unless clear evidence justifies further investment
1502
+
1503
+ ---
1504
+
1505
+ ### Summary: Gap Mitigation Strategy
1506
+
1507
+ | Gap | Mitigation | Status |
1508
+ |-----|------------|--------|
1509
+ | No architecture search | Use well-researched fixed design | Current |
1510
+ | No fitness evaluation | Observability + manual tuning | Current |
1511
+ | No weight optimization | Sensible defaults from literature | Current |
1512
+ | Strategy logging | Phase 1 candidate | NOT PLANNED |
1513
+ | A/B testing | Phase 2 candidate | NOT PLANNED |
1514
+ | Online learning | Phase 3 candidate | NOT PLANNED |
1515
+ | Full meta-evolution | Phase 4 candidate | PROBABLY NEVER |
1516
+
1517
+ The current fixed architecture is a reasonable tradeoff. Meta-evolution would be valuable for systems with diverse, well-benchmarked task distributions. Loki Mode's primary value is in autonomous code generation, where other factors (model capability, prompt engineering, tool integration) likely dominate over memory architecture optimization.
1518
+
1519
+ ---
1520
+
454
1521
  ## Integration with CONTINUITY.md
455
1522
 
456
- CONTINUITY.md is working memory - it references but doesn't duplicate long-term memory:
1523
+ CONTINUITY.md is **working memory** that sits above the 3-layer progressive disclosure system:
1524
+
1525
+ ```
1526
+ +------------------------------------------------------------------+
1527
+ | CONTINUITY.md (Working Memory) |
1528
+ | - Read/write every turn |
1529
+ | - Current task state, decisions, blockers |
1530
+ | - NOT part of long-term memory system |
1531
+ +------------------------------------------------------------------+
1532
+ |
1533
+ | (references, doesn't duplicate)
1534
+ v
1535
+ +------------------------------------------------------------------+
1536
+ | Progressive Disclosure Layers (Long-Term Memory) |
1537
+ | Layer 1: index.json -> Layer 2: timeline.json -> Layer 3: full |
1538
+ +------------------------------------------------------------------+
1539
+ ```
1540
+
1541
+ ### How They Work Together
1542
+
1543
+ 1. **Session Start:** Load CONTINUITY.md (always), then load Layer 1 index.json
1544
+ 2. **Task Execution:** CONTINUITY.md references memory IDs (e.g., `[sem-001]`) without duplicating content
1545
+ 3. **Context Retrieval:** When CONTINUITY.md references a topic, progressive load from Layer 1 -> 2 -> 3
1546
+ 4. **Session End:** Update CONTINUITY.md with outcomes; update timeline.json with compressed entries
1547
+
1548
+ ### CONTINUITY.md References Pattern
1549
+
1550
+ CONTINUITY.md references but doesn't duplicate long-term memory:
457
1551
 
458
1552
  ```markdown
459
1553
  ## Relevant Memories (Auto-Retrieved)
@@ -465,3 +1559,5 @@ CONTINUITY.md is working memory - it references but doesn't duplicate long-term
465
1559
  - Don't forget return type annotations
466
1560
  - Run contract tests before marking complete
467
1561
  ```
1562
+
1563
+ These IDs (`sem-001`, `ep-2026-01-05-012`) can be resolved via Layer 1 -> 3 lookup when full context is needed.