amalfa 1.0.2 → 1.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. package/package.json +1 -1
  2. package/src/cli.ts +1 -1
  3. package/docs/AGENT-METADATA-PATTERNS.md +0 -1021
  4. package/docs/AGENT_PROTOCOLS.md +0 -28
  5. package/docs/ARCHITECTURAL_OVERVIEW.md +0 -123
  6. package/docs/BENTO_BOXING_DEPRECATION.md +0 -281
  7. package/docs/Bun-SQLite.html +0 -464
  8. package/docs/COMMIT_GUIDELINES.md +0 -367
  9. package/docs/CONFIG_E2E_VALIDATION.md +0 -147
  10. package/docs/CONFIG_UNIFICATION.md +0 -187
  11. package/docs/CONFIG_VALIDATION.md +0 -103
  12. package/docs/DEVELOPER_ONBOARDING.md +0 -36
  13. package/docs/Graph and Vector Database Best Practices.md +0 -214
  14. package/docs/LEGACY_DEPRECATION.md +0 -174
  15. package/docs/MCP_SETUP.md +0 -317
  16. package/docs/PERFORMANCE_BASELINES.md +0 -88
  17. package/docs/QUICK_START_MCP.md +0 -168
  18. package/docs/REPOSITORY_CLEANUP_SUMMARY.md +0 -261
  19. package/docs/SESSION-2026-01-06-METADATA-PATTERNS.md +0 -346
  20. package/docs/SETUP.md +0 -464
  21. package/docs/SETUP_COMPLETE.md +0 -464
  22. package/docs/VISION-AGENT-LEARNING.md +0 -1242
  23. package/docs/_current-config-status.md +0 -93
  24. package/docs/edge-generation-methods.md +0 -57
  25. package/docs/elevator-pitch.md +0 -118
  26. package/docs/graph-and-vector-database-playbook.html +0 -480
  27. package/docs/hardened-sqlite.md +0 -85
  28. package/docs/headless-knowledge-management.md +0 -79
  29. package/docs/john-kaye-flux-prompt.md +0 -46
  30. package/docs/keyboard-shortcuts.md +0 -80
  31. package/docs/opinion-proceed-pattern.md +0 -29
  32. package/docs/polyvis-nodes-edges-schema.md +0 -77
  33. package/docs/protocols/lab-protocol.md +0 -30
  34. package/docs/reaction-iquest-loop-coder.md +0 -46
  35. package/docs/services.md +0 -60
  36. package/docs/sqlite-wal-readonly-trap.md +0 -228
  37. package/docs/strategy/css-architecture.md +0 -40
  38. package/docs/test-document-cycle.md +0 -83
  39. package/docs/test_lifecycle_E2E.md +0 -4
  40. package/docs/the-bicameral-graph.md +0 -83
  41. package/docs/user-guide.md +0 -70
  42. package/docs/vision-helper.md +0 -53
  43. package/polyvis.settings.json.bak +0 -38
  44. package/src/EnlightenedTriad.ts +0 -146
  45. package/src/JIT_Triad.ts +0 -137
  46. package/src/data/experience/test_doc_1.md +0 -2
  47. package/src/data/experience/test_doc_2.md +0 -2
  48. package/src/demo-triad.ts +0 -45
  49. package/src/gardeners/BaseGardener.ts +0 -55
  50. package/src/llm/EnlightenedProvider.ts +0 -95
  51. package/src/services/README.md +0 -56
  52. package/src/services/llama.ts +0 -59
  53. package/src/services/llamauv.ts +0 -56
  54. package/src/services/olmo3.ts +0 -58
  55. package/src/services/phi.ts +0 -52
@@ -1,1021 +0,0 @@
1
- # Agent-First Metadata: Auto-Augmentation Patterns
2
-
3
- **Date:** 2026-01-06
4
- **Status:** Design Document
5
- **Context:** Agent autonomy with human audit model
6
-
7
- ---
8
-
9
- ## Executive Summary
10
-
11
- Traditional knowledge management systems require humans to manually tag, link, and organize documents. This creates bottlenecks and doesn't scale. This document describes an **agent-first metadata system** where agents automatically augment documents with tags, links, and semantic relationships, while humans audit changes through git review rather than approving every decision.
12
-
13
- **Key Innovation:** Latent space tagging - tags and clusters emerge from vector embeddings rather than predefined taxonomies.
14
-
15
- ---
16
-
17
- ## Core Principles
18
-
19
- ### 1. Agent Autonomy by Default
20
-
21
- **The inversion:**
22
-
23
- **❌ Old model:**
24
- ```
25
- Agent: "Should I add this tag?"
26
- Human: *reviews, approves*
27
- Agent: "Should I link this document?"
28
- Human: *reviews, approves*
29
- ```
30
- → Human bottleneck on every decision
31
-
32
- **✅ New model:**
33
- ```
34
- Agent: *adds tags, links, metadata automatically*
35
- Human: *reviews git diff occasionally*
36
- Human: *removes/modifies anything wrong*
37
- Daemon: *picks up changes, re-indexes*
38
- ```
39
- → Human audits, doesn't approve
40
-
41
- ### 2. Git as Source of Truth
42
-
43
- **All agent augmentations are git commits:**
44
- - Atomic (each augmentation is one commit)
45
- - Auditable (see exactly what agent changed)
46
- - Reversible (revert any change)
47
- - Non-destructive (originals preserved in history)
48
-
49
- **Pattern:**
50
- ```bash
51
- # Agent auto-augments
52
- [Amalfa: auto-tagged debrief-auth-refactor]
53
-
54
- # Human reviews
55
- $ git diff
56
-
57
- # If wrong, just edit
58
- $ vim debrief-auth-refactor.md
59
- $ git commit -m "Remove incorrect tag"
60
-
61
- # Daemon syncs automatically
62
- [Amalfa: detected manual edit, re-indexed]
63
- ```
64
-
65
- ### 3. Optimistic Metadata
66
-
67
- **Metadata is optimistic by default, corrected on audit:**
68
-
69
- - Agent adds metadata immediately (no approval needed)
70
- - Human reviews periodically (weekly, not per-decision)
71
- - Human corrects errors (removes/modifies as needed)
72
- - System adapts to corrections (learns from edits)
73
-
74
- **Scaling characteristic:** Human effort is O(log N), not O(N).
75
-
76
- ### 4. Latent Space Organization
77
-
78
- **Tags and clusters emerge from vector space, not predefined taxonomy:**
79
-
80
- - No predefined tag list to maintain
81
- - Clusters form naturally from content similarity
82
- - Labels generated from cluster analysis
83
- - Adapts as knowledge base grows
84
-
85
- **Example:**
86
- ```yaml
87
- tags:
88
- latent:
89
- - auth-state-patterns (0.91) # cluster assignment
90
- - ui-reactivity (0.78) # secondary cluster
91
- ```
92
-
93
- ---
94
-
95
- ## The Auto-Augmentation Workflow
96
-
97
- ### On Document Save
98
-
99
- **Triggered automatically (pre-commit hook or daemon watch):**
100
-
101
- ```bash
102
- [file saved: debrief-auth-refactor.md]
103
-
104
- $ amalfa auto-augment debrief-auth-refactor.md
105
-
106
- Processing...
107
- [1/6] Entity extraction (0.3s)
108
- [2/6] Auto-linking (0.5s)
109
- [3/6] Clustering (0.8s)
110
- [4/6] Similarity search (0.2s)
111
- [5/6] Tag extraction (0.4s)
112
- [6/6] Metadata generation (0.1s)
113
-
114
- ✓ Done. Modified front matter (15 lines changed).
115
- ```
116
-
117
- **Agent augments document, commits, done.**
118
-
119
- ### What Gets Added
120
-
121
- **Front matter is augmented with:**
122
-
123
- ```yaml
124
- ---
125
- type: debrief
126
- brief_id: brief-auth-refactor
127
- date: 2026-01-05
128
- author: claude-3.5
129
-
130
- # Auto-generated by Amalfa (edit freely, changes will sync)
131
- tags:
132
- explicit: [alpine.js, state-management, localStorage]
133
- latent:
134
- - auth-state-patterns (0.91)
135
- - ui-reactivity (0.78)
136
- topics:
137
- - authentication (0.45)
138
- - state-patterns (0.38)
139
-
140
- links:
141
- - playbook-alpine-patterns (uses-pattern, 0.89)
142
- - playbook-state-persistence (extends, 0.81)
143
-
144
- suggested_reading:
145
- - debrief-session-management (0.87)
146
- - playbook-reactive-patterns (0.82)
147
-
148
- semantic_neighbors:
149
- - debrief-session-management (0.87)
150
- - debrief-login-flow (0.83)
151
-
152
- vector_id: vec_a7f3d2e
153
- embedding_model: all-MiniLM-L6-v2
154
- last_indexed: 2026-01-05T14:45:00Z
155
- ---
156
- ```
157
-
158
- **Body is augmented with wiki links:**
159
-
160
- ```markdown
161
- ## What Worked
162
- - [[playbook-alpine-patterns|Alpine's x-data pattern]] eliminated state tracking
163
- - Token refresh using [[debrief-token-refresh|$watch]] is reactive
164
-
165
- ## Lessons Learned
166
- - Alpine for UI state, [[playbook-state-persistence|localStorage]] for persistence
167
- ```
168
-
169
- ---
170
-
171
- ## The Pattern Library
172
-
173
- ### Pattern 1: Latent Space Tagging
174
-
175
- **Purpose:** Organize documents without predefined taxonomy
176
-
177
- **How it works:**
178
-
179
- ```python
180
- # Cluster all documents in embedding space
181
- docs = load_all_documents()
182
- embeddings = [doc.vector for doc in docs]
183
-
184
- # HDBSCAN or K-means clustering
185
- clusters = cluster_embeddings(embeddings, min_cluster_size=3)
186
-
187
- # Generate label for each cluster
188
- for cluster in clusters:
189
- # Analyze cluster content to extract theme
190
- label = generate_cluster_label(cluster.documents)
191
- # Example: "auth-state-patterns"
192
-
193
- # Tag all docs in cluster
194
- for doc in cluster.documents:
195
- distance = doc.distance_to_centroid(cluster)
196
- confidence = 1 - (distance / max_distance)
197
- doc.add_tag(f"latent:{label}", confidence)
198
- ```
199
-
200
- **Result in front matter:**
201
-
202
- ```yaml
203
- tags:
204
- latent:
205
- - auth-state-patterns (0.91) # strong cluster membership
206
- - ui-reactivity (0.78) # secondary cluster
207
- - browser-persistence (0.65) # weak membership
208
- ```
209
-
210
- **Enables queries:**
211
-
212
- ```bash
213
- # Find all docs in cluster
214
- $ amalfa search --cluster auth-state-patterns
215
-
216
- # Find docs near cluster boundary (potentially mis-clustered)
217
- $ amalfa search --cluster auth-state-patterns --confidence "<0.7"
218
- ```
219
-
220
- **Advantages:**
221
-
222
- - No taxonomy to maintain
223
- - Clusters adapt as corpus grows
224
- - Multi-cluster membership (docs can be in multiple clusters)
225
- - Confidence scores expose uncertainty
226
-
227
- **Re-clustering:**
228
-
229
- ```bash
230
- # Periodically re-cluster (weekly?)
231
- $ amalfa recluster --min-docs 15
232
-
233
- Analyzing 143 documents...
234
- ✓ Found 12 clusters (was 10)
235
- ✓ Created new cluster: api-integration-patterns (8 docs)
236
- ✓ Merged clusters: css-layout + browser-layout → browser-layout (15 docs)
237
- ✓ Relabeled cluster: state-mgmt → state-patterns (better fit)
238
- ✓ Updated 143 document front matters
239
-
240
- Commit? (Y/n)
241
- ```
242
-
243
- **Human reviews cluster changes via git diff.**
244
-
245
- ### Pattern 2: Entity Extraction & Auto-Linking
246
-
247
- **Purpose:** Link documents when concepts are mentioned
248
-
249
- **How it works:**
250
-
251
- ```python
252
- # Agent writes: "Alpine's x-data pattern works well"
253
-
254
- # Entity extraction
255
- entities = extract_entities("Alpine's x-data pattern works well")
256
- # → ["Alpine", "x-data pattern"]
257
-
258
- # Search graph for matches
259
- for entity in entities:
260
- matches = search_graph(entity, threshold=0.85)
261
- # → playbook-alpine-patterns (0.89)
262
-
263
- # Rewrite content with wiki link
264
- text = text.replace(
265
- "Alpine's x-data pattern",
266
- "[[playbook-alpine-patterns|Alpine's x-data pattern]]"
267
- )
268
- ```
269
-
270
- **Result:**
271
-
272
- ```markdown
273
- Before:
274
- - Alpine's x-data pattern works well
275
-
276
- After:
277
- - [[playbook-alpine-patterns|Alpine's x-data pattern]] works well
278
- ```
279
-
280
- **Advanced: Prevent over-linking**
281
-
282
- ```markdown
283
- # First mention: linked
284
- [[playbook-alpine-patterns|Alpine's x-data pattern]] works well.
285
-
286
- # Subsequent mentions: not linked (avoid clutter)
287
- Later we used the x-data pattern again.
288
-
289
- # Unless in different section
290
- ## Another Section
291
- We also applied [[playbook-alpine-patterns|Alpine's x-data]] here.
292
- ```
293
-
294
- **Human can prevent linking:**
295
-
296
- ```markdown
297
- <!-- amalfa-nolink: alpine -->
298
- We use Alpine here but don't link to playbook.
299
- ```
300
-
301
- ### Pattern 3: Topic Modeling
302
-
303
- **Purpose:** Extract high-level themes from content
304
-
305
- **How it works:**
306
-
307
- ```python
308
- # Run LDA or BERTopic on corpus
309
- topics = extract_topics(all_documents, n_topics=10)
310
-
311
- # Topics emerge from content:
312
- Topic 1: [authentication, token, session, login] (coherence: 0.82)
313
- Topic 2: [layout, css, flexbox, grid, safari] (coherence: 0.79)
314
- Topic 3: [performance, debounce, throttle] (coherence: 0.75)
315
-
316
- # Compute topic distribution for each doc
317
- doc.topic_distribution = compute_distribution(doc, topics)
318
- # → {topic_1: 0.45, topic_2: 0.38, topic_3: 0.17}
319
- ```
320
-
321
- **Result in front matter:**
322
-
323
- ```yaml
324
- topics:
325
- - authentication (0.45) # strong topic
326
- - css-layout (0.38) # secondary topic
327
- - performance (0.17) # weak topic
328
- ```
329
-
330
- **Enables queries:**
331
-
332
- ```bash
333
- # Find all docs about authentication
334
- $ amalfa search --topic authentication --min-score 0.4
335
-
336
- # Find docs bridging two topics
337
- $ amalfa search --topics authentication,performance --min-both 0.3
338
- ```
339
-
340
- **Topic evolution tracking:**
341
-
342
- ```bash
343
- # Show how topics shift over time
344
- $ amalfa topic-timeline authentication
345
-
346
- 2025-11: 5 docs (focus: OAuth patterns)
347
- 2025-12: 8 docs (focus: Session management)
348
- 2026-01: 12 docs (focus: Token refresh, Alpine integration)
349
- ```
350
-
351
- ### Pattern 4: Similarity-Based Suggested Reading
352
-
353
- **Purpose:** Help agents get context quickly
354
-
355
- **How it works:**
356
-
357
- ```python
358
- # For new document, find k-nearest neighbors
359
- doc = load_document("debrief-auth-refactor.md")
360
- neighbors = find_knn(doc.vector, k=5, exclude=doc.id)
361
-
362
- # Rank by similarity
363
- results = [
364
- ("debrief-session-management", 0.87),
365
- ("playbook-state-patterns", 0.82),
366
- ("brief-auth-system", 0.79),
367
- ("debrief-token-refresh", 0.76),
368
- ("playbook-alpine-patterns", 0.74)
369
- ]
370
- ```
371
-
372
- **Result in front matter:**
373
-
374
- ```yaml
375
- suggested_reading:
376
- - debrief-session-management (similar-patterns, 0.87)
377
- - playbook-state-patterns (related-approach, 0.82)
378
- - brief-auth-system (architectural-context, 0.79)
379
- ```
380
-
381
- **Agent reads these first when resuming work:**
382
-
383
- ```bash
384
- # New session starts
385
- Agent: "I'm working on brief-payment-refactor"
386
-
387
- Amalfa: "Here's the context:"
388
- 1. debrief-auth-refactor (0.87) - similar state patterns
389
- 2. playbook-session-management (0.82) - persistence approach
390
- 3. brief-auth-system (0.79) - architectural context
391
-
392
- Agent: *reads top 3 docs*
393
- Agent: *starts work with full context*
394
- ```
395
-
396
- ### Pattern 5: Temporal Sequences
397
-
398
- **Purpose:** Track work evolution over time
399
-
400
- **How it works:**
401
-
402
- ```python
403
- # Detect brief → debrief → playbook → follow-up chains
404
- sequence = [
405
- ("brief-auth-system", "2025-11-01"),
406
- ("debrief-auth-system", "2025-11-05"),
407
- ("playbook-alpine-patterns", "2025-11-05"), # updated
408
- ("brief-auth-refactor", "2025-12-01"), # references playbook
409
- ("debrief-auth-refactor", "2025-12-05"),
410
- ("playbook-alpine-patterns", "2025-12-05"), # updated again
411
- ]
412
-
413
- # Tag docs with sequence metadata
414
- ```
415
-
416
- **Result in front matter:**
417
-
418
- ```yaml
419
- sequence:
420
- chain: auth-system-evolution
421
- predecessor: debrief-auth-system
422
- successor: brief-auth-tests
423
- position: 3/7
424
- ```
425
-
426
- **Enables queries:**
427
-
428
- ```bash
429
- # Show full evolution of auth work
430
- $ amalfa sequence auth-system-evolution
431
-
432
- Auth System Evolution (7 docs):
433
- 1. brief-auth-system (2025-11-01)
434
- 2. debrief-auth-system (2025-11-05)
435
- 3. playbook-alpine-patterns (updated 2025-11-05)
436
- 4. brief-auth-refactor (2025-12-01) ← references playbook
437
- 5. debrief-auth-refactor (2025-12-05)
438
- 6. playbook-alpine-patterns (updated 2025-12-05)
439
- 7. brief-auth-tests (2026-01-03)
440
- ```
441
-
442
- ### Pattern 6: Semantic Backlinks
443
-
444
- **Purpose:** Maintain bidirectional links automatically
445
-
446
- **How it works:**
447
-
448
- ```python
449
- # When doc A links to doc B
450
- if "[[playbook-alpine-patterns]]" in doc_a.content:
451
- # Update doc B's front matter automatically
452
- doc_b.add_backlink(doc_a.id, similarity=0.89)
453
- ```
454
-
455
- **Result in `playbook-alpine-patterns.md`:**
456
-
457
- ```yaml
458
- backlinks:
459
- - debrief-auth-refactor (2026-01-05, 0.89)
460
- - debrief-session-management (2025-12-03, 0.82)
461
- - brief-payment-refactor (2025-11-15, 0.78)
462
- - debrief-login-flow (2025-11-08, 0.76)
463
- ```
464
-
465
- **Human never maintains backlinks manually.**
466
-
467
- **Broken link detection:**
468
-
469
- ```bash
470
- # If playbook-alpine-patterns is deleted/renamed
471
- $ amalfa check-links
472
-
473
- ⚠️ Found 4 broken links:
474
- - debrief-auth-refactor.md → [[playbook-alpine-patterns]] (deleted)
475
- - debrief-session-management.md → [[playbook-alpine-patterns]] (deleted)
476
-
477
- Suggested replacements:
478
- - [[playbook-state-patterns]] (0.91 similar)
479
- - [[playbook-reactive-patterns]] (0.85 similar)
480
-
481
- Apply suggestions? (Y/n/individual)
482
- ```
483
-
484
- ### Pattern 7: Confidence-Based Tag Weighting
485
-
486
- **Purpose:** Express uncertainty in metadata
487
-
488
- **All tags have confidence scores:**
489
-
490
- ```yaml
491
- tags:
492
- explicit:
493
- - alpine.js (1.0) # human-added, certain
494
- - architecture-decision (1.0)
495
-
496
- extracted:
497
- - state-management (0.87) # mentioned 5 times, high confidence
498
- - localStorage (0.78) # mentioned 3 times
499
- - token-refresh (0.45) # mentioned once, low confidence
500
-
501
- latent:
502
- - auth-state-patterns (0.91) # strong cluster membership
503
- - ui-reactivity (0.78) # secondary cluster
504
- - browser-persistence (0.65) # weak membership
505
- ```
506
-
507
- **Query by confidence:**
508
-
509
- ```bash
510
- # Only high-confidence tags
511
- $ amalfa search --tags state-management --min-confidence 0.8
512
-
513
- # Find potentially mis-tagged docs
514
- $ amalfa search --tags --max-confidence 0.6
515
- ```
516
-
517
- **Learning from human edits:**
518
-
519
- ```python
520
- # Human removes tag "token-refresh" (was confidence 0.45)
521
- # System learns: tags below 0.5 confidence are often incorrect
522
- # Adjust threshold for future auto-tagging
523
- new_threshold = learn_from_removal(removed_tag, confidence=0.45)
524
- # → new_threshold = 0.55
525
- ```
526
-
527
- ---
528
-
529
- ## The Daemon's Role
530
-
531
- ### Continuous File Watching
532
-
533
- ```bash
534
- $ amalfa daemon start
535
-
536
- Daemon started. Watching:
537
- - /path/to/repo/docs/**/*.md
538
- - /path/to/repo/briefs/**/*.md
539
- - /path/to/repo/debriefs/**/*.md
540
- - /path/to/repo/playbooks/**/*.md
541
-
542
- [2026-01-06 14:45:00] File changed: debriefs/2026-01-05-auth-refactor.md
543
- ✓ Re-generated embedding
544
- ✓ Updated cluster assignment (moved to cluster 1)
545
- ✓ Re-computed similarity neighbors
546
- ✓ Updated backlinks (3 docs reference this)
547
- ✓ Committed changes
548
-
549
- [2026-01-06 14:46:30] File deleted: playbooks/old-pattern.md
550
- ✓ Removed from graph database
551
- ✓ Found 5 broken links
552
- ✓ Suggested replacements
553
- ✓ Updated front matter in referring docs
554
- ✓ Committed changes
555
-
556
- [2026-01-06 14:48:15] Manual edit detected: debriefs/2026-01-05-auth-refactor.md
557
- (User removed tag: token-refresh)
558
- ✓ Re-indexed without removed tag
559
- ✓ Updated confidence threshold (0.45 → 0.55)
560
- ✓ No commit needed (human already committed)
561
- ```
562
-
563
- ### Git Integration
564
-
565
- **Daemon creates atomic commits:**
566
-
567
- ```bash
568
- $ git log --oneline --grep="Amalfa:"
569
-
570
- a7f3d2e Amalfa: auto-tagged debrief-auth-refactor (added 4 tags, 3 links)
571
- 8b2e4f1 Amalfa: re-clustered corpus (15 new docs, 12 clusters)
572
- c3d5a9f Amalfa: updated backlinks for playbook-alpine-patterns (2 new references)
573
- d4e6b2c Amalfa: detected broken links, suggested replacements (5 links fixed)
574
- ```
575
-
576
- **Each commit is a unit of work that can be reviewed/reverted.**
577
-
578
- ---
579
-
580
- ## Human Audit Workflow
581
-
582
- ### Weekly Review
583
-
584
- ```bash
585
- # See what agent did this week
586
- $ git log --since="1 week ago" --grep="Amalfa:" --oneline
587
-
588
- a7f3d2e Amalfa: auto-tagged debrief-auth-refactor
589
- 8b2e4f1 Amalfa: re-clustered corpus
590
- c3d5a9f Amalfa: updated backlinks
591
- d4e6b2c Amalfa: fixed broken links
592
-
593
- # Review specific commit
594
- $ git show a7f3d2e
595
-
596
- diff --git a/debriefs/2026-01-05-auth-refactor.md
597
- +tags:
598
- + explicit: [alpine.js, state-management, localStorage, token-refresh]
599
- + latent:
600
- + - auth-state-patterns (0.91)
601
-
602
- # Looks good, move on
603
- ```
604
-
605
- ### Correction Workflow
606
-
607
- **If something is wrong:**
608
-
609
- ```bash
610
- # Edit the document directly
611
- $ vim debriefs/2026-01-05-auth-refactor.md
612
-
613
- # Remove incorrect tag
614
- tags:
615
- explicit: [alpine.js, state-management, localStorage] # removed token-refresh
616
-
617
- # Commit change
618
- $ git commit -m "Remove incorrect tag from auth-refactor debrief"
619
-
620
- # Daemon picks up change automatically
621
- [Amalfa daemon: detected manual edit]
622
- ✓ Re-indexed without token-refresh tag
623
- ✓ Updated graph query results
624
- ✓ Learned: tags mentioning "refresh" once = low confidence
625
- ```
626
-
627
- **No special commands needed.** Just edit markdown, commit, done.
628
-
629
- ### Batch Corrections
630
-
631
- **If agent made systematic error:**
632
-
633
- ```bash
634
- # Find all docs with questionable tag
635
- $ amalfa search --tags token-refresh --confidence "<0.6"
636
-
637
- Found 7 documents with low-confidence "token-refresh" tag:
638
- - debrief-auth-refactor.md (0.45)
639
- - debrief-session-management.md (0.52)
640
- - debrief-login-flow.md (0.48)
641
- ...
642
-
643
- # Remove tag from all
644
- $ amalfa untag --tag token-refresh --max-confidence 0.6
645
-
646
- Removing "token-refresh" from 7 documents...
647
- ✓ Updated 7 files
648
- ✓ Re-indexed 7 documents
649
- ✓ Committed changes
650
-
651
- $ git show HEAD
652
- Amalfa: batch removed low-confidence tag "token-refresh" (7 docs)
653
- ```
654
-
655
- ---
656
-
657
- ## Example: Full Lifecycle
658
-
659
- ### 1. Agent Writes Document
660
-
661
- ```markdown
662
- # Debrief: Auth Refactor
663
-
664
- ## What Worked
665
- - Alpine's x-data pattern eliminated manual state tracking
666
- - Token refresh using $watch is reactive
667
-
668
- ## What Failed
669
- - Storing token in Alpine state broke on reload
670
-
671
- ## Lessons Learned
672
- - Alpine for UI state, localStorage for persistence
673
- ```
674
-
675
- **No metadata yet. Just content.**
676
-
677
- ### 2. Agent Saves → Auto-Augmentation
678
-
679
- ```bash
680
- [pre-commit hook or daemon watch triggers]
681
-
682
- $ amalfa auto-augment debrief-auth-refactor.md
683
-
684
- Processing...
685
- ✓ Entity extraction (found: Alpine, x-data, localStorage)
686
- ✓ Auto-linking (3 links inserted)
687
- ✓ Clustering (assigned to: auth-state-patterns, 0.91)
688
- ✓ Similarity search (found 5 neighbors)
689
- ✓ Tag extraction (6 tags)
690
- ✓ Metadata generation (embedding, vector_id)
691
-
692
- Commit? (Y/n) y
693
-
694
- [Amalfa: auto-tagged debrief-auth-refactor]
695
- ```
696
-
697
- ### 3. Result Document
698
-
699
- ```markdown
700
- ---
701
- type: debrief
702
- brief_id: brief-auth-refactor
703
- date: 2026-01-05
704
- author: claude-3.5
705
-
706
- # Auto-generated by Amalfa (edit freely)
707
- tags:
708
- explicit: [alpine.js, state-management, localStorage, token-refresh]
709
- latent:
710
- - auth-state-patterns (0.91)
711
- - ui-reactivity (0.78)
712
- topics:
713
- - authentication (0.45)
714
- - state-patterns (0.38)
715
-
716
- links:
717
- - playbook-alpine-patterns (uses-pattern, 0.89)
718
- - playbook-state-persistence (extends, 0.81)
719
- - debrief-token-refresh (similar-problem, 0.76)
720
-
721
- suggested_reading:
722
- - debrief-session-management (0.87)
723
- - playbook-reactive-patterns (0.82)
724
-
725
- semantic_neighbors:
726
- - debrief-session-management (0.87)
727
- - debrief-login-flow (0.83)
728
-
729
- vector_id: vec_a7f3d2e
730
- last_indexed: 2026-01-05T14:45:00Z
731
- ---
732
-
733
- # Debrief: Auth Refactor
734
-
735
- ## What Worked
736
- - [[playbook-alpine-patterns|Alpine's x-data pattern]] eliminated manual state tracking
737
- - Token refresh using [[debrief-token-refresh|$watch]] is reactive
738
-
739
- ## What Failed
740
- - Storing token in Alpine state broke on reload
741
-
742
- ## Lessons Learned
743
- - Alpine for UI state, [[playbook-state-persistence|localStorage]] for persistence
744
- ```
745
-
746
- ### 4. Human Reviews (Days Later)
747
-
748
- ```bash
749
- $ git log --since="1 week ago" --oneline --grep="Amalfa:"
750
-
751
- a7f3d2e Amalfa: auto-tagged debrief-auth-refactor
752
-
753
- $ git show a7f3d2e
754
-
755
- # Human notices: "token-refresh" tag is wrong (not the focus of this doc)
756
- ```
757
-
758
- ### 5. Human Corrects
759
-
760
- ```bash
761
- $ vim debriefs/2026-01-05-auth-refactor.md
762
-
763
- # Remove incorrect tag
764
- tags:
765
- explicit: [alpine.js, state-management, localStorage] # removed token-refresh
766
-
767
- $ git commit -m "Remove irrelevant token-refresh tag"
768
- ```
769
-
770
- ### 6. Daemon Syncs
771
-
772
- ```bash
773
- [Amalfa daemon watches git commits]
774
-
775
- Detected manual edit: debrief-auth-refactor.md
776
- ✓ Re-indexed without token-refresh tag
777
- ✓ Updated search results
778
- ✓ Learned: increase confidence threshold (0.45 → 0.55)
779
-
780
- No commit needed (human already committed).
781
- ```
782
-
783
- **System adapted to correction.**
784
-
785
- ---
786
-
787
- ## Configuration
788
-
789
- ### `.amalfa.yaml`
790
-
791
- ```yaml
792
- # Auto-augmentation settings
793
- auto_augment:
794
- enabled: true
795
- on_save: true # Run on every save (vs manual trigger)
796
- commit_changes: true # Auto-commit augmentations
797
-
798
- # What to augment
799
- features:
800
- entity_linking: true
801
- clustering: true
802
- topic_modeling: true
803
- similarity_search: true
804
- tag_extraction: true
805
- backlinks: true
806
-
807
- # Thresholds
808
- thresholds:
809
- tag_confidence: 0.55 # Learned from human corrections
810
- link_similarity: 0.85 # Minimum similarity for auto-linking
811
- cluster_confidence: 0.70 # Minimum for cluster assignment
812
-
813
- # Re-clustering
814
- reclustering:
815
- auto: true
816
- trigger: 15 # Re-cluster after N new docs
817
- min_cluster_size: 3
818
-
819
- # Daemon settings
820
- daemon:
821
- watch_paths:
822
- - docs/**/*.md
823
- - briefs/**/*.md
824
- - debriefs/**/*.md
825
- - playbooks/**/*.md
826
-
827
- git_integration:
828
- auto_commit: true
829
- commit_prefix: "Amalfa:"
830
-
831
- # Human audit
832
- audit:
833
- weekly_digest: true # Email summary of agent changes
834
- confidence_alerts: true # Alert on low-confidence tags
835
- broken_link_fix: auto # Auto-fix broken links
836
- ```
837
-
838
- ---
839
-
840
- ## Implementation Phases
841
-
842
- ### Phase 1: Basic Auto-Augmentation
843
-
844
- **Scope:** Tag extraction, basic linking
845
-
846
- **Deliverables:**
847
- - Entity extraction from content
848
- - Auto-insert wiki links (high similarity)
849
- - Extract explicit tags from content
850
- - Generate embeddings
851
- - Commit changes to git
852
-
853
- **Result:** Agent writes document → tags + links added automatically
854
-
855
- ### Phase 2: Latent Space Tagging
856
-
857
- **Scope:** Clustering, topic modeling
858
-
859
- **Deliverables:**
860
- - Cluster documents in embedding space
861
- - Generate cluster labels automatically
862
- - Assign latent tags with confidence scores
863
- - Topic modeling (LDA or BERTopic)
864
- - Re-clustering when corpus grows
865
-
866
- **Result:** Documents auto-organize without predefined taxonomy
867
-
868
- ### Phase 3: Semantic Relationships
869
-
870
- **Scope:** Similarity search, suggested reading
871
-
872
- **Deliverables:**
873
- - K-nearest neighbor search
874
- - Suggested reading lists
875
- - Semantic neighbor detection
876
- - Temporal sequence tracking
877
- - Backlink maintenance
878
-
879
- **Result:** Agents get context quickly when starting new sessions
880
-
881
- ### Phase 4: Learning from Corrections
882
-
883
- **Scope:** Adapt to human edits
884
-
885
- **Deliverables:**
886
- - Track human removals (tags, links)
887
- - Adjust confidence thresholds
888
- - Improve entity extraction
889
- - Learn project-specific patterns
890
- - Weekly digest of changes
891
-
892
- **Result:** System gets better over time based on human feedback
893
-
894
- ---
895
-
896
- ## Success Metrics
897
-
898
- ### Agent Productivity
899
-
900
- **Before auto-augmentation:**
901
- - Agent writes document: 15 minutes
902
- - Agent manually tags: 5 minutes
903
- - Agent manually links: 5 minutes
904
- - **Total: 25 minutes**
905
-
906
- **After auto-augmentation:**
907
- - Agent writes document: 15 minutes
908
- - Auto-augmentation runs: 2 seconds
909
- - **Total: 15 minutes (40% faster)**
910
-
911
- ### Human Audit Overhead
912
-
913
- **Target: O(log N) effort**
914
-
915
- - 10 documents: 5 minutes weekly review
916
- - 100 documents: 15 minutes weekly review
917
- - 1000 documents: 30 minutes weekly review
918
-
919
- **Actual corrections: <5% of auto-augmentations need human fixes**
920
-
921
- ### Knowledge Discovery
922
-
923
- **Measure: Time to find relevant context**
924
-
925
- **Before (manual search):**
926
- - Query: "authentication patterns"
927
- - Scan titles/filenames: 10 minutes
928
- - Read 5-10 docs to find relevant ones: 30 minutes
929
- - **Total: 40 minutes**
930
-
931
- **After (semantic search):**
932
- - Query: "authentication patterns"
933
- - Amalfa returns 5 most relevant: 5 seconds
934
- - Agent reads top 3: 10 minutes
935
- - **Total: 10 minutes (75% faster)**
936
-
937
- ---
938
-
939
- ## Advantages Over Manual Metadata
940
-
941
- ### 1. Scales Automatically
942
-
943
- **Manual:**
944
- - N documents = N × human_tagging_time
945
- - Bottleneck
946
-
947
- **Auto-augmentation:**
948
- - N documents = N × 2 seconds + O(log N) human audit
949
- - No bottleneck
950
-
951
- ### 2. Consistent
952
-
953
- **Manual:**
954
- - Different agents tag differently
955
- - Tags drift over time
956
- - Inconsistent naming
957
-
958
- **Auto-augmentation:**
959
- - Same algorithm tags all docs
960
- - Embeddings are comparable
961
- - Re-clustering normalizes tags
962
-
963
- ### 3. Adaptive
964
-
965
- **Manual:**
966
- - Static taxonomy
967
- - Hard to reorganize
968
- - Tags become obsolete
969
-
970
- **Auto-augmentation:**
971
- - Latent space clusters adapt
972
- - Re-clustering as corpus grows
973
- - Old docs get new tags automatically
974
-
975
- ### 4. Low Friction
976
-
977
- **Manual:**
978
- - Agent must remember to tag
979
- - Separate step after writing
980
- - Cognitive overhead
981
-
982
- **Auto-augmentation:**
983
- - Happens automatically on save
984
- - No extra effort
985
- - Agent just writes
986
-
987
- ---
988
-
989
- ## Conclusion
990
-
991
- **Agent-first metadata with git-based auditing enables:**
992
-
993
- 1. **Agent autonomy** - No approval bottleneck
994
- 2. **Human oversight** - Audit via git diff, O(log N) effort
995
- 3. **Automatic organization** - Latent space clusters emerge
996
- 4. **Fast context retrieval** - Agents get up to speed quickly
997
- 5. **System learning** - Adapts to human corrections
998
-
999
- **The paradigm shift:** Metadata is **optimistically generated, occasionally corrected** rather than **pessimistically approved upfront**.
1000
-
1001
- **Result:** Knowledge base that scales with minimal human intervention while maintaining quality through periodic audits.
1002
-
1003
- ---
1004
-
1005
- ## References
1006
-
1007
- - **Latent Space Clustering:** HDBSCAN, K-means, Gaussian Mixture Models
1008
- - **Topic Modeling:** LDA (Latent Dirichlet Allocation), BERTopic
1009
- - **Entity Extraction:** spaCy, BERT NER, GPT-4 prompting
1010
- - **Semantic Search:** Vector embeddings, FAISS, cosine similarity
1011
- - **Git Integration:** Pre-commit hooks, file watchers (watchdog)
1012
-
1013
- ---
1014
-
1015
- **Status:** Design document
1016
- **Next Steps:** Implement Phase 1 (basic auto-augmentation)
1017
- **Feedback:** Iterate based on PolyVis migration experience
1018
-
1019
- ---
1020
-
1021
- _This document describes the agent-first metadata system for Amalfa. The goal: let agents do what they're good at (pattern recognition, semantic analysis) while humans do what they're good at (judgment, correction, strategic direction)._