amalfa 1.0.1 → 1.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +226 -263
- package/docs/AGENT-METADATA-PATTERNS.md +1021 -0
- package/docs/CONFIG_E2E_VALIDATION.md +147 -0
- package/docs/CONFIG_UNIFICATION.md +187 -0
- package/docs/CONFIG_VALIDATION.md +103 -0
- package/docs/LEGACY_DEPRECATION.md +174 -0
- package/docs/MCP_SETUP.md +317 -0
- package/docs/QUICK_START_MCP.md +168 -0
- package/docs/SESSION-2026-01-06-METADATA-PATTERNS.md +346 -0
- package/docs/SETUP.md +464 -0
- package/docs/SETUP_COMPLETE.md +464 -0
- package/docs/VISION-AGENT-LEARNING.md +1242 -0
- package/docs/_current-config-status.md +93 -0
- package/package.json +6 -3
- package/polyvis.settings.json.bak +38 -0
- package/src/cli.ts +103 -21
- package/src/config/defaults.ts +52 -12
- package/src/core/VectorEngine.ts +18 -9
- package/src/mcp/index.ts +62 -7
- package/src/resonance/DatabaseFactory.ts +3 -4
- package/src/resonance/db.ts +4 -4
- package/src/resonance/services/vector-daemon.ts +151 -0
- package/src/utils/DaemonManager.ts +147 -0
- package/src/utils/ZombieDefense.ts +5 -1
- package/:memory: +0 -0
- package/:memory:-shm +0 -0
- package/:memory:-wal +0 -0
- package/CHANGELOG.md.old +0 -43
- package/README.old.md +0 -112
- package/ROADMAP.md +0 -316
- package/TEST_PLAN.md +0 -561
- package/agents.config.json +0 -11
- package/drizzle/0000_minor_iron_fist.sql +0 -19
- package/drizzle/meta/0000_snapshot.json +0 -139
- package/drizzle/meta/_journal.json +0 -13
- package/example_usage.ts +0 -39
- package/experiment.sh +0 -35
- package/hello +0 -2
- package/index.html +0 -52
- package/knowledge/excalibur.md +0 -12
- package/plans/experience-graph-integration.md +0 -60
- package/prompts/gemini-king-mode-prompt.md +0 -46
- package/public/docs/MCP_TOOLS.md +0 -372
- package/schemas/README.md +0 -20
- package/schemas/cda.schema.json +0 -84
- package/schemas/conceptual-lexicon.schema.json +0 -75
- package/scratchpads/dummy-debrief-boxed.md +0 -39
- package/scratchpads/dummy-debrief.md +0 -27
- package/scratchpads/scratchpad-design.md +0 -50
- package/scratchpads/scratchpad-scrolling.md +0 -20
- package/scratchpads/scratchpad-toc-disappearance.md +0 -23
- package/scratchpads/scratchpad-toc.md +0 -28
- package/scratchpads/test_gardener.md +0 -7
- package/src/core/LLMClient.ts +0 -93
- package/src/core/TagEngine.ts +0 -56
- package/src/db/schema.ts +0 -46
- package/src/gardeners/AutoTagger.ts +0 -116
- package/src/pipeline/HarvesterPipeline.ts +0 -101
- package/src/pipeline/Ingestor.ts +0 -555
- package/src/resonance/cli/ingest.ts +0 -41
- package/src/resonance/cli/migrate.ts +0 -54
- package/src/resonance/config.ts +0 -40
- package/src/resonance/daemon.ts +0 -236
- package/src/resonance/pipeline/extract.ts +0 -89
- package/src/resonance/pipeline/transform_docs.ts +0 -60
- package/src/resonance/services/tokenizer.ts +0 -159
- package/src/resonance/transform/cda.ts +0 -393
- package/src/utils/EnvironmentVerifier.ts +0 -67
- package/substack/substack-playbook-1.md +0 -95
- package/substack/substack-playbook-2.md +0 -78
- package/tasks/ui-investigation.md +0 -26
- package/test-db +0 -0
- package/test-db-shm +0 -0
- package/test-db-wal +0 -0
- package/tests/canary/verify_pinch_check.ts +0 -44
- package/tests/fixtures/ingest_test.md +0 -12
- package/tests/fixtures/ingest_test_boxed.md +0 -13
- package/tests/fixtures/safety_test.md +0 -45
- package/tests/fixtures/safety_test_boxed.md +0 -49
- package/tests/fixtures/tagged_output.md +0 -49
- package/tests/fixtures/tagged_test.md +0 -49
- package/tests/mcp-server-settings.json +0 -8
- package/verify-embedder.ts +0 -54
|
@@ -0,0 +1,1021 @@
|
|
|
1
|
+
# Agent-First Metadata: Auto-Augmentation Patterns
|
|
2
|
+
|
|
3
|
+
**Date:** 2026-01-06
|
|
4
|
+
**Status:** Design Document
|
|
5
|
+
**Context:** Agent autonomy with human audit model
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Executive Summary
|
|
10
|
+
|
|
11
|
+
Traditional knowledge management systems require humans to manually tag, link, and organize documents. This creates bottlenecks and doesn't scale. This document describes an **agent-first metadata system** where agents automatically augment documents with tags, links, and semantic relationships, while humans audit changes through git review rather than approving every decision.
|
|
12
|
+
|
|
13
|
+
**Key Innovation:** Latent space tagging - tags and clusters emerge from vector embeddings rather than predefined taxonomies.
|
|
14
|
+
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
## Core Principles
|
|
18
|
+
|
|
19
|
+
### 1. Agent Autonomy by Default
|
|
20
|
+
|
|
21
|
+
**The inversion:**
|
|
22
|
+
|
|
23
|
+
**❌ Old model:**
|
|
24
|
+
```
|
|
25
|
+
Agent: "Should I add this tag?"
|
|
26
|
+
Human: *reviews, approves*
|
|
27
|
+
Agent: "Should I link this document?"
|
|
28
|
+
Human: *reviews, approves*
|
|
29
|
+
```
|
|
30
|
+
→ Human bottleneck on every decision
|
|
31
|
+
|
|
32
|
+
**✅ New model:**
|
|
33
|
+
```
|
|
34
|
+
Agent: *adds tags, links, metadata automatically*
|
|
35
|
+
Human: *reviews git diff occasionally*
|
|
36
|
+
Human: *removes/modifies anything wrong*
|
|
37
|
+
Daemon: *picks up changes, re-indexes*
|
|
38
|
+
```
|
|
39
|
+
→ Human audits, doesn't approve
|
|
40
|
+
|
|
41
|
+
### 2. Git as Source of Truth
|
|
42
|
+
|
|
43
|
+
**All agent augmentations are git commits:**
|
|
44
|
+
- Atomic (each augmentation is one commit)
|
|
45
|
+
- Auditable (see exactly what agent changed)
|
|
46
|
+
- Reversible (revert any change)
|
|
47
|
+
- Non-destructive (originals preserved in history)
|
|
48
|
+
|
|
49
|
+
**Pattern:**
|
|
50
|
+
```bash
|
|
51
|
+
# Agent auto-augments
|
|
52
|
+
[Amalfa: auto-tagged debrief-auth-refactor]
|
|
53
|
+
|
|
54
|
+
# Human reviews
|
|
55
|
+
$ git diff
|
|
56
|
+
|
|
57
|
+
# If wrong, just edit
|
|
58
|
+
$ vim debrief-auth-refactor.md
|
|
59
|
+
$ git commit -m "Remove incorrect tag"
|
|
60
|
+
|
|
61
|
+
# Daemon syncs automatically
|
|
62
|
+
[Amalfa: detected manual edit, re-indexed]
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
### 3. Optimistic Metadata
|
|
66
|
+
|
|
67
|
+
**Metadata is optimistic by default, corrected on audit:**
|
|
68
|
+
|
|
69
|
+
- Agent adds metadata immediately (no approval needed)
|
|
70
|
+
- Human reviews periodically (weekly, not per-decision)
|
|
71
|
+
- Human corrects errors (removes/modifies as needed)
|
|
72
|
+
- System adapts to corrections (learns from edits)
|
|
73
|
+
|
|
74
|
+
**Scaling characteristic:** Human effort is O(log N), not O(N).
|
|
75
|
+
|
|
76
|
+
### 4. Latent Space Organization
|
|
77
|
+
|
|
78
|
+
**Tags and clusters emerge from vector space, not predefined taxonomy:**
|
|
79
|
+
|
|
80
|
+
- No predefined tag list to maintain
|
|
81
|
+
- Clusters form naturally from content similarity
|
|
82
|
+
- Labels generated from cluster analysis
|
|
83
|
+
- Adapts as knowledge base grows
|
|
84
|
+
|
|
85
|
+
**Example:**
|
|
86
|
+
```yaml
|
|
87
|
+
tags:
|
|
88
|
+
latent:
|
|
89
|
+
- auth-state-patterns (0.91) # cluster assignment
|
|
90
|
+
- ui-reactivity (0.78) # secondary cluster
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
---
|
|
94
|
+
|
|
95
|
+
## The Auto-Augmentation Workflow
|
|
96
|
+
|
|
97
|
+
### On Document Save
|
|
98
|
+
|
|
99
|
+
**Triggered automatically (pre-commit hook or daemon watch):**
|
|
100
|
+
|
|
101
|
+
```bash
|
|
102
|
+
[file saved: debrief-auth-refactor.md]
|
|
103
|
+
|
|
104
|
+
$ amalfa auto-augment debrief-auth-refactor.md
|
|
105
|
+
|
|
106
|
+
Processing...
|
|
107
|
+
[1/6] Entity extraction (0.3s)
|
|
108
|
+
[2/6] Auto-linking (0.5s)
|
|
109
|
+
[3/6] Clustering (0.8s)
|
|
110
|
+
[4/6] Similarity search (0.2s)
|
|
111
|
+
[5/6] Tag extraction (0.4s)
|
|
112
|
+
[6/6] Metadata generation (0.1s)
|
|
113
|
+
|
|
114
|
+
✓ Done. Modified front matter (15 lines changed).
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
**Agent augments document, commits, done.**
|
|
118
|
+
|
|
119
|
+
### What Gets Added
|
|
120
|
+
|
|
121
|
+
**Front matter is augmented with:**
|
|
122
|
+
|
|
123
|
+
```yaml
|
|
124
|
+
---
|
|
125
|
+
type: debrief
|
|
126
|
+
brief_id: brief-auth-refactor
|
|
127
|
+
date: 2026-01-05
|
|
128
|
+
author: claude-3.5
|
|
129
|
+
|
|
130
|
+
# Auto-generated by Amalfa (edit freely, changes will sync)
|
|
131
|
+
tags:
|
|
132
|
+
explicit: [alpine.js, state-management, localStorage]
|
|
133
|
+
latent:
|
|
134
|
+
- auth-state-patterns (0.91)
|
|
135
|
+
- ui-reactivity (0.78)
|
|
136
|
+
topics:
|
|
137
|
+
- authentication (0.45)
|
|
138
|
+
- state-patterns (0.38)
|
|
139
|
+
|
|
140
|
+
links:
|
|
141
|
+
- playbook-alpine-patterns (uses-pattern, 0.89)
|
|
142
|
+
- playbook-state-persistence (extends, 0.81)
|
|
143
|
+
|
|
144
|
+
suggested_reading:
|
|
145
|
+
- debrief-session-management (0.87)
|
|
146
|
+
- playbook-reactive-patterns (0.82)
|
|
147
|
+
|
|
148
|
+
semantic_neighbors:
|
|
149
|
+
- debrief-session-management (0.87)
|
|
150
|
+
- debrief-login-flow (0.83)
|
|
151
|
+
|
|
152
|
+
vector_id: vec_a7f3d2e
|
|
153
|
+
embedding_model: all-MiniLM-L6-v2
|
|
154
|
+
last_indexed: 2026-01-05T14:45:00Z
|
|
155
|
+
---
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
**Body is augmented with wiki links:**
|
|
159
|
+
|
|
160
|
+
```markdown
|
|
161
|
+
## What Worked
|
|
162
|
+
- [[playbook-alpine-patterns|Alpine's x-data pattern]] eliminated state tracking
|
|
163
|
+
- Token refresh using [[debrief-token-refresh|$watch]] is reactive
|
|
164
|
+
|
|
165
|
+
## Lessons Learned
|
|
166
|
+
- Alpine for UI state, [[playbook-state-persistence|localStorage]] for persistence
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
---
|
|
170
|
+
|
|
171
|
+
## The Pattern Library
|
|
172
|
+
|
|
173
|
+
### Pattern 1: Latent Space Tagging
|
|
174
|
+
|
|
175
|
+
**Purpose:** Organize documents without predefined taxonomy
|
|
176
|
+
|
|
177
|
+
**How it works:**
|
|
178
|
+
|
|
179
|
+
```python
|
|
180
|
+
# Cluster all documents in embedding space
|
|
181
|
+
docs = load_all_documents()
|
|
182
|
+
embeddings = [doc.vector for doc in docs]
|
|
183
|
+
|
|
184
|
+
# HDBSCAN or K-means clustering
|
|
185
|
+
clusters = cluster_embeddings(embeddings, min_cluster_size=3)
|
|
186
|
+
|
|
187
|
+
# Generate label for each cluster
|
|
188
|
+
for cluster in clusters:
|
|
189
|
+
# Analyze cluster content to extract theme
|
|
190
|
+
label = generate_cluster_label(cluster.documents)
|
|
191
|
+
# Example: "auth-state-patterns"
|
|
192
|
+
|
|
193
|
+
# Tag all docs in cluster
|
|
194
|
+
for doc in cluster.documents:
|
|
195
|
+
distance = doc.distance_to_centroid(cluster)
|
|
196
|
+
confidence = 1 - (distance / max_distance)
|
|
197
|
+
doc.add_tag(f"latent:{label}", confidence)
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
**Result in front matter:**
|
|
201
|
+
|
|
202
|
+
```yaml
|
|
203
|
+
tags:
|
|
204
|
+
latent:
|
|
205
|
+
- auth-state-patterns (0.91) # strong cluster membership
|
|
206
|
+
- ui-reactivity (0.78) # secondary cluster
|
|
207
|
+
- browser-persistence (0.65) # weak membership
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
**Enables queries:**
|
|
211
|
+
|
|
212
|
+
```bash
|
|
213
|
+
# Find all docs in cluster
|
|
214
|
+
$ amalfa search --cluster auth-state-patterns
|
|
215
|
+
|
|
216
|
+
# Find docs near cluster boundary (potentially mis-clustered)
|
|
217
|
+
$ amalfa search --cluster auth-state-patterns --confidence "<0.7"
|
|
218
|
+
```
|
|
219
|
+
|
|
220
|
+
**Advantages:**
|
|
221
|
+
|
|
222
|
+
- No taxonomy to maintain
|
|
223
|
+
- Clusters adapt as corpus grows
|
|
224
|
+
- Multi-cluster membership (docs can be in multiple clusters)
|
|
225
|
+
- Confidence scores expose uncertainty
|
|
226
|
+
|
|
227
|
+
**Re-clustering:**
|
|
228
|
+
|
|
229
|
+
```bash
|
|
230
|
+
# Periodically re-cluster (weekly?)
|
|
231
|
+
$ amalfa recluster --min-docs 15
|
|
232
|
+
|
|
233
|
+
Analyzing 143 documents...
|
|
234
|
+
✓ Found 12 clusters (was 10)
|
|
235
|
+
✓ Created new cluster: api-integration-patterns (8 docs)
|
|
236
|
+
✓ Merged clusters: css-layout + browser-layout → browser-layout (15 docs)
|
|
237
|
+
✓ Relabeled cluster: state-mgmt → state-patterns (better fit)
|
|
238
|
+
✓ Updated 143 document front matters
|
|
239
|
+
|
|
240
|
+
Commit? (Y/n)
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
**Human reviews cluster changes via git diff.**
|
|
244
|
+
|
|
245
|
+
### Pattern 2: Entity Extraction & Auto-Linking
|
|
246
|
+
|
|
247
|
+
**Purpose:** Link documents when concepts are mentioned
|
|
248
|
+
|
|
249
|
+
**How it works:**
|
|
250
|
+
|
|
251
|
+
```python
|
|
252
|
+
# Agent writes: "Alpine's x-data pattern works well"
|
|
253
|
+
|
|
254
|
+
# Entity extraction
|
|
255
|
+
entities = extract_entities("Alpine's x-data pattern works well")
|
|
256
|
+
# → ["Alpine", "x-data pattern"]
|
|
257
|
+
|
|
258
|
+
# Search graph for matches
|
|
259
|
+
for entity in entities:
|
|
260
|
+
matches = search_graph(entity, threshold=0.85)
|
|
261
|
+
# → playbook-alpine-patterns (0.89)
|
|
262
|
+
|
|
263
|
+
# Rewrite content with wiki link
|
|
264
|
+
text = text.replace(
|
|
265
|
+
"Alpine's x-data pattern",
|
|
266
|
+
"[[playbook-alpine-patterns|Alpine's x-data pattern]]"
|
|
267
|
+
)
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
**Result:**
|
|
271
|
+
|
|
272
|
+
```markdown
|
|
273
|
+
Before:
|
|
274
|
+
- Alpine's x-data pattern works well
|
|
275
|
+
|
|
276
|
+
After:
|
|
277
|
+
- [[playbook-alpine-patterns|Alpine's x-data pattern]] works well
|
|
278
|
+
```
|
|
279
|
+
|
|
280
|
+
**Advanced: Prevent over-linking**
|
|
281
|
+
|
|
282
|
+
```markdown
|
|
283
|
+
# First mention: linked
|
|
284
|
+
[[playbook-alpine-patterns|Alpine's x-data pattern]] works well.
|
|
285
|
+
|
|
286
|
+
# Subsequent mentions: not linked (avoid clutter)
|
|
287
|
+
Later we used the x-data pattern again.
|
|
288
|
+
|
|
289
|
+
# Unless in different section
|
|
290
|
+
## Another Section
|
|
291
|
+
We also applied [[playbook-alpine-patterns|Alpine's x-data]] here.
|
|
292
|
+
```
|
|
293
|
+
|
|
294
|
+
**Human can prevent linking:**
|
|
295
|
+
|
|
296
|
+
```markdown
|
|
297
|
+
<!-- amalfa-nolink: alpine -->
|
|
298
|
+
We use Alpine here but don't link to playbook.
|
|
299
|
+
```
|
|
300
|
+
|
|
301
|
+
### Pattern 3: Topic Modeling
|
|
302
|
+
|
|
303
|
+
**Purpose:** Extract high-level themes from content
|
|
304
|
+
|
|
305
|
+
**How it works:**
|
|
306
|
+
|
|
307
|
+
```python
|
|
308
|
+
# Run LDA or BERTopic on corpus
|
|
309
|
+
topics = extract_topics(all_documents, n_topics=10)
|
|
310
|
+
|
|
311
|
+
# Topics emerge from content:
|
|
312
|
+
Topic 1: [authentication, token, session, login] (coherence: 0.82)
|
|
313
|
+
Topic 2: [layout, css, flexbox, grid, safari] (coherence: 0.79)
|
|
314
|
+
Topic 3: [performance, debounce, throttle] (coherence: 0.75)
|
|
315
|
+
|
|
316
|
+
# Compute topic distribution for each doc
|
|
317
|
+
doc.topic_distribution = compute_distribution(doc, topics)
|
|
318
|
+
# → {topic_1: 0.45, topic_2: 0.38, topic_3: 0.17}
|
|
319
|
+
```
|
|
320
|
+
|
|
321
|
+
**Result in front matter:**
|
|
322
|
+
|
|
323
|
+
```yaml
|
|
324
|
+
topics:
|
|
325
|
+
- authentication (0.45) # strong topic
|
|
326
|
+
- css-layout (0.38) # secondary topic
|
|
327
|
+
- performance (0.17) # weak topic
|
|
328
|
+
```
|
|
329
|
+
|
|
330
|
+
**Enables queries:**
|
|
331
|
+
|
|
332
|
+
```bash
|
|
333
|
+
# Find all docs about authentication
|
|
334
|
+
$ amalfa search --topic authentication --min-score 0.4
|
|
335
|
+
|
|
336
|
+
# Find docs bridging two topics
|
|
337
|
+
$ amalfa search --topics authentication,performance --min-both 0.3
|
|
338
|
+
```
|
|
339
|
+
|
|
340
|
+
**Topic evolution tracking:**
|
|
341
|
+
|
|
342
|
+
```bash
|
|
343
|
+
# Show how topics shift over time
|
|
344
|
+
$ amalfa topic-timeline authentication
|
|
345
|
+
|
|
346
|
+
2025-11: 5 docs (focus: OAuth patterns)
|
|
347
|
+
2025-12: 8 docs (focus: Session management)
|
|
348
|
+
2026-01: 12 docs (focus: Token refresh, Alpine integration)
|
|
349
|
+
```
|
|
350
|
+
|
|
351
|
+
### Pattern 4: Similarity-Based Suggested Reading
|
|
352
|
+
|
|
353
|
+
**Purpose:** Help agents get context quickly
|
|
354
|
+
|
|
355
|
+
**How it works:**
|
|
356
|
+
|
|
357
|
+
```python
|
|
358
|
+
# For new document, find k-nearest neighbors
|
|
359
|
+
doc = load_document("debrief-auth-refactor.md")
|
|
360
|
+
neighbors = find_knn(doc.vector, k=5, exclude=doc.id)
|
|
361
|
+
|
|
362
|
+
# Rank by similarity
|
|
363
|
+
results = [
|
|
364
|
+
("debrief-session-management", 0.87),
|
|
365
|
+
("playbook-state-patterns", 0.82),
|
|
366
|
+
("brief-auth-system", 0.79),
|
|
367
|
+
("debrief-token-refresh", 0.76),
|
|
368
|
+
("playbook-alpine-patterns", 0.74)
|
|
369
|
+
]
|
|
370
|
+
```
|
|
371
|
+
|
|
372
|
+
**Result in front matter:**
|
|
373
|
+
|
|
374
|
+
```yaml
|
|
375
|
+
suggested_reading:
|
|
376
|
+
- debrief-session-management (similar-patterns, 0.87)
|
|
377
|
+
- playbook-state-patterns (related-approach, 0.82)
|
|
378
|
+
- brief-auth-system (architectural-context, 0.79)
|
|
379
|
+
```
|
|
380
|
+
|
|
381
|
+
**Agent reads these first when resuming work:**
|
|
382
|
+
|
|
383
|
+
```bash
|
|
384
|
+
# New session starts
|
|
385
|
+
Agent: "I'm working on brief-payment-refactor"
|
|
386
|
+
|
|
387
|
+
Amalfa: "Here's the context:"
|
|
388
|
+
1. debrief-auth-refactor (0.87) - similar state patterns
|
|
389
|
+
2. playbook-session-management (0.82) - persistence approach
|
|
390
|
+
3. brief-auth-system (0.79) - architectural context
|
|
391
|
+
|
|
392
|
+
Agent: *reads top 3 docs*
|
|
393
|
+
Agent: *starts work with full context*
|
|
394
|
+
```
|
|
395
|
+
|
|
396
|
+
### Pattern 5: Temporal Sequences
|
|
397
|
+
|
|
398
|
+
**Purpose:** Track work evolution over time
|
|
399
|
+
|
|
400
|
+
**How it works:**
|
|
401
|
+
|
|
402
|
+
```python
|
|
403
|
+
# Detect brief → debrief → playbook → follow-up chains
|
|
404
|
+
sequence = [
|
|
405
|
+
("brief-auth-system", "2025-11-01"),
|
|
406
|
+
("debrief-auth-system", "2025-11-05"),
|
|
407
|
+
("playbook-alpine-patterns", "2025-11-05"), # updated
|
|
408
|
+
("brief-auth-refactor", "2025-12-01"), # references playbook
|
|
409
|
+
("debrief-auth-refactor", "2025-12-05"),
|
|
410
|
+
("playbook-alpine-patterns", "2025-12-05"), # updated again
|
|
411
|
+
]
|
|
412
|
+
|
|
413
|
+
# Tag docs with sequence metadata
|
|
414
|
+
```
|
|
415
|
+
|
|
416
|
+
**Result in front matter:**
|
|
417
|
+
|
|
418
|
+
```yaml
|
|
419
|
+
sequence:
|
|
420
|
+
chain: auth-system-evolution
|
|
421
|
+
predecessor: debrief-auth-system
|
|
422
|
+
successor: brief-auth-tests
|
|
423
|
+
position: 3/7
|
|
424
|
+
```
|
|
425
|
+
|
|
426
|
+
**Enables queries:**
|
|
427
|
+
|
|
428
|
+
```bash
|
|
429
|
+
# Show full evolution of auth work
|
|
430
|
+
$ amalfa sequence auth-system-evolution
|
|
431
|
+
|
|
432
|
+
Auth System Evolution (7 docs):
|
|
433
|
+
1. brief-auth-system (2025-11-01)
|
|
434
|
+
2. debrief-auth-system (2025-11-05)
|
|
435
|
+
3. playbook-alpine-patterns (updated 2025-11-05)
|
|
436
|
+
4. brief-auth-refactor (2025-12-01) ← references playbook
|
|
437
|
+
5. debrief-auth-refactor (2025-12-05)
|
|
438
|
+
6. playbook-alpine-patterns (updated 2025-12-05)
|
|
439
|
+
7. brief-auth-tests (2026-01-03)
|
|
440
|
+
```
|
|
441
|
+
|
|
442
|
+
### Pattern 6: Semantic Backlinks
|
|
443
|
+
|
|
444
|
+
**Purpose:** Maintain bidirectional links automatically
|
|
445
|
+
|
|
446
|
+
**How it works:**
|
|
447
|
+
|
|
448
|
+
```python
|
|
449
|
+
# When doc A links to doc B
|
|
450
|
+
if "[[playbook-alpine-patterns]]" in doc_a.content:
|
|
451
|
+
# Update doc B's front matter automatically
|
|
452
|
+
doc_b.add_backlink(doc_a.id, similarity=0.89)
|
|
453
|
+
```
|
|
454
|
+
|
|
455
|
+
**Result in `playbook-alpine-patterns.md`:**
|
|
456
|
+
|
|
457
|
+
```yaml
|
|
458
|
+
backlinks:
|
|
459
|
+
- debrief-auth-refactor (2026-01-05, 0.89)
|
|
460
|
+
- debrief-session-management (2025-12-03, 0.82)
|
|
461
|
+
- brief-payment-refactor (2025-11-15, 0.78)
|
|
462
|
+
- debrief-login-flow (2025-11-08, 0.76)
|
|
463
|
+
```
|
|
464
|
+
|
|
465
|
+
**Human never maintains backlinks manually.**
|
|
466
|
+
|
|
467
|
+
**Broken link detection:**
|
|
468
|
+
|
|
469
|
+
```bash
|
|
470
|
+
# If playbook-alpine-patterns is deleted/renamed
|
|
471
|
+
$ amalfa check-links
|
|
472
|
+
|
|
473
|
+
⚠️ Found 4 broken links:
|
|
474
|
+
- debrief-auth-refactor.md → [[playbook-alpine-patterns]] (deleted)
|
|
475
|
+
- debrief-session-management.md → [[playbook-alpine-patterns]] (deleted)
|
|
476
|
+
|
|
477
|
+
Suggested replacements:
|
|
478
|
+
- [[playbook-state-patterns]] (0.91 similar)
|
|
479
|
+
- [[playbook-reactive-patterns]] (0.85 similar)
|
|
480
|
+
|
|
481
|
+
Apply suggestions? (Y/n/individual)
|
|
482
|
+
```
|
|
483
|
+
|
|
484
|
+
### Pattern 7: Confidence-Based Tag Weighting
|
|
485
|
+
|
|
486
|
+
**Purpose:** Express uncertainty in metadata
|
|
487
|
+
|
|
488
|
+
**All tags have confidence scores:**
|
|
489
|
+
|
|
490
|
+
```yaml
|
|
491
|
+
tags:
|
|
492
|
+
explicit:
|
|
493
|
+
- alpine.js (1.0) # human-added, certain
|
|
494
|
+
- architecture-decision (1.0)
|
|
495
|
+
|
|
496
|
+
extracted:
|
|
497
|
+
- state-management (0.87) # mentioned 5 times, high confidence
|
|
498
|
+
- localStorage (0.78) # mentioned 3 times
|
|
499
|
+
- token-refresh (0.45) # mentioned once, low confidence
|
|
500
|
+
|
|
501
|
+
latent:
|
|
502
|
+
- auth-state-patterns (0.91) # strong cluster membership
|
|
503
|
+
- ui-reactivity (0.78) # secondary cluster
|
|
504
|
+
- browser-persistence (0.65) # weak membership
|
|
505
|
+
```
|
|
506
|
+
|
|
507
|
+
**Query by confidence:**
|
|
508
|
+
|
|
509
|
+
```bash
|
|
510
|
+
# Only high-confidence tags
|
|
511
|
+
$ amalfa search --tags state-management --min-confidence 0.8
|
|
512
|
+
|
|
513
|
+
# Find potentially mis-tagged docs
|
|
514
|
+
$ amalfa search --tags --max-confidence 0.6
|
|
515
|
+
```
|
|
516
|
+
|
|
517
|
+
**Learning from human edits:**
|
|
518
|
+
|
|
519
|
+
```python
|
|
520
|
+
# Human removes tag "token-refresh" (was confidence 0.45)
|
|
521
|
+
# System learns: tags below 0.5 confidence are often incorrect
|
|
522
|
+
# Adjust threshold for future auto-tagging
|
|
523
|
+
new_threshold = learn_from_removal(removed_tag, confidence=0.45)
|
|
524
|
+
# → new_threshold = 0.55
|
|
525
|
+
```
|
|
526
|
+
|
|
527
|
+
---
|
|
528
|
+
|
|
529
|
+
## The Daemon's Role
|
|
530
|
+
|
|
531
|
+
### Continuous File Watching
|
|
532
|
+
|
|
533
|
+
```bash
|
|
534
|
+
$ amalfa daemon start
|
|
535
|
+
|
|
536
|
+
Daemon started. Watching:
|
|
537
|
+
- /path/to/repo/docs/**/*.md
|
|
538
|
+
- /path/to/repo/briefs/**/*.md
|
|
539
|
+
- /path/to/repo/debriefs/**/*.md
|
|
540
|
+
- /path/to/repo/playbooks/**/*.md
|
|
541
|
+
|
|
542
|
+
[2026-01-06 14:45:00] File changed: debriefs/2026-01-05-auth-refactor.md
|
|
543
|
+
✓ Re-generated embedding
|
|
544
|
+
✓ Updated cluster assignment (moved to cluster 1)
|
|
545
|
+
✓ Re-computed similarity neighbors
|
|
546
|
+
✓ Updated backlinks (3 docs reference this)
|
|
547
|
+
✓ Committed changes
|
|
548
|
+
|
|
549
|
+
[2026-01-06 14:46:30] File deleted: playbooks/old-pattern.md
|
|
550
|
+
✓ Removed from graph database
|
|
551
|
+
✓ Found 5 broken links
|
|
552
|
+
✓ Suggested replacements
|
|
553
|
+
✓ Updated front matter in referring docs
|
|
554
|
+
✓ Committed changes
|
|
555
|
+
|
|
556
|
+
[2026-01-06 14:48:15] Manual edit detected: debriefs/2026-01-05-auth-refactor.md
|
|
557
|
+
(User removed tag: token-refresh)
|
|
558
|
+
✓ Re-indexed without removed tag
|
|
559
|
+
✓ Updated confidence threshold (0.45 → 0.55)
|
|
560
|
+
✓ No commit needed (human already committed)
|
|
561
|
+
```
|
|
562
|
+
|
|
563
|
+
### Git Integration
|
|
564
|
+
|
|
565
|
+
**Daemon creates atomic commits:**
|
|
566
|
+
|
|
567
|
+
```bash
|
|
568
|
+
$ git log --oneline --grep="Amalfa:"
|
|
569
|
+
|
|
570
|
+
a7f3d2e Amalfa: auto-tagged debrief-auth-refactor (added 4 tags, 3 links)
|
|
571
|
+
8b2e4f1 Amalfa: re-clustered corpus (15 new docs, 12 clusters)
|
|
572
|
+
c3d5a9f Amalfa: updated backlinks for playbook-alpine-patterns (2 new references)
|
|
573
|
+
d4e6b2c Amalfa: detected broken links, suggested replacements (5 links fixed)
|
|
574
|
+
```
|
|
575
|
+
|
|
576
|
+
**Each commit is a unit of work that can be reviewed/reverted.**
|
|
577
|
+
|
|
578
|
+
---
|
|
579
|
+
|
|
580
|
+
## Human Audit Workflow
|
|
581
|
+
|
|
582
|
+
### Weekly Review
|
|
583
|
+
|
|
584
|
+
```bash
|
|
585
|
+
# See what agent did this week
|
|
586
|
+
$ git log --since="1 week ago" --grep="Amalfa:" --oneline
|
|
587
|
+
|
|
588
|
+
a7f3d2e Amalfa: auto-tagged debrief-auth-refactor
|
|
589
|
+
8b2e4f1 Amalfa: re-clustered corpus
|
|
590
|
+
c3d5a9f Amalfa: updated backlinks
|
|
591
|
+
d4e6b2c Amalfa: fixed broken links
|
|
592
|
+
|
|
593
|
+
# Review specific commit
|
|
594
|
+
$ git show a7f3d2e
|
|
595
|
+
|
|
596
|
+
diff --git a/debriefs/2026-01-05-auth-refactor.md
|
|
597
|
+
+tags:
|
|
598
|
+
+ explicit: [alpine.js, state-management, localStorage, token-refresh]
|
|
599
|
+
+ latent:
|
|
600
|
+
+ - auth-state-patterns (0.91)
|
|
601
|
+
|
|
602
|
+
# Looks good, move on
|
|
603
|
+
```
|
|
604
|
+
|
|
605
|
+
### Correction Workflow
|
|
606
|
+
|
|
607
|
+
**If something is wrong:**
|
|
608
|
+
|
|
609
|
+
```bash
|
|
610
|
+
# Edit the document directly
|
|
611
|
+
$ vim debriefs/2026-01-05-auth-refactor.md
|
|
612
|
+
|
|
613
|
+
# Remove incorrect tag
|
|
614
|
+
tags:
|
|
615
|
+
explicit: [alpine.js, state-management, localStorage] # removed token-refresh
|
|
616
|
+
|
|
617
|
+
# Commit change
|
|
618
|
+
$ git commit -m "Remove incorrect tag from auth-refactor debrief"
|
|
619
|
+
|
|
620
|
+
# Daemon picks up change automatically
|
|
621
|
+
[Amalfa daemon: detected manual edit]
|
|
622
|
+
✓ Re-indexed without token-refresh tag
|
|
623
|
+
✓ Updated graph query results
|
|
624
|
+
✓ Learned: tags mentioning "refresh" once = low confidence
|
|
625
|
+
```
|
|
626
|
+
|
|
627
|
+
**No special commands needed.** Just edit markdown, commit, done.
|
|
628
|
+
|
|
629
|
+
### Batch Corrections
|
|
630
|
+
|
|
631
|
+
**If agent made systematic error:**
|
|
632
|
+
|
|
633
|
+
```bash
|
|
634
|
+
# Find all docs with questionable tag
|
|
635
|
+
$ amalfa search --tags token-refresh --confidence "<0.6"
|
|
636
|
+
|
|
637
|
+
Found 7 documents with low-confidence "token-refresh" tag:
|
|
638
|
+
- debrief-auth-refactor.md (0.45)
|
|
639
|
+
- debrief-session-management.md (0.52)
|
|
640
|
+
- debrief-login-flow.md (0.48)
|
|
641
|
+
...
|
|
642
|
+
|
|
643
|
+
# Remove tag from all
|
|
644
|
+
$ amalfa untag --tag token-refresh --max-confidence 0.6
|
|
645
|
+
|
|
646
|
+
Removing "token-refresh" from 7 documents...
|
|
647
|
+
✓ Updated 7 files
|
|
648
|
+
✓ Re-indexed 7 documents
|
|
649
|
+
✓ Committed changes
|
|
650
|
+
|
|
651
|
+
$ git show HEAD
|
|
652
|
+
Amalfa: batch removed low-confidence tag "token-refresh" (7 docs)
|
|
653
|
+
```
|
|
654
|
+
|
|
655
|
+
---
|
|
656
|
+
|
|
657
|
+
## Example: Full Lifecycle
|
|
658
|
+
|
|
659
|
+
### 1. Agent Writes Document
|
|
660
|
+
|
|
661
|
+
```markdown
|
|
662
|
+
# Debrief: Auth Refactor
|
|
663
|
+
|
|
664
|
+
## What Worked
|
|
665
|
+
- Alpine's x-data pattern eliminated manual state tracking
|
|
666
|
+
- Token refresh using $watch is reactive
|
|
667
|
+
|
|
668
|
+
## What Failed
|
|
669
|
+
- Storing token in Alpine state broke on reload
|
|
670
|
+
|
|
671
|
+
## Lessons Learned
|
|
672
|
+
- Alpine for UI state, localStorage for persistence
|
|
673
|
+
```
|
|
674
|
+
|
|
675
|
+
**No metadata yet. Just content.**
|
|
676
|
+
|
|
677
|
+
### 2. Agent Saves → Auto-Augmentation
|
|
678
|
+
|
|
679
|
+
```bash
|
|
680
|
+
[pre-commit hook or daemon watch triggers]
|
|
681
|
+
|
|
682
|
+
$ amalfa auto-augment debrief-auth-refactor.md
|
|
683
|
+
|
|
684
|
+
Processing...
|
|
685
|
+
✓ Entity extraction (found: Alpine, x-data, localStorage)
|
|
686
|
+
✓ Auto-linking (3 links inserted)
|
|
687
|
+
✓ Clustering (assigned to: auth-state-patterns, 0.91)
|
|
688
|
+
✓ Similarity search (found 5 neighbors)
|
|
689
|
+
✓ Tag extraction (6 tags)
|
|
690
|
+
✓ Metadata generation (embedding, vector_id)
|
|
691
|
+
|
|
692
|
+
Commit? (Y/n) y
|
|
693
|
+
|
|
694
|
+
[Amalfa: auto-tagged debrief-auth-refactor]
|
|
695
|
+
```
|
|
696
|
+
|
|
697
|
+
### 3. Result Document
|
|
698
|
+
|
|
699
|
+
```markdown
|
|
700
|
+
---
|
|
701
|
+
type: debrief
|
|
702
|
+
brief_id: brief-auth-refactor
|
|
703
|
+
date: 2026-01-05
|
|
704
|
+
author: claude-3.5
|
|
705
|
+
|
|
706
|
+
# Auto-generated by Amalfa (edit freely)
|
|
707
|
+
tags:
|
|
708
|
+
explicit: [alpine.js, state-management, localStorage, token-refresh]
|
|
709
|
+
latent:
|
|
710
|
+
- auth-state-patterns (0.91)
|
|
711
|
+
- ui-reactivity (0.78)
|
|
712
|
+
topics:
|
|
713
|
+
- authentication (0.45)
|
|
714
|
+
- state-patterns (0.38)
|
|
715
|
+
|
|
716
|
+
links:
|
|
717
|
+
- playbook-alpine-patterns (uses-pattern, 0.89)
|
|
718
|
+
- playbook-state-persistence (extends, 0.81)
|
|
719
|
+
- debrief-token-refresh (similar-problem, 0.76)
|
|
720
|
+
|
|
721
|
+
suggested_reading:
|
|
722
|
+
- debrief-session-management (0.87)
|
|
723
|
+
- playbook-reactive-patterns (0.82)
|
|
724
|
+
|
|
725
|
+
semantic_neighbors:
|
|
726
|
+
- debrief-session-management (0.87)
|
|
727
|
+
- debrief-login-flow (0.83)
|
|
728
|
+
|
|
729
|
+
vector_id: vec_a7f3d2e
|
|
730
|
+
last_indexed: 2026-01-05T14:45:00Z
|
|
731
|
+
---
|
|
732
|
+
|
|
733
|
+
# Debrief: Auth Refactor
|
|
734
|
+
|
|
735
|
+
## What Worked
|
|
736
|
+
- [[playbook-alpine-patterns|Alpine's x-data pattern]] eliminated manual state tracking
|
|
737
|
+
- Token refresh using [[debrief-token-refresh|$watch]] is reactive
|
|
738
|
+
|
|
739
|
+
## What Failed
|
|
740
|
+
- Storing token in Alpine state broke on reload
|
|
741
|
+
|
|
742
|
+
## Lessons Learned
|
|
743
|
+
- Alpine for UI state, [[playbook-state-persistence|localStorage]] for persistence
|
|
744
|
+
```
|
|
745
|
+
|
|
746
|
+
### 4. Human Reviews (Days Later)
|
|
747
|
+
|
|
748
|
+
```bash
|
|
749
|
+
$ git log --since="1 week ago" --oneline --grep="Amalfa:"
|
|
750
|
+
|
|
751
|
+
a7f3d2e Amalfa: auto-tagged debrief-auth-refactor
|
|
752
|
+
|
|
753
|
+
$ git show a7f3d2e
|
|
754
|
+
|
|
755
|
+
# Human notices: "token-refresh" tag is wrong (not the focus of this doc)
|
|
756
|
+
```
|
|
757
|
+
|
|
758
|
+
### 5. Human Corrects
|
|
759
|
+
|
|
760
|
+
```bash
|
|
761
|
+
$ vim debriefs/2026-01-05-auth-refactor.md
|
|
762
|
+
|
|
763
|
+
# Remove incorrect tag
|
|
764
|
+
tags:
|
|
765
|
+
explicit: [alpine.js, state-management, localStorage] # removed token-refresh
|
|
766
|
+
|
|
767
|
+
$ git commit -m "Remove irrelevant token-refresh tag"
|
|
768
|
+
```
|
|
769
|
+
|
|
770
|
+
### 6. Daemon Syncs
|
|
771
|
+
|
|
772
|
+
```bash
|
|
773
|
+
[Amalfa daemon watches git commits]
|
|
774
|
+
|
|
775
|
+
Detected manual edit: debrief-auth-refactor.md
|
|
776
|
+
✓ Re-indexed without token-refresh tag
|
|
777
|
+
✓ Updated search results
|
|
778
|
+
✓ Learned: increase confidence threshold (0.45 → 0.55)
|
|
779
|
+
|
|
780
|
+
No commit needed (human already committed).
|
|
781
|
+
```
|
|
782
|
+
|
|
783
|
+
**System adapted to correction.**
|
|
784
|
+
|
|
785
|
+
---
|
|
786
|
+
|
|
787
|
+
## Configuration
|
|
788
|
+
|
|
789
|
+
### `.amalfa.yaml`
|
|
790
|
+
|
|
791
|
+
```yaml
|
|
792
|
+
# Auto-augmentation settings
|
|
793
|
+
auto_augment:
|
|
794
|
+
enabled: true
|
|
795
|
+
on_save: true # Run on every save (vs manual trigger)
|
|
796
|
+
commit_changes: true # Auto-commit augmentations
|
|
797
|
+
|
|
798
|
+
# What to augment
|
|
799
|
+
features:
|
|
800
|
+
entity_linking: true
|
|
801
|
+
clustering: true
|
|
802
|
+
topic_modeling: true
|
|
803
|
+
similarity_search: true
|
|
804
|
+
tag_extraction: true
|
|
805
|
+
backlinks: true
|
|
806
|
+
|
|
807
|
+
# Thresholds
|
|
808
|
+
thresholds:
|
|
809
|
+
tag_confidence: 0.55 # Learned from human corrections
|
|
810
|
+
link_similarity: 0.85 # Minimum similarity for auto-linking
|
|
811
|
+
cluster_confidence: 0.70 # Minimum for cluster assignment
|
|
812
|
+
|
|
813
|
+
# Re-clustering
|
|
814
|
+
reclustering:
|
|
815
|
+
auto: true
|
|
816
|
+
trigger: 15 # Re-cluster after N new docs
|
|
817
|
+
min_cluster_size: 3
|
|
818
|
+
|
|
819
|
+
# Daemon settings
|
|
820
|
+
daemon:
|
|
821
|
+
watch_paths:
|
|
822
|
+
- docs/**/*.md
|
|
823
|
+
- briefs/**/*.md
|
|
824
|
+
- debriefs/**/*.md
|
|
825
|
+
- playbooks/**/*.md
|
|
826
|
+
|
|
827
|
+
git_integration:
|
|
828
|
+
auto_commit: true
|
|
829
|
+
commit_prefix: "Amalfa:"
|
|
830
|
+
|
|
831
|
+
# Human audit
|
|
832
|
+
audit:
|
|
833
|
+
weekly_digest: true # Email summary of agent changes
|
|
834
|
+
confidence_alerts: true # Alert on low-confidence tags
|
|
835
|
+
broken_link_fix: auto # Auto-fix broken links
|
|
836
|
+
```
|
|
837
|
+
|
|
838
|
+
---
|
|
839
|
+
|
|
840
|
+
## Implementation Phases
|
|
841
|
+
|
|
842
|
+
### Phase 1: Basic Auto-Augmentation
|
|
843
|
+
|
|
844
|
+
**Scope:** Tag extraction, basic linking
|
|
845
|
+
|
|
846
|
+
**Deliverables:**
|
|
847
|
+
- Entity extraction from content
|
|
848
|
+
- Auto-insert wiki links (high similarity)
|
|
849
|
+
- Extract explicit tags from content
|
|
850
|
+
- Generate embeddings
|
|
851
|
+
- Commit changes to git
|
|
852
|
+
|
|
853
|
+
**Result:** Agent writes document → tags + links added automatically
|
|
854
|
+
|
|
855
|
+
### Phase 2: Latent Space Tagging
|
|
856
|
+
|
|
857
|
+
**Scope:** Clustering, topic modeling
|
|
858
|
+
|
|
859
|
+
**Deliverables:**
|
|
860
|
+
- Cluster documents in embedding space
|
|
861
|
+
- Generate cluster labels automatically
|
|
862
|
+
- Assign latent tags with confidence scores
|
|
863
|
+
- Topic modeling (LDA or BERTopic)
|
|
864
|
+
- Re-clustering when corpus grows
|
|
865
|
+
|
|
866
|
+
**Result:** Documents auto-organize without predefined taxonomy
|
|
867
|
+
|
|
868
|
+
### Phase 3: Semantic Relationships
|
|
869
|
+
|
|
870
|
+
**Scope:** Similarity search, suggested reading
|
|
871
|
+
|
|
872
|
+
**Deliverables:**
|
|
873
|
+
- K-nearest neighbor search
|
|
874
|
+
- Suggested reading lists
|
|
875
|
+
- Semantic neighbor detection
|
|
876
|
+
- Temporal sequence tracking
|
|
877
|
+
- Backlink maintenance
|
|
878
|
+
|
|
879
|
+
**Result:** Agents get context quickly when starting new sessions
|
|
880
|
+
|
|
881
|
+
### Phase 4: Learning from Corrections
|
|
882
|
+
|
|
883
|
+
**Scope:** Adapt to human edits
|
|
884
|
+
|
|
885
|
+
**Deliverables:**
|
|
886
|
+
- Track human removals (tags, links)
|
|
887
|
+
- Adjust confidence thresholds
|
|
888
|
+
- Improve entity extraction
|
|
889
|
+
- Learn project-specific patterns
|
|
890
|
+
- Weekly digest of changes
|
|
891
|
+
|
|
892
|
+
**Result:** System gets better over time based on human feedback
|
|
893
|
+
|
|
894
|
+
---
|
|
895
|
+
|
|
896
|
+
## Success Metrics
|
|
897
|
+
|
|
898
|
+
### Agent Productivity
|
|
899
|
+
|
|
900
|
+
**Before auto-augmentation:**
|
|
901
|
+
- Agent writes document: 15 minutes
|
|
902
|
+
- Agent manually tags: 5 minutes
|
|
903
|
+
- Agent manually links: 5 minutes
|
|
904
|
+
- **Total: 25 minutes**
|
|
905
|
+
|
|
906
|
+
**After auto-augmentation:**
|
|
907
|
+
- Agent writes document: 15 minutes
|
|
908
|
+
- Auto-augmentation runs: 2 seconds
|
|
909
|
+
- **Total: 15 minutes (40% faster)**
|
|
910
|
+
|
|
911
|
+
### Human Audit Overhead
|
|
912
|
+
|
|
913
|
+
**Target: O(log N) effort**
|
|
914
|
+
|
|
915
|
+
- 10 documents: 5 minutes weekly review
|
|
916
|
+
- 100 documents: 15 minutes weekly review
|
|
917
|
+
- 1000 documents: 30 minutes weekly review
|
|
918
|
+
|
|
919
|
+
**Actual corrections: <5% of auto-augmentations need human fixes**
|
|
920
|
+
|
|
921
|
+
### Knowledge Discovery
|
|
922
|
+
|
|
923
|
+
**Measure: Time to find relevant context**
|
|
924
|
+
|
|
925
|
+
**Before (manual search):**
|
|
926
|
+
- Query: "authentication patterns"
|
|
927
|
+
- Scan titles/filenames: 10 minutes
|
|
928
|
+
- Read 5-10 docs to find relevant ones: 30 minutes
|
|
929
|
+
- **Total: 40 minutes**
|
|
930
|
+
|
|
931
|
+
**After (semantic search):**
|
|
932
|
+
- Query: "authentication patterns"
|
|
933
|
+
- Amalfa returns 5 most relevant: 5 seconds
|
|
934
|
+
- Agent reads top 3: 10 minutes
|
|
935
|
+
- **Total: 10 minutes (75% faster)**
|
|
936
|
+
|
|
937
|
+
---
|
|
938
|
+
|
|
939
|
+
## Advantages Over Manual Metadata
|
|
940
|
+
|
|
941
|
+
### 1. Scales Automatically
|
|
942
|
+
|
|
943
|
+
**Manual:**
|
|
944
|
+
- N documents = N × human_tagging_time
|
|
945
|
+
- Bottleneck
|
|
946
|
+
|
|
947
|
+
**Auto-augmentation:**
|
|
948
|
+
- N documents = N × 2 seconds + O(log N) human audit
|
|
949
|
+
- No bottleneck
|
|
950
|
+
|
|
951
|
+
### 2. Consistent
|
|
952
|
+
|
|
953
|
+
**Manual:**
|
|
954
|
+
- Different agents tag differently
|
|
955
|
+
- Tags drift over time
|
|
956
|
+
- Inconsistent naming
|
|
957
|
+
|
|
958
|
+
**Auto-augmentation:**
|
|
959
|
+
- Same algorithm tags all docs
|
|
960
|
+
- Embeddings are comparable
|
|
961
|
+
- Re-clustering normalizes tags
|
|
962
|
+
|
|
963
|
+
### 3. Adaptive
|
|
964
|
+
|
|
965
|
+
**Manual:**
|
|
966
|
+
- Static taxonomy
|
|
967
|
+
- Hard to reorganize
|
|
968
|
+
- Tags become obsolete
|
|
969
|
+
|
|
970
|
+
**Auto-augmentation:**
|
|
971
|
+
- Latent space clusters adapt
|
|
972
|
+
- Re-clustering as corpus grows
|
|
973
|
+
- Old docs get new tags automatically
|
|
974
|
+
|
|
975
|
+
### 4. Low Friction
|
|
976
|
+
|
|
977
|
+
**Manual:**
|
|
978
|
+
- Agent must remember to tag
|
|
979
|
+
- Separate step after writing
|
|
980
|
+
- Cognitive overhead
|
|
981
|
+
|
|
982
|
+
**Auto-augmentation:**
|
|
983
|
+
- Happens automatically on save
|
|
984
|
+
- No extra effort
|
|
985
|
+
- Agent just writes
|
|
986
|
+
|
|
987
|
+
---
|
|
988
|
+
|
|
989
|
+
## Conclusion
|
|
990
|
+
|
|
991
|
+
**Agent-first metadata with git-based auditing enables:**
|
|
992
|
+
|
|
993
|
+
1. **Agent autonomy** - No approval bottleneck
|
|
994
|
+
2. **Human oversight** - Audit via git diff, O(log N) effort
|
|
995
|
+
3. **Automatic organization** - Latent space clusters emerge
|
|
996
|
+
4. **Fast context retrieval** - Agents get up to speed quickly
|
|
997
|
+
5. **System learning** - Adapts to human corrections
|
|
998
|
+
|
|
999
|
+
**The paradigm shift:** Metadata is **optimistically generated, occasionally corrected** rather than **pessimistically approved upfront**.
|
|
1000
|
+
|
|
1001
|
+
**Result:** Knowledge base that scales with minimal human intervention while maintaining quality through periodic audits.
|
|
1002
|
+
|
|
1003
|
+
---
|
|
1004
|
+
|
|
1005
|
+
## References
|
|
1006
|
+
|
|
1007
|
+
- **Latent Space Clustering:** HDBSCAN, K-means, Gaussian Mixture Models
|
|
1008
|
+
- **Topic Modeling:** LDA (Latent Dirichlet Allocation), BERTopic
|
|
1009
|
+
- **Entity Extraction:** spaCy, BERT NER, GPT-4 prompting
|
|
1010
|
+
- **Semantic Search:** Vector embeddings, FAISS, cosine similarity
|
|
1011
|
+
- **Git Integration:** Pre-commit hooks, file watchers (watchdog)
|
|
1012
|
+
|
|
1013
|
+
---
|
|
1014
|
+
|
|
1015
|
+
**Status:** Design document
|
|
1016
|
+
**Next Steps:** Implement Phase 1 (basic auto-augmentation)
|
|
1017
|
+
**Feedback:** Iterate based on PolyVis migration experience
|
|
1018
|
+
|
|
1019
|
+
---
|
|
1020
|
+
|
|
1021
|
+
_This document describes the agent-first metadata system for Amalfa. The goal: let agents do what they're good at (pattern recognition, semantic analysis) while humans do what they're good at (judgment, correction, strategic direction)._
|