npm - amalfa - Versions diffs - 1.1.0 → 1.3.0 - Mend

amalfa 1.1.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

package/CHANGELOG.md +39 -0
package/README.md +59 -2
package/package.json +1 -1
package/src/cli/commands/server.ts +59 -3
package/src/config/defaults.ts +23 -0
package/src/core/VectorEngine.ts +11 -14
package/src/daemon/sonar-inference.ts +7 -4
package/src/daemon/sonar-logic.ts +15 -18
package/src/mcp/index.ts +126 -32
package/src/pipeline/AmalfaIngestor.ts +0 -1
package/src/resonance/DATABASE-PROCEDURES.md +347 -0
package/src/resonance/README.md +15 -8
package/src/resonance/db.ts +15 -57
package/src/utils/ContentHydrator.ts +38 -0
package/src/utils/Scratchpad.ts +427 -0
package/src/utils/sonar-client.ts +1 -1
package/src/resonance/schema.ts +0 -190

package/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,45 @@ All notable changes to AMALFA will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [1.3.0] - 2026-01-13
+### Changed
+- **Database Schema**: Migrated to Drizzle ORM for schema management (internal implementation detail)
+- **Content Storage**: Database now stores only metadata and embeddings (hollow nodes). Content read from filesystem via `GraphGardener.getContent()`
+- **Vector Search**: Fixed embedding model consistency - now uses `BGESmallENV15` throughout for improved recall accuracy
+- **Sonar Integration**: Added proper content hydration before reranking to resolve empty placeholder issue
+### Added
+- **Content Hydrator**: `src/utils/ContentHydrator.ts` for explicit filesystem content loading
+- **Database Procedures**: `src/resonance/DATABASE-PROCEDURES.md` documenting canonical database operations
+- **Sonar Diagnostics**: Test suite and assessment tools for reranking service quality
+### Fixed
+- **Vector Recall**: Resolved embedding model mismatch causing poor search results
+- **Sonar Content**: Fixed hollow node issue where Sonar received empty content
+### Removed
+- **Custom Migration System**: Replaced with Drizzle ORM (232 lines deleted from `src/resonance/schema.ts`)
+### Migration
+**The database is a disposable runtime artifact.** If experiencing issues after upgrade:
+```bash
+rm -rf .amalfa/
+bun run scripts/cli/ingest.ts
+```
+Your documents are the single source of truth. Database can be regenerated anytime.
+## [1.2.0] - 2026-01-13
+### Added
+- **Scratchpad Protocol (Phase 7)**: Intercepts large MCP tool outputs (>4KB) and caches them to `.amalfa/cache/scratchpad/`, returning a reference with preview instead of full content. Reduces context window usage for verbose responses.
+  - New `scratchpad_read` and `scratchpad_list` MCP tools for retrieving cached content.
+  - Content-addressable storage with SHA256 deduplication.
+  - Configurable threshold, max age (24h), and cache size limit (50MB).
 ## [1.1.0] - 2026-01-13
 ### Added

package/README.md CHANGED Viewed

@@ -2,7 +2,9 @@
 **A Memory Layer For Agents**
-MCP server that gives AI agents semantic access to project knowledge graphs.
+Local-first knowledge graph with semantic search for AI agents.
+**Core Design**: Your documents are the source of truth. The database is a disposable runtime artifact.
 ---
@@ -24,6 +26,8 @@ Amalfa is a **Model Context Protocol (MCP) server** that provides AI agents with
 Built with **Bun + SQLite + FastEmbed**.
+**Core distinguisher**: Database is a **disposable runtime artifact**. Documents are the source of truth.
 ---
 ## The Problem
@@ -36,7 +40,60 @@ Built with **Bun + SQLite + FastEmbed**.
 ---
-## Core Concepts
+## Core Architecture: Disposable Database
+**The Foundation**: AMALFA treats your filesystem as the single source of truth and the database as an ephemeral cache.
+### The Philosophy
+**Documents = Truth, Database = Cache**
+```
+Markdown Files (filesystem)
+    ↓
+  [Ingestion Pipeline]
+    ↓
+SQLite Database (.amalfa/)
+    ↓
+  [Vector Search]
+    ↓
+MCP Server (AI agents)
+```
+**Key Insight**: The database can be deleted and regenerated at any time without data loss.
+- **Source of Truth**: Your markdown documents (immutable filesystem)
+- **Runtime Artifact**: SQLite database with embeddings and metadata
+- **Regeneration**: `rm -rf .amalfa/ && bun run scripts/cli/ingest.ts`
+### Why This Matters
+**Benefits**:
+- ✅ **No Migration Hell**: Upgrading? Just re-ingest. No migration scripts.
+- ✅ **Deterministic Rebuilds**: Same documents → same database state
+- ✅ **Version Freedom**: Switch between AMALFA versions without fear
+- ✅ **Corruption Immunity**: Database corrupt? Delete and rebuild in seconds
+- ✅ **Model Flexibility**: Change embedding models by re-ingesting
+**Distinguisher**: Unlike traditional systems where the database *is* the truth, AMALFA inverts this. Your prose is permanent, the index is disposable.
+### When to Re-Ingest
+Just delete `.amalfa/` and re-run ingestion:
+```bash
+rm -rf .amalfa/
+bun run scripts/cli/ingest.ts
+```
+**Common scenarios**:
+- After upgrading AMALFA versions
+- When experiencing search issues
+- When changing embedding models
+- After adding/modifying many documents
+- Anytime you want a clean slate
+**Speed**: 308 nodes in <1 second. Re-ingestion is fast enough to be casual.
 ### Brief-Debrief-Playbook Pattern

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
 	"name": "amalfa",
-	"version": "1.1.0",
+	"version": "1.3.0",
 	"description": "Local-first knowledge graph engine for AI agents. Transforms markdown into searchable memory with MCP protocol.",
 	"license": "MIT",
 	"homepage": "https://github.com/pjsvis/amalfa#readme",

package/src/cli/commands/server.ts CHANGED Viewed

@@ -15,8 +15,6 @@ export async function cmdServe(_args: string[]) {
 	console.error(`📊 Database: ${dbPath}`);
 	console.error("");
-	// Run MCP server (it handles stdio transport)
-	// Note: We need to resolve from project root, not relative to this new file location
 	const serverPath = join(process.cwd(), "src/mcp/index.ts");
 	const proc = spawn("bun", ["run", serverPath, "serve"], {
 		stdio: "inherit",
@@ -29,6 +27,21 @@ export async function cmdServe(_args: string[]) {
 }
 export async function cmdServers(args: string[]) {
+	const action = args[1];
+	// If action is provided and isn't a flag (like --dot), treat it as a lifecycle command
+	if (
+		action &&
+		!action.startsWith("-") &&
+		["start", "stop", "restart", "status"].includes(action)
+	) {
+		if (action === "status") {
+			// Just fall through to normal status display
+		} else {
+			await manageAllServers(action as "start" | "stop" | "restart");
+			return;
+		}
+	}
 	const showDot = args.includes("--dot");
 	const SERVICES = [
@@ -178,10 +191,53 @@ export async function cmdServers(args: string[]) {
 	console.log("─".repeat(95));
 	console.log(
-		"\n💡 Commands: amalfa serve | amalfa vector start | amalfa daemon start\n",
+		"\n💡 Commands: amalfa servers [start|stop|restart] | amalfa vector start | amalfa daemon start\n",
 	);
 }
+// Background services to manage via 'amalfa servers start/restart'
+const BACKGROUND_SERVICES = [
+	{
+		name: "Vector Daemon",
+		cmd: "amalfa",
+		args: ["vector", "start"],
+	},
+	{
+		name: "File Watcher",
+		cmd: "amalfa",
+		args: ["daemon", "start"],
+	},
+	{
+		name: "Sonar Agent",
+		cmd: "amalfa",
+		args: ["sonar", "start"],
+	},
+];
+async function manageAllServers(action: "start" | "stop" | "restart") {
+	if (action === "stop" || action === "restart") {
+		await cmdStopAll([]);
+	}
+	if (action === "start" || action === "restart") {
+		console.log("🚀 Starting background services...\n");
+		for (const svc of BACKGROUND_SERVICES) {
+			console.log(`▶️  Starting ${svc.name}...`);
+			const child = spawn(svc.cmd, svc.args, {
+				detached: true,
+				stdio: "ignore", // Daemons manage their own logs
+				cwd: process.cwd(),
+			});
+			child.unref();
+			// Brief pause to allow pid file creation / logging
+			await new Promise((resolve) => setTimeout(resolve, 500));
+		}
+		console.log("\n✅ All background services triggered.");
+		console.log("Run 'amalfa servers' to check status.");
+	}
+}
 export async function cmdStopAll(_args: string[]) {
 	console.log("🛑 Stopping ALL Amalfa Services...\n");

package/src/config/defaults.ts CHANGED Viewed

@@ -18,6 +18,12 @@ export const AMALFA_DIRS = {
 	get agent() {
 		return join(this.base, "agent");
 	},
+	get cache() {
+		return join(this.base, "cache");
+	},
+	get scratchpad() {
+		return join(this.base, "cache", "scratchpad");
+	},
 	get tasks() {
 		return {
 			pending: join(this.base, "agent", "tasks", "pending"),
@@ -33,6 +39,8 @@ export function initAmalfaDirs(): void {
 		AMALFA_DIRS.base,
 		AMALFA_DIRS.logs,
 		AMALFA_DIRS.runtime,
+		AMALFA_DIRS.cache,
+		AMALFA_DIRS.scratchpad,
 		AMALFA_DIRS.tasks.pending,
 		AMALFA_DIRS.tasks.processing,
 		AMALFA_DIRS.tasks.completed,
@@ -82,6 +90,15 @@ export interface AmalfaConfig {
 	phi3?: SonarConfig;
 	/** Ember automated enrichment configuration */
 	ember: EmberConfig;
+	/** Scratchpad cache configuration */
+	scratchpad?: ScratchpadConfig;
+}
+export interface ScratchpadConfig {
+	enabled: boolean;
+	thresholdBytes: number;
+	maxAgeMs: number;
+	maxCacheSizeBytes: number;
 }
 export interface SonarConfig {
@@ -153,6 +170,12 @@ export const DEFAULT_CONFIG: AmalfaConfig = {
 		autoSquash: false,
 		backupDir: ".amalfa/backups/ember",
 	},
+	scratchpad: {
+		enabled: true,
+		thresholdBytes: 4 * 1024,
+		maxAgeMs: 24 * 60 * 60 * 1000,
+		maxCacheSizeBytes: 50 * 1024 * 1024,
+	},
 	watch: {
 		enabled: true,
 		debounce: 1000,

package/src/core/VectorEngine.ts CHANGED Viewed

@@ -102,8 +102,9 @@ export class VectorEngine {
 		}
 		// Lazy load the model
+		// MUST match Embedder model (BGESmallENV15) for compatibility
 		this.modelPromise = FlagEmbedding.init({
-			model: EmbeddingModel.AllMiniLML6V2,
+			model: EmbeddingModel.BGESmallENV15,
 		});
 	}
@@ -188,37 +189,33 @@ export class VectorEngine {
 		// 4. Sort & Limit
 		const topK = scored.sort((a, b) => b.score - a.score).slice(0, limit);
-		// 5. Hydrate Content (for top K only)
-		// Note: Hollow Nodes have content=NULL, use meta.source to read from filesystem if needed
+		// 5. Hydrate Metadata (for top K only)
 		const results: SearchResult[] = [];
-		const contentStmt = this.db.prepare(
-			"SELECT title, content, meta, date FROM nodes WHERE id = ?",
+		const metaStmt = this.db.prepare(
+			"SELECT title, meta, date FROM nodes WHERE id = ?",
 		);
 		for (const item of topK) {
-			const row = contentStmt.get(item.id) as {
+			const row = metaStmt.get(item.id) as {
 				title: string;
-				content: string | null;
 				meta: string | null;
 				date: string | null;
 			};
 			if (row) {
-				// For hollow nodes, extract a preview from title or meta
-				let content = row.content || "";
-				if (!content && row.meta) {
+				let contentPlaceholder = "";
+				if (row.meta) {
 					try {
 						const meta = JSON.parse(row.meta);
-						// Provide source path as content placeholder for hollow nodes
-						content = `[Hollow Node: ${meta.source || "no source"}]`;
+						contentPlaceholder = `[Hollow Node: ${meta.source || "no source"}]`;
 					} catch {
-						content = "[Hollow Node: parse error]";
+						contentPlaceholder = "[Hollow Node: parse error]";
 					}
 				}
 				results.push({
 					id: item.id,
 					score: item.score,
 					title: row.title,
-					content: content,
+					content: contentPlaceholder,
 					date: row.date || undefined,
 				});
 			}

package/src/daemon/sonar-inference.ts CHANGED Viewed

@@ -99,14 +99,17 @@ export async function callOllama(
 		const result = await response.json();
 		if (provider === "openrouter") {
-			// OpenAI format
+			const openAIResult = result as { choices: Array<{ message: Message }> };
+			if (!openAIResult.choices?.[0]?.message) {
+				throw new Error("Invalid OpenRouter response format");
+			}
 			return {
-				message: (result as any).choices[0].message,
+				message: openAIResult.choices[0].message,
 			};
 		}
-		// Ollama format
+		const ollamaResult = result as { message: Message };
 		return {
-			message: (result as any).message,
+			message: ollamaResult.message,
 		};
 	} catch (error) {
 		const errorMsg = error instanceof Error ? error.message : String(error);

package/src/daemon/sonar-logic.ts CHANGED Viewed

@@ -145,20 +145,18 @@ Current Date: ${new Date().toISOString().split("T")[0]}`,
 		let augmentContext = "\n\nRELEVANT CONTEXT FROM KNOWLEDGE BASE:\n";
 		if (results.length > 0) {
 			augmentContext += `\n--- [DIRECT SEARCH RESULTS] ---\n`;
-			results.forEach((r) => {
-				const node = context.db.getNode(r.id);
-				const content = node?.content ?? "";
+			for (const r of results) {
+				const content = (await context.gardener.getContent(r.id)) || "";
 				augmentContext += `[Document: ${r.id}] (Similarity: ${r.score.toFixed(2)})\n${content.slice(0, 800)}\n\n`;
-			});
+			}
 			if (relatedNodeIds.size > 0) {
 				augmentContext += `\n--- [RELATED NEIGHBORS (GRAPH DISCOVERY)] ---\n`;
-				Array.from(relatedNodeIds)
-					.slice(0, 5)
-					.forEach((nrId) => {
-						const node = context.db.getNode(nrId);
-						augmentContext += `[Related: ${nrId}] (Via: ${node?.label || nrId})\n${(node?.content ?? "").slice(0, 400)}\n\n`;
-					});
+				for (const nrId of Array.from(relatedNodeIds).slice(0, 5)) {
+					const node = context.db.getNode(nrId);
+					const content = (await context.gardener.getContent(nrId)) || "";
+					augmentContext += `[Related: ${nrId}] (Via: ${node?.label || nrId})\n${content.slice(0, 400)}\n\n`;
+				}
 			}
 		}
@@ -531,18 +529,17 @@ Return JSON: { "action": "SEARCH"|"READ"|"EXPLORE"|"FINISH", "query": "...", "no
 			);
 			const content = actionResponse.message.content;
-			let decision: {
+			const parsed = safeJsonParse(content);
+			if (!parsed || typeof parsed !== "object") {
+				throw new Error("Could not parse JSON from response");
+			}
+			const decision = parsed as {
 				action: "SEARCH" | "READ" | "EXPLORE" | "FINISH";
 				query?: string;
 				nodeId?: string;
 				reasoning: string;
 				answer?: string;
-			} | null = null;
-			decision = safeJsonParse(content);
-			if (!decision) {
-				throw new Error("Could not parse JSON from response");
-			}
+			};
 			output += `> **Reasoning:** ${decision.reasoning}\n\n`;
 			if (decision.action === "FINISH") {
@@ -669,7 +666,7 @@ Return JSON: { "answered": true|false, "missing_info": "...", "final_answer": ".
 /**
  * Helper to safely parse JSON from LLM responses, handling markdown blocks
  */
-function safeJsonParse(content: string): any {
+function safeJsonParse(content: string): unknown {
 	try {
 		return JSON.parse(content);
 	} catch {