npm - @shadowforge0/aquifer-memory - Versions diffs - 0.2.0 - Mend

@shadowforge0/aquifer-memory 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/README.md +354 -0
package/consumers/cli.js +314 -0
package/consumers/mcp.js +135 -0
package/consumers/openclaw-plugin.js +235 -0
package/consumers/shared/config.js +143 -0
package/consumers/shared/factory.js +77 -0
package/consumers/shared/llm.js +119 -0
package/core/aquifer.js +634 -0
package/core/entity.js +360 -0
package/core/hybrid-rank.js +166 -0
package/core/storage.js +550 -0
package/index.js +6 -0
package/package.json +57 -0
package/pipeline/embed.js +230 -0
package/pipeline/extract-entities.js +73 -0
package/pipeline/summarize.js +245 -0
package/schema/001-base.sql +180 -0
package/schema/002-entities.sql +120 -0

package/README.md ADDED Viewed

@@ -0,0 +1,354 @@
+<div align="center">
+# 🌊 Aquifer
+**PG-native long-term memory for AI agents**
+*Turn-level embedding, hybrid RRF ranking, optional knowledge graph — all on PostgreSQL + pgvector.*
+[![npm version](https://img.shields.io/npm/v/aquifer-memory)](https://www.npmjs.com/package/aquifer-memory)
+[![PostgreSQL 15+](https://img.shields.io/badge/PostgreSQL-15%2B-336791)](https://www.postgresql.org/)
+[![pgvector](https://img.shields.io/badge/pgvector-0.7%2B-blue)](https://github.com/pgvector/pgvector)
+[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
+[English](README.md) | [繁體中文](README_TW.md) | [简体中文](README_CN.md)
+</div>
+---
+## Why Aquifer?
+Most AI memory systems bolt a vector DB on the side. Aquifer takes a different approach: **PostgreSQL is the memory**.
+Sessions, summaries, turn-level embeddings, entity graph — all live in one database, queried with one connection. No sync layer, no eventual consistency, no extra infrastructure.
+### What makes it different
+| | Aquifer | Typical vector-DB approach |
+|---|---|---|
+| **Storage** | PostgreSQL + pgvector | Separate vector DB + app DB |
+| **Granularity** | Turn-level embeddings (not just session summaries) | Session or document chunks |
+| **Ranking** | 3-way RRF: FTS + session embedding + turn embedding | Single vector similarity |
+| **Knowledge graph** | Built-in entity extraction & co-occurrence | Usually separate system |
+| **Multi-tenant** | `tenant_id` on every table, day-1 | Often an afterthought |
+| **Dependencies** | Just `pg` | Multiple SDKs |
+### Before and after
+**Without turn-level memory — search misses precise moments:**
+> Query: "What did we decide about the auth middleware?"
+> → Returns a 2000-word session summary that mentions auth somewhere
+**With Aquifer — search finds the exact turn:**
+> Query: "What did we decide about the auth middleware?"
+> → Returns the specific user turn: "Let's rip out the old auth middleware — legal flagged it for session token compliance"
+---
+## Quick Start
+### Prerequisites
+- Node.js >= 18
+- PostgreSQL 15+ with [pgvector](https://github.com/pgvector/pgvector) extension
+- An embedding API (OpenAI, Ollama, or any OpenAI-compatible endpoint)
+### Install
+```bash
+npm install aquifer-memory
+```
+### Initialize
+```javascript
+const { createAquifer } = require('aquifer-memory');
+const aquifer = createAquifer({
+  schema: 'memory',                    // PG schema name (default: 'aquifer')
+  pg: {
+    connectionString: 'postgresql://user:pass@localhost:5432/mydb',
+  },
+  embedder: {
+    baseURL: 'http://localhost:11434/v1',   // Ollama
+    model: 'bge-m3',
+    apiKey: 'ollama',
+  },
+  llm: {
+    baseURL: 'https://api.openai.com/v1',
+    model: 'gpt-4o-mini',
+    apiKey: process.env.OPENAI_API_KEY,
+  },
+});
+// Run migrations (safe to call multiple times)
+await aquifer.migrate();
+```
+### Ingest a session
+```javascript
+await aquifer.ingest({
+  sessionId: 'conv-001',
+  agentId: 'main',
+  messages: [
+    { role: 'user', content: 'Let me tell you about our new auth approach...' },
+    { role: 'assistant', content: 'Got it. So the plan is...' },
+  ],
+});
+// Stores session → generates summary → creates turn embeddings → extracts entities
+```
+### Recall
+```javascript
+const results = await aquifer.recall('auth middleware decision', {
+  agentId: 'main',
+  limit: 5,
+});
+// Returns ranked sessions with scores, using 3-way RRF fusion
+```
+---
+## Architecture
+```
+┌─────────────────────────────────────────────────────────────┐
+│                    createAquifer (entry)                     │
+│         Config · Migration · Ingest · Recall · Enrich       │
+└────────┬──────────┬──────────┬──────────┬───────────────────┘
+         │          │          │          │
+    ┌────▼───┐ ┌────▼────┐ ┌──▼───┐ ┌───▼──────────┐
+    │storage │ │hybrid-  │ │entity│ │   pipeline/   │
+    │  .js   │ │rank.js  │ │ .js  │ │summarize.js   │
+    └────────┘ └─────────┘ └──────┘ │embed.js       │
+         │                     │    │extract-ent.js │
+    ┌────▼───────────┐    ┌───▼──┐  └───────────────┘
+    │  PostgreSQL     │    │ LLM  │
+    │  + pgvector     │    │ API  │
+    └────────────────┘    └──────┘
+    ┌─────────────────────────────┐
+    │         schema/             │
+    │  001-base.sql (sessions,    │
+    │    summaries, turns, FTS)   │
+    │  002-entities.sql (KG)      │
+    └─────────────────────────────┘
+```
+### File Reference
+| File | Purpose |
+|------|---------|
+| `index.js` | Entry point — exports `createAquifer`, `createEmbedder` |
+| `core/aquifer.js` | Main facade: `migrate()`, `ingest()`, `recall()`, `enrich()` |
+| `core/storage.js` | Session/summary/turn CRUD, FTS search, embedding search |
+| `core/entity.js` | Entity upsert, mention tracking, relation graph, normalization |
+| `core/hybrid-rank.js` | 3-way RRF fusion, time decay, entity boost scoring |
+| `pipeline/summarize.js` | LLM-powered session summarization with structured output |
+| `pipeline/embed.js` | Embedding client (any OpenAI-compatible API) |
+| `pipeline/extract-entities.js` | LLM-powered entity extraction (12 types) |
+| `schema/001-base.sql` | DDL: sessions, summaries, turn_embeddings, FTS indexes |
+| `schema/002-entities.sql` | DDL: entities, mentions, relations, entity_sessions |
+---
+## Core Features
+### 3-Way Hybrid Retrieval (RRF)
+```
+Query ──┬── FTS (BM25)              ──┐
+        ├── Session embedding search ──├── RRF Fusion → Time Decay → Entity Boost → Results
+        └── Turn embedding search   ──┘
+```
+- **Full-text search** — PostgreSQL `tsvector` with language-aware ranking
+- **Session embedding** — cosine similarity on session summaries
+- **Turn embedding** — cosine similarity on individual user turns
+- **Reciprocal Rank Fusion** — merges all three ranked lists (K=60)
+- **Time decay** — sigmoid decay with configurable midpoint and steepness
+- **Entity boost** — sessions mentioning query-relevant entities get a score boost
+### Turn-Level Embeddings
+Not just session summaries — Aquifer embeds each meaningful user turn individually.
+- Filters noise: short messages, slash commands, confirmations ("ok", "got it")
+- Truncates at 2000 chars, skips turns under 5 chars
+- Stores turn text + embedding + position for precise retrieval
+### Knowledge Graph
+Built-in entity extraction and relationship tracking:
+- **12 entity types**: person, project, concept, tool, metric, org, place, event, doc, task, topic, other
+- **Entity normalization**: NFKC + homoglyph mapping + case folding
+- **Co-occurrence relations**: undirected edges with frequency tracking
+- **Entity-session mapping**: which entities appear in which sessions
+- **Entity boost in ranking**: sessions with relevant entities score higher
+---
+## Benchmark: LongMemEval
+We tested Aquifer's retrieval pipeline on [LongMemEval_S](https://github.com/xiaowu0162/LongMemEval) — 470 questions across 19,195 sessions (98,845 turn embeddings).
+**Setup:** Per-question haystack scoping (matching official methodology), bge-m3 embeddings via OpenRouter, turn-level user-only embedding.
+| Metric | Aquifer (bge-m3) |
+|--------|-----------------|
+| R@1 | 89.6% |
+| R@3 | 96.6% |
+| R@5 | 98.1% |
+| R@10 | 98.9% |
+**Key finding:** Turn-level embedding is the main driver — going from session-level (R@1=26.8%) to turn-level (R@1=89.6%) is a 3x improvement.
+### Multi-Tenant
+Every table includes `tenant_id` (default: `'default'`). Isolation is enforced at the query level — no cross-tenant data leakage by design.
+### Schema-per-deployment
+Pass `schema: 'my_app'` to `createAquifer()` and all tables live under that PostgreSQL schema. Run multiple Aquifer instances in the same database without conflicts.
+---
+## API Reference
+### `createAquifer(config)`
+Returns an Aquifer instance with the following methods:
+#### `aquifer.migrate()`
+Runs SQL migrations (idempotent). Creates tables, indexes, and extensions.
+#### `aquifer.ingest(options)`
+Ingests a session: stores messages, generates summary, creates turn embeddings, extracts entities.
+```javascript
+await aquifer.ingest({
+  sessionId: 'unique-id',
+  agentId: 'main',
+  source: 'api',                // optional, default 'api'
+  messages: [{ role, content }],
+  tenantId: 'default',          // optional
+  model: 'gpt-4o',             // optional metadata
+  tokensIn: 1500,              // optional
+  tokensOut: 800,              // optional
+});
+```
+#### `aquifer.recall(query, options)`
+Hybrid search across sessions.
+```javascript
+const results = await aquifer.recall('search query', {
+  agentId: 'main',
+  tenantId: 'default',
+  limit: 10,                    // max results
+  ftsLimit: 20,                 // FTS candidate pool
+  embLimit: 20,                 // embedding candidate pool
+  turnLimit: 20,                // turn embedding candidate pool
+  midpointDays: 45,             // time decay midpoint
+  entityBoostWeight: 0.18,      // entity boost factor
+});
+// Returns: [{ session_id, score, title, overview, started_at, ... }]
+```
+#### `aquifer.enrich(sessionId, options)`
+Re-processes an existing session: regenerate summary, embeddings, and entities.
+#### `aquifer.close()`
+Closes the PostgreSQL connection pool.
+---
+## Configuration
+```javascript
+createAquifer({
+  // PostgreSQL schema name (all tables created under this schema)
+  schema: 'aquifer',
+  // PostgreSQL connection
+  pg: {
+    connectionString: 'postgresql://...',
+    // or individual: host, port, database, user, password
+    max: 10,  // pool size
+  },
+  // Embedding provider (any OpenAI-compatible API)
+  embedder: {
+    baseURL: 'http://localhost:11434/v1',
+    model: 'bge-m3',
+    apiKey: 'ollama',
+    dimensions: 1024,           // optional
+    timeout: 30000,             // ms, default 30s
+  },
+  // LLM for summarization & entity extraction
+  llm: {
+    baseURL: 'https://api.openai.com/v1',
+    model: 'gpt-4o-mini',
+    apiKey: process.env.OPENAI_API_KEY,
+    timeout: 60000,             // ms, default 60s
+  },
+  // Tenant isolation
+  tenantId: 'default',
+});
+```
+---
+## Database Schema
+### 001-base.sql
+| Table | Purpose |
+|-------|---------|
+| `sessions` | Raw conversation data with messages (JSONB), token counts, timestamps |
+| `session_summaries` | LLM-generated structured summaries with embeddings |
+| `turn_embeddings` | Per-turn user message embeddings for precise retrieval |
+Key indexes: GIN on messages, GiST on `tsvector`, ivfflat on embeddings, B-tree on tenant/agent/timestamps.
+### 002-entities.sql
+| Table | Purpose |
+|-------|---------|
+| `entities` | Normalized named entities with type, aliases, frequency, optional embedding |
+| `entity_mentions` | Entity × session join with mention count and context |
+| `entity_relations` | Co-occurrence edges (undirected, `CHECK src < dst`) |
+| `entity_sessions` | Entity-session association for boost scoring |
+Key indexes: trigram on entity names, GiST on embeddings, composite on tenant/agent.
+---
+## Dependencies
+| Package | Purpose |
+|---------|---------|
+| `pg` ≥ 8.13 | PostgreSQL client |
+That's it. Aquifer has **one runtime dependency**.
+LLM and embedding calls use raw HTTP — no SDK required.
+---
+## License
+MIT

package/consumers/cli.js ADDED Viewed

@@ -0,0 +1,314 @@
+#!/usr/bin/env node
+'use strict';
+/**
+ * Aquifer CLI
+ *
+ * Usage:
+ *   aquifer migrate                     Run database migrations
+ *   aquifer recall <query> [options]    Search sessions
+ *   aquifer backfill [options]          Enrich pending sessions
+ *   aquifer stats [options]             Show database statistics
+ *   aquifer export [options]            Export sessions
+ *   aquifer mcp                         Start MCP server
+ */
+const { createAquiferFromConfig } = require('./shared/factory');
+const { loadConfig } = require('./shared/config');
+// ---------------------------------------------------------------------------
+// Argument parser (minimal, no deps)
+// ---------------------------------------------------------------------------
+function parseArgs(argv) {
+  const args = { _: [], flags: {} };
+  // Flags that take a value (not boolean)
+  const VALUE_FLAGS = new Set(['limit', 'agent-id', 'source', 'date-from', 'date-to', 'output', 'format', 'config', 'status', 'concurrency']);
+  for (let i = 0; i < argv.length; i++) {
+    if (argv[i] === '--') { args._.push(...argv.slice(i + 1)); break; }
+    if (argv[i].startsWith('--')) {
+      const key = argv[i].slice(2);
+      if (VALUE_FLAGS.has(key) && i + 1 < argv.length && !argv[i + 1].startsWith('--')) {
+        args.flags[key] = argv[++i];
+      } else {
+        args.flags[key] = true;
+      }
+    } else {
+      args._.push(argv[i]);
+    }
+  }
+  return args;
+}
+// ---------------------------------------------------------------------------
+// Commands
+// ---------------------------------------------------------------------------
+async function cmdMigrate(aquifer) {
+  await aquifer.migrate();
+  console.log('Migrations applied successfully.');
+}
+async function cmdRecall(aquifer, args) {
+  const query = args._.slice(1).join(' ');
+  if (!query) {
+    console.error('Usage: aquifer recall <query> [--limit N] [--agent-id ID] [--json]');
+    process.exit(1);
+  }
+  const results = await aquifer.recall(query, {
+    limit: parseInt(args.flags.limit || '5', 10),
+    agentId: args.flags['agent-id'] || undefined,
+    source: args.flags.source || undefined,
+    dateFrom: args.flags['date-from'] || undefined,
+    dateTo: args.flags['date-to'] || undefined,
+  });
+  if (args.flags.json) {
+    console.log(JSON.stringify(results, null, 2));
+    return;
+  }
+  if (results.length === 0) {
+    console.log('No results found.');
+    return;
+  }
+  for (let i = 0; i < results.length; i++) {
+    const r = results[i];
+    const ss = r.structuredSummary || {};
+    const title = ss.title || r.summaryText?.slice(0, 60) || '(untitled)';
+    const date = r.startedAt ? new Date(r.startedAt).toISOString().slice(0, 10) : '?';
+    console.log(`${i + 1}. [${r.score?.toFixed(3)}] ${title} (${date}, ${r.agentId})`);
+    if (ss.overview) console.log(`   ${ss.overview.slice(0, 200)}`);
+    if (r.matchedTurnText) console.log(`   > ${r.matchedTurnText.slice(0, 150)}`);
+    console.log();
+  }
+}
+async function cmdBackfill(aquifer, args) {
+  const limit = parseInt(args.flags.limit || '100', 10);
+  const dryRun = !!args.flags['dry-run'];
+  const skipSummary = !!args.flags['skip-summary'];
+  const skipTurnEmbed = !!args.flags['skip-turn-embed'];
+  const skipEntities = !!args.flags['skip-entities'];
+  const config = aquifer._config || {};
+  const schema = config.schema || 'aquifer';
+  const tenantId = config.tenantId || 'default';
+  const pool = aquifer._pool;
+  if (!pool) {
+    console.error('Backfill requires direct pool access.');
+    process.exit(1);
+  }
+  const qi = (id) => `"${id}"`;
+  const { rows } = await pool.query(`
+    SELECT session_id, agent_id, processing_status
+    FROM ${qi(schema)}.sessions
+    WHERE tenant_id = $1
+      AND processing_status IN ('pending', 'failed')
+    ORDER BY started_at DESC
+    LIMIT $2
+  `, [tenantId, limit]);
+  console.log(`Found ${rows.length} sessions to backfill${dryRun ? ' (dry-run)' : ''}`);
+  let enriched = 0, failed = 0;
+  for (const row of rows) {
+    if (dryRun) {
+      console.log(`  [dry-run] ${row.session_id} (${row.agent_id}) status=${row.processing_status}`);
+      continue;
+    }
+    try {
+      const result = await aquifer.enrich(row.session_id, {
+        agentId: row.agent_id,
+        skipSummary,
+        skipTurnEmbed,
+        skipEntities,
+      });
+      enriched++;
+      console.log(`  [${enriched}] ${row.session_id}: ${result.turnsEmbedded} turns, ${result.entitiesFound} entities`);
+    } catch (err) {
+      failed++;
+      console.error(`  [error] ${row.session_id}: ${err.message}`);
+    }
+  }
+  console.log(`\nDone. enriched=${enriched} failed=${failed} total=${rows.length}`);
+  if (failed > 0) process.exitCode = 2;
+}
+async function cmdStats(aquifer, args) {
+  const config = aquifer._config || {};
+  const schema = config.schema || 'aquifer';
+  const tenantId = config.tenantId || 'default';
+  const pool = aquifer._pool;
+  if (!pool) {
+    console.error('Stats requires direct pool access.');
+    process.exit(1);
+  }
+  const qi = (id) => `"${id}"`;
+  const [sessions, summaries, turns, entities] = await Promise.all([
+    pool.query(`SELECT processing_status, COUNT(*)::int as count FROM ${qi(schema)}.sessions WHERE tenant_id = $1 GROUP BY processing_status`, [tenantId]),
+    pool.query(`SELECT COUNT(*)::int as count FROM ${qi(schema)}.session_summaries WHERE tenant_id = $1`, [tenantId]),
+    pool.query(`SELECT COUNT(*)::int as count FROM ${qi(schema)}.turn_embeddings WHERE tenant_id = $1`, [tenantId]),
+    pool.query(`SELECT COUNT(*)::int as count FROM ${qi(schema)}.entities WHERE tenant_id = $1`, [tenantId]).catch(() => ({ rows: [{ count: 0 }] })),
+  ]);
+  const timeRange = await pool.query(`SELECT MIN(started_at) as earliest, MAX(started_at) as latest FROM ${qi(schema)}.sessions WHERE tenant_id = $1`, [tenantId]);
+  const stats = {
+    sessions: Object.fromEntries(sessions.rows.map(r => [r.processing_status, r.count])),
+    sessionTotal: sessions.rows.reduce((s, r) => s + r.count, 0),
+    summaries: summaries.rows[0]?.count || 0,
+    turnEmbeddings: turns.rows[0]?.count || 0,
+    entities: entities.rows[0]?.count || 0,
+    earliest: timeRange.rows[0]?.earliest || null,
+    latest: timeRange.rows[0]?.latest || null,
+  };
+  if (args.flags.json) {
+    console.log(JSON.stringify(stats, null, 2));
+  } else {
+    console.log(`Sessions: ${stats.sessionTotal} (${Object.entries(stats.sessions).map(([k, v]) => `${k}: ${v}`).join(', ')})`);
+    console.log(`Summaries: ${stats.summaries}`);
+    console.log(`Turn embeddings: ${stats.turnEmbeddings}`);
+    console.log(`Entities: ${stats.entities}`);
+    if (stats.earliest) console.log(`Range: ${new Date(stats.earliest).toISOString().slice(0, 10)} — ${new Date(stats.latest).toISOString().slice(0, 10)}`);
+  }
+}
+async function cmdExport(aquifer, args) {
+  const config = aquifer._config || {};
+  const schema = config.schema || 'aquifer';
+  const tenantId = config.tenantId || 'default';
+  const pool = aquifer._pool;
+  const output = args.flags.output || null;
+  const limit = parseInt(args.flags.limit || '1000', 10);
+  if (!pool) {
+    console.error('Export requires direct pool access.');
+    process.exit(1);
+  }
+  const qi = (id) => `"${id}"`;
+  const where = [`s.tenant_id = $1`];
+  const params = [tenantId];
+  if (args.flags['agent-id']) { params.push(args.flags['agent-id']); where.push(`s.agent_id = $${params.length}`); }
+  if (args.flags.source) { params.push(args.flags.source); where.push(`s.source = $${params.length}`); }
+  params.push(limit);
+  const { rows } = await pool.query(`
+    SELECT s.*, ss.summary_text, ss.structured_summary
+    FROM ${qi(schema)}.sessions s
+    LEFT JOIN ${qi(schema)}.session_summaries ss ON ss.session_row_id = s.id
+    WHERE ${where.join(' AND ')}
+    ORDER BY s.started_at DESC
+    LIMIT $${params.length}
+  `, params);
+  const stream = output ? require('fs').createWriteStream(output) : process.stdout;
+  for (const row of rows) {
+    stream.write(JSON.stringify({
+      session_id: row.session_id,
+      agent_id: row.agent_id,
+      source: row.source,
+      started_at: row.started_at,
+      msg_count: row.msg_count,
+      processing_status: row.processing_status,
+      summary: row.structured_summary || row.summary_text || null,
+    }) + '\n');
+  }
+  if (output) {
+    stream.end();
+    console.error(`Exported ${rows.length} sessions to ${output}`);
+  }
+}
+// ---------------------------------------------------------------------------
+// Main
+// ---------------------------------------------------------------------------
+async function main() {
+  const argv = process.argv.slice(2);
+  if (argv.length === 0 || argv[0] === '--help' || argv[0] === '-h') {
+    console.log(`Usage: aquifer <command> [options]
+Commands:
+  migrate                     Run database migrations
+  recall <query>              Search sessions (requires embed config)
+  backfill                    Enrich pending sessions
+  stats                       Show database statistics
+  export                      Export sessions as JSONL
+  mcp                         Start MCP server
+Options:
+  --limit N                   Limit results
+  --agent-id ID               Filter by agent
+  --source NAME               Filter by source
+  --date-from YYYY-MM-DD      Start date
+  --date-to YYYY-MM-DD        End date
+  --json                      JSON output
+  --dry-run                   Preview only (backfill)
+  --output PATH               Output file (export)
+  --config PATH               Config file path`);
+    process.exit(0);
+  }
+  const command = argv[0];
+  const args = parseArgs(argv);
+  // MCP: delegate to mcp.js
+  if (command === 'mcp') {
+    require('./mcp').main().catch(err => {
+      console.error(`aquifer mcp: ${err.message}`);
+      process.exit(1);
+    });
+    return;
+  }
+  // All other commands need an Aquifer instance
+  const configOverrides = {};
+  if (args.flags.config) {
+    // Will be picked up by loadConfig
+    process.env.AQUIFER_CONFIG = args.flags.config;
+  }
+  const aquifer = createAquiferFromConfig(configOverrides);
+  try {
+    switch (command) {
+      case 'migrate':
+        await cmdMigrate(aquifer);
+        break;
+      case 'recall':
+        await cmdRecall(aquifer, args);
+        break;
+      case 'backfill':
+        await cmdBackfill(aquifer, args);
+        break;
+      case 'stats':
+        await cmdStats(aquifer, args);
+        break;
+      case 'export':
+        await cmdExport(aquifer, args);
+        break;
+      default:
+        console.error(`Unknown command: ${command}. Run 'aquifer --help' for usage.`);
+        process.exit(1);
+    }
+  } finally {
+    if (aquifer._pool) await aquifer._pool.end();
+  }
+}
+main().catch(err => {
+  console.error(`aquifer: ${err.message}`);
+  process.exit(1);
+});