agenr 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,221 @@
1
+ # agenr
2
+
3
+ **/eɪ.dʒɛn.ɚ/** (AY-GEN-ER) - local-first memory for AI agents.
4
+
5
+ Your AI forgets everything between sessions. agenr fixes that.
6
+
7
+ It extracts structured knowledge from your OpenClaw conversation transcripts - facts, decisions, preferences, todos, relationships, events, lessons - and stores them in a local database with semantic search. Entries strengthen when reinforced, decay when stale, and resolve contradictions. It's not a search engine. It's memory that gets healthier with use.
8
+
9
+ One local database. Your memory stays on your machine.
10
+
11
+ ## What you need
12
+
13
+ agenr uses embeddings to make your memory searchable. The best setup we've found: an **OpenAI API key** with `text-embedding-3-small`. Embeddings cost fractions of a penny per operation - a full ingestion of 100+ session transcripts runs about $0.10 total.
14
+
15
+ agenr also supports **OpenAI Pro subscriptions** and **Anthropic Claude subscriptions** (no API key needed) for the LLM extraction step. But for the best balance of speed, accuracy, and cost, we recommend `gpt-4o-mini` with an API key. `agenr setup` walks you through all of this.
16
+
17
+ ```bash
18
+ export OPENAI_API_KEY=sk-... # for embeddings + extraction
19
+ ```
20
+
21
+ ## Setup
22
+
23
+ ### 1. Install and ingest your history
24
+
25
+ ```bash
26
+ # Install
27
+ npm install -g agenr
28
+
29
+ # Configure (picks your LLM provider, walks you through auth)
30
+ agenr setup
31
+
32
+ # Ingest all your OpenClaw sessions
33
+ agenr ingest ~/.openclaw/agents/main/sessions/ --glob '**/*.jsonl'
34
+
35
+ # Query your memory
36
+ agenr recall "what did we decide about the database schema?"
37
+ ```
38
+
39
+ ### 2. Keep it fresh
40
+
41
+ Start the watcher so new conversations get captured automatically:
42
+
43
+ ```bash
44
+ # Watch your current session file
45
+ agenr watch ~/.openclaw/agents/main/sessions/current.jsonl --interval 120
46
+
47
+ # Or watch a whole directory
48
+ agenr watch --dir ~/.openclaw/agents/main/sessions/
49
+
50
+ # Or install as a background daemon so it runs on its own
51
+ agenr daemon install
52
+ ```
53
+
54
+ ### 3. Give your agent memory
55
+
56
+ **Option A: CLI in AGENTS.md (no MCP needed, works everywhere)**
57
+
58
+ Add this to your OpenClaw `AGENTS.md`:
59
+
60
+ ```markdown
61
+ ## Memory (agenr)
62
+ On every session start, run this BEFORE responding to the first message:
63
+ agenr recall --context session-start --budget 2000
64
+ IMPORTANT: use --budget 2000, not just --limit. Budget triggers balanced output:
65
+ - 20% active todos
66
+ - 30% preferences and decisions
67
+ - 50% recent facts and events
68
+ Without --budget, score ranking skews toward old high-importance todos.
69
+ ```
70
+
71
+ Your agent runs the command on startup and gets its memory back. No MCP, no extra config.
72
+
73
+ **Option B: MCP server (richer integration)**
74
+
75
+ If your tool supports MCP (OpenClaw via mcporter, Claude Code, Codex, Cursor):
76
+
77
+ ```bash
78
+ # Add to OpenClaw (via mcporter)
79
+ mcporter config add agenr --stdio agenr --arg mcp --env OPENAI_API_KEY=your-key-here
80
+ ```
81
+
82
+ This gives your agent `agenr_recall`, `agenr_store`, and `agenr_extract` as tools it can call anytime - not just on startup.
83
+
84
+ Done. Your agent now has persistent memory that survives compaction, session restarts, and everything in between.
85
+
86
+ ## What happens when you ingest
87
+
88
+ agenr reads your OpenClaw session transcripts, filters out noise (tool calls, file dumps, boilerplate - about 80% of a typical session), and extracts structured knowledge entries:
89
+
90
+ ```
91
+ agenr ingest ~/.openclaw/agents/main/sessions/ --glob '**/*.jsonl'
92
+
93
+ [1/108] session-abc123.jsonl (1.2MB) - 12 extracted, 10 stored, 1 skipped (duplicate), 1 reinforced
94
+ [2/108] session-def456.jsonl (800KB) - 8 extracted, 7 stored, 0 skipped, 1 reinforced
95
+ ...
96
+ ```
97
+
98
+ Each entry has a type, subject, content, importance, and expiry. Near-duplicates are caught automatically - if you discussed the same decision in three sessions, you get one entry with higher confirmations, not three copies.
99
+
100
+ ```bash
101
+ agenr recall "package manager"
102
+ ```
103
+
104
+ ```text
105
+ 1 results (46ms)
106
+ 1. [decision] project tooling: We switched this project to pnpm.
107
+ importance=7 | today | recalled 3 times
108
+ tags: tooling, package-manager
109
+ ```
110
+
111
+ ## Live watching
112
+
113
+ The watcher keeps your memory current as you work. It tails your session files, extracts new knowledge every few minutes, and stores it. If you ingested history first, watch resumes right where ingest left off - no re-processing.
114
+
115
+ ```bash
116
+ # Watch your OpenClaw sessions directory
117
+ agenr watch --dir ~/.openclaw/agents/main/sessions/
118
+
119
+ # Auto-detect your session directory
120
+ agenr watch --auto
121
+
122
+ # Install as a background daemon (macOS launchd)
123
+ agenr daemon install
124
+ agenr daemon status
125
+ agenr daemon logs
126
+ ```
127
+
128
+ You can also auto-refresh a context file that AI tools read on startup:
129
+
130
+ ```bash
131
+ agenr watch --auto --context ~/.agenr/CONTEXT.md
132
+ ```
133
+
134
+ ## How it works
135
+
136
+ **Extract** - An LLM reads your transcripts and pulls out structured entries: facts, decisions, preferences, todos, relationships, events, lessons. Smart filtering removes noise (tool calls, file contents, boilerplate) before the LLM ever sees it.
137
+
138
+ **Store** - Entries get embedded and compared against what's already in the database. Near-duplicates reinforce existing knowledge. New information gets inserted. Online dedup catches copies in real-time.
139
+
140
+ **Recall** - Semantic search plus memory-aware ranking. Entries you recall often score higher. Stale entries decay. Contradicted entries get penalized.
141
+
142
+ **Consolidate** - Periodic cleanup: rule-based expiry first, then optional LLM-assisted merging for entries that say the same thing differently.
143
+
144
+ ```
145
+ Transcript -> Filter -> Extract -> Store -> Recall
146
+ 80% LLM dedup semantic
147
+ noise typed + embed + memory-
148
+ removed entries + dedup aware
149
+ ```
150
+
151
+ ## MCP integration
152
+
153
+ agenr exposes three MCP tools: `agenr_recall`, `agenr_store`, `agenr_extract`. Any tool that speaks MCP can use your memory.
154
+
155
+ **OpenClaw** (via [mcporter](https://mcporter.dev)):
156
+ ```bash
157
+ mcporter config add agenr --stdio agenr --arg mcp --env OPENAI_API_KEY=your-key-here
158
+ ```
159
+
160
+ Verify with `mcporter list agenr`. See [docs/OPENCLAW.md](./docs/OPENCLAW.md) for the full guide.
161
+
162
+ ## Commands
163
+
164
+ | Command | What it does |
165
+ | --- | --- |
166
+ | `agenr setup` | Interactive configuration |
167
+ | `agenr ingest <paths...>` | Bulk-ingest files and directories |
168
+ | `agenr extract <files...>` | Extract knowledge from text |
169
+ | `agenr store [files...]` | Store entries with semantic dedup |
170
+ | `agenr recall [query]` | Semantic + memory-aware recall |
171
+ | `agenr watch [file]` | Live-watch files, directories, or auto-detect |
172
+ | `agenr daemon install` | Install background watch daemon |
173
+ | `agenr consolidate` | Clean up and merge near-duplicates |
174
+ | `agenr context` | Generate context file for AI tool integration |
175
+ | `agenr mcp` | Start MCP server (stdio) |
176
+ | `agenr db stats` | Database statistics |
177
+ | `agenr db version` | Database schema version |
178
+
179
+ Full reference: [docs/CLI.md](./docs/CLI.md) | [docs/CONFIGURATION.md](./docs/CONFIGURATION.md)
180
+
181
+ ## Architecture
182
+
183
+ - **Runtime:** Node.js 20+, TypeScript, ESM
184
+ - **Storage:** libsql/SQLite (`~/.agenr/knowledge.db`)
185
+ - **Embeddings:** OpenAI `text-embedding-3-small`, 1024 dimensions
186
+ - **Recall scoring:** Vector similarity x recency x memory strength (max(importance, recall strength)), with contradiction penalties
187
+
188
+ Deep dive: [docs/ARCHITECTURE.md](./docs/ARCHITECTURE.md)
189
+
190
+ ## Status
191
+
192
+ Alpha. The core pipeline is stable and tested (369 tests). We use it daily managing thousands of knowledge entries across OpenClaw sessions.
193
+
194
+ What works: extraction, storage, recall, MCP integration, online dedup, consolidation, smart filtering, live watching, daemon mode.
195
+
196
+ What's next: local embeddings support, entity resolution, auto-scheduled consolidation.
197
+
198
+ ## Philosophy
199
+
200
+ The big labs are building bigger brains. We're building better memory. Those are complementary.
201
+
202
+ Current AI's bottleneck isn't intelligence - it's continuity. A slightly less brilliant model with accumulated context might be more useful than a brilliant amnesiac. What makes a senior engineer senior isn't raw IQ - it's patterns seen, mistakes remembered, approaches that worked. That's memory.
203
+
204
+ agenr is local-first because your memory is yours. It's structured (not just vectors) because "what did we decide about X?" needs a real answer, not a similarity score. It's open source because memory infrastructure should be shared.
205
+
206
+ ## Troubleshooting
207
+
208
+ | Problem | Fix |
209
+ |---|---|
210
+ | Embeddings fail | Set `OPENAI_API_KEY` env var or `agenr config set-key openai <key>` |
211
+ | Database locked | Wait for consolidation to finish, or check `~/.agenr/consolidation.lock` |
212
+ | Recall returns nothing after force-kill | `agenr db rebuild-index` (vector index corruption) |
213
+ | Extraction fails mid-file | Retry - dedup skips already-stored entries |
214
+
215
+ ## License
216
+
217
+ AGPL-3.0 - [LICENSE](./LICENSE)
218
+
219
+ ## Contributing
220
+
221
+ See [CONTRIBUTING.md](./CONTRIBUTING.md)
@@ -0,0 +1,213 @@
1
+ import { Command } from 'commander';
2
+ import { Model, Api, Context, SimpleStreamOptions, AssistantMessageEvent, AssistantMessage } from '@mariozechner/pi-ai';
3
+
4
+ declare const KNOWLEDGE_TYPES: readonly ["fact", "decision", "preference", "todo", "relationship", "event", "lesson"];
5
+ declare const EXPIRY_LEVELS: readonly ["core", "permanent", "temporary"];
6
+ declare const SCOPE_LEVELS: readonly ["private", "personal", "public"];
7
+ type KnowledgeType = (typeof KNOWLEDGE_TYPES)[number];
8
+ type Expiry = (typeof EXPIRY_LEVELS)[number];
9
+ type Scope = (typeof SCOPE_LEVELS)[number];
10
+ type AgenrProvider = "anthropic" | "openai" | "openai-codex";
11
+ type AgenrAuthMethod = "anthropic-oauth" | "anthropic-token" | "anthropic-api-key" | "openai-subscription" | "openai-api-key";
12
+ interface AgenrStoredCredentials {
13
+ anthropicApiKey?: string;
14
+ anthropicOauthToken?: string;
15
+ openaiApiKey?: string;
16
+ }
17
+ interface AgenrConfig {
18
+ auth?: AgenrAuthMethod;
19
+ provider?: AgenrProvider;
20
+ model?: string;
21
+ credentials?: AgenrStoredCredentials;
22
+ embedding?: {
23
+ provider?: "openai";
24
+ model?: string;
25
+ dimensions?: number;
26
+ apiKey?: string;
27
+ };
28
+ db?: {
29
+ path?: string;
30
+ };
31
+ }
32
+ interface KnowledgeEntry {
33
+ type: KnowledgeType;
34
+ content: string;
35
+ subject: string;
36
+ canonical_key?: string;
37
+ importance: number;
38
+ expiry: Expiry;
39
+ scope?: Scope;
40
+ tags: string[];
41
+ created_at?: string;
42
+ source: {
43
+ file: string;
44
+ context: string;
45
+ };
46
+ }
47
+ interface ExtractionStats {
48
+ chunks: number;
49
+ successful_chunks: number;
50
+ failed_chunks: number;
51
+ raw_entries: number;
52
+ deduped_entries: number;
53
+ warnings: string[];
54
+ }
55
+ interface ExtractionSummary {
56
+ files: number;
57
+ chunks: number;
58
+ successful_chunks: number;
59
+ failed_chunks: number;
60
+ raw_entries: number;
61
+ deduped_entries: number;
62
+ warnings: number;
63
+ }
64
+ interface ExtractionReport {
65
+ version: string;
66
+ extracted_at: string;
67
+ provider: AgenrProvider;
68
+ model: string;
69
+ files: Record<string, {
70
+ entries: KnowledgeEntry[];
71
+ stats: ExtractionStats;
72
+ }>;
73
+ summary: ExtractionSummary;
74
+ }
75
+ interface TranscriptMessage {
76
+ index: number;
77
+ role: "user" | "assistant";
78
+ text: string;
79
+ timestamp?: string;
80
+ }
81
+ interface TranscriptChunk {
82
+ chunk_index: number;
83
+ message_start: number;
84
+ message_end: number;
85
+ text: string;
86
+ context_hint: string;
87
+ index?: number;
88
+ totalChunks?: number;
89
+ timestamp_start?: string;
90
+ timestamp_end?: string;
91
+ }
92
+ interface ParsedTranscript {
93
+ file: string;
94
+ messages: TranscriptMessage[];
95
+ chunks: TranscriptChunk[];
96
+ warnings: string[];
97
+ metadata?: {
98
+ sessionId?: string;
99
+ platform?: string;
100
+ startedAt?: string;
101
+ model?: string;
102
+ cwd?: string;
103
+ };
104
+ }
105
+ interface ResolvedModel {
106
+ provider: AgenrProvider;
107
+ modelId: string;
108
+ model: Model<Api>;
109
+ }
110
+ interface ResolvedCredentials {
111
+ apiKey: string;
112
+ source: string;
113
+ }
114
+ interface LlmClient {
115
+ auth: AgenrAuthMethod;
116
+ resolvedModel: ResolvedModel;
117
+ credentials: ResolvedCredentials;
118
+ }
119
+
120
+ declare function deduplicateEntries(entries: KnowledgeEntry[]): KnowledgeEntry[];
121
+
122
+ type SimpleAssistantStream = AsyncIterable<AssistantMessageEvent> & {
123
+ result: () => Promise<AssistantMessage>;
124
+ };
125
+ type StreamSimpleFn = (model: Model<Api>, context: Context, options?: SimpleStreamOptions) => SimpleAssistantStream;
126
+
127
+ interface ExtractChunksResult {
128
+ entries: KnowledgeEntry[];
129
+ successfulChunks: number;
130
+ failedChunks: number;
131
+ warnings: string[];
132
+ aborted?: boolean;
133
+ skippedChunks?: number;
134
+ }
135
+ interface ExtractChunkCompleteResult {
136
+ chunkIndex: number;
137
+ totalChunks: number;
138
+ entries: KnowledgeEntry[];
139
+ warnings: string[];
140
+ }
141
+ declare function extractKnowledgeFromChunks(params: {
142
+ file: string;
143
+ chunks: TranscriptChunk[];
144
+ client: LlmClient;
145
+ verbose: boolean;
146
+ noDedup?: boolean;
147
+ interChunkDelayMs?: number;
148
+ llmConcurrency?: number;
149
+ onVerbose?: (line: string) => void;
150
+ onStreamDelta?: (delta: string, kind: "text" | "thinking") => void;
151
+ onChunkComplete?: (result: ExtractChunkCompleteResult) => Promise<void>;
152
+ streamSimpleImpl?: StreamSimpleFn;
153
+ sleepImpl?: (ms: number) => Promise<void>;
154
+ retryDelayMs?: (attempt: number) => number;
155
+ }): Promise<ExtractChunksResult>;
156
+
157
+ interface ResolveLlmClientInput {
158
+ provider?: string;
159
+ model?: string;
160
+ config?: AgenrConfig | null;
161
+ env?: NodeJS.ProcessEnv;
162
+ }
163
+ declare function createLlmClient(input: ResolveLlmClientInput): LlmClient;
164
+
165
+ declare function writeOutput(params: {
166
+ report: ExtractionReport;
167
+ format: "json" | "markdown";
168
+ output?: string;
169
+ split: boolean;
170
+ }): Promise<string[]>;
171
+
172
+ interface AdapterParseOptions {
173
+ /**
174
+ * When true, bypass adapter filtering and truncation and preserve tool results
175
+ * and other noisy blocks as much as possible.
176
+ */
177
+ raw?: boolean;
178
+ /**
179
+ * When true, adapters may add verbose diagnostics to parse warnings.
180
+ */
181
+ verbose?: boolean;
182
+ }
183
+
184
+ declare function expandInputFiles(inputs: string[]): Promise<string[]>;
185
+ declare function parseTranscriptFile(filePath: string, options?: AdapterParseOptions): Promise<ParsedTranscript>;
186
+
187
+ interface ExtractCommandOptions {
188
+ format: "json" | "markdown";
189
+ json?: boolean;
190
+ output?: string;
191
+ split?: boolean;
192
+ model?: string;
193
+ provider?: string;
194
+ verbose?: boolean;
195
+ noDedup?: boolean;
196
+ }
197
+ interface CliDeps {
198
+ expandInputFilesFn: typeof expandInputFiles;
199
+ assertReadableFileFn: (filePath: string) => Promise<void>;
200
+ parseTranscriptFileFn: typeof parseTranscriptFile;
201
+ createLlmClientFn: typeof createLlmClient;
202
+ extractKnowledgeFromChunksFn: typeof extractKnowledgeFromChunks;
203
+ deduplicateEntriesFn: typeof deduplicateEntries;
204
+ writeOutputFn: typeof writeOutput;
205
+ }
206
+ declare function runExtractCommand(files: string[], options: ExtractCommandOptions, deps?: Partial<CliDeps>): Promise<{
207
+ exitCode: number;
208
+ report?: ExtractionReport;
209
+ writtenPaths: string[];
210
+ }>;
211
+ declare function createProgram(): Command;
212
+
213
+ export { type CliDeps, type ExtractCommandOptions, createProgram, runExtractCommand };