@susu-eng/gralkor 27.2.0 → 27.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,15 +1,13 @@
1
1
  # Gralkor
2
2
 
3
- **Persistent memory for OpenClaw agents, powered by knowledge graphs.**
3
+ **The best memory plugin for OpenClaw agents**
4
4
 
5
- Gralkor is an OpenClaw plugin that gives your agents long-term, temporally-aware memory. It uses [Graphiti](https://github.com/getzep/graphiti) (by Zep) for knowledge graph construction and [FalkorDB](https://www.falkordb.com/) as the graph database backend. Both run automatically as a managed subprocess no Docker required.
5
+ Gralkor is an OpenClaw plugin that gives your agents long-term, temporally-aware memory. It uses [Graphiti](https://github.com/getzep/graphiti) (by Zep) for knowledge graph construction and [FalkorDB](https://www.falkordb.com/) as the graph database backend. Both run automatically as a managed subprocess - no independent server for you to manage, or SaaS company to connect to.
6
6
 
7
- Gralkor automatically remembers and recalls everything your agents says, _thinks_, and _does_ — no prompt engineering required by the operator, no conscious (haha) effort required by the agent.
7
+ Gralkor automatically remembers and recalls everything your agent says, _thinks_, and _does_ — no prompt engineering required by the operator, no conscious (haha) effort required by the agent.
8
8
 
9
9
  ## Why Gralkor
10
10
 
11
- After years of building with every AI memory system out there, reading the latest research daily, and doing my own cognitive architecture experiments, I am here to tell you a thing or two about AI memory, and why you should use Gralkor for your OpenClaw agents and forget everything else. I should say up front: I love this space and have enormous respect for everyone shipping in it — what follows is honest craft critique, not shade.
12
-
13
11
  Here's the honest field report on every OpenClaw memory plugin:
14
12
 
15
13
  | Plugin | Storage | Captures thinking | Episode scope | Temporal facts | Local |
@@ -23,32 +21,19 @@ Here's the honest field report on every OpenClaw memory plugin:
23
21
  | **Awareness** | Cloud + MD mirror | no | first message + last reply | none | ✗ |
24
22
  | **Gralkor** | Graphiti knowledge graph | **yes** | full session | `valid_at`/`invalid_at`/`expired_at` | ✓ |
25
23
 
26
- **Graphs, not Markdown or pure vector.** The AI ecosystem's fixation on Markdown-based memory is baffling. Graphs have been the right data structure for representing knowledge since long before LLMs existed. Your code is a graph (syntax trees). Your filesystem is a graph. The web is a graph. Relationships between entities are naturally graph-shaped, and trying to flatten them into Markdown files or pure vector embeddings is fighting reality. And yet: the most popular memory plugin memory-core, the one that ships inside OpenClaw — writes your agent's memory to `MEMORY.md` and `memory/YYYY-MM-DD.md`. The second most popular, lancedb-pro, stores extracted facts as flat rows in LanceDB. Both make recall a lookup problem when it should be a traversal problem. [Graphiti](https://github.com/getzep/graphiti) combines a knowledge graph with vector search — you get structured relationships *and* semantic retrieval. Facts carry temporal validity: when they became true, when they stopped being true, when they were superseded. This is not yet another chunking strategy or embedding experiment. Graphiti has solved this layer of the problem and we build on top of it (and not much). [HippoRAG](https://arxiv.org/abs/2405.14831) (NeurIPS 2024) found graph-based retrieval reaches 89.1% recall@5 on 2WikiMultiHopQA versus 68.2% for flat vector retrieval — a 20.9-point gap. [AriGraph](https://arxiv.org/abs/2407.04363) (IJCAI 2025) independently found KG-augmented agents markedly outperform RAG, summarization, and full-conversation-history baselines across interactive environments.
27
-
28
- **Remembering behaviour, not just dialog.** When your agent reasons through a problem — weighing options, rejecting approaches, arriving at a conclusion — that thinking process is as valuable as the final answer. Gralkor distills the agent's thinking blocks into first-person behavioural summaries and weaves them into the episode transcript before ingestion. The graph doesn't just know what was said; it knows how the agent arrived there. *Fighting words*: Every other OpenClaw memory plugin only remembers what was spoken, totally ignoring what your agent thinks and does — lancedb-pro filters for `type === "text"` only, MemOS strips `<think>` tags, Supermemory never looks at them. Even if you have a sophisticated memory system, your agent is inherently dishonest with you, frequently claiming to remember what it has done when it only really remembers what it claimed to have done, or to have thought what it is only now imagining. Gralkor actually remembers what your agent thought and did — it is the only OpenClaw memory plugin with this capability. [Reflexion](https://arxiv.org/abs/2303.11366) (NeurIPS 2023) showed agents storing self-reflective reasoning traces outperform GPT-4 output-only baselines by 11 points on HumanEval. [ExpeL](https://arxiv.org/abs/2308.10144) (AAAI 2024) directly ablated reasoning-trace storage versus output-only: +11–19 points across benchmarks from storing the reasoning process alone.
29
-
30
- **On cost.** Yes, Gralkor costs more to run than a Markdown file. Behaviour distillation adds roughly 20% to ingestion token cost. Auto-recall adds an LLM call before each turn when results need interpretation.
24
+ Let's look in detail about the decisions made for Gralkor and why they make it the best memory plugin for OpenClaw.
31
25
 
32
- Think of it as better context management, not overhead. However you handle long sessionsgrowing context windows, compaction, summarization — you're paying on every read. Gralkor pays once on write to extract and structure what matters, then pulls only the right stuff at read time.
26
+ **Graphs, not Markdown or pure vector.** The AI ecosystem's fixation on Markdown-based memory is baffling. Graphs are the right data structure for representing knowledge. Your code is a graph (syntax trees), your filesystem is a graph, the web is a graph. The world is a deeply interrelated graph, and trying to flatten it into Markdown files or pure vector embeddings is fighting reality. Yet: the most popular memory plugin memory-core, the one that ships inside OpenClaw — writes your agent's memory to `MEMORY.md` and `memory/YYYY-MM-DD.md`. The second most popular, lancedb-pro, stores extracted facts as flat rows in LanceDB. [Graphiti](https://github.com/getzep/graphiti) combines a knowledge graph with vector embeddings — you get structured relationships *and* semantic retrieval. Facts carry temporal validity: when they became true, when they stopped being true, when they were superseded. This is not another chunking strategy or embedding experiment. Graphiti has solved this layer of the problem and Gralkor deploys and leverages it optimally for this use case. [HippoRAG](https://arxiv.org/abs/2405.14831) (NeurIPS 2024) found graph-based retrieval reaches 89.1% recall@5 on 2WikiMultiHopQA versus 68.2% for flat vector retrieval — a 20.9-point gap. [AriGraph](https://arxiv.org/abs/2407.04363) (IJCAI 2025) independently found KG-augmented agents markedly outperform RAG, summarization, and full-conversation-history baselines across interactive environments.
33
27
 
34
- And the leverage is real. A single recalled fact "we chose postgres over mysql because of the jsonb column support we need for X" prevents re-litigating that decision in a new session. An agent that remembers your architectural decisions, your preferences, your debugging history, and your reasoning across sessions doesn't just save time; it changes the character of the work. You stop spending turns re-establishing context and start doing the actual work you opened the terminal for.
28
+ **Remembering behaviour, not just dialog.** Agents make mistakes options, weight options, reject approaches - they _learn_ as they complete tasks. Gralkor distills the agent's thinking blocks - it's learning - into first-person behavioural summaries and weaves them into the episode transcript before ingestion. The graph doesn't just know what was said; it knows how the agent arrived there. Yet: Every other OpenClaw memory plugin only remembers what was spoken, totally ignoring what your agent thinks and does — lancedb-pro filters for `type === "text"` only, MemOS strips `<think>` tags, Supermemory never looks at them. Even if you have a sophisticated memory system, your agent is inherently dishonest with you, frequently claiming to remember what it has done when it only really remembers what it claimed to have done, or to have thought what it is only now imagining. Gralkor actually remembers what your agent thought and did it is the only OpenClaw memory plugin with this capability. [Reflexion](https://arxiv.org/abs/2303.11366) (NeurIPS 2023) showed agents storing self-reflective reasoning traces outperform GPT-4 output-only baselines by 11 points on HumanEval. [ExpeL](https://arxiv.org/abs/2308.10144) (AAAI 2024) directly ablated reasoning-trace storage versus output-only: +11–19 points across benchmarks from storing the reasoning process alone.
35
29
 
36
- Paying $15–20/month in API costs to make your agent meaningfully smarter across sessions is not a place to save money. The agents that cost you real money are the ones that forget everything and make you start over.
30
+ **On cost.** Gralkor costs more to run than a Markdown file. It's better context management, not overhead. Instead of paying to pollute your context window with junk every read, you pay more on ingestion in exchange for cheap, high-relevance reads. Extract and structure what matters, then pull only the right stuff at read time. It's also worth it: A single recalled fact — "we chose postgres over mysql because of the jsonb column support we need for X" — prevents re-litigating that decision in a new session. An agent that remembers your architectural decisions, your preferences, your debugging history, and your reasoning across sessions doesn't just save time; it changes the character of the work. You stop spending turns re-establishing context and start doing the actual work you opened the terminal for. Paying $15–20/month in API costs to make your agent meaningfully smarter across sessions is a good investment for anybody doing real work. The agents that cost you real money are the ones that forget everything and make you start over.
37
31
 
38
- **Maximum context at ingestion.** Most memory plugins save isolated question-answer pairs or summarized snippets: Awareness stores the first user message and the last assistant reply — a 30-turn debugging session becomes two sentences. Supermemory and MemOS Cloud default to the last turn only. Gralkor captures all messages in each session of work, distills behaviour, and feeds results to Graphiti *as whole episodes*. Extraction works _way_ better when Graphiti has full context rather fragmented QA pairs. *Fighting words*: Other plugins capture single turns of dialog; we capture _the whole episode_ — the entire series of questions, thoughts, actions, and responses that _solved the problem_. Richer semantics, better understanding, better recall. [SeCom](https://arxiv.org/abs/2502.05589) (ICLR 2025) found coherent multi-turn episode storage scores 5.99 GPT4Score points higher than isolated turn-level storage on LOCOMO. [LongMemEval](https://arxiv.org/abs/2410.10813) (ICLR 2025) confirms: fact-level QA-pair extraction drops accuracy from 0.692 to 0.615 versus full-round episode storage.
32
+ **Maximum context at ingestion.** Gralkor captures all messages in each session of work, distills behaviour, and feeds results to Graphiti *as whole episodes*. Extraction works _way_ better when Graphiti has full context. Yet: Most memory plugins save isolated question-answer pairs or summarized snippets: "Awareness" stores the first user message and the last assistant reply — a 30-turn debugging session becomes two sentences. Supermemory and MemOS Cloud default to the last turn only. Other plugins capture single turns of dialog; we capture _the whole episode_ — the entire series of questions, thoughts, actions, and responses that _solved the problem_. Richer semantics, better understanding, better recall. [SeCom](https://arxiv.org/abs/2502.05589) (ICLR 2025) found coherent multi-turn episode storage scores 5.99 GPT4Score points higher than isolated turn-level storage on LOCOMO. [LongMemEval](https://arxiv.org/abs/2410.10813) (ICLR 2025) confirms: fact-level QA-pair extraction drops accuracy from 0.692 to 0.615 versus full-round episode storage.
39
33
 
40
- **Built for the long term.** Graphiti — on which Gralkor is based — is _temporally aware_. On every ingestion, it doesn't just append; it resolves new information against the existing graph, amending, expiring, and invalidating so that your agent knows _what happened over time_. lancedb-pro has something in this direction — an `invalidated_at` timestamp on vector rows, genuinely good but graph edges are not vector rows: Graphiti tracks four timestamps per fact (`created_at`, `valid_at`, `invalid_at`, `expired_at`) and supports point-in-time queries across a traversable structure. This is expensive, bad for throughput, and useless for short-lived agents, so serving a single, long-lived user agent is _the perfect use case_. Graphiti was destined for Gralkor and OpenClaw. [LongMemEval](https://arxiv.org/abs/2410.10813) (ICLR 2025) established that temporal reasoning is the hardest memory sub-task for commercial LLMs; time-aware indexing recovers 7–11% of that loss. [MemoTime](https://arxiv.org/abs/2510.13614) (WWW 2026) found temporal knowledge graphs enable a 4B model to match GPT-4-Turbo on temporal reasoning, with up to 24% improvement over static memory baselines.
34
+ **Built for the long term.** Graphiti — on which Gralkor is based — is _temporally aware_. On every ingestion, it doesn't just append; it resolves new information against the existing graph, amending, expiring, and invalidating so that your agent knows _what happened over time_. lancedb-pro has something in this direction — an `invalidated_at` timestamp on vector rows, but there's no graph. Graphiti tracks four timestamps per fact (`created_at`, `valid_at`, `invalid_at`, `expired_at`) and supports point-in-time queries across a traversable structure. This is expensive, bad for throughput, and useless for short-lived agents, so serving a single, long-lived user agent is _the perfect use case_. Graphiti was destined for Gralkor and OpenClaw. [LongMemEval](https://arxiv.org/abs/2410.10813) (ICLR 2025) established that temporal reasoning is the hardest memory sub-task for commercial LLMs; time-aware indexing recovers 7–11% of that loss. [MemoTime](https://arxiv.org/abs/2510.13614) (WWW 2026) found temporal knowledge graphs enable a 4B model to match GPT-4-Turbo on temporal reasoning, with up to 24% improvement over static memory baselines.
41
35
 
42
- **Recursion through reflection.** A knowledge graph is a living structure. The most powerful thing you can do with it is point the agent back at its own memory — let it reflect on what it knows, identify contradictions, synthesize higher-order insights, and do with them whatever you believe to be _good cognitive architecture_ :shrug:. Gralkor doesn't prescribe how you do this. Instead, it provides the platform for cognitive architecture experimentation: a structured, temporally-aware graph that the agent can both read from and write to using OpenClaw crons.
43
-
44
- ```bash
45
- # Example: schedule the agent to reflect on its memory every 6 hours
46
- openclaw cron add --every 6h --prompt "Search your memory for recent facts. \
47
- Look for contradictions, outdated information, or patterns worth consolidating. \
48
- Use memory_add to store any new insights."
49
- ```
50
-
51
- This is where it gets interesting. The graph gives you a substrate for experimentation — reflection strategies, knowledge consolidation, cross-session reasoning — that flat retrieval systems simply cannot support. [Reflexion](https://arxiv.org/abs/2303.11366) (NeurIPS 2023) demonstrated that agents storing verbal reflections in an episodic buffer gain 11 points with no weight updates. [Generative Agents](https://arxiv.org/abs/2304.03442) (UIST 2023) showed empirically that a reflection layer synthesizing raw memories into higher-order insights is essential for coherent long-term behavior.
36
+ **Recursion through reflection.** A knowledge graph is a living structure. The most powerful thing you can do with it is point the agent back at its own memory — let it reflect on what it knows, identify contradictions, synthesize higher-order insights, and do with them whatever you believe to be _good cognitive architecture_ :shrug:. Gralkor doesn't prescribe how you do this. Instead, it provides the platform for cognitive architecture experimentation: a structured, temporally-aware graph that the agent can both read from and write to using OpenClaw crons. Share yours, and ask to see mine. This is where it gets interesting. The graph gives you a substrate for experimentation — reflection strategies, knowledge consolidation, cross-session reasoning — that flat retrieval systems simply cannot support. [Reflexion](https://arxiv.org/abs/2303.11366) (NeurIPS 2023) demonstrated that agents storing verbal reflections in an episodic buffer gain 11 points with no weight updates. [Generative Agents](https://arxiv.org/abs/2304.03442) (UIST 2023) showed empirically that a reflection layer synthesizing raw memories into higher-order insights is essential for coherent long-term behavior.
52
37
 
53
38
  **Custom ontology: model your agent's world _your way_.** Define your own entity types, attributes, and relationships so that information is parsed into the language of your domain — or your life. [Apple's ODKE+](https://arxiv.org/abs/2509.04696) (2025) showed ontology-guided extraction hits 98.8% precision vs 91% raw LLM; [GoLLIE](https://arxiv.org/abs/2310.03668) (ICLR 2024) directly ablated schema-constrained versus unconstrained generation on the same model, finding +13 F1 points average across NER, relation, and event extraction in zero-shot settings. No other OpenClaw memory plugin offers this. If you want extraction to speak your domain's language: lancedb-pro has six hardcoded categories you can filter but not extend; Supermemory lets you write a free-text hint to guide extraction; the rest offer nothing. Custom ontologies give your agent a model of the world: you could use a domain model codified by experts, be the expert, or try to encode _your_ model of the world. Agent memory doesn't have to be so fuzzy that you lose track of what matters.
54
39
 
@@ -56,7 +41,7 @@ This is where it gets interesting. The graph gives you a substrate for experimen
56
41
 
57
42
  Gralkor replaces the native memory plugin entirely, taking the memory slot.
58
43
 
59
- - **`memory_search`** — searches the knowledge graph and returns relevant facts
44
+ - **`memory_search`** — searches the knowledge graph and returns relevant facts and entity summaries
60
45
  - **`memory_add`** — stores information in the knowledge graph; Graphiti extracts entities and relationships
61
46
  - **`memory_build_indices`** — rebuilds search indices and constraints (maintenance)
62
47
  - **`memory_build_communities`** — detects and builds entity communities/clusters to improve search quality (maintenance)
@@ -93,13 +78,7 @@ openclaw config set plugins.entries.gralkor.config.test true
93
78
  ### 3. Install the plugin
94
79
 
95
80
  ```bash
96
- openclaw plugins install @susu-eng/gralkor
97
- ```
98
-
99
- OpenClaw checks ClawHub before npm for bare package specs, so this installs from ClawHub automatically. To be explicit:
100
-
101
- ```bash
102
- openclaw plugins install clawhub:@susu-eng/gralkor
81
+ openclaw plugins install npm:@susu-eng/gralkor
103
82
  ```
104
83
 
105
84
  From a tarball (e.g. for air-gapped deploys):
@@ -137,26 +116,24 @@ Start chatting with your agent. Gralkor works in the background:
137
116
  - **Auto-capture**: Full multi-turn conversations are stored in the knowledge graph after each agent run
138
117
  - **Auto-recall**: Before the agent responds, relevant facts and entities are retrieved and injected as context
139
118
 
140
- ### Reinstalling / upgrading
141
-
142
- The plugin dir (`~/.openclaw/extensions/gralkor`) is ephemeral — it can be deleted and reinstalled freely. The `dataDir` is persistent — the venv and FalkorDB database survive across reinstalls.
143
-
144
- To reinstall:
119
+ ### Upgrading
145
120
 
146
121
  ```bash
147
- # Clear the memory slot first (otherwise install fails config validation)
148
- openclaw config set plugins.slots.memory ""
122
+ openclaw plugins update gralkor
123
+ ```
149
124
 
150
- # Remove old plugin code
151
- rm -rf ~/.openclaw/extensions/gralkor
125
+ ### Reinstalling
152
126
 
153
- # Reinstall
154
- openclaw plugins install @susu-eng/gralkor
127
+ The plugin dir (`~/.openclaw/extensions/gralkor`) is ephemeral — it can be deleted and reinstalled freely. The `dataDir` is persistent — the venv and FalkorDB database survive across reinstalls.
155
128
 
156
- # Re-assign slot
129
+ ```bash
130
+ openclaw plugins uninstall gralkor
131
+ openclaw plugins install npm:@susu-eng/gralkor
157
132
  openclaw config set plugins.slots.memory gralkor
158
133
  ```
159
134
 
135
+ `uninstall` removes the plugin files and resets the memory slot automatically.
136
+
160
137
  The second boot is fast (~4s) because the venv in `dataDir` is reused.
161
138
 
162
139
  ### LLM providers
@@ -153,5 +153,5 @@
153
153
  "label": "Groq API key"
154
154
  }
155
155
  },
156
- "version": "27.2.0"
156
+ "version": "27.2.3"
157
157
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@susu-eng/gralkor",
3
- "version": "27.2.0",
3
+ "version": "27.2.3",
4
4
  "description": "OpenClaw memory plugin powered by Graphiti knowledge graphs and FalkorDB",
5
5
  "type": "module",
6
6
  "main": "./dist/index.js",
@@ -24,11 +24,11 @@
24
24
  ],
25
25
  "repository": {
26
26
  "type": "git",
27
- "url": "https://github.com/susu-eng/gralkor.git"
27
+ "url": "git+ssh://git@github.com/elimydlarz/gralkor.git"
28
28
  },
29
- "homepage": "https://github.com/susu-eng/gralkor#readme",
29
+ "homepage": "https://github.com/elimydlarz/gralkor#readme",
30
30
  "bugs": {
31
- "url": "https://github.com/susu-eng/gralkor/issues"
31
+ "url": "https://github.com/elimydlarz/gralkor/issues"
32
32
  },
33
33
  "author": "susu-eng",
34
34
  "keywords": [
@@ -75,7 +75,8 @@
75
75
  "docker:up": "pnpm run docker:build && docker compose up -d",
76
76
  "docker:down": "docker compose down",
77
77
  "docker:logs": "docker compose logs graphiti",
78
- "publish:npm": "bash scripts/publish.sh",
79
- "publish:clawhub": "bash scripts/publish-clawhub.sh"
78
+ "publish:npm": "bash scripts/publish-npm.sh",
79
+ "publish:clawhub": "bash scripts/publish-clawhub.sh",
80
+ "publish:all": "bash scripts/publish-all.sh"
80
81
  }
81
82
  }
package/server/main.py CHANGED
@@ -287,6 +287,42 @@ def _find_rate_limit_error(exc: Exception) -> Exception | None:
287
287
  return None
288
288
 
289
289
 
290
+ _CREDENTIAL_HINTS = ("api key", "apikey", "credential", "authentication", "expired", "unauthorized")
291
+
292
+
293
+ def _downstream_llm_response(exc: Exception) -> JSONResponse:
294
+ """Map a downstream LLM provider error to an appropriate HTTP response."""
295
+ http_code = int(getattr(exc, "status_code", None) or getattr(exc, "code", None))
296
+ msg = str(exc).split("\n")[0][:200]
297
+
298
+ if 400 <= http_code < 500:
299
+ if http_code == 400:
300
+ status = 503 if any(h in msg.lower() for h in _CREDENTIAL_HINTS) else 500
301
+ elif http_code in (401, 403):
302
+ status = 503
303
+ elif http_code in (404, 422):
304
+ status = 500
305
+ else:
306
+ status = 502
307
+ else:
308
+ status = 502
309
+
310
+ return JSONResponse(status_code=status, content={"error": "provider error", "detail": msg})
311
+
312
+
313
+ def _find_downstream_llm_error(exc: Exception) -> Exception | None:
314
+ """Walk the exception chain to find a downstream LLM provider error with an HTTP status code."""
315
+ current: Exception | None = exc
316
+ seen: set[int] = set()
317
+ while current is not None and id(current) not in seen:
318
+ seen.add(id(current))
319
+ http_code = getattr(current, "status_code", None) or getattr(current, "code", None)
320
+ if http_code is not None and int(http_code) != 429:
321
+ return current
322
+ current = current.__cause__ or current.__context__
323
+ return None
324
+
325
+
290
326
  _DEFAULT_RETRY_AFTER = 5 # seconds
291
327
 
292
328
 
@@ -306,6 +342,9 @@ async def rate_limit_middleware(request, call_next):
306
342
  content={"detail": msg},
307
343
  headers={"retry-after": str(int(retry_after))},
308
344
  )
345
+ llm_err = _find_downstream_llm_error(exc)
346
+ if llm_err is not None:
347
+ return _downstream_llm_response(llm_err)
309
348
  raise
310
349
 
311
350