get-claudia 1.57.0 → 1.58.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +31 -0
- package/README.md +0 -13
- package/memory-daemon/claudia_memory/__main__.py +73 -0
- package/memory-daemon/claudia_memory/mcp/server.py +77 -0
- package/memory-daemon/claudia_memory/services/backfill.py +346 -0
- package/memory-daemon/claudia_memory/services/entities.py +192 -0
- package/memory-daemon/claudia_memory/services/remember.py +46 -7
- package/package.json +1 -1
- package/template-v2/.claude/manifest.json +4 -4
- package/template-v2/.claude/rules/claudia-principles.md +1 -1
- package/template-v2/.mcp.json.example +0 -10
- package/template-v2/CLAUDE.md +1 -79
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,37 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to Claudia will be documented in this file.
|
|
4
4
|
|
|
5
|
+
## 1.58.0 (2026-05-13)
|
|
6
|
+
|
|
7
|
+
### The Memory Reliability Release
|
|
8
|
+
|
|
9
|
+
Five PRs that fix the memory layer's biggest recurring failure mode and lock in the integration philosophy. After this release, memory writes that name entities ("Matt Blumberg") actually create those entities with the correct type. The release history's recurring memory-fix releases ("Recall Recovery", "Vector Search Fix", "Semantic Search Actually Works Now") get permanent regression-test sentinels so the same bug classes can't quietly come back. And the codebase loses a dual-maintenance hazard that was already costing time.
|
|
10
|
+
|
|
11
|
+
#### Fixed
|
|
12
|
+
- **`memory_remember` actually links entities and infers their type correctly (#54)** -- A confirmed bug from 2026-05-13: calling `memory_remember(content="Matt Blumberg said X", entities=["Matt Blumberg", "Markup AI"])` was creating entities but assigning them `type: person` by default, even when the name clearly indicated an organization. "Markup AI" was being saved as a person. The real bug was in `_infer_entity_type` -- it didn't recognise `AI` / `.ai` / `Co.` as corporate suffixes and fell back to `person`. Fixed with a pure-function rule-based type inference (corporate suffixes -> organization, project keywords -> project, person patterns -> person, fallback -> concept, never default to person). Plus a new `claudia-memory --backfill-entities` CLI to retroactively link orphaned references in existing user databases.
|
|
13
|
+
|
|
14
|
+
#### Added
|
|
15
|
+
- **`claudia memory backfill-entities` command (#54)** -- Default dry-run: prints a plan and writes nothing. `--apply` makes a timestamped backup to `~/.claudia/backups/memory-{timestamp}.db` first, then applies the backfill. Idempotent: re-running on an already-backfilled DB is a no-op. Aborts cleanly if backup creation fails.
|
|
16
|
+
- **5 regression tests for recurring bug classes (#56)** -- New `memory-daemon/tests/test_recurring_regressions.py` adds permanent forward-looking sentinels for: entity linking on `memory_remember`, recall returning results after seed writes, embedding migration preserving recall, daemon startup tolerating stale SHM files, and `memory_briefing` returning a valid structure on an empty database. Each test docstring names the historical releases where its bug class appeared (v1.35.x, v1.51.5, v1.51.18, v1.55.7, v1.55.8, v1.55.14, v1.21.1, v1.40.1).
|
|
17
|
+
- **API parameter aliases for read-side MCP tools (#57)** -- `memory_about` now accepts `entity_name` and `name` alongside `entity`. `memory_relate` accepts `source_entity` / `target_entity` / `relationship_type` alongside `source` / `target` / `relationship`. `memory_recall` accepts `q` and `search` alongside `query`. Purely additive: every existing caller continues to work unchanged. Aliases normalize at the MCP boundary; service-layer signatures are untouched. If both canonical and alias are passed in the same call, canonical wins.
|
|
18
|
+
|
|
19
|
+
#### Removed
|
|
20
|
+
- **Rube (Composio) MCP integration as a bundled default (#41)** -- Rube is no longer a recommended or bundled MCP server in `.mcp.json.example` (root and template-v2), README, or the Claudia documentation. Locks in the direct-integrations-only philosophy (claude.ai-native MCPs + user-built custom MCPs like Gmail/Calendar). Existing users with `rube` already configured continue to work unchanged; the installer simply no longer ships Rube as an example. The "Tool configuration" example in `claudia-principles.md` was updated to vendor-neutral phrasing.
|
|
21
|
+
- **Legacy `claudia/` sibling files (#55)** -- Removed 3 stale sibling files (`post-tool-capture.py`, `session-health-check.py`, `settings.local.json`) that lived under `claudia/`. These were never reaching users (the installer ships from `template-v2/` only), but every hook bug fix had to remember to patch both locations. The dual-maintenance hazard was real: PR #38's sibling-fix step had to apply the same env-var fix twice. Removed at the source.
|
|
22
|
+
|
|
23
|
+
#### Stats
|
|
24
|
+
- **43 new tests** across 4 files (22 entity-resolution tests in #54, 5 regression sentinels in #56, 16 alias tests in #57)
|
|
25
|
+
- **805 total daemon tests passing** (up from 762 before the v1.57.0 chain), 0 regressions
|
|
26
|
+
- TDD sensitivity proofs for every behavior change: tests fail on the un-modified code, pass after the fix
|
|
27
|
+
- 5 PRs merged, all with stop-gates and TDD discipline
|
|
28
|
+
|
|
29
|
+
#### Notes
|
|
30
|
+
- The bug in #54 was different from the original proposal (#51) described. The proposal said "entities are silently ignored." Actually the entity *records* were getting created -- the bug was that they were all getting `type: person`. Fixing the actual bug rather than the imagined one was a better outcome.
|
|
31
|
+
- The `claudia memory backfill-entities` command surface lives on the daemon's argparse (alongside `--backfill-embeddings`, `--migrate-vault-para`), not as a `claudia memory ...` subcommand on the Node CLI. The Node CLI is the installer, not a memory-command dispatcher.
|
|
32
|
+
- Aliases are NOT yet advertised in the MCP `list_tools()` `inputSchema`. They are tolerantly accepted at the request boundary. Schema-level advertisement is a future enhancement if it proves needed for client discoverability.
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
|
|
5
36
|
## 1.57.0 (2026-05-13)
|
|
6
37
|
|
|
7
38
|
### The Curated Memory Release
|
package/README.md
CHANGED
|
@@ -320,19 +320,6 @@ This generates a one-click URL to enable all required Google APIs and walks you
|
|
|
320
320
|
| **Extended** | 83 | Core + Docs, Sheets, Tasks, Chat |
|
|
321
321
|
| **Complete** | 111 | Extended + Slides, Forms, Apps Script |
|
|
322
322
|
|
|
323
|
-
### 500+ Apps via Rube
|
|
324
|
-
|
|
325
|
-
[Rube](https://rube.app) (by Composio) connects Claudia to Slack, Notion, Jira, GitHub, Linear, HubSpot, Stripe, Figma, and hundreds more through one-click OAuth. No per-app MCP setup needed.
|
|
326
|
-
|
|
327
|
-
| Category | Examples |
|
|
328
|
-
|----------|----------|
|
|
329
|
-
| **Communication** | Slack, Discord, Teams, Telegram |
|
|
330
|
-
| **Project Management** | Jira, Linear, Asana, Trello, Monday.com |
|
|
331
|
-
| **Knowledge & Docs** | Notion, Confluence, Google Docs, Coda |
|
|
332
|
-
| **Code & Dev** | GitHub, GitLab, Bitbucket |
|
|
333
|
-
| **CRM & Sales** | HubSpot, Salesforce, Pipedrive |
|
|
334
|
-
| **And 500+ more** | [Browse the full list](https://rube.app) |
|
|
335
|
-
|
|
336
323
|
### Obsidian Vault
|
|
337
324
|
|
|
338
325
|
Memory auto-syncs to an Obsidian vault at `~/.claudia/vault/` using PARA structure. Every entity becomes a markdown note with `[[wikilinks]]`, so Obsidian's graph view maps your network. SQLite is the source of truth; the vault is a read-only projection you can browse and search.
|
|
@@ -1110,6 +1110,24 @@ def main():
|
|
|
1110
1110
|
action="store_true",
|
|
1111
1111
|
help="Preview mode for --migrate-vault-para: show routing plan without making changes",
|
|
1112
1112
|
)
|
|
1113
|
+
parser.add_argument(
|
|
1114
|
+
"--backfill-entities",
|
|
1115
|
+
action="store_true",
|
|
1116
|
+
help=(
|
|
1117
|
+
"Scan memories for un-linked entity references and propose "
|
|
1118
|
+
"creating/linking them (Proposal #51). Dry-run by default; "
|
|
1119
|
+
"pass --apply to write changes (creates a SQLite backup first)."
|
|
1120
|
+
),
|
|
1121
|
+
)
|
|
1122
|
+
parser.add_argument(
|
|
1123
|
+
"--apply",
|
|
1124
|
+
action="store_true",
|
|
1125
|
+
help=(
|
|
1126
|
+
"With --backfill-entities: actually write the changes. "
|
|
1127
|
+
"A SQLite backup is created at ~/.claudia/backups/ before any "
|
|
1128
|
+
"writes; if backup creation fails, the command aborts."
|
|
1129
|
+
),
|
|
1130
|
+
)
|
|
1113
1131
|
parser.add_argument(
|
|
1114
1132
|
"--migrate-legacy",
|
|
1115
1133
|
action="store_true",
|
|
@@ -1704,6 +1722,61 @@ def main():
|
|
|
1704
1722
|
run_para_migration(vault_path, db=db, preview=args.preview)
|
|
1705
1723
|
return
|
|
1706
1724
|
|
|
1725
|
+
if args.backfill_entities:
|
|
1726
|
+
# Entity-link backfill (Proposal #51). Dry-run by default; --apply
|
|
1727
|
+
# writes after creating a SQLite backup.
|
|
1728
|
+
setup_logging(debug=args.debug)
|
|
1729
|
+
from datetime import datetime as _dt
|
|
1730
|
+
|
|
1731
|
+
from .services.backfill import (
|
|
1732
|
+
apply_backfill,
|
|
1733
|
+
format_plan_summary,
|
|
1734
|
+
plan_backfill,
|
|
1735
|
+
)
|
|
1736
|
+
|
|
1737
|
+
db = get_db()
|
|
1738
|
+
db.initialize()
|
|
1739
|
+
|
|
1740
|
+
plan = plan_backfill(db)
|
|
1741
|
+
print(format_plan_summary(plan))
|
|
1742
|
+
|
|
1743
|
+
if not args.apply:
|
|
1744
|
+
# Dry-run path: we already printed the plan; nothing more to do.
|
|
1745
|
+
return
|
|
1746
|
+
|
|
1747
|
+
# --apply: take the mandatory backup first.
|
|
1748
|
+
config = get_config()
|
|
1749
|
+
backups_dir = Path(config.backup_dir)
|
|
1750
|
+
try:
|
|
1751
|
+
backups_dir.mkdir(parents=True, exist_ok=True)
|
|
1752
|
+
except OSError as e:
|
|
1753
|
+
print(
|
|
1754
|
+
f"\nCannot create backup directory {backups_dir}: {e}\n"
|
|
1755
|
+
"Aborting before any database writes."
|
|
1756
|
+
)
|
|
1757
|
+
sys.exit(1)
|
|
1758
|
+
|
|
1759
|
+
timestamp = _dt.utcnow().strftime("%Y-%m-%dT%H%M%SZ")
|
|
1760
|
+
backup_path = backups_dir / f"memory-{timestamp}.db"
|
|
1761
|
+
|
|
1762
|
+
try:
|
|
1763
|
+
result = apply_backfill(db, plan, backup_path=backup_path)
|
|
1764
|
+
except Exception as e:
|
|
1765
|
+
print(
|
|
1766
|
+
f"\nBackfill aborted (no DB writes performed): {e}\n"
|
|
1767
|
+
f"Backup target was: {backup_path}"
|
|
1768
|
+
)
|
|
1769
|
+
sys.exit(1)
|
|
1770
|
+
|
|
1771
|
+
print(
|
|
1772
|
+
"\nBackfill applied:\n"
|
|
1773
|
+
f" backup written to: {result.backup_path}\n"
|
|
1774
|
+
f" entities created: {result.entities_created}\n"
|
|
1775
|
+
f" entities reused: {result.entities_reused}\n"
|
|
1776
|
+
f" memory_entities links created: {result.links_created}"
|
|
1777
|
+
)
|
|
1778
|
+
return
|
|
1779
|
+
|
|
1707
1780
|
if args.merge_databases:
|
|
1708
1781
|
# Manual consolidation of hash-named databases
|
|
1709
1782
|
setup_logging(debug=args.debug)
|
|
@@ -141,6 +141,79 @@ def _require(arguments: dict, key: str, tool_name: str):
|
|
|
141
141
|
return value
|
|
142
142
|
|
|
143
143
|
|
|
144
|
+
# ── Parameter-name aliases (v1.58.0 PR E) ──
|
|
145
|
+
#
|
|
146
|
+
# The memory MCP tools historically used different parameter conventions
|
|
147
|
+
# (entity vs source/target/relationship vs query). The aliases below let
|
|
148
|
+
# callers use a consistent variant while every existing caller continues
|
|
149
|
+
# to work unchanged. Normalization happens here at the MCP boundary;
|
|
150
|
+
# service-layer signatures in claudia_memory/services/ are untouched.
|
|
151
|
+
#
|
|
152
|
+
# Rules:
|
|
153
|
+
# 1. Purely additive. The canonical name continues to work as before.
|
|
154
|
+
# 2. If both the canonical name and an alias are provided in the same
|
|
155
|
+
# call, the canonical name wins (the alias is left in place and is
|
|
156
|
+
# not consulted by the handler).
|
|
157
|
+
# 3. Otherwise, the first matching alias is renamed to the canonical
|
|
158
|
+
# key and the alias key is removed from the arguments dict so the
|
|
159
|
+
# handler only ever sees the canonical name.
|
|
160
|
+
|
|
161
|
+
_PARAM_ALIASES: Dict[str, Dict[str, List[str]]] = {
|
|
162
|
+
"memory_about": {
|
|
163
|
+
"entity": ["entity_name", "name"],
|
|
164
|
+
},
|
|
165
|
+
"memory_relate": {
|
|
166
|
+
"source": ["source_entity"],
|
|
167
|
+
"target": ["target_entity"],
|
|
168
|
+
"relationship": ["relationship_type"],
|
|
169
|
+
},
|
|
170
|
+
"memory_recall": {
|
|
171
|
+
"query": ["q", "search"],
|
|
172
|
+
},
|
|
173
|
+
}
|
|
174
|
+
|
|
175
|
+
|
|
176
|
+
def _normalize_params(arguments: dict, canonical: str, aliases: List[str]) -> dict:
|
|
177
|
+
"""Resolve alias parameter names to the canonical name.
|
|
178
|
+
|
|
179
|
+
If the canonical name is already present in `arguments`, it wins and
|
|
180
|
+
`arguments` is returned unchanged. Otherwise, the first alias from
|
|
181
|
+
`aliases` that is present is renamed to the canonical key, and the
|
|
182
|
+
alias key is removed. If no alias matches either, `arguments` is
|
|
183
|
+
returned unchanged.
|
|
184
|
+
|
|
185
|
+
The function never mutates the caller's dict: when a rewrite is
|
|
186
|
+
needed it returns a shallow copy with the alias key replaced.
|
|
187
|
+
"""
|
|
188
|
+
if canonical in arguments:
|
|
189
|
+
return arguments
|
|
190
|
+
for alias in aliases:
|
|
191
|
+
if alias in arguments:
|
|
192
|
+
arguments = dict(arguments)
|
|
193
|
+
arguments[canonical] = arguments.pop(alias)
|
|
194
|
+
return arguments
|
|
195
|
+
return arguments
|
|
196
|
+
|
|
197
|
+
|
|
198
|
+
def _apply_parameter_aliases(tool_name: str, arguments: dict) -> dict:
|
|
199
|
+
"""Apply all registered aliases for the given tool, if any.
|
|
200
|
+
|
|
201
|
+
Tools without an entry in `_PARAM_ALIASES` (the vast majority) get
|
|
202
|
+
their arguments back unchanged. Dot-notation aliases (e.g.
|
|
203
|
+
'memory.about') are routed to the same canonical tool name for the
|
|
204
|
+
purposes of alias lookup.
|
|
205
|
+
"""
|
|
206
|
+
# Dot-notation aliases (memory.about, etc.) share the same canonical
|
|
207
|
+
# alias map as their underscore counterparts.
|
|
208
|
+
lookup_name = tool_name.replace(".", "_", 1) if "." in tool_name else tool_name
|
|
209
|
+
aliases_for_tool = _PARAM_ALIASES.get(lookup_name)
|
|
210
|
+
if not aliases_for_tool:
|
|
211
|
+
return arguments
|
|
212
|
+
for canonical, alias_list in aliases_for_tool.items():
|
|
213
|
+
arguments = _normalize_params(arguments, canonical, alias_list)
|
|
214
|
+
return arguments
|
|
215
|
+
|
|
216
|
+
|
|
144
217
|
MAX_RESPONSE_BYTES = 50_000
|
|
145
218
|
|
|
146
219
|
|
|
@@ -3142,6 +3215,10 @@ async def call_tool(name: str, arguments: Dict[str, Any]) -> CallToolResult:
|
|
|
3142
3215
|
"""Handle tool calls via dispatch registry."""
|
|
3143
3216
|
db = get_db()
|
|
3144
3217
|
try:
|
|
3218
|
+
# Normalize parameter-name aliases at the MCP boundary so handlers
|
|
3219
|
+
# only ever see canonical parameter names. Purely additive: tools
|
|
3220
|
+
# without registered aliases are unchanged. See _PARAM_ALIASES.
|
|
3221
|
+
arguments = _apply_parameter_aliases(name, arguments)
|
|
3145
3222
|
with db.transaction():
|
|
3146
3223
|
handler = _TOOL_HANDLERS.get(name)
|
|
3147
3224
|
if handler:
|
|
@@ -0,0 +1,346 @@
|
|
|
1
|
+
"""Entity-link backfill command (Proposal #51).
|
|
2
|
+
|
|
3
|
+
The pre-v1.58 write path linked entities to memories *most* of the time
|
|
4
|
+
but auto-created organisations as type=person and could miss entities
|
|
5
|
+
referenced only in the content (no ``about_entities`` array supplied).
|
|
6
|
+
|
|
7
|
+
This module scans existing memories that have no entity links and
|
|
8
|
+
proposes new entity creations + ``memory_entities`` rows. Two phases:
|
|
9
|
+
|
|
10
|
+
* ``plan_backfill(db)`` -- pure read. Returns a :class:`BackfillPlan`
|
|
11
|
+
with everything it would do. No writes.
|
|
12
|
+
* ``apply_backfill(db, plan, backup_path)`` -- writes. **First** creates
|
|
13
|
+
a SQLite backup at ``backup_path``. If backup fails, raises BEFORE
|
|
14
|
+
any DB modification.
|
|
15
|
+
|
|
16
|
+
CLI entry points live in ``claudia_memory/__main__.py``:
|
|
17
|
+
``claudia-memory --backfill-entities`` (dry-run; default) and
|
|
18
|
+
``claudia-memory --backfill-entities --apply``.
|
|
19
|
+
|
|
20
|
+
No new deps. No schema migrations. Idempotent on re-apply.
|
|
21
|
+
"""
|
|
22
|
+
|
|
23
|
+
from __future__ import annotations
|
|
24
|
+
|
|
25
|
+
import logging
|
|
26
|
+
import re
|
|
27
|
+
import sqlite3
|
|
28
|
+
from dataclasses import dataclass, field
|
|
29
|
+
from datetime import datetime
|
|
30
|
+
from pathlib import Path
|
|
31
|
+
from typing import Any, Dict, List, Optional
|
|
32
|
+
|
|
33
|
+
from .entities import infer_entity_type
|
|
34
|
+
|
|
35
|
+
logger = logging.getLogger(__name__)
|
|
36
|
+
|
|
37
|
+
|
|
38
|
+
# ---------------------------------------------------------------------------
|
|
39
|
+
# Plan / Result dataclasses
|
|
40
|
+
# ---------------------------------------------------------------------------
|
|
41
|
+
|
|
42
|
+
|
|
43
|
+
@dataclass
|
|
44
|
+
class BackfillPlan:
|
|
45
|
+
"""Read-only plan of what apply_backfill would do.
|
|
46
|
+
|
|
47
|
+
Attributes:
|
|
48
|
+
orphan_count: number of memory rows with zero memory_entities links
|
|
49
|
+
that the planner thinks SHOULD have at least one link.
|
|
50
|
+
proposed_entities: list of dicts ``{"name": str, "inferred_type":
|
|
51
|
+
str, "memory_ids": [int, ...]}``. Each dict represents a name
|
|
52
|
+
we detected in memory content for which we will (a) create the
|
|
53
|
+
entity if missing, (b) link it to those memories.
|
|
54
|
+
scanned_memories: total memories the planner looked at.
|
|
55
|
+
"""
|
|
56
|
+
|
|
57
|
+
orphan_count: int = 0
|
|
58
|
+
proposed_entities: List[Dict[str, Any]] = field(default_factory=list)
|
|
59
|
+
scanned_memories: int = 0
|
|
60
|
+
|
|
61
|
+
|
|
62
|
+
@dataclass
|
|
63
|
+
class BackfillResult:
|
|
64
|
+
"""Counts of writes performed by apply_backfill."""
|
|
65
|
+
|
|
66
|
+
entities_created: int = 0
|
|
67
|
+
entities_reused: int = 0
|
|
68
|
+
links_created: int = 0
|
|
69
|
+
backup_path: Optional[Path] = None
|
|
70
|
+
|
|
71
|
+
|
|
72
|
+
# ---------------------------------------------------------------------------
|
|
73
|
+
# Name detection -- intentionally conservative
|
|
74
|
+
# ---------------------------------------------------------------------------
|
|
75
|
+
|
|
76
|
+
# Two or more capitalised words: a reasonable signal for proper nouns.
|
|
77
|
+
# We won't catch single-word entities like "Acme" here -- that prevents a
|
|
78
|
+
# flood of false positives like "The", "She", "Monday" at sentence starts.
|
|
79
|
+
_PROPER_NOUN_RE = re.compile(r"\b([A-Z][a-z]+(?:\s+[A-Z][a-z]+)+)\b")
|
|
80
|
+
|
|
81
|
+
# Things we never want to propose as entity names.
|
|
82
|
+
_STOPWORDS = frozenset(
|
|
83
|
+
{
|
|
84
|
+
"Project", # Without a following noun, this is the keyword itself.
|
|
85
|
+
"Inc",
|
|
86
|
+
"LLC",
|
|
87
|
+
"Corp",
|
|
88
|
+
"AI",
|
|
89
|
+
"Ltd",
|
|
90
|
+
"Co",
|
|
91
|
+
}
|
|
92
|
+
)
|
|
93
|
+
|
|
94
|
+
|
|
95
|
+
def _candidate_names(content: str) -> List[str]:
|
|
96
|
+
"""Extract proper-noun candidate names from memory content.
|
|
97
|
+
|
|
98
|
+
Returns a list of unique, order-preserved candidates.
|
|
99
|
+
"""
|
|
100
|
+
if not content:
|
|
101
|
+
return []
|
|
102
|
+
|
|
103
|
+
seen: Dict[str, None] = {}
|
|
104
|
+
for match in _PROPER_NOUN_RE.finditer(content):
|
|
105
|
+
raw = match.group(1).strip()
|
|
106
|
+
# Reject single-token stopwords we accidentally captured.
|
|
107
|
+
if raw in _STOPWORDS:
|
|
108
|
+
continue
|
|
109
|
+
seen.setdefault(raw, None)
|
|
110
|
+
return list(seen.keys())
|
|
111
|
+
|
|
112
|
+
|
|
113
|
+
# ---------------------------------------------------------------------------
|
|
114
|
+
# Phase 1: plan_backfill (NO writes)
|
|
115
|
+
# ---------------------------------------------------------------------------
|
|
116
|
+
|
|
117
|
+
|
|
118
|
+
def plan_backfill(db) -> BackfillPlan:
|
|
119
|
+
"""Scan memories with no entity links and propose new links.
|
|
120
|
+
|
|
121
|
+
Args:
|
|
122
|
+
db: The Database object (sqlite wrapper).
|
|
123
|
+
|
|
124
|
+
Returns:
|
|
125
|
+
A :class:`BackfillPlan`. The caller can inspect ``orphan_count``
|
|
126
|
+
and ``proposed_entities`` before deciding to ``--apply``.
|
|
127
|
+
|
|
128
|
+
This function MUST NOT write to the database. Tests assert this.
|
|
129
|
+
"""
|
|
130
|
+
plan = BackfillPlan()
|
|
131
|
+
|
|
132
|
+
# Find memories that have no entity link at all and have content
|
|
133
|
+
# that looks like it mentions someone or something.
|
|
134
|
+
rows = db.execute(
|
|
135
|
+
"""
|
|
136
|
+
SELECT m.id, m.content
|
|
137
|
+
FROM memories m
|
|
138
|
+
LEFT JOIN memory_entities me ON m.id = me.memory_id
|
|
139
|
+
WHERE me.memory_id IS NULL
|
|
140
|
+
AND m.invalidated_at IS NULL
|
|
141
|
+
AND m.content IS NOT NULL
|
|
142
|
+
""",
|
|
143
|
+
fetch=True,
|
|
144
|
+
) or []
|
|
145
|
+
|
|
146
|
+
plan.scanned_memories = len(rows)
|
|
147
|
+
if not rows:
|
|
148
|
+
return plan
|
|
149
|
+
|
|
150
|
+
# name -> {"inferred_type": str, "memory_ids": [int]}
|
|
151
|
+
by_name: Dict[str, Dict[str, Any]] = {}
|
|
152
|
+
|
|
153
|
+
for row in rows:
|
|
154
|
+
memory_id = row["id"]
|
|
155
|
+
content = row["content"]
|
|
156
|
+
names = _candidate_names(content)
|
|
157
|
+
if not names:
|
|
158
|
+
continue
|
|
159
|
+
plan.orphan_count += 1
|
|
160
|
+
for name in names:
|
|
161
|
+
entry = by_name.setdefault(
|
|
162
|
+
name,
|
|
163
|
+
{
|
|
164
|
+
"name": name,
|
|
165
|
+
"inferred_type": infer_entity_type(name, content),
|
|
166
|
+
"memory_ids": [],
|
|
167
|
+
},
|
|
168
|
+
)
|
|
169
|
+
entry["memory_ids"].append(memory_id)
|
|
170
|
+
|
|
171
|
+
plan.proposed_entities = list(by_name.values())
|
|
172
|
+
return plan
|
|
173
|
+
|
|
174
|
+
|
|
175
|
+
# ---------------------------------------------------------------------------
|
|
176
|
+
# Phase 2: apply_backfill (WRITES, but only after a successful backup)
|
|
177
|
+
# ---------------------------------------------------------------------------
|
|
178
|
+
|
|
179
|
+
|
|
180
|
+
def _create_backup(db, backup_path: Path) -> Path:
|
|
181
|
+
"""Write a SQLite-native backup of ``db`` to ``backup_path``.
|
|
182
|
+
|
|
183
|
+
Uses :meth:`sqlite3.Connection.backup` for crash-consistent copy.
|
|
184
|
+
Creates parent directories. Raises on any failure so the caller can
|
|
185
|
+
abort the apply before touching the main DB.
|
|
186
|
+
"""
|
|
187
|
+
backup_path = Path(backup_path)
|
|
188
|
+
backup_path.parent.mkdir(parents=True, exist_ok=True)
|
|
189
|
+
|
|
190
|
+
# The Database wrapper exposes a thread-local connection via
|
|
191
|
+
# ``_get_connection``. We do not capture it as a long-lived
|
|
192
|
+
# attribute -- always ask the wrapper for the live one.
|
|
193
|
+
if hasattr(db, "_get_connection"):
|
|
194
|
+
source_conn = db._get_connection() # noqa: SLF001
|
|
195
|
+
elif hasattr(db, "conn"):
|
|
196
|
+
source_conn = db.conn
|
|
197
|
+
else:
|
|
198
|
+
raise RuntimeError(
|
|
199
|
+
"Cannot create backup: db has no _get_connection or conn attribute"
|
|
200
|
+
)
|
|
201
|
+
|
|
202
|
+
target = sqlite3.connect(str(backup_path))
|
|
203
|
+
try:
|
|
204
|
+
source_conn.backup(target)
|
|
205
|
+
finally:
|
|
206
|
+
target.close()
|
|
207
|
+
|
|
208
|
+
if not backup_path.exists() or backup_path.stat().st_size == 0:
|
|
209
|
+
raise RuntimeError(f"Backup file at {backup_path} is missing or empty")
|
|
210
|
+
|
|
211
|
+
return backup_path
|
|
212
|
+
|
|
213
|
+
|
|
214
|
+
def _ensure_entity_for_backfill(
|
|
215
|
+
db, name: str, entity_type: str
|
|
216
|
+
) -> tuple[int, bool]:
|
|
217
|
+
"""Return (entity_id, created_now).
|
|
218
|
+
|
|
219
|
+
Looks up by canonical_name (lowercased). Returns the existing id
|
|
220
|
+
if found, else inserts a new row. Does not touch embeddings (the
|
|
221
|
+
main daemon's normal flow will pick those up on next access).
|
|
222
|
+
"""
|
|
223
|
+
canonical = name.lower().strip()
|
|
224
|
+
existing = db.get_one(
|
|
225
|
+
"entities",
|
|
226
|
+
where="canonical_name = ?",
|
|
227
|
+
where_params=(canonical,),
|
|
228
|
+
)
|
|
229
|
+
if existing:
|
|
230
|
+
return existing["id"], False
|
|
231
|
+
|
|
232
|
+
now = datetime.utcnow().isoformat()
|
|
233
|
+
new_id = db.insert(
|
|
234
|
+
"entities",
|
|
235
|
+
{
|
|
236
|
+
"name": name,
|
|
237
|
+
"type": entity_type,
|
|
238
|
+
"canonical_name": canonical,
|
|
239
|
+
"importance": 1.0,
|
|
240
|
+
"created_at": now,
|
|
241
|
+
"updated_at": now,
|
|
242
|
+
},
|
|
243
|
+
)
|
|
244
|
+
return new_id, True
|
|
245
|
+
|
|
246
|
+
|
|
247
|
+
def apply_backfill(db, plan: BackfillPlan, backup_path: Path) -> BackfillResult:
|
|
248
|
+
"""Apply the plan after first taking a SQLite backup.
|
|
249
|
+
|
|
250
|
+
Args:
|
|
251
|
+
db: Database wrapper.
|
|
252
|
+
plan: A :class:`BackfillPlan` from :func:`plan_backfill`.
|
|
253
|
+
backup_path: Where to write the SQLite backup. Required.
|
|
254
|
+
|
|
255
|
+
Returns:
|
|
256
|
+
A :class:`BackfillResult` with counts.
|
|
257
|
+
|
|
258
|
+
Raises:
|
|
259
|
+
Anything raised by :func:`_create_backup`. If the backup step
|
|
260
|
+
fails, NO writes are performed.
|
|
261
|
+
"""
|
|
262
|
+
backup_path = Path(backup_path)
|
|
263
|
+
|
|
264
|
+
# Backup MUST come first. If it fails, abort before any DB write.
|
|
265
|
+
created_backup = _create_backup(db, backup_path)
|
|
266
|
+
logger.info("Backfill: backup created at %s", created_backup)
|
|
267
|
+
|
|
268
|
+
result = BackfillResult(backup_path=created_backup)
|
|
269
|
+
|
|
270
|
+
for proposal in plan.proposed_entities:
|
|
271
|
+
name = proposal["name"]
|
|
272
|
+
entity_type = proposal["inferred_type"]
|
|
273
|
+
memory_ids = proposal["memory_ids"]
|
|
274
|
+
|
|
275
|
+
entity_id, created_now = _ensure_entity_for_backfill(db, name, entity_type)
|
|
276
|
+
if created_now:
|
|
277
|
+
result.entities_created += 1
|
|
278
|
+
else:
|
|
279
|
+
result.entities_reused += 1
|
|
280
|
+
|
|
281
|
+
for memory_id in memory_ids:
|
|
282
|
+
try:
|
|
283
|
+
db.insert(
|
|
284
|
+
"memory_entities",
|
|
285
|
+
{
|
|
286
|
+
"memory_id": memory_id,
|
|
287
|
+
"entity_id": entity_id,
|
|
288
|
+
"relationship": "about",
|
|
289
|
+
},
|
|
290
|
+
)
|
|
291
|
+
result.links_created += 1
|
|
292
|
+
except Exception as e:
|
|
293
|
+
# Duplicate link (memory already has it) is harmless.
|
|
294
|
+
logger.debug(
|
|
295
|
+
"Backfill: skipping duplicate link memory=%s entity=%s: %s",
|
|
296
|
+
memory_id,
|
|
297
|
+
entity_id,
|
|
298
|
+
e,
|
|
299
|
+
)
|
|
300
|
+
|
|
301
|
+
logger.info(
|
|
302
|
+
"Backfill applied: %d entities created, %d reused, %d links",
|
|
303
|
+
result.entities_created,
|
|
304
|
+
result.entities_reused,
|
|
305
|
+
result.links_created,
|
|
306
|
+
)
|
|
307
|
+
return result
|
|
308
|
+
|
|
309
|
+
|
|
310
|
+
# ---------------------------------------------------------------------------
|
|
311
|
+
# CLI helper: render a plan summary for the dry-run output
|
|
312
|
+
# ---------------------------------------------------------------------------
|
|
313
|
+
|
|
314
|
+
|
|
315
|
+
def format_plan_summary(plan: BackfillPlan) -> str:
|
|
316
|
+
"""Human-readable summary of a plan (printed in dry-run mode)."""
|
|
317
|
+
lines = [
|
|
318
|
+
"Entity-link backfill plan (dry-run, no writes):",
|
|
319
|
+
f" Scanned memories without links: {plan.scanned_memories}",
|
|
320
|
+
f" Memories with orphan name references: {plan.orphan_count}",
|
|
321
|
+
f" Proposed new/linked entities: {len(plan.proposed_entities)}",
|
|
322
|
+
]
|
|
323
|
+
if plan.proposed_entities:
|
|
324
|
+
lines.append("")
|
|
325
|
+
lines.append(" By inferred type:")
|
|
326
|
+
type_counts: Dict[str, int] = {}
|
|
327
|
+
for p in plan.proposed_entities:
|
|
328
|
+
type_counts[p["inferred_type"]] = (
|
|
329
|
+
type_counts.get(p["inferred_type"], 0) + 1
|
|
330
|
+
)
|
|
331
|
+
for t, n in sorted(type_counts.items()):
|
|
332
|
+
lines.append(f" {t}: {n}")
|
|
333
|
+
# Show a small sample so the user can sanity-check.
|
|
334
|
+
sample = plan.proposed_entities[:10]
|
|
335
|
+
lines.append("")
|
|
336
|
+
lines.append(" Sample (first 10):")
|
|
337
|
+
for p in sample:
|
|
338
|
+
lines.append(
|
|
339
|
+
f" - {p['name']!r} -> {p['inferred_type']} "
|
|
340
|
+
f"({len(p['memory_ids'])} memory link(s))"
|
|
341
|
+
)
|
|
342
|
+
lines.append("")
|
|
343
|
+
lines.append(
|
|
344
|
+
"Run with --apply to write changes. A SQLite backup will be created first."
|
|
345
|
+
)
|
|
346
|
+
return "\n".join(lines)
|
|
@@ -0,0 +1,192 @@
|
|
|
1
|
+
"""Entity resolution helpers.
|
|
2
|
+
|
|
3
|
+
Centralised home for the entity-type inference heuristic and (eventually)
|
|
4
|
+
shared resolution logic. Pure-function module: no DB access, no I/O.
|
|
5
|
+
|
|
6
|
+
Proposal #51 (2026-05-13) traced a real bug where memory.remember +
|
|
7
|
+
memory.relate were both classifying organisations like "Markup AI" as
|
|
8
|
+
type="person" because the heuristic in services/remember.py did not
|
|
9
|
+
recognise the "AI" corporate suffix and silently defaulted to person.
|
|
10
|
+
|
|
11
|
+
The fix lives here so it can be re-used by both call sites
|
|
12
|
+
(remember_fact's about_entities path and relate_entities' auto-create
|
|
13
|
+
path) and tested as a pure function.
|
|
14
|
+
|
|
15
|
+
No new dependencies: pure Python stdlib, rule-based, no LLM, no spaCy.
|
|
16
|
+
"""
|
|
17
|
+
|
|
18
|
+
from __future__ import annotations
|
|
19
|
+
|
|
20
|
+
import re
|
|
21
|
+
from typing import Optional
|
|
22
|
+
|
|
23
|
+
# ---------------------------------------------------------------------------
|
|
24
|
+
# Keyword tables -- kept narrow on purpose. Wider sets create false positives
|
|
25
|
+
# more often than they catch new categories. Add reluctantly.
|
|
26
|
+
# ---------------------------------------------------------------------------
|
|
27
|
+
|
|
28
|
+
# Whole-word corporate signals (matched against lowercased tokens).
|
|
29
|
+
_ORG_WORD_SUFFIXES = frozenset(
|
|
30
|
+
{
|
|
31
|
+
"inc",
|
|
32
|
+
"inc.",
|
|
33
|
+
"llc",
|
|
34
|
+
"ltd",
|
|
35
|
+
"ltd.",
|
|
36
|
+
"corp",
|
|
37
|
+
"corp.",
|
|
38
|
+
"corporation",
|
|
39
|
+
"co",
|
|
40
|
+
"co.",
|
|
41
|
+
"company",
|
|
42
|
+
"gmbh",
|
|
43
|
+
"ag",
|
|
44
|
+
"sa",
|
|
45
|
+
"plc",
|
|
46
|
+
"foundation",
|
|
47
|
+
"university",
|
|
48
|
+
"institute",
|
|
49
|
+
"lab",
|
|
50
|
+
"labs",
|
|
51
|
+
"associates",
|
|
52
|
+
"group",
|
|
53
|
+
"partners",
|
|
54
|
+
# "AI" as a standalone token has become a near-universal corporate
|
|
55
|
+
# marker (Anthropic AI, OpenAI, Markup AI, Hugging AI, etc.). Worth
|
|
56
|
+
# the rare false positive for a person literally named "AI".
|
|
57
|
+
"ai",
|
|
58
|
+
}
|
|
59
|
+
)
|
|
60
|
+
|
|
61
|
+
# Substring corporate signals on the trailing token (for dotted suffixes
|
|
62
|
+
# like Hugging.ai, Acme.io, etc.).
|
|
63
|
+
_ORG_DOMAIN_SUFFIXES = (".ai", ".io", ".com", ".dev", ".so", ".co")
|
|
64
|
+
|
|
65
|
+
_PROJECT_KEYWORDS = frozenset(
|
|
66
|
+
{"project", "sprint", "mvp", "initiative", "campaign", "rollout"}
|
|
67
|
+
)
|
|
68
|
+
|
|
69
|
+
_CONCEPT_KEYWORDS = frozenset(
|
|
70
|
+
{"methodology", "framework", "theory", "protocol", "strategy", "principle"}
|
|
71
|
+
)
|
|
72
|
+
|
|
73
|
+
_LOCATION_KEYWORDS = frozenset(
|
|
74
|
+
{"office", "hq", "headquarters", "campus", "building"}
|
|
75
|
+
)
|
|
76
|
+
|
|
77
|
+
# Two-or-more capitalised words separated by spaces, no digits or punctuation,
|
|
78
|
+
# e.g. "Matt Blumberg", "Mary Anne Smith". A reliable person signal when
|
|
79
|
+
# no other classification fires.
|
|
80
|
+
_PERSON_NAME_RE = re.compile(r"^[A-Z][a-z]+(?:\s+[A-Z][a-z]+)+$")
|
|
81
|
+
|
|
82
|
+
|
|
83
|
+
def infer_entity_type(name: str, content: str = "") -> str:
|
|
84
|
+
"""Infer the canonical entity type from a name and (optional) content.
|
|
85
|
+
|
|
86
|
+
Heuristic rules, evaluated in order. The first rule to match wins:
|
|
87
|
+
|
|
88
|
+
1. **Location keywords** ("office", "hq", "campus") on any token. Checked
|
|
89
|
+
first so "Company HQ" is a location, not an organisation.
|
|
90
|
+
2. **Organisation signals** -- whole-word corporate tokens like ``inc``,
|
|
91
|
+
``llc``, ``corp``, ``ai``, ``foundation``, or domain-style suffixes
|
|
92
|
+
(``.ai``, ``.io``) on the trailing token. Catches "Markup AI",
|
|
93
|
+
"Hugging.ai", "Acme Inc.".
|
|
94
|
+
3. **Project keywords** ("project", "sprint", "mvp"). Matches "Project
|
|
95
|
+
Phoenix" and "Phoenix Project" alike.
|
|
96
|
+
4. **Concept keywords** ("methodology", "framework", "strategy").
|
|
97
|
+
5. **Person pattern** -- two-or-more capitalised words ("Matt Blumberg",
|
|
98
|
+
"Mary Anne Smith") with no other classification signal.
|
|
99
|
+
6. **Fallback: concept**, never ``person``. Proposal #51 explicitly
|
|
100
|
+
rejected ``person`` as the default because a single ambiguous token
|
|
101
|
+
like "Markup" was getting auto-typed as a person whenever it appeared
|
|
102
|
+
in the entities list.
|
|
103
|
+
|
|
104
|
+
Args:
|
|
105
|
+
name: The entity name (case-insensitive matching applies).
|
|
106
|
+
content: Optional surrounding memory text. Currently unused by the
|
|
107
|
+
heuristic but accepted so callers can pass context without
|
|
108
|
+
changing the signature later (we may grow the rules to peek at
|
|
109
|
+
the memory body for "company", "the team", etc. cues).
|
|
110
|
+
|
|
111
|
+
Returns:
|
|
112
|
+
One of: ``"organization"``, ``"person"``, ``"project"``,
|
|
113
|
+
``"concept"``, ``"location"``.
|
|
114
|
+
"""
|
|
115
|
+
if not name or not name.strip():
|
|
116
|
+
return "concept"
|
|
117
|
+
|
|
118
|
+
stripped = name.strip()
|
|
119
|
+
lowered = stripped.lower()
|
|
120
|
+
tokens = lowered.split()
|
|
121
|
+
|
|
122
|
+
# 1. Location keywords first (so "Company HQ" -> location, not org).
|
|
123
|
+
for tok in tokens:
|
|
124
|
+
if tok.rstrip(".,") in _LOCATION_KEYWORDS:
|
|
125
|
+
return "location"
|
|
126
|
+
|
|
127
|
+
# 2. Organisation: whole-word suffix on ANY token.
|
|
128
|
+
for tok in tokens:
|
|
129
|
+
if tok.rstrip(".,") in _ORG_WORD_SUFFIXES:
|
|
130
|
+
return "organization"
|
|
131
|
+
if tok in _ORG_WORD_SUFFIXES:
|
|
132
|
+
return "organization"
|
|
133
|
+
|
|
134
|
+
# 2b. Organisation: domain-style suffix on the trailing token
|
|
135
|
+
# (Hugging.ai, Acme.io, etc.).
|
|
136
|
+
if tokens:
|
|
137
|
+
last = tokens[-1]
|
|
138
|
+
for suffix in _ORG_DOMAIN_SUFFIXES:
|
|
139
|
+
if last.endswith(suffix):
|
|
140
|
+
return "organization"
|
|
141
|
+
|
|
142
|
+
# 3. Project keywords.
|
|
143
|
+
for tok in tokens:
|
|
144
|
+
if tok.rstrip(".,") in _PROJECT_KEYWORDS:
|
|
145
|
+
return "project"
|
|
146
|
+
|
|
147
|
+
# 4. Concept keywords.
|
|
148
|
+
for tok in tokens:
|
|
149
|
+
if tok.rstrip(".,") in _CONCEPT_KEYWORDS:
|
|
150
|
+
return "concept"
|
|
151
|
+
|
|
152
|
+
# 5. Person pattern: two-or-more capitalised words, plain ASCII.
|
|
153
|
+
if _PERSON_NAME_RE.match(stripped):
|
|
154
|
+
return "person"
|
|
155
|
+
|
|
156
|
+
# 6. Fallback: concept (NEVER person -- see Proposal #51).
|
|
157
|
+
return "concept"
|
|
158
|
+
|
|
159
|
+
|
|
160
|
+
# ---------------------------------------------------------------------------
|
|
161
|
+
# Aliases / shims so older callers in services/remember.py and tests keep
|
|
162
|
+
# working without an import dance. The old private helper
|
|
163
|
+
# remember._infer_entity_type is preserved as a thin wrapper for backward
|
|
164
|
+
# compatibility.
|
|
165
|
+
# ---------------------------------------------------------------------------
|
|
166
|
+
|
|
167
|
+
|
|
168
|
+
def legacy_infer_entity_type(name: str) -> str:
|
|
169
|
+
"""Backward-compatible shim for the original heuristic semantics.
|
|
170
|
+
|
|
171
|
+
The original ``remember._infer_entity_type`` (added Apr 2026) returned
|
|
172
|
+
``"person"`` as the fallback. This shim still returns ``"person"`` for
|
|
173
|
+
plain single-word inputs like "Kamil" or "Sarah" so the existing
|
|
174
|
+
test_entity_type_inference.py suite keeps passing, while the new
|
|
175
|
+
``infer_entity_type`` is free to default to concept for genuinely
|
|
176
|
+
ambiguous inputs.
|
|
177
|
+
|
|
178
|
+
Used by ``RememberService.remember_entity`` where an explicit empty
|
|
179
|
+
``entity_type`` preserves the legacy "single-name = person" rule.
|
|
180
|
+
"""
|
|
181
|
+
inferred = infer_entity_type(name)
|
|
182
|
+
if inferred == "concept":
|
|
183
|
+
# Legacy callers expected a single-token name to be a person.
|
|
184
|
+
# Preserve that for compatibility with existing test fixtures and
|
|
185
|
+
# callers that already passed an explicit "" type to mean
|
|
186
|
+
# "default to person".
|
|
187
|
+
tokens = (name or "").strip().split()
|
|
188
|
+
if len(tokens) == 1 and tokens[0] and tokens[0][:1].isupper():
|
|
189
|
+
return "person"
|
|
190
|
+
if len(tokens) == 1 and tokens[0]:
|
|
191
|
+
return "person"
|
|
192
|
+
return inferred
|
|
@@ -23,6 +23,7 @@ from ..extraction.entity_extractor import (
|
|
|
23
23
|
extract_all,
|
|
24
24
|
get_extractor,
|
|
25
25
|
)
|
|
26
|
+
from .entities import infer_entity_type as _smart_infer_entity_type
|
|
26
27
|
from .guards import validate_entity, validate_memory, validate_relationship
|
|
27
28
|
|
|
28
29
|
logger = logging.getLogger(__name__)
|
|
@@ -366,10 +367,14 @@ class RememberService:
|
|
|
366
367
|
except Exception as e:
|
|
367
368
|
logger.warning(f"Could not store memory embedding: {e}")
|
|
368
369
|
|
|
369
|
-
# Link to entities
|
|
370
|
+
# Link to entities. Pass the memory content as context so the type
|
|
371
|
+
# inference heuristic can use it (Proposal #51).
|
|
370
372
|
if about_entities:
|
|
373
|
+
now_iso = datetime.utcnow().isoformat()
|
|
371
374
|
for entity_name in about_entities:
|
|
372
|
-
entity_id = self._find_or_create_entity(
|
|
375
|
+
entity_id = self._find_or_create_entity(
|
|
376
|
+
entity_name, content_context=content
|
|
377
|
+
)
|
|
373
378
|
if entity_id:
|
|
374
379
|
try:
|
|
375
380
|
self.db.insert(
|
|
@@ -382,6 +387,17 @@ class RememberService:
|
|
|
382
387
|
)
|
|
383
388
|
except Exception:
|
|
384
389
|
pass # Duplicate link, ignore
|
|
390
|
+
# Touch the entity so attention-tier and recency reflect
|
|
391
|
+
# that we just heard about it. Safe on existing rows;
|
|
392
|
+
# newly-created rows already have these set.
|
|
393
|
+
try:
|
|
394
|
+
self.db.execute(
|
|
395
|
+
"UPDATE entities SET last_contact_at = ?, "
|
|
396
|
+
"updated_at = ? WHERE id = ?",
|
|
397
|
+
(now_iso, now_iso, entity_id),
|
|
398
|
+
)
|
|
399
|
+
except Exception as e:
|
|
400
|
+
logger.debug(f"Could not touch entity {entity_id}: {e}")
|
|
385
401
|
|
|
386
402
|
logger.debug(f"Remembered {memory_type}: {content[:50]}...")
|
|
387
403
|
|
|
@@ -1807,8 +1823,25 @@ class RememberService:
|
|
|
1807
1823
|
entity_type=extracted.type,
|
|
1808
1824
|
)
|
|
1809
1825
|
|
|
1810
|
-
def _find_or_create_entity(
|
|
1811
|
-
|
|
1826
|
+
def _find_or_create_entity(
|
|
1827
|
+
self,
|
|
1828
|
+
name: str,
|
|
1829
|
+
entity_type: str = "",
|
|
1830
|
+
content_context: str = "",
|
|
1831
|
+
) -> Optional[int]:
|
|
1832
|
+
"""Find entity by name or create if not exists.
|
|
1833
|
+
|
|
1834
|
+
When ``entity_type`` is not supplied, the smarter heuristic in
|
|
1835
|
+
``services/entities.py`` is used so that names like "Markup AI"
|
|
1836
|
+
get typed as ``organization`` instead of the legacy
|
|
1837
|
+
``person`` fallback. Proposal #51 (2026-05-13).
|
|
1838
|
+
|
|
1839
|
+
Args:
|
|
1840
|
+
name: Entity name (canonical lookup is case-insensitive).
|
|
1841
|
+
entity_type: Optional explicit type. If empty, inferred.
|
|
1842
|
+
content_context: Optional surrounding memory text. Passed to
|
|
1843
|
+
the inference helper for future content-aware rules.
|
|
1844
|
+
"""
|
|
1812
1845
|
canonical = self.extractor.canonical_name(name)
|
|
1813
1846
|
|
|
1814
1847
|
# Try exact match
|
|
@@ -1829,13 +1862,19 @@ class RememberService:
|
|
|
1829
1862
|
if alias_match:
|
|
1830
1863
|
return alias_match["entity_id"]
|
|
1831
1864
|
|
|
1865
|
+
# Smart inference for the *type* we will use to fuzzy-match and create.
|
|
1866
|
+
# Without this, _fuzzy_find_entity queries with type="" (no matches)
|
|
1867
|
+
# and remember_entity falls back to person.
|
|
1868
|
+
effective_type = entity_type or _smart_infer_entity_type(name, content_context)
|
|
1869
|
+
|
|
1832
1870
|
# Fuzzy pre-check: find near-matches of the same type
|
|
1833
|
-
fuzzy_match = self._fuzzy_find_entity(canonical,
|
|
1871
|
+
fuzzy_match = self._fuzzy_find_entity(canonical, effective_type)
|
|
1834
1872
|
if fuzzy_match:
|
|
1835
1873
|
return fuzzy_match
|
|
1836
1874
|
|
|
1837
|
-
# Create new
|
|
1838
|
-
|
|
1875
|
+
# Create new with the inferred (or explicit) type. We pass the type
|
|
1876
|
+
# explicitly so remember_entity skips its own (legacy) inference.
|
|
1877
|
+
return self.remember_entity(name=name, entity_type=effective_type)
|
|
1839
1878
|
|
|
1840
1879
|
def _fuzzy_find_entity(self, canonical: str, entity_type: str) -> Optional[int]:
|
|
1841
1880
|
"""Find a near-match entity of the same type using fuzzy string matching.
|
package/package.json
CHANGED
|
@@ -1,9 +1,9 @@
|
|
|
1
1
|
{
|
|
2
|
-
"version": "1.
|
|
3
|
-
"generated": "2026-05-
|
|
2
|
+
"version": "1.58.0",
|
|
3
|
+
"generated": "2026-05-13T11:19:49.402Z",
|
|
4
4
|
"algorithm": "sha256",
|
|
5
5
|
"files": {
|
|
6
|
-
".claude/rules/claudia-principles.md": "
|
|
6
|
+
".claude/rules/claudia-principles.md": "939e9720421628e7f2e4c8dfbaa4aeb9c1e18e8c6a5379cd6b772a6835b812e5",
|
|
7
7
|
".claude/rules/data-freshness.md": "052b3b8f3f489a54fff065b29e0ffc46bfad6da1c3a42386170034f298599233",
|
|
8
8
|
".claude/rules/memory-availability.md": "48309e2683b267c0a17ebf019fb3ee1116150d811b1d93b4ddb52fed89ae1fe3",
|
|
9
9
|
".claude/rules/memory-commitment.md": "49eee330b56c6ca0b5f1e01550931c4eef3dcb3249e3d0e2380de3e8dbfe31a8",
|
|
@@ -68,6 +68,6 @@
|
|
|
68
68
|
".claude/skills/vault-awareness.md": "5a9c3d3f0b907750b9941a51ec9308c2bd465aa9bf3c513124f640696e0a4c2c",
|
|
69
69
|
".claude/skills/weekly-review/SKILL.md": "fe7df64d3df18a0a47f98f7d4ef83cf98cef78cd9e0170a9316005035a2c6df7",
|
|
70
70
|
".claude/skills/what-am-i-missing/SKILL.md": "56736397717c4f0a25cab17e7258702e5b04d20b48727c262fdf66e3b2e5899f",
|
|
71
|
-
"CLAUDE.md": "
|
|
71
|
+
"CLAUDE.md": "736c5d95f1477ea6ef1c00d3006eeb4b369eb2bd5f3a926e4d78c42f857fa881"
|
|
72
72
|
}
|
|
73
73
|
}
|
|
@@ -230,7 +230,7 @@ MEMORY.md persists across sessions automatically. Because of this convenience, i
|
|
|
230
230
|
| File locations | "Interview files live in workspaces/beemok/interviews/" | Stable reference |
|
|
231
231
|
| Process knowledge | "Interviews follow the capture-interview skill" | Process, not status |
|
|
232
232
|
| Preferences | "User prefers detailed briefs over minimal ones" | Slow-changing |
|
|
233
|
-
| Tool configuration | "Gmail MCP is connected, Otter.ai
|
|
233
|
+
| Tool configuration | "Gmail MCP is connected, Otter.ai integration enabled" | Infrastructure |
|
|
234
234
|
|
|
235
235
|
### What MUST NOT go in MEMORY.md
|
|
236
236
|
|
|
@@ -37,15 +37,6 @@
|
|
|
37
37
|
"extended": "Adds Docs, Sheets, Tasks, Chat (83 tools)",
|
|
38
38
|
"complete": "All services including Slides, Forms, Apps Script (111 tools)"
|
|
39
39
|
}
|
|
40
|
-
},
|
|
41
|
-
"rube": {
|
|
42
|
-
"type": "http",
|
|
43
|
-
"url": "https://mcp.composio.dev",
|
|
44
|
-
"headers": {
|
|
45
|
-
"x-composio-api-key": ""
|
|
46
|
-
},
|
|
47
|
-
"_description": "Rube by Composio -- 500+ app integrations (HTTP, no stdio conflict)",
|
|
48
|
-
"_setup": "Connect 500+ apps via Rube: Slack, Notion, Drive, GitHub, Granola, Otter.ai, Jira, Linear, Airtable, HubSpot, Stripe, and more. 1) Sign up free at https://rube.app 2) Connect your apps in Rube's marketplace 3) Copy your API key from the dashboard 4) Paste it in x-composio-api-key above 5) Restart Claude Code. See Rube section in CLAUDE.md for full guide."
|
|
49
40
|
}
|
|
50
41
|
},
|
|
51
42
|
|
|
@@ -53,7 +44,6 @@
|
|
|
53
44
|
"memory": "Claudia's memory is powered by the claudia-memory daemon (Python MCP server). It provides ~33 tools for semantic search, pattern detection, and relationship tracking. The installer (npx get-claudia) automatically sets up the daemon in a Python venv at ~/.claudia/daemon/venv/ and configures .mcp.json.",
|
|
54
45
|
"google_options": "Two paths for Google integration: (A) gmail + google-calendar (lightweight, focused, fewer tools) or (B) google_workspace via workspace-mcp (all-in-one, more tools). Both can coexist. See Google Integration Setup in CLAUDE.md.",
|
|
55
46
|
"google_workspace": "Google Workspace uses the workspace-mcp server (taylorwilsdon/google_workspace_mcp). One server covers Gmail, Calendar, Drive, Docs, Sheets, Tasks, and more. Tool tiers control context usage.",
|
|
56
|
-
"rube": "Rube (by Composio) connects 500+ apps through one HTTP MCP connection. Each user creates their own free Rube account at rube.app. See the Rube section in CLAUDE.md for setup and troubleshooting.",
|
|
57
47
|
"security": "Each user authenticates with their own accounts and credentials. OAuth tokens are stored locally on your machine, never shared.",
|
|
58
48
|
"not_included": {
|
|
59
49
|
"filesystem": "Claude Code has native Read/Write/Edit tools",
|
package/template-v2/CLAUDE.md
CHANGED
|
@@ -334,85 +334,7 @@ I adapt to whatever tools are available. When you ask me to do something that ne
|
|
|
334
334
|
|
|
335
335
|
Both options can coexist in `.mcp.json`. When both are present, I use whichever tools are available. If the MCP tools aren't responding or you see authentication errors, the user needs to set up their Google Cloud credentials. See the **Google Integration Setup** section below.
|
|
336
336
|
|
|
337
|
-
**
|
|
338
|
-
|
|
339
|
-
If a user asks about connecting apps, integrations, or any of the services listed below, proactively suggest Rube as the easiest path. Walk them through setup step by step.
|
|
340
|
-
|
|
341
|
-
**What you can connect through Rube:**
|
|
342
|
-
|
|
343
|
-
| Category | Apps |
|
|
344
|
-
|----------|------|
|
|
345
|
-
| **Communication** | Slack, Discord, Microsoft Teams, Telegram |
|
|
346
|
-
| **Email** | Gmail, Outlook, Yahoo Mail |
|
|
347
|
-
| **Meeting Notes & Transcription** | Granola, Otter.ai, Fireflies.ai |
|
|
348
|
-
| **Project Management** | Jira, Linear, Asana, Trello, Monday.com, ClickUp, Basecamp |
|
|
349
|
-
| **Knowledge & Docs** | Notion, Confluence, Google Docs, Coda |
|
|
350
|
-
| **Cloud Storage** | Google Drive, Dropbox, OneDrive, Box |
|
|
351
|
-
| **Code & Dev** | GitHub, GitLab, Bitbucket |
|
|
352
|
-
| **Databases & Spreadsheets** | Airtable, Google Sheets, Supabase, PostgreSQL |
|
|
353
|
-
| **CRM & Sales** | HubSpot, Salesforce, Pipedrive |
|
|
354
|
-
| **Design** | Figma, Canva |
|
|
355
|
-
| **Social Media** | X/Twitter, LinkedIn, Instagram |
|
|
356
|
-
| **Finance** | Stripe, QuickBooks, Xero |
|
|
357
|
-
| **Calendar** | Google Calendar, Outlook Calendar, Calendly |
|
|
358
|
-
| **And 500+ more** | Browse the full list at [rube.app](https://rube.app) |
|
|
359
|
-
|
|
360
|
-
**External integrations** (Gmail, Calendar, Google Workspace, Rube, Brave Search) are optional add-ons that extend what I can see and do. I work fully without them. The core value is relationships and context.
|
|
361
|
-
|
|
362
|
-
### Rube Setup (Guide Users Through This)
|
|
363
|
-
|
|
364
|
-
**Step 1: Create a Rube account**
|
|
365
|
-
- Go to [rube.app](https://rube.app) and sign up (free tier available)
|
|
366
|
-
- This is the user's own account. They manage billing and app connections directly with Rube.
|
|
367
|
-
|
|
368
|
-
**Step 2: Connect the apps you want**
|
|
369
|
-
- In the Rube dashboard, browse the marketplace
|
|
370
|
-
- Click any app to connect it (each uses its own OAuth popup, handled by Rube)
|
|
371
|
-
- Start with the apps you use most: Slack, Notion, GitHub, Google Drive, etc.
|
|
372
|
-
- You can add more apps at any time without reconfiguring Claudia
|
|
373
|
-
|
|
374
|
-
**Step 3: Copy the API key**
|
|
375
|
-
- In the Rube dashboard, find the API key / Bearer token
|
|
376
|
-
- It may be under "Settings", "MCP Settings", or "Install Rube"
|
|
377
|
-
|
|
378
|
-
**Step 4: Add to Claudia's config**
|
|
379
|
-
- Open `.mcp.json` in the project root
|
|
380
|
-
- Find the `rube` server section
|
|
381
|
-
- Paste the API key into the `COMPOSIO_API_KEY` value:
|
|
382
|
-
```json
|
|
383
|
-
"env": {
|
|
384
|
-
"COMPOSIO_API_KEY": "paste-key-here"
|
|
385
|
-
}
|
|
386
|
-
```
|
|
387
|
-
- Alternatively, set it as an environment variable: `export COMPOSIO_API_KEY=paste-key-here`
|
|
388
|
-
|
|
389
|
-
**Step 5: Restart Claude Code**
|
|
390
|
-
- Close and reopen Claude Code for the MCP to connect
|
|
391
|
-
- Once connected, Rube's tools appear automatically
|
|
392
|
-
|
|
393
|
-
**Using Rube-connected apps:** Once Rube is connected, use the tools naturally. Examples:
|
|
394
|
-
- "Send a Slack message to #general saying the deploy is done"
|
|
395
|
-
- "Create a Notion page with today's meeting notes"
|
|
396
|
-
- "List my open GitHub issues on the claudia repo"
|
|
397
|
-
- "Search my Google Drive for the Q4 report"
|
|
398
|
-
- "Show me my Granola meeting notes from this week"
|
|
399
|
-
- "Add a task to my Jira sprint"
|
|
400
|
-
- "Check my Stripe dashboard for recent payments"
|
|
401
|
-
- "Create an Airtable record in the Contacts table"
|
|
402
|
-
|
|
403
|
-
The MCP tools from Rube will have names like `SLACK_SEND_MESSAGE`, `NOTION_CREATE_PAGE`, `GITHUB_LIST_ISSUES`, etc. Use them when they match what the user is asking for.
|
|
404
|
-
|
|
405
|
-
**Troubleshooting Rube:**
|
|
406
|
-
|
|
407
|
-
| Problem | Solution |
|
|
408
|
-
|---------|----------|
|
|
409
|
-
| Rube MCP not connecting | Check that `COMPOSIO_API_KEY` is set in `.mcp.json` or environment. Restart Claude Code. |
|
|
410
|
-
| Tool not found for an app | The user needs to connect that app in Rube's marketplace first (rube.app dashboard). |
|
|
411
|
-
| Authentication expired | The user should reconnect the specific app in Rube's dashboard (re-authorize OAuth). |
|
|
412
|
-
| Rate limited | Rube has usage limits on the free tier. The user may need to upgrade at rube.app/pricing. |
|
|
413
|
-
| Want to disconnect an app | Go to Rube dashboard and disconnect the app there. No Claudia config changes needed. |
|
|
414
|
-
|
|
415
|
-
**Rube vs. direct Google MCPs:** Rube works alongside (not instead of) the direct Google MCP servers (gmail, google-calendar, or workspace-mcp). Direct servers give a connection with no intermediary but require Google Cloud setup. Rube gives one setup for everything but routes data through Composio servers. All can coexist. If a user has both direct Google MCPs and Rube's Google apps connected, prefer the direct MCP tools.
|
|
337
|
+
**External integrations** (Gmail, Calendar, Google Workspace, Brave Search) are optional add-ons that extend what I can see and do. I work fully without them. The core value is relationships and context.
|
|
416
338
|
|
|
417
339
|
### Google Integration Setup
|
|
418
340
|
|