omnius 1.0.42 → 1.0.44
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +94 -30
- package/dist/index.js +1190 -53
- package/npm-shrinkwrap.json +2 -2
- package/package.json +2 -2
package/README.md
CHANGED
|
@@ -103,6 +103,7 @@ An autonomous multi-turn tool-calling agent that reads your code, makes changes,
|
|
|
103
103
|
- [Zettelkasten Linking (A-MEM)](#zettelkasten-linking-a-mem)
|
|
104
104
|
- [PPR Retrieval (HippoRAG)](#ppr-retrieval-hipporag)
|
|
105
105
|
- [Cross-Modal Binding](#cross-modal-binding)
|
|
106
|
+
- [Scoped Visual Identity Recall](#scoped-visual-identity-recall)
|
|
106
107
|
- [Gist Compression](#gist-compression)
|
|
107
108
|
- [Near-Critical Cognitive Architecture](#near-critical-cognitive-architecture)
|
|
108
109
|
- [Cross‑Modality Identity & Association (CLIP + Voice)](#crossmodality-identity--association-clip--voice)
|
|
@@ -240,7 +241,7 @@ An LLM is a high-bandwidth associative generative core — closer to a cortex-li
|
|
|
240
241
|
|---|---|---|
|
|
241
242
|
| Associative core | Cortex | LLM weights (any size) |
|
|
242
243
|
| Current workspace | Global workspace / attention | `assembleContext()` — structured context assembly |
|
|
243
|
-
| Episodic memory | Hippocampus | `.omnius/
|
|
244
|
+
| Episodic memory | Hippocampus | `.omnius/episodes.db` + `.omnius/knowledge.db` — write, search, retrieve, and link across sessions |
|
|
244
245
|
| Cognitive map | Hippocampal spatial maps | `semantic-map.ts` + `repo-map.ts` (PageRank) |
|
|
245
246
|
| Action gating | Basal ganglia | Tool selection policy (task-aware filtering) |
|
|
246
247
|
| Temporal hierarchy | Prefrontal executive | Task decomposition, sub-agent delegation |
|
|
@@ -303,7 +304,8 @@ Omnius includes background workers that compute and associate embeddings across
|
|
|
303
304
|
- Visual embeddings: CLIP ViT-B/32 (OpenCLIP) image embeddings for episodes with `modality: "visual"`.
|
|
304
305
|
- Audio embeddings: speaker embeddings (ECAPA) when available; automatic fallback to normalized log‑mel in constrained environments.
|
|
305
306
|
- Transcription: Whisper runs automatically for audio ingests; transcripts are stored as text episodes and embedded for retrieval.
|
|
306
|
-
- Associations: `appears_in` for visual presence, `said_by` for transcripts, and `alias_of` for alternate labels (e.g., username + display name). Workers also link visual episodes to nearby transcripts via a time-window co‑occurrence pass.
|
|
307
|
+
- Associations: `appears_in` for visual presence, `said_by` for transcripts, `depicts` / `named_as` / `same_person_candidate` for identity evidence, and `alias_of` for alternate labels (e.g., username + display name). Workers also link visual episodes to nearby transcripts via a time-window co‑occurrence pass.
|
|
308
|
+
- Scoped visual identity recall: image ingress in TUI, GUI, Telegram private chats, and Telegram groups runs structured face identification against prior explicit enrollments. If a known face matches, Omnius injects a same-scope recall block and commits graph evidence; if a face is unknown, it nudges the agent to ask who it is instead of guessing.
|
|
307
309
|
|
|
308
310
|
Config (env vars):
|
|
309
311
|
|
|
@@ -350,7 +352,7 @@ The daemon auto-installs Python dependencies (OpenCLIP, torchaudio + soundfile,
|
|
|
350
352
|
- **Mid-task steering** — type while the agent works to add context without interrupting
|
|
351
353
|
- **Smart compaction** — 6 context compaction strategies (default, aggressive, decisions, errors, summary, structured) with ARC-inspired active context revision ([arXiv:2601.12030](https://arxiv.org/abs/2601.12030)) that preserves structural file content through compaction, preventing small-model repetitive loops at the root cause. Success signals and content previews survive compaction so models never lose evidence that tools succeeded
|
|
352
354
|
- **Memex experience archive** — large tool outputs archived during compaction with hash-based retrieval
|
|
353
|
-
- **Persistent memory** — learned patterns stored
|
|
355
|
+
- **Persistent memory** — learned patterns, episodes, and temporal graph evidence are stored under `.omnius/` across sessions (`episodes.db`, `knowledge.db`, and specialized `.omnius/memory/` stores for procedural and subsystem memory)
|
|
354
356
|
- **Structured procedural memory (SQLite)** — replaces flat JSON with a full relational database: CRUD with soft-delete, revision tracking, embedding storage (float32 BLOB), bidirectional memory linking with confidence scores. Inspired by [ExpeL](https://arxiv.org/abs/2308.10144) (contrastive extraction) and [TIMG](https://arxiv.org/abs/2603.10600) (structured procedural format). 79 unit tests
|
|
355
357
|
- **Semantic memory search** — vector embeddings via [Ollama /api/embed](https://ollama.com) (nomic-embed-text, 768-dim) with cosine similarity search over stored memories. Auto-generates embeddings on memory creation. Auto-links related memories when similarity > 0.6. Graceful fallback to text search when Ollama unavailable
|
|
356
358
|
- **LLM-based memory extraction** — post-task, the LLM itself extracts structured procedural memories (CATEGORY/TRIGGER/LESSON/STEPS) instead of copying raw error text verbatim. Based on [ExpeL](https://arxiv.org/abs/2308.10144) and [AWM](https://arxiv.org/abs/2409.07429) patterns
|
|
@@ -979,8 +981,11 @@ Also cleans up the Docker container if the job was spawned with `"sandbox":"cont
|
|
|
979
981
|
| GET | `/v1/memory` | read | Memory backends summary |
|
|
980
982
|
| POST | `/v1/memory/search` | read | Vector + keyword search |
|
|
981
983
|
| POST | `/v1/memory/write` | run | Write a memory entry |
|
|
984
|
+
| POST | `/v1/memory/ingest` | run | Structured multimodal ingest for visual/audio/text media. Writes episodes + temporal graph atoms and returns scoped visual identity recall metadata when a known face matches. |
|
|
985
|
+
| GET | `/v1/memory/entities` | read | List temporal graph entities, including stored `person:` identity nodes |
|
|
982
986
|
| GET | `/v1/memory/episodes` | read | Paginated episode list |
|
|
983
987
|
| GET | `/v1/memory/failures` | read | Paginated failure list |
|
|
988
|
+
| POST | `/v1/chat/attachments` | run | Browser chat attachment upload. Saves media under `.omnius/gui-attachments/`, ingests it with GUI scope, and returns a context block for the next chat turn. |
|
|
984
989
|
| GET | `/v1/skills` | read | List AIWG + custom skills (paginated) |
|
|
985
990
|
| GET | `/v1/skills/:name` | read | Skill content |
|
|
986
991
|
| GET | `/v1/mcps` | read | List MCP servers |
|
|
@@ -1325,8 +1330,30 @@ curl -s 'http://127.0.0.1:11435/v1/memory/episodes?limit=10'
|
|
|
1325
1330
|
|
|
1326
1331
|
# Paginated failure store (anti-patterns)
|
|
1327
1332
|
curl -s 'http://127.0.0.1:11435/v1/memory/failures?limit=10'
|
|
1333
|
+
|
|
1334
|
+
# Structured multimodal ingest (visual/audio/text)
|
|
1335
|
+
curl -s -X POST http://127.0.0.1:11435/v1/memory/ingest \
|
|
1336
|
+
-d '{"sourceSurface":"api","scope":{"kind":"gui","id":"demo"},"modality":"visual","media_path":"/abs/path/person.jpg","media_type":"photo"}'
|
|
1337
|
+
|
|
1338
|
+
# Stored graph identity/entity nodes
|
|
1339
|
+
curl -s 'http://127.0.0.1:11435/v1/memory/entities?type=person&limit=25'
|
|
1340
|
+
```
|
|
1341
|
+
|
|
1342
|
+
`/v1/memory/ingest` writes through the same `MultimodalIdentityService` used by Telegram, TUI, and GUI attachments. Visual media is stored as an episode, linked into the temporal graph with explicit `scope`, `sender`, `message`, `replyTo`, and `media` atoms, and, when `visual_memory identify` returns a structured prior-enrolled face match, the response includes:
|
|
1343
|
+
|
|
1344
|
+
```json
|
|
1345
|
+
{
|
|
1346
|
+
"visualIdentity": {
|
|
1347
|
+
"matches": [{"name": "Cole", "confidence": 0.91}],
|
|
1348
|
+
"recalledEpisodes": [{"content": "Alice named this person as Cole."}],
|
|
1349
|
+
"committedEpisodeIds": ["..."],
|
|
1350
|
+
"contextBlock": "## Scoped Visual Identity Recall\n..."
|
|
1351
|
+
}
|
|
1352
|
+
}
|
|
1328
1353
|
```
|
|
1329
1354
|
|
|
1355
|
+
No identity is guessed from captions. New person names are stored only when the agent explicitly calls `identity_memory` from user intent, or when a previously staged next-image identity assertion is consumed in the same scope.
|
|
1356
|
+
|
|
1330
1357
|
**Example search response** — search returns real episode records with timestamps, content, importance scores, and retrieval counts:
|
|
1331
1358
|
|
|
1332
1359
|
```json
|
|
@@ -1713,6 +1740,7 @@ Open `http://localhost:11435/` in a browser when `omnius serve` is running. Zero
|
|
|
1713
1740
|
- Model picker populated from `/v1/models`
|
|
1714
1741
|
- API key support (stored in localStorage)
|
|
1715
1742
|
- System prompt (collapsible textarea)
|
|
1743
|
+
- Chat attachment upload through `/v1/chat/attachments`; images are saved under `.omnius/gui-attachments/`, ingested with GUI session scope, and can return scoped visual identity recall context before the next agent turn
|
|
1716
1744
|
- Markdown rendering with code block copy buttons
|
|
1717
1745
|
- Docker sandbox toggle (native vs container execution)
|
|
1718
1746
|
- Workspace sidebar (toggleable file tree)
|
|
@@ -2113,6 +2141,7 @@ On startup and `/model` switch, Omnius detects your RAM/VRAM and creates an opti
|
|
|
2113
2141
|
| `memory_read` | Read from persistent memory store by topic and key |
|
|
2114
2142
|
| `memory_write` | Store facts/patterns in persistent memory with provenance tracking |
|
|
2115
2143
|
| `memory_search` | Semantic search across all memory entries by query |
|
|
2144
|
+
| `identity_memory` | Scoped multimodal identity memory. Explicitly assert current-media identity, stage a name for the next same-scope image, identify enrolled faces, and recall graph evidence without regex name guessing |
|
|
2116
2145
|
| `memex_retrieve` | Recover full tool output archived during context compaction by hash ID |
|
|
2117
2146
|
| **Git & Diagnostics** | |
|
|
2118
2147
|
| `diagnostic` | Lint/typecheck/test/build validation pipeline in one call |
|
|
@@ -2161,7 +2190,7 @@ On startup and `/model` switch, Omnius detects your RAM/VRAM and creates an opti
|
|
|
2161
2190
|
| `audio_analyze` | Audio scene analysis — YAMNet 521-class classification (AudioSet taxonomy), Silero VAD voice activity detection, FFT spectrum analysis with peak frequency detection |
|
|
2162
2191
|
| `asr_listen` | Record from microphone and transcribe speech to text — combines audio capture + Whisper ASR in one call. Uses PipeWire (bluetooth/USB) → faster-whisper → openai-whisper backends |
|
|
2163
2192
|
| **Visual Intelligence** | |
|
|
2164
|
-
| `visual_memory` | Face recognition + object memory — InsightFace ArcFace 512d face enrollment/identification, CLIP ViT-B/32 object teaching/recognition. Persistent face+object databases in
|
|
2193
|
+
| `visual_memory` | Face recognition + object memory — InsightFace ArcFace 512d face enrollment/identification, CLIP ViT-B/32 object teaching/recognition. `detect`, `identify`, and `recognize` support `format=json` for machine-readable memory plumbing. Persistent face+object databases in `~/.omnius/visual-memory/` |
|
|
2165
2194
|
| `multimodal_memory` | Cross-modal episode binding — captures face + voice + text + location into unified episodes. Actions: capture (photo+audio), meet (register person with name+face+voice), recall (associative retrieval), timeline (chronological query) |
|
|
2166
2195
|
| **Associative Memory** | |
|
|
2167
2196
|
| `episode_store` | SQLite episode store with triple-factor scoring (recency x importance x relevance), 4-class temporal decay (session/daily/procedural/permanent), Ebbinghaus strengthening on retrieval |
|
|
@@ -2228,6 +2257,9 @@ The agent can access physical hardware — cameras, microphones, and speakers
|
|
|
2228
2257
|
| Transcribe audio file | `asr_listen` action=transcribe file="rec.wav" | Whisper transcription |
|
|
2229
2258
|
| Enroll a face | `visual_memory` action=enroll name="Alice" image="photo.jpg" | Face database entry |
|
|
2230
2259
|
| Identify faces | `visual_memory` action=identify image="photo.jpg" | Known face matches |
|
|
2260
|
+
| Remember current image identity | `identity_memory` action=assert_identity name="Alice" media="latest" | Scoped graph evidence + face enrollment attempt |
|
|
2261
|
+
| Name the next image | `identity_memory` action=stage_identity name="Alice" | Pending same-scope assertion consumed by later image ingress |
|
|
2262
|
+
| Ask who is in an image | `identity_memory` action=identify media="reply" | Prior enrolled face match + scoped recall context |
|
|
2231
2263
|
| Teach an object | `visual_memory` action=teach label="coffee_mug" image="obj.jpg" | CLIP object memory |
|
|
2232
2264
|
| Meet a person | `multimodal_memory` action=meet name="Bob" | Photo+voice+text episode |
|
|
2233
2265
|
| Recall a person | `multimodal_memory` action=recall query="Bob" | Associative memory search |
|
|
@@ -2245,7 +2277,7 @@ The agent can access physical hardware — cameras, microphones, and speakers
|
|
|
2245
2277
|
|
|
2246
2278
|
**Mesh/GPS/SDR**: Auto-installs dependencies when hardware is detected. Meshtastic creates a Python venv with the CLI. GPS auto-probes NMEA at multiple baud rates. RTL-SDR auto-blacklists kernel modules and installs udev rules via pkexec.
|
|
2247
2279
|
|
|
2248
|
-
**Visual Intelligence**: `visual_memory` provides persistent face recognition (InsightFace ArcFace 512d) and object memory (CLIP ViT-B/32). `multimodal_memory` binds all modalities into cross-session episodes with associative recall.
|
|
2280
|
+
**Visual Intelligence**: `visual_memory` provides persistent face recognition (InsightFace ArcFace 512d) and object memory (CLIP ViT-B/32). `identity_memory` is the agent-facing scoped layer that records explicit user-provided names, stages "next image is X" chronology, asks who unknown people are when identity matters, and recalls same-scope graph evidence. `multimodal_memory` binds all modalities into cross-session episodes with associative recall.
|
|
2249
2281
|
|
|
2250
2282
|
|
|
2251
2283
|
## Model Context Protocol (MCP)
|
|
@@ -3478,6 +3510,11 @@ Connect the agent to a Telegram bot. Telegram can run in auto, chat, or action m
|
|
|
3478
3510
|
/telegram personal <user_id> <limit> # Fetch profile personal chat messages
|
|
3479
3511
|
/telegram access get <managed_bot_user_id> # Show managed-bot access restrictions
|
|
3480
3512
|
/telegram access set <managed_bot_user_id> <restricted|open> [user_ids] # Configure access
|
|
3513
|
+
/telegram tools # Show unified Telegram tool policy
|
|
3514
|
+
/telegram tools group <chat> # Show a chat-scoped policy override
|
|
3515
|
+
/telegram tools panel <chat> [policy_chat] # Send inline admin toggle buttons
|
|
3516
|
+
/telegram delete-message <chat> <msg> [reason] # Delete one message
|
|
3517
|
+
/telegram delete-messages <chat> <msg,msg> [reason] # Delete multiple messages
|
|
3481
3518
|
/telegram delete-reaction <chat> <msg> --user <id> # Delete a message reaction
|
|
3482
3519
|
/telegram delete-reactions <chat> --user <id> # Delete recent reactions
|
|
3483
3520
|
```
|
|
@@ -3516,6 +3553,8 @@ The Telegram bridge handles modern Bot API traffic directly:
|
|
|
3516
3553
|
- **Direct-message channels** — direct-message channel/topic metadata is preserved on normalized Telegram messages for routing and future adapter logic.
|
|
3517
3554
|
- **Drafts and profile chat reads** — `/telegram draft` can set or clear message drafts, and `/telegram personal <user_id> <limit>` fetches recent messages from a user's profile personal chat.
|
|
3518
3555
|
- **Reactions/admin helpers** — `/telegram delete-reaction` removes a message reaction, `/telegram delete-reactions` removes recent reactions by a user/chat, and `/telegram admins <chat> [--bots]` can include bot administrators.
|
|
3556
|
+
- **Unified Telegram tool** — Telegram-sourced agent runs receive one scoped `telegram` tool for Bot API operations. Read/media actions are available by default; janitorial deletion, reaction cleanup, moderation, bot-admin changes, and message send/edit actions require admin context plus explicit `telegramToolPolicy` enablement. Before group deletes or moderation, Omnius checks the bot's Telegram rights with `getMe` + `getChatMember` and returns the Bot API error if Telegram refuses the operation.
|
|
3557
|
+
- **Inline tool controls** — `/telegram tools panel <chat> [policy_chat]` sends an inline keyboard that toggles global or chat-scoped Telegram tool groups. Telegram callback queries are answered immediately so client spinners stop, and policy is persisted under `.omnius/settings.json`.
|
|
3519
3558
|
- **TUI Telegram visibility** — Telegram quick-chat and action conversations register as blue plane-labeled views in the systems panel; clicking a view swaps the main scrollable window to that Telegram-only buffer, and the footer shows an active plane indicator while the bridge or Telegram work is active.
|
|
3520
3559
|
|
|
3521
3560
|
### Admin Slash Command Passthrough
|
|
@@ -3567,7 +3606,7 @@ While the sub-agent is working, users see:
|
|
|
3567
3606
|
|
|
3568
3607
|
### Public User Isolation
|
|
3569
3608
|
|
|
3570
|
-
Public users get **per-chat isolated memory** — each chat
|
|
3609
|
+
Public users get **per-chat isolated memory** — each chat is stored with explicit multimodal scope (`scope.kind = "group"|"private"`, `scope.id = chatId`) so public users can store and retrieve facts about their conversation without accessing or polluting unrelated chat memory. Public tools include: `memory_read`, `memory_write` (scoped), `memory_search`, `identity_memory` (scoped explicit identity evidence), `web_search`, `web_fetch`, and scoped minimal reminders via `reminder`/`remind`.
|
|
3571
3610
|
|
|
3572
3611
|
The bridge also maintains a per-chat conversation state file with recent history, participants, relationship signals, and lightweight Zettelkasten memory cards. Each Telegram group or private chat gets its own scoped personality document under `.omnius/scoped-personality/telegram-chat/`; that profile is updated as people talk and injected into future Telegram context so tone, pacing, names, and relationships stay available turn to turn.
|
|
3573
3612
|
|
|
@@ -3587,9 +3626,9 @@ Tools are gated per execution context. The system enforces strict separation bet
|
|
|
3587
3626
|
| Context | Default Tools | Notes |
|
|
3588
3627
|
|---------|--------------|-------|
|
|
3589
3628
|
| `terminal` | All tools | Wide open — shell, file read/write, everything |
|
|
3590
|
-
| `telegram-admin-dm` | All except shell | Admin DM — full tools, shell blocked by default (overridable) |
|
|
3591
|
-
| `telegram-admin-group` | Read-only + web + vision/OCR + scoped reminders | Admin in public group —
|
|
3592
|
-
| `telegram-public` | Memory r/w, web fetch/search, scoped creative tools, scoped minimal reminders | Public users — no arbitrary local file access or
|
|
3629
|
+
| `telegram-admin-dm` | All except shell + scoped `telegram` tool | Admin DM — full tools, shell blocked by default (overridable); Telegram janitorial/moderation actions still require explicit policy and Bot API rights |
|
|
3630
|
+
| `telegram-admin-group` | Read-only + web + vision/OCR + scoped reminders + scoped `telegram` tool | Admin in public group — current-chat only; high-risk Telegram actions require policy enablement |
|
|
3631
|
+
| `telegram-public` | Memory r/w, web fetch/search, scoped creative tools, scoped minimal reminders + read/media `telegram` actions | Public users — no arbitrary local file access, shell, moderation, bot-admin, or janitorial actions |
|
|
3593
3632
|
| `api` | All tools | API endpoint — configurable |
|
|
3594
3633
|
|
|
3595
3634
|
**System tools** (`shell`, `file_write`, `file_edit`, `file_read`, `file_patch`, `batch_edit`, `grep_search`, `glob_find`, `list_directory`, `code_sandbox`, `codebase_map`, `git_info`, etc.) are **never exposed** in public-facing contexts.
|
|
@@ -3626,14 +3665,16 @@ The bridge distinguishes between **private DMs** and **group/supergroup chats**,
|
|
|
3626
3665
|
|
|
3627
3666
|
Photos, audio, voice messages, video, video notes, and documents sent via Telegram are automatically downloaded and processed:
|
|
3628
3667
|
|
|
3629
|
-
1. **Download** — files are fetched via the Telegram `getFile` API and cached to `.omnius/media-cache/`
|
|
3668
|
+
1. **Download** — files are fetched via the Telegram `getFile` API and cached to `.omnius/telegram-media-cache/`
|
|
3630
3669
|
2. **Processing** — routed to the appropriate pipeline:
|
|
3631
|
-
- Images →
|
|
3670
|
+
- Images → vision ingress (`vision` / OCR context), multimodal memory ingest, and scoped visual identity association
|
|
3632
3671
|
- Audio/voice → `transcribe_file` tool
|
|
3633
3672
|
- Video/video notes → `transcribe_file` (audio track extraction)
|
|
3634
3673
|
- Documents → `pdf_to_text` / `ocr_pdf` for PDFs, `file_read` for text
|
|
3635
|
-
3. **
|
|
3636
|
-
4. **
|
|
3674
|
+
3. **Structured memory ingest** — media is posted to `/v1/memory/ingest` with `sourceSurface`, `scope`, `sender`, `message`, `replyTo`, `media`, transcript or extracted visual context, and Telegram chat/message IDs. If the daemon is unavailable, the bridge falls back to local scoped identity association.
|
|
3675
|
+
4. **Identity recall** — images run `visual_memory identify` with `format=json`. Prior enrolled face matches inject a `Scoped Visual Identity Recall` block and commit `same_person_candidate` / `depicts` graph evidence. Pending same-scope `identity_memory action="stage_identity"` assertions are consumed by the next image and enrolled. Unknown faces inject a prompt for the agent to ask who the person is when relevant.
|
|
3676
|
+
5. **Context injection** — processing results, reply relationship data, and identity recall blocks are prepended to the user's message as additional context for the sub-agent
|
|
3677
|
+
6. **Cache cleanup** — media files are cached for 30 minutes, then automatically deleted. Only scoped metadata (filename, type, chat ID, message ID, sender, processing summary, identity graph evidence) is persisted long-term per chat
|
|
3637
3678
|
|
|
3638
3679
|
### Rate Limit Handling
|
|
3639
3680
|
|
|
@@ -3922,15 +3963,17 @@ Omnius implements a full associative memory system inspired by hippocampal episo
|
|
|
3922
3963
|
┌─────────────────────────────────────────────────────────────────┐
|
|
3923
3964
|
│ Associative Memory Pipeline │
|
|
3924
3965
|
│ │
|
|
3925
|
-
│ Tool Call → Episode Store → Temporal KG
|
|
3926
|
-
│
|
|
3927
|
-
│
|
|
3928
|
-
│
|
|
3929
|
-
│
|
|
3930
|
-
│
|
|
3931
|
-
│
|
|
3932
|
-
│
|
|
3933
|
-
│
|
|
3966
|
+
│ Tool Call / Media Ingest → Episode Store → Temporal KG │
|
|
3967
|
+
│ │ │ │ │
|
|
3968
|
+
│ Triple-Factor Entity/Scope Zettelkasten Links │
|
|
3969
|
+
│ Scoring Edges (Graphiti) (A-MEM cosine) │
|
|
3970
|
+
│ │ │ │ │
|
|
3971
|
+
│ ├──── Multimodal Identity Service ────┐ │
|
|
3972
|
+
│ │ (sender/message/media/person) │ │
|
|
3973
|
+
│ └───── PPR Retrieval ─────────────────┘ │
|
|
3974
|
+
│ (HippoRAG) │
|
|
3975
|
+
│ │ │
|
|
3976
|
+
│ Scoped Context Injection + Recall │
|
|
3934
3977
|
└─────────────────────────────────────────────────────────────────┘
|
|
3935
3978
|
```
|
|
3936
3979
|
|
|
@@ -3944,6 +3987,7 @@ Every tool call generates an episode stored in SQLite with WAL journal mode:
|
|
|
3944
3987
|
| `importance` | 0-10 scale (errors=8, file edits=6, reads=3) |
|
|
3945
3988
|
| `decay_class` | session (1h), daily (1d), procedural (30d), permanent (∞) |
|
|
3946
3989
|
| `embedding` | 384d vector for semantic similarity |
|
|
3990
|
+
| `clip_embedding` | OpenCLIP-compatible image/text vector for cross-modal retrieval when available |
|
|
3947
3991
|
| `strength` | Ebbinghaus curve — increases on each retrieval |
|
|
3948
3992
|
|
|
3949
3993
|
**Scoring**: `score = recency_weight × importance × relevance` — the triple-factor model from [Generative Agents (Park et al., 2023)](https://arxiv.org/abs/2304.03442).
|
|
@@ -3952,8 +3996,8 @@ Every tool call generates an episode stored in SQLite with WAL journal mode:
|
|
|
3952
3996
|
|
|
3953
3997
|
Entities extracted from tool results form a temporal KG with [Graphiti](https://arxiv.org/abs/2501.13956)-style edges:
|
|
3954
3998
|
|
|
3955
|
-
- **Nodes**: files, functions, errors, people, concepts — with `mention_count` and `last_seen`
|
|
3956
|
-
- **Edges**: causal relationships (`
|
|
3999
|
+
- **Nodes**: files, functions, errors, people, scopes, messages, media assets, concepts — with `mention_count` and `last_seen`
|
|
4000
|
+
- **Edges**: causal and identity relationships (`contains`, `authored_by`, `uploaded_by`, `replied_to`, `depicts`, `named_as`, `same_person_candidate`, `voice_sample_of`) with `valid_from`/`valid_until` temporal bounds
|
|
3957
4001
|
- **Temporal queries**: "What was the state at time T?" via validity filtering
|
|
3958
4002
|
|
|
3959
4003
|
### Zettelkasten Linking (A-MEM)
|
|
@@ -3972,6 +4016,25 @@ Retrieval uses [Personalized PageRank over the temporal KG](https://arxiv.org/ab
|
|
|
3972
4016
|
|
|
3973
4017
|
This enables multi-hop retrieval: asking about "the auth bug" can surface episodes about the specific file, the test that caught it, and the person who reported it — even if those episodes don't share keywords.
|
|
3974
4018
|
|
|
4019
|
+
### Scoped Visual Identity Recall
|
|
4020
|
+
|
|
4021
|
+
Visual identity memory is deliberately split into two layers:
|
|
4022
|
+
|
|
4023
|
+
| Layer | Role | Storage |
|
|
4024
|
+
|-------|------|---------|
|
|
4025
|
+
| `visual_memory` | Local face/object recognizer. Enrolls and identifies faces with InsightFace ArcFace, teaches and recognizes objects with CLIP. Structured callers use `format=json` instead of parsing display text. | `~/.omnius/visual-memory/` |
|
|
4026
|
+
| `identity_memory` | Agent-facing scoped evidence layer. Records explicit user assertions, stages names for future images, identifies enrolled faces, and recalls graph evidence. | `.omnius/episodes.db` + `.omnius/knowledge.db` |
|
|
4027
|
+
| `MultimodalIdentityService` | Central graph writer for source surface, scope, sender, message, reply, media, identity assertions, embeddings, and transcript links. | `.omnius/episodes.db` + `.omnius/knowledge.db` |
|
|
4028
|
+
|
|
4029
|
+
Supported natural chronologies:
|
|
4030
|
+
|
|
4031
|
+
1. **Image then name** — user sends an image, then says "this is Cole" or replies to the image with the name. The agent calls `identity_memory action="assert_identity" name="Cole" media="latest|reply"`, storing `named_as` / `depicts` graph evidence and attempting face enrollment.
|
|
4032
|
+
2. **Name then image** — user says "the next image is Cole" before sending media. The agent calls `identity_memory action="stage_identity" name="Cole"`. The next same-scope image consumes that pending assertion, enrolls the face, and commits `depicts` evidence only after enrollment succeeds.
|
|
4033
|
+
3. **Later image** — TUI clipboard/drop, GUI attachment upload, Telegram private chats, Telegram groups, and `/v1/memory/ingest` all run structured `visual_memory identify`. If an enrolled face matches, Omnius injects a `Scoped Visual Identity Recall` block with same-scope memories and commits `same_person_candidate` / `depicts` evidence for the new image.
|
|
4034
|
+
4. **Unknown face** — if face detection sees a face but no enrolled identity matches, image ingress injects an `Unknown Visual Identity Candidate` block. The model is steered to ask who the person is only when identity matters to the user's task, and never to guess a real identity.
|
|
4035
|
+
|
|
4036
|
+
Scope is part of every write and recall. A Telegram group, Telegram DM, TUI terminal session, GUI chat session, and API caller each get their own `scope.kind` / `scope.id` boundary. The recognizer may know that a face matches "Cole", but related memory recall is filtered to the current scope/session unless a tool or policy explicitly broadens access.
|
|
4037
|
+
|
|
3975
4038
|
### Cross-Modal Binding
|
|
3976
4039
|
|
|
3977
4040
|
The `multimodal_memory` tool binds face, voice, text, and location into unified episodes:
|
|
@@ -3997,13 +4060,14 @@ Post-task, the [ReadAgent](https://arxiv.org/abs/2402.09727) gist compressor cre
|
|
|
3997
4060
|
|
|
3998
4061
|
### Cross‑Modality Identity & Association (CLIP + Voice)
|
|
3999
4062
|
|
|
4000
|
-
Omnius binds entities across image, audio, and text using
|
|
4063
|
+
Omnius binds entities across image, audio, and text using explicit evidence plus local embedding models:
|
|
4001
4064
|
|
|
4002
|
-
-
|
|
4003
|
-
-
|
|
4004
|
-
-
|
|
4005
|
-
-
|
|
4006
|
-
-
|
|
4065
|
+
- Face identity: InsightFace ArcFace embeddings in `visual_memory` perform enrolled-face matching. Matches become graph evidence only through structured JSON results, never by parsing pretty tool output.
|
|
4066
|
+
- Object and scene association: CLIP/OpenCLIP vectors are stored as `clip_embedding` for visual/text retrieval and for taught object recognition through `visual_memory teach/recognize`.
|
|
4067
|
+
- Voice linkage: speaker embeddings and transcripts attach audio episodes to sender/speaker candidates when available; transcripts are stored as text episodes for retrieval.
|
|
4068
|
+
- Text labels: person names are stored from explicit agent-decided `identity_memory` calls (`assert_identity` for current media, `stage_identity` for next media), not regex shortcuts over captions.
|
|
4069
|
+
- Association graph: cross-modal edges (`depicts`, `named_as`, `same_person_candidate`, `voice_sample_of`, `said_by`, `replied_to`) consolidate into scoped entity neighborhoods with provenance, confidence, timestamp, and source episode IDs.
|
|
4070
|
+
- Privacy & safety: raw media and embeddings remain local. Episode and graph evidence live under `.omnius/`; the persistent visual face/object database lives under `~/.omnius/visual-memory/`.
|
|
4007
4071
|
|
|
4008
4072
|
This enables queries like: “Find where Alex spoke about deployment,” “Show files edited after the person in the red sweater approved the PR,” or “Summarize conversations where Speaker‑B and Alice appear together.”
|
|
4009
4073
|
|