omnius 1.0.8 → 1.0.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,19 +1,17 @@
1
1
  <a name="top"></a>
2
2
  ```text
3
- ░▒▓██████▓▒░░▒▓██████████████▓▒░░▒▓███████▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░░▒▓███████▓▒░
4
- ░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░
5
- ░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░
6
- ░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░░▒▓██████▓▒░
7
- ░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░ ░▒▓█▓▒░
8
- ░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░ ░▒▓█▓▒░
9
- ░▒▓██████▓▒░░▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓██████▓▒░░▒▓███████▓▒░
10
-
11
-
3
+
4
+ ░░ ░░░ ░░░░ ░░ ░░░ ░░ ░░ ░░░░ ░░░ ░░
5
+ ▒ ▒▒▒▒ ▒▒ ▒▒ ▒▒ ▒▒ ▒▒▒▒▒ ▒▒▒▒▒ ▒▒▒▒ ▒▒ ▒▒▒▒▒▒▒
6
+ ▓▓▓▓ ▓▓ ▓▓ ▓ ▓ ▓▓▓▓▓ ▓▓▓▓▓ ▓▓▓▓ ▓▓▓ ▓▓
7
+ █ ████ ██ █ █ ██ ██ █████ █████ ████ ████████ █
8
+ ██ ███ ████ ██ ███ ██ ███ ████ ██
9
+
12
10
  ```
13
11
 
14
12
  <p align="center">
15
13
  <strong>AI coding agent powered entirely by open-weight models.</strong><br>
16
- No API keys. No cloud. Your code never leaves your machine.
14
+ No API keys. No cloud. Your code never leaves your machine <i>(unless you want it to!)</i>
17
15
  </p>
18
16
 
19
17
  <p align="center">
@@ -280,8 +278,10 @@ The agent uses tools autonomously in a loop — reading errors, fixing code, and
280
278
 
281
279
  - **61 autonomous tools** — file I/O, shell, grep, web search/fetch/crawl, memory (read/write/search), sub-agents, background tasks, image/OCR/PDF, git, diagnostics, vision, desktop automation, browser automation, temporal agency (scheduler/reminders/agenda), structured files, code sandbox, transcription, skills, opencode delegation, cron agents, nexus P2P networking + x402 micropayments, **COHERE cognitive stack** (persistent REPL, recursive LLM calls, memory metabolism, identity kernel, reflection, exploration)
282
280
  - **Moondream vision** — see and interact with the desktop via Moondream VLM (caption, query, detect, point-and-click)
281
+ - **Image generation with TUI previews** — `/image <prompt>` and the `generate_image` tool create PNGs under `.omnius/images/`, support explicit `--model` selection, and render generated, pasted, screenshot, and camera-capture images as auto-sized ASCII previews via `image-to-ascii`
283
282
  - **Desktop automation** — vision-guided clicking: describe a UI element in natural language, the agent finds and clicks it
284
283
  - **Auto-install desktop deps** — screenshot, mouse, OCR, and image tools auto-install missing system packages (scrot, xdotool, tesseract, imagemagick) on first use
284
+ - **Hardware-rated model lists** — first-run setup, `/models`, `/score`, and `/image list` score model fit against detected RAM/VRAM/GPU so text and image model choices are visible before you switch or generate
285
285
  - **Parallel tool execution** — read-only tools run concurrently via `Promise.allSettled`
286
286
  - **Sub-agent delegation** — spawn independent agents for parallel workstreams
287
287
  - **OpenCode delegation** — offload coding tasks to opencode (sst/opencode) as an autonomous sub-agent with auto-install, progress monitoring, and result evaluation
@@ -339,7 +339,7 @@ The daemon auto-installs Python dependencies (OpenCLIP, torchaudio + soundfile,
339
339
  - **Temporal agency** — schedule future tasks via OS cron, set cross-session reminders, flag attention items — startup injection surfaces due items automatically
340
340
  - **Web crawling** — multi-page web scraping with Crawlee/Playwright for deep documentation extraction
341
341
  - **Task templates** — specialized system prompts and tool recommendations for code, document, analysis, plan tasks
342
- - **Inference capability scoring** — canirun.ai-style hardware assessment at first launch: memory/compute/speed scores, per-model compatibility matrix, recommended model selection
342
+ - **Inference capability scoring** — canirun.ai-style hardware assessment at first launch and on demand: memory/compute/speed scores, per-model compatibility matrix, `/models` runtime fit ratings, `/image list` image-model fit ratings, and recommended model selection
343
343
  - **Auto-install everything** — first-run wizard auto-installs Ollama, curl, Python3, python3-venv with platform-aware package managers (apt, dnf, yum, pacman, apk, zypper, brew)
344
344
  - **Sponsored inference** — `/sponsor` walks through a 5-step wizard to share your GPU with the world: select endpoints, choose banner animation (8 presets + AI-generated custom), set header message/links, configure transport (cloudflared/libp2p) + rate limits, and go live. Consumers discover sponsors via `/endpoint sponsor`. Secure proxy relay with per-IP rate limiting, daily token budgets, model allowlist, and concurrent request caps. Sponsor's raw API URL is never exposed. See [Sponsored Inference](#sponsored-inference--share-your-gpu-with-the-world) below
345
345
  - **P2P inference network** — `/expose` local models or forward any `/endpoint` (Chutes, Groq, OpenRouter, etc.) through the libp2p P2P mesh. Passthrough mode (`/expose passthrough`) relays upstream API requests; `--loadbalance` distributes rate-limited token budgets across peers. `/expose config` provides an arrow-key menu for all settings. Gateway stats show budget remaining from `x-ratelimit-*` headers. Background daemon persists across Omnius restarts
@@ -357,7 +357,7 @@ The daemon auto-installs Python dependencies (OpenCLIP, torchaudio + soundfile,
357
357
  - **IPFS sharing surface** — `/ipfs` status page with peer info + identity kernel metrics + memory sentiment. `/ipfs pin <CID>` to pin remote agent content. `/ipfs publish` to share identity kernel. `/ipfs share tool/skill` to publish agent-created tools with secret stripping. `/ipfs import <CID>` to retrieve shared content
358
358
  - **Fortemi-React bridge** — `/fortemi start/status/stop` connects to [fortemi-react](https://github.com/robit-man/fortemi-react) (browser-first PGlite+pgvector knowledge system) via JWT auth. Proxy tools: `fortemi_capture`, `fortemi_search`, `fortemi_list`, `fortemi_get` auto-register when bridge is connected
359
359
  - **Content ingestion** — `/ingest <file>` imports audio (transcribe via Whisper), PDF (pdftotext), or text files into structured memory with 800-char/100-overlap chunking (matches fortemi pattern)
360
- - **Image generation** — `generate_image` tool using Ollama experimental models ([x/z-image-turbo](https://ollama.com/x/z-image-turbo), [x/flux2-klein](https://ollama.com/x/flux2-klein)). Auto-detect or auto-pull models. Saves PNG to `.omnius/images/`
360
+ - **Image generation** — `generate_image` supports Ollama image models, Diffusers models, and stable-diffusion.cpp checkpoints/GGUF. SDXL Turbo is the practical default auto-install path under `.omnius/image-gen/.venv`; FLUX.1 dev and Stable Diffusion 3.5 Large are the primary high-realism baselines when hardware allows. `/image list` groups models by type, size, quality expectations, and hardware fit
361
361
  - **Node visualization** — [omnius.nexus](https://github.com/robit-man/omnius.nexus) Three.js dashboard: 5-color emotional state mapping (neutral/focused/stressed/dreaming/excited), dynamic node size by memory depth + IPFS storage, activity-modulated connections, identity synchrony golden threads between mutually-pinned agents
362
362
  - **TTS sanitizer** — strips markdown syntax (`##`, `**`, `` ` ``), emoji (prevents "white heavy checkmark"), box-drawing chars, and ANSI codes before feeding to ALL TTS engines
363
363
  - **LuxTTS gapless playback** — look-ahead pre-synthesis pipeline: next chunk synthesizes while current plays, eliminating inter-sentence gaps. Jetson ARM support with NVIDIA's prebuilt PyTorch wheel
@@ -368,7 +368,7 @@ The daemon auto-installs Python dependencies (OpenCLIP, torchaudio + soundfile,
368
368
  - **Self-learning** — auto-fetches docs from the web when encountering unfamiliar APIs
369
369
  - **Seamless `/update`** — in-place update and reload with automatic context save/restore
370
370
  - **Blessed mode** — `/full-send-bless` infinite warm loop keeps model weights in VRAM, auto-cycles tasks, never exits until you say stop
371
- - **Telegram bridge** — `/telegram --key <token> --admin <userid>` public ingress/egress with admin filter and mandatory safety filter; bare `/telegram` toggles the service watchdog
371
+ - **Telegram bridge** — `/telegram --key <token> --admin <userid>` public ingress/egress with admin filter, scoped memory, per-chat personality profiles, sandboxed public creative file/image/audio tools, generated-artifact send-back, and mandatory safety filter; bare `/telegram` toggles the service watchdog
372
372
  - **Task control** — `/pause` (gentle halt at turn boundary), `/stop` (immediate kill), `/resume` to continue
373
373
  - **Model-tier awareness** — dynamic tool sets, prompt complexity, and context limits scale with model size (small/medium/large)
374
374
 
@@ -3294,13 +3294,16 @@ omnius
3294
3294
 
3295
3295
  The TUI features an animated multilingual phrase carousel, live metrics bar with pastel-colored labels (token in/out, context window usage, human expert speed ratio, cost), rotating tips, syntax-highlighted tool output, and dynamic terminal-width cropping.
3296
3296
 
3297
+ Image surfaces are first-class in the terminal. `/image` generations, generated-image tool results, pasted image context, screenshots, and camera captures are converted through `image-to-ascii` and sized to the current terminal before being printed into the main scrollback. Each generated image also includes the saved file path below the preview.
3298
+
3297
3299
  ### Slash Commands
3298
3300
 
3299
3301
  | Command | Description |
3300
3302
  |---------|-------------|
3301
3303
  | **Model & Endpoint** | |
3302
3304
  | `/model <name>` | Switch to a different model |
3303
- | `/models` | List all available models |
3305
+ | `/models` | List available text models with detected hardware-fit ratings |
3306
+ | `/score` | Show the inference capability scorecard: memory, compute, speed, and model compatibility |
3304
3307
  | `/endpoint <url>` | Connect to a remote vLLM or OpenAI-compatible API |
3305
3308
  | `/endpoint <url> --auth <key>` | Set endpoint with Bearer auth |
3306
3309
  | `/endpoint <peerId> --auth <key>` | Connect to a libp2p peer via nexus P2P network |
@@ -3318,6 +3321,11 @@ The TUI features an animated multilingual phrase carousel, live metrics bar with
3318
3321
  | **Audio & Vision** | |
3319
3322
  | `/voice [model]` | Toggle TTS voice (GLaDOS, Overwatch, Kokoro, LuxTTS, Supertonic) |
3320
3323
  | `/listen [mode]` | Toggle live microphone transcription |
3324
+ | `/image` | Open the image-generation model/setup menu |
3325
+ | `/image <prompt>` | Generate an image and show an auto-sized ASCII preview in the TUI |
3326
+ | `/image --model <model> <prompt>` | Generate with an explicit image model |
3327
+ | `/image list` | List image models by category, size, quality expectation, and hardware fit |
3328
+ | `/image setup <ollama\|diffusers\|sdcpp>` | Show setup commands for an image-generation backend |
3321
3329
  | `/dream [mode]` | Start dream mode (default, deep, lucid) |
3322
3330
  | **Display & Behavior** | |
3323
3331
  | `/stream` | Toggle streaming token display with pastel syntax highlighting |
@@ -3440,7 +3448,7 @@ The steering sub-agent uses the same model and backend as the main agent with `m
3440
3448
 
3441
3449
  <div align="right"><a href="#top">back to top</a></div>
3442
3450
 
3443
- Connect the agent to a Telegram bot. Telegram can run in auto, chat, or action mode: conversational messages get rapid streamed replies in chat mode, while codebase/file/run requests use dedicated action sub-agents that are visible in the terminal waterfall alongside other agent activity.
3451
+ Connect the agent to a Telegram bot. Telegram can run in auto, chat, or action mode: conversational messages get rapid streamed replies in chat mode, while codebase/file/run requests use dedicated action sub-agents that are visible in the terminal waterfall alongside other agent activity. Public group chats get scoped memory, live reply discretion, and sandboxed creative tools for generating files, audio, and images without exposing the local workspace.
3444
3452
 
3445
3453
  ```bash
3446
3454
  /telegram --key <token> # Save bot token (persisted to .omnius/settings.json)
@@ -3469,7 +3477,7 @@ The bot token, admin ID, and interaction mode are persisted to settings, so you
3469
3477
 
3470
3478
  Use `/telegram mode auto|chat|action` to control how inbound Telegram messages are routed:
3471
3479
 
3472
- - **auto** — short greetings, quick questions, playful messages, and conversational turns use fast streamed chat replies; explicit codebase/file/command/run/test requests use action sub-agents.
3480
+ - **auto** — the live router decides whether the message is conversational, actionable, or not directed at the bot. Reply-worthy conversational turns use fast streamed chat replies; explicit codebase/file/command/run/test requests use action sub-agents.
3473
3481
  - **chat** — every non-command message gets a direct quick-chat completion with no tool loop. This is best for rapid back-and-forth conversation.
3474
3482
  - **action** — every non-command message runs through the Telegram sub-agent path with the configured tool policy.
3475
3483
 
@@ -3528,13 +3536,13 @@ If a user sends another message while their sub-agent is still running, it's inj
3528
3536
  |-------|----------|-------|--------|
3529
3537
  | **Admin DM** (`--admin`, private chat) | 30 | All tools except shell (overridable) | Full read + write |
3530
3538
  | **Admin Group** (admin in group chat) | 15 | Read-only + web + vision/OCR/transcription | Full read + write |
3531
- | **Public** (everyone else) | 8 | memory r/w (scoped), web fetch/search | Scoped per-chat |
3539
+ | **Public** (everyone else) | 8 | scoped memory, web fetch/search, media analysis, sandboxed creative file/image/audio tools | Scoped per-chat |
3532
3540
 
3533
3541
  **Admin DM** — full agent experience in private chat. File read, grep, glob, memory, web research, all tools except shell (which can be unblocked via config).
3534
3542
 
3535
3543
  **Admin Group** — when the admin speaks in a group chat, the agent responds with read-only capabilities. No system-mutating tools (no shell, no file write, no code execution). Vision, OCR, transcription, and web tools are available for analyzing shared media and answering questions.
3536
3544
 
3537
- **Public** — lightweight assistant with safety guardrails. No file access, no shell, no code. Web search, scoped memory, and general knowledge only. Reply discretion active in groups.
3545
+ **Public** — lightweight assistant with safety guardrails. No shell and no access to arbitrary local files. Web search, scoped memory, media analysis, and creative artifact generation are available inside a per-chat sandbox. Reply discretion is active in groups.
3538
3546
 
3539
3547
  ### Streaming Responses
3540
3548
 
@@ -3542,12 +3550,25 @@ While the sub-agent is working, users see:
3542
3550
  1. **Typing indicator** — "typing..." appears immediately and refreshes every 4 seconds until the response is ready
3543
3551
  2. **Admin live streaming** — a placeholder message is sent immediately, then progressively edited via `editMessageText` with accumulated content + intermediate states (tool calls, results, status updates). Admin sees `🔧 tool_name(...)` and `✔ tool_name: result` inline as the agent works
3544
3552
  3. **Markdown → HTML conversion** — all responses are automatically converted from GitHub-flavored Markdown to Telegram-compatible HTML (`<b>`, `<i>`, `<code>`, `<pre>`, `<s>`, `<a>`) with plaintext fallback
3545
- 4. **Final message** — committed via `editMessageText` (admin) or `sendMessage` (public) when the agent completes
3553
+ 4. **Final message selection** — the bridge prefers the assistant's refined visible content over task-complete summaries, router decisions, memory-stage notes, or `no_reply` markers
3554
+ 5. **Artifact send-back** — generated images, documents, and audio files created inside the scoped creative workspace are uploaded back to Telegram via the appropriate Bot API method
3555
+ 6. **Final message** — committed via `editMessageText` (admin) or `sendMessage` (public) when the agent completes
3546
3556
 
3547
3557
  ### Public User Isolation
3548
3558
 
3549
3559
  Public users get **per-chat isolated memory** — each chat has its own scoped memory namespace (`telegram-{chatId}-{topic}`) so public users can store and retrieve facts about their conversation without accessing or polluting global agent memory. Public tools include: `memory_read`, `memory_write` (scoped), `memory_search`, `web_search`, `web_fetch`.
3550
3560
 
3561
+ The bridge also maintains a per-chat conversation state file with recent history, participants, relationship signals, and lightweight Zettelkasten memory cards. Each Telegram group or private chat gets its own scoped personality document under `.omnius/scoped-personality/telegram-chat/`; that profile is updated as people talk and injected into future Telegram context so tone, pacing, names, and relationships stay available turn to turn.
3562
+
3563
+ ### Public Creative Artifacts
3564
+
3565
+ Public chats can ask Omnius to create files, images, and audio without giving the model arbitrary write access. The bridge injects a per-chat creative workspace under `.omnius/telegram-creative/<chat>/` and exposes scoped tools that can only create or edit files inside that folder. Generated artifacts are tracked by manifest and by tool result text, then uploaded back into the chat.
3566
+
3567
+ - **Images** — the model can call `generate_image` directly when the conversation asks for an image; generated PNGs are sent with `sendPhoto` when Telegram accepts them
3568
+ - **Documents** — Markdown, text, JSON, CSV, and other generated files are sent with `sendDocument`
3569
+ - **Audio** — generated WAV/voice artifacts are sent as audio or voice media based on file type
3570
+ - **Sandbox rule** — public creative tools cannot delete or mutate anything outside the scoped chat folder
3571
+
3551
3572
  ### Context-Aware Tool Policy
3552
3573
 
3553
3574
  Tools are gated per execution context. The system enforces strict separation between what's available in a terminal session versus a public Telegram group:
@@ -3557,7 +3578,7 @@ Tools are gated per execution context. The system enforces strict separation bet
3557
3578
  | `terminal` | All tools | Wide open — shell, file read/write, everything |
3558
3579
  | `telegram-admin-dm` | All except shell | Admin DM — full tools, shell blocked by default (overridable) |
3559
3580
  | `telegram-admin-group` | Read-only + web + vision/OCR | Admin in public group — no system mutation tools |
3560
- | `telegram-public` | Memory r/w, web fetch/search | Public users — minimal safe tools only |
3581
+ | `telegram-public` | Memory r/w, web fetch/search, scoped creative tools | Public users — no arbitrary local file access or shell |
3561
3582
  | `api` | All tools | API endpoint — configurable |
3562
3583
 
3563
3584
  **System tools** (`shell`, `file_write`, `file_edit`, `file_read`, `file_patch`, `batch_edit`, `grep_search`, `glob_find`, `list_directory`, `code_sandbox`, `codebase_map`, `git_info`, etc.) are **never exposed** in public-facing contexts.
@@ -3586,9 +3607,9 @@ The bridge distinguishes between **private DMs** and **group/supergroup chats**,
3586
3607
 
3587
3608
  - **Admin DM** → full tool access, live streaming via `editMessageText`, project context injected
3588
3609
  - **Admin in group** → read-only tools + web + vision/OCR, no live streaming, concise responses
3589
- - **Public in group** → minimal safe tools, reply discretion active
3610
+ - **Public in group** → scoped memory + web + media + creative sandbox tools, reply discretion active
3590
3611
 
3591
- **Reply discretion** — in group chats, the agent evaluates whether a message warrants a response. Casual greetings, messages directed at other users, and chatter that doesn't involve the bot are silently skipped (the agent returns `no_reply` as its summary). This prevents the bot from flooding group conversations with unnecessary responses.
3612
+ **Reply discretion** — in group chats, the live router evaluates whether a message warrants a response using the current conversation stream, participants, mentions, replies, and recent tone. Chatter that doesn't involve the bot is silently skipped and retained as context. Skip decisions are not sent back into the chat.
3592
3613
 
3593
3614
  ### Media Handling
3594
3615