npm - omnius - Versions diffs - 1.0.8 → 1.0.9 - Mend

omnius 1.0.8 → 1.0.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -1,19 +1,17 @@
 <a name="top"></a>
 ```text
- ░▒▓██████▓▒░░▒▓██████████████▓▒░░▒▓███████▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░░▒▓███████▓▒░
-░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░
-░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░
-░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░░▒▓██████▓▒░
-░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░      ░▒▓█▓▒░
-░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░      ░▒▓█▓▒░
- ░▒▓██████▓▒░░▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓██████▓▒░░▒▓███████▓▒░
+░░      ░░░  ░░░░  ░░   ░░░  ░░        ░░  ░░░░  ░░░      ░░
+▒  ▒▒▒▒  ▒▒   ▒▒   ▒▒    ▒▒  ▒▒▒▒▒  ▒▒▒▒▒  ▒▒▒▒  ▒▒  ▒▒▒▒▒▒▒
+▓  ▓▓▓▓  ▓▓        ▓▓  ▓  ▓  ▓▓▓▓▓  ▓▓▓▓▓  ▓▓▓▓  ▓▓▓      ▓▓
+█  ████  ██  █  █  ██  ██    █████  █████  ████  ████████  █
+██      ███  ████  ██  ███   ██        ███      ████      ██
 ```
 <p align="center">
   <strong>AI coding agent powered entirely by open-weight models.</strong><br>
-  No API keys. No cloud. Your code never leaves your machine.
+  No API keys. No cloud. Your code never leaves your machine <i>(unless you want it to!)</i>
 </p>
 <p align="center">
@@ -280,8 +278,10 @@ The agent uses tools autonomously in a loop — reading errors, fixing code, and
 - **61 autonomous tools** — file I/O, shell, grep, web search/fetch/crawl, memory (read/write/search), sub-agents, background tasks, image/OCR/PDF, git, diagnostics, vision, desktop automation, browser automation, temporal agency (scheduler/reminders/agenda), structured files, code sandbox, transcription, skills, opencode delegation, cron agents, nexus P2P networking + x402 micropayments, **COHERE cognitive stack** (persistent REPL, recursive LLM calls, memory metabolism, identity kernel, reflection, exploration)
 - **Moondream vision** — see and interact with the desktop via Moondream VLM (caption, query, detect, point-and-click)
+- **Image generation with TUI previews** — `/image <prompt>` and the `generate_image` tool create PNGs under `.omnius/images/`, support explicit `--model` selection, and render generated, pasted, screenshot, and camera-capture images as auto-sized ASCII previews via `image-to-ascii`
 - **Desktop automation** — vision-guided clicking: describe a UI element in natural language, the agent finds and clicks it
 - **Auto-install desktop deps** — screenshot, mouse, OCR, and image tools auto-install missing system packages (scrot, xdotool, tesseract, imagemagick) on first use
+- **Hardware-rated model lists** — first-run setup, `/models`, `/score`, and `/image list` score model fit against detected RAM/VRAM/GPU so text and image model choices are visible before you switch or generate
 - **Parallel tool execution** — read-only tools run concurrently via `Promise.allSettled`
 - **Sub-agent delegation** — spawn independent agents for parallel workstreams
 - **OpenCode delegation** — offload coding tasks to opencode (sst/opencode) as an autonomous sub-agent with auto-install, progress monitoring, and result evaluation
@@ -339,7 +339,7 @@ The daemon auto-installs Python dependencies (OpenCLIP, torchaudio + soundfile,
 - **Temporal agency** — schedule future tasks via OS cron, set cross-session reminders, flag attention items — startup injection surfaces due items automatically
 - **Web crawling** — multi-page web scraping with Crawlee/Playwright for deep documentation extraction
 - **Task templates** — specialized system prompts and tool recommendations for code, document, analysis, plan tasks
-- **Inference capability scoring** — canirun.ai-style hardware assessment at first launch: memory/compute/speed scores, per-model compatibility matrix, recommended model selection
+- **Inference capability scoring** — canirun.ai-style hardware assessment at first launch and on demand: memory/compute/speed scores, per-model compatibility matrix, `/models` runtime fit ratings, `/image list` image-model fit ratings, and recommended model selection
 - **Auto-install everything** — first-run wizard auto-installs Ollama, curl, Python3, python3-venv with platform-aware package managers (apt, dnf, yum, pacman, apk, zypper, brew)
 - **Sponsored inference** — `/sponsor` walks through a 5-step wizard to share your GPU with the world: select endpoints, choose banner animation (8 presets + AI-generated custom), set header message/links, configure transport (cloudflared/libp2p) + rate limits, and go live. Consumers discover sponsors via `/endpoint sponsor`. Secure proxy relay with per-IP rate limiting, daily token budgets, model allowlist, and concurrent request caps. Sponsor's raw API URL is never exposed. See [Sponsored Inference](#sponsored-inference--share-your-gpu-with-the-world) below
 - **P2P inference network** — `/expose` local models or forward any `/endpoint` (Chutes, Groq, OpenRouter, etc.) through the libp2p P2P mesh. Passthrough mode (`/expose passthrough`) relays upstream API requests; `--loadbalance` distributes rate-limited token budgets across peers. `/expose config` provides an arrow-key menu for all settings. Gateway stats show budget remaining from `x-ratelimit-*` headers. Background daemon persists across Omnius restarts
@@ -357,7 +357,7 @@ The daemon auto-installs Python dependencies (OpenCLIP, torchaudio + soundfile,
 - **IPFS sharing surface** — `/ipfs` status page with peer info + identity kernel metrics + memory sentiment. `/ipfs pin <CID>` to pin remote agent content. `/ipfs publish` to share identity kernel. `/ipfs share tool/skill` to publish agent-created tools with secret stripping. `/ipfs import <CID>` to retrieve shared content
 - **Fortemi-React bridge** — `/fortemi start/status/stop` connects to [fortemi-react](https://github.com/robit-man/fortemi-react) (browser-first PGlite+pgvector knowledge system) via JWT auth. Proxy tools: `fortemi_capture`, `fortemi_search`, `fortemi_list`, `fortemi_get` auto-register when bridge is connected
 - **Content ingestion** — `/ingest <file>` imports audio (transcribe via Whisper), PDF (pdftotext), or text files into structured memory with 800-char/100-overlap chunking (matches fortemi pattern)
-- **Image generation** — `generate_image` tool using Ollama experimental models ([x/z-image-turbo](https://ollama.com/x/z-image-turbo), [x/flux2-klein](https://ollama.com/x/flux2-klein)). Auto-detect or auto-pull models. Saves PNG to `.omnius/images/`
+- **Image generation** — `generate_image` supports Ollama image models, Diffusers models, and stable-diffusion.cpp checkpoints/GGUF. SDXL Turbo is the practical default auto-install path under `.omnius/image-gen/.venv`; FLUX.1 dev and Stable Diffusion 3.5 Large are the primary high-realism baselines when hardware allows. `/image list` groups models by type, size, quality expectations, and hardware fit
 - **Node visualization** — [omnius.nexus](https://github.com/robit-man/omnius.nexus) Three.js dashboard: 5-color emotional state mapping (neutral/focused/stressed/dreaming/excited), dynamic node size by memory depth + IPFS storage, activity-modulated connections, identity synchrony golden threads between mutually-pinned agents
 - **TTS sanitizer** — strips markdown syntax (`##`, `**`, `` ` ``), emoji (prevents "white heavy checkmark"), box-drawing chars, and ANSI codes before feeding to ALL TTS engines
 - **LuxTTS gapless playback** — look-ahead pre-synthesis pipeline: next chunk synthesizes while current plays, eliminating inter-sentence gaps. Jetson ARM support with NVIDIA's prebuilt PyTorch wheel
@@ -368,7 +368,7 @@ The daemon auto-installs Python dependencies (OpenCLIP, torchaudio + soundfile,
 - **Self-learning** — auto-fetches docs from the web when encountering unfamiliar APIs
 - **Seamless `/update`** — in-place update and reload with automatic context save/restore
 - **Blessed mode** — `/full-send-bless` infinite warm loop keeps model weights in VRAM, auto-cycles tasks, never exits until you say stop
-- **Telegram bridge** — `/telegram --key <token> --admin <userid>` public ingress/egress with admin filter and mandatory safety filter; bare `/telegram` toggles the service watchdog
+- **Telegram bridge** — `/telegram --key <token> --admin <userid>` public ingress/egress with admin filter, scoped memory, per-chat personality profiles, sandboxed public creative file/image/audio tools, generated-artifact send-back, and mandatory safety filter; bare `/telegram` toggles the service watchdog
 - **Task control** — `/pause` (gentle halt at turn boundary), `/stop` (immediate kill), `/resume` to continue
 - **Model-tier awareness** — dynamic tool sets, prompt complexity, and context limits scale with model size (small/medium/large)
@@ -3294,13 +3294,16 @@ omnius
 The TUI features an animated multilingual phrase carousel, live metrics bar with pastel-colored labels (token in/out, context window usage, human expert speed ratio, cost), rotating tips, syntax-highlighted tool output, and dynamic terminal-width cropping.
+Image surfaces are first-class in the terminal. `/image` generations, generated-image tool results, pasted image context, screenshots, and camera captures are converted through `image-to-ascii` and sized to the current terminal before being printed into the main scrollback. Each generated image also includes the saved file path below the preview.
 ### Slash Commands
 | Command | Description |
 |---------|-------------|
 | **Model & Endpoint** | |
 | `/model <name>` | Switch to a different model |
-| `/models` | List all available models |
+| `/models` | List available text models with detected hardware-fit ratings |
+| `/score` | Show the inference capability scorecard: memory, compute, speed, and model compatibility |
 | `/endpoint <url>` | Connect to a remote vLLM or OpenAI-compatible API |
 | `/endpoint <url> --auth <key>` | Set endpoint with Bearer auth |
 | `/endpoint <peerId> --auth <key>` | Connect to a libp2p peer via nexus P2P network |
@@ -3318,6 +3321,11 @@ The TUI features an animated multilingual phrase carousel, live metrics bar with
 | **Audio & Vision** | |
 | `/voice [model]` | Toggle TTS voice (GLaDOS, Overwatch, Kokoro, LuxTTS, Supertonic) |
 | `/listen [mode]` | Toggle live microphone transcription |
+| `/image` | Open the image-generation model/setup menu |
+| `/image <prompt>` | Generate an image and show an auto-sized ASCII preview in the TUI |
+| `/image --model <model> <prompt>` | Generate with an explicit image model |
+| `/image list` | List image models by category, size, quality expectation, and hardware fit |
+| `/image setup <ollama\|diffusers\|sdcpp>` | Show setup commands for an image-generation backend |
 | `/dream [mode]` | Start dream mode (default, deep, lucid) |
 | **Display & Behavior** | |
 | `/stream` | Toggle streaming token display with pastel syntax highlighting |
@@ -3440,7 +3448,7 @@ The steering sub-agent uses the same model and backend as the main agent with `m
 <div align="right"><a href="#top">back to top</a></div>
-Connect the agent to a Telegram bot. Telegram can run in auto, chat, or action mode: conversational messages get rapid streamed replies in chat mode, while codebase/file/run requests use dedicated action sub-agents that are visible in the terminal waterfall alongside other agent activity.
+Connect the agent to a Telegram bot. Telegram can run in auto, chat, or action mode: conversational messages get rapid streamed replies in chat mode, while codebase/file/run requests use dedicated action sub-agents that are visible in the terminal waterfall alongside other agent activity. Public group chats get scoped memory, live reply discretion, and sandboxed creative tools for generating files, audio, and images without exposing the local workspace.
 ```bash
 /telegram --key <token>     # Save bot token (persisted to .omnius/settings.json)
@@ -3469,7 +3477,7 @@ The bot token, admin ID, and interaction mode are persisted to settings, so you
 Use `/telegram mode auto|chat|action` to control how inbound Telegram messages are routed:
-- **auto** — short greetings, quick questions, playful messages, and conversational turns use fast streamed chat replies; explicit codebase/file/command/run/test requests use action sub-agents.
+- **auto** — the live router decides whether the message is conversational, actionable, or not directed at the bot. Reply-worthy conversational turns use fast streamed chat replies; explicit codebase/file/command/run/test requests use action sub-agents.
 - **chat** — every non-command message gets a direct quick-chat completion with no tool loop. This is best for rapid back-and-forth conversation.
 - **action** — every non-command message runs through the Telegram sub-agent path with the configured tool policy.
@@ -3528,13 +3536,13 @@ If a user sends another message while their sub-agent is still running, it's inj
 |-------|----------|-------|--------|
 | **Admin DM** (`--admin`, private chat) | 30 | All tools except shell (overridable) | Full read + write |
 | **Admin Group** (admin in group chat) | 15 | Read-only + web + vision/OCR/transcription | Full read + write |
-| **Public** (everyone else) | 8 | memory r/w (scoped), web fetch/search | Scoped per-chat |
+| **Public** (everyone else) | 8 | scoped memory, web fetch/search, media analysis, sandboxed creative file/image/audio tools | Scoped per-chat |
 **Admin DM** — full agent experience in private chat. File read, grep, glob, memory, web research, all tools except shell (which can be unblocked via config).
 **Admin Group** — when the admin speaks in a group chat, the agent responds with read-only capabilities. No system-mutating tools (no shell, no file write, no code execution). Vision, OCR, transcription, and web tools are available for analyzing shared media and answering questions.
-**Public** — lightweight assistant with safety guardrails. No file access, no shell, no code. Web search, scoped memory, and general knowledge only. Reply discretion active in groups.
+**Public** — lightweight assistant with safety guardrails. No shell and no access to arbitrary local files. Web search, scoped memory, media analysis, and creative artifact generation are available inside a per-chat sandbox. Reply discretion is active in groups.
 ### Streaming Responses
@@ -3542,12 +3550,25 @@ While the sub-agent is working, users see:
 1. **Typing indicator** — "typing..." appears immediately and refreshes every 4 seconds until the response is ready
 2. **Admin live streaming** — a placeholder message is sent immediately, then progressively edited via `editMessageText` with accumulated content + intermediate states (tool calls, results, status updates). Admin sees `🔧 tool_name(...)` and `✔ tool_name: result` inline as the agent works
 3. **Markdown → HTML conversion** — all responses are automatically converted from GitHub-flavored Markdown to Telegram-compatible HTML (`<b>`, `<i>`, `<code>`, `<pre>`, `<s>`, `<a>`) with plaintext fallback
-4. **Final message** — committed via `editMessageText` (admin) or `sendMessage` (public) when the agent completes
+4. **Final message selection** — the bridge prefers the assistant's refined visible content over task-complete summaries, router decisions, memory-stage notes, or `no_reply` markers
+5. **Artifact send-back** — generated images, documents, and audio files created inside the scoped creative workspace are uploaded back to Telegram via the appropriate Bot API method
+6. **Final message** — committed via `editMessageText` (admin) or `sendMessage` (public) when the agent completes
 ### Public User Isolation
 Public users get **per-chat isolated memory** — each chat has its own scoped memory namespace (`telegram-{chatId}-{topic}`) so public users can store and retrieve facts about their conversation without accessing or polluting global agent memory. Public tools include: `memory_read`, `memory_write` (scoped), `memory_search`, `web_search`, `web_fetch`.
+The bridge also maintains a per-chat conversation state file with recent history, participants, relationship signals, and lightweight Zettelkasten memory cards. Each Telegram group or private chat gets its own scoped personality document under `.omnius/scoped-personality/telegram-chat/`; that profile is updated as people talk and injected into future Telegram context so tone, pacing, names, and relationships stay available turn to turn.
+### Public Creative Artifacts
+Public chats can ask Omnius to create files, images, and audio without giving the model arbitrary write access. The bridge injects a per-chat creative workspace under `.omnius/telegram-creative/<chat>/` and exposes scoped tools that can only create or edit files inside that folder. Generated artifacts are tracked by manifest and by tool result text, then uploaded back into the chat.
+- **Images** — the model can call `generate_image` directly when the conversation asks for an image; generated PNGs are sent with `sendPhoto` when Telegram accepts them
+- **Documents** — Markdown, text, JSON, CSV, and other generated files are sent with `sendDocument`
+- **Audio** — generated WAV/voice artifacts are sent as audio or voice media based on file type
+- **Sandbox rule** — public creative tools cannot delete or mutate anything outside the scoped chat folder
 ### Context-Aware Tool Policy
 Tools are gated per execution context. The system enforces strict separation between what's available in a terminal session versus a public Telegram group:
@@ -3557,7 +3578,7 @@ Tools are gated per execution context. The system enforces strict separation bet
 | `terminal` | All tools | Wide open — shell, file read/write, everything |
 | `telegram-admin-dm` | All except shell | Admin DM — full tools, shell blocked by default (overridable) |
 | `telegram-admin-group` | Read-only + web + vision/OCR | Admin in public group — no system mutation tools |
-| `telegram-public` | Memory r/w, web fetch/search | Public users — minimal safe tools only |
+| `telegram-public` | Memory r/w, web fetch/search, scoped creative tools | Public users — no arbitrary local file access or shell |
 | `api` | All tools | API endpoint — configurable |
 **System tools** (`shell`, `file_write`, `file_edit`, `file_read`, `file_patch`, `batch_edit`, `grep_search`, `glob_find`, `list_directory`, `code_sandbox`, `codebase_map`, `git_info`, etc.) are **never exposed** in public-facing contexts.
@@ -3586,9 +3607,9 @@ The bridge distinguishes between **private DMs** and **group/supergroup chats**,
 - **Admin DM** → full tool access, live streaming via `editMessageText`, project context injected
 - **Admin in group** → read-only tools + web + vision/OCR, no live streaming, concise responses
-- **Public in group** → minimal safe tools, reply discretion active
+- **Public in group** → scoped memory + web + media + creative sandbox tools, reply discretion active
-**Reply discretion** — in group chats, the agent evaluates whether a message warrants a response. Casual greetings, messages directed at other users, and chatter that doesn't involve the bot are silently skipped (the agent returns `no_reply` as its summary). This prevents the bot from flooding group conversations with unnecessary responses.
+**Reply discretion** — in group chats, the live router evaluates whether a message warrants a response using the current conversation stream, participants, mentions, replies, and recent tone. Chatter that doesn't involve the bot is silently skipped and retained as context. Skip decisions are not sent back into the chat.
 ### Media Handling