omnius 1.0.50 → 1.0.52
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +10 -2
- package/dist/index.js +5714 -2424
- package/npm-shrinkwrap.json +2 -2
- package/package.json +2 -2
package/README.md
CHANGED
|
@@ -279,8 +279,9 @@ The agent uses tools autonomously in a loop — reading errors, fixing code, and
|
|
|
279
279
|
|
|
280
280
|
- **60+ autonomous tools** — file I/O, shell, grep, web search/fetch/crawl, memory (read/write/search), sub-agents, background tasks, image/OCR/PDF, git, diagnostics, vision, desktop automation, browser automation, temporal agency (scheduler/reminders/agenda), structured files, code sandbox, transcription, skills, opencode delegation, cron agents, nexus P2P networking + x402 micropayments, **COHERE cognitive stack** (persistent REPL, recursive LLM calls, memory metabolism, identity kernel, reflection, exploration)
|
|
281
281
|
- **Moondream vision** — see and interact with the desktop via Moondream VLM (caption, query, detect, point-and-click)
|
|
282
|
-
- **Image generation with TUI previews** — `/image <prompt>` and the `generate_image` tool create PNGs under `.omnius/images/`, support explicit `--model` selection, try a ranked quality fallback ladder
|
|
282
|
+
- **Image generation with TUI previews** — `/image <prompt>` and the `generate_image` tool create PNGs under `.omnius/images/`, support explicit `--model` selection, default to NVIDIA Sana 1.5 1.6B and try a ranked quality fallback ladder (Sana 1.5 4.8B → Sana 1.5 1.6B → FLUX.1 dev → SD3.5 Large → smaller smoke-test models) when setup or generation fails, and render generated, pasted, screenshot, and camera-capture images as auto-sized ASCII previews via the bundled `image-to-ascii` renderer
|
|
283
283
|
- **Sound and music generation** — `/sound` and `/music` generate WAV files under `.omnius/audio/`, auto-create backend venvs under `.omnius/audio-gen/`, and fall back from high-quality Stable Audio / AudioLDM / MusicGen tiers to smaller practical models when a larger setup or model download fails. Stable Audio uses Diffusers `StableAudioPipeline` instead of the build-prone `stable-audio-tools` package
|
|
284
|
+
- **Video generation with thumbnail previews** — `/video <prompt>` (text-to-video) or `/video --image <path> <prompt>` (image-to-video) and the `generate_video` tool emit MP4s under `.omnius/videos/`, default to Sana-Video 480p (NVlabs, 2B Linear DiT, ICLR 2026 Oral), fall back through Wan2.2 TI2V 5B → LTX-Video → CogVideoX 5B → CogVideoX 2B on OOM/gating/download failure, write a first-frame PNG thumbnail next to the MP4 for TUI ASCII preview, and persist a `<video>.json` sidecar (original/expanded prompt, mode, frames, fps) so Telegram replies to a generated video carry the source prompt forward
|
|
284
285
|
- **Desktop automation** — vision-guided clicking: describe a UI element in natural language, the agent finds and clicks it
|
|
285
286
|
- **Auto-install desktop deps** — screenshot, mouse, OCR, and image tools auto-install missing system packages (scrot, xdotool, tesseract, imagemagick) on first use
|
|
286
287
|
- **Hardware-rated model lists** — first-run setup, `/models`, `/score`, and `/image list` score model fit against detected RAM/VRAM/GPU so text and image model choices are visible before you switch or generate
|
|
@@ -360,7 +361,8 @@ The daemon auto-installs Python dependencies (OpenCLIP, torchaudio + soundfile,
|
|
|
360
361
|
- **IPFS sharing surface** — `/ipfs` status page with peer info + identity kernel metrics + memory sentiment. `/ipfs pin <CID>` to pin remote agent content. `/ipfs publish` to share identity kernel. `/ipfs share tool/skill` to publish agent-created tools with secret stripping. `/ipfs import <CID>` to retrieve shared content
|
|
361
362
|
- **Fortemi-React bridge** — `/fortemi start/status/stop` connects to [fortemi-react](https://github.com/robit-man/fortemi-react) (browser-first PGlite+pgvector knowledge system) via JWT auth. Proxy tools: `fortemi_capture`, `fortemi_search`, `fortemi_list`, `fortemi_get` auto-register when bridge is connected
|
|
362
363
|
- **Content ingestion** — `/ingest <file>` imports audio (transcribe via Whisper), PDF (pdftotext), or text files into structured memory with 800-char/100-overlap chunking (matches fortemi pattern)
|
|
363
|
-
- **Image generation** — `generate_image` supports Ollama image models, Diffusers models, and stable-diffusion.cpp checkpoints/GGUF.
|
|
364
|
+
- **Image generation** — `generate_image` supports Ollama image models, Diffusers models, and stable-diffusion.cpp checkpoints/GGUF. NVIDIA Sana 1.5 1.6B is the practical default auto-install path under `.omnius/image-gen/.venv` (Apache 2.0 / NSCL, no HF gating); Sana 1.5 4.8B sits above it for 24 GB-class GPUs, with FLUX.1 dev and Stable Diffusion 3.5 Large remaining available as gated high-realism baselines. `/image list` groups models by type, size, quality expectations, and hardware fit. Generation falls through the ranked model ladder unless `strict_model=true` or `fallback=false` is set
|
|
365
|
+
- **Video generation** — `generate_video` runs Diffusers-based video pipelines (`WanPipeline`, `CogVideoXPipeline`/`CogVideoXImageToVideoPipeline`, `MochiPipeline`, `LTXPipeline`/`LTXConditionPipeline`, `HunyuanVideoPipeline`) under `.omnius/video-gen/.venv`. Sana-Video 480p (NVlabs) is the default; the tool autodetects `mode=t2v|i2v`, snaps width/height/frames to each model's required quantum, emits a `<video>.mp4` plus first-frame PNG thumbnail and `<video>.json` sidecar, and falls back through Wan2.2 TI2V 5B → smaller-VRAM models on OOM or gating failure. ComfyUI backend available via `--backend comfyui` for all models with registered workflows. Set `with_audio=true` to auto-generate and mux a matched soundtrack. The Telegram public/group quota for video generation is 2 clips per hour per user (vs. 10 for image/audio)
|
|
364
366
|
- **Node visualization** — [omnius.nexus](https://github.com/robit-man/omnius.nexus) Three.js dashboard: 5-color emotional state mapping (neutral/focused/stressed/dreaming/excited), dynamic node size by memory depth + IPFS storage, activity-modulated connections, identity synchrony golden threads between mutually-pinned agents
|
|
365
367
|
- **TTS sanitizer** — strips markdown syntax (`##`, `**`, `` ` ``), emoji (prevents "white heavy checkmark"), box-drawing chars, and ANSI codes before feeding to ALL TTS engines
|
|
366
368
|
- **LuxTTS gapless playback** — look-ahead pre-synthesis pipeline: next chunk synthesizes while current plays, eliminating inter-sentence gaps. Jetson ARM support with NVIDIA's prebuilt PyTorch wheel
|
|
@@ -3361,6 +3363,12 @@ Image surfaces are first-class in the terminal. `/image` generations, generated-
|
|
|
3361
3363
|
| `/image --model <model> <prompt>` | Generate with an explicit image model |
|
|
3362
3364
|
| `/image list` | List image models by category, size, quality expectation, and hardware fit |
|
|
3363
3365
|
| `/image setup <ollama\|diffusers\|sdcpp>` | Show setup commands for an image-generation backend |
|
|
3366
|
+
| `/video` | Open the video-generation model/setup menu |
|
|
3367
|
+
| `/video <prompt>` | Generate a short video (text-to-video) and show an ASCII thumbnail |
|
|
3368
|
+
| `/video --image <path\|url> <prompt>` | Image-to-video animation from a still |
|
|
3369
|
+
| `/video --model <model> <prompt>` | Generate with an explicit video model |
|
|
3370
|
+
| `/video list` | List video models by category, size, kinds, and hardware fit |
|
|
3371
|
+
| `/video setup <diffusers\|comfyui>` | Show setup commands for a video-generation backend |
|
|
3364
3372
|
| `/dream [mode]` | Start dream mode (default, deep, lucid) |
|
|
3365
3373
|
| **Display & Behavior** | |
|
|
3366
3374
|
| `/stream` | Toggle streaming token display with pastel syntax highlighting |
|