npm - @nomad-e/bluma-cli - Versions diffs - 0.1.14 → 0.1.16 - Mend

@nomad-e/bluma-cli 0.1.14 → 0.1.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/README.md +160 -0
package/dist/config/models_config.json +78 -0
package/dist/config/native_tools.json +33 -0
package/dist/main.js +291 -69
package/dist/skills/git-conventional/LICENSE.txt +3 -0
package/dist/skills/git-conventional/SKILL.md +83 -0
package/dist/skills/skill-creator/SKILL.md +495 -0
package/dist/skills/testing/LICENSE.txt +3 -0
package/dist/skills/testing/SKILL.md +114 -0
package/package.json +2 -2

package/README.md CHANGED Viewed

@@ -22,6 +22,7 @@ BluMa is a CLI-based model agent responsible for language-level code generation,
 - [Screenshots](#screenshots)
 - [Usage](#usage)
   - [Examples](#-usage-examples)
+- [Sandbox / Agent Mode](#sandbox-agent-mode)
 - [Configuration and Environment Variables](#configuration-and-environment-variables)
 - [Development and Build](#development-and-build)
 - [Extensibility: Tools and Plugins](#extensibility-tools-and-plugins)
@@ -179,6 +180,165 @@ For full installation details, see [Installation](#installation).
 ---
+## <a name="sandbox-agent-mode"></a>Sandbox / Agent Mode
+BluMa was designed primarily as an **interactive CLI agent**, but it also exposes a **non-interactive “agent mode”** for integration with orchestrators such as AGIWeb Sandbox or other backends.
+### Why Agent Mode Exists
+- Allow external systems (e.g. a Sandbox API, another agent like Severino, CI pipelines) to:
+  - Send a **single JSON payload** describing a task (`action` + `context`).
+  - Receive **only structured JSON Lines (JSONL)** as output (no TUI).
+  - Orchestrate BluMa as a **sub-agent** inside a larger architecture.
+- Guarantee:
+  - Deterministic, parseable logs.
+  - A single, well-defined `result` event per execution.
+  - No interactive prompts or confirmation flows when running in sandbox.
+### How to Call BluMa in Agent Mode
+Agent mode is activated by passing the `agent` subcommand and piping a JSON envelope to stdin:
+```bash
+BLUMA_SANDBOX=true \
+BLUMA_SANDBOX_NAME="sandbox-api" \
+node dist/main.js agent --input - << 'EOF'
+{
+  "message_id": "job-123",
+  "from_agent": "sandbox-api",
+  "to_agent": "bluma",
+  "action": "echo_test",
+  "context": {
+    "user_request": "Diz-me em uma frase o que é o bluma-cli."
+  },
+  "metadata": {
+    "sandbox": true
+  }
+}
+EOF
+```
+You can also use `--input-file` instead of stdin:
+```bash
+BLUMA_SANDBOX=true BLUMA_SANDBOX_NAME="sandbox-api" \
+node dist/main.js agent --input-file ./payload.json
+```
+### Input Envelope Contract
+The JSON payload must follow this envelope:
+```json
+{
+  "message_id": "job-123",           // Optional but recommended
+  "from_agent": "sandbox-api",       // Who is calling BluMa
+  "to_agent": "bluma",               // Target agent (for routing)
+  "action": "generate_app",          // High-level action label
+  "context": {                       // Arbitrary JSON with task details
+    "user_request": "Criar dashboard de vendas",
+    "erp_models": ["sale.order"],
+    "permissions": ["sales"]
+  },
+  "metadata": {                      // Free-form metadata for the orchestrator
+    "sandbox": true,
+    "caller": "agiweb"
+  }
+}
+```
+Internally, BluMa will:
+- Initialize the agent with a dedicated `eventBus`.
+- Build a single user message containing this JSON.
+- Run the normal reasoning + tool flow, but:
+  - **Without** rendering the Ink UI.
+  - **Without** asking for user confirmations when `BLUMA_SANDBOX=true`.
+### Output: JSON Lines (JSONL)
+In agent mode, BluMa writes **one JSON object per line** to stdout.
+Typical events:
+```json
+{"event_type":"log","level":"info","message":"Starting agent mode execution","timestamp":"...","data":{"message_id":"job-123","action":"echo_test","from_agent":"sandbox-api","to_agent":"bluma"}}
+{"event_type":"action_status","timestamp":"...","payload":{"action":"Thinking"}}
+{"event_type":"backend_message","backend_type":"tool_call","timestamp":"...","payload":{"type":"tool_call","tool_name":"read_file_lines","arguments":{...}}}
+{"event_type":"backend_message","backend_type":"tool_result","timestamp":"...","payload":{"type":"tool_result","tool_name":"read_file_lines","result":"{ ... }"}}
+...
+{"event_type":"result","status":"success","timestamp":"...","data":{"message_id":"job-123","action":"echo_test","last_assistant_message":"...","reasoning":null}}
+```
+Key points:
+- **`event_type: "backend_message"`** mirrors what the CLI UI would receive (`tool_call`, `tool_result`, `reasoning`, `done`, etc.).
+- **`event_type: "action_status"`** surfaces high-level states (Thinking, Reading, Executing, Waiting, Responding).
+- **`event_type: "result"`** appears **exactly once** per execution and contains:
+  - `message_id`: propagated from the input.
+  - `action`: propagated from the input.
+  - `last_assistant_message`: the final message BluMa would send to a human (content of the `message` tool).
+  - `reasoning`: concatenated reasoning text when available (can be `null`).
+### Sandbox Behaviour and Permissions
+When `BLUMA_SANDBOX=true`:
+- The **system prompt** is augmented with sandbox-specific context, instructing the model that:
+  - It is running **inside a non-interactive sandbox**.
+  - All inputs come from JSON payloads, not from a human on a terminal.
+  - Outputs must be deterministic, concise and suitable for machine parsing.
+- Tool execution:
+  - All tools are considered **auto-approved** in sandbox mode (no confirmation prompts from the user).
+  - This allows the orchestrator to let BluMa freely call `shell_command`, `command_status`, `coding_memory`, etc., while still observing every step through JSONL logs.
+### Example: Asking for Python Version
+```bash
+BLUMA_SANDBOX=true BLUMA_SANDBOX_NAME="sandbox-api" \
+node dist/main.js agent --input - << 'EOF'
+{
+  "message_id": "job-python-version",
+  "from_agent": "sandbox-api",
+  "to_agent": "bluma",
+  "action": "python_version",
+  "context": {
+    "user_request": "Diz-me qual a versão do Python instalada neste ambiente."
+  },
+  "metadata": {
+    "sandbox": true
+  }
+}
+EOF
+```
+BluMa will typically:
+- Call `shell_command` with `python3 --version`.
+- Use `command_status` to wait for completion.
+- Optionally probe `python --version`.
+- Return a final `result` event like:
+```json
+{
+  "event_type": "result",
+  "status": "success",
+  "data": {
+    "message_id": "job-python-version",
+    "action": "python_version",
+    "last_assistant_message": "**Python 3.12.3** está instalado neste ambiente.\n\nO comando `python` não está disponível — apenas `python3`.",
+    "reasoning": null
+  }
+}
+```
+This makes it straightforward for an API layer (AGIWeb Sandbox, Severino, etc.) to:
+- Orchestrate BluMa as a sub-agent.
+- Log all intermediate steps.
+- Present only the final `last_assistant_message` (and optionally `reasoning`) to the end user.
+---
 ## Screenshots
 Here's BluMa in action:

package/dist/config/models_config.json ADDED Viewed

@@ -0,0 +1,78 @@
+{
+  "models": [
+    {
+      "id": "google/gemini-3.1-pro-preview-customtools",
+      "tier": "tools",
+      "task_type": "multi_tool",
+      "context_window": 1048576,
+      "pricing": {
+        "input_per_1m_tokens": "$2.00",
+        "output_per_1m_tokens": "$12.00"
+      },
+      "description": "Enhanced version of Gemini 3.1 Pro tuned specifically for reliable tool selection and function calling. Designed to avoid overusing generic shell tools when more targeted tools are available, making it ideal for complex coding agents and multi-tool workflows in the CLI.",
+      "best_for": [
+        "Workflows that involve many different tools (file operations, search, tests, shell) and require the model to pick the right one.",
+        "Coordinating multi-step coding tasks where safe and accurate tool use is more important than raw speed or cost.",
+        "Complex debugging or refactors that need mixing code edits, search, tests, and shell commands.",
+        "Agent-style tasks where tool misuse is risky and you want maximum reliability in tool choice."
+      ],
+      "not_for": "Very simple edits, quick questions, or cheap one-off operations that qwen3.5-flash can handle more economically."
+    },
+    {
+      "id": "qwen/qwen3-coder-next",
+      "tier": "coder",
+      "task_type": "multi_file",
+      "context_window": 262144,
+      "pricing": {
+        "input_per_1m_tokens": "$0.12",
+        "output_per_1m_tokens": "$0.75"
+      },
+      "description": "Coder-focused MoE model optimized for long-horizon coding agents and local development workflows. Handles large codebases, multi-file refactors, and recovery from execution failures while remaining cost-effective for always-on agents.",
+      "best_for": [
+        "Large refactors across many files where the agent must keep track of a global design.",
+        "Implementing new subsystems (features, services, modules) with tests and documentation.",
+        "Agentic coding loops that involve planning, executing edits, running commands/tests, and iterating.",
+        "Complex bug-hunting sessions where the model must reason over many files and past attempts."
+      ],
+      "not_for": "Tiny edits, trivial Q&A, or ultra-simple tasks where qwen3.5-flash is cheaper and fast enough."
+    },
+    {
+      "id": "deepseek/deepseek-chat-v3.1",
+      "tier": "reasoning",
+      "task_type": "reasoning",
+      "context_window": 32768,
+      "pricing": {
+        "input_per_1m_tokens": "$0.15",
+        "output_per_1m_tokens": "$0.75"
+      },
+      "description": "Hybrid reasoning model supporting both thinking and non-thinking modes with strong performance on coding, tool use, and long-context reasoning. Good choice when deep structured reasoning or complex analysis is needed in addition to code generation.",
+      "best_for": [
+        "Deep reasoning on tricky bugs, architecture decisions, or algorithm design.",
+        "Analyzing logs, traces, or large textual outputs to identify root causes.",
+        "Complex code reviews or design reviews where clear, structured justification is needed.",
+        "Cases where you explicitly want more deliberate reasoning rather than only fast responses."
+      ],
+      "not_for": "Purely mechanical edits or very simple tasks where a cheaper, faster model is sufficient."
+    },
+    {
+      "id": "anthropic/claude-haiku-4.5",
+      "tier": "heavy",
+      "task_type": "complex",
+      "context_window": 200000,
+      "pricing": {
+        "input_per_1m_tokens": "$1.00",
+        "output_per_1m_tokens": "$5.00"
+      },
+      "description": "High-end, high-speed frontier model with extended thinking and strong performance on coding, reasoning, and computer-use tasks. Best reserved for the most complex, high-stakes workflows where you need frontier-level capability and robustness.",
+      "best_for": [
+        "Very large, high-risk changes to critical production code where maximum reliability matters.",
+        "Long, multi-phase agentic tasks that must not fail (e.g. migrations, deep refactors, safety-sensitive changes).",
+        "Scenarios where you want frontier-level coding performance and can afford higher cost.",
+        "Coordinating many sub-agents or parallel tool workflows for large projects."
+      ],
+      "not_for": "Routine day-to-day coding tasks, quick fixes, or low-impact work where cheaper models are perfectly adequate."
+    }
+  ],
+  "default_model": "qwen/qwen3-coder-next"
+}

package/dist/config/native_tools.json CHANGED Viewed

@@ -561,6 +561,39 @@
           ]
         }
       }
+    },
+    {
+      "type": "function",
+      "function": {
+        "name": "coding_memory",
+        "description": "Persists and retrieves short notes about the codebase, decisions, and context that should be remembered across turns. Use this to store important insights (APIs, invariants, design decisions) and to search them later.",
+        "parameters": {
+          "type": "object",
+          "properties": {
+            "action": {
+              "type": "string",
+              "description": "Operation to perform: add a new note, list all notes, search by text/tags, or clear all entries.",
+              "enum": ["add", "list", "search", "clear"]
+            },
+            "note": {
+              "type": "string",
+              "description": "The note to store when action is 'add'. Should summarize something worth remembering about the code, architecture, or requirements."
+            },
+            "tags": {
+              "type": "array",
+              "items": {
+                "type": "string"
+              },
+              "description": "Optional tags to categorize the note (e.g. ['api', 'auth', 'performance'])."
+            },
+            "query": {
+              "type": "string",
+              "description": "Search text used when action is 'search'. Matches against note text and tags."
+            }
+          },
+          "required": ["action"]
+        }
+      }
     }
   ]
 }