npm - @octavus/docs - Versions diffs - 2.5.0 → 2.6.0 - Mend

@octavus/docs 2.5.0 → 2.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

package/content/03-client-sdk/10-client-tools.md +4 -0
package/content/04-protocol/06-handlers.md +23 -1
package/content/04-protocol/07-agent-config.md +13 -3
package/dist/{chunk-SX6AIMRO.js → chunk-BY3IBD5N.js} +13 -13
package/dist/chunk-BY3IBD5N.js.map +1 -0
package/dist/{chunk-WQ7BTD5T.js → chunk-HMXCAQPP.js} +17 -17
package/dist/chunk-HMXCAQPP.js.map +1 -0
package/dist/{chunk-G5OOF4JJ.js → chunk-JBFWLUBN.js} +11 -11
package/dist/chunk-JBFWLUBN.js.map +1 -0
package/dist/chunk-TE6Q4675.js +1471 -0
package/dist/chunk-TE6Q4675.js.map +1 -0
package/dist/chunk-V35N3I3V.js +1471 -0
package/dist/chunk-V35N3I3V.js.map +1 -0
package/dist/content.js +1 -1
package/dist/docs.json +6 -6
package/dist/index.js +1 -1
package/dist/search-index.json +1 -1
package/dist/search.js +1 -1
package/dist/search.js.map +1 -1
package/dist/sections.json +6 -6
package/package.json +1 -1
package/dist/chunk-G5OOF4JJ.js.map +0 -1
package/dist/chunk-SX6AIMRO.js.map +0 -1
package/dist/chunk-WQ7BTD5T.js.map +0 -1

package/content/03-client-sdk/10-client-tools.md CHANGED Viewed

@@ -135,6 +135,7 @@ interface ClientToolContext {
   toolCallId: string; // Unique ID for this call
   toolName: string; // Name of the tool
   signal: AbortSignal; // Aborted if user stops generation
+  addFile: (file: FileReference) => void; // Attach a file to the result
 }
 ```
@@ -149,6 +150,8 @@ Use the signal to cancel long-running operations:
 }
 ```
+Tools that produce files (e.g., screenshots) can call `ctx.addFile()` to attach them to the result. Attached files are sent to the platform alongside the tool result so the LLM can see them as visual content on the next turn.
 ## Interactive Client Tools
 Interactive tools require user input before completing. Use these for confirmations, forms, or any UI that needs user action.
@@ -541,6 +544,7 @@ interface ClientToolContext {
   toolCallId: string;
   toolName: string;
   signal: AbortSignal;
+  addFile: (file: FileReference) => void;
 }
 // Interactive tool (with bound methods)

package/content/04-protocol/06-handlers.md CHANGED Viewed

@@ -186,7 +186,29 @@ Generate image:
   description: Generating your image... # Shown in UI
 ```
-This block is for deterministic image generation pipelines where the prompt is constructed programmatically (e.g., via prompt engineering in a separate thread).
+Edit an existing image using reference images:
+```yaml
+Edit image:
+  block: generate-image
+  prompt: EDIT_INSTRUCTIONS # e.g., "Remove the background"
+  referenceImages: [SOURCE_IMAGE_URL] # Variable(s) containing image URLs
+  imageModel: google/gemini-2.5-flash-image
+  output: EDITED_IMAGE
+  description: Editing image...
+```
+| Field             | Required | Description                                                     |
+| ----------------- | -------- | --------------------------------------------------------------- |
+| `prompt`          | Yes      | Variable name containing the image prompt or edit instructions  |
+| `imageModel`      | Yes      | Image model identifier (e.g., `google/gemini-2.5-flash-image`)  |
+| `size`            | No       | Image dimensions: `1024x1024`, `1792x1024`, or `1024x1792`      |
+| `referenceImages` | No       | Variable names containing image URLs for editing/transformation |
+| `output`          | No       | Variable name to store the generated image URL                  |
+| `thread`          | No       | Thread to associate the output file with                        |
+| `description`     | No       | Description shown in the UI during generation                   |
+This block is for deterministic image generation pipelines where the prompt is constructed programmatically (e.g., via prompt engineering in a separate thread). When `referenceImages` are provided, the prompt describes how to modify those images.
 For agentic image generation where the LLM decides when to generate, configure `imageModel` in the [agent config](/docs/protocol/agent-config#image-generation).

package/content/04-protocol/07-agent-config.md CHANGED Viewed

@@ -200,7 +200,7 @@ agent:
   agentic: true
 ```
-When `imageModel` is configured, the `octavus_generate_image` tool becomes available. The LLM can decide when to generate images based on user requests.
+When `imageModel` is configured, the `octavus_generate_image` tool becomes available. The LLM can decide when to generate images based on user requests. The tool supports both text-to-image generation and image editing/transformation using reference images.
 ### Supported Image Providers
@@ -220,16 +220,26 @@ The tool supports three image sizes:
 - `1792x1024` — Landscape (16:9)
 - `1024x1792` — Portrait (9:16)
+### Image Editing with Reference Images
+Both the agentic tool and the `generate-image` block support reference images for editing and transformation. When reference images are provided, the prompt describes how to modify or use those images.
+| Provider | Models                           | Reference Image Support |
+| -------- | -------------------------------- | ----------------------- |
+| OpenAI   | `gpt-image-1`                    | Yes                     |
+| Google   | Gemini native (`gemini-*-image`) | Yes                     |
+| Google   | Imagen (`imagen-*`)              | No                      |
 ### Agentic vs Deterministic
 Use `imageModel` in agent config when:
-- The LLM should decide when to generate images
+- The LLM should decide when to generate or edit images
 - Users ask for images in natural language
 Use `generate-image` block (see [Handlers](/docs/protocol/handlers#generate-image)) when:
-- You want explicit control over image generation
+- You want explicit control over image generation or editing
 - Building prompt engineering pipelines
 - Images are generated at specific handler steps