RubyGems - ox-ai-workers - Versions diffs - 0.9.5 → 0.9.6.1 - Mend

ox-ai-workers 0.9.5 → 0.9.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

checksums.yaml +4 -4
data/.cursor/rules/010-project-structure.mdc +89 -0
data/.cursor/rules/998-clean-code.mdc +52 -0
data/.cursor/rules/999-mdc-format.mdc +132 -0
data/CHANGELOG.md +4 -1
data/README.md +229 -1
data/lib/oxaiworkers/assistant/module_base.rb +8 -0
data/lib/oxaiworkers/assistant/painter.rb +0 -6
data/lib/oxaiworkers/iterator.rb +33 -4
data/lib/oxaiworkers/module_request.rb +3 -2
data/lib/oxaiworkers/tool/pixels.rb +3 -23
data/lib/oxaiworkers/version.rb +1 -1
metadata +4 -2
data/.cursorrules +0 -155

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: c008f2f4f413251ffa88e00935e118c9844b2e67ab5088ca1091f2b7ac64ac77
-  data.tar.gz: a9b4cf53a19322a10db55ffa480d1e9bfe3a587411398ca5797058f5941c6ee8
+  metadata.gz: 768550f6fde8e654917a2b01404856abcbe643429ba725d7c86c853b472f4bed
+  data.tar.gz: 9444f8c6869caa2e7039d3632605885579364942de95ba2fa6b201d5ca00c3e5
 SHA512:
-  metadata.gz: ac9053e4092115ef23a2f39d834c511c91d9594d8fc8ba37af95652917879ff5cd390581d2bb6d37d184af61d2809a9292c33375c8bd3d06eccb4b84a29ad2ff
-  data.tar.gz: 907e52e93352af84f3bc7b38ef02180dc1a4850f645bb40c7e36d371e85c1a8cdb00e60aa9805d91834b01437c3ae2ff5eb60746e379ec0bfae65a147b627c5c
+  metadata.gz: 6a0c8f3622ab30d8a199b367ad3b0bab3c985a292cc786d3c38380515b5adbe02b945eaff4606073b199a59bf4237bb36c86a0a89b07f6c62c0bc0b738ed7593
+  data.tar.gz: 491720defa83f4b26cccc0d389975f1125c8636821c37314814fdbe9bf714d9233f22456a09b2f8ec890e2b52d3fcb96065295563b1272336316f34461fe6667

data/.cursor/rules/010-project-structure.mdc ADDED Viewed

@@ -0,0 +1,89 @@
+---
+description:
+globs:
+alwaysApply: true
+---
+# Overview
+OxAiWorkers is a Ruby gem that implements a finite state machine (FSM, using the `state_machine` gem) to solve tasks using generative intelligence. This approach enhances the final result by utilizing internal monologue and external tools.
+## Core Components
+- `Request` and `DelayedRequest` - classes for executing API requests (immediate and delayed)
+- `ModuleRequest` - base class for all API requests with parsing and response handling ([module_request.rb](mdc:lib/oxaiworkers/module_request.rb))
+- `Iterator` - main class for iterative task execution with tools ([iterator.rb](mdc:lib/oxaiworkers/iterator.rb))
+- `Assistant::ModuleBase` - high-level wrappers over Iterator (Sysop, Coder, Localizer, etc.) ([module_base.rb](mdc:lib/oxaiworkers/assistant/module_base.rb))
+- `Tool` - tools that can be used during task execution (Eval, FileSystem, Database, Pixels, Pipeline)
+- `ToolDefinition` - module for declaring functions and methods for tools ([tool_definition.rb](mdc:lib/oxaiworkers/tool_definition.rb))
+- `StateTools` - base class for managing states and transitions ([state_tools.rb](mdc:lib/oxaiworkers/state_tools.rb))
+- `ContextualLogger` - logging system with contextual information support
+## Code Conventions
+- Use `snake_case` for method and variable names
+- All code comments, CHANGELOG, README, and other documentation must be written in English
+## Interaction Patterns
+- The system uses internal monologue (inner_monologue) for planning actions
+- External voice (outer_voice) is used for communication with the user
+- Execution flow management through finite state machine
+- Implementation of callback mechanisms for flexible event handling
+## Tools Architecture
+- Each tool should be a self-contained module
+- Tools are registered through the `define_function` interface
+- All tools should handle their own errors and return readable messages
+- Handle errors at the tool level, preventing them from interrupting the main execution flow
+## Finite State Machine Implementation
+- Core FSM based on `state_machine` gem with states: idle → prepared → requested → analyzed → finished → idle
+- State transitions managed by events: prepare, request, analyze, complete, iterate, end ([state_tools.rb](mdc:lib/oxaiworkers/state_tools.rb), [iterator.rb](mdc:lib/oxaiworkers/iterator.rb))
+- `StateTools` - base class for FSM implementation with event hooks and transition callbacks ([state_tools.rb](mdc:lib/oxaiworkers/state_tools.rb))
+- `StateBatch` - FSM extension for batch request processing with additional states
+- Automatic error recovery and retry mechanisms for failed API requests
+## Iterator Lifecycle
+- 3 core functions: inner_monologue, outer_voice, finish_it ([iterator.rb](mdc:lib/oxaiworkers/iterator.rb))
+- Configurable message queue for stateful conversation history
+- Callback system for processing each state transition
+- Context and milestone management for optimizing token usage
+- Support for custom steps and instruction templating
+## Assistants Details
+- `ModuleBase` - shared functionality for all assistant types ([module_base.rb](mdc:lib/oxaiworkers/assistant/module_base.rb))
+- `Sysop` - system administration and shell command execution ([sysop.rb](mdc:lib/oxaiworkers/assistant/sysop.rb), [file_system.rb](mdc:lib/oxaiworkers/tool/file_system.rb))
+- `Coder` - specialized for code generation and analysis ([coder.rb](mdc:lib/oxaiworkers/assistant/coder.rb), [eval.rb](mdc:lib/oxaiworkers/tool/eval.rb))
+- `Localizer` - translation and localization support ([localizer.rb](mdc:lib/oxaiworkers/assistant/localizer.rb))
+- `Orchestrator` - Coordinates multiple assistants to work together on complex tasks ([orchestrator.rb](mdc:lib/oxaiworkers/assistant/orchestrator.rb), [pipeline.rb](mdc:lib/oxaiworkers/tool/pipeline.rb))
+- `Painter` - Image generation and manipulation ([painter.rb](mdc:lib/oxaiworkers/assistant/painter.rb), [pixels.rb](mdc:lib/oxaiworkers/tool/pixels.rb))
+## Internationalization and Localization
+- All user-facing strings MUST be properly localized using I18n (config/locales/*.yml)
+- Use I18n.t for all text that will be shown to users or appears in assistant prompts
+- Store translations in YAML files within the config/locales directory
+- Follow the naming convention of language.namespace.key (e.g., en.oxaiworkers.assistant.role)
+- Use named parameters (%{variable}) instead of positional parameters (%s) in translation strings
+- Use the with_locale method to ensure proper locale context when processing localized text
+- Implement locale-aware classes by including the OxAiWorkers::LoadI18n module
+- Store the current locale on initialization and preserve it across method calls
+- Support multiple languages simultaneously through careful locale management
+- Default to English for developer-facing messages and logs
+- Ensure that all assistant classes properly handle localization in their format_role methods
+## LoadI18n Module Usage
+- The `OxAiWorkers::LoadI18n` module provides two key methods for localization:
+  - `store_locale` - saves the current locale at initialization time
+  - `with_locale` - executes a block of code in the context of the saved locale
+- Always include the `OxAiWorkers::LoadI18n` module in classes that need localization capabilities
+- Call `store_locale` in the initialization methods of locale-aware classes
+- Wrap all locale-dependent code in `with_locale` blocks
+- NEVER redefine the `with_locale` method in classes that include LoadI18n
+- All methods that produce user-visible text must use the locale context via `with_locale` blocks
+- Regular method calls from classes including LoadI18n do not require additional locale handling

data/.cursor/rules/998-clean-code.mdc ADDED Viewed

@@ -0,0 +1,52 @@
+---
+description: Guidelines for writing clean, maintainable, and human-readable code. Apply these rules when writing or reviewing code to ensure consistency and quality.
+globs:
+alwaysApply: false
+---
+# Clean Code Guidelines
+## Constants Over Magic Numbers
+- Replace hard-coded values with named constants
+- Use descriptive constant names that explain the value's purpose
+- Keep constants at the top of the file or in a dedicated constants file
+## Meaningful Names
+- Variables, functions, and classes should reveal their purpose
+- Names should explain why something exists and how it's used
+- Avoid abbreviations unless they're universally understood
+## Smart Comments
+- Don't comment on what the code does - make the code self-documenting
+- Use comments to explain why something is done a certain way
+- Document APIs, complex algorithms, and non-obvious side effects
+## Single Responsibility
+- Each function should do exactly one thing
+- Functions should be small and focused
+- If a function needs a comment to explain what it does, it should be split
+## DRY (Don't Repeat Yourself)
+- Extract repeated code into reusable functions
+- Share common logic through proper abstraction
+- Maintain single sources of truth
+## Clean Structure
+- Keep related code together
+- Organize code in a logical hierarchy
+- Use consistent file and folder naming conventions
+## Encapsulation
+- Hide implementation details
+- Expose clear interfaces
+- Move nested conditionals into well-named functions
+## Code Quality Maintenance
+- Refactor continuously
+- Fix technical debt early
+- Leave code cleaner than you found it
+- Follow the "Fail fast" principle for early error detection
+## Version Control
+- Write clear commit messages
+- Make small, focused commits
+- Use meaningful branch names

data/.cursor/rules/999-mdc-format.mdc ADDED Viewed

@@ -0,0 +1,132 @@
+---
+description:
+globs: *.mdc,**/*.mdc
+alwaysApply: false
+---
+# MDC File Format Guide
+MDC (Markdown Configuration) files are used by Cursor to provide context-specific instructions to AI assistants. This guide explains how to create and maintain these files properly.
+## File Structure
+Each MDC file consists of two main parts:
+1. **Frontmatter** - Configuration metadata at the top of the file
+2. **Markdown Content** - The actual instructions in Markdown format
+### Frontmatter
+The frontmatter must be the first thing in the file and must be enclosed between triple-dash lines (`---`). Configuration should be based on the intended behavior:
+```
+---
+# Configure your rule based on desired behavior:
+description: Brief description of what the rule does
+globs: **/*.js, **/*.ts  # Optional: Comma-separated list, not an array
+alwaysApply: false       # Set to true for global rules
+---
+```
+> **Important**: Despite the appearance, the frontmatter is not strictly YAML formatted. The `globs` field is a comma-separated list and should NOT include brackets `[]` or quotes `"`.
+#### Guidelines for Setting Fields
+- **description**: Should be agent-friendly and clearly describe when the rule is relevant. Format as `<topic>: <details>` for best results.
+- **globs**:
+  - If a rule is only relevant in very specific situations, leave globs empty so it's loaded only when applicable to the user request.
+  - If the only glob would match all files (like `**/*`), leave it empty and set `alwaysApply: true` instead.
+  - Otherwise, be as specific as possible with glob patterns to ensure rules are only applied with relevant files.
+- **alwaysApply**: Use sparingly for truly global guidelines.
+#### Glob Pattern Examples
+- **/*.js - All JavaScript files
+- src/**/*.jsx - All JSX files in the src directory
+- **/components/**/*.vue - All Vue files in any components directory
+### Markdown Content
+After the frontmatter, the rest of the file should be valid Markdown:
+```markdown
+# Title of Your Rule
+## Section 1
+- Guidelines and information
+- Code examples
+## Section 2
+More detailed information...
+```
+## Special Features
+### File References
+You can reference other files from within an MDC file using the markdown link syntax:
+```
+[rule-name.mdc](mdc:location/of/the/rule.mdc)
+```
+When this rule is activated, the referenced file will also be included in the context.
+### Code Blocks
+Use fenced code blocks for examples:
+````markdown
+```javascript
+// Example code
+function example() {
+  return "This is an example";
+}
+```
+````
+## Best Practices
+1. **Clear Organization**
+   - Use numbered prefixes (e.g., `01-workflow.mdc`) for sorting rules logically
+   - Place task-specific rules in the `tasks/` subdirectory
+   - Use descriptive filenames that indicate the rule's purpose
+2. **Frontmatter Specificity**
+   - Be specific with glob patterns to ensure rules are only applied in relevant contexts
+   - Use `alwaysApply: true` for truly global guidelines
+   - Make descriptions clear and concise so AI knows when to apply the rule
+3. **Content Structure**
+   - Start with a clear title (H1)
+   - Use hierarchical headings (H2, H3, etc.) to organize content
+   - Include examples where appropriate
+   - Keep instructions clear and actionable
+4. **File Size Considerations**
+   - Keep files focused on a single topic or closely related topics
+   - Split very large rule sets into multiple files and link them with references
+   - Aim for under 300 lines per file when possible
+## Usage in Cursor
+When working with files in Cursor, rules are automatically applied when:
+1. The file you're working on matches a rule's glob pattern
+2. A rule has `alwaysApply: true` set in its frontmatter
+3. The agent thinks the rule's description matches the user request
+4. You explicitly reference a rule in a conversation with Cursor's AI
+## Creating/Renaming/Removing Rules
+   - When a rule file is added/renamed/removed, update also the list under 010-workflow.mdc.
+   - When changs are made to multiple `mdc` files from a single request, review also [999-mdc-format]((mdc:.cursor/rules/999-mdc-format.mdc)) to consider whether to update it too.
+## Updating Rules
+When updating existing rules:
+1. Maintain the frontmatter format
+2. Keep the same glob patterns unless intentionally changing the rule's scope
+3. Update the description if the purpose of the rule changes
+4. Consider whether changes should propagate to related rules (e.g., CE versions)

data/CHANGELOG.md CHANGED Viewed

@@ -1,6 +1,9 @@
-## [0.9.5] - 2025-05-10
+## [0.9.6] - 2025-05-10
+- Added `add_file` for `Iterator` (only pdf for now)
+- Added `add_image` for `Iterator`
+- Added `add_file` and `add_image` for Assistants
 - Added `call_stack` for `Iterator` and `ModuleRequest`
 - Added `stop_double_calls` for `Iterator` and `ModuleRequest`

data/README.md CHANGED Viewed

@@ -88,6 +88,7 @@ For a more robust setup, you can configure the gem with your API keys, for examp
 OxAiWorkers.configure do |config|
     config.access_token_openai = ENV.fetch("OPENAI")
     config.access_token_deepseek = ENV.fetch("DEEPSEEK")
+    config.access_token_stability = ENV.fetch("STABILITY")
     config.max_tokens = 4096   # Default
     config.temperature = 0.7   # Default
     config.wait_for_complete = true # Default
@@ -396,6 +397,74 @@ class MyTool
 end
 ```
+### Working with Files and Images
+You can easily add files and images to your assistants:
+```ruby
+# Add a PDF file
+iterator.add_file(
+  pdf: File.read('document.pdf'),
+  filename: 'document.pdf',
+  text: 'Here is the document you requested'
+)
+# Add image from URL
+iterator.add_image(
+  text: 'Here is the image',
+  url: 'https://example.com/image.jpg',
+  detail: 'auto' # 'auto', 'low', or 'high'
+)
+# Add image from binary data
+image_data = File.read('local_image.jpg')
+iterator.add_image(
+  text: 'Image from binary data',
+  binary: image_data,
+  mime_type: 'image/jpeg' # Defaults to 'image/png'
+)
+```
+#### Image Input Requirements
+When using images with the API, your input images must meet the following requirements:
+**Supported file types:**
+- PNG (.png)
+- JPEG (.jpeg and .jpg)
+- WEBP (.webp)
+- Non-animated GIF (.gif)
+**Size limits:**
+- Up to 20MB per image
+- Low-resolution: 512px x 512px
+- High-resolution: 768px (short side) x 2000px (long side)
+**Other requirements:**
+- No watermarks or logos
+- No text
+- No NSFW content
+- Clear enough for a human to understand
+**Image detail level:**
+The `detail` parameter controls what level of detail the model uses when processing the image:
+```ruby
+iterator.add_image(
+  text: 'Nature boardwalk image',
+  url: 'https://example.com/nature.jpg',
+  detail: 'high' # Options: 'auto', 'low', or 'high'
+)
+```
+- `detail: 'low'`: Uses less tokens (85) and processes a low-resolution 512px x 512px version of the image. Best for simple use cases like identifying dominant colors or shapes.
+- `detail: 'high'`: Provides better image understanding for complex tasks requiring higher resolution detail.
+- `detail: 'auto'`: Lets the model decide the appropriate detail level (default if not specified).
 ### Handling State Transitions with Callbacks
 You can track and respond to state transitions with callbacks:
@@ -470,6 +539,16 @@ OxAiWorkers provides several specialized assistant types:
   orchestrator.task = "Create a hello world application in C, save it to hello_world.c, compile, run, and verify it works."
   ```
+All assistants support working with files and images:
+```ruby
+# Add files and images to any assistant
+sysop.add_file(pdf: File.read('error_log.pdf'), filename: 'error_log.pdf', text: 'Error log file')
+sysop.add_image(text: 'Screenshot of the error', url: 'https://example.com/screenshot.png')
+```
+See the [Working with Files and Images](#working-with-files-and-images) section for full details.
 ### Available Tools
 OxAiWorkers provides several specialized tools to extend functionality:
@@ -481,7 +560,7 @@ OxAiWorkers provides several specialized tools to extend functionality:
   pixels = OxAiWorkers::Tool::Pixels.new(
     worker: worker,                 # Required: Request or DelayedRequest instance
     current_dir: Dir.pwd,           # Optional: Directory to save generated images
-    image_model: 'dall-e-3',        # Optional: 'dall-e-3' or 'gpt-image-1'
+    image_model: OxAiWorkers::Models::StabilityImages.new,       # Optional, default is OpenaiDalle3
     only: [:generate_image]         # Optional: Limit available functions
   )
   ```
@@ -529,6 +608,69 @@ OxAiWorkers provides several specialized tools to extend functionality:
 Additional tools like Database and Converter are available for specialized tasks and can be integrated using the same pattern.
+### Function Control Mechanisms
+OxAiWorkers provides two powerful mechanisms to control function execution behavior in iterators:
+#### Call Stack
+The `call_stack` parameter allows you to force the model to call specific functions in a predetermined order:
+```ruby
+iterator = OxAiWorkers::Iterator.new(
+  worker: worker,
+  tools: [my_tool],
+  call_stack: [
+    my_tool.full_function_name(:process_data),
+    OxAiWorkers::Iterator.full_function_name(:outer_voice),
+  ]
+)
+```
+This feature is particularly useful when:
+- You need to ensure a specific sequence of operations
+- Certain functions must be called before others
+- You want to guide the model through a predefined workflow
+- Complex operations require strict ordering of function calls
+The `call_stack` is processed sequentially, with each function being removed from the stack after it's called.
+#### Stop Double Calls
+The `stop_double_calls` parameter prevents the model from calling the same function twice in consecutive operations:
+```ruby
+iterator = OxAiWorkers::Iterator.new(
+  worker: worker,
+  tools: [my_tool],
+  stop_double_calls: [
+    my_tool.full_function_name(:expensive_operation)
+  ]
+)
+```
+This feature is valuable for:
+- Preventing redundant operations that could waste resources
+- Avoiding duplicate processing of the same data
+- Ensuring that certain operations are executed only once in sequence
+- Protecting against potential infinite loops in function calls
+When a function is called, its name is stored as the `last_call`. If the next function call matches both the `last_call` and is included in the `stop_double_calls` list, it will be excluded from the available tools for that request.
+By default, `stop_double_calls` is applied to the `inner_monologue` and `outer_voice` functions to prevent reasoning loops and repetitive responses. This default behavior helps models avoid getting stuck in circular thinking patterns.
+If you need to override this default behavior (for example, when consecutive monologue or voice calls are required for your specific use case), you can reset the stop_double_calls list **after** the iterator is created:
+```ruby
+# Clear the default stop_double_calls constraints
+@iterator.stop_double_calls = []
+# Or set your own custom constraints
+@iterator.stop_double_calls = [my_tool.full_function_name(:specific_function)]
+```
 ### Implementing Your Own Assistant
 Create custom assistants by inheriting from existing ones or composing with the Iterator:
@@ -553,6 +695,92 @@ module OxAiWorkers
 end
 ```
+## Image Generation
+OxAiWorkers supports image generation through the Painter assistant and Pixels tool, with multiple AI image generation models.
+### Supported Image Models
+- **OpenaiDalle3** - OpenAI's DALL-E 3 model
+- **OpenaiGptImage** - OpenAI's GPT-Image-1 model
+- **StabilityImages** - Stability AI's image generation models
+### Using the Painter Assistant
+```ruby
+# Using DALL-E 3 (default)
+painter = OxAiWorkers::Assistant::Painter.new(current_dir: Dir.pwd)
+painter.task = "Create an image of a sunset over mountains"
+# Using GPT-Image-1
+painter = OxAiWorkers::Assistant::Painter.new(
+  image_model: OxAiWorkers::Models::OpenaiGptImage.new,
+  current_dir: Dir.pwd
+)
+painter.task = "Generate a photorealistic red apple"
+# Using Stability AI
+painter = OxAiWorkers::Assistant::Painter.new(
+  image_model: OxAiWorkers::Models::StabilityImages.new,
+  current_dir: Dir.pwd
+)
+painter.task = "Create a fantasy landscape with dragons"
+```
+### Using the Pixels Tool Directly
+For more direct control over image generation:
+```ruby
+# Initialize with DALL-E 3
+pixels = OxAiWorkers::Tool::Pixels.new(
+  worker: OxAiWorkers::Models::OpenaiDalle3.new,
+  current_dir: Dir.pwd
+)
+pixels.generate_image(
+  prompt: "A photorealistic red apple on a wooden table",
+  file_name: "apple.png",
+  size: "1024x1024",
+  quality: "hd"
+)
+# Initialize with GPT-Image-1
+pixels = OxAiWorkers::Tool::Pixels.new(
+  worker: OxAiWorkers::Models::OpenaiGptImage.new,
+  current_dir: Dir.pwd
+)
+pixels.generate_image(
+  prompt: "Futuristic cityscape at night",
+  file_name: "city.png",
+  size: "1536x1024",
+  quality: "high"
+)
+# Initialize with Stability AI
+pixels = OxAiWorkers::Tool::Pixels.new(
+  worker: OxAiWorkers::Models::StabilityImages.new,
+  current_dir: Dir.pwd
+)
+pixels.generate_image(
+  prompt: "Photorealistic mountain landscape",
+  file_name: "mountains.png"
+)
+```
+### Model-Specific Features
+- **OpenaiDalle3**
+  - Sizes: '1024x1024', '1024x1792', '1792x1024'
+  - Qualities: 'standard', 'hd'
+- **OpenaiGptImage**
+  - Sizes: 'auto', '1024x1024', '1536x1024', '1024x1536'
+  - Qualities: 'auto', 'low', 'medium', 'high'
+- **StabilityImages**
+  - Uses Stability AI's API with different engine options
+  - Configuration via options parameter
 ## Contributing
 Bug reports and pull requests are welcome on GitHub at <https://github.com/neonix20b/ox-ai-workers>. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [code of conduct](https://github.com/neonix20b/ox-ai-workers/blob/main/CODE_OF_CONDUCT.md).

data/lib/oxaiworkers/assistant/module_base.rb CHANGED Viewed

@@ -34,6 +34,14 @@ module OxAiWorkers
         @iterator.clear_context
         @iterator.add_context context
       end
+      def add_file(pdf:, filename:, text:, role: :user)
+        @iterator.add_file(pdf:, filename:, text:, role:)
+      end
+      def add_image(text:, url: nil, binary: nil, role: :user, detail: 'auto', mime_type: 'image/jpeg')
+        @iterator.add_image(text:, url:, binary:, role:, detail:, mime_type:)
+      end
     end
   end
 end

data/lib/oxaiworkers/assistant/painter.rb CHANGED Viewed

@@ -29,12 +29,6 @@ module OxAiWorkers
           on_outer_voice: ->(text:) { puts "voice: #{text}".colorize(:green) }
         )
       end
-      def cleanup
-        Dir.glob(File.join(@current_dir, '*.png')).each do |file|
-          File.delete(file) if File.exist?(file)
-        end
-      end
     end
   end
 end

data/lib/oxaiworkers/iterator.rb CHANGED Viewed

@@ -117,16 +117,16 @@ module OxAiWorkers
       @worker.call_stack = @call_stack.dup
       @worker.stop_double_calls = @stop_double_calls
       @worker.messages = []
-      @worker.append(role: :system, content: @role) if @role.present?
+      @worker.append(role: :system, content: "<role>\n#{@role}\n</role>") if @role.present?
-      @tasks.each { |task| @worker.append(role: :user, content: task) }
-      @worker.append(role: :system, content: valid_monologue.join("\n"))
+      @worker.append(role: :system, content: "<instructions>\n#{valid_monologue.join("\n")}\n</instructions>")
+      @tasks.each { |task| @worker.append(role: :user, content: "<task>\n#{task}\n</task>") }
       @worker.append(messages: @context) if @context.present?
       @tools.each do |tool|
         @worker.append(role: :user, content: tool.context) if tool.respond_to?(:context) && tool.context.present?
       end
       @worker.append(messages: @messages)
-      @tasks.each { |task| @worker.append(role: :user, content: task) }
+      @tasks.each { |task| @worker.append(role: :user, content: "<task>\n#{task}\n</task>") }
       @worker.tools = function_schemas.to_openai_format(only: available_defs)
       return unless @tools.present?
@@ -252,6 +252,35 @@ module OxAiWorkers
       add_raw_context({ role:, content: text })
     end
+    def add_file(pdf:, filename:, text:, role: :user)
+      content = []
+      content << { type: 'text', text: } if text.present?
+      content << {
+        type: 'file',
+        file: {
+          filename:,
+          file_data: Base64.strict_encode64(pdf)
+        }
+      }
+      add_raw_context({ role:, content: })
+    end
+    def add_image(text:, url: nil, binary: nil, role: :user, detail: 'auto', mime_type: 'image/png')
+      content = []
+      content << { type: 'text', text: } if text.present?
+      image_url = if binary.present?
+                    "data:#{mime_type};base64,#{Base64.strict_encode64(binary)}"
+                  else
+                    url
+                  end
+      content << { type: 'image_url', image_url: { url: image_url, detail: } }
+      add_raw_context({ role:, content: })
+    end
     def add_raw_context(c)
       @context << c
     end

data/lib/oxaiworkers/module_request.rb CHANGED Viewed

@@ -53,10 +53,11 @@ module OxAiWorkers
         frequency_penalty: @model.frequency_penalty
       }
       if @tools.present?
-        parameters[:tools] = @tools.reject do |f|
+        parameters[:tools] = @tools.select do |f|
           tool_name = f[:function][:name]
           tool_name == @last_call && @stop_double_calls.include?(tool_name)
         end
+        OxAiWorkers.logger.debug("tools: #{parameters[:tools]} last_call=#{@last_call} stop_double_calls=#{@stop_double_calls}", for: self.class)
         if @call_stack&.any?
           func1 = @call_stack.first
           @call_stack = @call_stack.drop(1)
@@ -146,7 +147,7 @@ module OxAiWorkers
           name: function['name'].split('__').last,
           args: args
         }
-        @last_call = function['name']
+        @last_call = function['name'].to_s
       end
     end
   end

data/lib/oxaiworkers/tool/pixels.rb CHANGED Viewed

@@ -62,29 +62,9 @@ module OxAiWorkers
         end
       end
-      def edit_image(input_image:, prompt:, output_file_name: nil, size: nil, mask: nil)
-        size ||= @image_model['size'].first
-        mask ||= @mask
-        response = @worker.client.images.edit(
-          parameters: {
-            image: input_image,
-            model: @image_model['model'],
-            prompt:,
-            size:,
-            mask:
-          }
-        )
-        @url = response.dig('data', 0, 'url')
-        revised_prompt = response.dig('data', 0, 'revised_prompt')
-        if output_file_name.present?
-          path = save_generated_image(file_name: output_file_name)
-          "url: #{@url}\nfile_name: #{path}\n\nrevised_prompt: #{revised_prompt}"
-        else
-          "url: #{@url}\n\nrevised_prompt: #{revised_prompt}"
-        end
-      end
+      # def edit_image(input_image:, prompt:, output_file_name: nil, size: nil, mask: nil)
+      #   # TODO: Implement edit_image
+      # end
       def save_generated_image(file_name:, binary:)
         unless @current_dir.present?

data/lib/oxaiworkers/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module OxAiWorkers
-  VERSION = '0.9.5'
+  VERSION = '0.9.6.1'
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: ox-ai-workers
 version: !ruby/object:Gem::Version
-  version: 0.9.5
+  version: 0.9.6.1
 platform: ruby
 authors:
 - Denis Smolev
@@ -123,7 +123,9 @@ executables:
 extensions: []
 extra_rdoc_files: []
 files:
-- ".cursorrules"
+- ".cursor/rules/010-project-structure.mdc"
+- ".cursor/rules/998-clean-code.mdc"
+- ".cursor/rules/999-mdc-format.mdc"
 - ".ruby-version"
 - CHANGELOG.md
 - CODE_OF_CONDUCT.md

data/.cursorrules DELETED Viewed

@@ -1,155 +0,0 @@
-# Overview
-OxAiWorkers is a Ruby gem that implements a finite state machine (using the `state_machine` gem) to solve tasks using generative intelligence (with the `ruby-openai` gem). This approach enhances the final result by utilizing internal monologue and external tools.
-## Architecture Principles
-- The library is built on the finite state machine (FSM) pattern using the 'state_machine' gem
-- Integration with generative models is implemented using the 'ruby-openai' gem
-- DRY (Don't Repeat Yourself) principle is applied throughout all components
-- Modular structure with clear separation of responsibilities between classes
-- Encapsulation of states and transitions in separate classes
-- Implementation of the "Composition" pattern for flexible tool integration
-## Core Components
-- `Request` and `DelayedRequest` - classes for executing API requests (immediate and delayed)
-- `Iterator` - main class for iterative task execution with tools
-- `Assistant` - high-level wrappers over Iterator (Sysop, Coder, Localizer, etc.)
-- `Tool` - tools that can be used during task execution (Eval, FileSystem, Database)
-- `ToolDefinition` - module for declaring functions and methods for tools
-- `StateTools` - base class for managing states and transitions
-- `ContextualLogger` - logging system with contextual information support
-## Code Conventions
-- Use `snake_case` for method and variable names
-- Functions for generative models should also be in `snake_case` (inner_monologue, outer_voice, etc.)
-- All public methods must have documentation with usage examples
-- Tests are mandatory for all new functions
-- All code comments, CHANGELOG, README, and other documentation must be written in English
-- Use YARD-style documentation for all public methods
-- Maintain a unified code formatting style (Rubocop is recommended)
-- Follow the "Fail fast" principle for early error detection
-## Interaction Patterns
-- The system uses internal monologue (inner_monologue) for planning actions
-- External voice (outer_voice) is used for communication with the user
-- Execution flow management through finite state machine
-- Implementation of callback mechanisms for flexible event handling
-- Isolation of error handling functions at the tool level
-## Integration
-- CLI interface through `oxaiworkers init` and `oxaiworkers run` commands
-- Rails support via ActiveRecord for storing delayed requests
-- Configuration through the `OxAiWorkers.configure` block
-- Multilingual support via standard I18n
-- Integration with external APIs through request client templates
-- Delayed execution mechanism via DelayedRequest
-- Support for various language models (OpenAI, Anthropic, Gemini)
-## Best Practices
-- Use callbacks to handle various states (on_inner_monologue, on_outer_voice)
-- Handle errors at the tool level, preventing them from interrupting the main execution flow
-- When creating new assistants, inherit from the base Assistant class
-- Use the white_list mechanism to restrict available functions
-- Separate language model requests from result processing logic
-- Practice dependency injection to improve code testability
-- Use localization mechanisms for multilingual support
-## Tools Architecture
-- Each tool should be a self-contained module
-- Tools are registered through the `define_function` interface
-- All tools should handle their own errors and return readable messages
-- Use parameter validation at the function definition level
-- Maintain a unified format for return values
-## Performance and Scaling
-- Cache API request results when possible
-- Use asynchronous processing for long operations
-- Apply backoff strategies for repeated requests
-- Break large tasks into atomic operations
-- Provide monitoring and profiling mechanisms
-## Finite State Machine Implementation
-- Core FSM based on `state_machine` gem with states: idle → prepared → requested → analyzed → finished → idle
-- State transitions managed by events: prepare, request, analyze, complete, iterate, end
-- `StateTools` - base class for FSM implementation with event hooks and transition callbacks
-- `StateBatch` - FSM extension for batch request processing with additional states
-- Automatic error recovery and retry mechanisms for failed API requests
-## Request Processing
-- `ModuleRequest` - base class for all API requests with parsing and response handling
-- Support for streaming responses with callback processing
-- Built-in token usage tracking and truncation detection
-- Error handling with automatic retries for server errors
-## Iterator Lifecycle
-- 3 core functions: inner_monologue, outer_voice, finish_it
-- Configurable message queue for stateful conversation history
-- Callback system for processing each state transition
-- Context and milestone management for optimizing token usage
-- Support for custom steps and instruction templating
-## Additional Tools
-- `Converter` - tools for data format conversion and transformation
-- Support for custom tool development through inheritance and composition
-- Automatic function name resolution and parameter validation
-## Assistants Details
-- `ModuleBase` - shared functionality for all assistant types
-- `Sysop` - system administration and shell command execution
-- `Coder` - specialized for code generation and analysis
-- `Localizer` - translation and localization support
-## Development Guidelines
-- Use dependency injection for testability
-- Follow the FSM pattern for all stateful operations
-- Implement proper error boundaries at the tool level
-- Use monologue for complex reasoning and planning
-- Apply callbacks for event-driven architecture
-- Utilize templates in the CLI for rapid prototyping
-- Extend the base classes rather than modifying them
-## Internationalization and Localization
-- All code comments, variable names, and documentation MUST be written in English
-- All user-facing strings MUST be properly localized using I18n
-- Use I18n.t for all text that will be shown to users or appears in assistant prompts
-- Store translations in YAML files within the config/locales directory
-- Follow the naming convention of language.namespace.key (e.g., en.oxaiworkers.assistant.role)
-- Use named parameters (%{variable}) instead of positional parameters (%s) in translation strings
-- Use the with_locale method to ensure proper locale context when processing localized text
-- Implement locale-aware classes by including the OxAiWorkers::LoadI18n module
-- Store the current locale on initialization and preserve it across method calls
-- Support multiple languages simultaneously through careful locale management
-- Default to English for developer-facing messages and logs
-- Ensure that all assistant classes properly handle localization in their format_role methods
-## LoadI18n Module Usage
-- The `OxAiWorkers::LoadI18n` module provides two key methods for localization:
-  - `store_locale` - saves the current locale at initialization time
-  - `with_locale` - executes a block of code in the context of the saved locale
-- Always include the `OxAiWorkers::LoadI18n` module in classes that need localization capabilities
-- Call `store_locale` in the initialization methods of locale-aware classes
-- Wrap all locale-dependent code in `with_locale` blocks
-- NEVER redefine the `with_locale` method in classes that include LoadI18n
-- All methods that produce user-visible text must use the locale context via `with_locale` blocks
-- Regular method calls from classes including LoadI18n do not require additional locale handling
-## Multi-Language Support
-- Use the store_locale and with_locale methods for consistent localization context
-- All error messages should be localized and retrieved via I18n.t