ox-ai-workers 0.9.5 → 0.9.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: c008f2f4f413251ffa88e00935e118c9844b2e67ab5088ca1091f2b7ac64ac77
4
- data.tar.gz: a9b4cf53a19322a10db55ffa480d1e9bfe3a587411398ca5797058f5941c6ee8
3
+ metadata.gz: 768550f6fde8e654917a2b01404856abcbe643429ba725d7c86c853b472f4bed
4
+ data.tar.gz: 9444f8c6869caa2e7039d3632605885579364942de95ba2fa6b201d5ca00c3e5
5
5
  SHA512:
6
- metadata.gz: ac9053e4092115ef23a2f39d834c511c91d9594d8fc8ba37af95652917879ff5cd390581d2bb6d37d184af61d2809a9292c33375c8bd3d06eccb4b84a29ad2ff
7
- data.tar.gz: 907e52e93352af84f3bc7b38ef02180dc1a4850f645bb40c7e36d371e85c1a8cdb00e60aa9805d91834b01437c3ae2ff5eb60746e379ec0bfae65a147b627c5c
6
+ metadata.gz: 6a0c8f3622ab30d8a199b367ad3b0bab3c985a292cc786d3c38380515b5adbe02b945eaff4606073b199a59bf4237bb36c86a0a89b07f6c62c0bc0b738ed7593
7
+ data.tar.gz: 491720defa83f4b26cccc0d389975f1125c8636821c37314814fdbe9bf714d9233f22456a09b2f8ec890e2b52d3fcb96065295563b1272336316f34461fe6667
@@ -0,0 +1,89 @@
1
+ ---
2
+ description:
3
+ globs:
4
+ alwaysApply: true
5
+ ---
6
+ # Overview
7
+
8
+ OxAiWorkers is a Ruby gem that implements a finite state machine (FSM, using the `state_machine` gem) to solve tasks using generative intelligence. This approach enhances the final result by utilizing internal monologue and external tools.
9
+
10
+ ## Core Components
11
+
12
+ - `Request` and `DelayedRequest` - classes for executing API requests (immediate and delayed)
13
+ - `ModuleRequest` - base class for all API requests with parsing and response handling ([module_request.rb](mdc:lib/oxaiworkers/module_request.rb))
14
+ - `Iterator` - main class for iterative task execution with tools ([iterator.rb](mdc:lib/oxaiworkers/iterator.rb))
15
+ - `Assistant::ModuleBase` - high-level wrappers over Iterator (Sysop, Coder, Localizer, etc.) ([module_base.rb](mdc:lib/oxaiworkers/assistant/module_base.rb))
16
+ - `Tool` - tools that can be used during task execution (Eval, FileSystem, Database, Pixels, Pipeline)
17
+ - `ToolDefinition` - module for declaring functions and methods for tools ([tool_definition.rb](mdc:lib/oxaiworkers/tool_definition.rb))
18
+ - `StateTools` - base class for managing states and transitions ([state_tools.rb](mdc:lib/oxaiworkers/state_tools.rb))
19
+ - `ContextualLogger` - logging system with contextual information support
20
+
21
+ ## Code Conventions
22
+
23
+ - Use `snake_case` for method and variable names
24
+ - All code comments, CHANGELOG, README, and other documentation must be written in English
25
+
26
+ ## Interaction Patterns
27
+
28
+ - The system uses internal monologue (inner_monologue) for planning actions
29
+ - External voice (outer_voice) is used for communication with the user
30
+ - Execution flow management through finite state machine
31
+ - Implementation of callback mechanisms for flexible event handling
32
+
33
+ ## Tools Architecture
34
+
35
+ - Each tool should be a self-contained module
36
+ - Tools are registered through the `define_function` interface
37
+ - All tools should handle their own errors and return readable messages
38
+ - Handle errors at the tool level, preventing them from interrupting the main execution flow
39
+
40
+ ## Finite State Machine Implementation
41
+
42
+ - Core FSM based on `state_machine` gem with states: idle → prepared → requested → analyzed → finished → idle
43
+ - State transitions managed by events: prepare, request, analyze, complete, iterate, end ([state_tools.rb](mdc:lib/oxaiworkers/state_tools.rb), [iterator.rb](mdc:lib/oxaiworkers/iterator.rb))
44
+ - `StateTools` - base class for FSM implementation with event hooks and transition callbacks ([state_tools.rb](mdc:lib/oxaiworkers/state_tools.rb))
45
+ - `StateBatch` - FSM extension for batch request processing with additional states
46
+ - Automatic error recovery and retry mechanisms for failed API requests
47
+
48
+ ## Iterator Lifecycle
49
+
50
+ - 3 core functions: inner_monologue, outer_voice, finish_it ([iterator.rb](mdc:lib/oxaiworkers/iterator.rb))
51
+ - Configurable message queue for stateful conversation history
52
+ - Callback system for processing each state transition
53
+ - Context and milestone management for optimizing token usage
54
+ - Support for custom steps and instruction templating
55
+
56
+ ## Assistants Details
57
+
58
+ - `ModuleBase` - shared functionality for all assistant types ([module_base.rb](mdc:lib/oxaiworkers/assistant/module_base.rb))
59
+ - `Sysop` - system administration and shell command execution ([sysop.rb](mdc:lib/oxaiworkers/assistant/sysop.rb), [file_system.rb](mdc:lib/oxaiworkers/tool/file_system.rb))
60
+ - `Coder` - specialized for code generation and analysis ([coder.rb](mdc:lib/oxaiworkers/assistant/coder.rb), [eval.rb](mdc:lib/oxaiworkers/tool/eval.rb))
61
+ - `Localizer` - translation and localization support ([localizer.rb](mdc:lib/oxaiworkers/assistant/localizer.rb))
62
+ - `Orchestrator` - Coordinates multiple assistants to work together on complex tasks ([orchestrator.rb](mdc:lib/oxaiworkers/assistant/orchestrator.rb), [pipeline.rb](mdc:lib/oxaiworkers/tool/pipeline.rb))
63
+ - `Painter` - Image generation and manipulation ([painter.rb](mdc:lib/oxaiworkers/assistant/painter.rb), [pixels.rb](mdc:lib/oxaiworkers/tool/pixels.rb))
64
+
65
+ ## Internationalization and Localization
66
+
67
+ - All user-facing strings MUST be properly localized using I18n (config/locales/*.yml)
68
+ - Use I18n.t for all text that will be shown to users or appears in assistant prompts
69
+ - Store translations in YAML files within the config/locales directory
70
+ - Follow the naming convention of language.namespace.key (e.g., en.oxaiworkers.assistant.role)
71
+ - Use named parameters (%{variable}) instead of positional parameters (%s) in translation strings
72
+ - Use the with_locale method to ensure proper locale context when processing localized text
73
+ - Implement locale-aware classes by including the OxAiWorkers::LoadI18n module
74
+ - Store the current locale on initialization and preserve it across method calls
75
+ - Support multiple languages simultaneously through careful locale management
76
+ - Default to English for developer-facing messages and logs
77
+ - Ensure that all assistant classes properly handle localization in their format_role methods
78
+
79
+ ## LoadI18n Module Usage
80
+
81
+ - The `OxAiWorkers::LoadI18n` module provides two key methods for localization:
82
+ - `store_locale` - saves the current locale at initialization time
83
+ - `with_locale` - executes a block of code in the context of the saved locale
84
+ - Always include the `OxAiWorkers::LoadI18n` module in classes that need localization capabilities
85
+ - Call `store_locale` in the initialization methods of locale-aware classes
86
+ - Wrap all locale-dependent code in `with_locale` blocks
87
+ - NEVER redefine the `with_locale` method in classes that include LoadI18n
88
+ - All methods that produce user-visible text must use the locale context via `with_locale` blocks
89
+ - Regular method calls from classes including LoadI18n do not require additional locale handling
@@ -0,0 +1,52 @@
1
+ ---
2
+ description: Guidelines for writing clean, maintainable, and human-readable code. Apply these rules when writing or reviewing code to ensure consistency and quality.
3
+ globs:
4
+ alwaysApply: false
5
+ ---
6
+ # Clean Code Guidelines
7
+
8
+ ## Constants Over Magic Numbers
9
+ - Replace hard-coded values with named constants
10
+ - Use descriptive constant names that explain the value's purpose
11
+ - Keep constants at the top of the file or in a dedicated constants file
12
+
13
+ ## Meaningful Names
14
+ - Variables, functions, and classes should reveal their purpose
15
+ - Names should explain why something exists and how it's used
16
+ - Avoid abbreviations unless they're universally understood
17
+
18
+ ## Smart Comments
19
+ - Don't comment on what the code does - make the code self-documenting
20
+ - Use comments to explain why something is done a certain way
21
+ - Document APIs, complex algorithms, and non-obvious side effects
22
+
23
+ ## Single Responsibility
24
+ - Each function should do exactly one thing
25
+ - Functions should be small and focused
26
+ - If a function needs a comment to explain what it does, it should be split
27
+
28
+ ## DRY (Don't Repeat Yourself)
29
+ - Extract repeated code into reusable functions
30
+ - Share common logic through proper abstraction
31
+ - Maintain single sources of truth
32
+
33
+ ## Clean Structure
34
+ - Keep related code together
35
+ - Organize code in a logical hierarchy
36
+ - Use consistent file and folder naming conventions
37
+
38
+ ## Encapsulation
39
+ - Hide implementation details
40
+ - Expose clear interfaces
41
+ - Move nested conditionals into well-named functions
42
+
43
+ ## Code Quality Maintenance
44
+ - Refactor continuously
45
+ - Fix technical debt early
46
+ - Leave code cleaner than you found it
47
+ - Follow the "Fail fast" principle for early error detection
48
+
49
+ ## Version Control
50
+ - Write clear commit messages
51
+ - Make small, focused commits
52
+ - Use meaningful branch names
@@ -0,0 +1,132 @@
1
+ ---
2
+ description:
3
+ globs: *.mdc,**/*.mdc
4
+ alwaysApply: false
5
+ ---
6
+ # MDC File Format Guide
7
+
8
+ MDC (Markdown Configuration) files are used by Cursor to provide context-specific instructions to AI assistants. This guide explains how to create and maintain these files properly.
9
+
10
+ ## File Structure
11
+
12
+ Each MDC file consists of two main parts:
13
+
14
+ 1. **Frontmatter** - Configuration metadata at the top of the file
15
+ 2. **Markdown Content** - The actual instructions in Markdown format
16
+
17
+ ### Frontmatter
18
+
19
+ The frontmatter must be the first thing in the file and must be enclosed between triple-dash lines (`---`). Configuration should be based on the intended behavior:
20
+
21
+ ```
22
+ ---
23
+ # Configure your rule based on desired behavior:
24
+
25
+ description: Brief description of what the rule does
26
+ globs: **/*.js, **/*.ts # Optional: Comma-separated list, not an array
27
+ alwaysApply: false # Set to true for global rules
28
+ ---
29
+ ```
30
+
31
+ > **Important**: Despite the appearance, the frontmatter is not strictly YAML formatted. The `globs` field is a comma-separated list and should NOT include brackets `[]` or quotes `"`.
32
+
33
+ #### Guidelines for Setting Fields
34
+
35
+ - **description**: Should be agent-friendly and clearly describe when the rule is relevant. Format as `<topic>: <details>` for best results.
36
+ - **globs**:
37
+ - If a rule is only relevant in very specific situations, leave globs empty so it's loaded only when applicable to the user request.
38
+ - If the only glob would match all files (like `**/*`), leave it empty and set `alwaysApply: true` instead.
39
+ - Otherwise, be as specific as possible with glob patterns to ensure rules are only applied with relevant files.
40
+ - **alwaysApply**: Use sparingly for truly global guidelines.
41
+
42
+ #### Glob Pattern Examples
43
+
44
+ - **/*.js - All JavaScript files
45
+ - src/**/*.jsx - All JSX files in the src directory
46
+ - **/components/**/*.vue - All Vue files in any components directory
47
+
48
+ ### Markdown Content
49
+
50
+ After the frontmatter, the rest of the file should be valid Markdown:
51
+
52
+ ```markdown
53
+ # Title of Your Rule
54
+
55
+ ## Section 1
56
+ - Guidelines and information
57
+ - Code examples
58
+
59
+ ## Section 2
60
+ More detailed information...
61
+ ```
62
+
63
+ ## Special Features
64
+
65
+ ### File References
66
+
67
+ You can reference other files from within an MDC file using the markdown link syntax:
68
+
69
+ ```
70
+ [rule-name.mdc](mdc:location/of/the/rule.mdc)
71
+ ```
72
+
73
+ When this rule is activated, the referenced file will also be included in the context.
74
+
75
+ ### Code Blocks
76
+
77
+ Use fenced code blocks for examples:
78
+
79
+ ````markdown
80
+ ```javascript
81
+ // Example code
82
+ function example() {
83
+ return "This is an example";
84
+ }
85
+ ```
86
+ ````
87
+
88
+ ## Best Practices
89
+
90
+ 1. **Clear Organization**
91
+ - Use numbered prefixes (e.g., `01-workflow.mdc`) for sorting rules logically
92
+ - Place task-specific rules in the `tasks/` subdirectory
93
+ - Use descriptive filenames that indicate the rule's purpose
94
+
95
+ 2. **Frontmatter Specificity**
96
+ - Be specific with glob patterns to ensure rules are only applied in relevant contexts
97
+ - Use `alwaysApply: true` for truly global guidelines
98
+ - Make descriptions clear and concise so AI knows when to apply the rule
99
+
100
+ 3. **Content Structure**
101
+ - Start with a clear title (H1)
102
+ - Use hierarchical headings (H2, H3, etc.) to organize content
103
+ - Include examples where appropriate
104
+ - Keep instructions clear and actionable
105
+
106
+ 4. **File Size Considerations**
107
+ - Keep files focused on a single topic or closely related topics
108
+ - Split very large rule sets into multiple files and link them with references
109
+ - Aim for under 300 lines per file when possible
110
+
111
+ ## Usage in Cursor
112
+
113
+ When working with files in Cursor, rules are automatically applied when:
114
+
115
+ 1. The file you're working on matches a rule's glob pattern
116
+ 2. A rule has `alwaysApply: true` set in its frontmatter
117
+ 3. The agent thinks the rule's description matches the user request
118
+ 4. You explicitly reference a rule in a conversation with Cursor's AI
119
+
120
+ ## Creating/Renaming/Removing Rules
121
+
122
+ - When a rule file is added/renamed/removed, update also the list under 010-workflow.mdc.
123
+ - When changs are made to multiple `mdc` files from a single request, review also [999-mdc-format]((mdc:.cursor/rules/999-mdc-format.mdc)) to consider whether to update it too.
124
+
125
+ ## Updating Rules
126
+
127
+ When updating existing rules:
128
+
129
+ 1. Maintain the frontmatter format
130
+ 2. Keep the same glob patterns unless intentionally changing the rule's scope
131
+ 3. Update the description if the purpose of the rule changes
132
+ 4. Consider whether changes should propagate to related rules (e.g., CE versions)
data/CHANGELOG.md CHANGED
@@ -1,6 +1,9 @@
1
1
 
2
- ## [0.9.5] - 2025-05-10
2
+ ## [0.9.6] - 2025-05-10
3
3
 
4
+ - Added `add_file` for `Iterator` (only pdf for now)
5
+ - Added `add_image` for `Iterator`
6
+ - Added `add_file` and `add_image` for Assistants
4
7
  - Added `call_stack` for `Iterator` and `ModuleRequest`
5
8
  - Added `stop_double_calls` for `Iterator` and `ModuleRequest`
6
9
 
data/README.md CHANGED
@@ -88,6 +88,7 @@ For a more robust setup, you can configure the gem with your API keys, for examp
88
88
  OxAiWorkers.configure do |config|
89
89
  config.access_token_openai = ENV.fetch("OPENAI")
90
90
  config.access_token_deepseek = ENV.fetch("DEEPSEEK")
91
+ config.access_token_stability = ENV.fetch("STABILITY")
91
92
  config.max_tokens = 4096 # Default
92
93
  config.temperature = 0.7 # Default
93
94
  config.wait_for_complete = true # Default
@@ -396,6 +397,74 @@ class MyTool
396
397
  end
397
398
  ```
398
399
 
400
+ ### Working with Files and Images
401
+
402
+ You can easily add files and images to your assistants:
403
+
404
+ ```ruby
405
+ # Add a PDF file
406
+ iterator.add_file(
407
+ pdf: File.read('document.pdf'),
408
+ filename: 'document.pdf',
409
+ text: 'Here is the document you requested'
410
+ )
411
+
412
+ # Add image from URL
413
+ iterator.add_image(
414
+ text: 'Here is the image',
415
+ url: 'https://example.com/image.jpg',
416
+ detail: 'auto' # 'auto', 'low', or 'high'
417
+ )
418
+
419
+ # Add image from binary data
420
+ image_data = File.read('local_image.jpg')
421
+ iterator.add_image(
422
+ text: 'Image from binary data',
423
+ binary: image_data,
424
+ mime_type: 'image/jpeg' # Defaults to 'image/png'
425
+ )
426
+ ```
427
+
428
+ #### Image Input Requirements
429
+
430
+ When using images with the API, your input images must meet the following requirements:
431
+
432
+ **Supported file types:**
433
+
434
+ - PNG (.png)
435
+ - JPEG (.jpeg and .jpg)
436
+ - WEBP (.webp)
437
+ - Non-animated GIF (.gif)
438
+
439
+ **Size limits:**
440
+
441
+ - Up to 20MB per image
442
+ - Low-resolution: 512px x 512px
443
+ - High-resolution: 768px (short side) x 2000px (long side)
444
+
445
+ **Other requirements:**
446
+
447
+ - No watermarks or logos
448
+ - No text
449
+ - No NSFW content
450
+ - Clear enough for a human to understand
451
+
452
+ **Image detail level:**
453
+
454
+ The `detail` parameter controls what level of detail the model uses when processing the image:
455
+
456
+ ```ruby
457
+ iterator.add_image(
458
+ text: 'Nature boardwalk image',
459
+ url: 'https://example.com/nature.jpg',
460
+ detail: 'high' # Options: 'auto', 'low', or 'high'
461
+ )
462
+ ```
463
+
464
+ - `detail: 'low'`: Uses less tokens (85) and processes a low-resolution 512px x 512px version of the image. Best for simple use cases like identifying dominant colors or shapes.
465
+ - `detail: 'high'`: Provides better image understanding for complex tasks requiring higher resolution detail.
466
+ - `detail: 'auto'`: Lets the model decide the appropriate detail level (default if not specified).
467
+
399
468
  ### Handling State Transitions with Callbacks
400
469
 
401
470
  You can track and respond to state transitions with callbacks:
@@ -470,6 +539,16 @@ OxAiWorkers provides several specialized assistant types:
470
539
  orchestrator.task = "Create a hello world application in C, save it to hello_world.c, compile, run, and verify it works."
471
540
  ```
472
541
 
542
+ All assistants support working with files and images:
543
+
544
+ ```ruby
545
+ # Add files and images to any assistant
546
+ sysop.add_file(pdf: File.read('error_log.pdf'), filename: 'error_log.pdf', text: 'Error log file')
547
+ sysop.add_image(text: 'Screenshot of the error', url: 'https://example.com/screenshot.png')
548
+ ```
549
+
550
+ See the [Working with Files and Images](#working-with-files-and-images) section for full details.
551
+
473
552
  ### Available Tools
474
553
 
475
554
  OxAiWorkers provides several specialized tools to extend functionality:
@@ -481,7 +560,7 @@ OxAiWorkers provides several specialized tools to extend functionality:
481
560
  pixels = OxAiWorkers::Tool::Pixels.new(
482
561
  worker: worker, # Required: Request or DelayedRequest instance
483
562
  current_dir: Dir.pwd, # Optional: Directory to save generated images
484
- image_model: 'dall-e-3', # Optional: 'dall-e-3' or 'gpt-image-1'
563
+ image_model: OxAiWorkers::Models::StabilityImages.new, # Optional, default is OpenaiDalle3
485
564
  only: [:generate_image] # Optional: Limit available functions
486
565
  )
487
566
  ```
@@ -529,6 +608,69 @@ OxAiWorkers provides several specialized tools to extend functionality:
529
608
 
530
609
  Additional tools like Database and Converter are available for specialized tasks and can be integrated using the same pattern.
531
610
 
611
+ ### Function Control Mechanisms
612
+
613
+ OxAiWorkers provides two powerful mechanisms to control function execution behavior in iterators:
614
+
615
+ #### Call Stack
616
+
617
+ The `call_stack` parameter allows you to force the model to call specific functions in a predetermined order:
618
+
619
+ ```ruby
620
+ iterator = OxAiWorkers::Iterator.new(
621
+ worker: worker,
622
+ tools: [my_tool],
623
+ call_stack: [
624
+ my_tool.full_function_name(:process_data),
625
+ OxAiWorkers::Iterator.full_function_name(:outer_voice),
626
+ ]
627
+ )
628
+ ```
629
+
630
+ This feature is particularly useful when:
631
+
632
+ - You need to ensure a specific sequence of operations
633
+ - Certain functions must be called before others
634
+ - You want to guide the model through a predefined workflow
635
+ - Complex operations require strict ordering of function calls
636
+
637
+ The `call_stack` is processed sequentially, with each function being removed from the stack after it's called.
638
+
639
+ #### Stop Double Calls
640
+
641
+ The `stop_double_calls` parameter prevents the model from calling the same function twice in consecutive operations:
642
+
643
+ ```ruby
644
+ iterator = OxAiWorkers::Iterator.new(
645
+ worker: worker,
646
+ tools: [my_tool],
647
+ stop_double_calls: [
648
+ my_tool.full_function_name(:expensive_operation)
649
+ ]
650
+ )
651
+ ```
652
+
653
+ This feature is valuable for:
654
+
655
+ - Preventing redundant operations that could waste resources
656
+ - Avoiding duplicate processing of the same data
657
+ - Ensuring that certain operations are executed only once in sequence
658
+ - Protecting against potential infinite loops in function calls
659
+
660
+ When a function is called, its name is stored as the `last_call`. If the next function call matches both the `last_call` and is included in the `stop_double_calls` list, it will be excluded from the available tools for that request.
661
+
662
+ By default, `stop_double_calls` is applied to the `inner_monologue` and `outer_voice` functions to prevent reasoning loops and repetitive responses. This default behavior helps models avoid getting stuck in circular thinking patterns.
663
+
664
+ If you need to override this default behavior (for example, when consecutive monologue or voice calls are required for your specific use case), you can reset the stop_double_calls list **after** the iterator is created:
665
+
666
+ ```ruby
667
+ # Clear the default stop_double_calls constraints
668
+ @iterator.stop_double_calls = []
669
+
670
+ # Or set your own custom constraints
671
+ @iterator.stop_double_calls = [my_tool.full_function_name(:specific_function)]
672
+ ```
673
+
532
674
  ### Implementing Your Own Assistant
533
675
 
534
676
  Create custom assistants by inheriting from existing ones or composing with the Iterator:
@@ -553,6 +695,92 @@ module OxAiWorkers
553
695
  end
554
696
  ```
555
697
 
698
+ ## Image Generation
699
+
700
+ OxAiWorkers supports image generation through the Painter assistant and Pixels tool, with multiple AI image generation models.
701
+
702
+ ### Supported Image Models
703
+
704
+ - **OpenaiDalle3** - OpenAI's DALL-E 3 model
705
+ - **OpenaiGptImage** - OpenAI's GPT-Image-1 model
706
+ - **StabilityImages** - Stability AI's image generation models
707
+
708
+ ### Using the Painter Assistant
709
+
710
+ ```ruby
711
+ # Using DALL-E 3 (default)
712
+ painter = OxAiWorkers::Assistant::Painter.new(current_dir: Dir.pwd)
713
+ painter.task = "Create an image of a sunset over mountains"
714
+
715
+ # Using GPT-Image-1
716
+ painter = OxAiWorkers::Assistant::Painter.new(
717
+ image_model: OxAiWorkers::Models::OpenaiGptImage.new,
718
+ current_dir: Dir.pwd
719
+ )
720
+ painter.task = "Generate a photorealistic red apple"
721
+
722
+ # Using Stability AI
723
+ painter = OxAiWorkers::Assistant::Painter.new(
724
+ image_model: OxAiWorkers::Models::StabilityImages.new,
725
+ current_dir: Dir.pwd
726
+ )
727
+ painter.task = "Create a fantasy landscape with dragons"
728
+ ```
729
+
730
+ ### Using the Pixels Tool Directly
731
+
732
+ For more direct control over image generation:
733
+
734
+ ```ruby
735
+ # Initialize with DALL-E 3
736
+ pixels = OxAiWorkers::Tool::Pixels.new(
737
+ worker: OxAiWorkers::Models::OpenaiDalle3.new,
738
+ current_dir: Dir.pwd
739
+ )
740
+ pixels.generate_image(
741
+ prompt: "A photorealistic red apple on a wooden table",
742
+ file_name: "apple.png",
743
+ size: "1024x1024",
744
+ quality: "hd"
745
+ )
746
+
747
+ # Initialize with GPT-Image-1
748
+ pixels = OxAiWorkers::Tool::Pixels.new(
749
+ worker: OxAiWorkers::Models::OpenaiGptImage.new,
750
+ current_dir: Dir.pwd
751
+ )
752
+ pixels.generate_image(
753
+ prompt: "Futuristic cityscape at night",
754
+ file_name: "city.png",
755
+ size: "1536x1024",
756
+ quality: "high"
757
+ )
758
+
759
+ # Initialize with Stability AI
760
+ pixels = OxAiWorkers::Tool::Pixels.new(
761
+ worker: OxAiWorkers::Models::StabilityImages.new,
762
+ current_dir: Dir.pwd
763
+ )
764
+ pixels.generate_image(
765
+ prompt: "Photorealistic mountain landscape",
766
+ file_name: "mountains.png"
767
+ )
768
+ ```
769
+
770
+ ### Model-Specific Features
771
+
772
+ - **OpenaiDalle3**
773
+ - Sizes: '1024x1024', '1024x1792', '1792x1024'
774
+ - Qualities: 'standard', 'hd'
775
+
776
+ - **OpenaiGptImage**
777
+ - Sizes: 'auto', '1024x1024', '1536x1024', '1024x1536'
778
+ - Qualities: 'auto', 'low', 'medium', 'high'
779
+
780
+ - **StabilityImages**
781
+ - Uses Stability AI's API with different engine options
782
+ - Configuration via options parameter
783
+
556
784
  ## Contributing
557
785
 
558
786
  Bug reports and pull requests are welcome on GitHub at <https://github.com/neonix20b/ox-ai-workers>. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [code of conduct](https://github.com/neonix20b/ox-ai-workers/blob/main/CODE_OF_CONDUCT.md).
@@ -34,6 +34,14 @@ module OxAiWorkers
34
34
  @iterator.clear_context
35
35
  @iterator.add_context context
36
36
  end
37
+
38
+ def add_file(pdf:, filename:, text:, role: :user)
39
+ @iterator.add_file(pdf:, filename:, text:, role:)
40
+ end
41
+
42
+ def add_image(text:, url: nil, binary: nil, role: :user, detail: 'auto', mime_type: 'image/jpeg')
43
+ @iterator.add_image(text:, url:, binary:, role:, detail:, mime_type:)
44
+ end
37
45
  end
38
46
  end
39
47
  end
@@ -29,12 +29,6 @@ module OxAiWorkers
29
29
  on_outer_voice: ->(text:) { puts "voice: #{text}".colorize(:green) }
30
30
  )
31
31
  end
32
-
33
- def cleanup
34
- Dir.glob(File.join(@current_dir, '*.png')).each do |file|
35
- File.delete(file) if File.exist?(file)
36
- end
37
- end
38
32
  end
39
33
  end
40
34
  end
@@ -117,16 +117,16 @@ module OxAiWorkers
117
117
  @worker.call_stack = @call_stack.dup
118
118
  @worker.stop_double_calls = @stop_double_calls
119
119
  @worker.messages = []
120
- @worker.append(role: :system, content: @role) if @role.present?
120
+ @worker.append(role: :system, content: "<role>\n#{@role}\n</role>") if @role.present?
121
121
 
122
- @tasks.each { |task| @worker.append(role: :user, content: task) }
123
- @worker.append(role: :system, content: valid_monologue.join("\n"))
122
+ @worker.append(role: :system, content: "<instructions>\n#{valid_monologue.join("\n")}\n</instructions>")
123
+ @tasks.each { |task| @worker.append(role: :user, content: "<task>\n#{task}\n</task>") }
124
124
  @worker.append(messages: @context) if @context.present?
125
125
  @tools.each do |tool|
126
126
  @worker.append(role: :user, content: tool.context) if tool.respond_to?(:context) && tool.context.present?
127
127
  end
128
128
  @worker.append(messages: @messages)
129
- @tasks.each { |task| @worker.append(role: :user, content: task) }
129
+ @tasks.each { |task| @worker.append(role: :user, content: "<task>\n#{task}\n</task>") }
130
130
  @worker.tools = function_schemas.to_openai_format(only: available_defs)
131
131
  return unless @tools.present?
132
132
 
@@ -252,6 +252,35 @@ module OxAiWorkers
252
252
  add_raw_context({ role:, content: text })
253
253
  end
254
254
 
255
+ def add_file(pdf:, filename:, text:, role: :user)
256
+ content = []
257
+ content << { type: 'text', text: } if text.present?
258
+ content << {
259
+ type: 'file',
260
+ file: {
261
+ filename:,
262
+ file_data: Base64.strict_encode64(pdf)
263
+ }
264
+ }
265
+
266
+ add_raw_context({ role:, content: })
267
+ end
268
+
269
+ def add_image(text:, url: nil, binary: nil, role: :user, detail: 'auto', mime_type: 'image/png')
270
+ content = []
271
+ content << { type: 'text', text: } if text.present?
272
+
273
+ image_url = if binary.present?
274
+ "data:#{mime_type};base64,#{Base64.strict_encode64(binary)}"
275
+ else
276
+ url
277
+ end
278
+
279
+ content << { type: 'image_url', image_url: { url: image_url, detail: } }
280
+
281
+ add_raw_context({ role:, content: })
282
+ end
283
+
255
284
  def add_raw_context(c)
256
285
  @context << c
257
286
  end
@@ -53,10 +53,11 @@ module OxAiWorkers
53
53
  frequency_penalty: @model.frequency_penalty
54
54
  }
55
55
  if @tools.present?
56
- parameters[:tools] = @tools.reject do |f|
56
+ parameters[:tools] = @tools.select do |f|
57
57
  tool_name = f[:function][:name]
58
58
  tool_name == @last_call && @stop_double_calls.include?(tool_name)
59
59
  end
60
+ OxAiWorkers.logger.debug("tools: #{parameters[:tools]} last_call=#{@last_call} stop_double_calls=#{@stop_double_calls}", for: self.class)
60
61
  if @call_stack&.any?
61
62
  func1 = @call_stack.first
62
63
  @call_stack = @call_stack.drop(1)
@@ -146,7 +147,7 @@ module OxAiWorkers
146
147
  name: function['name'].split('__').last,
147
148
  args: args
148
149
  }
149
- @last_call = function['name']
150
+ @last_call = function['name'].to_s
150
151
  end
151
152
  end
152
153
  end
@@ -62,29 +62,9 @@ module OxAiWorkers
62
62
  end
63
63
  end
64
64
 
65
- def edit_image(input_image:, prompt:, output_file_name: nil, size: nil, mask: nil)
66
- size ||= @image_model['size'].first
67
- mask ||= @mask
68
-
69
- response = @worker.client.images.edit(
70
- parameters: {
71
- image: input_image,
72
- model: @image_model['model'],
73
- prompt:,
74
- size:,
75
- mask:
76
- }
77
- )
78
-
79
- @url = response.dig('data', 0, 'url')
80
- revised_prompt = response.dig('data', 0, 'revised_prompt')
81
- if output_file_name.present?
82
- path = save_generated_image(file_name: output_file_name)
83
- "url: #{@url}\nfile_name: #{path}\n\nrevised_prompt: #{revised_prompt}"
84
- else
85
- "url: #{@url}\n\nrevised_prompt: #{revised_prompt}"
86
- end
87
- end
65
+ # def edit_image(input_image:, prompt:, output_file_name: nil, size: nil, mask: nil)
66
+ # # TODO: Implement edit_image
67
+ # end
88
68
 
89
69
  def save_generated_image(file_name:, binary:)
90
70
  unless @current_dir.present?
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module OxAiWorkers
4
- VERSION = '0.9.5'
4
+ VERSION = '0.9.6.1'
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ox-ai-workers
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.9.5
4
+ version: 0.9.6.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Denis Smolev
@@ -123,7 +123,9 @@ executables:
123
123
  extensions: []
124
124
  extra_rdoc_files: []
125
125
  files:
126
- - ".cursorrules"
126
+ - ".cursor/rules/010-project-structure.mdc"
127
+ - ".cursor/rules/998-clean-code.mdc"
128
+ - ".cursor/rules/999-mdc-format.mdc"
127
129
  - ".ruby-version"
128
130
  - CHANGELOG.md
129
131
  - CODE_OF_CONDUCT.md
data/.cursorrules DELETED
@@ -1,155 +0,0 @@
1
- # Overview
2
-
3
- OxAiWorkers is a Ruby gem that implements a finite state machine (using the `state_machine` gem) to solve tasks using generative intelligence (with the `ruby-openai` gem). This approach enhances the final result by utilizing internal monologue and external tools.
4
-
5
- ## Architecture Principles
6
-
7
- - The library is built on the finite state machine (FSM) pattern using the 'state_machine' gem
8
- - Integration with generative models is implemented using the 'ruby-openai' gem
9
- - DRY (Don't Repeat Yourself) principle is applied throughout all components
10
- - Modular structure with clear separation of responsibilities between classes
11
- - Encapsulation of states and transitions in separate classes
12
- - Implementation of the "Composition" pattern for flexible tool integration
13
-
14
- ## Core Components
15
-
16
- - `Request` and `DelayedRequest` - classes for executing API requests (immediate and delayed)
17
- - `Iterator` - main class for iterative task execution with tools
18
- - `Assistant` - high-level wrappers over Iterator (Sysop, Coder, Localizer, etc.)
19
- - `Tool` - tools that can be used during task execution (Eval, FileSystem, Database)
20
- - `ToolDefinition` - module for declaring functions and methods for tools
21
- - `StateTools` - base class for managing states and transitions
22
- - `ContextualLogger` - logging system with contextual information support
23
-
24
- ## Code Conventions
25
-
26
- - Use `snake_case` for method and variable names
27
- - Functions for generative models should also be in `snake_case` (inner_monologue, outer_voice, etc.)
28
- - All public methods must have documentation with usage examples
29
- - Tests are mandatory for all new functions
30
- - All code comments, CHANGELOG, README, and other documentation must be written in English
31
- - Use YARD-style documentation for all public methods
32
- - Maintain a unified code formatting style (Rubocop is recommended)
33
- - Follow the "Fail fast" principle for early error detection
34
-
35
- ## Interaction Patterns
36
-
37
- - The system uses internal monologue (inner_monologue) for planning actions
38
- - External voice (outer_voice) is used for communication with the user
39
- - Execution flow management through finite state machine
40
- - Implementation of callback mechanisms for flexible event handling
41
- - Isolation of error handling functions at the tool level
42
-
43
- ## Integration
44
-
45
- - CLI interface through `oxaiworkers init` and `oxaiworkers run` commands
46
- - Rails support via ActiveRecord for storing delayed requests
47
- - Configuration through the `OxAiWorkers.configure` block
48
- - Multilingual support via standard I18n
49
- - Integration with external APIs through request client templates
50
- - Delayed execution mechanism via DelayedRequest
51
- - Support for various language models (OpenAI, Anthropic, Gemini)
52
-
53
- ## Best Practices
54
-
55
- - Use callbacks to handle various states (on_inner_monologue, on_outer_voice)
56
- - Handle errors at the tool level, preventing them from interrupting the main execution flow
57
- - When creating new assistants, inherit from the base Assistant class
58
- - Use the white_list mechanism to restrict available functions
59
- - Separate language model requests from result processing logic
60
- - Practice dependency injection to improve code testability
61
- - Use localization mechanisms for multilingual support
62
-
63
- ## Tools Architecture
64
-
65
- - Each tool should be a self-contained module
66
- - Tools are registered through the `define_function` interface
67
- - All tools should handle their own errors and return readable messages
68
- - Use parameter validation at the function definition level
69
- - Maintain a unified format for return values
70
-
71
- ## Performance and Scaling
72
-
73
- - Cache API request results when possible
74
- - Use asynchronous processing for long operations
75
- - Apply backoff strategies for repeated requests
76
- - Break large tasks into atomic operations
77
- - Provide monitoring and profiling mechanisms
78
-
79
- ## Finite State Machine Implementation
80
-
81
- - Core FSM based on `state_machine` gem with states: idle → prepared → requested → analyzed → finished → idle
82
- - State transitions managed by events: prepare, request, analyze, complete, iterate, end
83
- - `StateTools` - base class for FSM implementation with event hooks and transition callbacks
84
- - `StateBatch` - FSM extension for batch request processing with additional states
85
- - Automatic error recovery and retry mechanisms for failed API requests
86
-
87
- ## Request Processing
88
-
89
- - `ModuleRequest` - base class for all API requests with parsing and response handling
90
- - Support for streaming responses with callback processing
91
- - Built-in token usage tracking and truncation detection
92
- - Error handling with automatic retries for server errors
93
-
94
- ## Iterator Lifecycle
95
-
96
- - 3 core functions: inner_monologue, outer_voice, finish_it
97
- - Configurable message queue for stateful conversation history
98
- - Callback system for processing each state transition
99
- - Context and milestone management for optimizing token usage
100
- - Support for custom steps and instruction templating
101
-
102
- ## Additional Tools
103
-
104
- - `Converter` - tools for data format conversion and transformation
105
- - Support for custom tool development through inheritance and composition
106
- - Automatic function name resolution and parameter validation
107
-
108
- ## Assistants Details
109
-
110
- - `ModuleBase` - shared functionality for all assistant types
111
- - `Sysop` - system administration and shell command execution
112
- - `Coder` - specialized for code generation and analysis
113
- - `Localizer` - translation and localization support
114
-
115
- ## Development Guidelines
116
-
117
- - Use dependency injection for testability
118
- - Follow the FSM pattern for all stateful operations
119
- - Implement proper error boundaries at the tool level
120
- - Use monologue for complex reasoning and planning
121
- - Apply callbacks for event-driven architecture
122
- - Utilize templates in the CLI for rapid prototyping
123
- - Extend the base classes rather than modifying them
124
-
125
- ## Internationalization and Localization
126
-
127
- - All code comments, variable names, and documentation MUST be written in English
128
- - All user-facing strings MUST be properly localized using I18n
129
- - Use I18n.t for all text that will be shown to users or appears in assistant prompts
130
- - Store translations in YAML files within the config/locales directory
131
- - Follow the naming convention of language.namespace.key (e.g., en.oxaiworkers.assistant.role)
132
- - Use named parameters (%{variable}) instead of positional parameters (%s) in translation strings
133
- - Use the with_locale method to ensure proper locale context when processing localized text
134
- - Implement locale-aware classes by including the OxAiWorkers::LoadI18n module
135
- - Store the current locale on initialization and preserve it across method calls
136
- - Support multiple languages simultaneously through careful locale management
137
- - Default to English for developer-facing messages and logs
138
- - Ensure that all assistant classes properly handle localization in their format_role methods
139
-
140
- ## LoadI18n Module Usage
141
-
142
- - The `OxAiWorkers::LoadI18n` module provides two key methods for localization:
143
- - `store_locale` - saves the current locale at initialization time
144
- - `with_locale` - executes a block of code in the context of the saved locale
145
- - Always include the `OxAiWorkers::LoadI18n` module in classes that need localization capabilities
146
- - Call `store_locale` in the initialization methods of locale-aware classes
147
- - Wrap all locale-dependent code in `with_locale` blocks
148
- - NEVER redefine the `with_locale` method in classes that include LoadI18n
149
- - All methods that produce user-visible text must use the locale context via `with_locale` blocks
150
- - Regular method calls from classes including LoadI18n do not require additional locale handling
151
-
152
- ## Multi-Language Support
153
-
154
- - Use the store_locale and with_locale methods for consistent localization context
155
- - All error messages should be localized and retrieved via I18n.t