RubyGems - touring_test - Versions diffs - 0.0.1 - Mend

touring_test 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

checksums.yaml +7 -0
data/.claude/settings.local.json +12 -0
data/README.md +752 -0
data/Rakefile +8 -0
data/lib/touring_test/agent.rb +155 -0
data/lib/touring_test/driver.rb +223 -0
data/lib/touring_test/railtie.rb +26 -0
data/lib/touring_test/version.rb +5 -0
data/lib/touring_test/world_extension.rb +14 -0
data/lib/touring_test.rb +11 -0
data/sig/cucumber/gemini/computer/use.rbs +10 -0
data/touring_test_example.png +0 -0
metadata +110 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: 02c8f40628071610d3959b937580803df4ecae32fc652b40fa7283e8331d7779
+  data.tar.gz: c6b41686b43ff5d347f971c370f0c425ac854d74c92f5ea9e16c79b64dbecfb4
+SHA512:
+  metadata.gz: c46e849c9096371879df2d2b4c8f3ab99ee6675d8b829321e89d3bd121b840e871985c5280d22a84a6d16ab2c2701a3e211641396a30400c43f631b1388dd452
+  data.tar.gz: 2e7b28a9a2a2071cabb1059e891b09010afdc489517dfe63dbf305b465961bea201fc60edcb94374297a0ad0ebbe38480be309821b3ac8c2f766de89568194a4

data/.claude/settings.local.json ADDED Viewed

@@ -0,0 +1,12 @@
+{
+  "permissions": {
+    "allow": [
+      "Bash(tree:*)",
+      "Bash(cat:*)",
+      "Bash(TOURING_TEST_DEBUG=true bundle exec cucumber:*)",
+      "Bash(bundle exec rake:*)"
+    ],
+    "deny": [],
+    "ask": []
+  }
+}

data/README.md ADDED Viewed

@@ -0,0 +1,752 @@
+# TouringTest
+> AI-Powered Natural Language Testing for Cucumber
+[![Ruby](https://img.shields.io/badge/ruby-%3E%3D%203.2.0-ruby.svg)](https://www.ruby-lang.org/en/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+**TouringTest** is a Ruby gem that integrates Google's Gemini "computer use" AI model with Cucumber testing framework. Write high-level, natural language test instructions and watch as an AI agent executes them by analyzing screenshots and performing browser actions via Capybara.
+**Status:** ⚠️ Experimental - relies on Google's preview API (`gemini-2.5-computer-use-preview-10-2025`)
+---
+## What is TouringTest?
+Traditional Cucumber tests require writing step definitions that use brittle CSS selectors and detailed browser automation logic. TouringTest flips this model:
+```ruby
+# Traditional approach
+When('I sign up with email and password') do
+  visit sign_up_path
+  fill_in 'user[email]', with: 'test@example.com'
+  fill_in 'user[password]', with: 'password123'
+  click_button 'Sign Up'
+end
+# TouringTest approach
+When('the agent {string}') do |instruction|
+  computer_use(instruction)
+end
+# In your feature file:
+When the agent "signs up with email 'test@example.com' and password 'password123'"
+```
+The AI agent:
+1. Takes a screenshot of the current page
+2. Analyzes it to understand the UI layout
+3. Determines what actions to take (click fields, type text, submit forms)
+4. Executes those actions via Capybara
+5. Repeats until the goal is achieved
+**Benefits:**
+- **More resilient tests** - No brittle CSS selectors that break when markup changes
+- **Usability testing** - Tests reflect real user interactions
+- **Faster test writing** - Describe what you want, not how to do it
+- **Better readability** - Tests read like user stories
+- **Self-healing** - AI adapts to UI changes automatically
+---
+## Quick Start
+### Prerequisites
+- Ruby >= 3.2.0
+- A Google Gemini API key ([Get one here](https://aistudio.google.com/apikey))
+### Installation
+```bash
+# Add to your Gemfile
+gem 'touring_test'
+# Install
+bundle install
+# Set your API key
+export GEMINI_API_KEY='your_api_key_here'
+```
+### Minimal Example
+```ruby
+# features/support/env.rb
+require 'touring_test'
+require 'capybara'
+Capybara.default_driver = :selenium_chrome_headless
+World(TouringTest::WorldExtension)
+# features/step_definitions/agent_steps.rb
+When('the agent {string}') do |instruction|
+  computer_use(instruction)
+end
+# features/login.feature
+Feature: User Login
+  Scenario: Successful login
+    Given I am on the login page
+    When the agent "logs in with username 'admin' and password 'secret'"
+    Then I should see the dashboard
+```
+### Example Output
+Here's what TouringTest looks like in action:
+![TouringTest Example Output](touring_test_example.png)
+The AI agent narrates its actions in real-time, showing:
+- Each step it evaluates
+- The actions it takes (click_at, type_text_at, etc.)
+- Its reasoning about what to do next
+- Final success confirmation
+---
+## Installation (Detailed)
+### 1. Add the Gem
+Add to your `Gemfile`:
+```ruby
+gem 'touring_test'
+```
+Or install directly:
+```bash
+gem install touring_test
+```
+### 2. Set Up API Key
+Get a Gemini API key from [Google AI Studio](https://aistudio.google.com/apikey) and set it as an environment variable:
+```bash
+# In your shell or .env file
+export GEMINI_API_KEY='your_api_key_here'
+```
+### 3. Configure Cucumber
+Add the following to your `features/support/env.rb`:
+```ruby
+require 'touring_test'
+require 'capybara'
+# Configure your Capybara driver (Selenium, Playwright, etc.)
+Capybara.default_driver = :selenium_chrome_headless
+# Add TouringTest's WorldExtension to Cucumber
+World(TouringTest::WorldExtension)
+```
+If you're using Rails, you may need to create this file if it doesn't exist yet (usually generated by `rails generate cucumber:install`).
+---
+## Usage
+### Basic Usage
+The core of TouringTest is the `computer_use` method, which accepts a natural language instruction:
+```ruby
+# In your step definitions
+When('the agent {string}') do |instruction|
+  computer_use(instruction)
+end
+```
+Now you can write Cucumber scenarios like:
+```gherkin
+Scenario: User creates an account
+  Given I am on the homepage
+  When the agent "clicks on Sign Up and creates an account with email 'user@example.com'"
+  Then I should see "Welcome!"
+```
+### Writing Effective Natural Language Instructions
+**Good instructions:**
+- Be specific about the goal: "sign up with email 'test@example.com' and password 'password123'"
+- Include exact text when important: "click the blue 'Submit' button"
+- Break complex tasks into steps if needed
+**Less effective:**
+- Too vague: "do the signup thing"
+- Missing critical data: "sign up with some credentials"
+- Overly complex: "navigate through multiple pages and fill out everything"
+### Available UI Actions
+The AI agent can perform these 11 browser actions:
+| Action | Description | Example Use |
+|--------|-------------|-------------|
+| `click_at(x:, y:)` | Click element at coordinates | Clicking buttons, links |
+| `type_text_at(x:, y:, text:)` | Type in an input field | Filling forms |
+| `hover_at(x:, y:)` | Hover over element | Revealing dropdowns |
+| `scroll_document(direction:)` | Scroll entire page | UP, DOWN, LEFT, RIGHT |
+| `scroll_at(x:, y:, direction:)` | Scroll specific element | Scrollable divs |
+| `drag_and_drop(start_x:, start_y:, end_x:, end_y:)` | Drag element | Reordering lists |
+| `navigate(url:)` | Go to URL | Changing pages |
+| `go_back()` | Browser back button | Navigation |
+| `go_forward()` | Browser forward button | Navigation |
+| `wait_5_seconds()` | Explicit wait | Slow loading |
+| `key_combination(keys:)` | Keyboard shortcuts | "enter", "ctrl+a" |
+The agent automatically chooses which actions to use based on its analysis of your instruction and the page screenshot.
+### Configuration Options
+```ruby
+# Default usage (screenshots saved to current directory)
+computer_use("sign up with email 'test@example.com'")
+# Custom root path for screenshots
+computer_use(
+  "sign up with email 'test@example.com'",
+  root_path: Rails.root
+)
+```
+### Screenshots & Debugging
+TouringTest automatically captures screenshots at each step:
+- **Location:** `{root_path}/tmp/screenshots/`
+- **Naming:** `step_1.png`, `step_2.png`, etc.
+- **Cleared:** At the start of each test run
+**API Logs:**
+- Full request/response JSON logged to `tmp/gemini_api_log.jsonl`
+- Useful for debugging API issues or understanding agent decisions
+**Console Output:**
+- Shows each instruction sent to the agent
+- Displays actions taken (e.g., "click_at(x: 450, y: 320)")
+- Reports success or failure
+---
+## How It Works
+### Architecture
+TouringTest uses a clean three-layer architecture:
+```
+┌─────────────────────────────────────────┐
+│  Cucumber Step Definition               │
+│  computer_use("sign up with email...")  │
+└──────────────┬──────────────────────────┘
+               │
+               ▼
+┌─────────────────────────────────────────┐
+│  Agent (AI Orchestrator)                │
+│  - Manages conversation with Gemini API │
+│  - Captures screenshots                 │
+│  - Enforces step limits                 │
+└──────────────┬──────────────────────────┘
+               │
+               ▼
+┌─────────────────────────────────────────┐
+│  Driver (Browser Automation Facade)     │
+│  - Executes UI actions                  │
+│  - Denormalizes coordinates             │
+│  - Wraps Capybara session               │
+└──────────────┬──────────────────────────┘
+               │
+               ▼
+┌─────────────────────────────────────────┐
+│  Capybara / Browser                     │
+└─────────────────────────────────────────┘
+```
+### Conversation Flow
+```
+1. Initial Turn:
+   User: "sign up with email 'test@example.com' and password 'password123'"
+   + Base64 screenshot of current page
+   + Current URL
+   ↓
+2. Gemini API Request:
+   POST with full conversation history + computer_use tool specification
+   ↓
+3. Gemini Response:
+   Function calls: [
+     {name: "click_at", args: {x: 450, y: 320}},
+     {name: "type_text_at", args: {x: 450, y: 320, text: "test@example.com"}}
+   ]
+   ↓
+4. Driver Execution:
+   - Executes each action
+   - Captures new screenshot after each action
+   ↓
+5. Next Turn:
+   User: Function responses + new screenshot + new URL
+   ↓
+6. Loop continues until:
+   - Gemini returns no function calls (goal achieved), OR
+   - Maximum steps reached (default: 15)
+```
+### Coordinate System
+Gemini returns **normalized coordinates** in a 0-1000 range. TouringTest converts these to pixel coordinates:
+- **API sends:** `{x: 500, y: 250}` (middle of screen on 1000-unit scale)
+- **Driver converts:** `(500 / 1000.0) * screenshot_width` → pixel position
+**Critical Detail:** Coordinates are denormalized using **screenshot dimensions**, not window size, to handle HiDPI/Retina displays correctly. On a 2x display:
+- Window size: 1512×834
+- Screenshot size: 756×417
+- Agent analyzes the 756×417 screenshot, so coordinates must match those dimensions
+### Step Limit
+To prevent infinite loops from AI hallucination or impossible tasks:
+- **Default:** 15 steps maximum
+- **Configurable:** Pass `max_steps` to Agent (for advanced usage)
+- **Exception raised:** If limit exceeded
+---
+## Example: Real-World Test
+Here's a complete example from the test app included in this gem:
+**Feature file** (`features/sign_up.feature`):
+```gherkin
+Feature: Sign up
+  Scenario: User signs up with email and password
+    Given I am on the sign up page
+    When the agent "signs up with the email address 'test@example.com' and password 'password123'"
+    Then I should be signed in
+```
+**Step definitions** (`features/step_definitions/sign_up_steps.rb`):
+```ruby
+Given('I am on the sign up page') do
+  visit sign_up_path
+end
+When('the agent {string}') do |instruction|
+  computer_use(instruction, root_path: Rails.root)
+end
+Then('I should be signed in') do
+  expect(page).to have_content('Welcome')
+end
+```
+**What the AI does:**
+1. Analyzes screenshot of sign-up form
+2. Identifies email input field coordinates
+3. Clicks email field: `click_at(x: 450, y: 280)`
+4. Types email: `type_text_at(x: 450, y: 280, text: "test@example.com")`
+5. Identifies password field
+6. Clicks password field: `click_at(x: 450, y: 350)`
+7. Types password: `type_text_at(x: 450, y: 350, text: "password123")`
+8. Finds Submit button
+9. Clicks Submit: `click_at(x: 500, y: 420)`
+10. Mission accomplished!
+---
+## File Structure & Architecture
+TouringTest follows standard Ruby gem conventions:
+```
+touring_test/
+├── lib/
+│   ├── touring_test.rb              # Main entry point
+│   └── touring_test/
+│       ├── version.rb               # VERSION = "0.1.0"
+│       ├── agent.rb                 # AI orchestration (147 lines)
+│       ├── driver.rb                # Browser automation (200 lines)
+│       ├── world_extension.rb       # Cucumber integration (14 lines)
+│       └── railtie.rb              # Rails auto-setup (26 lines)
+├── spec/
+│   ├── spec_helper.rb
+│   ├── touring_test_spec.rb         # Basic tests
+│   ├── touring_test/
+│   │   ├── agent_spec.rb            # Unit tests for Agent
+│   │   └── driver_spec.rb           # Unit tests for Driver
+│   └── test_app/                    # Full Rails integration test app
+│       ├── app/                     # Rails app with sign-up flow
+│       ├── features/                # Cucumber features
+│       └── Gemfile
+├── bin/
+│   ├── console                      # IRB with gem loaded
+│   └── setup                        # Automated setup script
+├── Gemfile
+├── Rakefile
+├── touring_test.gemspec
+└── README.md
+```
+### Core Components
+1. **Agent** (`lib/touring_test/agent.rb`)
+   - Orchestrates conversation with Gemini API
+   - Maintains conversation history during execution
+   - Enforces max step limit (default: 15)
+   - Logs full API interactions to `tmp/gemini_api_log.jsonl`
+2. **Driver** (`lib/touring_test/driver.rb`)
+   - Wraps Capybara session with AI-friendly interface
+   - Handles coordinate denormalization (0-1000 → pixels)
+   - Executes 11 different UI actions
+   - Manages screenshot capture
+3. **WorldExtension** (`lib/touring_test/world_extension.rb`)
+   - Provides `computer_use()` method to Cucumber World
+   - Bridges step definitions to Agent/Driver
+4. **Railtie** (`lib/touring_test/railtie.rb`)
+   - Automatic Rails integration
+   - Generates support files
+   - Zero-config experience
+### Test App (Non-Standard)
+The `spec/test_app/` directory contains a **complete Rails 7.1.2 application** for integration testing. This is unusual for a gem (most use minimal fixtures), but valuable for demonstrating end-to-end functionality.
+---
+## API & Configuration
+### Gemini API Requirements
+- **Model:** `gemini-2.5-computer-use-preview-10-2025`
+- **Endpoint:** `https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent`
+- **Authentication:** API key via query parameter: `?key={GEMINI_API_KEY}`
+- **Required Tool Specification:**
+  ```json
+  {
+    "computer_use": {
+      "environment": "ENVIRONMENT_BROWSER"
+    }
+  }
+  ```
+### Environment Variables
+- **`GEMINI_API_KEY`** (required): Your Google API key for Gemini access
+Get your API key: [https://aistudio.google.com/apikey](https://aistudio.google.com/apikey)
+### API Request Format
+The Agent sends multi-turn conversations to Gemini:
+```json
+{
+  "contents": [
+    {
+      "role": "user",
+      "parts": [
+        {"text": "sign up with email 'test@example.com'"},
+        {"inline_data": {"mime_type": "image/png", "data": "base64..."}},
+        {"text": "Current URL: http://localhost:3000/sign_up"}
+      ]
+    },
+    {
+      "role": "model",
+      "parts": [
+        {"functionCall": {"name": "click_at", "args": {"x": 450, "y": 280}}}
+      ]
+    },
+    {
+      "role": "user",
+      "parts": [
+        {"functionResponse": {"name": "click_at", "response": {"success": true}}},
+        {"inline_data": {"mime_type": "image/png", "data": "base64..."}}
+      ]
+    }
+  ],
+  "tools": [{"computer_use": {"environment": "ENVIRONMENT_BROWSER"}}]
+}
+```
+---
+## Development
+### Running Tests
+```bash
+# Unit tests (RSpec) - default rake task
+bundle exec rake
+# or
+bundle exec rake spec
+# Integration tests (Cucumber features in test app)
+cd spec/test_app
+bundle install
+bundle exec cucumber
+# Run specific feature
+bundle exec cucumber features/sign_up.feature
+```
+### Interactive Console
+```bash
+# Opens IRB with the gem loaded
+bin/console
+# Experiment with the gem
+> require 'touring_test'
+> driver = TouringTest::Driver.new(session, root_path: Dir.pwd)
+> agent = TouringTest::Agent.new(driver, "click the button")
+```
+### Building and Installing Locally
+```bash
+# Build the gem
+bundle exec rake build
+# Install locally
+bundle exec rake install
+# Release (requires RubyGems permissions)
+bundle exec rake release
+```
+---
+## Testing the Test App
+The `spec/test_app/` directory contains a complete Rails application for testing TouringTest end-to-end.
+### Test App Structure
+```
+spec/test_app/
+├── app/
+│   ├── controllers/
+│   │   ├── users_controller.rb      # Sign-up with dummy create action
+│   │   └── welcome_controller.rb    # Landing page
+│   └── views/
+│       ├── users/new.html.erb       # Sign-up form (email + password)
+│       └── welcome/index.html.erb   # Welcome message with flash
+├── features/
+│   ├── sign_up.feature              # Cucumber scenario
+│   ├── step_definitions/
+│   │   └── sign_up_steps.rb         # Uses computer_use()
+│   └── support/
+│       └── env.rb                   # Cucumber/Rails setup
+└── Gemfile
+```
+### Running the Test App
+```bash
+cd spec/test_app
+# Install dependencies
+bundle install
+# Run Cucumber features
+bundle exec cucumber
+# Start Rails server (for manual testing)
+bundle exec rails server
+# Rails console
+bundle exec rails console
+```
+### Test App Configuration
+- **Ruby:** 3.4.5
+- **Rails:** 7.1.2
+- **Database:** SQLite3
+- **Capybara Driver:** `selenium_chrome_headless`
+- **Database Cleaner:** `:truncation` strategy (for JavaScript tests)
+---
+## Troubleshooting
+### Missing API Key
+**Error:** `"GEMINI_API_KEY environment variable not set"`
+**Solution:**
+```bash
+export GEMINI_API_KEY='your_api_key_here'
+```
+Or add to `.env` file if using `dotenv`:
+```
+GEMINI_API_KEY=your_api_key_here
+```
+### Coordinate Misalignment (Clicks Wrong Location)
+**Symptom:** Agent clicks in wrong places on the page
+**Cause:** HiDPI/Retina display coordinate mismatch
+**Solution:** TouringTest automatically handles this by extracting screenshot dimensions. If issues persist:
+1. Check `tmp/screenshots/` to see what the AI sees
+2. Verify Capybara driver supports screenshot capture
+3. Check console output for coordinate denormalization debug info
+### Max Steps Exceeded
+**Error:** `"Agent exceeded maximum steps (15)"`
+**Cause:** Task too complex, AI stuck in loop, or impossible task
+**Solutions:**
+1. Break instruction into smaller steps
+2. Make instruction more specific
+3. Check screenshots to see where agent got stuck
+4. For advanced usage, increase `max_steps` when creating Agent
+### Screenshot Directory Permission Issues
+**Error:** Can't write to `tmp/screenshots/`
+**Solution:**
+```bash
+mkdir -p tmp/screenshots
+chmod 755 tmp/screenshots
+```
+Or specify a different `root_path`:
+```ruby
+computer_use(instruction, root_path: '/path/with/permissions')
+```
+### Agent Can't Find Elements
+**Symptom:** "Warning: No element found at (x, y)"
+**Possible causes:**
+1. Element not visible (hidden, off-screen)
+2. JavaScript not finished loading
+3. Element inside iframe (not currently supported)
+**Solutions:**
+- Add explicit wait steps: "wait for the page to load, then click submit"
+- Ensure elements are visible: `page.execute_script("window.scrollTo(0, 0)")`
+- Check screenshots to verify element visibility
+---
+## Limitations & Known Issues
+### Experimental API
+TouringTest relies on Google's **preview API** (`gemini-2.5-computer-use-preview-10-2025`):
+- May change without notice
+- No SLA or production guarantees
+- Rate limits apply
+### Step Limit Constraints
+- Default 15 steps may be insufficient for complex workflows
+- No dynamic adjustment based on task complexity
+- Manual tuning required for edge cases
+### Performance Considerations
+- Each step requires API call + screenshot capture (1-3 seconds)
+- Long tests can be slow (15 steps ≈ 30-45 seconds)
+- Not suitable for load testing or CI pipelines with strict time limits
+### HiDPI/Retina Display Requirements
+- Coordinate system assumes screenshot capture works correctly
+- Issues may occur on exotic display configurations
+- Tested primarily on macOS Retina displays
+### Iframes Not Supported
+- Agent cannot interact with elements inside iframes
+- Workaround: Use traditional Capybara `within_frame` blocks
+### No Multi-Tab/Window Support
+- Agent operates on single Capybara session
+- Cannot switch between tabs/windows automatically
+---
+## Roadmap / Future Plans
+- [ ] Support for additional Gemini models
+- [ ] Configurable step limits per instruction
+- [ ] Iframe interaction support
+- [ ] Multi-tab/window handling
+- [ ] Performance optimizations (screenshot caching, parallel API calls)
+- [ ] Alternative AI providers (OpenAI, Anthropic)
+- [ ] Visual regression testing mode
+- [ ] Accessibility testing integration
+- [ ] Record/replay functionality
+---
+## Contributing
+Bug reports and pull requests are welcome on GitHub at [https://github.com/stwerner92/touring_test](https://github.com/stwerner92/touring_test).
+### Development Setup
+1. Clone the repository
+2. Run `bin/setup` to install dependencies
+3. Run `rake spec` to run unit tests
+4. Run `cd spec/test_app && bundle exec cucumber` for integration tests
+### Pull Request Guidelines
+- Add tests for new functionality
+- Update README for user-facing changes
+- Follow existing code style
+- Keep commits focused and atomic
+---
+## License
+The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
+---
+## Credits & Acknowledgments
+**Author:** Scott Werner (stwerner@vt.edu)
+**Powered by:**
+- [Google Gemini API](https://ai.google.dev/) - AI computer use capabilities
+- [Cucumber](https://cucumber.io/) - BDD testing framework
+- [Capybara](https://github.com/teamcapybara/capybara) - Browser automation
+**Inspired by:** Anthropic's computer use demo and the vision of more maintainable, human-readable tests.
+---
+## Support
+- **Documentation:** [CLAUDE.md](./CLAUDE.md) contains detailed architectural information
+- **Issues:** [GitHub Issues](https://github.com/stwerner92/touring_test/issues)
+- **Email:** stwerner@vt.edu
+---
+Made with ❤️ for better testing experiences

data/Rakefile ADDED Viewed

@@ -0,0 +1,8 @@
+# frozen_string_literal: true
+require "bundler/gem_tasks"
+require "rspec/core/rake_task"
+RSpec::Core::RakeTask.new(:spec)
+task default: :spec

data/lib/touring_test/agent.rb ADDED Viewed

@@ -0,0 +1,155 @@
+# frozen_string_literal: true
+require "httparty"
+require "base64"
+require "json"
+require_relative "driver"
+module TouringTest
+  class Agent
+    attr_reader :driver, :instruction
+    def initialize(driver, instruction, max_steps: 15)
+      @driver = driver
+      @instruction = instruction
+      @api_key = ENV.fetch("GEMINI_API_KEY")
+      @max_steps = max_steps
+      @conversation_history = []
+      @log_file = setup_log_file
+    end
+    def run
+      # Initial turn
+      puts "[User] #{instruction}"
+      screenshot_path, url = driver.capture_screenshot_and_url
+      user_turn = build_user_turn(instruction, screenshot_path, url)
+      @conversation_history << user_turn
+      step_count = 0
+      loop do
+        if step_count >= @max_steps
+          raise "Agent exceeded maximum steps (#{@max_steps}). Halting execution to prevent infinite loop."
+        end
+        step_count += 1
+        puts "[Debug] === API Request Loop Iteration #{step_count}/#{@max_steps} ===" if debug?
+        response = make_api_request(@conversation_history)
+        model_turn = response.dig("candidates", 0, "content")
+        @conversation_history << model_turn
+        parts = model_turn["parts"]
+        function_calls = parts.select { |part| part["functionCall"] }
+        puts "[Debug] Found #{function_calls.size} function calls in response" if debug?
+        break if function_calls.empty?
+        function_responses = []
+        function_calls.each_with_index do |part, idx|
+          function_call = part["functionCall"]
+          next if function_call.nil?
+          puts "[Debug] Processing function call #{idx + 1}/#{function_calls.size}: #{function_call['name']}" if debug?
+          driver.execute_action(function_call)
+          screenshot_path, url = driver.capture_screenshot_and_url
+          puts "[Debug] Screenshot saved to: #{screenshot_path}" if debug?
+          # Each function response includes its own screenshot
+          function_responses << {
+            "functionResponse" => {
+              "name" => function_call["name"],
+              "response" => { "url" => url }
+            }
+          }
+          function_responses << {
+            "inline_data" => {
+              "mime_type" => "image/png",
+              "data" => encode_image(screenshot_path)
+            }
+          }
+        end
+        user_turn = { "role" => "user", "parts" => function_responses }
+        @conversation_history << user_turn
+      end
+    end
+    private
+    def debug?
+      ENV['TOURING_TEST_DEBUG'] == 'true' || ENV['TOURING_TEST_DEBUG'] == '1'
+    end
+    def build_user_turn(text, screenshot_path, url, function_responses = [])
+      parts = function_responses
+      parts << { "text" => text } if text
+      parts << { "inline_data" => { "mime_type" => "image/png", "data" => encode_image(screenshot_path) } }
+      parts << { "text" => "URL: #{url}" }
+      { "role" => "user", "parts" => parts }
+    end
+    def make_api_request(contents)
+      model = "gemini-2.5-computer-use-preview-10-2025"
+      api_url = "https://generativelanguage.googleapis.com/v1beta/models/#{model}:generateContent?key=#{@api_key}"
+      body = {
+        "contents" => contents,
+        "tools" => [{
+          "computer_use" => {
+            "environment" => "ENVIRONMENT_BROWSER"
+          }
+        }]
+      }
+      response = HTTParty.post(
+        api_url,
+        headers: { "Content-Type" => "application/json" },
+        body: body.to_json
+      )
+      if response.success?
+        parsed_response = response.parsed_response
+        # Log full JSON to file
+        log_api_call(body, parsed_response)
+        # Print clean conversation to console
+        print_conversation_update(parsed_response)
+        parsed_response
+      else
+        error_message = "Gemini API error: #{response.code} - #{response.body}"
+        raise StandardError, error_message
+      end
+    end
+    def encode_image(path)
+      Base64.strict_encode64(File.binread(path))
+    end
+    def setup_log_file
+      log_dir = File.join(Dir.pwd, "tmp")
+      FileUtils.mkdir_p(log_dir)
+      File.join(log_dir, "gemini_api_log.jsonl")
+    end
+    def log_api_call(request, response)
+      File.open(@log_file, "a") do |f|
+        f.puts({
+          timestamp: Time.now.iso8601,
+          request: request,
+          response: response
+        }.to_json)
+      end
+    end
+    def print_conversation_update(response)
+      parts = response.dig("candidates", 0, "content", "parts") || []
+      parts.each do |part|
+        if part["text"]
+          puts "[Model] #{part['text']}"
+        end
+      end
+    end
+  end
+end

data/lib/touring_test/driver.rb ADDED Viewed

@@ -0,0 +1,223 @@
+# frozen_string_literal: true
+require "capybara"
+require "fileutils"
+module TouringTest
+  class Driver
+    attr_reader :session
+    def initialize(session, root_path:)
+      @session = session
+      @screenshot_dir = File.join(root_path, "tmp", "screenshots")
+      @screenshot_count = 0
+      @last_screenshot_path = nil
+      FileUtils.mkdir_p(@screenshot_dir)
+      FileUtils.rm_f(Dir.glob("#{@screenshot_dir}/*")) # Clear old screenshots
+    end
+    def capture_screenshot_and_url
+      [capture_screenshot, session.current_url]
+    end
+    def execute_action(function_call)
+      action_name = function_call["name"].to_sym
+      args = function_call["args"].transform_keys(&:to_sym)
+      puts "[Action] #{action_name}(#{args.map { |k, v| "#{k}: #{v.inspect}" }.join(', ')})"
+      if respond_to?(action_name, true)
+        send(action_name, **args)
+      else
+        raise "Unknown action: #{action_name}"
+      end
+    end
+    private
+    def debug?
+      ENV['TOURING_TEST_DEBUG'] == 'true' || ENV['TOURING_TEST_DEBUG'] == '1'
+    end
+    def find_element_at(x, y)
+      # Search in reverse to find the most specific (innermost) element
+      session.all('*').to_a.reverse.find do |el|
+        rect = session.evaluate_script("arguments[0].getBoundingClientRect()", el)
+        rect['left'] <= x && x <= rect['right'] && rect['top'] <= y && y <= rect['bottom']
+      end
+    end
+    def capture_screenshot
+      @screenshot_count += 1
+      path = File.join(@screenshot_dir, "step_#{@screenshot_count}.png")
+      begin
+        session.save_screenshot(path)
+        @last_screenshot_path = path
+        if debug?
+          # Verify the file was actually created
+          if File.exist?(path)
+            file_size = File.size(path)
+            puts "[Debug] Screenshot captured: #{path} (#{file_size} bytes, count: #{@screenshot_count})"
+          else
+            puts "[Debug] WARNING: File not created at #{path}"
+          end
+        end
+      rescue => e
+        puts "[Error] Failed to save screenshot: #{e.message}"
+        puts "[Error] #{e.backtrace.first(3).join("\n")}" if debug?
+        raise
+      end
+      path
+    end
+    def denormalize_coordinates(x, y)
+      # Use screenshot dimensions instead of window size to account for device pixel ratio
+      require 'open3'
+      if @last_screenshot_path && File.exist?(@last_screenshot_path)
+        stdout, _stderr, _status = Open3.capture3("file", @last_screenshot_path)
+        if stdout =~ /(\d+) x (\d+)/
+          width = $1.to_i
+          height = $2.to_i
+          return [(x / 1000.0 * width).round, (y / 1000.0 * height).round]
+        end
+      end
+      # Fallback to window size
+      width = session.current_window.size[0]
+      height = session.current_window.size[1]
+      [(x / 1000.0 * width).round, (y / 1000.0 * height).round]
+    end
+    # Supported UI Actions
+    def click_at(x:, y:)
+      x, y = denormalize_coordinates(x, y)
+      element = find_element_at(x, y)
+      if element
+        element.click
+      else
+        puts "[Warning] No element found at (#{x}, #{y}) to click."
+      end
+    end
+    def type_text_at(x:, y:, text:, press_enter: false, clear_before_typing: false)
+      x, y = denormalize_coordinates(x, y)
+      element = find_element_at(x, y)
+      if element
+        element.click  # Focus the element first
+        if clear_before_typing
+          element.send_keys([:control, 'a'], :backspace)
+        end
+        element.send_keys(text)
+        element.send_keys(:enter) if press_enter
+      else
+        puts "[Warning] No element found at (#{x}, #{y}) to type in."
+      end
+    end
+    def hover_at(x:, y:)
+      x, y = denormalize_coordinates(x, y)
+      element = find_element_at(x, y)
+      if element
+        element.hover
+      else
+        puts "[Warning] No element found at (#{x}, #{y}) to hover over."
+      end
+    end
+    def scroll_document(direction:)
+      case direction.to_s.upcase
+      when "UP"
+        session.execute_script("window.scrollBy(0, -window.innerHeight)")
+      when "DOWN"
+        session.execute_script("window.scrollBy(0, window.innerHeight)")
+      when "LEFT"
+        session.execute_script("window.scrollBy(-window.innerWidth, 0)")
+      when "RIGHT"
+        session.execute_script("window.scrollBy(window.innerWidth, 0)")
+      else
+        raise "Unknown scroll direction: #{direction}"
+      end
+    end
+    def drag_and_drop(start_x:, start_y:, end_x:, end_y:)
+      start_x, start_y = denormalize_coordinates(start_x, start_y)
+      end_x, end_y = denormalize_coordinates(end_x, end_y)
+      element_to_drag = find_element_at(start_x, start_y)
+      target_element = find_element_at(end_x, end_y)
+      if element_to_drag
+        if target_element
+          element_to_drag.drag_to(target_element)
+        else
+          # Fallback if no specific target, drag by offset
+          session.driver.browser.action.drag_and_drop_by(element_to_drag.native, end_x - start_x, end_y - start_y).perform
+        end
+      else
+        puts "[Warning] No element found at (#{start_x}, #{start_y}) to drag."
+      end
+    end
+    def scroll_at(x:, y:, direction:)
+      x, y = denormalize_coordinates(x, y)
+      element = find_element_at(x, y)
+      if element
+        case direction.to_s.upcase
+        when "UP"
+          session.execute_script("arguments[0].scrollTop -= arguments[0].clientHeight", element)
+        when "DOWN"
+          session.execute_script("arguments[0].scrollTop += arguments[0].clientHeight", element)
+        when "LEFT"
+          session.execute_script("arguments[0].scrollLeft -= arguments[0].clientWidth", element)
+        when "RIGHT"
+          session.execute_script("arguments[0].scrollLeft += arguments[0].clientWidth", element)
+        else
+          raise "Unknown scroll direction: #{direction}"
+        end
+      else
+        puts "[Warning] No element found at (#{x}, #{y}) to scroll."
+      end
+    end
+    def navigate(url:)
+      session.visit(url)
+    end
+    def go_back
+      session.go_back
+    end
+    def go_forward
+      session.go_forward
+    end
+    def wait_5_seconds
+      sleep 5
+    end
+    def key_combination(keys:)
+      # keys can be a string like "enter" or "ctrl+a"
+      # Convert to Capybara format
+      key_array = keys.split('+').map do |k|
+        case k.strip.downcase
+        when 'ctrl', 'control' then :control
+        when 'shift' then :shift
+        when 'alt', 'option' then :alt
+        when 'cmd', 'command', 'meta' then :command
+        when 'enter', 'return' then :enter
+        when 'tab' then :tab
+        when 'escape', 'esc' then :escape
+        when 'backspace' then :backspace
+        when 'delete' then :delete
+        else k
+        end
+      end
+      session.send_keys(key_array)
+    end
+  end
+end

data/lib/touring_test/railtie.rb ADDED Viewed

@@ -0,0 +1,26 @@
+# frozen_string_literal: true
+require "rails"
+module TouringTest
+  class Railtie < Rails::Railtie
+    generators do |app|
+      Rails::Generators.invoke "cucumber:install"
+    end
+    initializer "cucumber_gemini_computer_use.load" do
+      ActiveSupport.on_load(:after_initialize) do
+        support_file = Rails.root.join("features/support/touring_test.rb")
+        unless File.exist?(support_file)
+          File.open(support_file, "w") do |f|
+            f.puts "# frozen_string_literal: true"
+            f.puts ""
+            f.puts "require 'touring_test'"
+            f.puts ""
+            f.puts "World(TouringTest::WorldExtension)"
+          end
+        end
+      end
+    end
+  end
+end

data/lib/touring_test/version.rb ADDED Viewed

@@ -0,0 +1,5 @@
+# frozen_string_literal: true
+module TouringTest
+  VERSION = "0.0.1"
+end

data/lib/touring_test/world_extension.rb ADDED Viewed

@@ -0,0 +1,14 @@
+# frozen_string_literal: true
+require_relative "agent"
+require_relative "driver"
+module TouringTest
+  module WorldExtension
+    def computer_use(instruction, root_path: Dir.pwd)
+      driver = Driver.new(page, root_path: root_path)
+      agent = Agent.new(driver, instruction)
+      agent.run
+    end
+  end
+end

data/lib/touring_test.rb ADDED Viewed

@@ -0,0 +1,11 @@
+# frozen_string_literal: true
+require_relative "touring_test/version"
+require_relative "touring_test/driver"
+require_relative "touring_test/agent"
+require_relative "touring_test/world_extension"
+module TouringTest
+  class Error < StandardError; end
+  # Your code goes here...
+end

data/sig/cucumber/gemini/computer/use.rbs ADDED Viewed

@@ -0,0 +1,10 @@
+module Cucumber
+  module Gemini
+    module Computer
+      module Use
+        VERSION: String
+        # See the writing guide of rbs: https://github.com/ruby/rbs#guides
+      end
+    end
+  end
+end

data/touring_test_example.png ADDED Viewed

Binary file

metadata ADDED Viewed

@@ -0,0 +1,110 @@
+--- !ruby/object:Gem::Specification
+name: touring_test
+version: !ruby/object:Gem::Version
+  version: 0.0.1
+platform: ruby
+authors:
+- Scott Werner
+bindir: exe
+cert_chain: []
+date: 1980-01-02 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: cucumber
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: capybara
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: httparty
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: playwright-ruby-client
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+description: This gem provides a simple way to use Google's 'computer use' Gemini
+  model within your Cucumber step definitions, allowing you to write high-level instructions
+  for an AI agent to execute.
+email:
+- scott@sublayer.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- ".claude/settings.local.json"
+- README.md
+- Rakefile
+- lib/touring_test.rb
+- lib/touring_test/agent.rb
+- lib/touring_test/driver.rb
+- lib/touring_test/railtie.rb
+- lib/touring_test/version.rb
+- lib/touring_test/world_extension.rb
+- sig/cucumber/gemini/computer/use.rbs
+- touring_test_example.png
+homepage: https://github.com/stwerner92/touring_test
+licenses: []
+metadata:
+  homepage_uri: https://github.com/stwerner92/touring_test
+  source_code_uri: https://github.com/works-on-your-machine/touring_test
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: 3.2.0
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubygems_version: 3.7.2
+specification_version: 4
+summary: A Cucumber support gem for using Google's 'computer use' Gemini model.
+test_files: []