touring_test 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 02c8f40628071610d3959b937580803df4ecae32fc652b40fa7283e8331d7779
4
+ data.tar.gz: c6b41686b43ff5d347f971c370f0c425ac854d74c92f5ea9e16c79b64dbecfb4
5
+ SHA512:
6
+ metadata.gz: c46e849c9096371879df2d2b4c8f3ab99ee6675d8b829321e89d3bd121b840e871985c5280d22a84a6d16ab2c2701a3e211641396a30400c43f631b1388dd452
7
+ data.tar.gz: 2e7b28a9a2a2071cabb1059e891b09010afdc489517dfe63dbf305b465961bea201fc60edcb94374297a0ad0ebbe38480be309821b3ac8c2f766de89568194a4
@@ -0,0 +1,12 @@
1
+ {
2
+ "permissions": {
3
+ "allow": [
4
+ "Bash(tree:*)",
5
+ "Bash(cat:*)",
6
+ "Bash(TOURING_TEST_DEBUG=true bundle exec cucumber:*)",
7
+ "Bash(bundle exec rake:*)"
8
+ ],
9
+ "deny": [],
10
+ "ask": []
11
+ }
12
+ }
data/README.md ADDED
@@ -0,0 +1,752 @@
1
+ # TouringTest
2
+
3
+ > AI-Powered Natural Language Testing for Cucumber
4
+
5
+ [![Ruby](https://img.shields.io/badge/ruby-%3E%3D%203.2.0-ruby.svg)](https://www.ruby-lang.org/en/)
6
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
7
+
8
+ **TouringTest** is a Ruby gem that integrates Google's Gemini "computer use" AI model with Cucumber testing framework. Write high-level, natural language test instructions and watch as an AI agent executes them by analyzing screenshots and performing browser actions via Capybara.
9
+
10
+ **Status:** ⚠️ Experimental - relies on Google's preview API (`gemini-2.5-computer-use-preview-10-2025`)
11
+
12
+ ---
13
+
14
+ ## What is TouringTest?
15
+
16
+ Traditional Cucumber tests require writing step definitions that use brittle CSS selectors and detailed browser automation logic. TouringTest flips this model:
17
+
18
+ ```ruby
19
+ # Traditional approach
20
+ When('I sign up with email and password') do
21
+ visit sign_up_path
22
+ fill_in 'user[email]', with: 'test@example.com'
23
+ fill_in 'user[password]', with: 'password123'
24
+ click_button 'Sign Up'
25
+ end
26
+
27
+ # TouringTest approach
28
+ When('the agent {string}') do |instruction|
29
+ computer_use(instruction)
30
+ end
31
+
32
+ # In your feature file:
33
+ When the agent "signs up with email 'test@example.com' and password 'password123'"
34
+ ```
35
+
36
+ The AI agent:
37
+ 1. Takes a screenshot of the current page
38
+ 2. Analyzes it to understand the UI layout
39
+ 3. Determines what actions to take (click fields, type text, submit forms)
40
+ 4. Executes those actions via Capybara
41
+ 5. Repeats until the goal is achieved
42
+
43
+ **Benefits:**
44
+ - **More resilient tests** - No brittle CSS selectors that break when markup changes
45
+ - **Usability testing** - Tests reflect real user interactions
46
+ - **Faster test writing** - Describe what you want, not how to do it
47
+ - **Better readability** - Tests read like user stories
48
+ - **Self-healing** - AI adapts to UI changes automatically
49
+
50
+ ---
51
+
52
+ ## Quick Start
53
+
54
+ ### Prerequisites
55
+
56
+ - Ruby >= 3.2.0
57
+ - A Google Gemini API key ([Get one here](https://aistudio.google.com/apikey))
58
+
59
+ ### Installation
60
+
61
+ ```bash
62
+ # Add to your Gemfile
63
+ gem 'touring_test'
64
+
65
+ # Install
66
+ bundle install
67
+
68
+ # Set your API key
69
+ export GEMINI_API_KEY='your_api_key_here'
70
+ ```
71
+
72
+ ### Minimal Example
73
+
74
+ ```ruby
75
+ # features/support/env.rb
76
+ require 'touring_test'
77
+ require 'capybara'
78
+
79
+ Capybara.default_driver = :selenium_chrome_headless
80
+ World(TouringTest::WorldExtension)
81
+
82
+ # features/step_definitions/agent_steps.rb
83
+ When('the agent {string}') do |instruction|
84
+ computer_use(instruction)
85
+ end
86
+
87
+ # features/login.feature
88
+ Feature: User Login
89
+ Scenario: Successful login
90
+ Given I am on the login page
91
+ When the agent "logs in with username 'admin' and password 'secret'"
92
+ Then I should see the dashboard
93
+ ```
94
+
95
+ ### Example Output
96
+
97
+ Here's what TouringTest looks like in action:
98
+
99
+ ![TouringTest Example Output](touring_test_example.png)
100
+
101
+ The AI agent narrates its actions in real-time, showing:
102
+ - Each step it evaluates
103
+ - The actions it takes (click_at, type_text_at, etc.)
104
+ - Its reasoning about what to do next
105
+ - Final success confirmation
106
+
107
+ ---
108
+
109
+ ## Installation (Detailed)
110
+
111
+ ### 1. Add the Gem
112
+
113
+ Add to your `Gemfile`:
114
+
115
+ ```ruby
116
+ gem 'touring_test'
117
+ ```
118
+
119
+ Or install directly:
120
+
121
+ ```bash
122
+ gem install touring_test
123
+ ```
124
+
125
+ ### 2. Set Up API Key
126
+
127
+ Get a Gemini API key from [Google AI Studio](https://aistudio.google.com/apikey) and set it as an environment variable:
128
+
129
+ ```bash
130
+ # In your shell or .env file
131
+ export GEMINI_API_KEY='your_api_key_here'
132
+ ```
133
+
134
+ ### 3. Configure Cucumber
135
+
136
+ Add the following to your `features/support/env.rb`:
137
+
138
+ ```ruby
139
+ require 'touring_test'
140
+ require 'capybara'
141
+
142
+ # Configure your Capybara driver (Selenium, Playwright, etc.)
143
+ Capybara.default_driver = :selenium_chrome_headless
144
+
145
+ # Add TouringTest's WorldExtension to Cucumber
146
+ World(TouringTest::WorldExtension)
147
+ ```
148
+
149
+ If you're using Rails, you may need to create this file if it doesn't exist yet (usually generated by `rails generate cucumber:install`).
150
+
151
+ ---
152
+
153
+ ## Usage
154
+
155
+ ### Basic Usage
156
+
157
+ The core of TouringTest is the `computer_use` method, which accepts a natural language instruction:
158
+
159
+ ```ruby
160
+ # In your step definitions
161
+ When('the agent {string}') do |instruction|
162
+ computer_use(instruction)
163
+ end
164
+ ```
165
+
166
+ Now you can write Cucumber scenarios like:
167
+
168
+ ```gherkin
169
+ Scenario: User creates an account
170
+ Given I am on the homepage
171
+ When the agent "clicks on Sign Up and creates an account with email 'user@example.com'"
172
+ Then I should see "Welcome!"
173
+ ```
174
+
175
+ ### Writing Effective Natural Language Instructions
176
+
177
+ **Good instructions:**
178
+ - Be specific about the goal: "sign up with email 'test@example.com' and password 'password123'"
179
+ - Include exact text when important: "click the blue 'Submit' button"
180
+ - Break complex tasks into steps if needed
181
+
182
+ **Less effective:**
183
+ - Too vague: "do the signup thing"
184
+ - Missing critical data: "sign up with some credentials"
185
+ - Overly complex: "navigate through multiple pages and fill out everything"
186
+
187
+ ### Available UI Actions
188
+
189
+ The AI agent can perform these 11 browser actions:
190
+
191
+ | Action | Description | Example Use |
192
+ |--------|-------------|-------------|
193
+ | `click_at(x:, y:)` | Click element at coordinates | Clicking buttons, links |
194
+ | `type_text_at(x:, y:, text:)` | Type in an input field | Filling forms |
195
+ | `hover_at(x:, y:)` | Hover over element | Revealing dropdowns |
196
+ | `scroll_document(direction:)` | Scroll entire page | UP, DOWN, LEFT, RIGHT |
197
+ | `scroll_at(x:, y:, direction:)` | Scroll specific element | Scrollable divs |
198
+ | `drag_and_drop(start_x:, start_y:, end_x:, end_y:)` | Drag element | Reordering lists |
199
+ | `navigate(url:)` | Go to URL | Changing pages |
200
+ | `go_back()` | Browser back button | Navigation |
201
+ | `go_forward()` | Browser forward button | Navigation |
202
+ | `wait_5_seconds()` | Explicit wait | Slow loading |
203
+ | `key_combination(keys:)` | Keyboard shortcuts | "enter", "ctrl+a" |
204
+
205
+ The agent automatically chooses which actions to use based on its analysis of your instruction and the page screenshot.
206
+
207
+ ### Configuration Options
208
+
209
+ ```ruby
210
+ # Default usage (screenshots saved to current directory)
211
+ computer_use("sign up with email 'test@example.com'")
212
+
213
+ # Custom root path for screenshots
214
+ computer_use(
215
+ "sign up with email 'test@example.com'",
216
+ root_path: Rails.root
217
+ )
218
+ ```
219
+
220
+ ### Screenshots & Debugging
221
+
222
+ TouringTest automatically captures screenshots at each step:
223
+
224
+ - **Location:** `{root_path}/tmp/screenshots/`
225
+ - **Naming:** `step_1.png`, `step_2.png`, etc.
226
+ - **Cleared:** At the start of each test run
227
+
228
+ **API Logs:**
229
+ - Full request/response JSON logged to `tmp/gemini_api_log.jsonl`
230
+ - Useful for debugging API issues or understanding agent decisions
231
+
232
+ **Console Output:**
233
+ - Shows each instruction sent to the agent
234
+ - Displays actions taken (e.g., "click_at(x: 450, y: 320)")
235
+ - Reports success or failure
236
+
237
+ ---
238
+
239
+ ## How It Works
240
+
241
+ ### Architecture
242
+
243
+ TouringTest uses a clean three-layer architecture:
244
+
245
+ ```
246
+ ┌─────────────────────────────────────────┐
247
+ │ Cucumber Step Definition │
248
+ │ computer_use("sign up with email...") │
249
+ └──────────────┬──────────────────────────┘
250
+
251
+
252
+ ┌─────────────────────────────────────────┐
253
+ │ Agent (AI Orchestrator) │
254
+ │ - Manages conversation with Gemini API │
255
+ │ - Captures screenshots │
256
+ │ - Enforces step limits │
257
+ └──────────────┬──────────────────────────┘
258
+
259
+
260
+ ┌─────────────────────────────────────────┐
261
+ │ Driver (Browser Automation Facade) │
262
+ │ - Executes UI actions │
263
+ │ - Denormalizes coordinates │
264
+ │ - Wraps Capybara session │
265
+ └──────────────┬──────────────────────────┘
266
+
267
+
268
+ ┌─────────────────────────────────────────┐
269
+ │ Capybara / Browser │
270
+ └─────────────────────────────────────────┘
271
+ ```
272
+
273
+ ### Conversation Flow
274
+
275
+ ```
276
+ 1. Initial Turn:
277
+ User: "sign up with email 'test@example.com' and password 'password123'"
278
+ + Base64 screenshot of current page
279
+ + Current URL
280
+
281
+ 2. Gemini API Request:
282
+ POST with full conversation history + computer_use tool specification
283
+
284
+ 3. Gemini Response:
285
+ Function calls: [
286
+ {name: "click_at", args: {x: 450, y: 320}},
287
+ {name: "type_text_at", args: {x: 450, y: 320, text: "test@example.com"}}
288
+ ]
289
+
290
+ 4. Driver Execution:
291
+ - Executes each action
292
+ - Captures new screenshot after each action
293
+
294
+ 5. Next Turn:
295
+ User: Function responses + new screenshot + new URL
296
+
297
+ 6. Loop continues until:
298
+ - Gemini returns no function calls (goal achieved), OR
299
+ - Maximum steps reached (default: 15)
300
+ ```
301
+
302
+ ### Coordinate System
303
+
304
+ Gemini returns **normalized coordinates** in a 0-1000 range. TouringTest converts these to pixel coordinates:
305
+
306
+ - **API sends:** `{x: 500, y: 250}` (middle of screen on 1000-unit scale)
307
+ - **Driver converts:** `(500 / 1000.0) * screenshot_width` → pixel position
308
+
309
+ **Critical Detail:** Coordinates are denormalized using **screenshot dimensions**, not window size, to handle HiDPI/Retina displays correctly. On a 2x display:
310
+ - Window size: 1512×834
311
+ - Screenshot size: 756×417
312
+ - Agent analyzes the 756×417 screenshot, so coordinates must match those dimensions
313
+
314
+ ### Step Limit
315
+
316
+ To prevent infinite loops from AI hallucination or impossible tasks:
317
+ - **Default:** 15 steps maximum
318
+ - **Configurable:** Pass `max_steps` to Agent (for advanced usage)
319
+ - **Exception raised:** If limit exceeded
320
+
321
+ ---
322
+
323
+ ## Example: Real-World Test
324
+
325
+ Here's a complete example from the test app included in this gem:
326
+
327
+ **Feature file** (`features/sign_up.feature`):
328
+ ```gherkin
329
+ Feature: Sign up
330
+ Scenario: User signs up with email and password
331
+ Given I am on the sign up page
332
+ When the agent "signs up with the email address 'test@example.com' and password 'password123'"
333
+ Then I should be signed in
334
+ ```
335
+
336
+ **Step definitions** (`features/step_definitions/sign_up_steps.rb`):
337
+ ```ruby
338
+ Given('I am on the sign up page') do
339
+ visit sign_up_path
340
+ end
341
+
342
+ When('the agent {string}') do |instruction|
343
+ computer_use(instruction, root_path: Rails.root)
344
+ end
345
+
346
+ Then('I should be signed in') do
347
+ expect(page).to have_content('Welcome')
348
+ end
349
+ ```
350
+
351
+ **What the AI does:**
352
+ 1. Analyzes screenshot of sign-up form
353
+ 2. Identifies email input field coordinates
354
+ 3. Clicks email field: `click_at(x: 450, y: 280)`
355
+ 4. Types email: `type_text_at(x: 450, y: 280, text: "test@example.com")`
356
+ 5. Identifies password field
357
+ 6. Clicks password field: `click_at(x: 450, y: 350)`
358
+ 7. Types password: `type_text_at(x: 450, y: 350, text: "password123")`
359
+ 8. Finds Submit button
360
+ 9. Clicks Submit: `click_at(x: 500, y: 420)`
361
+ 10. Mission accomplished!
362
+
363
+ ---
364
+
365
+ ## File Structure & Architecture
366
+
367
+ TouringTest follows standard Ruby gem conventions:
368
+
369
+ ```
370
+ touring_test/
371
+ ├── lib/
372
+ │ ├── touring_test.rb # Main entry point
373
+ │ └── touring_test/
374
+ │ ├── version.rb # VERSION = "0.1.0"
375
+ │ ├── agent.rb # AI orchestration (147 lines)
376
+ │ ├── driver.rb # Browser automation (200 lines)
377
+ │ ├── world_extension.rb # Cucumber integration (14 lines)
378
+ │ └── railtie.rb # Rails auto-setup (26 lines)
379
+ ├── spec/
380
+ │ ├── spec_helper.rb
381
+ │ ├── touring_test_spec.rb # Basic tests
382
+ │ ├── touring_test/
383
+ │ │ ├── agent_spec.rb # Unit tests for Agent
384
+ │ │ └── driver_spec.rb # Unit tests for Driver
385
+ │ └── test_app/ # Full Rails integration test app
386
+ │ ├── app/ # Rails app with sign-up flow
387
+ │ ├── features/ # Cucumber features
388
+ │ └── Gemfile
389
+ ├── bin/
390
+ │ ├── console # IRB with gem loaded
391
+ │ └── setup # Automated setup script
392
+ ├── Gemfile
393
+ ├── Rakefile
394
+ ├── touring_test.gemspec
395
+ └── README.md
396
+ ```
397
+
398
+ ### Core Components
399
+
400
+ 1. **Agent** (`lib/touring_test/agent.rb`)
401
+ - Orchestrates conversation with Gemini API
402
+ - Maintains conversation history during execution
403
+ - Enforces max step limit (default: 15)
404
+ - Logs full API interactions to `tmp/gemini_api_log.jsonl`
405
+
406
+ 2. **Driver** (`lib/touring_test/driver.rb`)
407
+ - Wraps Capybara session with AI-friendly interface
408
+ - Handles coordinate denormalization (0-1000 → pixels)
409
+ - Executes 11 different UI actions
410
+ - Manages screenshot capture
411
+
412
+ 3. **WorldExtension** (`lib/touring_test/world_extension.rb`)
413
+ - Provides `computer_use()` method to Cucumber World
414
+ - Bridges step definitions to Agent/Driver
415
+
416
+ 4. **Railtie** (`lib/touring_test/railtie.rb`)
417
+ - Automatic Rails integration
418
+ - Generates support files
419
+ - Zero-config experience
420
+
421
+ ### Test App (Non-Standard)
422
+
423
+ The `spec/test_app/` directory contains a **complete Rails 7.1.2 application** for integration testing. This is unusual for a gem (most use minimal fixtures), but valuable for demonstrating end-to-end functionality.
424
+
425
+ ---
426
+
427
+ ## API & Configuration
428
+
429
+ ### Gemini API Requirements
430
+
431
+ - **Model:** `gemini-2.5-computer-use-preview-10-2025`
432
+ - **Endpoint:** `https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent`
433
+ - **Authentication:** API key via query parameter: `?key={GEMINI_API_KEY}`
434
+ - **Required Tool Specification:**
435
+ ```json
436
+ {
437
+ "computer_use": {
438
+ "environment": "ENVIRONMENT_BROWSER"
439
+ }
440
+ }
441
+ ```
442
+
443
+ ### Environment Variables
444
+
445
+ - **`GEMINI_API_KEY`** (required): Your Google API key for Gemini access
446
+
447
+ Get your API key: [https://aistudio.google.com/apikey](https://aistudio.google.com/apikey)
448
+
449
+ ### API Request Format
450
+
451
+ The Agent sends multi-turn conversations to Gemini:
452
+
453
+ ```json
454
+ {
455
+ "contents": [
456
+ {
457
+ "role": "user",
458
+ "parts": [
459
+ {"text": "sign up with email 'test@example.com'"},
460
+ {"inline_data": {"mime_type": "image/png", "data": "base64..."}},
461
+ {"text": "Current URL: http://localhost:3000/sign_up"}
462
+ ]
463
+ },
464
+ {
465
+ "role": "model",
466
+ "parts": [
467
+ {"functionCall": {"name": "click_at", "args": {"x": 450, "y": 280}}}
468
+ ]
469
+ },
470
+ {
471
+ "role": "user",
472
+ "parts": [
473
+ {"functionResponse": {"name": "click_at", "response": {"success": true}}},
474
+ {"inline_data": {"mime_type": "image/png", "data": "base64..."}}
475
+ ]
476
+ }
477
+ ],
478
+ "tools": [{"computer_use": {"environment": "ENVIRONMENT_BROWSER"}}]
479
+ }
480
+ ```
481
+
482
+ ---
483
+
484
+ ## Development
485
+
486
+ ### Running Tests
487
+
488
+ ```bash
489
+ # Unit tests (RSpec) - default rake task
490
+ bundle exec rake
491
+ # or
492
+ bundle exec rake spec
493
+
494
+ # Integration tests (Cucumber features in test app)
495
+ cd spec/test_app
496
+ bundle install
497
+ bundle exec cucumber
498
+
499
+ # Run specific feature
500
+ bundle exec cucumber features/sign_up.feature
501
+ ```
502
+
503
+ ### Interactive Console
504
+
505
+ ```bash
506
+ # Opens IRB with the gem loaded
507
+ bin/console
508
+
509
+ # Experiment with the gem
510
+ > require 'touring_test'
511
+ > driver = TouringTest::Driver.new(session, root_path: Dir.pwd)
512
+ > agent = TouringTest::Agent.new(driver, "click the button")
513
+ ```
514
+
515
+ ### Building and Installing Locally
516
+
517
+ ```bash
518
+ # Build the gem
519
+ bundle exec rake build
520
+
521
+ # Install locally
522
+ bundle exec rake install
523
+
524
+ # Release (requires RubyGems permissions)
525
+ bundle exec rake release
526
+ ```
527
+
528
+ ---
529
+
530
+ ## Testing the Test App
531
+
532
+ The `spec/test_app/` directory contains a complete Rails application for testing TouringTest end-to-end.
533
+
534
+ ### Test App Structure
535
+
536
+ ```
537
+ spec/test_app/
538
+ ├── app/
539
+ │ ├── controllers/
540
+ │ │ ├── users_controller.rb # Sign-up with dummy create action
541
+ │ │ └── welcome_controller.rb # Landing page
542
+ │ └── views/
543
+ │ ├── users/new.html.erb # Sign-up form (email + password)
544
+ │ └── welcome/index.html.erb # Welcome message with flash
545
+ ├── features/
546
+ │ ├── sign_up.feature # Cucumber scenario
547
+ │ ├── step_definitions/
548
+ │ │ └── sign_up_steps.rb # Uses computer_use()
549
+ │ └── support/
550
+ │ └── env.rb # Cucumber/Rails setup
551
+ └── Gemfile
552
+ ```
553
+
554
+ ### Running the Test App
555
+
556
+ ```bash
557
+ cd spec/test_app
558
+
559
+ # Install dependencies
560
+ bundle install
561
+
562
+ # Run Cucumber features
563
+ bundle exec cucumber
564
+
565
+ # Start Rails server (for manual testing)
566
+ bundle exec rails server
567
+
568
+ # Rails console
569
+ bundle exec rails console
570
+ ```
571
+
572
+ ### Test App Configuration
573
+
574
+ - **Ruby:** 3.4.5
575
+ - **Rails:** 7.1.2
576
+ - **Database:** SQLite3
577
+ - **Capybara Driver:** `selenium_chrome_headless`
578
+ - **Database Cleaner:** `:truncation` strategy (for JavaScript tests)
579
+
580
+ ---
581
+
582
+ ## Troubleshooting
583
+
584
+ ### Missing API Key
585
+
586
+ **Error:** `"GEMINI_API_KEY environment variable not set"`
587
+
588
+ **Solution:**
589
+ ```bash
590
+ export GEMINI_API_KEY='your_api_key_here'
591
+ ```
592
+
593
+ Or add to `.env` file if using `dotenv`:
594
+ ```
595
+ GEMINI_API_KEY=your_api_key_here
596
+ ```
597
+
598
+ ### Coordinate Misalignment (Clicks Wrong Location)
599
+
600
+ **Symptom:** Agent clicks in wrong places on the page
601
+
602
+ **Cause:** HiDPI/Retina display coordinate mismatch
603
+
604
+ **Solution:** TouringTest automatically handles this by extracting screenshot dimensions. If issues persist:
605
+ 1. Check `tmp/screenshots/` to see what the AI sees
606
+ 2. Verify Capybara driver supports screenshot capture
607
+ 3. Check console output for coordinate denormalization debug info
608
+
609
+ ### Max Steps Exceeded
610
+
611
+ **Error:** `"Agent exceeded maximum steps (15)"`
612
+
613
+ **Cause:** Task too complex, AI stuck in loop, or impossible task
614
+
615
+ **Solutions:**
616
+ 1. Break instruction into smaller steps
617
+ 2. Make instruction more specific
618
+ 3. Check screenshots to see where agent got stuck
619
+ 4. For advanced usage, increase `max_steps` when creating Agent
620
+
621
+ ### Screenshot Directory Permission Issues
622
+
623
+ **Error:** Can't write to `tmp/screenshots/`
624
+
625
+ **Solution:**
626
+ ```bash
627
+ mkdir -p tmp/screenshots
628
+ chmod 755 tmp/screenshots
629
+ ```
630
+
631
+ Or specify a different `root_path`:
632
+ ```ruby
633
+ computer_use(instruction, root_path: '/path/with/permissions')
634
+ ```
635
+
636
+ ### Agent Can't Find Elements
637
+
638
+ **Symptom:** "Warning: No element found at (x, y)"
639
+
640
+ **Possible causes:**
641
+ 1. Element not visible (hidden, off-screen)
642
+ 2. JavaScript not finished loading
643
+ 3. Element inside iframe (not currently supported)
644
+
645
+ **Solutions:**
646
+ - Add explicit wait steps: "wait for the page to load, then click submit"
647
+ - Ensure elements are visible: `page.execute_script("window.scrollTo(0, 0)")`
648
+ - Check screenshots to verify element visibility
649
+
650
+ ---
651
+
652
+ ## Limitations & Known Issues
653
+
654
+ ### Experimental API
655
+
656
+ TouringTest relies on Google's **preview API** (`gemini-2.5-computer-use-preview-10-2025`):
657
+ - May change without notice
658
+ - No SLA or production guarantees
659
+ - Rate limits apply
660
+
661
+ ### Step Limit Constraints
662
+
663
+ - Default 15 steps may be insufficient for complex workflows
664
+ - No dynamic adjustment based on task complexity
665
+ - Manual tuning required for edge cases
666
+
667
+ ### Performance Considerations
668
+
669
+ - Each step requires API call + screenshot capture (1-3 seconds)
670
+ - Long tests can be slow (15 steps ≈ 30-45 seconds)
671
+ - Not suitable for load testing or CI pipelines with strict time limits
672
+
673
+ ### HiDPI/Retina Display Requirements
674
+
675
+ - Coordinate system assumes screenshot capture works correctly
676
+ - Issues may occur on exotic display configurations
677
+ - Tested primarily on macOS Retina displays
678
+
679
+ ### Iframes Not Supported
680
+
681
+ - Agent cannot interact with elements inside iframes
682
+ - Workaround: Use traditional Capybara `within_frame` blocks
683
+
684
+ ### No Multi-Tab/Window Support
685
+
686
+ - Agent operates on single Capybara session
687
+ - Cannot switch between tabs/windows automatically
688
+
689
+ ---
690
+
691
+ ## Roadmap / Future Plans
692
+
693
+ - [ ] Support for additional Gemini models
694
+ - [ ] Configurable step limits per instruction
695
+ - [ ] Iframe interaction support
696
+ - [ ] Multi-tab/window handling
697
+ - [ ] Performance optimizations (screenshot caching, parallel API calls)
698
+ - [ ] Alternative AI providers (OpenAI, Anthropic)
699
+ - [ ] Visual regression testing mode
700
+ - [ ] Accessibility testing integration
701
+ - [ ] Record/replay functionality
702
+
703
+ ---
704
+
705
+ ## Contributing
706
+
707
+ Bug reports and pull requests are welcome on GitHub at [https://github.com/stwerner92/touring_test](https://github.com/stwerner92/touring_test).
708
+
709
+ ### Development Setup
710
+
711
+ 1. Clone the repository
712
+ 2. Run `bin/setup` to install dependencies
713
+ 3. Run `rake spec` to run unit tests
714
+ 4. Run `cd spec/test_app && bundle exec cucumber` for integration tests
715
+
716
+ ### Pull Request Guidelines
717
+
718
+ - Add tests for new functionality
719
+ - Update README for user-facing changes
720
+ - Follow existing code style
721
+ - Keep commits focused and atomic
722
+
723
+ ---
724
+
725
+ ## License
726
+
727
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
728
+
729
+ ---
730
+
731
+ ## Credits & Acknowledgments
732
+
733
+ **Author:** Scott Werner (stwerner@vt.edu)
734
+
735
+ **Powered by:**
736
+ - [Google Gemini API](https://ai.google.dev/) - AI computer use capabilities
737
+ - [Cucumber](https://cucumber.io/) - BDD testing framework
738
+ - [Capybara](https://github.com/teamcapybara/capybara) - Browser automation
739
+
740
+ **Inspired by:** Anthropic's computer use demo and the vision of more maintainable, human-readable tests.
741
+
742
+ ---
743
+
744
+ ## Support
745
+
746
+ - **Documentation:** [CLAUDE.md](./CLAUDE.md) contains detailed architectural information
747
+ - **Issues:** [GitHub Issues](https://github.com/stwerner92/touring_test/issues)
748
+ - **Email:** stwerner@vt.edu
749
+
750
+ ---
751
+
752
+ Made with ❤️ for better testing experiences
data/Rakefile ADDED
@@ -0,0 +1,8 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "rspec/core/rake_task"
5
+
6
+ RSpec::Core::RakeTask.new(:spec)
7
+
8
+ task default: :spec
@@ -0,0 +1,155 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "httparty"
4
+ require "base64"
5
+ require "json"
6
+ require_relative "driver"
7
+
8
+ module TouringTest
9
+ class Agent
10
+ attr_reader :driver, :instruction
11
+
12
+ def initialize(driver, instruction, max_steps: 15)
13
+ @driver = driver
14
+ @instruction = instruction
15
+ @api_key = ENV.fetch("GEMINI_API_KEY")
16
+ @max_steps = max_steps
17
+ @conversation_history = []
18
+ @log_file = setup_log_file
19
+ end
20
+
21
+ def run
22
+ # Initial turn
23
+ puts "[User] #{instruction}"
24
+ screenshot_path, url = driver.capture_screenshot_and_url
25
+ user_turn = build_user_turn(instruction, screenshot_path, url)
26
+ @conversation_history << user_turn
27
+
28
+ step_count = 0
29
+ loop do
30
+ if step_count >= @max_steps
31
+ raise "Agent exceeded maximum steps (#{@max_steps}). Halting execution to prevent infinite loop."
32
+ end
33
+ step_count += 1
34
+ puts "[Debug] === API Request Loop Iteration #{step_count}/#{@max_steps} ===" if debug?
35
+
36
+ response = make_api_request(@conversation_history)
37
+ model_turn = response.dig("candidates", 0, "content")
38
+ @conversation_history << model_turn
39
+
40
+ parts = model_turn["parts"]
41
+ function_calls = parts.select { |part| part["functionCall"] }
42
+
43
+ puts "[Debug] Found #{function_calls.size} function calls in response" if debug?
44
+ break if function_calls.empty?
45
+
46
+ function_responses = []
47
+ function_calls.each_with_index do |part, idx|
48
+ function_call = part["functionCall"]
49
+ next if function_call.nil?
50
+
51
+ puts "[Debug] Processing function call #{idx + 1}/#{function_calls.size}: #{function_call['name']}" if debug?
52
+ driver.execute_action(function_call)
53
+ screenshot_path, url = driver.capture_screenshot_and_url
54
+ puts "[Debug] Screenshot saved to: #{screenshot_path}" if debug?
55
+
56
+ # Each function response includes its own screenshot
57
+ function_responses << {
58
+ "functionResponse" => {
59
+ "name" => function_call["name"],
60
+ "response" => { "url" => url }
61
+ }
62
+ }
63
+ function_responses << {
64
+ "inline_data" => {
65
+ "mime_type" => "image/png",
66
+ "data" => encode_image(screenshot_path)
67
+ }
68
+ }
69
+ end
70
+
71
+ user_turn = { "role" => "user", "parts" => function_responses }
72
+ @conversation_history << user_turn
73
+ end
74
+ end
75
+
76
+ private
77
+
78
+ def debug?
79
+ ENV['TOURING_TEST_DEBUG'] == 'true' || ENV['TOURING_TEST_DEBUG'] == '1'
80
+ end
81
+
82
+ def build_user_turn(text, screenshot_path, url, function_responses = [])
83
+ parts = function_responses
84
+ parts << { "text" => text } if text
85
+ parts << { "inline_data" => { "mime_type" => "image/png", "data" => encode_image(screenshot_path) } }
86
+ parts << { "text" => "URL: #{url}" }
87
+ { "role" => "user", "parts" => parts }
88
+ end
89
+
90
+ def make_api_request(contents)
91
+ model = "gemini-2.5-computer-use-preview-10-2025"
92
+ api_url = "https://generativelanguage.googleapis.com/v1beta/models/#{model}:generateContent?key=#{@api_key}"
93
+
94
+ body = {
95
+ "contents" => contents,
96
+ "tools" => [{
97
+ "computer_use" => {
98
+ "environment" => "ENVIRONMENT_BROWSER"
99
+ }
100
+ }]
101
+ }
102
+
103
+ response = HTTParty.post(
104
+ api_url,
105
+ headers: { "Content-Type" => "application/json" },
106
+ body: body.to_json
107
+ )
108
+
109
+ if response.success?
110
+ parsed_response = response.parsed_response
111
+
112
+ # Log full JSON to file
113
+ log_api_call(body, parsed_response)
114
+
115
+ # Print clean conversation to console
116
+ print_conversation_update(parsed_response)
117
+
118
+ parsed_response
119
+ else
120
+ error_message = "Gemini API error: #{response.code} - #{response.body}"
121
+ raise StandardError, error_message
122
+ end
123
+ end
124
+
125
+ def encode_image(path)
126
+ Base64.strict_encode64(File.binread(path))
127
+ end
128
+
129
+ def setup_log_file
130
+ log_dir = File.join(Dir.pwd, "tmp")
131
+ FileUtils.mkdir_p(log_dir)
132
+ File.join(log_dir, "gemini_api_log.jsonl")
133
+ end
134
+
135
+ def log_api_call(request, response)
136
+ File.open(@log_file, "a") do |f|
137
+ f.puts({
138
+ timestamp: Time.now.iso8601,
139
+ request: request,
140
+ response: response
141
+ }.to_json)
142
+ end
143
+ end
144
+
145
+ def print_conversation_update(response)
146
+ parts = response.dig("candidates", 0, "content", "parts") || []
147
+
148
+ parts.each do |part|
149
+ if part["text"]
150
+ puts "[Model] #{part['text']}"
151
+ end
152
+ end
153
+ end
154
+ end
155
+ end
@@ -0,0 +1,223 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "capybara"
4
+ require "fileutils"
5
+
6
+ module TouringTest
7
+ class Driver
8
+ attr_reader :session
9
+
10
+ def initialize(session, root_path:)
11
+ @session = session
12
+ @screenshot_dir = File.join(root_path, "tmp", "screenshots")
13
+ @screenshot_count = 0
14
+ @last_screenshot_path = nil
15
+ FileUtils.mkdir_p(@screenshot_dir)
16
+ FileUtils.rm_f(Dir.glob("#{@screenshot_dir}/*")) # Clear old screenshots
17
+ end
18
+
19
+ def capture_screenshot_and_url
20
+ [capture_screenshot, session.current_url]
21
+ end
22
+
23
+ def execute_action(function_call)
24
+ action_name = function_call["name"].to_sym
25
+ args = function_call["args"].transform_keys(&:to_sym)
26
+
27
+ puts "[Action] #{action_name}(#{args.map { |k, v| "#{k}: #{v.inspect}" }.join(', ')})"
28
+
29
+ if respond_to?(action_name, true)
30
+ send(action_name, **args)
31
+ else
32
+ raise "Unknown action: #{action_name}"
33
+ end
34
+ end
35
+
36
+ private
37
+
38
+ def debug?
39
+ ENV['TOURING_TEST_DEBUG'] == 'true' || ENV['TOURING_TEST_DEBUG'] == '1'
40
+ end
41
+
42
+ def find_element_at(x, y)
43
+ # Search in reverse to find the most specific (innermost) element
44
+ session.all('*').to_a.reverse.find do |el|
45
+ rect = session.evaluate_script("arguments[0].getBoundingClientRect()", el)
46
+ rect['left'] <= x && x <= rect['right'] && rect['top'] <= y && y <= rect['bottom']
47
+ end
48
+ end
49
+
50
+ def capture_screenshot
51
+ @screenshot_count += 1
52
+ path = File.join(@screenshot_dir, "step_#{@screenshot_count}.png")
53
+
54
+ begin
55
+ session.save_screenshot(path)
56
+ @last_screenshot_path = path
57
+
58
+ if debug?
59
+ # Verify the file was actually created
60
+ if File.exist?(path)
61
+ file_size = File.size(path)
62
+ puts "[Debug] Screenshot captured: #{path} (#{file_size} bytes, count: #{@screenshot_count})"
63
+ else
64
+ puts "[Debug] WARNING: File not created at #{path}"
65
+ end
66
+ end
67
+ rescue => e
68
+ puts "[Error] Failed to save screenshot: #{e.message}"
69
+ puts "[Error] #{e.backtrace.first(3).join("\n")}" if debug?
70
+ raise
71
+ end
72
+
73
+ path
74
+ end
75
+
76
+ def denormalize_coordinates(x, y)
77
+ # Use screenshot dimensions instead of window size to account for device pixel ratio
78
+ require 'open3'
79
+ if @last_screenshot_path && File.exist?(@last_screenshot_path)
80
+ stdout, _stderr, _status = Open3.capture3("file", @last_screenshot_path)
81
+ if stdout =~ /(\d+) x (\d+)/
82
+ width = $1.to_i
83
+ height = $2.to_i
84
+ return [(x / 1000.0 * width).round, (y / 1000.0 * height).round]
85
+ end
86
+ end
87
+
88
+ # Fallback to window size
89
+ width = session.current_window.size[0]
90
+ height = session.current_window.size[1]
91
+ [(x / 1000.0 * width).round, (y / 1000.0 * height).round]
92
+ end
93
+
94
+ # Supported UI Actions
95
+
96
+ def click_at(x:, y:)
97
+ x, y = denormalize_coordinates(x, y)
98
+ element = find_element_at(x, y)
99
+ if element
100
+ element.click
101
+ else
102
+ puts "[Warning] No element found at (#{x}, #{y}) to click."
103
+ end
104
+ end
105
+
106
+ def type_text_at(x:, y:, text:, press_enter: false, clear_before_typing: false)
107
+ x, y = denormalize_coordinates(x, y)
108
+ element = find_element_at(x, y)
109
+ if element
110
+ element.click # Focus the element first
111
+ if clear_before_typing
112
+ element.send_keys([:control, 'a'], :backspace)
113
+ end
114
+ element.send_keys(text)
115
+ element.send_keys(:enter) if press_enter
116
+ else
117
+ puts "[Warning] No element found at (#{x}, #{y}) to type in."
118
+ end
119
+ end
120
+
121
+ def hover_at(x:, y:)
122
+ x, y = denormalize_coordinates(x, y)
123
+ element = find_element_at(x, y)
124
+ if element
125
+ element.hover
126
+ else
127
+ puts "[Warning] No element found at (#{x}, #{y}) to hover over."
128
+ end
129
+ end
130
+
131
+ def scroll_document(direction:)
132
+ case direction.to_s.upcase
133
+ when "UP"
134
+ session.execute_script("window.scrollBy(0, -window.innerHeight)")
135
+ when "DOWN"
136
+ session.execute_script("window.scrollBy(0, window.innerHeight)")
137
+ when "LEFT"
138
+ session.execute_script("window.scrollBy(-window.innerWidth, 0)")
139
+ when "RIGHT"
140
+ session.execute_script("window.scrollBy(window.innerWidth, 0)")
141
+ else
142
+ raise "Unknown scroll direction: #{direction}"
143
+ end
144
+ end
145
+
146
+ def drag_and_drop(start_x:, start_y:, end_x:, end_y:)
147
+ start_x, start_y = denormalize_coordinates(start_x, start_y)
148
+ end_x, end_y = denormalize_coordinates(end_x, end_y)
149
+
150
+ element_to_drag = find_element_at(start_x, start_y)
151
+ target_element = find_element_at(end_x, end_y)
152
+
153
+ if element_to_drag
154
+ if target_element
155
+ element_to_drag.drag_to(target_element)
156
+ else
157
+ # Fallback if no specific target, drag by offset
158
+ session.driver.browser.action.drag_and_drop_by(element_to_drag.native, end_x - start_x, end_y - start_y).perform
159
+ end
160
+ else
161
+ puts "[Warning] No element found at (#{start_x}, #{start_y}) to drag."
162
+ end
163
+ end
164
+
165
+ def scroll_at(x:, y:, direction:)
166
+ x, y = denormalize_coordinates(x, y)
167
+ element = find_element_at(x, y)
168
+ if element
169
+ case direction.to_s.upcase
170
+ when "UP"
171
+ session.execute_script("arguments[0].scrollTop -= arguments[0].clientHeight", element)
172
+ when "DOWN"
173
+ session.execute_script("arguments[0].scrollTop += arguments[0].clientHeight", element)
174
+ when "LEFT"
175
+ session.execute_script("arguments[0].scrollLeft -= arguments[0].clientWidth", element)
176
+ when "RIGHT"
177
+ session.execute_script("arguments[0].scrollLeft += arguments[0].clientWidth", element)
178
+ else
179
+ raise "Unknown scroll direction: #{direction}"
180
+ end
181
+ else
182
+ puts "[Warning] No element found at (#{x}, #{y}) to scroll."
183
+ end
184
+ end
185
+
186
+ def navigate(url:)
187
+ session.visit(url)
188
+ end
189
+
190
+ def go_back
191
+ session.go_back
192
+ end
193
+
194
+ def go_forward
195
+ session.go_forward
196
+ end
197
+
198
+ def wait_5_seconds
199
+ sleep 5
200
+ end
201
+
202
+ def key_combination(keys:)
203
+ # keys can be a string like "enter" or "ctrl+a"
204
+ # Convert to Capybara format
205
+ key_array = keys.split('+').map do |k|
206
+ case k.strip.downcase
207
+ when 'ctrl', 'control' then :control
208
+ when 'shift' then :shift
209
+ when 'alt', 'option' then :alt
210
+ when 'cmd', 'command', 'meta' then :command
211
+ when 'enter', 'return' then :enter
212
+ when 'tab' then :tab
213
+ when 'escape', 'esc' then :escape
214
+ when 'backspace' then :backspace
215
+ when 'delete' then :delete
216
+ else k
217
+ end
218
+ end
219
+
220
+ session.send_keys(key_array)
221
+ end
222
+ end
223
+ end
@@ -0,0 +1,26 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "rails"
4
+
5
+ module TouringTest
6
+ class Railtie < Rails::Railtie
7
+ generators do |app|
8
+ Rails::Generators.invoke "cucumber:install"
9
+ end
10
+
11
+ initializer "cucumber_gemini_computer_use.load" do
12
+ ActiveSupport.on_load(:after_initialize) do
13
+ support_file = Rails.root.join("features/support/touring_test.rb")
14
+ unless File.exist?(support_file)
15
+ File.open(support_file, "w") do |f|
16
+ f.puts "# frozen_string_literal: true"
17
+ f.puts ""
18
+ f.puts "require 'touring_test'"
19
+ f.puts ""
20
+ f.puts "World(TouringTest::WorldExtension)"
21
+ end
22
+ end
23
+ end
24
+ end
25
+ end
26
+ end
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module TouringTest
4
+ VERSION = "0.0.1"
5
+ end
@@ -0,0 +1,14 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "agent"
4
+ require_relative "driver"
5
+
6
+ module TouringTest
7
+ module WorldExtension
8
+ def computer_use(instruction, root_path: Dir.pwd)
9
+ driver = Driver.new(page, root_path: root_path)
10
+ agent = Agent.new(driver, instruction)
11
+ agent.run
12
+ end
13
+ end
14
+ end
@@ -0,0 +1,11 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "touring_test/version"
4
+ require_relative "touring_test/driver"
5
+ require_relative "touring_test/agent"
6
+ require_relative "touring_test/world_extension"
7
+
8
+ module TouringTest
9
+ class Error < StandardError; end
10
+ # Your code goes here...
11
+ end
@@ -0,0 +1,10 @@
1
+ module Cucumber
2
+ module Gemini
3
+ module Computer
4
+ module Use
5
+ VERSION: String
6
+ # See the writing guide of rbs: https://github.com/ruby/rbs#guides
7
+ end
8
+ end
9
+ end
10
+ end
Binary file
metadata ADDED
@@ -0,0 +1,110 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: touring_test
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ platform: ruby
6
+ authors:
7
+ - Scott Werner
8
+ bindir: exe
9
+ cert_chain: []
10
+ date: 1980-01-02 00:00:00.000000000 Z
11
+ dependencies:
12
+ - !ruby/object:Gem::Dependency
13
+ name: cucumber
14
+ requirement: !ruby/object:Gem::Requirement
15
+ requirements:
16
+ - - ">="
17
+ - !ruby/object:Gem::Version
18
+ version: '0'
19
+ type: :runtime
20
+ prerelease: false
21
+ version_requirements: !ruby/object:Gem::Requirement
22
+ requirements:
23
+ - - ">="
24
+ - !ruby/object:Gem::Version
25
+ version: '0'
26
+ - !ruby/object:Gem::Dependency
27
+ name: capybara
28
+ requirement: !ruby/object:Gem::Requirement
29
+ requirements:
30
+ - - ">="
31
+ - !ruby/object:Gem::Version
32
+ version: '0'
33
+ type: :runtime
34
+ prerelease: false
35
+ version_requirements: !ruby/object:Gem::Requirement
36
+ requirements:
37
+ - - ">="
38
+ - !ruby/object:Gem::Version
39
+ version: '0'
40
+ - !ruby/object:Gem::Dependency
41
+ name: httparty
42
+ requirement: !ruby/object:Gem::Requirement
43
+ requirements:
44
+ - - ">="
45
+ - !ruby/object:Gem::Version
46
+ version: '0'
47
+ type: :runtime
48
+ prerelease: false
49
+ version_requirements: !ruby/object:Gem::Requirement
50
+ requirements:
51
+ - - ">="
52
+ - !ruby/object:Gem::Version
53
+ version: '0'
54
+ - !ruby/object:Gem::Dependency
55
+ name: playwright-ruby-client
56
+ requirement: !ruby/object:Gem::Requirement
57
+ requirements:
58
+ - - ">="
59
+ - !ruby/object:Gem::Version
60
+ version: '0'
61
+ type: :runtime
62
+ prerelease: false
63
+ version_requirements: !ruby/object:Gem::Requirement
64
+ requirements:
65
+ - - ">="
66
+ - !ruby/object:Gem::Version
67
+ version: '0'
68
+ description: This gem provides a simple way to use Google's 'computer use' Gemini
69
+ model within your Cucumber step definitions, allowing you to write high-level instructions
70
+ for an AI agent to execute.
71
+ email:
72
+ - scott@sublayer.com
73
+ executables: []
74
+ extensions: []
75
+ extra_rdoc_files: []
76
+ files:
77
+ - ".claude/settings.local.json"
78
+ - README.md
79
+ - Rakefile
80
+ - lib/touring_test.rb
81
+ - lib/touring_test/agent.rb
82
+ - lib/touring_test/driver.rb
83
+ - lib/touring_test/railtie.rb
84
+ - lib/touring_test/version.rb
85
+ - lib/touring_test/world_extension.rb
86
+ - sig/cucumber/gemini/computer/use.rbs
87
+ - touring_test_example.png
88
+ homepage: https://github.com/stwerner92/touring_test
89
+ licenses: []
90
+ metadata:
91
+ homepage_uri: https://github.com/stwerner92/touring_test
92
+ source_code_uri: https://github.com/works-on-your-machine/touring_test
93
+ rdoc_options: []
94
+ require_paths:
95
+ - lib
96
+ required_ruby_version: !ruby/object:Gem::Requirement
97
+ requirements:
98
+ - - ">="
99
+ - !ruby/object:Gem::Version
100
+ version: 3.2.0
101
+ required_rubygems_version: !ruby/object:Gem::Requirement
102
+ requirements:
103
+ - - ">="
104
+ - !ruby/object:Gem::Version
105
+ version: '0'
106
+ requirements: []
107
+ rubygems_version: 3.7.2
108
+ specification_version: 4
109
+ summary: A Cucumber support gem for using Google's 'computer use' Gemini model.
110
+ test_files: []