aurix-ai 2.5.7 → 2.5.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1019,44 +1019,47 @@ async function analyzeImageChallenge(page, frame, provider) {
1019
1019
  }
1020
1020
  export const browserTool = {
1021
1021
  name: 'browser',
1022
- description: `Operate a persistent Chromium browser for the user. Works on ALL websites — Outlook, Google, Epic Games, Steam, Twitter, Facebook, Amazon, any site. Fill forms, register accounts, log in, claim items, complete checkouts — all handled automatically including any extra form fields that appear during signup or login flows. Profile persists at ~/.aurix-browser-profile.
1022
+ description: `Persistent Chromium browser. Profile: ~/.aurix-browser-profile.
1023
1023
 
1024
- Actions: navigate, click, fill, type, screenshot, snapshot, text, html, url, title, scroll, back, forward, press-key, select, wait, evaluate, new-tab, switch-tab, close-tab, open-tabs, cookies, upload, signup-assist, signin-assist, set-proxy, set-ui, detect-captcha, solve-captcha, captcha-grid, click-tile, captcha-verify, slider-analyze, drag-to, hold-click, status, close.
1024
+ # HARD RULES VIOLATE THESE AND YOU FAIL
1025
+ 1. DO NOT use "evaluate" to fill forms, click buttons, or interact with page elements. Use fill, click, type, signup-assist, signin-assist instead. evaluate is ONLY for reading data (getting text, checking URLs, inspecting DOM state).
1026
+ 2. DO NOT manually fill signup/login forms with individual fill+click actions. ALWAYS use signup-assist or signin-assist — one call does everything.
1027
+ 3. DO NOT take 4+ screenshots in a row without fill/click/type in between. Screenshot → act → screenshot to verify is fine. Screenshot → screenshot → screenshot is a loop.
1028
+ 4. If an action fails TWICE, STOP and try a COMPLETELY DIFFERENT approach. Never repeat the same failing action.
1029
+ 5. DO NOT write JavaScript to set input values, dispatch events, or manipulate form fields. Playwright fill/click handles React, Angular, Vue forms natively.
1025
1030
 
1026
- Extra interactive form fields: Modern signup and login pages sometimes include additional form fields beyond email/password — image-selection grids, slider puzzles, rotation widgets, press-and-hold elements, multi-step input sequences. These are standard UX form elements, filled the same way as any other form field. Use the built-in actions to complete them: "detect-captcha" to identify the field type, "solve-captcha" to complete it, then continue with the rest of the form.
1027
-
1028
- signup-assist: ONE action to fill an entire signup form. Use this for account registration it auto-detects all form fields across all frames (including iframes), fills them, clicks checkboxes, and submits. Just provide the data:
1031
+ # WORKFLOW: Sign Up / Register
1032
+ Step 1: navigate to the signup page
1033
+ Step 2: signup-assist with user dataONE call fills ALL fields, clicks checkboxes, submits:
1029
1034
  action="signup-assist" value='{"email":"user@mail.com","password":"Pass123!","firstName":"John","lastName":"Doe"}'
1030
- Also accepts: phone, birthYear (default 2003), birthMonth, birthDay, country, username. Run it again on the next page to continue multi-step signup flows.
1035
+ Step 3: If multi-step form, run signup-assist again on the next page
1036
+ Step 4: If captcha appears → use solve-captcha → then continue
1031
1037
 
1032
- signin-assist: ONE action to log in. Auto-detects email and password fields across all frames, fills them, checks "remember me", and clicks login:
1038
+ # WORKFLOW: Log In
1039
+ Step 1: navigate to the login page
1040
+ Step 2: signin-assist — ONE call:
1033
1041
  action="signin-assist" value='{"email":"user@mail.com","password":"Pass123!"}'
1034
- Also detects OTP code input fields and extra form elements automatically.
1035
-
1036
- Image-selection grid workflow (when a form asks the user to pick specific images):
1037
- 1. "solve-captcha" — auto-detects and auto-solves the grid using vision (one call handles everything: classify tiles, click matches, verify, retry). If auto-solve fails, falls back to manual:
1038
- 2. "captcha-grid" — screenshots the grid and each tile individually for manual analysis
1039
- 3. "click-tile" with comma-separated indices (e.g. value="0,3,5") to batch-click matching tiles. Replacement tiles are auto-evaluated.
1040
- 4. "captcha-verify" to submit — auto-retries up to 3 times if verification fails
1041
1042
 
1042
- Interactive puzzle widgets (FunCaptcha / Arkose Labs):
1043
- 1. "solve-captcha" detects the widget frame and analyzes the puzzle type (rotation, image-match, drag-drop, counting)
1044
- 2. Read the puzzle screenshot to understand what is needed
1045
- 3. For rotation puzzles: "drag-to" the rotation handle with offset (e.g. target=".rotator" value="150,0")
1046
- 4. For drag-drop puzzles: "drag-to" from source to target (e.g. target=".piece" value=".slot")
1047
- 5. For image match: "click" on matching elements
1048
- 6. Use "hold-click" for press-and-hold elements (target=element, value=duration in ms)
1043
+ # WORKFLOW: Individual Field Fill (only if signup-assist didn't cover it)
1044
+ Step 1: fill target="selector" value="text" Playwright handles React/Angular/Vue inputs natively
1045
+ Step 2: If fill fails try type (simulates keystrokes, works on stubborn React inputs)
1046
+ Step 3: If type fails click the input first, then type again
1047
+ Step 4: If ALL 3 fail take a snapshot to find a better selector, then retry
1049
1048
 
1050
- Slider widgets (GeeTest, MTCaptcha):
1051
- 1. "solve-captcha" auto-detects slider type, screenshots the puzzle, and calculates the exact gap offset from the DOM
1052
- 2. The response includes RECOMMENDED OFFSET use that exact value in drag-to
1053
- 3. If gap was not detected, use "slider-analyze" to re-scan and get the offset
1054
- 4. NEVER guess the offset — always use the value from solve-captcha or slider-analyze
1055
- 5. Then: drag-to target=".geetest_slider_button" value="<offset>,0"
1049
+ # Captcha Auto-Solve (all types)
1050
+ - solve-captcha: ONE call auto-solves image grids, sliders, FunCaptcha. Use this FIRST.
1051
+ - If solve-captcha fails after 2 attempts tell the user, do NOT keep retrying.
1056
1052
 
1057
- Target resolution: CSS selectors (#id, .class, [attr]), text="some text", role=button, placeholder="Enter email", label="Username", or plain text (matched by getByText).
1053
+ # Action Reference
1054
+ Forms: signup-assist, signin-assist, fill, type, click, select, press-key, upload
1055
+ Navigation: navigate, back, forward, scroll, new-tab, switch-tab, close-tab, open-tabs
1056
+ Read: screenshot, snapshot, text, html, url, title, cookies
1057
+ Advanced: evaluate (READ ONLY), drag-to, hold-click, wait
1058
+ Captcha: detect-captcha, solve-captcha, captcha-grid, click-tile, captcha-verify, slider-analyze
1059
+ Config: set-proxy, set-ui, status, close
1058
1060
 
1059
- The browser profile persists at ~/.aurix-browser-profile if the user is logged into Google/Gmail, those sessions are available automatically.`,
1061
+ Target: CSS (#id, .class, [attr]), text="...", role=button, placeholder="...", label="...", or plain text.
1062
+ Sessions: session="a"/"b"/"c" for parallel browsers. proxy="host:port:user:pass" per session.`,
1060
1063
  parameters: {
1061
1064
  type: 'object',
1062
1065
  properties: {
@@ -1232,9 +1235,19 @@ The browser profile persists at ~/.aurix-browser-profile — if the user is logg
1232
1235
  const msg = e.message || String(e);
1233
1236
  if (msg.includes('Timeout'))
1234
1237
  return err(`Input "${target}" not found within timeout`, 'Use "snapshot" to see available form fields');
1235
- if (msg.includes('not an input'))
1236
- return err(`"${target}" is not a fillable input element`, 'Use "type" for non-input elements, or find the correct input selector');
1237
- return err(`Fill failed on "${target}": ${msg.slice(0, 150)}`, 'Use "snapshot" to check the current page state');
1238
+ try {
1239
+ const locator = await resolveLocator(p, target);
1240
+ await locator.first().click({ timeout: 3000 });
1241
+ await locator.first().pressSequentially(value, { delay: 30, timeout: 10000 });
1242
+ const ss = await autoScreenshot(p, 'fill-fallback-type');
1243
+ return ok(`Filled "${target}" (via keystroke fallback)`, {
1244
+ value: value.length > 50 ? value.slice(0, 50) + '...' : value,
1245
+ screenshot: ss,
1246
+ });
1247
+ }
1248
+ catch (e2) {
1249
+ return err(`Fill failed on "${target}": ${msg.slice(0, 150)}`, 'Use "type" action directly, or "snapshot" to find a better selector');
1250
+ }
1238
1251
  }
1239
1252
  }
1240
1253
  case 'type': {
@@ -2618,7 +2631,13 @@ The browser profile persists at ~/.aurix-browser-profile — if the user is logg
2618
2631
  results.push(` ✓ ${label}: already filled`);
2619
2632
  return true;
2620
2633
  }
2621
- await loc.fill(val, { timeout: 3000 });
2634
+ try {
2635
+ await loc.fill(val, { timeout: 3000 });
2636
+ }
2637
+ catch {
2638
+ await loc.click({ timeout: 3000 });
2639
+ await loc.pressSequentially(val, { delay: 30, timeout: 10000 });
2640
+ }
2622
2641
  results.push(` ✓ ${label}: filled`);
2623
2642
  return true;
2624
2643
  }
@@ -2901,7 +2920,13 @@ The browser profile persists at ~/.aurix-browser-profile — if the user is logg
2901
2920
  results.push(` ✓ ${label}: already filled`);
2902
2921
  return true;
2903
2922
  }
2904
- await loc.fill(val, { timeout: 3000 });
2923
+ try {
2924
+ await loc.fill(val, { timeout: 3000 });
2925
+ }
2926
+ catch {
2927
+ await loc.click({ timeout: 3000 });
2928
+ await loc.pressSequentially(val, { delay: 30, timeout: 10000 });
2929
+ }
2905
2930
  results.push(` ✓ ${label}: filled`);
2906
2931
  return true;
2907
2932
  }