rubycrawl 0.1.2 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (577) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +427 -210
  3. data/lib/rubycrawl/helpers.rb +15 -11
  4. data/lib/rubycrawl/markdown_converter.rb +3 -3
  5. data/lib/rubycrawl/result.rb +10 -11
  6. data/lib/rubycrawl/service_client.rb +25 -3
  7. data/lib/rubycrawl/site_crawler.rb +14 -6
  8. data/lib/rubycrawl/version.rb +1 -1
  9. data/lib/rubycrawl.rb +33 -7
  10. data/node/src/index.js +193 -19
  11. data/rubycrawl.gemspec +3 -2
  12. metadata +2 -567
  13. data/Gemfile +0 -11
  14. data/bin/console +0 -9
  15. data/bin/setup +0 -4
  16. data/node/node_modules/.bin/playwright +0 -1
  17. data/node/node_modules/.bin/playwright-core +0 -1
  18. data/node/node_modules/.package-lock.json +0 -65
  19. data/node/node_modules/dotenv/CHANGELOG.md +0 -520
  20. data/node/node_modules/dotenv/LICENSE +0 -23
  21. data/node/node_modules/dotenv/README-es.md +0 -411
  22. data/node/node_modules/dotenv/README.md +0 -645
  23. data/node/node_modules/dotenv/SECURITY.md +0 -1
  24. data/node/node_modules/dotenv/config.d.ts +0 -1
  25. data/node/node_modules/dotenv/config.js +0 -9
  26. data/node/node_modules/dotenv/lib/cli-options.js +0 -17
  27. data/node/node_modules/dotenv/lib/env-options.js +0 -28
  28. data/node/node_modules/dotenv/lib/main.d.ts +0 -162
  29. data/node/node_modules/dotenv/lib/main.js +0 -386
  30. data/node/node_modules/dotenv/package.json +0 -62
  31. data/node/node_modules/playwright/LICENSE +0 -202
  32. data/node/node_modules/playwright/NOTICE +0 -5
  33. data/node/node_modules/playwright/README.md +0 -168
  34. data/node/node_modules/playwright/ThirdPartyNotices.txt +0 -5042
  35. data/node/node_modules/playwright/cli.js +0 -19
  36. data/node/node_modules/playwright/index.d.ts +0 -17
  37. data/node/node_modules/playwright/index.js +0 -17
  38. data/node/node_modules/playwright/index.mjs +0 -18
  39. data/node/node_modules/playwright/jsx-runtime.js +0 -42
  40. data/node/node_modules/playwright/jsx-runtime.mjs +0 -21
  41. data/node/node_modules/playwright/lib/agents/agentParser.js +0 -89
  42. data/node/node_modules/playwright/lib/agents/copilot-setup-steps.yml +0 -34
  43. data/node/node_modules/playwright/lib/agents/generateAgents.js +0 -348
  44. data/node/node_modules/playwright/lib/agents/playwright-test-coverage.prompt.md +0 -31
  45. data/node/node_modules/playwright/lib/agents/playwright-test-generate.prompt.md +0 -8
  46. data/node/node_modules/playwright/lib/agents/playwright-test-generator.agent.md +0 -88
  47. data/node/node_modules/playwright/lib/agents/playwright-test-heal.prompt.md +0 -6
  48. data/node/node_modules/playwright/lib/agents/playwright-test-healer.agent.md +0 -55
  49. data/node/node_modules/playwright/lib/agents/playwright-test-plan.prompt.md +0 -9
  50. data/node/node_modules/playwright/lib/agents/playwright-test-planner.agent.md +0 -73
  51. data/node/node_modules/playwright/lib/common/config.js +0 -282
  52. data/node/node_modules/playwright/lib/common/configLoader.js +0 -344
  53. data/node/node_modules/playwright/lib/common/esmLoaderHost.js +0 -104
  54. data/node/node_modules/playwright/lib/common/expectBundle.js +0 -28
  55. data/node/node_modules/playwright/lib/common/expectBundleImpl.js +0 -407
  56. data/node/node_modules/playwright/lib/common/fixtures.js +0 -302
  57. data/node/node_modules/playwright/lib/common/globals.js +0 -58
  58. data/node/node_modules/playwright/lib/common/ipc.js +0 -60
  59. data/node/node_modules/playwright/lib/common/poolBuilder.js +0 -85
  60. data/node/node_modules/playwright/lib/common/process.js +0 -132
  61. data/node/node_modules/playwright/lib/common/suiteUtils.js +0 -140
  62. data/node/node_modules/playwright/lib/common/test.js +0 -321
  63. data/node/node_modules/playwright/lib/common/testLoader.js +0 -101
  64. data/node/node_modules/playwright/lib/common/testType.js +0 -298
  65. data/node/node_modules/playwright/lib/common/validators.js +0 -68
  66. data/node/node_modules/playwright/lib/fsWatcher.js +0 -67
  67. data/node/node_modules/playwright/lib/index.js +0 -726
  68. data/node/node_modules/playwright/lib/internalsForTest.js +0 -42
  69. data/node/node_modules/playwright/lib/isomorphic/events.js +0 -77
  70. data/node/node_modules/playwright/lib/isomorphic/folders.js +0 -30
  71. data/node/node_modules/playwright/lib/isomorphic/stringInternPool.js +0 -69
  72. data/node/node_modules/playwright/lib/isomorphic/teleReceiver.js +0 -521
  73. data/node/node_modules/playwright/lib/isomorphic/teleSuiteUpdater.js +0 -157
  74. data/node/node_modules/playwright/lib/isomorphic/testServerConnection.js +0 -225
  75. data/node/node_modules/playwright/lib/isomorphic/testServerInterface.js +0 -16
  76. data/node/node_modules/playwright/lib/isomorphic/testTree.js +0 -329
  77. data/node/node_modules/playwright/lib/isomorphic/types.d.js +0 -16
  78. data/node/node_modules/playwright/lib/loader/loaderMain.js +0 -59
  79. data/node/node_modules/playwright/lib/matchers/expect.js +0 -311
  80. data/node/node_modules/playwright/lib/matchers/matcherHint.js +0 -44
  81. data/node/node_modules/playwright/lib/matchers/matchers.js +0 -383
  82. data/node/node_modules/playwright/lib/matchers/toBeTruthy.js +0 -75
  83. data/node/node_modules/playwright/lib/matchers/toEqual.js +0 -100
  84. data/node/node_modules/playwright/lib/matchers/toHaveURL.js +0 -101
  85. data/node/node_modules/playwright/lib/matchers/toMatchAriaSnapshot.js +0 -159
  86. data/node/node_modules/playwright/lib/matchers/toMatchSnapshot.js +0 -342
  87. data/node/node_modules/playwright/lib/matchers/toMatchText.js +0 -99
  88. data/node/node_modules/playwright/lib/mcp/browser/browserContextFactory.js +0 -329
  89. data/node/node_modules/playwright/lib/mcp/browser/browserServerBackend.js +0 -84
  90. data/node/node_modules/playwright/lib/mcp/browser/config.js +0 -421
  91. data/node/node_modules/playwright/lib/mcp/browser/context.js +0 -244
  92. data/node/node_modules/playwright/lib/mcp/browser/response.js +0 -278
  93. data/node/node_modules/playwright/lib/mcp/browser/sessionLog.js +0 -75
  94. data/node/node_modules/playwright/lib/mcp/browser/tab.js +0 -343
  95. data/node/node_modules/playwright/lib/mcp/browser/tools/common.js +0 -65
  96. data/node/node_modules/playwright/lib/mcp/browser/tools/console.js +0 -46
  97. data/node/node_modules/playwright/lib/mcp/browser/tools/dialogs.js +0 -60
  98. data/node/node_modules/playwright/lib/mcp/browser/tools/evaluate.js +0 -61
  99. data/node/node_modules/playwright/lib/mcp/browser/tools/files.js +0 -58
  100. data/node/node_modules/playwright/lib/mcp/browser/tools/form.js +0 -63
  101. data/node/node_modules/playwright/lib/mcp/browser/tools/install.js +0 -72
  102. data/node/node_modules/playwright/lib/mcp/browser/tools/keyboard.js +0 -107
  103. data/node/node_modules/playwright/lib/mcp/browser/tools/mouse.js +0 -107
  104. data/node/node_modules/playwright/lib/mcp/browser/tools/navigate.js +0 -71
  105. data/node/node_modules/playwright/lib/mcp/browser/tools/network.js +0 -63
  106. data/node/node_modules/playwright/lib/mcp/browser/tools/open.js +0 -57
  107. data/node/node_modules/playwright/lib/mcp/browser/tools/pdf.js +0 -49
  108. data/node/node_modules/playwright/lib/mcp/browser/tools/runCode.js +0 -78
  109. data/node/node_modules/playwright/lib/mcp/browser/tools/screenshot.js +0 -93
  110. data/node/node_modules/playwright/lib/mcp/browser/tools/snapshot.js +0 -173
  111. data/node/node_modules/playwright/lib/mcp/browser/tools/tabs.js +0 -67
  112. data/node/node_modules/playwright/lib/mcp/browser/tools/tool.js +0 -47
  113. data/node/node_modules/playwright/lib/mcp/browser/tools/tracing.js +0 -74
  114. data/node/node_modules/playwright/lib/mcp/browser/tools/utils.js +0 -94
  115. data/node/node_modules/playwright/lib/mcp/browser/tools/verify.js +0 -143
  116. data/node/node_modules/playwright/lib/mcp/browser/tools/wait.js +0 -63
  117. data/node/node_modules/playwright/lib/mcp/browser/tools.js +0 -84
  118. data/node/node_modules/playwright/lib/mcp/browser/watchdog.js +0 -44
  119. data/node/node_modules/playwright/lib/mcp/config.d.js +0 -16
  120. data/node/node_modules/playwright/lib/mcp/extension/cdpRelay.js +0 -351
  121. data/node/node_modules/playwright/lib/mcp/extension/extensionContextFactory.js +0 -76
  122. data/node/node_modules/playwright/lib/mcp/extension/protocol.js +0 -28
  123. data/node/node_modules/playwright/lib/mcp/index.js +0 -61
  124. data/node/node_modules/playwright/lib/mcp/log.js +0 -35
  125. data/node/node_modules/playwright/lib/mcp/program.js +0 -111
  126. data/node/node_modules/playwright/lib/mcp/sdk/exports.js +0 -28
  127. data/node/node_modules/playwright/lib/mcp/sdk/http.js +0 -152
  128. data/node/node_modules/playwright/lib/mcp/sdk/inProcessTransport.js +0 -71
  129. data/node/node_modules/playwright/lib/mcp/sdk/server.js +0 -223
  130. data/node/node_modules/playwright/lib/mcp/sdk/tool.js +0 -47
  131. data/node/node_modules/playwright/lib/mcp/terminal/cli.js +0 -296
  132. data/node/node_modules/playwright/lib/mcp/terminal/command.js +0 -56
  133. data/node/node_modules/playwright/lib/mcp/terminal/commands.js +0 -333
  134. data/node/node_modules/playwright/lib/mcp/terminal/daemon.js +0 -129
  135. data/node/node_modules/playwright/lib/mcp/terminal/help.json +0 -32
  136. data/node/node_modules/playwright/lib/mcp/terminal/helpGenerator.js +0 -88
  137. data/node/node_modules/playwright/lib/mcp/terminal/socketConnection.js +0 -80
  138. data/node/node_modules/playwright/lib/mcp/test/browserBackend.js +0 -98
  139. data/node/node_modules/playwright/lib/mcp/test/generatorTools.js +0 -122
  140. data/node/node_modules/playwright/lib/mcp/test/plannerTools.js +0 -145
  141. data/node/node_modules/playwright/lib/mcp/test/seed.js +0 -82
  142. data/node/node_modules/playwright/lib/mcp/test/streams.js +0 -44
  143. data/node/node_modules/playwright/lib/mcp/test/testBackend.js +0 -99
  144. data/node/node_modules/playwright/lib/mcp/test/testContext.js +0 -285
  145. data/node/node_modules/playwright/lib/mcp/test/testTool.js +0 -30
  146. data/node/node_modules/playwright/lib/mcp/test/testTools.js +0 -108
  147. data/node/node_modules/playwright/lib/plugins/gitCommitInfoPlugin.js +0 -198
  148. data/node/node_modules/playwright/lib/plugins/index.js +0 -28
  149. data/node/node_modules/playwright/lib/plugins/webServerPlugin.js +0 -237
  150. data/node/node_modules/playwright/lib/program.js +0 -417
  151. data/node/node_modules/playwright/lib/reporters/base.js +0 -634
  152. data/node/node_modules/playwright/lib/reporters/blob.js +0 -138
  153. data/node/node_modules/playwright/lib/reporters/dot.js +0 -99
  154. data/node/node_modules/playwright/lib/reporters/empty.js +0 -32
  155. data/node/node_modules/playwright/lib/reporters/github.js +0 -128
  156. data/node/node_modules/playwright/lib/reporters/html.js +0 -633
  157. data/node/node_modules/playwright/lib/reporters/internalReporter.js +0 -138
  158. data/node/node_modules/playwright/lib/reporters/json.js +0 -254
  159. data/node/node_modules/playwright/lib/reporters/junit.js +0 -232
  160. data/node/node_modules/playwright/lib/reporters/line.js +0 -131
  161. data/node/node_modules/playwright/lib/reporters/list.js +0 -253
  162. data/node/node_modules/playwright/lib/reporters/listModeReporter.js +0 -69
  163. data/node/node_modules/playwright/lib/reporters/markdown.js +0 -144
  164. data/node/node_modules/playwright/lib/reporters/merge.js +0 -558
  165. data/node/node_modules/playwright/lib/reporters/multiplexer.js +0 -112
  166. data/node/node_modules/playwright/lib/reporters/reporterV2.js +0 -102
  167. data/node/node_modules/playwright/lib/reporters/teleEmitter.js +0 -317
  168. data/node/node_modules/playwright/lib/reporters/versions/blobV1.js +0 -16
  169. data/node/node_modules/playwright/lib/runner/dispatcher.js +0 -530
  170. data/node/node_modules/playwright/lib/runner/failureTracker.js +0 -72
  171. data/node/node_modules/playwright/lib/runner/lastRun.js +0 -77
  172. data/node/node_modules/playwright/lib/runner/loadUtils.js +0 -334
  173. data/node/node_modules/playwright/lib/runner/loaderHost.js +0 -89
  174. data/node/node_modules/playwright/lib/runner/processHost.js +0 -180
  175. data/node/node_modules/playwright/lib/runner/projectUtils.js +0 -241
  176. data/node/node_modules/playwright/lib/runner/rebase.js +0 -189
  177. data/node/node_modules/playwright/lib/runner/reporters.js +0 -138
  178. data/node/node_modules/playwright/lib/runner/sigIntWatcher.js +0 -96
  179. data/node/node_modules/playwright/lib/runner/storage.js +0 -91
  180. data/node/node_modules/playwright/lib/runner/taskRunner.js +0 -127
  181. data/node/node_modules/playwright/lib/runner/tasks.js +0 -410
  182. data/node/node_modules/playwright/lib/runner/testGroups.js +0 -125
  183. data/node/node_modules/playwright/lib/runner/testRunner.js +0 -398
  184. data/node/node_modules/playwright/lib/runner/testServer.js +0 -269
  185. data/node/node_modules/playwright/lib/runner/uiModeReporter.js +0 -30
  186. data/node/node_modules/playwright/lib/runner/vcs.js +0 -72
  187. data/node/node_modules/playwright/lib/runner/watchMode.js +0 -396
  188. data/node/node_modules/playwright/lib/runner/workerHost.js +0 -104
  189. data/node/node_modules/playwright/lib/third_party/pirates.js +0 -62
  190. data/node/node_modules/playwright/lib/third_party/tsconfig-loader.js +0 -103
  191. data/node/node_modules/playwright/lib/transform/babelBundle.js +0 -46
  192. data/node/node_modules/playwright/lib/transform/babelBundleImpl.js +0 -461
  193. data/node/node_modules/playwright/lib/transform/compilationCache.js +0 -274
  194. data/node/node_modules/playwright/lib/transform/esmLoader.js +0 -103
  195. data/node/node_modules/playwright/lib/transform/md.js +0 -221
  196. data/node/node_modules/playwright/lib/transform/portTransport.js +0 -67
  197. data/node/node_modules/playwright/lib/transform/transform.js +0 -303
  198. data/node/node_modules/playwright/lib/util.js +0 -400
  199. data/node/node_modules/playwright/lib/utilsBundle.js +0 -50
  200. data/node/node_modules/playwright/lib/utilsBundleImpl.js +0 -103
  201. data/node/node_modules/playwright/lib/worker/fixtureRunner.js +0 -262
  202. data/node/node_modules/playwright/lib/worker/testInfo.js +0 -536
  203. data/node/node_modules/playwright/lib/worker/testTracing.js +0 -345
  204. data/node/node_modules/playwright/lib/worker/timeoutManager.js +0 -174
  205. data/node/node_modules/playwright/lib/worker/util.js +0 -31
  206. data/node/node_modules/playwright/lib/worker/workerMain.js +0 -530
  207. data/node/node_modules/playwright/package.json +0 -72
  208. data/node/node_modules/playwright/test.d.ts +0 -18
  209. data/node/node_modules/playwright/test.js +0 -24
  210. data/node/node_modules/playwright/test.mjs +0 -34
  211. data/node/node_modules/playwright/types/test.d.ts +0 -10251
  212. data/node/node_modules/playwright/types/testReporter.d.ts +0 -822
  213. data/node/node_modules/playwright-core/LICENSE +0 -202
  214. data/node/node_modules/playwright-core/NOTICE +0 -5
  215. data/node/node_modules/playwright-core/README.md +0 -3
  216. data/node/node_modules/playwright-core/ThirdPartyNotices.txt +0 -4076
  217. data/node/node_modules/playwright-core/bin/install_media_pack.ps1 +0 -5
  218. data/node/node_modules/playwright-core/bin/install_webkit_wsl.ps1 +0 -33
  219. data/node/node_modules/playwright-core/bin/reinstall_chrome_beta_linux.sh +0 -42
  220. data/node/node_modules/playwright-core/bin/reinstall_chrome_beta_mac.sh +0 -13
  221. data/node/node_modules/playwright-core/bin/reinstall_chrome_beta_win.ps1 +0 -24
  222. data/node/node_modules/playwright-core/bin/reinstall_chrome_stable_linux.sh +0 -42
  223. data/node/node_modules/playwright-core/bin/reinstall_chrome_stable_mac.sh +0 -12
  224. data/node/node_modules/playwright-core/bin/reinstall_chrome_stable_win.ps1 +0 -24
  225. data/node/node_modules/playwright-core/bin/reinstall_msedge_beta_linux.sh +0 -48
  226. data/node/node_modules/playwright-core/bin/reinstall_msedge_beta_mac.sh +0 -11
  227. data/node/node_modules/playwright-core/bin/reinstall_msedge_beta_win.ps1 +0 -23
  228. data/node/node_modules/playwright-core/bin/reinstall_msedge_dev_linux.sh +0 -48
  229. data/node/node_modules/playwright-core/bin/reinstall_msedge_dev_mac.sh +0 -11
  230. data/node/node_modules/playwright-core/bin/reinstall_msedge_dev_win.ps1 +0 -23
  231. data/node/node_modules/playwright-core/bin/reinstall_msedge_stable_linux.sh +0 -48
  232. data/node/node_modules/playwright-core/bin/reinstall_msedge_stable_mac.sh +0 -11
  233. data/node/node_modules/playwright-core/bin/reinstall_msedge_stable_win.ps1 +0 -24
  234. data/node/node_modules/playwright-core/browsers.json +0 -79
  235. data/node/node_modules/playwright-core/cli.js +0 -18
  236. data/node/node_modules/playwright-core/index.d.ts +0 -17
  237. data/node/node_modules/playwright-core/index.js +0 -32
  238. data/node/node_modules/playwright-core/index.mjs +0 -28
  239. data/node/node_modules/playwright-core/lib/androidServerImpl.js +0 -65
  240. data/node/node_modules/playwright-core/lib/browserServerImpl.js +0 -120
  241. data/node/node_modules/playwright-core/lib/cli/driver.js +0 -97
  242. data/node/node_modules/playwright-core/lib/cli/program.js +0 -589
  243. data/node/node_modules/playwright-core/lib/cli/programWithTestStub.js +0 -74
  244. data/node/node_modules/playwright-core/lib/client/android.js +0 -361
  245. data/node/node_modules/playwright-core/lib/client/api.js +0 -137
  246. data/node/node_modules/playwright-core/lib/client/artifact.js +0 -79
  247. data/node/node_modules/playwright-core/lib/client/browser.js +0 -161
  248. data/node/node_modules/playwright-core/lib/client/browserContext.js +0 -582
  249. data/node/node_modules/playwright-core/lib/client/browserType.js +0 -185
  250. data/node/node_modules/playwright-core/lib/client/cdpSession.js +0 -51
  251. data/node/node_modules/playwright-core/lib/client/channelOwner.js +0 -194
  252. data/node/node_modules/playwright-core/lib/client/clientHelper.js +0 -64
  253. data/node/node_modules/playwright-core/lib/client/clientInstrumentation.js +0 -55
  254. data/node/node_modules/playwright-core/lib/client/clientStackTrace.js +0 -69
  255. data/node/node_modules/playwright-core/lib/client/clock.js +0 -68
  256. data/node/node_modules/playwright-core/lib/client/connection.js +0 -318
  257. data/node/node_modules/playwright-core/lib/client/consoleMessage.js +0 -58
  258. data/node/node_modules/playwright-core/lib/client/coverage.js +0 -44
  259. data/node/node_modules/playwright-core/lib/client/dialog.js +0 -56
  260. data/node/node_modules/playwright-core/lib/client/download.js +0 -62
  261. data/node/node_modules/playwright-core/lib/client/electron.js +0 -138
  262. data/node/node_modules/playwright-core/lib/client/elementHandle.js +0 -284
  263. data/node/node_modules/playwright-core/lib/client/errors.js +0 -77
  264. data/node/node_modules/playwright-core/lib/client/eventEmitter.js +0 -314
  265. data/node/node_modules/playwright-core/lib/client/events.js +0 -103
  266. data/node/node_modules/playwright-core/lib/client/fetch.js +0 -368
  267. data/node/node_modules/playwright-core/lib/client/fileChooser.js +0 -46
  268. data/node/node_modules/playwright-core/lib/client/fileUtils.js +0 -34
  269. data/node/node_modules/playwright-core/lib/client/frame.js +0 -409
  270. data/node/node_modules/playwright-core/lib/client/harRouter.js +0 -87
  271. data/node/node_modules/playwright-core/lib/client/input.js +0 -84
  272. data/node/node_modules/playwright-core/lib/client/jsHandle.js +0 -109
  273. data/node/node_modules/playwright-core/lib/client/jsonPipe.js +0 -39
  274. data/node/node_modules/playwright-core/lib/client/localUtils.js +0 -60
  275. data/node/node_modules/playwright-core/lib/client/locator.js +0 -369
  276. data/node/node_modules/playwright-core/lib/client/network.js +0 -747
  277. data/node/node_modules/playwright-core/lib/client/page.js +0 -745
  278. data/node/node_modules/playwright-core/lib/client/pageAgent.js +0 -64
  279. data/node/node_modules/playwright-core/lib/client/platform.js +0 -77
  280. data/node/node_modules/playwright-core/lib/client/playwright.js +0 -71
  281. data/node/node_modules/playwright-core/lib/client/selectors.js +0 -55
  282. data/node/node_modules/playwright-core/lib/client/stream.js +0 -39
  283. data/node/node_modules/playwright-core/lib/client/timeoutSettings.js +0 -79
  284. data/node/node_modules/playwright-core/lib/client/tracing.js +0 -119
  285. data/node/node_modules/playwright-core/lib/client/types.js +0 -28
  286. data/node/node_modules/playwright-core/lib/client/video.js +0 -59
  287. data/node/node_modules/playwright-core/lib/client/waiter.js +0 -142
  288. data/node/node_modules/playwright-core/lib/client/webError.js +0 -39
  289. data/node/node_modules/playwright-core/lib/client/webSocket.js +0 -93
  290. data/node/node_modules/playwright-core/lib/client/worker.js +0 -85
  291. data/node/node_modules/playwright-core/lib/client/writableStream.js +0 -39
  292. data/node/node_modules/playwright-core/lib/generated/bindingsControllerSource.js +0 -28
  293. data/node/node_modules/playwright-core/lib/generated/clockSource.js +0 -28
  294. data/node/node_modules/playwright-core/lib/generated/injectedScriptSource.js +0 -28
  295. data/node/node_modules/playwright-core/lib/generated/pollingRecorderSource.js +0 -28
  296. data/node/node_modules/playwright-core/lib/generated/storageScriptSource.js +0 -28
  297. data/node/node_modules/playwright-core/lib/generated/utilityScriptSource.js +0 -28
  298. data/node/node_modules/playwright-core/lib/generated/webSocketMockSource.js +0 -336
  299. data/node/node_modules/playwright-core/lib/inProcessFactory.js +0 -60
  300. data/node/node_modules/playwright-core/lib/inprocess.js +0 -3
  301. data/node/node_modules/playwright-core/lib/mcpBundle.js +0 -84
  302. data/node/node_modules/playwright-core/lib/mcpBundleImpl/index.js +0 -147
  303. data/node/node_modules/playwright-core/lib/outofprocess.js +0 -76
  304. data/node/node_modules/playwright-core/lib/protocol/serializers.js +0 -197
  305. data/node/node_modules/playwright-core/lib/protocol/validator.js +0 -2969
  306. data/node/node_modules/playwright-core/lib/protocol/validatorPrimitives.js +0 -193
  307. data/node/node_modules/playwright-core/lib/remote/playwrightConnection.js +0 -129
  308. data/node/node_modules/playwright-core/lib/remote/playwrightServer.js +0 -334
  309. data/node/node_modules/playwright-core/lib/server/agent/actionRunner.js +0 -335
  310. data/node/node_modules/playwright-core/lib/server/agent/actions.js +0 -128
  311. data/node/node_modules/playwright-core/lib/server/agent/codegen.js +0 -111
  312. data/node/node_modules/playwright-core/lib/server/agent/context.js +0 -150
  313. data/node/node_modules/playwright-core/lib/server/agent/expectTools.js +0 -156
  314. data/node/node_modules/playwright-core/lib/server/agent/pageAgent.js +0 -204
  315. data/node/node_modules/playwright-core/lib/server/agent/performTools.js +0 -262
  316. data/node/node_modules/playwright-core/lib/server/agent/tool.js +0 -109
  317. data/node/node_modules/playwright-core/lib/server/android/android.js +0 -465
  318. data/node/node_modules/playwright-core/lib/server/android/backendAdb.js +0 -177
  319. data/node/node_modules/playwright-core/lib/server/artifact.js +0 -127
  320. data/node/node_modules/playwright-core/lib/server/bidi/bidiBrowser.js +0 -549
  321. data/node/node_modules/playwright-core/lib/server/bidi/bidiChromium.js +0 -148
  322. data/node/node_modules/playwright-core/lib/server/bidi/bidiConnection.js +0 -213
  323. data/node/node_modules/playwright-core/lib/server/bidi/bidiDeserializer.js +0 -116
  324. data/node/node_modules/playwright-core/lib/server/bidi/bidiExecutionContext.js +0 -267
  325. data/node/node_modules/playwright-core/lib/server/bidi/bidiFirefox.js +0 -128
  326. data/node/node_modules/playwright-core/lib/server/bidi/bidiInput.js +0 -146
  327. data/node/node_modules/playwright-core/lib/server/bidi/bidiNetworkManager.js +0 -383
  328. data/node/node_modules/playwright-core/lib/server/bidi/bidiOverCdp.js +0 -102
  329. data/node/node_modules/playwright-core/lib/server/bidi/bidiPage.js +0 -583
  330. data/node/node_modules/playwright-core/lib/server/bidi/bidiPdf.js +0 -106
  331. data/node/node_modules/playwright-core/lib/server/bidi/third_party/bidiCommands.d.js +0 -22
  332. data/node/node_modules/playwright-core/lib/server/bidi/third_party/bidiKeyboard.js +0 -256
  333. data/node/node_modules/playwright-core/lib/server/bidi/third_party/bidiProtocol.js +0 -24
  334. data/node/node_modules/playwright-core/lib/server/bidi/third_party/bidiProtocolCore.js +0 -180
  335. data/node/node_modules/playwright-core/lib/server/bidi/third_party/bidiProtocolPermissions.js +0 -42
  336. data/node/node_modules/playwright-core/lib/server/bidi/third_party/bidiSerializer.js +0 -148
  337. data/node/node_modules/playwright-core/lib/server/bidi/third_party/firefoxPrefs.js +0 -259
  338. data/node/node_modules/playwright-core/lib/server/browser.js +0 -149
  339. data/node/node_modules/playwright-core/lib/server/browserContext.js +0 -702
  340. data/node/node_modules/playwright-core/lib/server/browserType.js +0 -336
  341. data/node/node_modules/playwright-core/lib/server/callLog.js +0 -82
  342. data/node/node_modules/playwright-core/lib/server/chromium/appIcon.png +0 -0
  343. data/node/node_modules/playwright-core/lib/server/chromium/chromium.js +0 -395
  344. data/node/node_modules/playwright-core/lib/server/chromium/chromiumSwitches.js +0 -104
  345. data/node/node_modules/playwright-core/lib/server/chromium/crBrowser.js +0 -511
  346. data/node/node_modules/playwright-core/lib/server/chromium/crConnection.js +0 -197
  347. data/node/node_modules/playwright-core/lib/server/chromium/crCoverage.js +0 -235
  348. data/node/node_modules/playwright-core/lib/server/chromium/crDevTools.js +0 -111
  349. data/node/node_modules/playwright-core/lib/server/chromium/crDragDrop.js +0 -131
  350. data/node/node_modules/playwright-core/lib/server/chromium/crExecutionContext.js +0 -146
  351. data/node/node_modules/playwright-core/lib/server/chromium/crInput.js +0 -187
  352. data/node/node_modules/playwright-core/lib/server/chromium/crNetworkManager.js +0 -707
  353. data/node/node_modules/playwright-core/lib/server/chromium/crPage.js +0 -1001
  354. data/node/node_modules/playwright-core/lib/server/chromium/crPdf.js +0 -121
  355. data/node/node_modules/playwright-core/lib/server/chromium/crProtocolHelper.js +0 -145
  356. data/node/node_modules/playwright-core/lib/server/chromium/crServiceWorker.js +0 -136
  357. data/node/node_modules/playwright-core/lib/server/chromium/defaultFontFamilies.js +0 -162
  358. data/node/node_modules/playwright-core/lib/server/chromium/protocol.d.js +0 -16
  359. data/node/node_modules/playwright-core/lib/server/clock.js +0 -149
  360. data/node/node_modules/playwright-core/lib/server/codegen/csharp.js +0 -327
  361. data/node/node_modules/playwright-core/lib/server/codegen/java.js +0 -274
  362. data/node/node_modules/playwright-core/lib/server/codegen/javascript.js +0 -247
  363. data/node/node_modules/playwright-core/lib/server/codegen/jsonl.js +0 -52
  364. data/node/node_modules/playwright-core/lib/server/codegen/language.js +0 -132
  365. data/node/node_modules/playwright-core/lib/server/codegen/languages.js +0 -68
  366. data/node/node_modules/playwright-core/lib/server/codegen/python.js +0 -279
  367. data/node/node_modules/playwright-core/lib/server/codegen/types.js +0 -16
  368. data/node/node_modules/playwright-core/lib/server/console.js +0 -57
  369. data/node/node_modules/playwright-core/lib/server/cookieStore.js +0 -206
  370. data/node/node_modules/playwright-core/lib/server/debugController.js +0 -191
  371. data/node/node_modules/playwright-core/lib/server/debugger.js +0 -119
  372. data/node/node_modules/playwright-core/lib/server/deviceDescriptors.js +0 -39
  373. data/node/node_modules/playwright-core/lib/server/deviceDescriptorsSource.json +0 -1779
  374. data/node/node_modules/playwright-core/lib/server/dialog.js +0 -116
  375. data/node/node_modules/playwright-core/lib/server/dispatchers/androidDispatcher.js +0 -325
  376. data/node/node_modules/playwright-core/lib/server/dispatchers/artifactDispatcher.js +0 -118
  377. data/node/node_modules/playwright-core/lib/server/dispatchers/browserContextDispatcher.js +0 -384
  378. data/node/node_modules/playwright-core/lib/server/dispatchers/browserDispatcher.js +0 -118
  379. data/node/node_modules/playwright-core/lib/server/dispatchers/browserTypeDispatcher.js +0 -64
  380. data/node/node_modules/playwright-core/lib/server/dispatchers/cdpSessionDispatcher.js +0 -44
  381. data/node/node_modules/playwright-core/lib/server/dispatchers/debugControllerDispatcher.js +0 -78
  382. data/node/node_modules/playwright-core/lib/server/dispatchers/dialogDispatcher.js +0 -47
  383. data/node/node_modules/playwright-core/lib/server/dispatchers/dispatcher.js +0 -364
  384. data/node/node_modules/playwright-core/lib/server/dispatchers/electronDispatcher.js +0 -89
  385. data/node/node_modules/playwright-core/lib/server/dispatchers/elementHandlerDispatcher.js +0 -181
  386. data/node/node_modules/playwright-core/lib/server/dispatchers/frameDispatcher.js +0 -227
  387. data/node/node_modules/playwright-core/lib/server/dispatchers/jsHandleDispatcher.js +0 -85
  388. data/node/node_modules/playwright-core/lib/server/dispatchers/jsonPipeDispatcher.js +0 -58
  389. data/node/node_modules/playwright-core/lib/server/dispatchers/localUtilsDispatcher.js +0 -149
  390. data/node/node_modules/playwright-core/lib/server/dispatchers/networkDispatchers.js +0 -213
  391. data/node/node_modules/playwright-core/lib/server/dispatchers/pageAgentDispatcher.js +0 -96
  392. data/node/node_modules/playwright-core/lib/server/dispatchers/pageDispatcher.js +0 -393
  393. data/node/node_modules/playwright-core/lib/server/dispatchers/playwrightDispatcher.js +0 -108
  394. data/node/node_modules/playwright-core/lib/server/dispatchers/streamDispatcher.js +0 -67
  395. data/node/node_modules/playwright-core/lib/server/dispatchers/tracingDispatcher.js +0 -68
  396. data/node/node_modules/playwright-core/lib/server/dispatchers/webSocketRouteDispatcher.js +0 -165
  397. data/node/node_modules/playwright-core/lib/server/dispatchers/writableStreamDispatcher.js +0 -79
  398. data/node/node_modules/playwright-core/lib/server/dom.js +0 -815
  399. data/node/node_modules/playwright-core/lib/server/download.js +0 -70
  400. data/node/node_modules/playwright-core/lib/server/electron/electron.js +0 -273
  401. data/node/node_modules/playwright-core/lib/server/electron/loader.js +0 -29
  402. data/node/node_modules/playwright-core/lib/server/errors.js +0 -69
  403. data/node/node_modules/playwright-core/lib/server/fetch.js +0 -621
  404. data/node/node_modules/playwright-core/lib/server/fileChooser.js +0 -43
  405. data/node/node_modules/playwright-core/lib/server/fileUploadUtils.js +0 -84
  406. data/node/node_modules/playwright-core/lib/server/firefox/ffBrowser.js +0 -418
  407. data/node/node_modules/playwright-core/lib/server/firefox/ffConnection.js +0 -142
  408. data/node/node_modules/playwright-core/lib/server/firefox/ffExecutionContext.js +0 -150
  409. data/node/node_modules/playwright-core/lib/server/firefox/ffInput.js +0 -159
  410. data/node/node_modules/playwright-core/lib/server/firefox/ffNetworkManager.js +0 -256
  411. data/node/node_modules/playwright-core/lib/server/firefox/ffPage.js +0 -497
  412. data/node/node_modules/playwright-core/lib/server/firefox/firefox.js +0 -114
  413. data/node/node_modules/playwright-core/lib/server/firefox/protocol.d.js +0 -16
  414. data/node/node_modules/playwright-core/lib/server/formData.js +0 -147
  415. data/node/node_modules/playwright-core/lib/server/frameSelectors.js +0 -160
  416. data/node/node_modules/playwright-core/lib/server/frames.js +0 -1471
  417. data/node/node_modules/playwright-core/lib/server/har/harRecorder.js +0 -147
  418. data/node/node_modules/playwright-core/lib/server/har/harTracer.js +0 -607
  419. data/node/node_modules/playwright-core/lib/server/harBackend.js +0 -157
  420. data/node/node_modules/playwright-core/lib/server/helper.js +0 -96
  421. data/node/node_modules/playwright-core/lib/server/index.js +0 -58
  422. data/node/node_modules/playwright-core/lib/server/input.js +0 -277
  423. data/node/node_modules/playwright-core/lib/server/instrumentation.js +0 -72
  424. data/node/node_modules/playwright-core/lib/server/javascript.js +0 -291
  425. data/node/node_modules/playwright-core/lib/server/launchApp.js +0 -128
  426. data/node/node_modules/playwright-core/lib/server/localUtils.js +0 -214
  427. data/node/node_modules/playwright-core/lib/server/macEditingCommands.js +0 -143
  428. data/node/node_modules/playwright-core/lib/server/network.js +0 -667
  429. data/node/node_modules/playwright-core/lib/server/page.js +0 -830
  430. data/node/node_modules/playwright-core/lib/server/pipeTransport.js +0 -89
  431. data/node/node_modules/playwright-core/lib/server/playwright.js +0 -69
  432. data/node/node_modules/playwright-core/lib/server/progress.js +0 -132
  433. data/node/node_modules/playwright-core/lib/server/protocolError.js +0 -52
  434. data/node/node_modules/playwright-core/lib/server/recorder/chat.js +0 -161
  435. data/node/node_modules/playwright-core/lib/server/recorder/recorderApp.js +0 -366
  436. data/node/node_modules/playwright-core/lib/server/recorder/recorderRunner.js +0 -138
  437. data/node/node_modules/playwright-core/lib/server/recorder/recorderSignalProcessor.js +0 -83
  438. data/node/node_modules/playwright-core/lib/server/recorder/recorderUtils.js +0 -157
  439. data/node/node_modules/playwright-core/lib/server/recorder/throttledFile.js +0 -57
  440. data/node/node_modules/playwright-core/lib/server/recorder.js +0 -499
  441. data/node/node_modules/playwright-core/lib/server/registry/browserFetcher.js +0 -177
  442. data/node/node_modules/playwright-core/lib/server/registry/dependencies.js +0 -371
  443. data/node/node_modules/playwright-core/lib/server/registry/index.js +0 -1422
  444. data/node/node_modules/playwright-core/lib/server/registry/nativeDeps.js +0 -1280
  445. data/node/node_modules/playwright-core/lib/server/registry/oopDownloadBrowserMain.js +0 -127
  446. data/node/node_modules/playwright-core/lib/server/screencast.js +0 -190
  447. data/node/node_modules/playwright-core/lib/server/screenshotter.js +0 -333
  448. data/node/node_modules/playwright-core/lib/server/selectors.js +0 -112
  449. data/node/node_modules/playwright-core/lib/server/socksClientCertificatesInterceptor.js +0 -383
  450. data/node/node_modules/playwright-core/lib/server/socksInterceptor.js +0 -95
  451. data/node/node_modules/playwright-core/lib/server/trace/recorder/snapshotter.js +0 -147
  452. data/node/node_modules/playwright-core/lib/server/trace/recorder/snapshotterInjected.js +0 -561
  453. data/node/node_modules/playwright-core/lib/server/trace/recorder/tracing.js +0 -604
  454. data/node/node_modules/playwright-core/lib/server/trace/viewer/traceParser.js +0 -72
  455. data/node/node_modules/playwright-core/lib/server/trace/viewer/traceViewer.js +0 -245
  456. data/node/node_modules/playwright-core/lib/server/transport.js +0 -181
  457. data/node/node_modules/playwright-core/lib/server/types.js +0 -28
  458. data/node/node_modules/playwright-core/lib/server/usKeyboardLayout.js +0 -145
  459. data/node/node_modules/playwright-core/lib/server/utils/ascii.js +0 -44
  460. data/node/node_modules/playwright-core/lib/server/utils/comparators.js +0 -139
  461. data/node/node_modules/playwright-core/lib/server/utils/crypto.js +0 -216
  462. data/node/node_modules/playwright-core/lib/server/utils/debug.js +0 -42
  463. data/node/node_modules/playwright-core/lib/server/utils/debugLogger.js +0 -122
  464. data/node/node_modules/playwright-core/lib/server/utils/env.js +0 -73
  465. data/node/node_modules/playwright-core/lib/server/utils/eventsHelper.js +0 -39
  466. data/node/node_modules/playwright-core/lib/server/utils/expectUtils.js +0 -123
  467. data/node/node_modules/playwright-core/lib/server/utils/fileUtils.js +0 -191
  468. data/node/node_modules/playwright-core/lib/server/utils/happyEyeballs.js +0 -207
  469. data/node/node_modules/playwright-core/lib/server/utils/hostPlatform.js +0 -123
  470. data/node/node_modules/playwright-core/lib/server/utils/httpServer.js +0 -203
  471. data/node/node_modules/playwright-core/lib/server/utils/imageUtils.js +0 -141
  472. data/node/node_modules/playwright-core/lib/server/utils/image_tools/colorUtils.js +0 -89
  473. data/node/node_modules/playwright-core/lib/server/utils/image_tools/compare.js +0 -109
  474. data/node/node_modules/playwright-core/lib/server/utils/image_tools/imageChannel.js +0 -78
  475. data/node/node_modules/playwright-core/lib/server/utils/image_tools/stats.js +0 -102
  476. data/node/node_modules/playwright-core/lib/server/utils/linuxUtils.js +0 -71
  477. data/node/node_modules/playwright-core/lib/server/utils/network.js +0 -242
  478. data/node/node_modules/playwright-core/lib/server/utils/nodePlatform.js +0 -154
  479. data/node/node_modules/playwright-core/lib/server/utils/pipeTransport.js +0 -84
  480. data/node/node_modules/playwright-core/lib/server/utils/processLauncher.js +0 -241
  481. data/node/node_modules/playwright-core/lib/server/utils/profiler.js +0 -65
  482. data/node/node_modules/playwright-core/lib/server/utils/socksProxy.js +0 -511
  483. data/node/node_modules/playwright-core/lib/server/utils/spawnAsync.js +0 -41
  484. data/node/node_modules/playwright-core/lib/server/utils/task.js +0 -51
  485. data/node/node_modules/playwright-core/lib/server/utils/userAgent.js +0 -98
  486. data/node/node_modules/playwright-core/lib/server/utils/wsServer.js +0 -121
  487. data/node/node_modules/playwright-core/lib/server/utils/zipFile.js +0 -74
  488. data/node/node_modules/playwright-core/lib/server/utils/zones.js +0 -57
  489. data/node/node_modules/playwright-core/lib/server/videoRecorder.js +0 -124
  490. data/node/node_modules/playwright-core/lib/server/webkit/protocol.d.js +0 -16
  491. data/node/node_modules/playwright-core/lib/server/webkit/webkit.js +0 -108
  492. data/node/node_modules/playwright-core/lib/server/webkit/wkBrowser.js +0 -335
  493. data/node/node_modules/playwright-core/lib/server/webkit/wkConnection.js +0 -144
  494. data/node/node_modules/playwright-core/lib/server/webkit/wkExecutionContext.js +0 -154
  495. data/node/node_modules/playwright-core/lib/server/webkit/wkInput.js +0 -181
  496. data/node/node_modules/playwright-core/lib/server/webkit/wkInterceptableRequest.js +0 -197
  497. data/node/node_modules/playwright-core/lib/server/webkit/wkPage.js +0 -1158
  498. data/node/node_modules/playwright-core/lib/server/webkit/wkProvisionalPage.js +0 -83
  499. data/node/node_modules/playwright-core/lib/server/webkit/wkWorkers.js +0 -105
  500. data/node/node_modules/playwright-core/lib/third_party/pixelmatch.js +0 -255
  501. data/node/node_modules/playwright-core/lib/utils/isomorphic/ariaSnapshot.js +0 -455
  502. data/node/node_modules/playwright-core/lib/utils/isomorphic/assert.js +0 -31
  503. data/node/node_modules/playwright-core/lib/utils/isomorphic/colors.js +0 -72
  504. data/node/node_modules/playwright-core/lib/utils/isomorphic/cssParser.js +0 -245
  505. data/node/node_modules/playwright-core/lib/utils/isomorphic/cssTokenizer.js +0 -1051
  506. data/node/node_modules/playwright-core/lib/utils/isomorphic/headers.js +0 -53
  507. data/node/node_modules/playwright-core/lib/utils/isomorphic/locatorGenerators.js +0 -689
  508. data/node/node_modules/playwright-core/lib/utils/isomorphic/locatorParser.js +0 -176
  509. data/node/node_modules/playwright-core/lib/utils/isomorphic/locatorUtils.js +0 -81
  510. data/node/node_modules/playwright-core/lib/utils/isomorphic/lruCache.js +0 -51
  511. data/node/node_modules/playwright-core/lib/utils/isomorphic/manualPromise.js +0 -114
  512. data/node/node_modules/playwright-core/lib/utils/isomorphic/mimeType.js +0 -459
  513. data/node/node_modules/playwright-core/lib/utils/isomorphic/multimap.js +0 -80
  514. data/node/node_modules/playwright-core/lib/utils/isomorphic/protocolFormatter.js +0 -81
  515. data/node/node_modules/playwright-core/lib/utils/isomorphic/protocolMetainfo.js +0 -330
  516. data/node/node_modules/playwright-core/lib/utils/isomorphic/rtti.js +0 -43
  517. data/node/node_modules/playwright-core/lib/utils/isomorphic/selectorParser.js +0 -386
  518. data/node/node_modules/playwright-core/lib/utils/isomorphic/semaphore.js +0 -54
  519. data/node/node_modules/playwright-core/lib/utils/isomorphic/stackTrace.js +0 -158
  520. data/node/node_modules/playwright-core/lib/utils/isomorphic/stringUtils.js +0 -204
  521. data/node/node_modules/playwright-core/lib/utils/isomorphic/time.js +0 -49
  522. data/node/node_modules/playwright-core/lib/utils/isomorphic/timeoutRunner.js +0 -66
  523. data/node/node_modules/playwright-core/lib/utils/isomorphic/trace/entries.js +0 -16
  524. data/node/node_modules/playwright-core/lib/utils/isomorphic/trace/snapshotRenderer.js +0 -499
  525. data/node/node_modules/playwright-core/lib/utils/isomorphic/trace/snapshotServer.js +0 -120
  526. data/node/node_modules/playwright-core/lib/utils/isomorphic/trace/snapshotStorage.js +0 -89
  527. data/node/node_modules/playwright-core/lib/utils/isomorphic/trace/traceLoader.js +0 -131
  528. data/node/node_modules/playwright-core/lib/utils/isomorphic/trace/traceModel.js +0 -365
  529. data/node/node_modules/playwright-core/lib/utils/isomorphic/trace/traceModernizer.js +0 -400
  530. data/node/node_modules/playwright-core/lib/utils/isomorphic/trace/versions/traceV3.js +0 -16
  531. data/node/node_modules/playwright-core/lib/utils/isomorphic/trace/versions/traceV4.js +0 -16
  532. data/node/node_modules/playwright-core/lib/utils/isomorphic/trace/versions/traceV5.js +0 -16
  533. data/node/node_modules/playwright-core/lib/utils/isomorphic/trace/versions/traceV6.js +0 -16
  534. data/node/node_modules/playwright-core/lib/utils/isomorphic/trace/versions/traceV7.js +0 -16
  535. data/node/node_modules/playwright-core/lib/utils/isomorphic/trace/versions/traceV8.js +0 -16
  536. data/node/node_modules/playwright-core/lib/utils/isomorphic/traceUtils.js +0 -58
  537. data/node/node_modules/playwright-core/lib/utils/isomorphic/types.js +0 -16
  538. data/node/node_modules/playwright-core/lib/utils/isomorphic/urlMatch.js +0 -190
  539. data/node/node_modules/playwright-core/lib/utils/isomorphic/utilityScriptSerializers.js +0 -251
  540. data/node/node_modules/playwright-core/lib/utils/isomorphic/yaml.js +0 -84
  541. data/node/node_modules/playwright-core/lib/utils.js +0 -111
  542. data/node/node_modules/playwright-core/lib/utilsBundle.js +0 -109
  543. data/node/node_modules/playwright-core/lib/utilsBundleImpl/index.js +0 -218
  544. data/node/node_modules/playwright-core/lib/utilsBundleImpl/xdg-open +0 -1066
  545. data/node/node_modules/playwright-core/lib/vite/htmlReport/index.html +0 -84
  546. data/node/node_modules/playwright-core/lib/vite/recorder/assets/codeMirrorModule-DYBRYzYX.css +0 -1
  547. data/node/node_modules/playwright-core/lib/vite/recorder/assets/codeMirrorModule-DadYNm1I.js +0 -32
  548. data/node/node_modules/playwright-core/lib/vite/recorder/assets/codicon-DCmgc-ay.ttf +0 -0
  549. data/node/node_modules/playwright-core/lib/vite/recorder/assets/index-BSjZa4pk.css +0 -1
  550. data/node/node_modules/playwright-core/lib/vite/recorder/assets/index-BhTWtUlo.js +0 -193
  551. data/node/node_modules/playwright-core/lib/vite/recorder/index.html +0 -29
  552. data/node/node_modules/playwright-core/lib/vite/recorder/playwright-logo.svg +0 -9
  553. data/node/node_modules/playwright-core/lib/vite/traceViewer/assets/codeMirrorModule-a5XoALAZ.js +0 -32
  554. data/node/node_modules/playwright-core/lib/vite/traceViewer/assets/defaultSettingsView-CJSZINFr.js +0 -266
  555. data/node/node_modules/playwright-core/lib/vite/traceViewer/assets/xtermModule-CsJ4vdCR.js +0 -9
  556. data/node/node_modules/playwright-core/lib/vite/traceViewer/codeMirrorModule.DYBRYzYX.css +0 -1
  557. data/node/node_modules/playwright-core/lib/vite/traceViewer/codicon.DCmgc-ay.ttf +0 -0
  558. data/node/node_modules/playwright-core/lib/vite/traceViewer/defaultSettingsView.7ch9cixO.css +0 -1
  559. data/node/node_modules/playwright-core/lib/vite/traceViewer/index.BVu7tZDe.css +0 -1
  560. data/node/node_modules/playwright-core/lib/vite/traceViewer/index.Bk2uYQRV.js +0 -2
  561. data/node/node_modules/playwright-core/lib/vite/traceViewer/index.html +0 -43
  562. data/node/node_modules/playwright-core/lib/vite/traceViewer/manifest.webmanifest +0 -16
  563. data/node/node_modules/playwright-core/lib/vite/traceViewer/playwright-logo.svg +0 -9
  564. data/node/node_modules/playwright-core/lib/vite/traceViewer/snapshot.html +0 -21
  565. data/node/node_modules/playwright-core/lib/vite/traceViewer/sw.bundle.js +0 -5
  566. data/node/node_modules/playwright-core/lib/vite/traceViewer/uiMode.Btcz36p_.css +0 -1
  567. data/node/node_modules/playwright-core/lib/vite/traceViewer/uiMode.CQJ9SCIQ.js +0 -5
  568. data/node/node_modules/playwright-core/lib/vite/traceViewer/uiMode.html +0 -17
  569. data/node/node_modules/playwright-core/lib/vite/traceViewer/xtermModule.DYP7pi_n.css +0 -32
  570. data/node/node_modules/playwright-core/lib/zipBundle.js +0 -34
  571. data/node/node_modules/playwright-core/lib/zipBundleImpl.js +0 -5
  572. data/node/node_modules/playwright-core/package.json +0 -43
  573. data/node/node_modules/playwright-core/types/protocol.d.ts +0 -23824
  574. data/node/node_modules/playwright-core/types/structs.d.ts +0 -45
  575. data/node/node_modules/playwright-core/types/types.d.ts +0 -22843
  576. data/spec/rubycrawl_spec.rb +0 -51
  577. data/spec/spec_helper.rb +0 -11
data/README.md CHANGED
@@ -1,39 +1,67 @@
1
- # rubycrawl
1
+ # RubyCrawl 🎭
2
2
 
3
- [![Gem Version](https://badge.fury.io/rb/rubycrawl.svg)](https://badge.fury.io/rb/rubycrawl)
3
+ [![Gem Version](https://badge.fury.io/rb/rubycrawl.svg)](https://rubygems.org/gems/rubycrawl)
4
4
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
5
+ [![Ruby](https://img.shields.io/badge/ruby-%3E%3D%203.0-red.svg)](https://www.ruby-lang.org/)
6
+ [![Node.js](https://img.shields.io/badge/node.js-18%2B-green.svg)](https://nodejs.org/)
5
7
 
6
- **Playwright-based web crawler for Ruby** — Inspired by [crawl4ai](https://github.com/unclecode/crawl4ai) (Python), designed idiomatically for Ruby with production-ready features.
8
+ **Production-ready web crawler for Ruby powered by Playwright** — Bringing the power of modern browser automation to the Ruby ecosystem with first-class Rails support.
7
9
 
8
- RubyCrawl provides accurate, JavaScript-enabled web scraping using Playwright's battle-tested browser automation, wrapped in a clean Ruby API. Perfect for extracting content from modern SPAs and dynamic websites.
10
+ RubyCrawl provides **accurate, JavaScript-enabled web scraping** using Playwright's battle-tested browser automation, wrapped in a clean Ruby API. Perfect for extracting content from modern SPAs, dynamic websites, and building RAG knowledge bases.
11
+
12
+ **Why RubyCrawl?**
13
+
14
+ - ✅ **Real browser** — Handles JavaScript, AJAX, and SPAs correctly
15
+ - ✅ **Zero config** — Works out of the box, no Playwright knowledge needed
16
+ - ✅ **Production-ready** — Auto-retry, error handling, resource optimization
17
+ - ✅ **Multi-page crawling** — BFS algorithm with smart URL deduplication
18
+ - ✅ **Rails-friendly** — Generators, initializers, and ActiveJob integration
19
+ - ✅ **Modular architecture** — Clean, testable, maintainable codebase
20
+
21
+ ```ruby
22
+ # One line to crawl any JavaScript-heavy site
23
+ result = RubyCrawl.crawl("https://docs.example.com")
24
+
25
+ result.html # Full HTML with JS rendered
26
+ result.links # All links with metadata
27
+ result.metadata # Title, description, OG tags, etc.
28
+ ```
9
29
 
10
30
  ## Features
11
31
 
12
- - **Playwright-powered**: Real browser automation for JavaScript-heavy sites
13
- - **Production-ready**: Designed for Rails apps and production environments
14
- - **Simple API**: Clean, minimal Ruby interface — zero Playwright knowledge required
15
- - **Resource optimization**: Built-in resource blocking for faster crawls
16
- - **Auto-managed browsers**: Browser process reuse and automatic lifecycle management
17
- - **Content extraction**: HTML, links, and Markdown conversion
18
- - **Multi-page crawling**: BFS crawler with depth limits and deduplication
19
- - **Rails integration**: First-class Rails support with generators and initializers
32
+ - **🎭 Playwright-powered**: Real browser automation for JavaScript-heavy sites and SPAs
33
+ - **🚀 Production-ready**: Designed for Rails apps and production environments with auto-retry and error handling
34
+ - **🎯 Simple API**: Clean, minimal Ruby interface — zero Playwright or Node.js knowledge required
35
+ - **⚡ Resource optimization**: Built-in resource blocking for 2-3x faster crawls
36
+ - **🔄 Auto-managed browsers**: Browser process reuse and automatic lifecycle management
37
+ - **📄 Content extraction**: HTML, plain text, links (with metadata), and **clean markdown** via HTML conversion
38
+ - **🌐 Multi-page crawling**: BFS (breadth-first search) crawler with configurable depth limits and URL deduplication
39
+ - **🛡️ Smart URL handling**: Automatic normalization, tracking parameter removal, and same-host filtering
40
+ - **🔧 Rails integration**: First-class Rails support with generators and initializers
41
+ - **💎 Modular design**: Clean separation of concerns with focused, testable modules
20
42
 
21
43
  ## Table of Contents
22
44
 
45
+ - [Features](#features)
23
46
  - [Installation](#installation)
24
47
  - [Quick Start](#quick-start)
48
+ - [Use Cases](#use-cases)
25
49
  - [Usage](#usage)
26
50
  - [Basic Crawling](#basic-crawling)
27
51
  - [Multi-Page Crawling](#multi-page-crawling)
28
52
  - [Configuration](#configuration)
29
53
  - [Result Object](#result-object)
54
+ - [Error Handling](#error-handling)
30
55
  - [Rails Integration](#rails-integration)
31
56
  - [Production Deployment](#production-deployment)
32
57
  - [Architecture](#architecture)
33
58
  - [Performance](#performance)
34
59
  - [Development](#development)
60
+ - [Project Structure](#project-structure)
35
61
  - [Contributing](#contributing)
62
+ - [Why Choose RubyCrawl?](#why-choose-rubycrawl)
36
63
  - [License](#license)
64
+ - [Support](#support)
37
65
 
38
66
  ## Installation
39
67
 
@@ -64,9 +92,24 @@ bundle exec rake rubycrawl:install
64
92
 
65
93
  This command:
66
94
 
67
- - Installs Node.js dependencies in the bundled `node/` directory
68
- - Downloads Playwright browsers (Chromium, Firefox, WebKit)
69
- - Creates a Rails initializer (if using Rails)
95
+ - Installs Node.js dependencies in the bundled `node/` directory
96
+ - Downloads Playwright browsers (Chromium, Firefox, WebKit) — ~300MB download
97
+ - Creates a Rails initializer (if using Rails)
98
+
99
+ **Note:** You only need to run this once. The installation task is idempotent and safe to run multiple times.
100
+
101
+ **Troubleshooting installation:**
102
+
103
+ ```bash
104
+ # If installation fails, check Node.js version
105
+ node --version # Should be v18+ LTS
106
+
107
+ # Enable verbose logging
108
+ RUBYCRAWL_NODE_LOG=/tmp/rubycrawl.log bundle exec rake rubycrawl:install
109
+
110
+ # Check installation status
111
+ cd node && npm list
112
+ ```
70
113
 
71
114
  ## Quick Start
72
115
 
@@ -77,12 +120,24 @@ require "rubycrawl"
77
120
  result = RubyCrawl.crawl("https://example.com")
78
121
 
79
122
  # Access extracted content
80
- puts result.html # Raw HTML content
81
- puts result.markdown # Converted to Markdown
82
- puts result.links # Extracted links from the page
83
- puts result.metadata # Status code, final URL, etc.
123
+ result.final_url # Final URL after redirects
124
+ result.text # Plain text content (via innerText)
125
+ result.html # Raw HTML content
126
+ result.links # Extracted links with metadata
127
+ result.metadata # Title, description, OG tags, etc.
84
128
  ```
85
129
 
130
+ ## Use Cases
131
+
132
+ RubyCrawl is perfect for:
133
+
134
+ - **📊 Data aggregation**: Crawl product catalogs, job listings, or news articles
135
+ - **🤖 RAG applications**: Build knowledge bases for LLM/AI applications by crawling documentation sites
136
+ - **🔍 SEO analysis**: Extract metadata, links, and content structure
137
+ - **📱 Content migration**: Convert existing sites to Markdown for static site generators
138
+ - **🧪 Testing**: Verify deployed site structure and content
139
+ - **📚 Documentation scraping**: Create local copies of documentation with preserved links
140
+
86
141
  ## Usage
87
142
 
88
143
  ### Basic Crawling
@@ -93,11 +148,9 @@ The simplest way to crawl a URL:
93
148
  result = RubyCrawl.crawl("https://example.com")
94
149
 
95
150
  # Access the results
96
- result.html # => "<html>...</html>"
97
- result.markdown # => "# Example Domain\n\nThis domain is..." (lazy-loaded)
98
- result.links # => [{ "url" => "https://...", "text" => "More info" }, ...]
99
- result.metadata # => { "status" => 200, "final_url" => "https://example.com" }
100
- result.text # => "" (coming soon)
151
+ result.html # => "<html>...</html>"
152
+ result.text # => "Example Domain\nThis domain is..." (plain text via innerText)
153
+ result.metadata # => { "status" => 200, "final_url" => "https://example.com" }
101
154
  ```
102
155
 
103
156
  ### Multi-Page Crawling
@@ -109,38 +162,72 @@ Crawl an entire site following links with BFS (breadth-first search):
109
162
  RubyCrawl.crawl_site("https://example.com", max_pages: 100, max_depth: 3) do |page|
110
163
  # Each page is yielded as it's crawled (streaming)
111
164
  puts "Crawled: #{page.url} (depth: #{page.depth})"
112
-
165
+
113
166
  # Save to database
114
167
  Page.create!(
115
168
  url: page.url,
116
169
  html: page.html,
117
- markdown: page.markdown,
170
+ markdown: page.clean_markdown,
118
171
  depth: page.depth
119
172
  )
120
173
  end
121
174
  ```
122
175
 
176
+ **Real-world example: Building a RAG knowledge base**
177
+
178
+ ```ruby
179
+ # Crawl documentation site for AI/RAG application
180
+ require "rubycrawl"
181
+
182
+ RubyCrawl.configure(
183
+ wait_until: "networkidle", # Ensure JS content loads
184
+ block_resources: true # Skip images/fonts for speed
185
+ )
186
+
187
+ pages_crawled = RubyCrawl.crawl_site(
188
+ "https://docs.example.com",
189
+ max_pages: 500,
190
+ max_depth: 5,
191
+ same_host_only: true
192
+ ) do |page|
193
+ # Store in vector database for RAG
194
+ VectorDB.upsert(
195
+ id: Digest::SHA256.hexdigest(page.url),
196
+ content: page.clean_markdown, # Clean markdown for better embeddings
197
+ metadata: {
198
+ url: page.url,
199
+ title: page.metadata["title"],
200
+ depth: page.depth
201
+ }
202
+ )
203
+
204
+ puts "✓ Indexed: #{page.metadata['title']} (#{page.depth} levels deep)"
205
+ end
206
+
207
+ puts "Crawled #{pages_crawled} pages into knowledge base"
208
+ ```
209
+
123
210
  #### Multi-Page Options
124
211
 
125
- | Option | Default | Description |
126
- |--------|---------|-------------|
127
- | `max_pages` | 50 | Maximum number of pages to crawl |
128
- | `max_depth` | 3 | Maximum link depth from start URL |
129
- | `same_host_only` | true | Only follow links on the same domain |
130
- | `wait_until` | inherited | Page load strategy |
131
- | `block_resources` | inherited | Block images/fonts/CSS |
212
+ | Option | Default | Description |
213
+ | ----------------- | --------- | ------------------------------------ |
214
+ | `max_pages` | 50 | Maximum number of pages to crawl |
215
+ | `max_depth` | 3 | Maximum link depth from start URL |
216
+ | `same_host_only` | true | Only follow links on the same domain |
217
+ | `wait_until` | inherited | Page load strategy |
218
+ | `block_resources` | inherited | Block images/fonts/CSS |
132
219
 
133
220
  #### Page Result Object
134
221
 
135
222
  The block receives a `PageResult` with:
136
223
 
137
224
  ```ruby
138
- page.url # String: Final URL after redirects
139
- page.html # String: Full HTML content
140
- page.markdown # String: Lazy-converted Markdown
141
- page.links # Array: URLs extracted from page
142
- page.metadata # Hash: HTTP status, final URL, etc.
143
- page.depth # Integer: Link depth from start URL
225
+ page.url # String: Final URL after redirects
226
+ page.html # String: Full HTML content
227
+ page.clean_markdown # String: Lazy-converted Markdown
228
+ page.links # Array: URLs extracted from page
229
+ page.metadata # Hash: HTTP status, final URL, etc.
230
+ page.depth # Integer: Link depth from start URL
144
231
  ```
145
232
 
146
233
  ### Configuration
@@ -177,16 +264,55 @@ result = RubyCrawl.crawl(
177
264
 
178
265
  #### Configuration Options
179
266
 
180
- | Option | Values | Default | Description |
181
- | ----------------- | ----------------------------------------------- | -------- | ------------------------------------------------- |
182
- | `wait_until` | `"load"`, `"domcontentloaded"`, `"networkidle"` | `"load"` | When to consider page loaded |
183
- | `block_resources` | `true`, `false` | `true` | Block images, fonts, CSS, media for faster crawls |
267
+ | Option | Values | Default | Description |
268
+ | ----------------- | ---------------------------------------------------------------------- | -------- | ------------------------------------------------- |
269
+ | `wait_until` | `"load"`, `"domcontentloaded"`, `"networkidle"`, `"commit"` | `"load"` | When to consider page loaded |
270
+ | `block_resources` | `true`, `false` | `true` | Block images, fonts, CSS, media for faster crawls |
271
+ | `max_attempts` | Integer | `3` | Total number of attempts (including the first) |
184
272
 
185
273
  **Wait strategies explained:**
186
274
 
187
275
  - `load` — Wait for the load event (fastest, good for static sites)
188
276
  - `domcontentloaded` — Wait for DOM ready (medium speed)
189
277
  - `networkidle` — Wait until no network requests for 500ms (slowest, best for SPAs)
278
+ - `commit` — Wait until the first response bytes are received (fastest possible)
279
+
280
+ ### Advanced Usage
281
+
282
+ #### Session-Based Crawling
283
+
284
+ Sessions allow reusing browser contexts for better performance when crawling multiple pages. They're automatically used by `crawl_site`, but you can manage them manually for advanced use cases:
285
+
286
+ ```ruby
287
+ # Create a session (reusable browser context)
288
+ session_id = RubyCrawl.create_session
289
+
290
+ begin
291
+ # All crawls with this session_id share the same browser context
292
+ result1 = RubyCrawl.crawl("https://example.com/page1", session_id: session_id)
293
+ result2 = RubyCrawl.crawl("https://example.com/page2", session_id: session_id)
294
+ # Browser state (cookies, localStorage) persists between crawls
295
+ ensure
296
+ # Always destroy session when done
297
+ RubyCrawl.destroy_session(session_id)
298
+ end
299
+ ```
300
+
301
+ **When to use sessions:**
302
+
303
+ - Multiple sequential crawls to the same domain (better performance)
304
+ - Preserving cookies/state set by the site between page visits
305
+ - Avoiding browser context creation overhead
306
+
307
+ **Important:** Sessions are for **performance optimization only**. RubyCrawl is designed for crawling **public websites**. It does not provide authentication or login functionality for protected content.
308
+
309
+ **Note:** `crawl_site` automatically creates and manages a session internally, so you don't need manual session management for multi-page crawling.
310
+
311
+ **Session lifecycle:**
312
+
313
+ - Sessions automatically expire after 30 minutes of inactivity
314
+ - Sessions are cleaned up every 5 minutes
315
+ - Always call `destroy_session` when done to free resources immediately
190
316
 
191
317
  ### Result Object
192
318
 
@@ -195,33 +321,47 @@ The crawl result is a `RubyCrawl::Result` object with these attributes:
195
321
  ```ruby
196
322
  result = RubyCrawl.crawl("https://example.com")
197
323
 
198
- result.html # String: Raw HTML content from page
199
- result.markdown # String: Markdown conversion (lazy-loaded on first access)
200
- result.links # Array: Extracted links with url and text
201
- result.text # String: Plain text (coming soon)
202
- result.metadata # Hash: Comprehensive metadata (see below)
324
+ result.html # String: Raw HTML content from page
325
+ result.text # String: Plain text via document.body.innerText
326
+ result.clean_markdown # String: Markdown conversion (lazy-loaded on first access)
327
+ result.links # Array: Extracted links with url and text
328
+ result.metadata # Hash: Comprehensive metadata (see below)
203
329
  ```
204
330
 
205
331
  #### Links Format
206
332
 
333
+ Links are extracted with full metadata:
334
+
207
335
  ```ruby
208
336
  result.links
209
337
  # => [
210
- # { "url" => "https://example.com/about", "text" => "About Us" },
211
- # { "url" => "https://example.com/contact", "text" => "Contact" },
338
+ # {
339
+ # "url" => "https://example.com/about",
340
+ # "text" => "About Us",
341
+ # "title" => "Learn more about us", # <a title="...">
342
+ # "rel" => nil # <a rel="nofollow">
343
+ # },
344
+ # {
345
+ # "url" => "https://example.com/contact",
346
+ # "text" => "Contact",
347
+ # "title" => null,
348
+ # "rel" => "nofollow"
349
+ # },
212
350
  # ...
213
351
  # ]
214
352
  ```
215
353
 
354
+ **Note:** URLs are automatically converted to absolute URLs by the browser, so relative links like `/about` become `https://example.com/about`.
355
+
216
356
  #### Markdown Conversion
217
357
 
218
- Markdown is **lazy-loaded** — conversion only happens when you access `.markdown`:
358
+ Markdown is **lazy-loaded** — conversion only happens when you access `.clean_markdown`:
219
359
 
220
360
  ```ruby
221
361
  result = RubyCrawl.crawl(url)
222
- result.html # ✅ No overhead
223
- result.markdown # ⬅️ Conversion happens here (first call only)
224
- result.markdown # ✅ Cached, instant
362
+ result.html # ✅ No overhead
363
+ result.clean_markdown # ⬅️ Conversion happens here (first call only)
364
+ result.clean_markdown # ✅ Cached, instant
225
365
  ```
226
366
 
227
367
  Uses [reverse_markdown](https://github.com/xijo/reverse_markdown) with GitHub-flavored output.
@@ -282,18 +422,19 @@ end
282
422
  ```
283
423
 
284
424
  **Exception Hierarchy:**
425
+
285
426
  - `RubyCrawl::Error` (base class)
286
427
  - `RubyCrawl::ConfigurationError` - Invalid URL or configuration
287
428
  - `RubyCrawl::TimeoutError` - Timeout during crawl
288
429
  - `RubyCrawl::NavigationError` - Page navigation failed
289
430
  - `RubyCrawl::ServiceError` - Node service issues
290
431
 
291
- **Automatic Retry:** RubyCrawl automatically retries transient failures (service errors, timeouts) up to 3 times with exponential backoff (2s, 4s, 8s). Configure with:
432
+ **Automatic Retry:** RubyCrawl automatically retries transient failures (service errors, timeouts) with exponential backoff. The default `max_attempts: 3` means 3 total attempts (2 retries). Configure with:
292
433
 
293
434
  ```ruby
294
- RubyCrawl.configure(max_retries: 5)
435
+ RubyCrawl.configure(max_attempts: 5)
295
436
  # or per-request
296
- RubyCrawl.crawl(url, retries: 1) # Disable retry
437
+ RubyCrawl.crawl(url, max_attempts: 1) # No retries
297
438
  ```
298
439
 
299
440
  ## Rails Integration
@@ -320,22 +461,177 @@ RubyCrawl.configure(
320
461
 
321
462
  ### Usage in Rails
322
463
 
464
+ #### Basic Usage in Controllers
465
+
466
+ ```ruby
467
+ class PagesController < ApplicationController
468
+ def show
469
+ result = RubyCrawl.crawl(params[:url])
470
+
471
+ @page = Page.create!(
472
+ url: result.final_url,
473
+ title: result.metadata['title'],
474
+ html: result.html,
475
+ text: result.text,
476
+ markdown: result.clean_markdown
477
+ )
478
+
479
+ redirect_to @page
480
+ end
481
+ end
482
+ ```
483
+
484
+ #### Background Jobs with ActiveJob
485
+
486
+ **Simple Crawl Job:**
487
+
323
488
  ```ruby
324
- # In a controller, service, or background job
325
- class ContentScraperJob < ApplicationJob
326
- def perform(url)
489
+ class CrawlPageJob < ApplicationJob
490
+ queue_as :crawlers
491
+
492
+ # Automatic retry with exponential backoff for transient failures
493
+ retry_on RubyCrawl::ServiceError, wait: :exponentially_longer, attempts: 5
494
+ retry_on RubyCrawl::TimeoutError, wait: :exponentially_longer, attempts: 3
495
+
496
+ # Don't retry on configuration errors (bad URLs)
497
+ discard_on RubyCrawl::ConfigurationError
498
+
499
+ def perform(url, user_id: nil)
327
500
  result = RubyCrawl.crawl(url)
328
501
 
329
- # Save to database
330
- ScrapedContent.create!(
331
- url: url,
502
+ Page.create!(
503
+ url: result.final_url,
504
+ title: result.metadata['title'],
505
+ text: result.text,
332
506
  html: result.html,
333
- status: result.metadata[:status]
507
+ user_id: user_id,
508
+ crawled_at: Time.current
334
509
  )
510
+ rescue RubyCrawl::NavigationError => e
511
+ # Page not found or failed to load
512
+ Rails.logger.warn "Failed to crawl #{url}: #{e.message}"
513
+ FailedCrawl.create!(url: url, error: e.message, user_id: user_id)
335
514
  end
336
515
  end
516
+
517
+ # Enqueue from anywhere
518
+ CrawlPageJob.perform_later("https://example.com", user_id: current_user.id)
337
519
  ```
338
520
 
521
+ **Multi-Page Site Crawler Job:**
522
+
523
+ ```ruby
524
+ class CrawlSiteJob < ApplicationJob
525
+ queue_as :crawlers
526
+
527
+ def perform(start_url, max_pages: 50)
528
+ pages_crawled = RubyCrawl.crawl_site(
529
+ start_url,
530
+ max_pages: max_pages,
531
+ max_depth: 3,
532
+ same_host_only: true
533
+ ) do |page|
534
+ Page.create!(
535
+ url: page.url,
536
+ title: page.metadata['title'],
537
+ text: page.clean_markdown, # Store markdown for RAG applications
538
+ depth: page.depth,
539
+ crawled_at: Time.current
540
+ )
541
+ end
542
+
543
+ Rails.logger.info "Crawled #{pages_crawled} pages from #{start_url}"
544
+ end
545
+ end
546
+ ```
547
+
548
+ **Batch Crawling Pattern:**
549
+
550
+ ```ruby
551
+ class BatchCrawlJob < ApplicationJob
552
+ queue_as :crawlers
553
+
554
+ def perform(urls)
555
+ # Create session for better performance
556
+ session_id = RubyCrawl.create_session
557
+
558
+ begin
559
+ urls.each do |url|
560
+ result = RubyCrawl.crawl(url, session_id: session_id)
561
+
562
+ Page.create!(
563
+ url: result.final_url,
564
+ html: result.html,
565
+ text: result.text
566
+ )
567
+ end
568
+ ensure
569
+ # Always destroy session when done
570
+ RubyCrawl.destroy_session(session_id)
571
+ end
572
+ end
573
+ end
574
+
575
+ # Enqueue batch
576
+ BatchCrawlJob.perform_later(["https://example.com", "https://example.com/about"])
577
+ ```
578
+
579
+ **Periodic Crawling with Sidekiq-Cron:**
580
+
581
+ ```ruby
582
+ # config/schedule.yml (for sidekiq-cron)
583
+ crawl_news_sites:
584
+ cron: "0 */6 * * *" # Every 6 hours
585
+ class: "CrawlNewsSitesJob"
586
+
587
+ # app/jobs/crawl_news_sites_job.rb
588
+ class CrawlNewsSitesJob < ApplicationJob
589
+ queue_as :scheduled_crawlers
590
+
591
+ def perform
592
+ Site.where(active: true).find_each do |site|
593
+ CrawlSiteJob.perform_later(site.url, max_pages: site.max_pages)
594
+ end
595
+ end
596
+ end
597
+ ```
598
+
599
+ **RAG/AI Knowledge Base Pattern:**
600
+
601
+ ```ruby
602
+ class BuildKnowledgeBaseJob < ApplicationJob
603
+ queue_as :crawlers
604
+
605
+ def perform(documentation_url)
606
+ RubyCrawl.crawl_site(
607
+ documentation_url,
608
+ max_pages: 500,
609
+ max_depth: 5
610
+ ) do |page|
611
+ # Store in vector database for RAG
612
+ embedding = OpenAI.embed(page.clean_markdown)
613
+
614
+ Document.create!(
615
+ url: page.url,
616
+ title: page.metadata['title'],
617
+ content: page.clean_markdown,
618
+ embedding: embedding,
619
+ depth: page.depth
620
+ )
621
+ end
622
+ end
623
+ end
624
+ ```
625
+
626
+ #### Best Practices
627
+
628
+ 1. **Use background jobs** for crawling to avoid blocking web requests
629
+ 2. **Configure retry logic** based on error types (retry ServiceError, discard ConfigurationError)
630
+ 3. **Use sessions** for batch crawling to improve performance
631
+ 4. **Monitor job failures** and set up alerts for repeated errors
632
+ 5. **Rate limit** external crawling to be respectful (use job throttling)
633
+ 6. **Store both HTML and text** for flexibility in data processing
634
+
339
635
  ## Production Deployment
340
636
 
341
637
  ### Pre-deployment Checklist
@@ -393,154 +689,41 @@ Add to `package.json` in your Rails root:
393
689
  }
394
690
  ```
395
691
 
396
- ### Performance Tips
397
-
398
- - **Reuse instances**: Use the class-level `RubyCrawl.crawl` method (recommended) rather than creating new instances
399
- - **Resource blocking**: Keep `block_resources: true` for 2-3x faster crawls when you don't need images/CSS
400
- - **Concurrency**: Use background jobs (Sidekiq, etc.) for parallel crawling
401
- - **Browser reuse**: The first crawl is slower due to browser launch; subsequent crawls reuse the process
402
-
403
- ## Architecture
404
-
405
- RubyCrawl uses a **dual-process architecture**:
406
-
407
- ```
408
- ┌─────────────────────────────────────────────┐
409
- │ Ruby Process (Your Application) │
410
- │ ┌─────────────────────────────────────┐ │
411
- │ │ RubyCrawl Gem │ │
412
- │ │ • Public API │ │
413
- │ │ • Result normalization │ │
414
- │ │ • Error handling │ │
415
- │ └────────────┬────────────────────────┘ │
416
- └───────────────┼─────────────────────────────┘
417
- │ HTTP/JSON (localhost:3344)
418
- ┌───────────────┼─────────────────────────────┐
419
- │ Node.js Process (Auto-started) │
420
- │ ┌────────────┴────────────────────────┐ │
421
- │ │ Playwright Service │ │
422
- │ │ • Browser management │ │
423
- │ │ • Page navigation │ │
424
- │ │ • HTML extraction │ │
425
- │ │ • Resource blocking │ │
426
- │ └─────────────────────────────────────┘ │
427
- └─────────────────────────────────────────────┘
428
- ```
429
-
430
- **Why this architecture?**
431
-
432
- - **Separation of concerns**: Ruby handles orchestration, Node handles browsers
433
- - **Stability**: Playwright's official Node.js bindings are most reliable
434
- - **Performance**: Long-running browser process, reused across requests
435
- - **Simplicity**: No C extensions, pure Ruby + bundled Node service
436
-
437
- See [.github/copilot-instructions.md](.github/copilot-instructions.md) for detailed architecture documentation.
438
-
439
- ## Performance
440
-
441
- ### Benchmarks
692
+ ## How It Works
442
693
 
443
- Typical crawl times (M1 Mac, fast network):
694
+ RubyCrawl uses a simple architecture:
444
695
 
445
- | Page Type | First Crawl | Subsequent | Config |
446
- | ----------- | ----------- | ---------- | --------------------------- |
447
- | Static HTML | ~2s | ~500ms | `block_resources: true` |
448
- | SPA (React) | ~3s | ~1.2s | `wait_until: "networkidle"` |
449
- | Heavy site | ~4s | ~2s | `block_resources: false` |
696
+ - **Ruby Gem** provides the public API and handles orchestration
697
+ - **Node.js Service** (bundled, auto-started) manages Playwright browsers
698
+ - Communication via HTTP/JSON on localhost
450
699
 
451
- **Note**: First crawl includes browser launch time (~1.5s). Subsequent crawls reuse the browser.
700
+ This design keeps things stable and easy to debug. The browser runs in a separate process, so crashes won't affect your Ruby application.
452
701
 
453
- ### Optimization Tips
702
+ ## Performance Tips
454
703
 
455
- 1. **Enable resource blocking** for content-only extraction:
456
-
457
- ```ruby
458
- RubyCrawl.configure(block_resources: true)
459
- ```
460
-
461
- 2. **Use appropriate wait strategy**:
462
- - Static sites: `wait_until: "load"`
463
- - SPAs: `wait_until: "networkidle"`
464
-
465
- 3. **Batch processing**: Use background jobs for concurrent crawling:
466
- ```ruby
467
- urls.each { |url| CrawlJob.perform_later(url) }
468
- ```
704
+ - **Resource blocking**: Keep `block_resources: true` (default) for 2-3x faster crawls when you don't need images/CSS
705
+ - **Wait strategy**: Use `wait_until: "load"` for static sites, `"networkidle"` for SPAs
706
+ - **Concurrency**: Use background jobs (Sidekiq, etc.) for parallel crawling
707
+ - **Browser reuse**: The first crawl is slower (~2s) due to browser launch; subsequent crawls are much faster (~500ms)
469
708
 
470
709
  ## Development
471
710
 
472
- ### Setup
711
+ Want to contribute? Check out the [contributor guidelines](.github/copilot-instructions.md).
473
712
 
474
713
  ```bash
714
+ # Setup
475
715
  git clone git@github.com:craft-wise/rubycrawl.git
476
716
  cd rubycrawl
477
- bin/setup # Installs dependencies and sets up Node service
478
- ```
717
+ bin/setup
479
718
 
480
- ### Running Tests
481
-
482
- ```bash
719
+ # Run tests
483
720
  bundle exec rspec
484
- ```
485
-
486
- ### Manual Testing
487
-
488
- ```bash
489
- # Terminal 1: Start Node service manually (optional)
490
- cd node
491
- npm start
492
721
 
493
- # Terminal 2: Ruby console
722
+ # Manual testing
494
723
  bin/console
495
- > result = RubyCrawl.crawl("https://example.com")
496
- > puts result.html
497
- ```
498
-
499
- ### Project Structure
500
-
501
- ```
502
- rubycrawl/
503
- ├── lib/
504
- │ ├── rubycrawl.rb # Main gem entry point
505
- │ ├── rubycrawl/
506
- │ │ ├── version.rb # Gem version
507
- │ │ ├── railtie.rb # Rails integration
508
- │ │ └── tasks/
509
- │ │ └── install.rake # Installation task
510
- ├── node/
511
- │ ├── src/
512
- │ │ └── index.js # Playwright HTTP service
513
- │ ├── package.json
514
- │ └── README.md
515
- ├── spec/ # RSpec tests
516
- ├── .github/
517
- │ └── copilot-instructions.md # GitHub Copilot guidelines
518
- ├── CLAUDE.md # Claude AI guidelines
519
- └── README.md
724
+ > RubyCrawl.crawl("https://example.com")
520
725
  ```
521
726
 
522
- ## Roadmap
523
-
524
- ### Current (v0.1.0)
525
-
526
- - [x] HTML extraction
527
- - [x] Link extraction
528
- - [x] Markdown conversion (lazy-loaded)
529
- - [x] Multi-page crawling with BFS
530
- - [x] URL normalization and deduplication
531
- - [x] Basic metadata (status, final URL)
532
- - [x] Resource blocking
533
- - [x] Rails integration
534
-
535
- ### Coming Soon
536
-
537
- - [ ] Plain text extraction
538
- - [ ] Screenshot capture
539
- - [ ] Custom JavaScript execution
540
- - [ ] Session/cookie support
541
- - [ ] Proxy support
542
- - [ ] Robots.txt support
543
-
544
727
  ## Contributing
545
728
 
546
729
  Contributions are welcome! Please read our [contribution guidelines](.github/copilot-instructions.md) first.
@@ -552,21 +735,46 @@ Contributions are welcome! Please read our [contribution guidelines](.github/cop
552
735
  - **Ruby-first**: Hide Node.js/Playwright complexity from users
553
736
  - **No vendor lock-in**: Pure open source, no SaaS dependencies
554
737
 
555
- ## Comparison with crawl4ai
738
+ ## Why Choose RubyCrawl?
739
+
740
+ RubyCrawl stands out in the Ruby ecosystem with its unique combination of features:
741
+
742
+ ### 🎯 **Built for Ruby Developers**
743
+
744
+ - **Idiomatic Ruby API** — Feels natural to Rubyists, no need to learn Playwright
745
+ - **Rails-first design** — Generators, initializers, and ActiveJob integration out of the box
746
+ - **Modular architecture** — Clean, testable code following Ruby best practices
747
+
748
+ ### 🚀 **Production-Grade Reliability**
556
749
 
557
- | Feature | crawl4ai (Python) | rubycrawl (Ruby) |
558
- | ------------------- | ----------------- | ---------------- |
559
- | Browser automation | Playwright | Playwright |
560
- | Language | Python | Ruby |
561
- | LLM extraction | ✅ | Planned |
562
- | Markdown extraction | ✅ | ✅ |
563
- | Link extraction | ✅ | ✅ |
564
- | Multi-page crawling | ✅ | ✅ |
565
- | Rails integration | N/A | ✅ |
566
- | Resource blocking | ✅ | ✅ |
567
- | Session management | ✅ | Planned |
750
+ - **Automatic retry** with exponential backoff for transient failures
751
+ - **Smart error handling** with custom exception hierarchy
752
+ - **Process isolation** — Browser crashes don't affect your Ruby application
753
+ - **Battle-tested** Built on Playwright's proven browser automation
568
754
 
569
- RubyCrawl aims to bring the same level of accuracy and reliability to the Ruby ecosystem.
755
+ ### 💎 **Developer Experience**
756
+
757
+ - **Zero configuration** — Works immediately after installation
758
+ - **Lazy loading** — Markdown conversion only when you need it
759
+ - **Smart URL handling** — Automatic normalization and deduplication
760
+ - **Comprehensive docs** — Clear examples for common use cases
761
+
762
+ ### 🌐 **Rich Feature Set**
763
+
764
+ - ✅ JavaScript-enabled crawling (SPAs, AJAX, dynamic content)
765
+ - ✅ Multi-page crawling with BFS algorithm
766
+ - ✅ Link extraction with metadata (url, text, title, rel)
767
+ - ✅ Markdown conversion (GitHub-flavored)
768
+ - ✅ Metadata extraction (OG tags, Twitter cards, etc.)
769
+ - ✅ Resource blocking for 2-3x performance boost
770
+
771
+ ### 📊 **Perfect for Modern Use Cases**
772
+
773
+ - **RAG applications** — Build AI knowledge bases from documentation
774
+ - **Data aggregation** — Extract structured data from multiple pages
775
+ - **Content migration** — Convert sites to Markdown for static generators
776
+ - **SEO analysis** — Extract metadata and link structures
777
+ - **Testing** — Verify deployed site content and structure
570
778
 
571
779
  ## License
572
780
 
@@ -574,12 +782,21 @@ The gem is available as open source under the terms of the [MIT License](LICENSE
574
782
 
575
783
  ## Credits
576
784
 
577
- Inspired by [crawl4ai](https://github.com/unclecode/crawl4ai) by @unclecode.
785
+ Built with [Playwright](https://playwright.dev/) by Microsoft — the industry-standard browser automation framework.
578
786
 
579
- Built with [Playwright](https://playwright.dev/) by Microsoft.
787
+ Powered by [reverse_markdown](https://github.com/xijo/reverse_markdown) for GitHub-flavored Markdown conversion.
580
788
 
581
789
  ## Support
582
790
 
583
791
  - **Issues**: [GitHub Issues](https://github.com/craft-wise/rubycrawl/issues)
584
- - **Discussions**: [GitHub Discussions](https://github.com/your-org/rubycrawl/discussions)
792
+ - **Discussions**: [GitHub Discussions](https://github.com/craft-wise/rubycrawl/discussions)
585
793
  - **Email**: ganesh.navale@zohomail.in
794
+
795
+ ## Acknowledgments
796
+
797
+ Special thanks to:
798
+
799
+ - [Microsoft Playwright](https://playwright.dev/) team for the robust, production-grade browser automation framework
800
+ - The Ruby community for building an ecosystem that values developer happiness and code clarity
801
+ - The Node.js community for excellent tooling and libraries that make cross-language integration seamless
802
+ - Open source contributors worldwide who make projects like this possible