wayfarer 0.4.6 → 0.4.7
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.github/workflows/lint.yaml +25 -0
- data/.github/workflows/release.yaml +29 -0
- data/.github/workflows/tests.yaml +30 -0
- data/.gitignore +4 -0
- data/.rubocop.yml +5 -0
- data/.vale.ini +5 -0
- data/.yardopts +1 -3
- data/Dockerfile +5 -4
- data/Gemfile +3 -0
- data/Gemfile.lock +107 -102
- data/Rakefile +5 -56
- data/bin/wayfarer +1 -1
- data/docker-compose.yml +20 -9
- data/docs/cookbook/consent_screen.md +2 -2
- data/docs/cookbook/executing_javascript.md +3 -3
- data/docs/cookbook/navigation.md +12 -12
- data/docs/cookbook/querying_html.md +3 -3
- data/docs/cookbook/screenshots.md +2 -2
- data/docs/cookbook/user_agent.md +1 -1
- data/docs/design.md +36 -0
- data/docs/guides/callbacks.md +24 -126
- data/docs/guides/configuration.md +8 -8
- data/docs/guides/handlers.md +60 -0
- data/docs/guides/index.md +1 -0
- data/docs/guides/jobs/error_handling.md +40 -0
- data/docs/guides/jobs.md +99 -31
- data/docs/guides/navigation.md +1 -1
- data/docs/guides/networking/capybara.md +13 -22
- data/docs/guides/networking/custom_adapters.md +82 -41
- data/docs/guides/networking/ferrum.md +4 -4
- data/docs/guides/networking/http.md +9 -13
- data/docs/guides/networking/selenium.md +10 -11
- data/docs/guides/pages.md +76 -10
- data/docs/guides/redis.md +10 -0
- data/docs/guides/routing.md +74 -0
- data/docs/guides/tasks.md +33 -9
- data/docs/guides/tutorial.md +60 -0
- data/docs/guides/user_agents.md +113 -0
- data/docs/index.md +17 -40
- data/docs/reference/cli.md +35 -25
- data/docs/reference/configuration.md +36 -0
- data/lib/wayfarer/base.rb +124 -46
- data/lib/wayfarer/batch_completion.rb +56 -0
- data/lib/wayfarer/callbacks.rb +22 -48
- data/lib/wayfarer/cli/route_printer.rb +71 -57
- data/lib/wayfarer/cli.rb +121 -0
- data/lib/wayfarer/gc.rb +13 -6
- data/lib/wayfarer/handler.rb +15 -7
- data/lib/wayfarer/logging.rb +38 -0
- data/lib/wayfarer/middleware/base.rb +2 -0
- data/lib/wayfarer/middleware/batch_completion.rb +19 -0
- data/lib/wayfarer/middleware/content_type.rb +54 -0
- data/lib/wayfarer/middleware/controller.rb +19 -15
- data/lib/wayfarer/middleware/dedup.rb +16 -13
- data/lib/wayfarer/middleware/dispatch.rb +12 -4
- data/lib/wayfarer/middleware/normalize.rb +12 -11
- data/lib/wayfarer/middleware/redis.rb +15 -0
- data/lib/wayfarer/middleware/router.rb +33 -35
- data/lib/wayfarer/middleware/stage.rb +5 -5
- data/lib/wayfarer/middleware/uri_parser.rb +30 -0
- data/lib/wayfarer/middleware/user_agent.rb +49 -0
- data/lib/wayfarer/networking/capybara.rb +1 -1
- data/lib/wayfarer/networking/context.rb +2 -2
- data/lib/wayfarer/networking/ferrum.rb +2 -2
- data/lib/wayfarer/networking/follow.rb +12 -6
- data/lib/wayfarer/networking/http.rb +1 -1
- data/lib/wayfarer/networking/pool.rb +17 -12
- data/lib/wayfarer/networking/selenium.rb +3 -3
- data/lib/wayfarer/networking/strategy.rb +2 -2
- data/lib/wayfarer/page.rb +36 -14
- data/lib/wayfarer/parsing/xml.rb +6 -6
- data/lib/wayfarer/parsing.rb +24 -0
- data/lib/wayfarer/redis/barrier.rb +13 -21
- data/lib/wayfarer/redis/counter.rb +19 -9
- data/lib/wayfarer/redis/pool.rb +1 -1
- data/lib/wayfarer/redis/resettable.rb +19 -0
- data/lib/wayfarer/routing/dsl.rb +1 -0
- data/lib/wayfarer/routing/matchers/path.rb +4 -2
- data/lib/wayfarer/routing/root_route.rb +5 -1
- data/lib/wayfarer/routing/route.rb +4 -14
- data/lib/wayfarer/stringify.rb +22 -30
- data/lib/wayfarer/task.rb +12 -18
- data/lib/wayfarer.rb +28 -1
- data/mkdocs.yml +52 -7
- data/rake/docs.rake +26 -0
- data/rake/lint.rake +105 -0
- data/rake/release.rake +29 -0
- data/rake/tests.rake +28 -0
- data/requirements.txt +1 -1
- data/spec/base_spec.rb +140 -160
- data/spec/batch_completion_spec.rb +104 -0
- data/spec/cli/job_spec.rb +19 -23
- data/spec/cli/routing_spec.rb +101 -0
- data/spec/cli/version_spec.rb +1 -1
- data/spec/factories/task.rb +7 -1
- data/spec/fixtures/dummy_job.rb +5 -3
- data/spec/gc_spec.rb +8 -50
- data/spec/handler_spec.rb +1 -1
- data/spec/integration/callbacks_spec.rb +157 -45
- data/spec/integration/content_type_spec.rb +145 -0
- data/spec/integration/gc_spec.rb +44 -0
- data/spec/integration/handler_spec.rb +66 -0
- data/spec/integration/page_spec.rb +44 -29
- data/spec/integration/params_spec.rb +33 -25
- data/spec/integration/parsing_spec.rb +125 -0
- data/spec/integration/routing_spec.rb +18 -0
- data/spec/integration/stage_spec.rb +27 -20
- data/spec/middleware/batch_completion_spec.rb +34 -0
- data/spec/middleware/chain_spec.rb +8 -8
- data/spec/middleware/content_type_spec.rb +86 -0
- data/spec/middleware/controller_spec.rb +5 -5
- data/spec/middleware/dedup_spec.rb +38 -55
- data/spec/middleware/dispatch_spec.rb +23 -7
- data/spec/middleware/normalize_spec.rb +44 -13
- data/spec/middleware/router_spec.rb +29 -30
- data/spec/middleware/stage_spec.rb +8 -8
- data/spec/middleware/uri_parser_spec.rb +53 -0
- data/spec/middleware/{fetch_spec.rb → user_agent_spec.rb} +28 -27
- data/spec/networking/context_spec.rb +1 -1
- data/spec/networking/follow_spec.rb +2 -2
- data/spec/networking/pool_spec.rb +5 -5
- data/spec/networking/strategy.rb +2 -2
- data/spec/page_spec.rb +42 -20
- data/spec/parsing/xml_spec.rb +11 -12
- data/spec/redis/barrier_spec.rb +8 -48
- data/spec/redis/counter_spec.rb +13 -1
- data/spec/redis/pool_spec.rb +1 -1
- data/spec/spec_helpers.rb +27 -16
- data/spec/support/test_app.rb +8 -0
- data/spec/task_spec.rb +3 -24
- data/spec/wayfarer_spec.rb +1 -1
- data/wayfarer.gemspec +4 -3
- metadata +61 -51
- data/.github/workflows/ci.yaml +0 -32
- data/docs/guides/error_handling.md +0 -53
- data/docs/guides/networking.md +0 -94
- data/docs/guides/performance.md +0 -130
- data/docs/guides/reliability.md +0 -41
- data/docs/guides/routing/steering.md +0 -30
- data/docs/reference/api/base.md +0 -48
- data/docs/reference/configuration_keys.md +0 -43
- data/docs/reference/environment_variables.md +0 -83
- data/lib/wayfarer/cli/base.rb +0 -45
- data/lib/wayfarer/cli/generate.rb +0 -17
- data/lib/wayfarer/cli/job.rb +0 -56
- data/lib/wayfarer/cli/route.rb +0 -29
- data/lib/wayfarer/cli/runner.rb +0 -34
- data/lib/wayfarer/cli/templates/Gemfile.tt +0 -5
- data/lib/wayfarer/cli/templates/job.rb.tt +0 -10
- data/lib/wayfarer/config/capybara.rb +0 -10
- data/lib/wayfarer/config/ferrum.rb +0 -11
- data/lib/wayfarer/config/networking.rb +0 -29
- data/lib/wayfarer/config/redis.rb +0 -14
- data/lib/wayfarer/config/root.rb +0 -11
- data/lib/wayfarer/config/selenium.rb +0 -21
- data/lib/wayfarer/config/strconv.rb +0 -45
- data/lib/wayfarer/config/struct.rb +0 -72
- data/lib/wayfarer/middleware/fetch.rb +0 -56
- data/lib/wayfarer/redis/connection.rb +0 -13
- data/lib/wayfarer/redis/version.rb +0 -19
- data/lib/wayfarer/routing/router.rb +0 -28
- data/spec/callbacks_spec.rb +0 -102
- data/spec/cli/generate_spec.rb +0 -39
- data/spec/config/capybara_spec.rb +0 -18
- data/spec/config/ferrum_spec.rb +0 -24
- data/spec/config/networking_spec.rb +0 -73
- data/spec/config/redis_spec.rb +0 -32
- data/spec/config/root_spec.rb +0 -31
- data/spec/config/selenium_spec.rb +0 -56
- data/spec/config/strconv_spec.rb +0 -58
- data/spec/config/struct_spec.rb +0 -66
- data/spec/integration/steering_spec.rb +0 -57
- data/spec/redis/version_spec.rb +0 -13
- data/spec/routing/router_spec.rb +0 -24
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: wayfarer
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.4.
|
4
|
+
version: 0.4.7
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Dominic Bauer
|
@@ -16,14 +16,14 @@ dependencies:
|
|
16
16
|
requirements:
|
17
17
|
- - ">="
|
18
18
|
- !ruby/object:Gem::Version
|
19
|
-
version: '
|
19
|
+
version: '7.1'
|
20
20
|
type: :runtime
|
21
21
|
prerelease: false
|
22
22
|
version_requirements: !ruby/object:Gem::Requirement
|
23
23
|
requirements:
|
24
24
|
- - ">="
|
25
25
|
- !ruby/object:Gem::Version
|
26
|
-
version: '
|
26
|
+
version: '7.1'
|
27
27
|
- !ruby/object:Gem::Dependency
|
28
28
|
name: addressable
|
29
29
|
requirement: !ruby/object:Gem::Requirement
|
@@ -123,33 +123,33 @@ dependencies:
|
|
123
123
|
- !ruby/object:Gem::Version
|
124
124
|
version: '3.0'
|
125
125
|
- !ruby/object:Gem::Dependency
|
126
|
-
name:
|
126
|
+
name: mock_redis
|
127
127
|
requirement: !ruby/object:Gem::Requirement
|
128
128
|
requirements:
|
129
129
|
- - "~>"
|
130
130
|
- !ruby/object:Gem::Version
|
131
|
-
version: '
|
131
|
+
version: '0.29'
|
132
132
|
type: :runtime
|
133
133
|
prerelease: false
|
134
134
|
version_requirements: !ruby/object:Gem::Requirement
|
135
135
|
requirements:
|
136
136
|
- - "~>"
|
137
137
|
- !ruby/object:Gem::Version
|
138
|
-
version: '
|
138
|
+
version: '0.29'
|
139
139
|
- !ruby/object:Gem::Dependency
|
140
|
-
name:
|
140
|
+
name: mustermann
|
141
141
|
requirement: !ruby/object:Gem::Requirement
|
142
142
|
requirements:
|
143
143
|
- - "~>"
|
144
144
|
- !ruby/object:Gem::Version
|
145
|
-
version: '
|
145
|
+
version: '1.1'
|
146
146
|
type: :runtime
|
147
147
|
prerelease: false
|
148
148
|
version_requirements: !ruby/object:Gem::Requirement
|
149
149
|
requirements:
|
150
150
|
- - "~>"
|
151
151
|
- !ruby/object:Gem::Version
|
152
|
-
version: '
|
152
|
+
version: '1.1'
|
153
153
|
- !ruby/object:Gem::Dependency
|
154
154
|
name: net-http-persistent
|
155
155
|
requirement: !ruby/object:Gem::Requirement
|
@@ -240,6 +240,20 @@ dependencies:
|
|
240
240
|
- - "~>"
|
241
241
|
- !ruby/object:Gem::Version
|
242
242
|
version: '1.0'
|
243
|
+
- !ruby/object:Gem::Dependency
|
244
|
+
name: zeitwerk
|
245
|
+
requirement: !ruby/object:Gem::Requirement
|
246
|
+
requirements:
|
247
|
+
- - "~>"
|
248
|
+
- !ruby/object:Gem::Version
|
249
|
+
version: '2.4'
|
250
|
+
type: :runtime
|
251
|
+
prerelease: false
|
252
|
+
version_requirements: !ruby/object:Gem::Requirement
|
253
|
+
requirements:
|
254
|
+
- - "~>"
|
255
|
+
- !ruby/object:Gem::Version
|
256
|
+
version: '2.4'
|
243
257
|
- !ruby/object:Gem::Dependency
|
244
258
|
name: cuprite
|
245
259
|
requirement: !ruby/object:Gem::Requirement
|
@@ -373,12 +387,15 @@ executables:
|
|
373
387
|
extensions: []
|
374
388
|
extra_rdoc_files: []
|
375
389
|
files:
|
376
|
-
- ".github/workflows/
|
390
|
+
- ".github/workflows/lint.yaml"
|
391
|
+
- ".github/workflows/release.yaml"
|
392
|
+
- ".github/workflows/tests.yaml"
|
377
393
|
- ".gitignore"
|
378
394
|
- ".rbenv-gemsets"
|
379
395
|
- ".rspec"
|
380
396
|
- ".rubocop.yml"
|
381
397
|
- ".ruby-version"
|
398
|
+
- ".vale.ini"
|
382
399
|
- ".yardopts"
|
383
400
|
- Dockerfile
|
384
401
|
- Gemfile
|
@@ -395,60 +412,53 @@ files:
|
|
395
412
|
- docs/cookbook/querying_html.md
|
396
413
|
- docs/cookbook/screenshots.md
|
397
414
|
- docs/cookbook/user_agent.md
|
415
|
+
- docs/design.md
|
398
416
|
- docs/guides/callbacks.md
|
399
417
|
- docs/guides/configuration.md
|
400
418
|
- docs/guides/debugging.md
|
401
|
-
- docs/guides/
|
419
|
+
- docs/guides/handlers.md
|
420
|
+
- docs/guides/index.md
|
402
421
|
- docs/guides/jobs.md
|
422
|
+
- docs/guides/jobs/error_handling.md
|
403
423
|
- docs/guides/navigation.md
|
404
|
-
- docs/guides/networking.md
|
405
424
|
- docs/guides/networking/capybara.md
|
406
425
|
- docs/guides/networking/custom_adapters.md
|
407
426
|
- docs/guides/networking/ferrum.md
|
408
427
|
- docs/guides/networking/http.md
|
409
428
|
- docs/guides/networking/selenium.md
|
410
429
|
- docs/guides/pages.md
|
411
|
-
- docs/guides/
|
412
|
-
- docs/guides/
|
413
|
-
- docs/guides/routing/steering.md
|
430
|
+
- docs/guides/redis.md
|
431
|
+
- docs/guides/routing.md
|
414
432
|
- docs/guides/tasks.md
|
433
|
+
- docs/guides/tutorial.md
|
434
|
+
- docs/guides/user_agents.md
|
415
435
|
- docs/index.md
|
416
|
-
- docs/reference/api/base.md
|
417
436
|
- docs/reference/api/route.md
|
418
437
|
- docs/reference/cli.md
|
419
|
-
- docs/reference/
|
420
|
-
- docs/reference/environment_variables.md
|
438
|
+
- docs/reference/configuration.md
|
421
439
|
- lib/wayfarer.rb
|
422
440
|
- lib/wayfarer/base.rb
|
441
|
+
- lib/wayfarer/batch_completion.rb
|
423
442
|
- lib/wayfarer/callbacks.rb
|
424
|
-
- lib/wayfarer/cli
|
425
|
-
- lib/wayfarer/cli/generate.rb
|
426
|
-
- lib/wayfarer/cli/job.rb
|
427
|
-
- lib/wayfarer/cli/route.rb
|
443
|
+
- lib/wayfarer/cli.rb
|
428
444
|
- lib/wayfarer/cli/route_printer.rb
|
429
|
-
- lib/wayfarer/cli/runner.rb
|
430
|
-
- lib/wayfarer/cli/templates/Gemfile.tt
|
431
|
-
- lib/wayfarer/cli/templates/job.rb.tt
|
432
|
-
- lib/wayfarer/config/capybara.rb
|
433
|
-
- lib/wayfarer/config/ferrum.rb
|
434
|
-
- lib/wayfarer/config/networking.rb
|
435
|
-
- lib/wayfarer/config/redis.rb
|
436
|
-
- lib/wayfarer/config/root.rb
|
437
|
-
- lib/wayfarer/config/selenium.rb
|
438
|
-
- lib/wayfarer/config/strconv.rb
|
439
|
-
- lib/wayfarer/config/struct.rb
|
440
445
|
- lib/wayfarer/gc.rb
|
441
446
|
- lib/wayfarer/handler.rb
|
447
|
+
- lib/wayfarer/logging.rb
|
442
448
|
- lib/wayfarer/middleware/base.rb
|
449
|
+
- lib/wayfarer/middleware/batch_completion.rb
|
443
450
|
- lib/wayfarer/middleware/chain.rb
|
451
|
+
- lib/wayfarer/middleware/content_type.rb
|
444
452
|
- lib/wayfarer/middleware/controller.rb
|
445
453
|
- lib/wayfarer/middleware/dedup.rb
|
446
454
|
- lib/wayfarer/middleware/dispatch.rb
|
447
|
-
- lib/wayfarer/middleware/fetch.rb
|
448
455
|
- lib/wayfarer/middleware/lazy.rb
|
449
456
|
- lib/wayfarer/middleware/normalize.rb
|
457
|
+
- lib/wayfarer/middleware/redis.rb
|
450
458
|
- lib/wayfarer/middleware/router.rb
|
451
459
|
- lib/wayfarer/middleware/stage.rb
|
460
|
+
- lib/wayfarer/middleware/uri_parser.rb
|
461
|
+
- lib/wayfarer/middleware/user_agent.rb
|
452
462
|
- lib/wayfarer/networking/capybara.rb
|
453
463
|
- lib/wayfarer/networking/context.rb
|
454
464
|
- lib/wayfarer/networking/ferrum.rb
|
@@ -459,13 +469,13 @@ files:
|
|
459
469
|
- lib/wayfarer/networking/selenium.rb
|
460
470
|
- lib/wayfarer/networking/strategy.rb
|
461
471
|
- lib/wayfarer/page.rb
|
472
|
+
- lib/wayfarer/parsing.rb
|
462
473
|
- lib/wayfarer/parsing/json.rb
|
463
474
|
- lib/wayfarer/parsing/xml.rb
|
464
475
|
- lib/wayfarer/redis/barrier.rb
|
465
|
-
- lib/wayfarer/redis/connection.rb
|
466
476
|
- lib/wayfarer/redis/counter.rb
|
467
477
|
- lib/wayfarer/redis/pool.rb
|
468
|
-
- lib/wayfarer/redis/
|
478
|
+
- lib/wayfarer/redis/resettable.rb
|
469
479
|
- lib/wayfarer/routing/dsl.rb
|
470
480
|
- lib/wayfarer/routing/matchers/custom.rb
|
471
481
|
- lib/wayfarer/routing/matchers/host.rb
|
@@ -478,26 +488,21 @@ files:
|
|
478
488
|
- lib/wayfarer/routing/result.rb
|
479
489
|
- lib/wayfarer/routing/root_route.rb
|
480
490
|
- lib/wayfarer/routing/route.rb
|
481
|
-
- lib/wayfarer/routing/router.rb
|
482
491
|
- lib/wayfarer/routing/target_route.rb
|
483
492
|
- lib/wayfarer/serializer.rb
|
484
493
|
- lib/wayfarer/stringify.rb
|
485
494
|
- lib/wayfarer/task.rb
|
486
495
|
- mkdocs.yml
|
496
|
+
- rake/docs.rake
|
497
|
+
- rake/lint.rake
|
498
|
+
- rake/release.rake
|
499
|
+
- rake/tests.rake
|
487
500
|
- requirements.txt
|
488
501
|
- spec/base_spec.rb
|
489
|
-
- spec/
|
490
|
-
- spec/cli/generate_spec.rb
|
502
|
+
- spec/batch_completion_spec.rb
|
491
503
|
- spec/cli/job_spec.rb
|
504
|
+
- spec/cli/routing_spec.rb
|
492
505
|
- spec/cli/version_spec.rb
|
493
|
-
- spec/config/capybara_spec.rb
|
494
|
-
- spec/config/ferrum_spec.rb
|
495
|
-
- spec/config/networking_spec.rb
|
496
|
-
- spec/config/redis_spec.rb
|
497
|
-
- spec/config/root_spec.rb
|
498
|
-
- spec/config/selenium_spec.rb
|
499
|
-
- spec/config/strconv_spec.rb
|
500
|
-
- spec/config/struct_spec.rb
|
501
506
|
- spec/factories/middleware.rb
|
502
507
|
- spec/factories/page.rb
|
503
508
|
- spec/factories/task.rb
|
@@ -505,18 +510,25 @@ files:
|
|
505
510
|
- spec/gc_spec.rb
|
506
511
|
- spec/handler_spec.rb
|
507
512
|
- spec/integration/callbacks_spec.rb
|
513
|
+
- spec/integration/content_type_spec.rb
|
514
|
+
- spec/integration/gc_spec.rb
|
515
|
+
- spec/integration/handler_spec.rb
|
508
516
|
- spec/integration/page_spec.rb
|
509
517
|
- spec/integration/params_spec.rb
|
518
|
+
- spec/integration/parsing_spec.rb
|
519
|
+
- spec/integration/routing_spec.rb
|
510
520
|
- spec/integration/stage_spec.rb
|
511
|
-
- spec/
|
521
|
+
- spec/middleware/batch_completion_spec.rb
|
512
522
|
- spec/middleware/chain_spec.rb
|
523
|
+
- spec/middleware/content_type_spec.rb
|
513
524
|
- spec/middleware/controller_spec.rb
|
514
525
|
- spec/middleware/dedup_spec.rb
|
515
526
|
- spec/middleware/dispatch_spec.rb
|
516
|
-
- spec/middleware/fetch_spec.rb
|
517
527
|
- spec/middleware/normalize_spec.rb
|
518
528
|
- spec/middleware/router_spec.rb
|
519
529
|
- spec/middleware/stage_spec.rb
|
530
|
+
- spec/middleware/uri_parser_spec.rb
|
531
|
+
- spec/middleware/user_agent_spec.rb
|
520
532
|
- spec/networking/capybara_spec.rb
|
521
533
|
- spec/networking/context_spec.rb
|
522
534
|
- spec/networking/ferrum_spec.rb
|
@@ -531,7 +543,6 @@ files:
|
|
531
543
|
- spec/redis/barrier_spec.rb
|
532
544
|
- spec/redis/counter_spec.rb
|
533
545
|
- spec/redis/pool_spec.rb
|
534
|
-
- spec/redis/version_spec.rb
|
535
546
|
- spec/routing/dsl_spec.rb
|
536
547
|
- spec/routing/integration_spec.rb
|
537
548
|
- spec/routing/matchers/custom_spec.rb
|
@@ -544,7 +555,6 @@ files:
|
|
544
555
|
- spec/routing/path_finder_spec.rb
|
545
556
|
- spec/routing/root_route_spec.rb
|
546
557
|
- spec/routing/route_spec.rb
|
547
|
-
- spec/routing/router_spec.rb
|
548
558
|
- spec/spec_helpers.rb
|
549
559
|
- spec/stringify_spec.rb
|
550
560
|
- spec/support/static/finders.html
|
data/.github/workflows/ci.yaml
DELETED
@@ -1,32 +0,0 @@
|
|
1
|
-
name: ci
|
2
|
-
|
3
|
-
on:
|
4
|
-
push:
|
5
|
-
branches:
|
6
|
-
- '*'
|
7
|
-
env:
|
8
|
-
CI: true
|
9
|
-
|
10
|
-
jobs:
|
11
|
-
ci:
|
12
|
-
runs-on: ubuntu-latest
|
13
|
-
steps:
|
14
|
-
- uses: actions/checkout@v2
|
15
|
-
|
16
|
-
- name: Start services
|
17
|
-
run: docker-compose up -d
|
18
|
-
|
19
|
-
- name: Run isolated tests
|
20
|
-
run: docker-compose run --rm --name test --service-ports wayfarer bundle exec rake test:isolated
|
21
|
-
|
22
|
-
- name: Run Ferrum tests
|
23
|
-
run: docker-compose run --rm --name test --service-ports wayfarer bundle exec rake test:ferrum
|
24
|
-
|
25
|
-
- name: Run Selenium tests
|
26
|
-
run: docker-compose run --rm --name test --service-ports wayfarer bundle exec rake test:selenium
|
27
|
-
|
28
|
-
- name: Run CLI tests
|
29
|
-
run: docker-compose run --rm --name test --service-ports wayfarer bundle exec rake test:cli
|
30
|
-
|
31
|
-
- name: Run RuboCop
|
32
|
-
run: docker-compose run --rm --name test --service-ports wayfarer bundle exec rake rubocop
|
@@ -1,53 +0,0 @@
|
|
1
|
-
# Error handling
|
2
|
-
|
3
|
-
## Wayfarer never swallows exceptions
|
4
|
-
|
5
|
-
* Wayfarer never swallows exceptions.
|
6
|
-
* Jobs with unhandled exceptions are not retried.
|
7
|
-
|
8
|
-
## Retrying or discarding failing jobs
|
9
|
-
|
10
|
-
Wayfarer relies on [Active Job's two error handling facilities](https://guides.rubyonrails.org/active_job_basics.html#exceptions).
|
11
|
-
|
12
|
-
* `retry_on` to retry jobs a number of times on certain errors:
|
13
|
-
|
14
|
-
```ruby
|
15
|
-
class DummyJob < Wayfarer::Base
|
16
|
-
retry_on MyError, attempts: 3 do |job, error|
|
17
|
-
# This block runs once all 3 attempts have failed
|
18
|
-
# (1 initial attempt + 2 retries)
|
19
|
-
|
20
|
-
raise error
|
21
|
-
end
|
22
|
-
end
|
23
|
-
```
|
24
|
-
|
25
|
-
* `discard_on` to throw away jobs on certain errors:
|
26
|
-
|
27
|
-
```ruby
|
28
|
-
class DummyJob < Wayfarer::Base
|
29
|
-
discard_on MyError do |job, error|
|
30
|
-
# This block runs once and buries the job
|
31
|
-
|
32
|
-
raise error
|
33
|
-
end
|
34
|
-
end
|
35
|
-
```
|
36
|
-
|
37
|
-
!!! attention "Always re-raise errors"
|
38
|
-
|
39
|
-
You should always re-raise errors from `retry_on` and `discard_on` blocks,
|
40
|
-
otherwise jobs will not get retried!
|
41
|
-
|
42
|
-
## Renewing agents on certain errors
|
43
|
-
|
44
|
-
```ruby
|
45
|
-
Wayfarer.config.network.renew_on = [MyError]
|
46
|
-
```
|
47
|
-
|
48
|
-
For example, if you use the Capybara
|
49
|
-
[Cuprite](https://github.com/rubycdp/cuprite) driver:
|
50
|
-
|
51
|
-
```ruby
|
52
|
-
Wayfarer.config.network.renew_on = [Ferrum::DeadBrowserError]
|
53
|
-
```
|
data/docs/guides/networking.md
DELETED
@@ -1,94 +0,0 @@
|
|
1
|
-
# Networking
|
2
|
-
|
3
|
-
Wayfarer navigates the web in two ways:
|
4
|
-
|
5
|
-
1. Via plain HTTP requests
|
6
|
-
2. By automating browsers
|
7
|
-
|
8
|
-
Both options are mutually exclusive per Ruby process.
|
9
|
-
|
10
|
-
## User agents
|
11
|
-
|
12
|
-
A user agent is an entity that knows how to retrieve the contents behind a URL.
|
13
|
-
|
14
|
-
The user agent can be configured via the global configuration:
|
15
|
-
|
16
|
-
```ruby
|
17
|
-
Wayfarer.config.network.agent = :http # or :ferrum, :selenium
|
18
|
-
```
|
19
|
-
|
20
|
-
## Connection pooling
|
21
|
-
|
22
|
-
Wayfarer keeps user agents within a connection pool. When a job executes
|
23
|
-
and needs to retrieve the contents behind a URL, an agent is checked out from
|
24
|
-
the pool.
|
25
|
-
|
26
|
-
The pool has a constant size and it should equal the number of threads the
|
27
|
-
underlying message queue operates with. The size can be configured via the
|
28
|
-
global configuration:
|
29
|
-
|
30
|
-
```ruby
|
31
|
-
Wayfarer.config.network.pool_size = 8
|
32
|
-
```
|
33
|
-
|
34
|
-
### Timeouts
|
35
|
-
|
36
|
-
user agents may stay checked out from the pool by jobs for a limited time
|
37
|
-
only. Once this time limit is exceeded, a `ConnectionPool::TimeoutError`
|
38
|
-
exception is raised. This places a hard time limit on every job.
|
39
|
-
|
40
|
-
The timeout can be configured via the global configuration:
|
41
|
-
|
42
|
-
```ruby
|
43
|
-
Wayfarer.config.network.pool_timeout = 20 # seconds
|
44
|
-
```
|
45
|
-
|
46
|
-
Because jobs with unhandled exceptions fail, explicit error handling is required
|
47
|
-
if retries are desired:
|
48
|
-
|
49
|
-
```ruby
|
50
|
-
class DummyJob < Wayfarer::Base
|
51
|
-
retry_on ConnectionPool::TimeoutError, attempts: 3
|
52
|
-
end
|
53
|
-
```
|
54
|
-
|
55
|
-
## Agent-specific client timeouts
|
56
|
-
|
57
|
-
The time in seconds it may take to communicate with remote browser processes can
|
58
|
-
be configured globally per agent:
|
59
|
-
|
60
|
-
```ruby
|
61
|
-
Wayfarer.config.ferrum.options = { timeout: 5 }
|
62
|
-
Wayfarer.config.selenium.client_timeout = 60
|
63
|
-
```
|
64
|
-
|
65
|
-
### Shared state
|
66
|
-
|
67
|
-
As user agents get checked in and out continously between jobs, their state
|
68
|
-
carries over from job to job, too.
|
69
|
-
|
70
|
-
For browser automation, this means:
|
71
|
-
|
72
|
-
* A job finds the browser at the last URL the previous job has left off.
|
73
|
-
* The browser's cookies might have been set, or other client-side state might
|
74
|
-
exist that significantly affects a page's behaviour.
|
75
|
-
|
76
|
-
## HTTP redirect handling
|
77
|
-
|
78
|
-
Browsers follow redirects transparently when they are navigated to a URL.
|
79
|
-
|
80
|
-
When using plain HTTP, redirect URLs are enqueued transparently within the same
|
81
|
-
batch. URLs that result in 3xx responses will not be retrieved again within
|
82
|
-
their batch.
|
83
|
-
|
84
|
-
## HTTP request headers
|
85
|
-
|
86
|
-
Request headers can be configured via the global configuration:
|
87
|
-
|
88
|
-
```ruby
|
89
|
-
Wayfarer.config.network.http_headers = { "Field" => "Value" }
|
90
|
-
```
|
91
|
-
|
92
|
-
!!! attention "Partial support"
|
93
|
-
|
94
|
-
Selenium does not support configuring HTTP request headers.
|
data/docs/guides/performance.md
DELETED
@@ -1,130 +0,0 @@
|
|
1
|
-
# Performance
|
2
|
-
|
3
|
-
How to write performant crawlers with Wayfarer.
|
4
|
-
|
5
|
-
## Use a sufficiently sized user agent pool
|
6
|
-
|
7
|
-
Automated browser processes or HTTP clients are kept in a [connection pool]() of
|
8
|
-
static size. This avoids having to re-establish browser processes and enables
|
9
|
-
their reuse.
|
10
|
-
|
11
|
-
If the size of the pool is too small, the pool is a
|
12
|
-
bottleneck. For example, if your message queue adapter uses 8 threads, but the
|
13
|
-
pool only contains 1 user agent, the remaining 7 threads block until the agent
|
14
|
-
is checked back in to the pool for use by one of the blocked threads.
|
15
|
-
|
16
|
-
There is no reliable way to detect the number of threads of the underlying
|
17
|
-
message queue adapter. The pool size should equal the number of threads;
|
18
|
-
|
19
|
-
```ruby
|
20
|
-
Wayfarer.config.network.pool_size = 8 # defaults to 1
|
21
|
-
```
|
22
|
-
|
23
|
-
### Job shedding
|
24
|
-
|
25
|
-
There is a maximum number of seconds that jobs wait when checking out a user
|
26
|
-
agent from the pool. Once this time is exceeded,
|
27
|
-
a `Wayfarer::UserAgentTimeoutError` is raised. By default, the timeout is 10
|
28
|
-
seconds.
|
29
|
-
|
30
|
-
This hints there are more threads in use than user agents in the pool.
|
31
|
-
|
32
|
-
## Stage less URLs
|
33
|
-
|
34
|
-
Staging less URLs saves space and time:
|
35
|
-
|
36
|
-
* Less tasks written to the message queue
|
37
|
-
* Less time spent consuming tasks
|
38
|
-
* Less time spent filtering URLs with Redis
|
39
|
-
|
40
|
-
Wayfarer maintains a set of processed URLs for a batch in Redis. Every staged
|
41
|
-
URL is checked for inclusion in this set before it gets appended as a task to
|
42
|
-
the message queue.
|
43
|
-
|
44
|
-
A common pattern is to stage all links of a page, and rely on routing to fetch
|
45
|
-
only the relevant ones:
|
46
|
-
|
47
|
-
```ruby
|
48
|
-
class DummyJob < Wayfarer::Base
|
49
|
-
route { to: index, host: "example.com" }
|
50
|
-
|
51
|
-
def index
|
52
|
-
stage page.meta.links.all
|
53
|
-
end
|
54
|
-
end
|
55
|
-
```
|
56
|
-
|
57
|
-
Pages commonly contain a large number of URLs.
|
58
|
-
|
59
|
-
Every staged URL is:
|
60
|
-
|
61
|
-
1. Normalized to a canonical form, for example by sorting query parameters
|
62
|
-
alphabetically.
|
63
|
-
2. Checked for inclusion in the batch Redis set or discarded.
|
64
|
-
3. Written to the message queue.
|
65
|
-
4. Consumed from the queue and matched against the router.
|
66
|
-
5. Fetched, if a route matches.
|
67
|
-
|
68
|
-
Narrowing down the links in the document to follow speeds up the process.
|
69
|
-
For example using Nokogiri, interesting links can be identified with a CSS
|
70
|
-
selector:
|
71
|
-
|
72
|
-
```ruby
|
73
|
-
class DummyJob < Wayfarer::Base
|
74
|
-
route { to: index, host: "example.com" }
|
75
|
-
|
76
|
-
def index
|
77
|
-
stage interesting_links
|
78
|
-
end
|
79
|
-
|
80
|
-
private
|
81
|
-
|
82
|
-
def interesting_links
|
83
|
-
page.doc.css("a.interesting").map { |elem| elem["href"] }
|
84
|
-
end
|
85
|
-
end
|
86
|
-
```
|
87
|
-
|
88
|
-
Because the router only accepts the single hostname `example.com`, the job can
|
89
|
-
also ensure it stages only internal URLs by intersecting them with the
|
90
|
-
interesting ones:
|
91
|
-
|
92
|
-
```ruby
|
93
|
-
class DummyJob < Wayfarer::Base
|
94
|
-
route { to: index, host: "example.com" }
|
95
|
-
|
96
|
-
def index
|
97
|
-
stage interesting_internal_links
|
98
|
-
end
|
99
|
-
|
100
|
-
private
|
101
|
-
|
102
|
-
def interesting_internal_links
|
103
|
-
page.meta.links.internal & interesting_links
|
104
|
-
end
|
105
|
-
|
106
|
-
def interesting_links
|
107
|
-
page.doc.css("a.interesting").map { |elem| elem["href"] }
|
108
|
-
end
|
109
|
-
end
|
110
|
-
```
|
111
|
-
|
112
|
-
|
113
|
-
## Use Redis >= 6.2.0
|
114
|
-
|
115
|
-
Redis 6.2.0 introduced the
|
116
|
-
[`SMISMEMBER`](https://redis.io/commands/smismember) command which enables
|
117
|
-
Wayfarer to check whether multiple URLs have been processed in a batch with a
|
118
|
-
single command. With earlier versions, one command per URL is required.
|
119
|
-
|
120
|
-
Wayfarer detects the Redis server version and uses `SMISMEMBER` without user
|
121
|
-
configuration when supported.
|
122
|
-
|
123
|
-
## Use Oj for JSON parsing
|
124
|
-
|
125
|
-
Wayfarer uses [Oj](https://github.com/ohler55/oj) for JSON parsing if the gem
|
126
|
-
has been required at runtime:
|
127
|
-
|
128
|
-
```ruby
|
129
|
-
require "oj"
|
130
|
-
```
|
data/docs/guides/reliability.md
DELETED
@@ -1,41 +0,0 @@
|
|
1
|
-
# Reliablity
|
2
|
-
|
3
|
-
## Durability
|
4
|
-
|
5
|
-
Wayfarer executes atop reliable messages queues such as Sidekiq, Resque,
|
6
|
-
RabbitMQ, etc. Its configuration is independent of the underlying queue
|
7
|
-
infrastructure it reads from and writes to.
|
8
|
-
|
9
|
-
## Self-healing user agents
|
10
|
-
|
11
|
-
Wayfarer handles the scenario where a remote browser process has crashed and
|
12
|
-
must be replaced by a fresh browser process.
|
13
|
-
|
14
|
-
This can be tested locally by automating a browser with headless mode turned
|
15
|
-
off, and then closing the opened browser window: The current job fails, but the
|
16
|
-
next job has access to a newly established browser session again.
|
17
|
-
|
18
|
-
For example Ferrum might raise `Ferrum::DeadBrowserError`. Wayfarer's
|
19
|
-
user agents are self-healing and react to these kinds of errors internally. When
|
20
|
-
a browser window is closed, the Ferrum user agent attempts to establish a new
|
21
|
-
browser process as a replacement, for the next job to use.
|
22
|
-
|
23
|
-
[Wayfarer never swallows exceptions](/guides/error_handling). This means
|
24
|
-
that even though the user agent might heal itself, jobs still need to explicitly
|
25
|
-
retry browser errors:
|
26
|
-
|
27
|
-
```ruby
|
28
|
-
class Foobar < Wayfarer::Base
|
29
|
-
route { to: :index }
|
30
|
-
|
31
|
-
retry_on Ferrum::DeadBrowserError, attempts: 3, wait: :exponentially_longer
|
32
|
-
|
33
|
-
# ...
|
34
|
-
end
|
35
|
-
```
|
36
|
-
|
37
|
-
This leads to log entries like:
|
38
|
-
|
39
|
-
```
|
40
|
-
Retrying DummyJob in 3 seconds, due to a Ferrum::DeadBrowserError.
|
41
|
-
```
|
@@ -1,30 +0,0 @@
|
|
1
|
-
# Steering
|
2
|
-
|
3
|
-
A job's router can receive arguments computed dynamically by `::steer`.
|
4
|
-
Steering enables [batch routing](/cookbook/batch_routing).
|
5
|
-
|
6
|
-
For example, the following router has hostname and path hard-coded:
|
7
|
-
|
8
|
-
```ruby
|
9
|
-
class DummyJob < Wayfarer::Base
|
10
|
-
route do
|
11
|
-
host "example.com", path: "/contact", to: :index
|
12
|
-
end
|
13
|
-
end
|
14
|
-
```
|
15
|
-
|
16
|
-
Instead, hostname and path could be provided by `::steer`, too:
|
17
|
-
|
18
|
-
```ruby
|
19
|
-
class DummyJob < Wayfarer::Base
|
20
|
-
route do |hostname, path|
|
21
|
-
host hostname, path: path, to: :index
|
22
|
-
end
|
23
|
-
|
24
|
-
steer do |_task|
|
25
|
-
["example.com", "/contact"]
|
26
|
-
end
|
27
|
-
end
|
28
|
-
```
|
29
|
-
|
30
|
-
Note that `steer` yields the current [task](/guides/tasks).
|
data/docs/reference/api/base.md
DELETED
@@ -1,48 +0,0 @@
|
|
1
|
-
---
|
2
|
-
title: Wayfarer::Base
|
3
|
-
---
|
4
|
-
|
5
|
-
# `Wayfarer::Base`
|
6
|
-
|
7
|
-
Wayfarer's complete job API.
|
8
|
-
|
9
|
-
---
|
10
|
-
|
11
|
-
### `::route`
|
12
|
-
: Draw routes to instance methods.
|
13
|
-
|
14
|
-
---
|
15
|
-
|
16
|
-
### `::steer { (Wayfarer::Task) -> [any] }`
|
17
|
-
: Provide router arguments.
|
18
|
-
|
19
|
-
---
|
20
|
-
|
21
|
-
### `#task -> Wayfarer::Task`
|
22
|
-
: The currently processing task.
|
23
|
-
|
24
|
-
---
|
25
|
-
|
26
|
-
### `#params -> Hash`
|
27
|
-
: URL parameters collected from the matching route.
|
28
|
-
|
29
|
-
---
|
30
|
-
|
31
|
-
### `#stage(String | [String]) -> void`
|
32
|
-
: Add URLs to a processing set. URLs already processed within the
|
33
|
-
current batch get discarded are not enqueued. Every staged URL gets
|
34
|
-
normalized.
|
35
|
-
|
36
|
-
---
|
37
|
-
|
38
|
-
### `#browser -> Object`
|
39
|
-
: The user agent that retrieved the current page.
|
40
|
-
|
41
|
-
---
|
42
|
-
|
43
|
-
### `#page(live: true | false) -> Page`
|
44
|
-
: The page representing the response retrieved from the currently
|
45
|
-
processing URL.
|
46
|
-
|
47
|
-
With `live: true` called, a fresh `Page` is returned that reflects the
|
48
|
-
current browser DOM. Calls to `#page` return the most recent page.
|