wayfarer 0.0.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.gitignore +8 -0
- data/.rbenv-gemsets +1 -0
- data/.rspec +3 -0
- data/.rubocop.yml +21 -0
- data/.ruby-version +1 -0
- data/.travis.yml +5 -0
- data/.yardopts +3 -0
- data/Changelog.md +10 -0
- data/Gemfile +11 -0
- data/LICENSE +19 -0
- data/README.md +21 -0
- data/Rakefile +114 -0
- data/benchmark/frontiers.rb +143 -0
- data/bin/wayfarer +116 -0
- data/docs/.gitignore +2 -0
- data/docs/_config.yml +15 -0
- data/docs/_includes/base.html +7 -0
- data/docs/_includes/head.html +10 -0
- data/docs/_includes/navigation.html +187 -0
- data/docs/_layouts/default.html +42 -0
- data/docs/_sass/base.scss +439 -0
- data/docs/_sass/variables.scss +24 -0
- data/docs/_sass/vendor/bourbon/_bourbon-deprecate.scss +19 -0
- data/docs/_sass/vendor/bourbon/_bourbon-deprecated-upcoming.scss +425 -0
- data/docs/_sass/vendor/bourbon/_bourbon.scss +90 -0
- data/docs/_sass/vendor/bourbon/addons/_border-color.scss +29 -0
- data/docs/_sass/vendor/bourbon/addons/_border-radius.scss +48 -0
- data/docs/_sass/vendor/bourbon/addons/_border-style.scss +28 -0
- data/docs/_sass/vendor/bourbon/addons/_border-width.scss +28 -0
- data/docs/_sass/vendor/bourbon/addons/_buttons.scss +69 -0
- data/docs/_sass/vendor/bourbon/addons/_clearfix.scss +25 -0
- data/docs/_sass/vendor/bourbon/addons/_ellipsis.scss +30 -0
- data/docs/_sass/vendor/bourbon/addons/_font-stacks.scss +31 -0
- data/docs/_sass/vendor/bourbon/addons/_hide-text.scss +27 -0
- data/docs/_sass/vendor/bourbon/addons/_margin.scss +29 -0
- data/docs/_sass/vendor/bourbon/addons/_padding.scss +29 -0
- data/docs/_sass/vendor/bourbon/addons/_position.scss +51 -0
- data/docs/_sass/vendor/bourbon/addons/_prefixer.scss +66 -0
- data/docs/_sass/vendor/bourbon/addons/_retina-image.scss +27 -0
- data/docs/_sass/vendor/bourbon/addons/_size.scss +56 -0
- data/docs/_sass/vendor/bourbon/addons/_text-inputs.scss +118 -0
- data/docs/_sass/vendor/bourbon/addons/_timing-functions.scss +34 -0
- data/docs/_sass/vendor/bourbon/addons/_triangle.scss +63 -0
- data/docs/_sass/vendor/bourbon/addons/_word-wrap.scss +29 -0
- data/docs/_sass/vendor/bourbon/css3/_animation.scss +61 -0
- data/docs/_sass/vendor/bourbon/css3/_appearance.scss +5 -0
- data/docs/_sass/vendor/bourbon/css3/_backface-visibility.scss +5 -0
- data/docs/_sass/vendor/bourbon/css3/_background-image.scss +44 -0
- data/docs/_sass/vendor/bourbon/css3/_background.scss +57 -0
- data/docs/_sass/vendor/bourbon/css3/_border-image.scss +61 -0
- data/docs/_sass/vendor/bourbon/css3/_calc.scss +6 -0
- data/docs/_sass/vendor/bourbon/css3/_columns.scss +67 -0
- data/docs/_sass/vendor/bourbon/css3/_filter.scss +6 -0
- data/docs/_sass/vendor/bourbon/css3/_flex-box.scss +327 -0
- data/docs/_sass/vendor/bourbon/css3/_font-face.scss +29 -0
- data/docs/_sass/vendor/bourbon/css3/_font-feature-settings.scss +6 -0
- data/docs/_sass/vendor/bourbon/css3/_hidpi-media-query.scss +12 -0
- data/docs/_sass/vendor/bourbon/css3/_hyphens.scss +6 -0
- data/docs/_sass/vendor/bourbon/css3/_image-rendering.scss +15 -0
- data/docs/_sass/vendor/bourbon/css3/_keyframes.scss +38 -0
- data/docs/_sass/vendor/bourbon/css3/_linear-gradient.scss +40 -0
- data/docs/_sass/vendor/bourbon/css3/_perspective.scss +12 -0
- data/docs/_sass/vendor/bourbon/css3/_placeholder.scss +10 -0
- data/docs/_sass/vendor/bourbon/css3/_radial-gradient.scss +40 -0
- data/docs/_sass/vendor/bourbon/css3/_selection.scss +44 -0
- data/docs/_sass/vendor/bourbon/css3/_text-decoration.scss +27 -0
- data/docs/_sass/vendor/bourbon/css3/_transform.scss +21 -0
- data/docs/_sass/vendor/bourbon/css3/_transition.scss +81 -0
- data/docs/_sass/vendor/bourbon/css3/_user-select.scss +5 -0
- data/docs/_sass/vendor/bourbon/functions/_assign-inputs.scss +16 -0
- data/docs/_sass/vendor/bourbon/functions/_contains-falsy.scss +25 -0
- data/docs/_sass/vendor/bourbon/functions/_contains.scss +31 -0
- data/docs/_sass/vendor/bourbon/functions/_is-length.scss +16 -0
- data/docs/_sass/vendor/bourbon/functions/_is-light.scss +26 -0
- data/docs/_sass/vendor/bourbon/functions/_is-number.scss +16 -0
- data/docs/_sass/vendor/bourbon/functions/_is-size.scss +23 -0
- data/docs/_sass/vendor/bourbon/functions/_modular-scale.scss +74 -0
- data/docs/_sass/vendor/bourbon/functions/_px-to-em.scss +24 -0
- data/docs/_sass/vendor/bourbon/functions/_px-to-rem.scss +26 -0
- data/docs/_sass/vendor/bourbon/functions/_shade.scss +24 -0
- data/docs/_sass/vendor/bourbon/functions/_strip-units.scss +22 -0
- data/docs/_sass/vendor/bourbon/functions/_tint.scss +24 -0
- data/docs/_sass/vendor/bourbon/functions/_transition-property-name.scss +37 -0
- data/docs/_sass/vendor/bourbon/functions/_unpack.scss +32 -0
- data/docs/_sass/vendor/bourbon/helpers/_convert-units.scss +26 -0
- data/docs/_sass/vendor/bourbon/helpers/_directional-values.scss +108 -0
- data/docs/_sass/vendor/bourbon/helpers/_font-source-declaration.scss +53 -0
- data/docs/_sass/vendor/bourbon/helpers/_gradient-positions-parser.scss +24 -0
- data/docs/_sass/vendor/bourbon/helpers/_linear-angle-parser.scss +35 -0
- data/docs/_sass/vendor/bourbon/helpers/_linear-gradient-parser.scss +51 -0
- data/docs/_sass/vendor/bourbon/helpers/_linear-positions-parser.scss +77 -0
- data/docs/_sass/vendor/bourbon/helpers/_linear-side-corner-parser.scss +41 -0
- data/docs/_sass/vendor/bourbon/helpers/_radial-arg-parser.scss +74 -0
- data/docs/_sass/vendor/bourbon/helpers/_radial-gradient-parser.scss +55 -0
- data/docs/_sass/vendor/bourbon/helpers/_radial-positions-parser.scss +28 -0
- data/docs/_sass/vendor/bourbon/helpers/_render-gradients.scss +31 -0
- data/docs/_sass/vendor/bourbon/helpers/_shape-size-stripper.scss +15 -0
- data/docs/_sass/vendor/bourbon/helpers/_str-to-num.scss +55 -0
- data/docs/_sass/vendor/bourbon/settings/_asset-pipeline.scss +7 -0
- data/docs/_sass/vendor/bourbon/settings/_deprecation-warnings.scss +8 -0
- data/docs/_sass/vendor/bourbon/settings/_prefixer.scss +9 -0
- data/docs/_sass/vendor/bourbon/settings/_px-to-em.scss +1 -0
- data/docs/_sass/vendor/neat/_neat-helpers.scss +11 -0
- data/docs/_sass/vendor/neat/_neat.scss +23 -0
- data/docs/_sass/vendor/neat/functions/_new-breakpoint.scss +49 -0
- data/docs/_sass/vendor/neat/functions/_private.scss +114 -0
- data/docs/_sass/vendor/neat/grid/_box-sizing.scss +15 -0
- data/docs/_sass/vendor/neat/grid/_direction-context.scss +33 -0
- data/docs/_sass/vendor/neat/grid/_display-context.scss +28 -0
- data/docs/_sass/vendor/neat/grid/_fill-parent.scss +22 -0
- data/docs/_sass/vendor/neat/grid/_media.scss +92 -0
- data/docs/_sass/vendor/neat/grid/_omega.scss +87 -0
- data/docs/_sass/vendor/neat/grid/_outer-container.scss +34 -0
- data/docs/_sass/vendor/neat/grid/_pad.scss +25 -0
- data/docs/_sass/vendor/neat/grid/_private.scss +35 -0
- data/docs/_sass/vendor/neat/grid/_row.scss +52 -0
- data/docs/_sass/vendor/neat/grid/_shift.scss +50 -0
- data/docs/_sass/vendor/neat/grid/_span-columns.scss +94 -0
- data/docs/_sass/vendor/neat/grid/_to-deprecate.scss +97 -0
- data/docs/_sass/vendor/neat/grid/_visual-grid.scss +42 -0
- data/docs/_sass/vendor/neat/mixins/_clearfix.scss +25 -0
- data/docs/_sass/vendor/neat/settings/_disable-warnings.scss +13 -0
- data/docs/_sass/vendor/neat/settings/_grid.scss +51 -0
- data/docs/_sass/vendor/neat/settings/_visual-grid.scss +27 -0
- data/docs/_sass/vendor/normalize-3.0.2.scss +427 -0
- data/docs/_sass/vendor/pygments.scss +356 -0
- data/docs/automating_browsers/capybara.md +70 -0
- data/docs/css/screen.scss +7 -0
- data/docs/guides/callbacks.md +45 -0
- data/docs/guides/cli.md +52 -0
- data/docs/guides/configuration.md +184 -0
- data/docs/guides/error_handling.md +46 -0
- data/docs/guides/frontiers.md +93 -0
- data/docs/guides/halting.md +23 -0
- data/docs/guides/job_queues.md +26 -0
- data/docs/guides/locals.md +36 -0
- data/docs/guides/logging.md +22 -0
- data/docs/guides/page_objects.md +67 -0
- data/docs/guides/peeking.md +46 -0
- data/docs/guides/selenium_capybara.md +100 -0
- data/docs/guides/tutorial.md +452 -0
- data/docs/index.md +82 -0
- data/docs/js/navigation.js +11 -0
- data/docs/misc/contributing.md +20 -0
- data/docs/misc/testing.md +11 -0
- data/docs/recipes/authentication.md +23 -0
- data/docs/recipes/csv.md +29 -0
- data/docs/recipes/javascript.md +20 -0
- data/docs/recipes/multiple_uris.md +18 -0
- data/docs/recipes/screenshots.md +20 -0
- data/docs/routing/custom_rules.md +16 -0
- data/docs/routing/filetypes_rules.md +21 -0
- data/docs/routing/host_rules.md +24 -0
- data/docs/routing/path_rules.md +33 -0
- data/docs/routing/protocol_rules.md +17 -0
- data/docs/routing/query_rules.md +69 -0
- data/docs/routing/routes.md +96 -0
- data/docs/routing/uri_rules.md +18 -0
- data/examples/collect_github_issues.rb +65 -0
- data/examples/find_foobar_on_wikipedia.rb +23 -0
- data/lib/wayfarer/configuration.rb +86 -0
- data/lib/wayfarer/crawl.rb +79 -0
- data/lib/wayfarer/crawl_observer.rb +103 -0
- data/lib/wayfarer/dispatcher.rb +104 -0
- data/lib/wayfarer/finders.rb +61 -0
- data/lib/wayfarer/frontiers/frontier.rb +79 -0
- data/lib/wayfarer/frontiers/memory_bloomfilter.rb +32 -0
- data/lib/wayfarer/frontiers/memory_frontier.rb +76 -0
- data/lib/wayfarer/frontiers/memory_trie_frontier.rb +39 -0
- data/lib/wayfarer/frontiers/normalize_uris.rb +48 -0
- data/lib/wayfarer/frontiers/redis_bloomfilter.rb +34 -0
- data/lib/wayfarer/frontiers/redis_frontier.rb +83 -0
- data/lib/wayfarer/http_adapters/adapter_pool.rb +62 -0
- data/lib/wayfarer/http_adapters/net_http_adapter.rb +77 -0
- data/lib/wayfarer/http_adapters/selenium_adapter.rb +80 -0
- data/lib/wayfarer/job.rb +211 -0
- data/lib/wayfarer/locals.rb +40 -0
- data/lib/wayfarer/page.rb +94 -0
- data/lib/wayfarer/parsers/json_parser.rb +20 -0
- data/lib/wayfarer/parsers/xml_parser.rb +27 -0
- data/lib/wayfarer/processor.rb +103 -0
- data/lib/wayfarer/routing/custom_rule.rb +21 -0
- data/lib/wayfarer/routing/filetypes_rule.rb +20 -0
- data/lib/wayfarer/routing/host_rule.rb +19 -0
- data/lib/wayfarer/routing/path_rule.rb +54 -0
- data/lib/wayfarer/routing/protocol_rule.rb +21 -0
- data/lib/wayfarer/routing/query_rule.rb +59 -0
- data/lib/wayfarer/routing/router.rb +71 -0
- data/lib/wayfarer/routing/rule.rb +114 -0
- data/lib/wayfarer/routing/uri_rule.rb +21 -0
- data/lib/wayfarer.rb +68 -0
- data/spec/configuration_spec.rb +26 -0
- data/spec/crawl_spec.rb +48 -0
- data/spec/finders_spec.rb +49 -0
- data/spec/frontiers/memory_bloomfilter_spec.rb +6 -0
- data/spec/frontiers/memory_frontier_spec.rb +6 -0
- data/spec/frontiers/memory_trie_frontier_spec.rb +6 -0
- data/spec/frontiers/normalize_uris_spec.rb +59 -0
- data/spec/frontiers/redis_bloomfilter_spec.rb +6 -0
- data/spec/frontiers/redis_frontier_spec.rb +6 -0
- data/spec/http_adapters/adapter_pool_spec.rb +33 -0
- data/spec/http_adapters/net_http_adapter_spec.rb +83 -0
- data/spec/http_adapters/selenium_adapter_spec.rb +53 -0
- data/spec/integration/callbacks_spec.rb +42 -0
- data/spec/integration/locals_spec.rb +106 -0
- data/spec/integration/peeking_spec.rb +61 -0
- data/spec/job_spec.rb +122 -0
- data/spec/page_spec.rb +38 -0
- data/spec/parsers/json_parser_spec.rb +30 -0
- data/spec/parsers/xml_parser_spec.rb +24 -0
- data/spec/processor_spec.rb +31 -0
- data/spec/routing/custom_rule_spec.rb +26 -0
- data/spec/routing/filetypes_rule_spec.rb +40 -0
- data/spec/routing/host_rule_spec.rb +48 -0
- data/spec/routing/path_rule_spec.rb +66 -0
- data/spec/routing/protocol_rule_spec.rb +26 -0
- data/spec/routing/query_rule_spec.rb +124 -0
- data/spec/routing/router_spec.rb +67 -0
- data/spec/routing/rule_spec.rb +251 -0
- data/spec/routing/uri_rule_spec.rb +24 -0
- data/spec/shared/frontier.rb +96 -0
- data/spec/spec_helpers.rb +62 -0
- data/spec/wayfarer_spec.rb +24 -0
- data/support/static/finders.html +38 -0
- data/support/static/graph/details/a.html +10 -0
- data/support/static/graph/details/b.html +10 -0
- data/support/static/graph/index.html +20 -0
- data/support/static/json/dummy.json +13 -0
- data/support/static/links/links.html +28 -0
- data/support/static/xml/dummy.xml +120 -0
- data/support/test_app.rb +45 -0
- data/wayfarer-jruby.gemspec +49 -0
- data/wayfarer.gemspec +53 -0
- metadata +697 -0
|
@@ -0,0 +1,356 @@
|
|
|
1
|
+
/* Generated by Pygments CSS Theme Builder - https://jwarby.github.io/jekyll-pygments-themes/builder.html */
|
|
2
|
+
/* Base Style */
|
|
3
|
+
.highlight pre {
|
|
4
|
+
color: #333333;
|
|
5
|
+
background-color: transparent;
|
|
6
|
+
}
|
|
7
|
+
/* Punctuation */
|
|
8
|
+
.highlight .p {
|
|
9
|
+
color: #333333;
|
|
10
|
+
background-color: transparent;
|
|
11
|
+
}
|
|
12
|
+
/* Error */
|
|
13
|
+
.highlight .err {
|
|
14
|
+
color: #333333;
|
|
15
|
+
background-color: transparent;
|
|
16
|
+
}
|
|
17
|
+
/* Base Style */
|
|
18
|
+
.highlight .n {
|
|
19
|
+
color: #333333;
|
|
20
|
+
background-color: transparent;
|
|
21
|
+
}
|
|
22
|
+
/* Name Attribute */
|
|
23
|
+
.highlight .na {
|
|
24
|
+
color: #333333;
|
|
25
|
+
background-color: transparent;
|
|
26
|
+
}
|
|
27
|
+
/* Name Builtin */
|
|
28
|
+
.highlight .nb {
|
|
29
|
+
color: #333333;
|
|
30
|
+
background-color: transparent;
|
|
31
|
+
}
|
|
32
|
+
/* Name Class */
|
|
33
|
+
.highlight .nc {
|
|
34
|
+
color: #333333;
|
|
35
|
+
background-color: transparent;
|
|
36
|
+
}
|
|
37
|
+
/* Name Constant */
|
|
38
|
+
.highlight .no {
|
|
39
|
+
color: #333333;
|
|
40
|
+
background-color: transparent;
|
|
41
|
+
}
|
|
42
|
+
/* Name Decorator */
|
|
43
|
+
.highlight .nd {
|
|
44
|
+
color: #333333;
|
|
45
|
+
background-color: transparent;
|
|
46
|
+
}
|
|
47
|
+
/* Name Entity */
|
|
48
|
+
.highlight .ni {
|
|
49
|
+
color: #a20e30;
|
|
50
|
+
background-color: transparent;
|
|
51
|
+
}
|
|
52
|
+
/* Name Exception */
|
|
53
|
+
.highlight .ne {
|
|
54
|
+
color: #333333;
|
|
55
|
+
background-color: transparent;
|
|
56
|
+
}
|
|
57
|
+
/* Name Function */
|
|
58
|
+
.highlight .nf {
|
|
59
|
+
color: #333333;
|
|
60
|
+
background-color: transparent;
|
|
61
|
+
}
|
|
62
|
+
/* Name Label */
|
|
63
|
+
.highlight .nl {
|
|
64
|
+
color: #333333;
|
|
65
|
+
background-color: transparent;
|
|
66
|
+
}
|
|
67
|
+
/* Name Namespace */
|
|
68
|
+
.highlight .nn {
|
|
69
|
+
color: #333333;
|
|
70
|
+
background-color: transparent;
|
|
71
|
+
}
|
|
72
|
+
/* Name Other */
|
|
73
|
+
.highlight .nx {
|
|
74
|
+
color: #333333;
|
|
75
|
+
background-color: transparent;
|
|
76
|
+
}
|
|
77
|
+
/* Name Property */
|
|
78
|
+
.highlight .py {
|
|
79
|
+
color: #333333;
|
|
80
|
+
background-color: transparent;
|
|
81
|
+
}
|
|
82
|
+
/* Name Tag */
|
|
83
|
+
.highlight .nt {
|
|
84
|
+
color: #333333;
|
|
85
|
+
background-color: transparent;
|
|
86
|
+
}
|
|
87
|
+
/* Name Variable */
|
|
88
|
+
.highlight .nv {
|
|
89
|
+
color: #333333;
|
|
90
|
+
background-color: transparent;
|
|
91
|
+
}
|
|
92
|
+
/* Name Variable Class */
|
|
93
|
+
.highlight .vc {
|
|
94
|
+
color: #333333;
|
|
95
|
+
background-color: transparent;
|
|
96
|
+
}
|
|
97
|
+
/* Name Variable Global */
|
|
98
|
+
.highlight .vg {
|
|
99
|
+
color: #333333;
|
|
100
|
+
background-color: transparent;
|
|
101
|
+
}
|
|
102
|
+
/* Name Variable Instance */
|
|
103
|
+
.highlight .vi {
|
|
104
|
+
color: #333333;
|
|
105
|
+
background-color: transparent;
|
|
106
|
+
}
|
|
107
|
+
/* Name Builtin Pseudo */
|
|
108
|
+
.highlight .bp {
|
|
109
|
+
color: #333333;
|
|
110
|
+
background-color: transparent;
|
|
111
|
+
}
|
|
112
|
+
/* Base Style */
|
|
113
|
+
.highlight .g {
|
|
114
|
+
color: #333333;
|
|
115
|
+
background-color: transparent;
|
|
116
|
+
}
|
|
117
|
+
/* */
|
|
118
|
+
.highlight .gd {
|
|
119
|
+
color: #333333;
|
|
120
|
+
background-color: transparent;
|
|
121
|
+
}
|
|
122
|
+
/* Base Style */
|
|
123
|
+
.highlight .o {
|
|
124
|
+
color: #333333;
|
|
125
|
+
background-color: transparent;
|
|
126
|
+
}
|
|
127
|
+
/* Operator Word */
|
|
128
|
+
.highlight .ow {
|
|
129
|
+
color: #333333;
|
|
130
|
+
background-color: transparent;
|
|
131
|
+
}
|
|
132
|
+
/* Base Style */
|
|
133
|
+
.highlight .c {
|
|
134
|
+
color: #727273;
|
|
135
|
+
background-color: transparent;
|
|
136
|
+
}
|
|
137
|
+
/* Comment Multiline */
|
|
138
|
+
.highlight .cm {
|
|
139
|
+
color: #727273;
|
|
140
|
+
background-color: transparent;
|
|
141
|
+
}
|
|
142
|
+
/* Comment Preproc */
|
|
143
|
+
.highlight .cp {
|
|
144
|
+
color: #727273;
|
|
145
|
+
background-color: transparent;
|
|
146
|
+
}
|
|
147
|
+
/* Comment Single */
|
|
148
|
+
.highlight .c1 {
|
|
149
|
+
color: #727273;
|
|
150
|
+
background-color: transparent;
|
|
151
|
+
}
|
|
152
|
+
/* Comment Special */
|
|
153
|
+
.highlight .cs {
|
|
154
|
+
color: #727273;
|
|
155
|
+
background-color: transparent;
|
|
156
|
+
}
|
|
157
|
+
/* Base Style */
|
|
158
|
+
.highlight .k {
|
|
159
|
+
color: #333333;
|
|
160
|
+
background-color: transparent;
|
|
161
|
+
}
|
|
162
|
+
/* Keyword Constant */
|
|
163
|
+
.highlight .kc {
|
|
164
|
+
color: #333333;
|
|
165
|
+
background-color: transparent;
|
|
166
|
+
}
|
|
167
|
+
/* Keyword Declaration */
|
|
168
|
+
.highlight .kd {
|
|
169
|
+
color: #2f3661;
|
|
170
|
+
background-color: transparent;
|
|
171
|
+
}
|
|
172
|
+
/* Keyword Namespace */
|
|
173
|
+
.highlight .kn {
|
|
174
|
+
color: #333333;
|
|
175
|
+
background-color: transparent;
|
|
176
|
+
}
|
|
177
|
+
/* Keyword Pseudo */
|
|
178
|
+
.highlight .kp {
|
|
179
|
+
color: #333333;
|
|
180
|
+
background-color: transparent;
|
|
181
|
+
}
|
|
182
|
+
/* Keyword Reserved */
|
|
183
|
+
.highlight .kr {
|
|
184
|
+
color: #333333;
|
|
185
|
+
background-color: transparent;
|
|
186
|
+
}
|
|
187
|
+
/* Keyword Type */
|
|
188
|
+
.highlight .kt {
|
|
189
|
+
color: #333333;
|
|
190
|
+
background-color: transparent;
|
|
191
|
+
}
|
|
192
|
+
/* Base Style */
|
|
193
|
+
.highlight .l {
|
|
194
|
+
color: #a20e30;
|
|
195
|
+
background-color: transparent;
|
|
196
|
+
}
|
|
197
|
+
/* Literal Date */
|
|
198
|
+
.highlight .ld {
|
|
199
|
+
color: #a20e30;
|
|
200
|
+
background-color: transparent;
|
|
201
|
+
}
|
|
202
|
+
/* Literal Number */
|
|
203
|
+
.highlight .m {
|
|
204
|
+
color: #a20e30;
|
|
205
|
+
background-color: transparent;
|
|
206
|
+
}
|
|
207
|
+
/* Literal Number Float */
|
|
208
|
+
.highlight .mf {
|
|
209
|
+
color: #a20e30;
|
|
210
|
+
background-color: transparent;
|
|
211
|
+
}
|
|
212
|
+
/* Literal Number Hex */
|
|
213
|
+
.highlight .mh {
|
|
214
|
+
color: #333333;
|
|
215
|
+
background-color: transparent;
|
|
216
|
+
}
|
|
217
|
+
/* Literal Number Integer */
|
|
218
|
+
.highlight .mi {
|
|
219
|
+
color: #a20e30;
|
|
220
|
+
background-color: transparent;
|
|
221
|
+
}
|
|
222
|
+
/* Literal Number Oct */
|
|
223
|
+
.highlight .mo {
|
|
224
|
+
color: #a20e30;
|
|
225
|
+
background-color: transparent;
|
|
226
|
+
}
|
|
227
|
+
/* Literal Number Integer Long */
|
|
228
|
+
.highlight .il {
|
|
229
|
+
color: #a20e30;
|
|
230
|
+
background-color: transparent;
|
|
231
|
+
}
|
|
232
|
+
/* Literal String */
|
|
233
|
+
.highlight .s {
|
|
234
|
+
color: #a20e30;
|
|
235
|
+
background-color: transparent;
|
|
236
|
+
}
|
|
237
|
+
/* Literal String Backtick */
|
|
238
|
+
.highlight .sb {
|
|
239
|
+
color: #a20e30;
|
|
240
|
+
background-color: transparent;
|
|
241
|
+
}
|
|
242
|
+
/* Literal String Char */
|
|
243
|
+
.highlight .sc {
|
|
244
|
+
color: #a20e30;
|
|
245
|
+
background-color: transparent;
|
|
246
|
+
}
|
|
247
|
+
/* Literal String Doc */
|
|
248
|
+
.highlight .sd {
|
|
249
|
+
color: #a20e30;
|
|
250
|
+
background-color: transparent;
|
|
251
|
+
}
|
|
252
|
+
/* Literal String Double */
|
|
253
|
+
.highlight .s2 {
|
|
254
|
+
color: #a20e30;
|
|
255
|
+
background-color: transparent;
|
|
256
|
+
}
|
|
257
|
+
/* Literal String Escape */
|
|
258
|
+
.highlight .se {
|
|
259
|
+
color: #a20e30;
|
|
260
|
+
background-color: transparent;
|
|
261
|
+
}
|
|
262
|
+
/* Literal String Heredoc */
|
|
263
|
+
.highlight .sh {
|
|
264
|
+
color: #a20e30;
|
|
265
|
+
background-color: transparent;
|
|
266
|
+
}
|
|
267
|
+
/* Literal String Interpol */
|
|
268
|
+
.highlight .si {
|
|
269
|
+
color: #a20e30;
|
|
270
|
+
background-color: transparent;
|
|
271
|
+
}
|
|
272
|
+
/* Literal String Other */
|
|
273
|
+
.highlight .sx {
|
|
274
|
+
color: #a20e30;
|
|
275
|
+
background-color: transparent;
|
|
276
|
+
}
|
|
277
|
+
/* Literal String Regex */
|
|
278
|
+
.highlight .sr {
|
|
279
|
+
color: #a20e30;
|
|
280
|
+
background-color: transparent;
|
|
281
|
+
}
|
|
282
|
+
/* Literal String Single */
|
|
283
|
+
.highlight .s1 {
|
|
284
|
+
color: #a20e30;
|
|
285
|
+
background-color: transparent;
|
|
286
|
+
}
|
|
287
|
+
/* Literal String Symbol */
|
|
288
|
+
.highlight .ss {
|
|
289
|
+
color: #a20e30;
|
|
290
|
+
background-color: transparent;
|
|
291
|
+
}
|
|
292
|
+
/* Base Style */
|
|
293
|
+
.highlight .g {
|
|
294
|
+
color: #333333;
|
|
295
|
+
background-color: transparent;
|
|
296
|
+
}
|
|
297
|
+
/* Generic Deleted */
|
|
298
|
+
.highlight .gd {
|
|
299
|
+
color: #333333;
|
|
300
|
+
background-color: transparent;
|
|
301
|
+
}
|
|
302
|
+
/* Generic Emph */
|
|
303
|
+
.highlight .ge {
|
|
304
|
+
color: #333333;
|
|
305
|
+
background-color: transparent;
|
|
306
|
+
}
|
|
307
|
+
/* Generic Error */
|
|
308
|
+
.highlight .gr {
|
|
309
|
+
color: #333333;
|
|
310
|
+
background-color: transparent;
|
|
311
|
+
}
|
|
312
|
+
/* Generic Heading */
|
|
313
|
+
.highlight .gh {
|
|
314
|
+
color: #333333;
|
|
315
|
+
background-color: transparent;
|
|
316
|
+
}
|
|
317
|
+
/* Generic Inserted */
|
|
318
|
+
.highlight .gi {
|
|
319
|
+
color: #333333;
|
|
320
|
+
background-color: transparent;
|
|
321
|
+
}
|
|
322
|
+
/* Generic Output */
|
|
323
|
+
.highlight .go {
|
|
324
|
+
color: #333333;
|
|
325
|
+
background-color: transparent;
|
|
326
|
+
}
|
|
327
|
+
/* Generic Prompt */
|
|
328
|
+
.highlight .gp {
|
|
329
|
+
color: #333333;
|
|
330
|
+
background-color: transparent;
|
|
331
|
+
}
|
|
332
|
+
/* Generic Strong */
|
|
333
|
+
.highlight .gs {
|
|
334
|
+
color: #333333;
|
|
335
|
+
background-color: transparent;
|
|
336
|
+
}
|
|
337
|
+
/* Generic Subheading */
|
|
338
|
+
.highlight .gu {
|
|
339
|
+
color: #333333;
|
|
340
|
+
background-color: transparent;
|
|
341
|
+
}
|
|
342
|
+
/* Generic Traceback */
|
|
343
|
+
.highlight .gt {
|
|
344
|
+
color: #333333;
|
|
345
|
+
background-color: transparent;
|
|
346
|
+
}
|
|
347
|
+
/* Other */
|
|
348
|
+
.highlight .x {
|
|
349
|
+
color: #333333;
|
|
350
|
+
background-color: transparent;
|
|
351
|
+
}
|
|
352
|
+
/* Text Whitespace */
|
|
353
|
+
.highlight .w {
|
|
354
|
+
color: #333333;
|
|
355
|
+
background-color: transparent;
|
|
356
|
+
}
|
|
@@ -0,0 +1,70 @@
|
|
|
1
|
+
---
|
|
2
|
+
layout: default
|
|
3
|
+
title: Using Capybara
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Using Capybara
|
|
7
|
+
When using Selenium, Wayfarer supports Selenium drivers. You can execute JavaScript, take screenshots, interact with the page, and so on. For an exhaustive list, see [the official API documentation](http://www.rubydoc.info/gems/selenium-webdriver/0.0.28/Selenium/WebDriver/Driver).
|
|
8
|
+
|
|
9
|
+
See [examples/selenium.rb](../examples/selenium.rb).
|
|
10
|
+
|
|
11
|
+
## Setup
|
|
12
|
+
Inside your instance methods, you have access to `#driver`, which returns a Selenium driver:
|
|
13
|
+
|
|
14
|
+
{% highlight ruby %}
|
|
15
|
+
class DummyJob < Wayfarer::Job
|
|
16
|
+
config do |c|
|
|
17
|
+
c.http_adapter = :selenium
|
|
18
|
+
c.selenium_argv = [:firefox]
|
|
19
|
+
c.connection_count = 4 # Number of instantiated WebDrivers
|
|
20
|
+
end
|
|
21
|
+
|
|
22
|
+
draw uri: "https://example.com"
|
|
23
|
+
def foo
|
|
24
|
+
driver # => #<Selenium::WebDriver::Driver:...>
|
|
25
|
+
end
|
|
26
|
+
end
|
|
27
|
+
{% endhighlight %}
|
|
28
|
+
|
|
29
|
+
### Selenium Grid
|
|
30
|
+
{% highlight ruby %}
|
|
31
|
+
class DummyJob < Wayfarer::Job
|
|
32
|
+
config do |c|
|
|
33
|
+
c.http_adapter = :selenium
|
|
34
|
+
c.selenium_argv = [
|
|
35
|
+
:remote, url: "http://localhost:4444/wd/hub", desired_capabilities: :firefox
|
|
36
|
+
]
|
|
37
|
+
end
|
|
38
|
+
end
|
|
39
|
+
{% endhighlight %}
|
|
40
|
+
|
|
41
|
+
## Executing JavaScript
|
|
42
|
+
```ruby
|
|
43
|
+
class DummyJob < Wayfarer::Job
|
|
44
|
+
config do |c|
|
|
45
|
+
c.http_adapter = :selenium
|
|
46
|
+
c.selenium_argv = [:firefox]
|
|
47
|
+
end
|
|
48
|
+
|
|
49
|
+
draw uri: "https://example.com"
|
|
50
|
+
def example
|
|
51
|
+
driver.execute_script("console.log('Hello from wayfarer!')")
|
|
52
|
+
end
|
|
53
|
+
end
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
## Taking screenshots
|
|
57
|
+
{% highlight ruby %}
|
|
58
|
+
class DummyJob < Wayfarer::Job
|
|
59
|
+
config do |c|
|
|
60
|
+
c.http_adapter = :selenium
|
|
61
|
+
c.selenium_argv = [:firefox]
|
|
62
|
+
c.window_size: [1024, 768]
|
|
63
|
+
end
|
|
64
|
+
|
|
65
|
+
draw uri: "https://example.com"
|
|
66
|
+
def example
|
|
67
|
+
driver.save_screenshot("/tmp/screenshot.png")
|
|
68
|
+
end
|
|
69
|
+
end
|
|
70
|
+
{% endhighlight %}
|
|
@@ -0,0 +1,45 @@
|
|
|
1
|
+
---
|
|
2
|
+
layout: default
|
|
3
|
+
title: Callbacks
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Callbacks
|
|
7
|
+
|
|
8
|
+
Besides all [ActiveJob callbacks](http://api.rubyonrails.org/classes/ActiveJob/Callbacks/ClassMethods.html), three other callbacks are available. You have access to [locals](/guides/locals) in all callbacks.
|
|
9
|
+
|
|
10
|
+
## `before_crawl`
|
|
11
|
+
Fires __once__ before any pages have been retrieved.
|
|
12
|
+
|
|
13
|
+
{% highlight ruby %}
|
|
14
|
+
class DummyJob < Wayfarer::Job
|
|
15
|
+
before_crawl { puts "Work is about to happen" }
|
|
16
|
+
end
|
|
17
|
+
{% endhighlight %}
|
|
18
|
+
|
|
19
|
+
## `after_crawl`
|
|
20
|
+
Fires __once__ after all pages have been retrieved and processing is done.
|
|
21
|
+
|
|
22
|
+
{% highlight ruby %}
|
|
23
|
+
class DummyJob < Wayfarer::Job
|
|
24
|
+
after_crawl { puts "Work did happen" }
|
|
25
|
+
end
|
|
26
|
+
{% endhighlight %}
|
|
27
|
+
|
|
28
|
+
## `setup_adapter`
|
|
29
|
+
Fires for every adapter immediately after its creation.
|
|
30
|
+
|
|
31
|
+
The block gets yielded an adapter. When using the Selenium HTTP adapter, both a WebDriver and a wrapping Capybara driver get yielded.
|
|
32
|
+
|
|
33
|
+
{% highlight ruby %}
|
|
34
|
+
class DummyJob < Wayfarer::Job
|
|
35
|
+
config.http_adapter = :selenium
|
|
36
|
+
config.connection_count = 4
|
|
37
|
+
|
|
38
|
+
setup_adapter do |adapter, driver, browser|
|
|
39
|
+
# This block gets called 4 times with different adapters
|
|
40
|
+
adapter # => The HTTP adapter
|
|
41
|
+
driver # => #<Selenium::WebDriver::Driver:...> or nil
|
|
42
|
+
browser # => #<Capybara::Selenium::Driver:...> or nil
|
|
43
|
+
end
|
|
44
|
+
end
|
|
45
|
+
{% endhighlight %}
|
data/docs/guides/cli.md
ADDED
|
@@ -0,0 +1,52 @@
|
|
|
1
|
+
---
|
|
2
|
+
layout: default
|
|
3
|
+
title: CLI
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Command-line interface
|
|
7
|
+
Wayfarer ships with a small executable, `wayfarer`.
|
|
8
|
+
|
|
9
|
+
Job classes are loaded by naming convention, e.g. if you pass `./directory/foo_bar.rb` as the `FILE` parameter, that file is expected to define the class `FooBar`. You can leave off the `.rb` extension.
|
|
10
|
+
|
|
11
|
+
## `% wayfarer route FILE URI`
|
|
12
|
+
Loads the job defined in `FILE`, and prints the first matching route for `URI`.
|
|
13
|
+
|
|
14
|
+
## `% wayfarer enqueue FILE URI`
|
|
15
|
+
Loads and enqueues the job in `FILE`, starting from `URI`.
|
|
16
|
+
|
|
17
|
+
* `--log_level LEVEL`
|
|
18
|
+
Option. Which log messages to print.
|
|
19
|
+
|
|
20
|
+
* Default: `info`
|
|
21
|
+
* Recognized values: `unknown`, `debug`, `error`, `fatal`, `info`, `warn`
|
|
22
|
+
|
|
23
|
+
* `--queue_adapter ADAPTER`
|
|
24
|
+
Option. Which ActiveJob queue adapter to use (e.g. `sidekiq`, `resque`).
|
|
25
|
+
* Recognized values: strings, see [documentation](http://api.rubyonrails.org/)
|
|
26
|
+
|
|
27
|
+
* `--wait VALUE`
|
|
28
|
+
Option. Point of time when the enqueued job should be run.
|
|
29
|
+
|
|
30
|
+
1. If the value can be converted to an integer, it represents the seconds from now.
|
|
31
|
+
2. If the value can be parsed by `Time::parse`, the job gets scheduled at that point in time.
|
|
32
|
+
3. If the value is a human-readable time string that [Chronic](https://github.com/mojombo/chronic) can make sense of, the job is scheduled at that point in time.
|
|
33
|
+
|
|
34
|
+
__Examples:__
|
|
35
|
+
|
|
36
|
+
60 seconds from now:
|
|
37
|
+
|
|
38
|
+
```
|
|
39
|
+
% wayfarer enqueue ./foo_bar http://google.com --wait 60
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
6pm, today:
|
|
43
|
+
|
|
44
|
+
```
|
|
45
|
+
% wayfarer enqueue ./foo_bar http://google.com --wait 18:00
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
Tomorrow:
|
|
49
|
+
|
|
50
|
+
```
|
|
51
|
+
% wayfarer enqueue ./foo_bar http://google.com --wait tomorrow
|
|
52
|
+
```
|
|
@@ -0,0 +1,184 @@
|
|
|
1
|
+
---
|
|
2
|
+
layout: default
|
|
3
|
+
title: Configuration
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Configuration
|
|
7
|
+
|
|
8
|
+
All job classes base their configuration off the global one.
|
|
9
|
+
|
|
10
|
+
{% highlight ruby %}
|
|
11
|
+
# Setting a key globally applies to all jobs ...
|
|
12
|
+
Wayfarer.config.key = :value
|
|
13
|
+
|
|
14
|
+
class DummyJob < Wayfarer::Job
|
|
15
|
+
# ... unless a job overrides it
|
|
16
|
+
config.key = :other_value
|
|
17
|
+
end
|
|
18
|
+
|
|
19
|
+
class DummyJob < Wayfarer::Job
|
|
20
|
+
# Have it yielded
|
|
21
|
+
config { |c| c.key = :other_value }
|
|
22
|
+
end
|
|
23
|
+
{% endhighlight %}
|
|
24
|
+
|
|
25
|
+
## Recognized keys and values
|
|
26
|
+
|
|
27
|
+
### `print_stacktraces`
|
|
28
|
+
* Default: `true`
|
|
29
|
+
* Recognized values: Booleans
|
|
30
|
+
|
|
31
|
+
Whether to print stacktraces when encounterting unhandled exceptions in job actions. See [Error handling]({{base}}/guides/error_handling.html).
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
### `reraise_exceptions`
|
|
36
|
+
|
|
37
|
+
* Default: `false`
|
|
38
|
+
* Recognized values: Booleans
|
|
39
|
+
|
|
40
|
+
Whether to crash when encountering unhandled exceptions in job actions. See [Error handling]({{base}}/guides/error_handling.html).
|
|
41
|
+
|
|
42
|
+
---
|
|
43
|
+
|
|
44
|
+
### `allow_circulation`
|
|
45
|
+
|
|
46
|
+
* Default: `false`
|
|
47
|
+
* Recognized values: Booleans
|
|
48
|
+
|
|
49
|
+
Whether URIs may be visited twice.
|
|
50
|
+
|
|
51
|
+
<aside class="note">
|
|
52
|
+
Allowing circulation might cause your jobs to not terminate.
|
|
53
|
+
</aside>
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
### `normalize_uris`
|
|
58
|
+
|
|
59
|
+
* Default: `true`
|
|
60
|
+
* Recognized values: Booleans
|
|
61
|
+
|
|
62
|
+
Whether to strip fragments, reorder query keys, etc. when staging and caching URIs. Customizable with the `:normalize_uri_options` key. See [normalize_url](https://github.com/rwz/normalize_url).
|
|
63
|
+
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
### `normalize_uri_options`
|
|
67
|
+
|
|
68
|
+
* Default: `{}`
|
|
69
|
+
* Recognized values: See [normalize_url](https://github.com/rwz/normalize_url).
|
|
70
|
+
|
|
71
|
+
---
|
|
72
|
+
|
|
73
|
+
### `frontier`
|
|
74
|
+
* Default: `:memory`
|
|
75
|
+
* Recognized values: See [(Redis) frontiers](frontiers.html).
|
|
76
|
+
|
|
77
|
+
Which frontier to use.
|
|
78
|
+
|
|
79
|
+
<aside class="note">
|
|
80
|
+
Bloom filters may yield false positives. See the <a href="https://en.wikipedia.org/wiki/Bloom_filter">Wikipedia article</a>.
|
|
81
|
+
</aside>
|
|
82
|
+
|
|
83
|
+
---
|
|
84
|
+
|
|
85
|
+
### `connection_count`
|
|
86
|
+
|
|
87
|
+
* Default: `4`
|
|
88
|
+
* Recognized values: Integers
|
|
89
|
+
|
|
90
|
+
How many threads and HTTP adapters to use (1:1 correspondence).
|
|
91
|
+
|
|
92
|
+
---
|
|
93
|
+
|
|
94
|
+
### `http_adapter`
|
|
95
|
+
|
|
96
|
+
* Default: `:net_http`
|
|
97
|
+
* Recognized values: `:net_http`, `:selenium`
|
|
98
|
+
|
|
99
|
+
Which HTTP adapter to use. See [Selenium & Capybara](selenium_capybara.html).
|
|
100
|
+
|
|
101
|
+
---
|
|
102
|
+
|
|
103
|
+
### `connection_timeout`
|
|
104
|
+
|
|
105
|
+
* Default: `Float::INFINITY`
|
|
106
|
+
* Recognized values: Floats
|
|
107
|
+
|
|
108
|
+
Time in seconds that a job instance may hold an HTTP adapter. Instances that exceed this time limit raise an exception.
|
|
109
|
+
|
|
110
|
+
---
|
|
111
|
+
|
|
112
|
+
### `max_http_redirects`
|
|
113
|
+
|
|
114
|
+
* Default: `3`
|
|
115
|
+
* Recognized values: Integers
|
|
116
|
+
|
|
117
|
+
How many 3xx redirects to follow.
|
|
118
|
+
|
|
119
|
+
<aside class="note">
|
|
120
|
+
Has no effect when using the <code>:selenium</code> HTTP adapter.
|
|
121
|
+
</aside>
|
|
122
|
+
|
|
123
|
+
---
|
|
124
|
+
|
|
125
|
+
### `selenium_argv`
|
|
126
|
+
|
|
127
|
+
* Default: `[:firefox]`
|
|
128
|
+
* Recognized values: See [Selenium & Capybara](selenium_capybara.html)
|
|
129
|
+
|
|
130
|
+
Argument vector passed to [`Selenium::WebDriver::Driver::for`](http://www.rubydoc.info/gems/selenium-webdriver/Selenium/WebDriver/Driver#for-class_method).
|
|
131
|
+
|
|
132
|
+
---
|
|
133
|
+
|
|
134
|
+
### `redis_opts`
|
|
135
|
+
|
|
136
|
+
* Default: `{ host: "localhost", port: 6379 }`
|
|
137
|
+
* Recognized values: [See documentation](http://www.rubydoc.info/github/redis/redis-rb/Redis%3Ainitialize)
|
|
138
|
+
|
|
139
|
+
Options passed to [`Redis#initialize`](http://www.rubydoc.info/github/redis/redis-rb/Redis%3Ainitialize).
|
|
140
|
+
|
|
141
|
+
---
|
|
142
|
+
|
|
143
|
+
### `bloomfilter_opts`
|
|
144
|
+
|
|
145
|
+
* Default:
|
|
146
|
+
```
|
|
147
|
+
{
|
|
148
|
+
size: 100,
|
|
149
|
+
hashes: 2,
|
|
150
|
+
seed: 1,
|
|
151
|
+
bucket: 3,
|
|
152
|
+
raise: false
|
|
153
|
+
}
|
|
154
|
+
```
|
|
155
|
+
* Recognized values:
|
|
156
|
+
* `size`: Integers; number of buckets in a bloom filter
|
|
157
|
+
* `hashes`: Integers; number of hash functions
|
|
158
|
+
* `seed`: Integers; seed of hash functions
|
|
159
|
+
* `bucket`: Integers; number of bits in a bloom filter bucket
|
|
160
|
+
* `raise`: Booleans; whether to raise on bucket overflow
|
|
161
|
+
|
|
162
|
+
Options for [bloomfilter-rb](https://github.com/igrigorik/bloomfilter-rb).
|
|
163
|
+
|
|
164
|
+
---
|
|
165
|
+
|
|
166
|
+
### `window_size`
|
|
167
|
+
|
|
168
|
+
* Default: `[1024, 768]`
|
|
169
|
+
* Recognized values: `[Integer, Integer]`
|
|
170
|
+
|
|
171
|
+
Dimensions of browser windows.
|
|
172
|
+
|
|
173
|
+
<aside class="note">
|
|
174
|
+
Only has an effect when using the <code>:selenium</code> HTTP adapter.
|
|
175
|
+
</aside>
|
|
176
|
+
|
|
177
|
+
---
|
|
178
|
+
|
|
179
|
+
### `mustermann_type`
|
|
180
|
+
|
|
181
|
+
* Default: `:sinatra`
|
|
182
|
+
* Recognized values: [See documentation](https://github.com/sinatra/mustermann)
|
|
183
|
+
|
|
184
|
+
Which [Mustermann](https://github.com/sinatra/mustermann) pattern type to use.
|