parklife 0.8.1 → 0.9.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.github/dependabot.yml +11 -0
- data/.github/workflows/examples.yml +4 -4
- data/.github/workflows/tests.yml +2 -2
- data/.rubocop.yml +0 -3
- data/CHANGELOG.md +21 -1
- data/README.md +6 -3
- data/lib/parklife/application.rb +40 -7
- data/lib/parklife/browser.rb +11 -12
- data/lib/parklife/build.rb +109 -0
- data/lib/parklife/cli.rb +13 -5
- data/lib/parklife/config.rb +36 -4
- data/lib/parklife/crawler.rb +73 -74
- data/lib/parklife/errors.rb +0 -1
- data/lib/parklife/logger.rb +29 -0
- data/lib/parklife/reporter/base.rb +30 -0
- data/lib/parklife/reporter/log.rb +13 -0
- data/lib/parklife/reporter/null.rb +9 -0
- data/lib/parklife/reporter/progress.rb +16 -0
- data/lib/parklife/responder/base.rb +15 -0
- data/lib/parklife/responder/not_found.rb +21 -0
- data/lib/parklife/responder/not_modified.rb +17 -0
- data/lib/parklife/responder/ok.rb +13 -0
- data/lib/parklife/responder/redirect.rb +16 -0
- data/lib/parklife/responder/unknown.rb +12 -0
- data/lib/parklife/route.rb +4 -0
- data/lib/parklife/utils.rb +0 -25
- data/lib/parklife/version.rb +1 -1
- metadata +14 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 98577bf55e3f709d741dfe6df388c5d61f6c93d12d53b88983ad10dbf0552fbd
|
|
4
|
+
data.tar.gz: 6dbe3f4d37e80846b461a6b29981b56cd15050bdce4f9ee517385ac7be08d5a3
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: d2b8b5540f43681e1b0ea92fff99fc14e5d25bde63b1e06909e648522be0baba8577be1f8981de0c92238b030791933cdad8a42b2d4cc5827a89937633393f07
|
|
7
|
+
data.tar.gz: c66a33ad08986f3e94fc0477eefdbab1c98ca8ea7ff87e87d9a26ea01f1d0f09fd152230632b462dbea42d54ddaf18bf5624f6e15710f136be22149b2fa9cd0f
|
|
@@ -7,13 +7,13 @@ jobs:
|
|
|
7
7
|
runs-on: ubuntu-latest
|
|
8
8
|
name: Rack example
|
|
9
9
|
steps:
|
|
10
|
-
- uses: actions/checkout@
|
|
10
|
+
- uses: actions/checkout@v6
|
|
11
11
|
- uses: ruby/setup-ruby@v1
|
|
12
12
|
with:
|
|
13
13
|
bundler-cache: true
|
|
14
14
|
ruby-version: '2.7'
|
|
15
15
|
working-directory: examples/rack
|
|
16
|
-
- run: bundle exec parklife build
|
|
16
|
+
- run: bundle exec parklife build --reporter log
|
|
17
17
|
working-directory: examples/rack
|
|
18
18
|
- run: test -f build/index.html
|
|
19
19
|
working-directory: examples/rack
|
|
@@ -22,13 +22,13 @@ jobs:
|
|
|
22
22
|
runs-on: ubuntu-latest
|
|
23
23
|
name: Roda example
|
|
24
24
|
steps:
|
|
25
|
-
- uses: actions/checkout@
|
|
25
|
+
- uses: actions/checkout@v6
|
|
26
26
|
- uses: ruby/setup-ruby@v1
|
|
27
27
|
with:
|
|
28
28
|
bundler-cache: true
|
|
29
29
|
ruby-version: '3.2'
|
|
30
30
|
working-directory: examples/roda
|
|
31
|
-
- run: bundle exec parklife build
|
|
31
|
+
- run: bundle exec parklife build --reporter log
|
|
32
32
|
working-directory: examples/roda
|
|
33
33
|
- run: test -f build/index.html
|
|
34
34
|
working-directory: examples/roda
|
data/.github/workflows/tests.yml
CHANGED
|
@@ -17,7 +17,7 @@ jobs:
|
|
|
17
17
|
- '3.4'
|
|
18
18
|
name: RSpec (Ruby ${{ matrix.ruby }})
|
|
19
19
|
steps:
|
|
20
|
-
- uses: actions/checkout@
|
|
20
|
+
- uses: actions/checkout@v6
|
|
21
21
|
- uses: ruby/setup-ruby@v1
|
|
22
22
|
with:
|
|
23
23
|
bundler-cache: true
|
|
@@ -28,7 +28,7 @@ jobs:
|
|
|
28
28
|
runs-on: ubuntu-latest
|
|
29
29
|
name: RuboCop
|
|
30
30
|
steps:
|
|
31
|
-
- uses: actions/checkout@
|
|
31
|
+
- uses: actions/checkout@v6
|
|
32
32
|
- uses: ruby/setup-ruby@v1
|
|
33
33
|
with:
|
|
34
34
|
bundler-cache: true
|
data/.rubocop.yml
CHANGED
data/CHANGELOG.md
CHANGED
|
@@ -1,4 +1,24 @@
|
|
|
1
|
-
## Unreleased
|
|
1
|
+
<!-- ## Unreleased -->
|
|
2
|
+
|
|
3
|
+
## Version 0.9.0 - 2026-05-16
|
|
4
|
+
|
|
5
|
+
- Introduce the log reporter and add colour to the output by default (or pass `--no-colour` to disable). Pass `--reporter log` to the build) Supported reporters are `log` (one line per visited route), `null` (only errors), and `progress` (dots). <https://github.com/benpickles/parklife/pull/138>
|
|
6
|
+
|
|
7
|
+
- Add inter-build caching via support HTTP ETags. <https://github.com/benpickles/parklife/pull/138>
|
|
8
|
+
|
|
9
|
+
Parklife now saves build metadata (in the file `BUILD_DIR/.parklife/build.yml`) which is used to support HTTP caching via ETags (it can be skipped by setting the config `skip_build_meta = false` or using `--skip-build-meta` via the CLI). ETag generation and the actual cache hit/miss responsibility rests on the app itself but hopefully your framework can help with this (certainly Rails has lots of HTTP caching helpers). Tell Parklife to use a previous build as a cache source at build time with `parklife build --cache-dir build`.
|
|
10
|
+
|
|
11
|
+
- Introduce response handlers and make it possible for them to be customised without having to resort to monkeypatching. <https://github.com/benpickles/parklife/pull/136>
|
|
12
|
+
|
|
13
|
+
- Only store a text/html response as an .html file <https://github.com/benpickles/parklife/pull/136>
|
|
14
|
+
|
|
15
|
+
Prior to this change all responses were expected to be HTML unless the request path included a file extension which led to the following issue:
|
|
16
|
+
|
|
17
|
+
The path `/dry-types/v1.8` would be detected as having a file extension (`.8`) and create the _file_ `dry-types/v1.8`, then the path `/dry-types/v1.8/something-else` would attempt to create the file `dry-types/v1.8/something-else.html` but encounter a `Errno::EEXIST` when it attempted to create the _directory_ `dry-types/v1.8`.
|
|
18
|
+
|
|
19
|
+
With this change a response is only saved as HTML when its content type is `text/html`. This fixes the above case and should mean that files in the final build and how they're treated by a static server is better aligned with how they're used in the dynamic development environment.
|
|
20
|
+
|
|
21
|
+
- Fix including a 404 response in the build when `on_404` was `:skip` or `:warn`. <https://github.com/benpickles/parklife/pull/135>
|
|
2
22
|
|
|
3
23
|
## Version 0.8.1 - 2025-12-21
|
|
4
24
|
|
data/README.md
CHANGED
|
@@ -116,10 +116,13 @@ Parklife.application.config.build_dir = 'my/build/dir'
|
|
|
116
116
|
|
|
117
117
|
### Handling a 404
|
|
118
118
|
|
|
119
|
-
By default if Parklife encounters a 404 response when fetching a route it will raise an exception (the `:error` setting)
|
|
119
|
+
By default if Parklife encounters a 404 response when fetching a route it will raise an exception which stops the build (the `:error` setting).
|
|
120
120
|
|
|
121
|
-
|
|
122
|
-
|
|
121
|
+
Possible values are:
|
|
122
|
+
|
|
123
|
+
- `:error` (default) - raise an exception which stops the build.
|
|
124
|
+
- `:skip` - do not save the response, continue processing.
|
|
125
|
+
- `:warn` - output a message to `stderr`, do not save the response, continue processing.
|
|
123
126
|
|
|
124
127
|
```ruby
|
|
125
128
|
Parklife.application.config.on_404 = :warn
|
data/lib/parklife/application.rb
CHANGED
|
@@ -1,6 +1,7 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
3
|
require 'fileutils'
|
|
4
|
+
require 'parklife/build'
|
|
4
5
|
require 'parklife/config'
|
|
5
6
|
require 'parklife/crawler'
|
|
6
7
|
require 'parklife/errors'
|
|
@@ -26,14 +27,10 @@ module Parklife
|
|
|
26
27
|
end
|
|
27
28
|
|
|
28
29
|
def build
|
|
29
|
-
raise BuildDirNotDefinedError if config.build_dir.nil?
|
|
30
30
|
raise RackAppNotDefinedError if config.app.nil?
|
|
31
31
|
|
|
32
|
-
if
|
|
33
|
-
|
|
34
|
-
else
|
|
35
|
-
Dir.mkdir(config.build_dir)
|
|
36
|
-
end
|
|
32
|
+
prepare_cache_dir if config.cache_dir
|
|
33
|
+
prepare_build_dir
|
|
37
34
|
|
|
38
35
|
@before_build_callbacks.each do |callback|
|
|
39
36
|
callback.call(self)
|
|
@@ -51,7 +48,11 @@ module Parklife
|
|
|
51
48
|
end
|
|
52
49
|
|
|
53
50
|
def crawler
|
|
54
|
-
@crawler ||= Crawler.new(
|
|
51
|
+
@crawler ||= Crawler.new(
|
|
52
|
+
config,
|
|
53
|
+
@route_set,
|
|
54
|
+
config.cache_dir ? Build.from_dir(config.cache_dir) : nil
|
|
55
|
+
)
|
|
55
56
|
end
|
|
56
57
|
|
|
57
58
|
def load_Parkfile(path)
|
|
@@ -66,5 +67,37 @@ module Parklife
|
|
|
66
67
|
@route_set
|
|
67
68
|
end
|
|
68
69
|
end
|
|
70
|
+
|
|
71
|
+
private
|
|
72
|
+
def prepare_build_dir
|
|
73
|
+
if config.build_dir.directory?
|
|
74
|
+
FileUtils.rm_rf(config.build_dir.children)
|
|
75
|
+
else
|
|
76
|
+
config.build_dir.mkdir
|
|
77
|
+
end
|
|
78
|
+
end
|
|
79
|
+
|
|
80
|
+
def prepare_cache_dir
|
|
81
|
+
# Nothing to do unless the previous build is being used as a cache.
|
|
82
|
+
return unless config.cache_dir.expand_path == config.build_dir.expand_path
|
|
83
|
+
|
|
84
|
+
if config.build_dir.exist?
|
|
85
|
+
config.cache_dir = Config::CACHE_TMPDIR
|
|
86
|
+
|
|
87
|
+
if config.cache_dir.exist?
|
|
88
|
+
config.cache_dir.rmtree
|
|
89
|
+
else
|
|
90
|
+
config.cache_dir.dirname.mkpath
|
|
91
|
+
end
|
|
92
|
+
|
|
93
|
+
# Move the existing previous build to the tmp location to clear the
|
|
94
|
+
# way for a fresh new build.
|
|
95
|
+
config.build_dir.rename(config.cache_dir)
|
|
96
|
+
else
|
|
97
|
+
# The build/cache directories are set to the same thing but don't
|
|
98
|
+
# exist so there is no cache.
|
|
99
|
+
config.cache_dir = nil
|
|
100
|
+
end
|
|
101
|
+
end
|
|
69
102
|
end
|
|
70
103
|
end
|
data/lib/parklife/browser.rb
CHANGED
|
@@ -11,24 +11,23 @@ module Parklife
|
|
|
11
11
|
@app = app
|
|
12
12
|
@base = base
|
|
13
13
|
@session = Rack::Test::Session.new(app)
|
|
14
|
-
|
|
14
|
+
@env = {
|
|
15
|
+
'HTTP_HOST' => Utils.host_with_port(base),
|
|
16
|
+
'HTTPS' => base.scheme == 'https' ? 'on' : 'off',
|
|
17
|
+
script_name: base.path.chomp('/'),
|
|
18
|
+
}
|
|
15
19
|
end
|
|
16
20
|
|
|
17
|
-
def get(path)
|
|
18
|
-
session.get(
|
|
21
|
+
def get(path, headers: nil)
|
|
22
|
+
session.get(
|
|
23
|
+
path,
|
|
24
|
+
nil,
|
|
25
|
+
headers ? headers.merge(env) : env
|
|
26
|
+
)
|
|
19
27
|
end
|
|
20
28
|
|
|
21
29
|
def uri_for(path)
|
|
22
30
|
base.dup.tap { |uri| uri.path = path }
|
|
23
31
|
end
|
|
24
|
-
|
|
25
|
-
private
|
|
26
|
-
def set_env
|
|
27
|
-
@env = {
|
|
28
|
-
'HTTP_HOST' => Utils.host_with_port(base),
|
|
29
|
-
'HTTPS' => base.scheme == 'https' ? 'on' : 'off',
|
|
30
|
-
script_name: base.path.chomp('/'),
|
|
31
|
-
}
|
|
32
|
-
end
|
|
33
32
|
end
|
|
34
33
|
end
|
|
@@ -0,0 +1,109 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
require 'fileutils'
|
|
3
|
+
require 'yaml'
|
|
4
|
+
|
|
5
|
+
module Parklife
|
|
6
|
+
class Build
|
|
7
|
+
META_PATH = File.join('.parklife', 'build.yml')
|
|
8
|
+
|
|
9
|
+
def self.from_dir(dir)
|
|
10
|
+
return unless dir.exist?
|
|
11
|
+
path = dir.join(META_PATH)
|
|
12
|
+
return unless path.exist?
|
|
13
|
+
data = YAML.safe_load(path.read)
|
|
14
|
+
|
|
15
|
+
build = new(dir, nested_index: data.dig('config', 'nested_index'))
|
|
16
|
+
build.paths.merge!(data['paths'])
|
|
17
|
+
build
|
|
18
|
+
end
|
|
19
|
+
|
|
20
|
+
def self.path_for(path, media_type: nil, nested_index:)
|
|
21
|
+
# Remove leading/trailing slashes.
|
|
22
|
+
path = path.gsub(/^\/|\/$/, '')
|
|
23
|
+
|
|
24
|
+
# The root is always expected to be an HTML page regardless of the
|
|
25
|
+
# response's content type.
|
|
26
|
+
return 'index.html' if path.empty?
|
|
27
|
+
|
|
28
|
+
text_html = media_type.nil? || media_type == 'text/html'
|
|
29
|
+
|
|
30
|
+
# Store a text/html response in an .html file.
|
|
31
|
+
if text_html && !path.end_with?('.html')
|
|
32
|
+
if nested_index
|
|
33
|
+
path << '/index.html'
|
|
34
|
+
else
|
|
35
|
+
path << '.html'
|
|
36
|
+
end
|
|
37
|
+
end
|
|
38
|
+
|
|
39
|
+
path
|
|
40
|
+
end
|
|
41
|
+
|
|
42
|
+
attr_reader :dir, :nested_index, :paths
|
|
43
|
+
|
|
44
|
+
def initialize(dir, nested_index:)
|
|
45
|
+
@dir = dir
|
|
46
|
+
@nested_index = nested_index
|
|
47
|
+
@paths = {}
|
|
48
|
+
end
|
|
49
|
+
|
|
50
|
+
def add(route, response)
|
|
51
|
+
build_path = self.build_path(route, response)
|
|
52
|
+
write(build_path, response.body)
|
|
53
|
+
add_path_meta(route, response, build_path)
|
|
54
|
+
end
|
|
55
|
+
|
|
56
|
+
def build_path(route, response)
|
|
57
|
+
self.class.path_for(
|
|
58
|
+
route.path,
|
|
59
|
+
media_type: response.media_type,
|
|
60
|
+
nested_index: nested_index,
|
|
61
|
+
)
|
|
62
|
+
end
|
|
63
|
+
|
|
64
|
+
def copy(src, route, response)
|
|
65
|
+
build_path = self.build_path(route, response)
|
|
66
|
+
|
|
67
|
+
dest = dir.join(build_path)
|
|
68
|
+
dest.dirname.mkpath
|
|
69
|
+
FileUtils.cp(src, dest)
|
|
70
|
+
|
|
71
|
+
add_path_meta(route, response, build_path)
|
|
72
|
+
end
|
|
73
|
+
|
|
74
|
+
def etag(path)
|
|
75
|
+
paths.dig(path, 'etag')
|
|
76
|
+
end
|
|
77
|
+
|
|
78
|
+
def get(route, response)
|
|
79
|
+
dir.join(build_path(route, response))
|
|
80
|
+
end
|
|
81
|
+
|
|
82
|
+
def to_yaml
|
|
83
|
+
YAML.dump({
|
|
84
|
+
'config' => {
|
|
85
|
+
'nested_index' => nested_index,
|
|
86
|
+
},
|
|
87
|
+
'paths' => paths,
|
|
88
|
+
})
|
|
89
|
+
end
|
|
90
|
+
|
|
91
|
+
def write(path, content)
|
|
92
|
+
file = dir.join(path)
|
|
93
|
+
file.dirname.mkpath
|
|
94
|
+
file.write(content, mode: 'wb')
|
|
95
|
+
end
|
|
96
|
+
|
|
97
|
+
def write_meta
|
|
98
|
+
write(META_PATH, to_yaml)
|
|
99
|
+
end
|
|
100
|
+
|
|
101
|
+
private
|
|
102
|
+
def add_path_meta(route, response, build_path)
|
|
103
|
+
paths[route.path] = {
|
|
104
|
+
'build_path' => build_path,
|
|
105
|
+
'etag' => response['Etag'],
|
|
106
|
+
}.compact
|
|
107
|
+
end
|
|
108
|
+
end
|
|
109
|
+
end
|
data/lib/parklife/cli.rb
CHANGED
|
@@ -11,23 +11,31 @@ module Parklife
|
|
|
11
11
|
class_option :base, desc: 'Override config.base configured in the Parkfile'
|
|
12
12
|
|
|
13
13
|
desc 'build', 'Create a production build'
|
|
14
|
+
option :cache_dir, desc: 'Path to an existing build directory (which must include build meta)', type: :string
|
|
15
|
+
option :no_colour, aliases: '--no-color', desc: "Don't include colours in terminal output", type: :boolean
|
|
16
|
+
option :reporter, desc: 'Output formatter: log (one line per visited route), null (only errors), progress (dots)'
|
|
17
|
+
option :skip_build_meta, desc: 'Do not include Parklife build metadata', type: :boolean
|
|
14
18
|
def build
|
|
19
|
+
application.config.cache_dir = options[:cache_dir] if options.key?(:cache_dir)
|
|
20
|
+
application.config.no_colour = options[:no_colour] if options.key?(:no_colour)
|
|
21
|
+
application.config.reporter = options[:reporter] if options.key?(:reporter)
|
|
22
|
+
application.config.skip_build_meta = options[:skip_build_meta] if options.key?(:skip_build_meta)
|
|
15
23
|
application.build
|
|
16
24
|
end
|
|
17
25
|
|
|
18
26
|
desc 'config', 'Output the full Parklife config'
|
|
19
27
|
def config
|
|
20
|
-
reporter = application.config.reporter
|
|
21
|
-
|
|
22
28
|
shell.print_table([
|
|
23
29
|
['app', application.config.app.inspect],
|
|
24
30
|
['base', application.config.base.to_s],
|
|
25
31
|
['build_dir', application.config.build_dir],
|
|
32
|
+
['cache_dir', application.config.cache_dir],
|
|
26
33
|
['nested_index', application.config.nested_index],
|
|
34
|
+
['no_colour', application.config.no_colour],
|
|
27
35
|
['on_404', application.config.on_404.inspect],
|
|
28
36
|
['parklife-rails', defined?(::Parklife::Rails) ? 'enabled' : '-'],
|
|
29
37
|
['parklife-sinatra', defined?(::Parklife::Sinatra) ? 'enabled' : '-'],
|
|
30
|
-
['reporter', reporter
|
|
38
|
+
['reporter', application.config.reporter],
|
|
31
39
|
])
|
|
32
40
|
end
|
|
33
41
|
|
|
@@ -65,8 +73,8 @@ module Parklife
|
|
|
65
73
|
private
|
|
66
74
|
def application
|
|
67
75
|
@application ||= Parklife.application.tap { |app|
|
|
68
|
-
# Default
|
|
69
|
-
app.config.reporter =
|
|
76
|
+
# Default to dots for progress (can be overridden in the Parkfile).
|
|
77
|
+
app.config.reporter = 'progress'
|
|
70
78
|
|
|
71
79
|
# Reach inside the consuming app's directory to apply its Parklife
|
|
72
80
|
# config.
|
data/lib/parklife/config.rb
CHANGED
|
@@ -1,22 +1,31 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
|
-
|
|
2
|
+
require 'pathname'
|
|
3
3
|
require 'stringio'
|
|
4
4
|
require 'uri'
|
|
5
|
+
require_relative 'logger'
|
|
6
|
+
require_relative 'reporter/log'
|
|
7
|
+
require_relative 'reporter/null'
|
|
8
|
+
require_relative 'reporter/progress'
|
|
5
9
|
|
|
6
10
|
module Parklife
|
|
7
11
|
class Config
|
|
12
|
+
CACHE_TMPDIR = 'tmp/parklife/cache'
|
|
8
13
|
DEFAULT_HOST = 'example.com'
|
|
9
14
|
DEFAULT_SCHEME = 'http'
|
|
10
15
|
|
|
11
|
-
attr_accessor :app, :
|
|
12
|
-
attr_reader :base
|
|
16
|
+
attr_accessor :app, :logger, :nested_index, :on_404, :skip_build_meta
|
|
17
|
+
attr_reader :base, :build_dir, :cache_dir, :no_colour, :reporter
|
|
13
18
|
|
|
14
19
|
def initialize
|
|
15
20
|
self.base = nil
|
|
16
21
|
self.build_dir = 'build'
|
|
22
|
+
self.cache_dir = nil
|
|
23
|
+
self.logger = Logger.new
|
|
17
24
|
self.nested_index = true
|
|
25
|
+
self.no_colour = false
|
|
18
26
|
self.on_404 = :error
|
|
19
|
-
self.reporter =
|
|
27
|
+
self.reporter = 'null'
|
|
28
|
+
self.skip_build_meta = false
|
|
20
29
|
end
|
|
21
30
|
|
|
22
31
|
def base=(value)
|
|
@@ -25,5 +34,28 @@ module Parklife
|
|
|
25
34
|
uri.scheme ||= DEFAULT_SCHEME
|
|
26
35
|
@base = uri
|
|
27
36
|
end
|
|
37
|
+
|
|
38
|
+
def build_dir=(value)
|
|
39
|
+
@build_dir = Pathname.new(value)
|
|
40
|
+
end
|
|
41
|
+
|
|
42
|
+
def cache_dir=(value)
|
|
43
|
+
@cache_dir = value ? Pathname.new(value) : nil
|
|
44
|
+
end
|
|
45
|
+
|
|
46
|
+
def no_colour=(value)
|
|
47
|
+
@no_colour = logger.no_colour = value
|
|
48
|
+
end
|
|
49
|
+
|
|
50
|
+
def reporter=(value)
|
|
51
|
+
@reporter = case value
|
|
52
|
+
when 'log'
|
|
53
|
+
Reporter::Log.new(logger)
|
|
54
|
+
when 'progress'
|
|
55
|
+
Reporter::Progress.new(logger)
|
|
56
|
+
else
|
|
57
|
+
Reporter::Null.new(logger)
|
|
58
|
+
end
|
|
59
|
+
end
|
|
28
60
|
end
|
|
29
61
|
end
|
data/lib/parklife/crawler.rb
CHANGED
|
@@ -1,98 +1,97 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
|
-
|
|
3
|
-
require 'parklife/browser'
|
|
4
|
-
require 'parklife/route'
|
|
5
|
-
require 'parklife/utils'
|
|
6
2
|
require 'set'
|
|
3
|
+
require_relative 'browser'
|
|
4
|
+
require_relative 'build'
|
|
5
|
+
require_relative 'responder/not_found'
|
|
6
|
+
require_relative 'responder/not_modified'
|
|
7
|
+
require_relative 'responder/ok'
|
|
8
|
+
require_relative 'responder/redirect'
|
|
9
|
+
require_relative 'responder/unknown'
|
|
10
|
+
require_relative 'route'
|
|
11
|
+
require_relative 'utils'
|
|
7
12
|
|
|
8
13
|
module Parklife
|
|
9
14
|
class Crawler
|
|
10
|
-
|
|
15
|
+
RESPONDERS = {
|
|
16
|
+
200 => Responder::Ok,
|
|
17
|
+
301 => Responder::Redirect,
|
|
18
|
+
302 => Responder::Redirect,
|
|
19
|
+
304 => Responder::NotModified,
|
|
20
|
+
404 => Responder::NotFound,
|
|
21
|
+
}
|
|
22
|
+
|
|
23
|
+
attr_reader :browser, :build, :cache, :config, :routes, :visited
|
|
11
24
|
|
|
12
|
-
def initialize(config,
|
|
25
|
+
def initialize(config, routes, cache)
|
|
13
26
|
@config = config
|
|
14
|
-
@
|
|
27
|
+
@routes = routes.to_a
|
|
28
|
+
@cache = cache
|
|
15
29
|
@browser = Browser.new(config.app, config.base)
|
|
30
|
+
@build = Build.new(config.build_dir, nested_index: config.nested_index)
|
|
31
|
+
@visited = Set.new
|
|
32
|
+
@responder_for_status = {}
|
|
16
33
|
end
|
|
17
34
|
|
|
18
|
-
def
|
|
19
|
-
|
|
35
|
+
def crawl(html)
|
|
36
|
+
Utils.scan_for_links(html) do |path|
|
|
37
|
+
# If the app is mounted at a subdirectory then it responds to paths that
|
|
38
|
+
# *exclude* the subdirectory and generates links that *include* the
|
|
39
|
+
# subdirectory (so if the app is mounted at "/foo" and serving "/bar"
|
|
40
|
+
# then the full path would be "/foo/bar" and a generated link would
|
|
41
|
+
# include the mount path like "/foo/link").
|
|
42
|
+
#
|
|
43
|
+
# Anyway, this mount path prefix must be trimmed from link paths so that
|
|
44
|
+
# correct app routes are created.
|
|
45
|
+
baseless_path = path.delete_prefix(config.base.path)
|
|
46
|
+
new_route = Route.new(baseless_path, crawl: true)
|
|
47
|
+
|
|
48
|
+
next if visited?(new_route)
|
|
49
|
+
|
|
50
|
+
routes << new_route
|
|
51
|
+
end
|
|
20
52
|
end
|
|
21
53
|
|
|
22
|
-
def
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
processed = process_route(route)
|
|
28
|
-
config.reporter.print('.') if processed
|
|
54
|
+
def get(path)
|
|
55
|
+
headers = if (etag = cache&.etag(path))
|
|
56
|
+
{ 'HTTP_IF_NONE_MATCH' => etag }
|
|
57
|
+
else
|
|
58
|
+
nil
|
|
29
59
|
end
|
|
30
60
|
|
|
31
|
-
|
|
61
|
+
browser.get(path, headers: headers)
|
|
32
62
|
end
|
|
33
63
|
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
@visited.include?(route)
|
|
40
|
-
else
|
|
41
|
-
# This route isn't being crawled so there's no need to re-process
|
|
42
|
-
# it if it has already been visited or crawled.
|
|
43
|
-
crawled_route = Route.new(route.path, crawl: true)
|
|
44
|
-
@visited.include?(route) || @visited.include?(crawled_route)
|
|
45
|
-
end
|
|
46
|
-
|
|
47
|
-
return false if already_processed
|
|
64
|
+
def responder_for_status(status)
|
|
65
|
+
@responder_for_status[status] ||= RESPONDERS
|
|
66
|
+
.fetch(status, Responder::Unknown)
|
|
67
|
+
.new(self)
|
|
68
|
+
end
|
|
48
69
|
|
|
70
|
+
def start
|
|
71
|
+
while (route = routes.shift)
|
|
72
|
+
next if visited?(route)
|
|
49
73
|
response = get(route.path)
|
|
50
|
-
|
|
51
|
-
case response.status
|
|
52
|
-
when 200
|
|
53
|
-
# Continue processing the route.
|
|
54
|
-
when 301, 302
|
|
55
|
-
raise HTTPRedirectError.new(
|
|
56
|
-
response.status,
|
|
57
|
-
browser.uri_for(route.path),
|
|
58
|
-
response.headers['location']
|
|
59
|
-
)
|
|
60
|
-
when 404
|
|
61
|
-
case config.on_404
|
|
62
|
-
when :warn
|
|
63
|
-
$stderr.puts HTTPError.new(404, route.path).message
|
|
64
|
-
when :skip
|
|
65
|
-
return false
|
|
66
|
-
else
|
|
67
|
-
raise HTTPError.new(404, route.path)
|
|
68
|
-
end
|
|
69
|
-
else
|
|
70
|
-
raise HTTPError.new(response.status, route.path)
|
|
71
|
-
end
|
|
72
|
-
|
|
73
|
-
Utils.save_page(route.path, response.body, config)
|
|
74
|
-
|
|
75
74
|
@visited << route
|
|
75
|
+
config.reporter.visit(route, response)
|
|
76
|
+
responder_for_status(response.status).call(route, response)
|
|
77
|
+
end
|
|
76
78
|
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
# it is correctly configured). This prefix must therefore be
|
|
82
|
-
# stripped from links discovered via crawling.
|
|
83
|
-
baseless_path = path.delete_prefix(config.base.path)
|
|
84
|
-
|
|
85
|
-
route = Route.new(baseless_path, crawl: true)
|
|
86
|
-
|
|
87
|
-
# Don't revisit the route if it has already been visited with
|
|
88
|
-
# crawl=true but do revisit if it wasn't crawled.
|
|
89
|
-
next if @visited.include?(route)
|
|
90
|
-
|
|
91
|
-
@routes << route
|
|
92
|
-
end
|
|
93
|
-
end
|
|
79
|
+
config.reporter.finish
|
|
80
|
+
ensure
|
|
81
|
+
build.write_meta unless config.skip_build_meta
|
|
82
|
+
end
|
|
94
83
|
|
|
95
|
-
|
|
84
|
+
def visited?(route)
|
|
85
|
+
if route.crawl
|
|
86
|
+
# A crawl=true route is only counted as visited when it has already been
|
|
87
|
+
# crawled, if it's been visited by a non-crawl route then it must be
|
|
88
|
+
# visited again so it can be crawled.
|
|
89
|
+
@visited.include?(route)
|
|
90
|
+
else
|
|
91
|
+
# A crawl=false route is counted as visited whether it was previously
|
|
92
|
+
# visited with either a crawl or non-crawl route.
|
|
93
|
+
@visited.include?(route) || @visited.include?(route.with_crawl)
|
|
96
94
|
end
|
|
95
|
+
end
|
|
97
96
|
end
|
|
98
97
|
end
|
data/lib/parklife/errors.rb
CHANGED
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
require 'forwardable'
|
|
3
|
+
require 'thor/shell/color'
|
|
4
|
+
|
|
5
|
+
module Parklife
|
|
6
|
+
class Logger
|
|
7
|
+
extend Forwardable
|
|
8
|
+
|
|
9
|
+
attr_accessor :no_colour
|
|
10
|
+
attr_reader :stderr, :stdout
|
|
11
|
+
|
|
12
|
+
def initialize(stdout = $stdout, stderr = $stderr, no_colour: false)
|
|
13
|
+
@stdout = stdout
|
|
14
|
+
@stderr = stderr
|
|
15
|
+
@no_colour = no_colour
|
|
16
|
+
@thor_color = Thor::Shell::Color.new
|
|
17
|
+
end
|
|
18
|
+
|
|
19
|
+
def_delegators :@stdout, :print, :puts
|
|
20
|
+
|
|
21
|
+
def colour(string, *colours)
|
|
22
|
+
no_colour ? string : @thor_color.set_color(string, *colours)
|
|
23
|
+
end
|
|
24
|
+
|
|
25
|
+
def warn(*message)
|
|
26
|
+
stderr.puts(colour(*message, :on_red))
|
|
27
|
+
end
|
|
28
|
+
end
|
|
29
|
+
end
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
require 'forwardable'
|
|
3
|
+
|
|
4
|
+
module Parklife
|
|
5
|
+
module Reporter
|
|
6
|
+
class Base
|
|
7
|
+
STATUS_COLOUR = {
|
|
8
|
+
200 => :green,
|
|
9
|
+
304 => :blue,
|
|
10
|
+
404 => :yellow,
|
|
11
|
+
}
|
|
12
|
+
|
|
13
|
+
extend Forwardable
|
|
14
|
+
|
|
15
|
+
attr_reader :logger
|
|
16
|
+
|
|
17
|
+
def initialize(logger)
|
|
18
|
+
@logger = logger
|
|
19
|
+
end
|
|
20
|
+
|
|
21
|
+
def_delegators :@logger, :colour, :print, :puts
|
|
22
|
+
|
|
23
|
+
def finish
|
|
24
|
+
end
|
|
25
|
+
|
|
26
|
+
def visit(_route, _response)
|
|
27
|
+
end
|
|
28
|
+
end
|
|
29
|
+
end
|
|
30
|
+
end
|
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
require_relative 'base'
|
|
3
|
+
|
|
4
|
+
module Parklife
|
|
5
|
+
module Reporter
|
|
6
|
+
class Log < Base
|
|
7
|
+
def visit(route, response)
|
|
8
|
+
status = response.status
|
|
9
|
+
puts "#{colour(status, *STATUS_COLOUR[status])} #{route.path}"
|
|
10
|
+
end
|
|
11
|
+
end
|
|
12
|
+
end
|
|
13
|
+
end
|
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
require_relative 'base'
|
|
3
|
+
|
|
4
|
+
module Parklife
|
|
5
|
+
module Reporter
|
|
6
|
+
class Progress < Base
|
|
7
|
+
def finish
|
|
8
|
+
puts
|
|
9
|
+
end
|
|
10
|
+
|
|
11
|
+
def visit(_route, response)
|
|
12
|
+
print colour('.', *STATUS_COLOUR[response.status])
|
|
13
|
+
end
|
|
14
|
+
end
|
|
15
|
+
end
|
|
16
|
+
end
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
require_relative 'base'
|
|
3
|
+
|
|
4
|
+
module Parklife
|
|
5
|
+
module Responder
|
|
6
|
+
class NotFound < Base
|
|
7
|
+
def call(route, response)
|
|
8
|
+
case crawler.config.on_404
|
|
9
|
+
when :skip
|
|
10
|
+
# No-op.
|
|
11
|
+
when :warn
|
|
12
|
+
crawler.config.logger.warn(
|
|
13
|
+
HTTPError.new(response.status, route.path).message
|
|
14
|
+
)
|
|
15
|
+
else
|
|
16
|
+
raise HTTPError.new(response.status, route.path)
|
|
17
|
+
end
|
|
18
|
+
end
|
|
19
|
+
end
|
|
20
|
+
end
|
|
21
|
+
end
|
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
require_relative 'base'
|
|
3
|
+
|
|
4
|
+
module Parklife
|
|
5
|
+
module Responder
|
|
6
|
+
class NotModified < Base
|
|
7
|
+
def call(route, response)
|
|
8
|
+
pathname = crawler.cache&.get(route, response)
|
|
9
|
+
|
|
10
|
+
return unless pathname&.exist?
|
|
11
|
+
|
|
12
|
+
crawler.build.copy(pathname, route, response)
|
|
13
|
+
crawler.crawl(pathname.read) if route.crawl
|
|
14
|
+
end
|
|
15
|
+
end
|
|
16
|
+
end
|
|
17
|
+
end
|
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
require_relative 'base'
|
|
3
|
+
|
|
4
|
+
module Parklife
|
|
5
|
+
module Responder
|
|
6
|
+
class Redirect < Base
|
|
7
|
+
def call(route, response)
|
|
8
|
+
raise HTTPRedirectError.new(
|
|
9
|
+
response.status,
|
|
10
|
+
crawler.browser.uri_for(route.path),
|
|
11
|
+
response.headers['location']
|
|
12
|
+
)
|
|
13
|
+
end
|
|
14
|
+
end
|
|
15
|
+
end
|
|
16
|
+
end
|
data/lib/parklife/route.rb
CHANGED
data/lib/parklife/utils.rb
CHANGED
|
@@ -7,36 +7,11 @@ module Parklife
|
|
|
7
7
|
module Utils
|
|
8
8
|
extend self
|
|
9
9
|
|
|
10
|
-
def build_path_for(path, index: true)
|
|
11
|
-
path = path.gsub(/^\/|\/$/, '')
|
|
12
|
-
|
|
13
|
-
if File.extname(path).empty?
|
|
14
|
-
if path.empty?
|
|
15
|
-
'index.html'
|
|
16
|
-
elsif index
|
|
17
|
-
File.join(path, 'index.html')
|
|
18
|
-
else
|
|
19
|
-
"#{path}.html"
|
|
20
|
-
end
|
|
21
|
-
else
|
|
22
|
-
path
|
|
23
|
-
end
|
|
24
|
-
end
|
|
25
|
-
|
|
26
10
|
def host_with_port(uri)
|
|
27
11
|
default_port = uri.scheme == 'https' ? 443 : 80
|
|
28
12
|
uri.port == default_port ? uri.host : "#{uri.host}:#{uri.port}"
|
|
29
13
|
end
|
|
30
14
|
|
|
31
|
-
def save_page(path, content, config)
|
|
32
|
-
build_path = File.join(
|
|
33
|
-
config.build_dir,
|
|
34
|
-
build_path_for(path, index: config.nested_index)
|
|
35
|
-
)
|
|
36
|
-
FileUtils.mkdir_p(File.dirname(build_path))
|
|
37
|
-
File.write(build_path, content, mode: 'wb')
|
|
38
|
-
end
|
|
39
|
-
|
|
40
15
|
def scan_for_links(html)
|
|
41
16
|
doc = Nokogiri::HTML.parse(html)
|
|
42
17
|
doc.css('a[href]').each do |a|
|
data/lib/parklife/version.rb
CHANGED
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: parklife
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.9.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Ben Pickles
|
|
@@ -59,6 +59,7 @@ extensions: []
|
|
|
59
59
|
extra_rdoc_files: []
|
|
60
60
|
files:
|
|
61
61
|
- ".github/FUNDING.yml"
|
|
62
|
+
- ".github/dependabot.yml"
|
|
62
63
|
- ".github/workflows/examples.yml"
|
|
63
64
|
- ".github/workflows/tests.yml"
|
|
64
65
|
- ".gitignore"
|
|
@@ -76,10 +77,22 @@ files:
|
|
|
76
77
|
- lib/parklife.rb
|
|
77
78
|
- lib/parklife/application.rb
|
|
78
79
|
- lib/parklife/browser.rb
|
|
80
|
+
- lib/parklife/build.rb
|
|
79
81
|
- lib/parklife/cli.rb
|
|
80
82
|
- lib/parklife/config.rb
|
|
81
83
|
- lib/parklife/crawler.rb
|
|
82
84
|
- lib/parklife/errors.rb
|
|
85
|
+
- lib/parklife/logger.rb
|
|
86
|
+
- lib/parklife/reporter/base.rb
|
|
87
|
+
- lib/parklife/reporter/log.rb
|
|
88
|
+
- lib/parklife/reporter/null.rb
|
|
89
|
+
- lib/parklife/reporter/progress.rb
|
|
90
|
+
- lib/parklife/responder/base.rb
|
|
91
|
+
- lib/parklife/responder/not_found.rb
|
|
92
|
+
- lib/parklife/responder/not_modified.rb
|
|
93
|
+
- lib/parklife/responder/ok.rb
|
|
94
|
+
- lib/parklife/responder/redirect.rb
|
|
95
|
+
- lib/parklife/responder/unknown.rb
|
|
83
96
|
- lib/parklife/route.rb
|
|
84
97
|
- lib/parklife/route_set.rb
|
|
85
98
|
- lib/parklife/templates/Parkfile.erb
|