site_maps 0.0.1.beta3 → 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.github/workflows/main.yml +2 -4
- data/.rubocop.yml +4 -2
- data/.tool-versions +1 -1
- data/AGENTS.md +73 -0
- data/CHANGELOG.md +5 -0
- data/CLAUDE.md +77 -0
- data/Gemfile +1 -0
- data/Gemfile.lock +72 -56
- data/README.md +531 -393
- data/docs/README.md +67 -0
- data/docs/adapters.md +143 -0
- data/docs/api.md +154 -0
- data/docs/cli.md +93 -0
- data/docs/events.md +79 -0
- data/docs/extensions.md +141 -0
- data/docs/getting-started.md +138 -0
- data/docs/middleware.md +85 -0
- data/docs/processes.md +156 -0
- data/docs/rails.md +128 -0
- data/lib/site_maps/adapters/adapter.rb +35 -5
- data/lib/site_maps/adapters/aws_sdk/storage.rb +5 -2
- data/lib/site_maps/builder/sitemap_index/item.rb +1 -1
- data/lib/site_maps/builder/sitemap_index.rb +29 -5
- data/lib/site_maps/builder/url.rb +13 -10
- data/lib/site_maps/builder/url_set.rb +17 -7
- data/lib/site_maps/builder/xsl_stylesheet.rb +192 -0
- data/lib/site_maps/cli.rb +6 -2
- data/lib/site_maps/configuration.rb +8 -1
- data/lib/site_maps/incremental_location.rb +1 -1
- data/lib/site_maps/middleware.rb +197 -0
- data/lib/site_maps/notification/event.rb +1 -1
- data/lib/site_maps/notification/publisher.rb +1 -0
- data/lib/site_maps/notification.rb +1 -0
- data/lib/site_maps/ping.rb +35 -0
- data/lib/site_maps/{primitives → primitive}/array.rb +1 -1
- data/lib/site_maps/{primitives → primitive}/output.rb +1 -1
- data/lib/site_maps/primitive/string.rb +106 -0
- data/lib/site_maps/robots_txt.rb +21 -0
- data/lib/site_maps/runner/event_listener.rb +2 -2
- data/lib/site_maps/runner.rb +17 -3
- data/lib/site_maps/sitemap_builder.rb +16 -4
- data/lib/site_maps/sitemap_reader.rb +3 -0
- data/lib/site_maps/version.rb +1 -1
- data/lib/site_maps.rb +81 -10
- data/site_maps.gemspec +1 -1
- metadata +23 -10
- data/lib/site_maps/primitives/string.rb +0 -43
data/README.md
CHANGED
|
@@ -1,46 +1,65 @@
|
|
|
1
1
|
# SiteMaps
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
A concurrent, incremental sitemap generator for Ruby. Framework-agnostic with built-in Rails support.
|
|
4
|
+
|
|
5
|
+
Generates SEO-optimized XML sitemaps with support for sitemap indexes, XSL stylesheets, gzip compression, image/video/news extensions, search engine pinging, and Rack middleware for serving sitemaps with proper HTTP headers.
|
|
6
|
+
|
|
7
|
+
## Documentation
|
|
8
|
+
|
|
9
|
+
Full guides, adapter reference, CLI docs, and recipes are published at **[gems.marcosz.com.br/site_maps](https://gems.marcosz.com.br/site_maps/)** — part of the [marcosgz Ruby gem catalogue](https://gems.marcosz.com.br).
|
|
10
|
+
|
|
11
|
+
## Table of Contents
|
|
12
|
+
|
|
13
|
+
- [Documentation](#documentation)
|
|
14
|
+
- [Installation](#installation)
|
|
15
|
+
- [Quick Start](#quick-start)
|
|
16
|
+
- [Configuration](#configuration)
|
|
17
|
+
- [Processes](#processes)
|
|
18
|
+
- [Multi-Tenant Configuration](#multi-tenant-configuration)
|
|
19
|
+
- [URL Filtering](#url-filtering)
|
|
20
|
+
- [External Sitemaps](#external-sitemaps)
|
|
21
|
+
- [Sitemap Extensions](#sitemap-extensions)
|
|
22
|
+
- [XSL Stylesheets](#xsl-stylesheets)
|
|
23
|
+
- [Rack Middleware](#rack-middleware)
|
|
24
|
+
- [robots.txt](#robotstxt)
|
|
25
|
+
- [Search Engine Ping](#search-engine-ping)
|
|
26
|
+
- [Adapters](#adapters)
|
|
27
|
+
- [CLI](#cli)
|
|
28
|
+
- [Notifications](#notifications)
|
|
29
|
+
- [Mixins](#mixins)
|
|
30
|
+
- [Development](#development)
|
|
31
|
+
- [License](#license)
|
|
4
32
|
|
|
5
33
|
## Installation
|
|
6
34
|
|
|
7
|
-
Add
|
|
35
|
+
Add to your Gemfile:
|
|
8
36
|
|
|
9
37
|
```ruby
|
|
10
|
-
gem
|
|
38
|
+
gem "site_maps"
|
|
11
39
|
```
|
|
12
40
|
|
|
13
|
-
|
|
41
|
+
Then run `bundle install`.
|
|
14
42
|
|
|
15
|
-
|
|
16
|
-
bundle install
|
|
17
|
-
```
|
|
18
|
-
|
|
19
|
-
Or install it yourself as:
|
|
20
|
-
|
|
21
|
-
```bash
|
|
22
|
-
gem install site_maps
|
|
23
|
-
```
|
|
43
|
+
## Quick Start
|
|
24
44
|
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
Create a configuration file where you will define the sitemap logic. You can use the following DSL to define the sitemap generation. Below is the minimum configuration required to generate a sitemap:
|
|
45
|
+
Create a configuration file:
|
|
28
46
|
|
|
29
47
|
```ruby
|
|
30
48
|
# config/sitemap.rb
|
|
31
49
|
SiteMaps.use(:file_system) do
|
|
32
50
|
configure do |config|
|
|
33
|
-
config.url = "https://example.com/
|
|
34
|
-
config.directory =
|
|
51
|
+
config.url = "https://example.com/sitemap.xml"
|
|
52
|
+
config.directory = Rails.public_path.to_s
|
|
35
53
|
end
|
|
54
|
+
|
|
36
55
|
process do |s|
|
|
37
|
-
s.add(
|
|
38
|
-
s.add(
|
|
56
|
+
s.add("/", lastmod: Time.now)
|
|
57
|
+
s.add("/about", lastmod: Time.now)
|
|
39
58
|
end
|
|
40
59
|
end
|
|
41
60
|
```
|
|
42
61
|
|
|
43
|
-
|
|
62
|
+
Generate sitemaps:
|
|
44
63
|
|
|
45
64
|
```ruby
|
|
46
65
|
SiteMaps.generate(config_file: "config/sitemap.rb")
|
|
@@ -48,544 +67,668 @@ SiteMaps.generate(config_file: "config/sitemap.rb")
|
|
|
48
67
|
.run
|
|
49
68
|
```
|
|
50
69
|
|
|
51
|
-
|
|
70
|
+
Or via CLI:
|
|
52
71
|
|
|
53
72
|
```bash
|
|
54
73
|
bundle exec site_maps generate --config-file config/sitemap.rb
|
|
55
74
|
```
|
|
56
75
|
|
|
57
|
-
|
|
76
|
+
## Configuration
|
|
58
77
|
|
|
59
|
-
Configuration can be
|
|
60
|
-
|
|
61
|
-
* `url` - URL of the main sitemap index file. This URL must ends with `.xml` or `.xml.gz`.
|
|
62
|
-
* `directory` - Directory where the sitemap files will be stored.
|
|
63
|
-
|
|
64
|
-
Configuration using the `#configure` block
|
|
78
|
+
Configuration can be set inside the `SiteMaps.use` block using `configure`, `config`, or by passing options directly:
|
|
65
79
|
|
|
66
80
|
```ruby
|
|
81
|
+
# Block style
|
|
67
82
|
SiteMaps.use(:file_system) do
|
|
68
83
|
configure do |config|
|
|
69
|
-
config.url = "https://example.com/
|
|
70
|
-
config.directory = "/
|
|
84
|
+
config.url = "https://example.com/sitemap.xml.gz"
|
|
85
|
+
config.directory = "/var/www/public"
|
|
71
86
|
end
|
|
72
|
-
# define sitemap processes..
|
|
73
87
|
end
|
|
74
|
-
```
|
|
75
|
-
|
|
76
|
-
Configuration using `#config` method
|
|
77
88
|
|
|
78
|
-
|
|
89
|
+
# Inline style
|
|
79
90
|
SiteMaps.use(:file_system) do
|
|
80
|
-
config.url = "https://example.com/
|
|
81
|
-
config.directory = "/
|
|
82
|
-
# define sitemap processes..
|
|
91
|
+
config.url = "https://example.com/sitemap.xml.gz"
|
|
92
|
+
config.directory = "/var/www/public"
|
|
83
93
|
end
|
|
94
|
+
|
|
95
|
+
# Options style
|
|
96
|
+
SiteMaps.use(:file_system, url: "https://example.com/sitemap.xml.gz", directory: "/var/www/public")
|
|
84
97
|
```
|
|
85
98
|
|
|
86
|
-
|
|
99
|
+
### Common Options
|
|
100
|
+
|
|
101
|
+
| Option | Default | Description |
|
|
102
|
+
|--------|---------|-------------|
|
|
103
|
+
| `url` | *required* | URL of the main sitemap index file. Must end with `.xml` or `.xml.gz`. |
|
|
104
|
+
| `directory` | `"/tmp/sitemaps"` | Local directory for generated sitemap files. |
|
|
105
|
+
| `max_links` | `50_000` | Maximum URLs per sitemap file before splitting. Set to `1_000` for Yoast-style performance. |
|
|
106
|
+
| `emit_priority` | `true` | Include `<priority>` in XML output. Google ignores this — set to `false` to omit. |
|
|
107
|
+
| `emit_changefreq` | `true` | Include `<changefreq>` in XML output. Google ignores this — set to `false` to omit. |
|
|
108
|
+
| `xsl_stylesheet_url` | `nil` | URL of the XSL stylesheet for URL set sitemaps. Enables human-readable browser display. |
|
|
109
|
+
| `xsl_index_stylesheet_url` | `nil` | URL of the XSL stylesheet for the sitemap index. |
|
|
110
|
+
| `ping_search_engines` | `false` | Ping search engines after sitemap generation. |
|
|
111
|
+
| `ping_engines` | `nil` | Custom engines hash. Defaults to Bing when `nil`. |
|
|
112
|
+
|
|
113
|
+
### Gzip Compression
|
|
114
|
+
|
|
115
|
+
Append `.gz` to the sitemap URL to enable automatic gzip compression:
|
|
87
116
|
|
|
88
117
|
```ruby
|
|
89
|
-
|
|
90
|
-
# define sitemap processes..
|
|
91
|
-
end
|
|
118
|
+
config.url = "https://example.com/sitemap.xml.gz"
|
|
92
119
|
```
|
|
93
120
|
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
### Gzip Compression
|
|
121
|
+
### Priority and Change Frequency
|
|
97
122
|
|
|
98
|
-
|
|
123
|
+
Google and most search engines ignore `<priority>` and `<changefreq>` — only `<lastmod>` is meaningful. You can disable them:
|
|
99
124
|
|
|
100
125
|
```ruby
|
|
101
|
-
# config/sitemap.rb
|
|
102
126
|
SiteMaps.use(:file_system) do
|
|
103
127
|
configure do |config|
|
|
104
|
-
config.url = "https://example.com/
|
|
105
|
-
config.
|
|
106
|
-
|
|
107
|
-
process do |s|
|
|
108
|
-
# Add sitemap links
|
|
128
|
+
config.url = "https://example.com/sitemap.xml"
|
|
129
|
+
config.emit_priority = false
|
|
130
|
+
config.emit_changefreq = false
|
|
109
131
|
end
|
|
110
132
|
end
|
|
111
133
|
```
|
|
112
134
|
|
|
113
|
-
|
|
135
|
+
When disabled, default values (`priority: 0.5`, `changefreq: "weekly"`) are not included in the XML output. If you explicitly pass `priority:` or `changefreq:` to `s.add`, they are still emitted regardless of the flag.
|
|
114
136
|
|
|
115
|
-
|
|
137
|
+
## Processes
|
|
116
138
|
|
|
139
|
+
Processes define units of work for sitemap generation. Each process runs in a separate thread for concurrent generation.
|
|
117
140
|
|
|
118
|
-
|
|
119
|
-
* Multiple processes defined in the configuration file.
|
|
120
|
-
* The amount of links exceeds the maximum limit of links (50,000 links).
|
|
121
|
-
* The amount of news links exceeds the maximum limit of news links (1,000 links).
|
|
122
|
-
* The uncompressed file size exceeds the maximum limit of file size (50MB).
|
|
141
|
+
### Static Processes
|
|
123
142
|
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
Sitemap links are defined in the `process` block because the gem is designed to generate sitemaps for large websites in parallel. It means that each process will be executed in a separate thread, which will improve the performance of the sitemap generation.
|
|
127
|
-
|
|
128
|
-
Each process can have a unique name and a unique sitemap file location. By omitting the name and the file location, the process will use the `:default` value.
|
|
129
|
-
|
|
130
|
-
Bellow is an example of a configuration file with multiple processes:
|
|
143
|
+
Execute once with a fixed location:
|
|
131
144
|
|
|
132
145
|
```ruby
|
|
133
|
-
# config/sitemap.rb
|
|
134
146
|
SiteMaps.use(:file_system) do
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
config.directory = "/home/www/public"
|
|
138
|
-
end
|
|
139
|
-
# Static Processes
|
|
147
|
+
config.url = "https://example.com/sitemap.xml"
|
|
148
|
+
|
|
140
149
|
process do |s|
|
|
141
|
-
s.add(
|
|
142
|
-
s.add(
|
|
150
|
+
s.add("/", lastmod: Time.now)
|
|
151
|
+
s.add("/about", lastmod: Time.now)
|
|
143
152
|
end
|
|
153
|
+
|
|
144
154
|
process :categories, "categories/sitemap.xml" do |s|
|
|
145
155
|
Category.find_each do |category|
|
|
146
|
-
s.add(category_path(category),
|
|
156
|
+
s.add(category_path(category), lastmod: category.updated_at)
|
|
147
157
|
end
|
|
148
158
|
end
|
|
149
|
-
|
|
150
|
-
|
|
159
|
+
end
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
### Dynamic Processes
|
|
163
|
+
|
|
164
|
+
Execute multiple times with different parameters. The location supports `%{placeholder}` interpolation:
|
|
165
|
+
|
|
166
|
+
```ruby
|
|
167
|
+
SiteMaps.use(:file_system) do
|
|
168
|
+
config.url = "https://example.com/sitemap.xml"
|
|
169
|
+
|
|
170
|
+
process :posts, "posts/%{year}-%{month}/sitemap.xml", year: Date.today.year, month: Date.today.month do |s, year:, month:, **|
|
|
151
171
|
Post.where(year: year.to_i, month: month.to_i).find_each do |post|
|
|
152
|
-
s.add(post_path(post),
|
|
172
|
+
s.add(post_path(post), lastmod: post.updated_at)
|
|
153
173
|
end
|
|
154
174
|
end
|
|
155
175
|
end
|
|
156
176
|
```
|
|
157
177
|
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
Location can contain placeholders that will be replaced by the values passed to the process block(The `%{year}` and `%{month}` of example bellow). Both relative and absolute paths are supported. Note that when using relative paths, the base dir of main sitemap index file will be used as the root directory.
|
|
161
|
-
|
|
162
|
-
It will let you enqueue the same process multiple times with different values.
|
|
178
|
+
Enqueue dynamic processes with specific values:
|
|
163
179
|
|
|
164
180
|
```ruby
|
|
165
181
|
SiteMaps.generate(config_file: "config/sitemap.rb")
|
|
166
|
-
.enqueue(:posts, year: "
|
|
167
|
-
.enqueue(:posts, year: "
|
|
168
|
-
.enqueue_remaining
|
|
182
|
+
.enqueue(:posts, year: "2024", month: "01")
|
|
183
|
+
.enqueue(:posts, year: "2024", month: "02")
|
|
184
|
+
.enqueue_remaining # enqueue all other non-enqueued processes
|
|
169
185
|
.run
|
|
170
186
|
```
|
|
171
187
|
|
|
172
|
-
**
|
|
188
|
+
**Note:** Dynamic process arguments may be strings when coming from CLI or external sources. Add `.to_i` or other conversions in the process block as needed.
|
|
189
|
+
|
|
190
|
+
### Automatic Splitting
|
|
173
191
|
|
|
174
|
-
|
|
175
|
-
* By omitting the extra arguments, the process will be enqueued with the default values defined in the configuration file. So make sure you define default values or properly add nil checks in the process block to avoid errors.
|
|
192
|
+
Sitemaps are automatically split into multiple files and a sitemap index is generated when:
|
|
176
193
|
|
|
177
|
-
|
|
194
|
+
- Multiple processes are defined.
|
|
195
|
+
- URL count exceeds `max_links` (default 50,000).
|
|
196
|
+
- News URL count exceeds 1,000.
|
|
197
|
+
- Uncompressed file size exceeds 50MB.
|
|
178
198
|
|
|
179
|
-
|
|
199
|
+
Split files are named sequentially: `sitemap1.xml`, `sitemap2.xml`, etc.
|
|
180
200
|
|
|
181
|
-
|
|
182
|
-
* [Image](https://support.google.com/webmasters/answer/178636?hl=en)
|
|
183
|
-
* [Mobile](http://support.google.com/webmasters/bin/answer.py?hl=en&answer=34648)
|
|
184
|
-
* [News](https://support.google.com/news/publisher-center/answer/9606710?hl=en)
|
|
185
|
-
* [PageMap](https://developers.google.com/custom-search/docs/structured_data?csw=1#pagemaps)
|
|
186
|
-
* [Video](https://support.google.com/webmasters/answer/80471?hl=en)
|
|
201
|
+
## Multi-Tenant Configuration
|
|
187
202
|
|
|
188
|
-
|
|
203
|
+
For multi-tenant applications where each site shares a config file but needs runtime context (like a `Site` model loaded from the database), use `SiteMaps.define` with the `context:` kwarg.
|
|
189
204
|
|
|
190
|
-
|
|
205
|
+
The `context:` value must be a `Hash`. Its keys are passed as keyword arguments to the `define` block:
|
|
191
206
|
|
|
192
|
-
|
|
207
|
+
```ruby
|
|
208
|
+
# config/sitemap.rb
|
|
209
|
+
SiteMaps.define do |site:, **|
|
|
210
|
+
use(:file_system) do
|
|
211
|
+
configure do |config|
|
|
212
|
+
config.url = "https://#{site.domain}/sitemap.xml"
|
|
213
|
+
config.directory = site.public_path
|
|
214
|
+
end
|
|
193
215
|
|
|
194
|
-
|
|
216
|
+
process do |s|
|
|
217
|
+
site.pages.find_each { |p| s.add(p.path, lastmod: p.updated_at) }
|
|
218
|
+
end
|
|
219
|
+
|
|
220
|
+
process :posts, "posts/sitemap.xml" do |s|
|
|
221
|
+
site.posts.published.find_each { |p| s.add(p.path, lastmod: p.updated_at) }
|
|
222
|
+
end
|
|
223
|
+
end
|
|
224
|
+
end
|
|
225
|
+
```
|
|
195
226
|
|
|
196
227
|
```ruby
|
|
197
|
-
|
|
198
|
-
|
|
199
|
-
|
|
200
|
-
|
|
201
|
-
|
|
202
|
-
|
|
203
|
-
|
|
204
|
-
|
|
205
|
-
|
|
206
|
-
|
|
207
|
-
|
|
228
|
+
# Usage — iterate sites, each gets its own isolated adapter
|
|
229
|
+
Site.find_each do |site|
|
|
230
|
+
SiteMaps.generate(config_file: "config/sitemap.rb", context: {site: site}).enqueue_all.run
|
|
231
|
+
end
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
Multiple context values are passed as additional Hash keys:
|
|
235
|
+
|
|
236
|
+
```ruby
|
|
237
|
+
SiteMaps.define do |site:, locale:|
|
|
238
|
+
use(:file_system) do
|
|
239
|
+
config.url = "https://#{site.domain}/#{locale}/sitemap.xml"
|
|
240
|
+
# ...
|
|
208
241
|
end
|
|
209
242
|
end
|
|
243
|
+
|
|
244
|
+
SiteMaps.generate(config_file: "config/sitemap.rb", context: {site: site, locale: "en"}).run
|
|
210
245
|
```
|
|
211
246
|
|
|
212
|
-
|
|
213
|
-
* `loc` - URL of the image.
|
|
214
|
-
* `caption` - Image caption.
|
|
215
|
-
* `geo_location` - Image geo location.
|
|
216
|
-
* `title` - Image title.
|
|
217
|
-
* `license` - Image license.
|
|
247
|
+
### Serving with Rack Middleware
|
|
218
248
|
|
|
219
|
-
|
|
249
|
+
`SiteMaps::Middleware` supports multi-tenant setups via a callable `adapter:`. Because the adapter is resolved per-request, you can derive it from thread-local state set by an upstream middleware (e.g. `Current.site`):
|
|
220
250
|
|
|
221
|
-
|
|
251
|
+
```ruby
|
|
252
|
+
# Insert after your multitenancy middleware so Current.site is already set
|
|
253
|
+
Rails.application.middleware.insert_after MultitenancyMiddleware, SiteMaps::Middleware,
|
|
254
|
+
adapter: -> {
|
|
255
|
+
site = Current.site
|
|
256
|
+
next unless site
|
|
222
257
|
|
|
223
|
-
|
|
258
|
+
SiteMaps::Adapters::FileSystem.new(url: site.sitemap_url, directory: "tmp/")
|
|
259
|
+
}
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
Both `adapter:` and the prefix options accept a 0-arg lambda (reads thread-local state) or a 1-arg lambda (receives the Rack `env`).
|
|
263
|
+
|
|
264
|
+
#### Path mapping options
|
|
265
|
+
|
|
266
|
+
Use these when the public URL path and the storage path differ:
|
|
267
|
+
|
|
268
|
+
| Option | Direction | Example |
|
|
269
|
+
|---|---|---|
|
|
270
|
+
| `public_prefix:` | Public URL has an extra prefix → strip it to find the file | Stored at `/sitemap.xml`, served at `/sitemaps/tenant/sitemap.xml` |
|
|
271
|
+
| `storage_prefix:` | Storage has an extra prefix → prepend it to the public path | Stored at `/sitemaps/tenant/sitemap.xml`, served at `/sitemap.xml` |
|
|
224
272
|
|
|
225
273
|
```ruby
|
|
226
|
-
|
|
227
|
-
|
|
228
|
-
|
|
229
|
-
|
|
230
|
-
|
|
231
|
-
|
|
232
|
-
|
|
233
|
-
|
|
234
|
-
|
|
235
|
-
|
|
236
|
-
|
|
237
|
-
|
|
238
|
-
|
|
239
|
-
|
|
240
|
-
|
|
241
|
-
|
|
242
|
-
|
|
243
|
-
|
|
244
|
-
|
|
245
|
-
|
|
274
|
+
# Sitemaps stored at /sitemaps/{slug}/sitemap.xml, served at /sitemap.xml
|
|
275
|
+
# (subdomain identifies the tenant, no prefix needed in the public URL)
|
|
276
|
+
Rails.application.middleware.insert_after MultitenancyMiddleware, SiteMaps::Middleware,
|
|
277
|
+
storage_prefix: -> { site = Current.site; "/sitemaps/#{site.slug}" if site },
|
|
278
|
+
adapter: -> { ... }
|
|
279
|
+
|
|
280
|
+
# Sitemaps stored at root, served at /sitemaps/{slug}/sitemap.xml
|
|
281
|
+
Rails.application.middleware.insert_after MultitenancyMiddleware, SiteMaps::Middleware,
|
|
282
|
+
public_prefix: -> { site = Current.site; "/sitemaps/#{site.slug}" if site },
|
|
283
|
+
adapter: -> { ... }
|
|
284
|
+
```
|
|
285
|
+
|
|
286
|
+
XSL stylesheet requests (`/_sitemap-stylesheet.xsl`, `/_sitemap-index-stylesheet.xsl`) are served directly without resolving the adapter or prefix.
|
|
287
|
+
|
|
288
|
+
### Thread safety
|
|
289
|
+
|
|
290
|
+
`SiteMaps.generate(config_file:, context:)` is thread-safe. Each call uses a thread-local scope to isolate adapter construction during `load(config_file)`, so concurrent calls from different threads don't race on module-level state:
|
|
291
|
+
|
|
292
|
+
```ruby
|
|
293
|
+
Site.find_each.map do |site|
|
|
294
|
+
Thread.new do
|
|
295
|
+
SiteMaps.generate(config_file: "config/sitemap.rb", context: {site: site}).enqueue_all.run
|
|
246
296
|
end
|
|
297
|
+
end.each(&:join)
|
|
298
|
+
```
|
|
299
|
+
|
|
300
|
+
Each thread's `Runner` gets its own isolated adapter. Note that `SiteMaps.current_adapter` (the module singleton) exhibits last-writer-wins semantics under concurrency — use the `Runner`'s `#adapter` attribute if you need a specific generation's adapter.
|
|
301
|
+
|
|
302
|
+
For cases where you want to skip the config file entirely (e.g., everything dynamic from the database), instantiate adapters directly:
|
|
303
|
+
|
|
304
|
+
```ruby
|
|
305
|
+
adapter = SiteMaps::Adapters::FileSystem.new do
|
|
306
|
+
config.url = "https://#{site.domain}/sitemap.xml"
|
|
307
|
+
# ...
|
|
247
308
|
end
|
|
309
|
+
SiteMaps::Runner.new(adapter).enqueue_all.run
|
|
248
310
|
```
|
|
249
|
-
|
|
250
|
-
|
|
251
|
-
|
|
252
|
-
|
|
253
|
-
|
|
254
|
-
|
|
255
|
-
|
|
256
|
-
|
|
257
|
-
|
|
258
|
-
|
|
259
|
-
|
|
260
|
-
|
|
261
|
-
|
|
262
|
-
|
|
263
|
-
|
|
264
|
-
|
|
265
|
-
|
|
266
|
-
|
|
267
|
-
|
|
268
|
-
|
|
269
|
-
|
|
270
|
-
* `price` - Price of the video.
|
|
271
|
-
* `price_currency` - Currency of the video price.
|
|
272
|
-
* `price_type` - Type of the video price.
|
|
273
|
-
* `price_resolution` - Resolution of the video price.
|
|
274
|
-
* `live` - Live attribute of the video.
|
|
275
|
-
* `requires_subscription` - Requires subscription attribute of the video.
|
|
276
|
-
|
|
277
|
-
#### PageMap
|
|
278
|
-
|
|
279
|
-
PageMap sitemaps can be added to the sitemap links by passing `pagemap` attributes to the `add` method. The `pagemap` attribute should be a hash with the pagemap attributes.
|
|
280
|
-
|
|
281
|
-
Check out the Google specification [here](https://developers.google.com/custom-search/docs/structured_data?csw=1#pagemaps).
|
|
282
|
-
|
|
283
|
-
```ruby
|
|
284
|
-
config = { ... }
|
|
285
|
-
SiteMaps.use(:file_system, **config) do
|
|
311
|
+
|
|
312
|
+
## URL Filtering
|
|
313
|
+
|
|
314
|
+
Use `url_filter` to exclude or modify URLs before they enter the sitemap. Filters receive the full URL string and the options hash. Return `false` to exclude, or a modified hash to change options:
|
|
315
|
+
|
|
316
|
+
```ruby
|
|
317
|
+
SiteMaps.use(:file_system) do
|
|
318
|
+
config.url = "https://example.com/sitemap.xml"
|
|
319
|
+
|
|
320
|
+
# Exclude admin URLs
|
|
321
|
+
url_filter { |url, _options| false if url.include?("/admin") }
|
|
322
|
+
|
|
323
|
+
# Override priority for blog posts
|
|
324
|
+
url_filter do |url, options|
|
|
325
|
+
if url.include?("/blog/")
|
|
326
|
+
options.merge(priority: 0.9)
|
|
327
|
+
else
|
|
328
|
+
options
|
|
329
|
+
end
|
|
330
|
+
end
|
|
331
|
+
|
|
286
332
|
process do |s|
|
|
287
|
-
s.add(
|
|
288
|
-
|
|
289
|
-
|
|
290
|
-
changefreq: "daily",
|
|
291
|
-
pagemap: {
|
|
292
|
-
dataobjects: [
|
|
293
|
-
{
|
|
294
|
-
type: "document",
|
|
295
|
-
id: "1",
|
|
296
|
-
attributes: [
|
|
297
|
-
{ name: "title", value: "Page Title" },
|
|
298
|
-
{ name: "description", value: "Page Description" },
|
|
299
|
-
{ name: "url", value: "https://example.com" },
|
|
300
|
-
]
|
|
301
|
-
}
|
|
302
|
-
]
|
|
303
|
-
}
|
|
304
|
-
)
|
|
333
|
+
s.add("/", lastmod: Time.now)
|
|
334
|
+
s.add("/admin/dashboard") # excluded by filter
|
|
335
|
+
s.add("/blog/hello-world", lastmod: Time.now) # priority overridden to 0.9
|
|
305
336
|
end
|
|
306
337
|
end
|
|
307
338
|
```
|
|
308
339
|
|
|
309
|
-
|
|
310
|
-
* `dataobjects` - Array of hashes with the data objects.
|
|
311
|
-
* `type` - Type of the object.
|
|
312
|
-
* `id` - ID of the object.
|
|
313
|
-
* `attributes` - Array of hashes with the attributes.
|
|
314
|
-
* `name` - Name of the attribute.
|
|
315
|
-
* `value` - Value of the attribute.
|
|
316
|
-
|
|
317
|
-
#### News
|
|
340
|
+
Multiple filters are chained in order. If any filter returns `false`, the URL is excluded and subsequent filters are not called.
|
|
318
341
|
|
|
319
|
-
|
|
342
|
+
## External Sitemaps
|
|
320
343
|
|
|
321
|
-
|
|
344
|
+
Add third-party or externally-hosted sitemaps to your sitemap index using `external_sitemap`:
|
|
322
345
|
|
|
323
346
|
```ruby
|
|
324
|
-
|
|
325
|
-
|
|
347
|
+
SiteMaps.use(:file_system) do
|
|
348
|
+
config.url = "https://example.com/sitemap.xml"
|
|
349
|
+
|
|
350
|
+
external_sitemap "https://cdn.example.com/products-sitemap.xml", lastmod: Time.now
|
|
351
|
+
external_sitemap "https://blog.example.com/sitemap.xml"
|
|
352
|
+
|
|
326
353
|
process do |s|
|
|
327
|
-
s.add(
|
|
328
|
-
'/',
|
|
329
|
-
priority: 1.0,
|
|
330
|
-
changefreq: "daily",
|
|
331
|
-
news: {
|
|
332
|
-
publication_name: "Publication Name",
|
|
333
|
-
publication_language: "en",
|
|
334
|
-
publication_date: Time.now,
|
|
335
|
-
genres: "PressRelease",
|
|
336
|
-
access: "Subscription",
|
|
337
|
-
title: "News Title",
|
|
338
|
-
keywords: "News Keywords",
|
|
339
|
-
stock_tickers: "NASDAQ:GOOG",
|
|
340
|
-
}
|
|
341
|
-
)
|
|
354
|
+
s.add("/", lastmod: Time.now)
|
|
342
355
|
end
|
|
343
356
|
end
|
|
344
357
|
```
|
|
345
358
|
|
|
346
|
-
|
|
347
|
-
* `publication_name` - Name of the publication.
|
|
348
|
-
* `publication_language` - Language of the publication.
|
|
349
|
-
* `publication_date` - Publication date of the news.
|
|
350
|
-
* `genres` - Genres of the news.
|
|
351
|
-
* `access` - Access of the news.
|
|
352
|
-
* `title` - Title of the news.
|
|
353
|
-
* `keywords` - Keywords of the news.
|
|
354
|
-
* `stock_tickers` - Stock tickers of the news.
|
|
359
|
+
External sitemaps appear in the sitemap index alongside your generated sitemaps. When external sitemaps are present, the index is always generated (even with a single process).
|
|
355
360
|
|
|
356
|
-
|
|
361
|
+
## Sitemap Extensions
|
|
357
362
|
|
|
358
|
-
|
|
363
|
+
### Image
|
|
359
364
|
|
|
360
|
-
|
|
365
|
+
Up to 1,000 images per URL. See [Google specification](https://support.google.com/webmasters/answer/178636?hl=en).
|
|
361
366
|
|
|
362
367
|
```ruby
|
|
363
|
-
|
|
364
|
-
|
|
365
|
-
|
|
366
|
-
|
|
367
|
-
|
|
368
|
-
|
|
369
|
-
|
|
370
|
-
alternates: [
|
|
371
|
-
{ href: "https://example.com/en", lang: "en" },
|
|
372
|
-
{ href: "https://example.com/es", lang: "es" },
|
|
373
|
-
],
|
|
374
|
-
)
|
|
375
|
-
end
|
|
376
|
-
end
|
|
368
|
+
s.add("/gallery",
|
|
369
|
+
lastmod: Time.now,
|
|
370
|
+
images: [
|
|
371
|
+
{ loc: "https://example.com/photo1.jpg", title: "Photo 1", caption: "A photo" },
|
|
372
|
+
{ loc: "https://example.com/photo2.jpg", title: "Photo 2" }
|
|
373
|
+
]
|
|
374
|
+
)
|
|
377
375
|
```
|
|
378
376
|
|
|
379
|
-
|
|
380
|
-
* `href` - URL of the alternate link. (Required)
|
|
381
|
-
* `lang` - Language of the alternate link. (Optional)
|
|
382
|
-
* `nofollow` - Nofollow attribute of the alternate link. (Optional)
|
|
383
|
-
* `media` - Media targets for responsive design pages. (Optional)
|
|
377
|
+
Attributes: `loc`, `caption`, `geo_location`, `title`, `license`.
|
|
384
378
|
|
|
385
|
-
|
|
379
|
+
### Video
|
|
386
380
|
|
|
387
|
-
|
|
381
|
+
See [Google specification](https://support.google.com/webmasters/answer/80471?hl=en).
|
|
388
382
|
|
|
389
|
-
|
|
383
|
+
```ruby
|
|
384
|
+
s.add("/videos/example",
|
|
385
|
+
lastmod: Time.now,
|
|
386
|
+
videos: [
|
|
387
|
+
{
|
|
388
|
+
thumbnail_loc: "https://example.com/thumb.jpg",
|
|
389
|
+
title: "Example Video",
|
|
390
|
+
description: "An example video",
|
|
391
|
+
content_loc: "https://example.com/video.mp4",
|
|
392
|
+
duration: 600,
|
|
393
|
+
publication_date: Time.now
|
|
394
|
+
}
|
|
395
|
+
]
|
|
396
|
+
)
|
|
397
|
+
```
|
|
398
|
+
|
|
399
|
+
Attributes: `thumbnail_loc`, `title`, `description`, `content_loc`, `player_loc`, `allow_embed`, `autoplay`, `duration`, `expiration_date`, `rating`, `view_count`, `publication_date`, `tags`, `tag`, `category`, `family_friendly`, `gallery_loc`, `gallery_title`, `uploader`, `uploader_info`, `price`, `live`, `requires_subscription`.
|
|
400
|
+
|
|
401
|
+
### News
|
|
402
|
+
|
|
403
|
+
Up to 1,000 news URLs per sitemap. See [Google specification](https://support.google.com/news/publisher-center/answer/9606710?hl=en).
|
|
390
404
|
|
|
391
405
|
```ruby
|
|
392
|
-
|
|
393
|
-
|
|
394
|
-
|
|
395
|
-
|
|
396
|
-
|
|
397
|
-
|
|
406
|
+
s.add("/article/breaking-news",
|
|
407
|
+
lastmod: Time.now,
|
|
408
|
+
news: {
|
|
409
|
+
publication_name: "Example Times",
|
|
410
|
+
publication_language: "en",
|
|
411
|
+
publication_date: Time.now,
|
|
412
|
+
title: "Breaking News Story",
|
|
413
|
+
keywords: "breaking, news",
|
|
414
|
+
genres: "PressRelease",
|
|
415
|
+
access: "Subscription",
|
|
416
|
+
stock_tickers: "NASDAQ:GOOG"
|
|
417
|
+
}
|
|
418
|
+
)
|
|
398
419
|
```
|
|
399
420
|
|
|
400
|
-
|
|
421
|
+
Attributes: `publication_name`, `publication_language`, `publication_date`, `genres`, `access`, `title`, `keywords`, `stock_tickers`.
|
|
401
422
|
|
|
402
|
-
|
|
423
|
+
### Alternates (hreflang)
|
|
403
424
|
|
|
404
|
-
|
|
425
|
+
For multi-language sites. See [Google specification](https://support.google.com/webmasters/answer/189077).
|
|
405
426
|
|
|
406
|
-
|
|
427
|
+
```ruby
|
|
428
|
+
s.add("/",
|
|
429
|
+
lastmod: Time.now,
|
|
430
|
+
alternates: [
|
|
431
|
+
{ href: "https://example.com/en", lang: "en" },
|
|
432
|
+
{ href: "https://example.com/es", lang: "es" },
|
|
433
|
+
{ href: "https://example.com/fr", lang: "fr" }
|
|
434
|
+
]
|
|
435
|
+
)
|
|
436
|
+
```
|
|
407
437
|
|
|
408
|
-
|
|
409
|
-
* AWS S3
|
|
438
|
+
Attributes: `href` (required), `lang`, `nofollow`, `media`.
|
|
410
439
|
|
|
411
|
-
###
|
|
440
|
+
### Mobile
|
|
412
441
|
|
|
413
|
-
|
|
442
|
+
See [Google specification](https://support.google.com/webmasters/answer/34648).
|
|
414
443
|
|
|
415
444
|
```ruby
|
|
445
|
+
s.add("/mobile-page", mobile: true)
|
|
446
|
+
```
|
|
447
|
+
|
|
448
|
+
### PageMap
|
|
449
|
+
|
|
450
|
+
For Google Custom Search. See [Google specification](https://developers.google.com/custom-search/docs/structured_data#pagemaps).
|
|
451
|
+
|
|
452
|
+
```ruby
|
|
453
|
+
s.add("/product",
|
|
454
|
+
lastmod: Time.now,
|
|
455
|
+
pagemap: {
|
|
456
|
+
dataobjects: [
|
|
457
|
+
{
|
|
458
|
+
type: "product",
|
|
459
|
+
id: "sku-123",
|
|
460
|
+
attributes: [
|
|
461
|
+
{ name: "name", value: "Widget" },
|
|
462
|
+
{ name: "price", value: "19.99" }
|
|
463
|
+
]
|
|
464
|
+
}
|
|
465
|
+
]
|
|
466
|
+
}
|
|
467
|
+
)
|
|
468
|
+
```
|
|
469
|
+
|
|
470
|
+
## XSL Stylesheets
|
|
471
|
+
|
|
472
|
+
XSL stylesheets transform raw XML into styled HTML tables when sitemaps are opened in a browser — making them human-readable for debugging and review.
|
|
473
|
+
|
|
474
|
+
The gem ships with built-in stylesheets for both URL set sitemaps and sitemap indexes.
|
|
416
475
|
|
|
476
|
+
### Using with Rack Middleware
|
|
477
|
+
|
|
478
|
+
The simplest setup — the middleware serves both sitemaps and stylesheets:
|
|
479
|
+
|
|
480
|
+
```ruby
|
|
417
481
|
SiteMaps.use(:file_system) do
|
|
418
482
|
configure do |config|
|
|
419
|
-
config.url = "https://example.com/
|
|
420
|
-
config.
|
|
483
|
+
config.url = "https://example.com/sitemap.xml"
|
|
484
|
+
config.xsl_stylesheet_url = "/_sitemap-stylesheet.xsl"
|
|
485
|
+
config.xsl_index_stylesheet_url = "/_sitemap-index-stylesheet.xsl"
|
|
421
486
|
end
|
|
422
|
-
|
|
423
|
-
|
|
487
|
+
end
|
|
488
|
+
```
|
|
489
|
+
|
|
490
|
+
### Using with Static Files
|
|
491
|
+
|
|
492
|
+
Generate the XSL files and serve them as static assets:
|
|
493
|
+
|
|
494
|
+
```ruby
|
|
495
|
+
# Write stylesheets to disk
|
|
496
|
+
File.write("public/sitemap-style.xsl", SiteMaps::Builder::XSLStylesheet.urlset_xsl)
|
|
497
|
+
File.write("public/sitemap-index-style.xsl", SiteMaps::Builder::XSLStylesheet.index_xsl)
|
|
498
|
+
```
|
|
499
|
+
|
|
500
|
+
Then point the config to the static URLs:
|
|
501
|
+
|
|
502
|
+
```ruby
|
|
503
|
+
config.xsl_stylesheet_url = "https://example.com/sitemap-style.xsl"
|
|
504
|
+
config.xsl_index_stylesheet_url = "https://example.com/sitemap-index-style.xsl"
|
|
505
|
+
```
|
|
506
|
+
|
|
507
|
+
## Rack Middleware
|
|
508
|
+
|
|
509
|
+
`SiteMaps::Middleware` serves sitemaps over HTTP with SEO-appropriate headers:
|
|
510
|
+
|
|
511
|
+
- `Content-Type: text/xml; charset=UTF-8`
|
|
512
|
+
- `X-Robots-Tag: noindex, follow` — prevents search engines from indexing the sitemap itself
|
|
513
|
+
- `Cache-Control: public, max-age=3600`
|
|
514
|
+
|
|
515
|
+
It also serves the built-in XSL stylesheets at `/_sitemap-stylesheet.xsl` and `/_sitemap-index-stylesheet.xsl`.
|
|
516
|
+
|
|
517
|
+
### Rails
|
|
518
|
+
|
|
519
|
+
```ruby
|
|
520
|
+
# config/application.rb
|
|
521
|
+
config.middleware.use SiteMaps::Middleware
|
|
522
|
+
```
|
|
523
|
+
|
|
524
|
+
### Rack
|
|
525
|
+
|
|
526
|
+
```ruby
|
|
527
|
+
# config.ru
|
|
528
|
+
use SiteMaps::Middleware
|
|
529
|
+
run MyApp
|
|
530
|
+
```
|
|
531
|
+
|
|
532
|
+
### Options
|
|
533
|
+
|
|
534
|
+
```ruby
|
|
535
|
+
use SiteMaps::Middleware,
|
|
536
|
+
adapter: SiteMaps.current_adapter, # defaults to SiteMaps.current_adapter
|
|
537
|
+
public_prefix: nil, # strip this prefix from the public URL before lookup
|
|
538
|
+
storage_prefix: nil, # prepend this prefix to the public URL for storage lookup
|
|
539
|
+
x_robots_tag: "noindex, follow", # default
|
|
540
|
+
cache_control: "public, max-age=3600" # default
|
|
541
|
+
```
|
|
542
|
+
|
|
543
|
+
Non-matching requests pass through to the next middleware.
|
|
544
|
+
|
|
545
|
+
## robots.txt
|
|
546
|
+
|
|
547
|
+
`SiteMaps::RobotsTxt` generates the `Sitemap:` directive for your `robots.txt`:
|
|
548
|
+
|
|
549
|
+
```ruby
|
|
550
|
+
# Get just the directive line
|
|
551
|
+
SiteMaps::RobotsTxt.sitemap_directive("https://example.com/sitemap.xml")
|
|
552
|
+
# => "Sitemap: https://example.com/sitemap.xml"
|
|
553
|
+
|
|
554
|
+
# Auto-detect from current adapter
|
|
555
|
+
SiteMaps::RobotsTxt.sitemap_directive
|
|
556
|
+
# => "Sitemap: https://example.com/sitemap.xml"
|
|
557
|
+
|
|
558
|
+
# Generate a complete robots.txt
|
|
559
|
+
SiteMaps::RobotsTxt.render(
|
|
560
|
+
sitemap_url: "https://example.com/sitemap.xml",
|
|
561
|
+
extra_directives: ["Disallow: /admin/"]
|
|
562
|
+
)
|
|
563
|
+
# => "User-agent: *\nAllow: /\nDisallow: /admin/\nSitemap: https://example.com/sitemap.xml\n"
|
|
564
|
+
```
|
|
565
|
+
|
|
566
|
+
In a Rails controller:
|
|
567
|
+
|
|
568
|
+
```ruby
|
|
569
|
+
class RobotsController < ApplicationController
|
|
570
|
+
def show
|
|
571
|
+
render plain: SiteMaps::RobotsTxt.render
|
|
424
572
|
end
|
|
425
573
|
end
|
|
426
574
|
```
|
|
427
575
|
|
|
428
|
-
|
|
576
|
+
## Search Engine Ping
|
|
429
577
|
|
|
430
|
-
|
|
578
|
+
After sitemap generation, ping search engines to notify them of updates:
|
|
431
579
|
|
|
432
580
|
```ruby
|
|
433
|
-
SiteMaps.use(:
|
|
581
|
+
SiteMaps.use(:file_system) do
|
|
434
582
|
configure do |config|
|
|
435
|
-
config.url = "https://
|
|
436
|
-
config.
|
|
437
|
-
# AWS S3 specific options
|
|
438
|
-
config.bucket = "my-bucket"
|
|
439
|
-
config.region = "us-east-1"
|
|
440
|
-
config.aws_access_key = ENV["AWS_ACCESS_KEY_ID"]
|
|
441
|
-
config.aws_secret_key = ENV["AWS_SECRET_ACCESS_KEY"]
|
|
442
|
-
# Optional parameters (default values)
|
|
443
|
-
config.acl = "public-read"
|
|
444
|
-
config.cache_control = "private, max-age=0, no-cache"
|
|
445
|
-
end
|
|
446
|
-
process do |s|
|
|
447
|
-
# Add sitemap links
|
|
583
|
+
config.url = "https://example.com/sitemap.xml"
|
|
584
|
+
config.ping_search_engines = true
|
|
448
585
|
end
|
|
449
586
|
end
|
|
450
587
|
```
|
|
451
588
|
|
|
452
|
-
|
|
589
|
+
By default, only Bing is pinged (`https://www.bing.com/ping?sitemap=...`). Google deprecated their ping endpoint in 2023 — they discover sitemaps via `robots.txt` and Search Console.
|
|
590
|
+
|
|
591
|
+
### Custom Engines
|
|
453
592
|
|
|
454
593
|
```ruby
|
|
455
|
-
|
|
456
|
-
|
|
594
|
+
config.ping_engines = {
|
|
595
|
+
bing: "https://www.bing.com/ping?sitemap=%{url}",
|
|
596
|
+
google: "https://www.google.com/ping?sitemap=%{url}",
|
|
597
|
+
custom: "https://search.example.com/ping?url=%{url}"
|
|
598
|
+
}
|
|
457
599
|
```
|
|
458
600
|
|
|
601
|
+
### Ping via generate / CLI
|
|
602
|
+
|
|
603
|
+
Use the `ping:` option to trigger a ping for a specific run without changing the config file:
|
|
604
|
+
|
|
459
605
|
```ruby
|
|
460
|
-
|
|
461
|
-
|
|
462
|
-
def show
|
|
463
|
-
location = params.permit("relative_path", "format").to_h.values.join(".")
|
|
606
|
+
SiteMaps.generate(config_file: "config/sitemap.rb", ping: true).enqueue_all.run
|
|
607
|
+
```
|
|
464
608
|
|
|
465
|
-
|
|
466
|
-
|
|
467
|
-
|
|
609
|
+
```bash
|
|
610
|
+
bundle exec site_maps generate --config-file config/sitemap.rb --ping
|
|
611
|
+
```
|
|
468
612
|
|
|
469
|
-
|
|
470
|
-
|
|
471
|
-
|
|
472
|
-
|
|
473
|
-
|
|
474
|
-
|
|
475
|
-
|
|
476
|
-
|
|
613
|
+
`ping: true` overrides `config.ping_search_engines`. `ping: false` suppresses pinging even if the config enables it. Omitting `ping:` (the default) defers to the config value.
|
|
614
|
+
|
|
615
|
+
### Manual Ping
|
|
616
|
+
|
|
617
|
+
```ruby
|
|
618
|
+
SiteMaps::Ping.ping("https://example.com/sitemap.xml")
|
|
619
|
+
# => { bing: { status: 200, url: "https://www.bing.com/ping?sitemap=..." } }
|
|
620
|
+
```
|
|
621
|
+
|
|
622
|
+
## Adapters
|
|
623
|
+
|
|
624
|
+
### File System
|
|
625
|
+
|
|
626
|
+
Writes sitemaps to the local filesystem:
|
|
627
|
+
|
|
628
|
+
```ruby
|
|
629
|
+
SiteMaps.use(:file_system) do
|
|
630
|
+
configure do |config|
|
|
631
|
+
config.url = "https://example.com/sitemap.xml.gz"
|
|
632
|
+
config.directory = "/var/www/public"
|
|
477
633
|
end
|
|
478
634
|
end
|
|
479
635
|
```
|
|
480
636
|
|
|
481
|
-
|
|
637
|
+
### AWS S3
|
|
482
638
|
|
|
639
|
+
Writes sitemaps to an S3 bucket:
|
|
483
640
|
|
|
484
|
-
|
|
641
|
+
```ruby
|
|
642
|
+
SiteMaps.use(:aws_sdk) do
|
|
643
|
+
configure do |config|
|
|
644
|
+
config.url = "https://my-bucket.s3.amazonaws.com/sitemaps/sitemap.xml"
|
|
645
|
+
config.directory = "/tmp"
|
|
646
|
+
config.bucket = "my-bucket"
|
|
647
|
+
config.region = "us-east-1"
|
|
648
|
+
config.access_key_id = ENV["AWS_ACCESS_KEY_ID"]
|
|
649
|
+
config.secret_access_key = ENV["AWS_SECRET_ACCESS_KEY"]
|
|
650
|
+
config.acl = "public-read" # default
|
|
651
|
+
config.cache_control = "private, max-age=0, no-cache" # default
|
|
652
|
+
end
|
|
653
|
+
end
|
|
654
|
+
```
|
|
485
655
|
|
|
486
|
-
|
|
656
|
+
### Custom Adapters
|
|
487
657
|
|
|
488
|
-
|
|
489
|
-
* `read(url)` - Read the sitemap data from the storage.
|
|
490
|
-
* `delete(url)` - Delete the sitemap data from the storage.
|
|
658
|
+
Implement the `SiteMaps::Adapters::Adapter` interface:
|
|
491
659
|
|
|
492
660
|
```ruby
|
|
493
661
|
class MyAdapter < SiteMaps::Adapters::Adapter
|
|
494
|
-
def write(url, raw_data, **
|
|
495
|
-
# Write
|
|
662
|
+
def write(url, raw_data, **kwargs)
|
|
663
|
+
# Write sitemap data to storage
|
|
496
664
|
end
|
|
497
665
|
|
|
498
666
|
def read(url)
|
|
499
|
-
#
|
|
667
|
+
# Return [raw_data, { content_type: "application/xml" }]
|
|
500
668
|
end
|
|
501
669
|
|
|
502
670
|
def delete(url)
|
|
503
|
-
# Delete
|
|
671
|
+
# Delete sitemap from storage
|
|
504
672
|
end
|
|
505
673
|
end
|
|
506
674
|
|
|
507
|
-
SiteMaps.use(MyAdapter
|
|
508
|
-
|
|
509
|
-
# Add sitemap links
|
|
510
|
-
end
|
|
511
|
-
end
|
|
512
|
-
```
|
|
513
|
-
|
|
514
|
-
#### Adapter Configuration
|
|
515
|
-
|
|
516
|
-
If you adapter requires additional configuration, you can define a `<adapter class>::Config` inheriting from `SiteMaps::Configuration` and implement the required configuration options.
|
|
517
|
-
|
|
518
|
-
```ruby
|
|
519
|
-
class MyAdapter::Config < SiteMaps::Configuration
|
|
520
|
-
attribute :api_key, String
|
|
675
|
+
SiteMaps.use(MyAdapter) do
|
|
676
|
+
config.url = "https://example.com/sitemap.xml"
|
|
521
677
|
end
|
|
522
678
|
```
|
|
523
679
|
|
|
524
|
-
|
|
680
|
+
For adapter-specific configuration, define a nested `Config` class:
|
|
525
681
|
|
|
526
682
|
```ruby
|
|
527
|
-
|
|
528
|
-
|
|
529
|
-
|
|
530
|
-
config.api_key = "my-api-key"
|
|
531
|
-
end
|
|
532
|
-
process do |s|
|
|
533
|
-
# Add sitemap links
|
|
683
|
+
class MyAdapter < SiteMaps::Adapters::Adapter
|
|
684
|
+
class Config < SiteMaps::Configuration
|
|
685
|
+
attribute :api_key, default: -> { ENV["MY_API_KEY"] }
|
|
534
686
|
end
|
|
535
687
|
end
|
|
536
688
|
```
|
|
537
689
|
|
|
538
690
|
## CLI
|
|
539
691
|
|
|
540
|
-
You can use the CLI to generate the sitemap. The CLI will load the configuration file and run the sitemap generation.
|
|
541
|
-
|
|
542
692
|
```bash
|
|
693
|
+
# Generate all sitemaps
|
|
543
694
|
bundle exec site_maps generate --config-file config/sitemap.rb
|
|
544
|
-
```
|
|
545
|
-
|
|
546
|
-
To enqueue dynamic processes, you can pass the process name with the context values.
|
|
547
695
|
|
|
548
|
-
|
|
696
|
+
# Enqueue a dynamic process with context
|
|
549
697
|
bundle exec site_maps generate monthly_posts \
|
|
550
698
|
--config-file config/sitemap.rb \
|
|
551
|
-
--context=year:
|
|
552
|
-
```
|
|
553
|
-
|
|
554
|
-
Enqueue dynamic + remaining processes
|
|
699
|
+
--context=year:2024 month:1
|
|
555
700
|
|
|
556
|
-
|
|
701
|
+
# Enqueue dynamic + remaining processes
|
|
557
702
|
bundle exec site_maps generate monthly_posts \
|
|
558
703
|
--config-file config/sitemap.rb \
|
|
559
|
-
--context=year:
|
|
704
|
+
--context=year:2024 month:1 \
|
|
560
705
|
--enqueue-remaining
|
|
561
|
-
```
|
|
562
706
|
|
|
563
|
-
|
|
564
|
-
|
|
565
|
-
```bash
|
|
707
|
+
# Control concurrency
|
|
566
708
|
bundle exec site_maps generate \
|
|
567
709
|
--config-file config/sitemap.rb \
|
|
568
710
|
--max-threads 10
|
|
569
711
|
```
|
|
570
712
|
|
|
571
|
-
##
|
|
713
|
+
## Notifications
|
|
572
714
|
|
|
573
|
-
|
|
715
|
+
Subscribe to internal events for monitoring sitemap generation:
|
|
574
716
|
|
|
575
|
-
|
|
576
|
-
|
|
577
|
-
|
|
578
|
-
|
|
579
|
-
|
|
580
|
-
|
|
717
|
+
| Event | Description |
|
|
718
|
+
|-------|-------------|
|
|
719
|
+
| `sitemaps.enqueue_process` | A process was enqueued |
|
|
720
|
+
| `sitemaps.before_process_execution` | A process is about to start |
|
|
721
|
+
| `sitemaps.process_execution` | A process finished execution |
|
|
722
|
+
| `sitemaps.finalize_urlset` | A URL set was finalized and written |
|
|
723
|
+
| `sitemaps.ping` | Search engines were pinged |
|
|
581
724
|
|
|
582
725
|
```ruby
|
|
583
|
-
SiteMaps::Notification.subscribe("sitemaps.
|
|
584
|
-
puts "
|
|
726
|
+
SiteMaps::Notification.subscribe("sitemaps.finalize_urlset") do |event|
|
|
727
|
+
puts "Wrote #{event.payload[:links_count]} links to #{event.payload[:url]}"
|
|
585
728
|
end
|
|
586
729
|
```
|
|
587
730
|
|
|
588
|
-
|
|
731
|
+
Use the built-in event listener for console output:
|
|
589
732
|
|
|
590
733
|
```ruby
|
|
591
734
|
SiteMaps::Notification.subscribe(SiteMaps::Runner::EventListener)
|
|
@@ -596,52 +739,47 @@ SiteMaps.generate(config_file: "config/sitemap.rb")
|
|
|
596
739
|
|
|
597
740
|
## Mixins
|
|
598
741
|
|
|
599
|
-
|
|
742
|
+
Extend the sitemap builder with custom methods shared across processes:
|
|
600
743
|
|
|
601
744
|
```ruby
|
|
602
|
-
module
|
|
745
|
+
module SitemapHelpers
|
|
603
746
|
def repository
|
|
604
747
|
Repository.new
|
|
605
748
|
end
|
|
606
|
-
|
|
607
|
-
def post_path(post)
|
|
608
|
-
"/posts/#{post.slug}"
|
|
609
|
-
end
|
|
610
749
|
end
|
|
611
750
|
|
|
612
751
|
SiteMaps.use(:file_system) do
|
|
613
|
-
|
|
752
|
+
extend_processes_with(SitemapHelpers)
|
|
753
|
+
|
|
614
754
|
process do |s|
|
|
615
|
-
repository.posts.each do |post|
|
|
616
|
-
s.add(
|
|
755
|
+
s.repository.posts.each do |post|
|
|
756
|
+
s.add("/posts/#{post.slug}", lastmod: post.updated_at)
|
|
617
757
|
end
|
|
618
758
|
end
|
|
619
759
|
end
|
|
620
760
|
```
|
|
621
761
|
|
|
622
|
-
|
|
762
|
+
Rails applications get a built-in mixin with URL helpers via the `route` method:
|
|
623
763
|
|
|
624
764
|
```ruby
|
|
625
|
-
|
|
626
|
-
|
|
627
|
-
|
|
628
|
-
s.add(route.root_path, priority: 1.0)
|
|
629
|
-
s.add(route.about_path, priority: 0.9)
|
|
630
|
-
end
|
|
765
|
+
process do |s|
|
|
766
|
+
s.add(s.route.root_path, lastmod: Time.now)
|
|
767
|
+
s.add(s.route.about_path, lastmod: Time.now)
|
|
631
768
|
end
|
|
632
769
|
```
|
|
633
770
|
|
|
634
771
|
## Development
|
|
635
772
|
|
|
636
|
-
After checking out the repo, run `bin/setup` to install dependencies.
|
|
637
|
-
|
|
638
|
-
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
|
|
773
|
+
After checking out the repo, run `bin/setup` to install dependencies. Run `bin/console` for an interactive prompt.
|
|
639
774
|
|
|
640
|
-
|
|
641
|
-
|
|
642
|
-
|
|
775
|
+
```bash
|
|
776
|
+
bundle exec rspec # run tests
|
|
777
|
+
bundle exec rubocop # run linter
|
|
778
|
+
bundle exec rake install # install locally
|
|
779
|
+
```
|
|
643
780
|
|
|
781
|
+
Bug reports and pull requests are welcome on [GitHub](https://github.com/marcosgz/site_maps).
|
|
644
782
|
|
|
645
783
|
## License
|
|
646
784
|
|
|
647
|
-
|
|
785
|
+
Available as open source under the [MIT License](https://opensource.org/licenses/MIT).
|