site_maps 0.1.0 → 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.tool-versions +1 -1
- data/CHANGELOG.md +5 -0
- data/Gemfile.lock +1 -1
- data/README.md +5 -0
- data/docs/README.md +67 -0
- data/docs/adapters.md +143 -0
- data/docs/api.md +154 -0
- data/docs/cli.md +93 -0
- data/docs/events.md +79 -0
- data/docs/extensions.md +141 -0
- data/docs/getting-started.md +138 -0
- data/docs/middleware.md +85 -0
- data/docs/processes.md +156 -0
- data/docs/rails.md +128 -0
- data/lib/site_maps/adapters/aws_sdk/storage.rb +5 -2
- data/lib/site_maps/version.rb +1 -1
- metadata +11 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 9246737e11d28f750005fedc117c0c2afb8c5f5ef606218fc505e1042c4da002
|
|
4
|
+
data.tar.gz: 80f005cca72244cfe1044c30cf91d1656b22bb7ff3047a2fd159eee2ddd2f3a5
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 6ebb8a69803018a65c995e61cc714385ae965b04fc8df330cb7a5e99fd37539568cc1661e86da70f1e7e90eef108ba9a9c5ba1bc6e4b50d5c17f41856e4df934
|
|
7
|
+
data.tar.gz: 9dbf41444903708643f4639ae773470fc2043eeb0deb9061dedccef5ceda30210435db6faec1d1f949baa8200841d7245c40ac97df88f8993b57e5c1caeaa41b
|
data/.tool-versions
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
ruby 3.
|
|
1
|
+
ruby 3.4.9
|
data/CHANGELOG.md
CHANGED
|
@@ -4,5 +4,10 @@ All notable changes to this project will be documented in this file.
|
|
|
4
4
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
6
6
|
|
|
7
|
+
## 0.1.1 - 2026-05-12
|
|
8
|
+
|
|
9
|
+
### Fixed
|
|
10
|
+
- AwsSdk adapter: switched from the deprecated `Aws::S3::Object#upload_file` to `Aws::S3::TransferManager#upload_file` to silence the deprecation warning and keep working past the next aws-sdk-s3 major.
|
|
11
|
+
|
|
7
12
|
## 0.0.1.beta1 - 2024-11-07
|
|
8
13
|
The first release of the gem
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
|
@@ -4,8 +4,13 @@ A concurrent, incremental sitemap generator for Ruby. Framework-agnostic with bu
|
|
|
4
4
|
|
|
5
5
|
Generates SEO-optimized XML sitemaps with support for sitemap indexes, XSL stylesheets, gzip compression, image/video/news extensions, search engine pinging, and Rack middleware for serving sitemaps with proper HTTP headers.
|
|
6
6
|
|
|
7
|
+
## Documentation
|
|
8
|
+
|
|
9
|
+
Full guides, adapter reference, CLI docs, and recipes are published at **[gems.marcosz.com.br/site_maps](https://gems.marcosz.com.br/site_maps/)** — part of the [marcosgz Ruby gem catalogue](https://gems.marcosz.com.br).
|
|
10
|
+
|
|
7
11
|
## Table of Contents
|
|
8
12
|
|
|
13
|
+
- [Documentation](#documentation)
|
|
9
14
|
- [Installation](#installation)
|
|
10
15
|
- [Quick Start](#quick-start)
|
|
11
16
|
- [Configuration](#configuration)
|
data/docs/README.md
ADDED
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
# site_maps
|
|
2
|
+
|
|
3
|
+
Concurrent, adapter-based sitemap.xml generation for Ruby applications.
|
|
4
|
+
|
|
5
|
+
`site_maps` is a framework-agnostic sitemap builder with built-in Rails support. It produces valid sitemap XML (with full SEO extensions — image, video, news, hreflang, mobile, PageMap), splits large sitemaps into indexed chunks automatically, generates them concurrently across a thread pool, and ships them to the filesystem, S3, or a custom backend through a pluggable adapter layer.
|
|
6
|
+
|
|
7
|
+
## Contents
|
|
8
|
+
|
|
9
|
+
- [Getting started](getting-started.md) — install, first sitemap, Rails
|
|
10
|
+
- [Processes](processes.md) — static and dynamic process DSL
|
|
11
|
+
- [Adapters](adapters.md) — filesystem, S3, no-op, custom
|
|
12
|
+
- [CLI](cli.md) — `site_maps generate`
|
|
13
|
+
- [Rack middleware](middleware.md) — serve generated sitemaps from the app
|
|
14
|
+
- [SEO extensions](extensions.md) — image, video, news, hreflang, mobile, PageMap
|
|
15
|
+
- [Events](events.md) — instrumentation hooks
|
|
16
|
+
- [Rails integration](rails.md) — URL helpers, Railtie, precompile
|
|
17
|
+
- [API reference](api.md) — full public API
|
|
18
|
+
|
|
19
|
+
## Install
|
|
20
|
+
|
|
21
|
+
```ruby
|
|
22
|
+
# Gemfile
|
|
23
|
+
gem 'site_maps'
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
## One-minute tour
|
|
27
|
+
|
|
28
|
+
```ruby
|
|
29
|
+
# config/sitemap.rb
|
|
30
|
+
SiteMaps.use(:file_system) do
|
|
31
|
+
configure do |config|
|
|
32
|
+
config.url = 'https://example.com/sitemap.xml'
|
|
33
|
+
config.directory = Rails.public_path.to_s
|
|
34
|
+
end
|
|
35
|
+
|
|
36
|
+
process do |s|
|
|
37
|
+
s.add('/', priority: 1.0, changefreq: 'daily')
|
|
38
|
+
s.add('/about', lastmod: Time.now)
|
|
39
|
+
|
|
40
|
+
Post.find_each do |post|
|
|
41
|
+
s.add("/posts/#{post.slug}", lastmod: post.updated_at)
|
|
42
|
+
end
|
|
43
|
+
end
|
|
44
|
+
end
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
```bash
|
|
48
|
+
bundle exec site_maps generate --config-file config/sitemap.rb
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
Generated: `public/sitemap.xml` (plus an indexed chain if the URL set exceeds 50k links).
|
|
52
|
+
|
|
53
|
+
## Why site_maps
|
|
54
|
+
|
|
55
|
+
- **Concurrency.** Processes run in a `Concurrent::FixedThreadPool`; threads share a thread-safe repo that handles file splitting.
|
|
56
|
+
- **Pluggable storage.** Write the same sitemap to disk in development and S3 in production by swapping one line.
|
|
57
|
+
- **Incremental sitemaps.** Full URL extensions support — images, videos, news, hreflang alternates, mobile, PageMap.
|
|
58
|
+
- **Dynamic processes.** Parameterized templates like `posts/%{year}-%{month}/sitemap.xml` let you rebuild a single shard without regenerating the whole site.
|
|
59
|
+
|
|
60
|
+
## Version
|
|
61
|
+
|
|
62
|
+
- Ruby: `>= 3.2.0`
|
|
63
|
+
- Depends on: `builder ~> 3.0`, `concurrent-ruby >= 1.1`, `rack >= 2.0`, `zeitwerk`, `thor`
|
|
64
|
+
|
|
65
|
+
## License
|
|
66
|
+
|
|
67
|
+
MIT.
|
data/docs/adapters.md
ADDED
|
@@ -0,0 +1,143 @@
|
|
|
1
|
+
# Adapters
|
|
2
|
+
|
|
3
|
+
An **adapter** is the storage backend for generated sitemap files. Three adapters ship with the gem; a clean interface makes it easy to write your own.
|
|
4
|
+
|
|
5
|
+
## Built-in adapters
|
|
6
|
+
|
|
7
|
+
| Adapter | When to use |
|
|
8
|
+
|---------|-------------|
|
|
9
|
+
| `:file_system` | Write to disk. Ideal for local dev, or for serving via the bundled Rack middleware. |
|
|
10
|
+
| `:aws_sdk` | Upload to S3. Production deployments behind CloudFront or similar. |
|
|
11
|
+
| `:noop` | Discard writes. Ideal for tests that care about "what URLs got added" but not "what ended up on disk". |
|
|
12
|
+
|
|
13
|
+
Select with `SiteMaps.use(<symbol>)`.
|
|
14
|
+
|
|
15
|
+
## `:file_system`
|
|
16
|
+
|
|
17
|
+
```ruby
|
|
18
|
+
SiteMaps.use(:file_system) do
|
|
19
|
+
configure do |config|
|
|
20
|
+
config.url = 'https://example.com/sitemap.xml'
|
|
21
|
+
config.directory = Rails.public_path.to_s # default: "public/sitemaps"
|
|
22
|
+
end
|
|
23
|
+
process { |s| ... }
|
|
24
|
+
end
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
**Config attributes:**
|
|
28
|
+
|
|
29
|
+
| Key | Purpose |
|
|
30
|
+
|-----|---------|
|
|
31
|
+
| `url` | Public URL — drives filename layout and is written into sitemap `<loc>` entries. |
|
|
32
|
+
| `directory` | Filesystem root under which files land. |
|
|
33
|
+
|
|
34
|
+
If `config.url` ends in `.gz`, the adapter writes gzipped files. The middleware transparently decompresses on serve.
|
|
35
|
+
|
|
36
|
+
## `:aws_sdk`
|
|
37
|
+
|
|
38
|
+
```ruby
|
|
39
|
+
SiteMaps.use(:aws_sdk) do
|
|
40
|
+
configure do |config|
|
|
41
|
+
config.url = 'https://my-bucket.s3.amazonaws.com/sitemap.xml'
|
|
42
|
+
config.directory = '/tmp/sitemaps' # local scratch space
|
|
43
|
+
config.bucket = 'my-bucket'
|
|
44
|
+
config.region = ENV.fetch('AWS_REGION', 'us-east-1')
|
|
45
|
+
config.access_key_id = ENV['AWS_ACCESS_KEY_ID']
|
|
46
|
+
config.secret_access_key = ENV['AWS_SECRET_ACCESS_KEY']
|
|
47
|
+
config.acl = 'public-read' # default
|
|
48
|
+
config.cache_control = 'private, max-age=0, no-cache'
|
|
49
|
+
end
|
|
50
|
+
process { |s| ... }
|
|
51
|
+
end
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
**Config attributes:**
|
|
55
|
+
|
|
56
|
+
| Key | Default |
|
|
57
|
+
|-----|---------|
|
|
58
|
+
| `bucket` | `ENV['AWS_BUCKET']` |
|
|
59
|
+
| `region` | `ENV.fetch('AWS_REGION', 'us-east-1')` |
|
|
60
|
+
| `access_key_id` | `ENV['AWS_ACCESS_KEY_ID']` |
|
|
61
|
+
| `secret_access_key` | `ENV['AWS_SECRET_ACCESS_KEY']` |
|
|
62
|
+
| `acl` | `"public-read"` |
|
|
63
|
+
| `cache_control` | `"private, max-age=0, no-cache"` |
|
|
64
|
+
| `directory` | Local scratch dir for staging before upload |
|
|
65
|
+
|
|
66
|
+
The adapter writes locally first (to `directory`), then uploads to S3 with the configured ACL and Cache-Control headers. You'll need `aws-sdk-s3` in your Gemfile:
|
|
67
|
+
|
|
68
|
+
```ruby
|
|
69
|
+
gem 'aws-sdk-s3'
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
## `:noop`
|
|
73
|
+
|
|
74
|
+
```ruby
|
|
75
|
+
SiteMaps.use(:noop) do
|
|
76
|
+
configure { |c| c.url = 'https://example.com/sitemap.xml' }
|
|
77
|
+
process { |s| ... }
|
|
78
|
+
end
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
Writes are discarded. Use it in tests when you want to assert on the URLs being added (via events, for example) without hitting disk.
|
|
82
|
+
|
|
83
|
+
## Writing a custom adapter
|
|
84
|
+
|
|
85
|
+
Subclass `SiteMaps::Adapters::Adapter` and implement `write`, `read`, `delete`:
|
|
86
|
+
|
|
87
|
+
```ruby
|
|
88
|
+
class GoogleCloudStorageAdapter < SiteMaps::Adapters::Adapter
|
|
89
|
+
class Config < SiteMaps::Configuration
|
|
90
|
+
attribute :bucket
|
|
91
|
+
attribute :project_id
|
|
92
|
+
end
|
|
93
|
+
|
|
94
|
+
def write(url, raw_data, **_kwargs)
|
|
95
|
+
storage = Google::Cloud::Storage.new(project_id: config.project_id)
|
|
96
|
+
bucket = storage.bucket(config.bucket)
|
|
97
|
+
bucket.create_file(StringIO.new(raw_data), path_from(url))
|
|
98
|
+
end
|
|
99
|
+
|
|
100
|
+
def read(url)
|
|
101
|
+
file = storage.bucket(config.bucket).file(path_from(url))
|
|
102
|
+
[file.download.string, { content_type: 'application/xml' }]
|
|
103
|
+
end
|
|
104
|
+
|
|
105
|
+
def delete(url)
|
|
106
|
+
storage.bucket(config.bucket).file(path_from(url))&.delete
|
|
107
|
+
end
|
|
108
|
+
|
|
109
|
+
private
|
|
110
|
+
|
|
111
|
+
def path_from(url)
|
|
112
|
+
URI(url).path[1..]
|
|
113
|
+
end
|
|
114
|
+
|
|
115
|
+
def storage
|
|
116
|
+
@storage ||= Google::Cloud::Storage.new(project_id: config.project_id)
|
|
117
|
+
end
|
|
118
|
+
end
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
Register and use it:
|
|
122
|
+
|
|
123
|
+
```ruby
|
|
124
|
+
SiteMaps.use(GoogleCloudStorageAdapter) do
|
|
125
|
+
configure do |config|
|
|
126
|
+
config.url = 'https://cdn.example.com/sitemap.xml'
|
|
127
|
+
config.bucket = 'my-bucket'
|
|
128
|
+
config.project_id = 'my-project'
|
|
129
|
+
end
|
|
130
|
+
process { |s| ... }
|
|
131
|
+
end
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
## Adapter interface
|
|
135
|
+
|
|
136
|
+
| Method | Purpose |
|
|
137
|
+
|--------|---------|
|
|
138
|
+
| `#write(url, raw_data, **kwargs)` | Persist `raw_data` at the location implied by `url`. |
|
|
139
|
+
| `#read(url)` | Return `[raw_data, { content_type: '…' }]` for the given URL. |
|
|
140
|
+
| `#delete(url)` | Remove the file at the URL. |
|
|
141
|
+
| `.config_class` | (optional) Return a `Configuration` subclass to expose adapter-specific settings. |
|
|
142
|
+
|
|
143
|
+
The adapter base class handles everything else: URL filters, the process registry, and thread-safe URL tracking.
|
data/docs/api.md
ADDED
|
@@ -0,0 +1,154 @@
|
|
|
1
|
+
# API Reference
|
|
2
|
+
|
|
3
|
+
## `SiteMaps` (top-level module)
|
|
4
|
+
|
|
5
|
+
| Method | Description |
|
|
6
|
+
|--------|-------------|
|
|
7
|
+
| `SiteMaps.use(adapter, **opts, &block)` | Register an adapter (`:file_system`, `:aws_sdk`, `:noop`, or a class) and yield its configuration block. |
|
|
8
|
+
| `SiteMaps.define(&block)` | Register a context-aware definition. Called by `.generate` with the `context:` hash splatted as kwargs. |
|
|
9
|
+
| `SiteMaps.configure { |config| ... }` | Mutate global defaults. |
|
|
10
|
+
| `SiteMaps.config` | Return global `Configuration`. |
|
|
11
|
+
| `SiteMaps.generate(config_file:, context: {}, **runner_opts) → Runner` | Load `config_file` and return a `Runner` ready to `.enqueue` and `.run`. |
|
|
12
|
+
| `SiteMaps.current_adapter` | Last-registered adapter (thread-local during `.generate`). |
|
|
13
|
+
| `SiteMaps.logger` | Configurable logger (default `Logger.new($stdout)`). |
|
|
14
|
+
|
|
15
|
+
### Constants
|
|
16
|
+
|
|
17
|
+
```ruby
|
|
18
|
+
SiteMaps::MAX_LENGTH # { links: 50_000, images: 1_000, news: 1_000 }
|
|
19
|
+
SiteMaps::MAX_FILESIZE # 50_000_000 bytes
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
### Errors
|
|
23
|
+
|
|
24
|
+
- `SiteMaps::Error` — base error
|
|
25
|
+
- `SiteMaps::AdapterNotFound` — unknown adapter symbol
|
|
26
|
+
- `SiteMaps::AdapterNotSetError` — generate called without an adapter
|
|
27
|
+
- `SiteMaps::FileNotFoundError` — missing file at adapter read
|
|
28
|
+
- `SiteMaps::FullSitemapError` — internal signal that a URL set is full (triggers split)
|
|
29
|
+
- `SiteMaps::ConfigurationError` — invalid config
|
|
30
|
+
|
|
31
|
+
---
|
|
32
|
+
|
|
33
|
+
## `SiteMaps::Configuration`
|
|
34
|
+
|
|
35
|
+
Base configuration. Adapter configs subclass this.
|
|
36
|
+
|
|
37
|
+
| Attribute | Default | Purpose |
|
|
38
|
+
|-----------|---------|---------|
|
|
39
|
+
| `url` | — (required) | Public URL of the main sitemap index. |
|
|
40
|
+
| `directory` | `"/tmp/sitemaps"` | Local storage directory. |
|
|
41
|
+
| `max_links` | `50_000` | URLs per file before split. |
|
|
42
|
+
| `emit_priority` | `true` | Emit `<priority>`. |
|
|
43
|
+
| `emit_changefreq` | `true` | Emit `<changefreq>`. |
|
|
44
|
+
| `xsl_stylesheet_url` | `nil` | Stylesheet for URL sets. |
|
|
45
|
+
| `xsl_index_stylesheet_url` | `nil` | Stylesheet for the sitemap index. |
|
|
46
|
+
| `ping_search_engines` | `false` | Auto-ping after generation. |
|
|
47
|
+
| `ping_engines` | `{ bing: '...' }` | URL templates per engine; `%{url}` is URL-encoded at ping time. |
|
|
48
|
+
|
|
49
|
+
---
|
|
50
|
+
|
|
51
|
+
## `SiteMaps::Adapters::Adapter` (base class)
|
|
52
|
+
|
|
53
|
+
Abstract base. Subclass to build custom adapters.
|
|
54
|
+
|
|
55
|
+
| Method | Description |
|
|
56
|
+
|--------|-------------|
|
|
57
|
+
| `.config_class` | Override to return a `Configuration` subclass with adapter-specific attributes. |
|
|
58
|
+
| `#write(url, raw_data, **kwargs)` | Abstract. Persist `raw_data` at the storage location implied by `url`. |
|
|
59
|
+
| `#read(url) → [raw_data, { content_type: '…' }]` | Abstract. |
|
|
60
|
+
| `#delete(url)` | Abstract. |
|
|
61
|
+
| `#configure { |c| ... }` | Yield the adapter's configuration. |
|
|
62
|
+
| `#process(name = :default, location = nil, **kwargs, &block)` | Register a process. |
|
|
63
|
+
| `#external_sitemap(url, lastmod:)` | Add an external sitemap to the index. |
|
|
64
|
+
| `#extend_processes_with(mod)` | Mix `mod` into all process blocks. |
|
|
65
|
+
| `#url_filter { |url, options| ... }` | Register a URL filter. |
|
|
66
|
+
| `#apply_url_filters(url, options)` | Run all filters; returns modified options or `nil` if excluded. |
|
|
67
|
+
| `#reset!` | Clear index and repo. Called before `Runner#run`. |
|
|
68
|
+
|
|
69
|
+
---
|
|
70
|
+
|
|
71
|
+
## `SiteMaps::Runner`
|
|
72
|
+
|
|
73
|
+
Executes enqueued processes concurrently.
|
|
74
|
+
|
|
75
|
+
```ruby
|
|
76
|
+
Runner.new(adapter = SiteMaps.current_adapter, max_threads: 4, ping: nil)
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
| Method | Description |
|
|
80
|
+
|--------|-------------|
|
|
81
|
+
| `#enqueue(process_name, **kwargs)` | Queue one process with kwargs. |
|
|
82
|
+
| `#enqueue_remaining` / `#enqueue_all` | Queue every process not yet enqueued. |
|
|
83
|
+
| `#run` | Execute queued processes, finalize index, optionally ping. |
|
|
84
|
+
|
|
85
|
+
---
|
|
86
|
+
|
|
87
|
+
## `SiteMaps::SitemapBuilder`
|
|
88
|
+
|
|
89
|
+
Yielded as `s` inside every `process` block.
|
|
90
|
+
|
|
91
|
+
| Method | Description |
|
|
92
|
+
|--------|-------------|
|
|
93
|
+
| `#add(path, **options)` | Add one URL to the current URL set. Automatically splits when full. |
|
|
94
|
+
| `#finalize!` | Finalize the current URL set. Called automatically when the process block returns. |
|
|
95
|
+
|
|
96
|
+
`options` supports every extension documented in [extensions.md](extensions.md): `lastmod`, `priority`, `changefreq`, `images`, `videos`, `news`, `alternates`, `mobile`, `pagemap`.
|
|
97
|
+
|
|
98
|
+
In Rails apps, `s.route` is an object exposing all URL helpers.
|
|
99
|
+
|
|
100
|
+
---
|
|
101
|
+
|
|
102
|
+
## `SiteMaps::Middleware`
|
|
103
|
+
|
|
104
|
+
Rack middleware for serving generated sitemaps. See [middleware.md](middleware.md).
|
|
105
|
+
|
|
106
|
+
```ruby
|
|
107
|
+
use SiteMaps::Middleware,
|
|
108
|
+
adapter: ...,
|
|
109
|
+
public_prefix: nil,
|
|
110
|
+
storage_prefix: nil,
|
|
111
|
+
x_robots_tag: 'noindex, follow',
|
|
112
|
+
cache_control: 'public, max-age=3600'
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
---
|
|
116
|
+
|
|
117
|
+
## `SiteMaps::Notification`
|
|
118
|
+
|
|
119
|
+
| Method | Description |
|
|
120
|
+
|--------|-------------|
|
|
121
|
+
| `.subscribe(event_or_class, &block)` | Subscribe to one event (string) or every event named on a class. |
|
|
122
|
+
| `.unsubscribe(subscriber)` | Remove a subscription. |
|
|
123
|
+
| `.instrument(event, payload) { ... }` | Emit an event, wrapping the block in a timer. |
|
|
124
|
+
|
|
125
|
+
See [events.md](events.md) for the event catalog.
|
|
126
|
+
|
|
127
|
+
---
|
|
128
|
+
|
|
129
|
+
## `SiteMaps::RobotsTxt`
|
|
130
|
+
|
|
131
|
+
| Method | Description |
|
|
132
|
+
|--------|-------------|
|
|
133
|
+
| `.sitemap_directive(url) → String` | Return `"Sitemap: <url>"`. |
|
|
134
|
+
| `.render(sitemap_url:, extra_directives: []) → String` | Build a full robots.txt body. |
|
|
135
|
+
|
|
136
|
+
---
|
|
137
|
+
|
|
138
|
+
## `SiteMaps::Ping`
|
|
139
|
+
|
|
140
|
+
| Method | Description |
|
|
141
|
+
|--------|-------------|
|
|
142
|
+
| `.ping(url, engines: { bing: '...' }) → Hash` | Fire a GET to each engine's template (substituting `%{url}`). Returns a hash of `{engine => { status:, url: }}`. |
|
|
143
|
+
|
|
144
|
+
---
|
|
145
|
+
|
|
146
|
+
## CLI entry point
|
|
147
|
+
|
|
148
|
+
`exec/site_maps` — the executable shipped with the gem.
|
|
149
|
+
|
|
150
|
+
```bash
|
|
151
|
+
bundle exec site_maps generate [processes] [options]
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
See [cli.md](cli.md).
|
data/docs/cli.md
ADDED
|
@@ -0,0 +1,93 @@
|
|
|
1
|
+
# CLI
|
|
2
|
+
|
|
3
|
+
The gem installs a `site_maps` executable backed by Thor.
|
|
4
|
+
|
|
5
|
+
```bash
|
|
6
|
+
bundle exec site_maps generate [PROCESS_NAMES...] [options]
|
|
7
|
+
```
|
|
8
|
+
|
|
9
|
+
If no process names are given, every process in the config file is enqueued.
|
|
10
|
+
|
|
11
|
+
## Options
|
|
12
|
+
|
|
13
|
+
| Flag | Default | Purpose |
|
|
14
|
+
|------|---------|---------|
|
|
15
|
+
| `--config-file`, `-r` | — | Path to the config file defining processes. **Required.** |
|
|
16
|
+
| `--max-threads`, `-c` | `4` | Thread pool size for concurrent process execution. |
|
|
17
|
+
| `--context` | `{}` | Hash-style kwargs passed to `SiteMaps.define` blocks: `--context=tenant:acme locale:en`. |
|
|
18
|
+
| `--enqueue-remaining` | `false` | In addition to specified processes, enqueue any others. |
|
|
19
|
+
| `--ping` | `false` | Override config to ping search engines after generation. |
|
|
20
|
+
| `--debug` | `false` | Set logger to DEBUG level. |
|
|
21
|
+
| `--logfile` | — | Write logs to a file instead of stdout. |
|
|
22
|
+
|
|
23
|
+
## Examples
|
|
24
|
+
|
|
25
|
+
Generate everything:
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
bundle exec site_maps generate --config-file config/sitemap.rb
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
Regenerate a single shard of a dynamic process:
|
|
32
|
+
|
|
33
|
+
```bash
|
|
34
|
+
bundle exec site_maps generate monthly_posts \
|
|
35
|
+
--config-file config/sitemap.rb \
|
|
36
|
+
--context=year:2024 month:3
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
Generate `posts` and `products`, then let the config decide what else to include:
|
|
40
|
+
|
|
41
|
+
```bash
|
|
42
|
+
bundle exec site_maps generate posts products \
|
|
43
|
+
--config-file config/sitemap.rb \
|
|
44
|
+
--enqueue-remaining
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
Tune concurrency:
|
|
48
|
+
|
|
49
|
+
```bash
|
|
50
|
+
bundle exec site_maps generate --config-file config/sitemap.rb --max-threads 10
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
Ping Bing and any custom engines (config-driven — see below):
|
|
54
|
+
|
|
55
|
+
```bash
|
|
56
|
+
bundle exec site_maps generate --config-file config/sitemap.rb --ping
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
## Search-engine pinging
|
|
60
|
+
|
|
61
|
+
Pinging is off by default. Enable globally in config or flip it on per run via `--ping`.
|
|
62
|
+
|
|
63
|
+
```ruby
|
|
64
|
+
SiteMaps.use(:file_system) do
|
|
65
|
+
configure do |config|
|
|
66
|
+
config.url = 'https://example.com/sitemap.xml'
|
|
67
|
+
config.ping_search_engines = true
|
|
68
|
+
config.ping_engines = {
|
|
69
|
+
bing: 'https://www.bing.com/ping?sitemap=%{url}',
|
|
70
|
+
custom: 'https://search.example.com/ping?url=%{url}'
|
|
71
|
+
}
|
|
72
|
+
end
|
|
73
|
+
end
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
`%{url}` in the template is replaced with a URL-encoded `config.url` at ping time.
|
|
77
|
+
|
|
78
|
+
## Rails / bundler
|
|
79
|
+
|
|
80
|
+
The CLI auto-requires `config/environment` if it detects a `config/application.rb`, so Rails URL helpers (via the Railtie) are available inside your config file.
|
|
81
|
+
|
|
82
|
+
If you don't want that — say, a Ruby-only script in a Rails repo — pass a config file outside the Rails root or invoke the library directly via `SiteMaps.generate(...)`.
|
|
83
|
+
|
|
84
|
+
## Logging
|
|
85
|
+
|
|
86
|
+
- `--debug` sets the logger to `Logger::DEBUG`.
|
|
87
|
+
- `--logfile PATH` writes to a file; otherwise stdout.
|
|
88
|
+
- A built-in event listener prints one line per finalized URL set with link counts and runtime.
|
|
89
|
+
|
|
90
|
+
## Exit codes
|
|
91
|
+
|
|
92
|
+
- `0` — success.
|
|
93
|
+
- Non-zero — any process raised. Errors are captured per-future and re-raised after all futures complete, so you see the real backtrace rather than a generic runner failure.
|
data/docs/events.md
ADDED
|
@@ -0,0 +1,79 @@
|
|
|
1
|
+
# Events
|
|
2
|
+
|
|
3
|
+
`site_maps` ships a lightweight pub/sub system under `SiteMaps::Notification`. Use it for logging, metrics, or reacting to particular generation phases.
|
|
4
|
+
|
|
5
|
+
## Subscribing
|
|
6
|
+
|
|
7
|
+
### Block subscribers
|
|
8
|
+
|
|
9
|
+
```ruby
|
|
10
|
+
SiteMaps::Notification.subscribe('sitemaps.finalize_urlset') do |event|
|
|
11
|
+
Rails.logger.info(
|
|
12
|
+
"[sitemap] wrote #{event[:links_count]} urls to #{event[:url]} in #{event[:runtime]}s"
|
|
13
|
+
)
|
|
14
|
+
end
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
### Class subscribers
|
|
18
|
+
|
|
19
|
+
A class with one method per event name (dots become underscores):
|
|
20
|
+
|
|
21
|
+
```ruby
|
|
22
|
+
class SitemapMetrics
|
|
23
|
+
def self.sitemaps_process_execution(event)
|
|
24
|
+
StatsD.timing('sitemaps.process', event[:runtime], tags: ["process:#{event[:process].name}"])
|
|
25
|
+
end
|
|
26
|
+
|
|
27
|
+
def self.sitemaps_finalize_urlset(event)
|
|
28
|
+
StatsD.increment('sitemaps.urlset.written', tags: ["url:#{event[:url]}"])
|
|
29
|
+
end
|
|
30
|
+
|
|
31
|
+
def self.sitemaps_ping(event)
|
|
32
|
+
event[:results].each do |engine, result|
|
|
33
|
+
StatsD.increment('sitemaps.ping', tags: ["engine:#{engine}", "status:#{result[:status]}"])
|
|
34
|
+
end
|
|
35
|
+
end
|
|
36
|
+
end
|
|
37
|
+
|
|
38
|
+
SiteMaps::Notification.subscribe(SitemapMetrics)
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
### The built-in listener
|
|
42
|
+
|
|
43
|
+
For colored terminal output during CLI runs:
|
|
44
|
+
|
|
45
|
+
```ruby
|
|
46
|
+
SiteMaps::Notification.subscribe(SiteMaps::Runner::EventListener)
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
This is subscribed automatically by the CLI.
|
|
50
|
+
|
|
51
|
+
## Events
|
|
52
|
+
|
|
53
|
+
| Event | Payload keys |
|
|
54
|
+
|-------|-------------|
|
|
55
|
+
| `sitemaps.enqueue_process` | `process`, `kwargs` |
|
|
56
|
+
| `sitemaps.before_process_execution` | `process`, `kwargs` |
|
|
57
|
+
| `sitemaps.process_execution` | `process`, `kwargs`, `runtime` |
|
|
58
|
+
| `sitemaps.finalize_urlset` | `url`, `links_count`, `news_count`, `last_modified`, `runtime`, `process` |
|
|
59
|
+
| `sitemaps.ping` | `results` |
|
|
60
|
+
|
|
61
|
+
`process` is a `SiteMaps::Process` struct (`name`, `location_template`, `kwargs_template`, `block`).
|
|
62
|
+
|
|
63
|
+
## Event ordering
|
|
64
|
+
|
|
65
|
+
For each process the sequence is:
|
|
66
|
+
|
|
67
|
+
1. `sitemaps.enqueue_process`
|
|
68
|
+
2. `sitemaps.before_process_execution`
|
|
69
|
+
3. One or more `sitemaps.finalize_urlset` (one per split file)
|
|
70
|
+
4. `sitemaps.process_execution`
|
|
71
|
+
|
|
72
|
+
After all processes complete, one final `sitemaps.finalize_urlset` fires for the sitemap index itself. If pinging is enabled, `sitemaps.ping` fires last.
|
|
73
|
+
|
|
74
|
+
## Use cases
|
|
75
|
+
|
|
76
|
+
- **Logging.** Tail-friendly output of what just ran, how many URLs, runtime.
|
|
77
|
+
- **Metrics.** StatsD / OpenTelemetry counters for throughput and ping outcomes.
|
|
78
|
+
- **Alerting.** Subscribe to `sitemaps.ping`, alert on non-200 results.
|
|
79
|
+
- **Cache busting.** After `sitemaps.finalize_urlset`, purge the CDN entry for the written URL.
|
data/docs/extensions.md
ADDED
|
@@ -0,0 +1,141 @@
|
|
|
1
|
+
# SEO Extensions
|
|
2
|
+
|
|
3
|
+
`s.add` accepts options for every sitemap extension recognized by Google and Bing. Pass any of the following alongside `lastmod`, `priority`, and `changefreq`.
|
|
4
|
+
|
|
5
|
+
## Image
|
|
6
|
+
|
|
7
|
+
Up to 1,000 images per URL.
|
|
8
|
+
|
|
9
|
+
```ruby
|
|
10
|
+
s.add('/gallery/summer', images: [
|
|
11
|
+
{
|
|
12
|
+
loc: 'https://cdn.example.com/summer/beach.jpg',
|
|
13
|
+
title: 'Beach sunset',
|
|
14
|
+
caption: 'A photo from the summer trip',
|
|
15
|
+
geo_location: 'Cape Cod, MA',
|
|
16
|
+
license: 'https://creativecommons.org/licenses/by/4.0/'
|
|
17
|
+
}
|
|
18
|
+
])
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
## Video
|
|
22
|
+
|
|
23
|
+
Up to 1,000 video entries per sitemap file.
|
|
24
|
+
|
|
25
|
+
```ruby
|
|
26
|
+
s.add('/videos/how-to', videos: [
|
|
27
|
+
{
|
|
28
|
+
thumbnail_loc: 'https://cdn.example.com/thumbs/how-to.jpg',
|
|
29
|
+
title: 'How to use site_maps',
|
|
30
|
+
description: 'A quick walkthrough',
|
|
31
|
+
content_loc: 'https://cdn.example.com/videos/how-to.mp4',
|
|
32
|
+
player_loc: 'https://example.com/embed/how-to',
|
|
33
|
+
duration: 600,
|
|
34
|
+
publication_date: Time.now,
|
|
35
|
+
rating: 4.8,
|
|
36
|
+
view_count: 12_345,
|
|
37
|
+
family_friendly: true,
|
|
38
|
+
requires_subscription: false,
|
|
39
|
+
live: false,
|
|
40
|
+
tags: %w[tutorial guide],
|
|
41
|
+
category: 'Technology',
|
|
42
|
+
uploader: 'example-team',
|
|
43
|
+
uploader_info: 'https://example.com/about',
|
|
44
|
+
gallery_loc: 'https://example.com/videos',
|
|
45
|
+
gallery_title: 'Example video gallery',
|
|
46
|
+
price: nil,
|
|
47
|
+
allow_embed: true,
|
|
48
|
+
autoplay: 'ap=1'
|
|
49
|
+
}
|
|
50
|
+
])
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
## News
|
|
54
|
+
|
|
55
|
+
Up to 1,000 news entries per sitemap file (use a dedicated process for news URLs).
|
|
56
|
+
|
|
57
|
+
```ruby
|
|
58
|
+
s.add('/news/breaking', news: {
|
|
59
|
+
publication_name: 'Example Times',
|
|
60
|
+
publication_language: 'en',
|
|
61
|
+
publication_date: Time.now,
|
|
62
|
+
title: 'Breaking news headline',
|
|
63
|
+
keywords: 'breaking, politics',
|
|
64
|
+
genres: 'PressRelease',
|
|
65
|
+
access: 'Subscription',
|
|
66
|
+
stock_tickers: 'NASDAQ:EXMP'
|
|
67
|
+
})
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
## Alternate language / hreflang
|
|
71
|
+
|
|
72
|
+
```ruby
|
|
73
|
+
s.add('/', alternates: [
|
|
74
|
+
{ href: 'https://example.com/en', lang: 'en' },
|
|
75
|
+
{ href: 'https://example.com/es', lang: 'es' },
|
|
76
|
+
{ href: 'https://example.com/fr', lang: 'fr', nofollow: true }
|
|
77
|
+
])
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
The `nofollow: true` variant emits `rel="nofollow alternate"` on the link. Use it to declare locale variants without signalling Google to crawl them as equivalents.
|
|
81
|
+
|
|
82
|
+
## Mobile
|
|
83
|
+
|
|
84
|
+
Declare a URL as mobile-friendly:
|
|
85
|
+
|
|
86
|
+
```ruby
|
|
87
|
+
s.add('/mobile-page', mobile: true)
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
## PageMap
|
|
91
|
+
|
|
92
|
+
Structured data for Google Custom Search.
|
|
93
|
+
|
|
94
|
+
```ruby
|
|
95
|
+
s.add('/products/widget', pagemap: {
|
|
96
|
+
dataobjects: [
|
|
97
|
+
{
|
|
98
|
+
type: 'product',
|
|
99
|
+
id: 'sku-123',
|
|
100
|
+
attributes: [
|
|
101
|
+
{ name: 'name', value: 'Widget' },
|
|
102
|
+
{ name: 'price', value: '19.99' },
|
|
103
|
+
{ name: 'color', value: 'blue' }
|
|
104
|
+
]
|
|
105
|
+
}
|
|
106
|
+
]
|
|
107
|
+
})
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
## Combined example
|
|
111
|
+
|
|
112
|
+
Everything can coexist on a single URL:
|
|
113
|
+
|
|
114
|
+
```ruby
|
|
115
|
+
s.add('/products/widget',
|
|
116
|
+
lastmod: Time.now,
|
|
117
|
+
priority: 0.9,
|
|
118
|
+
changefreq: 'weekly',
|
|
119
|
+
images: [{ loc: 'https://cdn.example.com/widget.jpg', title: 'Widget' }],
|
|
120
|
+
alternates: [{ href: 'https://example.com/es/products/widget', lang: 'es' }],
|
|
121
|
+
mobile: true,
|
|
122
|
+
pagemap: { dataobjects: [{ type: 'product', id: 'sku-123', attributes: [] }] }
|
|
123
|
+
)
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
## Disabling `priority` / `changefreq`
|
|
127
|
+
|
|
128
|
+
Both fields are optional per the sitemap spec, and many search engines ignore them. Disable globally if you want smaller files:
|
|
129
|
+
|
|
130
|
+
```ruby
|
|
131
|
+
configure do |config|
|
|
132
|
+
config.emit_priority = false
|
|
133
|
+
config.emit_changefreq = false
|
|
134
|
+
end
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
## Output size
|
|
138
|
+
|
|
139
|
+
- Per URL set: 50,000 links **or** 1,000 news items **or** 50 MB uncompressed — whichever comes first. When one of these is hit, the current file is finalized and a new one starts.
|
|
140
|
+
- File naming is automatic (`posts/sitemap.xml` → `posts/sitemap1.xml`, `posts/sitemap2.xml`, …).
|
|
141
|
+
- Use the `.gz` extension in `config.url` to emit gzipped files — most search engines fetch either form.
|
|
@@ -0,0 +1,138 @@
|
|
|
1
|
+
# Getting Started
|
|
2
|
+
|
|
3
|
+
## Install
|
|
4
|
+
|
|
5
|
+
```ruby
|
|
6
|
+
# Gemfile
|
|
7
|
+
gem 'site_maps'
|
|
8
|
+
```
|
|
9
|
+
|
|
10
|
+
```bash
|
|
11
|
+
bundle install
|
|
12
|
+
```
|
|
13
|
+
|
|
14
|
+
## Your first sitemap
|
|
15
|
+
|
|
16
|
+
Create `config/sitemap.rb`:
|
|
17
|
+
|
|
18
|
+
```ruby
|
|
19
|
+
SiteMaps.use(:file_system) do
|
|
20
|
+
configure do |config|
|
|
21
|
+
config.url = 'https://example.com/sitemap.xml'
|
|
22
|
+
config.directory = File.expand_path('public', __dir__)
|
|
23
|
+
end
|
|
24
|
+
|
|
25
|
+
process do |s|
|
|
26
|
+
s.add('/', priority: 1.0, changefreq: 'daily')
|
|
27
|
+
s.add('/about', priority: 0.8, lastmod: Time.now)
|
|
28
|
+
s.add('/contact', priority: 0.5)
|
|
29
|
+
end
|
|
30
|
+
end
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
Generate:
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
bundle exec site_maps generate --config-file config/sitemap.rb
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
Output: `public/sitemap.xml`.
|
|
40
|
+
|
|
41
|
+
## Dynamic URLs
|
|
42
|
+
|
|
43
|
+
Yield `s.add` for every URL you want indexed. Database records work naturally:
|
|
44
|
+
|
|
45
|
+
```ruby
|
|
46
|
+
process :posts do |s|
|
|
47
|
+
Post.published.find_each do |post|
|
|
48
|
+
s.add("/posts/#{post.slug}", lastmod: post.updated_at, priority: 0.7)
|
|
49
|
+
end
|
|
50
|
+
end
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
When the URL count of a single process exceeds `max_links` (default 50,000), the file is split into `sitemap1.xml`, `sitemap2.xml`, … and a sitemap index is written at `config.url`.
|
|
54
|
+
|
|
55
|
+
## Named processes
|
|
56
|
+
|
|
57
|
+
Named processes get their own file and run in parallel:
|
|
58
|
+
|
|
59
|
+
```ruby
|
|
60
|
+
SiteMaps.use(:file_system) do
|
|
61
|
+
configure { |c| c.url = 'https://example.com/sitemap.xml'; c.directory = 'public' }
|
|
62
|
+
|
|
63
|
+
process :static do |s|
|
|
64
|
+
s.add('/')
|
|
65
|
+
s.add('/about')
|
|
66
|
+
end
|
|
67
|
+
|
|
68
|
+
process :posts, 'posts/sitemap.xml' do |s|
|
|
69
|
+
Post.find_each { |p| s.add("/posts/#{p.slug}") }
|
|
70
|
+
end
|
|
71
|
+
|
|
72
|
+
process :products, 'products/sitemap.xml' do |s|
|
|
73
|
+
Product.find_each { |p| s.add("/products/#{p.id}") }
|
|
74
|
+
end
|
|
75
|
+
end
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
Run all:
|
|
79
|
+
|
|
80
|
+
```bash
|
|
81
|
+
bundle exec site_maps generate --config-file config/sitemap.rb --max-threads 4
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
Run one:
|
|
85
|
+
|
|
86
|
+
```bash
|
|
87
|
+
bundle exec site_maps generate posts --config-file config/sitemap.rb
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
See [processes.md](processes.md) for the full process DSL including parameterized templates.
|
|
91
|
+
|
|
92
|
+
## Using it in Rails
|
|
93
|
+
|
|
94
|
+
Add `site_maps` to your Gemfile and generate from a Rake task, a scheduled job, or your deploy pipeline. The Railtie injects URL helpers:
|
|
95
|
+
|
|
96
|
+
```ruby
|
|
97
|
+
# config/sitemap.rb
|
|
98
|
+
SiteMaps.use(:file_system) do
|
|
99
|
+
configure do |config|
|
|
100
|
+
config.url = 'https://example.com/sitemap.xml'
|
|
101
|
+
config.directory = Rails.public_path.to_s
|
|
102
|
+
end
|
|
103
|
+
|
|
104
|
+
process do |s|
|
|
105
|
+
s.add(s.route.root_path, priority: 1.0)
|
|
106
|
+
s.add(s.route.about_path)
|
|
107
|
+
Post.find_each { |post| s.add(s.route.post_path(post), lastmod: post.updated_at) }
|
|
108
|
+
end
|
|
109
|
+
end
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
See [rails.md](rails.md) for the full Rails integration, including asset precompile hooks and the Rack middleware for serving generated sitemaps.
|
|
113
|
+
|
|
114
|
+
## Uploading to S3
|
|
115
|
+
|
|
116
|
+
Swap the adapter line:
|
|
117
|
+
|
|
118
|
+
```ruby
|
|
119
|
+
SiteMaps.use(:aws_sdk) do
|
|
120
|
+
configure do |config|
|
|
121
|
+
config.url = 'https://my-bucket.s3.amazonaws.com/sitemap.xml'
|
|
122
|
+
config.bucket = 'my-bucket'
|
|
123
|
+
config.region = ENV['AWS_REGION']
|
|
124
|
+
# access_key_id / secret_access_key default to ENV vars
|
|
125
|
+
end
|
|
126
|
+
|
|
127
|
+
process { |s| ... }
|
|
128
|
+
end
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
See [adapters.md](adapters.md) for adapter specifics and how to build your own.
|
|
132
|
+
|
|
133
|
+
## Next steps
|
|
134
|
+
|
|
135
|
+
- [Processes](processes.md) — split your sitemap into static and dynamic shards
|
|
136
|
+
- [SEO extensions](extensions.md) — image, video, news, hreflang
|
|
137
|
+
- [CLI](cli.md) — automation-friendly generate command
|
|
138
|
+
- [Rack middleware](middleware.md) — serve the generated files with correct headers
|
data/docs/middleware.md
ADDED
|
@@ -0,0 +1,85 @@
|
|
|
1
|
+
# Rack Middleware
|
|
2
|
+
|
|
3
|
+
`SiteMaps::Middleware` serves generated sitemap files directly from the app. Useful when you've generated to `public/sitemaps/` (filesystem adapter) and want proper `Content-Type`, gzip handling, and XSL stylesheet routing without editing your web-server config.
|
|
4
|
+
|
|
5
|
+
## Basic usage
|
|
6
|
+
|
|
7
|
+
```ruby
|
|
8
|
+
# config/application.rb (Rails)
|
|
9
|
+
config.middleware.use SiteMaps::Middleware, adapter: -> { SiteMaps.current_adapter }
|
|
10
|
+
```
|
|
11
|
+
|
|
12
|
+
Or inline in `config.ru`:
|
|
13
|
+
|
|
14
|
+
```ruby
|
|
15
|
+
require 'site_maps'
|
|
16
|
+
|
|
17
|
+
use SiteMaps::Middleware, adapter: SiteMaps.current_adapter
|
|
18
|
+
run MyApp
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
## Options
|
|
22
|
+
|
|
23
|
+
```ruby
|
|
24
|
+
use SiteMaps::Middleware,
|
|
25
|
+
adapter: SiteMaps.current_adapter,
|
|
26
|
+
public_prefix: nil,
|
|
27
|
+
storage_prefix: nil,
|
|
28
|
+
x_robots_tag: 'noindex, follow',
|
|
29
|
+
cache_control: 'public, max-age=3600'
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
| Option | Purpose |
|
|
33
|
+
|--------|---------|
|
|
34
|
+
| `adapter` | Adapter instance (or a callable returning one — useful if the adapter is reconfigured at boot). |
|
|
35
|
+
| `public_prefix` | Strip from request path before lookup — e.g. `/sitemap` if your app mounts them under a sub-path. |
|
|
36
|
+
| `storage_prefix` | Prepend to the lookup key — e.g. `tenants/acme` for multi-tenant layouts. |
|
|
37
|
+
| `x_robots_tag` | `X-Robots-Tag` header added to served files. |
|
|
38
|
+
| `cache_control` | `Cache-Control` header. |
|
|
39
|
+
|
|
40
|
+
## Behavior
|
|
41
|
+
|
|
42
|
+
The middleware intercepts requests for `*.xml` and `*.xml.gz` files:
|
|
43
|
+
|
|
44
|
+
- Matches → serve from the adapter with `Content-Type: application/xml`, plus `X-Robots-Tag` and `Cache-Control`.
|
|
45
|
+
- Gzipped sources → auto-decompress on serve so XSL stylesheets render in the browser. Clients asking for `.xml.gz` still get the compressed bytes.
|
|
46
|
+
- Doesn't match → `env` passes through to `@app.call`.
|
|
47
|
+
|
|
48
|
+
## XSL stylesheets
|
|
49
|
+
|
|
50
|
+
The middleware also serves the built-in XSL stylesheets — pretty sitemap rendering for human visitors — at their referenced paths. Configure their URLs via:
|
|
51
|
+
|
|
52
|
+
```ruby
|
|
53
|
+
configure do |config|
|
|
54
|
+
config.xsl_stylesheet_url = '/_sitemap-stylesheet.xsl'
|
|
55
|
+
config.xsl_index_stylesheet_url = '/_sitemap-index-stylesheet.xsl'
|
|
56
|
+
end
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
## Multi-tenant routing
|
|
60
|
+
|
|
61
|
+
For per-tenant sitemaps stored under subpaths:
|
|
62
|
+
|
|
63
|
+
```ruby
|
|
64
|
+
use SiteMaps::Middleware,
|
|
65
|
+
adapter: per_request_adapter,
|
|
66
|
+
storage_prefix: ->(request) { "tenants/#{request.host.split('.').first}" }
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
If the adapter itself already scopes paths by tenant, no prefix is needed — just point it at the right one for each request.
|
|
70
|
+
|
|
71
|
+
## robots.txt integration
|
|
72
|
+
|
|
73
|
+
Emit a `Sitemap:` directive for the generated file:
|
|
74
|
+
|
|
75
|
+
```ruby
|
|
76
|
+
# config.ru or a controller
|
|
77
|
+
SiteMaps::RobotsTxt.sitemap_directive('https://example.com/sitemap.xml')
|
|
78
|
+
# => "Sitemap: https://example.com/sitemap.xml"
|
|
79
|
+
|
|
80
|
+
SiteMaps::RobotsTxt.render(
|
|
81
|
+
sitemap_url: 'https://example.com/sitemap.xml',
|
|
82
|
+
extra_directives: ['Disallow: /admin']
|
|
83
|
+
)
|
|
84
|
+
# => "Sitemap: https://example.com/sitemap.xml\nDisallow: /admin"
|
|
85
|
+
```
|
data/docs/processes.md
ADDED
|
@@ -0,0 +1,156 @@
|
|
|
1
|
+
# Processes
|
|
2
|
+
|
|
3
|
+
A **process** is a unit of work that produces part of a sitemap. Each process runs on its own thread, writes its own URL set, and becomes an entry in the sitemap index.
|
|
4
|
+
|
|
5
|
+
## Static processes
|
|
6
|
+
|
|
7
|
+
A static process has no parameters. It runs once and writes one (possibly split) sitemap file.
|
|
8
|
+
|
|
9
|
+
```ruby
|
|
10
|
+
SiteMaps.use(:file_system) do
|
|
11
|
+
configure { |c| c.url = 'https://example.com/sitemap.xml'; c.directory = 'public' }
|
|
12
|
+
|
|
13
|
+
process do |s|
|
|
14
|
+
s.add('/', priority: 1.0)
|
|
15
|
+
s.add('/about')
|
|
16
|
+
end
|
|
17
|
+
|
|
18
|
+
process :posts, 'posts/sitemap.xml' do |s|
|
|
19
|
+
Post.find_each { |post| s.add("/posts/#{post.slug}", lastmod: post.updated_at) }
|
|
20
|
+
end
|
|
21
|
+
end
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
- Without an explicit name, the process is named `:default`.
|
|
25
|
+
- Without an explicit location, a default filename is assigned.
|
|
26
|
+
- The block receives a `SitemapBuilder` (`s`), on which `add` is called per URL.
|
|
27
|
+
|
|
28
|
+
## Dynamic processes
|
|
29
|
+
|
|
30
|
+
A dynamic process has placeholders in its location template and corresponding kwargs. Each unique combination of kwargs produces a separate sitemap file.
|
|
31
|
+
|
|
32
|
+
```ruby
|
|
33
|
+
process :monthly_posts, 'posts/%{year}-%{month}/sitemap.xml', year: 2024, month: 1 do |s, year:, month:, **|
|
|
34
|
+
Post.where('extract(year from published_at) = ? AND extract(month from published_at) = ?', year, month)
|
|
35
|
+
.find_each { |p| s.add("/posts/#{p.slug}", lastmod: p.updated_at) }
|
|
36
|
+
end
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
The kwargs passed to `process` are **defaults**; the real values come from `Runner#enqueue`:
|
|
40
|
+
|
|
41
|
+
```ruby
|
|
42
|
+
runner = SiteMaps.generate(config_file: 'config/sitemap.rb')
|
|
43
|
+
runner.enqueue(:monthly_posts, year: 2024, month: 1)
|
|
44
|
+
runner.enqueue(:monthly_posts, year: 2024, month: 2)
|
|
45
|
+
runner.enqueue(:monthly_posts, year: 2024, month: 3)
|
|
46
|
+
runner.run
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
Or from the CLI:
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
bundle exec site_maps generate monthly_posts \
|
|
53
|
+
--config-file config/sitemap.rb \
|
|
54
|
+
--context=year:2024 month:1
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
## Execution model
|
|
58
|
+
|
|
59
|
+
When you call `runner.run`:
|
|
60
|
+
|
|
61
|
+
1. Each enqueued process is wrapped in a `Concurrent::Future`.
|
|
62
|
+
2. The pool (default 4 threads, configurable via `--max-threads`) runs them in parallel.
|
|
63
|
+
3. Each process builds a `URLSet`. When the set fills up (50,000 links, 1,000 news items, or 50 MB uncompressed), it's finalized and written, and a new URLSet starts — automatically.
|
|
64
|
+
4. After every process finishes, the sitemap index is aggregated and written to `config.url`.
|
|
65
|
+
|
|
66
|
+
## Splitting rules
|
|
67
|
+
|
|
68
|
+
A URL set is finalized and rolled over when **any** of these apply:
|
|
69
|
+
|
|
70
|
+
- Links reach `config.max_links` (default 50,000 — the sitemap spec limit).
|
|
71
|
+
- News entries reach 1,000.
|
|
72
|
+
- Uncompressed XML reaches 50 MB.
|
|
73
|
+
|
|
74
|
+
Split files are named by `IncrementalLocation`: `posts/sitemap.xml` becomes `posts/sitemap1.xml`, `posts/sitemap2.xml`, etc.
|
|
75
|
+
|
|
76
|
+
## Index generation
|
|
77
|
+
|
|
78
|
+
A sitemap index is produced when:
|
|
79
|
+
|
|
80
|
+
- More than one process exists,
|
|
81
|
+
- A single process was split across multiple files, or
|
|
82
|
+
- External sitemaps were added.
|
|
83
|
+
|
|
84
|
+
Otherwise a single `urlset` is written directly at `config.url` (the "inline" optimization).
|
|
85
|
+
|
|
86
|
+
## Adding external sitemaps
|
|
87
|
+
|
|
88
|
+
Reference third-party or pre-existing sitemaps in the index:
|
|
89
|
+
|
|
90
|
+
```ruby
|
|
91
|
+
SiteMaps.use(:file_system) do
|
|
92
|
+
configure { |c| c.url = 'https://example.com/sitemap.xml'; c.directory = 'public' }
|
|
93
|
+
|
|
94
|
+
external_sitemap('https://cdn.example.com/legacy-sitemap.xml', lastmod: Time.parse('2024-01-15'))
|
|
95
|
+
|
|
96
|
+
process { |s| s.add('/') }
|
|
97
|
+
end
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
## Shared helpers across processes
|
|
101
|
+
|
|
102
|
+
Use `extend_processes_with` to add methods that every process block can call:
|
|
103
|
+
|
|
104
|
+
```ruby
|
|
105
|
+
module Helpers
|
|
106
|
+
def post_path(post) = "/posts/#{post.slug}"
|
|
107
|
+
def published_posts = Post.where.not(published_at: nil)
|
|
108
|
+
end
|
|
109
|
+
|
|
110
|
+
SiteMaps.use(:file_system) do
|
|
111
|
+
configure { |c| c.url = 'https://example.com/sitemap.xml'; c.directory = 'public' }
|
|
112
|
+
extend_processes_with(Helpers)
|
|
113
|
+
|
|
114
|
+
process :posts do |s|
|
|
115
|
+
published_posts.find_each { |p| s.add(post_path(p), lastmod: p.updated_at) }
|
|
116
|
+
end
|
|
117
|
+
end
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
## URL filters
|
|
121
|
+
|
|
122
|
+
Filters run per URL inside every process — use them for global exclusions or default attributes:
|
|
123
|
+
|
|
124
|
+
```ruby
|
|
125
|
+
SiteMaps.use(:file_system) do
|
|
126
|
+
configure { |c| c.url = 'https://example.com/sitemap.xml'; c.directory = 'public' }
|
|
127
|
+
|
|
128
|
+
# Exclude any /admin path
|
|
129
|
+
url_filter { |url, _options| false if url.include?('/admin') }
|
|
130
|
+
|
|
131
|
+
# Boost blog priority
|
|
132
|
+
url_filter do |url, options|
|
|
133
|
+
if url.include?('/blog/')
|
|
134
|
+
options.merge(priority: 0.9, changefreq: 'daily')
|
|
135
|
+
else
|
|
136
|
+
options
|
|
137
|
+
end
|
|
138
|
+
end
|
|
139
|
+
|
|
140
|
+
process { |s| ... }
|
|
141
|
+
end
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
A filter returning `false` (or `nil`) excludes the URL entirely. Returning a hash replaces the options.
|
|
145
|
+
|
|
146
|
+
## Re-running a single shard
|
|
147
|
+
|
|
148
|
+
Only regenerate what changed — the rest is preserved from the existing sitemap index:
|
|
149
|
+
|
|
150
|
+
```ruby
|
|
151
|
+
runner = SiteMaps.generate(config_file: 'config/sitemap.rb')
|
|
152
|
+
runner.enqueue(:monthly_posts, year: 2024, month: 3) # only March
|
|
153
|
+
runner.run # Jan and Feb kept as-is
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
This is the main advantage of parameterized dynamic processes: you can rebuild one month's shard on a cron and leave the rest untouched.
|
data/docs/rails.md
ADDED
|
@@ -0,0 +1,128 @@
|
|
|
1
|
+
# Rails Integration
|
|
2
|
+
|
|
3
|
+
The Railtie loads automatically when Rails is present. It wires two things:
|
|
4
|
+
|
|
5
|
+
1. **URL helpers** — `s.route.<helper>` inside process blocks.
|
|
6
|
+
2. **No other magic** — no initializer, no autoloaded directories, no patched generators.
|
|
7
|
+
|
|
8
|
+
## URL helpers in processes
|
|
9
|
+
|
|
10
|
+
```ruby
|
|
11
|
+
# config/sitemap.rb
|
|
12
|
+
SiteMaps.use(:file_system) do
|
|
13
|
+
configure do |config|
|
|
14
|
+
config.url = 'https://example.com/sitemap.xml'
|
|
15
|
+
config.directory = Rails.public_path.to_s
|
|
16
|
+
end
|
|
17
|
+
|
|
18
|
+
process do |s|
|
|
19
|
+
s.add(s.route.root_path, priority: 1.0)
|
|
20
|
+
s.add(s.route.about_path)
|
|
21
|
+
Post.find_each { |p| s.add(s.route.post_path(p), lastmod: p.updated_at) }
|
|
22
|
+
end
|
|
23
|
+
end
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
`s.route` is a singleton wrapping `Rails.application.routes.url_helpers`.
|
|
27
|
+
|
|
28
|
+
## Generating from Rails
|
|
29
|
+
|
|
30
|
+
### One-off
|
|
31
|
+
|
|
32
|
+
```bash
|
|
33
|
+
bundle exec site_maps generate --config-file config/sitemap.rb
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
The CLI auto-requires `config/environment.rb` if it finds a `config/application.rb`, so ActiveRecord, URL helpers, and everything else loads as normal.
|
|
37
|
+
|
|
38
|
+
### From a Rake task
|
|
39
|
+
|
|
40
|
+
```ruby
|
|
41
|
+
# lib/tasks/sitemap.rake
|
|
42
|
+
namespace :sitemap do
|
|
43
|
+
desc 'Generate sitemaps'
|
|
44
|
+
task generate: :environment do
|
|
45
|
+
runner = SiteMaps.generate(config_file: Rails.root.join('config/sitemap.rb').to_s)
|
|
46
|
+
runner.enqueue_all.run
|
|
47
|
+
end
|
|
48
|
+
end
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
Run on deploy or via cron:
|
|
52
|
+
|
|
53
|
+
```bash
|
|
54
|
+
bundle exec rake sitemap:generate
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
### From a scheduled job
|
|
58
|
+
|
|
59
|
+
```ruby
|
|
60
|
+
class SitemapJob < ApplicationJob
|
|
61
|
+
def perform
|
|
62
|
+
runner = SiteMaps.generate(config_file: Rails.root.join('config/sitemap.rb').to_s)
|
|
63
|
+
runner.enqueue_all.run
|
|
64
|
+
end
|
|
65
|
+
end
|
|
66
|
+
|
|
67
|
+
SitemapJob.set(cron: '0 3 * * *').perform_later
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
## Serving generated sitemaps
|
|
71
|
+
|
|
72
|
+
Add the Rack middleware to serve files generated by the `:file_system` adapter:
|
|
73
|
+
|
|
74
|
+
```ruby
|
|
75
|
+
# config/application.rb
|
|
76
|
+
config.middleware.use SiteMaps::Middleware, adapter: -> { SiteMaps.current_adapter }
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
See [middleware.md](middleware.md) for options.
|
|
80
|
+
|
|
81
|
+
## Asset precompile integration
|
|
82
|
+
|
|
83
|
+
If you want sitemaps regenerated on every deploy, hook into `assets:precompile`:
|
|
84
|
+
|
|
85
|
+
```ruby
|
|
86
|
+
# lib/tasks/sitemap.rake
|
|
87
|
+
Rake::Task['assets:precompile'].enhance(['sitemap:generate'])
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
## robots.txt
|
|
91
|
+
|
|
92
|
+
```erb
|
|
93
|
+
<%# public/robots.txt.erb or app/views/robots.text.erb %>
|
|
94
|
+
User-agent: *
|
|
95
|
+
Disallow: /admin
|
|
96
|
+
|
|
97
|
+
<%= SiteMaps::RobotsTxt.sitemap_directive('https://example.com/sitemap.xml') %>
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
## Multi-tenant
|
|
101
|
+
|
|
102
|
+
`SiteMaps.define` gives you a generation function parameterized by runtime context:
|
|
103
|
+
|
|
104
|
+
```ruby
|
|
105
|
+
# config/sitemap.rb
|
|
106
|
+
SiteMaps.define do |tenant:|
|
|
107
|
+
use(:file_system) do
|
|
108
|
+
configure do |config|
|
|
109
|
+
config.url = "https://#{tenant.domain}/sitemap.xml"
|
|
110
|
+
config.directory = tenant.public_path
|
|
111
|
+
end
|
|
112
|
+
|
|
113
|
+
process { |s| tenant.pages.each { |page| s.add(page.path, lastmod: page.updated_at) } }
|
|
114
|
+
end
|
|
115
|
+
end
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
```ruby
|
|
119
|
+
Tenant.find_each do |tenant|
|
|
120
|
+
SiteMaps.generate(config_file: 'config/sitemap.rb', context: { tenant: tenant }).enqueue_all.run
|
|
121
|
+
end
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
The context hash is splatted into the `define` block as keyword args.
|
|
125
|
+
|
|
126
|
+
## Dependencies
|
|
127
|
+
|
|
128
|
+
- Rails is **not** listed in the gemspec. The Railtie is loaded only if Rails is already present. If you're using `site_maps` in a non-Rails Ruby project, the Rails-specific pieces are inert.
|
|
@@ -14,8 +14,7 @@ class SiteMaps::Adapters::AwsSdk::Storage
|
|
|
14
14
|
lastmod = options.delete(:last_modified) || Time.now
|
|
15
15
|
options[:metadata] ||= {}
|
|
16
16
|
options[:metadata]["given-last-modified"] = lastmod.utc.strftime("%Y-%m-%dT%H:%M:%S%:z")
|
|
17
|
-
|
|
18
|
-
obj.upload_file(location.path, **options)
|
|
17
|
+
transfer_manager.upload_file(location.path, bucket: config.bucket, key: location.remote_path, **options)
|
|
19
18
|
end
|
|
20
19
|
|
|
21
20
|
def read(location)
|
|
@@ -49,4 +48,8 @@ class SiteMaps::Adapters::AwsSdk::Storage
|
|
|
49
48
|
def object(remote_path)
|
|
50
49
|
config.s3_bucket.object(remote_path)
|
|
51
50
|
end
|
|
51
|
+
|
|
52
|
+
def transfer_manager
|
|
53
|
+
@transfer_manager ||= ::Aws::S3::TransferManager.new(client: config.s3_resource.client)
|
|
54
|
+
end
|
|
52
55
|
end
|
data/lib/site_maps/version.rb
CHANGED
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: site_maps
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.1.
|
|
4
|
+
version: 0.1.1
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Marcos G. Zimmermann
|
|
@@ -105,6 +105,16 @@ files:
|
|
|
105
105
|
- Rakefile
|
|
106
106
|
- bin/console
|
|
107
107
|
- bin/setup
|
|
108
|
+
- docs/README.md
|
|
109
|
+
- docs/adapters.md
|
|
110
|
+
- docs/api.md
|
|
111
|
+
- docs/cli.md
|
|
112
|
+
- docs/events.md
|
|
113
|
+
- docs/extensions.md
|
|
114
|
+
- docs/getting-started.md
|
|
115
|
+
- docs/middleware.md
|
|
116
|
+
- docs/processes.md
|
|
117
|
+
- docs/rails.md
|
|
108
118
|
- exec/site_maps
|
|
109
119
|
- lib/site-maps.rb
|
|
110
120
|
- lib/site_maps.rb
|