site_maps 0.0.1.beta3 → 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.github/workflows/main.yml +2 -4
- data/.rubocop.yml +4 -2
- data/.tool-versions +1 -1
- data/AGENTS.md +73 -0
- data/CHANGELOG.md +5 -0
- data/CLAUDE.md +77 -0
- data/Gemfile +1 -0
- data/Gemfile.lock +72 -56
- data/README.md +531 -393
- data/docs/README.md +67 -0
- data/docs/adapters.md +143 -0
- data/docs/api.md +154 -0
- data/docs/cli.md +93 -0
- data/docs/events.md +79 -0
- data/docs/extensions.md +141 -0
- data/docs/getting-started.md +138 -0
- data/docs/middleware.md +85 -0
- data/docs/processes.md +156 -0
- data/docs/rails.md +128 -0
- data/lib/site_maps/adapters/adapter.rb +35 -5
- data/lib/site_maps/adapters/aws_sdk/storage.rb +5 -2
- data/lib/site_maps/builder/sitemap_index/item.rb +1 -1
- data/lib/site_maps/builder/sitemap_index.rb +29 -5
- data/lib/site_maps/builder/url.rb +13 -10
- data/lib/site_maps/builder/url_set.rb +17 -7
- data/lib/site_maps/builder/xsl_stylesheet.rb +192 -0
- data/lib/site_maps/cli.rb +6 -2
- data/lib/site_maps/configuration.rb +8 -1
- data/lib/site_maps/incremental_location.rb +1 -1
- data/lib/site_maps/middleware.rb +197 -0
- data/lib/site_maps/notification/event.rb +1 -1
- data/lib/site_maps/notification/publisher.rb +1 -0
- data/lib/site_maps/notification.rb +1 -0
- data/lib/site_maps/ping.rb +35 -0
- data/lib/site_maps/{primitives → primitive}/array.rb +1 -1
- data/lib/site_maps/{primitives → primitive}/output.rb +1 -1
- data/lib/site_maps/primitive/string.rb +106 -0
- data/lib/site_maps/robots_txt.rb +21 -0
- data/lib/site_maps/runner/event_listener.rb +2 -2
- data/lib/site_maps/runner.rb +17 -3
- data/lib/site_maps/sitemap_builder.rb +16 -4
- data/lib/site_maps/sitemap_reader.rb +3 -0
- data/lib/site_maps/version.rb +1 -1
- data/lib/site_maps.rb +81 -10
- data/site_maps.gemspec +1 -1
- metadata +23 -10
- data/lib/site_maps/primitives/string.rb +0 -43
data/docs/README.md
ADDED
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
# site_maps
|
|
2
|
+
|
|
3
|
+
Concurrent, adapter-based sitemap.xml generation for Ruby applications.
|
|
4
|
+
|
|
5
|
+
`site_maps` is a framework-agnostic sitemap builder with built-in Rails support. It produces valid sitemap XML (with full SEO extensions — image, video, news, hreflang, mobile, PageMap), splits large sitemaps into indexed chunks automatically, generates them concurrently across a thread pool, and ships them to the filesystem, S3, or a custom backend through a pluggable adapter layer.
|
|
6
|
+
|
|
7
|
+
## Contents
|
|
8
|
+
|
|
9
|
+
- [Getting started](getting-started.md) — install, first sitemap, Rails
|
|
10
|
+
- [Processes](processes.md) — static and dynamic process DSL
|
|
11
|
+
- [Adapters](adapters.md) — filesystem, S3, no-op, custom
|
|
12
|
+
- [CLI](cli.md) — `site_maps generate`
|
|
13
|
+
- [Rack middleware](middleware.md) — serve generated sitemaps from the app
|
|
14
|
+
- [SEO extensions](extensions.md) — image, video, news, hreflang, mobile, PageMap
|
|
15
|
+
- [Events](events.md) — instrumentation hooks
|
|
16
|
+
- [Rails integration](rails.md) — URL helpers, Railtie, precompile
|
|
17
|
+
- [API reference](api.md) — full public API
|
|
18
|
+
|
|
19
|
+
## Install
|
|
20
|
+
|
|
21
|
+
```ruby
|
|
22
|
+
# Gemfile
|
|
23
|
+
gem 'site_maps'
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
## One-minute tour
|
|
27
|
+
|
|
28
|
+
```ruby
|
|
29
|
+
# config/sitemap.rb
|
|
30
|
+
SiteMaps.use(:file_system) do
|
|
31
|
+
configure do |config|
|
|
32
|
+
config.url = 'https://example.com/sitemap.xml'
|
|
33
|
+
config.directory = Rails.public_path.to_s
|
|
34
|
+
end
|
|
35
|
+
|
|
36
|
+
process do |s|
|
|
37
|
+
s.add('/', priority: 1.0, changefreq: 'daily')
|
|
38
|
+
s.add('/about', lastmod: Time.now)
|
|
39
|
+
|
|
40
|
+
Post.find_each do |post|
|
|
41
|
+
s.add("/posts/#{post.slug}", lastmod: post.updated_at)
|
|
42
|
+
end
|
|
43
|
+
end
|
|
44
|
+
end
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
```bash
|
|
48
|
+
bundle exec site_maps generate --config-file config/sitemap.rb
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
Generated: `public/sitemap.xml` (plus an indexed chain if the URL set exceeds 50k links).
|
|
52
|
+
|
|
53
|
+
## Why site_maps
|
|
54
|
+
|
|
55
|
+
- **Concurrency.** Processes run in a `Concurrent::FixedThreadPool`; threads share a thread-safe repo that handles file splitting.
|
|
56
|
+
- **Pluggable storage.** Write the same sitemap to disk in development and S3 in production by swapping one line.
|
|
57
|
+
- **Incremental sitemaps.** Full URL extensions support — images, videos, news, hreflang alternates, mobile, PageMap.
|
|
58
|
+
- **Dynamic processes.** Parameterized templates like `posts/%{year}-%{month}/sitemap.xml` let you rebuild a single shard without regenerating the whole site.
|
|
59
|
+
|
|
60
|
+
## Version
|
|
61
|
+
|
|
62
|
+
- Ruby: `>= 3.2.0`
|
|
63
|
+
- Depends on: `builder ~> 3.0`, `concurrent-ruby >= 1.1`, `rack >= 2.0`, `zeitwerk`, `thor`
|
|
64
|
+
|
|
65
|
+
## License
|
|
66
|
+
|
|
67
|
+
MIT.
|
data/docs/adapters.md
ADDED
|
@@ -0,0 +1,143 @@
|
|
|
1
|
+
# Adapters
|
|
2
|
+
|
|
3
|
+
An **adapter** is the storage backend for generated sitemap files. Three adapters ship with the gem; a clean interface makes it easy to write your own.
|
|
4
|
+
|
|
5
|
+
## Built-in adapters
|
|
6
|
+
|
|
7
|
+
| Adapter | When to use |
|
|
8
|
+
|---------|-------------|
|
|
9
|
+
| `:file_system` | Write to disk. Ideal for local dev, or for serving via the bundled Rack middleware. |
|
|
10
|
+
| `:aws_sdk` | Upload to S3. Production deployments behind CloudFront or similar. |
|
|
11
|
+
| `:noop` | Discard writes. Ideal for tests that care about "what URLs got added" but not "what ended up on disk". |
|
|
12
|
+
|
|
13
|
+
Select with `SiteMaps.use(<symbol>)`.
|
|
14
|
+
|
|
15
|
+
## `:file_system`
|
|
16
|
+
|
|
17
|
+
```ruby
|
|
18
|
+
SiteMaps.use(:file_system) do
|
|
19
|
+
configure do |config|
|
|
20
|
+
config.url = 'https://example.com/sitemap.xml'
|
|
21
|
+
config.directory = Rails.public_path.to_s # default: "public/sitemaps"
|
|
22
|
+
end
|
|
23
|
+
process { |s| ... }
|
|
24
|
+
end
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
**Config attributes:**
|
|
28
|
+
|
|
29
|
+
| Key | Purpose |
|
|
30
|
+
|-----|---------|
|
|
31
|
+
| `url` | Public URL — drives filename layout and is written into sitemap `<loc>` entries. |
|
|
32
|
+
| `directory` | Filesystem root under which files land. |
|
|
33
|
+
|
|
34
|
+
If `config.url` ends in `.gz`, the adapter writes gzipped files. The middleware transparently decompresses on serve.
|
|
35
|
+
|
|
36
|
+
## `:aws_sdk`
|
|
37
|
+
|
|
38
|
+
```ruby
|
|
39
|
+
SiteMaps.use(:aws_sdk) do
|
|
40
|
+
configure do |config|
|
|
41
|
+
config.url = 'https://my-bucket.s3.amazonaws.com/sitemap.xml'
|
|
42
|
+
config.directory = '/tmp/sitemaps' # local scratch space
|
|
43
|
+
config.bucket = 'my-bucket'
|
|
44
|
+
config.region = ENV.fetch('AWS_REGION', 'us-east-1')
|
|
45
|
+
config.access_key_id = ENV['AWS_ACCESS_KEY_ID']
|
|
46
|
+
config.secret_access_key = ENV['AWS_SECRET_ACCESS_KEY']
|
|
47
|
+
config.acl = 'public-read' # default
|
|
48
|
+
config.cache_control = 'private, max-age=0, no-cache'
|
|
49
|
+
end
|
|
50
|
+
process { |s| ... }
|
|
51
|
+
end
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
**Config attributes:**
|
|
55
|
+
|
|
56
|
+
| Key | Default |
|
|
57
|
+
|-----|---------|
|
|
58
|
+
| `bucket` | `ENV['AWS_BUCKET']` |
|
|
59
|
+
| `region` | `ENV.fetch('AWS_REGION', 'us-east-1')` |
|
|
60
|
+
| `access_key_id` | `ENV['AWS_ACCESS_KEY_ID']` |
|
|
61
|
+
| `secret_access_key` | `ENV['AWS_SECRET_ACCESS_KEY']` |
|
|
62
|
+
| `acl` | `"public-read"` |
|
|
63
|
+
| `cache_control` | `"private, max-age=0, no-cache"` |
|
|
64
|
+
| `directory` | Local scratch dir for staging before upload |
|
|
65
|
+
|
|
66
|
+
The adapter writes locally first (to `directory`), then uploads to S3 with the configured ACL and Cache-Control headers. You'll need `aws-sdk-s3` in your Gemfile:
|
|
67
|
+
|
|
68
|
+
```ruby
|
|
69
|
+
gem 'aws-sdk-s3'
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
## `:noop`
|
|
73
|
+
|
|
74
|
+
```ruby
|
|
75
|
+
SiteMaps.use(:noop) do
|
|
76
|
+
configure { |c| c.url = 'https://example.com/sitemap.xml' }
|
|
77
|
+
process { |s| ... }
|
|
78
|
+
end
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
Writes are discarded. Use it in tests when you want to assert on the URLs being added (via events, for example) without hitting disk.
|
|
82
|
+
|
|
83
|
+
## Writing a custom adapter
|
|
84
|
+
|
|
85
|
+
Subclass `SiteMaps::Adapters::Adapter` and implement `write`, `read`, `delete`:
|
|
86
|
+
|
|
87
|
+
```ruby
|
|
88
|
+
class GoogleCloudStorageAdapter < SiteMaps::Adapters::Adapter
|
|
89
|
+
class Config < SiteMaps::Configuration
|
|
90
|
+
attribute :bucket
|
|
91
|
+
attribute :project_id
|
|
92
|
+
end
|
|
93
|
+
|
|
94
|
+
def write(url, raw_data, **_kwargs)
|
|
95
|
+
storage = Google::Cloud::Storage.new(project_id: config.project_id)
|
|
96
|
+
bucket = storage.bucket(config.bucket)
|
|
97
|
+
bucket.create_file(StringIO.new(raw_data), path_from(url))
|
|
98
|
+
end
|
|
99
|
+
|
|
100
|
+
def read(url)
|
|
101
|
+
file = storage.bucket(config.bucket).file(path_from(url))
|
|
102
|
+
[file.download.string, { content_type: 'application/xml' }]
|
|
103
|
+
end
|
|
104
|
+
|
|
105
|
+
def delete(url)
|
|
106
|
+
storage.bucket(config.bucket).file(path_from(url))&.delete
|
|
107
|
+
end
|
|
108
|
+
|
|
109
|
+
private
|
|
110
|
+
|
|
111
|
+
def path_from(url)
|
|
112
|
+
URI(url).path[1..]
|
|
113
|
+
end
|
|
114
|
+
|
|
115
|
+
def storage
|
|
116
|
+
@storage ||= Google::Cloud::Storage.new(project_id: config.project_id)
|
|
117
|
+
end
|
|
118
|
+
end
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
Register and use it:
|
|
122
|
+
|
|
123
|
+
```ruby
|
|
124
|
+
SiteMaps.use(GoogleCloudStorageAdapter) do
|
|
125
|
+
configure do |config|
|
|
126
|
+
config.url = 'https://cdn.example.com/sitemap.xml'
|
|
127
|
+
config.bucket = 'my-bucket'
|
|
128
|
+
config.project_id = 'my-project'
|
|
129
|
+
end
|
|
130
|
+
process { |s| ... }
|
|
131
|
+
end
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
## Adapter interface
|
|
135
|
+
|
|
136
|
+
| Method | Purpose |
|
|
137
|
+
|--------|---------|
|
|
138
|
+
| `#write(url, raw_data, **kwargs)` | Persist `raw_data` at the location implied by `url`. |
|
|
139
|
+
| `#read(url)` | Return `[raw_data, { content_type: '…' }]` for the given URL. |
|
|
140
|
+
| `#delete(url)` | Remove the file at the URL. |
|
|
141
|
+
| `.config_class` | (optional) Return a `Configuration` subclass to expose adapter-specific settings. |
|
|
142
|
+
|
|
143
|
+
The adapter base class handles everything else: URL filters, the process registry, and thread-safe URL tracking.
|
data/docs/api.md
ADDED
|
@@ -0,0 +1,154 @@
|
|
|
1
|
+
# API Reference
|
|
2
|
+
|
|
3
|
+
## `SiteMaps` (top-level module)
|
|
4
|
+
|
|
5
|
+
| Method | Description |
|
|
6
|
+
|--------|-------------|
|
|
7
|
+
| `SiteMaps.use(adapter, **opts, &block)` | Register an adapter (`:file_system`, `:aws_sdk`, `:noop`, or a class) and yield its configuration block. |
|
|
8
|
+
| `SiteMaps.define(&block)` | Register a context-aware definition. Called by `.generate` with the `context:` hash splatted as kwargs. |
|
|
9
|
+
| `SiteMaps.configure { |config| ... }` | Mutate global defaults. |
|
|
10
|
+
| `SiteMaps.config` | Return global `Configuration`. |
|
|
11
|
+
| `SiteMaps.generate(config_file:, context: {}, **runner_opts) → Runner` | Load `config_file` and return a `Runner` ready to `.enqueue` and `.run`. |
|
|
12
|
+
| `SiteMaps.current_adapter` | Last-registered adapter (thread-local during `.generate`). |
|
|
13
|
+
| `SiteMaps.logger` | Configurable logger (default `Logger.new($stdout)`). |
|
|
14
|
+
|
|
15
|
+
### Constants
|
|
16
|
+
|
|
17
|
+
```ruby
|
|
18
|
+
SiteMaps::MAX_LENGTH # { links: 50_000, images: 1_000, news: 1_000 }
|
|
19
|
+
SiteMaps::MAX_FILESIZE # 50_000_000 bytes
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
### Errors
|
|
23
|
+
|
|
24
|
+
- `SiteMaps::Error` — base error
|
|
25
|
+
- `SiteMaps::AdapterNotFound` — unknown adapter symbol
|
|
26
|
+
- `SiteMaps::AdapterNotSetError` — generate called without an adapter
|
|
27
|
+
- `SiteMaps::FileNotFoundError` — missing file at adapter read
|
|
28
|
+
- `SiteMaps::FullSitemapError` — internal signal that a URL set is full (triggers split)
|
|
29
|
+
- `SiteMaps::ConfigurationError` — invalid config
|
|
30
|
+
|
|
31
|
+
---
|
|
32
|
+
|
|
33
|
+
## `SiteMaps::Configuration`
|
|
34
|
+
|
|
35
|
+
Base configuration. Adapter configs subclass this.
|
|
36
|
+
|
|
37
|
+
| Attribute | Default | Purpose |
|
|
38
|
+
|-----------|---------|---------|
|
|
39
|
+
| `url` | — (required) | Public URL of the main sitemap index. |
|
|
40
|
+
| `directory` | `"/tmp/sitemaps"` | Local storage directory. |
|
|
41
|
+
| `max_links` | `50_000` | URLs per file before split. |
|
|
42
|
+
| `emit_priority` | `true` | Emit `<priority>`. |
|
|
43
|
+
| `emit_changefreq` | `true` | Emit `<changefreq>`. |
|
|
44
|
+
| `xsl_stylesheet_url` | `nil` | Stylesheet for URL sets. |
|
|
45
|
+
| `xsl_index_stylesheet_url` | `nil` | Stylesheet for the sitemap index. |
|
|
46
|
+
| `ping_search_engines` | `false` | Auto-ping after generation. |
|
|
47
|
+
| `ping_engines` | `{ bing: '...' }` | URL templates per engine; `%{url}` is URL-encoded at ping time. |
|
|
48
|
+
|
|
49
|
+
---
|
|
50
|
+
|
|
51
|
+
## `SiteMaps::Adapters::Adapter` (base class)
|
|
52
|
+
|
|
53
|
+
Abstract base. Subclass to build custom adapters.
|
|
54
|
+
|
|
55
|
+
| Method | Description |
|
|
56
|
+
|--------|-------------|
|
|
57
|
+
| `.config_class` | Override to return a `Configuration` subclass with adapter-specific attributes. |
|
|
58
|
+
| `#write(url, raw_data, **kwargs)` | Abstract. Persist `raw_data` at the storage location implied by `url`. |
|
|
59
|
+
| `#read(url) → [raw_data, { content_type: '…' }]` | Abstract. |
|
|
60
|
+
| `#delete(url)` | Abstract. |
|
|
61
|
+
| `#configure { |c| ... }` | Yield the adapter's configuration. |
|
|
62
|
+
| `#process(name = :default, location = nil, **kwargs, &block)` | Register a process. |
|
|
63
|
+
| `#external_sitemap(url, lastmod:)` | Add an external sitemap to the index. |
|
|
64
|
+
| `#extend_processes_with(mod)` | Mix `mod` into all process blocks. |
|
|
65
|
+
| `#url_filter { |url, options| ... }` | Register a URL filter. |
|
|
66
|
+
| `#apply_url_filters(url, options)` | Run all filters; returns modified options or `nil` if excluded. |
|
|
67
|
+
| `#reset!` | Clear index and repo. Called before `Runner#run`. |
|
|
68
|
+
|
|
69
|
+
---
|
|
70
|
+
|
|
71
|
+
## `SiteMaps::Runner`
|
|
72
|
+
|
|
73
|
+
Executes enqueued processes concurrently.
|
|
74
|
+
|
|
75
|
+
```ruby
|
|
76
|
+
Runner.new(adapter = SiteMaps.current_adapter, max_threads: 4, ping: nil)
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
| Method | Description |
|
|
80
|
+
|--------|-------------|
|
|
81
|
+
| `#enqueue(process_name, **kwargs)` | Queue one process with kwargs. |
|
|
82
|
+
| `#enqueue_remaining` / `#enqueue_all` | Queue every process not yet enqueued. |
|
|
83
|
+
| `#run` | Execute queued processes, finalize index, optionally ping. |
|
|
84
|
+
|
|
85
|
+
---
|
|
86
|
+
|
|
87
|
+
## `SiteMaps::SitemapBuilder`
|
|
88
|
+
|
|
89
|
+
Yielded as `s` inside every `process` block.
|
|
90
|
+
|
|
91
|
+
| Method | Description |
|
|
92
|
+
|--------|-------------|
|
|
93
|
+
| `#add(path, **options)` | Add one URL to the current URL set. Automatically splits when full. |
|
|
94
|
+
| `#finalize!` | Finalize the current URL set. Called automatically when the process block returns. |
|
|
95
|
+
|
|
96
|
+
`options` supports every extension documented in [extensions.md](extensions.md): `lastmod`, `priority`, `changefreq`, `images`, `videos`, `news`, `alternates`, `mobile`, `pagemap`.
|
|
97
|
+
|
|
98
|
+
In Rails apps, `s.route` is an object exposing all URL helpers.
|
|
99
|
+
|
|
100
|
+
---
|
|
101
|
+
|
|
102
|
+
## `SiteMaps::Middleware`
|
|
103
|
+
|
|
104
|
+
Rack middleware for serving generated sitemaps. See [middleware.md](middleware.md).
|
|
105
|
+
|
|
106
|
+
```ruby
|
|
107
|
+
use SiteMaps::Middleware,
|
|
108
|
+
adapter: ...,
|
|
109
|
+
public_prefix: nil,
|
|
110
|
+
storage_prefix: nil,
|
|
111
|
+
x_robots_tag: 'noindex, follow',
|
|
112
|
+
cache_control: 'public, max-age=3600'
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
---
|
|
116
|
+
|
|
117
|
+
## `SiteMaps::Notification`
|
|
118
|
+
|
|
119
|
+
| Method | Description |
|
|
120
|
+
|--------|-------------|
|
|
121
|
+
| `.subscribe(event_or_class, &block)` | Subscribe to one event (string) or every event named on a class. |
|
|
122
|
+
| `.unsubscribe(subscriber)` | Remove a subscription. |
|
|
123
|
+
| `.instrument(event, payload) { ... }` | Emit an event, wrapping the block in a timer. |
|
|
124
|
+
|
|
125
|
+
See [events.md](events.md) for the event catalog.
|
|
126
|
+
|
|
127
|
+
---
|
|
128
|
+
|
|
129
|
+
## `SiteMaps::RobotsTxt`
|
|
130
|
+
|
|
131
|
+
| Method | Description |
|
|
132
|
+
|--------|-------------|
|
|
133
|
+
| `.sitemap_directive(url) → String` | Return `"Sitemap: <url>"`. |
|
|
134
|
+
| `.render(sitemap_url:, extra_directives: []) → String` | Build a full robots.txt body. |
|
|
135
|
+
|
|
136
|
+
---
|
|
137
|
+
|
|
138
|
+
## `SiteMaps::Ping`
|
|
139
|
+
|
|
140
|
+
| Method | Description |
|
|
141
|
+
|--------|-------------|
|
|
142
|
+
| `.ping(url, engines: { bing: '...' }) → Hash` | Fire a GET to each engine's template (substituting `%{url}`). Returns a hash of `{engine => { status:, url: }}`. |
|
|
143
|
+
|
|
144
|
+
---
|
|
145
|
+
|
|
146
|
+
## CLI entry point
|
|
147
|
+
|
|
148
|
+
`exec/site_maps` — the executable shipped with the gem.
|
|
149
|
+
|
|
150
|
+
```bash
|
|
151
|
+
bundle exec site_maps generate [processes] [options]
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
See [cli.md](cli.md).
|
data/docs/cli.md
ADDED
|
@@ -0,0 +1,93 @@
|
|
|
1
|
+
# CLI
|
|
2
|
+
|
|
3
|
+
The gem installs a `site_maps` executable backed by Thor.
|
|
4
|
+
|
|
5
|
+
```bash
|
|
6
|
+
bundle exec site_maps generate [PROCESS_NAMES...] [options]
|
|
7
|
+
```
|
|
8
|
+
|
|
9
|
+
If no process names are given, every process in the config file is enqueued.
|
|
10
|
+
|
|
11
|
+
## Options
|
|
12
|
+
|
|
13
|
+
| Flag | Default | Purpose |
|
|
14
|
+
|------|---------|---------|
|
|
15
|
+
| `--config-file`, `-r` | — | Path to the config file defining processes. **Required.** |
|
|
16
|
+
| `--max-threads`, `-c` | `4` | Thread pool size for concurrent process execution. |
|
|
17
|
+
| `--context` | `{}` | Hash-style kwargs passed to `SiteMaps.define` blocks: `--context=tenant:acme locale:en`. |
|
|
18
|
+
| `--enqueue-remaining` | `false` | In addition to specified processes, enqueue any others. |
|
|
19
|
+
| `--ping` | `false` | Override config to ping search engines after generation. |
|
|
20
|
+
| `--debug` | `false` | Set logger to DEBUG level. |
|
|
21
|
+
| `--logfile` | — | Write logs to a file instead of stdout. |
|
|
22
|
+
|
|
23
|
+
## Examples
|
|
24
|
+
|
|
25
|
+
Generate everything:
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
bundle exec site_maps generate --config-file config/sitemap.rb
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
Regenerate a single shard of a dynamic process:
|
|
32
|
+
|
|
33
|
+
```bash
|
|
34
|
+
bundle exec site_maps generate monthly_posts \
|
|
35
|
+
--config-file config/sitemap.rb \
|
|
36
|
+
--context=year:2024 month:3
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
Generate `posts` and `products`, then let the config decide what else to include:
|
|
40
|
+
|
|
41
|
+
```bash
|
|
42
|
+
bundle exec site_maps generate posts products \
|
|
43
|
+
--config-file config/sitemap.rb \
|
|
44
|
+
--enqueue-remaining
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
Tune concurrency:
|
|
48
|
+
|
|
49
|
+
```bash
|
|
50
|
+
bundle exec site_maps generate --config-file config/sitemap.rb --max-threads 10
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
Ping Bing and any custom engines (config-driven — see below):
|
|
54
|
+
|
|
55
|
+
```bash
|
|
56
|
+
bundle exec site_maps generate --config-file config/sitemap.rb --ping
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
## Search-engine pinging
|
|
60
|
+
|
|
61
|
+
Pinging is off by default. Enable globally in config or flip it on per run via `--ping`.
|
|
62
|
+
|
|
63
|
+
```ruby
|
|
64
|
+
SiteMaps.use(:file_system) do
|
|
65
|
+
configure do |config|
|
|
66
|
+
config.url = 'https://example.com/sitemap.xml'
|
|
67
|
+
config.ping_search_engines = true
|
|
68
|
+
config.ping_engines = {
|
|
69
|
+
bing: 'https://www.bing.com/ping?sitemap=%{url}',
|
|
70
|
+
custom: 'https://search.example.com/ping?url=%{url}'
|
|
71
|
+
}
|
|
72
|
+
end
|
|
73
|
+
end
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
`%{url}` in the template is replaced with a URL-encoded `config.url` at ping time.
|
|
77
|
+
|
|
78
|
+
## Rails / bundler
|
|
79
|
+
|
|
80
|
+
The CLI auto-requires `config/environment` if it detects a `config/application.rb`, so Rails URL helpers (via the Railtie) are available inside your config file.
|
|
81
|
+
|
|
82
|
+
If you don't want that — say, a Ruby-only script in a Rails repo — pass a config file outside the Rails root or invoke the library directly via `SiteMaps.generate(...)`.
|
|
83
|
+
|
|
84
|
+
## Logging
|
|
85
|
+
|
|
86
|
+
- `--debug` sets the logger to `Logger::DEBUG`.
|
|
87
|
+
- `--logfile PATH` writes to a file; otherwise stdout.
|
|
88
|
+
- A built-in event listener prints one line per finalized URL set with link counts and runtime.
|
|
89
|
+
|
|
90
|
+
## Exit codes
|
|
91
|
+
|
|
92
|
+
- `0` — success.
|
|
93
|
+
- Non-zero — any process raised. Errors are captured per-future and re-raised after all futures complete, so you see the real backtrace rather than a generic runner failure.
|
data/docs/events.md
ADDED
|
@@ -0,0 +1,79 @@
|
|
|
1
|
+
# Events
|
|
2
|
+
|
|
3
|
+
`site_maps` ships a lightweight pub/sub system under `SiteMaps::Notification`. Use it for logging, metrics, or reacting to particular generation phases.
|
|
4
|
+
|
|
5
|
+
## Subscribing
|
|
6
|
+
|
|
7
|
+
### Block subscribers
|
|
8
|
+
|
|
9
|
+
```ruby
|
|
10
|
+
SiteMaps::Notification.subscribe('sitemaps.finalize_urlset') do |event|
|
|
11
|
+
Rails.logger.info(
|
|
12
|
+
"[sitemap] wrote #{event[:links_count]} urls to #{event[:url]} in #{event[:runtime]}s"
|
|
13
|
+
)
|
|
14
|
+
end
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
### Class subscribers
|
|
18
|
+
|
|
19
|
+
A class with one method per event name (dots become underscores):
|
|
20
|
+
|
|
21
|
+
```ruby
|
|
22
|
+
class SitemapMetrics
|
|
23
|
+
def self.sitemaps_process_execution(event)
|
|
24
|
+
StatsD.timing('sitemaps.process', event[:runtime], tags: ["process:#{event[:process].name}"])
|
|
25
|
+
end
|
|
26
|
+
|
|
27
|
+
def self.sitemaps_finalize_urlset(event)
|
|
28
|
+
StatsD.increment('sitemaps.urlset.written', tags: ["url:#{event[:url]}"])
|
|
29
|
+
end
|
|
30
|
+
|
|
31
|
+
def self.sitemaps_ping(event)
|
|
32
|
+
event[:results].each do |engine, result|
|
|
33
|
+
StatsD.increment('sitemaps.ping', tags: ["engine:#{engine}", "status:#{result[:status]}"])
|
|
34
|
+
end
|
|
35
|
+
end
|
|
36
|
+
end
|
|
37
|
+
|
|
38
|
+
SiteMaps::Notification.subscribe(SitemapMetrics)
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
### The built-in listener
|
|
42
|
+
|
|
43
|
+
For colored terminal output during CLI runs:
|
|
44
|
+
|
|
45
|
+
```ruby
|
|
46
|
+
SiteMaps::Notification.subscribe(SiteMaps::Runner::EventListener)
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
This is subscribed automatically by the CLI.
|
|
50
|
+
|
|
51
|
+
## Events
|
|
52
|
+
|
|
53
|
+
| Event | Payload keys |
|
|
54
|
+
|-------|-------------|
|
|
55
|
+
| `sitemaps.enqueue_process` | `process`, `kwargs` |
|
|
56
|
+
| `sitemaps.before_process_execution` | `process`, `kwargs` |
|
|
57
|
+
| `sitemaps.process_execution` | `process`, `kwargs`, `runtime` |
|
|
58
|
+
| `sitemaps.finalize_urlset` | `url`, `links_count`, `news_count`, `last_modified`, `runtime`, `process` |
|
|
59
|
+
| `sitemaps.ping` | `results` |
|
|
60
|
+
|
|
61
|
+
`process` is a `SiteMaps::Process` struct (`name`, `location_template`, `kwargs_template`, `block`).
|
|
62
|
+
|
|
63
|
+
## Event ordering
|
|
64
|
+
|
|
65
|
+
For each process the sequence is:
|
|
66
|
+
|
|
67
|
+
1. `sitemaps.enqueue_process`
|
|
68
|
+
2. `sitemaps.before_process_execution`
|
|
69
|
+
3. One or more `sitemaps.finalize_urlset` (one per split file)
|
|
70
|
+
4. `sitemaps.process_execution`
|
|
71
|
+
|
|
72
|
+
After all processes complete, one final `sitemaps.finalize_urlset` fires for the sitemap index itself. If pinging is enabled, `sitemaps.ping` fires last.
|
|
73
|
+
|
|
74
|
+
## Use cases
|
|
75
|
+
|
|
76
|
+
- **Logging.** Tail-friendly output of what just ran, how many URLs, runtime.
|
|
77
|
+
- **Metrics.** StatsD / OpenTelemetry counters for throughput and ping outcomes.
|
|
78
|
+
- **Alerting.** Subscribe to `sitemaps.ping`, alert on non-200 results.
|
|
79
|
+
- **Cache busting.** After `sitemaps.finalize_urlset`, purge the CDN entry for the written URL.
|
data/docs/extensions.md
ADDED
|
@@ -0,0 +1,141 @@
|
|
|
1
|
+
# SEO Extensions
|
|
2
|
+
|
|
3
|
+
`s.add` accepts options for every sitemap extension recognized by Google and Bing. Pass any of the following alongside `lastmod`, `priority`, and `changefreq`.
|
|
4
|
+
|
|
5
|
+
## Image
|
|
6
|
+
|
|
7
|
+
Up to 1,000 images per URL.
|
|
8
|
+
|
|
9
|
+
```ruby
|
|
10
|
+
s.add('/gallery/summer', images: [
|
|
11
|
+
{
|
|
12
|
+
loc: 'https://cdn.example.com/summer/beach.jpg',
|
|
13
|
+
title: 'Beach sunset',
|
|
14
|
+
caption: 'A photo from the summer trip',
|
|
15
|
+
geo_location: 'Cape Cod, MA',
|
|
16
|
+
license: 'https://creativecommons.org/licenses/by/4.0/'
|
|
17
|
+
}
|
|
18
|
+
])
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
## Video
|
|
22
|
+
|
|
23
|
+
Up to 1,000 video entries per sitemap file.
|
|
24
|
+
|
|
25
|
+
```ruby
|
|
26
|
+
s.add('/videos/how-to', videos: [
|
|
27
|
+
{
|
|
28
|
+
thumbnail_loc: 'https://cdn.example.com/thumbs/how-to.jpg',
|
|
29
|
+
title: 'How to use site_maps',
|
|
30
|
+
description: 'A quick walkthrough',
|
|
31
|
+
content_loc: 'https://cdn.example.com/videos/how-to.mp4',
|
|
32
|
+
player_loc: 'https://example.com/embed/how-to',
|
|
33
|
+
duration: 600,
|
|
34
|
+
publication_date: Time.now,
|
|
35
|
+
rating: 4.8,
|
|
36
|
+
view_count: 12_345,
|
|
37
|
+
family_friendly: true,
|
|
38
|
+
requires_subscription: false,
|
|
39
|
+
live: false,
|
|
40
|
+
tags: %w[tutorial guide],
|
|
41
|
+
category: 'Technology',
|
|
42
|
+
uploader: 'example-team',
|
|
43
|
+
uploader_info: 'https://example.com/about',
|
|
44
|
+
gallery_loc: 'https://example.com/videos',
|
|
45
|
+
gallery_title: 'Example video gallery',
|
|
46
|
+
price: nil,
|
|
47
|
+
allow_embed: true,
|
|
48
|
+
autoplay: 'ap=1'
|
|
49
|
+
}
|
|
50
|
+
])
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
## News
|
|
54
|
+
|
|
55
|
+
Up to 1,000 news entries per sitemap file (use a dedicated process for news URLs).
|
|
56
|
+
|
|
57
|
+
```ruby
|
|
58
|
+
s.add('/news/breaking', news: {
|
|
59
|
+
publication_name: 'Example Times',
|
|
60
|
+
publication_language: 'en',
|
|
61
|
+
publication_date: Time.now,
|
|
62
|
+
title: 'Breaking news headline',
|
|
63
|
+
keywords: 'breaking, politics',
|
|
64
|
+
genres: 'PressRelease',
|
|
65
|
+
access: 'Subscription',
|
|
66
|
+
stock_tickers: 'NASDAQ:EXMP'
|
|
67
|
+
})
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
## Alternate language / hreflang
|
|
71
|
+
|
|
72
|
+
```ruby
|
|
73
|
+
s.add('/', alternates: [
|
|
74
|
+
{ href: 'https://example.com/en', lang: 'en' },
|
|
75
|
+
{ href: 'https://example.com/es', lang: 'es' },
|
|
76
|
+
{ href: 'https://example.com/fr', lang: 'fr', nofollow: true }
|
|
77
|
+
])
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
The `nofollow: true` variant emits `rel="nofollow alternate"` on the link. Use it to declare locale variants without signalling Google to crawl them as equivalents.
|
|
81
|
+
|
|
82
|
+
## Mobile
|
|
83
|
+
|
|
84
|
+
Declare a URL as mobile-friendly:
|
|
85
|
+
|
|
86
|
+
```ruby
|
|
87
|
+
s.add('/mobile-page', mobile: true)
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
## PageMap
|
|
91
|
+
|
|
92
|
+
Structured data for Google Custom Search.
|
|
93
|
+
|
|
94
|
+
```ruby
|
|
95
|
+
s.add('/products/widget', pagemap: {
|
|
96
|
+
dataobjects: [
|
|
97
|
+
{
|
|
98
|
+
type: 'product',
|
|
99
|
+
id: 'sku-123',
|
|
100
|
+
attributes: [
|
|
101
|
+
{ name: 'name', value: 'Widget' },
|
|
102
|
+
{ name: 'price', value: '19.99' },
|
|
103
|
+
{ name: 'color', value: 'blue' }
|
|
104
|
+
]
|
|
105
|
+
}
|
|
106
|
+
]
|
|
107
|
+
})
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
## Combined example
|
|
111
|
+
|
|
112
|
+
Everything can coexist on a single URL:
|
|
113
|
+
|
|
114
|
+
```ruby
|
|
115
|
+
s.add('/products/widget',
|
|
116
|
+
lastmod: Time.now,
|
|
117
|
+
priority: 0.9,
|
|
118
|
+
changefreq: 'weekly',
|
|
119
|
+
images: [{ loc: 'https://cdn.example.com/widget.jpg', title: 'Widget' }],
|
|
120
|
+
alternates: [{ href: 'https://example.com/es/products/widget', lang: 'es' }],
|
|
121
|
+
mobile: true,
|
|
122
|
+
pagemap: { dataobjects: [{ type: 'product', id: 'sku-123', attributes: [] }] }
|
|
123
|
+
)
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
## Disabling `priority` / `changefreq`
|
|
127
|
+
|
|
128
|
+
Both fields are optional per the sitemap spec, and many search engines ignore them. Disable globally if you want smaller files:
|
|
129
|
+
|
|
130
|
+
```ruby
|
|
131
|
+
configure do |config|
|
|
132
|
+
config.emit_priority = false
|
|
133
|
+
config.emit_changefreq = false
|
|
134
|
+
end
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
## Output size
|
|
138
|
+
|
|
139
|
+
- Per URL set: 50,000 links **or** 1,000 news items **or** 50 MB uncompressed — whichever comes first. When one of these is hit, the current file is finalized and a new one starts.
|
|
140
|
+
- File naming is automatic (`posts/sitemap.xml` → `posts/sitemap1.xml`, `posts/sitemap2.xml`, …).
|
|
141
|
+
- Use the `.gz` extension in `config.url` to emit gzipped files — most search engines fetch either form.
|