site_maps 0.0.1.beta3 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (48) hide show
  1. checksums.yaml +4 -4
  2. data/.github/workflows/main.yml +2 -4
  3. data/.rubocop.yml +4 -2
  4. data/.tool-versions +1 -1
  5. data/AGENTS.md +73 -0
  6. data/CHANGELOG.md +5 -0
  7. data/CLAUDE.md +77 -0
  8. data/Gemfile +1 -0
  9. data/Gemfile.lock +72 -56
  10. data/README.md +531 -393
  11. data/docs/README.md +67 -0
  12. data/docs/adapters.md +143 -0
  13. data/docs/api.md +154 -0
  14. data/docs/cli.md +93 -0
  15. data/docs/events.md +79 -0
  16. data/docs/extensions.md +141 -0
  17. data/docs/getting-started.md +138 -0
  18. data/docs/middleware.md +85 -0
  19. data/docs/processes.md +156 -0
  20. data/docs/rails.md +128 -0
  21. data/lib/site_maps/adapters/adapter.rb +35 -5
  22. data/lib/site_maps/adapters/aws_sdk/storage.rb +5 -2
  23. data/lib/site_maps/builder/sitemap_index/item.rb +1 -1
  24. data/lib/site_maps/builder/sitemap_index.rb +29 -5
  25. data/lib/site_maps/builder/url.rb +13 -10
  26. data/lib/site_maps/builder/url_set.rb +17 -7
  27. data/lib/site_maps/builder/xsl_stylesheet.rb +192 -0
  28. data/lib/site_maps/cli.rb +6 -2
  29. data/lib/site_maps/configuration.rb +8 -1
  30. data/lib/site_maps/incremental_location.rb +1 -1
  31. data/lib/site_maps/middleware.rb +197 -0
  32. data/lib/site_maps/notification/event.rb +1 -1
  33. data/lib/site_maps/notification/publisher.rb +1 -0
  34. data/lib/site_maps/notification.rb +1 -0
  35. data/lib/site_maps/ping.rb +35 -0
  36. data/lib/site_maps/{primitives → primitive}/array.rb +1 -1
  37. data/lib/site_maps/{primitives → primitive}/output.rb +1 -1
  38. data/lib/site_maps/primitive/string.rb +106 -0
  39. data/lib/site_maps/robots_txt.rb +21 -0
  40. data/lib/site_maps/runner/event_listener.rb +2 -2
  41. data/lib/site_maps/runner.rb +17 -3
  42. data/lib/site_maps/sitemap_builder.rb +16 -4
  43. data/lib/site_maps/sitemap_reader.rb +3 -0
  44. data/lib/site_maps/version.rb +1 -1
  45. data/lib/site_maps.rb +81 -10
  46. data/site_maps.gemspec +1 -1
  47. metadata +23 -10
  48. data/lib/site_maps/primitives/string.rb +0 -43
data/docs/README.md ADDED
@@ -0,0 +1,67 @@
1
+ # site_maps
2
+
3
+ Concurrent, adapter-based sitemap.xml generation for Ruby applications.
4
+
5
+ `site_maps` is a framework-agnostic sitemap builder with built-in Rails support. It produces valid sitemap XML (with full SEO extensions — image, video, news, hreflang, mobile, PageMap), splits large sitemaps into indexed chunks automatically, generates them concurrently across a thread pool, and ships them to the filesystem, S3, or a custom backend through a pluggable adapter layer.
6
+
7
+ ## Contents
8
+
9
+ - [Getting started](getting-started.md) — install, first sitemap, Rails
10
+ - [Processes](processes.md) — static and dynamic process DSL
11
+ - [Adapters](adapters.md) — filesystem, S3, no-op, custom
12
+ - [CLI](cli.md) — `site_maps generate`
13
+ - [Rack middleware](middleware.md) — serve generated sitemaps from the app
14
+ - [SEO extensions](extensions.md) — image, video, news, hreflang, mobile, PageMap
15
+ - [Events](events.md) — instrumentation hooks
16
+ - [Rails integration](rails.md) — URL helpers, Railtie, precompile
17
+ - [API reference](api.md) — full public API
18
+
19
+ ## Install
20
+
21
+ ```ruby
22
+ # Gemfile
23
+ gem 'site_maps'
24
+ ```
25
+
26
+ ## One-minute tour
27
+
28
+ ```ruby
29
+ # config/sitemap.rb
30
+ SiteMaps.use(:file_system) do
31
+ configure do |config|
32
+ config.url = 'https://example.com/sitemap.xml'
33
+ config.directory = Rails.public_path.to_s
34
+ end
35
+
36
+ process do |s|
37
+ s.add('/', priority: 1.0, changefreq: 'daily')
38
+ s.add('/about', lastmod: Time.now)
39
+
40
+ Post.find_each do |post|
41
+ s.add("/posts/#{post.slug}", lastmod: post.updated_at)
42
+ end
43
+ end
44
+ end
45
+ ```
46
+
47
+ ```bash
48
+ bundle exec site_maps generate --config-file config/sitemap.rb
49
+ ```
50
+
51
+ Generated: `public/sitemap.xml` (plus an indexed chain if the URL set exceeds 50k links).
52
+
53
+ ## Why site_maps
54
+
55
+ - **Concurrency.** Processes run in a `Concurrent::FixedThreadPool`; threads share a thread-safe repo that handles file splitting.
56
+ - **Pluggable storage.** Write the same sitemap to disk in development and S3 in production by swapping one line.
57
+ - **Incremental sitemaps.** Full URL extensions support — images, videos, news, hreflang alternates, mobile, PageMap.
58
+ - **Dynamic processes.** Parameterized templates like `posts/%{year}-%{month}/sitemap.xml` let you rebuild a single shard without regenerating the whole site.
59
+
60
+ ## Version
61
+
62
+ - Ruby: `>= 3.2.0`
63
+ - Depends on: `builder ~> 3.0`, `concurrent-ruby >= 1.1`, `rack >= 2.0`, `zeitwerk`, `thor`
64
+
65
+ ## License
66
+
67
+ MIT.
data/docs/adapters.md ADDED
@@ -0,0 +1,143 @@
1
+ # Adapters
2
+
3
+ An **adapter** is the storage backend for generated sitemap files. Three adapters ship with the gem; a clean interface makes it easy to write your own.
4
+
5
+ ## Built-in adapters
6
+
7
+ | Adapter | When to use |
8
+ |---------|-------------|
9
+ | `:file_system` | Write to disk. Ideal for local dev, or for serving via the bundled Rack middleware. |
10
+ | `:aws_sdk` | Upload to S3. Production deployments behind CloudFront or similar. |
11
+ | `:noop` | Discard writes. Ideal for tests that care about "what URLs got added" but not "what ended up on disk". |
12
+
13
+ Select with `SiteMaps.use(<symbol>)`.
14
+
15
+ ## `:file_system`
16
+
17
+ ```ruby
18
+ SiteMaps.use(:file_system) do
19
+ configure do |config|
20
+ config.url = 'https://example.com/sitemap.xml'
21
+ config.directory = Rails.public_path.to_s # default: "public/sitemaps"
22
+ end
23
+ process { |s| ... }
24
+ end
25
+ ```
26
+
27
+ **Config attributes:**
28
+
29
+ | Key | Purpose |
30
+ |-----|---------|
31
+ | `url` | Public URL — drives filename layout and is written into sitemap `<loc>` entries. |
32
+ | `directory` | Filesystem root under which files land. |
33
+
34
+ If `config.url` ends in `.gz`, the adapter writes gzipped files. The middleware transparently decompresses on serve.
35
+
36
+ ## `:aws_sdk`
37
+
38
+ ```ruby
39
+ SiteMaps.use(:aws_sdk) do
40
+ configure do |config|
41
+ config.url = 'https://my-bucket.s3.amazonaws.com/sitemap.xml'
42
+ config.directory = '/tmp/sitemaps' # local scratch space
43
+ config.bucket = 'my-bucket'
44
+ config.region = ENV.fetch('AWS_REGION', 'us-east-1')
45
+ config.access_key_id = ENV['AWS_ACCESS_KEY_ID']
46
+ config.secret_access_key = ENV['AWS_SECRET_ACCESS_KEY']
47
+ config.acl = 'public-read' # default
48
+ config.cache_control = 'private, max-age=0, no-cache'
49
+ end
50
+ process { |s| ... }
51
+ end
52
+ ```
53
+
54
+ **Config attributes:**
55
+
56
+ | Key | Default |
57
+ |-----|---------|
58
+ | `bucket` | `ENV['AWS_BUCKET']` |
59
+ | `region` | `ENV.fetch('AWS_REGION', 'us-east-1')` |
60
+ | `access_key_id` | `ENV['AWS_ACCESS_KEY_ID']` |
61
+ | `secret_access_key` | `ENV['AWS_SECRET_ACCESS_KEY']` |
62
+ | `acl` | `"public-read"` |
63
+ | `cache_control` | `"private, max-age=0, no-cache"` |
64
+ | `directory` | Local scratch dir for staging before upload |
65
+
66
+ The adapter writes locally first (to `directory`), then uploads to S3 with the configured ACL and Cache-Control headers. You'll need `aws-sdk-s3` in your Gemfile:
67
+
68
+ ```ruby
69
+ gem 'aws-sdk-s3'
70
+ ```
71
+
72
+ ## `:noop`
73
+
74
+ ```ruby
75
+ SiteMaps.use(:noop) do
76
+ configure { |c| c.url = 'https://example.com/sitemap.xml' }
77
+ process { |s| ... }
78
+ end
79
+ ```
80
+
81
+ Writes are discarded. Use it in tests when you want to assert on the URLs being added (via events, for example) without hitting disk.
82
+
83
+ ## Writing a custom adapter
84
+
85
+ Subclass `SiteMaps::Adapters::Adapter` and implement `write`, `read`, `delete`:
86
+
87
+ ```ruby
88
+ class GoogleCloudStorageAdapter < SiteMaps::Adapters::Adapter
89
+ class Config < SiteMaps::Configuration
90
+ attribute :bucket
91
+ attribute :project_id
92
+ end
93
+
94
+ def write(url, raw_data, **_kwargs)
95
+ storage = Google::Cloud::Storage.new(project_id: config.project_id)
96
+ bucket = storage.bucket(config.bucket)
97
+ bucket.create_file(StringIO.new(raw_data), path_from(url))
98
+ end
99
+
100
+ def read(url)
101
+ file = storage.bucket(config.bucket).file(path_from(url))
102
+ [file.download.string, { content_type: 'application/xml' }]
103
+ end
104
+
105
+ def delete(url)
106
+ storage.bucket(config.bucket).file(path_from(url))&.delete
107
+ end
108
+
109
+ private
110
+
111
+ def path_from(url)
112
+ URI(url).path[1..]
113
+ end
114
+
115
+ def storage
116
+ @storage ||= Google::Cloud::Storage.new(project_id: config.project_id)
117
+ end
118
+ end
119
+ ```
120
+
121
+ Register and use it:
122
+
123
+ ```ruby
124
+ SiteMaps.use(GoogleCloudStorageAdapter) do
125
+ configure do |config|
126
+ config.url = 'https://cdn.example.com/sitemap.xml'
127
+ config.bucket = 'my-bucket'
128
+ config.project_id = 'my-project'
129
+ end
130
+ process { |s| ... }
131
+ end
132
+ ```
133
+
134
+ ## Adapter interface
135
+
136
+ | Method | Purpose |
137
+ |--------|---------|
138
+ | `#write(url, raw_data, **kwargs)` | Persist `raw_data` at the location implied by `url`. |
139
+ | `#read(url)` | Return `[raw_data, { content_type: '…' }]` for the given URL. |
140
+ | `#delete(url)` | Remove the file at the URL. |
141
+ | `.config_class` | (optional) Return a `Configuration` subclass to expose adapter-specific settings. |
142
+
143
+ The adapter base class handles everything else: URL filters, the process registry, and thread-safe URL tracking.
data/docs/api.md ADDED
@@ -0,0 +1,154 @@
1
+ # API Reference
2
+
3
+ ## `SiteMaps` (top-level module)
4
+
5
+ | Method | Description |
6
+ |--------|-------------|
7
+ | `SiteMaps.use(adapter, **opts, &block)` | Register an adapter (`:file_system`, `:aws_sdk`, `:noop`, or a class) and yield its configuration block. |
8
+ | `SiteMaps.define(&block)` | Register a context-aware definition. Called by `.generate` with the `context:` hash splatted as kwargs. |
9
+ | `SiteMaps.configure { |config| ... }` | Mutate global defaults. |
10
+ | `SiteMaps.config` | Return global `Configuration`. |
11
+ | `SiteMaps.generate(config_file:, context: {}, **runner_opts) → Runner` | Load `config_file` and return a `Runner` ready to `.enqueue` and `.run`. |
12
+ | `SiteMaps.current_adapter` | Last-registered adapter (thread-local during `.generate`). |
13
+ | `SiteMaps.logger` | Configurable logger (default `Logger.new($stdout)`). |
14
+
15
+ ### Constants
16
+
17
+ ```ruby
18
+ SiteMaps::MAX_LENGTH # { links: 50_000, images: 1_000, news: 1_000 }
19
+ SiteMaps::MAX_FILESIZE # 50_000_000 bytes
20
+ ```
21
+
22
+ ### Errors
23
+
24
+ - `SiteMaps::Error` — base error
25
+ - `SiteMaps::AdapterNotFound` — unknown adapter symbol
26
+ - `SiteMaps::AdapterNotSetError` — generate called without an adapter
27
+ - `SiteMaps::FileNotFoundError` — missing file at adapter read
28
+ - `SiteMaps::FullSitemapError` — internal signal that a URL set is full (triggers split)
29
+ - `SiteMaps::ConfigurationError` — invalid config
30
+
31
+ ---
32
+
33
+ ## `SiteMaps::Configuration`
34
+
35
+ Base configuration. Adapter configs subclass this.
36
+
37
+ | Attribute | Default | Purpose |
38
+ |-----------|---------|---------|
39
+ | `url` | — (required) | Public URL of the main sitemap index. |
40
+ | `directory` | `"/tmp/sitemaps"` | Local storage directory. |
41
+ | `max_links` | `50_000` | URLs per file before split. |
42
+ | `emit_priority` | `true` | Emit `<priority>`. |
43
+ | `emit_changefreq` | `true` | Emit `<changefreq>`. |
44
+ | `xsl_stylesheet_url` | `nil` | Stylesheet for URL sets. |
45
+ | `xsl_index_stylesheet_url` | `nil` | Stylesheet for the sitemap index. |
46
+ | `ping_search_engines` | `false` | Auto-ping after generation. |
47
+ | `ping_engines` | `{ bing: '...' }` | URL templates per engine; `%{url}` is URL-encoded at ping time. |
48
+
49
+ ---
50
+
51
+ ## `SiteMaps::Adapters::Adapter` (base class)
52
+
53
+ Abstract base. Subclass to build custom adapters.
54
+
55
+ | Method | Description |
56
+ |--------|-------------|
57
+ | `.config_class` | Override to return a `Configuration` subclass with adapter-specific attributes. |
58
+ | `#write(url, raw_data, **kwargs)` | Abstract. Persist `raw_data` at the storage location implied by `url`. |
59
+ | `#read(url) → [raw_data, { content_type: '…' }]` | Abstract. |
60
+ | `#delete(url)` | Abstract. |
61
+ | `#configure { |c| ... }` | Yield the adapter's configuration. |
62
+ | `#process(name = :default, location = nil, **kwargs, &block)` | Register a process. |
63
+ | `#external_sitemap(url, lastmod:)` | Add an external sitemap to the index. |
64
+ | `#extend_processes_with(mod)` | Mix `mod` into all process blocks. |
65
+ | `#url_filter { |url, options| ... }` | Register a URL filter. |
66
+ | `#apply_url_filters(url, options)` | Run all filters; returns modified options or `nil` if excluded. |
67
+ | `#reset!` | Clear index and repo. Called before `Runner#run`. |
68
+
69
+ ---
70
+
71
+ ## `SiteMaps::Runner`
72
+
73
+ Executes enqueued processes concurrently.
74
+
75
+ ```ruby
76
+ Runner.new(adapter = SiteMaps.current_adapter, max_threads: 4, ping: nil)
77
+ ```
78
+
79
+ | Method | Description |
80
+ |--------|-------------|
81
+ | `#enqueue(process_name, **kwargs)` | Queue one process with kwargs. |
82
+ | `#enqueue_remaining` / `#enqueue_all` | Queue every process not yet enqueued. |
83
+ | `#run` | Execute queued processes, finalize index, optionally ping. |
84
+
85
+ ---
86
+
87
+ ## `SiteMaps::SitemapBuilder`
88
+
89
+ Yielded as `s` inside every `process` block.
90
+
91
+ | Method | Description |
92
+ |--------|-------------|
93
+ | `#add(path, **options)` | Add one URL to the current URL set. Automatically splits when full. |
94
+ | `#finalize!` | Finalize the current URL set. Called automatically when the process block returns. |
95
+
96
+ `options` supports every extension documented in [extensions.md](extensions.md): `lastmod`, `priority`, `changefreq`, `images`, `videos`, `news`, `alternates`, `mobile`, `pagemap`.
97
+
98
+ In Rails apps, `s.route` is an object exposing all URL helpers.
99
+
100
+ ---
101
+
102
+ ## `SiteMaps::Middleware`
103
+
104
+ Rack middleware for serving generated sitemaps. See [middleware.md](middleware.md).
105
+
106
+ ```ruby
107
+ use SiteMaps::Middleware,
108
+ adapter: ...,
109
+ public_prefix: nil,
110
+ storage_prefix: nil,
111
+ x_robots_tag: 'noindex, follow',
112
+ cache_control: 'public, max-age=3600'
113
+ ```
114
+
115
+ ---
116
+
117
+ ## `SiteMaps::Notification`
118
+
119
+ | Method | Description |
120
+ |--------|-------------|
121
+ | `.subscribe(event_or_class, &block)` | Subscribe to one event (string) or every event named on a class. |
122
+ | `.unsubscribe(subscriber)` | Remove a subscription. |
123
+ | `.instrument(event, payload) { ... }` | Emit an event, wrapping the block in a timer. |
124
+
125
+ See [events.md](events.md) for the event catalog.
126
+
127
+ ---
128
+
129
+ ## `SiteMaps::RobotsTxt`
130
+
131
+ | Method | Description |
132
+ |--------|-------------|
133
+ | `.sitemap_directive(url) → String` | Return `"Sitemap: <url>"`. |
134
+ | `.render(sitemap_url:, extra_directives: []) → String` | Build a full robots.txt body. |
135
+
136
+ ---
137
+
138
+ ## `SiteMaps::Ping`
139
+
140
+ | Method | Description |
141
+ |--------|-------------|
142
+ | `.ping(url, engines: { bing: '...' }) → Hash` | Fire a GET to each engine's template (substituting `%{url}`). Returns a hash of `{engine => { status:, url: }}`. |
143
+
144
+ ---
145
+
146
+ ## CLI entry point
147
+
148
+ `exec/site_maps` — the executable shipped with the gem.
149
+
150
+ ```bash
151
+ bundle exec site_maps generate [processes] [options]
152
+ ```
153
+
154
+ See [cli.md](cli.md).
data/docs/cli.md ADDED
@@ -0,0 +1,93 @@
1
+ # CLI
2
+
3
+ The gem installs a `site_maps` executable backed by Thor.
4
+
5
+ ```bash
6
+ bundle exec site_maps generate [PROCESS_NAMES...] [options]
7
+ ```
8
+
9
+ If no process names are given, every process in the config file is enqueued.
10
+
11
+ ## Options
12
+
13
+ | Flag | Default | Purpose |
14
+ |------|---------|---------|
15
+ | `--config-file`, `-r` | — | Path to the config file defining processes. **Required.** |
16
+ | `--max-threads`, `-c` | `4` | Thread pool size for concurrent process execution. |
17
+ | `--context` | `{}` | Hash-style kwargs passed to `SiteMaps.define` blocks: `--context=tenant:acme locale:en`. |
18
+ | `--enqueue-remaining` | `false` | In addition to specified processes, enqueue any others. |
19
+ | `--ping` | `false` | Override config to ping search engines after generation. |
20
+ | `--debug` | `false` | Set logger to DEBUG level. |
21
+ | `--logfile` | — | Write logs to a file instead of stdout. |
22
+
23
+ ## Examples
24
+
25
+ Generate everything:
26
+
27
+ ```bash
28
+ bundle exec site_maps generate --config-file config/sitemap.rb
29
+ ```
30
+
31
+ Regenerate a single shard of a dynamic process:
32
+
33
+ ```bash
34
+ bundle exec site_maps generate monthly_posts \
35
+ --config-file config/sitemap.rb \
36
+ --context=year:2024 month:3
37
+ ```
38
+
39
+ Generate `posts` and `products`, then let the config decide what else to include:
40
+
41
+ ```bash
42
+ bundle exec site_maps generate posts products \
43
+ --config-file config/sitemap.rb \
44
+ --enqueue-remaining
45
+ ```
46
+
47
+ Tune concurrency:
48
+
49
+ ```bash
50
+ bundle exec site_maps generate --config-file config/sitemap.rb --max-threads 10
51
+ ```
52
+
53
+ Ping Bing and any custom engines (config-driven — see below):
54
+
55
+ ```bash
56
+ bundle exec site_maps generate --config-file config/sitemap.rb --ping
57
+ ```
58
+
59
+ ## Search-engine pinging
60
+
61
+ Pinging is off by default. Enable globally in config or flip it on per run via `--ping`.
62
+
63
+ ```ruby
64
+ SiteMaps.use(:file_system) do
65
+ configure do |config|
66
+ config.url = 'https://example.com/sitemap.xml'
67
+ config.ping_search_engines = true
68
+ config.ping_engines = {
69
+ bing: 'https://www.bing.com/ping?sitemap=%{url}',
70
+ custom: 'https://search.example.com/ping?url=%{url}'
71
+ }
72
+ end
73
+ end
74
+ ```
75
+
76
+ `%{url}` in the template is replaced with a URL-encoded `config.url` at ping time.
77
+
78
+ ## Rails / bundler
79
+
80
+ The CLI auto-requires `config/environment` if it detects a `config/application.rb`, so Rails URL helpers (via the Railtie) are available inside your config file.
81
+
82
+ If you don't want that — say, a Ruby-only script in a Rails repo — pass a config file outside the Rails root or invoke the library directly via `SiteMaps.generate(...)`.
83
+
84
+ ## Logging
85
+
86
+ - `--debug` sets the logger to `Logger::DEBUG`.
87
+ - `--logfile PATH` writes to a file; otherwise stdout.
88
+ - A built-in event listener prints one line per finalized URL set with link counts and runtime.
89
+
90
+ ## Exit codes
91
+
92
+ - `0` — success.
93
+ - Non-zero — any process raised. Errors are captured per-future and re-raised after all futures complete, so you see the real backtrace rather than a generic runner failure.
data/docs/events.md ADDED
@@ -0,0 +1,79 @@
1
+ # Events
2
+
3
+ `site_maps` ships a lightweight pub/sub system under `SiteMaps::Notification`. Use it for logging, metrics, or reacting to particular generation phases.
4
+
5
+ ## Subscribing
6
+
7
+ ### Block subscribers
8
+
9
+ ```ruby
10
+ SiteMaps::Notification.subscribe('sitemaps.finalize_urlset') do |event|
11
+ Rails.logger.info(
12
+ "[sitemap] wrote #{event[:links_count]} urls to #{event[:url]} in #{event[:runtime]}s"
13
+ )
14
+ end
15
+ ```
16
+
17
+ ### Class subscribers
18
+
19
+ A class with one method per event name (dots become underscores):
20
+
21
+ ```ruby
22
+ class SitemapMetrics
23
+ def self.sitemaps_process_execution(event)
24
+ StatsD.timing('sitemaps.process', event[:runtime], tags: ["process:#{event[:process].name}"])
25
+ end
26
+
27
+ def self.sitemaps_finalize_urlset(event)
28
+ StatsD.increment('sitemaps.urlset.written', tags: ["url:#{event[:url]}"])
29
+ end
30
+
31
+ def self.sitemaps_ping(event)
32
+ event[:results].each do |engine, result|
33
+ StatsD.increment('sitemaps.ping', tags: ["engine:#{engine}", "status:#{result[:status]}"])
34
+ end
35
+ end
36
+ end
37
+
38
+ SiteMaps::Notification.subscribe(SitemapMetrics)
39
+ ```
40
+
41
+ ### The built-in listener
42
+
43
+ For colored terminal output during CLI runs:
44
+
45
+ ```ruby
46
+ SiteMaps::Notification.subscribe(SiteMaps::Runner::EventListener)
47
+ ```
48
+
49
+ This is subscribed automatically by the CLI.
50
+
51
+ ## Events
52
+
53
+ | Event | Payload keys |
54
+ |-------|-------------|
55
+ | `sitemaps.enqueue_process` | `process`, `kwargs` |
56
+ | `sitemaps.before_process_execution` | `process`, `kwargs` |
57
+ | `sitemaps.process_execution` | `process`, `kwargs`, `runtime` |
58
+ | `sitemaps.finalize_urlset` | `url`, `links_count`, `news_count`, `last_modified`, `runtime`, `process` |
59
+ | `sitemaps.ping` | `results` |
60
+
61
+ `process` is a `SiteMaps::Process` struct (`name`, `location_template`, `kwargs_template`, `block`).
62
+
63
+ ## Event ordering
64
+
65
+ For each process the sequence is:
66
+
67
+ 1. `sitemaps.enqueue_process`
68
+ 2. `sitemaps.before_process_execution`
69
+ 3. One or more `sitemaps.finalize_urlset` (one per split file)
70
+ 4. `sitemaps.process_execution`
71
+
72
+ After all processes complete, one final `sitemaps.finalize_urlset` fires for the sitemap index itself. If pinging is enabled, `sitemaps.ping` fires last.
73
+
74
+ ## Use cases
75
+
76
+ - **Logging.** Tail-friendly output of what just ran, how many URLs, runtime.
77
+ - **Metrics.** StatsD / OpenTelemetry counters for throughput and ping outcomes.
78
+ - **Alerting.** Subscribe to `sitemaps.ping`, alert on non-200 results.
79
+ - **Cache busting.** After `sitemaps.finalize_urlset`, purge the CDN entry for the written URL.
@@ -0,0 +1,141 @@
1
+ # SEO Extensions
2
+
3
+ `s.add` accepts options for every sitemap extension recognized by Google and Bing. Pass any of the following alongside `lastmod`, `priority`, and `changefreq`.
4
+
5
+ ## Image
6
+
7
+ Up to 1,000 images per URL.
8
+
9
+ ```ruby
10
+ s.add('/gallery/summer', images: [
11
+ {
12
+ loc: 'https://cdn.example.com/summer/beach.jpg',
13
+ title: 'Beach sunset',
14
+ caption: 'A photo from the summer trip',
15
+ geo_location: 'Cape Cod, MA',
16
+ license: 'https://creativecommons.org/licenses/by/4.0/'
17
+ }
18
+ ])
19
+ ```
20
+
21
+ ## Video
22
+
23
+ Up to 1,000 video entries per sitemap file.
24
+
25
+ ```ruby
26
+ s.add('/videos/how-to', videos: [
27
+ {
28
+ thumbnail_loc: 'https://cdn.example.com/thumbs/how-to.jpg',
29
+ title: 'How to use site_maps',
30
+ description: 'A quick walkthrough',
31
+ content_loc: 'https://cdn.example.com/videos/how-to.mp4',
32
+ player_loc: 'https://example.com/embed/how-to',
33
+ duration: 600,
34
+ publication_date: Time.now,
35
+ rating: 4.8,
36
+ view_count: 12_345,
37
+ family_friendly: true,
38
+ requires_subscription: false,
39
+ live: false,
40
+ tags: %w[tutorial guide],
41
+ category: 'Technology',
42
+ uploader: 'example-team',
43
+ uploader_info: 'https://example.com/about',
44
+ gallery_loc: 'https://example.com/videos',
45
+ gallery_title: 'Example video gallery',
46
+ price: nil,
47
+ allow_embed: true,
48
+ autoplay: 'ap=1'
49
+ }
50
+ ])
51
+ ```
52
+
53
+ ## News
54
+
55
+ Up to 1,000 news entries per sitemap file (use a dedicated process for news URLs).
56
+
57
+ ```ruby
58
+ s.add('/news/breaking', news: {
59
+ publication_name: 'Example Times',
60
+ publication_language: 'en',
61
+ publication_date: Time.now,
62
+ title: 'Breaking news headline',
63
+ keywords: 'breaking, politics',
64
+ genres: 'PressRelease',
65
+ access: 'Subscription',
66
+ stock_tickers: 'NASDAQ:EXMP'
67
+ })
68
+ ```
69
+
70
+ ## Alternate language / hreflang
71
+
72
+ ```ruby
73
+ s.add('/', alternates: [
74
+ { href: 'https://example.com/en', lang: 'en' },
75
+ { href: 'https://example.com/es', lang: 'es' },
76
+ { href: 'https://example.com/fr', lang: 'fr', nofollow: true }
77
+ ])
78
+ ```
79
+
80
+ The `nofollow: true` variant emits `rel="nofollow alternate"` on the link. Use it to declare locale variants without signalling Google to crawl them as equivalents.
81
+
82
+ ## Mobile
83
+
84
+ Declare a URL as mobile-friendly:
85
+
86
+ ```ruby
87
+ s.add('/mobile-page', mobile: true)
88
+ ```
89
+
90
+ ## PageMap
91
+
92
+ Structured data for Google Custom Search.
93
+
94
+ ```ruby
95
+ s.add('/products/widget', pagemap: {
96
+ dataobjects: [
97
+ {
98
+ type: 'product',
99
+ id: 'sku-123',
100
+ attributes: [
101
+ { name: 'name', value: 'Widget' },
102
+ { name: 'price', value: '19.99' },
103
+ { name: 'color', value: 'blue' }
104
+ ]
105
+ }
106
+ ]
107
+ })
108
+ ```
109
+
110
+ ## Combined example
111
+
112
+ Everything can coexist on a single URL:
113
+
114
+ ```ruby
115
+ s.add('/products/widget',
116
+ lastmod: Time.now,
117
+ priority: 0.9,
118
+ changefreq: 'weekly',
119
+ images: [{ loc: 'https://cdn.example.com/widget.jpg', title: 'Widget' }],
120
+ alternates: [{ href: 'https://example.com/es/products/widget', lang: 'es' }],
121
+ mobile: true,
122
+ pagemap: { dataobjects: [{ type: 'product', id: 'sku-123', attributes: [] }] }
123
+ )
124
+ ```
125
+
126
+ ## Disabling `priority` / `changefreq`
127
+
128
+ Both fields are optional per the sitemap spec, and many search engines ignore them. Disable globally if you want smaller files:
129
+
130
+ ```ruby
131
+ configure do |config|
132
+ config.emit_priority = false
133
+ config.emit_changefreq = false
134
+ end
135
+ ```
136
+
137
+ ## Output size
138
+
139
+ - Per URL set: 50,000 links **or** 1,000 news items **or** 50 MB uncompressed — whichever comes first. When one of these is hit, the current file is finalized and a new one starts.
140
+ - File naming is automatic (`posts/sitemap.xml` → `posts/sitemap1.xml`, `posts/sitemap2.xml`, …).
141
+ - Use the `.gz` extension in `config.url` to emit gzipped files — most search engines fetch either form.