site_maps 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: d749ec5643546f3f0570f39f327a18574c0183185bdcbf783aab6033a1d6165e
4
- data.tar.gz: d7445d56a854a6d483e1017b9fdbbfc6f1d57caf58f3780ed10b14cc49e36b9b
3
+ metadata.gz: 9246737e11d28f750005fedc117c0c2afb8c5f5ef606218fc505e1042c4da002
4
+ data.tar.gz: 80f005cca72244cfe1044c30cf91d1656b22bb7ff3047a2fd159eee2ddd2f3a5
5
5
  SHA512:
6
- metadata.gz: 225276c96b94cdc9b02fba44d089b26ed794751771aff2ae4c14cc5e7bb153a2ddb4b1d4058d9e716a212851fded48f288fa178a4b4058d7525d24afaf0c120d
7
- data.tar.gz: 1a9a604cab8bbd6cfaad8a45619edf3ee71257f0ab91112c3a62a4edfd1cd657403f993bd36673dcf964c1544a357cfc37712fb94b09fa955c8374fdc3440e2c
6
+ metadata.gz: 6ebb8a69803018a65c995e61cc714385ae965b04fc8df330cb7a5e99fd37539568cc1661e86da70f1e7e90eef108ba9a9c5ba1bc6e4b50d5c17f41856e4df934
7
+ data.tar.gz: 9dbf41444903708643f4639ae773470fc2043eeb0deb9061dedccef5ceda30210435db6faec1d1f949baa8200841d7245c40ac97df88f8993b57e5c1caeaa41b
data/.tool-versions CHANGED
@@ -1 +1 @@
1
- ruby 3.0.7
1
+ ruby 3.4.9
data/CHANGELOG.md CHANGED
@@ -4,5 +4,10 @@ All notable changes to this project will be documented in this file.
4
4
 
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
6
6
 
7
+ ## 0.1.1 - 2026-05-12
8
+
9
+ ### Fixed
10
+ - AwsSdk adapter: switched from the deprecated `Aws::S3::Object#upload_file` to `Aws::S3::TransferManager#upload_file` to silence the deprecation warning and keep working past the next aws-sdk-s3 major.
11
+
7
12
  ## 0.0.1.beta1 - 2024-11-07
8
13
  The first release of the gem
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- site_maps (0.1.0)
4
+ site_maps (0.1.1)
5
5
  builder (~> 3.0)
6
6
  concurrent-ruby (>= 1.1)
7
7
  rack (>= 2.0)
data/README.md CHANGED
@@ -4,8 +4,13 @@ A concurrent, incremental sitemap generator for Ruby. Framework-agnostic with bu
4
4
 
5
5
  Generates SEO-optimized XML sitemaps with support for sitemap indexes, XSL stylesheets, gzip compression, image/video/news extensions, search engine pinging, and Rack middleware for serving sitemaps with proper HTTP headers.
6
6
 
7
+ ## Documentation
8
+
9
+ Full guides, adapter reference, CLI docs, and recipes are published at **[gems.marcosz.com.br/site_maps](https://gems.marcosz.com.br/site_maps/)** — part of the [marcosgz Ruby gem catalogue](https://gems.marcosz.com.br).
10
+
7
11
  ## Table of Contents
8
12
 
13
+ - [Documentation](#documentation)
9
14
  - [Installation](#installation)
10
15
  - [Quick Start](#quick-start)
11
16
  - [Configuration](#configuration)
data/docs/README.md ADDED
@@ -0,0 +1,67 @@
1
+ # site_maps
2
+
3
+ Concurrent, adapter-based sitemap.xml generation for Ruby applications.
4
+
5
+ `site_maps` is a framework-agnostic sitemap builder with built-in Rails support. It produces valid sitemap XML (with full SEO extensions — image, video, news, hreflang, mobile, PageMap), splits large sitemaps into indexed chunks automatically, generates them concurrently across a thread pool, and ships them to the filesystem, S3, or a custom backend through a pluggable adapter layer.
6
+
7
+ ## Contents
8
+
9
+ - [Getting started](getting-started.md) — install, first sitemap, Rails
10
+ - [Processes](processes.md) — static and dynamic process DSL
11
+ - [Adapters](adapters.md) — filesystem, S3, no-op, custom
12
+ - [CLI](cli.md) — `site_maps generate`
13
+ - [Rack middleware](middleware.md) — serve generated sitemaps from the app
14
+ - [SEO extensions](extensions.md) — image, video, news, hreflang, mobile, PageMap
15
+ - [Events](events.md) — instrumentation hooks
16
+ - [Rails integration](rails.md) — URL helpers, Railtie, precompile
17
+ - [API reference](api.md) — full public API
18
+
19
+ ## Install
20
+
21
+ ```ruby
22
+ # Gemfile
23
+ gem 'site_maps'
24
+ ```
25
+
26
+ ## One-minute tour
27
+
28
+ ```ruby
29
+ # config/sitemap.rb
30
+ SiteMaps.use(:file_system) do
31
+ configure do |config|
32
+ config.url = 'https://example.com/sitemap.xml'
33
+ config.directory = Rails.public_path.to_s
34
+ end
35
+
36
+ process do |s|
37
+ s.add('/', priority: 1.0, changefreq: 'daily')
38
+ s.add('/about', lastmod: Time.now)
39
+
40
+ Post.find_each do |post|
41
+ s.add("/posts/#{post.slug}", lastmod: post.updated_at)
42
+ end
43
+ end
44
+ end
45
+ ```
46
+
47
+ ```bash
48
+ bundle exec site_maps generate --config-file config/sitemap.rb
49
+ ```
50
+
51
+ Generated: `public/sitemap.xml` (plus an indexed chain if the URL set exceeds 50k links).
52
+
53
+ ## Why site_maps
54
+
55
+ - **Concurrency.** Processes run in a `Concurrent::FixedThreadPool`; threads share a thread-safe repo that handles file splitting.
56
+ - **Pluggable storage.** Write the same sitemap to disk in development and S3 in production by swapping one line.
57
+ - **Incremental sitemaps.** Full URL extensions support — images, videos, news, hreflang alternates, mobile, PageMap.
58
+ - **Dynamic processes.** Parameterized templates like `posts/%{year}-%{month}/sitemap.xml` let you rebuild a single shard without regenerating the whole site.
59
+
60
+ ## Version
61
+
62
+ - Ruby: `>= 3.2.0`
63
+ - Depends on: `builder ~> 3.0`, `concurrent-ruby >= 1.1`, `rack >= 2.0`, `zeitwerk`, `thor`
64
+
65
+ ## License
66
+
67
+ MIT.
data/docs/adapters.md ADDED
@@ -0,0 +1,143 @@
1
+ # Adapters
2
+
3
+ An **adapter** is the storage backend for generated sitemap files. Three adapters ship with the gem; a clean interface makes it easy to write your own.
4
+
5
+ ## Built-in adapters
6
+
7
+ | Adapter | When to use |
8
+ |---------|-------------|
9
+ | `:file_system` | Write to disk. Ideal for local dev, or for serving via the bundled Rack middleware. |
10
+ | `:aws_sdk` | Upload to S3. Production deployments behind CloudFront or similar. |
11
+ | `:noop` | Discard writes. Ideal for tests that care about "what URLs got added" but not "what ended up on disk". |
12
+
13
+ Select with `SiteMaps.use(<symbol>)`.
14
+
15
+ ## `:file_system`
16
+
17
+ ```ruby
18
+ SiteMaps.use(:file_system) do
19
+ configure do |config|
20
+ config.url = 'https://example.com/sitemap.xml'
21
+ config.directory = Rails.public_path.to_s # default: "public/sitemaps"
22
+ end
23
+ process { |s| ... }
24
+ end
25
+ ```
26
+
27
+ **Config attributes:**
28
+
29
+ | Key | Purpose |
30
+ |-----|---------|
31
+ | `url` | Public URL — drives filename layout and is written into sitemap `<loc>` entries. |
32
+ | `directory` | Filesystem root under which files land. |
33
+
34
+ If `config.url` ends in `.gz`, the adapter writes gzipped files. The middleware transparently decompresses on serve.
35
+
36
+ ## `:aws_sdk`
37
+
38
+ ```ruby
39
+ SiteMaps.use(:aws_sdk) do
40
+ configure do |config|
41
+ config.url = 'https://my-bucket.s3.amazonaws.com/sitemap.xml'
42
+ config.directory = '/tmp/sitemaps' # local scratch space
43
+ config.bucket = 'my-bucket'
44
+ config.region = ENV.fetch('AWS_REGION', 'us-east-1')
45
+ config.access_key_id = ENV['AWS_ACCESS_KEY_ID']
46
+ config.secret_access_key = ENV['AWS_SECRET_ACCESS_KEY']
47
+ config.acl = 'public-read' # default
48
+ config.cache_control = 'private, max-age=0, no-cache'
49
+ end
50
+ process { |s| ... }
51
+ end
52
+ ```
53
+
54
+ **Config attributes:**
55
+
56
+ | Key | Default |
57
+ |-----|---------|
58
+ | `bucket` | `ENV['AWS_BUCKET']` |
59
+ | `region` | `ENV.fetch('AWS_REGION', 'us-east-1')` |
60
+ | `access_key_id` | `ENV['AWS_ACCESS_KEY_ID']` |
61
+ | `secret_access_key` | `ENV['AWS_SECRET_ACCESS_KEY']` |
62
+ | `acl` | `"public-read"` |
63
+ | `cache_control` | `"private, max-age=0, no-cache"` |
64
+ | `directory` | Local scratch dir for staging before upload |
65
+
66
+ The adapter writes locally first (to `directory`), then uploads to S3 with the configured ACL and Cache-Control headers. You'll need `aws-sdk-s3` in your Gemfile:
67
+
68
+ ```ruby
69
+ gem 'aws-sdk-s3'
70
+ ```
71
+
72
+ ## `:noop`
73
+
74
+ ```ruby
75
+ SiteMaps.use(:noop) do
76
+ configure { |c| c.url = 'https://example.com/sitemap.xml' }
77
+ process { |s| ... }
78
+ end
79
+ ```
80
+
81
+ Writes are discarded. Use it in tests when you want to assert on the URLs being added (via events, for example) without hitting disk.
82
+
83
+ ## Writing a custom adapter
84
+
85
+ Subclass `SiteMaps::Adapters::Adapter` and implement `write`, `read`, `delete`:
86
+
87
+ ```ruby
88
+ class GoogleCloudStorageAdapter < SiteMaps::Adapters::Adapter
89
+ class Config < SiteMaps::Configuration
90
+ attribute :bucket
91
+ attribute :project_id
92
+ end
93
+
94
+ def write(url, raw_data, **_kwargs)
95
+ storage = Google::Cloud::Storage.new(project_id: config.project_id)
96
+ bucket = storage.bucket(config.bucket)
97
+ bucket.create_file(StringIO.new(raw_data), path_from(url))
98
+ end
99
+
100
+ def read(url)
101
+ file = storage.bucket(config.bucket).file(path_from(url))
102
+ [file.download.string, { content_type: 'application/xml' }]
103
+ end
104
+
105
+ def delete(url)
106
+ storage.bucket(config.bucket).file(path_from(url))&.delete
107
+ end
108
+
109
+ private
110
+
111
+ def path_from(url)
112
+ URI(url).path[1..]
113
+ end
114
+
115
+ def storage
116
+ @storage ||= Google::Cloud::Storage.new(project_id: config.project_id)
117
+ end
118
+ end
119
+ ```
120
+
121
+ Register and use it:
122
+
123
+ ```ruby
124
+ SiteMaps.use(GoogleCloudStorageAdapter) do
125
+ configure do |config|
126
+ config.url = 'https://cdn.example.com/sitemap.xml'
127
+ config.bucket = 'my-bucket'
128
+ config.project_id = 'my-project'
129
+ end
130
+ process { |s| ... }
131
+ end
132
+ ```
133
+
134
+ ## Adapter interface
135
+
136
+ | Method | Purpose |
137
+ |--------|---------|
138
+ | `#write(url, raw_data, **kwargs)` | Persist `raw_data` at the location implied by `url`. |
139
+ | `#read(url)` | Return `[raw_data, { content_type: '…' }]` for the given URL. |
140
+ | `#delete(url)` | Remove the file at the URL. |
141
+ | `.config_class` | (optional) Return a `Configuration` subclass to expose adapter-specific settings. |
142
+
143
+ The adapter base class handles everything else: URL filters, the process registry, and thread-safe URL tracking.
data/docs/api.md ADDED
@@ -0,0 +1,154 @@
1
+ # API Reference
2
+
3
+ ## `SiteMaps` (top-level module)
4
+
5
+ | Method | Description |
6
+ |--------|-------------|
7
+ | `SiteMaps.use(adapter, **opts, &block)` | Register an adapter (`:file_system`, `:aws_sdk`, `:noop`, or a class) and yield its configuration block. |
8
+ | `SiteMaps.define(&block)` | Register a context-aware definition. Called by `.generate` with the `context:` hash splatted as kwargs. |
9
+ | `SiteMaps.configure { |config| ... }` | Mutate global defaults. |
10
+ | `SiteMaps.config` | Return global `Configuration`. |
11
+ | `SiteMaps.generate(config_file:, context: {}, **runner_opts) → Runner` | Load `config_file` and return a `Runner` ready to `.enqueue` and `.run`. |
12
+ | `SiteMaps.current_adapter` | Last-registered adapter (thread-local during `.generate`). |
13
+ | `SiteMaps.logger` | Configurable logger (default `Logger.new($stdout)`). |
14
+
15
+ ### Constants
16
+
17
+ ```ruby
18
+ SiteMaps::MAX_LENGTH # { links: 50_000, images: 1_000, news: 1_000 }
19
+ SiteMaps::MAX_FILESIZE # 50_000_000 bytes
20
+ ```
21
+
22
+ ### Errors
23
+
24
+ - `SiteMaps::Error` — base error
25
+ - `SiteMaps::AdapterNotFound` — unknown adapter symbol
26
+ - `SiteMaps::AdapterNotSetError` — generate called without an adapter
27
+ - `SiteMaps::FileNotFoundError` — missing file at adapter read
28
+ - `SiteMaps::FullSitemapError` — internal signal that a URL set is full (triggers split)
29
+ - `SiteMaps::ConfigurationError` — invalid config
30
+
31
+ ---
32
+
33
+ ## `SiteMaps::Configuration`
34
+
35
+ Base configuration. Adapter configs subclass this.
36
+
37
+ | Attribute | Default | Purpose |
38
+ |-----------|---------|---------|
39
+ | `url` | — (required) | Public URL of the main sitemap index. |
40
+ | `directory` | `"/tmp/sitemaps"` | Local storage directory. |
41
+ | `max_links` | `50_000` | URLs per file before split. |
42
+ | `emit_priority` | `true` | Emit `<priority>`. |
43
+ | `emit_changefreq` | `true` | Emit `<changefreq>`. |
44
+ | `xsl_stylesheet_url` | `nil` | Stylesheet for URL sets. |
45
+ | `xsl_index_stylesheet_url` | `nil` | Stylesheet for the sitemap index. |
46
+ | `ping_search_engines` | `false` | Auto-ping after generation. |
47
+ | `ping_engines` | `{ bing: '...' }` | URL templates per engine; `%{url}` is URL-encoded at ping time. |
48
+
49
+ ---
50
+
51
+ ## `SiteMaps::Adapters::Adapter` (base class)
52
+
53
+ Abstract base. Subclass to build custom adapters.
54
+
55
+ | Method | Description |
56
+ |--------|-------------|
57
+ | `.config_class` | Override to return a `Configuration` subclass with adapter-specific attributes. |
58
+ | `#write(url, raw_data, **kwargs)` | Abstract. Persist `raw_data` at the storage location implied by `url`. |
59
+ | `#read(url) → [raw_data, { content_type: '…' }]` | Abstract. |
60
+ | `#delete(url)` | Abstract. |
61
+ | `#configure { |c| ... }` | Yield the adapter's configuration. |
62
+ | `#process(name = :default, location = nil, **kwargs, &block)` | Register a process. |
63
+ | `#external_sitemap(url, lastmod:)` | Add an external sitemap to the index. |
64
+ | `#extend_processes_with(mod)` | Mix `mod` into all process blocks. |
65
+ | `#url_filter { |url, options| ... }` | Register a URL filter. |
66
+ | `#apply_url_filters(url, options)` | Run all filters; returns modified options or `nil` if excluded. |
67
+ | `#reset!` | Clear index and repo. Called before `Runner#run`. |
68
+
69
+ ---
70
+
71
+ ## `SiteMaps::Runner`
72
+
73
+ Executes enqueued processes concurrently.
74
+
75
+ ```ruby
76
+ Runner.new(adapter = SiteMaps.current_adapter, max_threads: 4, ping: nil)
77
+ ```
78
+
79
+ | Method | Description |
80
+ |--------|-------------|
81
+ | `#enqueue(process_name, **kwargs)` | Queue one process with kwargs. |
82
+ | `#enqueue_remaining` / `#enqueue_all` | Queue every process not yet enqueued. |
83
+ | `#run` | Execute queued processes, finalize index, optionally ping. |
84
+
85
+ ---
86
+
87
+ ## `SiteMaps::SitemapBuilder`
88
+
89
+ Yielded as `s` inside every `process` block.
90
+
91
+ | Method | Description |
92
+ |--------|-------------|
93
+ | `#add(path, **options)` | Add one URL to the current URL set. Automatically splits when full. |
94
+ | `#finalize!` | Finalize the current URL set. Called automatically when the process block returns. |
95
+
96
+ `options` supports every extension documented in [extensions.md](extensions.md): `lastmod`, `priority`, `changefreq`, `images`, `videos`, `news`, `alternates`, `mobile`, `pagemap`.
97
+
98
+ In Rails apps, `s.route` is an object exposing all URL helpers.
99
+
100
+ ---
101
+
102
+ ## `SiteMaps::Middleware`
103
+
104
+ Rack middleware for serving generated sitemaps. See [middleware.md](middleware.md).
105
+
106
+ ```ruby
107
+ use SiteMaps::Middleware,
108
+ adapter: ...,
109
+ public_prefix: nil,
110
+ storage_prefix: nil,
111
+ x_robots_tag: 'noindex, follow',
112
+ cache_control: 'public, max-age=3600'
113
+ ```
114
+
115
+ ---
116
+
117
+ ## `SiteMaps::Notification`
118
+
119
+ | Method | Description |
120
+ |--------|-------------|
121
+ | `.subscribe(event_or_class, &block)` | Subscribe to one event (string) or every event named on a class. |
122
+ | `.unsubscribe(subscriber)` | Remove a subscription. |
123
+ | `.instrument(event, payload) { ... }` | Emit an event, wrapping the block in a timer. |
124
+
125
+ See [events.md](events.md) for the event catalog.
126
+
127
+ ---
128
+
129
+ ## `SiteMaps::RobotsTxt`
130
+
131
+ | Method | Description |
132
+ |--------|-------------|
133
+ | `.sitemap_directive(url) → String` | Return `"Sitemap: <url>"`. |
134
+ | `.render(sitemap_url:, extra_directives: []) → String` | Build a full robots.txt body. |
135
+
136
+ ---
137
+
138
+ ## `SiteMaps::Ping`
139
+
140
+ | Method | Description |
141
+ |--------|-------------|
142
+ | `.ping(url, engines: { bing: '...' }) → Hash` | Fire a GET to each engine's template (substituting `%{url}`). Returns a hash of `{engine => { status:, url: }}`. |
143
+
144
+ ---
145
+
146
+ ## CLI entry point
147
+
148
+ `exec/site_maps` — the executable shipped with the gem.
149
+
150
+ ```bash
151
+ bundle exec site_maps generate [processes] [options]
152
+ ```
153
+
154
+ See [cli.md](cli.md).
data/docs/cli.md ADDED
@@ -0,0 +1,93 @@
1
+ # CLI
2
+
3
+ The gem installs a `site_maps` executable backed by Thor.
4
+
5
+ ```bash
6
+ bundle exec site_maps generate [PROCESS_NAMES...] [options]
7
+ ```
8
+
9
+ If no process names are given, every process in the config file is enqueued.
10
+
11
+ ## Options
12
+
13
+ | Flag | Default | Purpose |
14
+ |------|---------|---------|
15
+ | `--config-file`, `-r` | — | Path to the config file defining processes. **Required.** |
16
+ | `--max-threads`, `-c` | `4` | Thread pool size for concurrent process execution. |
17
+ | `--context` | `{}` | Hash-style kwargs passed to `SiteMaps.define` blocks: `--context=tenant:acme locale:en`. |
18
+ | `--enqueue-remaining` | `false` | In addition to specified processes, enqueue any others. |
19
+ | `--ping` | `false` | Override config to ping search engines after generation. |
20
+ | `--debug` | `false` | Set logger to DEBUG level. |
21
+ | `--logfile` | — | Write logs to a file instead of stdout. |
22
+
23
+ ## Examples
24
+
25
+ Generate everything:
26
+
27
+ ```bash
28
+ bundle exec site_maps generate --config-file config/sitemap.rb
29
+ ```
30
+
31
+ Regenerate a single shard of a dynamic process:
32
+
33
+ ```bash
34
+ bundle exec site_maps generate monthly_posts \
35
+ --config-file config/sitemap.rb \
36
+ --context=year:2024 month:3
37
+ ```
38
+
39
+ Generate `posts` and `products`, then let the config decide what else to include:
40
+
41
+ ```bash
42
+ bundle exec site_maps generate posts products \
43
+ --config-file config/sitemap.rb \
44
+ --enqueue-remaining
45
+ ```
46
+
47
+ Tune concurrency:
48
+
49
+ ```bash
50
+ bundle exec site_maps generate --config-file config/sitemap.rb --max-threads 10
51
+ ```
52
+
53
+ Ping Bing and any custom engines (config-driven — see below):
54
+
55
+ ```bash
56
+ bundle exec site_maps generate --config-file config/sitemap.rb --ping
57
+ ```
58
+
59
+ ## Search-engine pinging
60
+
61
+ Pinging is off by default. Enable globally in config or flip it on per run via `--ping`.
62
+
63
+ ```ruby
64
+ SiteMaps.use(:file_system) do
65
+ configure do |config|
66
+ config.url = 'https://example.com/sitemap.xml'
67
+ config.ping_search_engines = true
68
+ config.ping_engines = {
69
+ bing: 'https://www.bing.com/ping?sitemap=%{url}',
70
+ custom: 'https://search.example.com/ping?url=%{url}'
71
+ }
72
+ end
73
+ end
74
+ ```
75
+
76
+ `%{url}` in the template is replaced with a URL-encoded `config.url` at ping time.
77
+
78
+ ## Rails / bundler
79
+
80
+ The CLI auto-requires `config/environment` if it detects a `config/application.rb`, so Rails URL helpers (via the Railtie) are available inside your config file.
81
+
82
+ If you don't want that — say, a Ruby-only script in a Rails repo — pass a config file outside the Rails root or invoke the library directly via `SiteMaps.generate(...)`.
83
+
84
+ ## Logging
85
+
86
+ - `--debug` sets the logger to `Logger::DEBUG`.
87
+ - `--logfile PATH` writes to a file; otherwise stdout.
88
+ - A built-in event listener prints one line per finalized URL set with link counts and runtime.
89
+
90
+ ## Exit codes
91
+
92
+ - `0` — success.
93
+ - Non-zero — any process raised. Errors are captured per-future and re-raised after all futures complete, so you see the real backtrace rather than a generic runner failure.
data/docs/events.md ADDED
@@ -0,0 +1,79 @@
1
+ # Events
2
+
3
+ `site_maps` ships a lightweight pub/sub system under `SiteMaps::Notification`. Use it for logging, metrics, or reacting to particular generation phases.
4
+
5
+ ## Subscribing
6
+
7
+ ### Block subscribers
8
+
9
+ ```ruby
10
+ SiteMaps::Notification.subscribe('sitemaps.finalize_urlset') do |event|
11
+ Rails.logger.info(
12
+ "[sitemap] wrote #{event[:links_count]} urls to #{event[:url]} in #{event[:runtime]}s"
13
+ )
14
+ end
15
+ ```
16
+
17
+ ### Class subscribers
18
+
19
+ A class with one method per event name (dots become underscores):
20
+
21
+ ```ruby
22
+ class SitemapMetrics
23
+ def self.sitemaps_process_execution(event)
24
+ StatsD.timing('sitemaps.process', event[:runtime], tags: ["process:#{event[:process].name}"])
25
+ end
26
+
27
+ def self.sitemaps_finalize_urlset(event)
28
+ StatsD.increment('sitemaps.urlset.written', tags: ["url:#{event[:url]}"])
29
+ end
30
+
31
+ def self.sitemaps_ping(event)
32
+ event[:results].each do |engine, result|
33
+ StatsD.increment('sitemaps.ping', tags: ["engine:#{engine}", "status:#{result[:status]}"])
34
+ end
35
+ end
36
+ end
37
+
38
+ SiteMaps::Notification.subscribe(SitemapMetrics)
39
+ ```
40
+
41
+ ### The built-in listener
42
+
43
+ For colored terminal output during CLI runs:
44
+
45
+ ```ruby
46
+ SiteMaps::Notification.subscribe(SiteMaps::Runner::EventListener)
47
+ ```
48
+
49
+ This is subscribed automatically by the CLI.
50
+
51
+ ## Events
52
+
53
+ | Event | Payload keys |
54
+ |-------|-------------|
55
+ | `sitemaps.enqueue_process` | `process`, `kwargs` |
56
+ | `sitemaps.before_process_execution` | `process`, `kwargs` |
57
+ | `sitemaps.process_execution` | `process`, `kwargs`, `runtime` |
58
+ | `sitemaps.finalize_urlset` | `url`, `links_count`, `news_count`, `last_modified`, `runtime`, `process` |
59
+ | `sitemaps.ping` | `results` |
60
+
61
+ `process` is a `SiteMaps::Process` struct (`name`, `location_template`, `kwargs_template`, `block`).
62
+
63
+ ## Event ordering
64
+
65
+ For each process the sequence is:
66
+
67
+ 1. `sitemaps.enqueue_process`
68
+ 2. `sitemaps.before_process_execution`
69
+ 3. One or more `sitemaps.finalize_urlset` (one per split file)
70
+ 4. `sitemaps.process_execution`
71
+
72
+ After all processes complete, one final `sitemaps.finalize_urlset` fires for the sitemap index itself. If pinging is enabled, `sitemaps.ping` fires last.
73
+
74
+ ## Use cases
75
+
76
+ - **Logging.** Tail-friendly output of what just ran, how many URLs, runtime.
77
+ - **Metrics.** StatsD / OpenTelemetry counters for throughput and ping outcomes.
78
+ - **Alerting.** Subscribe to `sitemaps.ping`, alert on non-200 results.
79
+ - **Cache busting.** After `sitemaps.finalize_urlset`, purge the CDN entry for the written URL.
@@ -0,0 +1,141 @@
1
+ # SEO Extensions
2
+
3
+ `s.add` accepts options for every sitemap extension recognized by Google and Bing. Pass any of the following alongside `lastmod`, `priority`, and `changefreq`.
4
+
5
+ ## Image
6
+
7
+ Up to 1,000 images per URL.
8
+
9
+ ```ruby
10
+ s.add('/gallery/summer', images: [
11
+ {
12
+ loc: 'https://cdn.example.com/summer/beach.jpg',
13
+ title: 'Beach sunset',
14
+ caption: 'A photo from the summer trip',
15
+ geo_location: 'Cape Cod, MA',
16
+ license: 'https://creativecommons.org/licenses/by/4.0/'
17
+ }
18
+ ])
19
+ ```
20
+
21
+ ## Video
22
+
23
+ Up to 1,000 video entries per sitemap file.
24
+
25
+ ```ruby
26
+ s.add('/videos/how-to', videos: [
27
+ {
28
+ thumbnail_loc: 'https://cdn.example.com/thumbs/how-to.jpg',
29
+ title: 'How to use site_maps',
30
+ description: 'A quick walkthrough',
31
+ content_loc: 'https://cdn.example.com/videos/how-to.mp4',
32
+ player_loc: 'https://example.com/embed/how-to',
33
+ duration: 600,
34
+ publication_date: Time.now,
35
+ rating: 4.8,
36
+ view_count: 12_345,
37
+ family_friendly: true,
38
+ requires_subscription: false,
39
+ live: false,
40
+ tags: %w[tutorial guide],
41
+ category: 'Technology',
42
+ uploader: 'example-team',
43
+ uploader_info: 'https://example.com/about',
44
+ gallery_loc: 'https://example.com/videos',
45
+ gallery_title: 'Example video gallery',
46
+ price: nil,
47
+ allow_embed: true,
48
+ autoplay: 'ap=1'
49
+ }
50
+ ])
51
+ ```
52
+
53
+ ## News
54
+
55
+ Up to 1,000 news entries per sitemap file (use a dedicated process for news URLs).
56
+
57
+ ```ruby
58
+ s.add('/news/breaking', news: {
59
+ publication_name: 'Example Times',
60
+ publication_language: 'en',
61
+ publication_date: Time.now,
62
+ title: 'Breaking news headline',
63
+ keywords: 'breaking, politics',
64
+ genres: 'PressRelease',
65
+ access: 'Subscription',
66
+ stock_tickers: 'NASDAQ:EXMP'
67
+ })
68
+ ```
69
+
70
+ ## Alternate language / hreflang
71
+
72
+ ```ruby
73
+ s.add('/', alternates: [
74
+ { href: 'https://example.com/en', lang: 'en' },
75
+ { href: 'https://example.com/es', lang: 'es' },
76
+ { href: 'https://example.com/fr', lang: 'fr', nofollow: true }
77
+ ])
78
+ ```
79
+
80
+ The `nofollow: true` variant emits `rel="nofollow alternate"` on the link. Use it to declare locale variants without signalling Google to crawl them as equivalents.
81
+
82
+ ## Mobile
83
+
84
+ Declare a URL as mobile-friendly:
85
+
86
+ ```ruby
87
+ s.add('/mobile-page', mobile: true)
88
+ ```
89
+
90
+ ## PageMap
91
+
92
+ Structured data for Google Custom Search.
93
+
94
+ ```ruby
95
+ s.add('/products/widget', pagemap: {
96
+ dataobjects: [
97
+ {
98
+ type: 'product',
99
+ id: 'sku-123',
100
+ attributes: [
101
+ { name: 'name', value: 'Widget' },
102
+ { name: 'price', value: '19.99' },
103
+ { name: 'color', value: 'blue' }
104
+ ]
105
+ }
106
+ ]
107
+ })
108
+ ```
109
+
110
+ ## Combined example
111
+
112
+ Everything can coexist on a single URL:
113
+
114
+ ```ruby
115
+ s.add('/products/widget',
116
+ lastmod: Time.now,
117
+ priority: 0.9,
118
+ changefreq: 'weekly',
119
+ images: [{ loc: 'https://cdn.example.com/widget.jpg', title: 'Widget' }],
120
+ alternates: [{ href: 'https://example.com/es/products/widget', lang: 'es' }],
121
+ mobile: true,
122
+ pagemap: { dataobjects: [{ type: 'product', id: 'sku-123', attributes: [] }] }
123
+ )
124
+ ```
125
+
126
+ ## Disabling `priority` / `changefreq`
127
+
128
+ Both fields are optional per the sitemap spec, and many search engines ignore them. Disable globally if you want smaller files:
129
+
130
+ ```ruby
131
+ configure do |config|
132
+ config.emit_priority = false
133
+ config.emit_changefreq = false
134
+ end
135
+ ```
136
+
137
+ ## Output size
138
+
139
+ - Per URL set: 50,000 links **or** 1,000 news items **or** 50 MB uncompressed — whichever comes first. When one of these is hit, the current file is finalized and a new one starts.
140
+ - File naming is automatic (`posts/sitemap.xml` → `posts/sitemap1.xml`, `posts/sitemap2.xml`, …).
141
+ - Use the `.gz` extension in `config.url` to emit gzipped files — most search engines fetch either form.
@@ -0,0 +1,138 @@
1
+ # Getting Started
2
+
3
+ ## Install
4
+
5
+ ```ruby
6
+ # Gemfile
7
+ gem 'site_maps'
8
+ ```
9
+
10
+ ```bash
11
+ bundle install
12
+ ```
13
+
14
+ ## Your first sitemap
15
+
16
+ Create `config/sitemap.rb`:
17
+
18
+ ```ruby
19
+ SiteMaps.use(:file_system) do
20
+ configure do |config|
21
+ config.url = 'https://example.com/sitemap.xml'
22
+ config.directory = File.expand_path('public', __dir__)
23
+ end
24
+
25
+ process do |s|
26
+ s.add('/', priority: 1.0, changefreq: 'daily')
27
+ s.add('/about', priority: 0.8, lastmod: Time.now)
28
+ s.add('/contact', priority: 0.5)
29
+ end
30
+ end
31
+ ```
32
+
33
+ Generate:
34
+
35
+ ```bash
36
+ bundle exec site_maps generate --config-file config/sitemap.rb
37
+ ```
38
+
39
+ Output: `public/sitemap.xml`.
40
+
41
+ ## Dynamic URLs
42
+
43
+ Yield `s.add` for every URL you want indexed. Database records work naturally:
44
+
45
+ ```ruby
46
+ process :posts do |s|
47
+ Post.published.find_each do |post|
48
+ s.add("/posts/#{post.slug}", lastmod: post.updated_at, priority: 0.7)
49
+ end
50
+ end
51
+ ```
52
+
53
+ When the URL count of a single process exceeds `max_links` (default 50,000), the file is split into `sitemap1.xml`, `sitemap2.xml`, … and a sitemap index is written at `config.url`.
54
+
55
+ ## Named processes
56
+
57
+ Named processes get their own file and run in parallel:
58
+
59
+ ```ruby
60
+ SiteMaps.use(:file_system) do
61
+ configure { |c| c.url = 'https://example.com/sitemap.xml'; c.directory = 'public' }
62
+
63
+ process :static do |s|
64
+ s.add('/')
65
+ s.add('/about')
66
+ end
67
+
68
+ process :posts, 'posts/sitemap.xml' do |s|
69
+ Post.find_each { |p| s.add("/posts/#{p.slug}") }
70
+ end
71
+
72
+ process :products, 'products/sitemap.xml' do |s|
73
+ Product.find_each { |p| s.add("/products/#{p.id}") }
74
+ end
75
+ end
76
+ ```
77
+
78
+ Run all:
79
+
80
+ ```bash
81
+ bundle exec site_maps generate --config-file config/sitemap.rb --max-threads 4
82
+ ```
83
+
84
+ Run one:
85
+
86
+ ```bash
87
+ bundle exec site_maps generate posts --config-file config/sitemap.rb
88
+ ```
89
+
90
+ See [processes.md](processes.md) for the full process DSL including parameterized templates.
91
+
92
+ ## Using it in Rails
93
+
94
+ Add `site_maps` to your Gemfile and generate from a Rake task, a scheduled job, or your deploy pipeline. The Railtie injects URL helpers:
95
+
96
+ ```ruby
97
+ # config/sitemap.rb
98
+ SiteMaps.use(:file_system) do
99
+ configure do |config|
100
+ config.url = 'https://example.com/sitemap.xml'
101
+ config.directory = Rails.public_path.to_s
102
+ end
103
+
104
+ process do |s|
105
+ s.add(s.route.root_path, priority: 1.0)
106
+ s.add(s.route.about_path)
107
+ Post.find_each { |post| s.add(s.route.post_path(post), lastmod: post.updated_at) }
108
+ end
109
+ end
110
+ ```
111
+
112
+ See [rails.md](rails.md) for the full Rails integration, including asset precompile hooks and the Rack middleware for serving generated sitemaps.
113
+
114
+ ## Uploading to S3
115
+
116
+ Swap the adapter line:
117
+
118
+ ```ruby
119
+ SiteMaps.use(:aws_sdk) do
120
+ configure do |config|
121
+ config.url = 'https://my-bucket.s3.amazonaws.com/sitemap.xml'
122
+ config.bucket = 'my-bucket'
123
+ config.region = ENV['AWS_REGION']
124
+ # access_key_id / secret_access_key default to ENV vars
125
+ end
126
+
127
+ process { |s| ... }
128
+ end
129
+ ```
130
+
131
+ See [adapters.md](adapters.md) for adapter specifics and how to build your own.
132
+
133
+ ## Next steps
134
+
135
+ - [Processes](processes.md) — split your sitemap into static and dynamic shards
136
+ - [SEO extensions](extensions.md) — image, video, news, hreflang
137
+ - [CLI](cli.md) — automation-friendly generate command
138
+ - [Rack middleware](middleware.md) — serve the generated files with correct headers
@@ -0,0 +1,85 @@
1
+ # Rack Middleware
2
+
3
+ `SiteMaps::Middleware` serves generated sitemap files directly from the app. Useful when you've generated to `public/sitemaps/` (filesystem adapter) and want proper `Content-Type`, gzip handling, and XSL stylesheet routing without editing your web-server config.
4
+
5
+ ## Basic usage
6
+
7
+ ```ruby
8
+ # config/application.rb (Rails)
9
+ config.middleware.use SiteMaps::Middleware, adapter: -> { SiteMaps.current_adapter }
10
+ ```
11
+
12
+ Or inline in `config.ru`:
13
+
14
+ ```ruby
15
+ require 'site_maps'
16
+
17
+ use SiteMaps::Middleware, adapter: SiteMaps.current_adapter
18
+ run MyApp
19
+ ```
20
+
21
+ ## Options
22
+
23
+ ```ruby
24
+ use SiteMaps::Middleware,
25
+ adapter: SiteMaps.current_adapter,
26
+ public_prefix: nil,
27
+ storage_prefix: nil,
28
+ x_robots_tag: 'noindex, follow',
29
+ cache_control: 'public, max-age=3600'
30
+ ```
31
+
32
+ | Option | Purpose |
33
+ |--------|---------|
34
+ | `adapter` | Adapter instance (or a callable returning one — useful if the adapter is reconfigured at boot). |
35
+ | `public_prefix` | Strip from request path before lookup — e.g. `/sitemap` if your app mounts them under a sub-path. |
36
+ | `storage_prefix` | Prepend to the lookup key — e.g. `tenants/acme` for multi-tenant layouts. |
37
+ | `x_robots_tag` | `X-Robots-Tag` header added to served files. |
38
+ | `cache_control` | `Cache-Control` header. |
39
+
40
+ ## Behavior
41
+
42
+ The middleware intercepts requests for `*.xml` and `*.xml.gz` files:
43
+
44
+ - Matches → serve from the adapter with `Content-Type: application/xml`, plus `X-Robots-Tag` and `Cache-Control`.
45
+ - Gzipped sources → auto-decompress on serve so XSL stylesheets render in the browser. Clients asking for `.xml.gz` still get the compressed bytes.
46
+ - Doesn't match → `env` passes through to `@app.call`.
47
+
48
+ ## XSL stylesheets
49
+
50
+ The middleware also serves the built-in XSL stylesheets — pretty sitemap rendering for human visitors — at their referenced paths. Configure their URLs via:
51
+
52
+ ```ruby
53
+ configure do |config|
54
+ config.xsl_stylesheet_url = '/_sitemap-stylesheet.xsl'
55
+ config.xsl_index_stylesheet_url = '/_sitemap-index-stylesheet.xsl'
56
+ end
57
+ ```
58
+
59
+ ## Multi-tenant routing
60
+
61
+ For per-tenant sitemaps stored under subpaths:
62
+
63
+ ```ruby
64
+ use SiteMaps::Middleware,
65
+ adapter: per_request_adapter,
66
+ storage_prefix: ->(request) { "tenants/#{request.host.split('.').first}" }
67
+ ```
68
+
69
+ If the adapter itself already scopes paths by tenant, no prefix is needed — just point it at the right one for each request.
70
+
71
+ ## robots.txt integration
72
+
73
+ Emit a `Sitemap:` directive for the generated file:
74
+
75
+ ```ruby
76
+ # config.ru or a controller
77
+ SiteMaps::RobotsTxt.sitemap_directive('https://example.com/sitemap.xml')
78
+ # => "Sitemap: https://example.com/sitemap.xml"
79
+
80
+ SiteMaps::RobotsTxt.render(
81
+ sitemap_url: 'https://example.com/sitemap.xml',
82
+ extra_directives: ['Disallow: /admin']
83
+ )
84
+ # => "Sitemap: https://example.com/sitemap.xml\nDisallow: /admin"
85
+ ```
data/docs/processes.md ADDED
@@ -0,0 +1,156 @@
1
+ # Processes
2
+
3
+ A **process** is a unit of work that produces part of a sitemap. Each process runs on its own thread, writes its own URL set, and becomes an entry in the sitemap index.
4
+
5
+ ## Static processes
6
+
7
+ A static process has no parameters. It runs once and writes one (possibly split) sitemap file.
8
+
9
+ ```ruby
10
+ SiteMaps.use(:file_system) do
11
+ configure { |c| c.url = 'https://example.com/sitemap.xml'; c.directory = 'public' }
12
+
13
+ process do |s|
14
+ s.add('/', priority: 1.0)
15
+ s.add('/about')
16
+ end
17
+
18
+ process :posts, 'posts/sitemap.xml' do |s|
19
+ Post.find_each { |post| s.add("/posts/#{post.slug}", lastmod: post.updated_at) }
20
+ end
21
+ end
22
+ ```
23
+
24
+ - Without an explicit name, the process is named `:default`.
25
+ - Without an explicit location, a default filename is assigned.
26
+ - The block receives a `SitemapBuilder` (`s`), on which `add` is called per URL.
27
+
28
+ ## Dynamic processes
29
+
30
+ A dynamic process has placeholders in its location template and corresponding kwargs. Each unique combination of kwargs produces a separate sitemap file.
31
+
32
+ ```ruby
33
+ process :monthly_posts, 'posts/%{year}-%{month}/sitemap.xml', year: 2024, month: 1 do |s, year:, month:, **|
34
+ Post.where('extract(year from published_at) = ? AND extract(month from published_at) = ?', year, month)
35
+ .find_each { |p| s.add("/posts/#{p.slug}", lastmod: p.updated_at) }
36
+ end
37
+ ```
38
+
39
+ The kwargs passed to `process` are **defaults**; the real values come from `Runner#enqueue`:
40
+
41
+ ```ruby
42
+ runner = SiteMaps.generate(config_file: 'config/sitemap.rb')
43
+ runner.enqueue(:monthly_posts, year: 2024, month: 1)
44
+ runner.enqueue(:monthly_posts, year: 2024, month: 2)
45
+ runner.enqueue(:monthly_posts, year: 2024, month: 3)
46
+ runner.run
47
+ ```
48
+
49
+ Or from the CLI:
50
+
51
+ ```bash
52
+ bundle exec site_maps generate monthly_posts \
53
+ --config-file config/sitemap.rb \
54
+ --context=year:2024 month:1
55
+ ```
56
+
57
+ ## Execution model
58
+
59
+ When you call `runner.run`:
60
+
61
+ 1. Each enqueued process is wrapped in a `Concurrent::Future`.
62
+ 2. The pool (default 4 threads, configurable via `--max-threads`) runs them in parallel.
63
+ 3. Each process builds a `URLSet`. When the set fills up (50,000 links, 1,000 news items, or 50 MB uncompressed), it's finalized and written, and a new URLSet starts — automatically.
64
+ 4. After every process finishes, the sitemap index is aggregated and written to `config.url`.
65
+
66
+ ## Splitting rules
67
+
68
+ A URL set is finalized and rolled over when **any** of these apply:
69
+
70
+ - Links reach `config.max_links` (default 50,000 — the sitemap spec limit).
71
+ - News entries reach 1,000.
72
+ - Uncompressed XML reaches 50 MB.
73
+
74
+ Split files are named by `IncrementalLocation`: `posts/sitemap.xml` becomes `posts/sitemap1.xml`, `posts/sitemap2.xml`, etc.
75
+
76
+ ## Index generation
77
+
78
+ A sitemap index is produced when:
79
+
80
+ - More than one process exists,
81
+ - A single process was split across multiple files, or
82
+ - External sitemaps were added.
83
+
84
+ Otherwise a single `urlset` is written directly at `config.url` (the "inline" optimization).
85
+
86
+ ## Adding external sitemaps
87
+
88
+ Reference third-party or pre-existing sitemaps in the index:
89
+
90
+ ```ruby
91
+ SiteMaps.use(:file_system) do
92
+ configure { |c| c.url = 'https://example.com/sitemap.xml'; c.directory = 'public' }
93
+
94
+ external_sitemap('https://cdn.example.com/legacy-sitemap.xml', lastmod: Time.parse('2024-01-15'))
95
+
96
+ process { |s| s.add('/') }
97
+ end
98
+ ```
99
+
100
+ ## Shared helpers across processes
101
+
102
+ Use `extend_processes_with` to add methods that every process block can call:
103
+
104
+ ```ruby
105
+ module Helpers
106
+ def post_path(post) = "/posts/#{post.slug}"
107
+ def published_posts = Post.where.not(published_at: nil)
108
+ end
109
+
110
+ SiteMaps.use(:file_system) do
111
+ configure { |c| c.url = 'https://example.com/sitemap.xml'; c.directory = 'public' }
112
+ extend_processes_with(Helpers)
113
+
114
+ process :posts do |s|
115
+ published_posts.find_each { |p| s.add(post_path(p), lastmod: p.updated_at) }
116
+ end
117
+ end
118
+ ```
119
+
120
+ ## URL filters
121
+
122
+ Filters run per URL inside every process — use them for global exclusions or default attributes:
123
+
124
+ ```ruby
125
+ SiteMaps.use(:file_system) do
126
+ configure { |c| c.url = 'https://example.com/sitemap.xml'; c.directory = 'public' }
127
+
128
+ # Exclude any /admin path
129
+ url_filter { |url, _options| false if url.include?('/admin') }
130
+
131
+ # Boost blog priority
132
+ url_filter do |url, options|
133
+ if url.include?('/blog/')
134
+ options.merge(priority: 0.9, changefreq: 'daily')
135
+ else
136
+ options
137
+ end
138
+ end
139
+
140
+ process { |s| ... }
141
+ end
142
+ ```
143
+
144
+ A filter returning `false` (or `nil`) excludes the URL entirely. Returning a hash replaces the options.
145
+
146
+ ## Re-running a single shard
147
+
148
+ Only regenerate what changed — the rest is preserved from the existing sitemap index:
149
+
150
+ ```ruby
151
+ runner = SiteMaps.generate(config_file: 'config/sitemap.rb')
152
+ runner.enqueue(:monthly_posts, year: 2024, month: 3) # only March
153
+ runner.run # Jan and Feb kept as-is
154
+ ```
155
+
156
+ This is the main advantage of parameterized dynamic processes: you can rebuild one month's shard on a cron and leave the rest untouched.
data/docs/rails.md ADDED
@@ -0,0 +1,128 @@
1
+ # Rails Integration
2
+
3
+ The Railtie loads automatically when Rails is present. It wires two things:
4
+
5
+ 1. **URL helpers** — `s.route.<helper>` inside process blocks.
6
+ 2. **No other magic** — no initializer, no autoloaded directories, no patched generators.
7
+
8
+ ## URL helpers in processes
9
+
10
+ ```ruby
11
+ # config/sitemap.rb
12
+ SiteMaps.use(:file_system) do
13
+ configure do |config|
14
+ config.url = 'https://example.com/sitemap.xml'
15
+ config.directory = Rails.public_path.to_s
16
+ end
17
+
18
+ process do |s|
19
+ s.add(s.route.root_path, priority: 1.0)
20
+ s.add(s.route.about_path)
21
+ Post.find_each { |p| s.add(s.route.post_path(p), lastmod: p.updated_at) }
22
+ end
23
+ end
24
+ ```
25
+
26
+ `s.route` is a singleton wrapping `Rails.application.routes.url_helpers`.
27
+
28
+ ## Generating from Rails
29
+
30
+ ### One-off
31
+
32
+ ```bash
33
+ bundle exec site_maps generate --config-file config/sitemap.rb
34
+ ```
35
+
36
+ The CLI auto-requires `config/environment.rb` if it finds a `config/application.rb`, so ActiveRecord, URL helpers, and everything else loads as normal.
37
+
38
+ ### From a Rake task
39
+
40
+ ```ruby
41
+ # lib/tasks/sitemap.rake
42
+ namespace :sitemap do
43
+ desc 'Generate sitemaps'
44
+ task generate: :environment do
45
+ runner = SiteMaps.generate(config_file: Rails.root.join('config/sitemap.rb').to_s)
46
+ runner.enqueue_all.run
47
+ end
48
+ end
49
+ ```
50
+
51
+ Run on deploy or via cron:
52
+
53
+ ```bash
54
+ bundle exec rake sitemap:generate
55
+ ```
56
+
57
+ ### From a scheduled job
58
+
59
+ ```ruby
60
+ class SitemapJob < ApplicationJob
61
+ def perform
62
+ runner = SiteMaps.generate(config_file: Rails.root.join('config/sitemap.rb').to_s)
63
+ runner.enqueue_all.run
64
+ end
65
+ end
66
+
67
+ SitemapJob.set(cron: '0 3 * * *').perform_later
68
+ ```
69
+
70
+ ## Serving generated sitemaps
71
+
72
+ Add the Rack middleware to serve files generated by the `:file_system` adapter:
73
+
74
+ ```ruby
75
+ # config/application.rb
76
+ config.middleware.use SiteMaps::Middleware, adapter: -> { SiteMaps.current_adapter }
77
+ ```
78
+
79
+ See [middleware.md](middleware.md) for options.
80
+
81
+ ## Asset precompile integration
82
+
83
+ If you want sitemaps regenerated on every deploy, hook into `assets:precompile`:
84
+
85
+ ```ruby
86
+ # lib/tasks/sitemap.rake
87
+ Rake::Task['assets:precompile'].enhance(['sitemap:generate'])
88
+ ```
89
+
90
+ ## robots.txt
91
+
92
+ ```erb
93
+ <%# public/robots.txt.erb or app/views/robots.text.erb %>
94
+ User-agent: *
95
+ Disallow: /admin
96
+
97
+ <%= SiteMaps::RobotsTxt.sitemap_directive('https://example.com/sitemap.xml') %>
98
+ ```
99
+
100
+ ## Multi-tenant
101
+
102
+ `SiteMaps.define` gives you a generation function parameterized by runtime context:
103
+
104
+ ```ruby
105
+ # config/sitemap.rb
106
+ SiteMaps.define do |tenant:|
107
+ use(:file_system) do
108
+ configure do |config|
109
+ config.url = "https://#{tenant.domain}/sitemap.xml"
110
+ config.directory = tenant.public_path
111
+ end
112
+
113
+ process { |s| tenant.pages.each { |page| s.add(page.path, lastmod: page.updated_at) } }
114
+ end
115
+ end
116
+ ```
117
+
118
+ ```ruby
119
+ Tenant.find_each do |tenant|
120
+ SiteMaps.generate(config_file: 'config/sitemap.rb', context: { tenant: tenant }).enqueue_all.run
121
+ end
122
+ ```
123
+
124
+ The context hash is splatted into the `define` block as keyword args.
125
+
126
+ ## Dependencies
127
+
128
+ - Rails is **not** listed in the gemspec. The Railtie is loaded only if Rails is already present. If you're using `site_maps` in a non-Rails Ruby project, the Rails-specific pieces are inert.
@@ -14,8 +14,7 @@ class SiteMaps::Adapters::AwsSdk::Storage
14
14
  lastmod = options.delete(:last_modified) || Time.now
15
15
  options[:metadata] ||= {}
16
16
  options[:metadata]["given-last-modified"] = lastmod.utc.strftime("%Y-%m-%dT%H:%M:%S%:z")
17
- obj = object(location.remote_path)
18
- obj.upload_file(location.path, **options)
17
+ transfer_manager.upload_file(location.path, bucket: config.bucket, key: location.remote_path, **options)
19
18
  end
20
19
 
21
20
  def read(location)
@@ -49,4 +48,8 @@ class SiteMaps::Adapters::AwsSdk::Storage
49
48
  def object(remote_path)
50
49
  config.s3_bucket.object(remote_path)
51
50
  end
51
+
52
+ def transfer_manager
53
+ @transfer_manager ||= ::Aws::S3::TransferManager.new(client: config.s3_resource.client)
54
+ end
52
55
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SiteMaps
4
- VERSION = "0.1.0"
4
+ VERSION = "0.1.1"
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: site_maps
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.1.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Marcos G. Zimmermann
@@ -105,6 +105,16 @@ files:
105
105
  - Rakefile
106
106
  - bin/console
107
107
  - bin/setup
108
+ - docs/README.md
109
+ - docs/adapters.md
110
+ - docs/api.md
111
+ - docs/cli.md
112
+ - docs/events.md
113
+ - docs/extensions.md
114
+ - docs/getting-started.md
115
+ - docs/middleware.md
116
+ - docs/processes.md
117
+ - docs/rails.md
108
118
  - exec/site_maps
109
119
  - lib/site-maps.rb
110
120
  - lib/site_maps.rb