indexmap 0.2.1 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: c3d316abdab0a6a55f613b44a0481bdf98d7d041d3d5b60b8523524450494333
4
- data.tar.gz: a18dd5ad2e6db70ecef78719d8bae1a6d249fd393fe7dc363a8862fedd2faede
3
+ metadata.gz: ecbb344925e56757c840a508365932942f0344ce8fa38af6c3677e6eb6ec9bf3
4
+ data.tar.gz: 3facac318f7fb6553f672afc516910b176294b951d85ac61149cd665e00ded8f
5
5
  SHA512:
6
- metadata.gz: 8dfd199b1b978991b703870f4eb08091b8ce4689156840b7f30a985f951d55d163dce89ba71d7d8d4645931b2cb6980a4134f9b593bd8bde3b97cf272fdfbe4c
7
- data.tar.gz: 875b0e45a6d3ad629f03456f62b8311f6379f3d8cde8a308ec66e6bf5233cb122a34d39531d7a245b5b7e3a3e2bc051ee7fd16d5f2981fb5e5eb6b600ef01fa8
6
+ metadata.gz: 57f1ef28f1339a5cd7afa18f01518a520463eea63087363e765467ffe57f7835b3464345e8a792450c7245c89dedb8338a14771261d99aa06da562ed50ca5e1c
7
+ data.tar.gz: 513b73b694d00775765fd8a7e86d52a4a77971cc67fee2b4bd6ca6c675a5ab1c2e5ecca4ef9e8bc77ca7ff2f55784640d0b58734f180e757b650d2ecf1e04b63
data/CHANGELOG.md CHANGED
@@ -5,9 +5,14 @@ All notable changes to this project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
- ## [0.2.1] - 2026-04-21
8
+ ## [0.3.1] - 2026-04-22
9
+
10
+
11
+ ### Fixed
12
+
13
+ - fix changelog generation (#2)
14
+
15
+ - harden indexmap runtime defaults and test coverage (#3)
9
16
 
10
- ### <!-- 1 -->🐛 Bug Fixes
11
- - publish built gem in release workflow
12
17
 
13
18
 
data/README.md CHANGED
@@ -104,6 +104,47 @@ end
104
104
 
105
105
  In `:single_file` mode, `indexmap` writes a `urlset` directly to `sitemap.xml`. In the default `:index` mode, it writes a sitemap index plus child sitemap files from `sections`.
106
106
 
107
+ ## Validation and Parsing
108
+
109
+ `indexmap` also includes small utilities for working with generated sitemap files:
110
+
111
+ ```ruby
112
+ parser = Indexmap::Parser.new(path: Rails.public_path.join("sitemap.xml"))
113
+ parser.paths
114
+ # => ["/", "/about", "/articles/example"]
115
+
116
+ Indexmap::Validator.new.validate!
117
+ ```
118
+
119
+ The built-in validator checks for:
120
+
121
+ - missing sitemap files
122
+ - duplicate sitemap URLs
123
+ - parameterized URLs in sitemap entries
124
+
125
+ ## Search Engine Ping
126
+
127
+ The gem can ping Google Search Console and IndexNow once your app config provides the required credentials.
128
+
129
+ ```ruby
130
+ Indexmap.configure do |config|
131
+ config.google.credentials = -> { ENV["GOOGLE_SITEMAP"] }
132
+ config.index_now.key = -> { ENV["INDEXNOW_KEY"] }
133
+ end
134
+ ```
135
+
136
+ When `config.index_now.key` is set, `sitemap:create` also writes the matching `public/<key>.txt` verification file automatically.
137
+
138
+ Available rake tasks:
139
+
140
+ ```bash
141
+ bin/rails sitemap:validate
142
+ bin/rails sitemap:google:ping
143
+ bin/rails sitemap:index_now:ping
144
+ bin/rails sitemap:ping
145
+ bin/rails sitemap:index_now:write_key
146
+ ```
147
+
107
148
  ## Development
108
149
 
109
150
  Run tests:
@@ -124,6 +165,12 @@ Run the full default task:
124
165
  bundle exec rake
125
166
  ```
126
167
 
168
+ Tests generate a coverage report automatically. You can run either:
169
+
170
+ ```bash
171
+ bundle exec rake test
172
+ ```
173
+
127
174
  Note: `Gemfile.lock` is intentionally not tracked for this gem, following normal Ruby library conventions.
128
175
 
129
176
  ### Git hooks
@@ -24,10 +24,18 @@ module Indexmap
24
24
  value.nil? ? :index : value.to_sym
25
25
  end
26
26
 
27
+ def google
28
+ @google ||= GoogleConfiguration.new
29
+ end
30
+
27
31
  def index_filename
28
32
  resolve(@index_filename)
29
33
  end
30
34
 
35
+ def index_now
36
+ @index_now ||= IndexNowConfiguration.new
37
+ end
38
+
31
39
  def public_path
32
40
  value = resolve(@public_path)
33
41
  return Pathname("public") if value.nil?
@@ -0,0 +1,21 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Indexmap
4
+ class GoogleConfiguration
5
+ attr_writer :credentials, :property
6
+
7
+ def credentials
8
+ resolve(@credentials)
9
+ end
10
+
11
+ def property
12
+ resolve(@property)
13
+ end
14
+
15
+ private
16
+
17
+ def resolve(value)
18
+ value.respond_to?(:call) ? value.call : value
19
+ end
20
+ end
21
+ end
@@ -0,0 +1,45 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Indexmap
4
+ class IndexNowConfiguration
5
+ DEFAULT_ENDPOINT = "https://api.indexnow.org"
6
+ DEFAULT_MAX_URLS_PER_REQUEST = 500
7
+
8
+ attr_writer :dry_run, :endpoint, :key, :key_path, :max_urls_per_request
9
+
10
+ def dry_run?
11
+ value = resolve(@dry_run)
12
+ value == true || value.to_s == "1"
13
+ end
14
+
15
+ def endpoint
16
+ value = resolve(@endpoint)
17
+ value.to_s.strip.empty? ? DEFAULT_ENDPOINT : value
18
+ end
19
+
20
+ def key
21
+ resolve(@key)
22
+ end
23
+
24
+ def key_path(public_path:)
25
+ configured_path = resolve(@key_path)
26
+ return Pathname(configured_path) unless configured_path.to_s.strip.empty?
27
+ return if key.to_s.strip.empty?
28
+
29
+ Pathname(public_path).join("#{key}.txt")
30
+ end
31
+
32
+ def max_urls_per_request
33
+ value = resolve(@max_urls_per_request)
34
+ return DEFAULT_MAX_URLS_PER_REQUEST if value.nil?
35
+
36
+ value.to_i
37
+ end
38
+
39
+ private
40
+
41
+ def resolve(value)
42
+ value.respond_to?(:call) ? value.call : value
43
+ end
44
+ end
45
+ end
@@ -0,0 +1,202 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "net/http"
4
+ require "nokogiri"
5
+ require "pathname"
6
+ require "uri"
7
+
8
+ module Indexmap
9
+ class Parser
10
+ Entry = Struct.new(:loc, :lastmod, :source_sitemap, keyword_init: true)
11
+
12
+ def initialize(path: default_path, rebase_remote_children: false, index_filename: Indexmap.configuration.index_filename, public_path: Indexmap.configuration.public_path)
13
+ @source = path.to_s
14
+ @rebase_remote_children = rebase_remote_children
15
+ @index_filename = index_filename
16
+ @public_path = public_path
17
+ end
18
+
19
+ def entries(reset: false)
20
+ return reset! && entries if reset
21
+ return @entries if defined?(@entries)
22
+
23
+ visited = Set.new
24
+ @entries = parse_source(@source, visited: visited)
25
+ end
26
+
27
+ def paths(reset: false)
28
+ return reset! && paths if reset
29
+ return @paths if defined?(@paths)
30
+
31
+ seen = Set.new
32
+ @paths = entries.map do |entry|
33
+ path = extract_path(entry.loc)
34
+ next if path.nil?
35
+
36
+ normalized = normalize_path(path)
37
+ next if seen.include?(normalized)
38
+
39
+ seen.add(normalized)
40
+ normalized
41
+ end.compact
42
+ end
43
+
44
+ def urls(base_url:, reset: false)
45
+ return reset! && urls(base_url: base_url) if reset
46
+
47
+ target = URI.parse(base_url)
48
+ port_suffix = (target.port && ![80, 443].include?(target.port)) ? ":#{target.port}" : ""
49
+
50
+ paths.map do |path|
51
+ "#{target.scheme}://#{target.host}#{port_suffix}#{path}"
52
+ end
53
+ end
54
+
55
+ def reset!
56
+ remove_instance_variable(:@entries) if defined?(@entries)
57
+ remove_instance_variable(:@paths) if defined?(@paths)
58
+ end
59
+
60
+ private
61
+
62
+ attr_reader :index_filename, :public_path
63
+
64
+ def default_path
65
+ Indexmap::Path.existing_public_path(public_path: public_path, index_filename: index_filename)
66
+ end
67
+
68
+ def parse_source(source, visited:)
69
+ normalized_source = normalize_source(source)
70
+ return [] if normalized_source.nil? || visited.include?(normalized_source)
71
+
72
+ visited.add(normalized_source)
73
+ xml = read_source(normalized_source)
74
+ return [] if xml.to_s.strip.empty?
75
+
76
+ document = Nokogiri::XML(xml)
77
+ document.remove_namespaces!
78
+
79
+ if document.at_xpath("/sitemapindex")
80
+ document.xpath("//sitemap/loc").flat_map do |node|
81
+ child_source = resolve_child_sitemap(normalized_source, node.text.to_s.strip)
82
+ next [] if child_source.nil?
83
+
84
+ parse_source(child_source, visited: visited)
85
+ end
86
+ else
87
+ document.xpath("//url").map do |url_node|
88
+ loc = url_node.at_xpath("loc")&.text.to_s.strip
89
+ next if loc.empty?
90
+
91
+ lastmod = url_node.at_xpath("lastmod")&.text.to_s.strip
92
+ Entry.new(loc: loc, lastmod: lastmod.empty? ? nil : lastmod, source_sitemap: normalized_source)
93
+ end.compact
94
+ end
95
+ end
96
+
97
+ def resolve_child_sitemap(parent_source, loc)
98
+ return if loc.empty?
99
+
100
+ if remote_source?(parent_source)
101
+ parent_uri = URI.parse(parent_source)
102
+ if remote_source?(loc)
103
+ remote_child_source(parent_uri, loc)
104
+ else
105
+ URI.join(parent_uri.to_s, loc).to_s
106
+ end
107
+ elsif remote_source?(loc)
108
+ uri = URI.parse(loc)
109
+ File.join(File.dirname(parent_source), File.basename(uri.path))
110
+ else
111
+ File.expand_path(loc, File.dirname(parent_source))
112
+ end
113
+ rescue URI::InvalidURIError
114
+ File.expand_path(loc, File.dirname(parent_source))
115
+ end
116
+
117
+ def remote_child_source(parent_uri, loc)
118
+ child_uri = URI.parse(loc)
119
+ return child_uri.to_s unless @rebase_remote_children
120
+ return child_uri.to_s if child_uri.host == parent_uri.host && child_uri.port == parent_uri.port && child_uri.scheme == parent_uri.scheme
121
+
122
+ child_uri.scheme = parent_uri.scheme
123
+ child_uri.host = parent_uri.host
124
+ child_uri.port = parent_uri.port
125
+ child_uri.to_s
126
+ end
127
+
128
+ def normalize_source(source)
129
+ return if source.to_s.strip.empty?
130
+
131
+ if remote_source?(source)
132
+ URI.parse(source).to_s
133
+ else
134
+ Pathname(source).expand_path.to_s
135
+ end
136
+ rescue URI::InvalidURIError
137
+ nil
138
+ end
139
+
140
+ def remote_source?(value)
141
+ uri = URI.parse(value.to_s)
142
+ uri.is_a?(URI::HTTP) || uri.is_a?(URI::HTTPS)
143
+ rescue URI::InvalidURIError
144
+ false
145
+ end
146
+
147
+ def read_source(source)
148
+ if remote_source?(source)
149
+ fetch_remote_source(source)
150
+ elsif File.exist?(source)
151
+ File.read(source, encoding: "UTF-8")
152
+ end
153
+ end
154
+
155
+ def fetch_remote_source(source, redirects_remaining: 3)
156
+ uri = URI.parse(source)
157
+ request = Net::HTTP::Get.new(uri)
158
+ request["User-Agent"] = "Indexmap::Parser/1.0"
159
+
160
+ response = Net::HTTP.start(uri.host, uri.port, use_ssl: uri.scheme == "https", open_timeout: 10, read_timeout: 20) do |http|
161
+ http.request(request)
162
+ end
163
+
164
+ case response
165
+ when Net::HTTPSuccess
166
+ response.body
167
+ when Net::HTTPRedirection
168
+ return if redirects_remaining <= 0
169
+
170
+ location = response["location"].to_s
171
+ return if location.empty?
172
+
173
+ redirected = URI.join(source, location).to_s
174
+ fetch_remote_source(redirected, redirects_remaining: redirects_remaining - 1)
175
+ end
176
+ rescue URI::InvalidURIError
177
+ nil
178
+ end
179
+
180
+ def extract_path(loc)
181
+ return if loc.to_s.strip.empty?
182
+
183
+ if loc.start_with?("http://", "https://")
184
+ path = URI.parse(loc).path
185
+ (path.nil? || path.empty?) ? "/" : path
186
+ elsif loc.start_with?("/")
187
+ loc
188
+ else
189
+ "/#{loc}"
190
+ end
191
+ rescue URI::InvalidURIError
192
+ nil
193
+ end
194
+
195
+ def normalize_path(path)
196
+ return "/" if path == "/"
197
+
198
+ normalized = path.start_with?("/") ? path : "/#{path}"
199
+ normalized.delete_suffix("/")
200
+ end
201
+ end
202
+ end
@@ -0,0 +1,43 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "pathname"
4
+ require "uri"
5
+
6
+ module Indexmap
7
+ module Path
8
+ INDEX_FILENAME = "sitemap.xml"
9
+ LEGACY_FILENAME = "sitemap_index.xml"
10
+
11
+ module_function
12
+
13
+ def canonical_public_path(public_path: default_public_path_root, index_filename: default_index_filename)
14
+ Pathname(public_path).join(index_filename)
15
+ end
16
+
17
+ def existing_public_path(public_path: default_public_path_root, index_filename: default_index_filename, legacy_filename: LEGACY_FILENAME)
18
+ index_path = canonical_public_path(public_path: public_path, index_filename: index_filename)
19
+ return index_path if index_path.exist?
20
+
21
+ Pathname(public_path).join(legacy_filename)
22
+ end
23
+
24
+ def canonical_url(base_url, index_filename: default_index_filename)
25
+ URI.join(base_url, "/#{index_filename}").to_s
26
+ end
27
+
28
+ def default_index_filename
29
+ configured = Indexmap.configuration.index_filename
30
+ configured.to_s.strip.empty? ? INDEX_FILENAME : configured
31
+ rescue
32
+ INDEX_FILENAME
33
+ end
34
+
35
+ def default_public_path_root
36
+ if defined?(Rails)
37
+ Rails.public_path
38
+ else
39
+ Pathname("public")
40
+ end
41
+ end
42
+ end
43
+ end
@@ -0,0 +1,58 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "logger"
4
+ require "uri"
5
+
6
+ module Indexmap
7
+ module Pinger
8
+ class Base
9
+ def self.ping(...)
10
+ new(...).ping
11
+ end
12
+
13
+ def initialize(configuration: Indexmap.configuration)
14
+ @configuration = configuration
15
+ end
16
+
17
+ def ping
18
+ sitemap_files.each do |sitemap_file|
19
+ ping_sitemap(sitemap_file)
20
+ end
21
+ end
22
+
23
+ def logger
24
+ @logger ||= if defined?(Rails) && Rails.respond_to?(:logger) && Rails.logger
25
+ Rails.logger
26
+ else
27
+ Logger.new($stderr).tap do |logger|
28
+ logger.level = Logger::WARN
29
+ end
30
+ end
31
+ end
32
+
33
+ private
34
+
35
+ attr_reader :configuration
36
+
37
+ def host
38
+ configuration.base_url
39
+ end
40
+
41
+ def hostname
42
+ URI.parse(host).host
43
+ end
44
+
45
+ def root_domain
46
+ hostname.sub(/\Awww\./, "")
47
+ end
48
+
49
+ def sitemap_files
50
+ Dir.glob(configuration.public_path.join("sitemap*.xml")).sort
51
+ end
52
+
53
+ def ping_sitemap(_sitemap_file)
54
+ raise NotImplementedError
55
+ end
56
+ end
57
+ end
58
+ end
@@ -0,0 +1,78 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "google/apis/searchconsole_v1"
4
+ require "googleauth"
5
+ require "json"
6
+ require "stringio"
7
+
8
+ module Indexmap
9
+ module Pinger
10
+ class Google < Base
11
+ def initialize(configuration: Indexmap.configuration, service: nil, credentials_builder: nil)
12
+ super(configuration: configuration)
13
+ @service = service
14
+ @credentials_builder = credentials_builder
15
+ end
16
+
17
+ def ping
18
+ if google_configuration.credentials.to_s.strip.empty?
19
+ logger.debug("Google sitemap credentials not configured.")
20
+ return
21
+ end
22
+
23
+ super
24
+ end
25
+
26
+ private
27
+
28
+ attr_reader :credentials_builder
29
+
30
+ def google_configuration
31
+ configuration.google
32
+ end
33
+
34
+ def ping_sitemap(sitemap_file)
35
+ sitemap_url = URI.join(host, File.basename(sitemap_file)).to_s
36
+
37
+ unless authorized?
38
+ logger.error("Google Search Console does not have access to the site: #{root_domain}")
39
+ return
40
+ end
41
+
42
+ webmasters_service.submit_sitemap(property_identifier, sitemap_url)
43
+ logger.debug { "Successfully pinged Google with sitemap: #{sitemap_url}" }
44
+ rescue ::Google::Apis::ClientError => e
45
+ logger.debug { "Failed to ping Google for #{sitemap_url}. Status: #{e.status_code}, Body: #{e.body}" }
46
+ end
47
+
48
+ def authorized?
49
+ webmasters_service.list_sites.site_entry.any? { |site| site.site_url.include?(root_domain) }
50
+ end
51
+
52
+ def property_identifier
53
+ property = google_configuration.property
54
+ property.to_s.strip.empty? ? "sc-domain:#{root_domain}" : property
55
+ end
56
+
57
+ def webmasters_service
58
+ @webmasters_service ||= begin
59
+ service = @service || ::Google::Apis::SearchconsoleV1::SearchConsoleService.new
60
+ service.authorization = authorizer
61
+ service
62
+ end
63
+ end
64
+
65
+ def authorizer
66
+ json_key = JSON.parse(google_configuration.credentials).to_json
67
+ scope = "https://www.googleapis.com/auth/webmasters"
68
+
69
+ return credentials_builder.call(credentials: json_key, scope: scope) if credentials_builder
70
+
71
+ ::Google::Auth::ServiceAccountCredentials.make_creds(
72
+ json_key_io: StringIO.new(json_key),
73
+ scope: scope
74
+ )
75
+ end
76
+ end
77
+ end
78
+ end