indexmap 0.4.2 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 9b0f3c8b9175d0c18c39e52b61b3934129dfa8b8b637d06ad3d0d9adc30eb995
4
- data.tar.gz: 3baf06c566ce17d63e189aabfdd7f3e7b4047d5908fc6a58f3e017da5ecef1de
3
+ metadata.gz: 38fa952c31358e79a900d348041a2475d5f597d19652837b50d27713f2004250
4
+ data.tar.gz: 6889768f6be1e01b6de1938687a757b050a52f7f9dbf2af58d1818cca2fdc977
5
5
  SHA512:
6
- metadata.gz: 700e0cd5485cd433ceaa520a2d634eb4e5ccf93d4d8993224fae61acdf7dacd73c91029e24ff750cc58ce8704a5b00bb4ed66eff113d0693c46df75fc5aea4df
7
- data.tar.gz: aa51396dad778e50b18a53606d316d5f9008ed358cecb7b75c0f7dc72f24d47a3959750102fdddafe50fbb31688256e16b07ec9c9c0ae53cb9431148593bb250
6
+ metadata.gz: 612858ebdac07d01107af653411182685e161b1e501670e569d84265e1dbcd455dbb6a7e55615f802d112037875d9eb15d94c11747a2c15e6cbd1b9f63b804de
7
+ data.tar.gz: 512df6e55dbad711516e9558cda07b0b90b65f7f950441e36699fe696965296690c52a478247b707ff9e15801b49b73a5d3428cea92dd1e0b49e9861c14580d2
data/CHANGELOG.md CHANGED
@@ -5,12 +5,12 @@ All notable changes to this project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
- ## [0.4.2] - 2026-04-23
8
+ ## [0.5.0] - 2026-04-24
9
9
 
10
10
 
11
11
  ### Fixed
12
12
 
13
- - harden sitemap pinging and indexnow key handling (#7)
13
+ - namespace rake tasks and harden sitemap validation (#8)
14
14
 
15
15
 
16
16
 
data/README.md CHANGED
@@ -80,12 +80,12 @@ end
80
80
  Then run:
81
81
 
82
82
  ```bash
83
- bin/rails sitemap:create
84
- bin/rails sitemap:format
85
- bin/rails sitemap:validate
83
+ bin/rails indexmap:sitemap:create
84
+ bin/rails indexmap:sitemap:format
85
+ bin/rails indexmap:sitemap:validate
86
86
  ```
87
87
 
88
- `sitemap:create` is the main task. It writes sitemap files, formats them, and validates the result.
88
+ `indexmap:sitemap:create` is the main task. It writes sitemap files, formats them, and validates the result.
89
89
 
90
90
  ### Default Index Mode
91
91
 
@@ -129,8 +129,15 @@ Indexmap::Validator.new.validate!
129
129
  The built-in validator checks for:
130
130
 
131
131
  - missing sitemap files
132
+ - malformed sitemap XML
133
+ - empty sitemap files
134
+ - missing or duplicate child sitemap references
132
135
  - duplicate sitemap URLs
133
136
  - parameterized URLs in sitemap entries
137
+ - fragment URLs in sitemap entries
138
+ - non-HTTP or relative URLs
139
+ - URLs outside the configured `base_url`
140
+ - invalid `lastmod` values
134
141
 
135
142
  ## Search Engine Ping
136
143
 
@@ -139,11 +146,11 @@ The built-in validator checks for:
139
146
  Available rake tasks:
140
147
 
141
148
  ```bash
142
- bin/rails sitemap:validate
143
- bin/rails sitemap:google:ping
144
- bin/rails sitemap:index_now:ping
145
- bin/rails sitemap:index_now:write_key
146
- bin/rails sitemap:ping
149
+ bin/rails indexmap:sitemap:validate
150
+ bin/rails indexmap:google:ping
151
+ bin/rails indexmap:index_now:ping
152
+ bin/rails indexmap:index_now:write_key
153
+ bin/rails indexmap:ping
147
154
  ```
148
155
 
149
156
  ### Google Search Console
@@ -156,7 +163,7 @@ Indexmap.configure do |config|
156
163
  end
157
164
  ```
158
165
 
159
- If `config.google.credentials` is blank, `sitemap:google:ping` skips Google submission.
166
+ If `config.google.credentials` is blank, `indexmap:google:ping` skips Google submission.
160
167
 
161
168
  You can optionally override the Search Console property identifier:
162
169
 
@@ -184,21 +191,21 @@ Indexmap.configure do |config|
184
191
  end
185
192
  ```
186
193
 
187
- If `config.index_now.key` is set, `sitemap:create` also writes the matching `public/<key>.txt` verification file automatically.
194
+ If `config.index_now.key` is set, `indexmap:sitemap:create` also writes the matching `public/<key>.txt` verification file automatically.
188
195
 
189
196
  If you prefer the file-based flow, run:
190
197
 
191
198
  ```bash
192
- bin/rails sitemap:index_now:write_key
199
+ bin/rails indexmap:index_now:write_key
193
200
  ```
194
201
 
195
202
  That task:
196
203
 
197
204
  - reuses an existing valid key file when present
198
205
  - otherwise generates a new key in `public/<key>.txt`
199
- - makes that key available to `sitemap:index_now:ping` without adding `config.index_now.key`
206
+ - makes that key available to `indexmap:index_now:ping` without adding `config.index_now.key`
200
207
 
201
- If neither a configured key nor a valid key file is present, `sitemap:index_now:ping` skips IndexNow submission.
208
+ If neither a configured key nor a valid key file is present, `indexmap:index_now:ping` skips IndexNow submission.
202
209
 
203
210
  ## Development
204
211
 
@@ -1,5 +1,10 @@
1
1
  # frozen_string_literal: true
2
2
 
3
+ require "nokogiri"
4
+ require "date"
5
+ require "time"
6
+ require "uri"
7
+
3
8
  module Indexmap
4
9
  class Validator
5
10
  def initialize(configuration: Indexmap.configuration, path: nil)
@@ -14,9 +19,15 @@ module Indexmap
14
19
  )
15
20
  raise ValidationError, "Missing sitemap file: #{sitemap_path}" unless File.exist?(sitemap_path)
16
21
 
22
+ validate_sitemap_file!(sitemap_path)
17
23
  entries = Parser.new(path: sitemap_path).entries
24
+ validate_presence!(entries)
18
25
  validate_duplicates!(entries)
19
26
  validate_parameterized_urls!(entries)
27
+ validate_fragment_urls!(entries)
28
+ validate_absolute_http_urls!(entries)
29
+ validate_same_host_urls!(entries)
30
+ validate_lastmods!(entries)
20
31
  true
21
32
  end
22
33
 
@@ -24,6 +35,65 @@ module Indexmap
24
35
 
25
36
  attr_reader :configuration, :path
26
37
 
38
+ def validate_sitemap_file!(sitemap_path)
39
+ document = read_xml_document(sitemap_path)
40
+ root_name = document.root&.name
41
+
42
+ case root_name
43
+ when "urlset"
44
+ validate_urlset_document!(document, sitemap_path)
45
+ when "sitemapindex"
46
+ validate_sitemap_index_document!(document, sitemap_path)
47
+ else
48
+ raise ValidationError, "Invalid sitemap root element in #{sitemap_path}: #{root_name || "none"}"
49
+ end
50
+ end
51
+
52
+ def read_xml_document(file_path)
53
+ document = Nokogiri::XML(File.read(file_path, encoding: "UTF-8")) { |config| config.strict }
54
+ document.remove_namespaces!
55
+ document
56
+ rescue Nokogiri::XML::SyntaxError => error
57
+ raise ValidationError, "Invalid sitemap XML in #{file_path}: #{error.message.lines.first.strip}"
58
+ end
59
+
60
+ def validate_urlset_document!(document, sitemap_path)
61
+ return if document.xpath("/urlset/url/loc").any?
62
+
63
+ raise ValidationError, "Sitemap has no URLs: #{sitemap_path}"
64
+ end
65
+
66
+ def validate_sitemap_index_document!(document, sitemap_path)
67
+ child_locations = document.xpath("/sitemapindex/sitemap/loc").map { |node| node.text.to_s.strip }.reject(&:empty?)
68
+ raise ValidationError, "Sitemap index has no child sitemap URLs: #{sitemap_path}" if child_locations.empty?
69
+
70
+ duplicate_children = child_locations.group_by(&:itself).select { |_loc, values| values.size > 1 }.keys
71
+ unless duplicate_children.empty?
72
+ raise ValidationError, "Duplicate child sitemap URLs detected: #{duplicate_children.first(5).join(", ")}"
73
+ end
74
+
75
+ child_locations.each do |location|
76
+ child_path = local_child_path(sitemap_path, location)
77
+ raise ValidationError, "Missing child sitemap file: #{child_path}" unless File.exist?(child_path)
78
+
79
+ validate_sitemap_file!(child_path)
80
+ end
81
+ end
82
+
83
+ def local_child_path(sitemap_path, location)
84
+ uri = URI.parse(location)
85
+ filename = (uri.absolute? || location.start_with?("/")) ? File.basename(uri.path) : location
86
+ File.expand_path(filename, File.dirname(sitemap_path))
87
+ rescue URI::InvalidURIError
88
+ File.expand_path(location, File.dirname(sitemap_path))
89
+ end
90
+
91
+ def validate_presence!(entries)
92
+ return unless entries.empty?
93
+
94
+ raise ValidationError, "Sitemap has no URLs"
95
+ end
96
+
27
97
  def validate_duplicates!(entries)
28
98
  duplicates = entries.map(&:loc).group_by(&:itself).select { |_url, values| values.size > 1 }.keys
29
99
  return if duplicates.empty?
@@ -37,5 +107,55 @@ module Indexmap
37
107
 
38
108
  raise ValidationError, "Parameterized sitemap URLs detected: #{param_urls.first(5).join(", ")}"
39
109
  end
110
+
111
+ def validate_fragment_urls!(entries)
112
+ fragment_urls = entries.map(&:loc).select { |url| parse_uri(url)&.fragment }
113
+ return if fragment_urls.empty?
114
+
115
+ raise ValidationError, "Fragment sitemap URLs detected: #{fragment_urls.first(5).join(", ")}"
116
+ end
117
+
118
+ def validate_absolute_http_urls!(entries)
119
+ invalid_urls = entries.map(&:loc).reject do |url|
120
+ uri = parse_uri(url)
121
+ uri&.absolute? && %w[http https].include?(uri.scheme)
122
+ end
123
+ return if invalid_urls.empty?
124
+
125
+ raise ValidationError, "Invalid sitemap URLs detected: #{invalid_urls.first(5).join(", ")}"
126
+ end
127
+
128
+ def validate_same_host_urls!(entries)
129
+ base_uri = parse_uri(configuration.base_url)
130
+ return unless base_uri&.host
131
+
132
+ invalid_urls = entries.map(&:loc).reject do |url|
133
+ uri = parse_uri(url)
134
+ uri&.host == base_uri.host && uri&.scheme == base_uri.scheme && uri&.port == base_uri.port
135
+ end
136
+ return if invalid_urls.empty?
137
+
138
+ raise ValidationError, "Sitemap URLs outside configured base URL detected: #{invalid_urls.first(5).join(", ")}"
139
+ end
140
+
141
+ def validate_lastmods!(entries)
142
+ invalid_entries = entries.select do |entry|
143
+ next false if entry.lastmod.nil?
144
+
145
+ Date.iso8601(entry.lastmod)
146
+ false
147
+ rescue ArgumentError
148
+ true
149
+ end
150
+ return if invalid_entries.empty?
151
+
152
+ raise ValidationError, "Invalid sitemap lastmod values detected: #{invalid_entries.first(5).map(&:loc).join(", ")}"
153
+ end
154
+
155
+ def parse_uri(url)
156
+ URI.parse(url.to_s)
157
+ rescue URI::InvalidURIError
158
+ nil
159
+ end
40
160
  end
41
161
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Indexmap
4
- VERSION = "0.4.2"
4
+ VERSION = "0.5.0"
5
5
  end
@@ -1,35 +1,37 @@
1
- namespace :sitemap do
2
- desc "Create sitemap files"
3
- task create: :environment do
4
- runner = Indexmap::TaskRunner.new
5
- create_result = runner.create
6
- runner.format
7
- validated_files = runner.validate
8
-
9
- puts "Created, formatted, and validated #{file_count(validated_files)} in #{public_directory(runner)}."
10
- puts "IndexNow key file: #{create_result[:index_now_key_path]}" if create_result[:index_now_key_path]
11
- end
1
+ namespace :indexmap do
2
+ namespace :sitemap do
3
+ desc "Create sitemap files"
4
+ task create: :environment do
5
+ runner = Indexmap::TaskRunner.new
6
+ create_result = runner.create
7
+ runner.format
8
+ validated_files = runner.validate
9
+
10
+ puts "Created, formatted, and validated #{file_count(validated_files)} in #{public_directory(runner)}."
11
+ puts "IndexNow key file: #{create_result[:index_now_key_path]}" if create_result[:index_now_key_path]
12
+ end
12
13
 
13
- desc "Format sitemap files for better readability"
14
- task format: :environment do
15
- runner = Indexmap::TaskRunner.new
16
- formatted_files = runner.format
14
+ desc "Format sitemap files for better readability"
15
+ task format: :environment do
16
+ runner = Indexmap::TaskRunner.new
17
+ formatted_files = runner.format
17
18
 
18
- puts "Formatted #{file_count(formatted_files)} in #{public_directory(runner)}."
19
- end
19
+ puts "Formatted #{file_count(formatted_files)} in #{public_directory(runner)}."
20
+ end
20
21
 
21
- desc "Validate sitemap shape and URL hygiene"
22
- task validate: :environment do
23
- runner = Indexmap::TaskRunner.new
24
- validated_files = runner.validate
22
+ desc "Validate sitemap shape and URL hygiene"
23
+ task validate: :environment do
24
+ runner = Indexmap::TaskRunner.new
25
+ validated_files = runner.validate
25
26
 
26
- puts "Validated #{file_count(validated_files)} for sitemap shape and URL hygiene."
27
+ puts "Validated #{file_count(validated_files)} for sitemap shape and URL hygiene."
28
+ end
27
29
  end
28
30
 
29
31
  desc "Ping all configured search engines"
30
32
  task ping: :environment do
31
- Rake::Task["sitemap:index_now:ping"].invoke
32
- Rake::Task["sitemap:google:ping"].invoke
33
+ Rake::Task["indexmap:index_now:ping"].invoke
34
+ Rake::Task["indexmap:google:ping"].invoke
33
35
  end
34
36
 
35
37
  namespace :google do
@@ -8,30 +8,41 @@ class IndexmapConfigurationTest < Minitest::Test
8
8
  end
9
9
 
10
10
  def test_writer_builds_from_configured_callables
11
- Indexmap.configure do |config|
12
- config.base_url = -> { "https://example.com" }
13
- config.public_path = -> { Pathname("tmp/public") }
14
- config.sections = -> do
15
- [Indexmap::Section.new(filename: "sitemap-pages.xml", entries: [Indexmap::Entry.new(loc: "https://example.com/")])]
11
+ Dir.mktmpdir do |dir|
12
+ public_path = Pathname(dir)
13
+
14
+ Indexmap.configure do |config|
15
+ config.base_url = -> { "https://example.com" }
16
+ config.public_path = -> { public_path }
17
+ config.sections = -> do
18
+ [Indexmap::Section.new(filename: "sitemap-pages.xml", entries: [Indexmap::Entry.new(loc: "https://example.com/")])]
19
+ end
16
20
  end
17
- end
18
21
 
19
- writer = Indexmap.configuration.writer
22
+ Indexmap.configuration.writer.write
20
23
 
21
- assert_equal Pathname("tmp/public"), writer.instance_variable_get(:@public_path)
24
+ assert_includes public_path.join("sitemap.xml").read, "<loc>https://example.com/sitemap-pages.xml</loc>"
25
+ assert_includes public_path.join("sitemap-pages.xml").read, "<loc>https://example.com/</loc>"
26
+ end
22
27
  end
23
28
 
24
29
  def test_writer_builds_single_file_writer_from_configured_entries
25
- Indexmap.configure do |config|
26
- config.base_url = "https://example.com"
27
- config.format = :single_file
28
- config.entries = -> { [Indexmap::Entry.new(loc: "https://example.com/")] }
29
- end
30
+ Dir.mktmpdir do |dir|
31
+ public_path = Pathname(dir)
32
+
33
+ Indexmap.configure do |config|
34
+ config.base_url = "https://example.com"
35
+ config.public_path = public_path
36
+ config.format = :single_file
37
+ config.entries = -> { [Indexmap::Entry.new(loc: "https://example.com/")] }
38
+ end
30
39
 
31
- writer = Indexmap.configuration.writer
40
+ Indexmap.configuration.writer.write
32
41
 
33
- assert_equal :single_file, writer.instance_variable_get(:@format)
34
- assert_equal [Indexmap::Entry.new(loc: "https://example.com/")], writer.instance_variable_get(:@entries)
42
+ assert_includes public_path.join("sitemap.xml").read, "<urlset"
43
+ assert_includes public_path.join("sitemap.xml").read, "<loc>https://example.com/</loc>"
44
+ refute public_path.join("sitemap-pages.xml").exist?
45
+ end
35
46
  end
36
47
 
37
48
  def test_writer_raises_without_base_url
@@ -52,6 +52,120 @@ class IndexmapValidatorTest < Minitest::Test
52
52
  end
53
53
  end
54
54
 
55
+ def test_validate_raises_for_fragment_urls
56
+ Dir.mktmpdir do |directory|
57
+ path = Pathname(directory).join("sitemap.xml")
58
+ path.write(<<~XML)
59
+ <?xml version="1.0" encoding="UTF-8"?>
60
+ <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
61
+ <url><loc>https://example.com/about#team</loc></url>
62
+ </urlset>
63
+ XML
64
+
65
+ error = assert_raises(Indexmap::ValidationError) do
66
+ Indexmap::Validator.new(path: path).validate!
67
+ end
68
+
69
+ assert_equal "Fragment sitemap URLs detected: https://example.com/about#team", error.message
70
+ end
71
+ end
72
+
73
+ def test_validate_raises_for_relative_urls
74
+ Dir.mktmpdir do |directory|
75
+ path = Pathname(directory).join("sitemap.xml")
76
+ path.write(<<~XML)
77
+ <?xml version="1.0" encoding="UTF-8"?>
78
+ <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
79
+ <url><loc>/about</loc></url>
80
+ </urlset>
81
+ XML
82
+
83
+ error = assert_raises(Indexmap::ValidationError) do
84
+ Indexmap::Validator.new(path: path).validate!
85
+ end
86
+
87
+ assert_equal "Invalid sitemap URLs detected: /about", error.message
88
+ end
89
+ end
90
+
91
+ def test_validate_raises_for_urls_outside_configured_base_url
92
+ Dir.mktmpdir do |directory|
93
+ path = Pathname(directory).join("sitemap.xml")
94
+ path.write(<<~XML)
95
+ <?xml version="1.0" encoding="UTF-8"?>
96
+ <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
97
+ <url><loc>https://other.example.com/about</loc></url>
98
+ </urlset>
99
+ XML
100
+
101
+ configuration = Indexmap::Configuration.new
102
+ configuration.base_url = "https://example.com"
103
+
104
+ error = assert_raises(Indexmap::ValidationError) do
105
+ Indexmap::Validator.new(configuration: configuration, path: path).validate!
106
+ end
107
+
108
+ assert_equal "Sitemap URLs outside configured base URL detected: https://other.example.com/about", error.message
109
+ end
110
+ end
111
+
112
+ def test_validate_raises_for_invalid_lastmod_values
113
+ Dir.mktmpdir do |directory|
114
+ path = Pathname(directory).join("sitemap.xml")
115
+ path.write(<<~XML)
116
+ <?xml version="1.0" encoding="UTF-8"?>
117
+ <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
118
+ <url>
119
+ <loc>https://example.com/about</loc>
120
+ <lastmod>not-a-date</lastmod>
121
+ </url>
122
+ </urlset>
123
+ XML
124
+
125
+ error = assert_raises(Indexmap::ValidationError) do
126
+ Indexmap::Validator.new(path: path).validate!
127
+ end
128
+
129
+ assert_equal "Invalid sitemap lastmod values detected: https://example.com/about", error.message
130
+ end
131
+ end
132
+
133
+ def test_validate_raises_for_empty_sitemaps
134
+ Dir.mktmpdir do |directory|
135
+ path = Pathname(directory).join("sitemap.xml")
136
+ path.write(<<~XML)
137
+ <?xml version="1.0" encoding="UTF-8"?>
138
+ <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
139
+ </urlset>
140
+ XML
141
+
142
+ error = assert_raises(Indexmap::ValidationError) do
143
+ Indexmap::Validator.new(path: path).validate!
144
+ end
145
+
146
+ assert_equal "Sitemap has no URLs: #{path}", error.message
147
+ end
148
+ end
149
+
150
+ def test_validate_raises_for_missing_child_sitemap_files
151
+ Dir.mktmpdir do |directory|
152
+ path = Pathname(directory).join("sitemap.xml")
153
+ child_path = Pathname(directory).join("sitemap-pages.xml")
154
+ path.write(<<~XML)
155
+ <?xml version="1.0" encoding="UTF-8"?>
156
+ <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
157
+ <sitemap><loc>https://example.com/sitemap-pages.xml</loc></sitemap>
158
+ </sitemapindex>
159
+ XML
160
+
161
+ error = assert_raises(Indexmap::ValidationError) do
162
+ Indexmap::Validator.new(path: path).validate!
163
+ end
164
+
165
+ assert_equal "Missing child sitemap file: #{child_path}", error.message
166
+ end
167
+ end
168
+
55
169
  def test_validate_passes_for_valid_sitemap
56
170
  Dir.mktmpdir do |directory|
57
171
  path = Pathname(directory).join("sitemap.xml")
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: indexmap
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.2
4
+ version: 0.5.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Paulo Fidalgo