sitemap_generator 3.1.1 → 3.2
Sign up to get free protection for your applications and to get access to all the features.
- data/Gemfile.lock +1 -1
- data/README.md +455 -376
- data/VERSION +1 -1
- data/lib/sitemap_generator.rb +1 -0
- data/lib/sitemap_generator/adapters/s3_adapter.rb +25 -0
- data/lib/sitemap_generator/builder/sitemap_file.rb +2 -1
- data/lib/sitemap_generator/builder/sitemap_url.rb +12 -4
- data/lib/sitemap_generator/link_set.rb +2 -3
- data/spec/sitemap_generator/link_set_spec.rb +13 -10
- data/spec/sitemap_generator/mobile_sitemap_spec.rb +27 -0
- data/spec/sitemap_generator/sitemap_generator_spec.rb +3 -3
- data/spec/sitemap_generator/sitemap_groups_spec.rb +5 -5
- data/spec/sitemap_generator/video_sitemap_spec.rb +6 -0
- data/spec/support/schemas/sitemap-mobile.xsd +32 -0
- metadata +17 -12
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -1,98 +1,104 @@
|
|
1
|
-
SitemapGenerator
|
2
|
-
================
|
1
|
+
# SitemapGenerator
|
3
2
|
|
4
3
|
SitemapGenerator is the easiest way to generate Sitemaps in Ruby. Rails integration provides access to the Rails route helpers within your sitemap config file and automatically makes the rake tasks available to you. Or if you prefer to use another framework, you can! You can use the rake tasks provided or run your sitemap configs as plain ruby scripts.
|
5
4
|
|
6
5
|
Sitemaps adhere to the [Sitemap 0.9 protocol][sitemap_protocol] specification.
|
7
6
|
|
8
|
-
Features
|
9
|
-
-------
|
7
|
+
## Features
|
10
8
|
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
9
|
+
* Framework agnostic
|
10
|
+
* Supports [News sitemaps][sitemap_news], [Video sitemaps][sitemap_video], [Image sitemaps][sitemap_images], [Geo sitemaps][sitemap_geo] and [Mobile sitemaps][sitemap_mobile]
|
11
|
+
* Supports read-only filesystems like Heroku via uploading to a remote host like Amazon S3
|
12
|
+
* Compatible with Rails 2 & 3
|
13
|
+
* Adheres to the [Sitemap 0.9 protocol][sitemap_protocol]
|
14
|
+
* Handles millions of links
|
15
|
+
* Automatically compresses your sitemaps
|
16
|
+
* Notifies search engines (Google, Bing, SitemapWriter) of new sitemaps
|
17
|
+
* Ensures your old sitemaps stay in place if the new sitemap fails to generate
|
18
|
+
* Gives you complete control over your sitemaps and their content
|
21
19
|
|
22
|
-
|
23
|
-
|
20
|
+
|
21
|
+
### Show Me
|
24
22
|
|
25
23
|
Install:
|
26
24
|
|
27
|
-
|
25
|
+
```
|
26
|
+
gem install sitemap_generator
|
27
|
+
```
|
28
28
|
|
29
29
|
Create `sitemap.rb`:
|
30
30
|
|
31
|
-
|
32
|
-
|
31
|
+
```ruby
|
32
|
+
require 'rubygems'
|
33
|
+
require 'sitemap_generator'
|
33
34
|
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
|
35
|
+
SitemapGenerator::Sitemap.default_host = 'http://example.com'
|
36
|
+
SitemapGenerator::Sitemap.create do
|
37
|
+
add '/home', :changefreq => 'daily', :priority => 0.9
|
38
|
+
add '/contact_us', :changefreq => 'weekly'
|
39
|
+
end
|
40
|
+
SitemapGenerator::Sitemap.ping_search_engines # called for you when you use the rake task
|
41
|
+
```
|
40
42
|
|
41
43
|
Run it:
|
42
44
|
|
43
|
-
|
45
|
+
```
|
46
|
+
ruby sitemap.rb
|
47
|
+
```
|
44
48
|
|
45
49
|
Output:
|
46
50
|
|
47
|
-
|
48
|
-
|
49
|
-
|
50
|
-
|
51
|
+
```
|
52
|
+
In /Users/karl/projects/sitemap_generator-test/public/
|
53
|
+
+ sitemap1.xml.gz 3 links / 357 Bytes
|
54
|
+
+ sitemap_index.xml.gz 1 sitemaps / 228 Bytes
|
55
|
+
Sitemap stats: 3 links / 1 sitemaps / 0m00s
|
56
|
+
|
57
|
+
Successful ping of Google
|
58
|
+
Successful ping of Bing
|
59
|
+
Successful ping of Sitemap Writer
|
60
|
+
```
|
51
61
|
|
52
|
-
Successful ping of Google
|
53
|
-
Successful ping of Ask
|
54
|
-
Successful ping of Bing
|
55
|
-
Successful ping of Sitemap Writer
|
56
62
|
|
57
|
-
Contribute
|
58
|
-
-------
|
63
|
+
## Contribute
|
59
64
|
|
60
65
|
Does your website use SitemapGenerator to generate Sitemaps? Where would you be without Sitemaps? Probably still knocking rocks together. Consider donating to the project to keep it up-to-date and open source.
|
61
66
|
|
62
67
|
<a href='http://www.pledgie.com/campaigns/15267'><img alt='Click here to lend your support to: SitemapGenerator and make a donation at www.pledgie.com !' src='http://pledgie.com/campaigns/15267.png?skin_name=chrome' border='0' /></a>
|
63
68
|
|
64
|
-
Changelog
|
65
|
-
-------
|
66
69
|
|
67
|
-
|
68
|
-
|
69
|
-
|
70
|
-
|
70
|
+
## Changelog
|
71
|
+
|
72
|
+
* v3.2: **Support mobile tags**, **SitemapGenerator::S3Adapter** a simple S3 adapter which uses Fog and doesn't require CarrierWave; Remove Ask from the sitemap ping because the service has been shutdown; [Turn off `include_index`][include_index_change] by default; Fix the news XML namespace; Only include autoplay attribute if present
|
73
|
+
* v3.1.1: Bugfix: Groups inherit current adapter
|
74
|
+
* v3.1.0: Add `add_to_index` method to add links to the sitemap index. Add `sitemap` method for accessing the LinkSet instance from within `create()`. Don't modify options hashes passed to methods. Fix and improve `yield_sitemap` option handling.
|
75
|
+
* **v3.0.0: Framework agnostic**; fix alignment in output, show directory sitemaps are being generated into, only show sitemap compressed file size; toggle output using VERBOSE environment variable; remove tasks/ directory because it's deprecated in Rails 2; Simplify dependencies.
|
76
|
+
* v2.2.1: Support adding new search engines to ping and modifying the default search engines.
|
71
77
|
Allow the URL of the sitemap index to be passed as an argument to `ping_search_engines`. See **Pinging Search Engines**.
|
72
|
-
|
73
|
-
|
78
|
+
* v2.1.8: Extend and improve Video Sitemap support. Include sitemap docs in the README, support all element attributes, properly format values.
|
79
|
+
* v2.1.7: Improve format of float priorities; Remove Yahoo from ping - the Yahoo
|
74
80
|
service has been shut down.
|
75
|
-
|
76
|
-
|
77
|
-
|
78
|
-
|
79
|
-
|
80
|
-
|
81
|
-
|
82
|
-
|
83
|
-
|
84
|
-
|
85
|
-
|
86
|
-
|
87
|
-
|
88
|
-
|
89
|
-
|
90
|
-
|
91
|
-
|
92
|
-
|
93
|
-
|
94
|
-
|
95
|
-
|
81
|
+
* v2.1.6: Fix the lastmod value on sitemap file links
|
82
|
+
* v2.1.5: Fix verbose setting in the rake tasks; should default to true
|
83
|
+
* v2.1.4: Allow special characters in URLs (don't use URI.join to construct URLs)
|
84
|
+
* v2.1.3: Fix calling create with both `filename` and `sitemaps_namer` options
|
85
|
+
* v2.1.2: Support multiple videos per url using the new `videos` option to `add()`.
|
86
|
+
* v2.1.1: Support calling `create()` multiple times in a sitemap config. Support host names with path segments so you can use a `default_host` like `'http://mysite.com/subdirectory/'`. Turn off `include_index` when the `sitemaps_host` differs from `default_host`. Add docs about how to upload to remote hosts.
|
87
|
+
* v2.1.0: [News sitemap][sitemap_news] support
|
88
|
+
* v2.0.1.pre2: Fix uploading to the (bucket) root on a remote server
|
89
|
+
* v2.0.1.pre1: Support read-only filesystems like Heroku by supporting uploading to remote host
|
90
|
+
* v2.0.1: Minor improvements to verbose handling; prevent missing Timeout issue
|
91
|
+
* **v2.0.0: Introducing a new simpler API, Sitemap Groups, Sitemap Namers and more!**
|
92
|
+
* v1.5.0: New options `include_root`, `include_index`; Major testing & refactoring
|
93
|
+
* v1.4.0: [Geo sitemap][geo_tags] support, multiple sitemap support via CONFIG_FILE rake option
|
94
|
+
* v1.3.0: Support setting the sitemaps path
|
95
|
+
* v1.2.0: Verified working with Rails 3 stable release
|
96
|
+
* v1.1.0: [Video sitemap][sitemap_video] support
|
97
|
+
* v0.2.6: [Image Sitemap][sitemap_images] support
|
98
|
+
* v0.2.5: Rails 3 prerelease support (beta)
|
99
|
+
|
100
|
+
|
101
|
+
## Foreword
|
96
102
|
|
97
103
|
Adam Salter first created SitemapGenerator while we were working together in Sydney, Australia. Unfortunately, he passed away in 2009. Since then I have taken over development of SitemapGenerator.
|
98
104
|
|
@@ -100,46 +106,53 @@ Those who knew him know what an amazing guy he was, and what an excellent Rails
|
|
100
106
|
|
101
107
|
The canonical repository is now: [http://github.com/kjvarga/sitemap_generator][canonical_repo]
|
102
108
|
|
103
|
-
Install
|
104
|
-
=======
|
105
109
|
|
106
|
-
|
107
|
-
|
110
|
+
## Install
|
111
|
+
|
112
|
+
### Ruby
|
108
113
|
|
109
|
-
|
114
|
+
```
|
115
|
+
gem install 'sitemap_generator'
|
116
|
+
```
|
110
117
|
|
111
118
|
To use the rake tasks add the following to your `Rakefile`:
|
112
119
|
|
113
|
-
|
120
|
+
```ruby
|
121
|
+
require 'sitemap_generator/tasks'
|
122
|
+
```
|
114
123
|
|
115
124
|
The Rake tasks expect your sitemap to be at `config/sitemap.rb` but if you need to change that call like so: `rake sitemap:refresh CONFIG_FILE="path/to/sitemap.rb"`
|
116
125
|
|
117
|
-
Rails
|
118
|
-
-----
|
126
|
+
### Rails
|
119
127
|
|
120
128
|
Add the gem to your `Gemfile`:
|
121
129
|
|
122
|
-
|
130
|
+
```ruby
|
131
|
+
gem 'sitemap_generator'
|
132
|
+
```
|
123
133
|
|
124
134
|
Alternatively, if you are not using a `Gemfile` add the gem to your `config/environment.rb` file config block:
|
125
135
|
|
126
|
-
|
136
|
+
```ruby
|
137
|
+
config.gem 'sitemap_generator'
|
138
|
+
```
|
139
|
+
|
127
140
|
|
128
141
|
**Rails 1 or 2 only**, add the following code to your `Rakefile` to include the gem's Rake tasks in your project (Rails 3 does this for you automatically, so this step is not necessary):
|
129
142
|
|
130
|
-
|
131
|
-
|
132
|
-
|
133
|
-
|
134
|
-
|
143
|
+
```ruby
|
144
|
+
begin
|
145
|
+
require 'sitemap_generator/tasks'
|
146
|
+
rescue Exception => e
|
147
|
+
puts "Warning, couldn't load gem tasks: #{e.message}! Skipping..."
|
148
|
+
end
|
149
|
+
```
|
135
150
|
|
136
|
-
|
151
|
+
_If you would prefer to install as a plugin (deprecated) don't do any of the above. Simply run `script/plugin install git://github.com/kjvarga/sitemap_generator.git` from your application root directory._
|
137
152
|
|
138
|
-
Getting Started
|
139
|
-
======
|
153
|
+
## Getting Started
|
140
154
|
|
141
|
-
Preventing Output
|
142
|
-
-----
|
155
|
+
### Preventing Output
|
143
156
|
|
144
157
|
To disable all non-essential output set the environment variable `VERBOSE=false` when calling Rake or running your Ruby script.
|
145
158
|
|
@@ -147,65 +160,73 @@ Alternatively you can pass the `-s` option to Rake, for example `rake -s sitemap
|
|
147
160
|
|
148
161
|
To disable output in-code use the following:
|
149
162
|
|
150
|
-
|
151
|
-
|
152
|
-
|
153
|
-
-----
|
154
|
-
|
155
|
-
Run `rake sitemap:install` to create a `config/sitemap.rb` file which is your sitemap configuration and contains everything needed to build your sitemap. See **Sitemap Configuration** below for more information about how to define your sitemap.
|
163
|
+
```ruby
|
164
|
+
SitemapGenerator.verbose = false
|
165
|
+
```
|
156
166
|
|
157
|
-
|
167
|
+
### Rake Tasks
|
158
168
|
|
159
|
-
`rake sitemap:
|
169
|
+
* `rake sitemap:install` will create a `config/sitemap.rb` file which is your sitemap configuration and contains everything needed to build your sitemap. See **Sitemap Configuration** below for more information about how to define your sitemap.
|
170
|
+
* `rake sitemap:refresh` will create or rebuild your sitemap files as needed. Sitemaps are generated into the `public/` folder and by default are named `sitemap_index.xml.gz`, `sitemap1.xml.gz`, `sitemap2.xml.gz`, etc. As you can see they are automatically gzip compressed for you.
|
171
|
+
* `rake sitemap:refresh` will output information about each sitemap that is written including its location, how many links it contains and the size of the file.
|
160
172
|
|
161
173
|
|
162
|
-
Pinging Search Engines
|
163
|
-
-----
|
174
|
+
### Pinging Search Engines
|
164
175
|
|
165
|
-
Using `rake sitemap:refresh` will notify major search engines to let them know that a new sitemap is available (Google, Bing,
|
176
|
+
Using `rake sitemap:refresh` will notify major search engines to let them know that a new sitemap is available (Google, Bing, SitemapWriter). To generate new sitemaps without notifying search engines (for example when running in a local environment) use `rake sitemap:refresh:no_ping`.
|
166
177
|
|
167
178
|
If you want to customize the hash of search engines you can access it at:
|
168
179
|
|
169
|
-
|
180
|
+
```ruby
|
181
|
+
SitemapGenerator::Sitemap.search_engines
|
182
|
+
```
|
170
183
|
|
171
184
|
Usually you would be adding a new search engine to ping. In this case you can modify the `search_engines` hash directly. This ensures that when `SitemapGenerator::Sitemap.ping_search_engines` is called your new search engine will be included.
|
172
185
|
|
173
186
|
If you are calling `ping_search_engines` manually (for example if you have to wait some time or perform a custom action after your sitemaps have been regenerated) then you can pass you new search engine directly in the call as in the following example:
|
174
187
|
|
175
|
-
|
188
|
+
```ruby
|
189
|
+
SitemapGenerator::Sitemap.ping_search_engines(:newengine => 'http://newengine.com/ping?url=%s')
|
190
|
+
```
|
176
191
|
|
177
192
|
The key gives the name of the search engine as a string or symbol and the value is the full URL to ping with a string interpolation that will be replaced by the CGI escaped sitemap index URL. If you have any literal percent characters in your URL you need to escape them with `%%`.
|
178
193
|
|
179
194
|
If you are calling `SitemapGenerator::Sitemap.ping_search_engines` from outside of your sitemap config file then you will need to set `SitemapGenerator::Sitemap.default_host` and any other options that you set in your sitemap config which affect the location of the sitemap index file. For example:
|
180
195
|
|
181
|
-
|
182
|
-
|
196
|
+
```ruby
|
197
|
+
SitemapGenerator::Sitemap.default_host = 'http://example.com'
|
198
|
+
SitemapGenerator::Sitemap.ping_search_engines
|
199
|
+
```
|
183
200
|
|
184
201
|
Alternatively you can pass in the full URL to your sitemap index in which case we would have just the following:
|
185
202
|
|
186
|
-
|
203
|
+
```ruby
|
204
|
+
SitemapGenerator::Sitemap.ping_search_engines('http://example.com/sitemap_index.xml.gz')
|
205
|
+
```
|
187
206
|
|
188
|
-
Crontab
|
189
|
-
-----
|
207
|
+
### Crontab
|
190
208
|
|
191
209
|
To keep your sitemaps up-to-date, setup a cron job. Make sure to pass the `-s` option to silence rake. That way you will only get email if the sitemap build fails.
|
192
210
|
|
193
211
|
If you're using Whenever, your schedule would look something like this:
|
194
212
|
|
195
|
-
|
196
|
-
|
197
|
-
|
198
|
-
|
213
|
+
```ruby
|
214
|
+
# config/schedule.rb
|
215
|
+
every 1.day, :at => '5:00 am' do
|
216
|
+
rake "-s sitemap:refresh"
|
217
|
+
end
|
218
|
+
```
|
199
219
|
|
200
|
-
|
201
|
-
|
220
|
+
|
221
|
+
### Robots.txt
|
202
222
|
|
203
223
|
You should add the URL of the sitemap index file to `public/robots.txt` to help search engines find your sitemaps. The URL should be the complete URL to the sitemap index. For example:
|
204
224
|
|
205
|
-
|
225
|
+
```
|
226
|
+
Sitemap: http://www.example.com/sitemap_index.xml.gz
|
227
|
+
```
|
206
228
|
|
207
|
-
Deployments & Capistrano
|
208
|
-
----------
|
229
|
+
## Deployments & Capistrano
|
209
230
|
|
210
231
|
To ensure that your application's sitemaps are available after a deployment you can do one of the following:
|
211
232
|
|
@@ -213,29 +234,37 @@ To ensure that your application's sitemaps are available after a deployment you
|
|
213
234
|
|
214
235
|
You can set your sitemaps path to your shared directory using the `sitemaps_path` option. For example if we have a directory `public/shared/` that is shared by all deployments we can have our sitemaps generated into that directory by setting:
|
215
236
|
|
216
|
-
|
237
|
+
```ruby
|
238
|
+
SitemapGenerator::Sitemap.sitemaps_path = 'shared/'
|
239
|
+
```
|
217
240
|
|
218
241
|
2. **Copy the sitemaps from the previous deploy over to the new deploy:**
|
219
242
|
|
220
243
|
(You will need to customize the task if you are using custom sitemap filenames or locations.)
|
221
244
|
|
222
|
-
|
223
|
-
|
224
|
-
|
225
|
-
|
226
|
-
|
227
|
-
|
228
|
-
|
245
|
+
```ruby
|
246
|
+
after "deploy:update_code", "deploy:copy_old_sitemap"
|
247
|
+
namespace :deploy do
|
248
|
+
task :copy_old_sitemap do
|
249
|
+
run "if [ -e #{previous_release}/public/sitemap_index.xml.gz ]; then cp #{previous_release}/public/sitemap* #{current_release}/public/; fi"
|
250
|
+
end
|
251
|
+
end
|
252
|
+
```
|
229
253
|
|
230
254
|
3. **Regenerate your sitemaps after each deployment:**
|
231
255
|
|
232
|
-
|
233
|
-
|
234
|
-
|
235
|
-
|
256
|
+
```ruby
|
257
|
+
after "deploy", "refresh_sitemaps"
|
258
|
+
task :refresh_sitemaps do
|
259
|
+
run "cd #{latest_release} && RAILS_ENV=#{rails_env} rake sitemap:refresh"
|
260
|
+
end
|
261
|
+
```
|
262
|
+
|
263
|
+
### Upload Sitemaps to a Remote Host
|
236
264
|
|
237
|
-
|
238
|
-
|
265
|
+
> SitemapGenerator::S3Adapter is a simple S3 adapter which was added in v3.2 which
|
266
|
+
> uses Fog and doesn't require CarrierWave. You can find a bit more information
|
267
|
+
> about it [on the wiki page][remote_hosts].
|
239
268
|
|
240
269
|
Sometimes it is desirable to host your sitemap files on a remote server and point robots
|
241
270
|
and search engines to the remote files. For example if you are using a host like Heroku
|
@@ -258,25 +287,29 @@ Sitemap Generator uses CarrierWave to support uploading to Amazon S3 store, Rack
|
|
258
287
|
|
259
288
|
For Example:
|
260
289
|
|
261
|
-
|
262
|
-
|
263
|
-
|
264
|
-
|
265
|
-
|
290
|
+
```ruby
|
291
|
+
SitemapGenerator::Sitemap.default_host = "http://www.example.com"
|
292
|
+
SitemapGenerator::Sitemap.sitemaps_host = "http://s3.amazonaws.com/sitemap-generator/"
|
293
|
+
SitemapGenerator::Sitemap.public_path = 'tmp/'
|
294
|
+
SitemapGenerator::Sitemap.sitemaps_path = 'sitemaps/'
|
295
|
+
SitemapGenerator::Sitemap.adapter = SitemapGenerator::WaveAdapter.new
|
296
|
+
```
|
266
297
|
|
267
298
|
3. Update your `robots.txt` file to point robots to the remote sitemap index file, e.g:
|
268
299
|
|
269
|
-
|
300
|
+
```
|
301
|
+
Sitemap: http://s3.amazonaws.com/sitemap-generator/sitemaps/sitemap_index.xml.gz
|
302
|
+
```
|
270
303
|
|
271
304
|
You generate your sitemaps as usual using `rake sitemap:refresh`.
|
272
305
|
|
273
306
|
Note that SitemapGenerator will automatically turn off `include_index` in this case because
|
274
307
|
the `sitemaps_host` does not match the `default_host`. The link to the sitemap index file
|
275
308
|
that would otherwise be included would point to a different host than the rest of the links
|
276
|
-
in the sitemap, something that the sitemap rules forbid.
|
309
|
+
in the sitemap, something that the sitemap rules forbid. (Since version 3.2 this is no
|
310
|
+
longer an issue because [`include_index` is off by default][include_index_change].)
|
277
311
|
|
278
|
-
Generating Multiple Sitemaps
|
279
|
-
----------
|
312
|
+
### Generating Multiple Sitemaps
|
280
313
|
|
281
314
|
Each call to `create` creates a new sitemap index and associated sitemaps. You can call `create` as many times as you want within your sitemap configuration.
|
282
315
|
|
@@ -285,73 +318,85 @@ overwrite each other. You can use the `filename`, `sitemaps_namer` and `sitemap
|
|
285
318
|
|
286
319
|
In the following example we generate three sitemaps each in its own subdirectory:
|
287
320
|
|
288
|
-
|
289
|
-
|
290
|
-
|
291
|
-
|
292
|
-
|
293
|
-
|
294
|
-
|
321
|
+
```ruby
|
322
|
+
%w(google bing apple).each do |subdomain|
|
323
|
+
SitemapGenerator::Sitemap.default_host = "https://#{subdomain}.mysite.com"
|
324
|
+
SitemapGenerator::Sitemap.sitemaps_path = "sitemaps/#{subdomain}"
|
325
|
+
SitemapGenerator::Sitemap.create do
|
326
|
+
add '/home'
|
327
|
+
end
|
328
|
+
end
|
329
|
+
```
|
295
330
|
|
296
331
|
Outputs:
|
297
332
|
|
298
|
-
|
299
|
-
|
300
|
-
|
301
|
-
|
302
|
-
|
303
|
-
|
304
|
-
|
305
|
-
|
306
|
-
|
333
|
+
```
|
334
|
+
+ sitemaps/google/sitemap1.xml.gz 2 links / 822 Bytes / 328 Bytes gzipped
|
335
|
+
+ sitemaps/google/sitemap_index.xml.gz 1 sitemaps / 389 Bytes / 217 Bytes gzipped
|
336
|
+
Sitemap stats: 2 links / 1 sitemaps / 0m00s
|
337
|
+
+ sitemaps/bing/sitemap1.xml.gz 2 links / 820 Bytes / 330 Bytes gzipped
|
338
|
+
+ sitemaps/bing/sitemap_index.xml.gz 1 sitemaps / 388 Bytes / 217 Bytes gzipped
|
339
|
+
Sitemap stats: 2 links / 1 sitemaps / 0m00s
|
340
|
+
+ sitemaps/apple/sitemap1.xml.gz 2 links / 820 Bytes / 330 Bytes gzipped
|
341
|
+
+ sitemaps/apple/sitemap_index.xml.gz 1 sitemaps / 388 Bytes / 214 Bytes gzipped
|
342
|
+
Sitemap stats: 2 links / 1 sitemaps / 0m00s
|
343
|
+
```
|
307
344
|
|
308
345
|
If you don't want to have to generate all the sitemaps at once, or you want to refresh some more often than others, you can split them up into their own configuration files. Using the above example we would have:
|
309
346
|
|
310
|
-
|
311
|
-
|
312
|
-
|
313
|
-
|
314
|
-
|
315
|
-
|
316
|
-
|
317
|
-
|
318
|
-
|
319
|
-
|
320
|
-
|
321
|
-
|
322
|
-
|
323
|
-
|
324
|
-
|
325
|
-
|
326
|
-
|
327
|
-
|
328
|
-
|
329
|
-
|
347
|
+
```ruby
|
348
|
+
# config/google_sitemap.rb
|
349
|
+
SitemapGenerator::Sitemap.default_host = "https://google.mysite.com"
|
350
|
+
SitemapGenerator::Sitemap.sitemaps_path = "sitemaps/google"
|
351
|
+
SitemapGenerator::Sitemap.create do
|
352
|
+
add '/home'
|
353
|
+
end
|
354
|
+
|
355
|
+
# config/apple_sitemap.rb
|
356
|
+
SitemapGenerator::Sitemap.default_host = "https://apple.mysite.com"
|
357
|
+
SitemapGenerator::Sitemap.sitemaps_path = "sitemaps/apple"
|
358
|
+
SitemapGenerator::Sitemap.create do
|
359
|
+
add '/home'
|
360
|
+
end
|
361
|
+
|
362
|
+
# config/bing_sitemap.rb
|
363
|
+
SitemapGenerator::Sitemap.default_host = "https://bing.mysite.com"
|
364
|
+
SitemapGenerator::Sitemap.sitemaps_path = "sitemaps/bing"
|
365
|
+
SitemapGenerator::Sitemap.create do
|
366
|
+
add '/home'
|
367
|
+
end
|
368
|
+
```
|
369
|
+
|
330
370
|
|
331
371
|
To generate each one specify the configuration file to run by passing the `CONFIG_FILE` option to `rake sitemap:refresh`, e.g.:
|
332
372
|
|
333
|
-
|
334
|
-
|
335
|
-
|
373
|
+
```
|
374
|
+
rake sitemap:refresh CONFIG_FILE="config/google_sitemap.rb"
|
375
|
+
rake sitemap:refresh CONFIG_FILE="config/apple_sitemap.rb"
|
376
|
+
rake sitemap:refresh CONFIG_FILE="config/bing_sitemap.rb"
|
377
|
+
```
|
336
378
|
|
337
|
-
Sitemap Configuration
|
338
|
-
======
|
379
|
+
## Sitemap Configuration
|
339
380
|
|
340
381
|
A sitemap configuration file contains all the information needed to generate your sitemaps. By default SitemapGenerator looks for a configuration file in `config/sitemap.rb` - relative to your application root or the current working directory. (Run `rake sitemap:install` to have this file generated for you if you have not done so already.)
|
341
382
|
|
342
383
|
If you want to use a non-standard configuration file, or have multiple configuration files, you can specify which one to run by passing the `CONFIG_FILE` option like so:
|
343
384
|
|
344
|
-
|
385
|
+
```
|
386
|
+
rake sitemap:refresh CONFIG_FILE="config/geo_sitemap.rb"
|
387
|
+
```
|
388
|
+
|
345
389
|
|
346
|
-
A Simple Example
|
347
|
-
-------
|
390
|
+
### A Simple Example
|
348
391
|
|
349
392
|
So what does a sitemap configuration look like? Let's take a look at a simple example:
|
350
393
|
|
351
|
-
|
352
|
-
|
353
|
-
|
354
|
-
|
394
|
+
```ruby
|
395
|
+
SitemapGenerator::Sitemap.default_host = "http://www.example.com"
|
396
|
+
SitemapGenerator::Sitemap.create do
|
397
|
+
add '/welcome'
|
398
|
+
end
|
399
|
+
```
|
355
400
|
|
356
401
|
A few things to note:
|
357
402
|
|
@@ -362,63 +407,64 @@ A few things to note:
|
|
362
407
|
|
363
408
|
Now let's see what is output when we run this configuration with `rake sitemap:refresh:no_ping`:
|
364
409
|
|
365
|
-
|
366
|
-
|
367
|
-
|
410
|
+
```
|
411
|
+
+ sitemap1.xml.gz 2 links / 923 Bytes / 329 Bytes gzipped
|
412
|
+
+ sitemap_index.xml.gz 1 sitemaps / 364 Bytes / 199 Bytes gzipped
|
413
|
+
Sitemap stats: 2 links / 1 sitemaps / 0m00s
|
414
|
+
```
|
368
415
|
|
369
|
-
Weird! The sitemap has
|
416
|
+
Weird! The sitemap has two links, even though only added one! This is because SitemapGenerator adds the root URL `/` by default. (Note that prior to version 3.2 the URL of the sitemap index file was also added to the sitemap by default but [this behaviour has been changed][include_index_change] because of Google complaining about nested indexing.) You can change the default behaviour by setting the `include_root` or `include_index` option.
|
370
417
|
|
371
418
|
Now let's take a look at the files that were created. After uncompressing and XML-tidying the contents we have:
|
372
419
|
|
373
420
|
* `public/sitemap_index.xml.gz`
|
374
421
|
|
375
|
-
|
376
|
-
|
377
|
-
|
378
|
-
|
379
|
-
|
380
|
-
|
422
|
+
```xml
|
423
|
+
<?xml version="1.0" encoding="UTF-8"?>
|
424
|
+
<sitemapindex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd">
|
425
|
+
<sitemap>
|
426
|
+
<loc>http://www.example.com/sitemap1.xml.gz</loc>
|
427
|
+
</sitemap>
|
428
|
+
</sitemapindex>
|
429
|
+
```
|
381
430
|
|
382
431
|
* `public/sitemap1.xml.gz`
|
383
432
|
|
384
|
-
|
385
|
-
|
386
|
-
|
387
|
-
|
388
|
-
|
389
|
-
|
390
|
-
|
391
|
-
|
392
|
-
|
393
|
-
|
394
|
-
|
395
|
-
|
396
|
-
|
397
|
-
|
398
|
-
|
399
|
-
|
400
|
-
|
401
|
-
|
402
|
-
|
403
|
-
|
404
|
-
|
405
|
-
|
406
|
-
The sitemaps conform to the [Sitemap 0.9 protocol][sitemap_protocol]. Notice the values for `priority` and `changefreq` on the root and sitemap index links, the ones that were added for us? The values tell us that these links are the highest priority and should be checked regularly because they are constantly changing. You can specify your own values for these options in your call to `add`.
|
407
|
-
|
408
|
-
Adding Links
|
409
|
-
----------
|
433
|
+
```xml
|
434
|
+
<?xml version="1.0" encoding="UTF-8"?>
|
435
|
+
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:video="http://www.google.com/schemas/sitemap-video/1.1" xmlns:geo="http://www.google.com/geo/schemas/sitemap/1.0" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
|
436
|
+
<url>
|
437
|
+
<loc>http://www.example.com/</loc>
|
438
|
+
<lastmod>2011-05-21T00:03:38+00:00</lastmod>
|
439
|
+
<changefreq>always</changefreq>
|
440
|
+
<priority>1.0</priority>
|
441
|
+
</url>
|
442
|
+
<url>
|
443
|
+
<loc>http://www.example.com/welcome</loc>
|
444
|
+
<lastmod>2011-05-21T00:03:38+00:00</lastmod>
|
445
|
+
<changefreq>weekly</changefreq>
|
446
|
+
<priority>0.5</priority>
|
447
|
+
</url>
|
448
|
+
</urlset>
|
449
|
+
```
|
450
|
+
|
451
|
+
The sitemaps conform to the [Sitemap 0.9 protocol][sitemap_protocol]. Notice the value for `priority` and `changefreq` on the root link, the one that was added for us? The values tell us that this link is the highest priority and should be checked regularly because it are constantly changing. You can specify your own values for these options in your call to `add`.
|
452
|
+
|
453
|
+
### Adding Links
|
410
454
|
|
411
455
|
You call `add` in the block passed to `create` to add a **path** to your sitemap. `add` takes a string path and optional hash of options, generates the URL and adds it to the sitemap. You only need to pass a **path** because the URL will be built for us using the `default_host` we specified. However, if we want to use a different host for a particular link, we can pass the `:host` option to `add`.
|
412
456
|
|
413
457
|
Let's see another example:
|
414
458
|
|
415
|
-
|
416
|
-
|
417
|
-
|
418
|
-
|
419
|
-
|
420
|
-
|
421
|
-
|
459
|
+
```ruby
|
460
|
+
SitemapGenerator::Sitemap.default_host = "http://www.example.com"
|
461
|
+
SitemapGenerator::Sitemap.create do
|
462
|
+
add '/contact_us'
|
463
|
+
Content.find_each do |content|
|
464
|
+
add content_path(content), :lastmod => content.updated_at
|
465
|
+
end
|
466
|
+
end
|
467
|
+
```
|
422
468
|
|
423
469
|
In this example first we add the `/contact_us` page to the sitemap and then we iterate through the Content model's records adding each one to the sitemap using the `content_path` helper method to generate the path for each record.
|
424
470
|
|
@@ -428,9 +474,11 @@ In the example about we pass a `lastmod` (last modified) option with the value o
|
|
428
474
|
|
429
475
|
Looking at the output from running this sitemap, we see that we have a few more links than before:
|
430
476
|
|
431
|
-
|
432
|
-
|
433
|
-
|
477
|
+
```
|
478
|
+
+ sitemap1.xml.gz 12 links / 2.3 KB / 365 Bytes gzipped
|
479
|
+
+ sitemap_index.xml.gz 1 sitemaps / 364 Bytes / 199 Bytes gzipped
|
480
|
+
Sitemap stats: 12 links / 1 sitemaps / 0m00s
|
481
|
+
```
|
434
482
|
|
435
483
|
From this example we can see that:
|
436
484
|
|
@@ -446,28 +494,36 @@ You can read more about `add` in the [XML Specification](http://sitemaps.org/pro
|
|
446
494
|
|
447
495
|
Indicates how often the content of the page changes. One of `'always'`, `'hourly'`, `'daily'`, `'weekly'`, `'monthly'`, `'yearly'` or `'never'`. Example:
|
448
496
|
|
449
|
-
|
497
|
+
```ruby
|
498
|
+
add '/contact_us', :changefreq => 'monthly'
|
499
|
+
```
|
450
500
|
|
451
501
|
* `lastmod` - Default: `Time.now` (Time).
|
452
502
|
|
453
503
|
The date and time of last modification. Example:
|
454
504
|
|
455
|
-
|
505
|
+
```ruby
|
506
|
+
add content_path(content), :lastmod => content.updated_at
|
507
|
+
```
|
456
508
|
|
457
509
|
* `host` - Default: `default_host` (String).
|
458
510
|
|
459
511
|
Host to use when building the URL. Example:
|
460
512
|
|
461
|
-
|
513
|
+
```ruby
|
514
|
+
add '/login', :host => 'https://securehost.com'
|
515
|
+
```
|
462
516
|
|
463
517
|
* `priority` - Default: `0.5` (Float).
|
464
518
|
|
465
519
|
The priority of the URL relative to other URLs on a scale from 0 to 1. Example:
|
466
520
|
|
467
|
-
|
521
|
+
```ruby
|
522
|
+
add '/about', :priority => 0.75
|
523
|
+
```
|
524
|
+
|
468
525
|
|
469
|
-
Adding Links to the Sitemap Index
|
470
|
-
----------
|
526
|
+
### Adding Links to the Sitemap Index
|
471
527
|
|
472
528
|
Sometimes you may need to manually add some links to the sitemap index file. For example if you are generating your sitemaps incrementally you may want to create a sitemap index which includes the files which have already been generated. To achieve this you can use the `add_to_index` method which works exactly the same as the `add` method described above.
|
473
529
|
|
@@ -479,50 +535,56 @@ It supports the same options as `add`, namely:
|
|
479
535
|
|
480
536
|
The value for `host` defaults to whatever you have set as your `sitemaps_host`. Remember that the `sitemaps_host` is the host where your sitemaps reside. If your sitemaps are on the same host as your `default_host`, then the value for `default_host` is used. Example:
|
481
537
|
|
482
|
-
|
538
|
+
```ruby
|
539
|
+
add_to_index '/mysitemap1.xml.gz', :host => 'http://sitemaphostingserver.com'
|
540
|
+
```
|
483
541
|
|
484
542
|
* `priority`
|
485
543
|
|
486
544
|
An example:
|
487
545
|
|
488
|
-
|
489
|
-
|
490
|
-
|
491
|
-
|
492
|
-
|
493
|
-
|
546
|
+
```ruby
|
547
|
+
SitemapGenerator::Sitemap.default_host = "http://www.example.com"
|
548
|
+
SitemapGenerator::Sitemap.create do
|
549
|
+
add_to_index '/mysitemap1.xml.gz'
|
550
|
+
add_to_index '/mysitemap2.xml.gz'
|
551
|
+
# ...
|
552
|
+
end
|
553
|
+
```
|
494
554
|
|
495
|
-
Accessing the LinkSet instance
|
496
|
-
----------
|
555
|
+
### Accessing the LinkSet instance
|
497
556
|
|
498
557
|
Sometimes you need to mess with the internals to do custom stuff. If you need access to the LinkSet instance from within `create()` you can use the `sitemap` method to do so.
|
499
558
|
|
500
559
|
In this example, say we have already pre-generated three sitemap files: `sitemap1.xml.gz`, `sitemap2.xml.gz`, `sitemap3.xml.gz`. Now we want to start the sitemap generation at `sitemap4.xml.gz` and create a bunch of new sitemaps. There are a few ways we can do this, but this is an easy way:
|
501
560
|
|
502
|
-
|
503
|
-
|
504
|
-
|
505
|
-
|
506
|
-
|
507
|
-
|
508
|
-
|
509
|
-
|
510
|
-
|
561
|
+
```ruby
|
562
|
+
SitemapGenerator::Sitemap.default_host = "http://www.example.com"
|
563
|
+
SitemapGenerator::Sitemap.create do
|
564
|
+
3.times do |i|
|
565
|
+
add_to_index sitemap.sitemaps_namer.to_s
|
566
|
+
sitemap.sitemaps_namer.next
|
567
|
+
end
|
568
|
+
add '/home'
|
569
|
+
add '/another'
|
570
|
+
end
|
571
|
+
```
|
511
572
|
|
512
573
|
The output looks something like this:
|
513
574
|
|
514
|
-
|
515
|
-
|
516
|
-
|
517
|
-
|
575
|
+
```
|
576
|
+
In /Users/karl/projects/sitemap_generator-test/public/
|
577
|
+
+ sitemap4.xml.gz 4 links / 347 Bytes
|
578
|
+
+ sitemap_index.xml.gz 4 sitemaps / 242 Bytes
|
579
|
+
Sitemap stats: 4 links / 4 sitemaps / 0m00s
|
580
|
+
```
|
518
581
|
|
519
|
-
Speeding Things Up
|
520
|
-
----------
|
582
|
+
### Speeding Things Up
|
521
583
|
|
522
584
|
For large ActiveRecord collections with thousands of records it is advisable to iterate through them in batches to avoid loading all records into memory at once. For this reason in the example above we use `Content.find_each` which is a batched iterator available since Rails 2.3.2, rather than `Content.all`.
|
523
585
|
|
524
|
-
|
525
|
-
|
586
|
+
|
587
|
+
## Customizing your Sitemaps
|
526
588
|
|
527
589
|
SitemapGenerator supports a number of options which allow you to control every aspect of your sitemap generation. How they are named, where they are stored, the contents of the links and the location that the sitemaps will be hosted from can all be set.
|
528
590
|
|
@@ -530,34 +592,39 @@ The options can be set in the following ways.
|
|
530
592
|
|
531
593
|
On `SitemapGenerator::Sitemap`:
|
532
594
|
|
533
|
-
|
534
|
-
|
595
|
+
```ruby
|
596
|
+
SitemapGenerator::Sitemap.default_host = 'http://example.com'
|
597
|
+
SitemapGenerator::Sitemap.sitemaps_path = 'sitemaps/'
|
598
|
+
```
|
535
599
|
|
536
600
|
These options will apply to all sitemaps. This is how you set most options.
|
537
601
|
|
538
602
|
Passed as options in the call to `create`:
|
539
603
|
|
540
|
-
|
541
|
-
|
542
|
-
|
543
|
-
|
544
|
-
|
604
|
+
```ruby
|
605
|
+
SitemapGenerator::Sitemap.create(
|
606
|
+
:default_host => 'http://example.com',
|
607
|
+
:sitemaps_path => 'sitemaps/') do
|
608
|
+
add '/home'
|
609
|
+
end
|
610
|
+
```
|
545
611
|
|
546
612
|
This is useful if you are setting a lot of options.
|
547
613
|
|
548
614
|
Finally, passed as options in a call to `group`:
|
549
615
|
|
550
|
-
|
551
|
-
|
552
|
-
|
553
|
-
|
554
|
-
|
555
|
-
|
616
|
+
```ruby
|
617
|
+
SitemapGenerator::Sitemap.create do
|
618
|
+
group(:default_host => 'http://example.com',
|
619
|
+
:sitemaps_path => 'sitemaps/') do
|
620
|
+
add '/home'
|
621
|
+
end
|
622
|
+
end
|
623
|
+
```
|
556
624
|
|
557
625
|
The options passed to `group` only apply to the links and sitemaps generated in the group. Sitemap Groups are useful to group links into specific sitemaps, or to set options that you only want to apply to the links in that group.
|
558
626
|
|
559
|
-
Sitemap Options
|
560
|
-
-------
|
627
|
+
### Sitemap Options
|
561
628
|
|
562
629
|
The following options are supported:
|
563
630
|
|
@@ -565,7 +632,7 @@ The following options are supported:
|
|
565
632
|
|
566
633
|
* `filename` - Symbol. The **base name for the files** that will be generated. The default value is `:sitemap`. This yields sitemaps with names like `sitemap1.xml.gz`, `sitemap2.xml.gz`, `sitemap3.xml.gz` etc, and a sitemap index named `sitemap_index.xml.gz`. If we now set the value to `:geo` the sitemaps would be named `geo1.xml.gz`, `geo2.xml.gz`, `geo3.xml.gz` etc, and the sitemap index would be named `geo_index.xml.gz`.
|
567
634
|
|
568
|
-
* `include_index` - Boolean. Whether to **add a link to the sitemap index** to the current sitemap. This points search engines to your Sitemap Index to include it in the indexing of your site. Default is `
|
635
|
+
* `include_index` - Boolean. Whether to **add a link to the sitemap index** to the current sitemap. This points search engines to your Sitemap Index to include it in the indexing of your site. 2012-07: This is now turned off by default because Google may complain about there being 'Nested Sitemap indexes'. Default is `false`. Turned off when `sitemaps_host` is set or within a `group()` block.
|
569
636
|
|
570
637
|
* `include_root` - Boolean. Whether to **add the root** url i.e. '/' to the current sitemap. Default is `true`. Turned off within a `group()` block.
|
571
638
|
|
@@ -588,8 +655,8 @@ different host than the rest of the links in the sitemap. Something that the si
|
|
588
655
|
you can provide an instance of your own class to provide custom behavior. Your class must
|
589
656
|
define a write method which takes a `SitemapGenerator::Location` and raw XML data.
|
590
657
|
|
591
|
-
|
592
|
-
|
658
|
+
|
659
|
+
## Sitemap Groups
|
593
660
|
|
594
661
|
Sitemap Groups is a powerful feature that is also very simple to use.
|
595
662
|
|
@@ -600,33 +667,36 @@ Sitemap Groups is a powerful feature that is also very simple to use.
|
|
600
667
|
* Groups can handle any number of links.
|
601
668
|
* Group sitemaps are finalized (written out) as they get full and at the end of each group.
|
602
669
|
|
603
|
-
A Groups Example
|
604
|
-
----------------
|
670
|
+
### A Groups Example
|
605
671
|
|
606
672
|
When you create a new group you pass options which will apply only to that group. You pass a block to `group`. Inside your block you call `add` to add links to the group.
|
607
673
|
|
608
674
|
Let's see an example that demonstrates a few interesting things about groups:
|
609
675
|
|
610
|
-
|
611
|
-
|
612
|
-
|
676
|
+
```ruby
|
677
|
+
SitemapGenerator::Sitemap.default_host = "http://www.example.com"
|
678
|
+
SitemapGenerator::Sitemap.create do
|
679
|
+
add '/rss'
|
613
680
|
|
614
|
-
|
615
|
-
|
616
|
-
|
681
|
+
group(:sitemaps_path => 'en/', :filename => :english) do
|
682
|
+
add '/home'
|
683
|
+
end
|
617
684
|
|
618
|
-
|
619
|
-
|
620
|
-
|
621
|
-
|
685
|
+
group(:sitemaps_path => 'fr/', :filename => :french) do
|
686
|
+
add '/maison'
|
687
|
+
end
|
688
|
+
end
|
689
|
+
```
|
622
690
|
|
623
691
|
And the output from running the above:
|
624
692
|
|
625
|
-
|
626
|
-
|
627
|
-
|
628
|
-
|
629
|
-
|
693
|
+
```
|
694
|
+
+ en/english1.xml.gz 1 links / 612 Bytes / 296 Bytes gzipped
|
695
|
+
+ fr/french1.xml.gz 1 links / 614 Bytes / 298 Bytes gzipped
|
696
|
+
+ sitemap1.xml.gz 3 links / 919 Bytes / 328 Bytes gzipped
|
697
|
+
+ sitemap_index.xml.gz 3 sitemaps / 505 Bytes / 221 Bytes gzipped
|
698
|
+
Sitemap stats: 5 links / 3 sitemaps / 0m00s
|
699
|
+
```
|
630
700
|
|
631
701
|
So we have two sitemaps with one link each and one sitemap with three links. The sitemaps from the groups are easy to spot by their filenames. They are `english1.xml.gz` and `french1.xml.gz`. They contain only one link each because **`include_index` and `include_root` are set to `false` by default** in a group.
|
632
702
|
|
@@ -638,30 +708,31 @@ The options you use when creating your groups will determine which and how many
|
|
638
708
|
|
639
709
|
If you have changed your sitemaps physical location in a group, then the default sitemap will not be used and it will be unaffected by the group. **Group sitemaps are finalized as they get full and at the end of each group.**
|
640
710
|
|
641
|
-
Sitemap Extensions
|
642
|
-
===========
|
643
711
|
|
644
|
-
|
645
|
-
-----------
|
712
|
+
## Sitemap Extensions
|
646
713
|
|
647
|
-
|
648
|
-
|
649
|
-
### Example
|
714
|
+
### News Sitemaps
|
650
715
|
|
651
|
-
|
652
|
-
add('/index.html', :news => {
|
653
|
-
:publication_name => "Example",
|
654
|
-
:publication_language => "en",
|
655
|
-
:title => "My Article",
|
656
|
-
:keywords => "my article, articles about myself",
|
657
|
-
:stock_tickers => "SAO:PETR3",
|
658
|
-
:publication_date => "2011-08-22",
|
659
|
-
:access => "Subscription",
|
660
|
-
:genres => "PressRelease"
|
661
|
-
})
|
662
|
-
end
|
716
|
+
A news item can be added to a sitemap URL by passing a `:news` hash to `add`. The hash must contain tags defined by the [News Sitemap][news_tags] specification.
|
663
717
|
|
664
|
-
|
718
|
+
#### Example
|
719
|
+
|
720
|
+
```ruby
|
721
|
+
SitemapGenerator::Sitemap.create do
|
722
|
+
add('/index.html', :news => {
|
723
|
+
:publication_name => "Example",
|
724
|
+
:publication_language => "en",
|
725
|
+
:title => "My Article",
|
726
|
+
:keywords => "my article, articles about myself",
|
727
|
+
:stock_tickers => "SAO:PETR3",
|
728
|
+
:publication_date => "2011-08-22",
|
729
|
+
:access => "Subscription",
|
730
|
+
:genres => "PressRelease"
|
731
|
+
})
|
732
|
+
end
|
733
|
+
```
|
734
|
+
|
735
|
+
#### Supported options
|
665
736
|
|
666
737
|
* `publication_name`
|
667
738
|
* `publication_language`
|
@@ -672,21 +743,22 @@ A news item can be added to a sitemap URL by passing a `:news` hash to `add`. T
|
|
672
743
|
* `keywords`
|
673
744
|
* `stock_tickers`
|
674
745
|
|
675
|
-
|
676
|
-
Image Sitemaps
|
677
|
-
-----------
|
746
|
+
|
747
|
+
### Image Sitemaps
|
678
748
|
|
679
749
|
Images can be added to a sitemap URL by passing an `:images` array to `add`. Each item in the array must be a Hash containing tags defined by the [Image Sitemap][image_tags] specification.
|
680
750
|
|
681
|
-
|
751
|
+
#### Example
|
682
752
|
|
683
|
-
|
684
|
-
|
685
|
-
|
686
|
-
|
687
|
-
|
753
|
+
```ruby
|
754
|
+
SitemapGenerator::Sitemap.create do
|
755
|
+
add('/index.html', :images => [{
|
756
|
+
:loc => 'http://www.example.com/image.png',
|
757
|
+
:title => 'Image' }])
|
758
|
+
end
|
759
|
+
```
|
688
760
|
|
689
|
-
|
761
|
+
#### Supported options
|
690
762
|
|
691
763
|
* `loc` Required, location of the image
|
692
764
|
* `caption`
|
@@ -694,48 +766,50 @@ Images can be added to a sitemap URL by passing an `:images` array to `add`. Ea
|
|
694
766
|
* `title`
|
695
767
|
* `license`
|
696
768
|
|
697
|
-
|
698
|
-
Video Sitemaps
|
699
|
-
-----------
|
769
|
+
|
770
|
+
### Video Sitemaps
|
700
771
|
|
701
772
|
A video can be added to a sitemap URL by passing a `:video` Hash to `add()`. The Hash can contain tags defined by the [Video Sitemap specification][video_tags].
|
702
773
|
|
703
774
|
To add more than one video to a url, pass an array of video hashes using the `:videos` option.
|
704
775
|
|
705
|
-
|
776
|
+
#### Example
|
706
777
|
|
707
|
-
|
708
|
-
|
709
|
-
|
710
|
-
|
711
|
-
|
712
|
-
|
713
|
-
|
714
|
-
|
778
|
+
```ruby
|
779
|
+
add('/index.html', :video => {
|
780
|
+
:thumbnail_loc => 'http://www.example.com/video1_thumbnail.png',
|
781
|
+
:title => 'Title',
|
782
|
+
:description => 'Description',
|
783
|
+
:content_loc => 'http://www.example.com/cool_video.mpg',
|
784
|
+
:tags => %w[one two three],
|
785
|
+
:category => 'Category'
|
786
|
+
})
|
787
|
+
```
|
715
788
|
|
716
|
-
|
789
|
+
#### Supported options
|
717
790
|
|
718
791
|
* `:thumbnail_loc` - Required, string.
|
719
792
|
|
720
793
|
|
721
794
|
|
722
|
-
Geo Sitemaps
|
723
|
-
-----------
|
795
|
+
### Geo Sitemaps
|
724
796
|
|
725
797
|
Pages with geo data can be added by passing a `:geo` Hash to `add`. The Hash only supports one tag of `:format`. Google provides an [example of a geo sitemap link here][geo_tags]. Note that the sitemap does not actually contain your KML or GeoRSS. It merely links to a page that has this content.
|
726
798
|
|
727
|
-
|
799
|
+
#### Example:
|
728
800
|
|
729
|
-
|
730
|
-
|
731
|
-
|
801
|
+
```ruby
|
802
|
+
SitemapGenerator::Sitemap.create do
|
803
|
+
add('/stores/1234.xml', :geo => { :format => 'kml' })
|
804
|
+
end
|
805
|
+
```
|
732
806
|
|
733
|
-
|
807
|
+
#### Supported options
|
734
808
|
|
735
809
|
* `format` Required, either 'kml' or 'georss'
|
736
810
|
|
737
|
-
|
738
|
-
|
811
|
+
|
812
|
+
## Raison d'être
|
739
813
|
|
740
814
|
Most of the Sitemap plugins out there seem to try to recreate the Sitemap links by iterating the Rails routes. In some cases this is possible, but for a great deal of cases it isn't.
|
741
815
|
|
@@ -745,7 +819,9 @@ and
|
|
745
819
|
|
746
820
|
b) How would you infer the correct series of links for the following route?
|
747
821
|
|
748
|
-
|
822
|
+
```ruby
|
823
|
+
map.zipcode 'location/:state/:city/:zipcode', :controller => 'zipcode', :action => 'index'
|
824
|
+
```
|
749
825
|
|
750
826
|
Don't tell me it's trivial, because it isn't. It just looks trivial.
|
751
827
|
|
@@ -753,45 +829,46 @@ So my idea is to have another file similar to 'routes.rb' called 'sitemap.rb', w
|
|
753
829
|
|
754
830
|
Here's my solution:
|
755
831
|
|
756
|
-
|
757
|
-
|
758
|
-
|
832
|
+
```ruby
|
833
|
+
Zipcode.find(:all, :include => :city).each do |z|
|
834
|
+
add zipcode_path(:state => z.city.state, :city => z.city, :zipcode => z)
|
835
|
+
end
|
836
|
+
```
|
759
837
|
|
760
838
|
Easy hey?
|
761
839
|
|
762
|
-
Compatibility
|
763
|
-
=======
|
840
|
+
## Compatibility
|
764
841
|
|
765
842
|
Tested and working on:
|
766
843
|
|
767
|
-
|
768
|
-
|
769
|
-
|
844
|
+
* **Rails** 3.0.0, 3.0.7
|
845
|
+
* **Rails** 1.x - 2.3.8
|
846
|
+
* **Ruby** 1.8.6, 1.8.7, 1.8.7 Enterprise Edition, 1.9.1, 1.9.2
|
847
|
+
|
848
|
+
|
849
|
+
## Known Bugs
|
850
|
+
|
851
|
+
* There's no check on the size of a URL which [isn't supposed to exceed 2,048 bytes][sitemaps_xml].
|
852
|
+
* Currently only supports one Sitemap Index file, which can contain 50,000 Sitemap files which can each contain 50,000 urls, so it _only_ supports up to 2,500,000,000 (2.5 billion) urls.
|
770
853
|
|
771
|
-
Known Bugs
|
772
|
-
========
|
773
854
|
|
774
|
-
|
775
|
-
- Currently only supports one Sitemap Index file, which can contain 50,000 Sitemap files which can each contain 50,000 urls, so it _only_ supports up to 2,500,000,000 (2.5 billion) urls.
|
855
|
+
## Wishlist & Coming Soon
|
776
856
|
|
777
|
-
|
778
|
-
========
|
857
|
+
* Rails framework agnosticism; support for other frameworks like Merb
|
779
858
|
|
780
|
-
- Rails framework agnosticism; support for other frameworks like Merb
|
781
859
|
|
782
|
-
Thanks (in no particular order)
|
783
|
-
========
|
860
|
+
## Thanks (in no particular order)
|
784
861
|
|
785
|
-
|
786
|
-
|
787
|
-
|
788
|
-
|
789
|
-
|
790
|
-
|
791
|
-
|
792
|
-
|
793
|
-
|
794
|
-
|
862
|
+
* [Rodrigo Flores](https://github.com/rodrigoflores) for News sitemaps
|
863
|
+
* [Alex Soto](http://github.com/apsoto) for Video sitemaps
|
864
|
+
* [Alexadre Bini](http://github.com/alexandrebini) for Image sitemaps
|
865
|
+
* [Dan Pickett](http://github.com/dpickett)
|
866
|
+
* [Rob Biedenharn](http://github.com/rab)
|
867
|
+
* [Richie Vos](http://github.com/jerryvos)
|
868
|
+
* [Adrian Mugnolo](http://github.com/xymbol)
|
869
|
+
* [Jason Weathered](http://github.com/jasoncodes)
|
870
|
+
* [Andy Stewart](http://github.com/airblade)
|
871
|
+
* [Brian Armstrong](https://github.com/barmstrong) for Geo sitemaps
|
795
872
|
|
796
873
|
Copyright (c) 2009 Karl Varga released under the MIT license
|
797
874
|
|
@@ -804,9 +881,11 @@ Copyright (c) 2009 Karl Varga released under the MIT license
|
|
804
881
|
[sitemap_video]:http://www.google.com/support/webmasters/bin/topic.py?topic=10079
|
805
882
|
[sitemap_news]:http://www.google.com/support/webmasters/bin/topic.py?hl=en&topic=10078
|
806
883
|
[sitemap_geo]:http://www.google.com/support/webmasters/bin/topic.py?hl=en&topic=14688
|
884
|
+
[sitemap_mobile]:http://support.google.com/webmasters/bin/answer.py?hl=en&answer=34648
|
807
885
|
[sitemap_protocol]:http://sitemaps.org/protocol.php
|
808
886
|
[video_tags]:http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=80472#4
|
809
887
|
[image_tags]:http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=178636
|
810
888
|
[geo_tags]:http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=94555
|
811
889
|
[news_tags]:http://www.google.com/support/news_pub/bin/answer.py?answer=74288
|
812
890
|
[remote_hosts]:https://github.com/kjvarga/sitemap_generator/wiki/Generate-Sitemaps-on-read-only-filesystems-like-Heroku
|
891
|
+
[include_index_change]:https://github.com/kjvarga/sitemap_generator/issues/70
|