sitemap_generator 4.0.alpha → 4.0
Sign up to get free protection for your applications and to get access to all the features.
- data/Gemfile +2 -2
- data/Gemfile.lock +1 -3
- data/README.md +197 -111
- data/VERSION +1 -1
- data/lib/sitemap_generator/builder/sitemap_file.rb +12 -12
- data/lib/sitemap_generator/builder/sitemap_index_file.rb +22 -8
- data/lib/sitemap_generator/builder/sitemap_url.rb +4 -2
- data/lib/sitemap_generator/link_set.rb +139 -67
- data/lib/sitemap_generator/sitemap_location.rb +5 -5
- data/lib/sitemap_generator/sitemap_namer.rb +14 -5
- data/spec/files/sitemap.deprecated.rb +2 -0
- data/spec/files/sitemap.groups.rb +14 -2
- data/spec/sitemap_generator/alternate_sitemap_spec.rb +25 -0
- data/spec/sitemap_generator/builder/sitemap_file_spec.rb +78 -25
- data/spec/sitemap_generator/builder/sitemap_index_file_spec.rb +75 -12
- data/spec/sitemap_generator/builder/sitemap_index_url_spec.rb +17 -5
- data/spec/sitemap_generator/link_set_spec.rb +56 -13
- data/spec/sitemap_generator/sitemap_generator_spec.rb +222 -75
- data/spec/sitemap_generator/sitemap_groups_spec.rb +52 -41
- data/spec/sitemap_generator/sitemap_location_spec.rb +46 -44
- data/spec/sitemap_generator/sitemap_namer_spec.rb +14 -0
- data/spec/spec_helper.rb +3 -0
- metadata +17 -14
data/Gemfile
CHANGED
data/Gemfile.lock
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
PATH
|
2
2
|
remote: ./
|
3
3
|
specs:
|
4
|
-
sitemap_generator (4.0
|
4
|
+
sitemap_generator (4.0)
|
5
5
|
builder
|
6
6
|
|
7
7
|
GEM
|
@@ -14,7 +14,6 @@ GEM
|
|
14
14
|
metaclass (~> 0.0.1)
|
15
15
|
nokogiri (1.5.0)
|
16
16
|
rake (0.9.2.2)
|
17
|
-
rcov (0.9.11)
|
18
17
|
rspec (2.8.0)
|
19
18
|
rspec-core (~> 2.8.0)
|
20
19
|
rspec-expectations (~> 2.8.0)
|
@@ -32,6 +31,5 @@ DEPENDENCIES
|
|
32
31
|
mocha
|
33
32
|
nokogiri
|
34
33
|
rake
|
35
|
-
rcov
|
36
34
|
rspec
|
37
35
|
sitemap_generator!
|
data/README.md
CHANGED
@@ -9,13 +9,14 @@ Sitemaps adhere to the [Sitemap 0.9 protocol][sitemap_protocol] specification.
|
|
9
9
|
* Framework agnostic
|
10
10
|
* Supports [News sitemaps][sitemap_news], [Video sitemaps][sitemap_video], [Image sitemaps][sitemap_images], [Geo sitemaps][sitemap_geo], [Mobile sitemaps][sitemap_mobile] and [Alternate Links][alternate_links]
|
11
11
|
* Supports read-only filesystems like Heroku via uploading to a remote host like Amazon S3
|
12
|
-
* Compatible with Rails 2 & 3
|
12
|
+
* Compatible with Rails 2 & 3 and tested with Ruby REE, 1.9.2 & 1.9.3
|
13
13
|
* Adheres to the [Sitemap 0.9 protocol][sitemap_protocol]
|
14
14
|
* Handles millions of links
|
15
15
|
* Automatically compresses your sitemaps
|
16
16
|
* Notifies search engines (Google, Bing, SitemapWriter) of new sitemaps
|
17
17
|
* Ensures your old sitemaps stay in place if the new sitemap fails to generate
|
18
|
-
* Gives you complete control over your
|
18
|
+
* Gives you complete control over your sitemap contents and naming scheme
|
19
|
+
* Intelligent sitemap indexing
|
19
20
|
|
20
21
|
### Show Me
|
21
22
|
|
@@ -49,8 +50,7 @@ Output:
|
|
49
50
|
|
50
51
|
```
|
51
52
|
In /Users/karl/projects/sitemap_generator-test/public/
|
52
|
-
+
|
53
|
-
+ sitemap_index.xml.gz 1 sitemaps / 228 Bytes
|
53
|
+
+ sitemap.xml.gz 3 links / 364 Bytes
|
54
54
|
Sitemap stats: 3 links / 1 sitemaps / 0m00s
|
55
55
|
|
56
56
|
Successful ping of Google
|
@@ -65,9 +65,45 @@ Does your website use SitemapGenerator to generate Sitemaps? Where would you be
|
|
65
65
|
|
66
66
|
<a href='http://www.pledgie.com/campaigns/15267'><img alt='Click here to lend your support to: SitemapGenerator and make a donation at www.pledgie.com !' src='http://pledgie.com/campaigns/15267.png?skin_name=chrome' border='0' /></a>
|
67
67
|
|
68
|
+
## Important changes in version 4!
|
69
|
+
|
70
|
+
Version 4.0 introduces a new **non-backwards compatible** naming scheme. **If you are running version 3 or earlier and you upgrade to version 4, you need to make a couple small changes to ensure that search engines can still find your sitemaps!** Your sitemaps will still work fine, but the name of the index file has changed.
|
71
|
+
|
72
|
+
### So what has changed?
|
73
|
+
|
74
|
+
* **The index is generated intelligently**. SitemapGenerator now detects whether you need an index or not, and only generates one if you need it or have requested it. So small sites (less than 50,000 links) won't have one, large sites will. You don't have to worry about anything. And with the `create_index` option, it's easier than ever to control index creation to suit your needs.
|
75
|
+
|
76
|
+
* **The default index file name has changed** from `sitemap_index.xml.gz` to just `sitemap.xml.gz`. So the `_index` part has been removed. This is a more standard naming scheme for the sitemaps. Any further sitemaps are named `sitemap1.xml.gz`, `sitemap2.xml.gz`, `sitemap3.xml.gz` etc, just as before.
|
77
|
+
|
78
|
+
* **Everyone now points search engines to the `sitemap.xml.gz` file**. It doesn't matter whether your site has 10 links or a million links, just point to `sitemap.xml.gz`. If your site needs an index, that is the index. If it doesn't, then that's your sitemap. Simple.
|
79
|
+
|
80
|
+
* **It's easier to write custom namers** because the index and the sitemaps share the same namer instance (which is now a `SitemapGenerator::SimpleNamer` instance).
|
81
|
+
|
82
|
+
* **Groups share the new naming convention**. So the files in your `geo` group will be named `geo.xml.gz`, `geo1.xml.gz`, `geo2.xml.gz` etc. Pre-version 4 these files would have been named `geo1.xml.gz`, `geo2.xml.gz`, `geo3.xml.gz` etc.
|
83
|
+
|
84
|
+
### I don't want it! How can I keep everything as it was?
|
85
|
+
|
86
|
+
You don't care, you just want to get on with your day. To resort to pre-version 4 behaviour add the following to your sitemap config:
|
87
|
+
|
88
|
+
```ruby
|
89
|
+
SitemapGenerator::Sitemap.create_index = true
|
90
|
+
SitemapGenerator::Sitemap.namer = SitemapGenerator::SimpleNamer.new(:sitemap, :zero => '_index')
|
91
|
+
```
|
92
|
+
|
93
|
+
This tells SitemapGenerator to always create an index file and to name it `sitemap_index.xml.gz`. If you are already using custom namers, you don't need to set `namer`; your old namers should still work as before. If you are using named groups, setting the sitemap namer in this way won't affect your groups, which will still be using the new naming scheme. If this is an issue for you, you may have to create namers for your groups.
|
94
|
+
|
95
|
+
### I want it! What do I need to do?
|
96
|
+
|
97
|
+
1. Update your `robots.txt` file and make sure it points to `sitemap.xml.gz`.
|
98
|
+
2. Generate your sitemaps to create the new `sitemap.xml.gz` file.
|
99
|
+
3. Optionally remove the old `sitemap_index.xml.gz` file (or link it to the new file if you want to make sure that search engines can find it while you update them.)
|
100
|
+
4. Go to your Google Webmaster tools and other places where you've pointed search engines to your sitemaps and point them to your new `sitemap.xml.gz` file.
|
101
|
+
|
102
|
+
That's it! Welcome to the future!
|
68
103
|
|
69
104
|
## Changelog
|
70
105
|
|
106
|
+
* **v4.0: NEW, NON-BACKWARDS COMPATIBLE CHANGES.** See above for more info. `create_index` defaults to `:auto`. Define `SitemapGenerator::SimpleNamer` class for simpler custom namers compatible with the new naming conventions. Deprecate `sitemaps_namer`, `sitemap_index_namer` and their respective namer classes. It's more just that their usage is discouraged. Support `nofollow` option on alternate links. Fix formatting of `publication_date` in News sitemaps.
|
71
107
|
* v3.4: Support [alternate links][alternate_links] for urls; Support configurable options in the `SitemapGenerator::S3Adapter`
|
72
108
|
* v3.3: **Support creating sitemaps with no index file**. A big thank-you to [Eric Hochberger][ehoch] for generously paying for this feature.
|
73
109
|
* v3.2.1: Fix syntax error in SitemapGenerator::S3Adapter
|
@@ -203,7 +239,7 @@ SitemapGenerator::Sitemap.ping_search_engines
|
|
203
239
|
Alternatively you can pass in the full URL to your sitemap index in which case we would have just the following:
|
204
240
|
|
205
241
|
```ruby
|
206
|
-
SitemapGenerator::Sitemap.ping_search_engines('http://example.com/
|
242
|
+
SitemapGenerator::Sitemap.ping_search_engines('http://example.com/sitemap.xml.gz')
|
207
243
|
```
|
208
244
|
|
209
245
|
### Crontab
|
@@ -225,7 +261,7 @@ end
|
|
225
261
|
You should add the URL of the sitemap index file to `public/robots.txt` to help search engines find your sitemaps. The URL should be the complete URL to the sitemap index. For example:
|
226
262
|
|
227
263
|
```
|
228
|
-
Sitemap: http://www.example.com/
|
264
|
+
Sitemap: http://www.example.com/sitemap.xml.gz
|
229
265
|
```
|
230
266
|
|
231
267
|
## Deployments & Capistrano
|
@@ -233,40 +269,52 @@ Sitemap: http://www.example.com/sitemap_index.xml.gz
|
|
233
269
|
To ensure that your application's sitemaps are available after a deployment you can do one of the following:
|
234
270
|
|
235
271
|
1. **Generate sitemaps into a directory which is shared by all deployments.**
|
236
|
-
|
237
272
|
You can set your sitemaps path to your shared directory using the `sitemaps_path` option. For example if we have a directory `public/shared/` that is shared by all deployments we can have our sitemaps generated into that directory by setting:
|
238
273
|
|
239
|
-
```ruby
|
240
|
-
SitemapGenerator::Sitemap.sitemaps_path = 'shared/'
|
241
|
-
```
|
242
|
-
|
274
|
+
```ruby
|
275
|
+
SitemapGenerator::Sitemap.sitemaps_path = 'shared/'
|
276
|
+
```
|
243
277
|
2. **Copy the sitemaps from the previous deploy over to the new deploy:**
|
244
|
-
|
245
278
|
(You will need to customize the task if you are using custom sitemap filenames or locations.)
|
246
279
|
|
247
|
-
```ruby
|
248
|
-
after "deploy:update_code", "deploy:copy_old_sitemap"
|
249
|
-
namespace :deploy do
|
250
|
-
|
251
|
-
|
252
|
-
|
253
|
-
end
|
254
|
-
```
|
255
|
-
|
280
|
+
```ruby
|
281
|
+
after "deploy:update_code", "deploy:copy_old_sitemap"
|
282
|
+
namespace :deploy do
|
283
|
+
task :copy_old_sitemap do
|
284
|
+
run "if [ -e #{previous_release}/public/sitemap.xml.gz ]; then cp #{previous_release}/public/sitemap* #{current_release}/public/; fi"
|
285
|
+
end
|
286
|
+
end
|
287
|
+
```
|
256
288
|
3. **Regenerate your sitemaps after each deployment:**
|
257
289
|
|
258
|
-
```ruby
|
259
|
-
after "deploy", "refresh_sitemaps"
|
260
|
-
task :refresh_sitemaps do
|
261
|
-
|
262
|
-
end
|
263
|
-
```
|
290
|
+
```ruby
|
291
|
+
after "deploy", "refresh_sitemaps"
|
292
|
+
task :refresh_sitemaps do
|
293
|
+
run "cd #{latest_release} && RAILS_ENV=#{rails_env} rake sitemap:refresh"
|
294
|
+
end
|
295
|
+
```
|
264
296
|
|
265
297
|
### Sitemaps with no Index File
|
266
298
|
|
267
|
-
|
299
|
+
The sitemap index file is created for you on-demand, meaning that if you have a large site with more than one sitemap file, you will have a sitemap index file to reference those sitemap files. If however you have a small site with only one sitemap file, you don't require an index and so no index will be created. In both cases the index and sitemap file's name, respectively, is `sitemap.xml.gz`.
|
300
|
+
|
301
|
+
You may want to always create an index, even if you only have a small site. Or you may never want to create an index. For these cases, you can use the `create_index` option to control index creation. You can read about this option in the Sitemap Options section below.
|
302
|
+
|
303
|
+
To always create an index:
|
304
|
+
```ruby
|
305
|
+
SitemapGenerator::Sitemap.create_index = true
|
306
|
+
```
|
307
|
+
|
308
|
+
To never create an index:
|
309
|
+
```ruby
|
310
|
+
SitemapGenerator::Sitemap.create_index = false
|
311
|
+
```
|
312
|
+
Your sitemaps will still be called `sitemap.xml.gz`, `sitemap1.xml.gz`, `sitemap2.xml.gz`, etc.
|
268
313
|
|
269
|
-
|
314
|
+
And the default "intelligent" behaviour:
|
315
|
+
```ruby
|
316
|
+
SitemapGenerator::Sitemap.create_index = :auto
|
317
|
+
```
|
270
318
|
|
271
319
|
### Upload Sitemaps to a Remote Host
|
272
320
|
|
@@ -287,42 +335,46 @@ Sitemap Generator uses CarrierWave to support uploading to Amazon S3 store, Rack
|
|
287
335
|
|
288
336
|
2. Once you have CarrierWave setup and configured all you need to do is set some options in your sitemap config, such as:
|
289
337
|
|
290
|
-
|
291
|
-
|
292
|
-
|
293
|
-
|
294
|
-
|
338
|
+
* `default_host` - your website host name
|
339
|
+
* `sitemaps_host` - the remote host where your sitemaps will be hosted
|
340
|
+
* `public_path` - the directory to write sitemaps to locally e.g. `tmp/`
|
341
|
+
* `sitemaps_path` - set to a directory/path if you don't want to upload to the root of your `sitemaps_host`
|
342
|
+
* `adapter` - instance of `SitemapGenerator::WaveAdapter`
|
295
343
|
|
296
|
-
|
344
|
+
For Example:
|
297
345
|
|
298
|
-
```ruby
|
299
|
-
SitemapGenerator::Sitemap.default_host = "http://www.example.com"
|
300
|
-
SitemapGenerator::Sitemap.sitemaps_host = "http://s3.amazonaws.com/sitemap-generator/"
|
301
|
-
SitemapGenerator::Sitemap.public_path = 'tmp/'
|
302
|
-
SitemapGenerator::Sitemap.sitemaps_path = 'sitemaps/'
|
303
|
-
SitemapGenerator::Sitemap.adapter = SitemapGenerator::WaveAdapter.new
|
304
|
-
```
|
346
|
+
```ruby
|
347
|
+
SitemapGenerator::Sitemap.default_host = "http://www.example.com"
|
348
|
+
SitemapGenerator::Sitemap.sitemaps_host = "http://s3.amazonaws.com/sitemap-generator/"
|
349
|
+
SitemapGenerator::Sitemap.public_path = 'tmp/'
|
350
|
+
SitemapGenerator::Sitemap.sitemaps_path = 'sitemaps/'
|
351
|
+
SitemapGenerator::Sitemap.adapter = SitemapGenerator::WaveAdapter.new
|
352
|
+
```
|
305
353
|
|
306
354
|
3. Update your `robots.txt` file to point robots to the remote sitemap index file, e.g:
|
307
355
|
|
308
|
-
```
|
309
|
-
Sitemap: http://s3.amazonaws.com/sitemap-generator/sitemaps/sitemap_index.xml.gz
|
310
|
-
```
|
356
|
+
```
|
357
|
+
Sitemap: http://s3.amazonaws.com/sitemap-generator/sitemaps/sitemap_index.xml.gz
|
358
|
+
```
|
359
|
+
|
360
|
+
You generate your sitemaps as usual using `rake sitemap:refresh`.
|
311
361
|
|
312
|
-
|
362
|
+
Note that SitemapGenerator will automatically turn off `include_index` in this case because
|
363
|
+
the `sitemaps_host` does not match the `default_host`. The link to the sitemap index file
|
364
|
+
that would otherwise be included would point to a different host than the rest of the links
|
365
|
+
in the sitemap, something that the sitemap rules forbid. (Since version 3.2 this is no
|
366
|
+
longer an issue because [`include_index` is off by default][include_index_change].)
|
313
367
|
|
314
|
-
|
315
|
-
|
316
|
-
|
317
|
-
in the sitemap, something that the sitemap rules forbid. (Since version 3.2 this is no
|
318
|
-
longer an issue because [`include_index` is off by default][include_index_change].)
|
368
|
+
4. Verify to google that you own the s3 url
|
369
|
+
|
370
|
+
In order for Google to use your sitemap, you need to prove you own the s3 bucket through [google webmaster tools](https://www.google.com/webmasters/tools/home?hl=en). In the example above, you would add the site `http://s3.amazonaws.com/sitemap-generator/sitemaps`. Once you have verified you own the directory then add your `sitemap.xml.gz` to this list of sitemaps for the site.
|
319
371
|
|
320
372
|
### Generating Multiple Sitemaps
|
321
373
|
|
322
374
|
Each call to `create` creates a new sitemap index and associated sitemaps. You can call `create` as many times as you want within your sitemap configuration.
|
323
375
|
|
324
376
|
You must remember to use a different filename or location for each set of sitemaps, otherwise they will
|
325
|
-
overwrite each other. You can use the `filename`, `
|
377
|
+
overwrite each other. You can use the `filename`, `namer` and `sitemaps_path` options for this.
|
326
378
|
|
327
379
|
In the following example we generate three sitemaps each in its own subdirectory:
|
328
380
|
|
@@ -340,13 +392,13 @@ Outputs:
|
|
340
392
|
|
341
393
|
```
|
342
394
|
+ sitemaps/google/sitemap1.xml.gz 2 links / 822 Bytes / 328 Bytes gzipped
|
343
|
-
+ sitemaps/google/
|
395
|
+
+ sitemaps/google/sitemap.xml.gz 1 sitemaps / 389 Bytes / 217 Bytes gzipped
|
344
396
|
Sitemap stats: 2 links / 1 sitemaps / 0m00s
|
345
|
-
+ sitemaps/bing/sitemap1.xml.gz
|
346
|
-
+ sitemaps/bing/
|
397
|
+
+ sitemaps/bing/sitemap1.xml.gz 2 links / 820 Bytes / 330 Bytes gzipped
|
398
|
+
+ sitemaps/bing/sitemap.xml.gz 1 sitemaps / 388 Bytes / 217 Bytes gzipped
|
347
399
|
Sitemap stats: 2 links / 1 sitemaps / 0m00s
|
348
|
-
+ sitemaps/apple/sitemap1.xml.gz
|
349
|
-
+ sitemaps/apple/
|
400
|
+
+ sitemaps/apple/sitemap1.xml.gz 2 links / 820 Bytes / 330 Bytes gzipped
|
401
|
+
+ sitemaps/apple/sitemap.xml.gz 1 sitemaps / 388 Bytes / 214 Bytes gzipped
|
350
402
|
Sitemap stats: 2 links / 1 sitemaps / 0m00s
|
351
403
|
```
|
352
404
|
|
@@ -409,34 +461,24 @@ end
|
|
409
461
|
A few things to note:
|
410
462
|
|
411
463
|
* `SitemapGenerator::Sitemap` is a lazy-initialized sitemap object provided for your convenience.
|
412
|
-
* Every sitemap must set `default_host`. This is the hostname that is used when building links to add to the sitemap.
|
464
|
+
* Every sitemap must set `default_host`. This is the hostname that is used when building links to add to the sitemap (and all links in a sitemap must belong to the same host).
|
413
465
|
* The `create` method takes a block with calls to `add` to add links to the sitemap.
|
414
|
-
* The sitemaps are written to the `public/` directory
|
466
|
+
* The sitemaps are written to the `public/` directory in the directory from which the script is run. You can specify a custom location using the `public_path` or `sitemaps_path` option.
|
415
467
|
|
416
468
|
Now let's see what is output when we run this configuration with `rake sitemap:refresh:no_ping`:
|
417
469
|
|
418
470
|
```
|
419
|
-
|
420
|
-
+
|
471
|
+
In /Users/karl/projects/sitemap_generator-test/public/
|
472
|
+
+ sitemap.xml.gz 2 links / 347 Bytes
|
421
473
|
Sitemap stats: 2 links / 1 sitemaps / 0m00s
|
422
474
|
```
|
423
475
|
|
424
|
-
Weird! The sitemap has two links, even though only added one! This is because SitemapGenerator adds the root URL `/` by default. (Note that prior to version 3.2 the URL of the sitemap index file was also added to the sitemap by default but [this behaviour has been changed][include_index_change] because of Google complaining about nested indexing.) You can change the default behaviour by setting the `include_root` or `include_index` option.
|
425
|
-
|
426
|
-
Now let's take a look at the files that were created. After uncompressing and XML-tidying the contents we have:
|
476
|
+
Weird! The sitemap has two links, even though we only added one! This is because SitemapGenerator adds the root URL `/` for you by default. (Note that prior to version 3.2 the URL of the sitemap index file was also added to the sitemap by default but [this behaviour has been changed][include_index_change] because of Google complaining about nested indexing. This also doesn't make sense anymore because indexes are not always needed.) You can change the default behaviour by setting the `include_root` or `include_index` option.
|
427
477
|
|
428
|
-
|
478
|
+
Now let's take a look at the file that was created. After uncompressing and XML-tidying the contents we have:
|
429
479
|
|
430
|
-
```xml
|
431
|
-
<?xml version="1.0" encoding="UTF-8"?>
|
432
|
-
<sitemapindex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd">
|
433
|
-
<sitemap>
|
434
|
-
<loc>http://www.example.com/sitemap1.xml.gz</loc>
|
435
|
-
</sitemap>
|
436
|
-
</sitemapindex>
|
437
|
-
```
|
438
480
|
|
439
|
-
* `public/
|
481
|
+
* `public/sitemap.xml.gz`
|
440
482
|
|
441
483
|
```xml
|
442
484
|
<?xml version="1.0" encoding="UTF-8"?>
|
@@ -458,6 +500,39 @@ Now let's take a look at the files that were created. After uncompressing and X
|
|
458
500
|
|
459
501
|
The sitemaps conform to the [Sitemap 0.9 protocol][sitemap_protocol]. Notice the value for `priority` and `changefreq` on the root link, the one that was added for us? The values tell us that this link is the highest priority and should be checked regularly because it are constantly changing. You can specify your own values for these options in your call to `add`.
|
460
502
|
|
503
|
+
In this example no sitemap index was created because we have so few links, so none was needed. If we run the same example above and set `create_index = true` we can take a look at what an index file looks like:
|
504
|
+
|
505
|
+
```ruby
|
506
|
+
SitemapGenerator::Sitemap.default_host = "http://www.example.com"
|
507
|
+
SitemapGenerator::Sitemap.create_index = true
|
508
|
+
SitemapGenerator::Sitemap.create do
|
509
|
+
add '/welcome'
|
510
|
+
end
|
511
|
+
```
|
512
|
+
|
513
|
+
And the output:
|
514
|
+
|
515
|
+
```
|
516
|
+
In /Users/karl/projects/sitemap_generator-test/public/
|
517
|
+
+ sitemap1.xml.gz 2 links / 347 Bytes
|
518
|
+
+ sitemap.xml.gz 1 sitemaps / 228 Bytes
|
519
|
+
Sitemap stats: 2 links / 1 sitemaps / 0m00s
|
520
|
+
```
|
521
|
+
|
522
|
+
Now if we look at the uncompressed and formatted contents of `sitemap.xml.gz` we can see that it is a sitemap index and `sitemap1.xml.gz` is a sitemap:
|
523
|
+
|
524
|
+
* `public/sitemap.xml.gz`
|
525
|
+
|
526
|
+
```xml
|
527
|
+
<?xml version="1.0" encoding="UTF-8"?>
|
528
|
+
<sitemapindex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd">
|
529
|
+
<sitemap>
|
530
|
+
<loc>http://www.example.com/sitemap1.xml.gz</loc>
|
531
|
+
<lastmod>2013-05-01T18:10:26-07:00</lastmod>
|
532
|
+
</sitemap>
|
533
|
+
</sitemapindex>
|
534
|
+
```
|
535
|
+
|
461
536
|
### Adding Links
|
462
537
|
|
463
538
|
You call `add` in the block passed to `create` to add a **path** to your sitemap. `add` takes a string path and optional hash of options, generates the URL and adds it to the sitemap. You only need to pass a **path** because the URL will be built for us using the `default_host` we specified. However, if we want to use a different host for a particular link, we can pass the `:host` option to `add`.
|
@@ -483,8 +558,7 @@ In the example about we pass a `lastmod` (last modified) option with the value o
|
|
483
558
|
Looking at the output from running this sitemap, we see that we have a few more links than before:
|
484
559
|
|
485
560
|
```
|
486
|
-
+
|
487
|
-
+ sitemap_index.xml.gz 1 sitemaps / 364 Bytes / 199 Bytes gzipped
|
561
|
+
+ sitemap.xml.gz 12 links / 2.3 KB / 365 Bytes gzipped
|
488
562
|
Sitemap stats: 12 links / 1 sitemaps / 0m00s
|
489
563
|
```
|
490
564
|
|
@@ -518,7 +592,7 @@ add content_path(content), :lastmod => content.updated_at
|
|
518
592
|
|
519
593
|
* `host` - Default: `default_host` (String).
|
520
594
|
|
521
|
-
Host to use when building the URL. Example:
|
595
|
+
Host to use when building the URL. It's not technically valid to specify a different host for a link in a sitemap according to the spec, but this facility exists in case you have a need. Example:
|
522
596
|
|
523
597
|
```ruby
|
524
598
|
add '/login', :host => 'https://securehost.com'
|
@@ -562,6 +636,8 @@ SitemapGenerator::Sitemap.create do
|
|
562
636
|
end
|
563
637
|
```
|
564
638
|
|
639
|
+
When you add links in this way, an index is always created, unless you've explicitly set `create_index` to `false`.
|
640
|
+
|
565
641
|
### Accessing the LinkSet instance
|
566
642
|
|
567
643
|
Sometimes you need to mess with the internals to do custom stuff. If you need access to the LinkSet instance from within `create()` you can use the `sitemap` method to do so.
|
@@ -570,10 +646,10 @@ In this example, say we have already pre-generated three sitemap files: `sitemap
|
|
570
646
|
|
571
647
|
```ruby
|
572
648
|
SitemapGenerator::Sitemap.default_host = "http://www.example.com"
|
649
|
+
SitemapGenerator::Sitemap.namer = SitemapGenerator::SimpleNamer.new(:sitemap, :start => 4)
|
573
650
|
SitemapGenerator::Sitemap.create do
|
574
|
-
3.
|
575
|
-
add_to_index sitemap.
|
576
|
-
sitemap.sitemaps_namer.next
|
651
|
+
(1..3).each do |i|
|
652
|
+
add_to_index "sitemap#{i}.xml.gz"
|
577
653
|
end
|
578
654
|
add '/home'
|
579
655
|
add '/another'
|
@@ -584,9 +660,9 @@ The output looks something like this:
|
|
584
660
|
|
585
661
|
```
|
586
662
|
In /Users/karl/projects/sitemap_generator-test/public/
|
587
|
-
+ sitemap4.xml.gz
|
588
|
-
+
|
589
|
-
Sitemap stats:
|
663
|
+
+ sitemap4.xml.gz 3 links / 355 Bytes
|
664
|
+
+ sitemap.xml.gz 4 sitemaps / 242 Bytes
|
665
|
+
Sitemap stats: 3 links / 4 sitemaps / 0m00s
|
590
666
|
```
|
591
667
|
|
592
668
|
### Speeding Things Up
|
@@ -624,9 +700,8 @@ This is useful if you are setting a lot of options.
|
|
624
700
|
Finally, passed as options in a call to `group`:
|
625
701
|
|
626
702
|
```ruby
|
627
|
-
SitemapGenerator::Sitemap.create do
|
628
|
-
group(:
|
629
|
-
:sitemaps_path => 'sitemaps/') do
|
703
|
+
SitemapGenerator::Sitemap.create(:default_host => 'http://example.com') do
|
704
|
+
group(:filename => :somegroup, :sitemaps_path => 'sitemaps/') do
|
630
705
|
add '/home'
|
631
706
|
end
|
632
707
|
end
|
@@ -642,9 +717,9 @@ The following options are supported:
|
|
642
717
|
|
643
718
|
* `default_host` - String. Required. **Host including protocol** to use when building a link to add to your sitemap. For example `http://example.com`. Calling `add '/home'` would then generate the URL `http://example.com/home` and add that to the sitemap. You can pass a `:host` option in your call to `add` to override this value on a per-link basis. For example calling `add '/home', :host => 'https://example.com'` would generate the URL `https://example.com/home`, for that link only.
|
644
719
|
|
645
|
-
* `filename` - Symbol. The **base name for the files** that will be generated. The default value is `:sitemap`. This yields
|
720
|
+
* `filename` - Symbol. The **base name for the files** that will be generated. The default value is `:sitemap`. This yields files with names like `sitemap.xml.gz`, `sitemap1.xml.gz`, `sitemap2.xml.gz`, `sitemap3.xml.gz` etc. If we now set the value to `:geo` the files would be named `geo.xml.gz`, `geo1.xml.gz`, `geo2.xml.gz`, `geo3.xml.gz` etc.
|
646
721
|
|
647
|
-
* `include_index` - Boolean. Whether to **add a link to the sitemap index** to the current sitemap. This points search engines to your Sitemap Index to include it in the indexing of your site. 2012-07: This is now turned off by default because Google may complain about there being 'Nested Sitemap indexes'. Default is `false`. Turned off when `sitemaps_host` is set or within a `group()` block.
|
722
|
+
* `include_index` - Boolean. Whether to **add a link pointing to the sitemap index** to the current sitemap. This points search engines to your Sitemap Index to include it in the indexing of your site. 2012-07: This is now turned off by default because Google may complain about there being 'Nested Sitemap indexes'. Default is `false`. Turned off when `sitemaps_host` is set or within a `group()` block.
|
648
723
|
|
649
724
|
* `include_root` - Boolean. Whether to **add the root** url i.e. '/' to the current sitemap. Default is `true`. Turned off within a `group()` block.
|
650
725
|
|
@@ -652,12 +727,11 @@ The following options are supported:
|
|
652
727
|
|
653
728
|
* `sitemaps_host` - String. **Host including protocol** to use when generating a link to a sitemap file i.e. the hostname of the server where the sitemaps are hosted. The value will differ from the hostname in your sitemap links. For example: `'http://amazon.aws.com/'`. Note that `include_index` is
|
654
729
|
automatically turned off when the `sitemaps_host` does not match `default_host`.
|
655
|
-
Because the link to the sitemap index file that would otherwise be added would point to a
|
656
|
-
different host than the rest of the links in the sitemap. Something that the sitemap rules forbid.
|
730
|
+
Because the link to the sitemap index file that would otherwise be added would point to a different host than the rest of the links in the sitemap. Something that the sitemap rules forbid.
|
657
731
|
|
658
|
-
* `
|
732
|
+
* `namer` - A `SitemapGenerator::SimpleNamer` instance **for generating sitemap names**. You can read about Sitemap Namers by reading the API docs. Allows you to set the name, extension and number sequence for sitemap files, as well as modify the name of the first file in the sequence, which is often the index file. A simple example if we want to generate files like 'newname.xml.gz', 'newname1.xml.gz', etc is `SitemapGenerator::SimpleNamer.new(:newname)`. I've deprecated the old namer options `sitemaps_namer` and `sitemap_index_namer` in favour of this integrated approach, however those should still work.
|
659
733
|
|
660
|
-
* `sitemaps_path` - String. A **relative path** giving a directory under your `public_path` at which to write sitemaps. The difference between the two options is that the `sitemaps_path` is used when generating a link to a sitemap file. For example, if we set `SitemapGenerator::Sitemap.sitemaps_path = 'en/'` and use the default `public_path` sitemaps will be written to `public/en/`.
|
734
|
+
* `sitemaps_path` - String. A **relative path** giving a directory under your `public_path` at which to write sitemaps. The difference between the two options is that the `sitemaps_path` is used when generating a link to a sitemap file. For example, if we set `SitemapGenerator::Sitemap.sitemaps_path = 'en/'` and use the default `public_path` sitemaps will be written to `public/en/`. The URL to the sitemap index would then be `http://example.com/en/sitemap.xml.gz`.
|
661
735
|
|
662
736
|
* `verbose` - Boolean. Whether to **output a sitemap summary** describing the sitemap files and giving statistics about your sitemap. Default is `false`. When using the Rake tasks `verbose` will be `true` unless you pass the `-s` option.
|
663
737
|
|
@@ -678,6 +752,7 @@ Sitemap Groups is a powerful feature that is also very simple to use.
|
|
678
752
|
* The sitemap index file is shared by all groups.
|
679
753
|
* Groups can handle any number of links.
|
680
754
|
* Group sitemaps are finalized (written out) as they get full and at the end of each group.
|
755
|
+
* It's a good idea to name your groups
|
681
756
|
|
682
757
|
### A Groups Example
|
683
758
|
|
@@ -703,16 +778,17 @@ end
|
|
703
778
|
And the output from running the above:
|
704
779
|
|
705
780
|
```
|
706
|
-
|
707
|
-
+
|
708
|
-
+
|
709
|
-
+
|
710
|
-
|
781
|
+
In /Users/karl/projects/sitemap_generator-test/public/
|
782
|
+
+ en/english.xml.gz 1 links / 328 Bytes
|
783
|
+
+ fr/french.xml.gz 1 links / 329 Bytes
|
784
|
+
+ sitemap1.xml.gz 2 links / 346 Bytes
|
785
|
+
+ sitemap.xml.gz 3 sitemaps / 252 Bytes
|
786
|
+
Sitemap stats: 4 links / 3 sitemaps / 0m00s
|
711
787
|
```
|
712
788
|
|
713
|
-
So we have two sitemaps with one link each and one sitemap with
|
789
|
+
So we have two sitemaps with one link each and one sitemap with two links. The sitemaps from the groups are easy to spot by their filenames. They are `english.xml.gz` and `french.xml.gz`. They contain only one link each because **`include_index` and `include_root` are set to `false` by default** in a group.
|
714
790
|
|
715
|
-
On the other hand, the default sitemap which we added `/rss` to has
|
791
|
+
On the other hand, the default sitemap which we added `/rss` to has two links. The root url was added to it when we added `/rss`. If we hadn't added that link `sitemap1.xml.gz` would not have been created. So **when we are using groups, the default sitemap will only be created if we add links to it**.
|
716
792
|
|
717
793
|
**The sitemap index file is shared by all groups**. You can change its filename by setting `SitemapGenerator::Sitemap.filename` or by passing the `:filename` option to `create`.
|
718
794
|
|
@@ -730,6 +806,7 @@ A news item can be added to a sitemap URL by passing a `:news` hash to `add`. T
|
|
730
806
|
#### Example
|
731
807
|
|
732
808
|
```ruby
|
809
|
+
SitemapGenerator::Sitemap.default_host = "http://www.example.com"
|
733
810
|
SitemapGenerator::Sitemap.create do
|
734
811
|
add('/index.html', :news => {
|
735
812
|
:publication_name => "Example",
|
@@ -763,6 +840,7 @@ Images can be added to a sitemap URL by passing an `:images` array to `add`. Ea
|
|
763
840
|
#### Example
|
764
841
|
|
765
842
|
```ruby
|
843
|
+
SitemapGenerator::Sitemap.default_host = "http://www.example.com"
|
766
844
|
SitemapGenerator::Sitemap.create do
|
767
845
|
add('/index.html', :images => [{
|
768
846
|
:loc => 'http://www.example.com/image.png',
|
@@ -788,14 +866,17 @@ To add more than one video to a url, pass an array of video hashes using the `:v
|
|
788
866
|
#### Example
|
789
867
|
|
790
868
|
```ruby
|
791
|
-
|
792
|
-
|
793
|
-
:
|
794
|
-
|
795
|
-
|
796
|
-
|
797
|
-
|
798
|
-
|
869
|
+
SitemapGenerator::Sitemap.default_host = "http://www.example.com"
|
870
|
+
SitemapGenerator::Sitemap.create do
|
871
|
+
add('/index.html', :video => {
|
872
|
+
:thumbnail_loc => 'http://www.example.com/video1_thumbnail.png',
|
873
|
+
:title => 'Title',
|
874
|
+
:description => 'Description',
|
875
|
+
:content_loc => 'http://www.example.com/cool_video.mpg',
|
876
|
+
:tags => %w[one two three],
|
877
|
+
:category => 'Category'
|
878
|
+
})
|
879
|
+
end
|
799
880
|
```
|
800
881
|
|
801
882
|
#### Supported options
|
@@ -811,6 +892,7 @@ Pages with geo data can be added by passing a `:geo` Hash to `add`. The Hash on
|
|
811
892
|
#### Example:
|
812
893
|
|
813
894
|
```ruby
|
895
|
+
SitemapGenerator::Sitemap.default_host = "http://www.example.com"
|
814
896
|
SitemapGenerator::Sitemap.create do
|
815
897
|
add('/stores/1234.xml', :geo => { :format => 'kml' })
|
816
898
|
end
|
@@ -832,10 +914,12 @@ Check out the Google specification [here][alternate_links].
|
|
832
914
|
#### Example
|
833
915
|
|
834
916
|
```ruby
|
917
|
+
SitemapGenerator::Sitemap.default_host = "http://www.example.com"
|
835
918
|
SitemapGenerator::Sitemap.create do
|
836
919
|
add('/index.html', :alternate => {
|
837
920
|
:href => 'http://www.example.de/index.html',
|
838
|
-
:lang => 'de'
|
921
|
+
:lang => 'de',
|
922
|
+
:nofollow => true
|
839
923
|
})
|
840
924
|
end
|
841
925
|
```
|
@@ -844,7 +928,7 @@ end
|
|
844
928
|
|
845
929
|
* `:href` - Required, string.
|
846
930
|
* `:lang` - Required, string.
|
847
|
-
|
931
|
+
* `:nofollow` - Optional, boolean. Used to mark link as "nofollow".
|
848
932
|
|
849
933
|
## Raison d'être
|
850
934
|
|
@@ -891,11 +975,13 @@ Tested and working on:
|
|
891
975
|
|
892
976
|
## Wishlist & Coming Soon
|
893
977
|
|
894
|
-
* Rails framework agnosticism; support for other frameworks like Merb
|
895
|
-
|
896
978
|
|
897
979
|
## Thanks (in no particular order)
|
898
980
|
|
981
|
+
I've kind of stopped maintaining the list of contributors. To all those who have contributed code or a donation, many thanks!
|
982
|
+
|
983
|
+
Some past contributors:
|
984
|
+
|
899
985
|
* [Eric Hochberger][ehoch]
|
900
986
|
* [Rodrigo Flores](https://github.com/rodrigoflores) for News sitemaps
|
901
987
|
* [Alex Soto](http://github.com/apsoto) for Video sitemaps
|