sitemap_generator 4.0.alpha → 4.0

Sign up to get free protection for your applications and to get access to all the features.
data/Gemfile CHANGED
@@ -7,7 +7,7 @@ group :development, :test do
7
7
  gem 'mocha'
8
8
  gem 'nokogiri'
9
9
  gem 'rake'
10
- gem 'rcov'
11
10
  gem 'rspec'
12
- # gem 'ruby-debug19', :require => 'ruby-debug'
11
+ #gem 'ruby-debug19', :require => 'ruby-debug'
12
+ #gem 'simplecov', :require => false
13
13
  end
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: ./
3
3
  specs:
4
- sitemap_generator (4.0.alpha)
4
+ sitemap_generator (4.0)
5
5
  builder
6
6
 
7
7
  GEM
@@ -14,7 +14,6 @@ GEM
14
14
  metaclass (~> 0.0.1)
15
15
  nokogiri (1.5.0)
16
16
  rake (0.9.2.2)
17
- rcov (0.9.11)
18
17
  rspec (2.8.0)
19
18
  rspec-core (~> 2.8.0)
20
19
  rspec-expectations (~> 2.8.0)
@@ -32,6 +31,5 @@ DEPENDENCIES
32
31
  mocha
33
32
  nokogiri
34
33
  rake
35
- rcov
36
34
  rspec
37
35
  sitemap_generator!
data/README.md CHANGED
@@ -9,13 +9,14 @@ Sitemaps adhere to the [Sitemap 0.9 protocol][sitemap_protocol] specification.
9
9
  * Framework agnostic
10
10
  * Supports [News sitemaps][sitemap_news], [Video sitemaps][sitemap_video], [Image sitemaps][sitemap_images], [Geo sitemaps][sitemap_geo], [Mobile sitemaps][sitemap_mobile] and [Alternate Links][alternate_links]
11
11
  * Supports read-only filesystems like Heroku via uploading to a remote host like Amazon S3
12
- * Compatible with Rails 2 & 3
12
+ * Compatible with Rails 2 & 3 and tested with Ruby REE, 1.9.2 & 1.9.3
13
13
  * Adheres to the [Sitemap 0.9 protocol][sitemap_protocol]
14
14
  * Handles millions of links
15
15
  * Automatically compresses your sitemaps
16
16
  * Notifies search engines (Google, Bing, SitemapWriter) of new sitemaps
17
17
  * Ensures your old sitemaps stay in place if the new sitemap fails to generate
18
- * Gives you complete control over your sitemaps and their content
18
+ * Gives you complete control over your sitemap contents and naming scheme
19
+ * Intelligent sitemap indexing
19
20
 
20
21
  ### Show Me
21
22
 
@@ -49,8 +50,7 @@ Output:
49
50
 
50
51
  ```
51
52
  In /Users/karl/projects/sitemap_generator-test/public/
52
- + sitemap1.xml.gz 3 links / 357 Bytes
53
- + sitemap_index.xml.gz 1 sitemaps / 228 Bytes
53
+ + sitemap.xml.gz 3 links / 364 Bytes
54
54
  Sitemap stats: 3 links / 1 sitemaps / 0m00s
55
55
 
56
56
  Successful ping of Google
@@ -65,9 +65,45 @@ Does your website use SitemapGenerator to generate Sitemaps? Where would you be
65
65
 
66
66
  <a href='http://www.pledgie.com/campaigns/15267'><img alt='Click here to lend your support to: SitemapGenerator and make a donation at www.pledgie.com !' src='http://pledgie.com/campaigns/15267.png?skin_name=chrome' border='0' /></a>
67
67
 
68
+ ## Important changes in version 4!
69
+
70
+ Version 4.0 introduces a new **non-backwards compatible** naming scheme. **If you are running version 3 or earlier and you upgrade to version 4, you need to make a couple small changes to ensure that search engines can still find your sitemaps!** Your sitemaps will still work fine, but the name of the index file has changed.
71
+
72
+ ### So what has changed?
73
+
74
+ * **The index is generated intelligently**. SitemapGenerator now detects whether you need an index or not, and only generates one if you need it or have requested it. So small sites (less than 50,000 links) won't have one, large sites will. You don't have to worry about anything. And with the `create_index` option, it's easier than ever to control index creation to suit your needs.
75
+
76
+ * **The default index file name has changed** from `sitemap_index.xml.gz` to just `sitemap.xml.gz`. So the `_index` part has been removed. This is a more standard naming scheme for the sitemaps. Any further sitemaps are named `sitemap1.xml.gz`, `sitemap2.xml.gz`, `sitemap3.xml.gz` etc, just as before.
77
+
78
+ * **Everyone now points search engines to the `sitemap.xml.gz` file**. It doesn't matter whether your site has 10 links or a million links, just point to `sitemap.xml.gz`. If your site needs an index, that is the index. If it doesn't, then that's your sitemap. Simple.
79
+
80
+ * **It's easier to write custom namers** because the index and the sitemaps share the same namer instance (which is now a `SitemapGenerator::SimpleNamer` instance).
81
+
82
+ * **Groups share the new naming convention**. So the files in your `geo` group will be named `geo.xml.gz`, `geo1.xml.gz`, `geo2.xml.gz` etc. Pre-version 4 these files would have been named `geo1.xml.gz`, `geo2.xml.gz`, `geo3.xml.gz` etc.
83
+
84
+ ### I don't want it! How can I keep everything as it was?
85
+
86
+ You don't care, you just want to get on with your day. To resort to pre-version 4 behaviour add the following to your sitemap config:
87
+
88
+ ```ruby
89
+ SitemapGenerator::Sitemap.create_index = true
90
+ SitemapGenerator::Sitemap.namer = SitemapGenerator::SimpleNamer.new(:sitemap, :zero => '_index')
91
+ ```
92
+
93
+ This tells SitemapGenerator to always create an index file and to name it `sitemap_index.xml.gz`. If you are already using custom namers, you don't need to set `namer`; your old namers should still work as before. If you are using named groups, setting the sitemap namer in this way won't affect your groups, which will still be using the new naming scheme. If this is an issue for you, you may have to create namers for your groups.
94
+
95
+ ### I want it! What do I need to do?
96
+
97
+ 1. Update your `robots.txt` file and make sure it points to `sitemap.xml.gz`.
98
+ 2. Generate your sitemaps to create the new `sitemap.xml.gz` file.
99
+ 3. Optionally remove the old `sitemap_index.xml.gz` file (or link it to the new file if you want to make sure that search engines can find it while you update them.)
100
+ 4. Go to your Google Webmaster tools and other places where you've pointed search engines to your sitemaps and point them to your new `sitemap.xml.gz` file.
101
+
102
+ That's it! Welcome to the future!
68
103
 
69
104
  ## Changelog
70
105
 
106
+ * **v4.0: NEW, NON-BACKWARDS COMPATIBLE CHANGES.** See above for more info. `create_index` defaults to `:auto`. Define `SitemapGenerator::SimpleNamer` class for simpler custom namers compatible with the new naming conventions. Deprecate `sitemaps_namer`, `sitemap_index_namer` and their respective namer classes. It's more just that their usage is discouraged. Support `nofollow` option on alternate links. Fix formatting of `publication_date` in News sitemaps.
71
107
  * v3.4: Support [alternate links][alternate_links] for urls; Support configurable options in the `SitemapGenerator::S3Adapter`
72
108
  * v3.3: **Support creating sitemaps with no index file**. A big thank-you to [Eric Hochberger][ehoch] for generously paying for this feature.
73
109
  * v3.2.1: Fix syntax error in SitemapGenerator::S3Adapter
@@ -203,7 +239,7 @@ SitemapGenerator::Sitemap.ping_search_engines
203
239
  Alternatively you can pass in the full URL to your sitemap index in which case we would have just the following:
204
240
 
205
241
  ```ruby
206
- SitemapGenerator::Sitemap.ping_search_engines('http://example.com/sitemap_index.xml.gz')
242
+ SitemapGenerator::Sitemap.ping_search_engines('http://example.com/sitemap.xml.gz')
207
243
  ```
208
244
 
209
245
  ### Crontab
@@ -225,7 +261,7 @@ end
225
261
  You should add the URL of the sitemap index file to `public/robots.txt` to help search engines find your sitemaps. The URL should be the complete URL to the sitemap index. For example:
226
262
 
227
263
  ```
228
- Sitemap: http://www.example.com/sitemap_index.xml.gz
264
+ Sitemap: http://www.example.com/sitemap.xml.gz
229
265
  ```
230
266
 
231
267
  ## Deployments & Capistrano
@@ -233,40 +269,52 @@ Sitemap: http://www.example.com/sitemap_index.xml.gz
233
269
  To ensure that your application's sitemaps are available after a deployment you can do one of the following:
234
270
 
235
271
  1. **Generate sitemaps into a directory which is shared by all deployments.**
236
-
237
272
  You can set your sitemaps path to your shared directory using the `sitemaps_path` option. For example if we have a directory `public/shared/` that is shared by all deployments we can have our sitemaps generated into that directory by setting:
238
273
 
239
- ```ruby
240
- SitemapGenerator::Sitemap.sitemaps_path = 'shared/'
241
- ```
242
-
274
+ ```ruby
275
+ SitemapGenerator::Sitemap.sitemaps_path = 'shared/'
276
+ ```
243
277
  2. **Copy the sitemaps from the previous deploy over to the new deploy:**
244
-
245
278
  (You will need to customize the task if you are using custom sitemap filenames or locations.)
246
279
 
247
- ```ruby
248
- after "deploy:update_code", "deploy:copy_old_sitemap"
249
- namespace :deploy do
250
- task :copy_old_sitemap do
251
- run "if [ -e #{previous_release}/public/sitemap_index.xml.gz ]; then cp #{previous_release}/public/sitemap* #{current_release}/public/; fi"
252
- end
253
- end
254
- ```
255
-
280
+ ```ruby
281
+ after "deploy:update_code", "deploy:copy_old_sitemap"
282
+ namespace :deploy do
283
+ task :copy_old_sitemap do
284
+ run "if [ -e #{previous_release}/public/sitemap.xml.gz ]; then cp #{previous_release}/public/sitemap* #{current_release}/public/; fi"
285
+ end
286
+ end
287
+ ```
256
288
  3. **Regenerate your sitemaps after each deployment:**
257
289
 
258
- ```ruby
259
- after "deploy", "refresh_sitemaps"
260
- task :refresh_sitemaps do
261
- run "cd #{latest_release} && RAILS_ENV=#{rails_env} rake sitemap:refresh"
262
- end
263
- ```
290
+ ```ruby
291
+ after "deploy", "refresh_sitemaps"
292
+ task :refresh_sitemaps do
293
+ run "cd #{latest_release} && RAILS_ENV=#{rails_env} rake sitemap:refresh"
294
+ end
295
+ ```
264
296
 
265
297
  ### Sitemaps with no Index File
266
298
 
267
- Sometimes you may not want the sitemap index file to be automatically created, for example when you have a small site with only one sitemap file. Or you may only want an index file created if you have more than one sitemap file. Or you may never want the index file to be created.
299
+ The sitemap index file is created for you on-demand, meaning that if you have a large site with more than one sitemap file, you will have a sitemap index file to reference those sitemap files. If however you have a small site with only one sitemap file, you don't require an index and so no index will be created. In both cases the index and sitemap file's name, respectively, is `sitemap.xml.gz`.
300
+
301
+ You may want to always create an index, even if you only have a small site. Or you may never want to create an index. For these cases, you can use the `create_index` option to control index creation. You can read about this option in the Sitemap Options section below.
302
+
303
+ To always create an index:
304
+ ```ruby
305
+ SitemapGenerator::Sitemap.create_index = true
306
+ ```
307
+
308
+ To never create an index:
309
+ ```ruby
310
+ SitemapGenerator::Sitemap.create_index = false
311
+ ```
312
+ Your sitemaps will still be called `sitemap.xml.gz`, `sitemap1.xml.gz`, `sitemap2.xml.gz`, etc.
268
313
 
269
- To handle these cases, take a look at the `create_index` option in the Sitemap Options section below.
314
+ And the default "intelligent" behaviour:
315
+ ```ruby
316
+ SitemapGenerator::Sitemap.create_index = :auto
317
+ ```
270
318
 
271
319
  ### Upload Sitemaps to a Remote Host
272
320
 
@@ -287,42 +335,46 @@ Sitemap Generator uses CarrierWave to support uploading to Amazon S3 store, Rack
287
335
 
288
336
  2. Once you have CarrierWave setup and configured all you need to do is set some options in your sitemap config, such as:
289
337
 
290
- * `default_host` - your website host name
291
- * `sitemaps_host` - the remote host where your sitemaps will be hosted
292
- * `public_path` - the directory to write sitemaps to locally e.g. `tmp/`
293
- * `sitemaps_path` - set to a directory/path if you don't want to upload to the root of your `sitemaps_host`
294
- * `adapter` - instance of `SitemapGenerator::WaveAdapter`
338
+ * `default_host` - your website host name
339
+ * `sitemaps_host` - the remote host where your sitemaps will be hosted
340
+ * `public_path` - the directory to write sitemaps to locally e.g. `tmp/`
341
+ * `sitemaps_path` - set to a directory/path if you don't want to upload to the root of your `sitemaps_host`
342
+ * `adapter` - instance of `SitemapGenerator::WaveAdapter`
295
343
 
296
- For Example:
344
+ For Example:
297
345
 
298
- ```ruby
299
- SitemapGenerator::Sitemap.default_host = "http://www.example.com"
300
- SitemapGenerator::Sitemap.sitemaps_host = "http://s3.amazonaws.com/sitemap-generator/"
301
- SitemapGenerator::Sitemap.public_path = 'tmp/'
302
- SitemapGenerator::Sitemap.sitemaps_path = 'sitemaps/'
303
- SitemapGenerator::Sitemap.adapter = SitemapGenerator::WaveAdapter.new
304
- ```
346
+ ```ruby
347
+ SitemapGenerator::Sitemap.default_host = "http://www.example.com"
348
+ SitemapGenerator::Sitemap.sitemaps_host = "http://s3.amazonaws.com/sitemap-generator/"
349
+ SitemapGenerator::Sitemap.public_path = 'tmp/'
350
+ SitemapGenerator::Sitemap.sitemaps_path = 'sitemaps/'
351
+ SitemapGenerator::Sitemap.adapter = SitemapGenerator::WaveAdapter.new
352
+ ```
305
353
 
306
354
  3. Update your `robots.txt` file to point robots to the remote sitemap index file, e.g:
307
355
 
308
- ```
309
- Sitemap: http://s3.amazonaws.com/sitemap-generator/sitemaps/sitemap_index.xml.gz
310
- ```
356
+ ```
357
+ Sitemap: http://s3.amazonaws.com/sitemap-generator/sitemaps/sitemap_index.xml.gz
358
+ ```
359
+
360
+ You generate your sitemaps as usual using `rake sitemap:refresh`.
311
361
 
312
- You generate your sitemaps as usual using `rake sitemap:refresh`.
362
+ Note that SitemapGenerator will automatically turn off `include_index` in this case because
363
+ the `sitemaps_host` does not match the `default_host`. The link to the sitemap index file
364
+ that would otherwise be included would point to a different host than the rest of the links
365
+ in the sitemap, something that the sitemap rules forbid. (Since version 3.2 this is no
366
+ longer an issue because [`include_index` is off by default][include_index_change].)
313
367
 
314
- Note that SitemapGenerator will automatically turn off `include_index` in this case because
315
- the `sitemaps_host` does not match the `default_host`. The link to the sitemap index file
316
- that would otherwise be included would point to a different host than the rest of the links
317
- in the sitemap, something that the sitemap rules forbid. (Since version 3.2 this is no
318
- longer an issue because [`include_index` is off by default][include_index_change].)
368
+ 4. Verify to google that you own the s3 url
369
+
370
+ In order for Google to use your sitemap, you need to prove you own the s3 bucket through [google webmaster tools](https://www.google.com/webmasters/tools/home?hl=en). In the example above, you would add the site `http://s3.amazonaws.com/sitemap-generator/sitemaps`. Once you have verified you own the directory then add your `sitemap.xml.gz` to this list of sitemaps for the site.
319
371
 
320
372
  ### Generating Multiple Sitemaps
321
373
 
322
374
  Each call to `create` creates a new sitemap index and associated sitemaps. You can call `create` as many times as you want within your sitemap configuration.
323
375
 
324
376
  You must remember to use a different filename or location for each set of sitemaps, otherwise they will
325
- overwrite each other. You can use the `filename`, `sitemaps_namer` and `sitemaps_path` options for this.
377
+ overwrite each other. You can use the `filename`, `namer` and `sitemaps_path` options for this.
326
378
 
327
379
  In the following example we generate three sitemaps each in its own subdirectory:
328
380
 
@@ -340,13 +392,13 @@ Outputs:
340
392
 
341
393
  ```
342
394
  + sitemaps/google/sitemap1.xml.gz 2 links / 822 Bytes / 328 Bytes gzipped
343
- + sitemaps/google/sitemap_index.xml.gz 1 sitemaps / 389 Bytes / 217 Bytes gzipped
395
+ + sitemaps/google/sitemap.xml.gz 1 sitemaps / 389 Bytes / 217 Bytes gzipped
344
396
  Sitemap stats: 2 links / 1 sitemaps / 0m00s
345
- + sitemaps/bing/sitemap1.xml.gz 2 links / 820 Bytes / 330 Bytes gzipped
346
- + sitemaps/bing/sitemap_index.xml.gz 1 sitemaps / 388 Bytes / 217 Bytes gzipped
397
+ + sitemaps/bing/sitemap1.xml.gz 2 links / 820 Bytes / 330 Bytes gzipped
398
+ + sitemaps/bing/sitemap.xml.gz 1 sitemaps / 388 Bytes / 217 Bytes gzipped
347
399
  Sitemap stats: 2 links / 1 sitemaps / 0m00s
348
- + sitemaps/apple/sitemap1.xml.gz 2 links / 820 Bytes / 330 Bytes gzipped
349
- + sitemaps/apple/sitemap_index.xml.gz 1 sitemaps / 388 Bytes / 214 Bytes gzipped
400
+ + sitemaps/apple/sitemap1.xml.gz 2 links / 820 Bytes / 330 Bytes gzipped
401
+ + sitemaps/apple/sitemap.xml.gz 1 sitemaps / 388 Bytes / 214 Bytes gzipped
350
402
  Sitemap stats: 2 links / 1 sitemaps / 0m00s
351
403
  ```
352
404
 
@@ -409,34 +461,24 @@ end
409
461
  A few things to note:
410
462
 
411
463
  * `SitemapGenerator::Sitemap` is a lazy-initialized sitemap object provided for your convenience.
412
- * Every sitemap must set `default_host`. This is the hostname that is used when building links to add to the sitemap.
464
+ * Every sitemap must set `default_host`. This is the hostname that is used when building links to add to the sitemap (and all links in a sitemap must belong to the same host).
413
465
  * The `create` method takes a block with calls to `add` to add links to the sitemap.
414
- * The sitemaps are written to the `public/` directory, which is the default location. You can specify a custom location using the `public_path` or `sitemaps_path` option.
466
+ * The sitemaps are written to the `public/` directory in the directory from which the script is run. You can specify a custom location using the `public_path` or `sitemaps_path` option.
415
467
 
416
468
  Now let's see what is output when we run this configuration with `rake sitemap:refresh:no_ping`:
417
469
 
418
470
  ```
419
- + sitemap1.xml.gz 2 links / 923 Bytes / 329 Bytes gzipped
420
- + sitemap_index.xml.gz 1 sitemaps / 364 Bytes / 199 Bytes gzipped
471
+ In /Users/karl/projects/sitemap_generator-test/public/
472
+ + sitemap.xml.gz 2 links / 347 Bytes
421
473
  Sitemap stats: 2 links / 1 sitemaps / 0m00s
422
474
  ```
423
475
 
424
- Weird! The sitemap has two links, even though only added one! This is because SitemapGenerator adds the root URL `/` by default. (Note that prior to version 3.2 the URL of the sitemap index file was also added to the sitemap by default but [this behaviour has been changed][include_index_change] because of Google complaining about nested indexing.) You can change the default behaviour by setting the `include_root` or `include_index` option.
425
-
426
- Now let's take a look at the files that were created. After uncompressing and XML-tidying the contents we have:
476
+ Weird! The sitemap has two links, even though we only added one! This is because SitemapGenerator adds the root URL `/` for you by default. (Note that prior to version 3.2 the URL of the sitemap index file was also added to the sitemap by default but [this behaviour has been changed][include_index_change] because of Google complaining about nested indexing. This also doesn't make sense anymore because indexes are not always needed.) You can change the default behaviour by setting the `include_root` or `include_index` option.
427
477
 
428
- * `public/sitemap_index.xml.gz`
478
+ Now let's take a look at the file that was created. After uncompressing and XML-tidying the contents we have:
429
479
 
430
- ```xml
431
- <?xml version="1.0" encoding="UTF-8"?>
432
- <sitemapindex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd">
433
- <sitemap>
434
- <loc>http://www.example.com/sitemap1.xml.gz</loc>
435
- </sitemap>
436
- </sitemapindex>
437
- ```
438
480
 
439
- * `public/sitemap1.xml.gz`
481
+ * `public/sitemap.xml.gz`
440
482
 
441
483
  ```xml
442
484
  <?xml version="1.0" encoding="UTF-8"?>
@@ -458,6 +500,39 @@ Now let's take a look at the files that were created. After uncompressing and X
458
500
 
459
501
  The sitemaps conform to the [Sitemap 0.9 protocol][sitemap_protocol]. Notice the value for `priority` and `changefreq` on the root link, the one that was added for us? The values tell us that this link is the highest priority and should be checked regularly because it are constantly changing. You can specify your own values for these options in your call to `add`.
460
502
 
503
+ In this example no sitemap index was created because we have so few links, so none was needed. If we run the same example above and set `create_index = true` we can take a look at what an index file looks like:
504
+
505
+ ```ruby
506
+ SitemapGenerator::Sitemap.default_host = "http://www.example.com"
507
+ SitemapGenerator::Sitemap.create_index = true
508
+ SitemapGenerator::Sitemap.create do
509
+ add '/welcome'
510
+ end
511
+ ```
512
+
513
+ And the output:
514
+
515
+ ```
516
+ In /Users/karl/projects/sitemap_generator-test/public/
517
+ + sitemap1.xml.gz 2 links / 347 Bytes
518
+ + sitemap.xml.gz 1 sitemaps / 228 Bytes
519
+ Sitemap stats: 2 links / 1 sitemaps / 0m00s
520
+ ```
521
+
522
+ Now if we look at the uncompressed and formatted contents of `sitemap.xml.gz` we can see that it is a sitemap index and `sitemap1.xml.gz` is a sitemap:
523
+
524
+ * `public/sitemap.xml.gz`
525
+
526
+ ```xml
527
+ <?xml version="1.0" encoding="UTF-8"?>
528
+ <sitemapindex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd">
529
+ <sitemap>
530
+ <loc>http://www.example.com/sitemap1.xml.gz</loc>
531
+ <lastmod>2013-05-01T18:10:26-07:00</lastmod>
532
+ </sitemap>
533
+ </sitemapindex>
534
+ ```
535
+
461
536
  ### Adding Links
462
537
 
463
538
  You call `add` in the block passed to `create` to add a **path** to your sitemap. `add` takes a string path and optional hash of options, generates the URL and adds it to the sitemap. You only need to pass a **path** because the URL will be built for us using the `default_host` we specified. However, if we want to use a different host for a particular link, we can pass the `:host` option to `add`.
@@ -483,8 +558,7 @@ In the example about we pass a `lastmod` (last modified) option with the value o
483
558
  Looking at the output from running this sitemap, we see that we have a few more links than before:
484
559
 
485
560
  ```
486
- + sitemap1.xml.gz 12 links / 2.3 KB / 365 Bytes gzipped
487
- + sitemap_index.xml.gz 1 sitemaps / 364 Bytes / 199 Bytes gzipped
561
+ + sitemap.xml.gz 12 links / 2.3 KB / 365 Bytes gzipped
488
562
  Sitemap stats: 12 links / 1 sitemaps / 0m00s
489
563
  ```
490
564
 
@@ -518,7 +592,7 @@ add content_path(content), :lastmod => content.updated_at
518
592
 
519
593
  * `host` - Default: `default_host` (String).
520
594
 
521
- Host to use when building the URL. Example:
595
+ Host to use when building the URL. It's not technically valid to specify a different host for a link in a sitemap according to the spec, but this facility exists in case you have a need. Example:
522
596
 
523
597
  ```ruby
524
598
  add '/login', :host => 'https://securehost.com'
@@ -562,6 +636,8 @@ SitemapGenerator::Sitemap.create do
562
636
  end
563
637
  ```
564
638
 
639
+ When you add links in this way, an index is always created, unless you've explicitly set `create_index` to `false`.
640
+
565
641
  ### Accessing the LinkSet instance
566
642
 
567
643
  Sometimes you need to mess with the internals to do custom stuff. If you need access to the LinkSet instance from within `create()` you can use the `sitemap` method to do so.
@@ -570,10 +646,10 @@ In this example, say we have already pre-generated three sitemap files: `sitemap
570
646
 
571
647
  ```ruby
572
648
  SitemapGenerator::Sitemap.default_host = "http://www.example.com"
649
+ SitemapGenerator::Sitemap.namer = SitemapGenerator::SimpleNamer.new(:sitemap, :start => 4)
573
650
  SitemapGenerator::Sitemap.create do
574
- 3.times do |i|
575
- add_to_index sitemap.sitemaps_namer.to_s
576
- sitemap.sitemaps_namer.next
651
+ (1..3).each do |i|
652
+ add_to_index "sitemap#{i}.xml.gz"
577
653
  end
578
654
  add '/home'
579
655
  add '/another'
@@ -584,9 +660,9 @@ The output looks something like this:
584
660
 
585
661
  ```
586
662
  In /Users/karl/projects/sitemap_generator-test/public/
587
- + sitemap4.xml.gz 4 links / 347 Bytes
588
- + sitemap_index.xml.gz 4 sitemaps / 242 Bytes
589
- Sitemap stats: 4 links / 4 sitemaps / 0m00s
663
+ + sitemap4.xml.gz 3 links / 355 Bytes
664
+ + sitemap.xml.gz 4 sitemaps / 242 Bytes
665
+ Sitemap stats: 3 links / 4 sitemaps / 0m00s
590
666
  ```
591
667
 
592
668
  ### Speeding Things Up
@@ -624,9 +700,8 @@ This is useful if you are setting a lot of options.
624
700
  Finally, passed as options in a call to `group`:
625
701
 
626
702
  ```ruby
627
- SitemapGenerator::Sitemap.create do
628
- group(:default_host => 'http://example.com',
629
- :sitemaps_path => 'sitemaps/') do
703
+ SitemapGenerator::Sitemap.create(:default_host => 'http://example.com') do
704
+ group(:filename => :somegroup, :sitemaps_path => 'sitemaps/') do
630
705
  add '/home'
631
706
  end
632
707
  end
@@ -642,9 +717,9 @@ The following options are supported:
642
717
 
643
718
  * `default_host` - String. Required. **Host including protocol** to use when building a link to add to your sitemap. For example `http://example.com`. Calling `add '/home'` would then generate the URL `http://example.com/home` and add that to the sitemap. You can pass a `:host` option in your call to `add` to override this value on a per-link basis. For example calling `add '/home', :host => 'https://example.com'` would generate the URL `https://example.com/home`, for that link only.
644
719
 
645
- * `filename` - Symbol. The **base name for the files** that will be generated. The default value is `:sitemap`. This yields sitemaps with names like `sitemap1.xml.gz`, `sitemap2.xml.gz`, `sitemap3.xml.gz` etc, and a sitemap index named `sitemap_index.xml.gz`. If we now set the value to `:geo` the sitemaps would be named `geo1.xml.gz`, `geo2.xml.gz`, `geo3.xml.gz` etc, and the sitemap index would be named `geo_index.xml.gz`.
720
+ * `filename` - Symbol. The **base name for the files** that will be generated. The default value is `:sitemap`. This yields files with names like `sitemap.xml.gz`, `sitemap1.xml.gz`, `sitemap2.xml.gz`, `sitemap3.xml.gz` etc. If we now set the value to `:geo` the files would be named `geo.xml.gz`, `geo1.xml.gz`, `geo2.xml.gz`, `geo3.xml.gz` etc.
646
721
 
647
- * `include_index` - Boolean. Whether to **add a link to the sitemap index** to the current sitemap. This points search engines to your Sitemap Index to include it in the indexing of your site. 2012-07: This is now turned off by default because Google may complain about there being 'Nested Sitemap indexes'. Default is `false`. Turned off when `sitemaps_host` is set or within a `group()` block.
722
+ * `include_index` - Boolean. Whether to **add a link pointing to the sitemap index** to the current sitemap. This points search engines to your Sitemap Index to include it in the indexing of your site. 2012-07: This is now turned off by default because Google may complain about there being 'Nested Sitemap indexes'. Default is `false`. Turned off when `sitemaps_host` is set or within a `group()` block.
648
723
 
649
724
  * `include_root` - Boolean. Whether to **add the root** url i.e. '/' to the current sitemap. Default is `true`. Turned off within a `group()` block.
650
725
 
@@ -652,12 +727,11 @@ The following options are supported:
652
727
 
653
728
  * `sitemaps_host` - String. **Host including protocol** to use when generating a link to a sitemap file i.e. the hostname of the server where the sitemaps are hosted. The value will differ from the hostname in your sitemap links. For example: `'http://amazon.aws.com/'`. Note that `include_index` is
654
729
  automatically turned off when the `sitemaps_host` does not match `default_host`.
655
- Because the link to the sitemap index file that would otherwise be added would point to a
656
- different host than the rest of the links in the sitemap. Something that the sitemap rules forbid.
730
+ Because the link to the sitemap index file that would otherwise be added would point to a different host than the rest of the links in the sitemap. Something that the sitemap rules forbid.
657
731
 
658
- * `sitemaps_namer` - A `SitemapGenerator::SitemapNamer` instance **for generating sitemap names**. You can read about Sitemap Namers by reading the API docs. Sitemap Namers don't apply to the sitemap index. You can only modify the name of the index file using the `filename` option. Sitemap Namers allow you to set the name, extension and number sequence for sitemap files.
732
+ * `namer` - A `SitemapGenerator::SimpleNamer` instance **for generating sitemap names**. You can read about Sitemap Namers by reading the API docs. Allows you to set the name, extension and number sequence for sitemap files, as well as modify the name of the first file in the sequence, which is often the index file. A simple example if we want to generate files like 'newname.xml.gz', 'newname1.xml.gz', etc is `SitemapGenerator::SimpleNamer.new(:newname)`. I've deprecated the old namer options `sitemaps_namer` and `sitemap_index_namer` in favour of this integrated approach, however those should still work.
659
733
 
660
- * `sitemaps_path` - String. A **relative path** giving a directory under your `public_path` at which to write sitemaps. The difference between the two options is that the `sitemaps_path` is used when generating a link to a sitemap file. For example, if we set `SitemapGenerator::Sitemap.sitemaps_path = 'en/'` and use the default `public_path` sitemaps will be written to `public/en/`. And when the sitemap index is added to our sitemap it would have a URL like `http://example.com/en/sitemap_index.xml.gz`.
734
+ * `sitemaps_path` - String. A **relative path** giving a directory under your `public_path` at which to write sitemaps. The difference between the two options is that the `sitemaps_path` is used when generating a link to a sitemap file. For example, if we set `SitemapGenerator::Sitemap.sitemaps_path = 'en/'` and use the default `public_path` sitemaps will be written to `public/en/`. The URL to the sitemap index would then be `http://example.com/en/sitemap.xml.gz`.
661
735
 
662
736
  * `verbose` - Boolean. Whether to **output a sitemap summary** describing the sitemap files and giving statistics about your sitemap. Default is `false`. When using the Rake tasks `verbose` will be `true` unless you pass the `-s` option.
663
737
 
@@ -678,6 +752,7 @@ Sitemap Groups is a powerful feature that is also very simple to use.
678
752
  * The sitemap index file is shared by all groups.
679
753
  * Groups can handle any number of links.
680
754
  * Group sitemaps are finalized (written out) as they get full and at the end of each group.
755
+ * It's a good idea to name your groups
681
756
 
682
757
  ### A Groups Example
683
758
 
@@ -703,16 +778,17 @@ end
703
778
  And the output from running the above:
704
779
 
705
780
  ```
706
- + en/english1.xml.gz 1 links / 612 Bytes / 296 Bytes gzipped
707
- + fr/french1.xml.gz 1 links / 614 Bytes / 298 Bytes gzipped
708
- + sitemap1.xml.gz 3 links / 919 Bytes / 328 Bytes gzipped
709
- + sitemap_index.xml.gz 3 sitemaps / 505 Bytes / 221 Bytes gzipped
710
- Sitemap stats: 5 links / 3 sitemaps / 0m00s
781
+ In /Users/karl/projects/sitemap_generator-test/public/
782
+ + en/english.xml.gz 1 links / 328 Bytes
783
+ + fr/french.xml.gz 1 links / 329 Bytes
784
+ + sitemap1.xml.gz 2 links / 346 Bytes
785
+ + sitemap.xml.gz 3 sitemaps / 252 Bytes
786
+ Sitemap stats: 4 links / 3 sitemaps / 0m00s
711
787
  ```
712
788
 
713
- So we have two sitemaps with one link each and one sitemap with three links. The sitemaps from the groups are easy to spot by their filenames. They are `english1.xml.gz` and `french1.xml.gz`. They contain only one link each because **`include_index` and `include_root` are set to `false` by default** in a group.
789
+ So we have two sitemaps with one link each and one sitemap with two links. The sitemaps from the groups are easy to spot by their filenames. They are `english.xml.gz` and `french.xml.gz`. They contain only one link each because **`include_index` and `include_root` are set to `false` by default** in a group.
714
790
 
715
- On the other hand, the default sitemap which we added `/rss` to has three links. The sitemap index and root url were added to it when we added `/rss`. If we hadn't added that link `sitemap1.xml.gz` would not have been created. So **when we are using groups, the default sitemap will only be created if we add links to it**.
791
+ On the other hand, the default sitemap which we added `/rss` to has two links. The root url was added to it when we added `/rss`. If we hadn't added that link `sitemap1.xml.gz` would not have been created. So **when we are using groups, the default sitemap will only be created if we add links to it**.
716
792
 
717
793
  **The sitemap index file is shared by all groups**. You can change its filename by setting `SitemapGenerator::Sitemap.filename` or by passing the `:filename` option to `create`.
718
794
 
@@ -730,6 +806,7 @@ A news item can be added to a sitemap URL by passing a `:news` hash to `add`. T
730
806
  #### Example
731
807
 
732
808
  ```ruby
809
+ SitemapGenerator::Sitemap.default_host = "http://www.example.com"
733
810
  SitemapGenerator::Sitemap.create do
734
811
  add('/index.html', :news => {
735
812
  :publication_name => "Example",
@@ -763,6 +840,7 @@ Images can be added to a sitemap URL by passing an `:images` array to `add`. Ea
763
840
  #### Example
764
841
 
765
842
  ```ruby
843
+ SitemapGenerator::Sitemap.default_host = "http://www.example.com"
766
844
  SitemapGenerator::Sitemap.create do
767
845
  add('/index.html', :images => [{
768
846
  :loc => 'http://www.example.com/image.png',
@@ -788,14 +866,17 @@ To add more than one video to a url, pass an array of video hashes using the `:v
788
866
  #### Example
789
867
 
790
868
  ```ruby
791
- add('/index.html', :video => {
792
- :thumbnail_loc => 'http://www.example.com/video1_thumbnail.png',
793
- :title => 'Title',
794
- :description => 'Description',
795
- :content_loc => 'http://www.example.com/cool_video.mpg',
796
- :tags => %w[one two three],
797
- :category => 'Category'
798
- })
869
+ SitemapGenerator::Sitemap.default_host = "http://www.example.com"
870
+ SitemapGenerator::Sitemap.create do
871
+ add('/index.html', :video => {
872
+ :thumbnail_loc => 'http://www.example.com/video1_thumbnail.png',
873
+ :title => 'Title',
874
+ :description => 'Description',
875
+ :content_loc => 'http://www.example.com/cool_video.mpg',
876
+ :tags => %w[one two three],
877
+ :category => 'Category'
878
+ })
879
+ end
799
880
  ```
800
881
 
801
882
  #### Supported options
@@ -811,6 +892,7 @@ Pages with geo data can be added by passing a `:geo` Hash to `add`. The Hash on
811
892
  #### Example:
812
893
 
813
894
  ```ruby
895
+ SitemapGenerator::Sitemap.default_host = "http://www.example.com"
814
896
  SitemapGenerator::Sitemap.create do
815
897
  add('/stores/1234.xml', :geo => { :format => 'kml' })
816
898
  end
@@ -832,10 +914,12 @@ Check out the Google specification [here][alternate_links].
832
914
  #### Example
833
915
 
834
916
  ```ruby
917
+ SitemapGenerator::Sitemap.default_host = "http://www.example.com"
835
918
  SitemapGenerator::Sitemap.create do
836
919
  add('/index.html', :alternate => {
837
920
  :href => 'http://www.example.de/index.html',
838
- :lang => 'de'
921
+ :lang => 'de',
922
+ :nofollow => true
839
923
  })
840
924
  end
841
925
  ```
@@ -844,7 +928,7 @@ end
844
928
 
845
929
  * `:href` - Required, string.
846
930
  * `:lang` - Required, string.
847
-
931
+ * `:nofollow` - Optional, boolean. Used to mark link as "nofollow".
848
932
 
849
933
  ## Raison d'être
850
934
 
@@ -891,11 +975,13 @@ Tested and working on:
891
975
 
892
976
  ## Wishlist & Coming Soon
893
977
 
894
- * Rails framework agnosticism; support for other frameworks like Merb
895
-
896
978
 
897
979
  ## Thanks (in no particular order)
898
980
 
981
+ I've kind of stopped maintaining the list of contributors. To all those who have contributed code or a donation, many thanks!
982
+
983
+ Some past contributors:
984
+
899
985
  * [Eric Hochberger][ehoch]
900
986
  * [Rodrigo Flores](https://github.com/rodrigoflores) for News sitemaps
901
987
  * [Alex Soto](http://github.com/apsoto) for Video sitemaps