sitemap_generator 1.5.2 → 2.0.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: ./
3
3
  specs:
4
- sitemap_generator (1.5.0)
4
+ sitemap_generator (2.0.0)
5
5
 
6
6
  GEM
7
7
  remote: http://rubygems.org/
data/README.md CHANGED
@@ -7,18 +7,27 @@ Features
7
7
  -------
8
8
 
9
9
  - Supports [Video sitemaps][sitemap_video], [Image sitemaps][sitemap_images], and [Geo sitemaps][geo_tags]
10
- - Rails 2.x and 3.x compatible
10
+ - Compatible with Rails 2 & 3
11
11
  - Adheres to the [Sitemap 0.9 protocol][sitemap_protocol]
12
12
  - Handles millions of links
13
- - Compresses Sitemaps using GZip
14
- - Notifies Search Engines (Google, Yahoo, Bing, Ask, SitemapWriter) of new sitemaps
15
- - Ensures your old Sitemaps stay in place if the new Sitemap fails to generate
16
- - You set the hostname (and protocol) of the links in your Sitemap
13
+ - Automatically compresses your sitemaps
14
+ - Notifies search engines (Google, Yahoo, Bing, Ask, SitemapWriter) of new sitemaps
15
+ - Ensures your old sitemaps stay in place if the new sitemap fails to generate
16
+ - Gives you complete control over your sitemaps and their content
17
+
18
+ Contribute
19
+ -------
20
+
21
+ Does your website use SitemapGenerator to generate Sitemaps? Where would you be without Sitemaps? Probably still knocking rocks together. Consider donating to the project to keep it up-to-date and open source.
22
+
23
+ <a href='http://www.pledgie.com/campaigns/15267'><img alt='Click here to lend your support to: SitemapGenerator and make a donation at www.pledgie.com !' src='http://pledgie.com/campaigns/15267.png?skin_name=chrome' border='0' /></a>
24
+
17
25
 
18
26
  Changelog
19
27
  -------
20
28
 
21
- - v1.5.0: Major refactoring & testing in preparation for new API & features
29
+ - **v2.0.0: Introducing a new simpler API, Sitemap Groups, Sitemap Namers and more!**
30
+ - v1.5.0: New options `include_root`, `include_index`; Major testing & refactoring
22
31
  - v1.4.0: [Geo sitemap][geo_tags] support, multiple sitemap support via CONFIG_FILE rake option
23
32
  - v1.3.0: Support setting the sitemaps path
24
33
  - v1.2.0: Verified working with Rails 3 stable release
@@ -35,60 +44,69 @@ Those who knew him know what an amazing guy he was, and what an excellent Rails
35
44
 
36
45
  The canonical repository is now: [http://github.com/kjvarga/sitemap_generator][canonical_repo]
37
46
 
38
- Install
47
+ Install for Rails
39
48
  =======
40
49
 
41
- **Rails 3:**
42
-
43
- 1. Add the gem to your `Gemfile`
50
+ Rails 3
51
+ -------
44
52
 
45
- gem 'sitemap_generator'
53
+ Add the gem to your `Gemspec`:
46
54
 
47
- 2. `$ rake sitemap:install`
55
+ gem 'sitemap_generator'
48
56
 
49
- You don't need to include the tasks in your `Rakefile` because the tasks are loaded for you.
57
+ Then run `bundle`.
50
58
 
51
- **Pre Rails 3: As a gem**
59
+ Rails 2 Gem
60
+ --------
52
61
 
53
- 1. Add the gem as a dependency in your <tt>config/environment.rb</tt>
62
+ 1. Follow the Rails 3 install if you are using a `Gemfile`.
54
63
 
55
- config.gem 'sitemap_generator', :lib => false
64
+ If you are not using a `Gemfile` add the gem to your `config/environment.rb` configuration block with:
56
65
 
57
- 2. `$ rake gems:install`
66
+ config.gem 'sitemap_generator'
58
67
 
59
- 3. Add the following to your `Rakefile`
68
+ Then run `rake gems:install`.
60
69
 
61
- begin
62
- require 'sitemap_generator/tasks'
63
- rescue Exception => e
64
- puts "Warning, couldn't load gem tasks: #{e.message}! Skipping..."
65
- end
70
+ 2. Include the gem's Rake tasks in your `Rakefile`:
66
71
 
67
- 4. `$ rake sitemap:install`
72
+ begin
73
+ require 'sitemap_generator/tasks'
74
+ rescue Exception => e
75
+ puts "Warning, couldn't load gem tasks: #{e.message}! Skipping..."
76
+ end
68
77
 
69
- **Pre Rails 3: As a plugin**
78
+ Rails 2 Plugin
79
+ ----------
70
80
 
71
- 1. `$ ./script/plugin install git://github.com/kjvarga/sitemap_generator.git`
81
+ Run `script/plugin install git://github.com/kjvarga/sitemap_generator.git` from your application's root directory.
72
82
 
73
- Usage
83
+ Getting Started
74
84
  ======
75
85
 
76
- <code>rake sitemap:install</code> creates a <tt>config/sitemap.rb</tt> file which contains your logic for generating the Sitemap files.
86
+ Rake Tasks
87
+ -----
88
+
89
+ Run `rake sitemap:install` to create a `config/sitemap.rb` file which is your sitemap configuration and contains everything needed to build your sitemap. See **Sitemap Configuration** below for more information about how to define your sitemap.
90
+
91
+ Run `rake sitemap:refresh` as needed to create or rebuild your sitemap files. Sitemaps are generated into the `public/` folder and by default are named `sitemap_index.xml.gz`, `sitemap1.xml.gz`, `sitemap2.xml.gz`, etc. As you can see they are automatically gzip compressed for you.
92
+
93
+ `rake sitemap:refresh` will output information about each sitemap that is written including its location, how many links it contains and the size of the file.
77
94
 
78
- Once you have configured your sitemap in <tt>config/sitemap.rb</tt> (see Configuration below) run <code>rake sitemap:refresh</code> as needed to create/rebuild your Sitemap files. Sitemaps are generated into the <tt>public/</tt> folder and are named <tt>sitemap_index.xml.gz</tt>, <tt>sitemap1.xml.gz</tt>, <tt>sitemap2.xml.gz</tt>, etc.
95
+ **To disable all non-essential output from `rake` run the tasks passing a `-s` option.** For example: `rake -s sitemap:refresh`.
79
96
 
80
- Using <code>rake sitemap:refresh</code> will notify major search engines to let them know that a new Sitemap is available (Google, Yahoo, Bing, Ask, SitemapWriter). To generate new Sitemaps without notifying search engines (for example when running in a local environment) use <code>rake sitemap:refresh:no_ping</code>.
97
+ Search Engine Notification
98
+ -----
81
99
 
82
- To ping Yahoo you will need to set your Yahoo AppID in <tt>config/sitemap.rb</tt>. For example: <code>SitemapGenerator::Sitemap.yahoo_app_id = "my_app_id"</code>
100
+ Using `rake sitemap:refresh` will notify major search engines to let them know that a new sitemap is available (Google, Yahoo, Bing, Ask, SitemapWriter). To generate new sitemaps without notifying search engines (for example when running in a local environment) use `rake sitemap:refresh:no_ping`.
83
101
 
84
- To disable all non-essential output (only errors will be displayed) run the rake tasks with the <code>-s</code> option. For example <code>rake -s sitemap:refresh</code>.
102
+ To ping Yahoo you will need to set your Yahoo AppID in `config/sitemap.rb`. For example: `SitemapGenerator::Sitemap.yahoo_app_id = "my_app_id"`
85
103
 
86
- Cron
104
+ Crontab
87
105
  -----
88
106
 
89
- To keep your Sitemaps up-to-date, setup a cron job. Make sure to pass the <code>-s</code> option to silence rake. That way you will only get email when the sitemap build fails.
107
+ To keep your sitemaps up-to-date, setup a cron job. Make sure to pass the `-s` option to silence rake. That way you will only get email if the sitemap build fails.
90
108
 
91
- If you're using Whenever, your schedule would look something like the following:
109
+ If you're using Whenever, your schedule would look something like this:
92
110
 
93
111
  # config/schedule.rb
94
112
  every 1.day, :at => '5:00 am' do
@@ -98,195 +116,361 @@ If you're using Whenever, your schedule would look something like the following:
98
116
  Robots.txt
99
117
  ----------
100
118
 
101
- You should add the Sitemap index file to <code>public/robots.txt</code> to help search engines find your Sitemaps. The URL should be the complete URL to the Sitemap index file. For example:
119
+ You should add the URL of the sitemap index file to `public/robots.txt` to help search engines find your sitemaps. The URL should be the complete URL to the sitemap index. For example:
102
120
 
103
- Sitemap: http://www.example.org/sitemap_index.xml.gz
121
+ Sitemap: http://www.example.com/sitemap_index.xml.gz
104
122
 
105
- Image Sitemaps
106
- -----------
123
+ Deployments & Capistrano
124
+ ----------
107
125
 
108
- Images can be added to a sitemap URL by passing an <tt>:images</tt> array to <tt>add()</tt>. Each item in the array must be a Hash containing tags defined by the [Image Sitemap][image_tags] specification. For example:
126
+ To ensure that your application's sitemaps are available after a deployment you can do one of the following:
109
127
 
110
- sitemap.add('/index.html', :images => [{ :loc => 'http://www.example.com/image.png', :title => 'Image' }])
128
+ 1. **Generate sitemaps into a directory which is shared by all deployments.**
111
129
 
112
- Supported image options include:
130
+ You can set your sitemaps path to your shared directory using the `sitemaps_path` option. For example if we have a directory `public/shared/` that is shared by all deployments we can have our sitemaps generated into that directory by setting:
113
131
 
114
- * `loc` Required, location of the image
115
- * `caption`
116
- * `geo_location`
117
- * `title`
118
- * `license`
132
+ SitemapGenerator::Sitemap.sitemaps_path = 'shared/'
119
133
 
120
- Video Sitemaps
121
- -----------
134
+ 2. **Copy the sitemaps from the previous deploy over to the new deploy:**
122
135
 
123
- A video can be added to a sitemap URL by passing a <tt>:video</tt> Hash to <tt>add()</tt>. The Hash can contain tags defined by the [Video Sitemap specification][video_tags]. To associate more than one <tt>tag</tt> with a video, pass the tags as an array with the key <tt>:tags</tt>.
136
+ (You will need to customize the task if you are using custom sitemap filenames or locations.)
124
137
 
125
- sitemap.add('/index.html', :video => { :thumbnail_loc => 'http://www.example.com/video1_thumbnail.png', :title => 'Title', :description => 'Description', :content_loc => 'http://www.example.com/cool_video.mpg', :tags => %w[one two three], :category => 'Category' })
138
+ after "deploy:update_code", "deploy:copy_old_sitemap"
139
+ namespace :deploy do
140
+ task :copy_old_sitemap do
141
+ run "if [ -e #{previous_release}/public/sitemap_index.xml.gz ]; then cp #{previous_release}/public/sitemap* #{current_release}/public/; fi"
142
+ end
143
+ end
126
144
 
127
- Supported video options include:
128
145
 
129
- * `thumbnail_loc` Required
130
- * `title` Required
131
- * `description` Required
132
- * `content_loc` Depends. At least one of `player_loc` or `content_loc` is required
133
- * `player_loc` Depends. At least one of `player_loc` or `content_loc` is required
134
- * `expiration_date` Recommended
135
- * `duration` Recommended
136
- * `rating`
137
- * `view_count`
138
- * `publication_date`
139
- * `family_friendly`
140
- * `tags` A list of tags if more than one tag.
141
- * `tag` A single tag. See `tags`
142
- * `category`
143
- * `gallery_loc`
144
- * `uploader` (use `uploader_info` to set the info attribute)
146
+ 3. **Regenerate your sitemaps after each deployment:**
145
147
 
146
- Geo Sitemaps
147
- -----------
148
+ after "deploy", "refresh_sitemaps"
149
+ task :refresh_sitemaps do
150
+ run "cd #{latest_release} && RAILS_ENV=#{rails_env} rake sitemap:refresh"
151
+ end
148
152
 
149
- Page with geo data can be added by passing a <tt>:geo</tt> Hash to <tt>add()</tt>. The Hash only supports one tag of <tt>:format</tt>. Google provides an [example of a geo sitemap link here][geo_tags]. Note that the sitemap does not actually contain your KML or GeoRSS. It merely links to a page that has this content.
153
+ Sitemap Configuration
154
+ ======
150
155
 
151
- sitemap.add('/stores/1234.xml', :geo => { :format => 'kml' })
156
+ A sitemap configuration file contains all the information needed to generate your sitemaps. By default SitemapGenerator looks for a configuration file in `config/sitemap.rb` - relative to your application root or the current working directory. (Run `rake sitemap:install` to have this file generated for you if you have not done so already.)
152
157
 
153
- Supported geo options include:
158
+ If you want to use a non-standard configuration file, or have multiple configuration files, you can specify which one to run by passing the `CONFIG_FILE` option like so:
154
159
 
155
- * `format` Required, either 'kml' or 'georss'
156
-
157
- Configuration
158
- ======
160
+ rake sitemap:refresh CONFIG_FILE="config/geo_sitemap.rb"
161
+
162
+ A Simple Example
163
+ -------
164
+
165
+ So what does a sitemap configuration look like? Let's take a look at a simple example:
159
166
 
160
- The sitemap configuration file can be found in <tt>config/sitemap.rb</tt>. When you run a rake task to refresh your sitemaps this file is evaluated. It contains all your configuration settings, as well as your sitemap definition.
167
+ SitemapGenerator::Sitemap.default_host = "http://www.example.com"
168
+ SitemapGenerator::Sitemap.create do
169
+ add '/welcome'
170
+ end
161
171
 
162
- Sitemap Links
172
+ A few things to note:
173
+
174
+ * `SitemapGenerator::Sitemap` is a lazy-initialized sitemap object provided for your convenience.
175
+ * Every sitemap must set `default_host`. This is the hostname that is used when building links to add to the sitemap.
176
+ * The `create` method takes a block with calls to `add` to add links to the sitemap.
177
+ * The sitemaps are written to the `public/` directory, which is the default location. You can specify a custom location using the `public_path` or `sitemaps_path` option.
178
+
179
+ Now let's see what is output when we run this configuration with `rake sitemap:refresh:no_ping`:
180
+
181
+ + sitemap1.xml.gz 3 links / 923 Bytes / 329 Bytes gzipped
182
+ + sitemap_index.xml.gz 1 sitemaps / 364 Bytes / 199 Bytes gzipped
183
+ Sitemap stats: 3 links / 1 sitemaps / 0m00s
184
+
185
+ Weird! The sitemap has three links, even though only added one! This is because SitemapGenerator adds the root URL `/` and the URL of the sitemap index file to your sitemap by default. (You can change the default behaviour by setting the `include_root` or `include_index` option.)
186
+
187
+ Now let's take a look at the files that were created. After uncompressing and XML-tidying the contents we have:
188
+
189
+ * `public/sitemap_index.xml.gz`
190
+
191
+ <?xml version="1.0" encoding="UTF-8"?>
192
+ <sitemapindex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd">
193
+ <sitemap>
194
+ <loc>http://www.example.com/sitemap1.xml.gz</loc>
195
+ </sitemap>
196
+ </sitemapindex>
197
+
198
+ * `public/sitemap1.xml.gz`
199
+
200
+ <?xml version="1.0" encoding="UTF-8"?>
201
+ <urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:video="http://www.google.com/schemas/sitemap-video/1.1" xmlns:geo="http://www.google.com/geo/schemas/sitemap/1.0" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
202
+ <url>
203
+ <loc>http://www.example.com/</loc>
204
+ <lastmod>2011-05-21T00:03:38+00:00</lastmod>
205
+ <changefreq>always</changefreq>
206
+ <priority>1.0</priority>
207
+ </url>
208
+ <url>
209
+ <loc>http://www.example.com/sitemap_index.xml.gz</loc>
210
+ <lastmod>2011-05-21T00:03:38+00:00</lastmod>
211
+ <changefreq>always</changefreq>
212
+ <priority>1.0</priority>
213
+ </url>
214
+ <url>
215
+ <loc>http://www.example.com/welcome</loc>
216
+ <lastmod>2011-05-21T00:03:38+00:00</lastmod>
217
+ <changefreq>weekly</changefreq>
218
+ <priority>0.5</priority>
219
+ </url>
220
+ </urlset>
221
+
222
+ The sitemaps conform to the [Sitemap 0.9 protocol][sitemap_protocol]. Notice the values for `priority` and `changefreq` on the root and sitemap index links, the ones that were added for us? The values tell us that these links are the highest priority and should be checked regularly because they are constantly changing. You can specify your own values for these options in your call to `add`.
223
+
224
+ Adding Links
163
225
  ----------
164
226
 
165
- The Root Path <tt>/</tt> and Sitemap Index file are automatically added to your sitemap. Links are added to the Sitemap output in the order they are specified. Add links to your sitemap by calling <tt>add_links</tt>, passing a black which receives the sitemap object. Then call <tt>add(path, options)</tt> on the sitemap to add a link.
227
+ You call `add` in the block passed to `create` to add a **path** to your sitemap. `add` takes a string path and optional hash of options, generates the URL and adds it to the sitemap. You only need to pass a **path** because the URL will be built for us using the `default_host` we specified. However, if we want to use a different host for a particular link, we can pass the `:host` option to `add`.
166
228
 
167
- For Example:
229
+ Let's see another example:
168
230
 
169
- SitemapGenerator::Sitemap.add_links do |sitemap|
170
- sitemap.add '/reports'
231
+ SitemapGenerator::Sitemap.default_host = "http://www.example.com"
232
+ SitemapGenerator::Sitemap.create do
233
+ add '/contact_us'
234
+ Content.find_each do |content|
235
+ add content_path(content), :lastmod => content.updated_at
236
+ end
171
237
  end
172
238
 
173
- The Rails URL helpers are automatically included for you if Rails is detected. So in your call to <tt>add</tt> you can use them to generate paths for your active records, e.g.:
239
+ In this example first we add the `/contact_us` page to the sitemap and then we iterate through the Content model's records adding each one to the sitemap using the `content_path` helper method to generate the path for each record.
174
240
 
175
- Article.find_each do |article|
176
- sitemap.add article_path(article), :lastmod => article.updated_at
177
- end
241
+ The **Rails URL/path helper methods are automatically made available** to us in the `create` block. This keeps the logic for building our paths out of the sitemap config and in the Rails application where it should be. You use those methods just like you would in your application's view files.
178
242
 
179
- For large sitemaps it is advisable to iterate through your Active Records in batches to avoid loading all records into memory at once. As of Rails 2.3.2 you can use <tt>ActiveRecord::Base#find_each</tt> or <tt>ActiveRecord::Base#find_in_batches</tt> to do batched finds, which can significantly improve sitemap performance.
243
+ In the example about we pass a `lastmod` (last modified) option with the value of the record's `updated_at` attribute so that search engines know to only re-index the page when the record changes.
180
244
 
181
- Valid [options to <tt>add</tt>](http://sitemaps.org/protocol.php#xmlTagDefinitions) are:
245
+ Looking at the output from running this sitemap, we see that we have a few more links than before:
182
246
 
183
- * `priority` The priority of this URL relative to other URLs on your site. Valid values range from 0.0 to 1.0. Default _0.5_
184
- * `changefreq` One of: always, hourly, daily, weekly, monthly, yearly, never. Default _weekly_
185
- * `lastmod` Time instance. The date of last modification. Default `Time.now`
186
- * `host` Optional host for the link's URL. Defaults to `default_host`
187
-
188
- Sitemaps Path
189
- ----------
247
+ + sitemap1.xml.gz 12 links / 2.3 KB / 365 Bytes gzipped
248
+ + sitemap_index.xml.gz 1 sitemaps / 364 Bytes / 199 Bytes gzipped
249
+ Sitemap stats: 12 links / 1 sitemaps / 0m00s
190
250
 
191
- By default sitemaps are generated into <tt>public/</tt>. You can customize the location for your generated sitemaps by setting <tt>sitemaps_path</tt> to a path relative to your public directory. The directory will be created for you if it does not already exist.
251
+ From this example we can see that:
192
252
 
193
- For example:
253
+ * The `create` block can contain Ruby code
254
+ * The Rails URL/path helper methods are made available to us, and
255
+ * The basic syntax for adding paths to the sitemap using `add`
194
256
 
195
- SitemapGenerator::Sitemap.sitemaps_path = 'sitemaps/'
257
+ You can read more about `add` in the [XML Specification](http://sitemaps.org/protocol.php#xmlTagDefinitions).
196
258
 
197
- Will generate sitemaps into the `public/sitemaps/` directory. If you want your sitemaps to be findable by robots, you need to specify the location of your sitemap index file in your <tt>public/robots.txt</tt>.
259
+ ### Supported Options to `add`
198
260
 
199
- Sitemaps Host
200
- ----------
261
+ * `changefreq` - Default: `'weekly'` (String).
201
262
 
202
- You must set the <tt>default_host</tt> that is to be used when adding links to your sitemap. The hostname should match the host that the sitemaps are going to be served from. For example:
263
+ Indicates how often the content of the page changes. One of `'always'`, `'hourly'`, `'daily'`, `'weekly'`, `'monthly'`, `'yearly'` or `'never'`. Example:
203
264
 
204
- SitemapGenerator::Sitemap.default_host = "http://www.example.com"
265
+ add '/contact_us', :changefreq => 'monthly'
205
266
 
206
- The hostname must include the full protocol.
267
+ * `lastmod` - Default: `Time.now` (Time).
207
268
 
208
- Sitemap Filenames
209
- ----------
269
+ The date and time of last modification. Example:
210
270
 
211
- By default sitemaps have the name <tt>sitemap1.xml.gz</tt>, <tt>sitemap2.xml.gz</tt>, etc with the sitemap index having name <tt>sitemap_index.xml.gz</tt>.
271
+ add content_path(content), :lastmod => content.updated_at
212
272
 
213
- If you want to change the <tt>sitemap</tt> portion of the name you can set it as shown below. The surrounding structure of numbers, extensions, and _index will stay the same. For example:
273
+ * `host` - Default: `default_host` (String).
214
274
 
215
- SitemapGenerator::Sitemap.filename = "geo_sitemap"
275
+ Host to use when building the URL. Example:
216
276
 
217
- Example Configuration File
218
- ---------
277
+ add '/login', :host => 'https://securehost.com/login'
219
278
 
220
- SitemapGenerator::Sitemap.default_host = "http://www.example.com"
221
- SitemapGenerator::Sitemap.yahoo_app_id = nil # Set to your Yahoo AppID to ping Yahoo
222
-
223
- SitemapGenerator::Sitemap.add_links do |sitemap|
224
- # Put links creation logic here.
225
- #
226
- # The Root Path ('/') and Sitemap Index file are added automatically.
227
- # Links are added to the Sitemap output in the order they are specified.
228
- #
229
- # Usage: sitemap.add path, options
230
- # (default options are used if you don't specify them)
231
- #
232
- # Defaults: :priority => 0.5, :changefreq => 'weekly',
233
- # :lastmod => Time.now, :host => default_host
234
-
235
- # add '/articles'
236
- sitemap.add articles_path, :priority => 0.7, :changefreq => 'daily'
237
-
238
- # add all articles
239
- Article.all.each do |a|
240
- sitemap.add article_path(a), :lastmod => a.updated_at
241
- end
279
+ * `priority` - Default: `0.5` (Float).
280
+
281
+ The priority of the URL relative to other URLs on a scale from 0 to 1. Example:
282
+
283
+ add '/about', :priority => 0.75
242
284
 
243
- # add news page with images
244
- News.all.each do |news|
245
- images = news.images.collect do |image|
246
- { :loc => image.url, :title => image.name }
247
- end
248
- sitemap.add news_path(news), :images => images
249
- end
250
- end
251
285
 
252
- Generating Multiple Sets Of Sitemaps
286
+ Speeding Things Up
253
287
  ----------
254
288
 
255
- To generate multiple sets of sitemaps you can create multiple configuration files. Each should contain a different <tt>SitemapGenerator::Sitemap.filename</tt> to avoid overwriting the previous set. (Of course you can keep the default name of 'sitemap' in one of them.) You can then build each set with a separate rake task. For example:
289
+ For large ActiveRecord collections with thousands of records it is advisable to iterate through them in batches to avoid loading all records into memory at once. For this reason in the example above we use `Content.find_each` which is a batched iterator available since Rails 2.3.2, rather than `Content.all`.
290
+
291
+ Generating Multiple Sitemap Indexes
292
+ ----------
293
+
294
+ Each sitemap configuration corresponds to one sitemap index. To generate multiple sets of sitemaps you can create multiple configuration files. Each should specify a different location or filename to avoid overwriting each other. To generate your sitemaps, specify the configuration file to run in your call to `rake sitemap:refresh` using the `CONFIG_FILE` argument like in the following example:
256
295
 
257
- rake sitemap:refresh
258
296
  rake sitemap:refresh CONFIG_FILE="config/geo_sitemap.rb"
259
-
260
- The first one uses the default config file at <tt>config/sitemap.rb</tt>. Your first config file might look like this:
261
297
 
262
- # config/sitemap.rb
263
- SitemapGenerator::Sitemap.default_host = "http://www.example.com"
264
- SitemapGenerator::Sitemap.add_links do |sitemap|
265
- Store.each do |store
266
- sitemap.add store_path(store)
298
+ Customizing your Sitemaps
299
+ =======
300
+
301
+ SitemapGenerator supports a number of options which allow you to control every aspect of your sitemap generation. How they are named, where they are stored, the contents of the links and the location that the sitemaps will be hosted from can all be set.
302
+
303
+ The options can be set in the following ways.
304
+
305
+ On `SitemapGenerator::Sitemap`:
306
+
307
+ SitemapGenerator::Sitemap.default_host = 'http://example.com'
308
+ SitemapGenerator::Sitemap.sitemaps_path = 'sitemaps/'
309
+
310
+ These options will apply to all sitemaps. This is how you set most options.
311
+
312
+ Passed as options in the call to `create`:
313
+
314
+ SitemapGenerator::Sitemap.create(
315
+ :default_host => 'http://example.com',
316
+ :sitemaps_path => 'sitemaps/') do
317
+ add '/home'
318
+ end
319
+
320
+ This is useful if you are setting a lot of options.
321
+
322
+ Finally, passed as options in a call to `group`:
323
+
324
+ SitemapGenerator::Sitemap.create do
325
+ group(:default_host => 'http://example.com',
326
+ :sitemaps_path => 'sitemaps/') do
327
+ add '/home'
267
328
  end
268
329
  end
269
330
 
270
- And the second:
331
+ The options passed to `group` only apply to the links and sitemaps generated in the group. Sitemap Groups are useful to group links into specific sitemaps, or to set options that you only want to apply to the links in that group.
332
+
333
+ Sitemap Options
334
+ -------
335
+
336
+ The following options are supported:
337
+
338
+ * `default_host` - String. Required. **Host including protocol** to use when building a link to add to your sitemap. For example `http://example.com`. Calling `add '/home'` would then generate the URL `http://example.com/home` and add that to the sitemap. You can pass a `:host` option in your call to `add` to override this value on a per-link basis. For example calling `add '/home', :host => 'https://example.com'` would generate the URL `https://example.com/home`, for that link only.
339
+
340
+ * `filename` - Symbol. The **base name for the files** that will be generated. The default value is `:sitemap`. This yields sitemaps with names like `sitemap1.xml.gz`, `sitemap2.xml.gz`, `sitemap3.xml.gz` etc, and a sitemap index named `sitemap_index.xml.gz`. If we now set the value to `:geo` the sitemaps would be named `geo1.xml.gz`, `geo2.xml.gz`, `geo3.xml.gz` etc, and the sitemap index would be named `geo_index.xml.gz`.
341
+
342
+ * `include_index` - Boolean. Whether to **add a link to the sitemap index** to the current sitemap. This points search engines to your Sitemap Index to include it in the indexing of your site. Default is `true`.
343
+
344
+ * `include_root` - Boolean. Whether to **add the root** url i.e. '/' to the current sitemap. Default is `true`.
345
+
346
+ * `public_path` - String. A **full or relative path** to the `public` directory or the directory you want to write sitemaps into. Defaults to `public/` under your application root or relative to the current working directory.
347
+
348
+ * `sitemaps_host` - String. **Host including protocol** to use when generating a link to a sitemap file i.e. the hostname of the server where the sitemaps are hosted. The value will differ from the hostname in your sitemap links. For example: `'http://amazon.aws.com/'`
349
+
350
+ * `sitemaps_namer` - A `SitemapGenerator::SitemapNamer` instance **for generating sitemap names**. You can read about Sitemap Namers by reading the API docs. Sitemap Namers don't apply to the sitemap index. You can only modify the name of the index file using the `filename` option. Sitemap Namers allow you to set the name, extension and number sequence for sitemap files.
351
+
352
+ * `sitemaps_path` - String. A **relative path** giving a directory under your `public_path` at which to write sitemaps. The difference between the two options is that the `sitemaps_path` is used when generating a link to a sitemap file. For example, if we set `SitemapGenerator::Sitemap.sitemaps_path = 'en/'` and use the default `public_path` sitemaps will be written to `public/en/`. And when the sitemap index is added to our sitemap it would have a URL like `http://example.com/en/sitemap_index.xml.gz`.
353
+
354
+ * `verbose` - Boolean. Whether to **output a sitemap summary** describing the sitemap files and giving statistics about your sitemap. Default is `false`. When using the Rake tasks `verbose` will be `true` unless you pass the `-s` option.
355
+
356
+ Sitemap Groups
357
+ =======
358
+
359
+ Sitemap Groups is a powerful feature that is also very simple to use.
360
+
361
+ * All options are supported except for `public_path`. You cannot change the public path.
362
+ * Groups inherit the options set on the default sitemap.
363
+ * `include_index` and `include_root` are `false` by default in a group.
364
+ * The sitemap index file is shared by all groups.
365
+ * Groups can handle any number of links.
366
+ * Group sitemaps are finalized (written out) as they get full and at the end of each group.
367
+
368
+ A Groups Example
369
+ ----------------
370
+
371
+ When you create a new group you pass options which will apply only to that group. You pass a block to `group`. Inside your block you call `add` to add links to the group.
372
+
373
+ Let's see an example that demonstrates a few interesting things about groups:
271
374
 
272
- # config/geo_sitemap.rb
273
- SitemapGenerator::Sitemap.filename = "geo_sitemap"
274
375
  SitemapGenerator::Sitemap.default_host = "http://www.example.com"
275
- SitemapGenerator::Sitemap.add_links do |sitemap|
276
- Store.each do |store
277
- sitemap.add "stores/#{store.id}.xml", :geo => { :format => 'kml' }
376
+ SitemapGenerator::Sitemap.create do
377
+ add '/rss'
378
+
379
+ group(:sitemaps_path => 'en/', :filename => :english) do
380
+ add '/home'
278
381
  end
382
+
383
+ group(:sitemaps_path => 'fr/', :filename => :french) do
384
+ add '/maison'
385
+ end
386
+ end
387
+
388
+ And the output from running the above:
389
+
390
+ + en/english1.xml.gz 1 links / 612 Bytes / 296 Bytes gzipped
391
+ + fr/french1.xml.gz 1 links / 614 Bytes / 298 Bytes gzipped
392
+ + sitemap1.xml.gz 3 links / 919 Bytes / 328 Bytes gzipped
393
+ + sitemap_index.xml.gz 3 sitemaps / 505 Bytes / 221 Bytes gzipped
394
+ Sitemap stats: 5 links / 3 sitemaps / 0m00s
395
+
396
+ So we have two sitemaps with one link each and one sitemap with three links. The sitemaps from the groups are easy to spot by their filenames. They are `english1.xml.gz` and `french1.xml.gz`. They contain only one link each because **`include_index` and `include_root` are set to `false` by default** in a group.
397
+
398
+ On the other hand, the default sitemap which we added `/rss` to has three links. The sitemap index and root url were added to it when we added `/rss`. If we hadn't added that link `sitemap1.xml.gz` would not have been created. So **when we are using groups, the default sitemap will only be created if we add links to it**.
399
+
400
+ **The sitemap index file is shared by all groups**. You can change its filename by setting `SitemapGenerator::Sitemap.filename` or by passing the `:filename` option to `create`.
401
+
402
+ The options you use when creating your groups will determine which and how many sitemaps are created. Groups will inherit the default sitemap when possible, and will continue the normal series. However a group will often specify an option which requires the links in that group to be in their own files. In this case, if the default sitemap were being used it would be finalized before starting the next sitemap in the series.
403
+
404
+ If you have changed your sitemaps physical location in a group, then the default sitemap will not be used and it will be unaffected by the group. **Group sitemaps are finalized as they get full and at the end of each group.**
405
+
406
+ Sitemap Extensions
407
+ ===========
408
+
409
+ Image Sitemaps
410
+ -----------
411
+
412
+ Images can be added to a sitemap URL by passing an `:images` array to `add`. Each item in the array must be a Hash containing tags defined by the [Image Sitemap][image_tags] specification. For example:
413
+
414
+ SitemapGenerator::Sitemap.create do
415
+ add('/index.html', :images => [{
416
+ :loc => 'http://www.example.com/image.png',
417
+ :title => 'Image' }])
279
418
  end
280
419
 
281
- After running both rake tasks you'll have the following files in your <tt>public</tt> directory (or wherever you set the sitemaps_path):
420
+ Supported image options include:
282
421
 
283
- geo_sitemap_index.xml.gz
284
- geo_sitemap1.xml.gz
285
- sitemap_index.xml.gz
286
- sitemap1.xml.gz
422
+ * `loc` Required, location of the image
423
+ * `caption`
424
+ * `geo_location`
425
+ * `title`
426
+ * `license`
427
+
428
+ Video Sitemaps
429
+ -----------
430
+
431
+ A video can be added to a sitemap URL by passing a `:video` Hash to `add`. The Hash can contain tags defined by the [Video Sitemap specification][video_tags]. To associate more than one `tag` with a video, pass the tags as an array with the key `:tags`.
432
+
433
+ add('/index.html', :video => {
434
+ :thumbnail_loc => 'http://www.example.com/video1_thumbnail.png',
435
+ :title => 'Title',
436
+ :description => 'Description',
437
+ :content_loc => 'http://www.example.com/cool_video.mpg',
438
+ :tags => %w[one two three],
439
+ :category => 'Category'
440
+ })
441
+
442
+ Supported video options include:
443
+
444
+ * `thumbnail_loc` Required
445
+ * `title` Required
446
+ * `description` Required
447
+ * `content_loc` Depends. At least one of `player_loc` or `content_loc` is required
448
+ * `player_loc` Depends. At least one of `player_loc` or `content_loc` is required
449
+ * `expiration_date` Recommended
450
+ * `duration` Recommended
451
+ * `rating`
452
+ * `view_count`
453
+ * `publication_date`
454
+ * `family_friendly`
455
+ * `tags` A list of tags if more than one tag.
456
+ * `tag` A single tag. See `tags`
457
+ * `category`
458
+ * `gallery_loc`
459
+ * `uploader` (use `uploader_info` to set the info attribute)
460
+
461
+ Geo Sitemaps
462
+ -----------
463
+
464
+ Pages with geo data can be added by passing a `:geo` Hash to `add`. The Hash only supports one tag of `:format`. Google provides an [example of a geo sitemap link here][geo_tags]. Note that the sitemap does not actually contain your KML or GeoRSS. It merely links to a page that has this content.
465
+
466
+ add('/stores/1234.xml', :geo => { :format => 'kml' })
467
+
468
+ Supported geo options include:
469
+
470
+ * `format` Required, either 'kml' or 'georss'
287
471
 
288
472
  Raison d'être
289
- -------
473
+ =======
290
474
 
291
475
  Most of the Sitemap plugins out there seem to try to recreate the Sitemap links by iterating the Rails routes. In some cases this is possible, but for a great deal of cases it isn't.
292
476
 
@@ -305,46 +489,31 @@ So my idea is to have another file similar to 'routes.rb' called 'sitemap.rb', w
305
489
  Here's my solution:
306
490
 
307
491
  Zipcode.find(:all, :include => :city).each do |z|
308
- sitemap.add zipcode_path(:state => z.city.state, :city => z.city, :zipcode => z)
492
+ add zipcode_path(:state => z.city.state, :city => z.city, :zipcode => z)
309
493
  end
310
494
 
311
495
  Easy hey?
312
496
 
313
- Other Sitemap settings for the link, like `lastmod`, `priority`, `changefreq` and `host` are entered automatically, although you can override them if you need to.
314
-
315
497
  Compatibility
316
498
  =======
317
499
 
318
500
  Tested and working on:
319
501
 
320
- - **Rails** 3.0.0
502
+ - **Rails** 3.0.0, 3.0.7
321
503
  - **Rails** 1.x - 2.3.8
322
- - **Ruby** 1.8.6, 1.8.7, 1.8.7 Enterprise Edition, 1.9.1
323
-
324
- Notes
325
- =======
326
-
327
- 1) New Capistrano deploys will remove your Sitemap files, unless you run `rake sitemap:refresh`. The way around this is to create a cap task to copy the sitemaps from the previous deploy:
328
-
329
- after "deploy:update_code", "deploy:copy_old_sitemap"
330
-
331
- namespace :deploy do
332
- task :copy_old_sitemap do
333
- run "if [ -e #{previous_release}/public/sitemap_index.xml.gz ]; then cp #{previous_release}/public/sitemap* #{current_release}/public/; fi"
334
- end
335
- end
504
+ - **Ruby** 1.8.6, 1.8.7, 1.8.7 Enterprise Edition, 1.9.1, 1.9.2
336
505
 
337
506
  Known Bugs
338
507
  ========
339
508
 
340
509
  - There's no check on the size of a URL which [isn't supposed to exceed 2,048 bytes][sitemaps_xml].
341
- - Currently only supports one Sitemap Index file, which can contain 50,000 Sitemap files which can each contain 50,000 urls, so it _only_ supports up to 2,500,000,000 (2.5 billion) urls. I personally have no need of support for more urls, but plugin could be improved to support this.
510
+ - Currently only supports one Sitemap Index file, which can contain 50,000 Sitemap files which can each contain 50,000 urls, so it _only_ supports up to 2,500,000,000 (2.5 billion) urls.
342
511
 
343
512
  Wishlist & Coming Soon
344
513
  ========
345
514
 
346
- - Support for read-only filesystems
347
- - Support for plain Ruby and Merb sitemaps
515
+ - Support for read-only filesystems like Heroku
516
+ - Rails framework agnosticism; support for other frameworks like Merb
348
517
 
349
518
  Thanks (in no particular order)
350
519
  ========
@@ -371,4 +540,4 @@ Copyright (c) 2009 Karl Varga released under the MIT license
371
540
  [sitemap_protocol]:http://sitemaps.org/protocol.php
372
541
  [video_tags]:http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=80472#4
373
542
  [image_tags]:http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=178636
374
- [geo_tags]:http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=94555
543
+ [geo_tags]:http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=94555