sitemap_generator 2.1.0 → 2.1.1

Sign up to get free protection for your applications and to get access to all the features.
data/Gemfile CHANGED
@@ -19,6 +19,6 @@ gem 'nokogiri', '1.4.4'
19
19
  gem 'sqlite3-ruby', '1.3.1', :require => 'sqlite3'
20
20
 
21
21
  group :test do
22
- gem 'ruby-debug', '0.10.3'
23
- gem 'ruby-debug-base', '0.10.3'
22
+ gem 'ruby-debug', '~>0.10'
23
+ gem 'ruby-debug-base', '~>0.10'
24
24
  end
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: ./
3
3
  specs:
4
- sitemap_generator (2.1.0)
4
+ sitemap_generator (2.1.1)
5
5
 
6
6
  GEM
7
7
  remote: http://rubygems.org/
@@ -77,7 +77,7 @@ DEPENDENCIES
77
77
  rake (>= 0.8.7)
78
78
  rspec (= 1.3.1)
79
79
  rspec-rails (~> 1.3.2)
80
- ruby-debug (= 0.10.3)
81
- ruby-debug-base (= 0.10.3)
80
+ ruby-debug (~> 0.10)
81
+ ruby-debug-base (~> 0.10)
82
82
  sitemap_generator!
83
83
  sqlite3-ruby (= 1.3.1)
data/README.md CHANGED
@@ -23,10 +23,10 @@ Does your website use SitemapGenerator to generate Sitemaps? Where would you be
23
23
 
24
24
  <a href='http://www.pledgie.com/campaigns/15267'><img alt='Click here to lend your support to: SitemapGenerator and make a donation at www.pledgie.com !' src='http://pledgie.com/campaigns/15267.png?skin_name=chrome' border='0' /></a>
25
25
 
26
-
27
26
  Changelog
28
27
  -------
29
28
 
29
+ - v2.1.1: Support calling `create()` multiple times in a sitemap config. Support host names with path segments so you can use a `default_host` like `'http://mysite.com/subdirectory/'`. Turn off `include_index` when the `sitemaps_host` differs from `default_host`. Add docs about how to upload to remote hosts.
30
30
  - v2.1.0: [News sitemap][sitemap_news] support
31
31
  - v2.0.1.pre2: Fix uploading to the (bucket) root on a remote server
32
32
  - v2.0.1.pre1: Support read-only filesystems like Heroku by supporting uploading to remote host
@@ -155,6 +155,47 @@ To ensure that your application's sitemaps are available after a deployment you
155
155
  run "cd #{latest_release} && RAILS_ENV=#{rails_env} rake sitemap:refresh"
156
156
  end
157
157
 
158
+ Upload Sitemaps to a Remote Host
159
+ ----------
160
+
161
+ Sometimes it is desirable to host your sitemap files on a remote server and point robots
162
+ and search engines to the remote files. For example if you are using a host like Heroku
163
+ which doesn't allow writing to the local filesystem. You still require *some* write access
164
+ because the sitemap files need to be written out before uploading, so generally a host will
165
+ give you write access to a temporary directory. On Heroku this is `tmp/` in your application
166
+ directory.
167
+
168
+ Sitemap Generator uses CarrierWave to support uploading to Amazon S3 store, Rackspace Cloud Files store, and MongoDB's GridF - whatever CarrierWave supports.
169
+
170
+ 1. Please see [this wiki page][remote_hosts] for more information about setting up CarrierWave, SitemapGenerator and Rails.
171
+
172
+ 2. Once you have CarrierWave setup and configured all you need to do is set some options in your sitemap config, such as:
173
+
174
+ * `default_host` - your website host name
175
+ * `sitemaps_host` - the remote host where your sitemaps will be hosted
176
+ * `public_path` - the directory to write sitemaps to locally e.g. `tmp/`
177
+ * `sitemaps_path` - set to a directory/path if you don't want to upload to the root of your `sitemaps_host`
178
+ * `adapter` - instance of `SitemapGenerator::WaveAdapter`
179
+
180
+ For Example:
181
+
182
+ SitemapGenerator::Sitemap.default_host = "http://www.example.com"
183
+ SitemapGenerator::Sitemap.sitemaps_host = "http://s3.amazonaws.com/sitemap-generator/"
184
+ SitemapGenerator::Sitemap.public_path = 'tmp/'
185
+ SitemapGenerator::Sitemap.sitemaps_path = 'sitemaps/'
186
+ SitemapGenerator::Sitemap.adapter = SitemapGenerator::WaveAdapter.new
187
+
188
+ 3. Update your `robots.txt` file to point robots to the remote sitemap index file, e.g:
189
+
190
+ Sitemap: http://s3.amazonaws.com/sitemap-generator/sitemaps/sitemap_index.xml.gz
191
+
192
+ You generate your sitemaps as usual using `rake sitemap:refresh`.
193
+
194
+ Note that SitemapGenerator will automatically turn off `include_index` in this case because
195
+ the `sitemaps_host` does not match the `default_host`. The link to the sitemap index file
196
+ that would otherwise be included would point to a different host than the rest of the links
197
+ in the sitemap, something that the sitemap rules forbid.
198
+
158
199
  Sitemap Configuration
159
200
  ======
160
201
 
@@ -344,13 +385,16 @@ The following options are supported:
344
385
 
345
386
  * `filename` - Symbol. The **base name for the files** that will be generated. The default value is `:sitemap`. This yields sitemaps with names like `sitemap1.xml.gz`, `sitemap2.xml.gz`, `sitemap3.xml.gz` etc, and a sitemap index named `sitemap_index.xml.gz`. If we now set the value to `:geo` the sitemaps would be named `geo1.xml.gz`, `geo2.xml.gz`, `geo3.xml.gz` etc, and the sitemap index would be named `geo_index.xml.gz`.
346
387
 
347
- * `include_index` - Boolean. Whether to **add a link to the sitemap index** to the current sitemap. This points search engines to your Sitemap Index to include it in the indexing of your site. Default is `true`.
388
+ * `include_index` - Boolean. Whether to **add a link to the sitemap index** to the current sitemap. This points search engines to your Sitemap Index to include it in the indexing of your site. Default is `true`. Turned off when `sitemaps_host` is set or within a `group()` block.
348
389
 
349
- * `include_root` - Boolean. Whether to **add the root** url i.e. '/' to the current sitemap. Default is `true`.
390
+ * `include_root` - Boolean. Whether to **add the root** url i.e. '/' to the current sitemap. Default is `true`. Turned off within a `group()` block.
350
391
 
351
392
  * `public_path` - String. A **full or relative path** to the `public` directory or the directory you want to write sitemaps into. Defaults to `public/` under your application root or relative to the current working directory.
352
393
 
353
- * `sitemaps_host` - String. **Host including protocol** to use when generating a link to a sitemap file i.e. the hostname of the server where the sitemaps are hosted. The value will differ from the hostname in your sitemap links. For example: `'http://amazon.aws.com/'`
394
+ * `sitemaps_host` - String. **Host including protocol** to use when generating a link to a sitemap file i.e. the hostname of the server where the sitemaps are hosted. The value will differ from the hostname in your sitemap links. For example: `'http://amazon.aws.com/'`. Note that `include_index` is
395
+ automatically turned off when the `sitemaps_host` does not match `default_host`.
396
+ Because the link to the sitemap index file that would otherwise be added would point to a
397
+ different host than the rest of the links in the sitemap. Something that the sitemap rules forbid.
354
398
 
355
399
  * `sitemaps_namer` - A `SitemapGenerator::SitemapNamer` instance **for generating sitemap names**. You can read about Sitemap Namers by reading the API docs. Sitemap Namers don't apply to the sitemap index. You can only modify the name of the index file using the `filename` option. Sitemap Namers allow you to set the name, extension and number sequence for sitemap files.
356
400
 
@@ -358,6 +402,12 @@ The following options are supported:
358
402
 
359
403
  * `verbose` - Boolean. Whether to **output a sitemap summary** describing the sitemap files and giving statistics about your sitemap. Default is `false`. When using the Rake tasks `verbose` will be `true` unless you pass the `-s` option.
360
404
 
405
+ * `adapter` - Instance. The default adapter is a `SitemapGenerator::FileAdapter`
406
+ which simply writes files to the filesystem. You can use a `SitemapGenerator::WaveAdapter`
407
+ for uploading sitemaps to remote servers - useful for read-only hosts such as Heroku. Or
408
+ you can provide an instance of your own class to provide custom behavior. Your class must
409
+ define a write method which takes a `SitemapGenerator::Location` and raw XML data.
410
+
361
411
  Sitemap Groups
362
412
  =======
363
413
 
@@ -578,3 +628,4 @@ Copyright (c) 2009 Karl Varga released under the MIT license
578
628
  [image_tags]:http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=178636
579
629
  [geo_tags]:http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=94555
580
630
  [news_tags]:http://www.google.com/support/news_pub/bin/answer.py?answer=74288
631
+ [remote_hosts]:https://github.com/kjvarga/sitemap_generator/wiki/Generate-Sitemaps-on-read-only-filesystems-like-Heroku
data/VERSION CHANGED
@@ -1 +1 @@
1
- 2.1.0
1
+ 2.1.1
@@ -3,11 +3,27 @@ require 'uri'
3
3
 
4
4
  module SitemapGenerator
5
5
  module Builder
6
+ # A Hash-like class for holding information about a sitemap URL and
7
+ # generating an XML <url> element suitable for sitemaps.
6
8
  class SitemapUrl < Hash
7
9
 
8
- # Call with:
9
- # sitemap - a Sitemap instance, or
10
- # path, options - a path for the URL and options hash
10
+ # Return a new instance with options configured on it.
11
+ #
12
+ # == Arguments
13
+ # * sitemap - a Sitemap instance, or
14
+ # * path, options - a path string and options hash
15
+ #
16
+ # == Options
17
+ # Requires a host to be set. If passing a sitemap, the sitemap must have a +default_host+
18
+ # configured. If calling with a path and options, you must include the <tt>:host</tt> option.
19
+ #
20
+ # * +priority+
21
+ # * +changefreq+
22
+ # * +lastmod+
23
+ # * +images+
24
+ # * +video+
25
+ # * +geo+
26
+ # * +news+
11
27
  def initialize(path, options={})
12
28
  if sitemap = path.is_a?(SitemapGenerator::Builder::SitemapFile) && path
13
29
  options.reverse_merge!(:host => sitemap.location.host, :lastmod => sitemap.lastmod)
@@ -16,17 +32,18 @@ module SitemapGenerator
16
32
 
17
33
  SitemapGenerator::Utilities.assert_valid_keys(options, :priority, :changefreq, :lastmod, :host, :images, :video, :geo, :news)
18
34
  options.reverse_merge!(:priority => 0.5, :changefreq => 'weekly', :lastmod => Time.now, :images => [], :news => {})
35
+ raise "Cannot generate a url without a host" unless options[:host].present?
19
36
  self.merge!(
20
- :path => path,
21
- :priority => options[:priority],
37
+ :path => path,
38
+ :priority => options[:priority],
22
39
  :changefreq => options[:changefreq],
23
- :lastmod => options[:lastmod],
24
- :host => options[:host],
25
- :loc => URI.join(options[:host], path).to_s,
26
- :images => prepare_images(options[:images], options[:host]),
27
- :news => prepare_news(options[:news]),
28
- :video => options[:video],
29
- :geo => options[:geo]
40
+ :lastmod => options[:lastmod],
41
+ :host => options[:host],
42
+ :loc => URI.join(options[:host], path.to_s.sub(/^\//, '')).to_s, # support host with subdirectory
43
+ :images => prepare_images(options[:images], options[:host]),
44
+ :news => prepare_news(options[:news]),
45
+ :video => options[:video],
46
+ :geo => options[:geo]
30
47
  )
31
48
  end
32
49
 
@@ -133,4 +150,4 @@ module SitemapGenerator
133
150
  end
134
151
  end
135
152
  end
136
- end
153
+ end
@@ -10,8 +10,8 @@ module SitemapGenerator
10
10
  attr_reader :default_host, :sitemaps_path, :filename
11
11
  attr_accessor :verbose, :yahoo_app_id, :include_root, :include_index, :sitemaps_host, :adapter
12
12
 
13
- # Add links to the link set by evaluating the block. The block should
14
- # contains calls to sitemap methods like:
13
+ # Create a new sitemap index and sitemap files. Pass a block calls to the following
14
+ # methods:
15
15
  # * +add+ - Add a link to the current sitemap
16
16
  # * +group+ - Start a new group of sitemaps
17
17
  #
@@ -25,9 +25,12 @@ module SitemapGenerator
25
25
  # * <tt>:finalize</tt> - The sitemaps are written as they get full and at the end
26
26
  # of the block. Pass +false+ as the value to prevent the sitemap or sitemap index
27
27
  # from being finalized. Default is +true+.
28
+ #
29
+ # If you are calling +create+ more than once in your sitemap configuration file,
30
+ # make sure that you set a different +sitemaps_path+ or +filename+ for each call otherwise
31
+ # the sitemaps may be overwritten.
28
32
  def create(opts={}, &block)
29
- @sitemap_index = nil if @sitemap_index && @sitemap_index.finalized? && !@protect_index
30
- @sitemap = nil if @sitemap && @sitemap.finalized?
33
+ reset!
31
34
  set_options(opts)
32
35
  start_time = Time.now if @verbose
33
36
  interpreter.eval(:yield_sitemap => @yield_sitemap || SitemapGenerator.yield_sitemap?, &block)
@@ -47,8 +50,11 @@ module SitemapGenerator
47
50
  # Constructor
48
51
  #
49
52
  # == Options:
50
- # * <tt>:adapter</tt> - subclass of SitemapGenerator::Adapter used for persisting the
51
- # sitemaps. Default adapter is a SitemapGenerator::FileAdapter
53
+ # * <tt>:adapter</tt> - instance of a class with a write method which takes a SitemapGenerator::Location
54
+ # and raw XML data and persists it. The default adapter is a SitemapGenerator::FileAdapter
55
+ # which simply writes files to the filesystem. You can use a SitemapGenerator::WaveAdapter
56
+ # for uploading sitemaps to remote servers - useful for read-only hosts such as Heroku. Or
57
+ # you can provide an instance of your own class to provide custom behavior.
52
58
  #
53
59
  # * <tt>:default_host</tt> - host including protocol to use in all sitemap links
54
60
  # e.g. http://en.google.ca
@@ -57,8 +63,15 @@ module SitemapGenerator
57
63
  # Defaults to the <tt>public/</tt> directory in your application root directory or
58
64
  # the current working directory.
59
65
  #
60
- # * <tt>:sitemaps_host</tt> - host (including protocol) to use in links to the sitemaps. Useful if your sitemaps
61
- # are hosted o different server e.g. 'http://amazon.aws.com/'
66
+ # * <tt>:sitemaps_host</tt> - String. <b>Host including protocol</b> to use when generating
67
+ # a link to a sitemap file i.e. the hostname of the server where the sitemaps are hosted.
68
+ # The value will differ from the hostname in your sitemap links.
69
+ # For example: `'http://amazon.aws.com/'`.
70
+ #
71
+ # Note that `include_index` is automatically turned off when the `sitemaps_host` does
72
+ # not match `default_host`. Because the link to the sitemap index file that would
73
+ # otherwise be added would point to a different host than the rest of the links in
74
+ # the sitemap. Something that the sitemap rules forbid.
62
75
  #
63
76
  # * <tt>:sitemaps_path</tt> - path fragment within public to write sitemaps
64
77
  # to e.g. 'en/'. Sitemaps are written to <tt>public_path</tt> + <tt>sitemaps_path</tt>
@@ -69,11 +82,13 @@ module SitemapGenerator
69
82
  #
70
83
  # * <tt>:sitemaps_namer</tt> - A +SitemapNamer+ instance for generating the sitemap names.
71
84
  #
72
- # * <tt>:include_root</tt> - whether to include the root url i.e. '/' in each group of sitemaps.
73
- # Default is true.
85
+ # * <tt>include_index</tt> - Boolean. Whether to <b>add a link to the sitemap index<b>
86
+ # to the current sitemap. This points search engines to your Sitemap Index to
87
+ # include it in the indexing of your site. Default is `true`. Turned off when
88
+ # `sitemaps_host` is set or within a `group()` block.
74
89
  #
75
- # * <tt>:include_index</tt> - whether to include the sitemap index URL in each group of sitemaps.
76
- # Default is true.
90
+ # * <tt>include_root</tt> - Boolean. Whether to **add the root** url i.e. '/' to the
91
+ # current sitemap. Default is `true`. Turned off within a `group()` block.
77
92
  #
78
93
  # * <tt>:verbose</tt> - If +true+, output a summary line for each sitemap and sitemap
79
94
  # index that is created. Default is +false+.
@@ -110,10 +125,10 @@ module SitemapGenerator
110
125
  retry
111
126
  end
112
127
 
113
- # Create a new group of sitemaps. Returns a new LinkSet instance with options set on it.
128
+ # Create a new group of sitemap files.
114
129
  #
115
- # All groups share this LinkSet's sitemap index, which is not modified by any of the options
116
- # passed to +group+.
130
+ # Returns a new LinkSet instance with the options passed in set on it. All groups
131
+ # share the sitemap index, which is not affected by any of the options passed here.
117
132
  #
118
133
  # === Options
119
134
  # Any of the options to LinkSet.new. Except for <tt>:public_path</tt> which is shared
@@ -127,9 +142,13 @@ module SitemapGenerator
127
142
  #
128
143
  # If you are not changing any of the location settings like <tt>filename<tt>,
129
144
  # <tt>sitemaps_path</tt>, <tt>sitemaps_host</tt> or <tt>sitemaps_namer</tt>
130
- # the current sitemap will be used in the group. All of the options you have
131
- # specified which affect the way the links are generated will still be applied
132
- # for the duration of the group.
145
+ # links you add within the group will be added to the current sitemap file (e.g. sitemap1.xml).
146
+ # If one of these options is specified, the current sitemap file is finalized
147
+ # and a new sitemap file started.
148
+ #
149
+ # Options like <tt>:default_host</tt> can be used and it will only affect the links
150
+ # within the group. Links added outside of the group will revert to the previous
151
+ # +default_host+.
133
152
  def group(opts={}, &block)
134
153
  @created_group = true
135
154
  original_opts = opts.dup
@@ -230,6 +249,24 @@ module SitemapGenerator
230
249
  finalize_sitemap_index!
231
250
  end
232
251
 
252
+ # Return a boolean indicating hether to add a link to the sitemap index file
253
+ # to the current sitemap. This points search engines to your Sitemap Index so
254
+ # they include it in the indexing of your site, but is not strictly neccessary.
255
+ # Default is `true`. Turned off when `sitemaps_host` is set or within a `group()` block.
256
+ def include_index?
257
+ if default_host && sitemaps_host && sitemaps_host != default_host
258
+ false
259
+ else
260
+ @include_index
261
+ end
262
+ end
263
+
264
+ # Return a boolean indicating whether to automatically add the root url i.e. '/' to the
265
+ # current sitemap. Default is `true`. Turned off within a `group()` block.
266
+ def include_root?
267
+ !!@include_root
268
+ end
269
+
233
270
  protected
234
271
 
235
272
  # Set each option on this instance using accessor methods. This will affect
@@ -273,8 +310,12 @@ module SitemapGenerator
273
310
  # Add default links if those options are turned on. Record the fact that we have done so
274
311
  # in an instance variable.
275
312
  def add_default_links
276
- sitemap.add('/', :lastmod => Time.now, :changefreq => 'always', :priority => 1.0, :host => @default_host) if include_root
277
- sitemap.add(sitemap_index, :lastmod => Time.now, :changefreq => 'always', :priority => 1.0) if include_index
313
+ if include_root?
314
+ sitemap.add('/', :lastmod => Time.now, :changefreq => 'always', :priority => 1.0, :host => @default_host)
315
+ end
316
+ if include_index?
317
+ sitemap.add(sitemap_index, :lastmod => Time.now, :changefreq => 'always', :priority => 1.0)
318
+ end
278
319
  @added_default_links = true
279
320
  end
280
321
 
@@ -310,6 +351,15 @@ module SitemapGenerator
310
351
  @interpreter ||= SitemapGenerator::Interpreter.new(:link_set => self)
311
352
  end
312
353
 
354
+ # Reset this instance. Keep the same options, but return to the same state
355
+ # as before an sitemaps were created.
356
+ def reset!
357
+ @sitemap_index = nil if @sitemap_index && @sitemap_index.finalized? && !@protect_index
358
+ @sitemap = nil if @sitemap && @sitemap.finalized?
359
+ self.sitemaps_namer.reset # start from 1
360
+ @added_default_links = false
361
+ end
362
+
313
363
  module LocationHelpers
314
364
  public
315
365
 
@@ -351,6 +401,9 @@ module SitemapGenerator
351
401
  # Set the host name, including protocol, that will be used on all links to your sitemap
352
402
  # files. Useful when the server that hosts the sitemaps is not on the same host as
353
403
  # the links in the sitemap.
404
+ #
405
+ # Note that `include_index` will be turned off to avoid adding a link to a sitemap with
406
+ # a different host than the other links.
354
407
  def sitemaps_host=(value)
355
408
  @sitemaps_host = value
356
409
  update_location_info(:host, value)
metadata CHANGED
@@ -1,13 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sitemap_generator
3
3
  version: !ruby/object:Gem::Version
4
- hash: 11
4
+ hash: 9
5
5
  prerelease:
6
6
  segments:
7
7
  - 2
8
8
  - 1
9
- - 0
10
- version: 2.1.0
9
+ - 1
10
+ version: 2.1.1
11
11
  platform: ruby
12
12
  authors:
13
13
  - Karl Varga
@@ -16,7 +16,7 @@ autorequire:
16
16
  bindir: bin
17
17
  cert_chain: []
18
18
 
19
- date: 2011-08-31 00:00:00 -07:00
19
+ date: 2011-09-19 00:00:00 -07:00
20
20
  default_executable:
21
21
  dependencies:
22
22
  - !ruby/object:Gem::Dependency