sitemap_generator 2.1.0 → 2.1.1
Sign up to get free protection for your applications and to get access to all the features.
- data/Gemfile +2 -2
- data/Gemfile.lock +3 -3
- data/README.md +55 -4
- data/VERSION +1 -1
- data/lib/sitemap_generator/builder/sitemap_url.rb +30 -13
- data/lib/sitemap_generator/link_set.rb +73 -20
- metadata +4 -4
data/Gemfile
CHANGED
data/Gemfile.lock
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
PATH
|
2
2
|
remote: ./
|
3
3
|
specs:
|
4
|
-
sitemap_generator (2.1.
|
4
|
+
sitemap_generator (2.1.1)
|
5
5
|
|
6
6
|
GEM
|
7
7
|
remote: http://rubygems.org/
|
@@ -77,7 +77,7 @@ DEPENDENCIES
|
|
77
77
|
rake (>= 0.8.7)
|
78
78
|
rspec (= 1.3.1)
|
79
79
|
rspec-rails (~> 1.3.2)
|
80
|
-
ruby-debug (
|
81
|
-
ruby-debug-base (
|
80
|
+
ruby-debug (~> 0.10)
|
81
|
+
ruby-debug-base (~> 0.10)
|
82
82
|
sitemap_generator!
|
83
83
|
sqlite3-ruby (= 1.3.1)
|
data/README.md
CHANGED
@@ -23,10 +23,10 @@ Does your website use SitemapGenerator to generate Sitemaps? Where would you be
|
|
23
23
|
|
24
24
|
<a href='http://www.pledgie.com/campaigns/15267'><img alt='Click here to lend your support to: SitemapGenerator and make a donation at www.pledgie.com !' src='http://pledgie.com/campaigns/15267.png?skin_name=chrome' border='0' /></a>
|
25
25
|
|
26
|
-
|
27
26
|
Changelog
|
28
27
|
-------
|
29
28
|
|
29
|
+
- v2.1.1: Support calling `create()` multiple times in a sitemap config. Support host names with path segments so you can use a `default_host` like `'http://mysite.com/subdirectory/'`. Turn off `include_index` when the `sitemaps_host` differs from `default_host`. Add docs about how to upload to remote hosts.
|
30
30
|
- v2.1.0: [News sitemap][sitemap_news] support
|
31
31
|
- v2.0.1.pre2: Fix uploading to the (bucket) root on a remote server
|
32
32
|
- v2.0.1.pre1: Support read-only filesystems like Heroku by supporting uploading to remote host
|
@@ -155,6 +155,47 @@ To ensure that your application's sitemaps are available after a deployment you
|
|
155
155
|
run "cd #{latest_release} && RAILS_ENV=#{rails_env} rake sitemap:refresh"
|
156
156
|
end
|
157
157
|
|
158
|
+
Upload Sitemaps to a Remote Host
|
159
|
+
----------
|
160
|
+
|
161
|
+
Sometimes it is desirable to host your sitemap files on a remote server and point robots
|
162
|
+
and search engines to the remote files. For example if you are using a host like Heroku
|
163
|
+
which doesn't allow writing to the local filesystem. You still require *some* write access
|
164
|
+
because the sitemap files need to be written out before uploading, so generally a host will
|
165
|
+
give you write access to a temporary directory. On Heroku this is `tmp/` in your application
|
166
|
+
directory.
|
167
|
+
|
168
|
+
Sitemap Generator uses CarrierWave to support uploading to Amazon S3 store, Rackspace Cloud Files store, and MongoDB's GridF - whatever CarrierWave supports.
|
169
|
+
|
170
|
+
1. Please see [this wiki page][remote_hosts] for more information about setting up CarrierWave, SitemapGenerator and Rails.
|
171
|
+
|
172
|
+
2. Once you have CarrierWave setup and configured all you need to do is set some options in your sitemap config, such as:
|
173
|
+
|
174
|
+
* `default_host` - your website host name
|
175
|
+
* `sitemaps_host` - the remote host where your sitemaps will be hosted
|
176
|
+
* `public_path` - the directory to write sitemaps to locally e.g. `tmp/`
|
177
|
+
* `sitemaps_path` - set to a directory/path if you don't want to upload to the root of your `sitemaps_host`
|
178
|
+
* `adapter` - instance of `SitemapGenerator::WaveAdapter`
|
179
|
+
|
180
|
+
For Example:
|
181
|
+
|
182
|
+
SitemapGenerator::Sitemap.default_host = "http://www.example.com"
|
183
|
+
SitemapGenerator::Sitemap.sitemaps_host = "http://s3.amazonaws.com/sitemap-generator/"
|
184
|
+
SitemapGenerator::Sitemap.public_path = 'tmp/'
|
185
|
+
SitemapGenerator::Sitemap.sitemaps_path = 'sitemaps/'
|
186
|
+
SitemapGenerator::Sitemap.adapter = SitemapGenerator::WaveAdapter.new
|
187
|
+
|
188
|
+
3. Update your `robots.txt` file to point robots to the remote sitemap index file, e.g:
|
189
|
+
|
190
|
+
Sitemap: http://s3.amazonaws.com/sitemap-generator/sitemaps/sitemap_index.xml.gz
|
191
|
+
|
192
|
+
You generate your sitemaps as usual using `rake sitemap:refresh`.
|
193
|
+
|
194
|
+
Note that SitemapGenerator will automatically turn off `include_index` in this case because
|
195
|
+
the `sitemaps_host` does not match the `default_host`. The link to the sitemap index file
|
196
|
+
that would otherwise be included would point to a different host than the rest of the links
|
197
|
+
in the sitemap, something that the sitemap rules forbid.
|
198
|
+
|
158
199
|
Sitemap Configuration
|
159
200
|
======
|
160
201
|
|
@@ -344,13 +385,16 @@ The following options are supported:
|
|
344
385
|
|
345
386
|
* `filename` - Symbol. The **base name for the files** that will be generated. The default value is `:sitemap`. This yields sitemaps with names like `sitemap1.xml.gz`, `sitemap2.xml.gz`, `sitemap3.xml.gz` etc, and a sitemap index named `sitemap_index.xml.gz`. If we now set the value to `:geo` the sitemaps would be named `geo1.xml.gz`, `geo2.xml.gz`, `geo3.xml.gz` etc, and the sitemap index would be named `geo_index.xml.gz`.
|
346
387
|
|
347
|
-
* `include_index` - Boolean. Whether to **add a link to the sitemap index** to the current sitemap. This points search engines to your Sitemap Index to include it in the indexing of your site. Default is `true`.
|
388
|
+
* `include_index` - Boolean. Whether to **add a link to the sitemap index** to the current sitemap. This points search engines to your Sitemap Index to include it in the indexing of your site. Default is `true`. Turned off when `sitemaps_host` is set or within a `group()` block.
|
348
389
|
|
349
|
-
* `include_root` - Boolean. Whether to **add the root** url i.e. '/' to the current sitemap. Default is `true`.
|
390
|
+
* `include_root` - Boolean. Whether to **add the root** url i.e. '/' to the current sitemap. Default is `true`. Turned off within a `group()` block.
|
350
391
|
|
351
392
|
* `public_path` - String. A **full or relative path** to the `public` directory or the directory you want to write sitemaps into. Defaults to `public/` under your application root or relative to the current working directory.
|
352
393
|
|
353
|
-
* `sitemaps_host` - String. **Host including protocol** to use when generating a link to a sitemap file i.e. the hostname of the server where the sitemaps are hosted. The value will differ from the hostname in your sitemap links. For example: `'http://amazon.aws.com/'`
|
394
|
+
* `sitemaps_host` - String. **Host including protocol** to use when generating a link to a sitemap file i.e. the hostname of the server where the sitemaps are hosted. The value will differ from the hostname in your sitemap links. For example: `'http://amazon.aws.com/'`. Note that `include_index` is
|
395
|
+
automatically turned off when the `sitemaps_host` does not match `default_host`.
|
396
|
+
Because the link to the sitemap index file that would otherwise be added would point to a
|
397
|
+
different host than the rest of the links in the sitemap. Something that the sitemap rules forbid.
|
354
398
|
|
355
399
|
* `sitemaps_namer` - A `SitemapGenerator::SitemapNamer` instance **for generating sitemap names**. You can read about Sitemap Namers by reading the API docs. Sitemap Namers don't apply to the sitemap index. You can only modify the name of the index file using the `filename` option. Sitemap Namers allow you to set the name, extension and number sequence for sitemap files.
|
356
400
|
|
@@ -358,6 +402,12 @@ The following options are supported:
|
|
358
402
|
|
359
403
|
* `verbose` - Boolean. Whether to **output a sitemap summary** describing the sitemap files and giving statistics about your sitemap. Default is `false`. When using the Rake tasks `verbose` will be `true` unless you pass the `-s` option.
|
360
404
|
|
405
|
+
* `adapter` - Instance. The default adapter is a `SitemapGenerator::FileAdapter`
|
406
|
+
which simply writes files to the filesystem. You can use a `SitemapGenerator::WaveAdapter`
|
407
|
+
for uploading sitemaps to remote servers - useful for read-only hosts such as Heroku. Or
|
408
|
+
you can provide an instance of your own class to provide custom behavior. Your class must
|
409
|
+
define a write method which takes a `SitemapGenerator::Location` and raw XML data.
|
410
|
+
|
361
411
|
Sitemap Groups
|
362
412
|
=======
|
363
413
|
|
@@ -578,3 +628,4 @@ Copyright (c) 2009 Karl Varga released under the MIT license
|
|
578
628
|
[image_tags]:http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=178636
|
579
629
|
[geo_tags]:http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=94555
|
580
630
|
[news_tags]:http://www.google.com/support/news_pub/bin/answer.py?answer=74288
|
631
|
+
[remote_hosts]:https://github.com/kjvarga/sitemap_generator/wiki/Generate-Sitemaps-on-read-only-filesystems-like-Heroku
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
2.1.
|
1
|
+
2.1.1
|
@@ -3,11 +3,27 @@ require 'uri'
|
|
3
3
|
|
4
4
|
module SitemapGenerator
|
5
5
|
module Builder
|
6
|
+
# A Hash-like class for holding information about a sitemap URL and
|
7
|
+
# generating an XML <url> element suitable for sitemaps.
|
6
8
|
class SitemapUrl < Hash
|
7
9
|
|
8
|
-
#
|
9
|
-
#
|
10
|
-
#
|
10
|
+
# Return a new instance with options configured on it.
|
11
|
+
#
|
12
|
+
# == Arguments
|
13
|
+
# * sitemap - a Sitemap instance, or
|
14
|
+
# * path, options - a path string and options hash
|
15
|
+
#
|
16
|
+
# == Options
|
17
|
+
# Requires a host to be set. If passing a sitemap, the sitemap must have a +default_host+
|
18
|
+
# configured. If calling with a path and options, you must include the <tt>:host</tt> option.
|
19
|
+
#
|
20
|
+
# * +priority+
|
21
|
+
# * +changefreq+
|
22
|
+
# * +lastmod+
|
23
|
+
# * +images+
|
24
|
+
# * +video+
|
25
|
+
# * +geo+
|
26
|
+
# * +news+
|
11
27
|
def initialize(path, options={})
|
12
28
|
if sitemap = path.is_a?(SitemapGenerator::Builder::SitemapFile) && path
|
13
29
|
options.reverse_merge!(:host => sitemap.location.host, :lastmod => sitemap.lastmod)
|
@@ -16,17 +32,18 @@ module SitemapGenerator
|
|
16
32
|
|
17
33
|
SitemapGenerator::Utilities.assert_valid_keys(options, :priority, :changefreq, :lastmod, :host, :images, :video, :geo, :news)
|
18
34
|
options.reverse_merge!(:priority => 0.5, :changefreq => 'weekly', :lastmod => Time.now, :images => [], :news => {})
|
35
|
+
raise "Cannot generate a url without a host" unless options[:host].present?
|
19
36
|
self.merge!(
|
20
|
-
:path
|
21
|
-
:priority
|
37
|
+
:path => path,
|
38
|
+
:priority => options[:priority],
|
22
39
|
:changefreq => options[:changefreq],
|
23
|
-
:lastmod
|
24
|
-
:host
|
25
|
-
:loc
|
26
|
-
:images
|
27
|
-
:news
|
28
|
-
:video
|
29
|
-
:geo
|
40
|
+
:lastmod => options[:lastmod],
|
41
|
+
:host => options[:host],
|
42
|
+
:loc => URI.join(options[:host], path.to_s.sub(/^\//, '')).to_s, # support host with subdirectory
|
43
|
+
:images => prepare_images(options[:images], options[:host]),
|
44
|
+
:news => prepare_news(options[:news]),
|
45
|
+
:video => options[:video],
|
46
|
+
:geo => options[:geo]
|
30
47
|
)
|
31
48
|
end
|
32
49
|
|
@@ -133,4 +150,4 @@ module SitemapGenerator
|
|
133
150
|
end
|
134
151
|
end
|
135
152
|
end
|
136
|
-
end
|
153
|
+
end
|
@@ -10,8 +10,8 @@ module SitemapGenerator
|
|
10
10
|
attr_reader :default_host, :sitemaps_path, :filename
|
11
11
|
attr_accessor :verbose, :yahoo_app_id, :include_root, :include_index, :sitemaps_host, :adapter
|
12
12
|
|
13
|
-
#
|
14
|
-
#
|
13
|
+
# Create a new sitemap index and sitemap files. Pass a block calls to the following
|
14
|
+
# methods:
|
15
15
|
# * +add+ - Add a link to the current sitemap
|
16
16
|
# * +group+ - Start a new group of sitemaps
|
17
17
|
#
|
@@ -25,9 +25,12 @@ module SitemapGenerator
|
|
25
25
|
# * <tt>:finalize</tt> - The sitemaps are written as they get full and at the end
|
26
26
|
# of the block. Pass +false+ as the value to prevent the sitemap or sitemap index
|
27
27
|
# from being finalized. Default is +true+.
|
28
|
+
#
|
29
|
+
# If you are calling +create+ more than once in your sitemap configuration file,
|
30
|
+
# make sure that you set a different +sitemaps_path+ or +filename+ for each call otherwise
|
31
|
+
# the sitemaps may be overwritten.
|
28
32
|
def create(opts={}, &block)
|
29
|
-
|
30
|
-
@sitemap = nil if @sitemap && @sitemap.finalized?
|
33
|
+
reset!
|
31
34
|
set_options(opts)
|
32
35
|
start_time = Time.now if @verbose
|
33
36
|
interpreter.eval(:yield_sitemap => @yield_sitemap || SitemapGenerator.yield_sitemap?, &block)
|
@@ -47,8 +50,11 @@ module SitemapGenerator
|
|
47
50
|
# Constructor
|
48
51
|
#
|
49
52
|
# == Options:
|
50
|
-
# * <tt>:adapter</tt> -
|
51
|
-
#
|
53
|
+
# * <tt>:adapter</tt> - instance of a class with a write method which takes a SitemapGenerator::Location
|
54
|
+
# and raw XML data and persists it. The default adapter is a SitemapGenerator::FileAdapter
|
55
|
+
# which simply writes files to the filesystem. You can use a SitemapGenerator::WaveAdapter
|
56
|
+
# for uploading sitemaps to remote servers - useful for read-only hosts such as Heroku. Or
|
57
|
+
# you can provide an instance of your own class to provide custom behavior.
|
52
58
|
#
|
53
59
|
# * <tt>:default_host</tt> - host including protocol to use in all sitemap links
|
54
60
|
# e.g. http://en.google.ca
|
@@ -57,8 +63,15 @@ module SitemapGenerator
|
|
57
63
|
# Defaults to the <tt>public/</tt> directory in your application root directory or
|
58
64
|
# the current working directory.
|
59
65
|
#
|
60
|
-
# * <tt>:sitemaps_host</tt> -
|
61
|
-
#
|
66
|
+
# * <tt>:sitemaps_host</tt> - String. <b>Host including protocol</b> to use when generating
|
67
|
+
# a link to a sitemap file i.e. the hostname of the server where the sitemaps are hosted.
|
68
|
+
# The value will differ from the hostname in your sitemap links.
|
69
|
+
# For example: `'http://amazon.aws.com/'`.
|
70
|
+
#
|
71
|
+
# Note that `include_index` is automatically turned off when the `sitemaps_host` does
|
72
|
+
# not match `default_host`. Because the link to the sitemap index file that would
|
73
|
+
# otherwise be added would point to a different host than the rest of the links in
|
74
|
+
# the sitemap. Something that the sitemap rules forbid.
|
62
75
|
#
|
63
76
|
# * <tt>:sitemaps_path</tt> - path fragment within public to write sitemaps
|
64
77
|
# to e.g. 'en/'. Sitemaps are written to <tt>public_path</tt> + <tt>sitemaps_path</tt>
|
@@ -69,11 +82,13 @@ module SitemapGenerator
|
|
69
82
|
#
|
70
83
|
# * <tt>:sitemaps_namer</tt> - A +SitemapNamer+ instance for generating the sitemap names.
|
71
84
|
#
|
72
|
-
# * <tt
|
73
|
-
#
|
85
|
+
# * <tt>include_index</tt> - Boolean. Whether to <b>add a link to the sitemap index<b>
|
86
|
+
# to the current sitemap. This points search engines to your Sitemap Index to
|
87
|
+
# include it in the indexing of your site. Default is `true`. Turned off when
|
88
|
+
# `sitemaps_host` is set or within a `group()` block.
|
74
89
|
#
|
75
|
-
# * <tt
|
76
|
-
# Default is true.
|
90
|
+
# * <tt>include_root</tt> - Boolean. Whether to **add the root** url i.e. '/' to the
|
91
|
+
# current sitemap. Default is `true`. Turned off within a `group()` block.
|
77
92
|
#
|
78
93
|
# * <tt>:verbose</tt> - If +true+, output a summary line for each sitemap and sitemap
|
79
94
|
# index that is created. Default is +false+.
|
@@ -110,10 +125,10 @@ module SitemapGenerator
|
|
110
125
|
retry
|
111
126
|
end
|
112
127
|
|
113
|
-
# Create a new group of
|
128
|
+
# Create a new group of sitemap files.
|
114
129
|
#
|
115
|
-
#
|
116
|
-
# passed
|
130
|
+
# Returns a new LinkSet instance with the options passed in set on it. All groups
|
131
|
+
# share the sitemap index, which is not affected by any of the options passed here.
|
117
132
|
#
|
118
133
|
# === Options
|
119
134
|
# Any of the options to LinkSet.new. Except for <tt>:public_path</tt> which is shared
|
@@ -127,9 +142,13 @@ module SitemapGenerator
|
|
127
142
|
#
|
128
143
|
# If you are not changing any of the location settings like <tt>filename<tt>,
|
129
144
|
# <tt>sitemaps_path</tt>, <tt>sitemaps_host</tt> or <tt>sitemaps_namer</tt>
|
130
|
-
# the
|
131
|
-
#
|
132
|
-
#
|
145
|
+
# links you add within the group will be added to the current sitemap file (e.g. sitemap1.xml).
|
146
|
+
# If one of these options is specified, the current sitemap file is finalized
|
147
|
+
# and a new sitemap file started.
|
148
|
+
#
|
149
|
+
# Options like <tt>:default_host</tt> can be used and it will only affect the links
|
150
|
+
# within the group. Links added outside of the group will revert to the previous
|
151
|
+
# +default_host+.
|
133
152
|
def group(opts={}, &block)
|
134
153
|
@created_group = true
|
135
154
|
original_opts = opts.dup
|
@@ -230,6 +249,24 @@ module SitemapGenerator
|
|
230
249
|
finalize_sitemap_index!
|
231
250
|
end
|
232
251
|
|
252
|
+
# Return a boolean indicating hether to add a link to the sitemap index file
|
253
|
+
# to the current sitemap. This points search engines to your Sitemap Index so
|
254
|
+
# they include it in the indexing of your site, but is not strictly neccessary.
|
255
|
+
# Default is `true`. Turned off when `sitemaps_host` is set or within a `group()` block.
|
256
|
+
def include_index?
|
257
|
+
if default_host && sitemaps_host && sitemaps_host != default_host
|
258
|
+
false
|
259
|
+
else
|
260
|
+
@include_index
|
261
|
+
end
|
262
|
+
end
|
263
|
+
|
264
|
+
# Return a boolean indicating whether to automatically add the root url i.e. '/' to the
|
265
|
+
# current sitemap. Default is `true`. Turned off within a `group()` block.
|
266
|
+
def include_root?
|
267
|
+
!!@include_root
|
268
|
+
end
|
269
|
+
|
233
270
|
protected
|
234
271
|
|
235
272
|
# Set each option on this instance using accessor methods. This will affect
|
@@ -273,8 +310,12 @@ module SitemapGenerator
|
|
273
310
|
# Add default links if those options are turned on. Record the fact that we have done so
|
274
311
|
# in an instance variable.
|
275
312
|
def add_default_links
|
276
|
-
|
277
|
-
|
313
|
+
if include_root?
|
314
|
+
sitemap.add('/', :lastmod => Time.now, :changefreq => 'always', :priority => 1.0, :host => @default_host)
|
315
|
+
end
|
316
|
+
if include_index?
|
317
|
+
sitemap.add(sitemap_index, :lastmod => Time.now, :changefreq => 'always', :priority => 1.0)
|
318
|
+
end
|
278
319
|
@added_default_links = true
|
279
320
|
end
|
280
321
|
|
@@ -310,6 +351,15 @@ module SitemapGenerator
|
|
310
351
|
@interpreter ||= SitemapGenerator::Interpreter.new(:link_set => self)
|
311
352
|
end
|
312
353
|
|
354
|
+
# Reset this instance. Keep the same options, but return to the same state
|
355
|
+
# as before an sitemaps were created.
|
356
|
+
def reset!
|
357
|
+
@sitemap_index = nil if @sitemap_index && @sitemap_index.finalized? && !@protect_index
|
358
|
+
@sitemap = nil if @sitemap && @sitemap.finalized?
|
359
|
+
self.sitemaps_namer.reset # start from 1
|
360
|
+
@added_default_links = false
|
361
|
+
end
|
362
|
+
|
313
363
|
module LocationHelpers
|
314
364
|
public
|
315
365
|
|
@@ -351,6 +401,9 @@ module SitemapGenerator
|
|
351
401
|
# Set the host name, including protocol, that will be used on all links to your sitemap
|
352
402
|
# files. Useful when the server that hosts the sitemaps is not on the same host as
|
353
403
|
# the links in the sitemap.
|
404
|
+
#
|
405
|
+
# Note that `include_index` will be turned off to avoid adding a link to a sitemap with
|
406
|
+
# a different host than the other links.
|
354
407
|
def sitemaps_host=(value)
|
355
408
|
@sitemaps_host = value
|
356
409
|
update_location_info(:host, value)
|
metadata
CHANGED
@@ -1,13 +1,13 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: sitemap_generator
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
hash:
|
4
|
+
hash: 9
|
5
5
|
prerelease:
|
6
6
|
segments:
|
7
7
|
- 2
|
8
8
|
- 1
|
9
|
-
-
|
10
|
-
version: 2.1.
|
9
|
+
- 1
|
10
|
+
version: 2.1.1
|
11
11
|
platform: ruby
|
12
12
|
authors:
|
13
13
|
- Karl Varga
|
@@ -16,7 +16,7 @@ autorequire:
|
|
16
16
|
bindir: bin
|
17
17
|
cert_chain: []
|
18
18
|
|
19
|
-
date: 2011-
|
19
|
+
date: 2011-09-19 00:00:00 -07:00
|
20
20
|
default_executable:
|
21
21
|
dependencies:
|
22
22
|
- !ruby/object:Gem::Dependency
|