websitary 0.4 → 0.5

Sign up to get free protection for your applications and to get access to all the features.
@@ -1,3 +1,18 @@
1
+ = 0.5
2
+
3
+ * mailto: and javascript: hrefs are now handled via the exclude option
4
+ * rewrite absolute URLs sans host correctly
5
+ * strip href and image src tags in order to prevent parser errors
6
+ * some scaffolding for mechanize
7
+ * global proxy option (currently only used for mechanize)
8
+ * use -nolist for lynx
9
+ * catch errors in Websitary::App#execute_downdiff
10
+ * :rss_find_enclosure => LAMBDA: Extract the enclosure URL from the item
11
+ description
12
+ * :rss_format_local_copy => STRING|BLOCK/2: Format the display of the
13
+ local copy.
14
+
15
+
1
16
  = 0.4
2
17
 
3
18
  * Sources may have a :timeout option.
data/README.txt CHANGED
@@ -212,6 +212,12 @@ Known global options:
212
212
  <tt>:toggle_body => BOOLEAN</tt>::
213
213
  If true, make a news body collabsable on mouse-clicks (sort of).
214
214
 
215
+ <tt>:proxy => STRING</tt>, <tt>:proxy => ARRAY</tt>::
216
+ The proxy. (currently only supported by mechanize)
217
+
218
+ <tt>:user_agent => STRING</tt>::
219
+ Set the user agent (only for certain queries).
220
+
215
221
 
216
222
  ==== output_format FORMAT, output_format [FORMAT1, FORMAT2, ...]
217
223
  Set the output format.
@@ -318,11 +324,29 @@ Options
318
324
  <tt>:rss_enclosure => true|"DIRECTORY"</tt>::
319
325
  If true, save rss feed enclosures in
320
326
  "~/.websitary/attachments/RSS_FEED_NAME/". If a string, use this as
321
- destination directory.
327
+ destination directory. Only enclosures of new items will be saved --
328
+ i.e. when downloading a feed for the first time, no enclosures will be
329
+ saved.
330
+
331
+ <tt>:rss_find_enclosure => BLOCK</tt>::
332
+ Certain RSS-feeds embed enclosures in the description. Use this option
333
+ to scan the description (a Hpricot document) for an URL that is then saved
334
+ as enclosure if the :rss_enclosure option is set.
335
+ Example:
336
+ source 'http://www.example.com/rss',
337
+ :title => 'Example',
338
+ :use => :rss, :rss_enclosure => true,
339
+ :rss_find_enclosure => lambda {|item, doc| (doc / 'img').map {|e| e['src']}[0]}
322
340
 
323
341
  <tt>:rss_format (default: "plain_text")</tt>::
324
342
  When output format is :rss, create rss item descriptios as plain text.
325
343
 
344
+ <tt>:rss_format_local_copy => FORMAT_STRING | BLOCK</tt>::
345
+ By default a hypertext reference to the local copy of an RSS
346
+ enclosure is added to entry. Sometimes you may want to display
347
+ something inline (e.g. an image). You can then use this option to
348
+ define a format string (one field = the local copy's file url).
349
+
326
350
  <tt>:show_initial => true</tt>::
327
351
  Include initial copies in the report (may not always work properly).
328
352
  This can also be set as a global option.
@@ -388,6 +412,13 @@ Example:
388
412
  Use open-uri for downloading the source. Use webdiff for generating
389
413
  diffs. This doesn't handle cookies and the like.
390
414
 
415
+ <tt>:mechanize</tt>::
416
+ Use mechanize (must be installed) for downloading the source. Use
417
+ webdiff for generating diffs. This calls the URL's :mechanize property
418
+ (a lambda that takes 3 arguments: URL, agent, page => HTML as string)
419
+ to post-process the page (or if not available, use the page body's
420
+ HTML).
421
+
391
422
  <tt>:text</tt>::
392
423
  This requires hpricot to be installed. Use open-uri for downloading
393
424
  and hpricot for converting HTML to plain text. This still requires
@@ -730,7 +761,7 @@ Now check out the configuration commands in the Synopsis section.
730
761
 
731
762
  == LICENSE:
732
763
  websitary Webpage Monitor
733
- Copyright (C) 2007 Thomas Link
764
+ Copyright (C) 2007-2008 Thomas Link
734
765
 
735
766
  This program is free software; you can redistribute it and/or modify
736
767
  it under the terms of the GNU General Public License as published by
data/Rakefile CHANGED
@@ -21,11 +21,11 @@ require 'rtagstask'
21
21
  RTagsTask.new
22
22
 
23
23
  task :ctags do
24
- `ctags --extra=+q --fields=+i+S -R bin lib`
24
+ puts `ctags --extra=+q --fields=+i+S -R bin lib`
25
25
  end
26
26
 
27
27
  task :files do
28
- `find bin lib -name "*.rb" > files.lst`
28
+ puts `find bin lib -name "*.rb" > files.lst`
29
29
  end
30
30
 
31
31
  # vim: syntax=Ruby
@@ -1,6 +1,6 @@
1
1
  #! /usr/bin/env ruby
2
2
  # websitary.rb -- The website news, rss feed, podcast catching monitor
3
- # @Last Change: 2007-12-26.
3
+ # @Last Change: 2008-02-12.
4
4
  # Author:: Thomas Link (micathom at gmail com)
5
5
  # License:: GPL (see http://www.gnu.org/licenses/gpl.txt)
6
6
  # Created:: 2007-06-09.
@@ -11,7 +11,7 @@ require 'websitary'
11
11
 
12
12
  if __FILE__ == $0
13
13
  w = Websitary::App.new(ARGV)
14
- t = w.configuration.get_optionvalue(:global, :timer)
14
+ t = w.configuration.optval_get(:global, :timer)
15
15
  if t
16
16
  exit_code = 0
17
17
  while exit_code <= 1
@@ -1,5 +1,5 @@
1
1
  # websitary.rb
2
- # @Last Change: 2008-01-13.
2
+ # @Last Change: 2008-03-11.
3
3
  # Author:: Thomas Link (micathom AT gmail com)
4
4
  # License:: GPL (see http://www.gnu.org/licenses/gpl.txt)
5
5
  # Created:: 2007-09-08.
@@ -7,7 +7,8 @@
7
7
 
8
8
  require 'cgi'
9
9
  require 'digest/md5'
10
- require 'ftools'
10
+ # require 'ftools'
11
+ require 'fileutils'
11
12
  require 'net/ftp'
12
13
  require 'optparse'
13
14
  require 'pathname'
@@ -33,8 +34,8 @@ end
33
34
 
34
35
  module Websitary
35
36
  APPNAME = 'websitary'
36
- VERSION = '0.4'
37
- REVISION = '2464'
37
+ VERSION = '0.5'
38
+ REVISION = '2476'
38
39
  end
39
40
 
40
41
  require 'websitary/applog'
@@ -72,7 +73,7 @@ class Websitary::App
72
73
  unless File.exists?(css)
73
74
  $logger.info "Copying default css file: #{css}"
74
75
  @configuration.write_file(css, 'w') do |io|
75
- io.puts @configuration.get_option(:page, :css)
76
+ io.puts @configuration.opt_get(:page, :css)
76
77
  end
77
78
  end
78
79
  end
@@ -99,7 +100,7 @@ class Websitary::App
99
100
  def execute_configuration
100
101
  keys = @configuration.options.keys
101
102
  urls = @configuration.todo
102
- # urls = @configuration.todo..sort {|a,b| @configuration.get(a, :title, a) <=> @configuration.get(b, :title, b)}
103
+ # urls = @configuration.todo..sort {|a,b| @configuration.url_get(a, :title, a) <=> @configuration.url_get(b, :title, b)}
103
104
  urls.each_with_index do |url, i|
104
105
  data = @configuration.urls[url]
105
106
  text = [
@@ -107,7 +108,7 @@ class Websitary::App
107
108
  "<b>current</b><br/>#{CGI.escapeHTML(@configuration.latestname(url, true))}<br/>",
108
109
  "<b>backup</b><br/>#{CGI.escapeHTML(@configuration.oldname(url, true))}<br/>",
109
110
  *((data.keys | keys).map do |k|
110
- v = @configuration.get(url, k).inspect
111
+ v = @configuration.url_get(url, k).inspect
111
112
  "<b>:#{k}</b><br/>#{CGI.escapeHTML(v)}<br/>"
112
113
  end)
113
114
  ]
@@ -205,7 +206,7 @@ class Websitary::App
205
206
  rv = 0
206
207
  @configuration.todo.each do |url|
207
208
  opts = @configuration.urls[url]
208
- name = @configuration.get(url, :title, url)
209
+ name = @configuration.url_get(url, :title, url)
209
210
  $logger.debug "Source: #{name}"
210
211
  aggrbase = @configuration.encoded_filename('aggregate', url, true, 'md5')
211
212
  aggrfiles = Dir["#{aggrbase}_*"]
@@ -223,7 +224,7 @@ class Websitary::App
223
224
  def execute_show
224
225
  @configuration.todo.each do |url|
225
226
  opts = @configuration.urls[url]
226
- $logger.debug "Source: #{@configuration.get(url, :title, url)}"
227
+ $logger.debug "Source: #{@configuration.url_get(url, :title, url)}"
227
228
  aggrbase = @configuration.encoded_filename('aggregate', url, true, 'md5')
228
229
  difftext = []
229
230
  aggrfiles = Dir["#{aggrbase}_*"]
@@ -233,7 +234,7 @@ class Websitary::App
233
234
  difftext.compact!
234
235
  difftext.delete('')
235
236
  unless difftext.empty?
236
- joindiffs = @configuration.get(url, :joindiffs, lambda {|t| t.join("\n")})
237
+ joindiffs = @configuration.url_get(url, :joindiffs, lambda {|t| t.join("\n")})
237
238
  difftext = @configuration.call_cmd(joindiffs, [difftext], :url => url) if joindiffs
238
239
  accumulate(url, difftext, opts)
239
240
  end
@@ -255,13 +256,13 @@ class Websitary::App
255
256
  end
256
257
  @configuration.todo.each do |url|
257
258
  opts = @configuration.urls[url]
258
- $logger.debug "Source: #{@configuration.get(url, :title, url)}"
259
+ $logger.debug "Source: #{@configuration.url_get(url, :title, url)}"
259
260
 
260
261
  diffed = @configuration.diffname(url, true)
261
262
  $logger.debug "diffname: #{diffed}"
262
263
 
263
264
  if File.exists?(diffed)
264
- $logger.warn "Reuse old diff: #{@configuration.get(url, :title, url)} => #{diffed}"
265
+ $logger.warn "Reuse old diff: #{@configuration.url_get(url, :title, url)} => #{diffed}"
265
266
  difftext = File.read(diffed)
266
267
  accumulate(url, difftext, opts)
267
268
  else
@@ -272,17 +273,22 @@ class Websitary::App
272
273
  older = @configuration.oldname(url, true)
273
274
  $logger.debug "older: #{older}"
274
275
 
275
- if rebuild or download(url, opts, latest, older)
276
- difftext = diff(url, opts, latest, older)
277
- if difftext
278
- @configuration.write_file(diffed, 'wb') {|io| io.puts difftext}
279
- # $logger.debug "difftext: #{difftext}" #DBG#
280
- if accumulator
281
- accumulator.call(url, difftext, opts)
282
- else
283
- accumulate(url, difftext, opts)
276
+ begin
277
+ if rebuild or download(url, opts, latest, older)
278
+ difftext = diff(url, opts, latest, older)
279
+ if difftext
280
+ @configuration.write_file(diffed, 'wb') {|io| io.puts difftext}
281
+ # $logger.debug "difftext: #{difftext}" #DBG#
282
+ if accumulator
283
+ accumulator.call(url, difftext, opts)
284
+ else
285
+ accumulate(url, difftext, opts)
286
+ end
284
287
  end
285
288
  end
289
+ rescue Exception => e
290
+ $logger.error e.to_s
291
+ $logger.info e.backtrace.join("\n")
286
292
  end
287
293
  end
288
294
  end
@@ -291,20 +297,22 @@ class Websitary::App
291
297
 
292
298
 
293
299
  def move(from, to)
294
- copy_move(:rename, from, to)
300
+ # copy_move(:rename, from, to) # ftools
301
+ copy_move(:mv, from, to) # FileUtils
295
302
  end
296
303
 
297
304
 
298
305
  def copy(from, to)
299
- copy_move(:copy, from, to)
306
+ # copy_move(:copy, from, to)
307
+ copy_move(:cp, from, to)
300
308
  end
301
309
 
302
310
 
303
311
  def copy_move(method, from, to)
304
312
  if File.exists?(from)
305
- $logger.debug "Overwriting: #{from} -> #{to}" if File.exists?(to)
313
+ $logger.debug "Overwrite: #{from} -> #{to}" if File.exists?(to)
306
314
  lst = File.lstat(from)
307
- File.send(method, from, to)
315
+ FileUtils.send(method, from, to)
308
316
  File.utime(lst.atime, lst.mtime, to)
309
317
  @configuration.mtimes.set(from, lst.mtime)
310
318
  @configuration.mtimes.set(to, lst.mtime)
@@ -347,16 +355,16 @@ class Websitary::App
347
355
 
348
356
  def download(url, opts, latest, older=nil)
349
357
  if @configuration.done.include?(url)
350
- $logger.info "Already downloaded: #{@configuration.get(url, :title, url).inspect}"
358
+ $logger.info "Already downloaded: #{@configuration.url_get(url, :title, url).inspect}"
351
359
  return false
352
360
  end
353
361
 
354
- $logger.warn "Download: #{@configuration.get(url, :title, url).inspect}"
362
+ $logger.warn "Download: #{@configuration.url_get(url, :title, url).inspect}"
355
363
  @configuration.done << url
356
- text = @configuration.call_cmd(@configuration.get(url, :download), [url], :url => url)
364
+ text = @configuration.call_cmd(@configuration.url_get(url, :download), [url], :url => url)
357
365
  # $logger.debug text #DBG#
358
366
  unless text
359
- $logger.warn "no contents: #{@configuration.get(url, :title, url)}"
367
+ $logger.warn "no contents: #{@configuration.url_get(url, :title, url)}"
360
368
  return false
361
369
  end
362
370
 
@@ -390,7 +398,7 @@ class Websitary::App
390
398
  text = text.join("\n")
391
399
  end
392
400
 
393
- pprc = @configuration.get(url, :downloadprocess)
401
+ pprc = @configuration.url_get(url, :downloadprocess)
394
402
  if pprc
395
403
  $logger.debug "download process: #{pprc}"
396
404
  text = @configuration.call_cmd(pprc, [text], :url => url)
@@ -416,25 +424,25 @@ class Websitary::App
416
424
  def diff(url, opts, new, old)
417
425
  if File.exists?(old)
418
426
  $logger.debug "diff: #{old} <-> #{new}"
419
- difftext = @configuration.call_cmd(@configuration.get(url, :diff), [old, new], :url => url)
427
+ difftext = @configuration.call_cmd(@configuration.url_get(url, :diff), [old, new], :url => url)
420
428
  # $logger.debug "diff: #{difftext}" #DBG#
421
429
 
422
430
  if difftext =~ /\S/
423
- if (pprc = @configuration.get(url, :diffprocess))
431
+ if (pprc = @configuration.url_get(url, :diffprocess))
424
432
  $logger.debug "diff process: #{pprc}"
425
433
  difftext = @configuration.call_cmd(pprc, [difftext], :url => url)
426
434
  end
427
435
  # $logger.debug "difftext: #{difftext}" #DBG#
428
436
  if difftext =~ /\S/
429
- $logger.warn "Changed: #{@configuration.get(url, :title, url).inspect}"
437
+ $logger.warn "Changed: #{@configuration.url_get(url, :title, url).inspect}"
430
438
  return difftext
431
439
  end
432
440
  end
433
441
 
434
- $logger.debug "Unchanged: #{@configuration.get(url, :title, url).inspect}"
442
+ $logger.debug "Unchanged: #{@configuration.url_get(url, :title, url).inspect}"
435
443
 
436
444
  elsif File.exist?(new) and
437
- (@configuration.get(url, :show_initial) or @configuration.get_optionvalue(:global, :show_initial))
445
+ (@configuration.url_get(url, :show_initial) or @configuration.optval_get(:global, :show_initial))
438
446
 
439
447
  return File.read(new)
440
448
 
@@ -451,16 +459,16 @@ class Websitary::App
451
459
  tdiff = tdiff_with(opts, tn, tl)
452
460
  case tdiff
453
461
  when nil, false
454
- $logger.debug "Age requirement fulfilled: #{@configuration.get(url, :title, url).inspect}: #{format_tdiff(td)} old"
462
+ $logger.debug "Age requirement fulfilled: #{@configuration.url_get(url, :title, url).inspect}: #{format_tdiff(td)} old"
455
463
  return false
456
464
  when :skip, true
457
- $logger.info "Skip #{@configuration.get(url, :title, url).inspect}: Only #{format_tdiff(td)} old"
465
+ $logger.info "Skip #{@configuration.url_get(url, :title, url).inspect}: Only #{format_tdiff(td)} old"
458
466
  return true
459
467
  when Numeric
460
468
  if td < tdiff
461
469
  tdd = tdiff - td
462
470
  @tdiff_min = tdd if @tdiff_min.nil? or tdd < @tdiff_min
463
- $logger.info "Skip #{@configuration.get(url, :title, url).inspect}: Only #{format_tdiff(td)} old (#{format_tdiff(tdiff)})"
471
+ $logger.info "Skip #{@configuration.url_get(url, :title, url).inspect}: Only #{format_tdiff(td)} old (#{format_tdiff(tdiff)})"
464
472
  return true
465
473
  end
466
474
  else
@@ -509,7 +517,7 @@ class Websitary::App
509
517
  when Integer
510
518
  return eligible != now
511
519
  else
512
- $logger.error "#{@configuration.get(url, :title, url)}: Wrong type for :days_of_week=#{dweek.inspect}"
520
+ $logger.error "#{@configuration.url_get(url, :title, url)}: Wrong type for :days_of_week=#{dweek.inspect}"
513
521
  return :skip
514
522
  end
515
523
  end
File without changes
@@ -1,5 +1,5 @@
1
1
  # configuration.rb
2
- # @Last Change: 2008-01-09.
2
+ # @Last Change: 2008-05-23.
3
3
  # Author:: Thomas Link (micathom AT gmail com)
4
4
  # License:: GPL (see http://www.gnu.org/licenses/gpl.txt)
5
5
  # Created:: 2007-09-08.
@@ -47,7 +47,6 @@ class Websitary::Configuration
47
47
  @cmd_edit = 'vi "%s"'
48
48
  @execute = 'downdiff'
49
49
  @quicklist_profile = 'quicklist'
50
- @user_agent = "websitary/#{Websitary::VERSION}"
51
50
  @view = 'w3m "%s"'
52
51
 
53
52
  @allow = {}
@@ -60,7 +59,7 @@ class Websitary::Configuration
60
59
  @profiles = []
61
60
  @robots = {}
62
61
  @todo = []
63
- @exclude = []
62
+ @exclude = [/^\s*(javascript|mailto):/]
64
63
  @urlencmap = {}
65
64
  @urls = {}
66
65
 
@@ -190,10 +189,16 @@ class Websitary::Configuration
190
189
  end
191
190
 
192
191
 
192
+ def url_set(url, items)
193
+ opts = @urls[url] ||= {}
194
+ opts.merge!(items)
195
+ end
196
+
197
+
193
198
  # Retrieve an option for an url
194
199
  # url:: String
195
200
  # opt:: Symbol
196
- def get(url, opt, default=nil)
201
+ def url_get(url, opt, default=nil)
197
202
  opts = @urls[url]
198
203
  unless opts
199
204
  $logger.debug "Non-registered URL: #{url}"
@@ -221,7 +226,7 @@ class Websitary::Configuration
221
226
  when nil
222
227
  when Symbol
223
228
  $logger.debug "get: val=#{val}"
224
- success, rv = get_option(opt, val)
229
+ success, rv = opt_get(opt, val)
225
230
  $logger.debug "get: #{success}, #{rv}"
226
231
  if success
227
232
  return rv
@@ -231,7 +236,7 @@ class Websitary::Configuration
231
236
  return val
232
237
  end
233
238
  unless default
234
- success, default1 = get_option(opt, :default)
239
+ success, default1 = opt_get(opt, :default)
235
240
  default = default1 if success
236
241
  end
237
242
 
@@ -240,10 +245,10 @@ class Websitary::Configuration
240
245
  end
241
246
 
242
247
 
243
- def get_optionvalue(opt, val, default=nil)
248
+ def optval_get(opt, val, default=nil)
244
249
  case val
245
250
  when Symbol
246
- ok, val = get_option(opt, val)
251
+ ok, val = opt_get(opt, val)
247
252
  if ok
248
253
  val
249
254
  else
@@ -255,22 +260,22 @@ class Websitary::Configuration
255
260
  end
256
261
 
257
262
 
258
- def get_option(opt, val)
263
+ def opt_get(opt, val)
259
264
  vals = @options[opt]
260
265
  $logger.debug "val=#{val} vals=#{vals.inspect}"
261
266
  if vals and vals.has_key?(val)
262
267
  rv = vals[val]
263
- $logger.debug "get_option ok: #{opt} => #{rv.inspect}"
268
+ $logger.debug "opt_get ok: #{opt} => #{rv.inspect}"
264
269
  case rv
265
270
  when Symbol
266
- $logger.debug "get_option re: #{rv}"
267
- return get_option(opt, rv)
271
+ $logger.debug "opt_get re: #{rv}"
272
+ return opt_get(opt, rv)
268
273
  else
269
- $logger.debug "get_option true, #{rv}"
274
+ $logger.debug "opt_get true, #{rv}"
270
275
  return [true, rv]
271
276
  end
272
277
  else
273
- $logger.debug "get_option no: #{opt} => #{val.inspect}"
278
+ $logger.debug "opt_get no: #{opt} => #{val.inspect}"
274
279
  return [false, val]
275
280
  end
276
281
  end
@@ -409,7 +414,7 @@ class Websitary::Configuration
409
414
  # urls:: String
410
415
  def source(urls, opts={})
411
416
  urls.split("\n").flatten.compact.each do |url|
412
- @urls[url] = @default_options.dup.update(opts)
417
+ url_set(url, @default_options.dup.update(opts))
413
418
  to_do url
414
419
  end
415
420
  end
@@ -477,9 +482,9 @@ class Websitary::Configuration
477
482
 
478
483
 
479
484
  def format_text(url, text)
480
- enc = get(url, :iconv)
485
+ enc = url_get(url, :iconv)
481
486
  if enc
482
- denc = get_optionvalue(:global, :encoding)
487
+ denc = optval_get(:global, :encoding)
483
488
  begin
484
489
  require 'iconv'
485
490
  text = Iconv.conv(denc, enc, text)
@@ -493,7 +498,7 @@ class Websitary::Configuration
493
498
 
494
499
  # Format a diff according to URL's source options.
495
500
  def format(url, difftext)
496
- fmt = get(url, :format)
501
+ fmt = url_get(url, :format)
497
502
  text = format_text(url, difftext)
498
503
  eval_arg(fmt, [text], text)
499
504
  end
@@ -527,7 +532,7 @@ class Websitary::Configuration
527
532
  def call_cmd(cmd, cmdargs, args={})
528
533
  default = args[:default]
529
534
  url = args[:url]
530
- timeout = url ? get(url, :timeout) : nil
535
+ timeout = url ? url_get(url, :timeout) : nil
531
536
  if timeout
532
537
  begin
533
538
  Timeout::timeout(timeout) do |timeout_length|
@@ -583,7 +588,7 @@ class Websitary::Configuration
583
588
  if difftext
584
589
  difftext = html_to_text(difftext) if is_html?(difftext)
585
590
  !difftext.empty? && [
586
- eval_arg(get(url, :rewrite_link, '%s'), [url]),
591
+ eval_arg(url_get(url, :rewrite_link, '%s'), [url]),
587
592
  difftext_annotation(url),
588
593
  nil,
589
594
  difftext
@@ -594,32 +599,32 @@ class Websitary::Configuration
594
599
 
595
600
 
596
601
  def get_output_rss(difftext)
597
- success, rss_url = get_option(:rss, :url)
602
+ success, rss_url = opt_get(:rss, :url)
598
603
  if success
599
- success, rss_version = get_option(:rss, :version)
604
+ success, rss_version = opt_get(:rss, :version)
600
605
  # require "rss/#{rss_version}"
601
606
 
602
607
  rss = RSS::Rss.new(rss_version)
603
608
  chan = RSS::Rss::Channel.new
604
609
  chan.title = @output_title
605
610
  [:description, :copyright, :category, :language, :image, :webMaster, :pubDate].each do |field|
606
- ok, val = get_option(:rss, field)
611
+ ok, val = opt_get(:rss, field)
607
612
  item.send(format_symbol(field, '%s='), val) if ok
608
613
  end
609
614
  chan.link = rss_url
610
615
  rss.channel = chan
611
616
 
612
617
  cnt = difftext.map do |url, text|
613
- rss_format = get(url, :rss_format, 'plain_text')
618
+ rss_format = url_get(url, :rss_format, 'plain_text')
614
619
  text = strip_tags(text, :format => rss_format)
615
620
  next if text.empty?
616
621
 
617
622
  item = RSS::Rss::Channel::Item.new
618
623
  item.date = Time.now
619
- item.title = get(url, :title, File.basename(url))
620
- item.link = eval_arg(get(url, :rewrite_link, '%s'), [url])
624
+ item.title = url_get(url, :title, File.basename(url))
625
+ item.link = eval_arg(url_get(url, :rewrite_link, '%s'), [url])
621
626
  [:author, :date, :enclosure, :category, :pubDate].each do |field|
622
- val = get(url, format_symbol(field, 'rss_%s'))
627
+ val = url_get(url, format_symbol(field, 'rss_%s'))
623
628
  item.send(format_symbol(field, '%s='), val) if val
624
629
  end
625
630
 
@@ -647,7 +652,7 @@ class Websitary::Configuration
647
652
 
648
653
  def get_output_html(difftext)
649
654
  difftext = difftext.map do |url, text|
650
- tags = get(url, :strip_tags)
655
+ tags = url_get(url, :strip_tags)
651
656
  text = strip_tags(text, :tags => tags) if tags
652
657
  text.empty? ? nil : [url, text]
653
658
  end
@@ -655,7 +660,7 @@ class Websitary::Configuration
655
660
  sort_difftext!(difftext)
656
661
 
657
662
  toc = difftext.map do |url, text|
658
- ti = get(url, :title, File.basename(url))
663
+ ti = url_get(url, :title, File.basename(url))
659
664
  tid = html_toc_id(url)
660
665
  bid = html_body_id(url)
661
666
  %{<li id="#{tid}" class="toc"><a class="toc" href="\##{bid}">#{ti}</a></li>}
@@ -664,9 +669,9 @@ class Websitary::Configuration
664
669
  idx = 0
665
670
  cnt = difftext.map do |url, text|
666
671
  idx += 1
667
- ti = get(url, :title, File.basename(url))
672
+ ti = url_get(url, :title, File.basename(url))
668
673
  bid = html_body_id(url)
669
- if (rewrite = get(url, :rewrite_link))
674
+ if (rewrite = url_get(url, :rewrite_link))
670
675
  urlr = eval_arg(rewrite, [url])
671
676
  ext = ''
672
677
  else
@@ -676,7 +681,7 @@ class Websitary::Configuration
676
681
  urlr = url
677
682
  end
678
683
  note = difftext_annotation(url)
679
- onclick = get_optionvalue(:global, :toggle_body) ? 'onclick="ToggleBody(this)"' : ''
684
+ onclick = optval_get(:global, :toggle_body) ? 'onclick="ToggleBody(this)"' : ''
680
685
  <<HTML
681
686
  <div id="#{bid}" class="webpage" #{onclick}>
682
687
  <div class="count">
@@ -697,9 +702,9 @@ class Websitary::Configuration
697
702
  HTML
698
703
  end.join(('<hr class="separator"/>') + "\n")
699
704
 
700
- success, template = get_option(:page, :format)
705
+ success, template = opt_get(:page, :format)
701
706
  unless success
702
- success, template = get_option(:page, :simple)
707
+ success, template = opt_get(:page, :simple)
703
708
  end
704
709
  return eval_arg(template, [@output_title, toc, cnt])
705
710
  end
@@ -735,12 +740,12 @@ HTML
735
740
 
736
741
 
737
742
  def encoded_filename(dir, url, ensure_dir=false, type=nil)
738
- type ||= get(url, :cachetype, 'tree')
743
+ type ||= url_get(url, :cachetype, 'tree')
739
744
  $logger.debug "encoded_filename: type=#{type} url=#{url}"
740
745
  rv = File.join(@cfgdir, dir, encoded_basename(url, type))
741
746
  rd = File.dirname(rv)
742
747
  $logger.debug "encoded_filename: rv0=#{rv}"
743
- fm = get_optionvalue(:global, :filename_size, 255)
748
+ fm = optval_get(:global, :filename_size, 255)
744
749
  rdok = !ensure_dir || @app.ensure_dir(rd, false)
745
750
  if !rdok or rv.size > fm or File.directory?(rv)
746
751
  # $logger.debug "Filename too long (:global=>:filename_size = #{fm}), try md5 encoded filename instead: #{url}"
@@ -796,6 +801,24 @@ HTML
796
801
  end
797
802
 
798
803
 
804
+ def save_dir(url, dir, title=nil)
805
+ case dir
806
+ when true
807
+ title ||= url_get(url, :title)
808
+ dir = File.join(@cfgdir, 'attachments', encode(title))
809
+ when Proc
810
+ dir = dir.call(url)
811
+ end
812
+ @app.ensure_dir(dir) if dir
813
+ return dir
814
+ end
815
+
816
+
817
+ def clean_url(url)
818
+ url && url.strip
819
+ end
820
+
821
+
799
822
  # Strip the url's last part (after #).
800
823
  def canonic_url(url)
801
824
  url.sub(/#.*$/, '')
@@ -803,7 +826,7 @@ HTML
803
826
 
804
827
 
805
828
  def strip_tags_default
806
- success, tags = get_option(:strip_tags, :default)
829
+ success, tags = opt_get(:strip_tags, :default)
807
830
  tags.dup if success
808
831
  end
809
832
 
@@ -830,7 +853,7 @@ HTML
830
853
  # This checks either for a :match option for url or the extensions
831
854
  # of path0 and path.
832
855
  def eligible_path?(url, path0, path)
833
- rx = get(url, :match)
856
+ rx = url_get(url, :match)
834
857
  if rx
835
858
  return path =~ rx
836
859
  else
@@ -845,15 +868,15 @@ HTML
845
868
  begin
846
869
  $logger.debug "push_refs: #{url}"
847
870
  return if robots?(hpricot, 'nofollow') or is_excluded?(url)
848
- depth = get(url, :depth)
871
+ depth = url_get(url, :depth)
849
872
  return if depth and depth <= 0
850
873
  uri0 = URI.parse(url)
851
874
  # pn0 = Pathname.new(guess_dir(File.expand_path(uri0.path)))
852
875
  pn0 = Pathname.new(guess_dir(uri0.path))
853
876
  (hpricot / 'a').each do |a|
854
877
  next if a['rel'] == 'nofollow'
855
- href = a['href']
856
- next if href.nil? or href == url or href =~ /^\s*javascript:/ or href =~ /^\s*mailto:/ or is_excluded?(href)
878
+ href = clean_url(a['href'])
879
+ next if href.nil? or href == url or is_excluded?(href)
857
880
  uri = URI.parse(href)
858
881
  pn = guess_dir(uri.path)
859
882
  href = rewrite_href(href, url, uri0, pn0, true)
@@ -869,7 +892,7 @@ HTML
869
892
  opts[:title] = [opts[:title], File.basename(curl)].join(' - ')
870
893
  opts[:depth] = depth - 1 if depth and depth >= 0
871
894
  # opts[:sleep] = delay if delay
872
- @urls[curl] = opts
895
+ url_set(curl, opts)
873
896
  to_do curl
874
897
  end
875
898
  rescue Exception => e
@@ -887,7 +910,7 @@ HTML
887
910
  uri = URI.parse(url)
888
911
  urd = guess_dir(uri.path)
889
912
  (doc / 'a').each do |a|
890
- href = a['href']
913
+ href = clean_url(a['href'])
891
914
  if is_excluded?(href)
892
915
  comment_element(doc, a)
893
916
  else
@@ -896,7 +919,7 @@ HTML
896
919
  end
897
920
  end
898
921
  (doc / 'img').each do |a|
899
- href = a['src']
922
+ href = clean_url(a['src'])
900
923
  if is_excluded?(href)
901
924
  comment_element(doc, a)
902
925
  else
@@ -917,12 +940,14 @@ HTML
917
940
  # Try to make href an absolute url.
918
941
  def rewrite_href(href, url, uri=nil, urd=nil, local=false)
919
942
  begin
920
- return if !href or href =~ /^\s*javascript:/
921
- urh = URI.parse(href)
943
+ return nil if !href or is_excluded?(href)
922
944
  uri ||= URI.parse(url)
945
+ if href =~ /^\s*\//
946
+ return uri.merge(href).to_s
947
+ end
948
+ urh = URI.parse(href)
923
949
  urd ||= guess_dir(uri.path)
924
950
  rv = nil
925
- href = href.strip
926
951
 
927
952
  # $logger.debug "DBG", uri, urh, #DBG#
928
953
  if href =~ /\w+:/
@@ -1026,7 +1051,7 @@ HTML
1026
1051
 
1027
1052
 
1028
1053
  def canonic_filename(filename)
1029
- call_cmd(get_optionvalue(:global, :canonic_filename), [filename], :default => filename)
1054
+ call_cmd(optval_get(:global, :canonic_filename), [filename], :default => filename)
1030
1055
  end
1031
1056
 
1032
1057
 
@@ -1037,6 +1062,7 @@ HTML
1037
1062
  :download_html => :openuri,
1038
1063
  :encoding => 'ISO-8859-1',
1039
1064
  :toggle_body => false,
1065
+ :user_agent => "websitary/#{Websitary::VERSION}",
1040
1066
  },
1041
1067
  }
1042
1068
 
@@ -1052,11 +1078,11 @@ HTML
1052
1078
  },
1053
1079
 
1054
1080
  :binary => lambda {|old, new|
1055
- call_cmd(get_optionvalue(:diff, :diff), [old, new, '--binary -d -w'])
1081
+ call_cmd(optval_get(:diff, :diff), [old, new, '--binary -d -w'])
1056
1082
  },
1057
1083
 
1058
1084
  :new => lambda {|old, new|
1059
- difftext = call_cmd(get_optionvalue(:diff, :binary), [old, new])
1085
+ difftext = call_cmd(optval_get(:diff, :binary), [old, new])
1060
1086
  difftext.empty? ? '' : new
1061
1087
  },
1062
1088
 
@@ -1067,7 +1093,7 @@ HTML
1067
1093
  args = {
1068
1094
  :oldhtml => File.read(old),
1069
1095
  :newhtml => File.read(new),
1070
- :ignore => get(url, :ignore),
1096
+ :ignore => url_get(url, :ignore),
1071
1097
  }
1072
1098
  difftext = Websitary::Htmldiff.new(args).diff
1073
1099
  difftext
@@ -1130,7 +1156,7 @@ HTML
1130
1156
  # :download => 'w3m -no-cookie -S -F -dump "%s"'
1131
1157
 
1132
1158
  shortcut :lynx, :delegate => :diff,
1133
- :download => 'lynx -dump "%s"'
1159
+ :download => 'lynx -dump -nolist "%s"'
1134
1160
 
1135
1161
  shortcut :links, :delegate => :diff,
1136
1162
  :download => 'links -dump "%s"'
@@ -1142,24 +1168,14 @@ HTML
1142
1168
  :download => 'wget -q -O - "%s"'
1143
1169
 
1144
1170
  shortcut :text, :delegate => :diff,
1145
- :download => lambda {|url| html_to_text(read_url(url, 'html'))}
1171
+ :download => lambda {|url| doc_to_text(read_document(url))}
1146
1172
 
1147
1173
  shortcut :body_html, :delegate => :webdiff,
1148
1174
  :strip_tags => :default,
1149
1175
  :download => lambda {|url|
1150
1176
  begin
1151
- doc = Hpricot(read_url(url, 'html'))
1152
- doc = doc.at('body')
1153
- if doc
1154
- doc = rewrite_urls(url, doc)
1155
- doc = doc.inner_html
1156
- if (tags = get(url, :strip_tags))
1157
- doc = strip_tags(doc, :format => :hpricot, :tags => tags)
1158
- end
1159
- else
1160
- $logger.warn 'inner html: No body'
1161
- end
1162
- doc.to_s
1177
+ doc = read_document(url)
1178
+ body_html(url, doc).to_s
1163
1179
  rescue Exception => e
1164
1180
  # $logger.error e #DBG#
1165
1181
  $logger.error e.message
@@ -1180,10 +1196,37 @@ HTML
1180
1196
  end
1181
1197
  }
1182
1198
 
1199
+ shortcut :mechanize, :delegate => :webdiff,
1200
+ :download => lambda {|url|
1201
+ require 'mechanize'
1202
+ agent = WWW::Mechanize.new
1203
+ proxy = get_proxy
1204
+ if proxy
1205
+ agent.set_proxy(*proxy)
1206
+ end
1207
+ page = agent.get(url)
1208
+ process = url_get(url, :mechanize)
1209
+ if process
1210
+ uri = URI.parse(url)
1211
+ urd = guess_dir(uri.path)
1212
+ page.links.each {|link|
1213
+ href = link.node['href']
1214
+ if href
1215
+ href = rewrite_href(href, url, uri, urd, true)
1216
+ link.node['href'] = href if href
1217
+ end
1218
+ }
1219
+ process.call(url, agent, page)
1220
+ else
1221
+ doc = url_document(url, page.content)
1222
+ body_html(url, doc).to_s
1223
+ end
1224
+ }
1225
+
1183
1226
  shortcut :rss,
1184
1227
  :delegate => :openuri,
1185
1228
  :diff => lambda {|old, new|
1186
- success, rss_version = get_option(:rss, :version)
1229
+ success, rss_version = opt_get(:rss, :version)
1187
1230
  ro = RSS::Parser.parse(File.read(old), false)
1188
1231
  if ro
1189
1232
  rh = {}
@@ -1202,24 +1245,35 @@ HTML
1202
1245
  rnew << format_rss_item(item, rss_diff)
1203
1246
  else
1204
1247
  enc = item.respond_to?(:enclosure) && item.enclosure
1205
- if enc and (curl = enc.url)
1206
- url = url_from_filename(new)
1207
- dir = get(url, :rss_enclosure)
1248
+ url = url_from_filename(new)
1249
+ if !enc and item.description
1250
+ scanner = url_get(url, :rss_find_enclosure)
1251
+ if scanner
1252
+ ddoc = Hpricot(item.description)
1253
+ enc = scanner.call(item, ddoc)
1254
+ if enc
1255
+ def enc.url
1256
+ self
1257
+ end
1258
+ else
1259
+ $logger.warn "No embedded enclosure URL found: #{item.description}"
1260
+ end
1261
+ end
1262
+ end
1263
+ if enc and (curl = clean_url(enc.url))
1264
+ dir = url_get(url, :rss_enclosure)
1208
1265
  curl = rewrite_href(curl, url, nil, nil, true)
1209
1266
  next unless curl
1210
1267
  if dir
1211
- if dir == true
1212
- dir = File.join(@cfgdir, 'attachments', encode(rn.channel.title))
1213
- end
1214
- @app.ensure_dir(dir)
1215
- $logger.debug "Enclosure URL: #{curl}"
1268
+ dir = save_dir(url, dir, encode(rn.channel.title))
1269
+ $logger.info "Enclosure: #{curl}"
1216
1270
  fname = File.join(dir, encode(File.basename(curl) || item.title || item.pubDate.to_s || Time.now.to_s))
1217
- $logger.debug "Enclosure save to: #{fname}"
1271
+ $logger.debug "Save enclosure: #{fname}"
1218
1272
  enc = read_url(curl, 'rss_enclosure')
1219
1273
  write_file(fname, 'wb') {|io| io.puts enc}
1220
1274
  furl = file_url(fname)
1221
- enclosure = %{<p class="enclosure"><a href="%s" class="enclosure" />Enclosure (local copy)</a></p>} % furl
1222
- if get(url, :rss_rewrite_enclosed_urls)
1275
+ enclosure = rss_enclosure_local_copy(url, furl)
1276
+ if url_get(url, :rss_rewrite_enclosed_urls)
1223
1277
  item.description.gsub!(Regexp.new(Regexp.escape(curl))) {|t| furl}
1224
1278
  end
1225
1279
  else
@@ -1249,7 +1303,7 @@ HTML
1249
1303
  opts = @urls[url].dup
1250
1304
  opts[:download] = :rss
1251
1305
  opts[:title] = elt['title'] || elt['text'] || elt['htmlurl'] || curl
1252
- @urls[curl] = opts
1306
+ url_set(curl, opts)
1253
1307
  to_do curl
1254
1308
  else
1255
1309
  $logger.warn "Unsupported type in OPML: #{elt.to_s}"
@@ -1266,10 +1320,10 @@ HTML
1266
1320
  :download => lambda {|url| get_website_below(:body_html, url)}
1267
1321
 
1268
1322
  shortcut :website_txt, :delegate => :default,
1269
- :download => lambda {|url| html_to_text(get_website(get(url, :download_html, :openuri), url))}
1323
+ :download => lambda {|url| html_to_text(get_website(url_get(url, :download_html, :openuri), url))}
1270
1324
 
1271
1325
  shortcut :website_txt_below, :delegate => :default,
1272
- :download => lambda {|url| html_to_text(get_website_below(get(url, :download_html, :openuri), url))}
1326
+ :download => lambda {|url| html_to_text(get_website_below(url_get(url, :download_html, :openuri), url))}
1273
1327
 
1274
1328
  shortcut :ftp, :delegate => :default,
1275
1329
  :download => lambda {|url| get_ftp(url).join("\n")}
@@ -1277,7 +1331,7 @@ HTML
1277
1331
  shortcut :ftp_recursive, :delegate => :default,
1278
1332
  :download => lambda {|url|
1279
1333
  list = get_ftp(url)
1280
- depth = get(url, :depth)
1334
+ depth = url_get(url, :depth)
1281
1335
  if !depth or depth >= 0
1282
1336
  dirs = list.find_all {|e| e =~ /^d/}
1283
1337
  dirs.each do |l|
@@ -1287,7 +1341,7 @@ HTML
1287
1341
  opts = @urls[url].dup
1288
1342
  opts[:title] = [opts[:title], File.basename(curl)].join(' - ')
1289
1343
  opts[:depth] = depth - 1 if depth and depth >= 0
1290
- @urls[curl] = opts
1344
+ url_set(curl, opts)
1291
1345
  to_do curl
1292
1346
  end
1293
1347
  end
@@ -1307,7 +1361,7 @@ HTML
1307
1361
  <html>
1308
1362
  <head>
1309
1363
  <title>%s</title>
1310
- <meta http-equiv="Content-Type" content="text/html; charset=#{get_optionvalue(:global, :encoding)}">
1364
+ <meta http-equiv="Content-Type" content="text/html; charset=#{optval_get(:global, :encoding)}">
1311
1365
  <link rel="stylesheet" href="websitary.css" type="text/css">
1312
1366
  <link rel="alternate" href="websitary.rss" type="application/rss+xml" title="%s">
1313
1367
  </head>
@@ -1463,6 +1517,9 @@ CSS
1463
1517
  begin
1464
1518
  self.instance_eval(contents)
1465
1519
  return true
1520
+ rescue Exception => e
1521
+ $logger.fatal "Error when reading profile: #{profile_file}\n#{e}"
1522
+ exit 5
1466
1523
  ensure
1467
1524
  @current_profile = nil
1468
1525
  end
@@ -1470,9 +1527,9 @@ CSS
1470
1527
 
1471
1528
 
1472
1529
  def get_website(download, url)
1473
- html = call_cmd(get_optionvalue(:download, download), [url], :url => url)
1530
+ html = call_cmd(optval_get(:download, download), [url], :url => url)
1474
1531
  if html
1475
- doc = Hpricot(html)
1532
+ doc = url_document(url, html)
1476
1533
  if doc
1477
1534
  return if robots?(doc, 'noindex')
1478
1535
  push_hrefs(url, doc) do |uri0, pn0, uri, pn|
@@ -1486,10 +1543,10 @@ CSS
1486
1543
 
1487
1544
 
1488
1545
  def get_website_below(download, url)
1489
- dwnl = get_optionvalue(:download, download)
1546
+ dwnl = optval_get(:download, download)
1490
1547
  html = call_cmd(dwnl, [url], :url => url)
1491
1548
  if html
1492
- doc = Hpricot(html)
1549
+ doc = url_document(url, html)
1493
1550
  if doc
1494
1551
  return if robots?(doc, 'noindex')
1495
1552
  push_hrefs(url, doc) do |uri0, pn0, uri, pn|
@@ -1547,8 +1604,26 @@ CSS
1547
1604
  end
1548
1605
 
1549
1606
 
1607
+ def url_document(url, html)
1608
+ doc = html && Hpricot(html)
1609
+ if doc
1610
+ unless url_get(url, :title)
1611
+ ti = (doc / 'head > title').inner_html
1612
+ url_set(url, :title => ti) unless ti.empty?
1613
+ end
1614
+ end
1615
+ doc
1616
+ end
1617
+
1618
+
1619
+ def read_document(url)
1620
+ html = read_url(url, 'html')
1621
+ html && url_document(url, html)
1622
+ end
1623
+
1624
+
1550
1625
  def read_url(url, type='html')
1551
- downloader = get(url, "download_#{type}".intern)
1626
+ downloader = url_get(url, "download_#{type}".intern)
1552
1627
  if downloader
1553
1628
  call_cmd(downloader, [url], :url => url)
1554
1629
  else
@@ -1568,9 +1643,13 @@ CSS
1568
1643
  if uri.instance_of?(URI::Generic) or uri.scheme == 'file'
1569
1644
  open(url).read
1570
1645
  else
1571
- header = {"User-Agent" => @user_agent}
1572
- header.merge!(get(url, :header, {}))
1573
- open(url, header).read
1646
+ args = {"User-Agent" => optval_get(:global, :user_agent)}
1647
+ args.merge!(url_get(url, :header, {}))
1648
+ # proxy = get_proxy
1649
+ # if proxy
1650
+ # args[:proxy] = proxy[0,2].join(':')
1651
+ # end
1652
+ open(url, args).read
1574
1653
  end
1575
1654
  end
1576
1655
 
@@ -1579,7 +1658,7 @@ CSS
1579
1658
  bak = oldname(url)
1580
1659
  lst = latestname(url)
1581
1660
  if File.exist?(bak) and File.exist?(lst)
1582
- eval_arg(get(url, :format_annotation, '%s >>> %s'), [@mtimes.mtime(bak), @mtimes.mtime(lst)])
1661
+ eval_arg(url_get(url, :format_annotation, '%s >>> %s'), [@mtimes.mtime(bak), @mtimes.mtime(lst)])
1583
1662
  end
1584
1663
  end
1585
1664
 
@@ -1597,6 +1676,21 @@ CSS
1597
1676
  end
1598
1677
 
1599
1678
 
1679
+ def rss_enclosure_local_copy(url, furl)
1680
+ t = url_get(url, :rss_format_local_copy) ||
1681
+ %{<p class="enclosure"><a href="%s" class="enclosure" />Enclosure (local copy)</a></p>}
1682
+ case t
1683
+ when Proc
1684
+ t.call(url, furl)
1685
+ when String
1686
+ t % furl
1687
+ else
1688
+ $logger.fatal 'Argument for :rss_format_local_copy must be String or Proc: %s' % t.inspect
1689
+ exit 5
1690
+ end
1691
+ end
1692
+
1693
+
1600
1694
  def format_rss_item(item, body, enclosure='')
1601
1695
  ti = rss_field(item, :title)
1602
1696
  au = rss_field(item, :author)
@@ -1627,6 +1721,46 @@ EOT
1627
1721
  end
1628
1722
 
1629
1723
 
1724
+ def get_proxy
1725
+ proxy = optval_get(:global, :proxy)
1726
+ if proxy
1727
+ case proxy
1728
+ when String
1729
+ proxy = proxy.split(':', 2)
1730
+ if proxy.size == 1
1731
+ proxy << 8080
1732
+ else
1733
+ proxy[1] = proxy[1].to_i
1734
+ end
1735
+ when Array
1736
+ else
1737
+ raise ArgumentError, 'proxy must be String or Array'
1738
+ end
1739
+ end
1740
+ proxy
1741
+ end
1742
+
1743
+
1744
+ def body_html(url, doc)
1745
+ doc &&= doc.at('body')
1746
+ if doc
1747
+ doc = rewrite_urls(url, doc)
1748
+ doc = doc.inner_html
1749
+ if (tags = url_get(url, :strip_tags))
1750
+ doc = strip_tags(doc, :format => :hpricot, :tags => tags)
1751
+ end
1752
+ else
1753
+ $logger.warn 'inner html: No body'
1754
+ end
1755
+ doc
1756
+ end
1757
+
1758
+
1759
+ def doc_to_text(doc)
1760
+ doc && doc.to_plain_text
1761
+ end
1762
+
1763
+
1630
1764
  # Convert html to plain text using hpricot.
1631
1765
  def html_to_text(text)
1632
1766
  text && Hpricot(text).to_plain_text
@@ -1660,10 +1794,10 @@ EOT
1660
1794
  return true if rurl.nil? or rurl.empty?
1661
1795
  begin
1662
1796
  robots_txt = read_url(rurl, 'robots')
1663
- rules = RobotRules.new(@user_agent)
1797
+ rules = RobotRules.new(optval_get(:global, :user_agent))
1664
1798
  rules.parse(rurl, robots_txt)
1665
1799
  @robots[host] = rules
1666
- $logger.info "Loaded #{rurl} for #{@user_agent}"
1800
+ $logger.info "Loaded #{rurl} for #{optval_get(:global, :user_agent)}"
1667
1801
  $logger.debug robots_txt
1668
1802
  rescue Exception => e
1669
1803
  $logger.info "#{rurl}: #{e}"
@@ -1705,7 +1839,7 @@ EOT
1705
1839
  difftext.sort! do |a, b|
1706
1840
  aa = a[0]
1707
1841
  bb = b[0]
1708
- get(aa, :title, aa).downcase <=> get(bb, :title, bb).downcase
1842
+ url_get(aa, :title, aa).downcase <=> url_get(bb, :title, bb).downcase
1709
1843
  end
1710
1844
  end
1711
1845
 
@@ -1713,7 +1847,7 @@ EOT
1713
1847
  def file_url(filename)
1714
1848
  # filename = File.join(File.basename(File.dirname(filename)), File.basename(filename))
1715
1849
  # "file://#{encode(filename, ':/')}"
1716
- filename = call_cmd(get_optionvalue(:global, :file_url), [filename], :default => filename)
1850
+ filename = call_cmd(optval_get(:global, :file_url), [filename], :default => filename)
1717
1851
  encode(filename, ':/')
1718
1852
  end
1719
1853
 
File without changes
File without changes
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: websitary
3
3
  version: !ruby/object:Gem::Version
4
- version: "0.4"
4
+ version: "0.5"
5
5
  platform: ruby
6
6
  authors:
7
7
  - Thomas Link
@@ -9,7 +9,7 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2008-01-13 00:00:00 +01:00
12
+ date: 2008-05-24 00:00:00 +02:00
13
13
  default_executable:
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
@@ -28,7 +28,7 @@ dependencies:
28
28
  requirements:
29
29
  - - ">="
30
30
  - !ruby/object:Gem::Version
31
- version: 1.4.0
31
+ version: 1.5.1
32
32
  version:
33
33
  description: "== DESCRIPTION: websitary (formerly known as websitiary with an extra \"i\") monitors webpages, rss feeds, podcasts etc. It reuses other programs (w3m, diff etc.) to do most of the actual work. By default, it works on an ASCII basis, i.e. with the output of text-based webbrowsers like w3m (or lynx, links etc.) as the output can easily be post-processed. It can also work with HTML and highlight new items. This script was originally planned as a ruby-based websec replacement. By default, this script will use w3m to dump HTML pages and then run diff over the current page and the previous backup. Some pages are better viewed with lynx or links. Downloaded documents (HTML or ASCII) can be post-processed (e.g., filtered through some ruby block that extracts elements via hpricot and the like). Please see the configuration options below to find out how to change this globally or for a single source. This user manual is also available as PDF[http://websitiary.rubyforge.org/websitary.pdf]. == FEATURES/PROBLEMS: * Handle webpages, rss feeds (optionally save attachments in podcasts etc.) * Compare webpages with previous backups * Display differences between the current version and the backup * Provide hooks to post-process the downloaded documents and the diff * Display a one-page report summarizing all news * Automatically open the report in your favourite web-browser * Experimental: Download webpages on defined intervalls and generate incremental diffs."
34
34
  email: micathom at gmail com
@@ -75,7 +75,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
75
75
  requirements: []
76
76
 
77
77
  rubyforge_project: websitiary
78
- rubygems_version: 1.0.1
78
+ rubygems_version: 1.1.1
79
79
  signing_key:
80
80
  specification_version: 2
81
81
  summary: A unified website news, rss feed, podcast monitor