harvester 0.8.0.pre.2 → 0.8.0.pre.3

Sign up to get free protection for your applications and to get access to all the features.
@@ -1,8 +1,8 @@
1
1
  = Harvester 0.8
2
2
  The Harvester is a web-based Feed-Aggregator
3
3
 
4
- A running instance can be seen at:
5
- http://astroblog.spaceboyz.net/harvester/
4
+ Running instances can be seen at:
5
+ http://blog-harvester.de or http://rubynetz.de
6
6
 
7
7
  The Harvester eats the feeds you want and produces a static html/feed page that aggregates all those.
8
8
 
@@ -14,7 +14,7 @@ Harvester 0.8 is alpha quality! There are still some unfixed bugs (e.g. database
14
14
 
15
15
  Install the harvester with
16
16
 
17
- gem install harvester
17
+ gem install harvester --pre
18
18
 
19
19
  You can now create a new harvester project with
20
20
 
@@ -31,16 +31,6 @@ Then you will need to configure <tt>config.yaml</tt>, <tt>collections.yaml</tt>
31
31
 
32
32
  The <tt>collections.yaml</tt> file contains the links to your desired feeds.
33
33
 
34
- == Todo
35
- * Still some things broken after update
36
- * Fix database issues
37
- * Improve/fix feed parsing
38
- * Tidy up templates
39
- * Security issues
40
- * Optimize performance
41
- * Fix jabberbot
42
- * Implement some kind of tag filters
43
-
44
34
  == Usage
45
35
 
46
36
  Then run
@@ -55,13 +45,13 @@ This will create the output files. That's it ;).
55
45
 
56
46
  Usually, you want to run this command automatically every x seconds/minutes. You can use a program like <tt>cron</tt> or simply run
57
47
 
58
- harvester clock
48
+ harvester clock <time>
59
49
 
60
50
  to start a simple scheduler.
61
51
 
62
52
  There some more harvester commands to explore. Run
63
53
 
64
- harvester --help
54
+ harvester
65
55
 
66
56
  to get a command list.
67
57
 
@@ -69,6 +59,21 @@ to get a command list.
69
59
 
70
60
  ...
71
61
 
62
+
63
+ == Todo
64
+ * Still some things broken after update
65
+ * Fix database issues
66
+ * Fix wrong tweet escaping (sometimes: ä-->&auml;)
67
+ * Improve/fix feed parsing (especially atom)
68
+ * Fix chart generation
69
+ * Tidy up templates / remove astro-specific links
70
+ * Extract mrss.rb into extra gem
71
+
72
+ * Security issues
73
+ * Optimize performance
74
+ * Fix jabber bot
75
+ * Implement some kind of tag filters
76
+
72
77
  == Credits
73
78
 
74
79
  * {Astro}[https://github.com/astro] (2005-2008)
@@ -0,0 +1,5 @@
1
+ #!/usr/bin/env ruby
2
+ # encoding: utf-8
3
+
4
+ require_relative '../lib/harvester/create'
5
+ Harvester.new_from_argv.create!
@@ -1,6 +1,6 @@
1
1
  #!/usr/bin/env ruby
2
2
  # encoding: utf-8
3
3
 
4
- # TODO: pass differnet username as ARGV
4
+ # TODO: pass different username as ARGV
5
5
  require_relative '../lib/harvester/jabber'
6
6
  Harvester.new_from_argv.jabber!
@@ -0,0 +1,5 @@
1
+ #!/usr/bin/env ruby
2
+ # encoding: utf-8
3
+
4
+ require_relative '../lib/harvester/maintenance'
5
+ Harvester.new_from_argv.maintenance!
@@ -2,13 +2,16 @@
2
2
  # encoding: utf-8
3
3
 
4
4
  # run a complete harvester update :)
5
+ require_relative '../lib/harvester'
5
6
  require_relative '../lib/harvester/fetch'
6
7
  require_relative '../lib/harvester/generate'
7
- require_relative '../lib/harvester/chart'
8
8
  require_relative '../lib/harvester/post'
9
9
 
10
10
  harve = Harvester.new_from_argv
11
+ ( require_relative '../lib/harvester/maintenance'
12
+ harve.maintenance! ) if harve.settings['maintenance']
11
13
  harve.fetch!
12
14
  harve.generate!
13
- harve.chart! if harve.settings['chart']
15
+ ( require_relative '../lib/harvester/stats'
16
+ harve.stats! ) if harve.settings['stats']
14
17
  harve.post!
@@ -0,0 +1,5 @@
1
+ #!/usr/bin/env ruby
2
+ # encoding: utf-8
3
+
4
+ require_relative '../lib/harvester/stats'
5
+ Harvester.new_from_argv.stats!
@@ -240,8 +240,6 @@
240
240
  </div>
241
241
  </xsl:for-each>
242
242
  </div>
243
-
244
- <p style="display:none"><a href="http://cthulhu.c3d2.de/~astro/badpit.html">E-Mail Address Index</a></p>
245
243
  </body>
246
244
  </html>
247
245
  </xsl:template>
@@ -4,7 +4,7 @@ require 'yaml'
4
4
  require 'logger/colors'
5
5
 
6
6
  class Harvester
7
- VERSION = '0.8.0.pre.2'
7
+ VERSION = '0.8.0.pre.3'
8
8
 
9
9
  attr_reader :config, :settings, :collections, :dbi, :logger
10
10
 
@@ -62,7 +62,7 @@ class Harvester
62
62
  database: config['db']['database'],
63
63
  user: config['db']['user'],
64
64
  password: config['db']['password'],
65
- host: "localhost",
65
+ host: "localhost", # FIXME?
66
66
  rescue Exception
67
67
  error 'Something is wrong with your database settings:'
68
68
  raise
@@ -81,7 +81,7 @@ COMMANDS:
81
81
  run run a complete harvester update
82
82
  fetch run only the fetch script
83
83
  generate run only the generate script
84
- chart run only the generate chart script
84
+ stats run only the generate chart script
85
85
  post run only the post processing script
86
86
  db start a database task (create or maintenance)
87
87
  clock start the scheduler (cron replacement)
@@ -104,11 +104,11 @@ OPTIONS:} # automatically added as --help
104
104
  op.on('-p', '--post_script FILE') do |post_script|
105
105
  options['post_script'] = post_script
106
106
  end
107
- op.on('-m', '--no-maintenance') do
108
- options['no-maintenance'] = true
107
+ op.on('-m', '--maintenance') do
108
+ options['maintenance'] = true
109
109
  end
110
- op.on('-c', '--chart') do
111
- options['chart'] = true
110
+ op.on('-s', '--stats') do
111
+ options['stats'] = true
112
112
  end
113
113
  end.parse!
114
114
 
@@ -124,10 +124,18 @@ OPTIONS:} # automatically added as --help
124
124
  def error(msg) @logger.error(msg) end
125
125
  def fatal(msg) @logger.fatal(msg) end
126
126
 
127
- # adds an info message before and after the block
128
- def task(msg) # MAYBE: nested spaces+behaviour
127
+ def task(msg) # adds an info message before and after the block MAYBE: nested spaces+behaviour
129
128
  info "[start] " + msg
130
129
  yield
131
130
  info "[done ] " + msg
132
131
  end
132
+
133
+ # database helpers
134
+ def sql_queries(task)
135
+ Dir[ File.dirname(__FILE__) + "/../data/sql/#{ @config['db']['driver'].downcase }/#{ task }*.sql" ].each
136
+ end
137
+
138
+ def sql_query(task)
139
+ File.dirname(__FILE__) + "/../data/sql/#{ @config['db']['driver'].downcase }/#{ task }.sql"
140
+ end
133
141
  end
@@ -0,0 +1,19 @@
1
+ # encoding: utf-8
2
+
3
+ require_relative '../harvester'
4
+
5
+ class Harvester
6
+ CREATE = true
7
+ # creates required database structure
8
+ def create!
9
+ info "CREATE"
10
+ task "create database tables" do
11
+ begin @dbi.transaction do
12
+ sql_queries(:create).each{ |sql|
13
+ info "* execute " + File.basename(sql)
14
+ @dbi.execute File.read(sql)
15
+ }
16
+ end; end
17
+ end
18
+ end
19
+ end
@@ -1,7 +1,6 @@
1
1
  # encoding: utf-8
2
2
 
3
3
  require_relative '../harvester'
4
- require_relative 'db'
5
4
  require_relative 'mrss'
6
5
 
7
6
  require 'eventmachine'
@@ -9,13 +8,74 @@ require 'em-http'
9
8
  require 'uri'
10
9
 
11
10
  class Harvester
12
- module FETCH; end
13
-
11
+ FETCH = true
14
12
  # fetches new feed updates and store them in the database
15
13
  def fetch!
16
14
  info "FETCH"
17
- maintenance! unless @settings['no-maintenance']
18
- Fetcher.run @dbi, @collections, @settings, @logger do |*args| update(*args) end # results will be passed to the update function
15
+ Fetcher.run @dbi, @collections, @settings, @logger do |*args| update_db(*args) end # results will be passed to the update function
16
+ end
17
+
18
+ private
19
+
20
+ # saves result of a request in db
21
+ def update_db(rss_url, new_source, collection, response, rss_url_nice = rss_url)
22
+ rss = MRSS.parse(response)
23
+
24
+ begin @dbi.transaction do
25
+ # update source
26
+ if new_source
27
+ @dbi.execute "INSERT INTO sources (collection, rss, last, title, link, description) VALUES (?, ?, ?, ?, ?, ?)",
28
+ collection, rss_url, response['Last-Modified'], rss.title, rss.link, rss.description
29
+ info rss_url_nice + "Added as source"
30
+ else
31
+ @dbi.execute "UPDATE sources SET last=?, title=?, link=?, description=? WHERE collection=? AND rss=?",
32
+ response['Last-Modified'], rss.title, rss.link, rss.description, collection, rss_url
33
+ debug rss_url_nice + "Source updated"
34
+ end
35
+
36
+ # update items
37
+ items_new, items_updated = 0, 0
38
+ rss.items.each { |item|
39
+ description = item.description
40
+
41
+ # Link mangling
42
+ begin
43
+ link = URI::join((rss.link.to_s == '') ? URI.parse(rss_url).to_s : rss.link.to_s, item.link || rss.link).to_s
44
+ rescue URI::Error
45
+ link = item.link
46
+ end
47
+
48
+ # Push into database
49
+ db_title, = *@dbi.execute("SELECT title FROM items WHERE rss=? AND link=?", rss_url, link).fetch
50
+
51
+ if db_title.nil? || db_title.empty? # item is new
52
+ begin
53
+ @dbi.execute "INSERT INTO items (rss, title, link, date, description) VALUES (?, ?, ?, ?, ?)",
54
+ rss_url, item.title, link, item.date.to_s, description
55
+ items_new += 1
56
+ #rescue DBI::ProgrammingError
57
+ # puts description
58
+ # puts "#{$!.class}: #{$!}\n#{$!.backtrace.join("\n")}"
59
+ end
60
+ else
61
+ @dbi.execute "UPDATE items SET title=?, description=?, date=? WHERE rss=? AND link=?",
62
+ item.title, description, item.date.to_s, rss_url, link
63
+ items_updated += 1
64
+ end
65
+
66
+ # Remove all enclosures
67
+ @dbi.execute "DELETE FROM enclosures WHERE rss=? AND link=?", rss_url, link
68
+
69
+ # Re-add all enclosures
70
+ item.enclosures.each do |enclosure|
71
+ href = URI::join((rss.link.to_s == '') ? link.to_s : rss.link.to_s, enclosure['href']).to_s
72
+ @dbi.execute "INSERT INTO enclosures (rss, link, href, mime, title, length) VALUES (?, ?, ?, ?, ?, ?)",
73
+ rss_url, link, href, enclosure['type'], enclosure['title'],
74
+ !enclosure['length'] || enclosure['length'].empty? ? 0 : enclosure['length']
75
+ end
76
+ }
77
+ info rss_url_nice + "#{ items_new } new items, #{ items_updated } updated"
78
+ end; end
19
79
  end
20
80
  end
21
81
 
@@ -47,7 +107,7 @@ module Harvester::Fetcher
47
107
  db_rss, last = dbi.execute("SELECT rss, last FROM sources WHERE collection=? AND rss=?",
48
108
  collection, rss_url).fetch
49
109
  new_source = db_rss.nil? || db_rss.empty?
50
- uri = URI.parse rss_url
110
+ uri = URI.parse(rss_url)
51
111
 
52
112
  # prepare request
53
113
  header = {}
@@ -77,7 +137,7 @@ module Harvester::Fetcher
77
137
  elsif http.response.size > settings['size limit'].to_i
78
138
  logger.warn rss_url_nice + "Got too big repsonse: #{ response.size } bytes"
79
139
  else
80
- yield rss_url, new_source, collection, http.response, rss_url_nice
140
+ yield rss_url, new_source, collection, http.response, rss_url_nice # TODO clean up
81
141
  end
82
142
 
83
143
  pending.delete rss_url # same url twice?
@@ -85,6 +145,7 @@ module Harvester::Fetcher
85
145
  end
86
146
  }
87
147
  }
148
+ EM.stop if pending.empty? # e.g. no collections configured
88
149
 
89
150
  EM.add_timer(settings['timeout'].to_i){
90
151
  pending.each { |rss_url| logger.warn rss_url_nice + 'Timed out' }
@@ -14,13 +14,12 @@ rescue LoadError
14
14
  end
15
15
 
16
16
  class Harvester
17
- module GENERATE; end
18
-
17
+ GENERATE = true
19
18
  # generates the static html/feed files
20
19
  def generate!
21
20
  info "GENERATE"
22
21
 
23
- f = Generator.new @dbi, @logger
22
+ f = Generator.new @dbi, @settings, @logger
24
23
  xslt = XML::XSLT.new
25
24
  xslt.xml = f.generate_root.to_s
26
25
 
@@ -54,9 +53,10 @@ end
54
53
  class Harvester::Generator
55
54
  FUNC_NAMESPACE = 'http://astroblog.spaceboyz.net/harvester/xslt-functions'
56
55
 
57
- def initialize(dbi, logger)
58
- @dbi = dbi
59
- @logger = logger
56
+ def initialize(dbi, settings, logger) # TODO default values for arg2 and arg3
57
+ @dbi = dbi
58
+ @settings = settings
59
+ @logger = logger
60
60
  %w(collection-items feed-items item-description item-images item-enclosures).each { |func|
61
61
  XML::XSLT.extFunction(func, FUNC_NAMESPACE, self)
62
62
  }
@@ -80,7 +80,7 @@ class Harvester::Generator
80
80
  EntityTranslator.run(root, true, @logger)
81
81
  end
82
82
 
83
- def collection_items(collection, max=23)
83
+ def collection_items(collection, max = 99)
84
84
  items = REXML::Element.new('items')
85
85
  @dbi.execute("SELECT items.title,items.date,items.link,items.rss FROM items,sources WHERE items.rss=sources.rss AND sources.collection LIKE ? ORDER BY items.date DESC LIMIT ?", collection, max.to_i).each{ |title,date,link,rss|
86
86
  if title # TODO: debug (sqlite)
@@ -95,7 +95,7 @@ class Harvester::Generator
95
95
  EntityTranslator.run(items, true, @logger)
96
96
  end
97
97
 
98
- def feed_items(rss, max=23)
98
+ def feed_items(rss, max = 23)
99
99
  items = REXML::Element.new('items')
100
100
  @dbi.execute("SELECT title,date,link FROM items WHERE rss=? ORDER BY date DESC LIMIT ?", rss, max.to_i).each{ |title,date,link| #p rss,title,date,link
101
101
  # p title
@@ -1,5 +1,7 @@
1
1
  # encoding: ascii
2
2
 
3
+ # FIXME: fix & integrate, maybe extra-gem?
4
+
3
5
  require_relative '../harvester'
4
6
 
5
7
  #require 'fastthread'
@@ -0,0 +1,42 @@
1
+ # encoding: utf-8
2
+
3
+ require_relative '../harvester'
4
+
5
+ class Harvester
6
+ MAINTENANCE = true
7
+ # check for feed source changes
8
+ def maintenance!
9
+ info "MAINTENANCE"
10
+ task "look for sources to purge" do
11
+ purge = []
12
+ @dbi.execute("SELECT collection, rss FROM sources").each{ |dbc,dbr|
13
+ purge << [dbc, dbr] unless (@collections[dbc] || []).include? dbr
14
+ }
15
+
16
+ purge_rss = []
17
+ purge.each { |c,r|
18
+ info "* remove #{c}:#{r}..."
19
+ @dbi.execute "DELETE FROM sources WHERE collection=? AND rss=?", c, r
20
+ purge_rss << r
21
+ }
22
+
23
+ purge_rss.delete_if { |r|
24
+ purge_this = true
25
+
26
+ @collections.each { |cfc,cfr|
27
+ if purge_this
28
+ warn "* must keep #{r} because it's still in #{cfc}" if cfr && cfr.include?(r)
29
+ purge_this = !(cfr && cfr.include?(r))
30
+ end
31
+ }
32
+
33
+ !purge_this
34
+ }
35
+ purge_rss.each { |r|
36
+ info "* purge items from feed #{r}"
37
+ @dbi.execute "DELETE FROM items WHERE rss=?", r
38
+ }
39
+ end
40
+ end
41
+ alias purge! maintenance!
42
+ end
@@ -1,6 +1,7 @@
1
1
  # encoding: utf-8
2
2
  # Magic RSS
3
3
  # Helps getting around with RSS while ignoring standards and tolerating much
4
+ # TODO: refactor out of harvester
4
5
 
5
6
  require 'rexml/document'
6
7
  require 'cgi'
@@ -187,6 +188,7 @@ class MRSS
187
188
  @e.s_text "link"
188
189
  end
189
190
  def description
191
+ @e.rss_content("itunes:summary") ||
190
192
  @e.rss_content("content:encoded") ||
191
193
  @e.rss_content("encoded") ||
192
194
  @e.rss_content("description")
@@ -291,9 +293,7 @@ class MRSS
291
293
  k = case xml.name
292
294
  when 'rss'
293
295
  RSS
294
- when 'rdf'
295
- RDF
296
- when 'RDF'
296
+ when *%w[rdf RDF]
297
297
  RDF
298
298
  when 'feed'
299
299
  ATOM
@@ -303,6 +303,7 @@ class MRSS
303
303
  k.new(xml)
304
304
  end
305
305
 
306
+ # TODO also refactor out of mrss?
306
307
  module Util
307
308
  def self.detect_time(s)
308
309
  tz_offset = 0
@@ -3,11 +3,12 @@
3
3
  require_relative '../harvester'
4
4
 
5
5
  class Harvester
6
- module POST; end
7
6
 
7
+ POST = true
8
8
  # runs the configured post processing scripts
9
9
  def post!(path = nil)
10
- task "post process" do
10
+ info 'POST'
11
+ task 'post process' do
11
12
  if post_script = path || @config['settings']['post_script']
12
13
  error "Cannot find an executable script at #{ post_script }" unless test('x', post_script)
13
14
  exec post_script, @config['settings']['output']
@@ -1,15 +1,13 @@
1
1
  # encoding: utf-8
2
2
 
3
3
  require_relative '../harvester'
4
- require_relative '../harvester/db'
5
4
  require 'gruff'
6
5
 
7
6
  class Harvester
8
- module CHART; end
9
-
7
+ STATS = true
10
8
  # generates a fetch statistic image
11
- def chart!
12
- info "CHART"
9
+ def stats!
10
+ info "STATS"
13
11
  task "generate chart" do
14
12
  c = Chart::StatsPerCollection.new
15
13
  @dbi.execute( File.read( sql_query(:chart) ) ).each{ |date,collection|
metadata CHANGED
@@ -2,7 +2,7 @@
2
2
  name: harvester
3
3
  version: !ruby/object:Gem::Version
4
4
  prerelease: 6
5
- version: 0.8.0.pre.2
5
+ version: 0.8.0.pre.3
6
6
  platform: ruby
7
7
  authors:
8
8
  - astro
@@ -14,7 +14,7 @@ autorequire:
14
14
  bindir: bin
15
15
  cert_chain: []
16
16
 
17
- date: 2011-06-06 00:00:00 Z
17
+ date: 2011-06-10 00:00:00 Z
18
18
  dependencies:
19
19
  - !ruby/object:Gem::Dependency
20
20
  name: rdbi
@@ -131,39 +131,42 @@ email: mail@janlelis.de
131
131
  executables:
132
132
  - harvester-new
133
133
  - harvester
134
+ - harvester-create
134
135
  - harvester-clock
135
- - harvester-db
136
- - harvester-chart
136
+ - harvester-stats
137
137
  - harvester-run
138
138
  - harvester-generate
139
139
  - harvester-fetch
140
140
  - harvester-jabber
141
141
  - harvester-post
142
+ - harvester-maintenance
142
143
  extensions: []
143
144
 
144
145
  extra_rdoc_files: []
145
146
 
146
147
  files:
148
+ - lib/harvester/maintenance.rb
147
149
  - lib/harvester/generate.rb
148
150
  - lib/harvester/generator/entity_translator.rb
149
151
  - lib/harvester/generator/link_absolutizer.rb
150
- - lib/harvester/db.rb
151
- - lib/harvester/chart.rb
152
+ - lib/harvester/create.rb
152
153
  - lib/harvester/post.rb
154
+ - lib/harvester/stats.rb
153
155
  - lib/harvester/jabber.rb
154
156
  - lib/harvester/fetch.rb
155
157
  - lib/harvester/mrss.rb
156
158
  - lib/harvester.rb
157
159
  - bin/harvester-new
158
160
  - bin/harvester
161
+ - bin/harvester-create
159
162
  - bin/harvester-clock
160
- - bin/harvester-db
161
- - bin/harvester-chart
163
+ - bin/harvester-stats
162
164
  - bin/harvester-run
163
165
  - bin/harvester-generate
164
166
  - bin/harvester-fetch
165
167
  - bin/harvester-jabber
166
168
  - bin/harvester-post
169
+ - bin/harvester-maintenance
167
170
  - README.rdoc
168
171
  - CHANGELOG.rdoc
169
172
  - data/ent/HTMLsymbol.ent
@@ -1,5 +0,0 @@
1
- #!/usr/bin/env ruby
2
- # encoding: utf-8
3
-
4
- require_relative '../lib/harvester/chart'
5
- Harvester.new_from_argv.chart!
@@ -1,15 +0,0 @@
1
- #!/usr/bin/env ruby
2
- # encoding: utf-8
3
-
4
- require_relative '../lib/harvester/db'
5
- harve = Harvester.new_from_argv
6
-
7
- case action = ARGV.shift
8
- when 'create'
9
- harve.create!
10
- when 'maintenance'
11
- harve.maintenance!
12
- else
13
- puts '[DB] Do nothing...'
14
- end
15
-
@@ -1,123 +0,0 @@
1
- # encoding: utf-8
2
-
3
- require_relative '../harvester'
4
-
5
- class Harvester
6
- module DB; end
7
-
8
- # creates required database structure
9
- def create!
10
- task "create database tables" do
11
- begin @dbi.transaction do
12
- sql_queries(:create).each{ |sql|
13
- info "* execute " + File.basename(sql)
14
- @dbi.execute File.read(sql)
15
- }
16
- end; end
17
- end
18
- end
19
-
20
- # check for feed source changes
21
- def maintenance!
22
- task "look for sources to purge" do
23
- purge = []
24
- @dbi.execute("SELECT collection, rss FROM sources").each{ |dbc,dbr|
25
- purge << [dbc, dbr] unless (@collections[dbc] || []).include? dbr
26
- }
27
-
28
- purge_rss = []
29
- purge.each { |c,r|
30
- info "* remove #{c}:#{r}..."
31
- @dbi.execute "DELETE FROM sources WHERE collection=? AND rss=?", c, r
32
- purge_rss << r
33
- }
34
-
35
- purge_rss.delete_if { |r|
36
- purge_this = true
37
-
38
- @collections.each { |cfc,cfr|
39
- if purge_this
40
- warn "* must keep #{r} because it's still in #{cfc}" if cfr && cfr.include?(r)
41
- purge_this = !(cfr && cfr.include?(r))
42
- end
43
- }
44
-
45
- !purge_this
46
- }
47
- purge_rss.each { |r|
48
- info "* purge items from feed #{r}"
49
- @dbi.execute "DELETE FROM items WHERE rss=?", r
50
- }
51
- end
52
- end
53
-
54
- private
55
-
56
- def update(rss_url, new_source, collection, response, rss_url_nice = rss_url)
57
- rss = MRSS.parse(response)
58
-
59
- begin @dbi.transaction do
60
- # update source
61
- if new_source
62
- @dbi.execute "INSERT INTO sources (collection, rss, last, title, link, description) VALUES (?, ?, ?, ?, ?, ?)",
63
- collection, rss_url, response['Last-Modified'], rss.title, rss.link, rss.description
64
- info rss_url_nice + "Added as source"
65
- else
66
- @dbi.execute "UPDATE sources SET last=?, title=?, link=?, description=? WHERE collection=? AND rss=?",
67
- response['Last-Modified'], rss.title, rss.link, rss.description, collection, rss_url
68
- debug rss_url_nice + "Source updated"
69
- end
70
-
71
- # update items
72
- items_new, items_updated = 0, 0
73
- rss.items.each { |item|
74
- description = item.description
75
-
76
- # Link mangling
77
- begin
78
- link = URI::join((rss.link.to_s == '') ? uri.to_s : rss.link.to_s, item.link || rss.link).to_s
79
- rescue URI::Error
80
- link = item.link
81
- end
82
-
83
- # Push into database
84
- db_title, = *@dbi.execute("SELECT title FROM items WHERE rss=? AND link=?", rss_url, link).fetch
85
-
86
- if db_title.nil? || db_title.empty? # item is new
87
- begin
88
- @dbi.execute "INSERT INTO items (rss, title, link, date, description) VALUES (?, ?, ?, ?, ?)",
89
- rss_url, item.title, link, item.date.to_s, description
90
- items_new += 1
91
- #rescue DBI::ProgrammingError
92
- # puts description
93
- # puts "#{$!.class}: #{$!}\n#{$!.backtrace.join("\n")}"
94
- end
95
- else
96
- @dbi.execute "UPDATE items SET title=?, description=?, date=? WHERE rss=? AND link=?",
97
- item.title, description, item.date.to_s, rss_url, link
98
- items_updated += 1
99
- end
100
-
101
- # Remove all enclosures
102
- @dbi.execute "DELETE FROM enclosures WHERE rss=? AND link=?", rss_url, link
103
-
104
- # Re-add all enclosures
105
- item.enclosures.each do |enclosure|
106
- href = URI::join((rss.link.to_s == '') ? link.to_s : rss.link.to_s, enclosure['href']).to_s
107
- @dbi.execute "INSERT INTO enclosures (rss, link, href, mime, title, length) VALUES (?, ?, ?, ?, ?, ?)",
108
- rss_url, link, href, enclosure['type'], enclosure['title'],
109
- !enclosure['length'] || enclosure['length'].empty? ? 0 : enclosure['length']
110
- end
111
- }
112
- info rss_url_nice + "#{ items_new } new items, #{ items_updated } updated"
113
- end; end
114
- end
115
-
116
- def sql_queries(task)
117
- Dir[ File.dirname(__FILE__) + "/../../data/sql/#{ @config['db']['driver'].downcase }/#{ task }*.sql" ].each
118
- end
119
-
120
- def sql_query(task)
121
- File.dirname(__FILE__) + "/../../data/sql/#{ @config['db']['driver'].downcase }/#{ task }.sql"
122
- end
123
- end