libcraigscrape 1.0 → 1.1.0

Sign up to get free protection for your applications and to get access to all the features.
data/bin/craigwatch CHANGED
@@ -1,9 +1,10 @@
1
- #!/usr/bin/ruby
1
+ #!/usr/bin/env ruby
2
+ # encoding: UTF-8
2
3
  #
3
4
  # =craigwatch - A email-based "post monitoring" solution
4
5
  #
5
- # Created alongside the libcraigscrape library, libcraigwatch was designed to take the monotony out of regular
6
- # craiglist monitoring. craigwatch is designed to be run at periodic intervals (hourly/daily/etc) through crontab
6
+ # Created alongside the libcraigscrape library, libcraigwatch was designed to take the monotony out of regular
7
+ # craiglist monitoring. craigwatch is designed to be run at periodic intervals (hourly/daily/etc) through crontab
7
8
  # and report all new postings within a listing or search url, since its last run, by email.
8
9
  #
9
10
  # For more information, head to the {craiglist monitoring}[http://www.derosetechnologies.com/community/libcraigscrape] help section of our website.
@@ -25,29 +26,19 @@
25
26
  # - location_has_no - (array of string or regexp) Only include posts which don't match against the post location
26
27
  #
27
28
  # Multiple searches can be combined into a single report, and results can be sorted by newest-first or oldest-first (default)
28
- #
29
+ #
29
30
  # Reporting output is easily customized html, handled by ActionMailer, and emails can be delivered via smtp or sendmail.
30
- # Database tracking of already-delivered posts is handled by ActiveRecord, and its driver-agnostic SQL supports all the
31
+ # Database tracking of already-delivered posts is handled by ActiveRecord, and its driver-agnostic SQL supports all the
31
32
  # major backends (sqllite/mysql/postgres/probably-all-others). Database sizes are contained by automatically pruning old results
32
33
  # that are no longer required at the end of each run.
33
34
  #
34
35
  # Pretty useful, no?
35
- #
36
+ #
36
37
  # == Installation
37
- # craigwatch is coupled with libcraigscrape, and is installed via ruby gems. However, since we focused on keeping the
38
- # libcraigscrape download 'lightweight' some additional gems need to be installed in addition to the initial libcraigscrape
39
- # gem itself.
40
- #
41
- # This should take care of the craigwatch install on all systems:
42
- # sudo gem install libcraigscrape kwalify activerecord actionmailer
43
- # Alternatively, if you've already installed libcraigscrape and want to start working with craigwatch:
44
- # sudo gem install kwalify activerecord actionmailer
45
- #
46
- # This script was initially developed with activerecord 2.3, actionmailer 2.3 and kwalify 0.7, but will likely work with most
47
- # prior and future versions of these libraries.
48
- #
38
+ # craigwatch is coupled with libcraigscrape, and is installed via ruby gems.
39
+ #
49
40
  # == Usage
50
- # When craigwatch is invoked, it is designed to run a single report and then terminate. There is only one parameter to craigwatch, and
41
+ # When craigwatch is invoked, it is designed to run a single report and then terminate. There is only one parameter to craigwatch, and
51
42
  # this parameter is the path to a valid report-definition yml file. ie:
52
43
  # craigwatch johns_daily_watch.yml
53
44
  #
@@ -55,6 +46,9 @@
55
46
  # Probably, the best way to understand the report definition files, is to look at the annotated sample file below, and use it as a
56
47
  # starting point for your own.
57
48
  #
49
+ # New in version 1.1.0 is ERB evaluation of the report-definiton file. This feature is automatic, just include the erb blocks you'd
50
+ # like, and the file will be evaluated at runtime.
51
+ #
58
52
  # By default there is no program output, however, setting any of the following paramters to 'yes' in your definition file will turn on
59
53
  # useful debugging/logging output:
60
54
  # - debug_database
@@ -63,10 +57,10 @@
63
57
  #
64
58
  # == Definition File Sample
65
59
  #
66
- # Let's start with a minimal report, just enough needed to get something quick working:
60
+ # Let's start with a minimal report, just enough needed to get something quick working:
67
61
  # # We need some kind of destination to send this to
68
62
  # email_to: Chris DeRose <cderose@derosetechnologies.com>
69
- #
63
+ #
70
64
  # # This is an array of specific 'searches' we'll be performing in this report:
71
65
  # searches:
72
66
  # # We're looking for 90's era cadillac, something cheap, confortable and in white...
@@ -85,7 +79,7 @@
85
79
  # summary_post_has_no: [ /xlr/i ]
86
80
  #
87
81
  # # We were convertable, and white/cream/etc:
88
- # full_post_has:
82
+ # full_post_has:
89
83
  # - /convertible/i
90
84
  # - /(white|yellow|banana|creme|cream)/i
91
85
  #
@@ -93,7 +87,7 @@
93
87
  # full_post_has_no:
94
88
  # - /simulated[^a-z]{0,2}convertible/i
95
89
  #
96
- # # We want to search all of craigslist's in the us, and we'll want to find it using
90
+ # # We want to search all of craigslist's in the us, and we'll want to find it using
97
91
  # # the '/search/cta?hasPic=1&query=cadillac' url on the site
98
92
  # sites: [ us ]
99
93
  # listings:
@@ -104,6 +98,9 @@
104
98
  # # The report_name is fed into Time.now.strftime, hence the formatting characters
105
99
  # report_name: Craig Watch For Johnathan on %D at %I:%M %p
106
100
  #
101
+ # # Overrides the default system time zone with an EST zone
102
+ # tz: EST
103
+ #
107
104
  # email_to: Johnathan Peabody <john@example.local>
108
105
  #
109
106
  # # This is sent straight into ActiveRecord, so there's plenty of options available here. the following is an easy
@@ -129,21 +126,21 @@
129
126
  #
130
127
  # # Oh, and we're on a budget:
131
128
  # price_less_than: 120
132
- #
129
+ #
133
130
  # # Search #2
134
131
  # - name: Large apartment rentals in San Francisco
135
132
  # sites: [ us/ca/sfbay ]
136
133
  # starting: 9/10/2009
137
- #
138
- # # We're going to rely on craigslist's built-in search for this one since there's a lot of listings, and we
134
+ #
135
+ # # We're going to rely on craigslist's built-in search for this one since there's a lot of listings, and we
139
136
  # # want to conserve some bandwidth
140
137
  # listings: [ /search/apa?query=pool&minAsk=min&maxAsk=max&bedrooms=5 ]
141
138
  #
142
139
  # # We'll require a price to be listed, 'cause it keeps out some of the unwanted fluff
143
140
  # price_required: yes
144
- #
141
+ #
145
142
  # # Hopefully this will keep us away from a bad part of town:
146
- # price_greater_than: 1000
143
+ # price_greater_than: 1000
147
144
  #
148
145
  # # Since we dont have time to driv to each location, we'll require only listings with pictures
149
146
  # has_image: yes
@@ -160,9 +157,9 @@ $: << File.dirname(__FILE__) + '/../lib'
160
157
 
161
158
  require 'rubygems'
162
159
 
163
- gem 'kwalify', '~> 0.7'
164
- gem 'activerecord', '~> 2.3'
165
- gem 'actionmailer', '~> 2.3'
160
+ gem 'kwalify'
161
+ gem 'activerecord'
162
+ gem 'actionmailer'
166
163
 
167
164
  require 'kwalify'
168
165
  require 'active_record'
@@ -170,19 +167,20 @@ require 'action_mailer'
170
167
  require 'kwalify/util/hashlike'
171
168
  require 'libcraigscrape'
172
169
  require "socket"
170
+ require 'active_support/all'
173
171
 
174
172
  class String #:nodoc:
175
173
  RE = /^\/(.*)\/([ixm]*)$/
176
-
174
+
177
175
  def is_re?
178
176
  (RE.match self) ? true : false
179
177
  end
180
-
178
+
181
179
  def to_re
182
180
  source, options = ( RE.match(self) )? [$1, $2] : [self,nil]
183
181
  mods = 0
184
182
 
185
- options.each_char do |c|
183
+ options.each_char do |c|
186
184
  mods |= case c
187
185
  when 'i' then Regexp::IGNORECASE
188
186
  when 'x' then Regexp::EXTENDED
@@ -199,12 +197,19 @@ class CraigReportDefinition #:nodoc:
199
197
 
200
198
  EMAIL_NAME_PARTS = /^[ ]*(.+)[ ]*\<.+\>[ ]*/
201
199
 
202
- attr_reader :report_name, :email_to, :email_from, :tracking_database, :searches, :smtp_settings
200
+ attr_reader :report_name, :email_to, :email_from, :tracking_database, :searches,
201
+ :smtp_settings, :tz
203
202
 
204
203
  def debug_database?; @debug_database; end
205
204
  def debug_mailer?; @debug_mailer; end
206
205
  def debug_craigscrape?; @debug_craigscrape; end
207
206
 
207
+ # Returns the configuration report zone, if defined. Otherwise pulls the zone
208
+ # from the system's default local zone
209
+ def tz
210
+ @tz || Time.new.zone
211
+ end
212
+
208
213
  def email_from
209
214
  (@email_from) ? @email_from : ('%s@%s' % [ENV['USER'], Socket.gethostname])
210
215
  end
@@ -224,59 +229,66 @@ class CraigReportDefinition #:nodoc:
224
229
  :adapter => 'sqlite3',
225
230
  :database => File.basename(for_yaml_file, File.extname(for_yaml_file))+'.db'
226
231
  } if for_yaml_file
227
-
228
- # This is a little hack to make sqlite definitions a little more portable, by allowing them
232
+
233
+ # This is a little hack to make sqlite definitions a little more portable, by allowing them
229
234
  # to be specify dbfile's relative to the yml's directory:
230
235
  ret = @tracking_database
231
236
  ret['dbfile'] = '%s/%s' % [File.dirname(for_yaml_file), $1] if (
232
237
  for_yaml_file and ret.has_key? 'dbfile' and /^([^\/].*)$/.match ret['dbfile']
233
238
  )
234
-
239
+
235
240
  ret
236
241
  end
237
242
 
238
243
  class SearchDefinition #:nodoc:
239
- include Kwalify::Util::HashLike
240
-
244
+ include Kwalify::Util::HashLike
245
+
241
246
  attr_reader :name, :sites, :listings
242
247
  attr_reader :location_has, :location_has_no
243
248
  attr_reader :full_post_has, :full_post_has_no
244
249
  attr_reader :summary_post_has, :summary_post_has_no
245
250
  attr_reader :summary_or_full_post_has, :summary_or_full_post_has_no
246
-
247
- attr_reader :price_greater_than,:price_less_than
248
251
 
249
252
  def has_image?; @has_image; end
250
253
  def newest_first?; @newest_first; end
251
254
  def price_required?; @price_required; end
252
-
255
+
256
+ def price_greater_than
257
+ Money.new(@price_greater_than*100, 'USD') if @price_greater_than
258
+ end
259
+
260
+ def price_less_than
261
+ Money.new(@price_less_than*100, 'USD') if @price_less_than
262
+ end
263
+
253
264
  def starting_at
254
- (@starting) ?
255
- Time.parse(@starting) :
256
- Time.now.yesterday.beginning_of_day
265
+ (@starting) ?
266
+ Date.strptime(@starting, ['%m','%d',
267
+ /\/(?:[\d]{4})$/.match(@starting) ? '%Y' : '%y'].join('/') ) :
268
+ Date.yesterday
257
269
  end
258
-
259
- def passes_filter?(post)
270
+
271
+ def passes_filter?(post)
260
272
  if post.price.nil?
261
273
  return false if price_required?
262
274
  else
263
- return false if @price_greater_than and post.price <= @price_greater_than
264
- return false if @price_less_than and post.price >= @price_less_than
275
+ return false if price_greater_than and post.price <= price_greater_than
276
+ return false if price_less_than and post.price >= price_less_than
265
277
  end
266
-
278
+
267
279
  # Label Filters:
268
280
  return false unless matches_all? summary_post_has, post.label
269
281
  return false unless doesnt_match_any? summary_post_has_no, post.label
270
-
282
+
271
283
  # Location Filters:
272
284
  return false unless matches_all? location_has, post.location
273
285
  return false unless doesnt_match_any? location_has_no, post.location
274
-
286
+
275
287
  # Full post Filters:
276
288
  if full_post_has or full_post_has_no or summary_or_full_post_has or summary_or_full_post_has_no
277
289
  # We're going to download the page, so let's make sure we didnt hit a "This posting has been flagged for removal"
278
290
  return false if post.system_post?
279
-
291
+
280
292
  return false unless matches_all? full_post_has, post.contents_as_plain
281
293
  return false unless doesnt_match_any? full_post_has_no, post.contents_as_plain
282
294
 
@@ -286,21 +298,27 @@ class CraigReportDefinition #:nodoc:
286
298
 
287
299
  true
288
300
  end
289
-
301
+
290
302
  private
291
-
303
+
292
304
  def matches_all?(conditions, against)
293
- against = against.to_a
294
- (conditions.nil? or conditions.all?{|c| against.any?{|a| match_against c, a } }) ? true : false
305
+ (conditions.nil? or conditions.all?{|c| sanitized_against(against).any?{|a| match_against c, a } }) ? true : false
295
306
  end
296
-
307
+
297
308
  def doesnt_match_any?(conditions, against)
298
- against = against.to_a
299
- (conditions.nil? or conditions.all?{|c| against.any?{|a| !match_against c, a } }) ? true : false
309
+ (conditions.nil? or conditions.all?{|c| sanitized_against(against).any?{|a| !match_against c, a } }) ? true : false
300
310
  end
301
-
311
+
302
312
  def match_against(condition, against)
303
- (against.scan( condition.is_re? ? condition.to_re : /#{condition}/i).length > 0) ? true : false
313
+ (CraigScrape::Scraper.he_decode(against).scan( condition.is_re? ? condition.to_re : /#{condition}/i).length > 0) ? true : false
314
+ end
315
+
316
+ # This is kind of a hack to deal with ruby 1.9. Really the filtering mechanism
317
+ # needs to be factored out and tested....
318
+ def sanitized_against(against)
319
+ against = against.lines if against.respond_to? :lines
320
+ against = against.to_a if against.respond_to? :to_a
321
+ (against.nil?) ? [] : against.compact
304
322
  end
305
323
  end
306
324
  end
@@ -309,11 +327,11 @@ class TrackedSearch < ActiveRecord::Base #:nodoc:
309
327
  has_many :listings, :dependent => :destroy, :class_name => 'TrackedListing'
310
328
  validates_uniqueness_of :search_name
311
329
  validates_presence_of :search_name
312
-
330
+
313
331
  def self.find_by_name(name)
314
332
  self.find :first, :conditions => ['search_name = ?',name]
315
333
  end
316
-
334
+
317
335
  def find_listing_by_url(url)
318
336
  listings.find :first, :conditions => ['url = ?', url]
319
337
  end
@@ -330,9 +348,8 @@ class TrackedListing < ActiveRecord::Base #:nodoc:
330
348
  def last_tracked_at
331
349
  self.posts.maximum 'created_at'
332
350
  end
333
-
351
+
334
352
  def delete_posts_older_than(cutoff_date)
335
- # TODO: can't I use posts.delete 'created_at < ?' and keep it cleaner?
336
353
  TrackedPost.delete_all [ 'tracked_listing_id = ? AND created_at < ?', self.id, cutoff_date ]
337
354
  end
338
355
  end
@@ -342,11 +359,11 @@ class TrackedPost < ActiveRecord::Base #:nodoc:
342
359
 
343
360
  def self.activate_all!
344
361
  TrackedPost.update_all(
345
- { :active => true },
346
- [ 'active = ?', false ]
362
+ { :active => true },
363
+ [ 'active = ?', false ]
347
364
  )
348
365
  end
349
-
366
+
350
367
  def self.destroy_inactive!
351
368
  TrackedPost.delete_all [ 'active = ?', false ]
352
369
  end
@@ -354,23 +371,9 @@ end
354
371
 
355
372
  class ReportMailer < ActionMailer::Base #:nodoc:
356
373
  def report(to, sender, subject_template, report_tmpl)
357
-
358
- formatted_subject = Time.now.strftime(subject_template)
359
-
360
- recipients to
361
- from sender
362
- subject formatted_subject
363
-
364
- generate_view_parts 'craigslist_report', report_tmpl.merge({:subject =>formatted_subject})
365
- end
374
+ @summaries = report_tmpl[:summaries]
366
375
 
367
- def generate_view_parts(view_name, tmpl)
368
- part( :content_type => "multipart/alternative" ) do |p|
369
- [
370
- { :content_type => "text/plain", :body => render_message("#{view_name.to_s}.plain.erb", tmpl) },
371
- { :content_type => "text/html", :body => render_message("#{view_name.to_s}.html.erb", tmpl.merge({:part_container => p})) }
372
- ].each { |parms| p.part parms.merge( { :charset => "UTF-8", :transfer_encoding => "7bit" } ) }
373
- end
376
+ mail :to => to, :subject => Time.zone.now.strftime(subject_template), :from => sender
374
377
  end
375
378
  end
376
379
 
@@ -383,7 +386,7 @@ unless report_definition_file
383
386
  puts <<EOD
384
387
  Usage:
385
388
  #{File.basename($0)} [report_definition_file]
386
-
389
+
387
390
  Run 'gem server' and browse the libcraigscrape rdoc for 'bin/craigscrape' for specific usage details.
388
391
  EOD
389
392
  exit
@@ -397,20 +400,25 @@ parser = Kwalify::Yaml::Parser.new(
397
400
  :data_binding => true
398
401
  )
399
402
 
400
- craig_report = parser.parse_file report_definition_file
403
+ report_definition_file_content = ERB.new(File.read(report_definition_file)).result
404
+ craig_report = parser.parse(report_definition_file_content, filename: report_definition_file)
401
405
 
402
406
  parser.errors.each do |e|
403
407
  puts "Definition Validation Error (line #{e.linenum}, char #{e.column}): #{e.message}"
404
408
  end and exit if parser.errors.length > 0
405
409
 
410
+ # Set the time zone:
411
+ Time.zone = craig_report.tz
412
+
406
413
  # Initialize Action Mailer:
414
+ ActionMailer::Base.prepend_view_path(File.dirname(__FILE__))
407
415
  ActionMailer::Base.logger = Logger.new STDERR if craig_report.debug_mailer?
408
416
  if craig_report.smtp_settings
409
- ReportMailer.smtp_settings = craig_report.smtp_settings
417
+ ActionMailer::Base.smtp_settings = craig_report.smtp_settings.symbolize_keys
418
+ ActionMailer::Base.delivery_method = :smtp
410
419
  else
411
- ReportMailer.delivery_method = :sendmail
420
+ ActionMailer::Base.delivery_method = :sendmail
412
421
  end
413
- ReportMailer.template_root = File.dirname __FILE__
414
422
 
415
423
  # Initialize the database:
416
424
  ActiveRecord::Base.logger = Logger.new STDERR if craig_report.debug_database?
@@ -421,16 +429,16 @@ CraigScrape::Scraper.logger = Logger.new STDERR if craig_report.debug_craigscrap
421
429
 
422
430
  # Perform migrations if needed?
423
431
  ActiveRecord::Schema.define do
424
- suppress_messages do
432
+ suppress_messages do
425
433
  create_table :tracked_searches do |t|
426
434
  t.column :search_name, :string
427
435
  end unless table_exists? :tracked_searches
428
-
436
+
429
437
  create_table :tracked_listings do |t|
430
438
  t.column :url, :string
431
439
  t.column :tracked_search_id, :integer
432
- end unless table_exists? :tracked_listings
433
-
440
+ end unless table_exists? :tracked_listings
441
+
434
442
  create_table :tracked_posts do |t|
435
443
  t.column :url, :string
436
444
  t.column :tracked_listing_id, :integer
@@ -440,7 +448,7 @@ ActiveRecord::Schema.define do
440
448
  end
441
449
  end
442
450
 
443
- # Remove all posts which are inactive. They would be in there if the prior run was a failure.
451
+ # Remove all posts which are inactive. They would be in there if the prior run was a failure.
444
452
  TrackedPost.destroy_inactive!
445
453
 
446
454
  # We'll need these outside this next loop:
@@ -450,80 +458,80 @@ newly_tracked_posts = []
450
458
  report_summaries = craig_report.searches.collect do |search|
451
459
  # Load our tracking info
452
460
  search_track = TrackedSearch.find_by_name search.name
453
-
461
+
454
462
  # No Tracking found - let's set one up:
455
463
  search_track = TrackedSearch.create! :search_name => search.name unless search_track
456
-
464
+
457
465
  # This hash tracks what makes it into the report on this search.
458
466
  # NOTE that keys are url's b/c sometimes the same posting will end up in multiple listings,
459
467
  # And doing this ensures that we don't end-up reporting the same post twice.
460
468
  new_summaries = {}
461
-
469
+
462
470
  # And now we actually scrape:
463
471
  CraigScrape.new(*search.sites).each_listing(*search.listings) do |listing|
464
- # Keep in mind that listing.url does change in the while loop.
472
+ # Keep in mind that listing.url does change in the while loop.
465
473
  # But, this first one is a good base_url that will never change between runs.
466
474
 
467
475
  tracked_listing = search_track.find_listing_by_url listing.url
468
476
  tracked_listing ||= search_track.listings.create! :url => listing.url
469
-
470
- # Gives us a sane stopping point (hopefully) :
471
- last_tracked_at = tracked_listing.last_tracked_at
477
+
478
+ # Gives us a sane stopping point (hopefully) :
479
+ last_tracked_at = tracked_listing.last_tracked_at.try(:to_date)
472
480
  last_tracked_at ||= search.starting_at
473
481
 
474
482
  # Some more stopping points (probably):
475
483
  already_tracked_urls = tracked_listing.posts.collect{|tp| tp.url}
476
484
 
477
485
  # We'll use this in the loop to decide what posts to track:
478
- newest_post_date = last_tracked_at
479
-
486
+ newest_post_date = last_tracked_at
487
+
480
488
  # We keep track of post.post_date here, b/c in some circumstances, you can be in the below loop
481
489
  # but have no post.post_date since the posting was removed and it parsed to nil
482
- most_recent_posting_date = Time.now
483
-
490
+ most_recent_posting_date = Date.new
491
+
484
492
  # OK - Now let's go!
485
493
  catch :list_break do
486
494
  while listing
487
495
  listing.posts.each do |post|
488
496
  begin
489
497
  most_recent_posting_date = post.post_date if post.post_date
490
-
498
+
491
499
  # Are we at a point in the scrape, past which we don't need to proceed?
492
500
  throw :list_break if (
493
- most_recent_posting_date < last_tracked_at or
501
+ most_recent_posting_date.to_time < last_tracked_at or
494
502
  already_tracked_urls.include? post.url
495
503
  )
496
-
504
+
497
505
  # If we want to report this post, add it to the collection:
498
506
  new_summaries[post.url] = post if (
499
- !new_summaries.has_key? post.url and
507
+ !new_summaries.has_key? post.url and
500
508
  search.passes_filter? post
501
509
  )
502
- rescue CraigScrape::Scraper::ResourceNotFoundError,CraigScrape::Scraper::MaxRedirectError => e
510
+ rescue CraigScrape::Scraper::ResourceNotFoundError => e
503
511
  # Sometimes we do end up with 404's that will never load, and we dont want to
504
512
  # abort a run simply b/c we found some anomaly due to the craigslist index.
505
- # being out of date. This ResourceNotFoundError can occur due to
506
- # loading the post url in full, only to see that it was yanked - or craigslist
513
+ # being out of date. This ResourceNotFoundError can occur due to
514
+ # loading the post url in full, only to see that it was yanked - or craigslist
507
515
  # is acting funny.
508
516
  next
509
517
  end
510
-
518
+
511
519
  # Now let's see if the url should be kept in our tracking database for the future...
512
520
 
513
521
  # This post-date sets a limit for the tracked_listing.posts.create below
514
522
  newest_post_date = most_recent_posting_date if most_recent_posting_date > newest_post_date
515
-
523
+
516
524
  # Now let's add these urls to the database so as to reduce memory overhead.
517
525
  # Keep in mind - they're not active until the email goes out.
518
- # also - we shouldn't have to worry about putting 'irrelevant' posts in the db, since
519
- # the nbewest are always the first ones parsed:
526
+ # also - we shouldn't have to worry about putting 'irrelevant' posts in the db, since
527
+ # the newest are always the first ones parsed:
520
528
  tracked_listing.posts.create(
521
- :url => post.url,
522
- :created_at => newest_post_date
529
+ :url => post.url,
530
+ :created_at => newest_post_date
523
531
  ) unless most_recent_posting_date < newest_post_date
524
532
 
525
533
  end
526
-
534
+
527
535
  listing = listing.next_page
528
536
  end
529
537
  end
@@ -532,41 +540,35 @@ report_summaries = craig_report.searches.collect do |search|
532
540
 
533
541
 
534
542
  # Let's flatten the unique'd hash into a more useable array:
535
- # NOTE: The reason we included a reject is a little complicated, but here's the gist:
536
- # * We try not to load the whole post if we don't have to
537
- # * Its possible that we met all the criterion of the passes_filter? with merely a header, and
538
- # if so we add a url to the summaries stack
539
- # * Unfortunately, when we later load that post in full, we may find that the post was posting_has_expired?
540
- # or flagged_for_removal?, etc.
541
- # * If this was the case, below we'll end up sorting against nil post_dates. This would fail.
542
- # * So - before we sort, we run a quick reject on nil post_dates
543
- new_summaries = new_summaries.values.reject{|v| v.post_date.nil? }.sort{|a,b| a.post_date <=> b.post_date} # oldest goes to bottom
544
-
543
+ new_summaries = new_summaries.values.sort{|a,b| a.post_date <=> b.post_date} # oldest goes to bottom
544
+
545
545
  # Now Let's manage the tracking database:
546
- if new_summaries.length > 0
546
+ if new_summaries.length > 0
547
547
 
548
548
  # We'll use this in the cleanup at the bottom:
549
549
  latest_post_date = new_summaries.last.post_date
550
-
551
- new_summaries.reverse! if search.newest_first?
550
+
551
+ new_summaries.reverse! if search.newest_first?
552
552
  end
553
-
553
+
554
554
  # We'll want to email these...
555
- {
555
+ {
556
556
  :latest_post_date => latest_post_date,
557
- :search_track => search_track,
558
- :postings => new_summaries,
557
+ :search_track => search_track,
558
+ :postings => new_summaries,
559
559
  :search => search
560
560
  }
561
561
  end
562
562
 
563
- # Time to send the email:
564
- ReportMailer.deliver_report(
565
- craig_report.email_to,
566
- craig_report.email_from,
567
- craig_report.report_name,
568
- {:summaries => report_summaries, :definition => craig_report}
569
- ) if report_summaries.length > 0
563
+ # Time to send the email (maybe):
564
+ unless report_summaries.select { |s| !s[:postings].empty? }.empty?
565
+ ReportMailer.report(
566
+ craig_report.email_to,
567
+ craig_report.email_from,
568
+ craig_report.report_name,
569
+ {:summaries => report_summaries, :definition => craig_report}
570
+ ).deliver
571
+ end
570
572
 
571
573
  # Commit (make 'active') all newly created tracked post urls:
572
574
  TrackedPost.activate_all!
@@ -576,4 +578,4 @@ report_summaries.each do |summary|
576
578
  summary[:search_track].listings.each do |listing|
577
579
  listing.delete_posts_older_than listing.last_tracked_at
578
580
  end
579
- end
581
+ end
@@ -0,0 +1,20 @@
1
+ <h2><%=h @subject %></h2>
2
+ <%@summaries.each do |summary| %>
3
+ <h3><%=h summary[:search].name%></h3>
4
+ <% if summary[:postings].length > 0 %>
5
+ <%summary[:postings].each do |post|%>
6
+ <p>
7
+ <%=('%s <a href="%s">%s</a>' % [
8
+ h(post.post_date.strftime('%b %d')), post.url, h(post.title)
9
+ ]).html_safe %>
10
+ <%=([
11
+ (post.price) ? h(post.price.try(:format, :no_cents => true)) : nil,
12
+ (post.location) ? '<font size="-1"> (%s)</font>' % h(post.location) : nil,
13
+ (post.has_pic_or_img?) ? ' <span style="color: orange"> img</span>': nil
14
+ ].compact.join(' ')).html_safe -%>
15
+ </p>
16
+ <% end %>
17
+ <% else %>
18
+ <p><i>No new postings were found, which matched the search criteria.</i></p>
19
+ <% end %>
20
+ <% end %>
@@ -1,18 +1,19 @@
1
1
  CRAIGSLIST REPORTER
2
2
 
3
- <%@summaries.each do |summary| -%>
3
+ <% @summaries.each do |summary| -%>
4
4
  <%=summary[:search].name %>
5
5
  <% summary[:postings].collect do |post| -%>
6
6
  <% if summary[:postings].length > 0 %>
7
- <%='%s : %s %s %s %s' % [
7
+ <%='%s : %s %s %s %s %s' % [
8
8
  post.post_date.strftime('%b %d'),
9
- post.label,
10
- (post.location) ? " (#{post.location})" : '',
11
- (post.has_pic_or_img?) ? ' [img]': '',
9
+ post.title,
10
+ post.price.try(:format, :no_cents => true),
11
+ (post.location) ? " (#{post.location})" : nil,
12
+ (post.has_pic_or_img?) ? ' [img]': nil,
12
13
  post.url
13
14
  ] -%>
14
15
  <% else %>
15
16
  No new postings were found, which matched the search criteria.
16
17
  <% end %>
17
18
  <% end %>
18
- <% end -%>
19
+ <% end -%>
data/lib/geo_listings.rb CHANGED
@@ -141,4 +141,4 @@ class CraigScrape
141
141
  end
142
142
 
143
143
  end
144
- end
144
+ end