RubyGems - olek-libcraigscrape - Versions diffs - 1.0.3 → 1.1.0 - Mend

olek-libcraigscrape 1.0.3 → 1.1.0

Files changed (19) hide show

data/CHANGELOG +12 -6
data/COPYING.LESSER +1 -1
data/README +10 -10
data/Rakefile +5 -54
data/bin/craig_report_schema.yml +3 -3
data/bin/craigwatch +32 -44
data/bin/report_mailer/report.html.erb +17 -0
data/bin/report_mailer/{craigslist_report.plain.erb → report.text.erb} +6 -6
data/lib/geo_listings.rb +24 -24
data/lib/libcraigscrape.rb +6 -11
data/lib/listings.rb +62 -45
data/lib/posting.rb +153 -106
data/lib/scraper.rb +37 -94
data/test/libcraigscrape_test_helpers.rb +10 -10
data/test/test_craigslist_geolisting.rb +53 -53
data/test/test_craigslist_listing.rb +26 -26
data/test/test_craigslist_posting.rb +39 -38
metadata +38 -114
data/bin/report_mailer/craigslist_report.html.erb +0 -17

data/CHANGELOG CHANGED Viewed

@@ -1,34 +1,40 @@
 == Change Log
+=== Release 1.1
+- ruby 1.9.3 support
+- migrated from rails 2 gems to rails 3
+- fixed some new parsing bugs introduced by craigslist template changes
+- Replaced Net:Http with typhoeus
 === Release 1.0
 - Replaced hpricot dependency with Nokogiri. Nokogiri should be faster and more reliable. Whoo-hoo!
 === Release 0.9.1
 - Added support for posting_has_expired? and expired post recognition
-- Fixed a weird bug in craigwatch that would cause a scrape to abort if a flagged_for_removal? was encountered when using certain (minimal) filtering
+- Fixed a weird bug in craigwatch that would cause a scrape to abort if a flagged_for_removal? was encountered when using certain (minimal) filtering
 === Release 0.9 (Oct 01, 2010)
 - Minor adjustments to craigwatch to fix deprecation warnings in new ActiveRecord and ActionMailer gems
 - Added gem version specifiers to the Gem spec and to the require statements
 - Moved repo to github
-- Fixed an esoteric bug in craigwatch, affecting the last scraped post in a listing when that post was 'flagged for removal'.
+- Fixed an esoteric bug in craigwatch, affecting the last scraped post in a listing when that post was 'flagged for removal'.
 - Took all those extra package-building tasts out of the Rakefile since this is 2010 and we only party with gemfiles
 - Ruby 1.9 compatibility adjustments
 === Release 0.8.4 (Sep 6, 2010)
 - Someone found a way to screw up hpricot's to_s method (posting1938291834-090610.html) and fixed by added html_source to the craigslist Scraper object, which returns the body of the post without passing it through hpricot. Its a better way to go anyways, and re-wrote a couple incidentals to use the html_source method...
-- Adjusted the test cases a bit, since the user bodies being returned have less cleanup in their output than they had prior
+- Adjusted the test cases a bit, since the user bodies being returned have less cleanup in their output than they had prior
 === Release 0.8.3 (August 2, 2010)
 - Someone was posting really bad html that was screwing up Hpricot. Such is to be expected when you're soliciting html from the general public I suppose. Added test_bugs_found061710 posting test, and fixed by stripping out the user body before parsing with Hpricot.
-- Added a MaxRedirectError and corresponding maximum_redirects_per_request cattr for the Craigscrape objects. This fixed a weird bug where craigslist was sending us in redirect circles around 06/10
+- Added a MaxRedirectError and corresponding maximum_redirects_per_request cattr for the Craigscrape objects. This fixed a weird bug where craigslist was sending us in redirect circles around 06/10
 === Release 0.8.2 (April 17, 2010)
 - Found another odd parsing bug. Scrape sample is in 'listing_samples/mia_search_kitten.3.15.10.html', Adjusted CraigScrape::Listings::HEADER_DATE to fix.
 - Craigslist started added <span> tags in its post summaries. Fixed. See sample in test_new_listing_span051710
 === Release 0.8.1 (Feb 10, 2010)
-- Found an odd parsing bug occured for the first time today. Scrape sample is in 'listing_samples/mia_sss_kittens2.10.10.html', Adjusted CraigScrape::Listings::LABEL to fix.
+- Found an odd parsing bug occured for the first time today. Scrape sample is in 'listing_samples/mia_sss_kittens2.10.10.html', Adjusted CraigScrape::Listings::LABEL to fix.
 - Switched to require "active_support" per the deprecation notices
 - Little adjustments to fix the rdoc readibility
@@ -83,7 +89,7 @@
 - Adjusted the examples in the readme, added a "require 'rubygems'" to the top of the listing so that they would actually work if you tried to run them verbatim (Thanks J T!)
 - Restructured some of the parsing to be less leinient when scraped values aren't matching their regexp's in the PostSummary
 - It seems like craigslist returns a 404 on pages that exist, for no good reason on occasion. Added a retry mechanism that wont take no for an answer, unless we get a defineable number of them in a row
-- Added CraigScrape cattr_accessors : retries_on_fetch_fail, sleep_between_fetch_retries .
+- Added CraigScrape cattr_accessors : retries_on_fetch_fail, sleep_between_fetch_retries .
 - Adjusted craigwatch to not commit any database changes until the notification email goes out. This way if there's an error, the user wont miss any results on a re-run
 - Added a FetchError for http requests that don't return 200 or redirect...
 - Adjusted craigwatch to use scrape_until instead of scrape_since, this new approach cuts down on the url fetching by assuming that if we come across something we've already tracked, we dont need to keep going any further. NOTE: We still can't use a 'last_scraped_url' on the TrackedSearch model b/c sometimes posts get deleted.

data/COPYING.LESSER CHANGED Viewed

@@ -1,4 +1,4 @@
-       GNU LESSER GENERAL PUBLIC LICENSE
+		   GNU LESSER GENERAL PUBLIC LICENSE
                        Version 3, 29 June 2007
  Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>

data/README CHANGED Viewed

@@ -25,10 +25,10 @@ On the 'miami.craigslist.org' site, using the query "search/sss?query=apple"
   require 'libcraigscrape'
   require 'date'
   require 'pp'
   miami_cl = CraigScrape.new 'us/fl/miami'
   miami_cl.posts_since(Time.parse('Sep 10'), 'search/sss?query=apple').each do |post|
-    pp post
+    pp post
   end
 === Scrape Last 225 Craigslist Listings
@@ -38,26 +38,26 @@ On the 'miami.craigslist.org'  under the 'apa' category
   require 'rubygems'
   require 'libcraigscrape'
   require 'pp'
   i=1
   CraigScrape.new('us/fl/miami').each_post('apa') do |post|
     break if i > 225
-     i+=1
-     pp post
+  	 i+=1
+  	 pp post
   end
 === Multiple site with multiple section/search enumeration of posts
-In Florida, with the exception of 'miami.craigslist.org' & 'keys.craigslist.org' sites, output each post in
+In Florida, with the exception of 'miami.craigslist.org' & 'keys.craigslist.org' sites, output each post in
 the 'crg' category and for the search 'artist needed'
   require 'rubygems'
   require 'libcraigscrape'
   require 'pp'
   non_sfl_sites = CraigScrape.new('us/fl', '- us/fl/miami', '- us/fl/keys')
   non_sfl_sites.each_post('crg', 'search/sss?query=artist+needed') do |post|
-     pp post
+  	 pp post
   end
 === Scrape Single Craigslist Posting
@@ -66,7 +66,7 @@ This grabs the full details under the specific post http://miami.craigslist.org/
   require 'rubygems'
   require 'libcraigscrape'
   post = CraigScrape::Posting.new 'http://miami.craigslist.org/mdc/sys/1140808860.html'
   puts "(%s) %s:\n %s" % [ post.post_time.strftime('%b %d'), post.title, post.contents_as_plain ]
@@ -76,7 +76,7 @@ This grabs the post summaries of the single listings at http://miami.craigslist.
   require 'rubygems'
   require 'libcraigscrape'
   listing = CraigScrape::Listings.new 'http://miami.craigslist.org/search/sss?query=laptop'
   puts 'Found %d posts for the search "laptop" on this page' % listing.posts.length

data/Rakefile CHANGED Viewed

@@ -1,8 +1,8 @@
 require 'rake'
 require 'rake/clean'
-require 'rake/gempackagetask'
-require 'rake/rdoctask'
+require 'rdoc/task'
 require 'rake/testtask'
+require 'rubygems/package_task'
 require 'fileutils'
 require 'tempfile'
@@ -11,7 +11,7 @@ include FileUtils
 RbConfig = Config unless defined? RbConfig
 NAME = "olek-libcraigscrape"
-VERS = ENV['VERSION'] || "1.0.3"
+VERS = ENV['VERSION'] || "1.1.0"
 PKG = "#{NAME}-#{VERS}"
 RDOC_OPTS = ['--quiet', '--title', 'The libcraigscrape Reference', '--main', 'README', '--inline-source']
@@ -35,15 +35,8 @@ SPEC =
     s.homepage = 'http://www.derosetechnologies.com/community/libcraigscrape'
     s.rubyforge_project = 'libcraigwatch'
     s.files = PKG_FILES
-    s.require_paths = ["lib"]
+    s.require_paths = ["lib"]
     s.test_files = FileList['test/test_*.rb']
-    s.add_dependency 'nokogiri',     '>= 1.4.4'
-    s.add_dependency 'htmlentities', '>= 4.0.0'
-    s.add_dependency 'activesupport','>= 2.3.0', '< 3'
-    s.add_dependency 'activerecord', '>= 2.3.0', '< 3'
-    s.add_dependency 'actionmailer', '>= 2.3.0', '< 3'
-    s.add_dependency 'kwalify', '>= 0.7.2'
-    s.add_dependency 'sqlite3'
   end
 desc "Run all the tests"
@@ -61,7 +54,7 @@ Rake::RDocTask.new do |rdoc|
     rdoc.rdoc_files.add RDOC_FILES+Dir.glob('lib/*.rb').sort_by{|a,b| (a == 'lib/libcraigscrape.rb') ? -1 : 0 }
 end
-Rake::GemPackageTask.new(SPEC) do |p|
+Gem::PackageTask.new(SPEC) do |p|
   p.need_tar = false
   p.need_tar_gz = false
   p.need_tar_bz2 = false
@@ -81,45 +74,3 @@ end
 task :uninstall => [:clean] do
   sh %{sudo gem uninstall #{NAME}}
 end
-require 'roodi'
-require 'roodi_task'
-namespace :code_tests do
-  desc "Analyze for code complexity"
-  task :flog do
-    require 'flog'
-    flog = Flog.new
-    flog.flog_files ['lib']
-    threshold = 105
-    bad_methods = flog.totals.select do |name, score|
-       score > threshold
-    end
-    bad_methods.sort { |a,b| a[1] <=> b[1] }.each do |name, score|
-      puts "%8.1f: %s" % [score, name]
-    end
-    puts "WARNING : #{bad_methods.size} methods have a flog complexity > #{threshold}" unless bad_methods.empty?
-  end
-  desc "Analyze for code duplication"
-    require 'flay'
-    task :flay do
-    threshold = 25
-    flay = Flay.new({:fuzzy => false, :verbose => false, :mass => threshold})
-    flay.process(*Flay.expand_dirs_to_files(['lib']))
-    flay.report
-    raise "#{flay.masses.size} chunks of code have a duplicate mass > #{threshold}" unless flay.masses.empty?
-  end
-  RoodiTask.new 'roodi', ['lib/*.rb'], 'roodi.yml'
-end
-desc "Run all code tests"
-task :code_tests => %w(code_tests:flog code_tests:flay code_tests:roodi)

data/bin/craig_report_schema.yml CHANGED Viewed

@@ -25,7 +25,7 @@ mapping:
       mapping:
          "adapter":  { type: str, required: yes }
          "dbfile":   { type: str, required: no }
-         "host":    { type: str, required: no }
+         "host":  	{ type: str, required: no }
          "username": { type: str, required: no }
          "password": { type: str, required: no }
          "socket":   { type: str, required: no }
@@ -50,7 +50,7 @@ mapping:
               "summary_or_full_post_has_no": {type: seq, required: no, sequence: [ {type: str, unique: yes} ]}
               "location_has":                {type: seq, required: no, sequence: [ {type: str, unique: yes} ]}
               "location_has_no":             {type: seq, required: no, sequence: [ {type: str, unique: yes} ]}
-              "sites":
+              "sites":
                  type: seq
                  required: yes
                  sequence:
@@ -62,7 +62,7 @@ mapping:
                  sequence:
                    - type: str
                      unique: yes
-              "starting":
+              "starting":
                  type: str
                  required: no
                  pattern: /^[\d]{1,2}\/[\d]{1,2}\/(?:[\d]{2}|[\d]{4})$/

data/bin/craigwatch CHANGED Viewed

@@ -1,4 +1,5 @@
-#!/usr/bin/ruby
+#!/usr/bin/env ruby
+# encoding: UTF-8
 #
 # =craigwatch - A email-based "post monitoring" solution
 #
@@ -160,9 +161,9 @@ $: << File.dirname(__FILE__) + '/../lib'
 require 'rubygems'
-gem 'kwalify',      '~> 0.7'
-gem 'activerecord', '~> 2.3'
-gem 'actionmailer', '~> 2.3'
+gem 'kwalify'
+gem 'activerecord'
+gem 'actionmailer'
 require 'kwalify'
 require 'active_record'
@@ -252,7 +253,7 @@ class CraigReportDefinition #:nodoc:
     def starting_at
       (@starting) ?
-        Time.parse(@starting) :
+        Time.strptime(@starting, "%m/%d/%Y") :
         Time.now.yesterday.beginning_of_day
     end
@@ -290,17 +291,23 @@ class CraigReportDefinition #:nodoc:
     private
     def matches_all?(conditions, against)
-      against = against.to_a.compact
-      (conditions.nil? or conditions.all?{|c| against.any?{|a| match_against c, a } }) ? true : false
+      (conditions.nil? or conditions.all?{|c| sanitized_against(against).any?{|a| match_against c, a } }) ? true : false
     end
     def doesnt_match_any?(conditions, against)
-      against = against.to_a.compact
-      (conditions.nil? or conditions.all?{|c| against.any?{|a| !match_against c, a } }) ? true : false
+      (conditions.nil? or conditions.all?{|c| sanitized_against(against).any?{|a| !match_against c, a } }) ? true : false
     end
     def match_against(condition, against)
-      (against.scan( condition.is_re? ? condition.to_re : /#{condition}/i).length > 0) ? true : false
+      (CraigScrape::Scraper.he_decode(against).scan( condition.is_re? ? condition.to_re : /#{condition}/i).length > 0) ? true : false
+    end
+    # This is kind of a hack to deal with ruby 1.9. Really the filtering mechanism
+    # needs to be factored out and tested....
+    def sanitized_against(against)
+      against = against.lines if against.respond_to? :lines
+      against = against.to_a if against.respond_to? :to_a
+      (against.nil?) ? [] : against.compact
     end
   end
 end
@@ -353,24 +360,12 @@ class TrackedPost < ActiveRecord::Base #:nodoc:
 end
 class ReportMailer < ActionMailer::Base #:nodoc:
-  def report(to, sender, subject_template, report_tmpl)
-    formatted_subject = Time.now.strftime(subject_template)
-    recipients  to
-    from        sender
-    subject     formatted_subject
+  # default :template_path => File.dirname(__FILE__)
-    generate_view_parts 'craigslist_report', report_tmpl.merge({:subject =>formatted_subject})
-  end
-  def generate_view_parts(view_name, tmpl)
-    part( :content_type => "multipart/alternative" ) do |p|
-      [
-        { :content_type => "text/plain", :body => render_message("#{view_name.to_s}.plain.erb", tmpl) },
-        { :content_type => "text/html",  :body => render_message("#{view_name.to_s}.html.erb",  tmpl.merge({:part_container => p})) }
-      ].each { |parms| p.part parms.merge( { :charset => "UTF-8", :transfer_encoding => "7bit" } ) }
-    end
+  def report(to, sender, subject_template, report_tmpl)
+    subject = Time.now.strftime subject_template
+    @summaries = report_tmpl[:summaries]
+    mail :to => to, :subject => subject, :from => sender
   end
 end
@@ -405,13 +400,14 @@ parser.errors.each do |e|
 end and exit if parser.errors.length > 0
 # Initialize Action Mailer:
+ActionMailer::Base.prepend_view_path(File.dirname(__FILE__))
 ActionMailer::Base.logger = Logger.new STDERR if craig_report.debug_mailer?
 if craig_report.smtp_settings
-  ReportMailer.smtp_settings = craig_report.smtp_settings.symbolize_keys
+  ActionMailer::Base.smtp_settings = craig_report.smtp_settings
+  ActionMailer::Base.delivery_method = :smtp
 else
-  ReportMailer.delivery_method = :sendmail
+  ActionMailer::Base.delivery_method = :sendmail
 end
-ReportMailer.template_root = File.dirname __FILE__
 # Initialize the database:
 ActiveRecord::Base.logger = Logger.new STDERR if craig_report.debug_database?
@@ -517,7 +513,7 @@ report_summaries = craig_report.searches.collect do |search|
           # Now let's add these urls to the database so as to reduce memory overhead.
           # Keep in mind - they're not active until the email goes out.
           # also - we shouldn't have to worry about putting 'irrelevant' posts in the db, since
-          # the nbewest are always the first ones parsed:
+          # the newest are always the first ones parsed:
           tracked_listing.posts.create(
             :url => post.url,
             :created_at => newest_post_date
@@ -530,18 +526,10 @@ report_summaries = craig_report.searches.collect do |search|
     end
   end
   # Let's flatten the unique'd hash into a more useable array:
-  # NOTE: The reason we included a reject is a little complicated, but here's the gist:
-  #  * We try not to load the whole post if we don't have to
-  #  * Its possible that we met all the criterion of the passes_filter? with merely a header, and
-  #    if so we add a url to the summaries stack
-  #  * Unfortunately, when we later load that post in full, we may find that the post was posting_has_expired?
-  #    or flagged_for_removal?, etc.
-  #  * If this was the case, below we'll end up sorting against nil post_dates. This would fail.
-  #  * So - before we sort, we run a quick reject on nil post_dates
-  new_summaries = new_summaries.values.reject{|v| v.post_date.nil? }.sort{|a,b| a.post_date <=> b.post_date} # oldest goes to bottom
+  new_summaries = new_summaries.values.sort{|a,b| a.post_date <=> b.post_date} # oldest goes to bottom
   # Now Let's manage the tracking database:
   if new_summaries.length > 0
@@ -562,13 +550,13 @@ report_summaries = craig_report.searches.collect do |search|
 end
 # Time to send the email (maybe):
-unless report_summaries.select { |s| ! s[:postings].empty? }.empty?
-  ReportMailer.deliver_report(
+unless report_summaries.select { |s| !s[:postings].empty? }.empty?
+  ReportMailer.report(
     craig_report.email_to,
     craig_report.email_from,
     craig_report.report_name,
     {:summaries => report_summaries, :definition => craig_report}
-  )
+  ).deliver
 end
 # Commit (make 'active') all newly created tracked post urls:

data/bin/report_mailer/report.html.erb ADDED Viewed

@@ -0,0 +1,17 @@
+<h2><%=h @subject %></h2>
+<%@summaries.each do |summary| %>
+   <h3><%=h summary[:search].name%></h3>
+      <% if summary[:postings].length > 0 %>
+         <%summary[:postings].each do |post|%>
+            <%=('<p>%s <a href="%s">%s -</a>%s%s</p>' % [
+   			h(post.post_date.strftime('%b %d')),
+   			post.url,
+   			h(post.label),
+   			(post.location) ? '<font size="-1"> (%s)</font>' % h(post.location) : '',
+   			(post.has_pic_or_img?) ? ' <span style="color: orange"> img</span>': ''
+         	]).html_safe -%>
+         <% end %>
+      <% else %>
+         <p><i>No new postings were found, which matched the search criteria.</i></p>
+      <% end %>
+<% end %>

data/bin/report_mailer/{craigslist_report.plain.erb → report.text.erb} RENAMED Viewed

@@ -1,15 +1,15 @@
 CRAIGSLIST REPORTER
-<%@summaries.each do |summary| -%>
+<% @summaries.each do |summary| -%>
    <%=summary[:search].name %>
    <% summary[:postings].collect do |post| -%>
       <% if summary[:postings].length > 0 %>
       <%='%s : %s %s %s %s' % [
-            post.post_date.strftime('%b %d'),
-            post.label,
-            (post.location) ? " (#{post.location})" : '',
-            (post.has_pic_or_img?) ? ' [img]': '',
-            post.url
+			post.post_date.strftime('%b %d'),
+			post.label,
+			(post.location) ? " (#{post.location})" : '',
+			(post.has_pic_or_img?) ? ' [img]': '',
+			post.url
       ] -%>
       <% else %>
       No new postings were found, which matched the search criteria.

data/lib/geo_listings.rb CHANGED Viewed

@@ -1,19 +1,19 @@
 # = About geo_listings.rb
 #
 # This file contains the parsing code, and logic relating to geographic site pages and paths. You
-# should never need to include this file directly, as all of libcraigscrape's objects and methods
+# should never need to include this file directly, as all of libcraigscrape's objects and methods
 # are loaded when you use <tt>require 'libcraigscrape'</tt> in your code.
 #
 require 'scraper'
 class CraigScrape
-  # GeoListings represents a parsed Craigslist geo lisiting page. (i.e. {'http://geo.craigslist.org/iso/us'}[http://geo.craigslist.org/iso/us])
+  # GeoListings represents a parsed Craigslist geo lisiting page. (i.e. {'http://geo.craigslist.org/iso/us'}[http://geo.craigslist.org/iso/us])
   # These list all the craigslist sites in a given region.
   class GeoListings < Scraper
     GEOLISTING_BASE_URL = %{http://geo.craigslist.org/iso/}
     LOCATION_NAME    = /[ ]*\>[ ](.+)[ ]*/
     PATH_SCANNER     = /(?:\\\/|[^\/])+/
     URL_HOST_PART    = /^[^\:]+\:\/\/([^\/]+)[\/]?$/
@@ -31,18 +31,18 @@ class CraigScrape
       # Validate that required fields are present, at least - if we've downloaded it from a url
       parse_error! unless location
     end
     # Returns the GeoLocation's full name
     def location
       unless @location
         cursor = html % 'h3 > b > a:first-of-type'
-        cursor = cursor.next if cursor
+        cursor = cursor.next if cursor
         @location = $1 if cursor and LOCATION_NAME.match he_decode(cursor.to_s)
       end
       @location
     end
     # Returns a hash of site name to urls in the current listing
     def sites
       unless @sites
@@ -52,27 +52,27 @@ class CraigScrape
           @sites[site_name] = $1 if URL_HOST_PART.match el_a[:href]
         end
       end
       @sites
     end
     # This method will return an array of all possible sites that match the specified location path.
     # Sample location paths:
     # - us/ca
     # - us/fl/miami
     # - jp/fukuoka
     # - mx
-    # Here's how location paths work.
+    # Here's how location paths work.
     # - The components of the path are to be separated by '/' 's.
     # - Up to (and optionally, not including) the last component, the path should correspond against a valid GeoLocation url with the prefix of 'http://geo.craigslist.org/iso/'
     # - the last component can either be a site's 'prefix' on a GeoLocation page, or, the last component can just be a geolocation page itself, in which case all the sites on that page are selected.
     # - the site prefix is the first dns record in a website listed on a GeoLocation page. (So, for the case of us/fl/miami , the last 'miami' corresponds to the 'south florida' link on {'http://geo.craigslist.org/iso/us/fl'}[http://geo.craigslist.org/iso/us/fl]
     def self.sites_in_path(full_path, base_url = GEOLISTING_BASE_URL)
       # the base_url parameter is mostly so we can test this method
-      # Unfortunately - the easiest way to understand much of this is to see how craigslist returns
+      # Unfortunately - the easiest way to understand much of this is to see how craigslist returns
       # these geolocations. Watch what happens when you request us/fl/non-existant/page/here.
-      # I also made this a little forgiving in a couple ways not specified with official support, per
+      # I also made this a little forgiving in a couple ways not specified with official support, per
       # the rules above.
       full_path_parts = full_path.scan PATH_SCANNER
@@ -82,15 +82,15 @@ class CraigScrape
       full_path_parts.each_with_index do |part, i|
         # Let's un-escape the path-part, if needed:
-        part.gsub! "\\/", "/"
+        part.gsub! "\\/", "/"
         # If they're specifying a single site, this will catch and return it immediately
-        site = geo_listing.sites.find{ |n,s|
+        site = geo_listing.sites.find{ |n,s|
           (SITE_PREFIX.match s and $1 == part) or n == part
         } if geo_listing
         # This returns the site component of the found array
-        return [site.last] if site
+        return [site.last] if site
         begin
           # The URI escape is mostly needed to translate the space characters
@@ -109,9 +109,9 @@ class CraigScrape
       geo_listing.sites.collect{|n,s| s }
     end
-    # find_sites takes a single array of strings as an argument. Each string is to be either a location path
+    # find_sites takes a single array of strings as an argument. Each string is to be either a location path
     # (see sites_in_path), or a full site (in canonical form - ie "memphis.craigslist.org"). Optionally,
-    # each of this may/should contain a '+' or '-' prefix to indicate whether the string is supposed to
+    # each of this may/should contain a '+' or '-' prefix to indicate whether the string is supposed to
     # include sites from the master list, or remove them from the list. If no '+' or'-' is
     # specified, the default assumption is '+'. Strings are processed from left to right, which gives
     # a high degree of control over the selection set. Examples:
@@ -122,23 +122,23 @@ class CraigScrape
     # There's a lot of flexibility here, you get the idea.
     def self.find_sites(specs, base_url = GEOLISTING_BASE_URL)
       ret = []
       specs.each do |spec|
         (op,spec = $1,$2) if FIND_SITES_PARTS.match spec
-        spec = (spec.include? '.')  ? [spec] : sites_in_path(spec, base_url)
+        spec = (spec.include? '.')  ? [spec] : sites_in_path(spec, base_url)
         (op == '-') ? ret -= spec : ret |= spec
       end
       ret
     end
     private
     def self.bad_geo_path!(path)
       raise BadGeoListingPath, "Unable to load path #{path.inspect}, either you're having problems connecting to Craiglist, or your path is invalid."
     end
   end
 end