RubyGems - olek-libcraigscrape - Versions diffs - 1.0.3 → 1.1.0 - Mend

olek-libcraigscrape 1.0.3 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

data/CHANGELOG +12 -6
data/COPYING.LESSER +1 -1
data/README +10 -10
data/Rakefile +5 -54
data/bin/craig_report_schema.yml +3 -3
data/bin/craigwatch +32 -44
data/bin/report_mailer/report.html.erb +17 -0
data/bin/report_mailer/{craigslist_report.plain.erb → report.text.erb} +6 -6
data/lib/geo_listings.rb +24 -24
data/lib/libcraigscrape.rb +6 -11
data/lib/listings.rb +62 -45
data/lib/posting.rb +153 -106
data/lib/scraper.rb +37 -94
data/test/libcraigscrape_test_helpers.rb +10 -10
data/test/test_craigslist_geolisting.rb +53 -53
data/test/test_craigslist_listing.rb +26 -26
data/test/test_craigslist_posting.rb +39 -38
metadata +38 -114
data/bin/report_mailer/craigslist_report.html.erb +0 -17

data/CHANGELOG CHANGED Viewed

@@ -1,34 +1,40 @@
 == Change Log
+=== Release 1.1
+- ruby 1.9.3 support
+- migrated from rails 2 gems to rails 3
+- fixed some new parsing bugs introduced by craigslist template changes
+- Replaced Net:Http with typhoeus
 === Release 1.0
 - Replaced hpricot dependency with Nokogiri. Nokogiri should be faster and more reliable. Whoo-hoo!
 === Release 0.9.1
 - Added support for posting_has_expired? and expired post recognition
-- Fixed a weird bug in craigwatch that would cause a scrape to abort if a flagged_for_removal? was encountered when using certain (minimal) filtering
+- Fixed a weird bug in craigwatch that would cause a scrape to abort if a flagged_for_removal? was encountered when using certain (minimal) filtering
 === Release 0.9 (Oct 01, 2010)
 - Minor adjustments to craigwatch to fix deprecation warnings in new ActiveRecord and ActionMailer gems
 - Added gem version specifiers to the Gem spec and to the require statements
 - Moved repo to github
-- Fixed an esoteric bug in craigwatch, affecting the last scraped post in a listing when that post was 'flagged for removal'.
+- Fixed an esoteric bug in craigwatch, affecting the last scraped post in a listing when that post was 'flagged for removal'.
 - Took all those extra package-building tasts out of the Rakefile since this is 2010 and we only party with gemfiles
 - Ruby 1.9 compatibility adjustments
 === Release 0.8.4 (Sep 6, 2010)
 - Someone found a way to screw up hpricot's to_s method (posting1938291834-090610.html) and fixed by added html_source to the craigslist Scraper object, which returns the body of the post without passing it through hpricot. Its a better way to go anyways, and re-wrote a couple incidentals to use the html_source method...
-- Adjusted the test cases a bit, since the user bodies being returned have less cleanup in their output than they had prior
+- Adjusted the test cases a bit, since the user bodies being returned have less cleanup in their output than they had prior
 === Release 0.8.3 (August 2, 2010)
 - Someone was posting really bad html that was screwing up Hpricot. Such is to be expected when you're soliciting html from the general public I suppose. Added test_bugs_found061710 posting test, and fixed by stripping out the user body before parsing with Hpricot.
-- Added a MaxRedirectError and corresponding maximum_redirects_per_request cattr for the Craigscrape objects. This fixed a weird bug where craigslist was sending us in redirect circles around 06/10
+- Added a MaxRedirectError and corresponding maximum_redirects_per_request cattr for the Craigscrape objects. This fixed a weird bug where craigslist was sending us in redirect circles around 06/10
 === Release 0.8.2 (April 17, 2010)
 - Found another odd parsing bug. Scrape sample is in 'listing_samples/mia_search_kitten.3.15.10.html', Adjusted CraigScrape::Listings::HEADER_DATE to fix.
 - Craigslist started added <span> tags in its post summaries. Fixed. See sample in test_new_listing_span051710
 === Release 0.8.1 (Feb 10, 2010)
-- Found an odd parsing bug occured for the first time today. Scrape sample is in 'listing_samples/mia_sss_kittens2.10.10.html', Adjusted CraigScrape::Listings::LABEL to fix.
+- Found an odd parsing bug occured for the first time today. Scrape sample is in 'listing_samples/mia_sss_kittens2.10.10.html', Adjusted CraigScrape::Listings::LABEL to fix.
 - Switched to require "active_support" per the deprecation notices
 - Little adjustments to fix the rdoc readibility
@@ -83,7 +89,7 @@
 - Adjusted the examples in the readme, added a "require 'rubygems'" to the top of the listing so that they would actually work if you tried to run them verbatim (Thanks J T!)
 - Restructured some of the parsing to be less leinient when scraped values aren't matching their regexp's in the PostSummary
 - It seems like craigslist returns a 404 on pages that exist, for no good reason on occasion. Added a retry mechanism that wont take no for an answer, unless we get a defineable number of them in a row
-- Added CraigScrape cattr_accessors : retries_on_fetch_fail, sleep_between_fetch_retries .
+- Added CraigScrape cattr_accessors : retries_on_fetch_fail, sleep_between_fetch_retries .
 - Adjusted craigwatch to not commit any database changes until the notification email goes out. This way if there's an error, the user wont miss any results on a re-run
 - Added a FetchError for http requests that don't return 200 or redirect...
 - Adjusted craigwatch to use scrape_until instead of scrape_since, this new approach cuts down on the url fetching by assuming that if we come across something we've already tracked, we dont need to keep going any further. NOTE: We still can't use a 'last_scraped_url' on the TrackedSearch model b/c sometimes posts get deleted.

data/COPYING.LESSER CHANGED Viewed

@@ -1,4 +1,4 @@
-       GNU LESSER GENERAL PUBLIC LICENSE
+		   GNU LESSER GENERAL PUBLIC LICENSE
                        Version 3, 29 June 2007
  Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>

data/README CHANGED Viewed

@@ -25,10 +25,10 @@ On the 'miami.craigslist.org' site, using the query "search/sss?query=apple"
   require 'libcraigscrape'
   require 'date'
   require 'pp'
   miami_cl = CraigScrape.new 'us/fl/miami'
   miami_cl.posts_since(Time.parse('Sep 10'), 'search/sss?query=apple').each do |post|
-    pp post
+    pp post
   end
 === Scrape Last 225 Craigslist Listings
@@ -38,26 +38,26 @@ On the 'miami.craigslist.org'  under the 'apa' category
   require 'rubygems'
   require 'libcraigscrape'
   require 'pp'
   i=1
   CraigScrape.new('us/fl/miami').each_post('apa') do |post|
     break if i > 225
-     i+=1
-     pp post
+  	 i+=1
+  	 pp post
   end
 === Multiple site with multiple section/search enumeration of posts
-In Florida, with the exception of 'miami.craigslist.org' & 'keys.craigslist.org' sites, output each post in
+In Florida, with the exception of 'miami.craigslist.org' & 'keys.craigslist.org' sites, output each post in
 the 'crg' category and for the search 'artist needed'
   require 'rubygems'
   require 'libcraigscrape'
   require 'pp'
   non_sfl_sites = CraigScrape.new('us/fl', '- us/fl/miami', '- us/fl/keys')
   non_sfl_sites.each_post('crg', 'search/sss?query=artist+needed') do |post|
-     pp post
+  	 pp post
   end
 === Scrape Single Craigslist Posting
@@ -66,7 +66,7 @@ This grabs the full details under the specific post http://miami.craigslist.org/
   require 'rubygems'
   require 'libcraigscrape'
   post = CraigScrape::Posting.new 'http://miami.craigslist.org/mdc/sys/1140808860.html'
   puts "(%s) %s:\n %s" % [ post.post_time.strftime('%b %d'), post.title, post.contents_as_plain ]
@@ -76,7 +76,7 @@ This grabs the post summaries of the single listings at http://miami.craigslist.
   require 'rubygems'
   require 'libcraigscrape'
   listing = CraigScrape::Listings.new 'http://miami.craigslist.org/search/sss?query=laptop'
   puts 'Found %d posts for the search "laptop" on this page' % listing.posts.length

data/Rakefile CHANGED Viewed

@@ -1,8 +1,8 @@
 require 'rake'
 require 'rake/clean'
-require 'rake/gempackagetask'
-require 'rake/rdoctask'
+require 'rdoc/task'
 require 'rake/testtask'
+require 'rubygems/package_task'
 require 'fileutils'
 require 'tempfile'
@@ -11,7 +11,7 @@ include FileUtils
 RbConfig = Config unless defined? RbConfig
 NAME = "olek-libcraigscrape"
-VERS = ENV['VERSION'] || "1.0.3"
+VERS = ENV['VERSION'] || "1.1.0"
 PKG = "#{NAME}-#{VERS}"
 RDOC_OPTS = ['--quiet', '--title', 'The libcraigscrape Reference', '--main', 'README', '--inline-source']
@@ -35,15 +35,8 @@ SPEC =
     s.homepage = 'http://www.derosetechnologies.com/community/libcraigscrape'
     s.rubyforge_project = 'libcraigwatch'
     s.files = PKG_FILES
-    s.require_paths = ["lib"]
+    s.require_paths = ["lib"]
     s.test_files = FileList['test/test_*.rb']
-    s.add_dependency 'nokogiri',     '>= 1.4.4'
-    s.add_dependency 'htmlentities', '>= 4.0.0'
-    s.add_dependency 'activesupport','>= 2.3.0', '< 3'
-    s.add_dependency 'activerecord', '>= 2.3.0', '< 3'
-    s.add_dependency 'actionmailer', '>= 2.3.0', '< 3'
-    s.add_dependency 'kwalify', '>= 0.7.2'
-    s.add_dependency 'sqlite3'
   end
 desc "Run all the tests"
@@ -61,7 +54,7 @@ Rake::RDocTask.new do |rdoc|
     rdoc.rdoc_files.add RDOC_FILES+Dir.glob('lib/*.rb').sort_by{|a,b| (a == 'lib/libcraigscrape.rb') ? -1 : 0 }
 end
-Rake::GemPackageTask.new(SPEC) do |p|
+Gem::PackageTask.new(SPEC) do |p|
   p.need_tar = false
   p.need_tar_gz = false
   p.need_tar_bz2 = false
@@ -81,45 +74,3 @@ end
 task :uninstall => [:clean] do
   sh %{sudo gem uninstall #{NAME}}
 end
-require 'roodi'
-require 'roodi_task'
-namespace :code_tests do
-  desc "Analyze for code complexity"
-  task :flog do
-    require 'flog'
-    flog = Flog.new
-    flog.flog_files ['lib']
-    threshold = 105
-    bad_methods = flog.totals.select do |name, score|
-       score > threshold
-    end
-    bad_methods.sort { |a,b| a[1] <=> b[1] }.each do |name, score|
-      puts "%8.1f: %s" % [score, name]
-    end
-    puts "WARNING : #{bad_methods.size} methods have a flog complexity > #{threshold}" unless bad_methods.empty?
-  end
-  desc "Analyze for code duplication"
-    require 'flay'
-    task :flay do
-    threshold = 25
-    flay = Flay.new({:fuzzy => false, :verbose => false, :mass => threshold})
-    flay.process(*Flay.expand_dirs_to_files(['lib']))
-    flay.report
-    raise "#{flay.masses.size} chunks of code have a duplicate mass > #{threshold}" unless flay.masses.empty?
-  end
-  RoodiTask.new 'roodi', ['lib/*.rb'], 'roodi.yml'
-end
-desc "Run all code tests"
-task :code_tests => %w(code_tests:flog code_tests:flay code_tests:roodi)

data/bin/craig_report_schema.yml CHANGED Viewed

@@ -25,7 +25,7 @@ mapping:
       mapping:
          "adapter":  { type: str, required: yes }
          "dbfile":   { type: str, required: no }
-         "host":    { type: str, required: no }
+         "host":  	{ type: str, required: no }
          "username": { type: str, required: no }
          "password": { type: str, required: no }
          "socket":   { type: str, required: no }
@@ -50,7 +50,7 @@ mapping:
               "summary_or_full_post_has_no": {type: seq, required: no, sequence: [ {type: str, unique: yes} ]}
               "location_has":                {type: seq, required: no, sequence: [ {type: str, unique: yes} ]}
               "location_has_no":             {type: seq, required: no, sequence: [ {type: str, unique: yes} ]}
-              "sites":
+              "sites":
                  type: seq
                  required: yes
                  sequence:
@@ -62,7 +62,7 @@ mapping:
                  sequence:
                    - type: str
                      unique: yes
-              "starting":
+              "starting":
                  type: str
                  required: no
                  pattern: /^[\d]{1,2}\/[\d]{1,2}\/(?:[\d]{2}|[\d]{4})$/

data/bin/craigwatch CHANGED Viewed

@@ -1,4 +1,5 @@
-#!/usr/bin/ruby
+#!/usr/bin/env ruby
+# encoding: UTF-8
 #
 # =craigwatch - A email-based "post monitoring" solution
 #
@@ -160,9 +161,9 @@ $: << File.dirname(__FILE__) + '/../lib'
 require 'rubygems'
-gem 'kwalify',      '~> 0.7'
-gem 'activerecord', '~> 2.3'
-gem 'actionmailer', '~> 2.3'
+gem 'kwalify'
+gem 'activerecord'
+gem 'actionmailer'
 require 'kwalify'
 require 'active_record'
@@ -252,7 +253,7 @@ class CraigReportDefinition #:nodoc:
     def starting_at
       (@starting) ?
-        Time.parse(@starting) :
+        Time.strptime(@starting, "%m/%d/%Y") :
         Time.now.yesterday.beginning_of_day
     end
@@ -290,17 +291,23 @@ class CraigReportDefinition #:nodoc:
     private
     def matches_all?(conditions, against)
-      against = against.to_a.compact
-      (conditions.nil? or conditions.all?{|c| against.any?{|a| match_against c, a } }) ? true : false
+      (conditions.nil? or conditions.all?{|c| sanitized_against(against).any?{|a| match_against c, a } }) ? true : false
     end
     def doesnt_match_any?(conditions, against)
-      against = against.to_a.compact
-      (conditions.nil? or conditions.all?{|c| against.any?{|a| !match_against c, a } }) ? true : false
+      (conditions.nil? or conditions.all?{|c| sanitized_against(against).any?{|a| !match_against c, a } }) ? true : false
     end
     def match_against(condition, against)
-      (against.scan( condition.is_re? ? condition.to_re : /#{condition}/i).length > 0) ? true : false
+      (CraigScrape::Scraper.he_decode(against).scan( condition.is_re? ? condition.to_re : /#{condition}/i).length > 0) ? true : false
+    end
+    # This is kind of a hack to deal with ruby 1.9. Really the filtering mechanism
+    # needs to be factored out and tested....
+    def sanitized_against(against)
+      against = against.lines if against.respond_to? :lines
+      against = against.to_a if against.respond_to? :to_a
+      (against.nil?) ? [] : against.compact
     end
   end
 end
@@ -353,24 +360,12 @@ class TrackedPost < ActiveRecord::Base #:nodoc:
 end
 class ReportMailer < ActionMailer::Base #:nodoc:
-  def report(to, sender, subject_template, report_tmpl)
-    formatted_subject = Time.now.strftime(subject_template)
-    recipients  to
-    from        sender
-    subject     formatted_subject
+  # default :template_path => File.dirname(__FILE__)
-    generate_view_parts 'craigslist_report', report_tmpl.merge({:subject =>formatted_subject})
-  end
-  def generate_view_parts(view_name, tmpl)
-    part( :content_type => "multipart/alternative" ) do |p|
-      [
-        { :content_type => "text/plain", :body => render_message("#{view_name.to_s}.plain.erb", tmpl) },
-        { :content_type => "text/html",  :body => render_message("#{view_name.to_s}.html.erb",  tmpl.merge({:part_container => p})) }
-      ].each { |parms| p.part parms.merge( { :charset => "UTF-8", :transfer_encoding => "7bit" } ) }
-    end
+  def report(to, sender, subject_template, report_tmpl)
+    subject = Time.now.strftime subject_template
+    @summaries = report_tmpl[:summaries]
+    mail :to => to, :subject => subject, :from => sender
   end
 end
@@ -405,13 +400,14 @@ parser.errors.each do |e|
 end and exit if parser.errors.length > 0
 # Initialize Action Mailer:
+ActionMailer::Base.prepend_view_path(File.dirname(__FILE__))
 ActionMailer::Base.logger = Logger.new STDERR if craig_report.debug_mailer?
 if craig_report.smtp_settings
-  ReportMailer.smtp_settings = craig_report.smtp_settings.symbolize_keys
+  ActionMailer::Base.smtp_settings = craig_report.smtp_settings
+  ActionMailer::Base.delivery_method = :smtp
 else
-  ReportMailer.delivery_method = :sendmail
+  ActionMailer::Base.delivery_method = :sendmail
 end
-ReportMailer.template_root = File.dirname __FILE__
 # Initialize the database:
 ActiveRecord::Base.logger = Logger.new STDERR if craig_report.debug_database?
@@ -517,7 +513,7 @@ report_summaries = craig_report.searches.collect do |search|
           # Now let's add these urls to the database so as to reduce memory overhead.
           # Keep in mind - they're not active until the email goes out.
           # also - we shouldn't have to worry about putting 'irrelevant' posts in the db, since
-          # the nbewest are always the first ones parsed:
+          # the newest are always the first ones parsed:
           tracked_listing.posts.create(
             :url => post.url,
             :created_at => newest_post_date
@@ -530,18 +526,10 @@ report_summaries = craig_report.searches.collect do |search|
     end
   end
   # Let's flatten the unique'd hash into a more useable array:
-  # NOTE: The reason we included a reject is a little complicated, but here's the gist:
-  #  * We try not to load the whole post if we don't have to
-  #  * Its possible that we met all the criterion of the passes_filter? with merely a header, and
-  #    if so we add a url to the summaries stack
-  #  * Unfortunately, when we later load that post in full, we may find that the post was posting_has_expired?
-  #    or flagged_for_removal?, etc.
-  #  * If this was the case, below we'll end up sorting against nil post_dates. This would fail.
-  #  * So - before we sort, we run a quick reject on nil post_dates
-  new_summaries = new_summaries.values.reject{|v| v.post_date.nil? }.sort{|a,b| a.post_date <=> b.post_date} # oldest goes to bottom
+  new_summaries = new_summaries.values.sort{|a,b| a.post_date <=> b.post_date} # oldest goes to bottom
   # Now Let's manage the tracking database:
   if new_summaries.length > 0
@@ -562,13 +550,13 @@ report_summaries = craig_report.searches.collect do |search|
 end
 # Time to send the email (maybe):
-unless report_summaries.select { |s| ! s[:postings].empty? }.empty?
-  ReportMailer.deliver_report(
+unless report_summaries.select { |s| !s[:postings].empty? }.empty?
+  ReportMailer.report(
     craig_report.email_to,
     craig_report.email_from,
     craig_report.report_name,
     {:summaries => report_summaries, :definition => craig_report}
-  )
+  ).deliver
 end
 # Commit (make 'active') all newly created tracked post urls:

data/bin/report_mailer/report.html.erb ADDED Viewed

@@ -0,0 +1,17 @@
+<h2><%=h @subject %></h2>
+<%@summaries.each do |summary| %>
+   <h3><%=h summary[:search].name%></h3>
+      <% if summary[:postings].length > 0 %>
+         <%summary[:postings].each do |post|%>
+            <%=('<p>%s <a href="%s">%s -</a>%s%s</p>' % [
+   			h(post.post_date.strftime('%b %d')),
+   			post.url,
+   			h(post.label),
+   			(post.location) ? '<font size="-1"> (%s)</font>' % h(post.location) : '',
+   			(post.has_pic_or_img?) ? ' <span style="color: orange"> img</span>': ''
+         	]).html_safe -%>
+         <% end %>
+      <% else %>
+         <p><i>No new postings were found, which matched the search criteria.</i></p>
+      <% end %>
+<% end %>

data/bin/report_mailer/{craigslist_report.plain.erb → report.text.erb} RENAMED Viewed

@@ -1,15 +1,15 @@
 CRAIGSLIST REPORTER
-<%@summaries.each do |summary| -%>
+<% @summaries.each do |summary| -%>
    <%=summary[:search].name %>
    <% summary[:postings].collect do |post| -%>
       <% if summary[:postings].length > 0 %>
       <%='%s : %s %s %s %s' % [
-            post.post_date.strftime('%b %d'),
-            post.label,
-            (post.location) ? " (#{post.location})" : '',
-            (post.has_pic_or_img?) ? ' [img]': '',
-            post.url
+			post.post_date.strftime('%b %d'),
+			post.label,
+			(post.location) ? " (#{post.location})" : '',
+			(post.has_pic_or_img?) ? ' [img]': '',
+			post.url
       ] -%>
       <% else %>
       No new postings were found, which matched the search criteria.

data/lib/geo_listings.rb CHANGED Viewed

@@ -1,19 +1,19 @@
 # = About geo_listings.rb
 #
 # This file contains the parsing code, and logic relating to geographic site pages and paths. You
-# should never need to include this file directly, as all of libcraigscrape's objects and methods
+# should never need to include this file directly, as all of libcraigscrape's objects and methods
 # are loaded when you use <tt>require 'libcraigscrape'</tt> in your code.
 #
 require 'scraper'
 class CraigScrape
-  # GeoListings represents a parsed Craigslist geo lisiting page. (i.e. {'http://geo.craigslist.org/iso/us'}[http://geo.craigslist.org/iso/us])
+  # GeoListings represents a parsed Craigslist geo lisiting page. (i.e. {'http://geo.craigslist.org/iso/us'}[http://geo.craigslist.org/iso/us])
   # These list all the craigslist sites in a given region.
   class GeoListings < Scraper
     GEOLISTING_BASE_URL = %{http://geo.craigslist.org/iso/}
     LOCATION_NAME    = /[ ]*\>[ ](.+)[ ]*/
     PATH_SCANNER     = /(?:\\\/|[^\/])+/
     URL_HOST_PART    = /^[^\:]+\:\/\/([^\/]+)[\/]?$/
@@ -31,18 +31,18 @@ class CraigScrape
       # Validate that required fields are present, at least - if we've downloaded it from a url
       parse_error! unless location
     end
     # Returns the GeoLocation's full name
     def location
       unless @location
         cursor = html % 'h3 > b > a:first-of-type'
-        cursor = cursor.next if cursor
+        cursor = cursor.next if cursor
         @location = $1 if cursor and LOCATION_NAME.match he_decode(cursor.to_s)
       end
       @location
     end
     # Returns a hash of site name to urls in the current listing
     def sites
       unless @sites
@@ -52,27 +52,27 @@ class CraigScrape
           @sites[site_name] = $1 if URL_HOST_PART.match el_a[:href]
         end
       end
       @sites
     end
     # This method will return an array of all possible sites that match the specified location path.
     # Sample location paths:
     # - us/ca
     # - us/fl/miami
     # - jp/fukuoka
     # - mx
-    # Here's how location paths work.
+    # Here's how location paths work.
     # - The components of the path are to be separated by '/' 's.
     # - Up to (and optionally, not including) the last component, the path should correspond against a valid GeoLocation url with the prefix of 'http://geo.craigslist.org/iso/'
     # - the last component can either be a site's 'prefix' on a GeoLocation page, or, the last component can just be a geolocation page itself, in which case all the sites on that page are selected.
     # - the site prefix is the first dns record in a website listed on a GeoLocation page. (So, for the case of us/fl/miami , the last 'miami' corresponds to the 'south florida' link on {'http://geo.craigslist.org/iso/us/fl'}[http://geo.craigslist.org/iso/us/fl]
     def self.sites_in_path(full_path, base_url = GEOLISTING_BASE_URL)
       # the base_url parameter is mostly so we can test this method
-      # Unfortunately - the easiest way to understand much of this is to see how craigslist returns
+      # Unfortunately - the easiest way to understand much of this is to see how craigslist returns
       # these geolocations. Watch what happens when you request us/fl/non-existant/page/here.
-      # I also made this a little forgiving in a couple ways not specified with official support, per
+      # I also made this a little forgiving in a couple ways not specified with official support, per
       # the rules above.
       full_path_parts = full_path.scan PATH_SCANNER
@@ -82,15 +82,15 @@ class CraigScrape
       full_path_parts.each_with_index do |part, i|
         # Let's un-escape the path-part, if needed:
-        part.gsub! "\\/", "/"
+        part.gsub! "\\/", "/"
         # If they're specifying a single site, this will catch and return it immediately
-        site = geo_listing.sites.find{ |n,s|
+        site = geo_listing.sites.find{ |n,s|
           (SITE_PREFIX.match s and $1 == part) or n == part
         } if geo_listing
         # This returns the site component of the found array
-        return [site.last] if site
+        return [site.last] if site
         begin
           # The URI escape is mostly needed to translate the space characters
@@ -109,9 +109,9 @@ class CraigScrape
       geo_listing.sites.collect{|n,s| s }
     end
-    # find_sites takes a single array of strings as an argument. Each string is to be either a location path
+    # find_sites takes a single array of strings as an argument. Each string is to be either a location path
     # (see sites_in_path), or a full site (in canonical form - ie "memphis.craigslist.org"). Optionally,
-    # each of this may/should contain a '+' or '-' prefix to indicate whether the string is supposed to
+    # each of this may/should contain a '+' or '-' prefix to indicate whether the string is supposed to
     # include sites from the master list, or remove them from the list. If no '+' or'-' is
     # specified, the default assumption is '+'. Strings are processed from left to right, which gives
     # a high degree of control over the selection set. Examples:
@@ -122,23 +122,23 @@ class CraigScrape
     # There's a lot of flexibility here, you get the idea.
     def self.find_sites(specs, base_url = GEOLISTING_BASE_URL)
       ret = []
       specs.each do |spec|
         (op,spec = $1,$2) if FIND_SITES_PARTS.match spec
-        spec = (spec.include? '.')  ? [spec] : sites_in_path(spec, base_url)
+        spec = (spec.include? '.')  ? [spec] : sites_in_path(spec, base_url)
         (op == '-') ? ret -= spec : ret |= spec
       end
       ret
     end
     private
     def self.bad_geo_path!(path)
       raise BadGeoListingPath, "Unable to load path #{path.inspect}, either you're having problems connecting to Craiglist, or your path is invalid."
     end
   end
 end