RubyGems - somezack-feedzirra - Versions diffs - 0.0.2 → 0.0.3 - Mend

somezack-feedzirra 0.0.2 → 0.0.3

Files changed (14) hide show

data/README.textile +10 -6
data/lib/feedzirra/atom_entry.rb +1 -0
data/lib/feedzirra/atom_feed_burner_entry.rb +2 -0
data/lib/feedzirra/feed.rb +22 -6
data/lib/feedzirra/feed_entry_utilities.rb +23 -14
data/lib/feedzirra/rss_entry.rb +1 -0
data/lib/feedzirra.rb +2 -1
data/spec/feedzirra/atom_entry_spec.rb +4 -0
data/spec/feedzirra/atom_feed_burner_entry_spec.rb +9 -0
data/spec/feedzirra/feed_entry_utilities_spec.rb +8 -1
data/spec/feedzirra/feed_spec.rb +11 -1
data/spec/feedzirra/rss_entry_spec.rb +4 -0
data/spec/feedzirra/rss_spec.rb +5 -4
metadata +13 -3

data/README.textile CHANGED Viewed

@@ -42,6 +42,11 @@ NoMethodError: undefined method `on_success' for #<Curl::Easy:0x1182724>
 </pre>
 This means that you are requiring curl-multi or the Ruby Forge version of Curb somewhere. You can't use those and need to get the taf2 version up and running.
+If you're on Debian or Ubuntu and getting errors while trying to install the taf2-curb gem, it could be because you don't have the latest version of libcurl installed. Do this to fix:
+<pre>
+sudo apt-get install libcurl4-gnutls-dev
+</pre>
 Another problem could be if you are running Mac Ports and you have libcurl installed through there. You need to uninstall it for curb to work! The version in Mac Ports is old and doesn't play nice with curb. If you're running Leopard, you can just uninstall and you should be golden. If you're on an older version of OS X, you'll then need to "download curl":http://curl.haxx.se/download.html and build from source. Then you'll have to install the taf2-curb gem again. You might have to perform the step above.
 If you're still having issues, please let me know on the mailing list. Also, "Todd Fisher (taf2)":http://github.com/taf2 is working on fixing the gem install. Please send him a full error report.
@@ -69,11 +74,13 @@ entry.author     # => "Paul Dix"
 entry.summary    # => "..."
 entry.content    # => "..."
 entry.published  # => Thu Jan 29 17:00:19 UTC 2009 # it's a Time object
+entry.categories # => ["...", "..."]
 # sanitizing an entry's content
-entry.sanitized.title   # => returns the title with harmful stuff escaped
-entry.sanitized.author  # => returns the author with harmful stuff escaped
-entry.sanitized.content # => returns the content with harmful stuff escaped
+entry.title.sanitize   # => returns the title with harmful stuff escaped
+entry.author.sanitize  # => returns the author with harmful stuff escaped
+entry.content.sanitize # => returns the content with harmful stuff escaped
+entry.content.sanitize! # => returns content with harmful stuff escaped and replaces original (also exists for author and title)
 entry.sanitize!         # => sanitizes the entry's title, author, and content in place (as in, it changes the value to clean versions)
 feed.sanitize_entries!  # => sanitizes all entries in place
@@ -133,13 +140,10 @@ This thing needs to hammer on many different feeds in the wild. I'm sure there w
 Here are some more specific TODOs.
 * Make a feedzirra-rails gem to integrate feedzirra seamlessly with Rails and ActiveRecord.
-* Add function to sanitize content.
-* Add support to automatically handle gzip and deflate encododing.
 * Add support for authenticated feeds.
 * Create a super sweet DSL for defining new parsers.
 * Test against Ruby 1.9.1 and fix any bugs.
 * I'm not keeping track of modified on entries. Should I add this?
-* Should I be parsing stuff like tags or categories for entries?
 * Clean up the fetching code inside feed.rb so it doesn't suck so hard.
 * Make the feed_spec actually mock stuff out so it doesn't hit the net.
 * Readdress how feeds determine if they can parse a document. Maybe I should use namespaces instead?

data/lib/feedzirra/atom_entry.rb CHANGED Viewed

@@ -9,5 +9,6 @@ module Feedzirra
     element :summary
     element :published
     element :created, :as => :published
+    elements :category, :as => :categories, :value => :term
   end
 end

data/lib/feedzirra/atom_feed_burner_entry.rb CHANGED Viewed

@@ -4,9 +4,11 @@ module Feedzirra
     include FeedEntryUtilities
     element :title
     element :name, :as => :author
+    element :link, :as => :url, :value => :href, :with => {:type => "text/html", :rel => "alternate"}
     element :"feedburner:origLink", :as => :url
     element :summary
     element :content
     element :published
+    elements :category, :as => :categories, :value => :term
   end
 end

data/lib/feedzirra/feed.rb CHANGED Viewed

@@ -29,17 +29,18 @@ module Feedzirra
     # when passed a single url it returns the body of the response
     # when passed an array of urls it returns a hash with the urls as keys and body of responses as values
     def self.fetch_raw(urls, options = {})
-      urls = [*urls]
+      url_queue = [*urls]
       multi = Curl::Multi.new
       responses = {}
-      urls.each do |url|
+      url_queue.each do |url|
         easy = Curl::Easy.new(url) do |curl|
           curl.headers["User-Agent"]        = (options[:user_agent] || USER_AGENT)
           curl.headers["If-Modified-Since"] = options[:if_modified_since].httpdate if options.has_key?(:if_modified_since)
           curl.headers["If-None-Match"]     = options[:if_none_match] if options.has_key?(:if_none_match)
+          curl.headers["Accept-encoding"]   = 'gzip, deflate'
           curl.follow_location = true
           curl.on_success do |c|
-            responses[url] = c.body_str
+            responses[url] = decode_content(c)
           end
           curl.on_failure do |c|
             responses[url] = c.response_code
@@ -49,7 +50,7 @@ module Feedzirra
       end
       multi.perform
-      return responses.size == 1 ? responses.values.first : responses
+      return urls.is_a?(String) ? responses.values.first : responses
     end
     def self.fetch_and_parse(urls, options = {})
@@ -64,7 +65,21 @@ module Feedzirra
       end
       multi.perform
-      return responses.size == 1 ? responses.values.first : responses
+      return urls.is_a?(String) ? responses.values.first : responses
+    end
+    def self.decode_content(c)
+      if c.header_str.match(/Content-Encoding: gzip/)
+        gz =  Zlib::GzipReader.new(StringIO.new(c.body_str))
+        xml = gz.read
+        gz.close
+      elsif c.header_str.match(/Content-Encoding: deflate/)
+        xml = Zlib::Deflate.inflate(c.body_str)
+      else
+        xml = c.body_str
+      end
+      xml
     end
     def self.update(feeds, options = {})
@@ -84,10 +99,11 @@ module Feedzirra
         curl.headers["User-Agent"]        = (options[:user_agent] || USER_AGENT)
         curl.headers["If-Modified-Since"] = options[:if_modified_since].httpdate if options.has_key?(:if_modified_since)
         curl.headers["If-None-Match"]     = options[:if_none_match] if options.has_key?(:if_none_match)
+        curl.headers["Accept-encoding"]   = 'gzip, deflate'
         curl.follow_location = true
         curl.on_success do |c|
           add_url_to_multi(multi, url_queue.shift, url_queue, responses, options) unless url_queue.empty?
-          xml = c.body_str
+          xml = decode_content(c)
           klass = determine_feed_parser_for_xml(xml)
           if klass
             feed = klass.parse(xml)

data/lib/feedzirra/feed_entry_utilities.rb CHANGED Viewed

@@ -1,5 +1,15 @@
 module Feedzirra
   module FeedEntryUtilities
+    module Sanitize
+      def sanitize!
+        self.replace(sanitize)
+      end
+      def sanitize
+        Dryopteris.sanitize(self)
+      end
+    end
     attr_reader :published
     def parse_datetime(string)
@@ -10,23 +20,22 @@ module Feedzirra
       @published = parse_datetime(val)
     end
-    def sanitized
-      dispatcher = Class.new do
-        def initialize(entry)
-          @entry = entry
-        end
-        def method_missing(method, *args)
-          Dryopteris.sanitize(@entry.send(method))
-        end
-      end
-      dispatcher.new(self)
+    def content
+      @content.extend(Sanitize)
+    end
+    def title
+      @title.extend(Sanitize)
+    end
+    def author
+      @author.extend(Sanitize)
     end
     def sanitize!
-      self.title   = sanitized.title
-      self.author  = sanitized.author
-      self.content = sanitized.content
+      self.title.sanitize!
+      self.author.sanitize!
+      self.content.sanitize!
     end
     alias_method :last_modified, :published

data/lib/feedzirra/rss_entry.rb CHANGED Viewed

@@ -11,5 +11,6 @@ module Feedzirra
     element :pubDate, :as => :published
     element :"dc:date", :as => :published
+    elements :category, :as => :categories
   end
 end

data/lib/feedzirra.rb CHANGED Viewed

@@ -2,6 +2,7 @@ $LOAD_PATH.unshift(File.dirname(__FILE__)) unless $LOAD_PATH.include?(File.dirna
 gem 'activesupport'
+require 'zlib'
 require 'curb'
 require 'sax-machine'
 require 'dryopteris'
@@ -26,5 +27,5 @@ require 'feedzirra/atom'
 require 'feedzirra/atom_feed_burner'
 module Feedzirra
-  VERSION = "0.0.1"
+  VERSION = "0.0.3"
 end

data/spec/feedzirra/atom_entry_spec.rb CHANGED Viewed

@@ -30,4 +30,8 @@ describe Feedzirra::AtomEntry do
   it "should parse the published date" do
     @entry.published.to_s.should == "Fri Jan 16 18:21:00 UTC 2009"
   end
+  it "should parse the categories" do
+    @entry.categories.should == ['Turkey', 'Seattle']
+  end
 end

data/spec/feedzirra/atom_feed_burner_entry_spec.rb CHANGED Viewed

@@ -11,6 +11,11 @@ describe Feedzirra::AtomFeedBurnerEntry do
     @entry.title.should == "Making a Ruby C library even faster"
   end
+  it "should be able to fetch a url via the 'alternate' rel if no origLink exists" do
+    entry = Feedzirra::AtomFeedBurner.parse(File.read("#{File.dirname(__FILE__)}/../sample_feeds/PaulDixExplainsNothingAlternate.xml")).entries.first
+    entry.url.should == 'http://feeds.feedburner.com/~r/PaulDixExplainsNothing/~3/519925023/making-a-ruby-c-library-even-faster.html'
+  end
   it "should parse the url" do
     @entry.url.should == "http://www.pauldix.net/2009/01/making-a-ruby-c-library-even-faster.html"
   end
@@ -30,4 +35,8 @@ describe Feedzirra::AtomFeedBurnerEntry do
   it "should parse the published date" do
     @entry.published.to_s.should == "Thu Jan 22 15:50:22 UTC 2009"
   end
+  it "should parse the categories" do
+    @entry.categories.should == ['Ruby', 'Another Category']
+  end
 end

data/spec/feedzirra/feed_entry_utilities_spec.rb CHANGED Viewed

@@ -24,7 +24,14 @@ describe Feedzirra::FeedUtilities do
     it "should provide a sanitized title" do
       new_title = "<script>" + @entry.title
       @entry.title = new_title
-      @entry.sanitized.title.should == Dryopteris.sanitize(new_title)
+      @entry.title.sanitize.should == Dryopteris.sanitize(new_title)
+    end
+    it "should sanitize content in place" do
+      new_content = "<script>" + @entry.content
+      @entry.content = new_content.dup
+      @entry.content.sanitize!.should == Dryopteris.sanitize(new_content)
+      @entry.content.should == Dryopteris.sanitize(new_content)
     end
     it "should sanitize things in place" do

data/spec/feedzirra/feed_spec.rb CHANGED Viewed

@@ -113,7 +113,7 @@ describe Feedzirra::Feed do
   describe "fetching feeds" do
     before(:each) do
       @paul_feed_url = "http://feeds.feedburner.com/PaulDixExplainsNothing"
-      @trotter_feed_url = "http://feeds.feedburner.com/trottercashion"
+      @trotter_feed_url = "http://feeds2.feedburner.com/trottercashion"
     end
     describe "handling many feeds" do
@@ -139,6 +139,11 @@ describe Feedzirra::Feed do
         results[@paul_feed_url].should =~ /Paul Dix/
         results[@trotter_feed_url].should =~ /Trotter Cashion/
       end
+      it "should always return a hash when passed an array" do
+        results = Feedzirra::Feed.fetch_raw([@paul_feed_url])
+        results.class.should == Hash
+      end
     end
     describe "#fetch_and_parse" do
@@ -169,6 +174,11 @@ describe Feedzirra::Feed do
         feeds[@trotter_feed_url].feed_url.should == @trotter_feed_url
       end
+      it "should always return a hash when passed an array" do
+        feeds = Feedzirra::Feed.fetch_and_parse([@paul_feed_url])
+        feeds.class.should == Hash
+      end
       it "should yeild the url and feed object to a :on_success lambda" do
         successful_call_mock = mock("successful_call_mock")
         successful_call_mock.should_receive(:call)

data/spec/feedzirra/rss_entry_spec.rb CHANGED Viewed

@@ -30,4 +30,8 @@ describe Feedzirra::RSSEntry do
   it "should parse the published date" do
     @entry.published.to_s.should == "Thu Dec 04 17:17:49 UTC 2008"
   end
+  it "should parse the categories" do
+    @entry.categories.should == ['computadora', 'nokogiri', 'rails']
+  end
 end

data/spec/feedzirra/rss_spec.rb CHANGED Viewed

@@ -5,10 +5,11 @@ describe Feedzirra::RSS do
     it "should return true for an RSS feed" do
       Feedzirra::RSS.should be_able_to_parse(sample_rss_feed)
     end
-    it "should return false for an rdf feed" do
-      Feedzirra::RSS.should_not be_able_to_parse(sample_rdf_feed)
-    end
+    # this is no longer true. combined rdf and rss into one
+    # it "should return false for an rdf feed" do
+    #   Feedzirra::RSS.should_not be_able_to_parse(sample_rdf_feed)
+    # end
     it "should return fase for an atom feed" do
       Feedzirra::RSS.should_not be_able_to_parse(sample_atom_feed)

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: somezack-feedzirra
 version: !ruby/object:Gem::Version
-  version: 0.0.2
+  version: 0.0.3
 platform: ruby
 authors:
 - Paul Dix
@@ -9,7 +9,7 @@ autorequire:
 bindir: bin
 cert_chain: []
-date: 2009-01-22 00:00:00 -08:00
+date: 2009-02-19 00:00:00 -08:00
 default_executable:
 dependencies:
 - !ruby/object:Gem::Dependency
@@ -30,7 +30,7 @@ dependencies:
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
-        version: 0.0.7
+        version: 0.0.9
     version:
 - !ruby/object:Gem::Dependency
   name: taf2-curb
@@ -42,6 +42,16 @@ dependencies:
       - !ruby/object:Gem::Version
         version: 0.2.3
     version:
+- !ruby/object:Gem::Dependency
+  name: builder
+  type: :runtime
+  version_requirement:
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: 2.1.2
+    version:
 - !ruby/object:Gem::Dependency
   name: activesupport
   type: :runtime