RubyGems - nikkou - Versions diffs - 0.0.2 → 0.0.3 - Mend

nikkou 0.0.2 → 0.0.3

Files changed (12) hide show

data/README.md +131 -22
data/Rakefile +1 -1
data/lib/nikkou/nokogiri/xml/node.rb +71 -7
data/lib/nikkou/nokogiri/xml/node_set.rb +18 -0
data/lib/nikkou/version.rb +1 -1
data/spec/drillable_spec.rb +1 -1
data/spec/files/test.html +5 -0
data/spec/findable_spec.rb +1 -1
data/spec/node_set_spec.rb +22 -8
data/spec/node_spec.rb +73 -6
data/spec/spec_helper.rb +0 -1
metadata +4 -4

data/README.md CHANGED

@@ -1,75 +1,184 @@
 Nikkou
 ======
-description
+Extract useful data from HTML and XML with ease!
 Description
 -----------
-Nikkou...
+Nikkou adds additional methods to Nokogiri to make extracting commonly-used data from HTML and XML easier. It lets you transform HTML into structured data very quickly, and it integrates nicely with [Mechanize](https://github.com/sparklemotion/mechanize).
-### time
+Method Overview
+---------------
+Here's a summary of the methods Nikkou provides (see "Methods" for details):
+### Formatting
+**parse_text** - Parses the node's text as XML and returns it as a Nokogiri::XML::NodeSet
+**time(options={})** - Intelligently parses the time (relative or absolute) of either the text or a specified attribute; accepts a `time_zone` option
+**url(attribute='href')** - Converts the href (or other specified attribute) into an absolute URL using the document's URI; `<a href="/p/1">Link</a>` yields `http://mysite.com/p/1`
+### Searching
+**attr_equals(attribute, string)** - Finds nodes where the attribute equals the string
+**attr_includes(attribute, string)** - Finds nodes where the attribute includes the string
+**attr_matches(attribute, pattern)** - Finds nodes where the attribute matches the pattern
+**drill(*methods)** - Nil-safe method chaining
+**find(path)** - Same as `search` but returns the first matched node
+**text_equals(string)** - Finds nodes where the text equals the string
+**text_includes(string)** - Finds nodes where the text includes the string
+**text_matches(pattern)** - Finds nodes where the text matches the pattern
+## Methods
+### Formatting
+#### time(options={})
 Returns a Time object (in UTC) by automatically parsing the text or specified attribute of the node.
 ```ruby
 # <a href="/p/1">3 hours ago</a>
-doc.search('a').first.time # 2013-04-16 02:42:34 UTC
+doc.search('a').first.time
 ```
 ###### Options
-`attribute` - The attribute to parse:
+`attribute`
+The attribute to parse:
 ```ruby
-# <a href="/p/1" data-published-at="2013-04-16 02:42:34">My link</a>
-doc.search('a').first.time(attribute: 'data-published-at') # 2013-04-16 02:42:34 UTC
+# <a href="/p/1" data-published-at="2013-05-22 02:42:34">My link</a>
+doc.search('a').first.time(attribute: 'data-published-at')
 ```
-`time_zone` - The document's time zone (the time will be converted from that to UTC):
+`time_zone`
+The document's time zone (the time will be converted from that to UTC):
 ```ruby
 # <a href="/p/1">3 hours ago</a>
-doc.search('a').first.time(time_zone: 'America/New_York') # 2013-04-16 06:42:34 UTC
+doc.search('a').first.time(time_zone: 'America/New_York')
 ```
-#### url
+#### url(attribute='href')
-Returns an absolute URL; useful for parsing relative hrefs. The document's `uri` needs to be set for Nikkou to know what domain to add to relative hrefs.
+Returns an absolute URL; useful for parsing relative hrefs. The document's `uri` needs to be set for Nikkou to know what domain to add to relative paths.
 ```ruby
 # <a href="/p/1">My link</a>
 doc.uri = 'http://mysite.com/mypage'
-doc.search('a').first.url # http://mysite.com/p/1
+doc.search('a').first.url # "http://mysite.com/p/1"
 ```
+If Mechanize is being used, the `uri` doesn't need to be manually set.
 ###### Options
-`attribute` - The attribute to parse:
+`attribute`
+The attribute to parse:
 ```ruby
 # <a href="/p/1" data-comments-url="/p/1#comments">My Link</a>
 doc.uri = 'http://mysite.com/mypage'
-doc.search('a').first.url('data-comments-url') # http://mysite.com/p/1#comments
+doc.search('a').first.url('data-comments-url') # "http://mysite.com/p/1#comments"
+```
+### Searching
+#### attr_equals(attribute, string)
+Selects nodes where the specified attribute equals the string.
+```ruby
+# <div data-type="news">My Text</div>
+doc.attr_equals('data-type', 'news').first.text # "My Text"
+```
+#### attr_includes(attribute, string)
+Selects nodes where the specified attribute includes the string.
+```ruby
+# <div data-type="major-news">My Text</div>
+doc.attr_equals('data-type', 'news').first.text # "My Text"
 ```
-### attr_matches(attribute, pattern)
+#### attr_matches(attribute, pattern)
-Selects nodes with an attribute matching a pattern. The pattern's matches are stored in `Node#matches`.
+Selects nodes with an attribute matching a pattern. The pattern's matches are available in `Node#matches`.
 ```ruby
 # <span data-tooltip="3 Comments">My Text</span>
-doc.search('span').attr_matches('data-tooltip', /(\d+) comments/i).first.text # My Text
-doc.search('span').attr_matches('data-tooltip', /(\d+) comments/i).first.matches # ["3 Comments", "3"]
+doc.attr_matches('data-tooltip', /(\d+) comments/i).first.text # "My Text"
+doc.attr_matches('data-tooltip', /(\d+) comments/i).first.matches # ["3 Comments", "3"]
+```
+#### drill(*methods)
+Nil-safe method chaining. Replaces this:
+```ruby
+node = doc.find('.count')
+if node
+  attribute = node.attr('data-count')
+  if attribute
+    return attribute.to_i
+  end
+end
+```
+With this:
+```ruby
+return doc.drill([:find, '.count'], [:attr, 'data-count'], :to_i)
+```
+#### find(path)
+Same as `search`, but returns the first matched node. Replaces this:
+```ruby
+nodes = node.search('h4')
+if nodes
+  return nodes.first
+end
+```
+With this:
+```ruby
+return node.find('h4')
+```
+#### text_includes(string)
+Selects nodes where the text includes the string.
+```ruby
+# <div data-type="news">My Text</div>
+doc.text_includes('Text').first.text # "My Text"
 ```
-### text_matches(attribute, pattern)
+#### text_matches(pattern)
-Selects nodes with text matching a pattern. The pattern's matches are stored in `Node#matches`.
+Selects nodes with text matching a pattern. The pattern's matches are available in `Node#matches`.
 ```ruby
 # <a href="/p/1">3 Comments</a>
-doc.search('span').text_matches(/^(\d+) comments$/i).first.attr('href') # "/p/1"
-doc.search('span').text_matches(/^(\d+) comments$/i).first.matches # ["3 Comments", "3"]
+doc.text_matches(/^(\d+) comments$/i).first.attr('href') # "/p/1"
+doc.text_matches(/^(\d+) comments$/i).first.matches # ["3 Comments", "3"]
 ```
 License

data/Rakefile CHANGED

@@ -16,7 +16,7 @@ RDoc::Task.new(:rdoc) do |rdoc|
   rdoc.rdoc_dir = 'rdoc'
   rdoc.title    = 'Nikkou'
   rdoc.options << '--line-numbers'
-  rdoc.rdoc_files.include('README.rdoc')
+  rdoc.rdoc_files.include('README.md')
   rdoc.rdoc_files.include('lib/**/*.rb')
 end

data/lib/nikkou/nokogiri/xml/node.rb CHANGED

@@ -6,14 +6,68 @@ module Nikkou
         include Nikkou::Findable
         attr_accessor :matches
+        def attr_equals(attribute, string)
+          list = []
+          traverse do |node|
+            list << node if node.attr(attribute) == string
+          end
+          ::Nokogiri::XML::NodeSet.new(document, list)
+        end
+        def attr_includes(attribute, string)
+          list = []
+          traverse do |node|
+            next if node.attr(attribute).nil?
+            list << node if node.attr(attribute).include?(string)
+          end
+          ::Nokogiri::XML::NodeSet.new(document, list)
+        end
+        def attr_matches(attribute, pattern)
+          list = []
+          traverse do |node|
+            next if node.attr(attribute).nil?
+            if node.attr(attribute).match(pattern)
+              node.matches = $~.to_a
+              list << node
+            end
+          end
+          ::Nokogiri::XML::NodeSet.new(document, list)
+        end
-        def url(attribute='href')
-          return nil if attr(attribute).nil? || document.nil? || document.uri.nil?
-          href = attr(attribute)
-          return href if href =~ /^https?:\/\//
-          return "http:#{href}" if href.start_with?('//')
-          root_url = "#{document.uri.scheme}://#{document.uri.host}"
-          URI.join(root_url, href).to_s
+        def parse_text
+          parse(text)
+        end
+        def text_equals(string)
+          list = []
+          traverse do |node|
+            next if node.is_a?(::Nokogiri::XML::Text)
+            list << node if node.text == string
+          end
+          ::Nokogiri::XML::NodeSet.new(document, list)
+        end
+        def text_includes(string)
+          list = []
+          traverse do |node|
+            next if node.is_a?(::Nokogiri::XML::Text)
+            list << node if node.text.include?(string)
+          end
+          ::Nokogiri::XML::NodeSet.new(document, list)
+        end
+        def text_matches(pattern)
+          list = []
+          traverse do |node|
+            next if node.is_a?(::Nokogiri::XML::Text)
+            if node.text.match(pattern)
+              node.matches = $~.to_a
+              list << node
+            end
+          end
+          ::Nokogiri::XML::NodeSet.new(document, list)
         end
         def time(options={})
@@ -33,6 +87,16 @@ module Nikkou
           end
           time_zone.local_to_utc(time)
         end
+        def url(attribute='href')
+          return nil if attr(attribute).nil?
+          href = attr(attribute)
+          return href if href =~ /^https?:\/\//
+          return "http:#{href}" if href.start_with?('//')
+          return nil if document.nil? || document.uri.nil?
+          root_url = "#{document.uri.scheme}://#{document.uri.host}"
+          URI.join(root_url, href).to_s
+        end
       end
     end
   end

data/lib/nikkou/nokogiri/xml/node_set.rb CHANGED

@@ -5,6 +5,14 @@ module Nikkou
         include Nikkou::Drillable
         include Nikkou::Findable
+        def attr_equals(attribute, string)
+          list = select do |node|
+            return false if node.attr(attribute).nil?
+            node.attr(attribute) == string
+          end
+          self.class.new(document, list)
+        end
         def attr_includes(attribute, string)
           list = select do |node|
             return false if node.attr(attribute).nil?
@@ -25,8 +33,17 @@ module Nikkou
           self.class.new(document, list)
         end
+        def text_equals(string)
+          list = select do |node|
+            next if node.is_a?(::Nokogiri::XML::Text)
+            node.text == string
+          end
+          self.class.new(document, list)
+        end
         def text_includes(string)
           list = select do |node|
+            next if node.is_a?(::Nokogiri::XML::Text)
             node.text.include?(string)
           end
           self.class.new(document, list)
@@ -35,6 +52,7 @@ module Nikkou
         def text_matches(pattern)
           list = []
           each do |node|
+            next if node.is_a?(::Nokogiri::XML::Text)
             if node.text.match(pattern)
               node.matches = $~.to_a
               list << node

data/lib/nikkou/version.rb CHANGED

@@ -1,3 +1,3 @@
 module Nikkou
-  VERSION = "0.0.2"
+  VERSION = '0.0.3'
 end

data/spec/drillable_spec.rb CHANGED

@@ -1,7 +1,7 @@
 require 'spec_helper'
 describe Nikkou::Drillable do
-  before do
+  before(:all) do
     assets_directory = File.expand_path(File.join(File.dirname(__FILE__), 'files'))
     html_file = File.join(assets_directory, 'test.html')
     @html = Nokogiri::HTML.parse(File.read(html_file))

data/spec/files/test.html CHANGED

@@ -18,5 +18,10 @@
         </ul>
       </div>
     </div>
+    <div class="xml-node">
+      &lt;div class=&quot;xml-encoded-node&quot;&gt;
+        xml encoded node value
+      &lt;/div&gt;
+    </div>
   </body>
 </html>

data/spec/findable_spec.rb CHANGED

@@ -1,7 +1,7 @@
 require 'spec_helper'
 describe Nikkou::Findable do
-  before do
+  before(:all) do
     assets_directory = File.expand_path(File.join(File.dirname(__FILE__), 'files'))
     html_file = File.join(assets_directory, 'test.html')
     @html = Nokogiri::HTML.parse(File.read(html_file))

data/spec/node_set_spec.rb CHANGED

@@ -1,12 +1,19 @@
 require 'spec_helper'
 describe Nokogiri::XML::NodeSet do
-  before do
+  before(:all) do
     assets_directory = File.expand_path(File.join(File.dirname(__FILE__), 'files'))
     html_file = File.join(assets_directory, 'test.html')
     @html = Nokogiri::HTML.parse(File.read(html_file))
   end
+  describe '.attr_equals' do
+    it 'finds nodes' do
+      nodes = @html.search('a').attr_equals('href', 'http://www.ipsum.com/')
+      nodes.first.text.should == 'ipsum'
+    end
+  end
   describe '.attr_includes' do
     it 'finds nodes' do
       nodes = @html.search('a').attr_includes('href', 'ipsum.com')
@@ -26,6 +33,20 @@ describe Nokogiri::XML::NodeSet do
     end
   end
+  describe '.text_equals' do
+    it 'finds nodes' do
+      nodes = @html.search('a').text_equals('ipsum')
+      nodes.first.text.should == 'ipsum'
+    end
+  end
+  describe '.text_includes' do
+    it 'finds nodes' do
+      nodes = @html.search('a').text_includes('ipsum')
+      nodes.first.text.should == 'ipsum'
+    end
+  end
   describe '.text_matches' do
     it 'finds nodes' do
       nodes = @html.search('a').text_matches(/(\d+) comments/)
@@ -37,11 +58,4 @@ describe Nokogiri::XML::NodeSet do
       nodes.first.matches.should == ['12 comments', '12']
     end
   end
-  describe '.text_includes' do
-    it 'finds nodes' do
-      nodes = @html.search('a').text_includes('ipsum')
-      nodes.first.text.should == 'ipsum'
-    end
-  end
 end

data/spec/node_spec.rb CHANGED

@@ -1,20 +1,77 @@
 require 'spec_helper'
 describe Nokogiri::XML::Node do
-  before do
+  before(:all) do
     assets_directory = File.expand_path(File.join(File.dirname(__FILE__), 'files'))
     html_file = File.join(assets_directory, 'test.html')
     @html = Nokogiri::HTML.parse(File.read(html_file))
     @html.uri = 'http://www.loremipsum.com/page/2'
+    # Set the time zone for .time
+    Time.zone = 'Pacific Time (US & Canada)'
   end
-  describe '.url' do
-    it 'reads absolute URLs' do
-      @html.search('a.absolute-url').first.url.should == 'http://www.absoluteurl.com/'
+  describe '.attr_equals' do
+    it 'finds nodes' do
+      nodes = @html.search('body').first.attr_equals('href', 'http://www.ipsum.com/')
+      nodes.first.text.should == 'ipsum'
     end
+  end
-    it 'reads relative URLs' do
-      @html.search('a.relative-url').first.url.should == 'http://www.loremipsum.com/p/1'
+  describe '.attr_includes' do
+    it 'finds nodes' do
+      nodes = @html.search('body').first.attr_includes('href', 'ipsum.com')
+      nodes.first.text.should == 'ipsum'
+    end
+  end
+  describe '.attr_matches' do
+    it 'finds nodes' do
+      nodes = @html.search('body').first.attr_matches('href', /(lorem|ipsum)\.com/)
+      nodes.first.text.should == 'ipsum'
+    end
+    it 'sets matches' do
+      nodes = @html.search('body').first.attr_matches('href', /(lorem|ipsum)\.com/)
+      nodes.first.matches.should == ['ipsum.com', 'ipsum']
+    end
+  end
+  describe '.parse_text' do
+    it 'converts the node\'s text to a node set' do
+      nodes = @html.search('.xml-node').first.parse_text
+      nodes.should be_an_instance_of(Nokogiri::XML::NodeSet)
+    end
+    it 'returns a node set that contains the correct content' do
+      nodes = @html.search('.xml-node').first.parse_text
+      nodes.search('.xml-encoded-node').length.should == 1
+    end
+  end
+  describe '.text_equals' do
+    it 'finds nodes' do
+      nodes = @html.search('body').first.text_equals('ipsum')
+      nodes.first.text.should == 'ipsum'
+    end
+  end
+  describe '.text_includes' do
+    it 'finds nodes' do
+      nodes = @html.search('body').first.text_includes('ipsum')
+      nodes.first.text.should == 'ipsum'
+    end
+  end
+  describe '.text_matches' do
+    it 'finds nodes' do
+      nodes = @html.search('body').first.text_matches(/(\d+) comments/)
+      nodes.first.text.should == '12 comments'
+    end
+    it 'sets matches' do
+      nodes = @html.search('body').first.text_matches(/(\d+) comments/)
+      nodes.first.matches.should == ['12 comments', '12']
     end
   end
@@ -31,4 +88,14 @@ describe Nokogiri::XML::Node do
       @html.search('.post-published-at').first.time(attribute: 'data-published-at', time_zone: 'America/New_York').to_s.should == '2013-04-01 04:00:00 UTC'
     end
   end
+  describe '.url' do
+    it 'reads absolute URLs' do
+      @html.search('a.absolute-url').first.url.should == 'http://www.absoluteurl.com/'
+    end
+    it 'reads relative URLs' do
+      @html.search('a.relative-url').first.url.should == 'http://www.loremipsum.com/p/1'
+    end
+  end
 end

data/spec/spec_helper.rb CHANGED

@@ -2,7 +2,6 @@ ENV["RAILS_ENV"] ||= 'test'
 require 'rspec'
 require 'nikkou'
-require 'pry'
 RSpec.configure do |config|
   config.color_enabled = true

metadata CHANGED

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: nikkou
 version: !ruby/object:Gem::Version
-  version: 0.0.2
+  version: 0.0.3
   prerelease:
 platform: ruby
 authors:
@@ -9,7 +9,7 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2013-04-23 00:00:00.000000000 Z
+date: 2013-06-02 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: nokogiri
@@ -91,7 +91,7 @@ dependencies:
     - - ! '>='
       - !ruby/object:Gem::Version
         version: '0'
-description: Utilities for Nokogiri
+description: Extract useful data from HTML and XML with ease!
 email:
 - tombenner@gmail.com
 executables: []
@@ -141,7 +141,7 @@ rubyforge_project:
 rubygems_version: 1.8.24
 signing_key:
 specification_version: 3
-summary: Utilities for Nokogiri
+summary: Extract useful data from HTML and XML with ease!
 test_files:
 - spec/drillable_spec.rb
 - spec/files/test.html