RubyGems - nikkou - Versions diffs - 0.0.2 → 0.0.3 - Mend

nikkou 0.0.2 → 0.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

data/README.md +131 -22
data/Rakefile +1 -1
data/lib/nikkou/nokogiri/xml/node.rb +71 -7
data/lib/nikkou/nokogiri/xml/node_set.rb +18 -0
data/lib/nikkou/version.rb +1 -1
data/spec/drillable_spec.rb +1 -1
data/spec/files/test.html +5 -0
data/spec/findable_spec.rb +1 -1
data/spec/node_set_spec.rb +22 -8
data/spec/node_spec.rb +73 -6
data/spec/spec_helper.rb +0 -1
metadata +4 -4

data/README.md CHANGED

@@ -1,75 +1,184 @@
 Nikkou
 ======
-description
+Extract useful data from HTML and XML with ease!
 Description
 -----------
-Nikkou...
+Nikkou adds additional methods to Nokogiri to make extracting commonly-used data from HTML and XML easier. It lets you transform HTML into structured data very quickly, and it integrates nicely with [Mechanize](https://github.com/sparklemotion/mechanize).
-### time
+Method Overview
+---------------
+Here's a summary of the methods Nikkou provides (see "Methods" for details):
+### Formatting
+**parse_text** - Parses the node's text as XML and returns it as a Nokogiri::XML::NodeSet
+**time(options={})** - Intelligently parses the time (relative or absolute) of either the text or a specified attribute; accepts a `time_zone` option
+**url(attribute='href')** - Converts the href (or other specified attribute) into an absolute URL using the document's URI; `<a href="/p/1">Link</a>` yields `http://mysite.com/p/1`
+### Searching
+**attr_equals(attribute, string)** - Finds nodes where the attribute equals the string
+**attr_includes(attribute, string)** - Finds nodes where the attribute includes the string
+**attr_matches(attribute, pattern)** - Finds nodes where the attribute matches the pattern
+**drill(*methods)** - Nil-safe method chaining
+**find(path)** - Same as `search` but returns the first matched node
+**text_equals(string)** - Finds nodes where the text equals the string
+**text_includes(string)** - Finds nodes where the text includes the string
+**text_matches(pattern)** - Finds nodes where the text matches the pattern
+## Methods
+### Formatting
+#### time(options={})
 Returns a Time object (in UTC) by automatically parsing the text or specified attribute of the node.
 ```ruby
 # <a href="/p/1">3 hours ago</a>
-doc.search('a').first.time # 2013-04-16 02:42:34 UTC
+doc.search('a').first.time
 ```
 ###### Options
-`attribute` - The attribute to parse:
+`attribute`
+The attribute to parse:
 ```ruby
-# <a href="/p/1" data-published-at="2013-04-16 02:42:34">My link</a>
-doc.search('a').first.time(attribute: 'data-published-at') # 2013-04-16 02:42:34 UTC
+# <a href="/p/1" data-published-at="2013-05-22 02:42:34">My link</a>
+doc.search('a').first.time(attribute: 'data-published-at')
 ```
-`time_zone` - The document's time zone (the time will be converted from that to UTC):
+`time_zone`
+The document's time zone (the time will be converted from that to UTC):
 ```ruby
 # <a href="/p/1">3 hours ago</a>
-doc.search('a').first.time(time_zone: 'America/New_York') # 2013-04-16 06:42:34 UTC
+doc.search('a').first.time(time_zone: 'America/New_York')
 ```
-#### url
+#### url(attribute='href')
-Returns an absolute URL; useful for parsing relative hrefs. The document's `uri` needs to be set for Nikkou to know what domain to add to relative hrefs.
+Returns an absolute URL; useful for parsing relative hrefs. The document's `uri` needs to be set for Nikkou to know what domain to add to relative paths.
 ```ruby
 # <a href="/p/1">My link</a>
 doc.uri = 'http://mysite.com/mypage'
-doc.search('a').first.url # http://mysite.com/p/1
+doc.search('a').first.url # "http://mysite.com/p/1"
 ```
+If Mechanize is being used, the `uri` doesn't need to be manually set.
 ###### Options
-`attribute` - The attribute to parse:
+`attribute`
+The attribute to parse:
 ```ruby
 # <a href="/p/1" data-comments-url="/p/1#comments">My Link</a>
 doc.uri = 'http://mysite.com/mypage'
-doc.search('a').first.url('data-comments-url') # http://mysite.com/p/1#comments
+doc.search('a').first.url('data-comments-url') # "http://mysite.com/p/1#comments"
+```
+### Searching
+#### attr_equals(attribute, string)
+Selects nodes where the specified attribute equals the string.
+```ruby
+# <div data-type="news">My Text</div>
+doc.attr_equals('data-type', 'news').first.text # "My Text"
+```
+#### attr_includes(attribute, string)
+Selects nodes where the specified attribute includes the string.
+```ruby
+# <div data-type="major-news">My Text</div>
+doc.attr_equals('data-type', 'news').first.text # "My Text"
 ```
-### attr_matches(attribute, pattern)
+#### attr_matches(attribute, pattern)
-Selects nodes with an attribute matching a pattern. The pattern's matches are stored in `Node#matches`.
+Selects nodes with an attribute matching a pattern. The pattern's matches are available in `Node#matches`.
 ```ruby
 # <span data-tooltip="3 Comments">My Text</span>
-doc.search('span').attr_matches('data-tooltip', /(\d+) comments/i).first.text # My Text
-doc.search('span').attr_matches('data-tooltip', /(\d+) comments/i).first.matches # ["3 Comments", "3"]
+doc.attr_matches('data-tooltip', /(\d+) comments/i).first.text # "My Text"
+doc.attr_matches('data-tooltip', /(\d+) comments/i).first.matches # ["3 Comments", "3"]
+```
+#### drill(*methods)
+Nil-safe method chaining. Replaces this:
+```ruby
+node = doc.find('.count')
+if node
+  attribute = node.attr('data-count')
+  if attribute
+    return attribute.to_i
+  end
+end
+```
+With this:
+```ruby
+return doc.drill([:find, '.count'], [:attr, 'data-count'], :to_i)
+```
+#### find(path)
+Same as `search`, but returns the first matched node. Replaces this:
+```ruby
+nodes = node.search('h4')
+if nodes
+  return nodes.first
+end
+```
+With this:
+```ruby
+return node.find('h4')
+```
+#### text_includes(string)
+Selects nodes where the text includes the string.
+```ruby
+# <div data-type="news">My Text</div>
+doc.text_includes('Text').first.text # "My Text"
 ```
-### text_matches(attribute, pattern)
+#### text_matches(pattern)
-Selects nodes with text matching a pattern. The pattern's matches are stored in `Node#matches`.
+Selects nodes with text matching a pattern. The pattern's matches are available in `Node#matches`.
 ```ruby
 # <a href="/p/1">3 Comments</a>
-doc.search('span').text_matches(/^(\d+) comments$/i).first.attr('href') # "/p/1"
-doc.search('span').text_matches(/^(\d+) comments$/i).first.matches # ["3 Comments", "3"]
+doc.text_matches(/^(\d+) comments$/i).first.attr('href') # "/p/1"
+doc.text_matches(/^(\d+) comments$/i).first.matches # ["3 Comments", "3"]
 ```
 License

data/Rakefile CHANGED

@@ -16,7 +16,7 @@ RDoc::Task.new(:rdoc) do |rdoc|
   rdoc.rdoc_dir = 'rdoc'
   rdoc.title    = 'Nikkou'
   rdoc.options << '--line-numbers'
-  rdoc.rdoc_files.include('README.rdoc')
+  rdoc.rdoc_files.include('README.md')
   rdoc.rdoc_files.include('lib/**/*.rb')
 end

data/lib/nikkou/nokogiri/xml/node.rb CHANGED

@@ -6,14 +6,68 @@ module Nikkou
         include Nikkou::Findable
         attr_accessor :matches
+        def attr_equals(attribute, string)
+          list = []
+          traverse do |node|
+            list << node if node.attr(attribute) == string
+          end
+          ::Nokogiri::XML::NodeSet.new(document, list)
+        end
+        def attr_includes(attribute, string)
+          list = []
+          traverse do |node|
+            next if node.attr(attribute).nil?
+            list << node if node.attr(attribute).include?(string)
+          end
+          ::Nokogiri::XML::NodeSet.new(document, list)
+        end
+        def attr_matches(attribute, pattern)
+          list = []
+          traverse do |node|
+            next if node.attr(attribute).nil?
+            if node.attr(attribute).match(pattern)
+              node.matches = $~.to_a
+              list << node
+            end
+          end
+          ::Nokogiri::XML::NodeSet.new(document, list)
+        end
-        def url(attribute='href')
-          return nil if attr(attribute).nil? || document.nil? || document.uri.nil?
-          href = attr(attribute)
-          return href if href =~ /^https?:\/\//
-          return "http:#{href}" if href.start_with?('//')
-          root_url = "#{document.uri.scheme}://#{document.uri.host}"
-          URI.join(root_url, href).to_s
+        def parse_text
+          parse(text)
+        end
+        def text_equals(string)
+          list = []
+          traverse do |node|
+            next if node.is_a?(::Nokogiri::XML::Text)
+            list << node if node.text == string
+          end
+          ::Nokogiri::XML::NodeSet.new(document, list)
+        end
+        def text_includes(string)
+          list = []
+          traverse do |node|
+            next if node.is_a?(::Nokogiri::XML::Text)
+            list << node if node.text.include?(string)
+          end
+          ::Nokogiri::XML::NodeSet.new(document, list)
+        end
+        def text_matches(pattern)
+          list = []
+          traverse do |node|
+            next if node.is_a?(::Nokogiri::XML::Text)
+            if node.text.match(pattern)
+              node.matches = $~.to_a
+              list << node
+            end
+          end
+          ::Nokogiri::XML::NodeSet.new(document, list)
         end
         def time(options={})
@@ -33,6 +87,16 @@ module Nikkou
           end
           time_zone.local_to_utc(time)
         end
+        def url(attribute='href')
+          return nil if attr(attribute).nil?
+          href = attr(attribute)
+          return href if href =~ /^https?:\/\//
+          return "http:#{href}" if href.start_with?('//')
+          return nil if document.nil? || document.uri.nil?
+          root_url = "#{document.uri.scheme}://#{document.uri.host}"
+          URI.join(root_url, href).to_s
+        end
       end
     end
   end

data/lib/nikkou/nokogiri/xml/node_set.rb CHANGED

@@ -5,6 +5,14 @@ module Nikkou
         include Nikkou::Drillable
         include Nikkou::Findable
+        def attr_equals(attribute, string)
+          list = select do |node|
+            return false if node.attr(attribute).nil?
+            node.attr(attribute) == string
+          end
+          self.class.new(document, list)
+        end
         def attr_includes(attribute, string)
           list = select do |node|
             return false if node.attr(attribute).nil?
@@ -25,8 +33,17 @@ module Nikkou
           self.class.new(document, list)
         end
+        def text_equals(string)
+          list = select do |node|
+            next if node.is_a?(::Nokogiri::XML::Text)
+            node.text == string
+          end
+          self.class.new(document, list)
+        end
         def text_includes(string)
           list = select do |node|
+            next if node.is_a?(::Nokogiri::XML::Text)
             node.text.include?(string)
           end
           self.class.new(document, list)
@@ -35,6 +52,7 @@ module Nikkou
         def text_matches(pattern)
           list = []
           each do |node|
+            next if node.is_a?(::Nokogiri::XML::Text)
             if node.text.match(pattern)
               node.matches = $~.to_a
               list << node

data/lib/nikkou/version.rb CHANGED

@@ -1,3 +1,3 @@
 module Nikkou
-  VERSION = "0.0.2"
+  VERSION = '0.0.3'
 end

data/spec/drillable_spec.rb CHANGED

@@ -1,7 +1,7 @@
 require 'spec_helper'
 describe Nikkou::Drillable do
-  before do
+  before(:all) do
     assets_directory = File.expand_path(File.join(File.dirname(__FILE__), 'files'))
     html_file = File.join(assets_directory, 'test.html')
     @html = Nokogiri::HTML.parse(File.read(html_file))

data/spec/files/test.html CHANGED

@@ -18,5 +18,10 @@
         </ul>
       </div>
     </div>
+    <div class="xml-node">
+      &lt;div class=&quot;xml-encoded-node&quot;&gt;
+        xml encoded node value
+      &lt;/div&gt;
+    </div>
   </body>
 </html>

data/spec/findable_spec.rb CHANGED

@@ -1,7 +1,7 @@
 require 'spec_helper'
 describe Nikkou::Findable do
-  before do
+  before(:all) do
     assets_directory = File.expand_path(File.join(File.dirname(__FILE__), 'files'))
     html_file = File.join(assets_directory, 'test.html')
     @html = Nokogiri::HTML.parse(File.read(html_file))

data/spec/node_set_spec.rb CHANGED

@@ -1,12 +1,19 @@
 require 'spec_helper'
 describe Nokogiri::XML::NodeSet do
-  before do
+  before(:all) do
     assets_directory = File.expand_path(File.join(File.dirname(__FILE__), 'files'))
     html_file = File.join(assets_directory, 'test.html')
     @html = Nokogiri::HTML.parse(File.read(html_file))
   end
+  describe '.attr_equals' do
+    it 'finds nodes' do
+      nodes = @html.search('a').attr_equals('href', 'http://www.ipsum.com/')
+      nodes.first.text.should == 'ipsum'
+    end
+  end
   describe '.attr_includes' do
     it 'finds nodes' do
       nodes = @html.search('a').attr_includes('href', 'ipsum.com')
@@ -26,6 +33,20 @@ describe Nokogiri::XML::NodeSet do
     end
   end
+  describe '.text_equals' do
+    it 'finds nodes' do
+      nodes = @html.search('a').text_equals('ipsum')
+      nodes.first.text.should == 'ipsum'
+    end
+  end
+  describe '.text_includes' do
+    it 'finds nodes' do
+      nodes = @html.search('a').text_includes('ipsum')
+      nodes.first.text.should == 'ipsum'
+    end
+  end
   describe '.text_matches' do
     it 'finds nodes' do
       nodes = @html.search('a').text_matches(/(\d+) comments/)
@@ -37,11 +58,4 @@ describe Nokogiri::XML::NodeSet do
       nodes.first.matches.should == ['12 comments', '12']
     end
   end
-  describe '.text_includes' do
-    it 'finds nodes' do
-      nodes = @html.search('a').text_includes('ipsum')
-      nodes.first.text.should == 'ipsum'
-    end
-  end
 end

data/spec/node_spec.rb CHANGED

@@ -1,20 +1,77 @@
 require 'spec_helper'
 describe Nokogiri::XML::Node do
-  before do
+  before(:all) do
     assets_directory = File.expand_path(File.join(File.dirname(__FILE__), 'files'))
     html_file = File.join(assets_directory, 'test.html')
     @html = Nokogiri::HTML.parse(File.read(html_file))
     @html.uri = 'http://www.loremipsum.com/page/2'
+    # Set the time zone for .time
+    Time.zone = 'Pacific Time (US & Canada)'
   end
-  describe '.url' do
-    it 'reads absolute URLs' do
-      @html.search('a.absolute-url').first.url.should == 'http://www.absoluteurl.com/'
+  describe '.attr_equals' do
+    it 'finds nodes' do
+      nodes = @html.search('body').first.attr_equals('href', 'http://www.ipsum.com/')
+      nodes.first.text.should == 'ipsum'
     end
+  end
-    it 'reads relative URLs' do
-      @html.search('a.relative-url').first.url.should == 'http://www.loremipsum.com/p/1'
+  describe '.attr_includes' do
+    it 'finds nodes' do
+      nodes = @html.search('body').first.attr_includes('href', 'ipsum.com')
+      nodes.first.text.should == 'ipsum'
+    end
+  end
+  describe '.attr_matches' do
+    it 'finds nodes' do
+      nodes = @html.search('body').first.attr_matches('href', /(lorem|ipsum)\.com/)
+      nodes.first.text.should == 'ipsum'
+    end
+    it 'sets matches' do
+      nodes = @html.search('body').first.attr_matches('href', /(lorem|ipsum)\.com/)
+      nodes.first.matches.should == ['ipsum.com', 'ipsum']
+    end
+  end
+  describe '.parse_text' do
+    it 'converts the node\'s text to a node set' do
+      nodes = @html.search('.xml-node').first.parse_text
+      nodes.should be_an_instance_of(Nokogiri::XML::NodeSet)
+    end
+    it 'returns a node set that contains the correct content' do
+      nodes = @html.search('.xml-node').first.parse_text
+      nodes.search('.xml-encoded-node').length.should == 1
+    end
+  end
+  describe '.text_equals' do
+    it 'finds nodes' do
+      nodes = @html.search('body').first.text_equals('ipsum')
+      nodes.first.text.should == 'ipsum'
+    end
+  end
+  describe '.text_includes' do
+    it 'finds nodes' do
+      nodes = @html.search('body').first.text_includes('ipsum')
+      nodes.first.text.should == 'ipsum'
+    end
+  end
+  describe '.text_matches' do
+    it 'finds nodes' do
+      nodes = @html.search('body').first.text_matches(/(\d+) comments/)
+      nodes.first.text.should == '12 comments'
+    end
+    it 'sets matches' do
+      nodes = @html.search('body').first.text_matches(/(\d+) comments/)
+      nodes.first.matches.should == ['12 comments', '12']
     end
   end
@@ -31,4 +88,14 @@ describe Nokogiri::XML::Node do
       @html.search('.post-published-at').first.time(attribute: 'data-published-at', time_zone: 'America/New_York').to_s.should == '2013-04-01 04:00:00 UTC'
     end
   end
+  describe '.url' do
+    it 'reads absolute URLs' do
+      @html.search('a.absolute-url').first.url.should == 'http://www.absoluteurl.com/'
+    end
+    it 'reads relative URLs' do
+      @html.search('a.relative-url').first.url.should == 'http://www.loremipsum.com/p/1'
+    end
+  end
 end

data/spec/spec_helper.rb CHANGED

@@ -2,7 +2,6 @@ ENV["RAILS_ENV"] ||= 'test'
 require 'rspec'
 require 'nikkou'
-require 'pry'
 RSpec.configure do |config|
   config.color_enabled = true

metadata CHANGED

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: nikkou
 version: !ruby/object:Gem::Version
-  version: 0.0.2
+  version: 0.0.3
   prerelease:
 platform: ruby
 authors:
@@ -9,7 +9,7 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2013-04-23 00:00:00.000000000 Z
+date: 2013-06-02 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: nokogiri
@@ -91,7 +91,7 @@ dependencies:
     - - ! '>='
       - !ruby/object:Gem::Version
         version: '0'
-description: Utilities for Nokogiri
+description: Extract useful data from HTML and XML with ease!
 email:
 - tombenner@gmail.com
 executables: []
@@ -141,7 +141,7 @@ rubyforge_project:
 rubygems_version: 1.8.24
 signing_key:
 specification_version: 3
-summary: Utilities for Nokogiri
+summary: Extract useful data from HTML and XML with ease!
 test_files:
 - spec/drillable_spec.rb
 - spec/files/test.html