RubyGems - buzzsaw - Versions diffs - 0.0.1 - Mend

buzzsaw 0.0.1

Files changed (15) hide show

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: d88ab56bde5c005eaa147b077b71083da8287978
+  data.tar.gz: 710737bfcda47736888b26f45a97d5508b12df3a
+SHA512:
+  metadata.gz: f59b9ef5ddffa5d885cea106eb1fd6db06037114462e6df045d8ce4e9ee97c1e38303c0bc4df14776565c3f185d9d5f05560c5118c12d59ccbf4b85e64c927b0
+  data.tar.gz: a1ef7c9c9ce90f4195fa5bfcc367f5ffc18eedf18600256976f06e608e9d5c2c36d23d6af63a825fa16bed6d9747e8c8a66af2ed1795745cc7b883bbac99035f

data/.gitignore ADDED Viewed

@@ -0,0 +1,14 @@
+/.bundle/
+/.yardoc
+/Gemfile.lock
+/_yardoc/
+/coverage/
+/doc/
+/pkg/
+/spec/reports/
+/tmp/
+*.bundle
+*.so
+*.o
+*.a
+mkmf.log

data/Gemfile ADDED Viewed

@@ -0,0 +1,4 @@
+source 'https://rubygems.org'
+# Specify your gem's dependencies in buzzsaw.gemspec
+gemspec

data/LICENSE.txt ADDED Viewed

@@ -0,0 +1,22 @@
+Copyright (c) 2015 Jon Stokes
+MIT License
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

data/README.md ADDED Viewed

@@ -0,0 +1,147 @@
+# Buzzsaw
+A DSL that wraps around `Nokogiri` and is used by stretched.io for web scraping.
+## Installation
+Add this line to your application's Gemfile:
+```ruby
+gem 'buzzsaw'
+```
+And then execute:
+    $ bundle
+Or install it yourself as:
+    $ gem install buzzsaw
+## Usage
+This gem is what stretched.io uses for its DSL -- both the JSON-based one and the
+scripting one. You can use it independently, though.
+## Query DSL
+### find_by_xpath
+Most of the time when I'm scraping the web, I just want to find the first
+bit of matching text at a matching xpath. That's why `find_by_xpath` is the workhorse
+of this query DSL.
+This method takes the following arguments:
+ - `xpath`: The xpath query string of the nodes that you want to search for a given pattern. This argument is mandatory.
+ - `match`: A regex that the text of the xpath node should match.
+ - `capture`: A regex that pulls only the matching text out of the matched string and returns it.
+ - `pattern`: If the `pattern` argument is present, then `match = capture = pattern`.
+ - `label`: If this is present, then any positive match will return the string supplied by this argument.
+Here's a look at how `find_by_xpath` works in practice.
+Let's say that you want to extract the price of `product2` from the following bit of HTML in `products.html`:
+ ```html
+ <div id="product1-details">
+  <ul>
+    <li>Status: In-stock</li>
+    <li>UPC: 00110012232</li>
+    <li>Price: $12.99</li>
+  </ul>
+ </div>
+ <div id="product2-details">
+  <ul>
+    <li>Status: In-stock</li>
+    <li>UPC: 00110012232</li>
+    <li>SKU: ITEM-2</li>
+    <li>Price: $12.99</li>
+  </ul>
+ </div>
+ ```
+You might use `find_by_xpath` as follows:
+```ruby
+source = File.open { |f| f.read("products.html") }
+buzz = Buzzsaw::Document.new(source, format: :html)
+buzz.find_by_xpath(
+  xpath: '//div[@id="product2-details"]//li',
+  pattern: /\$[0-9]+\.[0-9]+/
+)
+#=> "$12.99"
+```
+If for whatever reason you wanted that entire price node, you could do:
+```ruby
+buzz.find_by_xpath(
+  xpath: '//div[@id="product2-details"]//li',
+  match: /\$[0-9]+\.[0-9]+/
+)
+#=> "Price: $12.99"
+```
+Now let's say that you only want "12.99", without the dollar sign. You could do
+that as follows:
+```ruby
+buzz.find_by_xpath(
+  xpath: '//div[@id="product2-details"]//li',
+  match: /\$[0-9]+\.[0-9]+/
+  capture: /[0-9]+\.[0-9]/
+)
+#=> "12.99"
+```
+Sometimes you might want to return a specific bit of text if you find a match on a page.
+This can be done with the `label` argument.
+For instance, what if we want to the `find_by_xpath` function to return the token
+`in_stock` if we use it to find that the item is in stock. We'd do that as follows:
+```ruby
+buzz.find_by_xpath(
+  xpath: '//div[@id="product2-details"]//li',
+  pattern: /Status: In-stock/
+  label: 'in_stock'
+)
+#=> in_stock
+```
+These examples are contrived, but you get the idea.
+### collect_by_xpath
+Consider the list of product details above. Let's say that I want
+it capture and store those details as a human-readable string. If I have a `Nokogiri::Document` called
+`doc` with the above HTML in it, then look at the following:
+```ruby
+doc.xpath("//div[@id='product2-details']//li").text
+#=> Status: In-stockUPC: 00110012232SKU: ITEM-2Price: $12.99
+```
+All of the nodes are crammed together, but it would be nice if I could insert
+a space in between them. That's one place where `collect_by_xpath` helps.
+```ruby
+buzz.collect_by_xpath(
+  xpath: "//div[@id='product2-details']//li",
+  join: ' '
+)
+#=> Status: In-stock UPC: 00110012232 SKU: ITEM-2 Price: $12.99
+```
+The `collect_by_xpath` function finds all of the matching nodes and concatenates
+their text, using the character(s) supplied by optional `join` as a delimiter.
+This method also takes the same `match`, `capture`, and `pattern` arguments
+as `find_by_xpath`, and they do the same thing. You can use the `match` argument to
+collect only matching nodes, and the `capture` argument to filter the final string.
+Finally, this function also takes the `label` argument.
+### find_in_table
+This method is useful for pulling text out of tables, one of the most annoying
+jobs in web scraping. The `find_in_table` method takes the following arguments:
+ - `row`: Either a regex for matching a row, or an integer row index. This argument is mandatory.
+ - `column`: Either a regex for matching a column, or an integer column index.
+## Contributing
+1. Fork it ( https://github.com/jonstokes/Buzzsaw/fork )
+2. Create your feature branch (`git checkout -b my-new-feature`)
+3. Commit your changes (`git commit -am 'Add some feature'`)
+4. Push to the branch (`git push origin my-new-feature`)
+5. Create a new Pull Request

data/Rakefile ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ require "bundler/gem_tasks"
2	+

data/buzzsaw.gemspec ADDED Viewed

@@ -0,0 +1,28 @@
+# coding: utf-8
+lib = File.expand_path('../lib', __FILE__)
+$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
+require 'buzzsaw/version'
+Gem::Specification.new do |spec|
+  spec.name          = "buzzsaw"
+  spec.version       = Buzzsaw::VERSION
+  spec.authors       = ["Jon Stokes"]
+  spec.email         = ["jon@jonstokes.com"]
+  spec.summary       = %q{A web scraping DSL built on Nokogiri}
+  spec.homepage      = ""
+  spec.license       = "MIT"
+  spec.files         = `git ls-files -z`.split("\x0")
+  spec.executables   = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
+  spec.test_files    = spec.files.grep(%r{^(test|spec|features)/})
+  spec.require_paths = ["lib"]
+  spec.add_dependency "htmlentities", "~> 4.3"
+  spec.add_dependency "nokogiri", "~> 1.6.6"
+  spec.add_dependency "activesupport", "~> 4.2"
+  spec.add_dependency "stringex", "~> 2.5"
+  spec.add_development_dependency "bundler", "~> 1.7"
+  spec.add_development_dependency "rake", "~> 10.0"
+  spec.add_development_dependency "rspec", "~> 3.3"
+end

data/lib/buzzsaw/document.rb ADDED Viewed

@@ -0,0 +1,16 @@
+module Buzzsaw
+  class Document
+    include Buzzsaw::DSL
+    attr_reader :doc
+    def initialize(source, format: nil)
+      @doc = if format == :html
+        Nokogiri::HTML(source)
+      elsif format == :xml
+        Nokogiri::XML(source)
+      else
+        Nokogiri.parse(source)
+      end
+    end
+  end
+end

data/lib/buzzsaw/dsl.rb ADDED Viewed

@@ -0,0 +1,247 @@
+module Buzzsaw
+  module DSL
+    ENCODING_EXCEPTION = defined?(Java) ? Java::JavaNioCharset::UnsupportedCharsetException : Encoding::CompatibilityError
+    #
+    # Main DSL methods
+    #
+    def find_by_xpath(args)
+      args.symbolize_keys!
+      args[:match] = args[:capture] = args[:pattern] if args[:pattern]
+      nodes = get_nodes(args)
+      target = find_target_text(args, nodes)
+      return args[:label] if args[:label] && target.present?
+      asciify_target_text(target)
+    end
+    def collect_by_xpath(args)
+      args.symbolize_keys!
+      args[:match] = args[:capture] = args[:pattern] if args[:pattern]
+      nodes = get_nodes(args)
+      target = collect_target_text(args, nodes)
+      return args[:label] if args[:label] && target.present?
+      asciify_target_text(target)
+    end
+    def find_in_table(args)
+      args.symbolize_keys!
+      xpath   = args[:xpath]
+      capture = args[:capture]
+      if args[:row].is_a?(Fixnum)
+        match_row = nil
+        row_index = args[:row]
+      else
+        row_index = nil
+        match_row = args[:row]
+      end
+      if args[:column].is_a?(Fixnum)
+        match_column = nil
+        column_index = args[:column]
+      else
+        column_index = nil
+        match_column = args[:column]
+      end
+      return unless table = doc.at_xpath(xpath)
+      # Rows match first
+      return unless row = match_table_element(table, "tr", match_row, row_index)
+      unless match_column || column_index
+        if capture
+          return row.text[capture]
+        else
+          return row.text
+        end
+      end
+      # Now columns
+      return unless col = match_table_element(row, "td", match_column, column_index)
+      return col.text unless capture
+      col.text[capture]
+    end
+    def find_by_meta_tag(args)
+      args.symbolize_keys!
+      args[:pattern] ||= args[:match] # Backwards compatibility
+      nodes = get_nodes_for_meta_attribute(args)
+      return unless target = get_content_for_meta_nodes(nodes)
+      target = target[args[:pattern]] if args[:pattern]
+      return args[:label] if args[:label] && target.present?
+      target
+    end
+    alias_method :label_by_meta_tag, :find_by_meta_tag
+    def find_by_schema_tag(value)
+      string_methods = [:upcase, :downcase, :capitalize]
+      nodes = string_methods.map do |method|
+        doc.at_xpath("//*[@itemprop=\"#{value.send(method)}\"]")
+      end.compact
+      return if nodes.empty?
+      content = nodes.first.text.strip.gsub(/\s+/," ")
+      return unless content.present?
+      content
+    end
+    def label_by_url(args)
+      args.symbolize_keys!
+      return args[:label] if "#{url}"[args[:pattern]]
+    end
+    #
+    # Meta tag convenience methods
+    #
+    def meta_property(args)
+      args.symbolize_keys!
+      args.merge!(attribute: 'property')
+      find_by_meta_tag(args)
+    end
+    def meta_name(args)
+      args.symbolize_keys!
+      args.merge!(attribute: 'name')
+      find_by_meta_tag(args)
+    end
+    def meta_og(value);   meta_property(value: "og:#{value}"); end
+    def meta_title;       meta_name(value: 'title'); end
+    def meta_keywords;    meta_name(value: 'keywords'); end
+    def meta_description; meta_name(value: 'description'); end
+    def meta_image;       meta_name(value: 'image'); end
+    def meta_price;       meta_name(value: 'price'); end
+    def meta_og_title;       meta_og('title'); end
+    def meta_og_keywords;    meta_og('keywords'); end
+    def meta_og_description; meta_og('description'); end
+    def meta_og_image;       meta_og('image'); end
+    def label_by_meta_keywords(args)
+      args.symbolize_keys!
+      return args[:label] if meta_keywords && meta_keywords[args[:pattern]]
+    end
+    #
+    # Schema.org convenience mthods
+    #
+    def schema_price;       find_by_schema_tag("price"); end
+    def schema_name;        find_by_schema_tag("name"); end
+    def schema_description; find_by_schema_tag("description"); end
+    def filter_target_text(target, filter_list)
+      filter_list.each do |filter|
+        next unless target.present?
+        filter.symbolize_keys! if filter.is_a?(Hash)
+        if filter.is_a?(String) && respond_to?(filter)
+          target = send(filter, target)
+        elsif filter[:accept]
+          target = target[filter[:accept]]
+        elsif filter[:reject]
+          target.slice!(filter[:reject])
+        elsif filter[:prefix]
+          target = "#{filter[:prefix]}#{target}"
+        elsif filter[:postfix]
+          target = "#{target}#{filter[:postfix]}"
+        end
+      end
+      target.try(:strip)
+    end
+    alias_method :filters, :filter_target_text
+    #
+    # Private
+    #
+    def match_table_element(table, element, match, index)
+      row = nil
+      row = table.xpath(".//#{element}").detect { |r| r.text && r.text[match] } if match
+      row ||= table.xpath(".//#{element}[#{index}]") if index
+      row
+    end
+    def find_target_text(args, nodes)
+      match_target_text!(nodes, args[:match])
+      # Select the first match
+      result = nodes.first.try(:strip)
+      # Filter match with the :capture regex
+      capture_target_text(result, args[:capture])
+    rescue ENCODING_EXCEPTION
+    end
+    def collect_target_text(args, nodes)
+      match_target_text!(nodes, args[:match])
+      # Reduce the matching nodes
+      result = join_target_text(nodes, args[:join])
+      # Filter the string with the :capture regex
+      capture_target_text(result, args[:capture])
+    rescue ENCODING_EXCEPTION
+    end
+    def match_target_text!(nodes, pattern)
+      return unless nodes.present?
+      nodes.select! do |node|
+        pattern ? node[pattern].present? : node.present?
+      end
+    end
+    def capture_target_text(text, pattern)
+      return unless text
+      pattern ? text[pattern] : text.gsub(/\s+/," ")
+    end
+    def join_target_text(nodes, delimiter)
+      return unless nodes.present?
+      delimiter = delimiter.to_s
+      nodes.inject { |a, b| a.to_s + delimiter + b.to_s }
+    end
+    def sanitize(text)
+      return unless str = Sanitize.clean(text, elements: [])
+      HTMLEntities.new.decode(str)
+    end
+    def get_nodes(args)
+      nodes = doc.xpath(args[:xpath])
+      nodes.map(&:text).compact
+    end
+    def get_nodes_for_meta_attribute(args)
+      attribute = args[:attribute]
+      value_variations = [:upcase, :downcase, :capitalize].map { |method| args[:value].send(method) }
+      nodes = value_variations.map do |value|
+        doc.at_xpath("//head/meta[@#{attribute}=\"#{value}\"]")
+      end.compact
+      return if nodes.empty?
+      nodes
+    end
+    def get_content_for_meta_nodes(nodes)
+      return unless nodes && nodes.any?
+      contents = nodes.map { |node| node.attribute("content") }.compact
+      return if contents.empty?
+      content = contents.first.value.strip.squeeze(" ")
+      return unless content.present?
+      content
+    end
+    def asciify_target_text(target)
+      return unless target
+      newstr = ""
+      target.each_char { |chr| newstr << (chr.dump["u{e2}"] ? '"' : chr) }
+      newstr.to_ascii
+    end
+  end
+end

data/lib/buzzsaw/version.rb ADDED Viewed

@@ -0,0 +1,3 @@
+module Buzzsaw
+  VERSION = "0.0.1"
+end

data/lib/buzzsaw.rb ADDED Viewed

@@ -0,0 +1,7 @@
+require "buzzsaw/version"
+require "buzzsaw/dsl"
+require "buzzsaw/document"
+module Buzzsaw
+end

data/spec/dsl_spec.rb ADDED Viewed

@@ -0,0 +1,123 @@
+require 'spec_helper'
+RSpec.describe Buzzsaw::DSL do
+  let(:file_name) { 'sample.html' }
+  let(:source) {
+    File.open(File.join('spec', 'fixtures', 'sample.html')) { |f| f.read }
+  }
+  let(:doc)       { Buzzsaw::Document.new(source, format: :html) }
+  describe "#find_by_xpath" do
+    it "finds the first matching node by xpath" do
+      result = doc.find_by_xpath(xpath: "//div[@class='container']//li")
+      expect(result).to eq("First Item")
+    end
+    it "takes a pattern argument" do
+      result = doc.find_by_xpath(
+        xpath:   "//div[@class='container']//li",
+        pattern: /second/i
+      )
+      expect(result).to eq("Second")
+    end
+    it "takes a match argument" do
+      result = doc.find_by_xpath(
+        xpath: "//div[@class='container']//li",
+        match: /second/i
+      )
+      expect(result).to eq("Second Item")
+    end
+    it "takes a capture argument" do
+      result = doc.find_by_xpath(
+        xpath:   "//div[@class='container']//li",
+        capture: /first/i
+      )
+      expect(result).to eq("First")
+    end
+    it "takes match and capture arguments together" do
+      result = doc.find_by_xpath(
+        xpath:   "//div[@class='container']//li",
+        match:   /Third/,
+        capture: /item/i
+      )
+      expect(result).to eq("Item")
+    end
+    it "takes a label argument" do
+      result = doc.find_by_xpath(
+        xpath: "//div[@class='container']//li",
+        match: /Third/,
+        label: "Foo"
+      )
+      expect(result).to eq("Foo")
+    end
+  end
+  describe "#collect_by_xpath" do
+    it "collects nodes by xpath" do
+      result = doc.collect_by_xpath(xpath: "//div[@class='container']//li")
+      expect(result).to eq("First ItemSecond ItemThird ItemFourth Item")
+    end
+    it "uses a join argument" do
+      result = doc.collect_by_xpath(
+        xpath: "//div[@class='container']//li",
+        join:  "|"
+      )
+      expect(result).to eq("First Item|Second Item|Third Item|Fourth Item")
+    end
+  end
+  describe "#find_in_table" do
+    it "takes a capture argument" do
+      result = doc.find_in_table(
+        xpath: "//table",
+        row:   2,
+        capture: /row/i
+      )
+      expect(result.strip.squeeze).to eq("Row")
+    end
+    context "row argument" do
+      it "matches a row by number" do
+        result = doc.find_in_table(
+          xpath: "//table",
+          row:   2
+        )
+        expect(result.strip.squeeze).to eq("Col 1, Row 2\n Col 2, Row 2")
+      end
+      it "matches a row by pattern" do
+        result = doc.find_in_table(
+          xpath: "//table",
+          row:   /Col 1\, Row 2/
+        )
+        expect(result.strip.squeeze).to eq("Col 1, Row 2\n Col 2, Row 2")
+      end
+    end
+    context "column argument" do
+      it "matches a column by number" do
+        result = doc.find_in_table(
+          xpath:  "//table",
+          row:    2,
+          column: 2
+        )
+        expect(result.strip.squeeze).to eq("Col 2, Row 2")
+      end
+      it "matches a column by pattern" do
+        result = doc.find_in_table(
+          xpath:  "//table",
+          row:    2,
+          column: /Col 1/
+        )
+        expect(result.strip.squeeze).to eq("Col 1, Row 2")
+      end
+    end
+  end
+end

data/spec/fixtures/sample.html ADDED Viewed

@@ -0,0 +1,34 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <title>Sample HTML document</title>
+  </head>
+  <body>
+    <div class="container">
+      <ul class="list">
+        <li>First Item</li>
+        <li>Second Item</li>
+      </ul>
+    </div>
+    <div class="container">
+      <ul class="list">
+        <li>Third Item</li>
+        <li>Fourth Item</li>
+      </ul>
+    </div>
+    <table>
+      <tr>
+        <td>Col 1, Row 1</td>
+        <td>Col 2, Row 1</td>
+      </tr>
+      <tr>
+        <td>Col 1, Row 2</td>
+        <td>Col 2, Row 2</td>
+      </tr>
+    </table>
+  </body>
+</html>

data/spec/spec_helper.rb ADDED Viewed

@@ -0,0 +1,12 @@
+require 'rubygems'
+require 'bundler/setup'
+require 'rspec'
+require 'active_support/all'
+require 'htmlentities'
+require 'stringex'
+require 'nokogiri'
+require 'buzzsaw'
+RSpec.configure do |config|
+  config.order = 'random'
+end

metadata ADDED Viewed

@@ -0,0 +1,158 @@
+--- !ruby/object:Gem::Specification
+name: buzzsaw
+version: !ruby/object:Gem::Version
+  version: 0.0.1
+platform: ruby
+authors:
+- Jon Stokes
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2015-08-01 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: htmlentities
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '4.3'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '4.3'
+- !ruby/object:Gem::Dependency
+  name: nokogiri
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 1.6.6
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 1.6.6
+- !ruby/object:Gem::Dependency
+  name: activesupport
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '4.2'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '4.2'
+- !ruby/object:Gem::Dependency
+  name: stringex
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '2.5'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '2.5'
+- !ruby/object:Gem::Dependency
+  name: bundler
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.7'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.7'
+- !ruby/object:Gem::Dependency
+  name: rake
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '10.0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '10.0'
+- !ruby/object:Gem::Dependency
+  name: rspec
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.3'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.3'
+description:
+email:
+- jon@jonstokes.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- ".gitignore"
+- Gemfile
+- LICENSE.txt
+- README.md
+- Rakefile
+- buzzsaw.gemspec
+- lib/buzzsaw.rb
+- lib/buzzsaw/document.rb
+- lib/buzzsaw/dsl.rb
+- lib/buzzsaw/version.rb
+- spec/dsl_spec.rb
+- spec/fixtures/sample.html
+- spec/spec_helper.rb
+homepage: ''
+licenses:
+- MIT
+metadata: {}
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubyforge_project:
+rubygems_version: 2.4.6
+signing_key:
+specification_version: 4
+summary: A web scraping DSL built on Nokogiri
+test_files:
+- spec/dsl_spec.rb
+- spec/fixtures/sample.html
+- spec/spec_helper.rb