RubyGems - Dynamised - Versions diffs - 0.1.0 - Mend

Dynamised 0.1.0

Files changed (23) hide show

checksums.yaml +7 -0
data/.gitignore +13 -0
data/CODE_OF_CONDUCT.md +74 -0
data/Gemfile +4 -0
data/LICENSE.txt +21 -0
data/README.md +41 -0
data/Rakefile +2 -0
data/bin/console +14 -0
data/bin/dynamised +72 -0
data/bin/setup +8 -0
data/dynamised.gemspec +32 -0
data/lib/dynamised/after_scrape_methods.rb +13 -0
data/lib/dynamised/before_scrape_methods.rb +7 -0
data/lib/dynamised/curb_dsl.rb +100 -0
data/lib/dynamised/dbm_wrapper.rb +44 -0
data/lib/dynamised/helpers.rb +26 -0
data/lib/dynamised/meta.rb +9 -0
data/lib/dynamised/node.rb +46 -0
data/lib/dynamised/scraper.rb +227 -0
data/lib/dynamised/scraper_dsl.rb +109 -0
data/lib/dynamised/writers.rb +24 -0
data/lib/dynamised.rb +8 -0
metadata +152 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: b6a50d089e06a0aed555c420f05550140421382c
+  data.tar.gz: 1e3230c210211f9a1e300d2adafffda0c03f2ee3
+SHA512:
+  metadata.gz: 23a5a47d22dfc676790017a97bde656a842988115722a079788a41f53399cbb8114a9781024b4d60fc32e5b0653209785cb4cdcb00758a0076f5fec71bdbf7b4
+  data.tar.gz: b654c59470598500794eab450194ebf93e72ec5ba607b94dac910b7ea3dfd1afa5b784304214b03942aea968c9329c5dc6a8632586ad6784f667dd921310f953

data/.gitignore ADDED Viewed

@@ -0,0 +1,13 @@
+/.bundle/
+/.yardoc
+/Gemfile.lock
+/_yardoc/
+/coverage/
+/doc/
+/pkg/
+/spec/reports/
+/tmp/
+tags
+*.db
+/*.rb
+*.gem

data/CODE_OF_CONDUCT.md ADDED Viewed

@@ -0,0 +1,74 @@
+# Contributor Covenant Code of Conduct
+## Our Pledge
+In the interest of fostering an open and welcoming environment, we as
+contributors and maintainers pledge to making participation in our project and
+our community a harassment-free experience for everyone, regardless of age, body
+size, disability, ethnicity, gender identity and expression, level of experience,
+nationality, personal appearance, race, religion, or sexual identity and
+orientation.
+## Our Standards
+Examples of behavior that contributes to creating a positive environment
+include:
+* Using welcoming and inclusive language
+* Being respectful of differing viewpoints and experiences
+* Gracefully accepting constructive criticism
+* Focusing on what is best for the community
+* Showing empathy towards other community members
+Examples of unacceptable behavior by participants include:
+* The use of sexualized language or imagery and unwelcome sexual attention or
+advances
+* Trolling, insulting/derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or electronic
+  address, without explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+  professional setting
+## Our Responsibilities
+Project maintainers are responsible for clarifying the standards of acceptable
+behavior and are expected to take appropriate and fair corrective action in
+response to any instances of unacceptable behavior.
+Project maintainers have the right and responsibility to remove, edit, or
+reject comments, commits, code, wiki edits, issues, and other contributions
+that are not aligned to this Code of Conduct, or to ban temporarily or
+permanently any contributor for other behaviors that they deem inappropriate,
+threatening, offensive, or harmful.
+## Scope
+This Code of Conduct applies both within project spaces and in public spaces
+when an individual is representing the project or its community. Examples of
+representing a project or community include using an official project e-mail
+address, posting via an official social media account, or acting as an appointed
+representative at an online or offline event. Representation of a project may be
+further defined and clarified by project maintainers.
+## Enforcement
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported by contacting the project team at mbeckerwork@gmail.com. All
+complaints will be reviewed and investigated and will result in a response that
+is deemed necessary and appropriate to the circumstances. The project team is
+obligated to maintain confidentiality with regard to the reporter of an incident.
+Further details of specific enforcement policies may be posted separately.
+Project maintainers who do not follow or enforce the Code of Conduct in good
+faith may face temporary or permanent repercussions as determined by other
+members of the project's leadership.
+## Attribution
+This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
+available at [http://contributor-covenant.org/version/1/4][version]
+[homepage]: http://contributor-covenant.org
+[version]: http://contributor-covenant.org/version/1/4/

data/Gemfile ADDED Viewed

@@ -0,0 +1,4 @@
+source 'https://rubygems.org'
+# Specify your gem's dependencies in dynamised.gemspec
+gemspec

data/LICENSE.txt ADDED Viewed

@@ -0,0 +1,21 @@
+The MIT License (MIT)
+Copyright (c) 2017 Martin Becker
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.

data/README.md ADDED Viewed

@@ -0,0 +1,41 @@
+# Dynamised
+Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/dynamised`. To experiment with that code, run `bin/console` for an interactive prompt.
+TODO: Delete this and the text above, and describe your gem
+## Installation
+Add this line to your application's Gemfile:
+```ruby
+gem 'dynamised'
+```
+And then execute:
+    $ bundle
+Or install it yourself as:
+    $ gem install dynamised
+## Usage
+TODO: Write usage instructions here
+## Development
+After checking out the repo, run `bin/setup` to install dependencies. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
+To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
+## Contributing
+Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/dynamised. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
+## License
+The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).

data/Rakefile ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ require "bundler/gem_tasks"
2	+ task :default => :spec

data/bin/console ADDED Viewed

@@ -0,0 +1,14 @@
+#!/usr/bin/env ruby
+require "bundler/setup"
+require "dynamised"
+# You can add fixtures and/or initialization code here to make experimenting
+# with your gem easier. You can also use a different console, if you like.
+# (If you use this, don't forget to add pry to your Gemfile!)
+# require "pry"
+# Pry.start
+require "irb"
+IRB.start

data/bin/dynamised ADDED Viewed

@@ -0,0 +1,72 @@
+#!/usr/bin/env ruby
+require_relative '../lib/dynamised'
+require 'commander'
+module Dynamised
+  class CLI
+    include Commander::Methods
+    def run
+      program :name, "Dynamised"
+      program :version, META::Version
+      program :description, META::Description
+      command :run do |c|
+        c.syntax = 'dynamised run <script>'
+        c.description = 'scrapes with given scraper'
+        c.action do |args,options|
+          script_path = check_and_convert(args.first)
+          class_name = get_class_name(args.first)
+          create_temp_class(class_name,File.read(script_path))
+          class_ref = Scraper.fetch(class_name)
+          spinner = TTY::Spinner.new("[:spinner] scraping with %s" % scraper_name)
+          class_ref.new.pull_and_store do
+            spinner.spin
+          end
+          spinner.success("(Successfull)")
+        end
+      end
+      command :test do |c|
+        c.syntax = 'dynamised test <script>'
+        c.description = "tests given scraper"
+        c.action do |args,options|
+          script_path = check_and_convert(args.first)
+          class_name = get_class_name(args.first)
+          create_temp_class(class_name,File.read(script_path))
+          class_ref = Scraper.fetch(class_name)
+          class_ref.new.pull_and_check
+        end
+      end
+      alias_command :r, :run
+      alias_command :t, :test
+      default_command :help
+      run!
+    end
+    def check_and_convert(path)
+      script_path = File.expand_path(path, Dir.pwd )
+      (puts "File name %s doesn't exist" % script_path and exit) unless File.exists?(script_path)
+      script_path
+    end
+    def get_class_name(string)
+      string.split('/').last.split('.').first.gsub(/ /,'_').capitalize
+    end
+    def create_temp_class(class_name,script)
+      Dynamised.module_eval <<-RUBY
+        class #{class_name} < Scraper
+          #{script}
+        end
+        RUBY
+    end
+  end
+end
+Dynamised::CLI.new.run if $0 == __FILE__

data/bin/setup ADDED Viewed

@@ -0,0 +1,8 @@
+#!/usr/bin/env bash
+set -euo pipefail
+IFS=$'\n\t'
+set -vx
+bundle install
+# Do any other automated setup that you need to do here

data/dynamised.gemspec ADDED Viewed

@@ -0,0 +1,32 @@
+# coding: utf-8
+lib = File.expand_path('../lib', __FILE__)
+$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
+require 'dynamised/meta'
+Gem::Specification.new do |spec|
+  spec.name          = "Dynamised"
+  spec.version       = Dynamised::META::Version
+  spec.authors       = ["Martin Becker"]
+  spec.email         = ["mbeckerwork@gmail.com"]
+  spec.summary       = %q{A tool to allow you to build site crawling page scrapers.}
+  spec.description   = Dynamised::META::Description
+  spec.homepage      = "https://github.com/Thermatix/dynamised-rb"
+  spec.license       = "MIT"
+  spec.files         = `git ls-files -z`.split("\x0").reject do |f|
+    f.match(%r{^(test|spec|features)/})
+  end
+  spec.bindir        = "exe"
+  spec.executables   = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
+  spec.require_paths = ["lib"]
+# {tty-spinner nokogiri awesome_print
+  spec.add_runtime_dependency "tty-spinner", "~> 0.4"
+  spec.add_runtime_dependency "nokogiri", "~> 1.7"
+  spec.add_runtime_dependency "awesome_print", "~> 1.7"
+  spec.add_runtime_dependency "commander", "~> 4.4"
+  spec.add_development_dependency "bundler", "~> 1.13"
+  spec.add_development_dependency "rake", "~> 10.0"
+end

data/lib/dynamised/after_scrape_methods.rb ADDED Viewed

@@ -0,0 +1,13 @@
+module Dynamised
+  class Scraper
+    module After_Scrape
+      def scrub_tags(string,field_data)
+        string.gsub(/<\/?[^>]*>/, "").strip.gsub(/ ?\\r\\n/,'')
+      end
+      def page_url(string,field_data)
+        @current_url
+      end
+    end
+  end
+end

data/lib/dynamised/before_scrape_methods.rb ADDED Viewed

@@ -0,0 +1,7 @@
+module Dynamised
+  class Scraper
+    module Before_Scrape
+    end
+  end
+end

data/lib/dynamised/curb_dsl.rb ADDED Viewed

@@ -0,0 +1,100 @@
+require "curb"
+module Dynamised
+  module Curb_DSL
+    def self.included(base)
+      base.extend Singleton
+      base.instance_eval do
+        attr_reader :curl, :headers,:payload, :username, :password, :auth_type, :uri, :ssl, :redirects, :type_converter
+        [:get, :post, :put, :delete, :head, :options, :patch, :link, :unlink].each do |func_name|
+          define_method func_name do |&block|
+            make_request_of func_name.to_s.upcase, &block
+          end
+        end
+        [:password,:username,:payload, :auth_type, :uri, :ssl, :redirects,:type_converter].each do |func_name|
+          define_method "set_#{func_name}" do |value|
+            self.instance_variable_set :"@#{func_name}", value
+          end
+        end
+      end
+    end
+    module Singleton
+      def request(&block)
+        puts block
+        self.new(&block).body
+      end
+      def query_params(value)
+        Curl::postalize(value)
+      end
+    end
+    def initialize(&block)
+      @headers = {}
+      instance_eval(&block) if block
+    end
+    def header(name, content)
+      @headers[name] = content
+    end
+    def make_request_of(request_method,&block)
+      @curl = Curl::Easy.new(@uri) do |http|
+        setup_request request_method, http
+      end
+      @curl.ssl_verify_peer = @ssl ||false
+      # @curl.ignore_content_length = true
+      @curl.http request_method
+      if @curl.response_code == 301
+        @uri =  @curl.redirect_url
+        make_request_of request_method
+      end
+    end
+    def status_code
+      @curl.response_code
+    end
+    def body
+      @curl.body
+    end
+    def query_params(value)
+      Curl::postalize(value)
+    end
+    private
+    def setup_request(method,http)
+      http.headers['request-method'] = method.to_s
+      http.headers.update(headers || {})
+      http.max_redirects = @redirects || 3
+      http.post_body = get_payload || nil
+      http.http_auth_types = @auth_type || nil
+      http.username = @username || nil
+      http.password = @password || nil
+      http.useragent = "curb"
+      http
+    end
+    def get_payload
+      if @type_converter
+        @type_converter.call(@payload)
+      else
+        @payload
+      end
+    end
+  end
+end

data/lib/dynamised/dbm_wrapper.rb ADDED Viewed

@@ -0,0 +1,44 @@
+#wrapper taken from: https://gist.github.com/stephan-nordnes-eriksen/6c9c56f63f36d5d100b://gist.github.com/stephan-nordnes-eriksen/6c9c56f63f36d5d100b2
+class DBM_Wrapper
+  def initialize(file_name)
+      # @store = DBM.open("testDBM", 666, DBM::WRCREAT)
+      @store = DBM.new(file_name)
+  end
+  def []=(key,val)
+      @store[key] = val
+  end
+  def [](key)
+      @store[key]
+  end
+  def each(&block)
+    @store.each(&block)
+  end
+  def values
+      @store.values
+  end
+  def keys
+      @store.keys
+  end
+  def delete(key)
+      @store.delete(key)
+  end
+  def stop
+      @store.close unless @store.closed?
+  end
+  def destroy
+      stop
+      FileUtils.rm("testDBM.db")
+  end
+  def sync_lock
+  end
+end

data/lib/dynamised/helpers.rb ADDED Viewed

@@ -0,0 +1,26 @@
+module Dynamised
+  class Scraper
+    module Helpers
+      def to_doc(html)
+        Nokogiri::HTML(html)
+      end
+      def sub_page(html_listing)
+        html_listing.xpath(".%s" % get_sub_page_tag[:path]).attr('href').to_s
+      end
+      def mpc(doc)
+        get_mpc(doc.xpath(get_mpc_tag[:path]))
+      end
+      def get_mpc(doc)
+        doc[-2].respond_to?(:inner_text) ? doc[-2].inner_text.to_i : 0
+      end
+      def field_keys
+        @current_page.data[:fields].keys
+      end
+    end
+  end
+end

data/lib/dynamised/meta.rb ADDED Viewed

@@ -0,0 +1,9 @@
+module Dynamised
+  module META
+    Version = "0.1.0"
+    Description = <<-DESC.gsub(/^\s*/, '')
+      A tool that allows a user to build a web scraper that works by recursively crawling pages until
+      it finds the requested infomation.
+    DESC
+  end
+end

data/lib/dynamised/node.rb ADDED Viewed

@@ -0,0 +1,46 @@
+module Dynamised
+  class Node
+    include Enumerable
+    attr_accessor :childs,:init, :data, :ident,:siblings
+    def initialize(init={},ident=nil)
+      @ident  = ident
+      @childs = {}
+      @sibilngs =  {}
+      @init   = init.clone
+      @data   = init.clone
+    end
+    def each(&block)
+      block.call(self)
+      @childs.map do |key,child|
+        child.each(&block)
+      end
+    end
+    def <=>(other_node)
+      @data <=> other_node.data
+    end
+    def [](*keys)
+      return self if @childs.empty?
+      [*keys.flatten].inject(self) do |node,ident|
+          node.find {|n| n.ident == ident}
+      end
+    end
+    def new_child(ident,&block)
+      child = self.class.new(@init,ident)
+      child.siblings = self.childs
+      child.tap(&block) if block_given?
+      @childs[ident] = child
+    end
+    def pretty_print(pp)
+      self.each {|node| pp.text(node.ident || "" );puts "\n";pp.pp_hash node.data}
+    end
+  end
+end

data/lib/dynamised/scraper.rb ADDED Viewed

@@ -0,0 +1,227 @@
+module Dynamised
+  class Scraper
+    XPATH_Anchor = ".%s"
+    extend DSL
+    class << self
+      def inherited(base)
+        @scrapers ||= {}
+        @scrapers[base.to_s.split('::').last.downcase] = base
+        base.instance_exec do
+          set_up_tree
+        end
+      end
+      def list
+        @scrapers ||= {}
+        @scrapers.map {|i,s| i}
+      end
+      def each(&block)
+        @scrapers ||= {}
+        @scrapers.each(&block)
+      end
+      def fetch(*args,&block)
+        @scrapers ||= {}
+        @scrapers.fetch(args.first.downcase) {|name|raise "No scraper called %s was found" % name }
+      end
+    end
+    include Curb_DSL
+    include Helpers
+    include Before_Scrape
+    include After_Scrape
+    include Writers
+    def initialize(args=[],&block)
+      @args = args
+      @tree_pointer = []
+      @use_store = false
+      @scraped_data = DBM_Wrapper.new("%s_scraped_data" % self.class.to_s)
+      [:inc,:uri,:tree,:tree_pointer,:base_url,:writer].each do |attr|
+        varb_name = "@%s" % attr
+        self.instance_variable_set(varb_name,self.class.instance_variable_get(varb_name))
+      end
+      super(&block)
+    end
+    def pull_and_store(&spinner)
+      raise "No writer detected" unless @writer
+      @use_store = true
+      scrape_data(&spinner)
+      write_data(&spinner)
+    end
+    def pull_and_check
+      doc = pull_initial
+      seperator = "}#{'-' * 40}{"
+      ap seperator
+      pull(doc,@tree) do |hash|
+        ap hash
+        ap seperator
+       sleep 0.5
+      end
+    end
+    private
+    def scrape_data(&spinner)
+      pull(pull_initial) do |hash|
+        spinner.call
+      end
+    end
+    def write_data(&spinner)
+      parsed_data = @scraped_data.map {|r| JSON.parse(r) }
+      @writer.each do |type,data|
+        case type
+          when :csv
+            write_csv(parsed_data, data, &spinner)
+          when :custom
+            data.call(parsed_data, &spinner)
+          else
+            raise '%s is a non supported writer type'
+        end
+      end
+    end
+    def pull(doc,tree,&block)
+      if fields?(tree)
+        scrape(doc,tree,&block)
+      end
+      childs(tree) do |pos,node,sub_tr|
+        @current_child = node
+        spt = node.data[:meta][:sub_page_tag]
+        scrape_tag_set(doc,spt[:xpath],spt[:meta]) do |url,i|
+          pull(get_doc(segment?(url)),sub_tr||node,&block)
+        end
+      end
+    end
+    def segment?(url)
+      url =~ /http/ ? url : "%s/%s"  % [@base_url.gsub(/\/$|\z/,''), url.gsub(/\A\//,'')]
+    end
+    def tree_down(key,tree=false)
+      @tree_pointer << key
+      yield
+      @tree_pointer.pop
+    end
+    def fields?(tree)
+      not tree.data[:fields].empty?
+    end
+    def scrape(doc,tree,&block)
+      c_url = @current_url
+      if (@use_store ? !@scraped_data[c_url] : true) && can_scrape(doc,tree)
+        fields =
+        tree.data[:fields].each_with_object({}) do |(field,data),res_hash|
+          target = execute_method(data[:meta][:before],remove_style_tags(doc),res_hash)
+          value = scrape_tag(target,data[:xpath],data[:meta])
+          res_hash[field] = value ? execute_method(data[:meta][:after],value,res_hash) : data[:meta].fetch(:default,nil)
+        end
+        @scraped_data[c_url] = fields.to_json if @use_store
+        block.call(res_hash)
+      end
+    end
+    def remove_style_tags(doc)
+      doc.css("style").remove
+      doc
+    end
+    def get_by_ident(tree,ident)
+      return false unless tree
+      tree.find {|ch_i,ch| ch_i == ident}.last
+    end
+    def can_scrape(doc,tree)
+      scrape_if = tree.data[:scrape_if]
+      case true
+        when scrape_if.respond_to?(:call)
+          scrape_if.call(doc)
+        when scrape_if.respond_to?(:keys)
+          case true
+            when scrape_if.keys.include?(:fields)
+              check_for_fields(doc,tree,scrape_if)
+          end
+        else
+          @tree[@tree_pointer].data[:fields].length > 0
+      end
+    end
+    def check_for_fields(doc,tree,scrape_if)
+      [*scrape_if[:fields]].find do |field|
+        f = (tree || @tree[@tree_pointer]).data[:fields][field]
+        search_for_tag(doc,f[:xpath])
+      end
+    end
+    def execute_method(meth_name=nil,*args)
+      if meth_name
+        self.send(meth_name,*args)
+      else
+        args.first
+      end
+    end
+    def childs(node,tree=nil,&block)
+      if node.is_a? Array
+        tree.each do |child_node|
+         childs?(child_node,tree,&block)
+        end
+      else
+        unless node.childs.empty? && node.siblings.empty?
+          (node.childs.empty? ? node.siblings : node.childs).each do |ident,child_node|
+            block.call(ident,child_node,tree)
+          end
+        end
+      end
+    end
+    def scrape_tag_set(doc,xpath,meta={})
+      (doc.xpath(xpath)).each_with_index do |node,i|
+        yield(pull_from_node(node,meta),i)
+      end
+    end
+    def search_for_tag(doc,xpath)
+      doc.at_xpath(XPATH_Anchor % xpath)
+    end
+    def scrape_tag(doc,xpath,meta={})
+      pull_from_node(doc.xpath(XPATH_Anchor % xpath),meta)
+    end
+    def pull_from_node(node,meta)
+      (return nil if node.empty?) if node.respond_to?(:empty?)
+      (node.respond_to?(:empty?) ? node.first : node).send(*meta.fetch(:attr,:inner_text))
+          .send(meta.fetch(:r_type,:to_s))
+    end
+    def get_doc(url)
+      @current_url = url
+      set_uri(url)
+      get
+      to_doc(body)
+    end
+    def pull_initial
+      @inital_pull ||= begin
+        get_doc(@base_url)
+      end
+    end
+  end
+end

data/lib/dynamised/scraper_dsl.rb ADDED Viewed

@@ -0,0 +1,109 @@
+module Dynamised
+  class Scraper
+    module DSL
+      # include Tree
+      def set_up_tree
+        unless @tree
+          @tree = Node.new({
+            fields:             {},
+            meta:               {},
+            recursive_select:   false,
+            select:             false,
+            scrape_if:          nil
+          })
+          @tree_pointer = []
+          @xpath_prefix = []
+          @useables = {}
+          @writer = nil
+          @base_url = ""
+          @inc = 1
+        end
+      end
+      def tree_down(key,childs=false)
+        @tree_pointer << key
+        yield
+        @tree_pointer.pop
+      end
+      def re_useable(name,&block)
+        check_for_block(&block)
+        @useables[name] = block
+      end
+      def use(name)
+        instance_exec(&@useables[name])
+      end
+      def set_base_url(url)
+        @base_url = url
+      end
+      def set_pag_increment(value)
+        @inc = value
+      end
+      def xpath_prefix(prefix,&block)
+        check_for_block(&block)
+        @xpath_prefix << prefix
+        yield
+        @xpath_prefix.pop
+      end
+      def scrape_here_if(args=nil,&block)
+        @tree[@tree_pointer].data[:scrape_if] = args || {block: block}
+      end
+      def select_sub_page
+        @tree[@tree_pointer].data[:select] = true
+      end
+      #recursivly drill into page
+      def sub_page(items,&block)
+        items.each do |item,path|
+          @tree[@tree_pointer].new_child(item)
+          tree_down(item) do
+            set_meta_tag(:sub_page_tag,join_xpath(path),{attr: [:attr,:href]})
+            block.call
+          end
+        end
+      end
+      def set_field(name,xpath,meta={})
+        set_info(:fields,name,xpath,meta)
+      end
+      def set_meta_tag(name,xpath,meta={})
+        set_info(:meta,name,xpath,meta)
+      end
+      def writer(writers)
+        @writer = writers
+      end
+      private
+      def check_for_block(&block)
+        raise "No block given for #%s" % caller[0][/`.*'/][1..-2] unless block_given?
+      end
+      def set_info(type,name,xpath,meta)
+        @tree[@tree_pointer].data[type] = @tree[@tree_pointer].data[type].merge({name => {
+          xpath: join_xpath(xpath),
+          meta: meta
+        }})
+      end
+      def join_xpath(tag)
+        tag.empty? ? tag : @xpath_prefix.join + tag
+      end
+    end
+  end
+end

data/lib/dynamised/writers.rb ADDED Viewed

@@ -0,0 +1,24 @@
+require "csv"
+require "json"
+module Dynamised
+  class Scraper
+    module Writers
+      def write_csv(scraped_data,file_name,&spinner)
+        CSV.open(file_name, "wb") do |csv|
+          headers_written = false
+          title = ""
+          @scraped_data.each do |url,json|
+            hash = JSON.parse(json)
+    #       the next two lines are a temporary hack to solve the double scrape issue
+            next unless title != hash[:title]
+            title = hash[:title]
+            (csv << hash.keys && headers_written = true) unless headers_written
+            csv << hash.values
+            spinner.call
+          end
+      end
+      end
+    end
+  end
+end

data/lib/dynamised.rb ADDED Viewed

@@ -0,0 +1,8 @@
+%w{tty-spinner nokogiri awesome_print gdbm json}.each {|lib| require lib}
+%w{meta after_scrape_methods before_scrape_methods curb_dsl helpers node scraper_dsl writers dbm_wrapper scraper}
+  .each do |f|
+  require_relative "dynamised/%s" % f
+end
+module Dynamised
+  # Your code goes here...
+end

metadata ADDED Viewed

@@ -0,0 +1,152 @@
+--- !ruby/object:Gem::Specification
+name: Dynamised
+version: !ruby/object:Gem::Version
+  version: 0.1.0
+platform: ruby
+authors:
+- Martin Becker
+autorequire:
+bindir: exe
+cert_chain: []
+date: 2017-03-08 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: tty-spinner
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '0.4'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '0.4'
+- !ruby/object:Gem::Dependency
+  name: nokogiri
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.7'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.7'
+- !ruby/object:Gem::Dependency
+  name: awesome_print
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.7'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.7'
+- !ruby/object:Gem::Dependency
+  name: commander
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '4.4'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '4.4'
+- !ruby/object:Gem::Dependency
+  name: bundler
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.13'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.13'
+- !ruby/object:Gem::Dependency
+  name: rake
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '10.0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '10.0'
+description: |
+  A tool that allows a user to build a web scraper that works by recursively crawling pages until
+  it finds the requested infomation.
+email:
+- mbeckerwork@gmail.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- ".gitignore"
+- CODE_OF_CONDUCT.md
+- Gemfile
+- LICENSE.txt
+- README.md
+- Rakefile
+- bin/console
+- bin/dynamised
+- bin/setup
+- dynamised.gemspec
+- lib/dynamised.rb
+- lib/dynamised/after_scrape_methods.rb
+- lib/dynamised/before_scrape_methods.rb
+- lib/dynamised/curb_dsl.rb
+- lib/dynamised/dbm_wrapper.rb
+- lib/dynamised/helpers.rb
+- lib/dynamised/meta.rb
+- lib/dynamised/node.rb
+- lib/dynamised/scraper.rb
+- lib/dynamised/scraper_dsl.rb
+- lib/dynamised/writers.rb
+homepage: https://github.com/Thermatix/dynamised-rb
+licenses:
+- MIT
+metadata: {}
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubyforge_project:
+rubygems_version: 2.5.1
+signing_key:
+specification_version: 4
+summary: A tool to allow you to build site crawling page scrapers.
+test_files: []
+has_rdoc: