scruber 0.1.3 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 937243b2503853e755c95800e0d34ae5ecb939a7
4
- data.tar.gz: 1e2472ff1d1487da8df94292daa57a27d5b52c5a
3
+ metadata.gz: 8deee66960a3768ace0af72a5cb1eced62c90329
4
+ data.tar.gz: a0d3f330d8b838aee078f2d752226a1e5432b311
5
5
  SHA512:
6
- metadata.gz: 2a30b80a4301ca18b87e1c91217ebdfd881e13c511e643ce198e7e9f28984d271b0133b11a065b6d971690fca49b948b289147884eed7c01ee5bfc985d35cc69
7
- data.tar.gz: 77201fa5bbde27dfac6f7205711b05a3af9f85ab0ad705c6d08af5ea6b55f2c18c37a4e80878e3e7de7280ad945d20aca79aedc179087036f7bf176ecf05f597
6
+ metadata.gz: 30df32ccd86afde913d47483e9f327b94869c52a21f7c1a43f442ef8a1f138a1500d0746d4641a4542937aa2dbba7e53e4697c95b234cfdc3d07eeb8ab3d13ed
7
+ data.tar.gz: 4e57023647a62f7f312a8a77b89097920e7ab6750c1fd6562d1b6b6b4b3ff239b484f08aed6920a259365100ba4efe9d3f168b1a873183a377c877261d49ba15
data/README.md CHANGED
@@ -1,38 +1,42 @@
1
1
  # Scruber
2
2
 
3
- Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/scruber`. To experiment with that code, run `bin/console` for an interactive prompt.
3
+ Scruber is an open source scraping framework for Ruby.
4
4
 
5
- TODO: Delete this and the text above, and describe your gem
5
+ ## Getting started
6
6
 
7
- ## Installation
7
+ 1. Install Scruber at the command prompt if you haven't yet:
8
8
 
9
- Add this line to your application's Gemfile:
9
+ $ gem install scruber
10
10
 
11
- ```ruby
12
- gem 'scruber'
13
- ```
11
+ 2. Create a new workspace
14
12
 
15
- And then execute:
13
+ $ scruber new myworkspace
16
14
 
17
- $ bundle
15
+ 3. Create a new scraper
18
16
 
19
- Or install it yourself as:
17
+ $ scruber new scraper example
20
18
 
21
- $ gem install scruber
22
19
 
23
- ## Usage
20
+ ```ruby
21
+ Scruber.run do
22
+ csv_file 'output.csv', col_sep: ','
23
+
24
+ get 'http://example.com'
24
25
 
25
- TODO: Write usage instructions here
26
+ parse :html do |page, html|
27
+ csv_out html.at('title').text
28
+ end
29
+ end
30
+ ```
26
31
 
27
- ## Development
32
+ 4. Run your scraper
28
33
 
29
- After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
34
+ $ scruber start example
30
35
 
31
- To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
32
36
 
33
37
  ## Contributing
34
38
 
35
- Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/scruber.
39
+ Bug reports and pull requests are welcome on GitHub at https://github.com/scruber/scruber.
36
40
 
37
41
  ## License
38
42
 
@@ -17,7 +17,7 @@ module Scruber
17
17
  Scruber.configuration.merge_options(options)
18
18
  @callbacks_options = {}
19
19
  @callbacks = {}
20
- @on_complete_callbacks = {}
20
+ @on_complete_callbacks = []
21
21
  @queue = Scruber::Queue.new(scraper_name: scraper_name)
22
22
  @fetcher = Scruber::Fetcher.new
23
23
  load_extenstions
@@ -39,7 +39,7 @@ module Scruber
39
39
  end
40
40
  end
41
41
  end
42
- @on_complete_callbacks.each do |_,callback|
42
+ @on_complete_callbacks.sort_by{|c| -c[0] }.each do |(_,callback)|
43
43
  instance_exec &(callback)
44
44
  end
45
45
  end
@@ -84,8 +84,8 @@ module Scruber
84
84
  @callbacks[page_type.to_sym] = block
85
85
  end
86
86
 
87
- def on_complete_callback(name, &block)
88
- @on_complete_callbacks[name] = block
87
+ def on_complete(priority=1, &block)
88
+ @on_complete_callbacks.push [priority,block]
89
89
  end
90
90
 
91
91
  def process_page(page, page_type)
@@ -7,7 +7,7 @@ module Scruber
7
7
  file_id = options.fetch(:file_id) { :default }.to_sym
8
8
  options.delete(:file_id)
9
9
  Scruber::Core::Extensions::CsvOutput.register_csv file_id, path, options
10
- on_complete_callback :close_csv_files do
10
+ on_complete -1 do
11
11
  Scruber::Core::Extensions::CsvOutput.close_all
12
12
  end
13
13
  end
@@ -7,7 +7,7 @@ module Scruber
7
7
  end
8
8
 
9
9
  def read(options={})
10
- col_sep = options.delete(:col_sep) || ';'
10
+ col_sep = options.delete(:col_sep) || ','
11
11
 
12
12
  CSV.foreach(@file_path, col_sep: col_sep, headers: true, encoding: 'utf-8') do |csv_row|
13
13
  if options.blank?
@@ -1,3 +1,3 @@
1
1
  module Scruber
2
- VERSION = "0.1.3"
2
+ VERSION = "0.1.4"
3
3
  end
data/lib/scruber.rb CHANGED
@@ -26,6 +26,10 @@ require "scruber/core/extensions/csv_output"
26
26
  require "scruber/core/extensions/queue_aliases"
27
27
  require "scruber/core/extensions/parser_aliases"
28
28
 
29
+ require "scruber/helpers/dictionary_reader"
30
+ require "scruber/helpers/dictionary_reader/xml"
31
+ require "scruber/helpers/dictionary_reader/csv"
32
+
29
33
  # require "scruber/core/configuration"
30
34
  # require "scruber/core/configuration"
31
35
 
@@ -44,11 +48,6 @@ module Scruber
44
48
  autoload :AbstractAdapter, "scruber/helpers/fetcher_agent_adapters/abstract_adapter"
45
49
  autoload :Memory, "scruber/helpers/fetcher_agent_adapters/memory"
46
50
  end
47
- autoload :DictionaryReader, "scruber/helpers/dictionary_reader"
48
- module DictionaryReader
49
- autoload :Xml, "scruber/helpers/dictionary_reader/xml"
50
- autoload :Csv, "scruber/helpers/dictionary_reader/csv"
51
- end
52
51
  end
53
52
 
54
53
  class << self
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: scruber
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.3
4
+ version: 0.1.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ivan Goncharov
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2018-03-17 00:00:00.000000000 Z
11
+ date: 2018-04-11 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: typhoeus