tanakai 1.5.0 → 1.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: '0363335680ba18ca855d2413e4efdce1957decab5f31c954e2b04f4f91660ac6'
4
- data.tar.gz: 3fee8a56e284ef3bae724d1ffe3cc7f1614ad374efd59d60a002a7f71353cf06
3
+ metadata.gz: aefc92adbda49240ac69ad9d3163ec3bb54930d8288705943d269ccbba41c64a
4
+ data.tar.gz: c1c361906354aba4edb1dcef9e2737b4e827c794c5f61b4104ec309a67e3ca80
5
5
  SHA512:
6
- metadata.gz: fabeeb2270349d0961294de34abe055906c38477cd4f744da9e033c626939e2672b86b30053a3d8c89bea1889f6392370e1b58296a0327f1a891d4915132478c
7
- data.tar.gz: 8e34927825ef45893de6e00c676621823b4d3ca7c28210c73720409000c04bb228b1ea0dd75293144afd1dbff6f4f370ca0312847d781895434896374bb77f7b
6
+ metadata.gz: c744ae590cbb25e9dde914174be9fb003399bdc25408c9901c3898c296d4ff5428b97d3e4942c952979586946c75b4c7027edd594d438dc387dffb6afedda3bb
7
+ data.tar.gz: ac51ac208e71cd928aa402d62f51e465b11620b742f5ab8af208323e1f636503345ae910a4a6c0fe21f3d415bcff7bba600b6022bf879877695ba711463e97c0
data/CHANGELOG.md CHANGED
@@ -1,5 +1,9 @@
1
1
  # CHANGELOG
2
2
 
3
+ ## 1.5.1
4
+ ### New
5
+ * Add `response_type` to `in_parallel`
6
+
3
7
  ## 1.5.0
4
8
  ### New
5
9
  * First release as Tanakai
data/README.md CHANGED
@@ -626,7 +626,7 @@ Check out **Capybara cheat sheets** where you can see all available methods **to
626
626
 
627
627
  ### `request_to` method
628
628
 
629
- For making requests to a particular method there is `request_to`. It requires minimum two arguments: `:method_name` and `url:`. An optional argument is `data:` (see above what for is it). Example:
629
+ For making requests to a particular method there is `request_to`. It requires minimum two arguments: `:method_name` and `url:`. An optional argument is `data:` (see above what for is it) and `response_type` (defaults to `:html`). Example:
630
630
 
631
631
  ```ruby
632
632
  class Spider < Tanakai::Base
@@ -635,11 +635,12 @@ class Spider < Tanakai::Base
635
635
 
636
636
  def parse(response, url:, data: {})
637
637
  # Process request to `parse_product` method with `https://example.com/some_product` url:
638
- request_to :parse_product, url: "https://example.com/some_product"
638
+ request_to :parse_product, url: "https://example.com/some_product.json", response_type: :json
639
639
  end
640
640
 
641
641
  def parse_product(response, url:, data: {})
642
- puts "From page https://example.com/some_product !"
642
+ puts "JSON parsed from page https://example.com/some_product.json"
643
+ puts response
643
644
  end
644
645
  end
645
646
  ```
@@ -1194,6 +1195,7 @@ I, [2018-08-22 14:49:12 +0400#13033] [M: 46982297486840] INFO -- amazon_spider:
1194
1195
  * `delay:` set delay between requests: `in_parallel(:method, urls, threads: 3, delay: 2)`. Delay can be `Integer`, `Float` or `Range` (`2..5`). In case of a Range, delay number will be chosen randomly for each request: `rand (2..5) # => 3`
1195
1196
  * `engine:` set custom engine than a default one: `in_parallel(:method, urls, threads: 3, engine: :poltergeist_phantomjs)`
1196
1197
  * `config:` pass custom options to config (see [config section](#crawler-config))
1198
+ * `response_type:` response should be returned as `:html` or `:json`, defaults to `:html`
1197
1199
 
1198
1200
  ### Active Support included
1199
1201
 
data/lib/tanakai/base.rb CHANGED
@@ -286,7 +286,7 @@ module Tanakai
286
286
  end
287
287
  end
288
288
 
289
- def in_parallel(handler, urls, threads:, data: {}, delay: nil, engine: @engine, config: {})
289
+ def in_parallel(handler, urls, threads:, data: {}, delay: nil, engine: @engine, config: {}, response_type: :html)
290
290
  parts = urls.in_sorted_groups(threads, false)
291
291
  urls_count = urls.size
292
292
 
@@ -304,12 +304,12 @@ module Tanakai
304
304
  part.each do |url_data|
305
305
  if url_data.class == Hash
306
306
  if url_data[:url].present? && url_data[:data].present?
307
- spider.request_to(handler, delay, url_data)
307
+ spider.request_to(handler, delay, url_data, response_type: response_type)
308
308
  else
309
309
  spider.public_send(handler, url_data)
310
310
  end
311
311
  else
312
- spider.request_to(handler, delay, url: url_data, data: data)
312
+ spider.request_to(handler, delay, url: url_data, data: data, response_type: response_type)
313
313
  end
314
314
  end
315
315
  ensure
@@ -4,7 +4,7 @@ git_source(:github) { |repo| "https://github.com/#{repo}.git" }
4
4
  ruby '>= 2.5'
5
5
 
6
6
  # Framework
7
- gem 'tanakai'
7
+ gem 'tanakai', '~> 1.5'
8
8
 
9
9
  # Require files in directory and child directories recursively
10
10
  gem 'require_all'
@@ -1,3 +1,3 @@
1
1
  module Tanakai
2
- VERSION = "1.5.0"
2
+ VERSION = "1.5.1"
3
3
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: tanakai
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.5.0
4
+ version: 1.5.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Victor Afanasev