tanakai 1.5.0 → 1.5.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +4 -0
- data/README.md +5 -3
- data/lib/tanakai/base.rb +3 -3
- data/lib/tanakai/template/Gemfile +1 -1
- data/lib/tanakai/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: aefc92adbda49240ac69ad9d3163ec3bb54930d8288705943d269ccbba41c64a
|
4
|
+
data.tar.gz: c1c361906354aba4edb1dcef9e2737b4e827c794c5f61b4104ec309a67e3ca80
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: c744ae590cbb25e9dde914174be9fb003399bdc25408c9901c3898c296d4ff5428b97d3e4942c952979586946c75b4c7027edd594d438dc387dffb6afedda3bb
|
7
|
+
data.tar.gz: ac51ac208e71cd928aa402d62f51e465b11620b742f5ab8af208323e1f636503345ae910a4a6c0fe21f3d415bcff7bba600b6022bf879877695ba711463e97c0
|
data/CHANGELOG.md
CHANGED
data/README.md
CHANGED
@@ -626,7 +626,7 @@ Check out **Capybara cheat sheets** where you can see all available methods **to
|
|
626
626
|
|
627
627
|
### `request_to` method
|
628
628
|
|
629
|
-
For making requests to a particular method there is `request_to`. It requires minimum two arguments: `:method_name` and `url:`. An optional argument is `data:` (see above what for is it). Example:
|
629
|
+
For making requests to a particular method there is `request_to`. It requires minimum two arguments: `:method_name` and `url:`. An optional argument is `data:` (see above what for is it) and `response_type` (defaults to `:html`). Example:
|
630
630
|
|
631
631
|
```ruby
|
632
632
|
class Spider < Tanakai::Base
|
@@ -635,11 +635,12 @@ class Spider < Tanakai::Base
|
|
635
635
|
|
636
636
|
def parse(response, url:, data: {})
|
637
637
|
# Process request to `parse_product` method with `https://example.com/some_product` url:
|
638
|
-
request_to :parse_product, url: "https://example.com/some_product"
|
638
|
+
request_to :parse_product, url: "https://example.com/some_product.json", response_type: :json
|
639
639
|
end
|
640
640
|
|
641
641
|
def parse_product(response, url:, data: {})
|
642
|
-
puts "
|
642
|
+
puts "JSON parsed from page https://example.com/some_product.json"
|
643
|
+
puts response
|
643
644
|
end
|
644
645
|
end
|
645
646
|
```
|
@@ -1194,6 +1195,7 @@ I, [2018-08-22 14:49:12 +0400#13033] [M: 46982297486840] INFO -- amazon_spider:
|
|
1194
1195
|
* `delay:` set delay between requests: `in_parallel(:method, urls, threads: 3, delay: 2)`. Delay can be `Integer`, `Float` or `Range` (`2..5`). In case of a Range, delay number will be chosen randomly for each request: `rand (2..5) # => 3`
|
1195
1196
|
* `engine:` set custom engine than a default one: `in_parallel(:method, urls, threads: 3, engine: :poltergeist_phantomjs)`
|
1196
1197
|
* `config:` pass custom options to config (see [config section](#crawler-config))
|
1198
|
+
* `response_type:` response should be returned as `:html` or `:json`, defaults to `:html`
|
1197
1199
|
|
1198
1200
|
### Active Support included
|
1199
1201
|
|
data/lib/tanakai/base.rb
CHANGED
@@ -286,7 +286,7 @@ module Tanakai
|
|
286
286
|
end
|
287
287
|
end
|
288
288
|
|
289
|
-
def in_parallel(handler, urls, threads:, data: {}, delay: nil, engine: @engine, config: {})
|
289
|
+
def in_parallel(handler, urls, threads:, data: {}, delay: nil, engine: @engine, config: {}, response_type: :html)
|
290
290
|
parts = urls.in_sorted_groups(threads, false)
|
291
291
|
urls_count = urls.size
|
292
292
|
|
@@ -304,12 +304,12 @@ module Tanakai
|
|
304
304
|
part.each do |url_data|
|
305
305
|
if url_data.class == Hash
|
306
306
|
if url_data[:url].present? && url_data[:data].present?
|
307
|
-
spider.request_to(handler, delay, url_data)
|
307
|
+
spider.request_to(handler, delay, url_data, response_type: response_type)
|
308
308
|
else
|
309
309
|
spider.public_send(handler, url_data)
|
310
310
|
end
|
311
311
|
else
|
312
|
-
spider.request_to(handler, delay, url: url_data, data: data)
|
312
|
+
spider.request_to(handler, delay, url: url_data, data: data, response_type: response_type)
|
313
313
|
end
|
314
314
|
end
|
315
315
|
ensure
|
data/lib/tanakai/version.rb
CHANGED