tanakai 1.5.0 → 1.5.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +4 -0
- data/README.md +5 -3
- data/lib/tanakai/base.rb +3 -3
- data/lib/tanakai/template/Gemfile +1 -1
- data/lib/tanakai/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: aefc92adbda49240ac69ad9d3163ec3bb54930d8288705943d269ccbba41c64a
|
4
|
+
data.tar.gz: c1c361906354aba4edb1dcef9e2737b4e827c794c5f61b4104ec309a67e3ca80
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: c744ae590cbb25e9dde914174be9fb003399bdc25408c9901c3898c296d4ff5428b97d3e4942c952979586946c75b4c7027edd594d438dc387dffb6afedda3bb
|
7
|
+
data.tar.gz: ac51ac208e71cd928aa402d62f51e465b11620b742f5ab8af208323e1f636503345ae910a4a6c0fe21f3d415bcff7bba600b6022bf879877695ba711463e97c0
|
data/CHANGELOG.md
CHANGED
data/README.md
CHANGED
@@ -626,7 +626,7 @@ Check out **Capybara cheat sheets** where you can see all available methods **to
|
|
626
626
|
|
627
627
|
### `request_to` method
|
628
628
|
|
629
|
-
For making requests to a particular method there is `request_to`. It requires minimum two arguments: `:method_name` and `url:`. An optional argument is `data:` (see above what for is it). Example:
|
629
|
+
For making requests to a particular method there is `request_to`. It requires minimum two arguments: `:method_name` and `url:`. An optional argument is `data:` (see above what for is it) and `response_type` (defaults to `:html`). Example:
|
630
630
|
|
631
631
|
```ruby
|
632
632
|
class Spider < Tanakai::Base
|
@@ -635,11 +635,12 @@ class Spider < Tanakai::Base
|
|
635
635
|
|
636
636
|
def parse(response, url:, data: {})
|
637
637
|
# Process request to `parse_product` method with `https://example.com/some_product` url:
|
638
|
-
request_to :parse_product, url: "https://example.com/some_product"
|
638
|
+
request_to :parse_product, url: "https://example.com/some_product.json", response_type: :json
|
639
639
|
end
|
640
640
|
|
641
641
|
def parse_product(response, url:, data: {})
|
642
|
-
puts "
|
642
|
+
puts "JSON parsed from page https://example.com/some_product.json"
|
643
|
+
puts response
|
643
644
|
end
|
644
645
|
end
|
645
646
|
```
|
@@ -1194,6 +1195,7 @@ I, [2018-08-22 14:49:12 +0400#13033] [M: 46982297486840] INFO -- amazon_spider:
|
|
1194
1195
|
* `delay:` set delay between requests: `in_parallel(:method, urls, threads: 3, delay: 2)`. Delay can be `Integer`, `Float` or `Range` (`2..5`). In case of a Range, delay number will be chosen randomly for each request: `rand (2..5) # => 3`
|
1195
1196
|
* `engine:` set custom engine than a default one: `in_parallel(:method, urls, threads: 3, engine: :poltergeist_phantomjs)`
|
1196
1197
|
* `config:` pass custom options to config (see [config section](#crawler-config))
|
1198
|
+
* `response_type:` response should be returned as `:html` or `:json`, defaults to `:html`
|
1197
1199
|
|
1198
1200
|
### Active Support included
|
1199
1201
|
|
data/lib/tanakai/base.rb
CHANGED
@@ -286,7 +286,7 @@ module Tanakai
|
|
286
286
|
end
|
287
287
|
end
|
288
288
|
|
289
|
-
def in_parallel(handler, urls, threads:, data: {}, delay: nil, engine: @engine, config: {})
|
289
|
+
def in_parallel(handler, urls, threads:, data: {}, delay: nil, engine: @engine, config: {}, response_type: :html)
|
290
290
|
parts = urls.in_sorted_groups(threads, false)
|
291
291
|
urls_count = urls.size
|
292
292
|
|
@@ -304,12 +304,12 @@ module Tanakai
|
|
304
304
|
part.each do |url_data|
|
305
305
|
if url_data.class == Hash
|
306
306
|
if url_data[:url].present? && url_data[:data].present?
|
307
|
-
spider.request_to(handler, delay, url_data)
|
307
|
+
spider.request_to(handler, delay, url_data, response_type: response_type)
|
308
308
|
else
|
309
309
|
spider.public_send(handler, url_data)
|
310
310
|
end
|
311
311
|
else
|
312
|
-
spider.request_to(handler, delay, url: url_data, data: data)
|
312
|
+
spider.request_to(handler, delay, url: url_data, data: data, response_type: response_type)
|
313
313
|
end
|
314
314
|
end
|
315
315
|
ensure
|
data/lib/tanakai/version.rb
CHANGED