kimurai 1.0.0 → 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: d108c41e5da08b22c21cc6c71cc3ac7056ddd1af32054c22a22f0c59658bfcb4
4
- data.tar.gz: 8a8d32b7b8646eb50bd9f71d8986edc2ac78efc0e2e6a437b3280cff4418c5dd
3
+ metadata.gz: 7f2185614ca5aa8486c17e0c43b3b035cf22cd18d51617430f556a12af3dc7c8
4
+ data.tar.gz: 9e5c296feb5d020aa13bcfaa7f6f4c77d839ff373e8aa9d3e0abcc953aaa89de
5
5
  SHA512:
6
- metadata.gz: 4c82647cbe276980ef0a246693c7e68c08651351a549f99fbc6618bc9836c4a4ba83b4d09e1e29d06abcfa0d4f70443fb88682f57c544c0218b22940834a48b1
7
- data.tar.gz: 845f04c77fbb5e53b24d048e60f23e2c0f9fdeb4d2fde7dcaaa04bebfebc4454777ade03cae895e444583aafb6c8e56038d0d722589fde10076091903646fdf7
6
+ metadata.gz: 07d92edd8719cbfc701ac7d82975d4c06f5ba9f6adb0bdbbc6731f81655d70d077d140efa54b473b462058042078abab9218f5f00dab244f7478f91c62c8e24b
7
+ data.tar.gz: 5dc6a70b6379a46c58c917455a7eace96c1093944125888cbc2f9b2af93cf065de0ca00e9d98e786d9a2fbc3a53a1ea3dbf1712e703984d063dfc937ad5e0c71
@@ -0,0 +1,6 @@
1
+ # CHANGELOG
2
+ ## HEAD
3
+
4
+ ## 1.0.1
5
+ * Add missing `logger` method to pipeline
6
+ * Fix `set_proxy` in Mechanize and Poltergeist builders
data/README.md CHANGED
@@ -6,6 +6,18 @@
6
6
  <h1>Kimura Framework</h1>
7
7
  </div>
8
8
 
9
+ > **Note about v1.0.0 version:**
10
+ > * The code was massively refactored for a [support](#using-kimurai-inside-existing-ruby-application) to run spiders multiple times from inside a single process. Now it's possible to run Kimurai spiders using background jobs like Sidekiq.
11
+ > * `require 'kimurai'` doesn't require any gems except Active Support. Only when a particular spider [starts](#crawl-method), Capybara will be required with a specific driver.
12
+ > * Although Kimurai [extends](lib/kimurai/capybara_ext) Capybara (all the magic happens inside [extended](lib/kimurai/capybara_ext/session.rb) `Capybara::Session#visit` method), session instances which were created manually will behave normally.
13
+ > * No spaghetti code with `case/when/end` blocks anymore. All drivers [were extended](lib/kimurai/capybara_ext) to support unified methods for cookies, proxies, headers, etc.
14
+ > * `selenium_url_to_set_cookies` @config option don't need anymore if you're use Selenium-like engine with custom cookies setting.
15
+ > * Small changes in design (check the readme again to see what was changed)
16
+ > * Stats database with a web dashboard were removed
17
+ > * Again, massive refactor. Code now looks much better than it was before.
18
+
19
+ <br>
20
+
9
21
  Kimurai is a modern web scraping framework written in Ruby which **works out of box with Headless Chromium/Firefox, PhantomJS**, or simple HTTP requests and **allows to scrape and interact with JavaScript rendered websites.**
10
22
 
11
23
  Kimurai based on well-known [Capybara](https://github.com/teamcapybara/capybara) and [Nokogiri](https://github.com/sparklemotion/nokogiri) gems, so you don't have to learn anything new. Lets see:
@@ -217,7 +229,6 @@ I, [2018-08-22 13:33:30 +0400#23356] [M: 47375890851320] INFO -- infinite_scrol
217
229
  * [Kimurai](#kimurai)
218
230
  * [Features](#features)
219
231
  * [Table of Contents](#table-of-contents)
220
- * [Note about v1.0.0 version](#note-about-v1-0-0-version)
221
232
  * [Installation](#installation)
222
233
  * [Getting to Know](#getting-to-know)
223
234
  * [Interactive console](#interactive-console)
@@ -255,12 +266,6 @@ I, [2018-08-22 13:33:30 +0400#23356] [M: 47375890851320] INFO -- infinite_scrol
255
266
  * [Chat Support and Feedback](#chat-support-and-feedback)
256
267
  * [License](#license)
257
268
 
258
- ## Note about v1.0.0 version
259
- * The code was massively refactored for a [support](#using-kimurai-inside-existing-ruby-application) to run spiders multiple times from inside a single process. Now it's possible to run Kimurai spiders using background jobs like Sidekiq.
260
- * `require 'kimurai'` doesn't require any gems except Active Support. Only when a particular spider [starts](#crawl-method), Capybara will be required with a specific driver.
261
- * Although Kimurai [extends](lib/kimurai/capybara_ext) Capybara (all the magic happens inside [extended](lib/kimurai/capybara_ext/session.rb) `Capybara::Session#visit` method), session instances which were created manually will behave normally.
262
- * Small changes in design (check the readme again to see what was changed)
263
- * Again, massive refactor. Code now looks much better than it was before.
264
269
 
265
270
  ## Installation
266
271
  Kimurai requires Ruby version `>= 2.5.0`. Supported platforms: `Linux` and `Mac OS X`.
@@ -1604,7 +1609,7 @@ To generate a new spider in the project, run:
1604
1609
 
1605
1610
  ```bash
1606
1611
  $ kimurai generate spider example_spider
1607
- create crawlers/example_spider.rb
1612
+ create spiders/example_spider.rb
1608
1613
  ```
1609
1614
 
1610
1615
  Command will generate a new spider class inherited from `ApplicationSpider`:
@@ -37,7 +37,7 @@ module Kimurai
37
37
  if type == "socks5"
38
38
  logger.error "BrowserBuilder (mechanize): can't set socks5 proxy (not supported), skipped"
39
39
  else
40
- @browser.set_proxy(*proxy_string.split(":"))
40
+ @browser.driver.set_proxy(*proxy_string.split(":"))
41
41
  logger.debug "BrowserBuilder (mechanize): enabled #{type} proxy, ip: #{ip}, port: #{port}"
42
42
  end
43
43
  end
@@ -84,7 +84,7 @@ module Kimurai
84
84
 
85
85
  # restart_if
86
86
  if @config.dig(:browser, :restart_if).present?
87
- logger.error "BrowserBuilder (mechanize): `browser restart_if` options not supported by Mechanize, skipped"
87
+ logger.warn "BrowserBuilder (mechanize): `browser restart_if` options not supported by Mechanize, skipped"
88
88
  end
89
89
 
90
90
  # before_request clear_cookies
@@ -59,7 +59,7 @@ module Kimurai
59
59
  proxy_string = (proxy.class == Proc ? proxy.call : proxy).strip
60
60
  ip, port, type = proxy_string.split(":")
61
61
 
62
- @browser.set_proxy(*proxy_string.split(":"))
62
+ @browser.driver.set_proxy(*proxy_string.split(":"))
63
63
  logger.debug "BrowserBuilder (poltergeist_phantomjs): enabled #{type} proxy, ip: #{ip}, port: #{port}"
64
64
  end
65
65
 
@@ -21,5 +21,9 @@ module Kimurai
21
21
  def save_to(path, item, format:, position: true)
22
22
  spider.save_to(path, item, format: format, position: position)
23
23
  end
24
+
25
+ def logger
26
+ spider.logger
27
+ end
24
28
  end
25
29
  end
@@ -1,3 +1,3 @@
1
1
  module Kimurai
2
- VERSION = "1.0.0"
2
+ VERSION = "1.0.1"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: kimurai
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.0
4
+ version: 1.0.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Victor Afanasev
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2018-08-23 00:00:00.000000000 Z
11
+ date: 2018-08-27 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: thor
@@ -264,6 +264,7 @@ extra_rdoc_files: []
264
264
  files:
265
265
  - ".gitignore"
266
266
  - ".travis.yml"
267
+ - CHANGELOG.md
267
268
  - CODE_OF_CONDUCT.md
268
269
  - Gemfile
269
270
  - LICENSE.txt