kimurai 1.0.0 → 1.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 7f2185614ca5aa8486c17e0c43b3b035cf22cd18d51617430f556a12af3dc7c8
|
4
|
+
data.tar.gz: 9e5c296feb5d020aa13bcfaa7f6f4c77d839ff373e8aa9d3e0abcc953aaa89de
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 07d92edd8719cbfc701ac7d82975d4c06f5ba9f6adb0bdbbc6731f81655d70d077d140efa54b473b462058042078abab9218f5f00dab244f7478f91c62c8e24b
|
7
|
+
data.tar.gz: 5dc6a70b6379a46c58c917455a7eace96c1093944125888cbc2f9b2af93cf065de0ca00e9d98e786d9a2fbc3a53a1ea3dbf1712e703984d063dfc937ad5e0c71
|
data/CHANGELOG.md
ADDED
data/README.md
CHANGED
@@ -6,6 +6,18 @@
|
|
6
6
|
<h1>Kimura Framework</h1>
|
7
7
|
</div>
|
8
8
|
|
9
|
+
> **Note about v1.0.0 version:**
|
10
|
+
> * The code was massively refactored for a [support](#using-kimurai-inside-existing-ruby-application) to run spiders multiple times from inside a single process. Now it's possible to run Kimurai spiders using background jobs like Sidekiq.
|
11
|
+
> * `require 'kimurai'` doesn't require any gems except Active Support. Only when a particular spider [starts](#crawl-method), Capybara will be required with a specific driver.
|
12
|
+
> * Although Kimurai [extends](lib/kimurai/capybara_ext) Capybara (all the magic happens inside [extended](lib/kimurai/capybara_ext/session.rb) `Capybara::Session#visit` method), session instances which were created manually will behave normally.
|
13
|
+
> * No spaghetti code with `case/when/end` blocks anymore. All drivers [were extended](lib/kimurai/capybara_ext) to support unified methods for cookies, proxies, headers, etc.
|
14
|
+
> * `selenium_url_to_set_cookies` @config option don't need anymore if you're use Selenium-like engine with custom cookies setting.
|
15
|
+
> * Small changes in design (check the readme again to see what was changed)
|
16
|
+
> * Stats database with a web dashboard were removed
|
17
|
+
> * Again, massive refactor. Code now looks much better than it was before.
|
18
|
+
|
19
|
+
<br>
|
20
|
+
|
9
21
|
Kimurai is a modern web scraping framework written in Ruby which **works out of box with Headless Chromium/Firefox, PhantomJS**, or simple HTTP requests and **allows to scrape and interact with JavaScript rendered websites.**
|
10
22
|
|
11
23
|
Kimurai based on well-known [Capybara](https://github.com/teamcapybara/capybara) and [Nokogiri](https://github.com/sparklemotion/nokogiri) gems, so you don't have to learn anything new. Lets see:
|
@@ -217,7 +229,6 @@ I, [2018-08-22 13:33:30 +0400#23356] [M: 47375890851320] INFO -- infinite_scrol
|
|
217
229
|
* [Kimurai](#kimurai)
|
218
230
|
* [Features](#features)
|
219
231
|
* [Table of Contents](#table-of-contents)
|
220
|
-
* [Note about v1.0.0 version](#note-about-v1-0-0-version)
|
221
232
|
* [Installation](#installation)
|
222
233
|
* [Getting to Know](#getting-to-know)
|
223
234
|
* [Interactive console](#interactive-console)
|
@@ -255,12 +266,6 @@ I, [2018-08-22 13:33:30 +0400#23356] [M: 47375890851320] INFO -- infinite_scrol
|
|
255
266
|
* [Chat Support and Feedback](#chat-support-and-feedback)
|
256
267
|
* [License](#license)
|
257
268
|
|
258
|
-
## Note about v1.0.0 version
|
259
|
-
* The code was massively refactored for a [support](#using-kimurai-inside-existing-ruby-application) to run spiders multiple times from inside a single process. Now it's possible to run Kimurai spiders using background jobs like Sidekiq.
|
260
|
-
* `require 'kimurai'` doesn't require any gems except Active Support. Only when a particular spider [starts](#crawl-method), Capybara will be required with a specific driver.
|
261
|
-
* Although Kimurai [extends](lib/kimurai/capybara_ext) Capybara (all the magic happens inside [extended](lib/kimurai/capybara_ext/session.rb) `Capybara::Session#visit` method), session instances which were created manually will behave normally.
|
262
|
-
* Small changes in design (check the readme again to see what was changed)
|
263
|
-
* Again, massive refactor. Code now looks much better than it was before.
|
264
269
|
|
265
270
|
## Installation
|
266
271
|
Kimurai requires Ruby version `>= 2.5.0`. Supported platforms: `Linux` and `Mac OS X`.
|
@@ -1604,7 +1609,7 @@ To generate a new spider in the project, run:
|
|
1604
1609
|
|
1605
1610
|
```bash
|
1606
1611
|
$ kimurai generate spider example_spider
|
1607
|
-
create
|
1612
|
+
create spiders/example_spider.rb
|
1608
1613
|
```
|
1609
1614
|
|
1610
1615
|
Command will generate a new spider class inherited from `ApplicationSpider`:
|
@@ -37,7 +37,7 @@ module Kimurai
|
|
37
37
|
if type == "socks5"
|
38
38
|
logger.error "BrowserBuilder (mechanize): can't set socks5 proxy (not supported), skipped"
|
39
39
|
else
|
40
|
-
@browser.set_proxy(*proxy_string.split(":"))
|
40
|
+
@browser.driver.set_proxy(*proxy_string.split(":"))
|
41
41
|
logger.debug "BrowserBuilder (mechanize): enabled #{type} proxy, ip: #{ip}, port: #{port}"
|
42
42
|
end
|
43
43
|
end
|
@@ -84,7 +84,7 @@ module Kimurai
|
|
84
84
|
|
85
85
|
# restart_if
|
86
86
|
if @config.dig(:browser, :restart_if).present?
|
87
|
-
logger.
|
87
|
+
logger.warn "BrowserBuilder (mechanize): `browser restart_if` options not supported by Mechanize, skipped"
|
88
88
|
end
|
89
89
|
|
90
90
|
# before_request clear_cookies
|
@@ -59,7 +59,7 @@ module Kimurai
|
|
59
59
|
proxy_string = (proxy.class == Proc ? proxy.call : proxy).strip
|
60
60
|
ip, port, type = proxy_string.split(":")
|
61
61
|
|
62
|
-
@browser.set_proxy(*proxy_string.split(":"))
|
62
|
+
@browser.driver.set_proxy(*proxy_string.split(":"))
|
63
63
|
logger.debug "BrowserBuilder (poltergeist_phantomjs): enabled #{type} proxy, ip: #{ip}, port: #{port}"
|
64
64
|
end
|
65
65
|
|
data/lib/kimurai/pipeline.rb
CHANGED
data/lib/kimurai/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: kimurai
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.0.
|
4
|
+
version: 1.0.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Victor Afanasev
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2018-08-
|
11
|
+
date: 2018-08-27 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: thor
|
@@ -264,6 +264,7 @@ extra_rdoc_files: []
|
|
264
264
|
files:
|
265
265
|
- ".gitignore"
|
266
266
|
- ".travis.yml"
|
267
|
+
- CHANGELOG.md
|
267
268
|
- CODE_OF_CONDUCT.md
|
268
269
|
- Gemfile
|
269
270
|
- LICENSE.txt
|