kimurai 1.0.0 → 1.0.1
Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 7f2185614ca5aa8486c17e0c43b3b035cf22cd18d51617430f556a12af3dc7c8
|
4
|
+
data.tar.gz: 9e5c296feb5d020aa13bcfaa7f6f4c77d839ff373e8aa9d3e0abcc953aaa89de
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 07d92edd8719cbfc701ac7d82975d4c06f5ba9f6adb0bdbbc6731f81655d70d077d140efa54b473b462058042078abab9218f5f00dab244f7478f91c62c8e24b
|
7
|
+
data.tar.gz: 5dc6a70b6379a46c58c917455a7eace96c1093944125888cbc2f9b2af93cf065de0ca00e9d98e786d9a2fbc3a53a1ea3dbf1712e703984d063dfc937ad5e0c71
|
data/CHANGELOG.md
ADDED
data/README.md
CHANGED
@@ -6,6 +6,18 @@
|
|
6
6
|
<h1>Kimura Framework</h1>
|
7
7
|
</div>
|
8
8
|
|
9
|
+
> **Note about v1.0.0 version:**
|
10
|
+
> * The code was massively refactored for a [support](#using-kimurai-inside-existing-ruby-application) to run spiders multiple times from inside a single process. Now it's possible to run Kimurai spiders using background jobs like Sidekiq.
|
11
|
+
> * `require 'kimurai'` doesn't require any gems except Active Support. Only when a particular spider [starts](#crawl-method), Capybara will be required with a specific driver.
|
12
|
+
> * Although Kimurai [extends](lib/kimurai/capybara_ext) Capybara (all the magic happens inside [extended](lib/kimurai/capybara_ext/session.rb) `Capybara::Session#visit` method), session instances which were created manually will behave normally.
|
13
|
+
> * No spaghetti code with `case/when/end` blocks anymore. All drivers [were extended](lib/kimurai/capybara_ext) to support unified methods for cookies, proxies, headers, etc.
|
14
|
+
> * `selenium_url_to_set_cookies` @config option don't need anymore if you're use Selenium-like engine with custom cookies setting.
|
15
|
+
> * Small changes in design (check the readme again to see what was changed)
|
16
|
+
> * Stats database with a web dashboard were removed
|
17
|
+
> * Again, massive refactor. Code now looks much better than it was before.
|
18
|
+
|
19
|
+
<br>
|
20
|
+
|
9
21
|
Kimurai is a modern web scraping framework written in Ruby which **works out of box with Headless Chromium/Firefox, PhantomJS**, or simple HTTP requests and **allows to scrape and interact with JavaScript rendered websites.**
|
10
22
|
|
11
23
|
Kimurai based on well-known [Capybara](https://github.com/teamcapybara/capybara) and [Nokogiri](https://github.com/sparklemotion/nokogiri) gems, so you don't have to learn anything new. Lets see:
|
@@ -217,7 +229,6 @@ I, [2018-08-22 13:33:30 +0400#23356] [M: 47375890851320] INFO -- infinite_scrol
|
|
217
229
|
* [Kimurai](#kimurai)
|
218
230
|
* [Features](#features)
|
219
231
|
* [Table of Contents](#table-of-contents)
|
220
|
-
* [Note about v1.0.0 version](#note-about-v1-0-0-version)
|
221
232
|
* [Installation](#installation)
|
222
233
|
* [Getting to Know](#getting-to-know)
|
223
234
|
* [Interactive console](#interactive-console)
|
@@ -255,12 +266,6 @@ I, [2018-08-22 13:33:30 +0400#23356] [M: 47375890851320] INFO -- infinite_scrol
|
|
255
266
|
* [Chat Support and Feedback](#chat-support-and-feedback)
|
256
267
|
* [License](#license)
|
257
268
|
|
258
|
-
## Note about v1.0.0 version
|
259
|
-
* The code was massively refactored for a [support](#using-kimurai-inside-existing-ruby-application) to run spiders multiple times from inside a single process. Now it's possible to run Kimurai spiders using background jobs like Sidekiq.
|
260
|
-
* `require 'kimurai'` doesn't require any gems except Active Support. Only when a particular spider [starts](#crawl-method), Capybara will be required with a specific driver.
|
261
|
-
* Although Kimurai [extends](lib/kimurai/capybara_ext) Capybara (all the magic happens inside [extended](lib/kimurai/capybara_ext/session.rb) `Capybara::Session#visit` method), session instances which were created manually will behave normally.
|
262
|
-
* Small changes in design (check the readme again to see what was changed)
|
263
|
-
* Again, massive refactor. Code now looks much better than it was before.
|
264
269
|
|
265
270
|
## Installation
|
266
271
|
Kimurai requires Ruby version `>= 2.5.0`. Supported platforms: `Linux` and `Mac OS X`.
|
@@ -1604,7 +1609,7 @@ To generate a new spider in the project, run:
|
|
1604
1609
|
|
1605
1610
|
```bash
|
1606
1611
|
$ kimurai generate spider example_spider
|
1607
|
-
create
|
1612
|
+
create spiders/example_spider.rb
|
1608
1613
|
```
|
1609
1614
|
|
1610
1615
|
Command will generate a new spider class inherited from `ApplicationSpider`:
|
@@ -37,7 +37,7 @@ module Kimurai
|
|
37
37
|
if type == "socks5"
|
38
38
|
logger.error "BrowserBuilder (mechanize): can't set socks5 proxy (not supported), skipped"
|
39
39
|
else
|
40
|
-
@browser.set_proxy(*proxy_string.split(":"))
|
40
|
+
@browser.driver.set_proxy(*proxy_string.split(":"))
|
41
41
|
logger.debug "BrowserBuilder (mechanize): enabled #{type} proxy, ip: #{ip}, port: #{port}"
|
42
42
|
end
|
43
43
|
end
|
@@ -84,7 +84,7 @@ module Kimurai
|
|
84
84
|
|
85
85
|
# restart_if
|
86
86
|
if @config.dig(:browser, :restart_if).present?
|
87
|
-
logger.
|
87
|
+
logger.warn "BrowserBuilder (mechanize): `browser restart_if` options not supported by Mechanize, skipped"
|
88
88
|
end
|
89
89
|
|
90
90
|
# before_request clear_cookies
|
@@ -59,7 +59,7 @@ module Kimurai
|
|
59
59
|
proxy_string = (proxy.class == Proc ? proxy.call : proxy).strip
|
60
60
|
ip, port, type = proxy_string.split(":")
|
61
61
|
|
62
|
-
@browser.set_proxy(*proxy_string.split(":"))
|
62
|
+
@browser.driver.set_proxy(*proxy_string.split(":"))
|
63
63
|
logger.debug "BrowserBuilder (poltergeist_phantomjs): enabled #{type} proxy, ip: #{ip}, port: #{port}"
|
64
64
|
end
|
65
65
|
|
data/lib/kimurai/pipeline.rb
CHANGED
data/lib/kimurai/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: kimurai
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.0.
|
4
|
+
version: 1.0.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Victor Afanasev
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2018-08-
|
11
|
+
date: 2018-08-27 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: thor
|
@@ -264,6 +264,7 @@ extra_rdoc_files: []
|
|
264
264
|
files:
|
265
265
|
- ".gitignore"
|
266
266
|
- ".travis.yml"
|
267
|
+
- CHANGELOG.md
|
267
268
|
- CODE_OF_CONDUCT.md
|
268
269
|
- Gemfile
|
269
270
|
- LICENSE.txt
|