kimurai_dynamic 1.4.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.gitignore +11 -0
- data/.travis.yml +5 -0
- data/CHANGELOG.md +111 -0
- data/Gemfile +6 -0
- data/LICENSE.txt +21 -0
- data/README.md +2038 -0
- data/Rakefile +10 -0
- data/bin/console +14 -0
- data/bin/setup +8 -0
- data/exe/kimurai +6 -0
- data/kimurai.gemspec +48 -0
- data/lib/kimurai/automation/deploy.yml +54 -0
- data/lib/kimurai/automation/setup/chromium_chromedriver.yml +26 -0
- data/lib/kimurai/automation/setup/firefox_geckodriver.yml +20 -0
- data/lib/kimurai/automation/setup/phantomjs.yml +33 -0
- data/lib/kimurai/automation/setup/ruby_environment.yml +124 -0
- data/lib/kimurai/automation/setup.yml +45 -0
- data/lib/kimurai/base/saver.rb +106 -0
- data/lib/kimurai/base/storage.rb +54 -0
- data/lib/kimurai/base.rb +330 -0
- data/lib/kimurai/base_helper.rb +22 -0
- data/lib/kimurai/browser_builder/mechanize_builder.rb +154 -0
- data/lib/kimurai/browser_builder/poltergeist_phantomjs_builder.rb +175 -0
- data/lib/kimurai/browser_builder/selenium_chrome_builder.rb +199 -0
- data/lib/kimurai/browser_builder/selenium_firefox_builder.rb +204 -0
- data/lib/kimurai/browser_builder.rb +20 -0
- data/lib/kimurai/capybara_configuration.rb +10 -0
- data/lib/kimurai/capybara_ext/driver/base.rb +62 -0
- data/lib/kimurai/capybara_ext/mechanize/driver.rb +71 -0
- data/lib/kimurai/capybara_ext/poltergeist/driver.rb +13 -0
- data/lib/kimurai/capybara_ext/selenium/driver.rb +34 -0
- data/lib/kimurai/capybara_ext/session/config.rb +22 -0
- data/lib/kimurai/capybara_ext/session.rb +249 -0
- data/lib/kimurai/cli/ansible_command_builder.rb +71 -0
- data/lib/kimurai/cli/generator.rb +57 -0
- data/lib/kimurai/cli.rb +183 -0
- data/lib/kimurai/core_ext/array.rb +14 -0
- data/lib/kimurai/core_ext/hash.rb +5 -0
- data/lib/kimurai/core_ext/numeric.rb +19 -0
- data/lib/kimurai/core_ext/string.rb +7 -0
- data/lib/kimurai/pipeline.rb +33 -0
- data/lib/kimurai/runner.rb +60 -0
- data/lib/kimurai/template/.gitignore +18 -0
- data/lib/kimurai/template/Gemfile +28 -0
- data/lib/kimurai/template/README.md +3 -0
- data/lib/kimurai/template/config/application.rb +37 -0
- data/lib/kimurai/template/config/automation.yml +13 -0
- data/lib/kimurai/template/config/boot.rb +22 -0
- data/lib/kimurai/template/config/initializers/.keep +0 -0
- data/lib/kimurai/template/config/schedule.rb +57 -0
- data/lib/kimurai/template/db/.keep +0 -0
- data/lib/kimurai/template/helpers/application_helper.rb +3 -0
- data/lib/kimurai/template/lib/.keep +0 -0
- data/lib/kimurai/template/log/.keep +0 -0
- data/lib/kimurai/template/pipelines/saver.rb +11 -0
- data/lib/kimurai/template/pipelines/validator.rb +24 -0
- data/lib/kimurai/template/spiders/application_spider.rb +143 -0
- data/lib/kimurai/template/tmp/.keep +0 -0
- data/lib/kimurai/version.rb +3 -0
- data/lib/kimurai.rb +54 -0
- metadata +349 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: 1c8491f3e3723cfbaed46c66dbe215fd2708f185bb5f53eda5464975926a39a9
|
4
|
+
data.tar.gz: fabb7dbc5da7aa18963996f4b161f533198737f19007eb3c93c083745a885f2a
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 44aa5bdc2349826677f3f7bc935db5a2b7a3044176ddc3189d4fdd0f8f5f0d1f6092072c58d62567ffcfacd8f9ff5510c4a23b82db159a3eb7e72b3dfc9c3a6b
|
7
|
+
data.tar.gz: 739df8d22a98a8e59c9ec077f5fa92212757746e9cb0aea380e9d43c6e7d8e0f85c3121e02cd83a45770b19861acc2f48ef1d04dd2c990b78f9cce0e8b9aa1f0
|
data/.gitignore
ADDED
data/.travis.yml
ADDED
data/CHANGELOG.md
ADDED
@@ -0,0 +1,111 @@
|
|
1
|
+
# CHANGELOG
|
2
|
+
## 1.4.0
|
3
|
+
### New
|
4
|
+
* Add `encoding` config option (see [All available config options](https://github.com/vifreefly/kimuraframework#all-available-config-options))
|
5
|
+
* Validate url before processing a request (Base#request_to)
|
6
|
+
|
7
|
+
### Fixes
|
8
|
+
* Fix console command bug (see [issue 21](https://github.com/vifreefly/kimuraframework/issues/21))
|
9
|
+
|
10
|
+
## 1.3.2
|
11
|
+
### Fixes
|
12
|
+
* In the project template, set Ruby version as >= 2.5 (before was hard-coded to 2.5.1)
|
13
|
+
* Remove .ruby-version file (was hard-coded to 2.5.1) from the project template
|
14
|
+
|
15
|
+
## 1.3.1
|
16
|
+
### Fixes
|
17
|
+
* Fixed bug in Base#save_to
|
18
|
+
|
19
|
+
## 1.3.0
|
20
|
+
### Breaking changes 1.3.0
|
21
|
+
* Remove persistence database feature (because it's slow and makes things complicated)
|
22
|
+
|
23
|
+
### New
|
24
|
+
* Add `--include` and `--exclude` options to CLI#runner
|
25
|
+
* Add Base `#create_browser` method to easily create additional browser instances
|
26
|
+
* Add Capybara::Session `#scroll_to_bottom`
|
27
|
+
* Add skip_on_failure feature to `retry_request_errors` config option
|
28
|
+
* Add info about `add_event` method to the README
|
29
|
+
|
30
|
+
### Fixes and improvements
|
31
|
+
* Improve Runner
|
32
|
+
* Fix time helper in schedule.rb
|
33
|
+
* Add proxy validation to browser builders
|
34
|
+
* Allow to pass different arguments to the `Base.parse` method
|
35
|
+
|
36
|
+
## 1.2.0
|
37
|
+
### New
|
38
|
+
* Add possibility to add array of values to the storage (`Base::Storage#add`)
|
39
|
+
* Add `exception_on_fail` option to `Base.crawl!`
|
40
|
+
* Add possibility to pass request hash to the `start_urls` (You can use array of hashes as well, like: `@start_urls = [{ url: "https://example.com/cat?id=1", data: { category: "First Category" } }]`)
|
41
|
+
* Implement `skip_request_errors` config feature. Added [Handle request errors](https://github.com/vifreefly/kimuraframework#handle-request-errors) chapter to the README.
|
42
|
+
* Add option to choose response type for `Session#current_response` (`:html` default, or `:json`)
|
43
|
+
* Add option to provide custom chrome and chromedriver paths
|
44
|
+
|
45
|
+
### Improvements
|
46
|
+
* Refactor `Runner`
|
47
|
+
|
48
|
+
### Fixes
|
49
|
+
* Fix `Base#Saver` (automatically create file if it doesn't exists in case of persistence database)
|
50
|
+
* Do not deep merge config's `headers:` option
|
51
|
+
|
52
|
+
## 1.1.0
|
53
|
+
### Breaking changes 1.1.0
|
54
|
+
`browser` config option depricated. Now all sub-options inside `browser` should be placed right into `@config` hash, without `browser` parent key. Example:
|
55
|
+
|
56
|
+
```ruby
|
57
|
+
# Was:
|
58
|
+
@config = {
|
59
|
+
browser: {
|
60
|
+
retry_request_errors: [Net::ReadTimeout],
|
61
|
+
restart_if: {
|
62
|
+
memory_limit: 350_000,
|
63
|
+
requests_limit: 100
|
64
|
+
},
|
65
|
+
before_request: {
|
66
|
+
change_proxy: true,
|
67
|
+
change_user_agent: true,
|
68
|
+
clear_cookies: true,
|
69
|
+
clear_and_set_cookies: true,
|
70
|
+
delay: 1..3
|
71
|
+
}
|
72
|
+
}
|
73
|
+
}
|
74
|
+
|
75
|
+
# Now:
|
76
|
+
@config = {
|
77
|
+
retry_request_errors: [Net::ReadTimeout],
|
78
|
+
restart_if: {
|
79
|
+
memory_limit: 350_000,
|
80
|
+
requests_limit: 100
|
81
|
+
},
|
82
|
+
before_request: {
|
83
|
+
change_proxy: true,
|
84
|
+
change_user_agent: true,
|
85
|
+
clear_cookies: true,
|
86
|
+
clear_and_set_cookies: true,
|
87
|
+
delay: 1..3
|
88
|
+
}
|
89
|
+
}
|
90
|
+
```
|
91
|
+
|
92
|
+
### New
|
93
|
+
* Add `storage` object with additional methods and persistence database feature
|
94
|
+
* Add events feature to `run_info`
|
95
|
+
* Add `skip_duplicate_requests` config option to automatically skip already visited urls when using requrst_to
|
96
|
+
* Add `extensions` config option to allow inject JS code into browser (supported only by poltergeist_phantomjs engine)
|
97
|
+
* Add Capybara::Session#within_new_window_by method
|
98
|
+
|
99
|
+
### Improvements
|
100
|
+
* Add the last backtrace line to pipeline output when item was dropped
|
101
|
+
* Do not destroy driver if it's not exists (for Base.parse! method)
|
102
|
+
* Handle possible Net::ReadTimeout error while trying to #quit driver
|
103
|
+
|
104
|
+
### Fixes
|
105
|
+
* Fix Mechanize::Driver#proxy (there was a bug while using proxy for mechanize engine without authorization)
|
106
|
+
* Fix requests retries logic
|
107
|
+
|
108
|
+
|
109
|
+
## 1.0.1
|
110
|
+
* Add missing `logger` method to pipeline
|
111
|
+
* Fix `set_proxy` in Mechanize and Poltergeist builders
|
data/Gemfile
ADDED
data/LICENSE.txt
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
The MIT License (MIT)
|
2
|
+
|
3
|
+
Copyright (c) 2018 Victor Afanasev
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
7
|
+
in the Software without restriction, including without limitation the rights
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
10
|
+
furnished to do so, subject to the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
13
|
+
all copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
21
|
+
THE SOFTWARE.
|