kimurai_dynamic 1.4.1

Sign up to get free protection for your applications and to get access to all the features.
Files changed (62) hide show
  1. checksums.yaml +7 -0
  2. data/.gitignore +11 -0
  3. data/.travis.yml +5 -0
  4. data/CHANGELOG.md +111 -0
  5. data/Gemfile +6 -0
  6. data/LICENSE.txt +21 -0
  7. data/README.md +2038 -0
  8. data/Rakefile +10 -0
  9. data/bin/console +14 -0
  10. data/bin/setup +8 -0
  11. data/exe/kimurai +6 -0
  12. data/kimurai.gemspec +48 -0
  13. data/lib/kimurai/automation/deploy.yml +54 -0
  14. data/lib/kimurai/automation/setup/chromium_chromedriver.yml +26 -0
  15. data/lib/kimurai/automation/setup/firefox_geckodriver.yml +20 -0
  16. data/lib/kimurai/automation/setup/phantomjs.yml +33 -0
  17. data/lib/kimurai/automation/setup/ruby_environment.yml +124 -0
  18. data/lib/kimurai/automation/setup.yml +45 -0
  19. data/lib/kimurai/base/saver.rb +106 -0
  20. data/lib/kimurai/base/storage.rb +54 -0
  21. data/lib/kimurai/base.rb +330 -0
  22. data/lib/kimurai/base_helper.rb +22 -0
  23. data/lib/kimurai/browser_builder/mechanize_builder.rb +154 -0
  24. data/lib/kimurai/browser_builder/poltergeist_phantomjs_builder.rb +175 -0
  25. data/lib/kimurai/browser_builder/selenium_chrome_builder.rb +199 -0
  26. data/lib/kimurai/browser_builder/selenium_firefox_builder.rb +204 -0
  27. data/lib/kimurai/browser_builder.rb +20 -0
  28. data/lib/kimurai/capybara_configuration.rb +10 -0
  29. data/lib/kimurai/capybara_ext/driver/base.rb +62 -0
  30. data/lib/kimurai/capybara_ext/mechanize/driver.rb +71 -0
  31. data/lib/kimurai/capybara_ext/poltergeist/driver.rb +13 -0
  32. data/lib/kimurai/capybara_ext/selenium/driver.rb +34 -0
  33. data/lib/kimurai/capybara_ext/session/config.rb +22 -0
  34. data/lib/kimurai/capybara_ext/session.rb +249 -0
  35. data/lib/kimurai/cli/ansible_command_builder.rb +71 -0
  36. data/lib/kimurai/cli/generator.rb +57 -0
  37. data/lib/kimurai/cli.rb +183 -0
  38. data/lib/kimurai/core_ext/array.rb +14 -0
  39. data/lib/kimurai/core_ext/hash.rb +5 -0
  40. data/lib/kimurai/core_ext/numeric.rb +19 -0
  41. data/lib/kimurai/core_ext/string.rb +7 -0
  42. data/lib/kimurai/pipeline.rb +33 -0
  43. data/lib/kimurai/runner.rb +60 -0
  44. data/lib/kimurai/template/.gitignore +18 -0
  45. data/lib/kimurai/template/Gemfile +28 -0
  46. data/lib/kimurai/template/README.md +3 -0
  47. data/lib/kimurai/template/config/application.rb +37 -0
  48. data/lib/kimurai/template/config/automation.yml +13 -0
  49. data/lib/kimurai/template/config/boot.rb +22 -0
  50. data/lib/kimurai/template/config/initializers/.keep +0 -0
  51. data/lib/kimurai/template/config/schedule.rb +57 -0
  52. data/lib/kimurai/template/db/.keep +0 -0
  53. data/lib/kimurai/template/helpers/application_helper.rb +3 -0
  54. data/lib/kimurai/template/lib/.keep +0 -0
  55. data/lib/kimurai/template/log/.keep +0 -0
  56. data/lib/kimurai/template/pipelines/saver.rb +11 -0
  57. data/lib/kimurai/template/pipelines/validator.rb +24 -0
  58. data/lib/kimurai/template/spiders/application_spider.rb +143 -0
  59. data/lib/kimurai/template/tmp/.keep +0 -0
  60. data/lib/kimurai/version.rb +3 -0
  61. data/lib/kimurai.rb +54 -0
  62. metadata +349 -0
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 1c8491f3e3723cfbaed46c66dbe215fd2708f185bb5f53eda5464975926a39a9
4
+ data.tar.gz: fabb7dbc5da7aa18963996f4b161f533198737f19007eb3c93c083745a885f2a
5
+ SHA512:
6
+ metadata.gz: 44aa5bdc2349826677f3f7bc935db5a2b7a3044176ddc3189d4fdd0f8f5f0d1f6092072c58d62567ffcfacd8f9ff5510c4a23b82db159a3eb7e72b3dfc9c3a6b
7
+ data.tar.gz: 739df8d22a98a8e59c9ec077f5fa92212757746e9cb0aea380e9d43c6e7d8e0f85c3121e02cd83a45770b19861acc2f48ef1d04dd2c990b78f9cce0e8b9aa1f0
data/.gitignore ADDED
@@ -0,0 +1,11 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /_yardoc/
4
+ /coverage/
5
+ /doc/
6
+ /pkg/
7
+ /spec/reports/
8
+ /tmp/
9
+ Gemfile.lock
10
+
11
+ *.retry
data/.travis.yml ADDED
@@ -0,0 +1,5 @@
1
+ sudo: false
2
+ language: ruby
3
+ rvm:
4
+ - 2.5.1
5
+ before_install: gem install bundler -v 1.16.2
data/CHANGELOG.md ADDED
@@ -0,0 +1,111 @@
1
+ # CHANGELOG
2
+ ## 1.4.0
3
+ ### New
4
+ * Add `encoding` config option (see [All available config options](https://github.com/vifreefly/kimuraframework#all-available-config-options))
5
+ * Validate url before processing a request (Base#request_to)
6
+
7
+ ### Fixes
8
+ * Fix console command bug (see [issue 21](https://github.com/vifreefly/kimuraframework/issues/21))
9
+
10
+ ## 1.3.2
11
+ ### Fixes
12
+ * In the project template, set Ruby version as >= 2.5 (before was hard-coded to 2.5.1)
13
+ * Remove .ruby-version file (was hard-coded to 2.5.1) from the project template
14
+
15
+ ## 1.3.1
16
+ ### Fixes
17
+ * Fixed bug in Base#save_to
18
+
19
+ ## 1.3.0
20
+ ### Breaking changes 1.3.0
21
+ * Remove persistence database feature (because it's slow and makes things complicated)
22
+
23
+ ### New
24
+ * Add `--include` and `--exclude` options to CLI#runner
25
+ * Add Base `#create_browser` method to easily create additional browser instances
26
+ * Add Capybara::Session `#scroll_to_bottom`
27
+ * Add skip_on_failure feature to `retry_request_errors` config option
28
+ * Add info about `add_event` method to the README
29
+
30
+ ### Fixes and improvements
31
+ * Improve Runner
32
+ * Fix time helper in schedule.rb
33
+ * Add proxy validation to browser builders
34
+ * Allow to pass different arguments to the `Base.parse` method
35
+
36
+ ## 1.2.0
37
+ ### New
38
+ * Add possibility to add array of values to the storage (`Base::Storage#add`)
39
+ * Add `exception_on_fail` option to `Base.crawl!`
40
+ * Add possibility to pass request hash to the `start_urls` (You can use array of hashes as well, like: `@start_urls = [{ url: "https://example.com/cat?id=1", data: { category: "First Category" } }]`)
41
+ * Implement `skip_request_errors` config feature. Added [Handle request errors](https://github.com/vifreefly/kimuraframework#handle-request-errors) chapter to the README.
42
+ * Add option to choose response type for `Session#current_response` (`:html` default, or `:json`)
43
+ * Add option to provide custom chrome and chromedriver paths
44
+
45
+ ### Improvements
46
+ * Refactor `Runner`
47
+
48
+ ### Fixes
49
+ * Fix `Base#Saver` (automatically create file if it doesn't exists in case of persistence database)
50
+ * Do not deep merge config's `headers:` option
51
+
52
+ ## 1.1.0
53
+ ### Breaking changes 1.1.0
54
+ `browser` config option depricated. Now all sub-options inside `browser` should be placed right into `@config` hash, without `browser` parent key. Example:
55
+
56
+ ```ruby
57
+ # Was:
58
+ @config = {
59
+ browser: {
60
+ retry_request_errors: [Net::ReadTimeout],
61
+ restart_if: {
62
+ memory_limit: 350_000,
63
+ requests_limit: 100
64
+ },
65
+ before_request: {
66
+ change_proxy: true,
67
+ change_user_agent: true,
68
+ clear_cookies: true,
69
+ clear_and_set_cookies: true,
70
+ delay: 1..3
71
+ }
72
+ }
73
+ }
74
+
75
+ # Now:
76
+ @config = {
77
+ retry_request_errors: [Net::ReadTimeout],
78
+ restart_if: {
79
+ memory_limit: 350_000,
80
+ requests_limit: 100
81
+ },
82
+ before_request: {
83
+ change_proxy: true,
84
+ change_user_agent: true,
85
+ clear_cookies: true,
86
+ clear_and_set_cookies: true,
87
+ delay: 1..3
88
+ }
89
+ }
90
+ ```
91
+
92
+ ### New
93
+ * Add `storage` object with additional methods and persistence database feature
94
+ * Add events feature to `run_info`
95
+ * Add `skip_duplicate_requests` config option to automatically skip already visited urls when using requrst_to
96
+ * Add `extensions` config option to allow inject JS code into browser (supported only by poltergeist_phantomjs engine)
97
+ * Add Capybara::Session#within_new_window_by method
98
+
99
+ ### Improvements
100
+ * Add the last backtrace line to pipeline output when item was dropped
101
+ * Do not destroy driver if it's not exists (for Base.parse! method)
102
+ * Handle possible Net::ReadTimeout error while trying to #quit driver
103
+
104
+ ### Fixes
105
+ * Fix Mechanize::Driver#proxy (there was a bug while using proxy for mechanize engine without authorization)
106
+ * Fix requests retries logic
107
+
108
+
109
+ ## 1.0.1
110
+ * Add missing `logger` method to pipeline
111
+ * Fix `set_proxy` in Mechanize and Poltergeist builders
data/Gemfile ADDED
@@ -0,0 +1,6 @@
1
+ source "https://rubygems.org"
2
+
3
+ git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }
4
+
5
+ # Specify your gem's dependencies in kimurai.gemspec
6
+ gemspec
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2018 Victor Afanasev
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.