shrimple 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: e11086eed094342c573b7e63b6150d44ceb089ba
4
+ data.tar.gz: 73c918822c7f614950bc832aa606e56a19f14392
5
+ SHA512:
6
+ metadata.gz: ec92531f5eb31c9546aaf6406956cfaee57dd55188f41ef7e800644e755d16728c2c8dbae9df18952a7ebc4281b3a92761e8d9e7eb98076064a963b99e06aa24
7
+ data.tar.gz: 9f0263d4910fbcc6a2f5873fafb18707f837da080d30390d6ad7909d3bdac6fe0f0efa86636decc004bac1a9cc8554a4a5cba5572bda158390a203045b72c63e
data/.gitignore ADDED
@@ -0,0 +1,2 @@
1
+ *.gem
2
+ Gemfile.lock
data/.travis.yml ADDED
@@ -0,0 +1,9 @@
1
+ language: ruby
2
+
3
+ rvm:
4
+ - 1.9.3
5
+ - 2.0.0
6
+ - 2.1.1
7
+ - ruby-head
8
+ - jruby-19mode
9
+ - jruby-head
data/Gemfile ADDED
@@ -0,0 +1,9 @@
1
+ source 'https://rubygems.org'
2
+
3
+ gem 'hashie'
4
+
5
+ group :test do
6
+ gem 'rspec'
7
+ gem 'rake' # for travis-ci
8
+ gem 'dimensions'
9
+ end
data/LICENSE.txt ADDED
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2012 adeven GmbH Manuel Kniep
2
+
3
+ MIT License
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,125 @@
1
+ # Shrimple
2
+
3
+ Launches PhantomJS to render web sites or local files (or have
4
+ Phantom do pretty much everything).
5
+ Shrimple started as a set of patches for [Shrimp](https://github.com/adjust/shrimp).
6
+
7
+ [![Build Status](https://travis-ci.org/bronson/shrimple.svg?branch=master)](https://travis-ci.org/bronson/shrimple)
8
+ [![Gem Version](https://badge.fury.io/rb/shrimple.svg)](http://badge.fury.io/rb/shrimple)
9
+
10
+
11
+
12
+ ## Installation
13
+
14
+ Install [PhantomJS](http://phantomjs.org/download.html), then add this line to your application's Gemfile:
15
+
16
+ gem 'shrimple', git: 'https://github.com/bronson/shrimple'
17
+
18
+ and execute `bundle`.
19
+
20
+ ## Usage
21
+
22
+ Render to a file:
23
+
24
+ ```ruby
25
+ require 'shrimple'
26
+
27
+ s = Shrimple.new( page: { paperSize: { format: 'A4' }} )
28
+ s.render_pdf('http://bl.ocks.org/mbostock', to: '/tmp/output.pdf')
29
+ ```
30
+
31
+ Or render to a variable by omitting the destination:
32
+
33
+ ```ruby
34
+ result = Shrimple.new.render_text('http://thingsididlastnight.com')
35
+ result.stdout # <== TODO: naming stdout is arcane
36
+ => "Your Mom\n"
37
+ ```
38
+
39
+ Render in the background:
40
+
41
+ TODO: show background mode
42
+ TODO: show start_time and finish_time?
43
+
44
+
45
+ ## Configuration
46
+
47
+ Shrimple supports all configuration options provided by PhantomJS
48
+ (including unanticipated ones added in the future).
49
+
50
+ Options specified later override those specified earlier, and
51
+ options passed directly to render only affect that call -- they are not remembered.
52
+
53
+ ```ruby
54
+ s = Shrimple.new( page: { zoomFactor: 0.5 }, timeout: 10 )
55
+ s.page.paperSize = { border: '3cm', format: 'A4', orientation: 'landscape' }
56
+ s.render_pdf('http://joeyh.name/blog/', to: '/tmp/joey.pdf')
57
+ ```
58
+
59
+ * Set options passed to PhantomJS's [command line](http://phantomjs.org/api/command-line.html) with `config`:<br>
60
+ `s.config.loadImages = false`<br>
61
+ Phantom requires these to be in JSON notation: `proxyType` instead of `--proxy-type`.
62
+
63
+ * Set options in PhantomJS's [web page module](http://phantomjs.org/api/webpage/) with `page`:<br>
64
+ `s.page.paperSize.orientation = 'landscape'`
65
+
66
+ * Set options passed to PhantomJS's [render call](http://phantomjs.org/api/webpage/method/render.html) with `render`:<br>
67
+ `s.render = { format: 'jpeg', quality: 85 }`
68
+
69
+ See [default_config.rb](https://github.com/bronson/shrimple/blob/master/lib/shrimple/default_config.rb)
70
+ for the known options all listed in one place.
71
+
72
+ ### Shrimple Options
73
+
74
+ - **background** If true, the PhantomJS process will be spawned in the background
75
+ and Ruby execution will resume immediatley.<br>
76
+ `background: false`
77
+
78
+ - **timeout** The time in seconds after which the PhantomJS executable is killed.
79
+ If killed, the render results in an error.<br>
80
+ `timeout: nil`
81
+
82
+ - **output / to** Specifies the destination file. If you don't specify a destination
83
+ then the output is buffered into memory and can be retrieved with `result.stdout`.
84
+ `to` is just a more readable synonym for `output`.
85
+
86
+ - **stderr** The path to save phantom's stderr. Normally it's buffered into memory
87
+ and can be retrieved with `result.stderr`
88
+
89
+ - **onSuccess** A Ruby proc to be called when the render succeeds.<br>
90
+ `onSuccess = ->(result) { ftp.put(result.stdout) }`
91
+
92
+ - **onError** A Ruby proc called when the render fails or is killed.<br>
93
+ `onError = ->(result) { page_admin(result.stderr, result.options.to_hash) }`
94
+
95
+ ####
96
+
97
+ These are more obscure, only necessary if you're trying to use Phantom in
98
+ an obscure way.
99
+
100
+ - **input** specifies the source file to render. Normally you'd pass this as the first
101
+ argument to render. Use this option if you want to specify the input file once and render it multiple times.
102
+ You must specify a valid URL. Use `file://test_file.html` to specify a file on the local filesystem.
103
+
104
+ - **execuatable** a path to the phantomjs exectuable to use. Shrimple searches
105
+ pretty hard for installed phantomjs executables so there's usually no need
106
+ to specify this.
107
+
108
+ - **renderer** the render.js script to pass to Phantom. Probably only useful for testing.
109
+
110
+
111
+ ## Changes to Shrimp
112
+
113
+ - Added background mode (even works in JRuby >1.7.4).
114
+ - Allows configuring pretty much anything: proxies, userName/password, scrollPosition, jpeg quality, etc.
115
+ - Prevents potential shell attacks by ensuring options aren't passed on the command line.
116
+ - Better error handling.
117
+ - Removed middleware. In my app, background mode made it unnecessary. Besides, I could never get it to work reliably.
118
+
119
+
120
+ ## Copyright
121
+
122
+ Shrimp, the original project, is Copyright © 2012 adeven (Manuel Kniep).
123
+ It is free software, and may be redistributed under the MIT License (see LICENSE.txt).
124
+
125
+ Shrimple is also Copyright © 2013 Scott Bronson and may be redistributed under the same terms.
data/Rakefile ADDED
@@ -0,0 +1,7 @@
1
+ require 'bundler/gem_tasks'
2
+ require 'rspec/core/rake_task'
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task default: :spec
7
+ task test: :spec
data/lib/render.js ADDED
@@ -0,0 +1,68 @@
1
+ // Invokes PhantomJS to render a webpage to stdout. Config is supplied as json on stdin.
2
+
3
+ /* jshint phantom: true */
4
+
5
+
6
+ var system = require('system')
7
+ var page = require('webpage').create()
8
+
9
+ function errorHandler(msg, trace) {
10
+ system.stderr.writeLine(msg)
11
+ trace.forEach(function(item) {
12
+ system.stderr.writeLine(' -> ' + (item.file || item.sourceURL) + ': ' + item.line + (item.function ? ' (in function ' + item.function + ')' : ''));
13
+ })
14
+ phantom.exit(1)
15
+ }
16
+
17
+ phantom.onError = function(msg, trace) { errorHandler("PHANTOM ERROR: " + msg, trace) }
18
+
19
+ page.onError = function(msg, trace) { errorHandler("PAGE ERROR: " + msg, trace) }
20
+
21
+ var config = JSON.parse(system.stdin.read())
22
+
23
+ for(var key in config.page) {
24
+ page[key] = config.page[key]
25
+ }
26
+
27
+ page.open(config.input, function (status) {
28
+ if (status !== 'success' /* || (statusCode !== 200 && statusCode !== null) */) {
29
+ system.stderr.writeLine('Unable to load ' + config.input);
30
+ phantom.exit(1);
31
+ }
32
+
33
+ if(config.render.format === 'text') {
34
+ system.stdout.writeLine(page.plainText)
35
+ } else if(config.render.format === 'html') {
36
+ system.stdout.writeLine(page.content)
37
+ } else {
38
+ page.render('/dev/stdout', config.render);
39
+ }
40
+
41
+ phantom.exit(0);
42
+ });
43
+
44
+
45
+
46
+ // if (typeof cookie_file !== 'undefined') {
47
+ // try {
48
+ // var f = fs.open(cookie_file, "r");
49
+ // cookies = JSON.parse(f.read());
50
+ // fs.remove(cookie_file)
51
+ // } catch (e) {
52
+ // // TODO: run this through regular error reporter. just don't catch right?
53
+ // console.log(e);
54
+ // }
55
+
56
+ // phantom.cookiesEnabled = true;
57
+ // phantom.cookies = cookies;
58
+ // }
59
+
60
+
61
+
62
+ // Determine the statusCode
63
+ // page.onResourceReceived = function (resource) {
64
+ // if (resource.url === address) {
65
+ // statusCode = resource.status;
66
+ // }
67
+ // };
68
+
data/lib/shrimple.rb ADDED
@@ -0,0 +1,120 @@
1
+ # Keeps track of options and calls phantoimjs to run the render script.
2
+
3
+ # TODO: make debug mode continually dump stderr to parent stderr, plus add debugging to render script
4
+ # TODO: add a header and footer to the page, printheaderfooter.coffee
5
+ # TODO: support for injectjs? http://phantomjs.org/tips-and-tricks.html
6
+ # and maybe page.evaluate(function() { document.body.bgColor = 'white'; });
7
+ # TODO: fix syntax of non-background mode, make retrieving data easier
8
+ # TODO: allow constructor to accept multiple hashes, add merge method to merge options hashes
9
+ # TODO: add ability to specify max number of processes in process_monitor
10
+ # TODO: add onResourceTimeout: https://github.com/onlyurei/phantomjs/commit/fa5a3504070f86a99f11469a3b7eb17a0b005ef7
11
+ # TODO: add cookiefile support?
12
+ # TODO: should block if user tries to launch too man phantom processes? Like SizedQueue.
13
+ # TODO: wow --config=file sucks. maybe add a way to specify cmdline args again?
14
+ # either that or fix phantomjs... https://github.com/ariya/phantomjs/issues/12265 https://github.com/ariya/phantomjs/issues/11775
15
+ # TODO: test that page.customHeaders appear in the network requests (how....?)
16
+
17
+ # maybe:
18
+ # TODO: support casperjs?
19
+ # TODO: fill in both "can read partial" tests
20
+ # TODO: include lots of info about page load in logfile
21
+ # TODO: documentation! probably using sdoc or yard?
22
+ # TODO: possible to test margins? printmargins.coffee
23
+ # TODO: bl.ocks.org/mbostock: page renders before it's done downloading. able to add arbitrary delay or js function to wait before rendering?
24
+
25
+
26
+ require 'hashie/mash'
27
+ require 'shrimple/phantom'
28
+ require 'shrimple/default_config'
29
+
30
+
31
+ class Shrimple
32
+ attr_accessor :options
33
+
34
+ # allows setting config options directly on this object: s.timeout = 10
35
+ def method_missing name, *args, &block
36
+ options.send(name, *args, &block)
37
+ end
38
+
39
+
40
+ def initialize opts={}
41
+ @options = Hashie::Mash.new(Shrimple::DefaultConfig)
42
+ @options.deep_merge!(opts)
43
+ self.executable ||= self.class.default_executable
44
+ self.renderer ||= self.class.default_renderer
45
+ end
46
+
47
+
48
+ # might be time to allow method_missing to handle these helpers...
49
+ def render_pdf src, *opts
50
+ render src, {render: {format: 'pdf'}}, *opts
51
+ end
52
+
53
+ def render_png src, *opts
54
+ render src, {render: {format: 'png'}}, *opts
55
+ end
56
+
57
+ def render_jpeg src, *opts
58
+ render src, {render: {format: 'jpeg'}}, *opts
59
+ end
60
+
61
+ def render_gif src, *opts
62
+ render src, {render: {format: 'gif'}}, *opts
63
+ end
64
+
65
+ def render_html src, *opts
66
+ render src, {render: {format: 'html'}}, *opts
67
+ end
68
+
69
+ def render_text src, *opts
70
+ render src, {render: {format: 'text'}}, *opts
71
+ end
72
+
73
+
74
+
75
+ def render src={}, *opts
76
+ full_opts = get_full_options(src, *opts)
77
+ phantom = Shrimple::Phantom.new(full_opts)
78
+ phantom.wait unless full_opts[:background]
79
+ phantom
80
+ end
81
+
82
+ def get_full_options src, *inopts
83
+ exopts = options.dup
84
+ onSuccess = exopts.delete(:onSuccess)
85
+ onError = exopts.delete(:onError)
86
+
87
+ full_opts = Shrimple.deep_dup(exopts)
88
+ full_opts.deep_merge!(src) if src && src.kind_of?(Hash)
89
+ inopts.each { |opt| full_opts.deep_merge!(opt) }
90
+ full_opts.merge!(input: src) if src && !src.kind_of?(Hash)
91
+ full_opts.merge!(output: full_opts.delete(:to)) if full_opts[:to]
92
+ full_opts.merge!(onSuccess: onSuccess, onError: onError)
93
+
94
+ self.class.compact!(full_opts)
95
+ full_opts
96
+ end
97
+
98
+
99
+ # how are these not a part of Hash?
100
+ def self.compact! hash
101
+ hash.delete_if { |k,v| v.nil? or (v.is_a?(Hash) && compact!(v).empty?) or (v.respond_to?('empty?') && v.empty?) }
102
+ end
103
+
104
+ def self.deep_dup hash
105
+ Marshal.load(Marshal.dump(hash))
106
+ end
107
+
108
+
109
+ def self.processes
110
+ @processes ||= Shrimple::ProcessMonitor.new
111
+ end
112
+
113
+ def self.default_renderer
114
+ File.expand_path('../render.js', __FILE__)
115
+ end
116
+
117
+ def self.default_executable
118
+ (defined?(Bundler::GemfileError) ? `bundle exec which phantomjs` : `which phantomjs`).chomp
119
+ end
120
+ end
@@ -0,0 +1,114 @@
1
+ # Here is a list of all the phantomjs settings that we know about.
2
+ # Note that you can set any option you want. It need not be listed here.
3
+ #
4
+ # nil means leave unset -- use Phantom's defaults
5
+
6
+
7
+ class Shrimple
8
+ DefaultConfig = {
9
+
10
+ #
11
+ # options for launching the PhantomJS executable
12
+ #
13
+
14
+ background: nil, # false blocks until page is rendered, true returns immediately
15
+ executable: nil, # specifies the PhantomJS executable to use. If unspecified then Shrimple will search for one.
16
+ renderer: nil, # the render script to use. Useful for testing, or if you want to do something other than rendering the page.
17
+ timeout: nil, # time in seconds after which the PhantomJS process should simply be killed
18
+ input: nil, # specifies the URL to request (use file:// for local assets). Can also be specified by the optional first argument to render calls.
19
+ output: nil, # path to the rendered output file, nil to buffer output in memory. "to" is a more readable synonym: 'render url, to: file'.
20
+ stderr: nil, # path to the file to receive PhantomJS's stderr, leave nil to store it in a string
21
+ onSuccess: nil, # a function to call when the pdf has been successfully rendered. called before process is removed from Shrimple.processes so, if it blocks, process table fills up. Useful for rate limiting.
22
+ onError: nil, # a function to call when the pdf has failed for whatever reason (timeout, killed, network error, etc). Called before process is removed from Shrimple.processes.
23
+
24
+
25
+ #
26
+ # arguments passed to the PhantomJS render method http://phantomjs.org/api/webpage/method/render.html
27
+ #
28
+
29
+ render: {
30
+ format: nil, # format for the output file. usually supplied by a helper (render_pdf, render_png, etc)
31
+ quality: nil # only relevant to format=jpeg I think, range is 1-100. not sure what Phantom's default is
32
+ },
33
+
34
+
35
+ #
36
+ # command-line options passed to PhantomJS in --config: http://phantomjs.org/api/command-line.html
37
+ #
38
+
39
+ config: {
40
+ cookiesFile: nil, # path to the persitent cookies file
41
+ diskCache: nil, # if true, caches requested assets. Defaults to false. See config.maxDiskCacheSize. The cache location is not currently configurable.
42
+ ignoreSslErrors: nil, # if true, SSL errors won't prevent page from being rendered. defaults to false
43
+ loadImages: nil, # load inlined images? defaults to true. see also page.settings.loadImages
44
+ localStoragePath: nil, # directory to save LocalStorage and WebSQL content
45
+ localStorageQuota: nil, # maximum size for local data
46
+ localToRemoteUrlAccess: nil, # local content can initiate requests for remote assets? Defaults to false. also see page.settings.localToRemoteUrlAccessEnabled
47
+ maxDiskCacheSize: nil, # maximum size for disk cache in KB. Also see config.diskCache.
48
+ outputEncoding: nil, # sets the encoding used in the logfile. nil means "utf8"
49
+ remoteDebuggerPort: nil, # starts the render script in a debug harness and listens on this port
50
+ remoteDebuggerAutorun: nil, # run the render script in a debugger? defaults to false, probably never needed
51
+ proxy: nil, # proxy to use in "address:port" format
52
+ proxyType: nil, # type of proxy to use
53
+ proxyAuth: nil, # authentication information for proxy
54
+ scriptEncoding: nil, # encoding of the render script, defaults to "utf8"
55
+ sslProtocol: nil, # the protocol to use for SSL connections, defaults to "SSLv3"
56
+ webSecurity: nil # enable web security and forbid cross-domain XHR? Defaults to true
57
+ },
58
+
59
+
60
+ #
61
+ # settings for rendering the page: http://phantomjs.org/api/webpage/
62
+ #
63
+
64
+ page: {
65
+ canGoBack: nil, # allow javascript navigation, defaults to false
66
+ canGoForward: nil, # allow javascript navigation, defaults to false
67
+ clipRect: { # area to rasterize when page.render is called
68
+ left: nil, # Defaults to (0,0,0,0) meaning render the entire page
69
+ top: nil,
70
+ width: nil,
71
+ height: nil
72
+ },
73
+ customHeaders: { # headers added to every HTTP request. if nil, Shrimple.DefaultHeaders is used.
74
+ "Accept-Encoding" => "identity" # Don't accept gzipped responses, work around https://github.com/ariya/phantomjs/issues/10930
75
+ },
76
+ # event? http://phantomjs.org/api/webpage/property/event.html
77
+ # libraryPath? # might be useful if we add support for calling injectJS
78
+ navigationLocked: nil, # if true, phantomjs prevents navigating away from the page. Defaults to false.
79
+ offlineStoragePath: nil, # file to contain offline storage data
80
+ offlineStorageQuota: nil, # maximum amount of data allowed in offline storage
81
+ ownsPages: nil, # should child pages (opened with window.open()) be closed when parent closes? Defaults to true.
82
+ paperSize: { # the size of the rendered output http://phantomjs.org/api/webpage/property/paper-size.html
83
+ format: nil, # size for pdf pages, defaults to 'A4'?
84
+ orientation: nil, # orientation for pdf pages, defautls to 'portrait?'
85
+ width: nil, # width of png/jpeg/gif
86
+ height: nil, # height of png/jpeg/gif
87
+ border: nil # blank border around the page, defaults to '1cm'?
88
+ # margin: nil # use border instead
89
+ },
90
+ scrollPosition: { # scroll page to here before rendering
91
+ left: nil, # defaults to (0,0) which renders the entire page
92
+ top: nil
93
+ },
94
+ settings: { # request settings: http://phantomjs.org/api/webpage/property/settings.html
95
+ javascriptCanCloseWindows: nil, # whether window.open() is allowed, defaults to true
96
+ javascriptCanOpenWindows: nil, # whether window.close() is allowed, defaults to true
97
+ javascriptEnabled: nil, # if false, Javascript in the requested page is not executed. Defaults to true.
98
+ loadImages: nil, # if false, inlined images in the requested page are not loaded (see also config.loadImages). Defaults to true.
99
+ localToRemoteUrlAccessEnabled: nil, # if true, local resources (like a page loaded using file:// url) are able to load remote assets. Defaults to false.
100
+ password: nil, # password for basic HTTP authentication, see also userName
101
+ resourceTimeout: nil, # time in ms after which request will stop and onResourceTimeout() is called
102
+ userAgent: nil, # user agent string for requests (nil means use PhantomJS's default WebKitty one)
103
+ userName: nil, # name for basic HTTP authentication, see also password
104
+ webSecurityEnabled: nil, # see config.webSecurity. Defaults to true.
105
+ XSSAuditingEnabled: nil # monitor requests for XSS attempts. Defaults to false.
106
+ },
107
+ viewportSize: { # sets the size of the virtual browser window
108
+ width: nil,
109
+ height: nil
110
+ },
111
+ zoomFactor: nil # 4.0 increases page by 4X before rendering (right?), 0.25 shrinks page by 4X. Defaults to 1.0.
112
+ }
113
+ }
114
+ end