shrimple 0.8.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: e11086eed094342c573b7e63b6150d44ceb089ba
4
+ data.tar.gz: 73c918822c7f614950bc832aa606e56a19f14392
5
+ SHA512:
6
+ metadata.gz: ec92531f5eb31c9546aaf6406956cfaee57dd55188f41ef7e800644e755d16728c2c8dbae9df18952a7ebc4281b3a92761e8d9e7eb98076064a963b99e06aa24
7
+ data.tar.gz: 9f0263d4910fbcc6a2f5873fafb18707f837da080d30390d6ad7909d3bdac6fe0f0efa86636decc004bac1a9cc8554a4a5cba5572bda158390a203045b72c63e
data/.gitignore ADDED
@@ -0,0 +1,2 @@
1
+ *.gem
2
+ Gemfile.lock
data/.travis.yml ADDED
@@ -0,0 +1,9 @@
1
+ language: ruby
2
+
3
+ rvm:
4
+ - 1.9.3
5
+ - 2.0.0
6
+ - 2.1.1
7
+ - ruby-head
8
+ - jruby-19mode
9
+ - jruby-head
data/Gemfile ADDED
@@ -0,0 +1,9 @@
1
+ source 'https://rubygems.org'
2
+
3
+ gem 'hashie'
4
+
5
+ group :test do
6
+ gem 'rspec'
7
+ gem 'rake' # for travis-ci
8
+ gem 'dimensions'
9
+ end
data/LICENSE.txt ADDED
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2012 adeven GmbH Manuel Kniep
2
+
3
+ MIT License
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,125 @@
1
+ # Shrimple
2
+
3
+ Launches PhantomJS to render web sites or local files (or have
4
+ Phantom do pretty much everything).
5
+ Shrimple started as a set of patches for [Shrimp](https://github.com/adjust/shrimp).
6
+
7
+ [![Build Status](https://travis-ci.org/bronson/shrimple.svg?branch=master)](https://travis-ci.org/bronson/shrimple)
8
+ [![Gem Version](https://badge.fury.io/rb/shrimple.svg)](http://badge.fury.io/rb/shrimple)
9
+
10
+
11
+
12
+ ## Installation
13
+
14
+ Install [PhantomJS](http://phantomjs.org/download.html), then add this line to your application's Gemfile:
15
+
16
+ gem 'shrimple', git: 'https://github.com/bronson/shrimple'
17
+
18
+ and execute `bundle`.
19
+
20
+ ## Usage
21
+
22
+ Render to a file:
23
+
24
+ ```ruby
25
+ require 'shrimple'
26
+
27
+ s = Shrimple.new( page: { paperSize: { format: 'A4' }} )
28
+ s.render_pdf('http://bl.ocks.org/mbostock', to: '/tmp/output.pdf')
29
+ ```
30
+
31
+ Or render to a variable by omitting the destination:
32
+
33
+ ```ruby
34
+ result = Shrimple.new.render_text('http://thingsididlastnight.com')
35
+ result.stdout # <== TODO: naming stdout is arcane
36
+ => "Your Mom\n"
37
+ ```
38
+
39
+ Render in the background:
40
+
41
+ TODO: show background mode
42
+ TODO: show start_time and finish_time?
43
+
44
+
45
+ ## Configuration
46
+
47
+ Shrimple supports all configuration options provided by PhantomJS
48
+ (including unanticipated ones added in the future).
49
+
50
+ Options specified later override those specified earlier, and
51
+ options passed directly to render only affect that call -- they are not remembered.
52
+
53
+ ```ruby
54
+ s = Shrimple.new( page: { zoomFactor: 0.5 }, timeout: 10 )
55
+ s.page.paperSize = { border: '3cm', format: 'A4', orientation: 'landscape' }
56
+ s.render_pdf('http://joeyh.name/blog/', to: '/tmp/joey.pdf')
57
+ ```
58
+
59
+ * Set options passed to PhantomJS's [command line](http://phantomjs.org/api/command-line.html) with `config`:<br>
60
+ `s.config.loadImages = false`<br>
61
+ Phantom requires these to be in JSON notation: `proxyType` instead of `--proxy-type`.
62
+
63
+ * Set options in PhantomJS's [web page module](http://phantomjs.org/api/webpage/) with `page`:<br>
64
+ `s.page.paperSize.orientation = 'landscape'`
65
+
66
+ * Set options passed to PhantomJS's [render call](http://phantomjs.org/api/webpage/method/render.html) with `render`:<br>
67
+ `s.render = { format: 'jpeg', quality: 85 }`
68
+
69
+ See [default_config.rb](https://github.com/bronson/shrimple/blob/master/lib/shrimple/default_config.rb)
70
+ for the known options all listed in one place.
71
+
72
+ ### Shrimple Options
73
+
74
+ - **background** If true, the PhantomJS process will be spawned in the background
75
+ and Ruby execution will resume immediatley.<br>
76
+ `background: false`
77
+
78
+ - **timeout** The time in seconds after which the PhantomJS executable is killed.
79
+ If killed, the render results in an error.<br>
80
+ `timeout: nil`
81
+
82
+ - **output / to** Specifies the destination file. If you don't specify a destination
83
+ then the output is buffered into memory and can be retrieved with `result.stdout`.
84
+ `to` is just a more readable synonym for `output`.
85
+
86
+ - **stderr** The path to save phantom's stderr. Normally it's buffered into memory
87
+ and can be retrieved with `result.stderr`
88
+
89
+ - **onSuccess** A Ruby proc to be called when the render succeeds.<br>
90
+ `onSuccess = ->(result) { ftp.put(result.stdout) }`
91
+
92
+ - **onError** A Ruby proc called when the render fails or is killed.<br>
93
+ `onError = ->(result) { page_admin(result.stderr, result.options.to_hash) }`
94
+
95
+ ####
96
+
97
+ These are more obscure, only necessary if you're trying to use Phantom in
98
+ an obscure way.
99
+
100
+ - **input** specifies the source file to render. Normally you'd pass this as the first
101
+ argument to render. Use this option if you want to specify the input file once and render it multiple times.
102
+ You must specify a valid URL. Use `file://test_file.html` to specify a file on the local filesystem.
103
+
104
+ - **execuatable** a path to the phantomjs exectuable to use. Shrimple searches
105
+ pretty hard for installed phantomjs executables so there's usually no need
106
+ to specify this.
107
+
108
+ - **renderer** the render.js script to pass to Phantom. Probably only useful for testing.
109
+
110
+
111
+ ## Changes to Shrimp
112
+
113
+ - Added background mode (even works in JRuby >1.7.4).
114
+ - Allows configuring pretty much anything: proxies, userName/password, scrollPosition, jpeg quality, etc.
115
+ - Prevents potential shell attacks by ensuring options aren't passed on the command line.
116
+ - Better error handling.
117
+ - Removed middleware. In my app, background mode made it unnecessary. Besides, I could never get it to work reliably.
118
+
119
+
120
+ ## Copyright
121
+
122
+ Shrimp, the original project, is Copyright © 2012 adeven (Manuel Kniep).
123
+ It is free software, and may be redistributed under the MIT License (see LICENSE.txt).
124
+
125
+ Shrimple is also Copyright © 2013 Scott Bronson and may be redistributed under the same terms.
data/Rakefile ADDED
@@ -0,0 +1,7 @@
1
+ require 'bundler/gem_tasks'
2
+ require 'rspec/core/rake_task'
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task default: :spec
7
+ task test: :spec
data/lib/render.js ADDED
@@ -0,0 +1,68 @@
1
+ // Invokes PhantomJS to render a webpage to stdout. Config is supplied as json on stdin.
2
+
3
+ /* jshint phantom: true */
4
+
5
+
6
+ var system = require('system')
7
+ var page = require('webpage').create()
8
+
9
+ function errorHandler(msg, trace) {
10
+ system.stderr.writeLine(msg)
11
+ trace.forEach(function(item) {
12
+ system.stderr.writeLine(' -> ' + (item.file || item.sourceURL) + ': ' + item.line + (item.function ? ' (in function ' + item.function + ')' : ''));
13
+ })
14
+ phantom.exit(1)
15
+ }
16
+
17
+ phantom.onError = function(msg, trace) { errorHandler("PHANTOM ERROR: " + msg, trace) }
18
+
19
+ page.onError = function(msg, trace) { errorHandler("PAGE ERROR: " + msg, trace) }
20
+
21
+ var config = JSON.parse(system.stdin.read())
22
+
23
+ for(var key in config.page) {
24
+ page[key] = config.page[key]
25
+ }
26
+
27
+ page.open(config.input, function (status) {
28
+ if (status !== 'success' /* || (statusCode !== 200 && statusCode !== null) */) {
29
+ system.stderr.writeLine('Unable to load ' + config.input);
30
+ phantom.exit(1);
31
+ }
32
+
33
+ if(config.render.format === 'text') {
34
+ system.stdout.writeLine(page.plainText)
35
+ } else if(config.render.format === 'html') {
36
+ system.stdout.writeLine(page.content)
37
+ } else {
38
+ page.render('/dev/stdout', config.render);
39
+ }
40
+
41
+ phantom.exit(0);
42
+ });
43
+
44
+
45
+
46
+ // if (typeof cookie_file !== 'undefined') {
47
+ // try {
48
+ // var f = fs.open(cookie_file, "r");
49
+ // cookies = JSON.parse(f.read());
50
+ // fs.remove(cookie_file)
51
+ // } catch (e) {
52
+ // // TODO: run this through regular error reporter. just don't catch right?
53
+ // console.log(e);
54
+ // }
55
+
56
+ // phantom.cookiesEnabled = true;
57
+ // phantom.cookies = cookies;
58
+ // }
59
+
60
+
61
+
62
+ // Determine the statusCode
63
+ // page.onResourceReceived = function (resource) {
64
+ // if (resource.url === address) {
65
+ // statusCode = resource.status;
66
+ // }
67
+ // };
68
+
data/lib/shrimple.rb ADDED
@@ -0,0 +1,120 @@
1
+ # Keeps track of options and calls phantoimjs to run the render script.
2
+
3
+ # TODO: make debug mode continually dump stderr to parent stderr, plus add debugging to render script
4
+ # TODO: add a header and footer to the page, printheaderfooter.coffee
5
+ # TODO: support for injectjs? http://phantomjs.org/tips-and-tricks.html
6
+ # and maybe page.evaluate(function() { document.body.bgColor = 'white'; });
7
+ # TODO: fix syntax of non-background mode, make retrieving data easier
8
+ # TODO: allow constructor to accept multiple hashes, add merge method to merge options hashes
9
+ # TODO: add ability to specify max number of processes in process_monitor
10
+ # TODO: add onResourceTimeout: https://github.com/onlyurei/phantomjs/commit/fa5a3504070f86a99f11469a3b7eb17a0b005ef7
11
+ # TODO: add cookiefile support?
12
+ # TODO: should block if user tries to launch too man phantom processes? Like SizedQueue.
13
+ # TODO: wow --config=file sucks. maybe add a way to specify cmdline args again?
14
+ # either that or fix phantomjs... https://github.com/ariya/phantomjs/issues/12265 https://github.com/ariya/phantomjs/issues/11775
15
+ # TODO: test that page.customHeaders appear in the network requests (how....?)
16
+
17
+ # maybe:
18
+ # TODO: support casperjs?
19
+ # TODO: fill in both "can read partial" tests
20
+ # TODO: include lots of info about page load in logfile
21
+ # TODO: documentation! probably using sdoc or yard?
22
+ # TODO: possible to test margins? printmargins.coffee
23
+ # TODO: bl.ocks.org/mbostock: page renders before it's done downloading. able to add arbitrary delay or js function to wait before rendering?
24
+
25
+
26
+ require 'hashie/mash'
27
+ require 'shrimple/phantom'
28
+ require 'shrimple/default_config'
29
+
30
+
31
+ class Shrimple
32
+ attr_accessor :options
33
+
34
+ # allows setting config options directly on this object: s.timeout = 10
35
+ def method_missing name, *args, &block
36
+ options.send(name, *args, &block)
37
+ end
38
+
39
+
40
+ def initialize opts={}
41
+ @options = Hashie::Mash.new(Shrimple::DefaultConfig)
42
+ @options.deep_merge!(opts)
43
+ self.executable ||= self.class.default_executable
44
+ self.renderer ||= self.class.default_renderer
45
+ end
46
+
47
+
48
+ # might be time to allow method_missing to handle these helpers...
49
+ def render_pdf src, *opts
50
+ render src, {render: {format: 'pdf'}}, *opts
51
+ end
52
+
53
+ def render_png src, *opts
54
+ render src, {render: {format: 'png'}}, *opts
55
+ end
56
+
57
+ def render_jpeg src, *opts
58
+ render src, {render: {format: 'jpeg'}}, *opts
59
+ end
60
+
61
+ def render_gif src, *opts
62
+ render src, {render: {format: 'gif'}}, *opts
63
+ end
64
+
65
+ def render_html src, *opts
66
+ render src, {render: {format: 'html'}}, *opts
67
+ end
68
+
69
+ def render_text src, *opts
70
+ render src, {render: {format: 'text'}}, *opts
71
+ end
72
+
73
+
74
+
75
+ def render src={}, *opts
76
+ full_opts = get_full_options(src, *opts)
77
+ phantom = Shrimple::Phantom.new(full_opts)
78
+ phantom.wait unless full_opts[:background]
79
+ phantom
80
+ end
81
+
82
+ def get_full_options src, *inopts
83
+ exopts = options.dup
84
+ onSuccess = exopts.delete(:onSuccess)
85
+ onError = exopts.delete(:onError)
86
+
87
+ full_opts = Shrimple.deep_dup(exopts)
88
+ full_opts.deep_merge!(src) if src && src.kind_of?(Hash)
89
+ inopts.each { |opt| full_opts.deep_merge!(opt) }
90
+ full_opts.merge!(input: src) if src && !src.kind_of?(Hash)
91
+ full_opts.merge!(output: full_opts.delete(:to)) if full_opts[:to]
92
+ full_opts.merge!(onSuccess: onSuccess, onError: onError)
93
+
94
+ self.class.compact!(full_opts)
95
+ full_opts
96
+ end
97
+
98
+
99
+ # how are these not a part of Hash?
100
+ def self.compact! hash
101
+ hash.delete_if { |k,v| v.nil? or (v.is_a?(Hash) && compact!(v).empty?) or (v.respond_to?('empty?') && v.empty?) }
102
+ end
103
+
104
+ def self.deep_dup hash
105
+ Marshal.load(Marshal.dump(hash))
106
+ end
107
+
108
+
109
+ def self.processes
110
+ @processes ||= Shrimple::ProcessMonitor.new
111
+ end
112
+
113
+ def self.default_renderer
114
+ File.expand_path('../render.js', __FILE__)
115
+ end
116
+
117
+ def self.default_executable
118
+ (defined?(Bundler::GemfileError) ? `bundle exec which phantomjs` : `which phantomjs`).chomp
119
+ end
120
+ end
@@ -0,0 +1,114 @@
1
+ # Here is a list of all the phantomjs settings that we know about.
2
+ # Note that you can set any option you want. It need not be listed here.
3
+ #
4
+ # nil means leave unset -- use Phantom's defaults
5
+
6
+
7
+ class Shrimple
8
+ DefaultConfig = {
9
+
10
+ #
11
+ # options for launching the PhantomJS executable
12
+ #
13
+
14
+ background: nil, # false blocks until page is rendered, true returns immediately
15
+ executable: nil, # specifies the PhantomJS executable to use. If unspecified then Shrimple will search for one.
16
+ renderer: nil, # the render script to use. Useful for testing, or if you want to do something other than rendering the page.
17
+ timeout: nil, # time in seconds after which the PhantomJS process should simply be killed
18
+ input: nil, # specifies the URL to request (use file:// for local assets). Can also be specified by the optional first argument to render calls.
19
+ output: nil, # path to the rendered output file, nil to buffer output in memory. "to" is a more readable synonym: 'render url, to: file'.
20
+ stderr: nil, # path to the file to receive PhantomJS's stderr, leave nil to store it in a string
21
+ onSuccess: nil, # a function to call when the pdf has been successfully rendered. called before process is removed from Shrimple.processes so, if it blocks, process table fills up. Useful for rate limiting.
22
+ onError: nil, # a function to call when the pdf has failed for whatever reason (timeout, killed, network error, etc). Called before process is removed from Shrimple.processes.
23
+
24
+
25
+ #
26
+ # arguments passed to the PhantomJS render method http://phantomjs.org/api/webpage/method/render.html
27
+ #
28
+
29
+ render: {
30
+ format: nil, # format for the output file. usually supplied by a helper (render_pdf, render_png, etc)
31
+ quality: nil # only relevant to format=jpeg I think, range is 1-100. not sure what Phantom's default is
32
+ },
33
+
34
+
35
+ #
36
+ # command-line options passed to PhantomJS in --config: http://phantomjs.org/api/command-line.html
37
+ #
38
+
39
+ config: {
40
+ cookiesFile: nil, # path to the persitent cookies file
41
+ diskCache: nil, # if true, caches requested assets. Defaults to false. See config.maxDiskCacheSize. The cache location is not currently configurable.
42
+ ignoreSslErrors: nil, # if true, SSL errors won't prevent page from being rendered. defaults to false
43
+ loadImages: nil, # load inlined images? defaults to true. see also page.settings.loadImages
44
+ localStoragePath: nil, # directory to save LocalStorage and WebSQL content
45
+ localStorageQuota: nil, # maximum size for local data
46
+ localToRemoteUrlAccess: nil, # local content can initiate requests for remote assets? Defaults to false. also see page.settings.localToRemoteUrlAccessEnabled
47
+ maxDiskCacheSize: nil, # maximum size for disk cache in KB. Also see config.diskCache.
48
+ outputEncoding: nil, # sets the encoding used in the logfile. nil means "utf8"
49
+ remoteDebuggerPort: nil, # starts the render script in a debug harness and listens on this port
50
+ remoteDebuggerAutorun: nil, # run the render script in a debugger? defaults to false, probably never needed
51
+ proxy: nil, # proxy to use in "address:port" format
52
+ proxyType: nil, # type of proxy to use
53
+ proxyAuth: nil, # authentication information for proxy
54
+ scriptEncoding: nil, # encoding of the render script, defaults to "utf8"
55
+ sslProtocol: nil, # the protocol to use for SSL connections, defaults to "SSLv3"
56
+ webSecurity: nil # enable web security and forbid cross-domain XHR? Defaults to true
57
+ },
58
+
59
+
60
+ #
61
+ # settings for rendering the page: http://phantomjs.org/api/webpage/
62
+ #
63
+
64
+ page: {
65
+ canGoBack: nil, # allow javascript navigation, defaults to false
66
+ canGoForward: nil, # allow javascript navigation, defaults to false
67
+ clipRect: { # area to rasterize when page.render is called
68
+ left: nil, # Defaults to (0,0,0,0) meaning render the entire page
69
+ top: nil,
70
+ width: nil,
71
+ height: nil
72
+ },
73
+ customHeaders: { # headers added to every HTTP request. if nil, Shrimple.DefaultHeaders is used.
74
+ "Accept-Encoding" => "identity" # Don't accept gzipped responses, work around https://github.com/ariya/phantomjs/issues/10930
75
+ },
76
+ # event? http://phantomjs.org/api/webpage/property/event.html
77
+ # libraryPath? # might be useful if we add support for calling injectJS
78
+ navigationLocked: nil, # if true, phantomjs prevents navigating away from the page. Defaults to false.
79
+ offlineStoragePath: nil, # file to contain offline storage data
80
+ offlineStorageQuota: nil, # maximum amount of data allowed in offline storage
81
+ ownsPages: nil, # should child pages (opened with window.open()) be closed when parent closes? Defaults to true.
82
+ paperSize: { # the size of the rendered output http://phantomjs.org/api/webpage/property/paper-size.html
83
+ format: nil, # size for pdf pages, defaults to 'A4'?
84
+ orientation: nil, # orientation for pdf pages, defautls to 'portrait?'
85
+ width: nil, # width of png/jpeg/gif
86
+ height: nil, # height of png/jpeg/gif
87
+ border: nil # blank border around the page, defaults to '1cm'?
88
+ # margin: nil # use border instead
89
+ },
90
+ scrollPosition: { # scroll page to here before rendering
91
+ left: nil, # defaults to (0,0) which renders the entire page
92
+ top: nil
93
+ },
94
+ settings: { # request settings: http://phantomjs.org/api/webpage/property/settings.html
95
+ javascriptCanCloseWindows: nil, # whether window.open() is allowed, defaults to true
96
+ javascriptCanOpenWindows: nil, # whether window.close() is allowed, defaults to true
97
+ javascriptEnabled: nil, # if false, Javascript in the requested page is not executed. Defaults to true.
98
+ loadImages: nil, # if false, inlined images in the requested page are not loaded (see also config.loadImages). Defaults to true.
99
+ localToRemoteUrlAccessEnabled: nil, # if true, local resources (like a page loaded using file:// url) are able to load remote assets. Defaults to false.
100
+ password: nil, # password for basic HTTP authentication, see also userName
101
+ resourceTimeout: nil, # time in ms after which request will stop and onResourceTimeout() is called
102
+ userAgent: nil, # user agent string for requests (nil means use PhantomJS's default WebKitty one)
103
+ userName: nil, # name for basic HTTP authentication, see also password
104
+ webSecurityEnabled: nil, # see config.webSecurity. Defaults to true.
105
+ XSSAuditingEnabled: nil # monitor requests for XSS attempts. Defaults to false.
106
+ },
107
+ viewportSize: { # sets the size of the virtual browser window
108
+ width: nil,
109
+ height: nil
110
+ },
111
+ zoomFactor: nil # 4.0 increases page by 4X before rendering (right?), 0.25 shrinks page by 4X. Defaults to 1.0.
112
+ }
113
+ }
114
+ end