curlyq 0.0.10 → 0.0.12
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +12 -0
- data/Gemfile.lock +2 -2
- data/README.md +39 -3
- data/bin/curlyq +92 -3
- data/lib/curly/curl/html.rb +89 -21
- data/lib/curly/version.rb +1 -1
- data/src/_README.md +19 -0
- data/test/curlyq_scrape_test.rb +2 -2
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: af564949987eb7b96f7dfc730fea6ac79fb4a903c3eaff7a412af446c4a0e699
|
4
|
+
data.tar.gz: 58fb83c8132da551813adad468a6eb546a0d758df2b7c756b794a2a019963084
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 70e9485a7729acbd45a501278ceaae575bf358205c05da49de5c36f08ab8fd263824c8f8bd302c32b058986db3783b7abc6d9f0e58bb11f18879cdfde5210eac
|
7
|
+
data.tar.gz: 8d2e664b9e786c0a9d13fb8177c703be84c0a784835dba14354189978c34501334cc1dbdd68cc0112ad363c73554c170e9eb5fea64b60ee229bc4105b08d3fc6
|
data/CHANGELOG.md
CHANGED
data/Gemfile.lock
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
PATH
|
2
2
|
remote: .
|
3
3
|
specs:
|
4
|
-
curlyq (0.0.
|
4
|
+
curlyq (0.0.12)
|
5
5
|
gli (~> 2.21.0)
|
6
6
|
nokogiri (~> 1.16.0)
|
7
7
|
selenium-webdriver (~> 4.16.0)
|
@@ -11,7 +11,7 @@ GEM
|
|
11
11
|
remote: https://rubygems.org/
|
12
12
|
specs:
|
13
13
|
gli (2.21.1)
|
14
|
-
nokogiri (1.16.
|
14
|
+
nokogiri (1.16.2-arm64-darwin)
|
15
15
|
racc (~> 1.4)
|
16
16
|
parallel (1.23.0)
|
17
17
|
parallel_tests (3.13.0)
|
data/README.md
CHANGED
@@ -13,8 +13,7 @@ _If you find this useful, feel free to [buy me some coffee][donate]._
|
|
13
13
|
[jq]: https://github.com/jqlang/jq "Command-line JSON processor"
|
14
14
|
[yq]: https://github.com/mikefarah/yq "yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor"
|
15
15
|
|
16
|
-
The current version of `curlyq` is 0.0.
|
17
|
-
.
|
16
|
+
The current version of `curlyq` is 0.0.12.
|
18
17
|
|
19
18
|
CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like [jq] to parse the output.
|
20
19
|
|
@@ -47,7 +46,7 @@ SYNOPSIS
|
|
47
46
|
curlyq [global options] command [command options] [arguments...]
|
48
47
|
|
49
48
|
VERSION
|
50
|
-
0.0.
|
49
|
+
0.0.12
|
51
50
|
|
52
51
|
GLOBAL OPTIONS
|
53
52
|
--help - Show this message
|
@@ -56,6 +55,7 @@ GLOBAL OPTIONS
|
|
56
55
|
-y, --[no-]yaml - Output YAML instead of json
|
57
56
|
|
58
57
|
COMMANDS
|
58
|
+
execute - Execute JavaScript on a URL
|
59
59
|
extract - Extract contents between two regular expressions
|
60
60
|
headlinks - Return all <head> links on URL's page
|
61
61
|
help - Shows a list of commands or help for one command
|
@@ -136,6 +136,34 @@ COMMAND OPTIONS
|
|
136
136
|
```
|
137
137
|
|
138
138
|
|
139
|
+
##### execute
|
140
|
+
|
141
|
+
You can execute JavaScript on a given web page using the `execute` subcommand.
|
142
|
+
|
143
|
+
Example:
|
144
|
+
|
145
|
+
curlyq execute -s "NiftyAPI.find('file/save').arrow().shoot('file-save')" file:///Users/ttscoff/Desktop/Code/niftymenu/dist/MultiMarkdown-Composer.html
|
146
|
+
|
147
|
+
You can specify an element id to wait for using `--id`, and define a pause to wait after executing a script with `--wait` (defaults to 2 seconds). Scripts can be read from the command line arguments with `--script "SCRIPT"`, from STDIN with `--script -`, or from a file using `--script PATH`.
|
148
|
+
|
149
|
+
If you expect a return value, be sure to include a `return` statement in your executed script. Results will be output to STDOUT.
|
150
|
+
|
151
|
+
```
|
152
|
+
NAME
|
153
|
+
execute - Execute JavaScript on a URL
|
154
|
+
|
155
|
+
SYNOPSIS
|
156
|
+
|
157
|
+
curlyq [global options] execute [command options] URL...
|
158
|
+
|
159
|
+
COMMAND OPTIONS
|
160
|
+
-b, --browser=arg - Browser to use (firefox, chrome) (default: chrome)
|
161
|
+
-h, --header=arg - Define a header to send as key=value (may be used more than once, default: none)
|
162
|
+
-i, --id=arg - Element ID to wait for before executing (default: none)
|
163
|
+
-s, --script=arg - Script to execute, use - to read from STDIN (may be used more than once, default: none)
|
164
|
+
-w, --wait=arg - Seconds to wait after executing JS (default: 2)
|
165
|
+
```
|
166
|
+
|
139
167
|
##### headlinks
|
140
168
|
|
141
169
|
Example:
|
@@ -237,6 +265,7 @@ COMMAND OPTIONS
|
|
237
265
|
-h, --header=arg - Define a header to send as "key=value" (may be used more than once, default: none)
|
238
266
|
--[no-]ignore_fragments - Ignore fragment hrefs when gathering content links
|
239
267
|
--[no-]ignore_relative - Ignore relative hrefs when gathering content links
|
268
|
+
-l, --local_links_only - Only gather internal (same-site) links
|
240
269
|
-q, --query, --filter=arg - Filter output using dot-syntax path (default: none)
|
241
270
|
-r, --raw=arg - Output a raw value for a key (default: none)
|
242
271
|
-s, --search=arg - Regurn an array of matches to a CSS or XPath query (default: none)
|
@@ -379,6 +408,7 @@ COMMAND OPTIONS
|
|
379
408
|
-d, --[no-]dedup - Filter out duplicate links, preserving only first one
|
380
409
|
--[no-]ignore_fragments - Ignore fragment hrefs when gathering content links
|
381
410
|
--[no-]ignore_relative - Ignore relative hrefs when gathering content links
|
411
|
+
-l, --local_links_only - Only gather internal (same-site) links
|
382
412
|
-q, --query, --filter=arg - Filter output using dot-syntax path (default: none)
|
383
413
|
-x, --external_links_only - Only gather external links
|
384
414
|
```
|
@@ -436,6 +466,9 @@ Example:
|
|
436
466
|
|
437
467
|
Screenshot saved to /Users/ttscoff/Desktop/test.png
|
438
468
|
|
469
|
+
You can wait for an element ID to be visible using `--id`. This can be any `#ID` on the page. If the ID doesn't exist on the page, though, the screenshot will hang for a timeout of 10 seconds.
|
470
|
+
|
471
|
+
You can execute a script before taking the screenshot with the `--script` flag. If this is set to `-`, it will read the script from STDIN. If it's set to an existing file path, that file will be read for script input. Specify an interval (in seconds) to wait after executing the script with `--wait`.
|
439
472
|
|
440
473
|
```
|
441
474
|
NAME
|
@@ -448,8 +481,11 @@ SYNOPSIS
|
|
448
481
|
COMMAND OPTIONS
|
449
482
|
-b, --browser=arg - Browser to use (firefox, chrome) (default: chrome)
|
450
483
|
-h, --header=arg - Define a header to send as key=value (may be used more than once, default: none)
|
484
|
+
-i, --id=arg - Element ID to wait for before taking screenshot (default: none)
|
451
485
|
-o, --out, --file=arg - File destination (required, default: none)
|
486
|
+
-s, --script=arg - Script to execute before taking screenshot (may be used more than once, default: none)
|
452
487
|
-t, --type=arg - Type of screenshot to save (full (requires firefox), print, visible) (default: visible)
|
488
|
+
-w, --wait=arg - Time to wait before taking screenshot (default: 0)
|
453
489
|
```
|
454
490
|
|
455
491
|
##### tags
|
data/bin/curlyq
CHANGED
@@ -103,6 +103,9 @@ command %i[html curl] do |c|
|
|
103
103
|
c.desc 'Only gather external links'
|
104
104
|
c.switch %i[x external_links_only], default_value: false, negatable: false
|
105
105
|
|
106
|
+
c.desc 'Only gather internal (same-site) links'
|
107
|
+
c.switch %i[l local_links_only], default_value: false, negatable: false
|
108
|
+
|
106
109
|
c.action do |global_options, options, args|
|
107
110
|
urls = args.join(' ').split(/[, ]+/)
|
108
111
|
headers = break_headers(options[:header])
|
@@ -115,7 +118,8 @@ command %i[html curl] do |c|
|
|
115
118
|
compressed: options[:compressed], clean: options[:clean],
|
116
119
|
ignore_local_links: options[:ignore_relative],
|
117
120
|
ignore_fragment_links: options[:ignore_fragments],
|
118
|
-
external_links_only: options[:external_links_only]
|
121
|
+
external_links_only: options[:external_links_only],
|
122
|
+
local_links_only: options[:local_links_only] }
|
119
123
|
res = Curl::Html.new(url, curl_settings)
|
120
124
|
res.curl
|
121
125
|
|
@@ -152,6 +156,61 @@ command %i[html curl] do |c|
|
|
152
156
|
end
|
153
157
|
end
|
154
158
|
|
159
|
+
desc 'Execute JavaScript on a URL'
|
160
|
+
arg_name 'URL', multiple: true
|
161
|
+
command :execute do |c|
|
162
|
+
c.desc 'Browser to use (firefox, chrome)'
|
163
|
+
c.flag %i[b browser], type: BrowserType, must_match: /^[fc].*?$/, default_value: 'chrome'
|
164
|
+
|
165
|
+
c.desc 'Define a header to send as key=value'
|
166
|
+
c.flag %i[h header], multiple: true
|
167
|
+
|
168
|
+
c.desc 'Script to execute, use - to read from STDIN'
|
169
|
+
c.flag %i[s script], multiple: true
|
170
|
+
|
171
|
+
c.desc 'Element ID to wait for before executing'
|
172
|
+
c.flag %i[i id]
|
173
|
+
|
174
|
+
c.desc 'Seconds to wait after executing JS'
|
175
|
+
c.flag %i[w wait], default_value: 2
|
176
|
+
|
177
|
+
c.action do |_, options, args|
|
178
|
+
urls = args.join(' ').split(/[, ]+/)
|
179
|
+
|
180
|
+
raise 'Script input required' unless options[:file] || options[:script]
|
181
|
+
|
182
|
+
compiled_script = []
|
183
|
+
|
184
|
+
if options[:script].count.positive?
|
185
|
+
options[:script].each do |scr|
|
186
|
+
scr.strip!
|
187
|
+
if scr == '-'
|
188
|
+
compiled_script << $stdin.read
|
189
|
+
elsif File.exist?(File.expand_path(scr))
|
190
|
+
compiled_script << IO.read(File.expand_path(scr))
|
191
|
+
else
|
192
|
+
compiled_script << scr
|
193
|
+
end
|
194
|
+
end
|
195
|
+
end
|
196
|
+
|
197
|
+
script = compiled_script.count.positive? ? compiled_script.join(';') : nil
|
198
|
+
|
199
|
+
headers = break_headers(options[:header])
|
200
|
+
|
201
|
+
browser = options[:browser]
|
202
|
+
|
203
|
+
browser = browser.is_a?(Symbol) ? browser : browser.normalize_browser_type
|
204
|
+
|
205
|
+
urls.each do |url|
|
206
|
+
c = Curl::Html.new(url)
|
207
|
+
c.headers = headers
|
208
|
+
c.browser = browser
|
209
|
+
$stdout.puts c.execute(script, options[:wait], options[:id])
|
210
|
+
end
|
211
|
+
end
|
212
|
+
end
|
213
|
+
|
155
214
|
desc 'Save a screenshot of a URL'
|
156
215
|
arg_name 'URL', multiple: true
|
157
216
|
command :screenshot do |c|
|
@@ -167,6 +226,15 @@ command :screenshot do |c|
|
|
167
226
|
c.desc 'Define a header to send as key=value'
|
168
227
|
c.flag %i[h header], multiple: true
|
169
228
|
|
229
|
+
c.desc 'Script to execute before taking screenshot'
|
230
|
+
c.flag %i[s script], multiple: true
|
231
|
+
|
232
|
+
c.desc 'Element ID to wait for before taking screenshot'
|
233
|
+
c.flag %i[i id]
|
234
|
+
|
235
|
+
c.desc 'Time to wait before taking screenshot'
|
236
|
+
c.flag %i[w wait], default_value: 0, type: Integer
|
237
|
+
|
170
238
|
c.action do |_, options, args|
|
171
239
|
urls = args.join(' ').split(/[, ]+/)
|
172
240
|
headers = break_headers(options[:header])
|
@@ -177,13 +245,30 @@ command :screenshot do |c|
|
|
177
245
|
type = type.is_a?(Symbol) ? type : type.normalize_screenshot_type
|
178
246
|
browser = browser.is_a?(Symbol) ? browser : browser.normalize_browser_type
|
179
247
|
|
248
|
+
compiled_script = []
|
249
|
+
|
250
|
+
if options[:script].count.positive?
|
251
|
+
options[:script].each do |scr|
|
252
|
+
scr.strip!
|
253
|
+
if scr == '-'
|
254
|
+
compiled_script << $stdin.read
|
255
|
+
elsif File.exist?(File.expand_path(scr))
|
256
|
+
compiled_script << IO.read(File.expand_path(scr))
|
257
|
+
else
|
258
|
+
compiled_script << scr
|
259
|
+
end
|
260
|
+
end
|
261
|
+
end
|
262
|
+
|
263
|
+
script = compiled_script.count.positive? ? compiled_script.join(';') : nil
|
264
|
+
|
180
265
|
raise 'Full page screen shots only available with Firefox' if type == :full_page && browser != :firefox
|
181
266
|
|
182
267
|
urls.each do |url|
|
183
268
|
c = Curl::Html.new(url)
|
184
269
|
c.headers = headers
|
185
270
|
c.browser = browser
|
186
|
-
c.screenshot(options[:out], type: type)
|
271
|
+
c.screenshot(options[:out], type: type, script: script, id: options[:id], wait: options[:wait])
|
187
272
|
end
|
188
273
|
end
|
189
274
|
end
|
@@ -417,6 +502,9 @@ command :links do |c|
|
|
417
502
|
c.desc 'Only gather external links'
|
418
503
|
c.switch %i[x external_links_only], default_value: false, negatable: false
|
419
504
|
|
505
|
+
c.desc 'Only gather internal (same-site) links'
|
506
|
+
c.switch %i[l local_links_only], default_value: false, negatable: false
|
507
|
+
|
420
508
|
c.desc 'Filter output using dot-syntax path'
|
421
509
|
c.flag %i[q query filter]
|
422
510
|
|
@@ -433,7 +521,8 @@ command :links do |c|
|
|
433
521
|
compressed: options[:compressed], clean: options[:clean],
|
434
522
|
ignore_local_links: options[:ignore_relative],
|
435
523
|
ignore_fragment_links: options[:ignore_fragments],
|
436
|
-
external_links_only: options[:external_links_only]
|
524
|
+
external_links_only: options[:external_links_only],
|
525
|
+
local_links_only: options[:local_links_only]
|
437
526
|
})
|
438
527
|
res.curl
|
439
528
|
|
data/lib/curly/curl/html.rb
CHANGED
@@ -11,7 +11,7 @@ module Curl
|
|
11
11
|
# Class for CURLing an HTML page
|
12
12
|
class Html
|
13
13
|
attr_accessor :settings, :browser, :source, :headers, :headers_only, :compressed, :clean, :fallback,
|
14
|
-
:ignore_local_links, :ignore_fragment_links, :external_links_only
|
14
|
+
:ignore_local_links, :ignore_fragment_links, :external_links_only, :local_links_only
|
15
15
|
|
16
16
|
attr_reader :url, :code, :meta, :links, :head, :body,
|
17
17
|
:title, :description, :body_links, :body_images
|
@@ -69,6 +69,7 @@ module Curl
|
|
69
69
|
@ignore_local_links = options[:ignore_local_links]
|
70
70
|
@ignore_fragment_links = options[:ignore_fragment_links]
|
71
71
|
@external_links_only = options[:external_links_only]
|
72
|
+
@local_links_only = options[:local_links_only]
|
72
73
|
|
73
74
|
@curl = TTY::Which.which('curl')
|
74
75
|
@url = url.nil? ? options[:url] : url
|
@@ -127,10 +128,19 @@ module Curl
|
|
127
128
|
## save (:full_page,
|
128
129
|
## :print_page, :visible)
|
129
130
|
##
|
130
|
-
def screenshot(destination = nil, type: :full_page)
|
131
|
-
full_page = type.to_sym == :full_page
|
132
|
-
print_page = type.to_sym == :print_page
|
133
|
-
save_screenshot(destination, type: type)
|
131
|
+
def screenshot(destination = nil, type: :full_page, script: nil, id: nil, wait: 0)
|
132
|
+
# full_page = type.to_sym == :full_page
|
133
|
+
# print_page = type.to_sym == :print_page
|
134
|
+
save_screenshot(destination, type: type, script: script, id: id, wait_seconds: wait)
|
135
|
+
end
|
136
|
+
|
137
|
+
##
|
138
|
+
## @brief Execute JavaScript
|
139
|
+
##
|
140
|
+
## @param script The script to run
|
141
|
+
##
|
142
|
+
def execute(script, wait, element_id)
|
143
|
+
run_js(script, wait, element_id)
|
134
144
|
end
|
135
145
|
|
136
146
|
##
|
@@ -144,12 +154,11 @@ module Curl
|
|
144
154
|
def extract(before, after, inclusive: false)
|
145
155
|
before = /#{Regexp.escape(before)}/ unless before.is_a?(Regexp)
|
146
156
|
after = /#{Regexp.escape(after)}/ unless after.is_a?(Regexp)
|
147
|
-
|
148
|
-
|
149
|
-
|
150
|
-
|
151
|
-
|
152
|
-
end
|
157
|
+
rx = if inclusive
|
158
|
+
/(#{before.source}.*?#{after.source})/m
|
159
|
+
else
|
160
|
+
/(?<=#{before.source})(.*?)(?=#{after.source})/m
|
161
|
+
end
|
153
162
|
@body.scan(rx).map { |r| @clean ? r[0].clean : r[0] }
|
154
163
|
end
|
155
164
|
|
@@ -490,11 +499,19 @@ module Curl
|
|
490
499
|
|
491
500
|
link_href = link_href[2]
|
492
501
|
|
493
|
-
|
502
|
+
if @local_links_only
|
503
|
+
next if @ignore_fragment_links && link_href =~ /^#/
|
504
|
+
|
505
|
+
next unless same_origin?(link_href)
|
494
506
|
|
495
|
-
|
507
|
+
else
|
508
|
+
next if link_href =~ /^#/ && (@ignore_fragment_links || @external_links_only)
|
509
|
+
|
510
|
+
next if link_href !~ %r{^(\w+:)?//} && (@ignore_local_links || @external_links_only)
|
496
511
|
|
497
|
-
|
512
|
+
next if same_origin?(link_href) && @external_links_only
|
513
|
+
|
514
|
+
end
|
498
515
|
|
499
516
|
link_title = tag.match(/title=(['"])(.*?)\1/)
|
500
517
|
link_title = link_title.nil? ? nil : link_title[2]
|
@@ -522,11 +539,19 @@ module Curl
|
|
522
539
|
link_tags.each do |m|
|
523
540
|
href = m['tag'].match(/href=(["'])(.*?)\1/)
|
524
541
|
href = href[2] unless href.nil?
|
525
|
-
|
542
|
+
if @local_links_only
|
543
|
+
next if href =~ /^#/ && @ignore_fragment_links
|
544
|
+
|
545
|
+
next unless same_origin?(href)
|
546
|
+
|
547
|
+
else
|
548
|
+
next if href =~ /^#/ && (@ignore_fragment_links || @external_links_only)
|
549
|
+
|
550
|
+
next if href !~ %r{^(\w+:)?//} && (@ignore_local_links || @external_links_only)
|
526
551
|
|
527
|
-
|
552
|
+
next if same_origin?(href) && @external_links_only
|
528
553
|
|
529
|
-
|
554
|
+
end
|
530
555
|
|
531
556
|
title = m['tag'].match(/title=(["'])(.*?)\1/)
|
532
557
|
title = title[2] unless title.nil?
|
@@ -587,14 +612,46 @@ module Curl
|
|
587
612
|
res
|
588
613
|
end
|
589
614
|
|
615
|
+
##
|
616
|
+
## Run JavaScript on a URL
|
617
|
+
##
|
618
|
+
## @param script The JavaScript to execute
|
619
|
+
## @param wait Seconds to wait after executing JS
|
620
|
+
## @param element_id The element identifier
|
621
|
+
##
|
622
|
+
def run_js(script, wait_seconds = 2, element_id = nil)
|
623
|
+
raise 'No script provided' if script.nil?
|
624
|
+
|
625
|
+
browser = @browser.is_a?(String) ? @browser.normalize_browser_type : @browser
|
626
|
+
|
627
|
+
driver = Selenium::WebDriver.for browser
|
628
|
+
|
629
|
+
driver.manage.timeouts.implicit_wait = 15
|
630
|
+
res = nil
|
631
|
+
begin
|
632
|
+
driver.get @url
|
633
|
+
if element_id
|
634
|
+
wait = Selenium::WebDriver::Wait.new(timeout: 10) # seconds
|
635
|
+
wait.until { driver.find_element(id: element_id) }
|
636
|
+
end
|
637
|
+
res = driver.execute_script(script)
|
638
|
+
sleep wait_seconds.to_i
|
639
|
+
ensure
|
640
|
+
driver.quit
|
641
|
+
end
|
642
|
+
|
643
|
+
$stderr.puts "Executed JS on #{@url}"
|
644
|
+
|
645
|
+
res
|
646
|
+
end
|
647
|
+
|
590
648
|
##
|
591
649
|
## Save a screenshot of a url
|
592
650
|
##
|
593
651
|
## @param destination [String] File path destination
|
594
|
-
## @param browser [Symbol] The browser (:chrome or :firefox)
|
595
652
|
## @param type [Symbol] The type of screenshot (:full_page, :print_page, or :visible)
|
596
653
|
##
|
597
|
-
def save_screenshot(destination = nil, type: :full_page)
|
654
|
+
def save_screenshot(destination = nil, type: :full_page, script: nil, wait_seconds: 0, id: nil)
|
598
655
|
raise 'No URL provided' if url.nil?
|
599
656
|
|
600
657
|
raise 'No file destination provided' if destination.nil?
|
@@ -603,7 +660,7 @@ module Curl
|
|
603
660
|
|
604
661
|
raise 'Path doesn\'t exist' unless File.directory?(File.dirname(destination))
|
605
662
|
|
606
|
-
browser = browser.
|
663
|
+
browser = @browser.is_a?(String) ? @browser.normalize_browser_type : @browser
|
607
664
|
type = type.normalize_screenshot_type if type.is_a?(String)
|
608
665
|
raise 'Can not save full screen with Chrome, use Firefox' if type == :full_page && browser == :chrome
|
609
666
|
|
@@ -614,10 +671,21 @@ module Curl
|
|
614
671
|
"#{destination.sub(/\.(pdf|jpe?g|png)$/, '')}.png"
|
615
672
|
end
|
616
673
|
|
617
|
-
driver = Selenium::WebDriver.for
|
674
|
+
driver = Selenium::WebDriver.for browser
|
618
675
|
driver.manage.timeouts.implicit_wait = 4
|
619
676
|
begin
|
620
677
|
driver.get @url
|
678
|
+
if id
|
679
|
+
wait = Selenium::WebDriver::Wait.new(timeout: 10) # seconds
|
680
|
+
wait.until { driver.find_element(id: id) }
|
681
|
+
end
|
682
|
+
|
683
|
+
if script
|
684
|
+
res = driver.execute_script(script)
|
685
|
+
end
|
686
|
+
|
687
|
+
sleep wait_seconds.to_i
|
688
|
+
|
621
689
|
case type
|
622
690
|
when :print_page
|
623
691
|
driver.save_print_page(destination)
|
data/lib/curly/version.rb
CHANGED
data/src/_README.md
CHANGED
@@ -95,6 +95,22 @@ This specifies a before and after string and includes them (`-i`) in the result.
|
|
95
95
|
```
|
96
96
|
|
97
97
|
|
98
|
+
##### execute
|
99
|
+
|
100
|
+
You can execute JavaScript on a given web page using the `execute` subcommand.
|
101
|
+
|
102
|
+
Example:
|
103
|
+
|
104
|
+
curlyq execute -s "NiftyAPI.find('file/save').arrow().shoot('file-save')" file:///Users/ttscoff/Desktop/Code/niftymenu/dist/MultiMarkdown-Composer.html
|
105
|
+
|
106
|
+
You can specify an element id to wait for using `--id`, and define a pause to wait after executing a script with `--wait` (defaults to 2 seconds). Scripts can be read from the command line arguments with `--script "SCRIPT"`, from STDIN with `--script -`, or from a file using `--script PATH`.
|
107
|
+
|
108
|
+
If you expect a return value, be sure to include a `return` statement in your executed script. Results will be output to STDOUT.
|
109
|
+
|
110
|
+
```
|
111
|
+
@cli(bundle exec bin/curlyq help execute)
|
112
|
+
```
|
113
|
+
|
98
114
|
##### headlinks
|
99
115
|
|
100
116
|
Example:
|
@@ -321,6 +337,9 @@ Example:
|
|
321
337
|
|
322
338
|
Screenshot saved to /Users/ttscoff/Desktop/test.png
|
323
339
|
|
340
|
+
You can wait for an element ID to be visible using `--id`. This can be any `#ID` on the page. If the ID doesn't exist on the page, though, the screenshot will hang for a timeout of 10 seconds.
|
341
|
+
|
342
|
+
You can execute a script before taking the screenshot with the `--script` flag. If this is set to `-`, it will read the script from STDIN. If it's set to an existing file path, that file will be read for script input. Specify an interval (in seconds) to wait after executing the script with `--wait`.
|
324
343
|
|
325
344
|
```
|
326
345
|
@cli(bundle exec bin/curlyq help screenshot)
|
data/test/curlyq_scrape_test.rb
CHANGED
@@ -13,13 +13,13 @@ class CurlyQScrapeTest < Test::Unit::TestCase
|
|
13
13
|
def setup
|
14
14
|
@screenshot = File.join(File.dirname(__FILE__), 'screenshot_test')
|
15
15
|
FileUtils.rm_f("#{@screenshot}.pdf") if File.exist?("#{@screenshot}.pdf")
|
16
|
-
FileUtils.rm_f(
|
16
|
+
FileUtils.rm_f("#{@screenshot}.png") if File.exist?("#{@screenshot}.png")
|
17
17
|
FileUtils.rm_f("#{@screenshot}_full.png") if File.exist?("#{@screenshot}_full.png")
|
18
18
|
end
|
19
19
|
|
20
20
|
def teardown
|
21
21
|
FileUtils.rm_f("#{@screenshot}.pdf") if File.exist?("#{@screenshot}.pdf")
|
22
|
-
FileUtils.rm_f(
|
22
|
+
FileUtils.rm_f("#{@screenshot}.png") if File.exist?("#{@screenshot}.png")
|
23
23
|
FileUtils.rm_f("#{@screenshot}_full.png") if File.exist?("#{@screenshot}_full.png")
|
24
24
|
end
|
25
25
|
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: curlyq
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.12
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Brett Terpstra
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2024-
|
11
|
+
date: 2024-04-04 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rake
|