curlyq 0.0.11 → 0.0.12

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: a9b0847eb3dd79e15b96bed47858ad0eb0df2ba7db8cf2e3395cb9e08e71c194
4
- data.tar.gz: '06623683ff93c02087432750a150ac663c4558b7d18323bbbb367e004abd58ab'
3
+ metadata.gz: af564949987eb7b96f7dfc730fea6ac79fb4a903c3eaff7a412af446c4a0e699
4
+ data.tar.gz: 58fb83c8132da551813adad468a6eb546a0d758df2b7c756b794a2a019963084
5
5
  SHA512:
6
- metadata.gz: 8b7098dde55f9b76a53eff1f71a5d821a2db6d5828fb67428f2aa3ef5d6ab8e2bdbb79f5375fb5291b965ff3d0b9677cf0084782c078c2bb5575a8383bd26906
7
- data.tar.gz: c0b02267ea0de1c490b2c2dcd171f8a992fa659733aa9bd9e0dc590988af3d7c5f4b6e38e0371ce72c879a1f956ec7f8b87e8432e684d8f7dad4f019314fa834
6
+ metadata.gz: 70e9485a7729acbd45a501278ceaae575bf358205c05da49de5c36f08ab8fd263824c8f8bd302c32b058986db3783b7abc6d9f0e58bb11f18879cdfde5210eac
7
+ data.tar.gz: 8d2e664b9e786c0a9d13fb8177c703be84c0a784835dba14354189978c34501334cc1dbdd68cc0112ad363c73554c170e9eb5fea64b60ee229bc4105b08d3fc6
data/CHANGELOG.md CHANGED
@@ -1,3 +1,7 @@
1
+ ### 0.0.12
2
+
3
+ 2024-04-04 13:06
4
+
1
5
  ### 0.0.11
2
6
 
3
7
  2024-01-21 15:29
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- curlyq (0.0.11)
4
+ curlyq (0.0.12)
5
5
  gli (~> 2.21.0)
6
6
  nokogiri (~> 1.16.0)
7
7
  selenium-webdriver (~> 4.16.0)
@@ -11,7 +11,7 @@ GEM
11
11
  remote: https://rubygems.org/
12
12
  specs:
13
13
  gli (2.21.1)
14
- nokogiri (1.16.0-arm64-darwin)
14
+ nokogiri (1.16.2-arm64-darwin)
15
15
  racc (~> 1.4)
16
16
  parallel (1.23.0)
17
17
  parallel_tests (3.13.0)
data/README.md CHANGED
@@ -13,8 +13,7 @@ _If you find this useful, feel free to [buy me some coffee][donate]._
13
13
  [jq]: https://github.com/jqlang/jq "Command-line JSON processor"
14
14
  [yq]: https://github.com/mikefarah/yq "yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor"
15
15
 
16
- The current version of `curlyq` is 0.0.11
17
- .
16
+ The current version of `curlyq` is 0.0.12.
18
17
 
19
18
  CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like [jq] to parse the output.
20
19
 
@@ -47,7 +46,7 @@ SYNOPSIS
47
46
  curlyq [global options] command [command options] [arguments...]
48
47
 
49
48
  VERSION
50
- 0.0.11
49
+ 0.0.12
51
50
 
52
51
  GLOBAL OPTIONS
53
52
  --help - Show this message
@@ -56,6 +55,7 @@ GLOBAL OPTIONS
56
55
  -y, --[no-]yaml - Output YAML instead of json
57
56
 
58
57
  COMMANDS
58
+ execute - Execute JavaScript on a URL
59
59
  extract - Extract contents between two regular expressions
60
60
  headlinks - Return all <head> links on URL's page
61
61
  help - Shows a list of commands or help for one command
@@ -94,13 +94,11 @@ Comparisons can be numeric or string comparisons. A numeric comparison like `cur
94
94
 
95
95
  You can also use dot syntax inside of comparisons, e.g. `[links.rel*=me]` to target the links object (`html` command), and return only the links with a `rel=me` attribute. If the comparison is to an array object (like `class` or `rel`), it will match if any of the elements of the array match your comparison.
96
96
 
97
- If you end the query with a specific key, only that key will be output, but it will be in an array. If there's only one match, it will be output as a raw string as a single element in an array.
97
+ If you end the query with a specific key, only that key will be output. If there's only one match, it will be output as a raw string. If there are multiple matches, output will be an array:
98
98
 
99
99
  curlyq tags --search '#main .post h3' -q '[attrs.id*=what].source' 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/'
100
100
 
101
- [
102
- "<h3 id=\"whats-next\">What???s Next</h3>"
103
- ]
101
+ <h3 id="whats-next">What???s Next</h3>
104
102
 
105
103
  #### Commands
106
104
 
@@ -138,6 +136,34 @@ COMMAND OPTIONS
138
136
  ```
139
137
 
140
138
 
139
+ ##### execute
140
+
141
+ You can execute JavaScript on a given web page using the `execute` subcommand.
142
+
143
+ Example:
144
+
145
+ curlyq execute -s "NiftyAPI.find('file/save').arrow().shoot('file-save')" file:///Users/ttscoff/Desktop/Code/niftymenu/dist/MultiMarkdown-Composer.html
146
+
147
+ You can specify an element id to wait for using `--id`, and define a pause to wait after executing a script with `--wait` (defaults to 2 seconds). Scripts can be read from the command line arguments with `--script "SCRIPT"`, from STDIN with `--script -`, or from a file using `--script PATH`.
148
+
149
+ If you expect a return value, be sure to include a `return` statement in your executed script. Results will be output to STDOUT.
150
+
151
+ ```
152
+ NAME
153
+ execute - Execute JavaScript on a URL
154
+
155
+ SYNOPSIS
156
+
157
+ curlyq [global options] execute [command options] URL...
158
+
159
+ COMMAND OPTIONS
160
+ -b, --browser=arg - Browser to use (firefox, chrome) (default: chrome)
161
+ -h, --header=arg - Define a header to send as key=value (may be used more than once, default: none)
162
+ -i, --id=arg - Element ID to wait for before executing (default: none)
163
+ -s, --script=arg - Script to execute, use - to read from STDIN (may be used more than once, default: none)
164
+ -w, --wait=arg - Seconds to wait after executing JS (default: 2)
165
+ ```
166
+
141
167
  ##### headlinks
142
168
 
143
169
  Example:
@@ -440,6 +466,9 @@ Example:
440
466
 
441
467
  Screenshot saved to /Users/ttscoff/Desktop/test.png
442
468
 
469
+ You can wait for an element ID to be visible using `--id`. This can be any `#ID` on the page. If the ID doesn't exist on the page, though, the screenshot will hang for a timeout of 10 seconds.
470
+
471
+ You can execute a script before taking the screenshot with the `--script` flag. If this is set to `-`, it will read the script from STDIN. If it's set to an existing file path, that file will be read for script input. Specify an interval (in seconds) to wait after executing the script with `--wait`.
443
472
 
444
473
  ```
445
474
  NAME
@@ -452,8 +481,11 @@ SYNOPSIS
452
481
  COMMAND OPTIONS
453
482
  -b, --browser=arg - Browser to use (firefox, chrome) (default: chrome)
454
483
  -h, --header=arg - Define a header to send as key=value (may be used more than once, default: none)
484
+ -i, --id=arg - Element ID to wait for before taking screenshot (default: none)
455
485
  -o, --out, --file=arg - File destination (required, default: none)
486
+ -s, --script=arg - Script to execute before taking screenshot (may be used more than once, default: none)
456
487
  -t, --type=arg - Type of screenshot to save (full (requires firefox), print, visible) (default: visible)
488
+ -w, --wait=arg - Time to wait before taking screenshot (default: 0)
457
489
  ```
458
490
 
459
491
  ##### tags
data/bin/curlyq CHANGED
@@ -156,6 +156,61 @@ command %i[html curl] do |c|
156
156
  end
157
157
  end
158
158
 
159
+ desc 'Execute JavaScript on a URL'
160
+ arg_name 'URL', multiple: true
161
+ command :execute do |c|
162
+ c.desc 'Browser to use (firefox, chrome)'
163
+ c.flag %i[b browser], type: BrowserType, must_match: /^[fc].*?$/, default_value: 'chrome'
164
+
165
+ c.desc 'Define a header to send as key=value'
166
+ c.flag %i[h header], multiple: true
167
+
168
+ c.desc 'Script to execute, use - to read from STDIN'
169
+ c.flag %i[s script], multiple: true
170
+
171
+ c.desc 'Element ID to wait for before executing'
172
+ c.flag %i[i id]
173
+
174
+ c.desc 'Seconds to wait after executing JS'
175
+ c.flag %i[w wait], default_value: 2
176
+
177
+ c.action do |_, options, args|
178
+ urls = args.join(' ').split(/[, ]+/)
179
+
180
+ raise 'Script input required' unless options[:file] || options[:script]
181
+
182
+ compiled_script = []
183
+
184
+ if options[:script].count.positive?
185
+ options[:script].each do |scr|
186
+ scr.strip!
187
+ if scr == '-'
188
+ compiled_script << $stdin.read
189
+ elsif File.exist?(File.expand_path(scr))
190
+ compiled_script << IO.read(File.expand_path(scr))
191
+ else
192
+ compiled_script << scr
193
+ end
194
+ end
195
+ end
196
+
197
+ script = compiled_script.count.positive? ? compiled_script.join(';') : nil
198
+
199
+ headers = break_headers(options[:header])
200
+
201
+ browser = options[:browser]
202
+
203
+ browser = browser.is_a?(Symbol) ? browser : browser.normalize_browser_type
204
+
205
+ urls.each do |url|
206
+ c = Curl::Html.new(url)
207
+ c.headers = headers
208
+ c.browser = browser
209
+ $stdout.puts c.execute(script, options[:wait], options[:id])
210
+ end
211
+ end
212
+ end
213
+
159
214
  desc 'Save a screenshot of a URL'
160
215
  arg_name 'URL', multiple: true
161
216
  command :screenshot do |c|
@@ -171,6 +226,15 @@ command :screenshot do |c|
171
226
  c.desc 'Define a header to send as key=value'
172
227
  c.flag %i[h header], multiple: true
173
228
 
229
+ c.desc 'Script to execute before taking screenshot'
230
+ c.flag %i[s script], multiple: true
231
+
232
+ c.desc 'Element ID to wait for before taking screenshot'
233
+ c.flag %i[i id]
234
+
235
+ c.desc 'Time to wait before taking screenshot'
236
+ c.flag %i[w wait], default_value: 0, type: Integer
237
+
174
238
  c.action do |_, options, args|
175
239
  urls = args.join(' ').split(/[, ]+/)
176
240
  headers = break_headers(options[:header])
@@ -181,13 +245,30 @@ command :screenshot do |c|
181
245
  type = type.is_a?(Symbol) ? type : type.normalize_screenshot_type
182
246
  browser = browser.is_a?(Symbol) ? browser : browser.normalize_browser_type
183
247
 
248
+ compiled_script = []
249
+
250
+ if options[:script].count.positive?
251
+ options[:script].each do |scr|
252
+ scr.strip!
253
+ if scr == '-'
254
+ compiled_script << $stdin.read
255
+ elsif File.exist?(File.expand_path(scr))
256
+ compiled_script << IO.read(File.expand_path(scr))
257
+ else
258
+ compiled_script << scr
259
+ end
260
+ end
261
+ end
262
+
263
+ script = compiled_script.count.positive? ? compiled_script.join(';') : nil
264
+
184
265
  raise 'Full page screen shots only available with Firefox' if type == :full_page && browser != :firefox
185
266
 
186
267
  urls.each do |url|
187
268
  c = Curl::Html.new(url)
188
269
  c.headers = headers
189
270
  c.browser = browser
190
- c.screenshot(options[:out], type: type)
271
+ c.screenshot(options[:out], type: type, script: script, id: options[:id], wait: options[:wait])
191
272
  end
192
273
  end
193
274
  end
@@ -128,10 +128,19 @@ module Curl
128
128
  ## save (:full_page,
129
129
  ## :print_page, :visible)
130
130
  ##
131
- def screenshot(destination = nil, type: :full_page)
132
- full_page = type.to_sym == :full_page
133
- print_page = type.to_sym == :print_page
134
- save_screenshot(destination, type: type)
131
+ def screenshot(destination = nil, type: :full_page, script: nil, id: nil, wait: 0)
132
+ # full_page = type.to_sym == :full_page
133
+ # print_page = type.to_sym == :print_page
134
+ save_screenshot(destination, type: type, script: script, id: id, wait_seconds: wait)
135
+ end
136
+
137
+ ##
138
+ ## @brief Execute JavaScript
139
+ ##
140
+ ## @param script The script to run
141
+ ##
142
+ def execute(script, wait, element_id)
143
+ run_js(script, wait, element_id)
135
144
  end
136
145
 
137
146
  ##
@@ -145,12 +154,11 @@ module Curl
145
154
  def extract(before, after, inclusive: false)
146
155
  before = /#{Regexp.escape(before)}/ unless before.is_a?(Regexp)
147
156
  after = /#{Regexp.escape(after)}/ unless after.is_a?(Regexp)
148
-
149
- if inclusive
150
- rx = /(#{before.source}.*?#{after.source})/m
151
- else
152
- rx = /(?<=#{before.source})(.*?)(?=#{after.source})/m
153
- end
157
+ rx = if inclusive
158
+ /(#{before.source}.*?#{after.source})/m
159
+ else
160
+ /(?<=#{before.source})(.*?)(?=#{after.source})/m
161
+ end
154
162
  @body.scan(rx).map { |r| @clean ? r[0].clean : r[0] }
155
163
  end
156
164
 
@@ -604,14 +612,46 @@ module Curl
604
612
  res
605
613
  end
606
614
 
615
+ ##
616
+ ## Run JavaScript on a URL
617
+ ##
618
+ ## @param script The JavaScript to execute
619
+ ## @param wait Seconds to wait after executing JS
620
+ ## @param element_id The element identifier
621
+ ##
622
+ def run_js(script, wait_seconds = 2, element_id = nil)
623
+ raise 'No script provided' if script.nil?
624
+
625
+ browser = @browser.is_a?(String) ? @browser.normalize_browser_type : @browser
626
+
627
+ driver = Selenium::WebDriver.for browser
628
+
629
+ driver.manage.timeouts.implicit_wait = 15
630
+ res = nil
631
+ begin
632
+ driver.get @url
633
+ if element_id
634
+ wait = Selenium::WebDriver::Wait.new(timeout: 10) # seconds
635
+ wait.until { driver.find_element(id: element_id) }
636
+ end
637
+ res = driver.execute_script(script)
638
+ sleep wait_seconds.to_i
639
+ ensure
640
+ driver.quit
641
+ end
642
+
643
+ $stderr.puts "Executed JS on #{@url}"
644
+
645
+ res
646
+ end
647
+
607
648
  ##
608
649
  ## Save a screenshot of a url
609
650
  ##
610
651
  ## @param destination [String] File path destination
611
- ## @param browser [Symbol] The browser (:chrome or :firefox)
612
652
  ## @param type [Symbol] The type of screenshot (:full_page, :print_page, or :visible)
613
653
  ##
614
- def save_screenshot(destination = nil, type: :full_page)
654
+ def save_screenshot(destination = nil, type: :full_page, script: nil, wait_seconds: 0, id: nil)
615
655
  raise 'No URL provided' if url.nil?
616
656
 
617
657
  raise 'No file destination provided' if destination.nil?
@@ -620,7 +660,7 @@ module Curl
620
660
 
621
661
  raise 'Path doesn\'t exist' unless File.directory?(File.dirname(destination))
622
662
 
623
- browser = browser.normalize_browser_type if browser.is_a?(String)
663
+ browser = @browser.is_a?(String) ? @browser.normalize_browser_type : @browser
624
664
  type = type.normalize_screenshot_type if type.is_a?(String)
625
665
  raise 'Can not save full screen with Chrome, use Firefox' if type == :full_page && browser == :chrome
626
666
 
@@ -631,10 +671,21 @@ module Curl
631
671
  "#{destination.sub(/\.(pdf|jpe?g|png)$/, '')}.png"
632
672
  end
633
673
 
634
- driver = Selenium::WebDriver.for @browser
674
+ driver = Selenium::WebDriver.for browser
635
675
  driver.manage.timeouts.implicit_wait = 4
636
676
  begin
637
677
  driver.get @url
678
+ if id
679
+ wait = Selenium::WebDriver::Wait.new(timeout: 10) # seconds
680
+ wait.until { driver.find_element(id: id) }
681
+ end
682
+
683
+ if script
684
+ res = driver.execute_script(script)
685
+ end
686
+
687
+ sleep wait_seconds.to_i
688
+
638
689
  case type
639
690
  when :print_page
640
691
  driver.save_print_page(destination)
data/lib/curly/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # Top level module for CurlyQ
2
2
  module Curly
3
3
  # Current version number
4
- VERSION = '0.0.11'
4
+ VERSION = '0.0.12'
5
5
  end
data/src/_README.md CHANGED
@@ -13,7 +13,7 @@ _If you find this useful, feel free to [buy me some coffee][donate]._
13
13
  [jq]: https://github.com/jqlang/jq "Command-line JSON processor"
14
14
  [yq]: https://github.com/mikefarah/yq "yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor"
15
15
 
16
- The current version of `curlyq` is <!--VER-->0.0.10<!--END VER-->.
16
+ The current version of `curlyq` is <!--VER-->0.0.9<!--END VER-->.
17
17
 
18
18
  CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like [jq] to parse the output.
19
19
 
@@ -68,13 +68,11 @@ Comparisons can be numeric or string comparisons. A numeric comparison like `cur
68
68
 
69
69
  You can also use dot syntax inside of comparisons, e.g. `[links.rel*=me]` to target the links object (`html` command), and return only the links with a `rel=me` attribute. If the comparison is to an array object (like `class` or `rel`), it will match if any of the elements of the array match your comparison.
70
70
 
71
- If you end the query with a specific key, only that key will be output, but it will be in an array. If there's only one match, it will be output as a raw string as a single element in an array.
71
+ If you end the query with a specific key, only that key will be output. If there's only one match, it will be output as a raw string. If there are multiple matches, output will be an array:
72
72
 
73
73
  curlyq tags --search '#main .post h3' -q '[attrs.id*=what].source' 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/'
74
74
 
75
- [
76
- "<h3 id=\"whats-next\">What’s Next</h3>"
77
- ]
75
+ <h3 id="whats-next">What’s Next</h3>
78
76
 
79
77
  #### Commands
80
78
 
@@ -97,6 +95,22 @@ This specifies a before and after string and includes them (`-i`) in the result.
97
95
  ```
98
96
 
99
97
 
98
+ ##### execute
99
+
100
+ You can execute JavaScript on a given web page using the `execute` subcommand.
101
+
102
+ Example:
103
+
104
+ curlyq execute -s "NiftyAPI.find('file/save').arrow().shoot('file-save')" file:///Users/ttscoff/Desktop/Code/niftymenu/dist/MultiMarkdown-Composer.html
105
+
106
+ You can specify an element id to wait for using `--id`, and define a pause to wait after executing a script with `--wait` (defaults to 2 seconds). Scripts can be read from the command line arguments with `--script "SCRIPT"`, from STDIN with `--script -`, or from a file using `--script PATH`.
107
+
108
+ If you expect a return value, be sure to include a `return` statement in your executed script. Results will be output to STDOUT.
109
+
110
+ ```
111
+ @cli(bundle exec bin/curlyq help execute)
112
+ ```
113
+
100
114
  ##### headlinks
101
115
 
102
116
  Example:
@@ -323,6 +337,9 @@ Example:
323
337
 
324
338
  Screenshot saved to /Users/ttscoff/Desktop/test.png
325
339
 
340
+ You can wait for an element ID to be visible using `--id`. This can be any `#ID` on the page. If the ID doesn't exist on the page, though, the screenshot will hang for a timeout of 10 seconds.
341
+
342
+ You can execute a script before taking the screenshot with the `--script` flag. If this is set to `-`, it will read the script from STDIN. If it's set to an existing file path, that file will be read for script input. Specify an interval (in seconds) to wait after executing the script with `--wait`.
326
343
 
327
344
  ```
328
345
  @cli(bundle exec bin/curlyq help screenshot)
@@ -13,13 +13,13 @@ class CurlyQScrapeTest < Test::Unit::TestCase
13
13
  def setup
14
14
  @screenshot = File.join(File.dirname(__FILE__), 'screenshot_test')
15
15
  FileUtils.rm_f("#{@screenshot}.pdf") if File.exist?("#{@screenshot}.pdf")
16
- FileUtils.rm_f('screenshot_test.png') if File.exist?("#{@screenshot}.png")
16
+ FileUtils.rm_f("#{@screenshot}.png") if File.exist?("#{@screenshot}.png")
17
17
  FileUtils.rm_f("#{@screenshot}_full.png") if File.exist?("#{@screenshot}_full.png")
18
18
  end
19
19
 
20
20
  def teardown
21
21
  FileUtils.rm_f("#{@screenshot}.pdf") if File.exist?("#{@screenshot}.pdf")
22
- FileUtils.rm_f('screenshot_test.png') if File.exist?("#{@screenshot}.png")
22
+ FileUtils.rm_f("#{@screenshot}.png") if File.exist?("#{@screenshot}.png")
23
23
  FileUtils.rm_f("#{@screenshot}_full.png") if File.exist?("#{@screenshot}_full.png")
24
24
  end
25
25
 
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: curlyq
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.11
4
+ version: 0.0.12
5
5
  platform: ruby
6
6
  authors:
7
7
  - Brett Terpstra
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2024-01-21 00:00:00.000000000 Z
11
+ date: 2024-04-04 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rake