curlyq 0.0.11 → 0.0.13
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +17 -0
- data/Gemfile.lock +2 -2
- data/README.md +39 -7
- data/bin/curlyq +93 -1
- data/lib/curly/array.rb +2 -0
- data/lib/curly/curl/html.rb +81 -25
- data/lib/curly/version.rb +1 -1
- data/src/_README.md +22 -5
- data/test/curlyq_html_test.rb +1 -1
- data/test/curlyq_scrape_test.rb +2 -2
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: db3d3f02ab489eefdf85122b219d01a14d43322040009566548ba76ebed07767
|
4
|
+
data.tar.gz: 1979f168d55f7f11fab7625a3916154b97ec9a1736e9a204a561399de5a3406c
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: b47fd47823f2b74493affc0180eb96cb4146c34d89d670edb4350703009dccd341ce278f1221e9c7eafd8f2d8a117caa2ba6991a931e5927396091347d4970b6
|
7
|
+
data.tar.gz: 3143684aabac9d380583de9bfcb51045e78efacefc75051a7281dc2760a2efc24e70101f6c688ef396b7ba7f336d128f6641372a52c2e95be33082c8d9724712
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,20 @@
|
|
1
|
+
### 0.0.13
|
2
|
+
|
3
|
+
2024-10-25 10:23
|
4
|
+
|
5
|
+
#### FIXED
|
6
|
+
|
7
|
+
- Fix tests, handle empty results better
|
8
|
+
|
9
|
+
### 0.0.12
|
10
|
+
|
11
|
+
2024-04-04 13:06
|
12
|
+
|
13
|
+
#### NEW
|
14
|
+
|
15
|
+
- Add --script option to screenshot command
|
16
|
+
- Add `execute` command for executing JavaScript on a page
|
17
|
+
|
1
18
|
### 0.0.11
|
2
19
|
|
3
20
|
2024-01-21 15:29
|
data/Gemfile.lock
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
PATH
|
2
2
|
remote: .
|
3
3
|
specs:
|
4
|
-
curlyq (0.0.
|
4
|
+
curlyq (0.0.13)
|
5
5
|
gli (~> 2.21.0)
|
6
6
|
nokogiri (~> 1.16.0)
|
7
7
|
selenium-webdriver (~> 4.16.0)
|
@@ -11,7 +11,7 @@ GEM
|
|
11
11
|
remote: https://rubygems.org/
|
12
12
|
specs:
|
13
13
|
gli (2.21.1)
|
14
|
-
nokogiri (1.16.
|
14
|
+
nokogiri (1.16.2-arm64-darwin)
|
15
15
|
racc (~> 1.4)
|
16
16
|
parallel (1.23.0)
|
17
17
|
parallel_tests (3.13.0)
|
data/README.md
CHANGED
@@ -13,8 +13,7 @@ _If you find this useful, feel free to [buy me some coffee][donate]._
|
|
13
13
|
[jq]: https://github.com/jqlang/jq "Command-line JSON processor"
|
14
14
|
[yq]: https://github.com/mikefarah/yq "yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor"
|
15
15
|
|
16
|
-
The current version of `curlyq` is 0.0.
|
17
|
-
.
|
16
|
+
The current version of `curlyq` is 0.0.13.
|
18
17
|
|
19
18
|
CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like [jq] to parse the output.
|
20
19
|
|
@@ -47,7 +46,7 @@ SYNOPSIS
|
|
47
46
|
curlyq [global options] command [command options] [arguments...]
|
48
47
|
|
49
48
|
VERSION
|
50
|
-
0.0.
|
49
|
+
0.0.13
|
51
50
|
|
52
51
|
GLOBAL OPTIONS
|
53
52
|
--help - Show this message
|
@@ -56,6 +55,7 @@ GLOBAL OPTIONS
|
|
56
55
|
-y, --[no-]yaml - Output YAML instead of json
|
57
56
|
|
58
57
|
COMMANDS
|
58
|
+
execute - Execute JavaScript on a URL
|
59
59
|
extract - Extract contents between two regular expressions
|
60
60
|
headlinks - Return all <head> links on URL's page
|
61
61
|
help - Shows a list of commands or help for one command
|
@@ -94,13 +94,11 @@ Comparisons can be numeric or string comparisons. A numeric comparison like `cur
|
|
94
94
|
|
95
95
|
You can also use dot syntax inside of comparisons, e.g. `[links.rel*=me]` to target the links object (`html` command), and return only the links with a `rel=me` attribute. If the comparison is to an array object (like `class` or `rel`), it will match if any of the elements of the array match your comparison.
|
96
96
|
|
97
|
-
If you end the query with a specific key, only that key will be output
|
97
|
+
If you end the query with a specific key, only that key will be output. If there's only one match, it will be output as a raw string. If there are multiple matches, output will be an array:
|
98
98
|
|
99
99
|
curlyq tags --search '#main .post h3' -q '[attrs.id*=what].source' 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/'
|
100
100
|
|
101
|
-
|
102
|
-
"<h3 id=\"whats-next\">What???s Next</h3>"
|
103
|
-
]
|
101
|
+
<h3 id="whats-next">What???s Next</h3>
|
104
102
|
|
105
103
|
#### Commands
|
106
104
|
|
@@ -138,6 +136,34 @@ COMMAND OPTIONS
|
|
138
136
|
```
|
139
137
|
|
140
138
|
|
139
|
+
##### execute
|
140
|
+
|
141
|
+
You can execute JavaScript on a given web page using the `execute` subcommand.
|
142
|
+
|
143
|
+
Example:
|
144
|
+
|
145
|
+
curlyq execute -s "NiftyAPI.find('file/save').arrow().shoot('file-save')" file:///Users/ttscoff/Desktop/Code/niftymenu/dist/MultiMarkdown-Composer.html
|
146
|
+
|
147
|
+
You can specify an element id to wait for using `--id`, and define a pause to wait after executing a script with `--wait` (defaults to 2 seconds). Scripts can be read from the command line arguments with `--script "SCRIPT"`, from STDIN with `--script -`, or from a file using `--script PATH`.
|
148
|
+
|
149
|
+
If you expect a return value, be sure to include a `return` statement in your executed script. Results will be output to STDOUT.
|
150
|
+
|
151
|
+
```
|
152
|
+
NAME
|
153
|
+
execute - Execute JavaScript on a URL
|
154
|
+
|
155
|
+
SYNOPSIS
|
156
|
+
|
157
|
+
curlyq [global options] execute [command options] URL...
|
158
|
+
|
159
|
+
COMMAND OPTIONS
|
160
|
+
-b, --browser=arg - Browser to use (firefox, chrome) (default: chrome)
|
161
|
+
-h, --header=arg - Define a header to send as key=value (may be used more than once, default: none)
|
162
|
+
-i, --id=arg - Element ID to wait for before executing (default: none)
|
163
|
+
-s, --script=arg - Script to execute, use - to read from STDIN (may be used more than once, default: none)
|
164
|
+
-w, --wait=arg - Seconds to wait after executing JS (default: 2)
|
165
|
+
```
|
166
|
+
|
141
167
|
##### headlinks
|
142
168
|
|
143
169
|
Example:
|
@@ -440,6 +466,9 @@ Example:
|
|
440
466
|
|
441
467
|
Screenshot saved to /Users/ttscoff/Desktop/test.png
|
442
468
|
|
469
|
+
You can wait for an element ID to be visible using `--id`. This can be any `#ID` on the page. If the ID doesn't exist on the page, though, the screenshot will hang for a timeout of 10 seconds.
|
470
|
+
|
471
|
+
You can execute a script before taking the screenshot with the `--script` flag. If this is set to `-`, it will read the script from STDIN. If it's set to an existing file path, that file will be read for script input. Specify an interval (in seconds) to wait after executing the script with `--wait`.
|
443
472
|
|
444
473
|
```
|
445
474
|
NAME
|
@@ -452,8 +481,11 @@ SYNOPSIS
|
|
452
481
|
COMMAND OPTIONS
|
453
482
|
-b, --browser=arg - Browser to use (firefox, chrome) (default: chrome)
|
454
483
|
-h, --header=arg - Define a header to send as key=value (may be used more than once, default: none)
|
484
|
+
-i, --id=arg - Element ID to wait for before taking screenshot (default: none)
|
455
485
|
-o, --out, --file=arg - File destination (required, default: none)
|
486
|
+
-s, --script=arg - Script to execute before taking screenshot (may be used more than once, default: none)
|
456
487
|
-t, --type=arg - Type of screenshot to save (full (requires firefox), print, visible) (default: visible)
|
488
|
+
-w, --wait=arg - Time to wait before taking screenshot (default: 0)
|
457
489
|
```
|
458
490
|
|
459
491
|
##### tags
|
data/bin/curlyq
CHANGED
@@ -148,6 +148,12 @@ command %i[html curl] do |c|
|
|
148
148
|
end
|
149
149
|
output.delete_if(&:nil?)
|
150
150
|
output.delete_if(&:empty?)
|
151
|
+
|
152
|
+
if output.nil? || output.empty?
|
153
|
+
print_out([], global_options[:yaml], raw: options[:raw], pretty: global_options[:pretty])
|
154
|
+
Process.exit 1
|
155
|
+
end
|
156
|
+
|
151
157
|
output.map! { |o| o[options[:raw].to_sym] } if options[:raw]
|
152
158
|
|
153
159
|
output = output.clean_output
|
@@ -156,6 +162,61 @@ command %i[html curl] do |c|
|
|
156
162
|
end
|
157
163
|
end
|
158
164
|
|
165
|
+
desc 'Execute JavaScript on a URL'
|
166
|
+
arg_name 'URL', multiple: true
|
167
|
+
command :execute do |c|
|
168
|
+
c.desc 'Browser to use (firefox, chrome)'
|
169
|
+
c.flag %i[b browser], type: BrowserType, must_match: /^[fc].*?$/, default_value: 'chrome'
|
170
|
+
|
171
|
+
c.desc 'Define a header to send as key=value'
|
172
|
+
c.flag %i[h header], multiple: true
|
173
|
+
|
174
|
+
c.desc 'Script to execute, use - to read from STDIN'
|
175
|
+
c.flag %i[s script], multiple: true
|
176
|
+
|
177
|
+
c.desc 'Element ID to wait for before executing'
|
178
|
+
c.flag %i[i id]
|
179
|
+
|
180
|
+
c.desc 'Seconds to wait after executing JS'
|
181
|
+
c.flag %i[w wait], default_value: 2
|
182
|
+
|
183
|
+
c.action do |_, options, args|
|
184
|
+
urls = args.join(' ').split(/[, ]+/)
|
185
|
+
|
186
|
+
raise 'Script input required' unless options[:file] || options[:script]
|
187
|
+
|
188
|
+
compiled_script = []
|
189
|
+
|
190
|
+
if options[:script].count.positive?
|
191
|
+
options[:script].each do |scr|
|
192
|
+
scr.strip!
|
193
|
+
if scr == '-'
|
194
|
+
compiled_script << $stdin.read
|
195
|
+
elsif File.exist?(File.expand_path(scr))
|
196
|
+
compiled_script << IO.read(File.expand_path(scr))
|
197
|
+
else
|
198
|
+
compiled_script << scr
|
199
|
+
end
|
200
|
+
end
|
201
|
+
end
|
202
|
+
|
203
|
+
script = compiled_script.count.positive? ? compiled_script.join(';') : nil
|
204
|
+
|
205
|
+
headers = break_headers(options[:header])
|
206
|
+
|
207
|
+
browser = options[:browser]
|
208
|
+
|
209
|
+
browser = browser.is_a?(Symbol) ? browser : browser.normalize_browser_type
|
210
|
+
|
211
|
+
urls.each do |url|
|
212
|
+
c = Curl::Html.new(url)
|
213
|
+
c.headers = headers
|
214
|
+
c.browser = browser
|
215
|
+
$stdout.puts c.execute(script, options[:wait], options[:id])
|
216
|
+
end
|
217
|
+
end
|
218
|
+
end
|
219
|
+
|
159
220
|
desc 'Save a screenshot of a URL'
|
160
221
|
arg_name 'URL', multiple: true
|
161
222
|
command :screenshot do |c|
|
@@ -171,6 +232,15 @@ command :screenshot do |c|
|
|
171
232
|
c.desc 'Define a header to send as key=value'
|
172
233
|
c.flag %i[h header], multiple: true
|
173
234
|
|
235
|
+
c.desc 'Script to execute before taking screenshot'
|
236
|
+
c.flag %i[s script], multiple: true
|
237
|
+
|
238
|
+
c.desc 'Element ID to wait for before taking screenshot'
|
239
|
+
c.flag %i[i id]
|
240
|
+
|
241
|
+
c.desc 'Time to wait before taking screenshot'
|
242
|
+
c.flag %i[w wait], default_value: 0, type: Integer
|
243
|
+
|
174
244
|
c.action do |_, options, args|
|
175
245
|
urls = args.join(' ').split(/[, ]+/)
|
176
246
|
headers = break_headers(options[:header])
|
@@ -181,13 +251,30 @@ command :screenshot do |c|
|
|
181
251
|
type = type.is_a?(Symbol) ? type : type.normalize_screenshot_type
|
182
252
|
browser = browser.is_a?(Symbol) ? browser : browser.normalize_browser_type
|
183
253
|
|
254
|
+
compiled_script = []
|
255
|
+
|
256
|
+
if options[:script].count.positive?
|
257
|
+
options[:script].each do |scr|
|
258
|
+
scr.strip!
|
259
|
+
if scr == '-'
|
260
|
+
compiled_script << $stdin.read
|
261
|
+
elsif File.exist?(File.expand_path(scr))
|
262
|
+
compiled_script << IO.read(File.expand_path(scr))
|
263
|
+
else
|
264
|
+
compiled_script << scr
|
265
|
+
end
|
266
|
+
end
|
267
|
+
end
|
268
|
+
|
269
|
+
script = compiled_script.count.positive? ? compiled_script.join(';') : nil
|
270
|
+
|
184
271
|
raise 'Full page screen shots only available with Firefox' if type == :full_page && browser != :firefox
|
185
272
|
|
186
273
|
urls.each do |url|
|
187
274
|
c = Curl::Html.new(url)
|
188
275
|
c.headers = headers
|
189
276
|
c.browser = browser
|
190
|
-
c.screenshot(options[:out], type: type)
|
277
|
+
c.screenshot(options[:out], type: type, script: script, id: options[:id], wait: options[:wait])
|
191
278
|
end
|
192
279
|
end
|
193
280
|
end
|
@@ -405,6 +492,11 @@ command :images do |c|
|
|
405
492
|
end
|
406
493
|
end
|
407
494
|
|
495
|
+
if output.nil? || output.empty?
|
496
|
+
print_out([], global_options[:yaml], raw: options[:raw], pretty: global_options[:pretty])
|
497
|
+
Process.exit 1
|
498
|
+
end
|
499
|
+
|
408
500
|
print_out(output, global_options[:yaml], pretty: global_options[:pretty])
|
409
501
|
end
|
410
502
|
end
|
data/lib/curly/array.rb
CHANGED
data/lib/curly/curl/html.rb
CHANGED
@@ -128,10 +128,19 @@ module Curl
|
|
128
128
|
## save (:full_page,
|
129
129
|
## :print_page, :visible)
|
130
130
|
##
|
131
|
-
def screenshot(destination = nil, type: :full_page)
|
132
|
-
full_page = type.to_sym == :full_page
|
133
|
-
print_page = type.to_sym == :print_page
|
134
|
-
save_screenshot(destination, type: type)
|
131
|
+
def screenshot(destination = nil, type: :full_page, script: nil, id: nil, wait: 0)
|
132
|
+
# full_page = type.to_sym == :full_page
|
133
|
+
# print_page = type.to_sym == :print_page
|
134
|
+
save_screenshot(destination, type: type, script: script, id: id, wait_seconds: wait)
|
135
|
+
end
|
136
|
+
|
137
|
+
##
|
138
|
+
## @brief Execute JavaScript
|
139
|
+
##
|
140
|
+
## @param script The script to run
|
141
|
+
##
|
142
|
+
def execute(script, wait, element_id)
|
143
|
+
run_js(script, wait, element_id)
|
135
144
|
end
|
136
145
|
|
137
146
|
##
|
@@ -145,12 +154,11 @@ module Curl
|
|
145
154
|
def extract(before, after, inclusive: false)
|
146
155
|
before = /#{Regexp.escape(before)}/ unless before.is_a?(Regexp)
|
147
156
|
after = /#{Regexp.escape(after)}/ unless after.is_a?(Regexp)
|
148
|
-
|
149
|
-
|
150
|
-
|
151
|
-
|
152
|
-
|
153
|
-
end
|
157
|
+
rx = if inclusive
|
158
|
+
/(#{before.source}.*?#{after.source})/m
|
159
|
+
else
|
160
|
+
/(?<=#{before.source})(.*?)(?=#{after.source})/m
|
161
|
+
end
|
154
162
|
@body.scan(rx).map { |r| @clean ? r[0].clean : r[0] }
|
155
163
|
end
|
156
164
|
|
@@ -281,6 +289,7 @@ module Curl
|
|
281
289
|
end
|
282
290
|
when /img/
|
283
291
|
next unless %i[all img].include?(type)
|
292
|
+
|
284
293
|
width = img[:attrs]['width']
|
285
294
|
height = img[:attrs]['height']
|
286
295
|
alt = img[:attrs]['alt']
|
@@ -293,7 +302,7 @@ module Curl
|
|
293
302
|
height: height || 'unknown',
|
294
303
|
alt: alt,
|
295
304
|
title: title,
|
296
|
-
attrs: img[:attrs]
|
305
|
+
attrs: img[:attrs]
|
297
306
|
}
|
298
307
|
end
|
299
308
|
end
|
@@ -325,7 +334,9 @@ module Curl
|
|
325
334
|
##
|
326
335
|
def h(level = '\d')
|
327
336
|
res = []
|
328
|
-
headlines = @body.to_enum(:scan, %r{<h(?<level>#{level})(?<tag> .*?)?>(?<text>.*?)</h#{level}>}i).map
|
337
|
+
headlines = @body.to_enum(:scan, %r{<h(?<level>#{level})(?<tag> .*?)?>(?<text>.*?)</h#{level}>}i).map do
|
338
|
+
Regexp.last_match
|
339
|
+
end
|
329
340
|
headlines.each do |m|
|
330
341
|
headline = { level: m['level'] }
|
331
342
|
if m['tag'].nil?
|
@@ -428,7 +439,13 @@ module Curl
|
|
428
439
|
attrs = tag['attrs'].strip.to_enum(:scan, /(?ix)
|
429
440
|
(?<key>[@a-z0-9-]+)(?:=(?<quot>["'])
|
430
441
|
(?<value>[^"']+)\k<quot>|[ >])?/i).map { Regexp.last_match }
|
431
|
-
attributes = attrs.each_with_object({})
|
442
|
+
attributes = attrs.each_with_object({}) do |a, hsh|
|
443
|
+
if a['value'].nil?
|
444
|
+
hsh[a['key']] = nil
|
445
|
+
else
|
446
|
+
hsh[a['key']] = a['key'] =~ /^(class|rel)$/ ? a['value'].split(/ /) : a['value']
|
447
|
+
end
|
448
|
+
end
|
432
449
|
end
|
433
450
|
{
|
434
451
|
tag: tag['tag'],
|
@@ -604,14 +621,46 @@ module Curl
|
|
604
621
|
res
|
605
622
|
end
|
606
623
|
|
624
|
+
##
|
625
|
+
## Run JavaScript on a URL
|
626
|
+
##
|
627
|
+
## @param script The JavaScript to execute
|
628
|
+
## @param wait Seconds to wait after executing JS
|
629
|
+
## @param element_id The element identifier
|
630
|
+
##
|
631
|
+
def run_js(script, wait_seconds = 2, element_id = nil)
|
632
|
+
raise 'No script provided' if script.nil?
|
633
|
+
|
634
|
+
browser = @browser.is_a?(String) ? @browser.normalize_browser_type : @browser
|
635
|
+
|
636
|
+
driver = Selenium::WebDriver.for browser
|
637
|
+
|
638
|
+
driver.manage.timeouts.implicit_wait = 15
|
639
|
+
res = nil
|
640
|
+
begin
|
641
|
+
driver.get @url
|
642
|
+
if element_id
|
643
|
+
wait = Selenium::WebDriver::Wait.new(timeout: 10) # seconds
|
644
|
+
wait.until { driver.find_element(id: element_id) }
|
645
|
+
end
|
646
|
+
res = driver.execute_script(script)
|
647
|
+
sleep wait_seconds.to_i
|
648
|
+
ensure
|
649
|
+
driver.quit
|
650
|
+
end
|
651
|
+
|
652
|
+
warn "Executed JS on #{@url}"
|
653
|
+
|
654
|
+
res
|
655
|
+
end
|
656
|
+
|
607
657
|
##
|
608
658
|
## Save a screenshot of a url
|
609
659
|
##
|
610
660
|
## @param destination [String] File path destination
|
611
|
-
## @param browser [Symbol] The browser (:chrome or :firefox)
|
612
661
|
## @param type [Symbol] The type of screenshot (:full_page, :print_page, or :visible)
|
613
662
|
##
|
614
|
-
def save_screenshot(destination = nil, type: :full_page)
|
663
|
+
def save_screenshot(destination = nil, type: :full_page, script: nil, wait_seconds: 0, id: nil)
|
615
664
|
raise 'No URL provided' if url.nil?
|
616
665
|
|
617
666
|
raise 'No file destination provided' if destination.nil?
|
@@ -620,7 +669,7 @@ module Curl
|
|
620
669
|
|
621
670
|
raise 'Path doesn\'t exist' unless File.directory?(File.dirname(destination))
|
622
671
|
|
623
|
-
browser = browser.
|
672
|
+
browser = @browser.is_a?(String) ? @browser.normalize_browser_type : @browser
|
624
673
|
type = type.normalize_screenshot_type if type.is_a?(String)
|
625
674
|
raise 'Can not save full screen with Chrome, use Firefox' if type == :full_page && browser == :chrome
|
626
675
|
|
@@ -631,10 +680,19 @@ module Curl
|
|
631
680
|
"#{destination.sub(/\.(pdf|jpe?g|png)$/, '')}.png"
|
632
681
|
end
|
633
682
|
|
634
|
-
driver = Selenium::WebDriver.for
|
683
|
+
driver = Selenium::WebDriver.for browser
|
635
684
|
driver.manage.timeouts.implicit_wait = 4
|
636
685
|
begin
|
637
686
|
driver.get @url
|
687
|
+
if id
|
688
|
+
wait = Selenium::WebDriver::Wait.new(timeout: 10) # seconds
|
689
|
+
wait.until { driver.find_element(id: id) }
|
690
|
+
end
|
691
|
+
|
692
|
+
res = driver.execute_script(script) if script
|
693
|
+
|
694
|
+
sleep wait_seconds.to_i
|
695
|
+
|
638
696
|
case type
|
639
697
|
when :print_page
|
640
698
|
driver.save_print_page(destination)
|
@@ -647,7 +705,7 @@ module Curl
|
|
647
705
|
driver.quit
|
648
706
|
end
|
649
707
|
|
650
|
-
|
708
|
+
warn "Screenshot saved to #{destination}"
|
651
709
|
end
|
652
710
|
|
653
711
|
##
|
@@ -786,13 +844,11 @@ module Curl
|
|
786
844
|
## @return [Boolean] true if hostnames match
|
787
845
|
##
|
788
846
|
def same_origin?(href)
|
789
|
-
|
790
|
-
|
791
|
-
|
792
|
-
|
793
|
-
|
794
|
-
false
|
795
|
-
end
|
847
|
+
uri = URI(href)
|
848
|
+
origin = URI(@url)
|
849
|
+
uri.host == origin.host
|
850
|
+
rescue StandardError
|
851
|
+
false
|
796
852
|
end
|
797
853
|
end
|
798
854
|
end
|
data/lib/curly/version.rb
CHANGED
data/src/_README.md
CHANGED
@@ -13,7 +13,7 @@ _If you find this useful, feel free to [buy me some coffee][donate]._
|
|
13
13
|
[jq]: https://github.com/jqlang/jq "Command-line JSON processor"
|
14
14
|
[yq]: https://github.com/mikefarah/yq "yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor"
|
15
15
|
|
16
|
-
The current version of `curlyq` is <!--VER-->0.0.
|
16
|
+
The current version of `curlyq` is <!--VER-->0.0.12<!--END VER-->.
|
17
17
|
|
18
18
|
CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like [jq] to parse the output.
|
19
19
|
|
@@ -68,13 +68,11 @@ Comparisons can be numeric or string comparisons. A numeric comparison like `cur
|
|
68
68
|
|
69
69
|
You can also use dot syntax inside of comparisons, e.g. `[links.rel*=me]` to target the links object (`html` command), and return only the links with a `rel=me` attribute. If the comparison is to an array object (like `class` or `rel`), it will match if any of the elements of the array match your comparison.
|
70
70
|
|
71
|
-
If you end the query with a specific key, only that key will be output
|
71
|
+
If you end the query with a specific key, only that key will be output. If there's only one match, it will be output as a raw string. If there are multiple matches, output will be an array:
|
72
72
|
|
73
73
|
curlyq tags --search '#main .post h3' -q '[attrs.id*=what].source' 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/'
|
74
74
|
|
75
|
-
|
76
|
-
"<h3 id=\"whats-next\">What’s Next</h3>"
|
77
|
-
]
|
75
|
+
<h3 id="whats-next">What’s Next</h3>
|
78
76
|
|
79
77
|
#### Commands
|
80
78
|
|
@@ -97,6 +95,22 @@ This specifies a before and after string and includes them (`-i`) in the result.
|
|
97
95
|
```
|
98
96
|
|
99
97
|
|
98
|
+
##### execute
|
99
|
+
|
100
|
+
You can execute JavaScript on a given web page using the `execute` subcommand.
|
101
|
+
|
102
|
+
Example:
|
103
|
+
|
104
|
+
curlyq execute -s "NiftyAPI.find('file/save').arrow().shoot('file-save')" file:///Users/ttscoff/Desktop/Code/niftymenu/dist/MultiMarkdown-Composer.html
|
105
|
+
|
106
|
+
You can specify an element id to wait for using `--id`, and define a pause to wait after executing a script with `--wait` (defaults to 2 seconds). Scripts can be read from the command line arguments with `--script "SCRIPT"`, from STDIN with `--script -`, or from a file using `--script PATH`.
|
107
|
+
|
108
|
+
If you expect a return value, be sure to include a `return` statement in your executed script. Results will be output to STDOUT.
|
109
|
+
|
110
|
+
```
|
111
|
+
@cli(bundle exec bin/curlyq help execute)
|
112
|
+
```
|
113
|
+
|
100
114
|
##### headlinks
|
101
115
|
|
102
116
|
Example:
|
@@ -323,6 +337,9 @@ Example:
|
|
323
337
|
|
324
338
|
Screenshot saved to /Users/ttscoff/Desktop/test.png
|
325
339
|
|
340
|
+
You can wait for an element ID to be visible using `--id`. This can be any `#ID` on the page. If the ID doesn't exist on the page, though, the screenshot will hang for a timeout of 10 seconds.
|
341
|
+
|
342
|
+
You can execute a script before taking the screenshot with the `--script` flag. If this is set to `-`, it will read the script from STDIN. If it's set to an existing file path, that file will be read for script input. Specify an interval (in seconds) to wait after executing the script with `--wait`.
|
326
343
|
|
327
344
|
```
|
328
345
|
@cli(bundle exec bin/curlyq help screenshot)
|
data/test/curlyq_html_test.rb
CHANGED
@@ -11,7 +11,7 @@ class CurlyQHtmlTest < Test::Unit::TestCase
|
|
11
11
|
include CurlyQHelpers
|
12
12
|
|
13
13
|
def test_html_search_query
|
14
|
-
result = curlyq('html', '-s', '#main article .aligncenter', '-q', 'images[
|
14
|
+
result = curlyq('html', '-s', '#main article .aligncenter', '-q', 'images[0]', 'https://brettterpstra.com/2024/10/19/web-excursions-for-october-19-2024/')
|
15
15
|
json = JSON.parse(result)
|
16
16
|
|
17
17
|
assert_match(/aligncenter/, json[0]['class'], 'Should have found an image with class "aligncenter"')
|
data/test/curlyq_scrape_test.rb
CHANGED
@@ -13,13 +13,13 @@ class CurlyQScrapeTest < Test::Unit::TestCase
|
|
13
13
|
def setup
|
14
14
|
@screenshot = File.join(File.dirname(__FILE__), 'screenshot_test')
|
15
15
|
FileUtils.rm_f("#{@screenshot}.pdf") if File.exist?("#{@screenshot}.pdf")
|
16
|
-
FileUtils.rm_f(
|
16
|
+
FileUtils.rm_f("#{@screenshot}.png") if File.exist?("#{@screenshot}.png")
|
17
17
|
FileUtils.rm_f("#{@screenshot}_full.png") if File.exist?("#{@screenshot}_full.png")
|
18
18
|
end
|
19
19
|
|
20
20
|
def teardown
|
21
21
|
FileUtils.rm_f("#{@screenshot}.pdf") if File.exist?("#{@screenshot}.pdf")
|
22
|
-
FileUtils.rm_f(
|
22
|
+
FileUtils.rm_f("#{@screenshot}.png") if File.exist?("#{@screenshot}.png")
|
23
23
|
FileUtils.rm_f("#{@screenshot}_full.png") if File.exist?("#{@screenshot}_full.png")
|
24
24
|
end
|
25
25
|
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: curlyq
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.13
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Brett Terpstra
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2024-
|
11
|
+
date: 2024-10-25 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rake
|