curlyq 0.0.7 → 0.0.9
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +20 -0
- data/Gemfile.lock +1 -1
- data/README.md +20 -4
- data/bin/curlyq +24 -19
- data/lib/curly/array.rb +8 -10
- data/lib/curly/hash.rb +142 -37
- data/lib/curly/version.rb +1 -1
- data/src/_README.md +19 -3
- data/test/curlyq_extract_test.rb +1 -1
- data/test/curlyq_html_test.rb +2 -2
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 44e01914de08789721522e24e506fe88a49106610fac9a16736efcba0916be88
|
4
|
+
data.tar.gz: ec2887fee0dab67c64c0095f59091e867445b22559674b31f2eea64d8f4b9fea
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: aa8338482e3d9414e6347195abc7ba8645adbc120f13469f219df7f2aa5fa1ba3c209740c608f728539969502965f6c29afc9d65b5ba411c8387b26ebd640c9d
|
7
|
+
data.tar.gz: 4fe071cb872a259795163da084851afdb5d003f7607a82a5ab6fe868f7f2edae22f0caa31327dfe1bad31e28e48fddde1857d882344359db8bf47b8579aef22c
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,23 @@
|
|
1
|
+
### 0.0.9
|
2
|
+
|
3
|
+
2024-01-16 12:38
|
4
|
+
|
5
|
+
#### IMPROVED
|
6
|
+
|
7
|
+
- You can now use dot syntax inside of a square bracket comparison in --query (`[attrs.id*=what]`)
|
8
|
+
- *=, ^=, $=, and == work with array values
|
9
|
+
- [] comparisons with no comparison, e.g. [attrs.id], will return every match that has that element populated
|
10
|
+
|
11
|
+
### 0.0.8
|
12
|
+
|
13
|
+
2024-01-15 16:45
|
14
|
+
|
15
|
+
#### IMPROVED
|
16
|
+
|
17
|
+
- Dot syntax query can now operate on a full array using empty set []
|
18
|
+
- Dot syntax query should output a specific key, e.g. attrs[id*=news].content (work in progress)
|
19
|
+
- Dot query syntax handling touch-ups. Piping to jq is still more flexible, but the basics are there.
|
20
|
+
|
1
21
|
### 0.0.7
|
2
22
|
|
3
23
|
2024-01-12 17:03
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -10,10 +10,13 @@ _If you find this useful, feel free to [buy me some coffee][donate]._
|
|
10
10
|
[donate]: https://brettterpstra.com/donate
|
11
11
|
|
12
12
|
|
13
|
-
|
13
|
+
[jq]: https://github.com/jqlang/jq "Command-line JSON processor"
|
14
|
+
[yq]: https://github.com/mikefarah/yq "yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor"
|
15
|
+
|
16
|
+
The current version of `curlyq` is 0.0.9
|
14
17
|
.
|
15
18
|
|
16
|
-
CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like
|
19
|
+
CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like [jq] to parse the output.
|
17
20
|
|
18
21
|
[github]: https://github.com/ttscoff/curlyq/
|
19
22
|
|
@@ -44,7 +47,7 @@ SYNOPSIS
|
|
44
47
|
curlyq [global options] command [command options] [arguments...]
|
45
48
|
|
46
49
|
VERSION
|
47
|
-
0.0.
|
50
|
+
0.0.9
|
48
51
|
|
49
52
|
GLOBAL OPTIONS
|
50
53
|
--help - Show this message
|
@@ -71,6 +74,9 @@ You can shape the results using `--search` (`-s`) and `--query` (`-q`) on some c
|
|
71
74
|
|
72
75
|
A search uses either CSS or XPath syntax to locate elements. For example, if you wanted to locate all of the `<article>` elements with a class of `post` inside of the div with an id of `main`, you would run `--search '#main article.post'`. Searches can target tags, ids, and classes, and can accept `>` to target direct descendents. You can also use XPaths, but I hate those so I'm not going to document them.
|
73
76
|
|
77
|
+
> I've tried to make the query function useful, but if you want to do any kind of advanced shaping, you're better off piping the JSON output to [jq] or [yq].
|
78
|
+
|
79
|
+
|
74
80
|
Queries are specifically for shaping CurlyQ output. If you're using the `html` command, it returns a key called `images`, so you can target just the images in the response with `-q 'images'`. The queries accept array syntax, so to get the first image, you would use `-q 'images[0]'`. Ranges are accepted as well, so `-q 'images[1..4]'` will return the 2nd through 5th images found on the page. You can also do comparisons, e.g. `images[rel=me]'` to target only images with a `rel` attribute of `me`.
|
75
81
|
|
76
82
|
The comparisons for the query flag are:
|
@@ -84,6 +90,16 @@ The comparisons for the query flag are:
|
|
84
90
|
- `^=` starts with text
|
85
91
|
- `$=` ends with text
|
86
92
|
|
93
|
+
Comparisons can be numeric or string comparisons. A numeric comparison like `curlyq images -q '[width>500]' URL` would return all of the images on the page with a width attribute greater than 500.
|
94
|
+
|
95
|
+
You can also use dot syntax inside of comparisons, e.g. `[links.rel*=me]` to target the links object (`html` command), and return only the links with a `rel=me` attribute. If the comparison is to an array object (like `class` or `rel`), it will match if any of the elements of the array match your comparison.
|
96
|
+
|
97
|
+
If you end the query with a specific key, only that key will be output. If there's only one match, it will be output as a raw string. If there are multiple matches, output will be an array:
|
98
|
+
|
99
|
+
curlyq tags --search '#main .post h3' -q '[attrs.id*=what].source' 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/'
|
100
|
+
|
101
|
+
<h3 id="whats-next">What???s Next</h3>
|
102
|
+
|
87
103
|
#### Commands
|
88
104
|
|
89
105
|
curlyq makes use of subcommands, e.g. `curlyq html [options] URL` or `curlyq extract [options] URL`. Each subcommand takes its own options, but I've made an effort to standardize the choices between each command as much as possible.
|
@@ -440,7 +456,7 @@ COMMAND OPTIONS
|
|
440
456
|
|
441
457
|
Return a hierarchy of all tags in a page. Use `-t` to limit to a specific tag.
|
442
458
|
|
443
|
-
curlyq tags --search '#main .post h3' -q 'attrs
|
459
|
+
curlyq tags --search '#main .post h3' -q '[attrs.id*=what]' https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/
|
444
460
|
|
445
461
|
[
|
446
462
|
{
|
data/bin/curlyq
CHANGED
@@ -130,13 +130,13 @@ command %i[html curl] do |c|
|
|
130
130
|
out = res.parse(source)
|
131
131
|
|
132
132
|
if options[:query]
|
133
|
-
out = out.to_data(url: url, clean: options[:clean]).dot_query(options[:query])
|
133
|
+
out = out.to_data(url: url, clean: options[:clean]).dot_query(options[:query], full_tag: false)
|
134
134
|
else
|
135
135
|
out = out.to_data
|
136
136
|
end
|
137
137
|
output.push([out])
|
138
138
|
elsif options[:query]
|
139
|
-
queried = res.to_data.dot_query(options[:query])
|
139
|
+
queried = res.to_data.dot_query(options[:query], full_tag: false)
|
140
140
|
output.push(queried) if queried
|
141
141
|
else
|
142
142
|
output.push(res.to_data(url: url))
|
@@ -147,6 +147,12 @@ command %i[html curl] do |c|
|
|
147
147
|
# output = output[0] if output.count == 1
|
148
148
|
output.map! { |o| o[options[:raw].to_sym] } if options[:raw]
|
149
149
|
|
150
|
+
if output.is_a?(Array)
|
151
|
+
while output.length == 1
|
152
|
+
output = output[0]
|
153
|
+
end
|
154
|
+
end
|
155
|
+
|
150
156
|
print_out(output, global_options[:yaml], raw: options[:raw], pretty: global_options[:pretty])
|
151
157
|
end
|
152
158
|
end
|
@@ -342,9 +348,7 @@ command :tags do |c|
|
|
342
348
|
out = out.dot_query(options[:query]) if options[:query]
|
343
349
|
output.push(out)
|
344
350
|
elsif options[:query]
|
345
|
-
|
346
|
-
|
347
|
-
output = res.to_data.dot_query(query)
|
351
|
+
output = res.to_data.dot_query(options[:query])
|
348
352
|
elsif tags.count.positive?
|
349
353
|
tags.each { |tag| output.concat(res.tags(tag)) }
|
350
354
|
else
|
@@ -352,7 +356,9 @@ command :tags do |c|
|
|
352
356
|
end
|
353
357
|
end
|
354
358
|
|
355
|
-
|
359
|
+
while output.is_a?(Array) && output.count == 1
|
360
|
+
output = output[0]
|
361
|
+
end
|
356
362
|
|
357
363
|
if options[:source]
|
358
364
|
puts output.to_html
|
@@ -393,13 +399,13 @@ command :images do |c|
|
|
393
399
|
res.curl
|
394
400
|
|
395
401
|
res = res.images(types: types)
|
402
|
+
res = { images: res }.dot_query(options[:query], 'images', full_tag: false) if options[:query]
|
396
403
|
|
397
|
-
if
|
398
|
-
|
399
|
-
|
404
|
+
if res.is_a?(Array)
|
405
|
+
output.concat(res)
|
406
|
+
else
|
407
|
+
output.push(res)
|
400
408
|
end
|
401
|
-
|
402
|
-
output.concat(res)
|
403
409
|
end
|
404
410
|
|
405
411
|
print_out(output, global_options[:yaml], pretty: global_options[:pretty])
|
@@ -439,9 +445,9 @@ command :links do |c|
|
|
439
445
|
res.curl
|
440
446
|
|
441
447
|
if options[:query]
|
442
|
-
|
443
|
-
|
444
|
-
output.concat(queried) if queried
|
448
|
+
queried = res.to_data.dot_query(options[:query], 'links', full_tag: false)
|
449
|
+
|
450
|
+
queried.is_a?(Array) ? output.concat(queried) : output.push(queried) if queried
|
445
451
|
else
|
446
452
|
output.concat(res.body_links)
|
447
453
|
end
|
@@ -469,9 +475,8 @@ command :headlinks do |c|
|
|
469
475
|
res.curl
|
470
476
|
|
471
477
|
if options[:query]
|
472
|
-
|
473
|
-
queried
|
474
|
-
output.concat(queried) if queried
|
478
|
+
queried = { links: res.to_data[:meta_links] }.dot_query(options[:query], 'links', full_tag: false)
|
479
|
+
output.push(queried) if queried
|
475
480
|
else
|
476
481
|
output.push(res.to_data[:meta_links])
|
477
482
|
end
|
@@ -516,10 +521,10 @@ command :scrape do |c|
|
|
516
521
|
if options[:search]
|
517
522
|
out = res.search(options[:search])
|
518
523
|
|
519
|
-
out = out.dot_query(options[:query]) if options[:query]
|
524
|
+
out = out.dot_query(options[:query], full_tag: false) if options[:query]
|
520
525
|
output.push(out)
|
521
526
|
elsif options[:query]
|
522
|
-
queried = res.to_data(url: url).dot_query(options[:query])
|
527
|
+
queried = res.to_data(url: url).dot_query(options[:query], full_tag: false)
|
523
528
|
output.push(queried) if queried
|
524
529
|
else
|
525
530
|
output.push(res.to_data(url: url))
|
data/lib/curly/array.rb
CHANGED
@@ -74,20 +74,18 @@ class ::Array
|
|
74
74
|
## @return [Array] elements matching dot query
|
75
75
|
##
|
76
76
|
def dot_query(path)
|
77
|
-
|
78
|
-
|
79
|
-
|
80
|
-
|
81
|
-
|
82
|
-
r
|
83
|
-
end
|
84
|
-
end
|
77
|
+
res = map { |el| el.dot_query(path) }
|
78
|
+
res.delete_if { |r| !r }
|
79
|
+
res.delete_if(&:empty?)
|
80
|
+
res
|
81
|
+
end
|
85
82
|
|
86
|
-
|
83
|
+
def get_value(path)
|
84
|
+
map { |el| el.get_value(path) }
|
87
85
|
end
|
88
86
|
|
89
87
|
def to_html
|
90
|
-
map
|
88
|
+
map(&:to_html)
|
91
89
|
end
|
92
90
|
|
93
91
|
##
|
data/lib/curly/hash.rb
CHANGED
@@ -29,24 +29,62 @@ class ::Hash
|
|
29
29
|
end
|
30
30
|
end
|
31
31
|
|
32
|
+
def get_value(query)
|
33
|
+
return nil if self.empty?
|
34
|
+
stringify_keys!
|
35
|
+
|
36
|
+
query.split('.').inject(self) do |v, k|
|
37
|
+
if v.is_a? Array
|
38
|
+
return v.map { |el| el.get_value(k) }
|
39
|
+
end
|
40
|
+
# k = k.to_i if v.is_a? Array
|
41
|
+
next unless v.key?(k)
|
42
|
+
|
43
|
+
v.fetch(k)
|
44
|
+
end
|
45
|
+
end
|
46
|
+
|
32
47
|
# Extract data using a dot-syntax path
|
33
48
|
#
|
34
49
|
# @param path [String] The path
|
35
50
|
#
|
36
51
|
# @return Result of path query
|
37
52
|
#
|
38
|
-
def dot_query(path)
|
53
|
+
def dot_query(path, root = nil, full_tag: true)
|
39
54
|
res = stringify_keys
|
55
|
+
res = res[root] unless root.nil?
|
56
|
+
|
57
|
+
unless path =~ /\[/
|
58
|
+
return res.get_value(path)
|
59
|
+
end
|
40
60
|
|
61
|
+
path.gsub!(/\[(.*?)\]/) do
|
62
|
+
inter = Regexp.last_match(1).gsub(/\./, '%')
|
63
|
+
"[#{inter}]"
|
64
|
+
end
|
65
|
+
|
66
|
+
enumerate = false
|
41
67
|
out = []
|
42
68
|
q = path.split(/(?<![\d.])\./)
|
43
|
-
|
44
|
-
|
45
|
-
pth.
|
69
|
+
|
70
|
+
while q.count.positive?
|
71
|
+
pth = q.shift
|
72
|
+
pth.gsub!(/%/, '.')
|
73
|
+
|
74
|
+
return nil if res.nil?
|
75
|
+
|
76
|
+
unless pth =~ /\[/
|
77
|
+
return res.get_value(pth)
|
78
|
+
end
|
79
|
+
|
80
|
+
el = Regexp.last_match(1) if pth =~ /\[([0-9,.]+)?\]/
|
81
|
+
pth.sub!(/\[([0-9,.]+)?\]/, '')
|
82
|
+
|
46
83
|
ats = []
|
47
84
|
at = []
|
48
|
-
while pth =~ /\[[+&,]
|
49
|
-
m = pth.match(/\[(?<com>[,+&])? *(?<key
|
85
|
+
while pth =~ /\[[+&,]?[\w.]+( *[\^*$=<>]=? *\w+)?/
|
86
|
+
m = pth.match(/\[(?<com>[,+&])? *(?<key>[\w.]+)( *(?<op>[\^*$=<>]{1,2}) *(?<val>[^,&\]]+))? */)
|
87
|
+
|
50
88
|
comp = [m['key'], m['op'], m['val']]
|
51
89
|
case m['com']
|
52
90
|
when ','
|
@@ -56,16 +94,32 @@ class ::Hash
|
|
56
94
|
at.push(comp)
|
57
95
|
end
|
58
96
|
|
59
|
-
pth.sub!(/\[(?<com>[,&+])? *(?<key
|
97
|
+
pth.sub!(/\[(?<com>[,&+])? *(?<key>[\w.]+)( *(?<op>[\^*$=<>]{1,2}) *(?<val>[^,&\]]+))?/, '[')
|
60
98
|
end
|
61
99
|
ats.push(at) unless at.empty?
|
62
100
|
pth.sub!(/\[\]/, '')
|
63
101
|
|
64
|
-
res = res[0] if res.is_a?(Array)
|
102
|
+
res = res[0] if res.is_a?(Array) && res.count == 1
|
103
|
+
if ats.empty? && el.nil? && res.is_a?(Array) && res[0]&.key?(pth)
|
104
|
+
res.map! { |r| r[pth] }
|
105
|
+
next
|
106
|
+
end
|
107
|
+
|
108
|
+
res.map!(&:stringify_keys) if res.is_a?(Array) && res[0].is_a?(Hash)
|
109
|
+
# if res.is_a?(String) || (res.is_a?(Array) && res[0].is_a?(String))
|
110
|
+
# out.push(res)
|
111
|
+
# next
|
112
|
+
# end
|
65
113
|
|
66
|
-
|
114
|
+
# if res.is_a?(Array) && !pth.nil?
|
115
|
+
# return res.delete_if { |r| !r.key?(pth) }
|
116
|
+
# else
|
117
|
+
# return false if el.nil? && ats.empty? && res.is_a?(Hash) && (res.nil? || !res.key?(pth))
|
118
|
+
# end
|
119
|
+
tag = res
|
120
|
+
res = res[pth] unless pth.nil? || pth.empty?
|
67
121
|
|
68
|
-
|
122
|
+
pth = ''
|
69
123
|
|
70
124
|
return false if res.nil?
|
71
125
|
|
@@ -73,22 +127,49 @@ class ::Hash
|
|
73
127
|
while ats.count.positive?
|
74
128
|
atr = ats.shift
|
75
129
|
res = [res] if res.is_a?(Hash)
|
76
|
-
|
77
|
-
evaluate_comp(r, atr)
|
130
|
+
res.each do |r|
|
131
|
+
out.push(full_tag ? tag : r) if evaluate_comp(r, atr)
|
78
132
|
end
|
79
|
-
|
80
|
-
out.concat(keepers)
|
81
133
|
end
|
82
134
|
else
|
83
135
|
out = res
|
84
136
|
end
|
85
137
|
|
86
|
-
out = out
|
138
|
+
out = out.get_value(pth) unless pth.nil?
|
139
|
+
|
140
|
+
if el.nil? && out.is_a?(Array) && out[0].is_a?(Hash)
|
141
|
+
out.map! { |o|
|
142
|
+
o.stringify_keys
|
143
|
+
# o.key?(pth) ? o[pth] : o
|
144
|
+
}
|
145
|
+
elsif out.is_a?(Array) && el =~ /^[\d.,]+$/
|
146
|
+
out = out[eval(el)]
|
147
|
+
end
|
148
|
+
res = out
|
87
149
|
end
|
88
150
|
|
151
|
+
out = out[0] if out&.count == 1
|
89
152
|
out
|
90
153
|
end
|
91
154
|
|
155
|
+
def array_match(array, key, comp)
|
156
|
+
keep = false
|
157
|
+
array.each do |el|
|
158
|
+
keep = case comp
|
159
|
+
when /^\^/
|
160
|
+
key =~ /^#{el}/i ? true : false
|
161
|
+
when /^\$/
|
162
|
+
key =~ /#{el}$/i ? true : false
|
163
|
+
when /^\*/
|
164
|
+
key =~ /#{el}/i ? true : false
|
165
|
+
else
|
166
|
+
key =~ /^#{el}$/i ? true : false
|
167
|
+
end
|
168
|
+
break if keep
|
169
|
+
end
|
170
|
+
keep
|
171
|
+
end
|
172
|
+
|
92
173
|
##
|
93
174
|
## Evaluate a comparison
|
94
175
|
##
|
@@ -112,39 +193,59 @@ class ::Hash
|
|
112
193
|
else
|
113
194
|
a[2]
|
114
195
|
end
|
196
|
+
r = r.get_value(key.to_s) if key.to_s =~ /\./
|
197
|
+
|
198
|
+
if val.nil?
|
199
|
+
if r.is_a?(Hash)
|
200
|
+
return r.key?(key) && !r[key].nil? && !r[key].empty?
|
201
|
+
elsif r.is_a?(String)
|
202
|
+
return r.nil? ? false : true
|
203
|
+
elsif r.is_a?(Array)
|
204
|
+
return r.empty? ? false : true
|
205
|
+
end
|
206
|
+
end
|
115
207
|
|
116
|
-
if
|
208
|
+
if r.nil?
|
117
209
|
keep = false
|
118
|
-
elsif r
|
119
|
-
valid = r
|
120
|
-
|
121
|
-
|
122
|
-
k =~ /^#{a[2]}/i ? true : false
|
123
|
-
when /^\$/
|
124
|
-
k =~ /#{a[2]}$/i ? true : false
|
125
|
-
when /^\*/
|
126
|
-
k =~ /#{a[2]}/i ? true : false
|
210
|
+
elsif r.is_a?(Array)
|
211
|
+
valid = r.filter do |k|
|
212
|
+
if k.is_a? Array
|
213
|
+
array_match(k, a[2], a[1])
|
127
214
|
else
|
128
|
-
|
215
|
+
case a[1]
|
216
|
+
when /^\^/
|
217
|
+
k =~ /^#{a[2]}/i ? true : false
|
218
|
+
when /^\$/
|
219
|
+
k =~ /#{a[2]}$/i ? true : false
|
220
|
+
when /^\*/
|
221
|
+
k =~ /#{a[2]}/i ? true : false
|
222
|
+
else
|
223
|
+
k =~ /^#{a[2]}$/i ? true : false
|
224
|
+
end
|
129
225
|
end
|
130
226
|
end
|
131
227
|
|
132
228
|
keep = valid.count.positive?
|
133
229
|
elsif val.is_a?(Numeric) && a[1] =~ /^[<>=]{1,2}$/
|
134
|
-
k = r
|
230
|
+
k = r.to_i
|
135
231
|
comp = a[1] =~ /^=$/ ? '==' : a[1]
|
136
232
|
keep = eval("#{k}#{comp}#{val}")
|
137
233
|
else
|
138
|
-
|
139
|
-
|
140
|
-
|
141
|
-
|
142
|
-
|
143
|
-
|
144
|
-
|
145
|
-
|
146
|
-
|
147
|
-
|
234
|
+
v = r.is_a?(Hash) ? r[key] : r
|
235
|
+
if v.is_a? Array
|
236
|
+
keep = array_match(v, a[2], a[1])
|
237
|
+
else
|
238
|
+
keep = case a[1]
|
239
|
+
when /^\^/
|
240
|
+
v =~ /^#{a[2]}/i ? true : false
|
241
|
+
when /^\$/
|
242
|
+
v =~ /#{a[2]}$/i ? true : false
|
243
|
+
when /^\*/
|
244
|
+
v =~ /#{a[2]}/i ? true : false
|
245
|
+
else
|
246
|
+
v =~ /^#{a[2]}$/i ? true : false
|
247
|
+
end
|
248
|
+
end
|
148
249
|
end
|
149
250
|
|
150
251
|
return false unless keep
|
@@ -251,4 +352,8 @@ class ::Hash
|
|
251
352
|
hsh[k.to_s] = v.is_a?(Hash) ? v.stringify_keys : v
|
252
353
|
end
|
253
354
|
end
|
355
|
+
|
356
|
+
def stringify_keys!
|
357
|
+
replace stringify_keys
|
358
|
+
end
|
254
359
|
end
|
data/lib/curly/version.rb
CHANGED
data/src/_README.md
CHANGED
@@ -10,9 +10,12 @@ _If you find this useful, feel free to [buy me some coffee][donate]._
|
|
10
10
|
[donate]: https://brettterpstra.com/donate
|
11
11
|
<!--END GITHUB-->
|
12
12
|
|
13
|
-
|
13
|
+
[jq]: https://github.com/jqlang/jq "Command-line JSON processor"
|
14
|
+
[yq]: https://github.com/mikefarah/yq "yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor"
|
14
15
|
|
15
|
-
|
16
|
+
The current version of `curlyq` is <!--VER-->0.0.4<!--END VER-->.
|
17
|
+
|
18
|
+
CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like [jq] to parse the output.
|
16
19
|
|
17
20
|
[github]: https://github.com/ttscoff/curlyq/
|
18
21
|
|
@@ -45,6 +48,9 @@ You can shape the results using `--search` (`-s`) and `--query` (`-q`) on some c
|
|
45
48
|
|
46
49
|
A search uses either CSS or XPath syntax to locate elements. For example, if you wanted to locate all of the `<article>` elements with a class of `post` inside of the div with an id of `main`, you would run `--search '#main article.post'`. Searches can target tags, ids, and classes, and can accept `>` to target direct descendents. You can also use XPaths, but I hate those so I'm not going to document them.
|
47
50
|
|
51
|
+
> I've tried to make the query function useful, but if you want to do any kind of advanced shaping, you're better off piping the JSON output to [jq] or [yq].
|
52
|
+
<!--JEKYLL{:.warn}-->
|
53
|
+
|
48
54
|
Queries are specifically for shaping CurlyQ output. If you're using the `html` command, it returns a key called `images`, so you can target just the images in the response with `-q 'images'`. The queries accept array syntax, so to get the first image, you would use `-q 'images[0]'`. Ranges are accepted as well, so `-q 'images[1..4]'` will return the 2nd through 5th images found on the page. You can also do comparisons, e.g. `images[rel=me]'` to target only images with a `rel` attribute of `me`.
|
49
55
|
|
50
56
|
The comparisons for the query flag are:
|
@@ -58,6 +64,16 @@ The comparisons for the query flag are:
|
|
58
64
|
- `^=` starts with text
|
59
65
|
- `$=` ends with text
|
60
66
|
|
67
|
+
Comparisons can be numeric or string comparisons. A numeric comparison like `curlyq images -q '[width>500]' URL` would return all of the images on the page with a width attribute greater than 500.
|
68
|
+
|
69
|
+
You can also use dot syntax inside of comparisons, e.g. `[links.rel*=me]` to target the links object (`html` command), and return only the links with a `rel=me` attribute. If the comparison is to an array object (like `class` or `rel`), it will match if any of the elements of the array match your comparison.
|
70
|
+
|
71
|
+
If you end the query with a specific key, only that key will be output. If there's only one match, it will be output as a raw string. If there are multiple matches, output will be an array:
|
72
|
+
|
73
|
+
curlyq tags --search '#main .post h3' -q '[attrs.id*=what].source' 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/'
|
74
|
+
|
75
|
+
<h3 id="whats-next">What’s Next</h3>
|
76
|
+
|
61
77
|
#### Commands
|
62
78
|
|
63
79
|
curlyq makes use of subcommands, e.g. `curlyq html [options] URL` or `curlyq extract [options] URL`. Each subcommand takes its own options, but I've made an effort to standardize the choices between each command as much as possible.
|
@@ -314,7 +330,7 @@ Example:
|
|
314
330
|
|
315
331
|
Return a hierarchy of all tags in a page. Use `-t` to limit to a specific tag.
|
316
332
|
|
317
|
-
curlyq tags --search '#main .post h3' -q 'attrs
|
333
|
+
curlyq tags --search '#main .post h3' -q '[attrs.id*=what]' https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/
|
318
334
|
|
319
335
|
[
|
320
336
|
{
|
data/test/curlyq_extract_test.rb
CHANGED
data/test/curlyq_html_test.rb
CHANGED
@@ -12,9 +12,9 @@ class CurlyQHtmlTest < Test::Unit::TestCase
|
|
12
12
|
|
13
13
|
def test_html_search_query
|
14
14
|
result = curlyq('html', '-s', '#main article .aligncenter', '-q', 'images[1]', 'https://brettterpstra.com')
|
15
|
-
json = JSON.parse(result)
|
15
|
+
json = JSON.parse(result)
|
16
16
|
|
17
|
-
assert_match(/aligncenter/, json[
|
17
|
+
assert_match(/aligncenter/, json['class'], 'Should have found an image with class "aligncenter"')
|
18
18
|
end
|
19
19
|
|
20
20
|
def test_html_query
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: curlyq
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.9
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Brett Terpstra
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2024-01-
|
11
|
+
date: 2024-01-16 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rake
|