curlyq 0.0.8 → 0.0.9
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +10 -0
- data/Gemfile.lock +1 -1
- data/README.md +20 -4
- data/bin/curlyq +5 -3
- data/lib/curly/array.rb +1 -2
- data/lib/curly/hash.rb +75 -28
- data/lib/curly/version.rb +1 -1
- data/src/_README.md +18 -2
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 44e01914de08789721522e24e506fe88a49106610fac9a16736efcba0916be88
|
4
|
+
data.tar.gz: ec2887fee0dab67c64c0095f59091e867445b22559674b31f2eea64d8f4b9fea
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: aa8338482e3d9414e6347195abc7ba8645adbc120f13469f219df7f2aa5fa1ba3c209740c608f728539969502965f6c29afc9d65b5ba411c8387b26ebd640c9d
|
7
|
+
data.tar.gz: 4fe071cb872a259795163da084851afdb5d003f7607a82a5ab6fe868f7f2edae22f0caa31327dfe1bad31e28e48fddde1857d882344359db8bf47b8579aef22c
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,13 @@
|
|
1
|
+
### 0.0.9
|
2
|
+
|
3
|
+
2024-01-16 12:38
|
4
|
+
|
5
|
+
#### IMPROVED
|
6
|
+
|
7
|
+
- You can now use dot syntax inside of a square bracket comparison in --query (`[attrs.id*=what]`)
|
8
|
+
- *=, ^=, $=, and == work with array values
|
9
|
+
- [] comparisons with no comparison, e.g. [attrs.id], will return every match that has that element populated
|
10
|
+
|
1
11
|
### 0.0.8
|
2
12
|
|
3
13
|
2024-01-15 16:45
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -10,10 +10,13 @@ _If you find this useful, feel free to [buy me some coffee][donate]._
|
|
10
10
|
[donate]: https://brettterpstra.com/donate
|
11
11
|
|
12
12
|
|
13
|
-
|
13
|
+
[jq]: https://github.com/jqlang/jq "Command-line JSON processor"
|
14
|
+
[yq]: https://github.com/mikefarah/yq "yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor"
|
15
|
+
|
16
|
+
The current version of `curlyq` is 0.0.9
|
14
17
|
.
|
15
18
|
|
16
|
-
CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like
|
19
|
+
CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like [jq] to parse the output.
|
17
20
|
|
18
21
|
[github]: https://github.com/ttscoff/curlyq/
|
19
22
|
|
@@ -44,7 +47,7 @@ SYNOPSIS
|
|
44
47
|
curlyq [global options] command [command options] [arguments...]
|
45
48
|
|
46
49
|
VERSION
|
47
|
-
0.0.
|
50
|
+
0.0.9
|
48
51
|
|
49
52
|
GLOBAL OPTIONS
|
50
53
|
--help - Show this message
|
@@ -71,6 +74,9 @@ You can shape the results using `--search` (`-s`) and `--query` (`-q`) on some c
|
|
71
74
|
|
72
75
|
A search uses either CSS or XPath syntax to locate elements. For example, if you wanted to locate all of the `<article>` elements with a class of `post` inside of the div with an id of `main`, you would run `--search '#main article.post'`. Searches can target tags, ids, and classes, and can accept `>` to target direct descendents. You can also use XPaths, but I hate those so I'm not going to document them.
|
73
76
|
|
77
|
+
> I've tried to make the query function useful, but if you want to do any kind of advanced shaping, you're better off piping the JSON output to [jq] or [yq].
|
78
|
+
|
79
|
+
|
74
80
|
Queries are specifically for shaping CurlyQ output. If you're using the `html` command, it returns a key called `images`, so you can target just the images in the response with `-q 'images'`. The queries accept array syntax, so to get the first image, you would use `-q 'images[0]'`. Ranges are accepted as well, so `-q 'images[1..4]'` will return the 2nd through 5th images found on the page. You can also do comparisons, e.g. `images[rel=me]'` to target only images with a `rel` attribute of `me`.
|
75
81
|
|
76
82
|
The comparisons for the query flag are:
|
@@ -84,6 +90,16 @@ The comparisons for the query flag are:
|
|
84
90
|
- `^=` starts with text
|
85
91
|
- `$=` ends with text
|
86
92
|
|
93
|
+
Comparisons can be numeric or string comparisons. A numeric comparison like `curlyq images -q '[width>500]' URL` would return all of the images on the page with a width attribute greater than 500.
|
94
|
+
|
95
|
+
You can also use dot syntax inside of comparisons, e.g. `[links.rel*=me]` to target the links object (`html` command), and return only the links with a `rel=me` attribute. If the comparison is to an array object (like `class` or `rel`), it will match if any of the elements of the array match your comparison.
|
96
|
+
|
97
|
+
If you end the query with a specific key, only that key will be output. If there's only one match, it will be output as a raw string. If there are multiple matches, output will be an array:
|
98
|
+
|
99
|
+
curlyq tags --search '#main .post h3' -q '[attrs.id*=what].source' 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/'
|
100
|
+
|
101
|
+
<h3 id="whats-next">What???s Next</h3>
|
102
|
+
|
87
103
|
#### Commands
|
88
104
|
|
89
105
|
curlyq makes use of subcommands, e.g. `curlyq html [options] URL` or `curlyq extract [options] URL`. Each subcommand takes its own options, but I've made an effort to standardize the choices between each command as much as possible.
|
@@ -440,7 +456,7 @@ COMMAND OPTIONS
|
|
440
456
|
|
441
457
|
Return a hierarchy of all tags in a page. Use `-t` to limit to a specific tag.
|
442
458
|
|
443
|
-
curlyq tags --search '#main .post h3' -q 'attrs
|
459
|
+
curlyq tags --search '#main .post h3' -q '[attrs.id*=what]' https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/
|
444
460
|
|
445
461
|
[
|
446
462
|
{
|
data/bin/curlyq
CHANGED
@@ -130,13 +130,13 @@ command %i[html curl] do |c|
|
|
130
130
|
out = res.parse(source)
|
131
131
|
|
132
132
|
if options[:query]
|
133
|
-
out = out.to_data(url: url, clean: options[:clean]).dot_query(options[:query])
|
133
|
+
out = out.to_data(url: url, clean: options[:clean]).dot_query(options[:query], full_tag: false)
|
134
134
|
else
|
135
135
|
out = out.to_data
|
136
136
|
end
|
137
137
|
output.push([out])
|
138
138
|
elsif options[:query]
|
139
|
-
queried = res.to_data.dot_query(options[:query])
|
139
|
+
queried = res.to_data.dot_query(options[:query], full_tag: false)
|
140
140
|
output.push(queried) if queried
|
141
141
|
else
|
142
142
|
output.push(res.to_data(url: url))
|
@@ -356,7 +356,9 @@ command :tags do |c|
|
|
356
356
|
end
|
357
357
|
end
|
358
358
|
|
359
|
-
|
359
|
+
while output.is_a?(Array) && output.count == 1
|
360
|
+
output = output[0]
|
361
|
+
end
|
360
362
|
|
361
363
|
if options[:source]
|
362
364
|
puts output.to_html
|
data/lib/curly/array.rb
CHANGED
data/lib/curly/hash.rb
CHANGED
@@ -31,9 +31,15 @@ class ::Hash
|
|
31
31
|
|
32
32
|
def get_value(query)
|
33
33
|
return nil if self.empty?
|
34
|
+
stringify_keys!
|
35
|
+
|
34
36
|
query.split('.').inject(self) do |v, k|
|
35
|
-
|
37
|
+
if v.is_a? Array
|
38
|
+
return v.map { |el| el.get_value(k) }
|
39
|
+
end
|
40
|
+
# k = k.to_i if v.is_a? Array
|
36
41
|
next unless v.key?(k)
|
42
|
+
|
37
43
|
v.fetch(k)
|
38
44
|
end
|
39
45
|
end
|
@@ -52,12 +58,18 @@ class ::Hash
|
|
52
58
|
return res.get_value(path)
|
53
59
|
end
|
54
60
|
|
61
|
+
path.gsub!(/\[(.*?)\]/) do
|
62
|
+
inter = Regexp.last_match(1).gsub(/\./, '%')
|
63
|
+
"[#{inter}]"
|
64
|
+
end
|
65
|
+
|
55
66
|
enumerate = false
|
56
67
|
out = []
|
57
68
|
q = path.split(/(?<![\d.])\./)
|
58
69
|
|
59
70
|
while q.count.positive?
|
60
71
|
pth = q.shift
|
72
|
+
pth.gsub!(/%/, '.')
|
61
73
|
|
62
74
|
return nil if res.nil?
|
63
75
|
|
@@ -70,8 +82,8 @@ class ::Hash
|
|
70
82
|
|
71
83
|
ats = []
|
72
84
|
at = []
|
73
|
-
while pth =~ /\[[+&,]
|
74
|
-
m = pth.match(/\[(?<com>[,+&])? *(?<key
|
85
|
+
while pth =~ /\[[+&,]?[\w.]+( *[\^*$=<>]=? *\w+)?/
|
86
|
+
m = pth.match(/\[(?<com>[,+&])? *(?<key>[\w.]+)( *(?<op>[\^*$=<>]{1,2}) *(?<val>[^,&\]]+))? */)
|
75
87
|
|
76
88
|
comp = [m['key'], m['op'], m['val']]
|
77
89
|
case m['com']
|
@@ -82,7 +94,7 @@ class ::Hash
|
|
82
94
|
at.push(comp)
|
83
95
|
end
|
84
96
|
|
85
|
-
pth.sub!(/\[(?<com>[,&+])? *(?<key
|
97
|
+
pth.sub!(/\[(?<com>[,&+])? *(?<key>[\w.]+)( *(?<op>[\^*$=<>]{1,2}) *(?<val>[^,&\]]+))?/, '[')
|
86
98
|
end
|
87
99
|
ats.push(at) unless at.empty?
|
88
100
|
pth.sub!(/\[\]/, '')
|
@@ -110,11 +122,11 @@ class ::Hash
|
|
110
122
|
pth = ''
|
111
123
|
|
112
124
|
return false if res.nil?
|
125
|
+
|
113
126
|
if ats.count.positive?
|
114
127
|
while ats.count.positive?
|
115
128
|
atr = ats.shift
|
116
129
|
res = [res] if res.is_a?(Hash)
|
117
|
-
|
118
130
|
res.each do |r|
|
119
131
|
out.push(full_tag ? tag : r) if evaluate_comp(r, atr)
|
120
132
|
end
|
@@ -140,6 +152,24 @@ class ::Hash
|
|
140
152
|
out
|
141
153
|
end
|
142
154
|
|
155
|
+
def array_match(array, key, comp)
|
156
|
+
keep = false
|
157
|
+
array.each do |el|
|
158
|
+
keep = case comp
|
159
|
+
when /^\^/
|
160
|
+
key =~ /^#{el}/i ? true : false
|
161
|
+
when /^\$/
|
162
|
+
key =~ /#{el}$/i ? true : false
|
163
|
+
when /^\*/
|
164
|
+
key =~ /#{el}/i ? true : false
|
165
|
+
else
|
166
|
+
key =~ /^#{el}$/i ? true : false
|
167
|
+
end
|
168
|
+
break if keep
|
169
|
+
end
|
170
|
+
keep
|
171
|
+
end
|
172
|
+
|
143
173
|
##
|
144
174
|
## Evaluate a comparison
|
145
175
|
##
|
@@ -165,40 +195,57 @@ class ::Hash
|
|
165
195
|
end
|
166
196
|
r = r.get_value(key.to_s) if key.to_s =~ /\./
|
167
197
|
|
168
|
-
|
198
|
+
if val.nil?
|
199
|
+
if r.is_a?(Hash)
|
200
|
+
return r.key?(key) && !r[key].nil? && !r[key].empty?
|
201
|
+
elsif r.is_a?(String)
|
202
|
+
return r.nil? ? false : true
|
203
|
+
elsif r.is_a?(Array)
|
204
|
+
return r.empty? ? false : true
|
205
|
+
end
|
206
|
+
end
|
169
207
|
|
170
|
-
if
|
208
|
+
if r.nil?
|
171
209
|
keep = false
|
172
|
-
elsif r
|
173
|
-
valid = r
|
174
|
-
|
175
|
-
|
176
|
-
k =~ /^#{a[2]}/i ? true : false
|
177
|
-
when /^\$/
|
178
|
-
k =~ /#{a[2]}$/i ? true : false
|
179
|
-
when /^\*/
|
180
|
-
k =~ /#{a[2]}/i ? true : false
|
210
|
+
elsif r.is_a?(Array)
|
211
|
+
valid = r.filter do |k|
|
212
|
+
if k.is_a? Array
|
213
|
+
array_match(k, a[2], a[1])
|
181
214
|
else
|
182
|
-
|
215
|
+
case a[1]
|
216
|
+
when /^\^/
|
217
|
+
k =~ /^#{a[2]}/i ? true : false
|
218
|
+
when /^\$/
|
219
|
+
k =~ /#{a[2]}$/i ? true : false
|
220
|
+
when /^\*/
|
221
|
+
k =~ /#{a[2]}/i ? true : false
|
222
|
+
else
|
223
|
+
k =~ /^#{a[2]}$/i ? true : false
|
224
|
+
end
|
183
225
|
end
|
184
226
|
end
|
185
227
|
|
186
228
|
keep = valid.count.positive?
|
187
229
|
elsif val.is_a?(Numeric) && a[1] =~ /^[<>=]{1,2}$/
|
188
|
-
k = r
|
230
|
+
k = r.to_i
|
189
231
|
comp = a[1] =~ /^=$/ ? '==' : a[1]
|
190
232
|
keep = eval("#{k}#{comp}#{val}")
|
191
233
|
else
|
192
|
-
|
193
|
-
|
194
|
-
|
195
|
-
|
196
|
-
|
197
|
-
|
198
|
-
|
199
|
-
|
200
|
-
|
201
|
-
|
234
|
+
v = r.is_a?(Hash) ? r[key] : r
|
235
|
+
if v.is_a? Array
|
236
|
+
keep = array_match(v, a[2], a[1])
|
237
|
+
else
|
238
|
+
keep = case a[1]
|
239
|
+
when /^\^/
|
240
|
+
v =~ /^#{a[2]}/i ? true : false
|
241
|
+
when /^\$/
|
242
|
+
v =~ /#{a[2]}$/i ? true : false
|
243
|
+
when /^\*/
|
244
|
+
v =~ /#{a[2]}/i ? true : false
|
245
|
+
else
|
246
|
+
v =~ /^#{a[2]}$/i ? true : false
|
247
|
+
end
|
248
|
+
end
|
202
249
|
end
|
203
250
|
|
204
251
|
return false unless keep
|
data/lib/curly/version.rb
CHANGED
data/src/_README.md
CHANGED
@@ -10,9 +10,12 @@ _If you find this useful, feel free to [buy me some coffee][donate]._
|
|
10
10
|
[donate]: https://brettterpstra.com/donate
|
11
11
|
<!--END GITHUB-->
|
12
12
|
|
13
|
+
[jq]: https://github.com/jqlang/jq "Command-line JSON processor"
|
14
|
+
[yq]: https://github.com/mikefarah/yq "yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor"
|
15
|
+
|
13
16
|
The current version of `curlyq` is <!--VER-->0.0.4<!--END VER-->.
|
14
17
|
|
15
|
-
CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like
|
18
|
+
CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like [jq] to parse the output.
|
16
19
|
|
17
20
|
[github]: https://github.com/ttscoff/curlyq/
|
18
21
|
|
@@ -45,6 +48,9 @@ You can shape the results using `--search` (`-s`) and `--query` (`-q`) on some c
|
|
45
48
|
|
46
49
|
A search uses either CSS or XPath syntax to locate elements. For example, if you wanted to locate all of the `<article>` elements with a class of `post` inside of the div with an id of `main`, you would run `--search '#main article.post'`. Searches can target tags, ids, and classes, and can accept `>` to target direct descendents. You can also use XPaths, but I hate those so I'm not going to document them.
|
47
50
|
|
51
|
+
> I've tried to make the query function useful, but if you want to do any kind of advanced shaping, you're better off piping the JSON output to [jq] or [yq].
|
52
|
+
<!--JEKYLL{:.warn}-->
|
53
|
+
|
48
54
|
Queries are specifically for shaping CurlyQ output. If you're using the `html` command, it returns a key called `images`, so you can target just the images in the response with `-q 'images'`. The queries accept array syntax, so to get the first image, you would use `-q 'images[0]'`. Ranges are accepted as well, so `-q 'images[1..4]'` will return the 2nd through 5th images found on the page. You can also do comparisons, e.g. `images[rel=me]'` to target only images with a `rel` attribute of `me`.
|
49
55
|
|
50
56
|
The comparisons for the query flag are:
|
@@ -58,6 +64,16 @@ The comparisons for the query flag are:
|
|
58
64
|
- `^=` starts with text
|
59
65
|
- `$=` ends with text
|
60
66
|
|
67
|
+
Comparisons can be numeric or string comparisons. A numeric comparison like `curlyq images -q '[width>500]' URL` would return all of the images on the page with a width attribute greater than 500.
|
68
|
+
|
69
|
+
You can also use dot syntax inside of comparisons, e.g. `[links.rel*=me]` to target the links object (`html` command), and return only the links with a `rel=me` attribute. If the comparison is to an array object (like `class` or `rel`), it will match if any of the elements of the array match your comparison.
|
70
|
+
|
71
|
+
If you end the query with a specific key, only that key will be output. If there's only one match, it will be output as a raw string. If there are multiple matches, output will be an array:
|
72
|
+
|
73
|
+
curlyq tags --search '#main .post h3' -q '[attrs.id*=what].source' 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/'
|
74
|
+
|
75
|
+
<h3 id="whats-next">What’s Next</h3>
|
76
|
+
|
61
77
|
#### Commands
|
62
78
|
|
63
79
|
curlyq makes use of subcommands, e.g. `curlyq html [options] URL` or `curlyq extract [options] URL`. Each subcommand takes its own options, but I've made an effort to standardize the choices between each command as much as possible.
|
@@ -314,7 +330,7 @@ Example:
|
|
314
330
|
|
315
331
|
Return a hierarchy of all tags in a page. Use `-t` to limit to a specific tag.
|
316
332
|
|
317
|
-
curlyq tags --search '#main .post h3' -q 'attrs
|
333
|
+
curlyq tags --search '#main .post h3' -q '[attrs.id*=what]' https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/
|
318
334
|
|
319
335
|
[
|
320
336
|
{
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: curlyq
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.9
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Brett Terpstra
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2024-01-
|
11
|
+
date: 2024-01-16 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rake
|