curlyq 0.0.8 → 0.0.9

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: d3e32b382d7318b067ee3fb22f2e9057cf6aa9facfac41c74a0ebb5d4fb4743d
4
- data.tar.gz: d379da3f0db621052e61230356f5c58b587eefccbb0a4c997216516a4159b44a
3
+ metadata.gz: 44e01914de08789721522e24e506fe88a49106610fac9a16736efcba0916be88
4
+ data.tar.gz: ec2887fee0dab67c64c0095f59091e867445b22559674b31f2eea64d8f4b9fea
5
5
  SHA512:
6
- metadata.gz: ae63654deb943771e5f6f3aa0f6a037b1015336abbd696a8ce77acc22f361a3b6a18b03f3b7d02e5c7d5dcaa8d3608248bed240679acfce22ba2e462d84b529f
7
- data.tar.gz: 481f8499e45a65cb3981fcf20ef7fc9f01f97a1b7014c6566aa2f3bf7a6611fd2d5d35f78e742e4063eea192b938c0642f0ca764e5032f330778d2815a191a41
6
+ metadata.gz: aa8338482e3d9414e6347195abc7ba8645adbc120f13469f219df7f2aa5fa1ba3c209740c608f728539969502965f6c29afc9d65b5ba411c8387b26ebd640c9d
7
+ data.tar.gz: 4fe071cb872a259795163da084851afdb5d003f7607a82a5ab6fe868f7f2edae22f0caa31327dfe1bad31e28e48fddde1857d882344359db8bf47b8579aef22c
data/CHANGELOG.md CHANGED
@@ -1,3 +1,13 @@
1
+ ### 0.0.9
2
+
3
+ 2024-01-16 12:38
4
+
5
+ #### IMPROVED
6
+
7
+ - You can now use dot syntax inside of a square bracket comparison in --query (`[attrs.id*=what]`)
8
+ - *=, ^=, $=, and == work with array values
9
+ - [] comparisons with no comparison, e.g. [attrs.id], will return every match that has that element populated
10
+
1
11
  ### 0.0.8
2
12
 
3
13
  2024-01-15 16:45
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- curlyq (0.0.8)
4
+ curlyq (0.0.9)
5
5
  gli (~> 2.21.0)
6
6
  nokogiri (~> 1.16.0)
7
7
  selenium-webdriver (~> 4.16.0)
data/README.md CHANGED
@@ -10,10 +10,13 @@ _If you find this useful, feel free to [buy me some coffee][donate]._
10
10
  [donate]: https://brettterpstra.com/donate
11
11
 
12
12
 
13
- The current version of `curlyq` is 0.0.8
13
+ [jq]: https://github.com/jqlang/jq "Command-line JSON processor"
14
+ [yq]: https://github.com/mikefarah/yq "yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor"
15
+
16
+ The current version of `curlyq` is 0.0.9
14
17
  .
15
18
 
16
- CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like `jq` to parse the output.
19
+ CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like [jq] to parse the output.
17
20
 
18
21
  [github]: https://github.com/ttscoff/curlyq/
19
22
 
@@ -44,7 +47,7 @@ SYNOPSIS
44
47
  curlyq [global options] command [command options] [arguments...]
45
48
 
46
49
  VERSION
47
- 0.0.8
50
+ 0.0.9
48
51
 
49
52
  GLOBAL OPTIONS
50
53
  --help - Show this message
@@ -71,6 +74,9 @@ You can shape the results using `--search` (`-s`) and `--query` (`-q`) on some c
71
74
 
72
75
  A search uses either CSS or XPath syntax to locate elements. For example, if you wanted to locate all of the `<article>` elements with a class of `post` inside of the div with an id of `main`, you would run `--search '#main article.post'`. Searches can target tags, ids, and classes, and can accept `>` to target direct descendents. You can also use XPaths, but I hate those so I'm not going to document them.
73
76
 
77
+ > I've tried to make the query function useful, but if you want to do any kind of advanced shaping, you're better off piping the JSON output to [jq] or [yq].
78
+
79
+
74
80
  Queries are specifically for shaping CurlyQ output. If you're using the `html` command, it returns a key called `images`, so you can target just the images in the response with `-q 'images'`. The queries accept array syntax, so to get the first image, you would use `-q 'images[0]'`. Ranges are accepted as well, so `-q 'images[1..4]'` will return the 2nd through 5th images found on the page. You can also do comparisons, e.g. `images[rel=me]'` to target only images with a `rel` attribute of `me`.
75
81
 
76
82
  The comparisons for the query flag are:
@@ -84,6 +90,16 @@ The comparisons for the query flag are:
84
90
  - `^=` starts with text
85
91
  - `$=` ends with text
86
92
 
93
+ Comparisons can be numeric or string comparisons. A numeric comparison like `curlyq images -q '[width>500]' URL` would return all of the images on the page with a width attribute greater than 500.
94
+
95
+ You can also use dot syntax inside of comparisons, e.g. `[links.rel*=me]` to target the links object (`html` command), and return only the links with a `rel=me` attribute. If the comparison is to an array object (like `class` or `rel`), it will match if any of the elements of the array match your comparison.
96
+
97
+ If you end the query with a specific key, only that key will be output. If there's only one match, it will be output as a raw string. If there are multiple matches, output will be an array:
98
+
99
+ curlyq tags --search '#main .post h3' -q '[attrs.id*=what].source' 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/'
100
+
101
+ <h3 id="whats-next">What???s Next</h3>
102
+
87
103
  #### Commands
88
104
 
89
105
  curlyq makes use of subcommands, e.g. `curlyq html [options] URL` or `curlyq extract [options] URL`. Each subcommand takes its own options, but I've made an effort to standardize the choices between each command as much as possible.
@@ -440,7 +456,7 @@ COMMAND OPTIONS
440
456
 
441
457
  Return a hierarchy of all tags in a page. Use `-t` to limit to a specific tag.
442
458
 
443
- curlyq tags --search '#main .post h3' -q 'attrs[id*=what]' https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/
459
+ curlyq tags --search '#main .post h3' -q '[attrs.id*=what]' https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/
444
460
 
445
461
  [
446
462
  {
data/bin/curlyq CHANGED
@@ -130,13 +130,13 @@ command %i[html curl] do |c|
130
130
  out = res.parse(source)
131
131
 
132
132
  if options[:query]
133
- out = out.to_data(url: url, clean: options[:clean]).dot_query(options[:query])
133
+ out = out.to_data(url: url, clean: options[:clean]).dot_query(options[:query], full_tag: false)
134
134
  else
135
135
  out = out.to_data
136
136
  end
137
137
  output.push([out])
138
138
  elsif options[:query]
139
- queried = res.to_data.dot_query(options[:query])
139
+ queried = res.to_data.dot_query(options[:query], full_tag: false)
140
140
  output.push(queried) if queried
141
141
  else
142
142
  output.push(res.to_data(url: url))
@@ -356,7 +356,9 @@ command :tags do |c|
356
356
  end
357
357
  end
358
358
 
359
- output = output[0] if output.count == 1
359
+ while output.is_a?(Array) && output.count == 1
360
+ output = output[0]
361
+ end
360
362
 
361
363
  if options[:source]
362
364
  puts output.to_html
data/lib/curly/array.rb CHANGED
@@ -81,8 +81,7 @@ class ::Array
81
81
  end
82
82
 
83
83
  def get_value(path)
84
- res = map { |el| el.get_value(path) }
85
- res.is_a?(Array) && res.count == 1 ? res[0] : res
84
+ map { |el| el.get_value(path) }
86
85
  end
87
86
 
88
87
  def to_html
data/lib/curly/hash.rb CHANGED
@@ -31,9 +31,15 @@ class ::Hash
31
31
 
32
32
  def get_value(query)
33
33
  return nil if self.empty?
34
+ stringify_keys!
35
+
34
36
  query.split('.').inject(self) do |v, k|
35
- k = k.to_i if v.is_a? Array
37
+ if v.is_a? Array
38
+ return v.map { |el| el.get_value(k) }
39
+ end
40
+ # k = k.to_i if v.is_a? Array
36
41
  next unless v.key?(k)
42
+
37
43
  v.fetch(k)
38
44
  end
39
45
  end
@@ -52,12 +58,18 @@ class ::Hash
52
58
  return res.get_value(path)
53
59
  end
54
60
 
61
+ path.gsub!(/\[(.*?)\]/) do
62
+ inter = Regexp.last_match(1).gsub(/\./, '%')
63
+ "[#{inter}]"
64
+ end
65
+
55
66
  enumerate = false
56
67
  out = []
57
68
  q = path.split(/(?<![\d.])\./)
58
69
 
59
70
  while q.count.positive?
60
71
  pth = q.shift
72
+ pth.gsub!(/%/, '.')
61
73
 
62
74
  return nil if res.nil?
63
75
 
@@ -70,8 +82,8 @@ class ::Hash
70
82
 
71
83
  ats = []
72
84
  at = []
73
- while pth =~ /\[[+&,]?\w+( *[\^*$=<>]=? *\w+)?/
74
- m = pth.match(/\[(?<com>[,+&])? *(?<key>\w+)( *(?<op>[\^*$=<>]{1,2}) *(?<val>[^,&\]]+))? */)
85
+ while pth =~ /\[[+&,]?[\w.]+( *[\^*$=<>]=? *\w+)?/
86
+ m = pth.match(/\[(?<com>[,+&])? *(?<key>[\w.]+)( *(?<op>[\^*$=<>]{1,2}) *(?<val>[^,&\]]+))? */)
75
87
 
76
88
  comp = [m['key'], m['op'], m['val']]
77
89
  case m['com']
@@ -82,7 +94,7 @@ class ::Hash
82
94
  at.push(comp)
83
95
  end
84
96
 
85
- pth.sub!(/\[(?<com>[,&+])? *(?<key>\w+)( *(?<op>[\^*$=<>]{1,2}) *(?<val>[^,&\]]+))?/, '[')
97
+ pth.sub!(/\[(?<com>[,&+])? *(?<key>[\w.]+)( *(?<op>[\^*$=<>]{1,2}) *(?<val>[^,&\]]+))?/, '[')
86
98
  end
87
99
  ats.push(at) unless at.empty?
88
100
  pth.sub!(/\[\]/, '')
@@ -110,11 +122,11 @@ class ::Hash
110
122
  pth = ''
111
123
 
112
124
  return false if res.nil?
125
+
113
126
  if ats.count.positive?
114
127
  while ats.count.positive?
115
128
  atr = ats.shift
116
129
  res = [res] if res.is_a?(Hash)
117
-
118
130
  res.each do |r|
119
131
  out.push(full_tag ? tag : r) if evaluate_comp(r, atr)
120
132
  end
@@ -140,6 +152,24 @@ class ::Hash
140
152
  out
141
153
  end
142
154
 
155
+ def array_match(array, key, comp)
156
+ keep = false
157
+ array.each do |el|
158
+ keep = case comp
159
+ when /^\^/
160
+ key =~ /^#{el}/i ? true : false
161
+ when /^\$/
162
+ key =~ /#{el}$/i ? true : false
163
+ when /^\*/
164
+ key =~ /#{el}/i ? true : false
165
+ else
166
+ key =~ /^#{el}$/i ? true : false
167
+ end
168
+ break if keep
169
+ end
170
+ keep
171
+ end
172
+
143
173
  ##
144
174
  ## Evaluate a comparison
145
175
  ##
@@ -165,40 +195,57 @@ class ::Hash
165
195
  end
166
196
  r = r.get_value(key.to_s) if key.to_s =~ /\./
167
197
 
168
- return r.key?(key) && !r[key].nil? && !r[key].empty? if val.nil?
198
+ if val.nil?
199
+ if r.is_a?(Hash)
200
+ return r.key?(key) && !r[key].nil? && !r[key].empty?
201
+ elsif r.is_a?(String)
202
+ return r.nil? ? false : true
203
+ elsif r.is_a?(Array)
204
+ return r.empty? ? false : true
205
+ end
206
+ end
169
207
 
170
- if !r.key?(key)
208
+ if r.nil?
171
209
  keep = false
172
- elsif r[key].is_a?(Array)
173
- valid = r[key].filter do |k|
174
- case a[1]
175
- when /^\^/
176
- k =~ /^#{a[2]}/i ? true : false
177
- when /^\$/
178
- k =~ /#{a[2]}$/i ? true : false
179
- when /^\*/
180
- k =~ /#{a[2]}/i ? true : false
210
+ elsif r.is_a?(Array)
211
+ valid = r.filter do |k|
212
+ if k.is_a? Array
213
+ array_match(k, a[2], a[1])
181
214
  else
182
- k =~ /^#{a[2]}$/i ? true : false
215
+ case a[1]
216
+ when /^\^/
217
+ k =~ /^#{a[2]}/i ? true : false
218
+ when /^\$/
219
+ k =~ /#{a[2]}$/i ? true : false
220
+ when /^\*/
221
+ k =~ /#{a[2]}/i ? true : false
222
+ else
223
+ k =~ /^#{a[2]}$/i ? true : false
224
+ end
183
225
  end
184
226
  end
185
227
 
186
228
  keep = valid.count.positive?
187
229
  elsif val.is_a?(Numeric) && a[1] =~ /^[<>=]{1,2}$/
188
- k = r[key].to_i
230
+ k = r.to_i
189
231
  comp = a[1] =~ /^=$/ ? '==' : a[1]
190
232
  keep = eval("#{k}#{comp}#{val}")
191
233
  else
192
- keep = case a[1]
193
- when /^\^/
194
- r[key] =~ /^#{a[2]}/i ? true : false
195
- when /^\$/
196
- r[key] =~ /#{a[2]}$/i ? true : false
197
- when /^\*/
198
- r[key] =~ /#{a[2]}/i ? true : false
199
- else
200
- r[key] =~ /^#{a[2]}$/i ? true : false
201
- end
234
+ v = r.is_a?(Hash) ? r[key] : r
235
+ if v.is_a? Array
236
+ keep = array_match(v, a[2], a[1])
237
+ else
238
+ keep = case a[1]
239
+ when /^\^/
240
+ v =~ /^#{a[2]}/i ? true : false
241
+ when /^\$/
242
+ v =~ /#{a[2]}$/i ? true : false
243
+ when /^\*/
244
+ v =~ /#{a[2]}/i ? true : false
245
+ else
246
+ v =~ /^#{a[2]}$/i ? true : false
247
+ end
248
+ end
202
249
  end
203
250
 
204
251
  return false unless keep
data/lib/curly/version.rb CHANGED
@@ -1,3 +1,3 @@
1
1
  module Curly
2
- VERSION = '0.0.8'
2
+ VERSION = '0.0.9'
3
3
  end
data/src/_README.md CHANGED
@@ -10,9 +10,12 @@ _If you find this useful, feel free to [buy me some coffee][donate]._
10
10
  [donate]: https://brettterpstra.com/donate
11
11
  <!--END GITHUB-->
12
12
 
13
+ [jq]: https://github.com/jqlang/jq "Command-line JSON processor"
14
+ [yq]: https://github.com/mikefarah/yq "yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor"
15
+
13
16
  The current version of `curlyq` is <!--VER-->0.0.4<!--END VER-->.
14
17
 
15
- CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like `jq` to parse the output.
18
+ CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like [jq] to parse the output.
16
19
 
17
20
  [github]: https://github.com/ttscoff/curlyq/
18
21
 
@@ -45,6 +48,9 @@ You can shape the results using `--search` (`-s`) and `--query` (`-q`) on some c
45
48
 
46
49
  A search uses either CSS or XPath syntax to locate elements. For example, if you wanted to locate all of the `<article>` elements with a class of `post` inside of the div with an id of `main`, you would run `--search '#main article.post'`. Searches can target tags, ids, and classes, and can accept `>` to target direct descendents. You can also use XPaths, but I hate those so I'm not going to document them.
47
50
 
51
+ > I've tried to make the query function useful, but if you want to do any kind of advanced shaping, you're better off piping the JSON output to [jq] or [yq].
52
+ <!--JEKYLL{:.warn}-->
53
+
48
54
  Queries are specifically for shaping CurlyQ output. If you're using the `html` command, it returns a key called `images`, so you can target just the images in the response with `-q 'images'`. The queries accept array syntax, so to get the first image, you would use `-q 'images[0]'`. Ranges are accepted as well, so `-q 'images[1..4]'` will return the 2nd through 5th images found on the page. You can also do comparisons, e.g. `images[rel=me]'` to target only images with a `rel` attribute of `me`.
49
55
 
50
56
  The comparisons for the query flag are:
@@ -58,6 +64,16 @@ The comparisons for the query flag are:
58
64
  - `^=` starts with text
59
65
  - `$=` ends with text
60
66
 
67
+ Comparisons can be numeric or string comparisons. A numeric comparison like `curlyq images -q '[width>500]' URL` would return all of the images on the page with a width attribute greater than 500.
68
+
69
+ You can also use dot syntax inside of comparisons, e.g. `[links.rel*=me]` to target the links object (`html` command), and return only the links with a `rel=me` attribute. If the comparison is to an array object (like `class` or `rel`), it will match if any of the elements of the array match your comparison.
70
+
71
+ If you end the query with a specific key, only that key will be output. If there's only one match, it will be output as a raw string. If there are multiple matches, output will be an array:
72
+
73
+ curlyq tags --search '#main .post h3' -q '[attrs.id*=what].source' 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/'
74
+
75
+ <h3 id="whats-next">What’s Next</h3>
76
+
61
77
  #### Commands
62
78
 
63
79
  curlyq makes use of subcommands, e.g. `curlyq html [options] URL` or `curlyq extract [options] URL`. Each subcommand takes its own options, but I've made an effort to standardize the choices between each command as much as possible.
@@ -314,7 +330,7 @@ Example:
314
330
 
315
331
  Return a hierarchy of all tags in a page. Use `-t` to limit to a specific tag.
316
332
 
317
- curlyq tags --search '#main .post h3' -q 'attrs[id*=what]' https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/
333
+ curlyq tags --search '#main .post h3' -q '[attrs.id*=what]' https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/
318
334
 
319
335
  [
320
336
  {
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: curlyq
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.8
4
+ version: 0.0.9
5
5
  platform: ruby
6
6
  authors:
7
7
  - Brett Terpstra
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2024-01-15 00:00:00.000000000 Z
11
+ date: 2024-01-16 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rake