curlyq 0.0.2 → 0.0.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 4a73a5990b9c07f4d564216cd13c1ea0d73a833191c3f7734e7e3e5af2954b40
4
- data.tar.gz: 8444276e61febd7b3e517eec56155a4f8754809fa8dc46c0d6e173737bca79e0
3
+ metadata.gz: 74306df5fa01c7d69f341fb38f0ef966ef1acdad835db8e2d49feb6e064ebd28
4
+ data.tar.gz: e8b117b64755738951adfd6e9265fb813118e4e187f5d1cd138788f6759f0d3d
5
5
  SHA512:
6
- metadata.gz: ca1a8c0bfc122e8020b356018276e27647449834f30eb66d7561acf187ec6cd837b59564a722ceaad5b3e99ac47de4a9944dfc370e69b92575155988a81fcfd4
7
- data.tar.gz: 3bc9ed736378cc70607d4f42ecbe1f8cc91fbe87243d0dda4f7dc9ff6e44f5cc33f687d00e188f25ae3494f47bbbfedd2f2e27e8b008048e22c6c10ce2dc3b7f
6
+ metadata.gz: a98b2b0d24cba28ef4487b5564668e1266b56cf1e097488510865e26795cb4d752ef15c8f83360cc7208bc529e48b6bc857821eb8193b9077302bc24fa9a33e0
7
+ data.tar.gz: e6c2c276a6bf265612da74085a62c3b475c7dd401dd4f149ff274902c1a21d0f7c241f6a269a5c28dceddf46b62e175d1b45c6940a2f99364be8967e2d67a67c
@@ -0,0 +1,2 @@
1
+ github: [ttscoff]
2
+ custom: ['https://brettterpstra.com/support/', 'https://brettterpstra.com/donate/']
data/CHANGELOG.md CHANGED
@@ -1,3 +1,14 @@
1
+ ### 0.0.3
2
+
3
+ 2024-01-10 13:38
4
+
5
+ #### IMPROVED
6
+
7
+ - Refactor Curl and Json libs to allow setting of options after creation of object
8
+ - Allow setting of headers on most subcommands
9
+ - --clean now affects source, head, and body keys of output
10
+ - Also remove tabs when cleaning whitespace
11
+
1
12
  ### 0.0.2
2
13
 
3
14
  2024-01-10 09:18
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- curlyq (0.0.2)
4
+ curlyq (0.0.3)
5
5
  gli (~> 2.21.0)
6
6
  nokogiri (~> 1.16.0)
7
7
  selenium-webdriver (~> 4.16.0)
data/README.md CHANGED
@@ -1,4 +1,4 @@
1
- # curlyq
1
+ # CurlyQ
2
2
 
3
3
  [![Gem](https://img.shields.io/gem/v/na.svg)](https://rubygems.org/gems/curlyq)
4
4
  [![GitHub license](https://img.shields.io/github/license/ttscoff/curlyq.svg)](./LICENSE.txt)
@@ -7,11 +7,13 @@
7
7
 
8
8
  _If you find this useful, feel free to [buy me some coffee][donate]._
9
9
 
10
+ [donate]: https://brettterpstra.com/donate
10
11
 
11
- The current version of `curlyq` is 0.0.2
12
+
13
+ The current version of `curlyq` is 0.0.3
12
14
  .
13
15
 
14
- `curlyq` is a command that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like `jq` to parse the output.
16
+ CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like `jq` to parse the output.
15
17
 
16
18
  [github]: https://github.com/ttscoff/curlyq/
17
19
 
@@ -24,11 +26,15 @@ If you're using Homebrew, you have the option to install via [brew-gem](https://
24
26
  brew install brew-gem
25
27
  brew gem install curlyq
26
28
 
27
- If you don't have Ruby/RubyGems, you can install them pretty easily with Homebrew, rvm, or asdf.
29
+ If you don't have Ruby/RubyGems, you can install them pretty easily with [Homebrew], [rvm], or [asdf].
30
+
31
+ [Homebrew]: https://brew.sh/ "Homebrew???The Missing Package Manager for macOS (or Linux)"
32
+ [rvm]: https://rvm.io/ "Ruby Version Manager (RVM)"
33
+ [asdf]: https://github.com/asdf-vm/asdf "asdf-vm/asdf:Extendable version manager with support for ..."
28
34
 
29
35
  ### Usage
30
36
 
31
- Run `curlyq help` for a list of commands. Run `curlyq help SUBCOMMAND` for details on a particular subcommand and its options.
37
+ Run `curlyq help` for a list of subcommands. Run `curlyq help SUBCOMMAND` for details on a particular subcommand and its options.
32
38
 
33
39
  ```
34
40
  NAME
@@ -38,7 +44,7 @@ SYNOPSIS
38
44
  curlyq [global options] command [command options] [arguments...]
39
45
 
40
46
  VERSION
41
- 0.0.2
47
+ 0.0.3
42
48
 
43
49
  GLOBAL OPTIONS
44
50
  --help - Show this message
@@ -61,7 +67,7 @@ COMMANDS
61
67
 
62
68
  #### Commands
63
69
 
64
- curlyq makes use of subcommands, e.g. `curlyq html` or `curlyq extract`. Each subcommand takes its own options, but I've made an effort to standardize the choices between each command.
70
+ curlyq makes use of subcommands, e.g. `curlyq html [options] URL` or `curlyq extract [options] URL`. Each subcommand takes its own options, but I've made an effort to standardize the choices between each command as much as possible.
65
71
 
66
72
  ##### extract
67
73
 
@@ -135,6 +141,7 @@ SYNOPSIS
135
141
  COMMAND OPTIONS
136
142
  -c, --[no-]compressed - Expect compressed results
137
143
  --[no-]clean - Remove extra whitespace from results
144
+ -h, --header=arg - Define a header to send as key=value (may be used more than once, default: none)
138
145
  -t, --type=arg - Type of images to return (img, srcset, opengraph, all) (may be used more than once, default: ["all"])
139
146
  ```
140
147
 
@@ -193,6 +200,8 @@ COMMAND OPTIONS
193
200
 
194
201
  ##### screenshot
195
202
 
203
+ Full-page screenshots require Firefox, installed and specified with `--browser firefox`.
204
+
196
205
  ```
197
206
  NAME
198
207
  screenshot - Save a screenshot of a URL
@@ -203,6 +212,7 @@ SYNOPSIS
203
212
 
204
213
  COMMAND OPTIONS
205
214
  -b, --browser=arg - Browser to use (firefox, chrome) (default: chrome)
215
+ -h, --header=arg - Define a header to send as key=value (may be used more than once, default: none)
206
216
  -o, --out, --file=arg - File destination (default: none)
207
217
  -t, --type=arg - Type of screenshot to save (full (requires firefox), print, visible) (default: full)
208
218
  ```
@@ -230,4 +240,4 @@ PayPal link: [paypal.me/ttscoff](https://paypal.me/ttscoff)
230
240
 
231
241
  ## Changelog
232
242
 
233
- See [CHANGELOG.md](https://github.com/ttscoff/na_gem/blob/master/CHANGELOG.md)
243
+ See [CHANGELOG.md](https://github.com/ttscoff/curlyq/blob/main/CHANGELOG.md)
data/bin/curlyq CHANGED
@@ -110,12 +110,13 @@ command %i[html curl] do |c|
110
110
  output = []
111
111
 
112
112
  urls.each do |url|
113
- res = Curl::Html.new(url, browser: options[:browser], fallback: options[:fallback],
114
- headers: headers, headers_only: options[:info],
115
- compressed: options[:compressed], clean: options[:clean],
116
- ignore_local_links: options[:ignore_relative],
117
- ignore_fragment_links: options[:ignore_fragments],
118
- external_links_only: options[:external_links_only])
113
+ res = Curl::Html.new(url, { browser: options[:browser], fallback: options[:fallback],
114
+ headers: headers, headers_only: options[:info],
115
+ compressed: options[:compressed], clean: options[:clean],
116
+ ignore_local_links: options[:ignore_relative],
117
+ ignore_fragment_links: options[:ignore_fragments],
118
+ external_links_only: options[:external_links_only] })
119
+ res.curl
119
120
 
120
121
  if options[:info]
121
122
  output.push(res.headers)
@@ -156,12 +157,18 @@ command :screenshot do |c|
156
157
  c.desc 'File destination'
157
158
  c.flag %i[o out file]
158
159
 
160
+ c.desc 'Define a header to send as key=value'
161
+ c.flag %i[h header], multiple: true
162
+
159
163
  c.action do |_, options, args|
160
164
  urls = args.join(' ').split(/[, ]+/)
165
+ headers = break_headers(options[:header])
161
166
 
162
167
  urls.each do |url|
163
168
  c = Curl::Html.new(url)
164
- c.screenshot(options[:out], browser: options[:browser], type: options[:type])
169
+ c.headers = headers
170
+ c.browser = options[:browser]
171
+ c.screenshot(options[:out], type: options[:type])
165
172
  end
166
173
  end
167
174
  end
@@ -185,7 +192,11 @@ command :json do |c|
185
192
  output = []
186
193
 
187
194
  urls.each do |url|
188
- res = Curl::Json.new(url, headers: headers, compressed: options[:compressed], symbolize_names: false)
195
+ res = Curl::Json.new(url)
196
+ res.request_headers = headers
197
+ res.compressed = options[:compressed],
198
+ res.symbolize_names = false
199
+ res.curl
189
200
 
190
201
  json = res.json
191
202
 
@@ -235,8 +246,9 @@ command :extract do |c|
235
246
  output = []
236
247
 
237
248
  urls.each do |url|
238
- res = Curl::Html.new(url, headers: headers, headers_only: false,
239
- compressed: options[:compressed], clean: options[:clean])
249
+ res = Curl::Html.new(url, { headers: headers, headers_only: false,
250
+ compressed: options[:compressed], clean: options[:clean] })
251
+ res.curl
240
252
  extracted = res.extract(options[:before], options[:after])
241
253
  extracted.strip_tags! if options[:strip]
242
254
  output.concat(extracted)
@@ -271,8 +283,9 @@ command :tags do |c|
271
283
  output = []
272
284
 
273
285
  urls.each do |url|
274
- res = Curl::Html.new(url, headers: headers, headers_only: options[:headers],
275
- compressed: options[:compressed], clean: options[:clean])
286
+ res = Curl::Html.new(url, { headers: headers, headers_only: options[:headers],
287
+ compressed: options[:compressed], clean: options[:clean] })
288
+ res.curl
276
289
  output = []
277
290
  if options[:search]
278
291
  output = res.tags.search(options[:search])
@@ -299,15 +312,20 @@ command :images do |c|
299
312
  c.desc 'Remove extra whitespace from results'
300
313
  c.switch %i[clean]
301
314
 
315
+ c.desc 'Define a header to send as key=value'
316
+ c.flag %i[h header], multiple: true
317
+
302
318
  c.action do |global_options, options, args|
303
319
  urls = args.join(' ').split(/[, ]+/)
320
+ headers = break_headers(options[:header])
304
321
 
305
322
  output = []
306
323
 
307
324
  types = options[:type].join(' ').split(/[ ,]+/).map(&:normalize_image_type)
308
325
 
309
326
  urls.each do |url|
310
- res = Curl::Html.new(url, compressed: options[:compressed], clean: options[:clean])
327
+ res = Curl::Html.new(url, { compressed: options[:compressed], clean: options[:clean] })
328
+ res.curl
311
329
  output.concat(res.images(types: types))
312
330
  end
313
331
 
@@ -339,10 +357,13 @@ command :links do |c|
339
357
  output = []
340
358
 
341
359
  urls.each do |url|
342
- res = Curl::Html.new(url, compressed: options[:compressed], clean: options[:clean],
343
- ignore_local_links: options[:ignore_relative],
344
- ignore_fragment_links: options[:ignore_fragments],
345
- external_links_only: options[:external_links_only])
360
+ res = Curl::Html.new(url, {
361
+ compressed: options[:compressed], clean: options[:clean],
362
+ ignore_local_links: options[:ignore_relative],
363
+ ignore_fragment_links: options[:ignore_fragments],
364
+ external_links_only: options[:external_links_only]
365
+ })
366
+ res.curl
346
367
 
347
368
  if options[:query]
348
369
  query = options[:query] =~ /^links/ ? options[:query] : "links#{options[:query]}"
@@ -371,7 +392,8 @@ command :headlinks do |c|
371
392
  output = []
372
393
 
373
394
  urls.each do |url|
374
- res = Curl::Html.new(url, compressed: options[:compressed], clean: options[:clean])
395
+ res = Curl::Html.new(url, { compressed: options[:compressed], clean: options[:clean] })
396
+ res.curl
375
397
 
376
398
  if options[:query]
377
399
  query = options[:query] =~ /^links/ ? options[:query] : "links#{options[:query]}"
@@ -420,7 +442,8 @@ command :scrape do |c|
420
442
  driver.get url
421
443
  res = driver.page_source
422
444
 
423
- res = Curl::Html.new(nil, source: res, clean: options[:clean])
445
+ res = Curl::Html.new(nil, { source: res, clean: options[:clean] })
446
+ res.curl
424
447
  if options[:search]
425
448
  out = res.search(options[:search])
426
449
 
@@ -10,8 +10,11 @@ module Curl
10
10
 
11
11
  # Class for CURLing an HTML page
12
12
  class Html
13
- attr_reader :url, :code, :headers, :meta, :links, :head, :body,
14
- :source, :title, :description, :body_links, :body_images, :clean
13
+ attr_accessor :settings, :browser, :source, :headers, :headers_only, :compressed, :clean, :fallback,
14
+ :ignore_local_links, :ignore_fragment_links, :external_links_only
15
+
16
+ attr_reader :url, :code, :meta, :links, :head, :body,
17
+ :title, :description, :body_links, :body_images
15
18
 
16
19
  def to_data(url: nil)
17
20
  {
@@ -20,9 +23,9 @@ module Curl
20
23
  headers: @headers,
21
24
  meta: @meta,
22
25
  meta_links: @links,
23
- head: @head,
24
- body: @body,
25
- source: @source,
26
+ head: @clean ? @head&.strip&.clean : @head,
27
+ body: @clean ? @body&.strip&.clean : @body,
28
+ source: @clean ? @source&.strip&.clean : @source,
26
29
  title: @title,
27
30
  description: @description,
28
31
  links: @body_links,
@@ -33,29 +36,48 @@ module Curl
33
36
  ##
34
37
  ## Create a new page object from a URL
35
38
  ##
36
- ## @param url [String] The url
37
- ## @param headers [Hash] The headers to use in the curl call
38
- ## @param headers_only [Boolean] Return headers only
39
- ## @param compressed [Boolean] Expect compressed result
39
+ ## @param url [String] The url
40
+ ## @param options [Hash] The options
41
+ ##
42
+ ## @option options :browser [Symbol] the browser to use instead of curl (:chrome, :firefox)
43
+ ## @option options :source [String] source provided instead of curl
44
+ ## @option options :headers [Hash] headers to send in the request
45
+ ## @option options :headers_only [Boolean] whether to return just response headers
46
+ ## @option options :compressed [Boolean] expect compressed response
47
+ ## @option options :clean [Boolean] clean whitespace from response
48
+ ## @option options :fallback [Symbol] browser to fall back to if curl doesn't work (:chrome, :firefox)
49
+ ## @option options :ignore_local_links [Boolean] when collecting links, ignore local/relative links
50
+ ## @option options :ignore_fragment_links [Boolean] when collecting links, ignore links that are just #fragments
51
+ ## @option options :external_links_only [Boolean] only collect links outside of current site
40
52
  ##
41
53
  ## @return [HTMLCurl] new page object
42
54
  ##
43
- def initialize(url, browser: nil, source: nil, headers: nil,
44
- headers_only: false, compressed: false, clean: false, fallback: false,
45
- ignore_local_links: false, ignore_fragment_links: false, external_links_only: false)
46
- @clean = clean
47
- @ignore_local_links = ignore_local_links
48
- @ignore_fragment_links = ignore_fragment_links
49
- @external_links_only = external_links_only
55
+ def initialize(url, options = {})
56
+ @browser = options[:browser] || :none
57
+ @source = options[:source]
58
+ @headers = options[:headers] || {}
59
+ @headers_only = options[:headers_only]
60
+ @compressed = options[:compressed]
61
+ @clean = options[:clean]
62
+ @fallback = options[:fallback]
63
+ @ignore_local_links = options[:ignore_local_links]
64
+ @ignore_fragment_links = options[:ignore_fragment_links]
65
+ @external_links_only = options[:external_links_only]
66
+
50
67
  @curl = TTY::Which.which('curl')
51
68
  @url = url
52
- res = if url && browser && browser != :none
53
- source = curl_dynamic_html(url, browser, headers)
54
- curl_html(nil, source: source, headers: headers)
69
+ end
70
+
71
+ def curl
72
+ res = if @url && @browser && @browser != :none
73
+ source = curl_dynamic_html
74
+ curl_html(nil, source: source, headers: @headers)
55
75
  elsif url.nil? && !source.nil?
56
- curl_html(nil, source: source, headers: headers, headers_only: headers_only, compressed: compressed, fallback: false)
76
+ curl_html(nil, source: @source, headers: @headers, headers_only: @headers_only,
77
+ compressed: @compressed, fallback: false)
57
78
  else
58
- curl_html(url, headers: headers, headers_only: headers_only, compressed: compressed, fallback: fallback)
79
+ curl_html(@url, headers: @headers, headers_only: @headers_only,
80
+ compressed: @compressed, fallback: @fallback)
59
81
  end
60
82
  @url = res[:url]
61
83
  @code = res[:code]
@@ -82,10 +104,10 @@ module Curl
82
104
  ## save (:full_page,
83
105
  ## :print_page, :visible)
84
106
  ##
85
- def screenshot(destination = nil, browser: :chrome, type: :full_page)
107
+ def screenshot(destination = nil, type: :full_page)
86
108
  full_page = type.to_sym == :full_page
87
109
  print_page = type.to_sym == :print_page
88
- save_screenshot(destination, browser: browser, type: type)
110
+ save_screenshot(destination, type: type)
89
111
  end
90
112
 
91
113
  ##
@@ -297,7 +319,7 @@ module Curl
297
319
 
298
320
  {
299
321
  tag: el.name,
300
- source: el.to_html,
322
+ source: @clean ? el.to_html&.strip&.clean : el.to_html,
301
323
  attrs: attributes,
302
324
  content: @clean ? el.text&.strip&.clean : el.text.strip,
303
325
  tags: recurse_children(el)
@@ -511,14 +533,14 @@ module Curl
511
533
  ##
512
534
  ## @return [String] page source
513
535
  ##
514
- def curl_dynamic_html(url, browser, headers)
515
- browser = browser.normalize_browser_type if browser.is_a?(String)
536
+ def curl_dynamic_html
537
+ browser = @browser.normalize_browser_type if @browser.is_a?(String)
516
538
  res = nil
517
539
 
518
540
  driver = Selenium::WebDriver.for browser
519
541
  driver.manage.timeouts.implicit_wait = 4
520
542
  begin
521
- driver.get url
543
+ driver.get @url
522
544
  res = driver.page_source
523
545
  ensure
524
546
  driver.quit
@@ -534,7 +556,7 @@ module Curl
534
556
  ## @param browser [Symbol] The browser (:chrome or :firefox)
535
557
  ## @param type [Symbol] The type of screenshot (:full_page, :print_page, or :visible)
536
558
  ##
537
- def save_screenshot(destination = nil, browser: :chrome, type: :full_page)
559
+ def save_screenshot(destination = nil, type: :full_page)
538
560
  raise 'No URL provided' if url.nil?
539
561
 
540
562
  raise 'No file destination provided' if destination.nil?
@@ -554,7 +576,7 @@ module Curl
554
576
  "#{destination.sub(/\.(pdf|jpe?g|png)$/, '')}.png"
555
577
  end
556
578
 
557
- driver = Selenium::WebDriver.for browser
579
+ driver = Selenium::WebDriver.for @browser
558
580
  driver.manage.timeouts.implicit_wait = 4
559
581
  begin
560
582
  driver.get @url
@@ -587,38 +609,38 @@ module Curl
587
609
  headers_only: false, compressed: false, fallback: false)
588
610
  unless url.nil?
589
611
  flags = 'SsL'
590
- flags += headers_only ? 'I' : 'i'
612
+ flags += @headers_only ? 'I' : 'i'
591
613
  agents = [
592
614
  'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.6 Safari/605.1.1',
593
615
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.',
594
616
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.3',
595
617
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.0.'
596
618
  ]
597
- headers = headers.nil? ? '' : headers.map { |h, v| %(-H "#{h}: #{v}") }.join(' ')
598
- compress = compressed ? '--compressed' : ''
599
- source = `#{@curl} -#{flags} #{compress} #{headers} '#{url}' 2>/dev/null`
619
+ headers = @headers.nil? ? '' : @headers.map { |h, v| %(-H "#{h}: #{v}") }.join(' ')
620
+ compress = @compressed ? '--compressed' : ''
621
+ @source = `#{@curl} -#{flags} #{compress} #{headers} '#{@url}' 2>/dev/null`
600
622
  agent = 0
601
623
  while source.nil? || source.empty?
602
- source = `#{@curl} -#{flags} #{compress} -A "#{agents[agent]}" #{headers} '#{url}' 2>/dev/null`
624
+ source = `#{@curl} -#{flags} #{compress} -A "#{agents[agent]}" #{headers} '#{@url}' 2>/dev/null`
603
625
  break if agent >= agents.count - 1
604
626
  end
605
627
 
606
- unless $?.success? || fallback
607
- warn "Error curling #{url}"
628
+ unless $?.success? || @fallback
629
+ warn "Error curling #{@url}"
608
630
  Process.exit 1
609
631
  end
610
632
 
611
- if fallback && (source.nil? || source.empty?)
612
- source = curl_dynamic_html(url, fallback, headers)
633
+ if @fallback && (@source.nil? || @source.empty?)
634
+ @source = curl_dynamic_html(@url, @fallback, @headers)
613
635
  end
614
636
  end
615
637
 
616
638
  return false if source.nil? || source.empty?
617
639
 
618
- source.strip!
640
+ @source.strip!
619
641
 
620
- headers = { 'location' => url }
621
- lines = source.split(/\r\n/)
642
+ headers = { 'location' => @url }
643
+ lines = @source.split(/\r\n/)
622
644
  code = lines[0].match(/(\d\d\d)/)[1]
623
645
  lines.shift
624
646
  lines.each_with_index do |line, idx|
@@ -626,7 +648,7 @@ module Curl
626
648
  m = Regexp.last_match
627
649
  headers[m[1]] = m[2]
628
650
  else
629
- source = lines[idx..].join("\n")
651
+ @source = lines[idx..].join("\n")
630
652
  break
631
653
  end
632
654
  end
@@ -636,21 +658,21 @@ module Curl
636
658
  end
637
659
 
638
660
  if headers['content-type'] =~ /json/
639
- return { url: url, code: code, headers: headers, meta: nil, links: nil,
640
- head: nil, body: source.strip, source: source.strip, body_links: nil, body_images: nil }
661
+ return { url: @url, code: code, headers: headers, meta: nil, links: nil,
662
+ head: nil, body: @source.strip, source: @source.strip, body_links: nil, body_images: nil }
641
663
  end
642
664
 
643
665
  head = source.match(%r{(?<=<head>)(.*?)(?=</head>)}mi)
644
666
 
645
667
  if head.nil?
646
- { url: url, code: code, headers: headers, meta: nil, links: nil, head: nil, body: source.strip,
647
- source: source.strip, body_links: nil, body_images: nil }
668
+ { url: @url, code: code, headers: headers, meta: nil, links: nil, head: nil, body: @source.strip,
669
+ source: @source.strip, body_links: nil, body_images: nil }
648
670
  else
649
671
  meta = meta_tags(head[1])
650
672
  links = link_tags(head[1])
651
- body = source.match(%r{<body.*?>(.*?)</body>}mi)[1]
652
- { url: url, code: code, headers: headers, meta: meta, links: links, head: head[1], body: body,
653
- source: source.strip, body_links: body_links, body_images: body_images }
673
+ body = @source.match(%r{<body.*?>(.*?)</body>}mi)[1]
674
+ { url: @url, code: code, headers: headers, meta: meta, links: links, head: head[1], body: body,
675
+ source: @source.strip, body_links: body_links, body_images: body_images }
654
676
  end
655
677
  end
656
678
 
@@ -3,7 +3,11 @@
3
3
  module Curl
4
4
  # Class for CURLing a JSON response
5
5
  class Json
6
- attr_reader :url, :code, :json, :headers
6
+ attr_accessor :url
7
+
8
+ attr_writer :compressed, :request_headers, :symbolize_names
9
+
10
+ attr_reader :code, :json, :headers
7
11
 
8
12
  def to_data
9
13
  {
@@ -23,9 +27,17 @@ module Curl
23
27
  ##
24
28
  ## @return [Curl::Json] Curl::Json object with url, code, parsed json, and response headers
25
29
  ##
26
- def initialize(url, headers: nil, compressed: false, symbolize_names: false)
30
+ def initialize(url, options = {})
31
+ @url = url
32
+ @request_headers = options[:headers]
33
+ @compressed = options[:compressed]
34
+ @symbolize_names = options[:symbolize_names]
35
+
27
36
  @curl = TTY::Which.which('curl')
28
- page = curl_json(url, headers: headers, compressed: compressed, symbolize_names: symbolize_names)
37
+ end
38
+
39
+ def curl
40
+ page = curl_json
29
41
 
30
42
  raise "Error retrieving #{url}" if page.nil? || page.empty?
31
43
 
@@ -60,7 +72,7 @@ module Curl
60
72
  ##
61
73
  ## @return [Hash] hash of url, code, headers, and parsed json
62
74
  ##
63
- def curl_json(url, headers: nil, compressed: false, symbolize_names: false)
75
+ def curl_json
64
76
  flags = 'SsLi'
65
77
  agents = [
66
78
  'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.6 Safari/605.1.1',
@@ -69,12 +81,12 @@ module Curl
69
81
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.0.'
70
82
  ]
71
83
 
72
- headers = headers.nil? ? '' : headers.map { |h, v| %(-H "#{h}: #{v}") }.join(' ')
73
- compress = compressed ? '--compressed' : ''
74
- source = `#{@curl} -#{flags} #{compress} #{headers} '#{url}' 2>/dev/null`
84
+ headers = @headers.nil? ? '' : @headers.map { |h, v| %(-H "#{h}: #{v}") }.join(' ')
85
+ compress = @compressed ? '--compressed' : ''
86
+ source = `#{@curl} -#{flags} #{compress} #{headers} '#{@url}' 2>/dev/null`
75
87
  agent = 0
76
88
  while source.nil? || source.empty?
77
- source = `#{@curl} -#{flags} #{compress} -A "#{agents[agent]}" #{headers} '#{url}' 2>/dev/null`
89
+ source = `#{@curl} -#{flags} #{compress} -A "#{agents[agent]}" #{headers} '#{@url}' 2>/dev/null`
78
90
  break if agent >= agents.count - 1
79
91
  end
80
92
 
@@ -99,9 +111,9 @@ module Curl
99
111
  json = source.strip.force_encoding('utf-8')
100
112
  begin
101
113
  json.gsub!(/[\u{1F600}-\u{1F6FF}]/, '')
102
- { url: url, code: code, headers: headers, json: JSON.parse(json, symbolize_names: symbolize_names) }
103
- rescue StandardError => e
104
- { url: url, code: code, headers: headers, json: nil}
114
+ { url: @url, code: code, headers: headers, json: JSON.parse(json, symbolize_names: @symbolize_names) }
115
+ rescue StandardError
116
+ { url: @url, code: code, headers: headers, json: nil }
105
117
  end
106
118
  end
107
119
  end
data/lib/curly/string.rb CHANGED
@@ -7,7 +7,7 @@
7
7
  ##
8
8
  class ::String
9
9
  def clean
10
- gsub(/[\n ]+/m, ' ').gsub(/> +</, '><')
10
+ gsub(/[\t\n ]+/m, ' ').gsub(/> +</, '><')
11
11
  end
12
12
 
13
13
  ##
data/lib/curly/version.rb CHANGED
@@ -1,3 +1,3 @@
1
1
  module Curly
2
- VERSION = '0.0.2'
2
+ VERSION = '0.0.3'
3
3
  end
data/src/_README.md CHANGED
@@ -1,4 +1,4 @@
1
- <!--README--><!--GITHUB--># curlyq
1
+ <!--README--><!--GITHUB--># CurlyQ
2
2
 
3
3
  [![Gem](https://img.shields.io/gem/v/na.svg)](https://rubygems.org/gems/curlyq)
4
4
  [![GitHub license](https://img.shields.io/github/license/ttscoff/curlyq.svg)](./LICENSE.txt)
@@ -6,11 +6,13 @@
6
6
  **A command line helper for curl and web scraping**
7
7
 
8
8
  _If you find this useful, feel free to [buy me some coffee][donate]._
9
+
10
+ [donate]: https://brettterpstra.com/donate
9
11
  <!--END GITHUB-->
10
12
 
11
- The current version of `curlyq` is <!--VER--><!--END VER-->.
13
+ The current version of `curlyq` is <!--VER-->0.0.2<!--END VER-->.
12
14
 
13
- `curlyq` is a command that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like `jq` to parse the output.
15
+ CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like `jq` to parse the output.
14
16
 
15
17
  [github]: https://github.com/ttscoff/curlyq/
16
18
 
@@ -23,11 +25,15 @@ If you're using Homebrew, you have the option to install via [brew-gem](https://
23
25
  brew install brew-gem
24
26
  brew gem install curlyq
25
27
 
26
- If you don't have Ruby/RubyGems, you can install them pretty easily with Homebrew, rvm, or asdf.
28
+ If you don't have Ruby/RubyGems, you can install them pretty easily with [Homebrew], [rvm], or [asdf].
29
+
30
+ [Homebrew]: https://brew.sh/ "Homebrew—The Missing Package Manager for macOS (or Linux)"
31
+ [rvm]: https://rvm.io/ "Ruby Version Manager (RVM)"
32
+ [asdf]: https://github.com/asdf-vm/asdf "asdf-vm/asdf:Extendable version manager with support for ..."
27
33
 
28
34
  ### Usage
29
35
 
30
- Run `curlyq help` for a list of commands. Run `curlyq help SUBCOMMAND` for details on a particular subcommand and its options.
36
+ Run `curlyq help` for a list of subcommands. Run `curlyq help SUBCOMMAND` for details on a particular subcommand and its options.
31
37
 
32
38
  ```
33
39
  @cli(bundle exec bin/curlyq help)
@@ -35,7 +41,7 @@ Run `curlyq help` for a list of commands. Run `curlyq help SUBCOMMAND` for detai
35
41
 
36
42
  #### Commands
37
43
 
38
- curlyq makes use of subcommands, e.g. `curlyq html` or `curlyq extract`. Each subcommand takes its own options, but I've made an effort to standardize the choices between each command.
44
+ curlyq makes use of subcommands, e.g. `curlyq html [options] URL` or `curlyq extract [options] URL`. Each subcommand takes its own options, but I've made an effort to standardize the choices between each command as much as possible.
39
45
 
40
46
  ##### extract
41
47
 
@@ -82,6 +88,8 @@ curlyq makes use of subcommands, e.g. `curlyq html` or `curlyq extract`. Each su
82
88
 
83
89
  ##### screenshot
84
90
 
91
+ Full-page screenshots require Firefox, installed and specified with `--browser firefox`.
92
+
85
93
  ```
86
94
  @cli(bundle exec bin/curlyq help screenshot)
87
95
  ```
@@ -97,5 +105,5 @@ PayPal link: [paypal.me/ttscoff](https://paypal.me/ttscoff)
97
105
 
98
106
  ## Changelog
99
107
 
100
- See [CHANGELOG.md](https://github.com/ttscoff/na_gem/blob/master/CHANGELOG.md)
108
+ See [CHANGELOG.md](https://github.com/ttscoff/curlyq/blob/main/CHANGELOG.md)
101
109
  <!--END GITHUB--><!--END README-->
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: curlyq
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.2
4
+ version: 0.0.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Brett Terpstra
@@ -137,6 +137,7 @@ extra_rdoc_files:
137
137
  - README.rdoc
138
138
  - curlyq.rdoc
139
139
  files:
140
+ - ".github/FUNDING.yml"
140
141
  - ".gitignore"
141
142
  - CHANGELOG.md
142
143
  - Gemfile