RubyGems - curlyq - Versions diffs - 0.0.2 → 0.0.3 - Mend

curlyq 0.0.2 → 0.0.3

Files changed (12) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 4a73a5990b9c07f4d564216cd13c1ea0d73a833191c3f7734e7e3e5af2954b40
-  data.tar.gz: 8444276e61febd7b3e517eec56155a4f8754809fa8dc46c0d6e173737bca79e0
+  metadata.gz: 74306df5fa01c7d69f341fb38f0ef966ef1acdad835db8e2d49feb6e064ebd28
+  data.tar.gz: e8b117b64755738951adfd6e9265fb813118e4e187f5d1cd138788f6759f0d3d
 SHA512:
-  metadata.gz: ca1a8c0bfc122e8020b356018276e27647449834f30eb66d7561acf187ec6cd837b59564a722ceaad5b3e99ac47de4a9944dfc370e69b92575155988a81fcfd4
-  data.tar.gz: 3bc9ed736378cc70607d4f42ecbe1f8cc91fbe87243d0dda4f7dc9ff6e44f5cc33f687d00e188f25ae3494f47bbbfedd2f2e27e8b008048e22c6c10ce2dc3b7f
+  metadata.gz: a98b2b0d24cba28ef4487b5564668e1266b56cf1e097488510865e26795cb4d752ef15c8f83360cc7208bc529e48b6bc857821eb8193b9077302bc24fa9a33e0
+  data.tar.gz: e6c2c276a6bf265612da74085a62c3b475c7dd401dd4f149ff274902c1a21d0f7c241f6a269a5c28dceddf46b62e175d1b45c6940a2f99364be8967e2d67a67c

data/.github/FUNDING.yml ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ github: [ttscoff]
2	+ custom: ['https://brettterpstra.com/support/', 'https://brettterpstra.com/donate/']

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,14 @@
+### 0.0.3
+2024-01-10 13:38
+#### IMPROVED
+- Refactor Curl and Json libs to allow setting of options after creation of object
+- Allow setting of headers on most subcommands
+- --clean now affects source, head, and body keys of output
+- Also remove tabs when cleaning whitespace
 ### 0.0.2
 2024-01-10 09:18

data/Gemfile.lock CHANGED Viewed

@@ -1,7 +1,7 @@
 PATH
   remote: .
   specs:
-    curlyq (0.0.2)
+    curlyq (0.0.3)
       gli (~> 2.21.0)
       nokogiri (~> 1.16.0)
       selenium-webdriver (~> 4.16.0)

data/README.md CHANGED Viewed

@@ -1,4 +1,4 @@
-# curlyq
+# CurlyQ
 [![Gem](https://img.shields.io/gem/v/na.svg)](https://rubygems.org/gems/curlyq)
 [![GitHub license](https://img.shields.io/github/license/ttscoff/curlyq.svg)](./LICENSE.txt)
@@ -7,11 +7,13 @@
 _If you find this useful, feel free to [buy me some coffee][donate]._
+[donate]: https://brettterpstra.com/donate
-The current version of `curlyq` is 0.0.2
+The current version of `curlyq` is 0.0.3
 .
-`curlyq` is a command that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like `jq` to parse the output.
+CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like `jq` to parse the output.
 [github]: https://github.com/ttscoff/curlyq/
@@ -24,11 +26,15 @@ If you're using Homebrew, you have the option to install via [brew-gem](https://
     brew install brew-gem
     brew gem install curlyq
-If you don't have Ruby/RubyGems, you can install them pretty easily with Homebrew, rvm, or asdf.
+If you don't have Ruby/RubyGems, you can install them pretty easily with [Homebrew], [rvm], or [asdf].
+[Homebrew]: https://brew.sh/ "Homebrew???The Missing Package Manager for macOS (or Linux)"
+[rvm]: https://rvm.io/ "Ruby Version Manager (RVM)"
+[asdf]: https://github.com/asdf-vm/asdf "asdf-vm/asdf:Extendable version manager with support for ..."
 ### Usage
-Run `curlyq help` for a list of commands. Run `curlyq help SUBCOMMAND` for details on a particular subcommand and its options.
+Run `curlyq help` for a list of subcommands. Run `curlyq help SUBCOMMAND` for details on a particular subcommand and its options.
 ```
 NAME
@@ -38,7 +44,7 @@ SYNOPSIS
     curlyq [global options] command [command options] [arguments...]
 VERSION
-    0.0.2
+    0.0.3
 GLOBAL OPTIONS
     --help          - Show this message
@@ -61,7 +67,7 @@ COMMANDS
 #### Commands
-curlyq makes use of subcommands, e.g. `curlyq html` or `curlyq extract`. Each subcommand takes its own options, but I've made an effort to standardize the choices between each command.
+curlyq makes use of subcommands, e.g. `curlyq html [options] URL` or `curlyq extract [options] URL`. Each subcommand takes its own options, but I've made an effort to standardize the choices between each command as much as possible.
 ##### extract
@@ -135,6 +141,7 @@ SYNOPSIS
 COMMAND OPTIONS
     -c, --[no-]compressed - Expect compressed results
     --[no-]clean          - Remove extra whitespace from results
+    -h, --header=arg      - Define a header to send as key=value (may be used more than once, default: none)
     -t, --type=arg        - Type of images to return (img, srcset, opengraph, all) (may be used more than once, default: ["all"])
 ```
@@ -193,6 +200,8 @@ COMMAND OPTIONS
 ##### screenshot
+Full-page screenshots require Firefox, installed and specified with `--browser firefox`.
 ```
 NAME
     screenshot - Save a screenshot of a URL
@@ -203,6 +212,7 @@ SYNOPSIS
 COMMAND OPTIONS
     -b, --browser=arg     - Browser to use (firefox, chrome) (default: chrome)
+    -h, --header=arg      - Define a header to send as key=value (may be used more than once, default: none)
     -o, --out, --file=arg - File destination (default: none)
     -t, --type=arg        - Type of screenshot to save (full (requires firefox), print, visible) (default: full)
 ```
@@ -230,4 +240,4 @@ PayPal link: [paypal.me/ttscoff](https://paypal.me/ttscoff)
 ## Changelog
-See [CHANGELOG.md](https://github.com/ttscoff/na_gem/blob/master/CHANGELOG.md)
+See [CHANGELOG.md](https://github.com/ttscoff/curlyq/blob/main/CHANGELOG.md)

data/bin/curlyq CHANGED Viewed

@@ -110,12 +110,13 @@ command %i[html curl] do |c|
     output = []
     urls.each do |url|
-      res = Curl::Html.new(url, browser: options[:browser], fallback: options[:fallback],
-                                headers: headers, headers_only: options[:info],
-                                compressed: options[:compressed], clean: options[:clean],
-                                ignore_local_links: options[:ignore_relative],
-                                ignore_fragment_links: options[:ignore_fragments],
-                                external_links_only: options[:external_links_only])
+      res = Curl::Html.new(url, { browser: options[:browser], fallback: options[:fallback],
+                                  headers: headers, headers_only: options[:info],
+                                  compressed: options[:compressed], clean: options[:clean],
+                                  ignore_local_links: options[:ignore_relative],
+                                  ignore_fragment_links: options[:ignore_fragments],
+                                  external_links_only: options[:external_links_only] })
+      res.curl
       if options[:info]
         output.push(res.headers)
@@ -156,12 +157,18 @@ command :screenshot do |c|
   c.desc 'File destination'
   c.flag %i[o out file]
+  c.desc 'Define a header to send as key=value'
+  c.flag %i[h header], multiple: true
   c.action do |_, options, args|
     urls = args.join(' ').split(/[, ]+/)
+    headers = break_headers(options[:header])
     urls.each do |url|
       c = Curl::Html.new(url)
-      c.screenshot(options[:out], browser: options[:browser], type: options[:type])
+      c.headers = headers
+      c.browser = options[:browser]
+      c.screenshot(options[:out], type: options[:type])
     end
   end
 end
@@ -185,7 +192,11 @@ command :json do |c|
     output = []
     urls.each do |url|
-      res = Curl::Json.new(url, headers: headers, compressed: options[:compressed], symbolize_names: false)
+      res = Curl::Json.new(url)
+      res.request_headers = headers
+      res.compressed = options[:compressed],
+      res.symbolize_names = false
+      res.curl
       json = res.json
@@ -235,8 +246,9 @@ command :extract do |c|
     output = []
     urls.each do |url|
-      res = Curl::Html.new(url, headers: headers, headers_only: false,
-                                compressed: options[:compressed], clean: options[:clean])
+      res = Curl::Html.new(url, { headers: headers, headers_only: false,
+                                  compressed: options[:compressed], clean: options[:clean] })
+      res.curl
       extracted = res.extract(options[:before], options[:after])
       extracted.strip_tags! if options[:strip]
       output.concat(extracted)
@@ -271,8 +283,9 @@ command :tags do |c|
     output = []
     urls.each do |url|
-      res = Curl::Html.new(url, headers: headers, headers_only: options[:headers],
-                                compressed: options[:compressed], clean: options[:clean])
+      res = Curl::Html.new(url, { headers: headers, headers_only: options[:headers],
+                                  compressed: options[:compressed], clean: options[:clean] })
+      res.curl
       output = []
       if options[:search]
         output = res.tags.search(options[:search])
@@ -299,15 +312,20 @@ command :images do |c|
   c.desc 'Remove extra whitespace from results'
   c.switch %i[clean]
+  c.desc 'Define a header to send as key=value'
+  c.flag %i[h header], multiple: true
   c.action do |global_options, options, args|
     urls = args.join(' ').split(/[, ]+/)
+    headers = break_headers(options[:header])
     output = []
     types = options[:type].join(' ').split(/[ ,]+/).map(&:normalize_image_type)
     urls.each do |url|
-      res = Curl::Html.new(url, compressed: options[:compressed], clean: options[:clean])
+      res = Curl::Html.new(url, { compressed: options[:compressed], clean: options[:clean] })
+      res.curl
       output.concat(res.images(types: types))
     end
@@ -339,10 +357,13 @@ command :links do |c|
     output = []
     urls.each do |url|
-      res = Curl::Html.new(url, compressed: options[:compressed], clean: options[:clean],
-                                ignore_local_links: options[:ignore_relative],
-                                ignore_fragment_links: options[:ignore_fragments],
-                                external_links_only: options[:external_links_only])
+      res = Curl::Html.new(url, {
+                             compressed: options[:compressed], clean: options[:clean],
+                             ignore_local_links: options[:ignore_relative],
+                             ignore_fragment_links: options[:ignore_fragments],
+                             external_links_only: options[:external_links_only]
+                           })
+      res.curl
       if options[:query]
         query = options[:query] =~ /^links/ ? options[:query] : "links#{options[:query]}"
@@ -371,7 +392,8 @@ command :headlinks do |c|
     output = []
     urls.each do |url|
-      res = Curl::Html.new(url, compressed: options[:compressed], clean: options[:clean])
+      res = Curl::Html.new(url, { compressed: options[:compressed], clean: options[:clean] })
+      res.curl
       if options[:query]
         query = options[:query] =~ /^links/ ? options[:query] : "links#{options[:query]}"
@@ -420,7 +442,8 @@ command :scrape do |c|
         driver.get url
         res = driver.page_source
-        res = Curl::Html.new(nil, source: res, clean: options[:clean])
+        res = Curl::Html.new(nil, { source: res, clean: options[:clean] })
+        res.curl
         if options[:search]
           out = res.search(options[:search])

data/lib/curly/curl/html.rb CHANGED Viewed

@@ -10,8 +10,11 @@ module Curl
   # Class for CURLing an HTML page
   class Html
-    attr_reader :url, :code, :headers, :meta, :links, :head, :body,
-                :source, :title, :description, :body_links, :body_images, :clean
+    attr_accessor :settings, :browser, :source, :headers, :headers_only, :compressed, :clean, :fallback,
+                  :ignore_local_links, :ignore_fragment_links, :external_links_only
+    attr_reader :url, :code, :meta, :links, :head, :body,
+                :title, :description, :body_links, :body_images
     def to_data(url: nil)
       {
@@ -20,9 +23,9 @@ module Curl
         headers: @headers,
         meta: @meta,
         meta_links: @links,
-        head: @head,
-        body: @body,
-        source: @source,
+        head: @clean ? @head&.strip&.clean : @head,
+        body: @clean ? @body&.strip&.clean : @body,
+        source: @clean ? @source&.strip&.clean : @source,
         title: @title,
         description: @description,
         links: @body_links,
@@ -33,29 +36,48 @@ module Curl
     ##
     ## Create a new page object from a URL
     ##
-    ## @param      url           [String] The url
-    ## @param      headers       [Hash] The headers to use in the curl call
-    ## @param      headers_only  [Boolean] Return headers only
-    ## @param      compressed    [Boolean] Expect compressed result
+    ## @param      url      [String] The url
+    ## @param      options  [Hash] The options
+    ##
+    ## @option options :browser [Symbol] the browser to use instead of curl (:chrome, :firefox)
+    ## @option options :source [String] source provided instead of curl
+    ## @option options :headers [Hash] headers to send in the request
+    ## @option options :headers_only [Boolean] whether to return just response headers
+    ## @option options :compressed [Boolean] expect compressed response
+    ## @option options :clean [Boolean] clean whitespace from response
+    ## @option options :fallback [Symbol] browser to fall back to if curl doesn't work (:chrome, :firefox)
+    ## @option options :ignore_local_links [Boolean] when collecting links, ignore local/relative links
+    ## @option options :ignore_fragment_links [Boolean] when collecting links, ignore links that are just #fragments
+    ## @option options :external_links_only [Boolean] only collect links outside of current site
     ##
     ## @return     [HTMLCurl] new page object
     ##
-    def initialize(url, browser: nil, source: nil, headers: nil,
-                   headers_only: false, compressed: false, clean: false, fallback: false,
-                   ignore_local_links: false, ignore_fragment_links: false, external_links_only: false)
-      @clean = clean
-      @ignore_local_links = ignore_local_links
-      @ignore_fragment_links = ignore_fragment_links
-      @external_links_only = external_links_only
+    def initialize(url, options = {})
+      @browser = options[:browser] || :none
+      @source = options[:source]
+      @headers = options[:headers] || {}
+      @headers_only = options[:headers_only]
+      @compressed = options[:compressed]
+      @clean = options[:clean]
+      @fallback = options[:fallback]
+      @ignore_local_links = options[:ignore_local_links]
+      @ignore_fragment_links = options[:ignore_fragment_links]
+      @external_links_only = options[:external_links_only]
       @curl = TTY::Which.which('curl')
       @url = url
-      res = if url && browser && browser != :none
-              source = curl_dynamic_html(url, browser, headers)
-              curl_html(nil, source: source, headers: headers)
+    end
+    def curl
+      res = if @url && @browser && @browser != :none
+              source = curl_dynamic_html
+              curl_html(nil, source: source, headers: @headers)
             elsif url.nil? && !source.nil?
-              curl_html(nil, source: source, headers: headers, headers_only: headers_only, compressed: compressed, fallback: false)
+              curl_html(nil, source: @source, headers: @headers, headers_only: @headers_only,
+                             compressed: @compressed, fallback: false)
             else
-              curl_html(url, headers: headers, headers_only: headers_only, compressed: compressed, fallback: fallback)
+              curl_html(@url, headers: @headers, headers_only: @headers_only,
+                              compressed: @compressed, fallback: @fallback)
             end
       @url = res[:url]
       @code = res[:code]
@@ -82,10 +104,10 @@ module Curl
     ##                          save (:full_page,
     ##                          :print_page, :visible)
     ##
-    def screenshot(destination = nil, browser: :chrome, type: :full_page)
+    def screenshot(destination = nil, type: :full_page)
       full_page = type.to_sym == :full_page
       print_page = type.to_sym == :print_page
-      save_screenshot(destination, browser: browser, type: type)
+      save_screenshot(destination, type: type)
     end
     ##
@@ -297,7 +319,7 @@ module Curl
       {
         tag: el.name,
-        source: el.to_html,
+        source: @clean ? el.to_html&.strip&.clean : el.to_html,
         attrs: attributes,
         content: @clean ? el.text&.strip&.clean : el.text.strip,
         tags: recurse_children(el)
@@ -511,14 +533,14 @@ module Curl
     ##
     ## @return [String] page source
     ##
-    def curl_dynamic_html(url, browser, headers)
-      browser = browser.normalize_browser_type if browser.is_a?(String)
+    def curl_dynamic_html
+      browser = @browser.normalize_browser_type if @browser.is_a?(String)
       res = nil
       driver = Selenium::WebDriver.for browser
       driver.manage.timeouts.implicit_wait = 4
       begin
-        driver.get url
+        driver.get @url
         res = driver.page_source
       ensure
         driver.quit
@@ -534,7 +556,7 @@ module Curl
     ## @param      browser      [Symbol] The browser (:chrome or :firefox)
     ## @param      type         [Symbol] The type of screenshot (:full_page, :print_page, or :visible)
     ##
-    def save_screenshot(destination = nil, browser: :chrome, type: :full_page)
+    def save_screenshot(destination = nil, type: :full_page)
       raise 'No URL provided' if url.nil?
       raise 'No file destination provided' if destination.nil?
@@ -554,7 +576,7 @@ module Curl
                       "#{destination.sub(/\.(pdf|jpe?g|png)$/, '')}.png"
                     end
-      driver = Selenium::WebDriver.for browser
+      driver = Selenium::WebDriver.for @browser
       driver.manage.timeouts.implicit_wait = 4
       begin
         driver.get @url
@@ -587,38 +609,38 @@ module Curl
                   headers_only: false, compressed: false, fallback: false)
       unless url.nil?
         flags = 'SsL'
-        flags += headers_only ? 'I' : 'i'
+        flags += @headers_only ? 'I' : 'i'
         agents = [
           'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.6 Safari/605.1.1',
           'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.',
           'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.3',
           'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.0.'
         ]
-        headers = headers.nil? ? '' : headers.map { |h, v| %(-H "#{h}: #{v}") }.join(' ')
-        compress = compressed ? '--compressed' : ''
-        source = `#{@curl} -#{flags} #{compress} #{headers} '#{url}' 2>/dev/null`
+        headers = @headers.nil? ? '' : @headers.map { |h, v| %(-H "#{h}: #{v}") }.join(' ')
+        compress = @compressed ? '--compressed' : ''
+        @source = `#{@curl} -#{flags} #{compress} #{headers} '#{@url}' 2>/dev/null`
         agent = 0
         while source.nil? || source.empty?
-          source = `#{@curl} -#{flags} #{compress} -A "#{agents[agent]}" #{headers} '#{url}' 2>/dev/null`
+          source = `#{@curl} -#{flags} #{compress} -A "#{agents[agent]}" #{headers} '#{@url}' 2>/dev/null`
           break if agent >= agents.count - 1
         end
-        unless $?.success? || fallback
-          warn "Error curling #{url}"
+        unless $?.success? || @fallback
+          warn "Error curling #{@url}"
           Process.exit 1
         end
-        if fallback && (source.nil? || source.empty?)
-          source = curl_dynamic_html(url, fallback, headers)
+        if @fallback && (@source.nil? || @source.empty?)
+          @source = curl_dynamic_html(@url, @fallback, @headers)
         end
       end
       return false if source.nil? || source.empty?
-      source.strip!
+      @source.strip!
-      headers = { 'location' => url }
-      lines = source.split(/\r\n/)
+      headers = { 'location' => @url }
+      lines = @source.split(/\r\n/)
       code = lines[0].match(/(\d\d\d)/)[1]
       lines.shift
       lines.each_with_index do |line, idx|
@@ -626,7 +648,7 @@ module Curl
           m = Regexp.last_match
           headers[m[1]] = m[2]
         else
-          source = lines[idx..].join("\n")
+          @source = lines[idx..].join("\n")
           break
         end
       end
@@ -636,21 +658,21 @@ module Curl
       end
       if headers['content-type'] =~ /json/
-        return { url: url, code: code, headers: headers, meta: nil, links: nil,
-                 head: nil, body: source.strip, source: source.strip, body_links: nil, body_images: nil }
+        return { url: @url, code: code, headers: headers, meta: nil, links: nil,
+                 head: nil, body: @source.strip, source: @source.strip, body_links: nil, body_images: nil }
       end
       head = source.match(%r{(?<=<head>)(.*?)(?=</head>)}mi)
       if head.nil?
-        { url: url, code: code, headers: headers, meta: nil, links: nil, head: nil, body: source.strip,
-          source: source.strip, body_links: nil, body_images: nil }
+        { url: @url, code: code, headers: headers, meta: nil, links: nil, head: nil, body: @source.strip,
+          source: @source.strip, body_links: nil, body_images: nil }
       else
         meta = meta_tags(head[1])
         links = link_tags(head[1])
-        body = source.match(%r{<body.*?>(.*?)</body>}mi)[1]
-        { url: url, code: code, headers: headers, meta: meta, links: links, head: head[1], body: body,
-          source: source.strip, body_links: body_links, body_images: body_images }
+        body = @source.match(%r{<body.*?>(.*?)</body>}mi)[1]
+        { url: @url, code: code, headers: headers, meta: meta, links: links, head: head[1], body: body,
+          source: @source.strip, body_links: body_links, body_images: body_images }
       end
     end

data/lib/curly/curl/json.rb CHANGED Viewed

@@ -3,7 +3,11 @@
 module Curl
   # Class for CURLing a JSON response
   class Json
-    attr_reader :url, :code, :json, :headers
+    attr_accessor :url
+    attr_writer :compressed, :request_headers, :symbolize_names
+    attr_reader :code, :json, :headers
     def to_data
       {
@@ -23,9 +27,17 @@ module Curl
     ##
     ## @return     [Curl::Json] Curl::Json object with url, code, parsed json, and response headers
     ##
-    def initialize(url, headers: nil, compressed: false, symbolize_names: false)
+    def initialize(url, options = {})
+      @url = url
+      @request_headers = options[:headers]
+      @compressed = options[:compressed]
+      @symbolize_names = options[:symbolize_names]
       @curl = TTY::Which.which('curl')
-      page = curl_json(url, headers: headers, compressed: compressed, symbolize_names: symbolize_names)
+    end
+    def curl
+      page = curl_json
       raise "Error retrieving #{url}" if page.nil? || page.empty?
@@ -60,7 +72,7 @@ module Curl
     ##
     ## @return     [Hash] hash of url, code, headers, and parsed json
     ##
-    def curl_json(url, headers: nil, compressed: false, symbolize_names: false)
+    def curl_json
       flags = 'SsLi'
       agents = [
         'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.6 Safari/605.1.1',
@@ -69,12 +81,12 @@ module Curl
         'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.0.'
       ]
-      headers = headers.nil? ? '' : headers.map { |h, v| %(-H "#{h}: #{v}") }.join(' ')
-      compress = compressed ? '--compressed' : ''
-      source = `#{@curl} -#{flags} #{compress} #{headers} '#{url}' 2>/dev/null`
+      headers = @headers.nil? ? '' : @headers.map { |h, v| %(-H "#{h}: #{v}") }.join(' ')
+      compress = @compressed ? '--compressed' : ''
+      source = `#{@curl} -#{flags} #{compress} #{headers} '#{@url}' 2>/dev/null`
       agent = 0
       while source.nil? || source.empty?
-        source = `#{@curl} -#{flags} #{compress} -A "#{agents[agent]}" #{headers} '#{url}' 2>/dev/null`
+        source = `#{@curl} -#{flags} #{compress} -A "#{agents[agent]}" #{headers} '#{@url}' 2>/dev/null`
         break if agent >= agents.count - 1
       end
@@ -99,9 +111,9 @@ module Curl
       json = source.strip.force_encoding('utf-8')
       begin
         json.gsub!(/[\u{1F600}-\u{1F6FF}]/, '')
-        { url: url, code: code, headers: headers, json: JSON.parse(json, symbolize_names: symbolize_names) }
-      rescue StandardError => e
-        { url: url, code: code, headers: headers, json: nil}
+        { url: @url, code: code, headers: headers, json: JSON.parse(json, symbolize_names: @symbolize_names) }
+      rescue StandardError
+        { url: @url, code: code, headers: headers, json: nil }
       end
     end
   end

data/lib/curly/string.rb CHANGED Viewed

@@ -7,7 +7,7 @@
 ##
 class ::String
   def clean
-    gsub(/[\n ]+/m, ' ').gsub(/> +</, '><')
+    gsub(/[\t\n ]+/m, ' ').gsub(/> +</, '><')
   end
   ##

data/lib/curly/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module Curly
-  VERSION = '0.0.2'
+  VERSION = '0.0.3'
 end

data/src/_README.md CHANGED Viewed

@@ -1,4 +1,4 @@
-<!--README--><!--GITHUB--># curlyq
+<!--README--><!--GITHUB--># CurlyQ
 [![Gem](https://img.shields.io/gem/v/na.svg)](https://rubygems.org/gems/curlyq)
 [![GitHub license](https://img.shields.io/github/license/ttscoff/curlyq.svg)](./LICENSE.txt)
@@ -6,11 +6,13 @@
 **A command line helper for curl and web scraping**
 _If you find this useful, feel free to [buy me some coffee][donate]._
+[donate]: https://brettterpstra.com/donate
 <!--END GITHUB-->
-The current version of `curlyq` is <!--VER--><!--END VER-->.
+The current version of `curlyq` is <!--VER-->0.0.2<!--END VER-->.
-`curlyq` is a command that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like `jq` to parse the output.
+CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like `jq` to parse the output.
 [github]: https://github.com/ttscoff/curlyq/
@@ -23,11 +25,15 @@ If you're using Homebrew, you have the option to install via [brew-gem](https://
     brew install brew-gem
     brew gem install curlyq
-If you don't have Ruby/RubyGems, you can install them pretty easily with Homebrew, rvm, or asdf.
+If you don't have Ruby/RubyGems, you can install them pretty easily with [Homebrew], [rvm], or [asdf].
+[Homebrew]: https://brew.sh/ "Homebrew—The Missing Package Manager for macOS (or Linux)"
+[rvm]: https://rvm.io/ "Ruby Version Manager (RVM)"
+[asdf]: https://github.com/asdf-vm/asdf "asdf-vm/asdf:Extendable version manager with support for ..."
 ### Usage
-Run `curlyq help` for a list of commands. Run `curlyq help SUBCOMMAND` for details on a particular subcommand and its options.
+Run `curlyq help` for a list of subcommands. Run `curlyq help SUBCOMMAND` for details on a particular subcommand and its options.
 ```
 @cli(bundle exec bin/curlyq help)
@@ -35,7 +41,7 @@ Run `curlyq help` for a list of commands. Run `curlyq help SUBCOMMAND` for detai
 #### Commands
-curlyq makes use of subcommands, e.g. `curlyq html` or `curlyq extract`. Each subcommand takes its own options, but I've made an effort to standardize the choices between each command.
+curlyq makes use of subcommands, e.g. `curlyq html [options] URL` or `curlyq extract [options] URL`. Each subcommand takes its own options, but I've made an effort to standardize the choices between each command as much as possible.
 ##### extract
@@ -82,6 +88,8 @@ curlyq makes use of subcommands, e.g. `curlyq html` or `curlyq extract`. Each su
 ##### screenshot
+Full-page screenshots require Firefox, installed and specified with `--browser firefox`.
 ```
 @cli(bundle exec bin/curlyq help screenshot)
 ```
@@ -97,5 +105,5 @@ PayPal link: [paypal.me/ttscoff](https://paypal.me/ttscoff)
 ## Changelog
-See [CHANGELOG.md](https://github.com/ttscoff/na_gem/blob/master/CHANGELOG.md)
+See [CHANGELOG.md](https://github.com/ttscoff/curlyq/blob/main/CHANGELOG.md)
 <!--END GITHUB--><!--END README-->

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: curlyq
 version: !ruby/object:Gem::Version
-  version: 0.0.2
+  version: 0.0.3
 platform: ruby
 authors:
 - Brett Terpstra
@@ -137,6 +137,7 @@ extra_rdoc_files:
 - README.rdoc
 - curlyq.rdoc
 files:
+- ".github/FUNDING.yml"
 - ".gitignore"
 - CHANGELOG.md
 - Gemfile