RubyGems - curlyq - Versions diffs - 0.0.8 → 0.0.10 - Mend

curlyq 0.0.8 → 0.0.10

Files changed (20) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +20 -0
data/Gemfile.lock +1 -1
data/README.md +20 -4
data/Rakefile +17 -0
data/bin/curlyq +8 -13
data/lib/curly/array.rb +40 -4
data/lib/curly/curl/html.rb +22 -0
data/lib/curly/hash.rb +128 -31
data/lib/curly/numeric.rb +11 -0
data/lib/curly/string.rb +27 -3
data/lib/curly/version.rb +3 -1
data/lib/curly.rb +1 -0
data/src/_README.md +19 -3
data/test/curlyq_headlinks_test.rb +3 -2
data/test/curlyq_html_test.rb +3 -3
data/test/curlyq_scrape_test.rb +32 -2
data/test/curlyq_tags_test.rb +12 -4
data/test/helpers/curlyq-helpers.rb +1 -0
metadata +3 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: d3e32b382d7318b067ee3fb22f2e9057cf6aa9facfac41c74a0ebb5d4fb4743d
-  data.tar.gz: d379da3f0db621052e61230356f5c58b587eefccbb0a4c997216516a4159b44a
+  metadata.gz: 6109483b8869733f9e21ecab9bc8bcda0aa3b58ca1f13f9b96fe7739d019df1f
+  data.tar.gz: 98a8d46fe68bc88ea030dfb8e04262fbab5418005390ff79693d6f636a3bf276
 SHA512:
-  metadata.gz: ae63654deb943771e5f6f3aa0f6a037b1015336abbd696a8ce77acc22f361a3b6a18b03f3b7d02e5c7d5dcaa8d3608248bed240679acfce22ba2e462d84b529f
-  data.tar.gz: 481f8499e45a65cb3981fcf20ef7fc9f01f97a1b7014c6566aa2f3bf7a6611fd2d5d35f78e742e4063eea192b938c0642f0ca764e5032f330778d2815a191a41
+  metadata.gz: 1d75b4af2d6c1fadb83501fa707184ef41d061c08de14666b86d296048e8f21540fe2ad53a79985d5b042c93fa629cdbe8d101828edbb02832d1b55b920d5834
+  data.tar.gz: 238855918e3e765a2edf1864dd2663a959b099cfa5f1b89942f94eb20ba428c1700adee85590879662f0cf8de659328fbe752e8648ee210eefe0769639c57da2

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,23 @@
+### 0.0.10
+2024-01-17 13:50
+#### IMPROVED
+- Update YARD documentation
+- Breaking change, ensure all return types are Arrays, even with single objects, to aid in scriptability
+- Screenshot test suite
+### 0.0.9
+2024-01-16 12:38
+#### IMPROVED
+- You can now use dot syntax inside of a square bracket comparison in --query (`[attrs.id*=what]`)
+- *=, ^=, $=, and == work with array values
+- [] comparisons with no comparison, e.g. [attrs.id], will return every match that has that element populated
 ### 0.0.8
 2024-01-15 16:45

data/Gemfile.lock CHANGED Viewed

@@ -1,7 +1,7 @@
 PATH
   remote: .
   specs:
-    curlyq (0.0.8)
+    curlyq (0.0.10)
       gli (~> 2.21.0)
       nokogiri (~> 1.16.0)
       selenium-webdriver (~> 4.16.0)

data/README.md CHANGED Viewed

@@ -10,10 +10,13 @@ _If you find this useful, feel free to [buy me some coffee][donate]._
 [donate]: https://brettterpstra.com/donate
-The current version of `curlyq` is 0.0.8
+[jq]: https://github.com/jqlang/jq "Command-line JSON processor"
+[yq]: https://github.com/mikefarah/yq "yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor"
+The current version of `curlyq` is 0.0.10
 .
-CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like `jq` to parse the output.
+CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like [jq] to parse the output.
 [github]: https://github.com/ttscoff/curlyq/
@@ -44,7 +47,7 @@ SYNOPSIS
     curlyq [global options] command [command options] [arguments...]
 VERSION
-    0.0.8
+    0.0.10
 GLOBAL OPTIONS
     --help          - Show this message
@@ -71,6 +74,9 @@ You can shape the results using `--search` (`-s`) and `--query` (`-q`) on some c
 A search uses either CSS or XPath syntax to locate elements. For example, if you wanted to locate all of the `<article>` elements with a class of `post` inside of the div with an id of `main`, you would run `--search '#main article.post'`. Searches can target tags, ids, and classes, and can accept `>` to target direct descendents. You can also use XPaths, but I hate those so I'm not going to document them.
+> I've tried to make the query function useful, but if you want to do any kind of advanced shaping, you're better off piping the JSON output to [jq] or [yq].
 Queries are specifically for shaping CurlyQ output. If you're using the `html` command, it returns a key called `images`, so you can target just the images in the response with `-q 'images'`. The queries accept array syntax, so to get the first image, you would use `-q 'images[0]'`. Ranges are accepted as well, so `-q 'images[1..4]'` will return the 2nd through 5th images found on the page. You can also do comparisons, e.g. `images[rel=me]'` to target only images with a `rel` attribute of `me`.
 The comparisons for the query flag are:
@@ -84,6 +90,16 @@ The comparisons for the query flag are:
 - `^=` starts with text
 - `$=` ends with text
+Comparisons can be numeric or string comparisons. A numeric comparison like `curlyq images -q '[width>500]' URL` would return all of the images on the page with a width attribute greater than 500.
+You can also use dot syntax inside of comparisons, e.g. `[links.rel*=me]` to target the links object (`html` command), and return only the links with a `rel=me` attribute. If the comparison is to an array object (like `class` or `rel`), it will match if any of the elements of the array match your comparison.
+If you end the query with a specific key, only that key will be output. If there's only one match, it will be output as a raw string. If there are multiple matches, output will be an array:
+    curlyq tags --search '#main .post h3' -q '[attrs.id*=what].source' 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/'
+    <h3 id="whats-next">What???s Next</h3>
 #### Commands
 curlyq makes use of subcommands, e.g. `curlyq html [options] URL` or `curlyq extract [options] URL`. Each subcommand takes its own options, but I've made an effort to standardize the choices between each command as much as possible.
@@ -440,7 +456,7 @@ COMMAND OPTIONS
 Return a hierarchy of all tags in a page. Use `-t` to limit to a specific tag.
-    curlyq tags --search '#main .post h3' -q 'attrs[id*=what]' https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/
+    curlyq tags --search '#main .post h3' -q '[attrs.id*=what]' https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/
     [
       {

data/Rakefile CHANGED Viewed

@@ -56,6 +56,23 @@ task :test, :pattern, :threads, :max_tests do |_, args|
   ThreadedTests.new.run(pattern: pattern, max_threads: args[:threads].to_i, max_tests: args[:max_tests])
 end
+desc 'Install current gem in all versions of asdf-controlled ruby'
+task :install do
+  Rake::Task['clobber'].invoke
+  Rake::Task['package'].invoke
+  Dir.chdir 'pkg'
+  file = Dir.glob('*.gem').last
+  current_ruby = `asdf current ruby`.match(/(\d.\d+.\d+)/)[1]
+  `asdf list ruby`.split.map { |ruby| ruby.strip.sub(/^*/, '') }.each do |ruby|
+    `asdf shell ruby #{ruby}`
+    puts `gem install #{file}`
+  end
+  `asdf shell ruby #{current_ruby}`
+end
 desc 'Development version check'
 task :ver do
   gver = `git ver`

data/bin/curlyq CHANGED Viewed

@@ -49,7 +49,7 @@ end
 def self.print_out(output, yaml, raw: false, pretty: true)
   output = output.to_data if output.respond_to?(:to_data)
   # Was intended to flatten single responses, but not getting an array back is unpredictable
-  # output = output[0] if output&.is_a?(Array) && output.count == 1
+  output = output.clean_output
   if output.is_a?(String)
     print output
   elsif raw
@@ -130,13 +130,13 @@ command %i[html curl] do |c|
         out = res.parse(source)
         if options[:query]
-          out = out.to_data(url: url, clean: options[:clean]).dot_query(options[:query])
+          out = out.to_data(url: url, clean: options[:clean]).dot_query(options[:query], full_tag: false)
         else
           out = out.to_data
         end
         output.push([out])
       elsif options[:query]
-        queried = res.to_data.dot_query(options[:query])
+        queried = res.to_data.dot_query(options[:query], full_tag: false)
         output.push(queried) if queried
       else
         output.push(res.to_data(url: url))
@@ -144,14 +144,9 @@ command %i[html curl] do |c|
     end
     output.delete_if(&:nil?)
     output.delete_if(&:empty?)
-    # output = output[0] if output.count == 1
     output.map! { |o| o[options[:raw].to_sym] } if options[:raw]
-    if output.is_a?(Array)
-      while output.length == 1
-        output = output[0]
-      end
-    end
+    output = output.clean_output
     print_out(output, global_options[:yaml], raw: options[:raw], pretty: global_options[:pretty])
   end
@@ -246,7 +241,7 @@ command :json do |c|
       end
     end
-    # output = output[0] if output.count == 1
+    output = output.clean_output
     print_out(output, global_options[:yaml], pretty: global_options[:pretty])
   end
@@ -356,7 +351,7 @@ command :tags do |c|
       end
     end
-    output = output[0] if output.count == 1
+    output = output.clean_output
     if options[:source]
       puts output.to_html
@@ -480,7 +475,7 @@ command :headlinks do |c|
       end
     end
-    output = output[0] if output.count == 1
+    output = output.clean_output
     print_out(output, global_options[:yaml], pretty: global_options[:pretty])
   end
@@ -531,7 +526,7 @@ command :scrape do |c|
     output.delete_if(&:empty?)
-    output = output[0] if output.count == 1
+    output = output.clean_output
     if options[:raw]
       output.map! { |o| o[options[:raw].to_sym] }

data/lib/curly/array.rb CHANGED Viewed

@@ -66,7 +66,7 @@ class ::Array
     replace dedup_links
   end
-  #---------------------------------------------------------
+  ##
   ## Run a query on array elements
   ##
   ## @param      path [String] dot.syntax path to compare
@@ -80,17 +80,29 @@ class ::Array
     res
   end
+  ##
+  ## Gets the value of every item in the array
+  ##
+  ## @param      path  The query path (dot syntax)
+  ##
+  ## @return     [Array] array of values
+  ##
   def get_value(path)
-    res = map { |el| el.get_value(path) }
-    res.is_a?(Array) && res.count == 1 ? res[0] : res
+    map { |el| el.get_value(path) }
   end
+  ##
+  ## Convert every item in the array to HTML
+  ##
+  ## @return     [String] Html representation of the object.
+  ##
   def to_html
     map(&:to_html)
   end
   ##
-  ## Test if a tag contains an attribute matching filter queries
+  ## Test if a tag contains an attribute matching filter
+  ## queries
   ##
   ## @param      tag_name    [String] The tag name
   ## @param      classes     [String] The classes to match
@@ -102,6 +114,8 @@ class ::Array
   ## @param      value       [String] The value to match
   ## @param      descendant  [Boolean] Check descendant tags
   ##
+  ## @return     [Boolean] tag matches
+  ##
   def tag_match(tag_name, classes, id, attribute, operator, value, descendant: false)
     tag = self
     keep = true
@@ -155,4 +169,26 @@ class ::Array
       keep
     end
   end
+  ##
+  ## Clean up output, shrink single-item arrays, ensure array output
+  ##
+  ## @return [Array] cleaned up array
+  ##
+  def clean_output
+    output = dup
+    while output.is_a?(Array) && output.count == 1
+      output = output[0]
+    end
+    output.ensure_array
+  end
+  ##
+  ## Ensure that an object is an array
+  ##
+  ## @return     [Array] object as Array
+  ##
+  def ensure_array
+    return self
+  end
 end

data/lib/curly/curl/html.rb CHANGED Viewed

@@ -16,6 +16,12 @@ module Curl
     attr_reader :url, :code, :meta, :links, :head, :body,
                 :title, :description, :body_links, :body_images
+    # Convert self to a hash of data
+    #
+    # @param      url   [String]  A base url to fall back to
+    #
+    # @return     [Hash] a hash of data
+    #
     def to_data(url: nil)
       {
         url: @url || url,
@@ -68,12 +74,23 @@ module Curl
       @url = url.nil? ? options[:url] : url
     end
+    ##
+    # Parse raw HTML source instead of curling
+    #
+    # @param      source  [String] The source
+    #
+    #
+    # @return     [Hash] Hash of data after processing #
+    #
     def parse(source)
       @body = source
       { url: @url, code: @code, headers: @headers, meta: @meta, links: @links, head: @head, body: source,
         source: source.strip, body_links: content_links, body_images: content_images }
     end
+    ##
+    ## Curl a url, either with curl or Selenium based on browser settings
+    ##
     def curl
       res = if @url && @browser && @browser != :none
               source = curl_dynamic_html
@@ -283,6 +300,11 @@ module Curl
       output
     end
+    ##
+    ## String representation
+    ##
+    ## @return     String representation of the object.
+    ##
     def to_s
       headers = @headers.nil? ? 0 : @headers.count
       meta = @meta.nil? ? 0 : @meta.count

data/lib/curly/hash.rb CHANGED Viewed

@@ -2,6 +2,14 @@
 # Hash helpers
 class ::Hash
+  ## Convert a Curly object to data hash
+  ##
+  ## @return     [Hash] return a hash with keys renamed and
+  ##             cleaned up
+  ##
+  ## @param      url    [String] A url to fall back to
+  ## @param      clean  [Boolean] Clean extra spaces and newlines in sources
+  ##
   def to_data(url: nil, clean: false)
     if key?(:body_links)
       {
@@ -23,17 +31,33 @@ class ::Hash
     end
   end
+  ##
+  ## Return the raw HTML of the object
+  ##
+  ## @return    [String] Html representation of the object.
+  ##
   def to_html
     if key?(:source)
       self[:source]
     end
   end
+  ##
+  ## Get a value from the hash using a dot-syntax query
+  ##
+  ## @param      query  [String] The query (dot notation)
+  ##
+  ## @return     [Object] result of querying the hash
+  ##
   def get_value(query)
     return nil if self.empty?
+    stringify_keys!
     query.split('.').inject(self) do |v, k|
-      k = k.to_i if v.is_a? Array
-      next unless v.key?(k)
+      return v.map { |el| el.get_value(k) } if v.is_a? Array
+      # k = k.to_i if v.is_a? Array
+      next v unless v.key?(k)
       v.fetch(k)
     end
   end
@@ -42,7 +66,7 @@ class ::Hash
   #
   # @param      path  [String] The path
   #
-  # @return     Result of path query
+  # @return     [Object] Result of path query
   #
   def dot_query(path, root = nil, full_tag: true)
     res = stringify_keys
@@ -52,12 +76,17 @@ class ::Hash
       return res.get_value(path)
     end
-    enumerate = false
+    path.gsub!(/\[(.*?)\]/) do
+      inter = Regexp.last_match(1).gsub(/\./, '%')
+      "[#{inter}]"
+    end
     out = []
     q = path.split(/(?<![\d.])\./)
     while q.count.positive?
       pth = q.shift
+      pth.gsub!(/%/, '.')
       return nil if res.nil?
@@ -70,8 +99,8 @@ class ::Hash
       ats = []
       at = []
-      while pth =~ /\[[+&,]?\w+( *[\^*$=<>]=? *\w+)?/
-        m = pth.match(/\[(?<com>[,+&])? *(?<key>\w+)( *(?<op>[\^*$=<>]{1,2}) *(?<val>[^,&\]]+))? */)
+      while pth =~ /\[[+&,]?[\w.]+( *[\^*$=<>]=? *\w+)?/
+        m = pth.match(/\[(?<com>[,+&])? *(?<key>[\w.]+)( *(?<op>[\^*$=<>]{1,2}) *(?<val>[^,&\]]+))? */)
         comp = [m['key'], m['op'], m['val']]
         case m['com']
@@ -82,7 +111,7 @@ class ::Hash
           at.push(comp)
         end
-        pth.sub!(/\[(?<com>[,&+])? *(?<key>\w+)( *(?<op>[\^*$=<>]{1,2}) *(?<val>[^,&\]]+))?/, '[')
+        pth.sub!(/\[(?<com>[,&+])? *(?<key>[\w.]+)( *(?<op>[\^*$=<>]{1,2}) *(?<val>[^,&\]]+))?/, '[')
       end
       ats.push(at) unless at.empty?
       pth.sub!(/\[\]/, '')
@@ -110,11 +139,11 @@ class ::Hash
       pth = ''
       return false if res.nil?
       if ats.count.positive?
         while ats.count.positive?
           atr = ats.shift
           res = [res] if res.is_a?(Hash)
           res.each do |r|
             out.push(full_tag ? tag : r) if evaluate_comp(r, atr)
           end
@@ -140,6 +169,32 @@ class ::Hash
     out
   end
+  ##
+  ## Test if values in an array match an operator
+  ##
+  ## @param      array [Array] The array
+  ## @param      key   [String] The key
+  ## @param      comp  [String] The comparison, e.g. *= or $=
+  ##
+  ## @return [Boolean] true if array contains match
+  def array_match(array, key, comp)
+    keep = false
+    array.each do |el|
+      keep = case comp
+             when /^\^/
+               key =~ /^#{el}/i ? true : false
+             when /^\$/
+               key =~ /#{el}$/i ? true : false
+             when /^\*/
+               key =~ /#{el}/i ? true : false
+             else
+               key =~ /^#{el}$/i ? true : false
+             end
+      break if keep
+    end
+    keep
+  end
   ##
   ## Evaluate a comparison
   ##
@@ -165,40 +220,57 @@ class ::Hash
             end
       r = r.get_value(key.to_s) if key.to_s =~ /\./
-      return r.key?(key) && !r[key].nil? && !r[key].empty? if val.nil?
+      if val.nil?
+        if r.is_a?(Hash)
+          return r.key?(key) && !r[key].nil? && !r[key].empty?
+        elsif r.is_a?(String)
+          return r.nil? ? false : true
+        elsif r.is_a?(Array)
+          return r.empty? ? false : true
+        end
+      end
-      if !r.key?(key)
+      if r.nil?
         keep = false
-      elsif r[key].is_a?(Array)
-        valid = r[key].filter do |k|
-          case a[1]
-          when /^\^/
-            k =~ /^#{a[2]}/i ? true : false
-          when /^\$/
-            k =~ /#{a[2]}$/i ? true : false
-          when /^\*/
-            k =~ /#{a[2]}/i ? true : false
+      elsif r.is_a?(Array)
+        valid = r.filter do |k|
+          if k.is_a? Array
+            array_match(k, a[2], a[1])
           else
-            k =~ /^#{a[2]}$/i ? true : false
+            case a[1]
+            when /^\^/
+              k =~ /^#{a[2]}/i ? true : false
+            when /^\$/
+              k =~ /#{a[2]}$/i ? true : false
+            when /^\*/
+              k =~ /#{a[2]}/i ? true : false
+            else
+              k =~ /^#{a[2]}$/i ? true : false
+            end
           end
         end
         keep = valid.count.positive?
       elsif val.is_a?(Numeric) && a[1] =~ /^[<>=]{1,2}$/
-        k = r[key].to_i
+        k = r.to_i
         comp = a[1] =~ /^=$/ ? '==' : a[1]
         keep = eval("#{k}#{comp}#{val}")
       else
-        keep = case a[1]
-               when /^\^/
-                 r[key] =~ /^#{a[2]}/i ? true : false
-               when /^\$/
-                 r[key] =~ /#{a[2]}$/i ? true : false
-               when /^\*/
-                 r[key] =~ /#{a[2]}/i ? true : false
-               else
-                 r[key] =~ /^#{a[2]}$/i ? true : false
-               end
+        v = r.is_a?(Hash) ? r[key] : r
+        if v.is_a? Array
+          keep = array_match(v, a[2], a[1])
+        else
+          keep = case a[1]
+                 when /^\^/
+                   v =~ /^#{a[2]}/i ? true : false
+                 when /^\$/
+                   v =~ /#{a[2]}$/i ? true : false
+                 when /^\*/
+                   v =~ /#{a[2]}/i ? true : false
+                 else
+                   v =~ /^#{a[2]}$/i ? true : false
+                 end
+        end
       end
       return false unless keep
@@ -306,7 +378,32 @@ class ::Hash
     end
   end
+  ##
+  ## Destructive version of #stringify_keys
+  ##
+  ## @see        #stringify_keys
+  ##
   def stringify_keys!
     replace stringify_keys
   end
+  ##
+  ## Clean up empty arrays and return an array with one or
+  ## more elements
+  ##
+  ## @return     [Array] output array
+  ##
+  def clean_output
+    output = ensure_array
+    output.clean_output
+  end
+  ##
+  ## Ensure that an object is an array
+  ##
+  ## @return     [Array] object as Array
+  ##
+  def ensure_array
+    return [self]
+  end
 end

data/lib/curly/numeric.rb ADDED Viewed

@@ -0,0 +1,11 @@
+# Numeric helpers
+class ::Numeric
+  ##
+  ## Return an array version of self
+  ##
+  ## @return     [Array] self enclosed in an array
+  ##
+  def ensure_array
+    [self]
+  end
+end

data/lib/curly/string.rb CHANGED Viewed

@@ -6,6 +6,11 @@
 ## @return     [String] cleaned string
 ##
 class ::String
+  ## Remove extra spaces and newlines, compress space
+  ## between tags
+  ##
+  ## @return     [String] cleaned string
+  ##
   def clean
     gsub(/[\t\n ]+/m, ' ').gsub(/> +</, '><')
   end
@@ -40,7 +45,7 @@ class ::String
   ##
   ## Convert an image type string to a symbol
   ##
-  ## @return     Symbol :srcset, :img, :opengraph, :all
+  ## @return     [Symbol] :srcset, :img, :opengraph, :all
   ##
   def normalize_image_type(default = :all)
     case self.to_s
@@ -58,7 +63,7 @@ class ::String
   ##
   ## Convert a browser type string to a symbol
   ##
-  ## @return     Symbol :chrome, :firefox
+  ## @return     [Symbol] :chrome, :firefox
   ##
   def normalize_browser_type(default = :none)
     case self.to_s
@@ -74,7 +79,7 @@ class ::String
   ##
   ## Convert a screenshot type string to a symbol
   ##
-  ## @return     Symbol :full_page, :print_page, :visible
+  ## @return     [Symbol] :full_page, :print_page, :visible
   ##
   def normalize_screenshot_type(default = :none)
     case self.to_s
@@ -88,4 +93,23 @@ class ::String
       default.is_a?(Symbol) ? default.to_sym : default.normalize_browser_type
     end
   end
+  ##
+  ## Clean up output and return a single-item array
+  ##
+  ## @return     [Array] output array
+  ##
+  def clean_output
+    output = ensure_array
+    output.clean_output
+  end
+  ##
+  ## Ensure that an object is an array
+  ##
+  ## @return     [Array] object as Array
+  ##
+  def ensure_array
+    return [self]
+  end
 end

data/lib/curly/version.rb CHANGED Viewed

@@ -1,3 +1,5 @@
+# Top level module for CurlyQ
 module Curly
-  VERSION = '0.0.8'
+  # Current version number
+  VERSION = '0.0.10'
 end

data/lib/curly.rb CHANGED Viewed

@@ -4,6 +4,7 @@ require 'curly/version'
 require 'curly/hash'
 require 'curly/string'
 require 'curly/array'
+require 'curly/numeric'
 require 'json'
 require 'yaml'
 require 'uri'

data/src/_README.md CHANGED Viewed

@@ -10,9 +10,12 @@ _If you find this useful, feel free to [buy me some coffee][donate]._
 [donate]: https://brettterpstra.com/donate
 <!--END GITHUB-->
-The current version of `curlyq` is <!--VER-->0.0.4<!--END VER-->.
+[jq]: https://github.com/jqlang/jq "Command-line JSON processor"
+[yq]: https://github.com/mikefarah/yq "yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor"
-CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like `jq` to parse the output.
+The current version of `curlyq` is <!--VER-->0.0.9<!--END VER-->.
+CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like [jq] to parse the output.
 [github]: https://github.com/ttscoff/curlyq/
@@ -45,6 +48,9 @@ You can shape the results using `--search` (`-s`) and `--query` (`-q`) on some c
 A search uses either CSS or XPath syntax to locate elements. For example, if you wanted to locate all of the `<article>` elements with a class of `post` inside of the div with an id of `main`, you would run `--search '#main article.post'`. Searches can target tags, ids, and classes, and can accept `>` to target direct descendents. You can also use XPaths, but I hate those so I'm not going to document them.
+> I've tried to make the query function useful, but if you want to do any kind of advanced shaping, you're better off piping the JSON output to [jq] or [yq].
+<!--JEKYLL{:.warn}-->
 Queries are specifically for shaping CurlyQ output. If you're using the `html` command, it returns a key called `images`, so you can target just the images in the response with `-q 'images'`. The queries accept array syntax, so to get the first image, you would use `-q 'images[0]'`. Ranges are accepted as well, so `-q 'images[1..4]'` will return the 2nd through 5th images found on the page. You can also do comparisons, e.g. `images[rel=me]'` to target only images with a `rel` attribute of `me`.
 The comparisons for the query flag are:
@@ -58,6 +64,16 @@ The comparisons for the query flag are:
 - `^=` starts with text
 - `$=` ends with text
+Comparisons can be numeric or string comparisons. A numeric comparison like `curlyq images -q '[width>500]' URL` would return all of the images on the page with a width attribute greater than 500.
+You can also use dot syntax inside of comparisons, e.g. `[links.rel*=me]` to target the links object (`html` command), and return only the links with a `rel=me` attribute. If the comparison is to an array object (like `class` or `rel`), it will match if any of the elements of the array match your comparison.
+If you end the query with a specific key, only that key will be output. If there's only one match, it will be output as a raw string. If there are multiple matches, output will be an array:
+    curlyq tags --search '#main .post h3' -q '[attrs.id*=what].source' 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/'
+    <h3 id="whats-next">What’s Next</h3>
 #### Commands
 curlyq makes use of subcommands, e.g. `curlyq html [options] URL` or `curlyq extract [options] URL`. Each subcommand takes its own options, but I've made an effort to standardize the choices between each command as much as possible.
@@ -314,7 +330,7 @@ Example:
 Return a hierarchy of all tags in a page. Use `-t` to limit to a specific tag.
-    curlyq tags --search '#main .post h3' -q 'attrs[id*=what]' https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/
+    curlyq tags --search '#main .post h3' -q '[attrs.id*=what]' https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/
     [
       {

data/test/curlyq_headlinks_test.rb CHANGED Viewed

@@ -17,8 +17,9 @@ class CurlyQHeadlinksTest < Test::Unit::TestCase
     result = curlyq('headlinks', '-q', '[rel=stylesheet]', 'https://brettterpstra.com')
     json = JSON.parse(result)
-    assert_match(/stylesheet/, json['rel'], 'Should have retrieved a single result with rel stylesheet')
-    assert_match(/screen\.\d+\.css$/, json['href'], 'Stylesheet should be correct primary stylesheet')
+    assert_equal(Array, json.class, 'Result should be an array')
+    assert_match(/stylesheet/, json[0]['rel'], 'Should have retrieved a single result with rel stylesheet')
+    assert_match(/screen\.\d+\.css$/, json[0]['href'], 'Stylesheet should be correct primary stylesheet')
   end
   def test_headlinks

data/test/curlyq_html_test.rb CHANGED Viewed

@@ -14,12 +14,12 @@ class CurlyQHtmlTest < Test::Unit::TestCase
     result = curlyq('html', '-s', '#main article .aligncenter', '-q', 'images[1]', 'https://brettterpstra.com')
     json = JSON.parse(result)
-    assert_match(/aligncenter/, json['class'], 'Should have found an image with class "aligncenter"')
+    assert_match(/aligncenter/, json[0]['class'], 'Should have found an image with class "aligncenter"')
   end
   def test_html_query
     result = curlyq('html', '-q', 'meta.title', 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/')
-    assert_match(/Introducing CurlyQ/, result, 'Should have retrived the page title')
+    json = JSON.parse(result)
+    assert_match(/Introducing CurlyQ/, json[0], 'Should have retrived the page title')
   end
 end

data/test/curlyq_scrape_test.rb CHANGED Viewed

@@ -11,12 +11,42 @@ class CurlyQScrapeTest < Test::Unit::TestCase
   include CurlyQHelpers
   def setup
+    @screenshot = File.join(File.dirname(__FILE__), 'screenshot_test')
+    FileUtils.rm_f("#{@screenshot}.pdf") if File.exist?("#{@screenshot}.pdf")
+    FileUtils.rm_f('screenshot_test.png') if File.exist?("#{@screenshot}.png")
+    FileUtils.rm_f("#{@screenshot}_full.png") if File.exist?("#{@screenshot}_full.png")
   end
-  def test_scrape
+  def teardown
+    FileUtils.rm_f("#{@screenshot}.pdf") if File.exist?("#{@screenshot}.pdf")
+    FileUtils.rm_f('screenshot_test.png') if File.exist?("#{@screenshot}.png")
+    FileUtils.rm_f("#{@screenshot}_full.png") if File.exist?("#{@screenshot}_full.png")
+  end
+  def test_scrape_firefox
     result = curlyq('scrape', '-b', 'firefox', '-q', 'links[rel=me&content*=mastodon][0]', 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/')
     json = JSON.parse(result)
-    assert_match(/Mastodon/, json['content'], 'Should have retrieved a Mastodon link')
+    assert_equal(Array, json.class, 'Result should be an Array')
+    assert_match(/Mastodon/, json[0]['content'], 'Should have retrieved a Mastodon link')
+  end
+  def test_scrape_chrome
+    result = curlyq('scrape', '-b', 'chrome', '-q', 'links[rel=me&content*=mastodon][0]', 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/')
+    json = JSON.parse(result)
+    assert_equal(Array, json.class, 'Result should be an Array')
+    assert_match(/Mastodon/, json[0]['content'], 'Should have retrieved a Mastodon link')
+  end
+  def test_screenshot
+    curlyq('screenshot', '-b', 'firefox', '-o', @screenshot, '-t', 'print', 'https://brettterpstra.com')
+    assert(File.exist?("#{@screenshot}.pdf"), 'PDF Screenshot should exist')
+    curlyq('screenshot', '-b', 'chrome', '-o', @screenshot, '-t', 'visible', 'https://brettterpstra.com')
+    assert(File.exist?("#{@screenshot}.png"), 'PNG Screenshot should exist')
+    curlyq('screenshot', '-b', 'firefox', '-o', "#{@screenshot}_full", '-t', 'full', 'https://brettterpstra.com')
+    assert(File.exist?("#{@screenshot}_full.png"), 'PNG Screenshot should exist')
   end
 end

data/test/curlyq_tags_test.rb CHANGED Viewed

@@ -14,18 +14,26 @@ class CurlyQTagsTest < Test::Unit::TestCase
   end
   def test_tags
-    result = curlyq('tags', '--search', '#main .post h3', '-q', 'attrs[id*=what]', 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/')
+    result = curlyq('tags', '--search', '#main .post h3', 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/')
     json = JSON.parse(result)
-    assert_equal(json.count, 1, 'Should have 1 result')
-    assert_match(/whats-next/, json[0]['attrs']['id'], 'Should have matched #whats-next')
+    assert_equal(Array, json.class, 'Should be an array of matches')
+    assert_equal(6, json.count, 'Should be six results')
   end
   def test_clean
     result = curlyq('tags', '--search', '#main section.related', '--clean', 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/')
     json = JSON.parse(result)
-    assert_equal(json.count, 1, 'Should have 1 result')
+    assert_equal(Array, json.class, 'Should be a single Array')
+    assert_equal(1, json.count, 'Should be one element')
     assert_match(%r{Last.fm</h5></a></li>}, json[0]['source'], 'Should have matched #whats-next')
   end
+  def test_query
+    result = curlyq('tags', '--search', '#main .post h3', '-q', '[attrs.id*=what].source', 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/')
+    json = JSON.parse(result)
+    assert_equal(Array, json.class, 'Should be an array')
+    assert_match(%r{^<h3 id="whats-next">What’s Next</h3>$}, json[0], 'Should have returned just source')
+  end
 end

data/test/helpers/curlyq-helpers.rb CHANGED Viewed

@@ -1,5 +1,6 @@
 require 'open3'
 require 'time'
+require 'fileutils'
 $LOAD_PATH.unshift File.join(__dir__, '..', '..', 'lib')
 require 'curly'

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: curlyq
 version: !ruby/object:Gem::Version
-  version: 0.0.8
+  version: 0.0.10
 platform: ruby
 authors:
 - Brett Terpstra
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2024-01-15 00:00:00.000000000 Z
+date: 2024-01-17 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rake
@@ -236,6 +236,7 @@ files:
 - lib/curly/curl/html.rb
 - lib/curly/curl/json.rb
 - lib/curly/hash.rb
+- lib/curly/numeric.rb
 - lib/curly/string.rb
 - lib/curly/version.rb
 - src/_README.md