RubyGems - curlyq - Versions diffs - 0.0.11 → 0.0.12 - Mend

curlyq 0.0.11 → 0.0.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: a9b0847eb3dd79e15b96bed47858ad0eb0df2ba7db8cf2e3395cb9e08e71c194
-  data.tar.gz: '06623683ff93c02087432750a150ac663c4558b7d18323bbbb367e004abd58ab'
+  metadata.gz: af564949987eb7b96f7dfc730fea6ac79fb4a903c3eaff7a412af446c4a0e699
+  data.tar.gz: 58fb83c8132da551813adad468a6eb546a0d758df2b7c756b794a2a019963084
 SHA512:
-  metadata.gz: 8b7098dde55f9b76a53eff1f71a5d821a2db6d5828fb67428f2aa3ef5d6ab8e2bdbb79f5375fb5291b965ff3d0b9677cf0084782c078c2bb5575a8383bd26906
-  data.tar.gz: c0b02267ea0de1c490b2c2dcd171f8a992fa659733aa9bd9e0dc590988af3d7c5f4b6e38e0371ce72c879a1f956ec7f8b87e8432e684d8f7dad4f019314fa834
+  metadata.gz: 70e9485a7729acbd45a501278ceaae575bf358205c05da49de5c36f08ab8fd263824c8f8bd302c32b058986db3783b7abc6d9f0e58bb11f18879cdfde5210eac
+  data.tar.gz: 8d2e664b9e786c0a9d13fb8177c703be84c0a784835dba14354189978c34501334cc1dbdd68cc0112ad363c73554c170e9eb5fea64b60ee229bc4105b08d3fc6

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,7 @@
+### 0.0.12
+2024-04-04 13:06
 ### 0.0.11
 2024-01-21 15:29

data/Gemfile.lock CHANGED Viewed

@@ -1,7 +1,7 @@
 PATH
   remote: .
   specs:
-    curlyq (0.0.11)
+    curlyq (0.0.12)
       gli (~> 2.21.0)
       nokogiri (~> 1.16.0)
       selenium-webdriver (~> 4.16.0)
@@ -11,7 +11,7 @@ GEM
   remote: https://rubygems.org/
   specs:
     gli (2.21.1)
-    nokogiri (1.16.0-arm64-darwin)
+    nokogiri (1.16.2-arm64-darwin)
       racc (~> 1.4)
     parallel (1.23.0)
     parallel_tests (3.13.0)

data/README.md CHANGED Viewed

@@ -13,8 +13,7 @@ _If you find this useful, feel free to [buy me some coffee][donate]._
 [jq]: https://github.com/jqlang/jq "Command-line JSON processor"
 [yq]: https://github.com/mikefarah/yq "yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor"
-The current version of `curlyq` is 0.0.11
-.
+The current version of `curlyq` is 0.0.12.
 CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like [jq] to parse the output.
@@ -47,7 +46,7 @@ SYNOPSIS
     curlyq [global options] command [command options] [arguments...]
 VERSION
-    0.0.11
+    0.0.12
 GLOBAL OPTIONS
     --help          - Show this message
@@ -56,6 +55,7 @@ GLOBAL OPTIONS
     -y, --[no-]yaml - Output YAML instead of json
 COMMANDS
+    execute    - Execute JavaScript on a URL
     extract    - Extract contents between two regular expressions
     headlinks  - Return all <head> links on URL's page
     help       - Shows a list of commands or help for one command
@@ -94,13 +94,11 @@ Comparisons can be numeric or string comparisons. A numeric comparison like `cur
 You can also use dot syntax inside of comparisons, e.g. `[links.rel*=me]` to target the links object (`html` command), and return only the links with a `rel=me` attribute. If the comparison is to an array object (like `class` or `rel`), it will match if any of the elements of the array match your comparison.
-If you end the query with a specific key, only that key will be output, but it will be in an array. If there's only one match, it will be output as a raw string as a single element in an array.
+If you end the query with a specific key, only that key will be output. If there's only one match, it will be output as a raw string. If there are multiple matches, output will be an array:
     curlyq tags --search '#main .post h3' -q '[attrs.id*=what].source' 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/'
-    [
-      "<h3 id=\"whats-next\">What???s Next</h3>"
-    ]
+    <h3 id="whats-next">What???s Next</h3>
 #### Commands
@@ -138,6 +136,34 @@ COMMAND OPTIONS
 ```
+##### execute
+You can execute JavaScript on a given web page using the `execute` subcommand.
+Example:
+    curlyq execute -s "NiftyAPI.find('file/save').arrow().shoot('file-save')" file:///Users/ttscoff/Desktop/Code/niftymenu/dist/MultiMarkdown-Composer.html
+You can specify an element id to wait for using `--id`, and define a pause to wait after executing a script with `--wait` (defaults to 2 seconds). Scripts can be read from the command line arguments with `--script "SCRIPT"`, from STDIN with `--script -`, or from a file using `--script PATH`.
+If you expect a return value, be sure to include a `return` statement in your executed script. Results will be output to STDOUT.
+```
+NAME
+    execute - Execute JavaScript on a URL
+SYNOPSIS
+    curlyq [global options] execute [command options] URL...
+COMMAND OPTIONS
+    -b, --browser=arg - Browser to use (firefox, chrome) (default: chrome)
+    -h, --header=arg  - Define a header to send as key=value (may be used more than once, default: none)
+    -i, --id=arg      - Element ID to wait for before executing (default: none)
+    -s, --script=arg  - Script to execute, use - to read from STDIN (may be used more than once, default: none)
+    -w, --wait=arg    - Seconds to wait after executing JS (default: 2)
+```
 ##### headlinks
 Example:
@@ -440,6 +466,9 @@ Example:
     Screenshot saved to /Users/ttscoff/Desktop/test.png
+You can wait for an element ID to be visible using `--id`. This can be any `#ID` on the page. If the ID doesn't exist on the page, though, the screenshot will hang for a timeout of 10 seconds.
+You can execute a script before taking the screenshot with the `--script` flag. If this is set to `-`, it will read the script from STDIN. If it's set to an existing file path, that file will be read for script input. Specify an interval (in seconds) to wait after executing the script with `--wait`.
 ```
 NAME
@@ -452,8 +481,11 @@ SYNOPSIS
 COMMAND OPTIONS
     -b, --browser=arg     - Browser to use (firefox, chrome) (default: chrome)
     -h, --header=arg      - Define a header to send as key=value (may be used more than once, default: none)
+    -i, --id=arg          - Element ID to wait for before taking screenshot (default: none)
     -o, --out, --file=arg - File destination (required, default: none)
+    -s, --script=arg      - Script to execute before taking screenshot (may be used more than once, default: none)
     -t, --type=arg        - Type of screenshot to save (full (requires firefox), print, visible) (default: visible)
+    -w, --wait=arg        - Time to wait before taking screenshot (default: 0)
 ```
 ##### tags

data/bin/curlyq CHANGED Viewed

@@ -156,6 +156,61 @@ command %i[html curl] do |c|
   end
 end
+desc 'Execute JavaScript on a URL'
+arg_name 'URL', multiple: true
+command :execute do |c|
+  c.desc 'Browser to use (firefox, chrome)'
+  c.flag %i[b browser], type: BrowserType, must_match: /^[fc].*?$/, default_value: 'chrome'
+  c.desc 'Define a header to send as key=value'
+  c.flag %i[h header], multiple: true
+  c.desc 'Script to execute, use - to read from STDIN'
+  c.flag %i[s script], multiple: true
+  c.desc 'Element ID to wait for before executing'
+  c.flag %i[i id]
+  c.desc 'Seconds to wait after executing JS'
+  c.flag %i[w wait], default_value: 2
+  c.action do |_, options, args|
+    urls = args.join(' ').split(/[, ]+/)
+    raise 'Script input required' unless options[:file] || options[:script]
+    compiled_script = []
+    if options[:script].count.positive?
+      options[:script].each do |scr|
+        scr.strip!
+        if scr == '-'
+          compiled_script << $stdin.read
+        elsif File.exist?(File.expand_path(scr))
+          compiled_script << IO.read(File.expand_path(scr))
+        else
+          compiled_script << scr
+        end
+      end
+    end
+    script = compiled_script.count.positive? ? compiled_script.join(';') : nil
+    headers = break_headers(options[:header])
+    browser = options[:browser]
+    browser = browser.is_a?(Symbol) ? browser : browser.normalize_browser_type
+    urls.each do |url|
+      c = Curl::Html.new(url)
+      c.headers = headers
+      c.browser = browser
+      $stdout.puts c.execute(script, options[:wait], options[:id])
+    end
+  end
+end
 desc 'Save a screenshot of a URL'
 arg_name 'URL', multiple: true
 command :screenshot do |c|
@@ -171,6 +226,15 @@ command :screenshot do |c|
   c.desc 'Define a header to send as key=value'
   c.flag %i[h header], multiple: true
+  c.desc 'Script to execute before taking screenshot'
+  c.flag %i[s script], multiple: true
+  c.desc 'Element ID to wait for before taking screenshot'
+  c.flag %i[i id]
+  c.desc 'Time to wait before taking screenshot'
+  c.flag %i[w wait], default_value: 0, type: Integer
   c.action do |_, options, args|
     urls = args.join(' ').split(/[, ]+/)
     headers = break_headers(options[:header])
@@ -181,13 +245,30 @@ command :screenshot do |c|
     type = type.is_a?(Symbol) ? type : type.normalize_screenshot_type
     browser = browser.is_a?(Symbol) ? browser : browser.normalize_browser_type
+    compiled_script = []
+    if options[:script].count.positive?
+      options[:script].each do |scr|
+        scr.strip!
+        if scr == '-'
+          compiled_script << $stdin.read
+        elsif File.exist?(File.expand_path(scr))
+          compiled_script << IO.read(File.expand_path(scr))
+        else
+          compiled_script << scr
+        end
+      end
+    end
+    script = compiled_script.count.positive? ? compiled_script.join(';') : nil
     raise 'Full page screen shots only available with Firefox' if type == :full_page && browser != :firefox
     urls.each do |url|
       c = Curl::Html.new(url)
       c.headers = headers
       c.browser = browser
-      c.screenshot(options[:out], type: type)
+      c.screenshot(options[:out], type: type, script: script, id: options[:id], wait: options[:wait])
     end
   end
 end

data/lib/curly/curl/html.rb CHANGED Viewed

@@ -128,10 +128,19 @@ module Curl
     ##                          save (:full_page,
     ##                          :print_page, :visible)
     ##
-    def screenshot(destination = nil, type: :full_page)
-      full_page = type.to_sym == :full_page
-      print_page = type.to_sym == :print_page
-      save_screenshot(destination, type: type)
+    def screenshot(destination = nil, type: :full_page, script: nil, id: nil, wait: 0)
+      # full_page = type.to_sym == :full_page
+      # print_page = type.to_sym == :print_page
+      save_screenshot(destination, type: type, script: script, id: id, wait_seconds: wait)
+    end
+    ##
+    ## @brief      Execute JavaScript
+    ##
+    ## @param      script  The script to run
+    ##
+    def execute(script, wait, element_id)
+      run_js(script, wait, element_id)
     end
     ##
@@ -145,12 +154,11 @@ module Curl
     def extract(before, after, inclusive: false)
       before = /#{Regexp.escape(before)}/ unless before.is_a?(Regexp)
       after = /#{Regexp.escape(after)}/ unless after.is_a?(Regexp)
-      if inclusive
-        rx = /(#{before.source}.*?#{after.source})/m
-      else
-        rx = /(?<=#{before.source})(.*?)(?=#{after.source})/m
-      end
+      rx = if inclusive
+             /(#{before.source}.*?#{after.source})/m
+           else
+             /(?<=#{before.source})(.*?)(?=#{after.source})/m
+           end
       @body.scan(rx).map { |r| @clean ? r[0].clean : r[0] }
     end
@@ -604,14 +612,46 @@ module Curl
       res
     end
+    ##
+    ## Run JavaScript on a URL
+    ##
+    ## @param      script      The JavaScript to execute
+    ## @param      wait        Seconds to wait after executing JS
+    ## @param      element_id  The element identifier
+    ##
+    def run_js(script, wait_seconds = 2, element_id = nil)
+      raise 'No script provided' if script.nil?
+      browser = @browser.is_a?(String) ? @browser.normalize_browser_type : @browser
+      driver = Selenium::WebDriver.for browser
+      driver.manage.timeouts.implicit_wait = 15
+      res = nil
+      begin
+        driver.get @url
+        if element_id
+          wait = Selenium::WebDriver::Wait.new(timeout: 10) # seconds
+          wait.until { driver.find_element(id: element_id) }
+        end
+        res = driver.execute_script(script)
+        sleep wait_seconds.to_i
+      ensure
+        driver.quit
+      end
+      $stderr.puts "Executed JS on #{@url}"
+      res
+    end
     ##
     ## Save a screenshot of a url
     ##
     ## @param      destination  [String] File path destination
-    ## @param      browser      [Symbol] The browser (:chrome or :firefox)
     ## @param      type         [Symbol] The type of screenshot (:full_page, :print_page, or :visible)
     ##
-    def save_screenshot(destination = nil, type: :full_page)
+    def save_screenshot(destination = nil, type: :full_page, script: nil, wait_seconds: 0, id: nil)
       raise 'No URL provided' if url.nil?
       raise 'No file destination provided' if destination.nil?
@@ -620,7 +660,7 @@ module Curl
       raise 'Path doesn\'t exist' unless File.directory?(File.dirname(destination))
-      browser = browser.normalize_browser_type if browser.is_a?(String)
+      browser = @browser.is_a?(String) ? @browser.normalize_browser_type : @browser
       type = type.normalize_screenshot_type if type.is_a?(String)
       raise 'Can not save full screen with Chrome, use Firefox' if type == :full_page && browser == :chrome
@@ -631,10 +671,21 @@ module Curl
                       "#{destination.sub(/\.(pdf|jpe?g|png)$/, '')}.png"
                     end
-      driver = Selenium::WebDriver.for @browser
+      driver = Selenium::WebDriver.for browser
       driver.manage.timeouts.implicit_wait = 4
       begin
         driver.get @url
+        if id
+          wait = Selenium::WebDriver::Wait.new(timeout: 10) # seconds
+          wait.until { driver.find_element(id: id) }
+        end
+        if script
+          res = driver.execute_script(script)
+        end
+        sleep wait_seconds.to_i
         case type
         when :print_page
           driver.save_print_page(destination)

data/lib/curly/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # Top level module for CurlyQ
 module Curly
   # Current version number
-  VERSION = '0.0.11'
+  VERSION = '0.0.12'
 end

data/src/_README.md CHANGED Viewed

@@ -13,7 +13,7 @@ _If you find this useful, feel free to [buy me some coffee][donate]._
 [jq]: https://github.com/jqlang/jq "Command-line JSON processor"
 [yq]: https://github.com/mikefarah/yq "yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor"
-The current version of `curlyq` is <!--VER-->0.0.10<!--END VER-->.
+The current version of `curlyq` is <!--VER-->0.0.9<!--END VER-->.
 CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like [jq] to parse the output.
@@ -68,13 +68,11 @@ Comparisons can be numeric or string comparisons. A numeric comparison like `cur
 You can also use dot syntax inside of comparisons, e.g. `[links.rel*=me]` to target the links object (`html` command), and return only the links with a `rel=me` attribute. If the comparison is to an array object (like `class` or `rel`), it will match if any of the elements of the array match your comparison.
-If you end the query with a specific key, only that key will be output, but it will be in an array. If there's only one match, it will be output as a raw string as a single element in an array.
+If you end the query with a specific key, only that key will be output. If there's only one match, it will be output as a raw string. If there are multiple matches, output will be an array:
     curlyq tags --search '#main .post h3' -q '[attrs.id*=what].source' 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/'
-    [
-      "<h3 id=\"whats-next\">What’s Next</h3>"
-    ]
+    <h3 id="whats-next">What’s Next</h3>
 #### Commands
@@ -97,6 +95,22 @@ This specifies a before and after string and includes them (`-i`) in the result.
 ```
+##### execute
+You can execute JavaScript on a given web page using the `execute` subcommand.
+Example:
+    curlyq execute -s "NiftyAPI.find('file/save').arrow().shoot('file-save')" file:///Users/ttscoff/Desktop/Code/niftymenu/dist/MultiMarkdown-Composer.html
+You can specify an element id to wait for using `--id`, and define a pause to wait after executing a script with `--wait` (defaults to 2 seconds). Scripts can be read from the command line arguments with `--script "SCRIPT"`, from STDIN with `--script -`, or from a file using `--script PATH`.
+If you expect a return value, be sure to include a `return` statement in your executed script. Results will be output to STDOUT.
+```
+@cli(bundle exec bin/curlyq help execute)
+```
 ##### headlinks
 Example:
@@ -323,6 +337,9 @@ Example:
     Screenshot saved to /Users/ttscoff/Desktop/test.png
+You can wait for an element ID to be visible using `--id`. This can be any `#ID` on the page. If the ID doesn't exist on the page, though, the screenshot will hang for a timeout of 10 seconds.
+You can execute a script before taking the screenshot with the `--script` flag. If this is set to `-`, it will read the script from STDIN. If it's set to an existing file path, that file will be read for script input. Specify an interval (in seconds) to wait after executing the script with `--wait`.
 ```
 @cli(bundle exec bin/curlyq help screenshot)

data/test/curlyq_scrape_test.rb CHANGED Viewed

@@ -13,13 +13,13 @@ class CurlyQScrapeTest < Test::Unit::TestCase
   def setup
     @screenshot = File.join(File.dirname(__FILE__), 'screenshot_test')
     FileUtils.rm_f("#{@screenshot}.pdf") if File.exist?("#{@screenshot}.pdf")
-    FileUtils.rm_f('screenshot_test.png') if File.exist?("#{@screenshot}.png")
+    FileUtils.rm_f("#{@screenshot}.png") if File.exist?("#{@screenshot}.png")
     FileUtils.rm_f("#{@screenshot}_full.png") if File.exist?("#{@screenshot}_full.png")
   end
   def teardown
     FileUtils.rm_f("#{@screenshot}.pdf") if File.exist?("#{@screenshot}.pdf")
-    FileUtils.rm_f('screenshot_test.png') if File.exist?("#{@screenshot}.png")
+    FileUtils.rm_f("#{@screenshot}.png") if File.exist?("#{@screenshot}.png")
     FileUtils.rm_f("#{@screenshot}_full.png") if File.exist?("#{@screenshot}_full.png")
   end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: curlyq
 version: !ruby/object:Gem::Version
-  version: 0.0.11
+  version: 0.0.12
 platform: ruby
 authors:
 - Brett Terpstra
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2024-01-21 00:00:00.000000000 Z
+date: 2024-04-04 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rake