RubyGems - curlyq - Versions diffs - 0.0.10 → 0.0.11 - Mend

curlyq 0.0.10 → 0.0.11

Files changed (9) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 6109483b8869733f9e21ecab9bc8bcda0aa3b58ca1f13f9b96fe7739d019df1f
-  data.tar.gz: 98a8d46fe68bc88ea030dfb8e04262fbab5418005390ff79693d6f636a3bf276
+  metadata.gz: a9b0847eb3dd79e15b96bed47858ad0eb0df2ba7db8cf2e3395cb9e08e71c194
+  data.tar.gz: '06623683ff93c02087432750a150ac663c4558b7d18323bbbb367e004abd58ab'
 SHA512:
-  metadata.gz: 1d75b4af2d6c1fadb83501fa707184ef41d061c08de14666b86d296048e8f21540fe2ad53a79985d5b042c93fa629cdbe8d101828edbb02832d1b55b920d5834
-  data.tar.gz: 238855918e3e765a2edf1864dd2663a959b099cfa5f1b89942f94eb20ba428c1700adee85590879662f0cf8de659328fbe752e8648ee210eefe0769639c57da2
+  metadata.gz: 8b7098dde55f9b76a53eff1f71a5d821a2db6d5828fb67428f2aa3ef5d6ab8e2bdbb79f5375fb5291b965ff3d0b9677cf0084782c078c2bb5575a8383bd26906
+  data.tar.gz: c0b02267ea0de1c490b2c2dcd171f8a992fa659733aa9bd9e0dc590988af3d7c5f4b6e38e0371ce72c879a1f956ec7f8b87e8432e684d8f7dad4f019314fa834

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,11 @@
+### 0.0.11
+2024-01-21 15:29
+#### IMPROVED
+- Add option for --local_links_only to html and links command, only returning links with the same origin site
 ### 0.0.10
 2024-01-17 13:50

data/Gemfile.lock CHANGED Viewed

@@ -1,7 +1,7 @@
 PATH
   remote: .
   specs:
-    curlyq (0.0.10)
+    curlyq (0.0.11)
       gli (~> 2.21.0)
       nokogiri (~> 1.16.0)
       selenium-webdriver (~> 4.16.0)

data/README.md CHANGED Viewed

@@ -13,7 +13,7 @@ _If you find this useful, feel free to [buy me some coffee][donate]._
 [jq]: https://github.com/jqlang/jq "Command-line JSON processor"
 [yq]: https://github.com/mikefarah/yq "yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor"
-The current version of `curlyq` is 0.0.10
+The current version of `curlyq` is 0.0.11
 .
 CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like [jq] to parse the output.
@@ -47,7 +47,7 @@ SYNOPSIS
     curlyq [global options] command [command options] [arguments...]
 VERSION
-    0.0.10
+    0.0.11
 GLOBAL OPTIONS
     --help          - Show this message
@@ -94,11 +94,13 @@ Comparisons can be numeric or string comparisons. A numeric comparison like `cur
 You can also use dot syntax inside of comparisons, e.g. `[links.rel*=me]` to target the links object (`html` command), and return only the links with a `rel=me` attribute. If the comparison is to an array object (like `class` or `rel`), it will match if any of the elements of the array match your comparison.
-If you end the query with a specific key, only that key will be output. If there's only one match, it will be output as a raw string. If there are multiple matches, output will be an array:
+If you end the query with a specific key, only that key will be output, but it will be in an array. If there's only one match, it will be output as a raw string as a single element in an array.
     curlyq tags --search '#main .post h3' -q '[attrs.id*=what].source' 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/'
-    <h3 id="whats-next">What???s Next</h3>
+    [
+      "<h3 id=\"whats-next\">What???s Next</h3>"
+    ]
 #### Commands
@@ -237,6 +239,7 @@ COMMAND OPTIONS
     -h, --header=arg          - Define a header to send as "key=value" (may be used more than once, default: none)
     --[no-]ignore_fragments   - Ignore fragment hrefs when gathering content links
     --[no-]ignore_relative    - Ignore relative hrefs when gathering content links
+    -l, --local_links_only    - Only gather internal (same-site) links
     -q, --query, --filter=arg - Filter output using dot-syntax path (default: none)
     -r, --raw=arg             - Output a raw value for a key (default: none)
     -s, --search=arg          - Regurn an array of matches to a CSS or XPath query (default: none)
@@ -379,6 +382,7 @@ COMMAND OPTIONS
     -d, --[no-]dedup          - Filter out duplicate links, preserving only first one
     --[no-]ignore_fragments   - Ignore fragment hrefs when gathering content links
     --[no-]ignore_relative    - Ignore relative hrefs when gathering content links
+    -l, --local_links_only    - Only gather internal (same-site) links
     -q, --query, --filter=arg - Filter output using dot-syntax path (default: none)
     -x, --external_links_only - Only gather external links
 ```

data/bin/curlyq CHANGED Viewed

@@ -103,6 +103,9 @@ command %i[html curl] do |c|
   c.desc 'Only gather external links'
   c.switch %i[x external_links_only], default_value: false, negatable: false
+  c.desc 'Only gather internal (same-site) links'
+  c.switch %i[l local_links_only], default_value: false, negatable: false
   c.action do |global_options, options, args|
     urls = args.join(' ').split(/[, ]+/)
     headers = break_headers(options[:header])
@@ -115,7 +118,8 @@ command %i[html curl] do |c|
                         compressed: options[:compressed], clean: options[:clean],
                         ignore_local_links: options[:ignore_relative],
                         ignore_fragment_links: options[:ignore_fragments],
-                        external_links_only: options[:external_links_only] }
+                        external_links_only: options[:external_links_only],
+                        local_links_only: options[:local_links_only] }
       res = Curl::Html.new(url, curl_settings)
       res.curl
@@ -417,6 +421,9 @@ command :links do |c|
   c.desc 'Only gather external links'
   c.switch %i[x external_links_only], default_value: false, negatable: false
+  c.desc 'Only gather internal (same-site) links'
+  c.switch %i[l local_links_only], default_value: false, negatable: false
   c.desc 'Filter output using dot-syntax path'
   c.flag %i[q query filter]
@@ -433,7 +440,8 @@ command :links do |c|
                              compressed: options[:compressed], clean: options[:clean],
                              ignore_local_links: options[:ignore_relative],
                              ignore_fragment_links: options[:ignore_fragments],
-                             external_links_only: options[:external_links_only]
+                             external_links_only: options[:external_links_only],
+                             local_links_only: options[:local_links_only]
                            })
       res.curl

data/lib/curly/curl/html.rb CHANGED Viewed

@@ -11,7 +11,7 @@ module Curl
   # Class for CURLing an HTML page
   class Html
     attr_accessor :settings, :browser, :source, :headers, :headers_only, :compressed, :clean, :fallback,
-                  :ignore_local_links, :ignore_fragment_links, :external_links_only
+                  :ignore_local_links, :ignore_fragment_links, :external_links_only, :local_links_only
     attr_reader :url, :code, :meta, :links, :head, :body,
                 :title, :description, :body_links, :body_images
@@ -69,6 +69,7 @@ module Curl
       @ignore_local_links = options[:ignore_local_links]
       @ignore_fragment_links = options[:ignore_fragment_links]
       @external_links_only = options[:external_links_only]
+      @local_links_only = options[:local_links_only]
       @curl = TTY::Which.which('curl')
       @url = url.nil? ? options[:url] : url
@@ -490,11 +491,19 @@ module Curl
         link_href = link_href[2]
-        next if link_href =~ /^#/ && (@ignore_fragment_links || @external_links_only)
+        if @local_links_only
+          next if @ignore_fragment_links && link_href =~ /^#/
-        next if link_href !~ %r{^(\w+:)?//} && (@ignore_local_links || @external_links_only)
+          next unless same_origin?(link_href)
-        next if same_origin?(link_href) && @external_links_only
+        else
+          next if link_href =~ /^#/ && (@ignore_fragment_links || @external_links_only)
+          next if link_href !~ %r{^(\w+:)?//} && (@ignore_local_links || @external_links_only)
+          next if same_origin?(link_href) && @external_links_only
+        end
         link_title = tag.match(/title=(['"])(.*?)\1/)
         link_title = link_title.nil? ? nil : link_title[2]
@@ -522,11 +531,19 @@ module Curl
       link_tags.each do |m|
         href = m['tag'].match(/href=(["'])(.*?)\1/)
         href = href[2] unless href.nil?
-        next if href =~ /^#/ && (@ignore_fragment_links || @external_links_only)
+        if @local_links_only
+          next if href =~ /^#/ && @ignore_fragment_links
-        next if href !~ %r{^(\w+:)?//} && (@ignore_local_links || @external_links_only)
+          next unless same_origin?(href)
-        next if same_origin?(href) && @external_links_only
+        else
+          next if href =~ /^#/ && (@ignore_fragment_links || @external_links_only)
+          next if href !~ %r{^(\w+:)?//} && (@ignore_local_links || @external_links_only)
+          next if same_origin?(href) && @external_links_only
+        end
         title = m['tag'].match(/title=(["'])(.*?)\1/)
         title = title[2] unless title.nil?

data/lib/curly/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # Top level module for CurlyQ
 module Curly
   # Current version number
-  VERSION = '0.0.10'
+  VERSION = '0.0.11'
 end

data/src/_README.md CHANGED Viewed

@@ -13,7 +13,7 @@ _If you find this useful, feel free to [buy me some coffee][donate]._
 [jq]: https://github.com/jqlang/jq "Command-line JSON processor"
 [yq]: https://github.com/mikefarah/yq "yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor"
-The current version of `curlyq` is <!--VER-->0.0.9<!--END VER-->.
+The current version of `curlyq` is <!--VER-->0.0.10<!--END VER-->.
 CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like [jq] to parse the output.
@@ -68,11 +68,13 @@ Comparisons can be numeric or string comparisons. A numeric comparison like `cur
 You can also use dot syntax inside of comparisons, e.g. `[links.rel*=me]` to target the links object (`html` command), and return only the links with a `rel=me` attribute. If the comparison is to an array object (like `class` or `rel`), it will match if any of the elements of the array match your comparison.
-If you end the query with a specific key, only that key will be output. If there's only one match, it will be output as a raw string. If there are multiple matches, output will be an array:
+If you end the query with a specific key, only that key will be output, but it will be in an array. If there's only one match, it will be output as a raw string as a single element in an array.
     curlyq tags --search '#main .post h3' -q '[attrs.id*=what].source' 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/'
-    <h3 id="whats-next">What’s Next</h3>
+    [
+      "<h3 id=\"whats-next\">What’s Next</h3>"
+    ]
 #### Commands

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: curlyq
 version: !ruby/object:Gem::Version
-  version: 0.0.10
+  version: 0.0.11
 platform: ruby
 authors:
 - Brett Terpstra
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2024-01-17 00:00:00.000000000 Z
+date: 2024-01-21 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rake