curlyq 0.0.10 → 0.0.11

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 6109483b8869733f9e21ecab9bc8bcda0aa3b58ca1f13f9b96fe7739d019df1f
4
- data.tar.gz: 98a8d46fe68bc88ea030dfb8e04262fbab5418005390ff79693d6f636a3bf276
3
+ metadata.gz: a9b0847eb3dd79e15b96bed47858ad0eb0df2ba7db8cf2e3395cb9e08e71c194
4
+ data.tar.gz: '06623683ff93c02087432750a150ac663c4558b7d18323bbbb367e004abd58ab'
5
5
  SHA512:
6
- metadata.gz: 1d75b4af2d6c1fadb83501fa707184ef41d061c08de14666b86d296048e8f21540fe2ad53a79985d5b042c93fa629cdbe8d101828edbb02832d1b55b920d5834
7
- data.tar.gz: 238855918e3e765a2edf1864dd2663a959b099cfa5f1b89942f94eb20ba428c1700adee85590879662f0cf8de659328fbe752e8648ee210eefe0769639c57da2
6
+ metadata.gz: 8b7098dde55f9b76a53eff1f71a5d821a2db6d5828fb67428f2aa3ef5d6ab8e2bdbb79f5375fb5291b965ff3d0b9677cf0084782c078c2bb5575a8383bd26906
7
+ data.tar.gz: c0b02267ea0de1c490b2c2dcd171f8a992fa659733aa9bd9e0dc590988af3d7c5f4b6e38e0371ce72c879a1f956ec7f8b87e8432e684d8f7dad4f019314fa834
data/CHANGELOG.md CHANGED
@@ -1,3 +1,11 @@
1
+ ### 0.0.11
2
+
3
+ 2024-01-21 15:29
4
+
5
+ #### IMPROVED
6
+
7
+ - Add option for --local_links_only to html and links command, only returning links with the same origin site
8
+
1
9
  ### 0.0.10
2
10
 
3
11
  2024-01-17 13:50
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- curlyq (0.0.10)
4
+ curlyq (0.0.11)
5
5
  gli (~> 2.21.0)
6
6
  nokogiri (~> 1.16.0)
7
7
  selenium-webdriver (~> 4.16.0)
data/README.md CHANGED
@@ -13,7 +13,7 @@ _If you find this useful, feel free to [buy me some coffee][donate]._
13
13
  [jq]: https://github.com/jqlang/jq "Command-line JSON processor"
14
14
  [yq]: https://github.com/mikefarah/yq "yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor"
15
15
 
16
- The current version of `curlyq` is 0.0.10
16
+ The current version of `curlyq` is 0.0.11
17
17
  .
18
18
 
19
19
  CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like [jq] to parse the output.
@@ -47,7 +47,7 @@ SYNOPSIS
47
47
  curlyq [global options] command [command options] [arguments...]
48
48
 
49
49
  VERSION
50
- 0.0.10
50
+ 0.0.11
51
51
 
52
52
  GLOBAL OPTIONS
53
53
  --help - Show this message
@@ -94,11 +94,13 @@ Comparisons can be numeric or string comparisons. A numeric comparison like `cur
94
94
 
95
95
  You can also use dot syntax inside of comparisons, e.g. `[links.rel*=me]` to target the links object (`html` command), and return only the links with a `rel=me` attribute. If the comparison is to an array object (like `class` or `rel`), it will match if any of the elements of the array match your comparison.
96
96
 
97
- If you end the query with a specific key, only that key will be output. If there's only one match, it will be output as a raw string. If there are multiple matches, output will be an array:
97
+ If you end the query with a specific key, only that key will be output, but it will be in an array. If there's only one match, it will be output as a raw string as a single element in an array.
98
98
 
99
99
  curlyq tags --search '#main .post h3' -q '[attrs.id*=what].source' 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/'
100
100
 
101
- <h3 id="whats-next">What???s Next</h3>
101
+ [
102
+ "<h3 id=\"whats-next\">What???s Next</h3>"
103
+ ]
102
104
 
103
105
  #### Commands
104
106
 
@@ -237,6 +239,7 @@ COMMAND OPTIONS
237
239
  -h, --header=arg - Define a header to send as "key=value" (may be used more than once, default: none)
238
240
  --[no-]ignore_fragments - Ignore fragment hrefs when gathering content links
239
241
  --[no-]ignore_relative - Ignore relative hrefs when gathering content links
242
+ -l, --local_links_only - Only gather internal (same-site) links
240
243
  -q, --query, --filter=arg - Filter output using dot-syntax path (default: none)
241
244
  -r, --raw=arg - Output a raw value for a key (default: none)
242
245
  -s, --search=arg - Regurn an array of matches to a CSS or XPath query (default: none)
@@ -379,6 +382,7 @@ COMMAND OPTIONS
379
382
  -d, --[no-]dedup - Filter out duplicate links, preserving only first one
380
383
  --[no-]ignore_fragments - Ignore fragment hrefs when gathering content links
381
384
  --[no-]ignore_relative - Ignore relative hrefs when gathering content links
385
+ -l, --local_links_only - Only gather internal (same-site) links
382
386
  -q, --query, --filter=arg - Filter output using dot-syntax path (default: none)
383
387
  -x, --external_links_only - Only gather external links
384
388
  ```
data/bin/curlyq CHANGED
@@ -103,6 +103,9 @@ command %i[html curl] do |c|
103
103
  c.desc 'Only gather external links'
104
104
  c.switch %i[x external_links_only], default_value: false, negatable: false
105
105
 
106
+ c.desc 'Only gather internal (same-site) links'
107
+ c.switch %i[l local_links_only], default_value: false, negatable: false
108
+
106
109
  c.action do |global_options, options, args|
107
110
  urls = args.join(' ').split(/[, ]+/)
108
111
  headers = break_headers(options[:header])
@@ -115,7 +118,8 @@ command %i[html curl] do |c|
115
118
  compressed: options[:compressed], clean: options[:clean],
116
119
  ignore_local_links: options[:ignore_relative],
117
120
  ignore_fragment_links: options[:ignore_fragments],
118
- external_links_only: options[:external_links_only] }
121
+ external_links_only: options[:external_links_only],
122
+ local_links_only: options[:local_links_only] }
119
123
  res = Curl::Html.new(url, curl_settings)
120
124
  res.curl
121
125
 
@@ -417,6 +421,9 @@ command :links do |c|
417
421
  c.desc 'Only gather external links'
418
422
  c.switch %i[x external_links_only], default_value: false, negatable: false
419
423
 
424
+ c.desc 'Only gather internal (same-site) links'
425
+ c.switch %i[l local_links_only], default_value: false, negatable: false
426
+
420
427
  c.desc 'Filter output using dot-syntax path'
421
428
  c.flag %i[q query filter]
422
429
 
@@ -433,7 +440,8 @@ command :links do |c|
433
440
  compressed: options[:compressed], clean: options[:clean],
434
441
  ignore_local_links: options[:ignore_relative],
435
442
  ignore_fragment_links: options[:ignore_fragments],
436
- external_links_only: options[:external_links_only]
443
+ external_links_only: options[:external_links_only],
444
+ local_links_only: options[:local_links_only]
437
445
  })
438
446
  res.curl
439
447
 
@@ -11,7 +11,7 @@ module Curl
11
11
  # Class for CURLing an HTML page
12
12
  class Html
13
13
  attr_accessor :settings, :browser, :source, :headers, :headers_only, :compressed, :clean, :fallback,
14
- :ignore_local_links, :ignore_fragment_links, :external_links_only
14
+ :ignore_local_links, :ignore_fragment_links, :external_links_only, :local_links_only
15
15
 
16
16
  attr_reader :url, :code, :meta, :links, :head, :body,
17
17
  :title, :description, :body_links, :body_images
@@ -69,6 +69,7 @@ module Curl
69
69
  @ignore_local_links = options[:ignore_local_links]
70
70
  @ignore_fragment_links = options[:ignore_fragment_links]
71
71
  @external_links_only = options[:external_links_only]
72
+ @local_links_only = options[:local_links_only]
72
73
 
73
74
  @curl = TTY::Which.which('curl')
74
75
  @url = url.nil? ? options[:url] : url
@@ -490,11 +491,19 @@ module Curl
490
491
 
491
492
  link_href = link_href[2]
492
493
 
493
- next if link_href =~ /^#/ && (@ignore_fragment_links || @external_links_only)
494
+ if @local_links_only
495
+ next if @ignore_fragment_links && link_href =~ /^#/
494
496
 
495
- next if link_href !~ %r{^(\w+:)?//} && (@ignore_local_links || @external_links_only)
497
+ next unless same_origin?(link_href)
496
498
 
497
- next if same_origin?(link_href) && @external_links_only
499
+ else
500
+ next if link_href =~ /^#/ && (@ignore_fragment_links || @external_links_only)
501
+
502
+ next if link_href !~ %r{^(\w+:)?//} && (@ignore_local_links || @external_links_only)
503
+
504
+ next if same_origin?(link_href) && @external_links_only
505
+
506
+ end
498
507
 
499
508
  link_title = tag.match(/title=(['"])(.*?)\1/)
500
509
  link_title = link_title.nil? ? nil : link_title[2]
@@ -522,11 +531,19 @@ module Curl
522
531
  link_tags.each do |m|
523
532
  href = m['tag'].match(/href=(["'])(.*?)\1/)
524
533
  href = href[2] unless href.nil?
525
- next if href =~ /^#/ && (@ignore_fragment_links || @external_links_only)
534
+ if @local_links_only
535
+ next if href =~ /^#/ && @ignore_fragment_links
526
536
 
527
- next if href !~ %r{^(\w+:)?//} && (@ignore_local_links || @external_links_only)
537
+ next unless same_origin?(href)
528
538
 
529
- next if same_origin?(href) && @external_links_only
539
+ else
540
+ next if href =~ /^#/ && (@ignore_fragment_links || @external_links_only)
541
+
542
+ next if href !~ %r{^(\w+:)?//} && (@ignore_local_links || @external_links_only)
543
+
544
+ next if same_origin?(href) && @external_links_only
545
+
546
+ end
530
547
 
531
548
  title = m['tag'].match(/title=(["'])(.*?)\1/)
532
549
  title = title[2] unless title.nil?
data/lib/curly/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # Top level module for CurlyQ
2
2
  module Curly
3
3
  # Current version number
4
- VERSION = '0.0.10'
4
+ VERSION = '0.0.11'
5
5
  end
data/src/_README.md CHANGED
@@ -13,7 +13,7 @@ _If you find this useful, feel free to [buy me some coffee][donate]._
13
13
  [jq]: https://github.com/jqlang/jq "Command-line JSON processor"
14
14
  [yq]: https://github.com/mikefarah/yq "yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor"
15
15
 
16
- The current version of `curlyq` is <!--VER-->0.0.9<!--END VER-->.
16
+ The current version of `curlyq` is <!--VER-->0.0.10<!--END VER-->.
17
17
 
18
18
  CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like [jq] to parse the output.
19
19
 
@@ -68,11 +68,13 @@ Comparisons can be numeric or string comparisons. A numeric comparison like `cur
68
68
 
69
69
  You can also use dot syntax inside of comparisons, e.g. `[links.rel*=me]` to target the links object (`html` command), and return only the links with a `rel=me` attribute. If the comparison is to an array object (like `class` or `rel`), it will match if any of the elements of the array match your comparison.
70
70
 
71
- If you end the query with a specific key, only that key will be output. If there's only one match, it will be output as a raw string. If there are multiple matches, output will be an array:
71
+ If you end the query with a specific key, only that key will be output, but it will be in an array. If there's only one match, it will be output as a raw string as a single element in an array.
72
72
 
73
73
  curlyq tags --search '#main .post h3' -q '[attrs.id*=what].source' 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/'
74
74
 
75
- <h3 id="whats-next">What’s Next</h3>
75
+ [
76
+ "<h3 id=\"whats-next\">What’s Next</h3>"
77
+ ]
76
78
 
77
79
  #### Commands
78
80
 
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: curlyq
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.10
4
+ version: 0.0.11
5
5
  platform: ruby
6
6
  authors:
7
7
  - Brett Terpstra
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2024-01-17 00:00:00.000000000 Z
11
+ date: 2024-01-21 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rake