curlyq 0.0.10 → 0.0.11
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +8 -0
- data/Gemfile.lock +1 -1
- data/README.md +8 -4
- data/bin/curlyq +10 -2
- data/lib/curly/curl/html.rb +24 -7
- data/lib/curly/version.rb +1 -1
- data/src/_README.md +5 -3
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: a9b0847eb3dd79e15b96bed47858ad0eb0df2ba7db8cf2e3395cb9e08e71c194
|
4
|
+
data.tar.gz: '06623683ff93c02087432750a150ac663c4558b7d18323bbbb367e004abd58ab'
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 8b7098dde55f9b76a53eff1f71a5d821a2db6d5828fb67428f2aa3ef5d6ab8e2bdbb79f5375fb5291b965ff3d0b9677cf0084782c078c2bb5575a8383bd26906
|
7
|
+
data.tar.gz: c0b02267ea0de1c490b2c2dcd171f8a992fa659733aa9bd9e0dc590988af3d7c5f4b6e38e0371ce72c879a1f956ec7f8b87e8432e684d8f7dad4f019314fa834
|
data/CHANGELOG.md
CHANGED
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -13,7 +13,7 @@ _If you find this useful, feel free to [buy me some coffee][donate]._
|
|
13
13
|
[jq]: https://github.com/jqlang/jq "Command-line JSON processor"
|
14
14
|
[yq]: https://github.com/mikefarah/yq "yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor"
|
15
15
|
|
16
|
-
The current version of `curlyq` is 0.0.
|
16
|
+
The current version of `curlyq` is 0.0.11
|
17
17
|
.
|
18
18
|
|
19
19
|
CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like [jq] to parse the output.
|
@@ -47,7 +47,7 @@ SYNOPSIS
|
|
47
47
|
curlyq [global options] command [command options] [arguments...]
|
48
48
|
|
49
49
|
VERSION
|
50
|
-
0.0.
|
50
|
+
0.0.11
|
51
51
|
|
52
52
|
GLOBAL OPTIONS
|
53
53
|
--help - Show this message
|
@@ -94,11 +94,13 @@ Comparisons can be numeric or string comparisons. A numeric comparison like `cur
|
|
94
94
|
|
95
95
|
You can also use dot syntax inside of comparisons, e.g. `[links.rel*=me]` to target the links object (`html` command), and return only the links with a `rel=me` attribute. If the comparison is to an array object (like `class` or `rel`), it will match if any of the elements of the array match your comparison.
|
96
96
|
|
97
|
-
If you end the query with a specific key, only that key will be output. If there's only one match, it will be output as a raw string
|
97
|
+
If you end the query with a specific key, only that key will be output, but it will be in an array. If there's only one match, it will be output as a raw string as a single element in an array.
|
98
98
|
|
99
99
|
curlyq tags --search '#main .post h3' -q '[attrs.id*=what].source' 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/'
|
100
100
|
|
101
|
-
|
101
|
+
[
|
102
|
+
"<h3 id=\"whats-next\">What???s Next</h3>"
|
103
|
+
]
|
102
104
|
|
103
105
|
#### Commands
|
104
106
|
|
@@ -237,6 +239,7 @@ COMMAND OPTIONS
|
|
237
239
|
-h, --header=arg - Define a header to send as "key=value" (may be used more than once, default: none)
|
238
240
|
--[no-]ignore_fragments - Ignore fragment hrefs when gathering content links
|
239
241
|
--[no-]ignore_relative - Ignore relative hrefs when gathering content links
|
242
|
+
-l, --local_links_only - Only gather internal (same-site) links
|
240
243
|
-q, --query, --filter=arg - Filter output using dot-syntax path (default: none)
|
241
244
|
-r, --raw=arg - Output a raw value for a key (default: none)
|
242
245
|
-s, --search=arg - Regurn an array of matches to a CSS or XPath query (default: none)
|
@@ -379,6 +382,7 @@ COMMAND OPTIONS
|
|
379
382
|
-d, --[no-]dedup - Filter out duplicate links, preserving only first one
|
380
383
|
--[no-]ignore_fragments - Ignore fragment hrefs when gathering content links
|
381
384
|
--[no-]ignore_relative - Ignore relative hrefs when gathering content links
|
385
|
+
-l, --local_links_only - Only gather internal (same-site) links
|
382
386
|
-q, --query, --filter=arg - Filter output using dot-syntax path (default: none)
|
383
387
|
-x, --external_links_only - Only gather external links
|
384
388
|
```
|
data/bin/curlyq
CHANGED
@@ -103,6 +103,9 @@ command %i[html curl] do |c|
|
|
103
103
|
c.desc 'Only gather external links'
|
104
104
|
c.switch %i[x external_links_only], default_value: false, negatable: false
|
105
105
|
|
106
|
+
c.desc 'Only gather internal (same-site) links'
|
107
|
+
c.switch %i[l local_links_only], default_value: false, negatable: false
|
108
|
+
|
106
109
|
c.action do |global_options, options, args|
|
107
110
|
urls = args.join(' ').split(/[, ]+/)
|
108
111
|
headers = break_headers(options[:header])
|
@@ -115,7 +118,8 @@ command %i[html curl] do |c|
|
|
115
118
|
compressed: options[:compressed], clean: options[:clean],
|
116
119
|
ignore_local_links: options[:ignore_relative],
|
117
120
|
ignore_fragment_links: options[:ignore_fragments],
|
118
|
-
external_links_only: options[:external_links_only]
|
121
|
+
external_links_only: options[:external_links_only],
|
122
|
+
local_links_only: options[:local_links_only] }
|
119
123
|
res = Curl::Html.new(url, curl_settings)
|
120
124
|
res.curl
|
121
125
|
|
@@ -417,6 +421,9 @@ command :links do |c|
|
|
417
421
|
c.desc 'Only gather external links'
|
418
422
|
c.switch %i[x external_links_only], default_value: false, negatable: false
|
419
423
|
|
424
|
+
c.desc 'Only gather internal (same-site) links'
|
425
|
+
c.switch %i[l local_links_only], default_value: false, negatable: false
|
426
|
+
|
420
427
|
c.desc 'Filter output using dot-syntax path'
|
421
428
|
c.flag %i[q query filter]
|
422
429
|
|
@@ -433,7 +440,8 @@ command :links do |c|
|
|
433
440
|
compressed: options[:compressed], clean: options[:clean],
|
434
441
|
ignore_local_links: options[:ignore_relative],
|
435
442
|
ignore_fragment_links: options[:ignore_fragments],
|
436
|
-
external_links_only: options[:external_links_only]
|
443
|
+
external_links_only: options[:external_links_only],
|
444
|
+
local_links_only: options[:local_links_only]
|
437
445
|
})
|
438
446
|
res.curl
|
439
447
|
|
data/lib/curly/curl/html.rb
CHANGED
@@ -11,7 +11,7 @@ module Curl
|
|
11
11
|
# Class for CURLing an HTML page
|
12
12
|
class Html
|
13
13
|
attr_accessor :settings, :browser, :source, :headers, :headers_only, :compressed, :clean, :fallback,
|
14
|
-
:ignore_local_links, :ignore_fragment_links, :external_links_only
|
14
|
+
:ignore_local_links, :ignore_fragment_links, :external_links_only, :local_links_only
|
15
15
|
|
16
16
|
attr_reader :url, :code, :meta, :links, :head, :body,
|
17
17
|
:title, :description, :body_links, :body_images
|
@@ -69,6 +69,7 @@ module Curl
|
|
69
69
|
@ignore_local_links = options[:ignore_local_links]
|
70
70
|
@ignore_fragment_links = options[:ignore_fragment_links]
|
71
71
|
@external_links_only = options[:external_links_only]
|
72
|
+
@local_links_only = options[:local_links_only]
|
72
73
|
|
73
74
|
@curl = TTY::Which.which('curl')
|
74
75
|
@url = url.nil? ? options[:url] : url
|
@@ -490,11 +491,19 @@ module Curl
|
|
490
491
|
|
491
492
|
link_href = link_href[2]
|
492
493
|
|
493
|
-
|
494
|
+
if @local_links_only
|
495
|
+
next if @ignore_fragment_links && link_href =~ /^#/
|
494
496
|
|
495
|
-
|
497
|
+
next unless same_origin?(link_href)
|
496
498
|
|
497
|
-
|
499
|
+
else
|
500
|
+
next if link_href =~ /^#/ && (@ignore_fragment_links || @external_links_only)
|
501
|
+
|
502
|
+
next if link_href !~ %r{^(\w+:)?//} && (@ignore_local_links || @external_links_only)
|
503
|
+
|
504
|
+
next if same_origin?(link_href) && @external_links_only
|
505
|
+
|
506
|
+
end
|
498
507
|
|
499
508
|
link_title = tag.match(/title=(['"])(.*?)\1/)
|
500
509
|
link_title = link_title.nil? ? nil : link_title[2]
|
@@ -522,11 +531,19 @@ module Curl
|
|
522
531
|
link_tags.each do |m|
|
523
532
|
href = m['tag'].match(/href=(["'])(.*?)\1/)
|
524
533
|
href = href[2] unless href.nil?
|
525
|
-
|
534
|
+
if @local_links_only
|
535
|
+
next if href =~ /^#/ && @ignore_fragment_links
|
526
536
|
|
527
|
-
|
537
|
+
next unless same_origin?(href)
|
528
538
|
|
529
|
-
|
539
|
+
else
|
540
|
+
next if href =~ /^#/ && (@ignore_fragment_links || @external_links_only)
|
541
|
+
|
542
|
+
next if href !~ %r{^(\w+:)?//} && (@ignore_local_links || @external_links_only)
|
543
|
+
|
544
|
+
next if same_origin?(href) && @external_links_only
|
545
|
+
|
546
|
+
end
|
530
547
|
|
531
548
|
title = m['tag'].match(/title=(["'])(.*?)\1/)
|
532
549
|
title = title[2] unless title.nil?
|
data/lib/curly/version.rb
CHANGED
data/src/_README.md
CHANGED
@@ -13,7 +13,7 @@ _If you find this useful, feel free to [buy me some coffee][donate]._
|
|
13
13
|
[jq]: https://github.com/jqlang/jq "Command-line JSON processor"
|
14
14
|
[yq]: https://github.com/mikefarah/yq "yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor"
|
15
15
|
|
16
|
-
The current version of `curlyq` is <!--VER-->0.0.
|
16
|
+
The current version of `curlyq` is <!--VER-->0.0.10<!--END VER-->.
|
17
17
|
|
18
18
|
CurlyQ is a utility that provides a simple interface for curl, with additional features for things like extracting images and links, finding elements by CSS selector or XPath, getting detailed header info, and more. It's designed to be part of a scripting pipeline, outputting everything as structured data (JSON or YAML). It also has rudimentary support for making calls to JSON endpoints easier, but it's expected that you'll use something like [jq] to parse the output.
|
19
19
|
|
@@ -68,11 +68,13 @@ Comparisons can be numeric or string comparisons. A numeric comparison like `cur
|
|
68
68
|
|
69
69
|
You can also use dot syntax inside of comparisons, e.g. `[links.rel*=me]` to target the links object (`html` command), and return only the links with a `rel=me` attribute. If the comparison is to an array object (like `class` or `rel`), it will match if any of the elements of the array match your comparison.
|
70
70
|
|
71
|
-
If you end the query with a specific key, only that key will be output. If there's only one match, it will be output as a raw string
|
71
|
+
If you end the query with a specific key, only that key will be output, but it will be in an array. If there's only one match, it will be output as a raw string as a single element in an array.
|
72
72
|
|
73
73
|
curlyq tags --search '#main .post h3' -q '[attrs.id*=what].source' 'https://brettterpstra.com/2024/01/10/introducing-curlyq-a-pipeline-oriented-curl-helper/'
|
74
74
|
|
75
|
-
|
75
|
+
[
|
76
|
+
"<h3 id=\"whats-next\">What’s Next</h3>"
|
77
|
+
]
|
76
78
|
|
77
79
|
#### Commands
|
78
80
|
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: curlyq
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.11
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Brett Terpstra
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2024-01-
|
11
|
+
date: 2024-01-21 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rake
|