metainspector 5.16.0 → 5.17.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8311486a8156f619d20a7cc93283e57ae055dd9fdcb222e14d3900adfae6d6a2
4
- data.tar.gz: 9f1288adf02bc224d5d5e6915a0c24aaab342cabe3d6a85e07989f553966324a
3
+ metadata.gz: 293357cd2d7638ce799d31024e708ba8b82ae8c6bb376304ab8b400aa959be05
4
+ data.tar.gz: a358b95df5bb28c81325d8f48ff6dfff85b423054ddd25195a91d3906e04c409
5
5
  SHA512:
6
- metadata.gz: 0ef843e07a8af813a4ed27f18b53eab03f5bfeaa60af5014fb3474346f3d02081891b94828ea034ffd7c5734f40c4fcb0e4028a2ccfe682152408d8f69820aec
7
- data.tar.gz: c8da85989e7c11ad7bccff3d9f53c94daf75962492c4fa0d54550db2a02bee7928747b36354c89d47f0875c459310ea404bea9154f7e3cbd414b489fb45110e3
6
+ metadata.gz: 113e3d9d929b7e9df03b107ba40a5b813ce9732510ccf858764048a607074cee10208a50686f0f1b3efb952de1185dff5b1739d41d8a278ec8097de010677137
7
+ data.tar.gz: 8c090ead5bb14f40a2c6c867da1e9178d6cc8665bdae7853395c28ed9ddee3de235880bc9727622964afab3ba24408c67a0a68e6b7c082ebb67f88c6a6eae42e
data/.circleci/config.yml CHANGED
@@ -2,27 +2,36 @@ version: 2.1
2
2
  orbs:
3
3
  ruby: circleci/ruby@1.0.4
4
4
  jobs:
5
- test_3_1:
5
+ test_3_2:
6
6
  docker:
7
- - image: cimg/ruby:3.1.7
7
+ - image: cimg/ruby:3.2.9
8
8
  steps:
9
9
  - checkout
10
10
  - ruby/install-deps
11
11
  - run:
12
12
  name: Run tests
13
13
  command: bundle exec rake
14
- test_3_2:
14
+ test_3_3:
15
15
  docker:
16
- - image: cimg/ruby:3.2.6
16
+ - image: cimg/ruby:3.3.10
17
17
  steps:
18
18
  - checkout
19
19
  - ruby/install-deps
20
20
  - run:
21
21
  name: Run tests
22
22
  command: bundle exec rake
23
- test_3_3:
23
+ test_3_4:
24
+ docker:
25
+ - image: cimg/ruby:3.4.8
26
+ steps:
27
+ - checkout
28
+ - ruby/install-deps
29
+ - run:
30
+ name: Run tests
31
+ command: bundle exec rake
32
+ test_4_0:
24
33
  docker:
25
- - image: cimg/ruby:3.3.8
34
+ - image: cimg/ruby:4.0.0
26
35
  steps:
27
36
  - checkout
28
37
  - ruby/install-deps
@@ -33,6 +42,7 @@ workflows:
33
42
  version: 2
34
43
  deploy:
35
44
  jobs:
36
- - test_3_1
37
45
  - test_3_2
38
46
  - test_3_3
47
+ - test_3_4
48
+ - test_4_0
data/.gitignore CHANGED
@@ -9,3 +9,4 @@ pkg/*
9
9
  .rubocop_todo.yml
10
10
  .rubocop.yml
11
11
  .tool-versions
12
+ Gemfile.lock
data/README.md CHANGED
@@ -21,24 +21,65 @@ gem 'metainspector'
21
21
 
22
22
  Supported Ruby versions are defined in [`.circleci/config.yml`](.circleci/config.yml).
23
23
 
24
+ ## Supporting MetaInspector
25
+
26
+ MetaInspector has been downloaded more than 3 million times via RubyGems.org and has more than 1K stars on Github.
27
+
28
+ If you're using MetaInspector, it would be much appreciated if you can give back to the project in order to help ensure its continued development.
29
+
30
+ ### Gold Sponsors
31
+
32
+ <div style="display: flex; gap: 10px;">
33
+
34
+ <a href="https://github.com/sponsors/jaimeiniesta/sponsorships?sponsor=jaimeiniesta&tier_id=507665">
35
+ <img height="175" src="assets/sponsor_logo_placeholder.gif" alt="Become a Gold Sponsor">
36
+ </a>
37
+
38
+ </div>
39
+
40
+ ### Silver Sponsors
41
+
42
+ <div style="display: flex; gap: 10px;">
43
+
44
+ <a href="https://rocketvalidator.com">
45
+ <img height="150" src="assets/sponsor_logo_rocket_validator.png" alt="Rocket Validator - accessibility and HTML site-wide validator">
46
+ </a>
47
+
48
+ <a href="https://github.com/sponsors/jaimeiniesta/sponsorships?sponsor=jaimeiniesta&tier_id=63404">
49
+ <img height="150" src="assets/sponsor_logo_placeholder.gif" alt="Become a Silver Sponsor">
50
+ </a>
51
+
52
+ </div>
53
+
54
+ ### Bronze Sponsors
55
+
56
+ <div style="display: flex; gap: 10px;">
57
+
58
+ <a href="https://github.com/sponsors/jaimeiniesta/sponsorships?sponsor=jaimeiniesta&tier_id=63403">
59
+ <img height="125" src="assets/sponsor_logo_placeholder.gif" alt="Become a Bronze Sponsor">
60
+ </a>
61
+
62
+ </div>
63
+
64
+
24
65
  ## Usage
25
66
 
26
67
  Initialize a MetaInspector instance for an URL, like this:
27
68
 
28
69
  ```ruby
29
- page = MetaInspector.new('http://sitevalidator.com')
70
+ page = MetaInspector.new('https://github.com')
30
71
  ```
31
72
 
32
73
  If you don't include the scheme on the URL, http:// will be used by default:
33
74
 
34
75
  ```ruby
35
- page = MetaInspector.new('sitevalidator.com')
76
+ page = MetaInspector.new('github.com')
36
77
  ```
37
78
 
38
79
  You can also include the html which will be used as the document to scrape:
39
80
 
40
81
  ```ruby
41
- page = MetaInspector.new("http://sitevalidator.com",
82
+ page = MetaInspector.new("https://github.com",
42
83
  :document => "<html>...</html>")
43
84
  ```
44
85
 
@@ -62,8 +103,8 @@ page.tracked? # returns true if the url contains known tracking param
62
103
  page.untracked_url # returns the url with the known tracking parameters removed
63
104
  page.untrack! # removes the known tracking parameters from the url
64
105
  page.scheme # Scheme of the page (http, https)
65
- page.host # Hostname of the page (like, sitevalidator.com, without the scheme)
66
- page.root_url # Root url (scheme + host, like http://sitevalidator.com/)
106
+ page.host # Hostname of the page (like, github.com, without the scheme)
107
+ page.root_url # Root url (scheme + host, like https://github.com/)
67
108
  ```
68
109
 
69
110
  ### Head links
@@ -236,8 +277,8 @@ page.content_type # content-type returned by the server when the url was
236
277
  You can also access most of the scraped data as a hash:
237
278
 
238
279
  ```ruby
239
- page.to_hash # { "url" => "http://sitevalidator.com",
240
- "title" => "MarkupValidator :: site-wide markup validation tool", ... }
280
+ page.to_hash # { "url" => "https://github.com",
281
+ "title" => "GitHub", ... }
241
282
  ```
242
283
 
243
284
  The original document is accessible from:
@@ -366,7 +407,7 @@ MetaInspector.new('https://example.com', faraday_options: { ssl: { verify: false
366
407
  MetaInspector will by default raise an exception when trying to parse a non-HTML URL (one that has a content-type different than text/html). You can disable this behaviour with:
367
408
 
368
409
  ```ruby
369
- page = MetaInspector.new('sitevalidator.com', :allow_non_html_content => true)
410
+ page = MetaInspector.new('github.com', :allow_non_html_content => true)
370
411
  ```
371
412
 
372
413
  ```ruby
@@ -385,8 +426,8 @@ By default, URLs are normalized using the Addressable gem. For example:
385
426
 
386
427
  ```ruby
387
428
  # Normalization will add a default scheme and a trailing slash...
388
- page = MetaInspector.new('sitevalidator.com')
389
- page.url # http://sitevalidator.com/
429
+ page = MetaInspector.new('github.com')
430
+ page.url # https://github.com/
390
431
 
391
432
  # ...and it will also convert international characters
392
433
  page = MetaInspector.new('http://www.詹姆斯.com')
@@ -434,23 +475,14 @@ $ irb
434
475
  >> require 'metainspector'
435
476
  => true
436
477
 
437
- >> page = MetaInspector.new('http://sitevalidator.com')
438
- => #<MetaInspector:0x11330c0 @url="http://sitevalidator.com">
478
+ >> page = MetaInspector.new('http://github.com')
479
+ => #<MetaInspector:0x11330c0 @url="http://github.com">
439
480
 
440
481
  >> page.title
441
- => "MarkupValidator :: site-wide markup validation tool"
482
+ => "GitHub"
442
483
 
443
484
  >> page.meta['description']
444
- => "Site-wide markup validation tool. Validate the markup of your whole site with just one click."
445
-
446
- >> page.meta['keywords']
447
- => "html, markup, validation, validator, tool, w3c, development, standards, free"
448
-
449
- >> page.links.size
450
- => 15
451
-
452
- >> page.links[4]
453
- => "/plans-and-pricing"
485
+ => "Join the most widely adopted, AI-powered developer platform where millions of developers, businesses, and the largest open source community build software that advances humanity."
454
486
  ```
455
487
 
456
488
  ## Contributing guidelines
Binary file
@@ -1,3 +1,3 @@
1
1
  module MetaInspector
2
- VERSION = '5.16.0'
2
+ VERSION = '5.17.0'
3
3
  end
@@ -14,12 +14,12 @@ Gem::Specification.new do |gem|
14
14
  gem.require_paths = ["lib"]
15
15
  gem.version = MetaInspector::VERSION
16
16
 
17
- gem.add_dependency 'nokogiri', '~> 1.18.8'
17
+ gem.add_dependency 'nokogiri', '~> 1.19.0'
18
18
  gem.add_dependency 'faraday', '~> 2.5'
19
19
  gem.add_dependency 'faraday-cookie_jar', '~> 0.0'
20
20
  gem.add_dependency 'faraday-encoding', '~> 0.0'
21
21
  gem.add_dependency 'faraday-follow_redirects', '~> 0.3'
22
- gem.add_dependency 'faraday-gzip', '>= 0.1', '< 3.0'
22
+ gem.add_dependency 'faraday-gzip', '>= 0.1', '< 4.0'
23
23
  gem.add_dependency 'faraday-http-cache', '~> 2.5'
24
24
  gem.add_dependency 'faraday-retry', '~> 2.0'
25
25
  gem.add_dependency 'addressable', '~> 2.8.4'
@@ -31,8 +31,8 @@ Gem::Specification.new do |gem|
31
31
  gem.add_development_dependency 'awesome_print', '~> 1.9'
32
32
  gem.add_development_dependency 'rake', '~> 13.0'
33
33
  gem.add_development_dependency 'pry', '~> 0.14'
34
- gem.add_development_dependency 'puma', '~> 6.4.0'
34
+ gem.add_development_dependency 'puma', '~> 7.0.2'
35
35
  gem.add_development_dependency 'rubocop', '~> 1.34'
36
- gem.add_development_dependency 'resolv', '~> 0.2.2'
37
- gem.add_development_dependency 'sinatra', '~> 3.0.6'
36
+ gem.add_development_dependency 'resolv', '~> 0.6.2'
37
+ gem.add_development_dependency 'sinatra', '~> 4.2.0'
38
38
  end
metadata CHANGED
@@ -1,14 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: metainspector
3
3
  version: !ruby/object:Gem::Version
4
- version: 5.16.0
4
+ version: 5.17.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jaime Iniesta
8
- autorequire:
9
8
  bindir: bin
10
9
  cert_chain: []
11
- date: 2025-05-15 00:00:00.000000000 Z
10
+ date: 1980-01-02 00:00:00.000000000 Z
12
11
  dependencies:
13
12
  - !ruby/object:Gem::Dependency
14
13
  name: nokogiri
@@ -16,14 +15,14 @@ dependencies:
16
15
  requirements:
17
16
  - - "~>"
18
17
  - !ruby/object:Gem::Version
19
- version: 1.18.8
18
+ version: 1.19.0
20
19
  type: :runtime
21
20
  prerelease: false
22
21
  version_requirements: !ruby/object:Gem::Requirement
23
22
  requirements:
24
23
  - - "~>"
25
24
  - !ruby/object:Gem::Version
26
- version: 1.18.8
25
+ version: 1.19.0
27
26
  - !ruby/object:Gem::Dependency
28
27
  name: faraday
29
28
  requirement: !ruby/object:Gem::Requirement
@@ -89,7 +88,7 @@ dependencies:
89
88
  version: '0.1'
90
89
  - - "<"
91
90
  - !ruby/object:Gem::Version
92
- version: '3.0'
91
+ version: '4.0'
93
92
  type: :runtime
94
93
  prerelease: false
95
94
  version_requirements: !ruby/object:Gem::Requirement
@@ -99,7 +98,7 @@ dependencies:
99
98
  version: '0.1'
100
99
  - - "<"
101
100
  - !ruby/object:Gem::Version
102
- version: '3.0'
101
+ version: '4.0'
103
102
  - !ruby/object:Gem::Dependency
104
103
  name: faraday-http-cache
105
104
  requirement: !ruby/object:Gem::Requirement
@@ -246,14 +245,14 @@ dependencies:
246
245
  requirements:
247
246
  - - "~>"
248
247
  - !ruby/object:Gem::Version
249
- version: 6.4.0
248
+ version: 7.0.2
250
249
  type: :development
251
250
  prerelease: false
252
251
  version_requirements: !ruby/object:Gem::Requirement
253
252
  requirements:
254
253
  - - "~>"
255
254
  - !ruby/object:Gem::Version
256
- version: 6.4.0
255
+ version: 7.0.2
257
256
  - !ruby/object:Gem::Dependency
258
257
  name: rubocop
259
258
  requirement: !ruby/object:Gem::Requirement
@@ -274,28 +273,28 @@ dependencies:
274
273
  requirements:
275
274
  - - "~>"
276
275
  - !ruby/object:Gem::Version
277
- version: 0.2.2
276
+ version: 0.6.2
278
277
  type: :development
279
278
  prerelease: false
280
279
  version_requirements: !ruby/object:Gem::Requirement
281
280
  requirements:
282
281
  - - "~>"
283
282
  - !ruby/object:Gem::Version
284
- version: 0.2.2
283
+ version: 0.6.2
285
284
  - !ruby/object:Gem::Dependency
286
285
  name: sinatra
287
286
  requirement: !ruby/object:Gem::Requirement
288
287
  requirements:
289
288
  - - "~>"
290
289
  - !ruby/object:Gem::Version
291
- version: 3.0.6
290
+ version: 4.2.0
292
291
  type: :development
293
292
  prerelease: false
294
293
  version_requirements: !ruby/object:Gem::Requirement
295
294
  requirements:
296
295
  - - "~>"
297
296
  - !ruby/object:Gem::Version
298
- version: 3.0.6
297
+ version: 4.2.0
299
298
  description: MetaInspector lets you scrape a web page and get its links, images, texts,
300
299
  meta tags...
301
300
  email: jaimeiniesta@gmail.com
@@ -311,10 +310,11 @@ files:
311
310
  - ".rubocop.yml.example"
312
311
  - CHANGELOG.md
313
312
  - Gemfile
314
- - Gemfile.lock
315
313
  - MIT-LICENSE
316
314
  - README.md
317
315
  - Rakefile
316
+ - assets/sponsor_logo_placeholder.gif
317
+ - assets/sponsor_logo_rocket_validator.png
318
318
  - bin/console
319
319
  - examples/basic_scraping.rb
320
320
  - examples/faraday_redirect_options.rb
@@ -417,7 +417,6 @@ homepage: https://github.com/jaimeiniesta/metainspector
417
417
  licenses:
418
418
  - MIT
419
419
  metadata: {}
420
- post_install_message:
421
420
  rdoc_options: []
422
421
  require_paths:
423
422
  - lib
@@ -432,8 +431,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
432
431
  - !ruby/object:Gem::Version
433
432
  version: '0'
434
433
  requirements: []
435
- rubygems_version: 3.5.22
436
- signing_key:
434
+ rubygems_version: 4.0.3
437
435
  specification_version: 4
438
436
  summary: MetaInspector is a ruby gem for web scraping purposes, that returns metadata
439
437
  from a given URL
data/Gemfile.lock DELETED
@@ -1,146 +0,0 @@
1
- PATH
2
- remote: .
3
- specs:
4
- metainspector (5.16.0)
5
- addressable (~> 2.8.4)
6
- faraday (~> 2.5)
7
- faraday-cookie_jar (~> 0.0)
8
- faraday-encoding (~> 0.0)
9
- faraday-follow_redirects (~> 0.3)
10
- faraday-gzip (>= 0.1, < 3.0)
11
- faraday-http-cache (~> 2.5)
12
- faraday-retry (~> 2.0)
13
- fastimage (~> 2.2)
14
- nesty (~> 1.0)
15
- nokogiri (~> 1.18.8)
16
-
17
- GEM
18
- remote: http://rubygems.org/
19
- specs:
20
- addressable (2.8.5)
21
- public_suffix (>= 2.0.2, < 6.0)
22
- ast (2.4.2)
23
- awesome_print (1.9.2)
24
- coderay (1.1.3)
25
- crack (0.4.5)
26
- rexml
27
- diff-lcs (1.5.0)
28
- domain_name (0.6.20240107)
29
- faraday (2.13.1)
30
- faraday-net_http (>= 2.0, < 3.5)
31
- json
32
- logger
33
- faraday-cookie_jar (0.0.7)
34
- faraday (>= 0.8.0)
35
- http-cookie (~> 1.0.0)
36
- faraday-encoding (0.0.6)
37
- faraday
38
- faraday-follow_redirects (0.3.0)
39
- faraday (>= 1, < 3)
40
- faraday-gzip (2.0.1)
41
- faraday (>= 1.0)
42
- zlib (~> 3.0)
43
- faraday-http-cache (2.5.1)
44
- faraday (>= 0.8)
45
- faraday-net_http (3.4.0)
46
- net-http (>= 0.5.0)
47
- faraday-retry (2.3.1)
48
- faraday (~> 2.0)
49
- fastimage (2.4.0)
50
- hashdiff (1.0.1)
51
- http-cookie (1.0.8)
52
- domain_name (~> 0.5)
53
- json (2.7.1)
54
- language_server-protocol (3.17.0.3)
55
- logger (1.7.0)
56
- method_source (1.0.0)
57
- mustermann (3.0.0)
58
- ruby2_keywords (~> 0.0.1)
59
- nesty (1.0.2)
60
- net-http (0.6.0)
61
- uri
62
- nio4r (2.5.9)
63
- nokogiri (1.18.8-arm64-darwin)
64
- racc (~> 1.4)
65
- nokogiri (1.18.8-x86_64-linux-gnu)
66
- racc (~> 1.4)
67
- parallel (1.24.0)
68
- parser (3.3.0.5)
69
- ast (~> 2.4.1)
70
- racc
71
- pry (0.14.2)
72
- coderay (~> 1.1)
73
- method_source (~> 1.0)
74
- public_suffix (5.0.3)
75
- puma (6.4.0)
76
- nio4r (~> 2.0)
77
- racc (1.8.1)
78
- rack (2.2.14)
79
- rack-protection (3.0.6)
80
- rack
81
- rainbow (3.1.1)
82
- rake (13.1.0)
83
- regexp_parser (2.9.0)
84
- resolv (0.2.2)
85
- rexml (3.2.6)
86
- rspec (3.12.0)
87
- rspec-core (~> 3.12.0)
88
- rspec-expectations (~> 3.12.0)
89
- rspec-mocks (~> 3.12.0)
90
- rspec-core (3.12.1)
91
- rspec-support (~> 3.12.0)
92
- rspec-expectations (3.12.2)
93
- diff-lcs (>= 1.2.0, < 2.0)
94
- rspec-support (~> 3.12.0)
95
- rspec-mocks (3.12.3)
96
- diff-lcs (>= 1.2.0, < 2.0)
97
- rspec-support (~> 3.12.0)
98
- rspec-support (3.12.0)
99
- rubocop (1.62.0)
100
- json (~> 2.3)
101
- language_server-protocol (>= 3.17.0)
102
- parallel (~> 1.10)
103
- parser (>= 3.3.0.2)
104
- rainbow (>= 2.2.2, < 4.0)
105
- regexp_parser (>= 1.8, < 3.0)
106
- rexml (>= 3.2.5, < 4.0)
107
- rubocop-ast (>= 1.31.1, < 2.0)
108
- ruby-progressbar (~> 1.7)
109
- unicode-display_width (>= 2.4.0, < 3.0)
110
- rubocop-ast (1.31.1)
111
- parser (>= 3.3.0.4)
112
- ruby-progressbar (1.13.0)
113
- ruby2_keywords (0.0.5)
114
- sinatra (3.0.6)
115
- mustermann (~> 3.0)
116
- rack (~> 2.2, >= 2.2.4)
117
- rack-protection (= 3.0.6)
118
- tilt (~> 2.0)
119
- tilt (2.1.0)
120
- unicode-display_width (2.5.0)
121
- uri (1.0.3)
122
- webmock (3.18.1)
123
- addressable (>= 2.8.0)
124
- crack (>= 0.3.2)
125
- hashdiff (>= 0.4.0, < 2.0.0)
126
- zlib (3.2.1)
127
-
128
- PLATFORMS
129
- arm64-darwin-22
130
- arm64-darwin-24
131
- x86_64-linux
132
-
133
- DEPENDENCIES
134
- awesome_print (~> 1.9)
135
- metainspector!
136
- pry (~> 0.14)
137
- puma (~> 6.4.0)
138
- rake (~> 13.0)
139
- resolv (~> 0.2.2)
140
- rspec (~> 3.11)
141
- rubocop (~> 1.34)
142
- sinatra (~> 3.0.6)
143
- webmock (~> 3.17)
144
-
145
- BUNDLED WITH
146
- 2.3.20