uri_resolver 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 0b5218486467a58c9767f9416999e04e3a9c3a7a
4
+ data.tar.gz: 0a2747914812689af0cf8d49f3a8ab3d1e8dc1fe
5
+ SHA512:
6
+ metadata.gz: 4744dfb882f5cf5dd1603922bdd786d5ec6424bd55ffb2f5abf4c66af2278bb02f897dd7a64d721187c67cf45f36b4ff3d0fcff3760685f79891c23d30f0875d
7
+ data.tar.gz: 65c1e5fc818c305d4f243144cd99ba3506aa553be60345caa1312e85671212efef99f5a5ff31d1207aa283308c9d0a076bbba86f63475b689da0cfed36fe77a2
@@ -0,0 +1,9 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ --format documentation
2
+ --color
@@ -0,0 +1,4 @@
1
+ language: ruby
2
+ rvm:
3
+ - 2.3.0
4
+ before_install: gem install bundler -v 1.11.2
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in uri_resolver.gemspec
4
+ gemspec
@@ -0,0 +1,74 @@
1
+ # UriResolver
2
+
3
+ Checking whether a URI resolves manually is easy: You simply enter it into your
4
+ browser and wait to see what happens!
5
+
6
+ Scripting this is normally quite easy, too. Some things you might try are:
7
+
8
+ * Check if the URI is a valid format, i.e. does it match `URI::regexp`?
9
+ * Ping the URI, and check for a response.
10
+ * Try using `URI.parse` or `Net::HTTP.get`, and rescue if `Errno::ECONNREFUSED`?
11
+
12
+ For *most* URIs a simple method like the above works fine.
13
+
14
+ However, for many "obscure" websites, such as those registered in new GTLDs, you
15
+ run into all sorts of trouble attempting this - especially when checking a very
16
+ long list! For example:
17
+
18
+ * Even a simple `ping` (i.e. opening a TCP connection) can *freeze* while
19
+ connecting to the DNS server - which causes it to take ~20 seconds to check
20
+ just one URI!!
21
+ * Some URIs resolve via many layers of redirections.
22
+ * Some URIs resolve, but to "invalid" URIs which contain escape sequences.
23
+ * The URI may resolve to a HTTPS connection with no/a misconfigured SSL certificate.
24
+ * The URI may connect, but fails/times out when attempting to *read* any data.
25
+
26
+ This ruby gem attempts to gracefully account for all of these edge cases (and more),
27
+ by use of intelligent error handling. It also uses a multi threaded approach to prevent
28
+ system-blocking timeouts.
29
+
30
+ It's a pretty simplistic solution, but will hopefully be useful to others.
31
+
32
+ ## Installation
33
+
34
+ Feel free to use this as you like, but I'm currently experimenting with it;
35
+ implementation may change significantly, until `v1.0.0` is published.
36
+
37
+
38
+ Add this line to your application's Gemfile:
39
+
40
+ ```ruby
41
+ gem 'uri_resolver'
42
+ ```
43
+
44
+ And then execute:
45
+
46
+ $ bundle
47
+
48
+ ## Usage
49
+
50
+ UriResolver.resolve_status "google.com" # => :resoves
51
+ UriResolver.resolve_status "sakjflkdjsfh.com" # => :does_not_resolve
52
+
53
+ # If the connection times out, then the gem returns :maybe_resolves
54
+ UriResolver.resolve_status "getmintedpoker.tv" # => :maybe_resolves
55
+ # Such URIs *probably* don't resolve, but a manual check may be a good idea
56
+
57
+ Warning: This is *not perfect*; you can still get some false negatives. For example:
58
+
59
+ # Intermittant and very slow... This often times out, but sometimes does resolve!
60
+ UriResolver.resolve_status "bet-and-win.gr" # => :maybe_resolves
61
+
62
+ # This IS a real website, but (currently) has "Bandwidth Limit Exceeded" error:
63
+ UriResolver.resolve_status "notarealwebsite.com" # => :does_not_resolve
64
+
65
+ ## Development
66
+
67
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
68
+
69
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
70
+
71
+ ## Contributing
72
+
73
+ Bug reports and pull requests are welcome on GitHub at https://github.com/tom-lord/uri_resolver.
74
+
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "uri_resolver"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,84 @@
1
+ require "uri_resolver/version"
2
+ require 'httparty'
3
+
4
+ module UriResolver
5
+ module Status
6
+ # Closest thing ruby has to an Enum....
7
+ Resolves = :resolves
8
+ MaybeResolves = :maybe_resolves
9
+ DoesNotResolve = :does_not_resolve
10
+ end
11
+
12
+ ConnectionFailed = Class.new(StandardError)
13
+ include HTTParty
14
+ default_timeout 1 # HTTParty
15
+
16
+ def self.resolve_status(uri)
17
+ # TODO: I used to use code like this, when the below was inside a class:
18
+ #@@responses[uri] ||= try_get_uri(uri) # Might raise error, if DNS error/timeout
19
+ # Is it safe/sensible to memoize the reponses here, for performance?
20
+
21
+ response = try_get_uri(uri) # Might raise error, if DNS error/timeout
22
+ case
23
+ # TODO: .dev domains behave differently on Ubuntu vs Windows!!
24
+ when response.code != 200 || response.headers['server'] == "Apache/2.4.7 (Ubuntu)"
25
+ Status::DoesNotResolve
26
+ else
27
+ # TODO: Is it posible to analyse anything further?
28
+ # E.g. does the web page redirect? Does it contain certain strings? Is it at least a minimum size? ....
29
+ Status::Resolves
30
+ end
31
+
32
+ rescue SocketError # from TCPSocket#new
33
+ Status::DoesNotResolve
34
+ rescue Net::OpenTimeout # from HTTParty#get
35
+ Status::DoesNotResolve
36
+ rescue Net::ReadTimeout # from HTTParty#get
37
+ Status::DoesNotResolve
38
+ rescue Errno::ECONNREFUSED # From HTTParty#get
39
+ Status::DoesNotResolve
40
+ rescue Errno::ECONNRESET # From HTTParty#get
41
+ Status::DoesNotResolve
42
+ rescue ConnectionFailed # From TCPSocket#new or HTTParty#get -- something funny happened, check manually!
43
+ Status::MaybeResolves
44
+ rescue OpenSSL::SSL::SSLError # Resolves, but SSL certificate is not set up correctly
45
+ Status::Resolves
46
+ rescue URI::InvalidURIError # Resolves, but to weird URL (contains escape sequences??)
47
+ Status::Resolves
48
+ rescue HTTParty::RedirectionTooDeep # Resolves, but with lots of redirection!
49
+ Status::Resolves
50
+ rescue StandardError => e # Something else happened??!!
51
+ warn "URI #{uri} did not resolve, for unknown reason:"
52
+ warn e.message
53
+ Status::MaybeResolves
54
+ end
55
+
56
+ private
57
+
58
+ def self.try_get_uri(uri, timeout = 1)
59
+ thread_with_timeout(timeout) { TCPSocket.new(uri, 80) }
60
+ # TODO: Do not prepend with "http://" if inappropriate
61
+ thread_with_timeout(timeout*5) { HTTParty.get("http://#{uri}") } # This can take longer...
62
+ end
63
+
64
+ def self.thread_with_timeout(timeout)
65
+ th = Thread.new { yield }
66
+ th.priority = 2 # Make sure this runs immediately
67
+
68
+ # Do not use Timeout::timeout, as this does not play nicely with multi-threading
69
+ start = Time.now
70
+ loop do
71
+ if th.alive?
72
+ if (Time.now - start > timeout)
73
+ Thread.kill(th)
74
+ raise ConnectionFailed
75
+ end
76
+ else
77
+ return th.value
78
+ end
79
+ sleep 0.1 # Prevent system hammering
80
+ end
81
+ end
82
+
83
+ end
84
+
@@ -0,0 +1,3 @@
1
+ module UriResolver
2
+ VERSION = "0.1.0"
3
+ end
@@ -0,0 +1,25 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'uri_resolver/version'
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "uri_resolver"
8
+ spec.version = UriResolver::VERSION
9
+ spec.authors = ["Tom Lord"]
10
+ spec.email = ["tom.lord@comlaude.com"]
11
+
12
+ spec.summary = %q{Like "ping", but better -- especially for new GTLDs}
13
+ spec.homepage = "https://github.com/tom-lord/uri_resolver"
14
+
15
+ spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
16
+ spec.bindir = "exe"
17
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
18
+ spec.require_paths = ["lib"]
19
+
20
+ spec.add_development_dependency "bundler", "~> 1.11"
21
+ spec.add_development_dependency "rake", "~> 10.0"
22
+ spec.add_development_dependency "rspec", "~> 3.0"
23
+
24
+ spec.add_runtime_dependency 'httparty'
25
+ end
metadata ADDED
@@ -0,0 +1,110 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: uri_resolver
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Tom Lord
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2016-04-06 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: bundler
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.11'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.11'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rake
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '10.0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '10.0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rspec
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '3.0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '3.0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: httparty
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ type: :runtime
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: '0'
69
+ description:
70
+ email:
71
+ - tom.lord@comlaude.com
72
+ executables: []
73
+ extensions: []
74
+ extra_rdoc_files: []
75
+ files:
76
+ - ".gitignore"
77
+ - ".rspec"
78
+ - ".travis.yml"
79
+ - Gemfile
80
+ - README.md
81
+ - Rakefile
82
+ - bin/console
83
+ - bin/setup
84
+ - lib/uri_resolver.rb
85
+ - lib/uri_resolver/version.rb
86
+ - uri_resolver.gemspec
87
+ homepage: https://github.com/tom-lord/uri_resolver
88
+ licenses: []
89
+ metadata: {}
90
+ post_install_message:
91
+ rdoc_options: []
92
+ require_paths:
93
+ - lib
94
+ required_ruby_version: !ruby/object:Gem::Requirement
95
+ requirements:
96
+ - - ">="
97
+ - !ruby/object:Gem::Version
98
+ version: '0'
99
+ required_rubygems_version: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - ">="
102
+ - !ruby/object:Gem::Version
103
+ version: '0'
104
+ requirements: []
105
+ rubyforge_project:
106
+ rubygems_version: 2.5.1
107
+ signing_key:
108
+ specification_version: 4
109
+ summary: Like "ping", but better -- especially for new GTLDs
110
+ test_files: []