uri_resolver 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 0b5218486467a58c9767f9416999e04e3a9c3a7a
4
+ data.tar.gz: 0a2747914812689af0cf8d49f3a8ab3d1e8dc1fe
5
+ SHA512:
6
+ metadata.gz: 4744dfb882f5cf5dd1603922bdd786d5ec6424bd55ffb2f5abf4c66af2278bb02f897dd7a64d721187c67cf45f36b4ff3d0fcff3760685f79891c23d30f0875d
7
+ data.tar.gz: 65c1e5fc818c305d4f243144cd99ba3506aa553be60345caa1312e85671212efef99f5a5ff31d1207aa283308c9d0a076bbba86f63475b689da0cfed36fe77a2
@@ -0,0 +1,9 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ --format documentation
2
+ --color
@@ -0,0 +1,4 @@
1
+ language: ruby
2
+ rvm:
3
+ - 2.3.0
4
+ before_install: gem install bundler -v 1.11.2
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in uri_resolver.gemspec
4
+ gemspec
@@ -0,0 +1,74 @@
1
+ # UriResolver
2
+
3
+ Checking whether a URI resolves manually is easy: You simply enter it into your
4
+ browser and wait to see what happens!
5
+
6
+ Scripting this is normally quite easy, too. Some things you might try are:
7
+
8
+ * Check if the URI is a valid format, i.e. does it match `URI::regexp`?
9
+ * Ping the URI, and check for a response.
10
+ * Try using `URI.parse` or `Net::HTTP.get`, and rescue if `Errno::ECONNREFUSED`?
11
+
12
+ For *most* URIs a simple method like the above works fine.
13
+
14
+ However, for many "obscure" websites, such as those registered in new GTLDs, you
15
+ run into all sorts of trouble attempting this - especially when checking a very
16
+ long list! For example:
17
+
18
+ * Even a simple `ping` (i.e. opening a TCP connection) can *freeze* while
19
+ connecting to the DNS server - which causes it to take ~20 seconds to check
20
+ just one URI!!
21
+ * Some URIs resolve via many layers of redirections.
22
+ * Some URIs resolve, but to "invalid" URIs which contain escape sequences.
23
+ * The URI may resolve to a HTTPS connection with no/a misconfigured SSL certificate.
24
+ * The URI may connect, but fails/times out when attempting to *read* any data.
25
+
26
+ This ruby gem attempts to gracefully account for all of these edge cases (and more),
27
+ by use of intelligent error handling. It also uses a multi threaded approach to prevent
28
+ system-blocking timeouts.
29
+
30
+ It's a pretty simplistic solution, but will hopefully be useful to others.
31
+
32
+ ## Installation
33
+
34
+ Feel free to use this as you like, but I'm currently experimenting with it;
35
+ implementation may change significantly, until `v1.0.0` is published.
36
+
37
+
38
+ Add this line to your application's Gemfile:
39
+
40
+ ```ruby
41
+ gem 'uri_resolver'
42
+ ```
43
+
44
+ And then execute:
45
+
46
+ $ bundle
47
+
48
+ ## Usage
49
+
50
+ UriResolver.resolve_status "google.com" # => :resoves
51
+ UriResolver.resolve_status "sakjflkdjsfh.com" # => :does_not_resolve
52
+
53
+ # If the connection times out, then the gem returns :maybe_resolves
54
+ UriResolver.resolve_status "getmintedpoker.tv" # => :maybe_resolves
55
+ # Such URIs *probably* don't resolve, but a manual check may be a good idea
56
+
57
+ Warning: This is *not perfect*; you can still get some false negatives. For example:
58
+
59
+ # Intermittant and very slow... This often times out, but sometimes does resolve!
60
+ UriResolver.resolve_status "bet-and-win.gr" # => :maybe_resolves
61
+
62
+ # This IS a real website, but (currently) has "Bandwidth Limit Exceeded" error:
63
+ UriResolver.resolve_status "notarealwebsite.com" # => :does_not_resolve
64
+
65
+ ## Development
66
+
67
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
68
+
69
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
70
+
71
+ ## Contributing
72
+
73
+ Bug reports and pull requests are welcome on GitHub at https://github.com/tom-lord/uri_resolver.
74
+
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "uri_resolver"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,84 @@
1
+ require "uri_resolver/version"
2
+ require 'httparty'
3
+
4
+ module UriResolver
5
+ module Status
6
+ # Closest thing ruby has to an Enum....
7
+ Resolves = :resolves
8
+ MaybeResolves = :maybe_resolves
9
+ DoesNotResolve = :does_not_resolve
10
+ end
11
+
12
+ ConnectionFailed = Class.new(StandardError)
13
+ include HTTParty
14
+ default_timeout 1 # HTTParty
15
+
16
+ def self.resolve_status(uri)
17
+ # TODO: I used to use code like this, when the below was inside a class:
18
+ #@@responses[uri] ||= try_get_uri(uri) # Might raise error, if DNS error/timeout
19
+ # Is it safe/sensible to memoize the reponses here, for performance?
20
+
21
+ response = try_get_uri(uri) # Might raise error, if DNS error/timeout
22
+ case
23
+ # TODO: .dev domains behave differently on Ubuntu vs Windows!!
24
+ when response.code != 200 || response.headers['server'] == "Apache/2.4.7 (Ubuntu)"
25
+ Status::DoesNotResolve
26
+ else
27
+ # TODO: Is it posible to analyse anything further?
28
+ # E.g. does the web page redirect? Does it contain certain strings? Is it at least a minimum size? ....
29
+ Status::Resolves
30
+ end
31
+
32
+ rescue SocketError # from TCPSocket#new
33
+ Status::DoesNotResolve
34
+ rescue Net::OpenTimeout # from HTTParty#get
35
+ Status::DoesNotResolve
36
+ rescue Net::ReadTimeout # from HTTParty#get
37
+ Status::DoesNotResolve
38
+ rescue Errno::ECONNREFUSED # From HTTParty#get
39
+ Status::DoesNotResolve
40
+ rescue Errno::ECONNRESET # From HTTParty#get
41
+ Status::DoesNotResolve
42
+ rescue ConnectionFailed # From TCPSocket#new or HTTParty#get -- something funny happened, check manually!
43
+ Status::MaybeResolves
44
+ rescue OpenSSL::SSL::SSLError # Resolves, but SSL certificate is not set up correctly
45
+ Status::Resolves
46
+ rescue URI::InvalidURIError # Resolves, but to weird URL (contains escape sequences??)
47
+ Status::Resolves
48
+ rescue HTTParty::RedirectionTooDeep # Resolves, but with lots of redirection!
49
+ Status::Resolves
50
+ rescue StandardError => e # Something else happened??!!
51
+ warn "URI #{uri} did not resolve, for unknown reason:"
52
+ warn e.message
53
+ Status::MaybeResolves
54
+ end
55
+
56
+ private
57
+
58
+ def self.try_get_uri(uri, timeout = 1)
59
+ thread_with_timeout(timeout) { TCPSocket.new(uri, 80) }
60
+ # TODO: Do not prepend with "http://" if inappropriate
61
+ thread_with_timeout(timeout*5) { HTTParty.get("http://#{uri}") } # This can take longer...
62
+ end
63
+
64
+ def self.thread_with_timeout(timeout)
65
+ th = Thread.new { yield }
66
+ th.priority = 2 # Make sure this runs immediately
67
+
68
+ # Do not use Timeout::timeout, as this does not play nicely with multi-threading
69
+ start = Time.now
70
+ loop do
71
+ if th.alive?
72
+ if (Time.now - start > timeout)
73
+ Thread.kill(th)
74
+ raise ConnectionFailed
75
+ end
76
+ else
77
+ return th.value
78
+ end
79
+ sleep 0.1 # Prevent system hammering
80
+ end
81
+ end
82
+
83
+ end
84
+
@@ -0,0 +1,3 @@
1
+ module UriResolver
2
+ VERSION = "0.1.0"
3
+ end
@@ -0,0 +1,25 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'uri_resolver/version'
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "uri_resolver"
8
+ spec.version = UriResolver::VERSION
9
+ spec.authors = ["Tom Lord"]
10
+ spec.email = ["tom.lord@comlaude.com"]
11
+
12
+ spec.summary = %q{Like "ping", but better -- especially for new GTLDs}
13
+ spec.homepage = "https://github.com/tom-lord/uri_resolver"
14
+
15
+ spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
16
+ spec.bindir = "exe"
17
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
18
+ spec.require_paths = ["lib"]
19
+
20
+ spec.add_development_dependency "bundler", "~> 1.11"
21
+ spec.add_development_dependency "rake", "~> 10.0"
22
+ spec.add_development_dependency "rspec", "~> 3.0"
23
+
24
+ spec.add_runtime_dependency 'httparty'
25
+ end
metadata ADDED
@@ -0,0 +1,110 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: uri_resolver
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Tom Lord
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2016-04-06 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: bundler
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.11'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.11'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rake
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '10.0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '10.0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rspec
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '3.0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '3.0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: httparty
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ type: :runtime
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: '0'
69
+ description:
70
+ email:
71
+ - tom.lord@comlaude.com
72
+ executables: []
73
+ extensions: []
74
+ extra_rdoc_files: []
75
+ files:
76
+ - ".gitignore"
77
+ - ".rspec"
78
+ - ".travis.yml"
79
+ - Gemfile
80
+ - README.md
81
+ - Rakefile
82
+ - bin/console
83
+ - bin/setup
84
+ - lib/uri_resolver.rb
85
+ - lib/uri_resolver/version.rb
86
+ - uri_resolver.gemspec
87
+ homepage: https://github.com/tom-lord/uri_resolver
88
+ licenses: []
89
+ metadata: {}
90
+ post_install_message:
91
+ rdoc_options: []
92
+ require_paths:
93
+ - lib
94
+ required_ruby_version: !ruby/object:Gem::Requirement
95
+ requirements:
96
+ - - ">="
97
+ - !ruby/object:Gem::Version
98
+ version: '0'
99
+ required_rubygems_version: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - ">="
102
+ - !ruby/object:Gem::Version
103
+ version: '0'
104
+ requirements: []
105
+ rubyforge_project:
106
+ rubygems_version: 2.5.1
107
+ signing_key:
108
+ specification_version: 4
109
+ summary: Like "ping", but better -- especially for new GTLDs
110
+ test_files: []