gman 4.3.1 → 4.4.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 11648796c6d730940bc52ae469f1e8d3ae6d0e0d
4
- data.tar.gz: 29a866b7263245d010583e8ac2578d17f9ae7a73
3
+ metadata.gz: 94f77accd5bd6449f6f58a0cc0e2e3566555003d
4
+ data.tar.gz: 570a30e2c58019f9ca18bed1c797f6d0d2260cce
5
5
  SHA512:
6
- metadata.gz: 554d33a96022f7db57269e42ba0e819e9af5eb83e40565f94ed79a78446522f7d9aa14b5965da5dd2da4cef78c96a6a3994877943461e556d847082999c33262
7
- data.tar.gz: 486cf049f003be3409df9db65336adcbf177e57fb9561b5818d06dac36396293e8d801011db577872cc1868f0c0e57936c8fc7af3c774408e9f067dd357022fc
6
+ metadata.gz: ffd38055c145a9a48b85c0c83de8b2e0a60636750f89978859e52aa1f908551d90d12b6f117c058fa02c9aa742fe660ba56e0d0fdee8e0166e4d8784bfce762b
7
+ data.tar.gz: e937086398523d2030af950c07ded2b644d0a2d491eb7e5f4b23fa8f4f33a21e9af285f0bd561a392359ff5ff5085e084cecc45e7e4d9138e1940c5d1d9ae87e
@@ -0,0 +1,4 @@
1
+ *.gem
2
+ .bundle
3
+ tmp
4
+ Gemfile.lock
@@ -0,0 +1,4 @@
1
+ langauage: ruby
2
+ script: "script/cibuild"
3
+ sudo: false
4
+ cache: bundler
@@ -0,0 +1,22 @@
1
+ # Contributing to Gman
2
+
3
+ ## How to contribute
4
+
5
+ 1. Fork the project
6
+ 2. Create a descriptive branch
7
+ 3. Make your change
8
+ 4. Submit a pull request
9
+
10
+ ## Code
11
+
12
+ Open an issue, or submit a pull request
13
+
14
+ ## Domains
15
+
16
+ Domains live in `./config/domains.txt` as a list of TLDs and SLD+TLDs.
17
+
18
+ Right now, the only valid government top level domains (TLDs), represent the US government and are `.gov`, and `.mil`.
19
+
20
+ Secondary domains (e.g., `gov.uk`, or `mil.au`) detect non-US government entities.
21
+
22
+ To add or remove a domain from the list of known government domains, simply edit the `domains.txt` file.
data/Gemfile ADDED
@@ -0,0 +1,6 @@
1
+ source 'https://rubygems.org'
2
+ gemspec
3
+
4
+ group :development, :test do
5
+ gem 'net-dns'
6
+ end
@@ -0,0 +1,119 @@
1
+ # Gman Gem
2
+
3
+ [![Build Status](https://travis-ci.org/benbalter/gman.png)](https://travis-ci.org/benbalter/gman) [![Gem Version](https://badge.fury.io/rb/gman.png)](http://badge.fury.io/rb/gman)
4
+
5
+ A ruby gem to check if the owner of a given email address or website is working for THE MAN (a.k.a verifies government domains). It will also provide you with metadata about the domain, such as the country, state, city, or agency, where applicable. It does this by leveraging the power of [Naughty or Nice](https://github.com/benbalter/naughty_or_nice), the [Public Suffix List](http://publicsuffix.org/), and the associated [Ruby Gem](https://github.com/weppos/publicsuffix-ruby).
6
+
7
+ You could theoretically [use regex](https://gist.github.com/benbalter/6147066), but either you'll get a bunch of false positives, or your regex will be insanely complicated. `gov.uk`, may be valid, for example, but `gov.fr` is not (it's `gouv.fr`, for what it's worth). The solution? Use Public Suffix to verify that it's a valid public domain, then maintain [a crowd-sourced sub-list of known global government and military domains](https://github.com/benbalter/gman/blob/master/config/domains.txt). It should cover all US and international, government and military domains for both email and website verification.
8
+
9
+ See a domains that's missing or one that shouldn't be there? [We'd love you to contribute](CONTRIBUTING.md).
10
+
11
+ ## Installation
12
+
13
+ Gman is a Ruby gem, so you'll need a little Ruby-fu to get it working. Simply
14
+
15
+ `gem install gman`
16
+
17
+ Or add this to your `Gemfile` before doing a `bundle install`:
18
+
19
+ `gem 'gman'`
20
+
21
+ ## Usage
22
+
23
+ ### In general
24
+
25
+ ### Verify email addresses
26
+
27
+ ```ruby
28
+ Gman.valid? "foo@bar.gov" #=> true
29
+ Gman.valid? "foo@bar.com" #=> false
30
+ ```
31
+
32
+ ### Verify domain
33
+
34
+ ```ruby
35
+ Gman.valid? "http://foo.bar.gov" #=> true
36
+ Gman.valid? "foo.bar.gov" #=> true
37
+ Gman.valid? "foo.gov" #=> true
38
+ Gman.valid? "foo.biz" #=> false
39
+ ```
40
+
41
+ ### Determine the type of domain
42
+
43
+ ```ruby
44
+ domain = Gman.new "whitehouse.gov"
45
+ domain.type #=> :federal
46
+ domain.federal? #=> true
47
+ domain.state? #=> false
48
+ domain.city? #=> false
49
+ domain.county? #=> false
50
+ ```
51
+
52
+ ### Get information about the domain's geographic location (.gov and .us only)
53
+
54
+ ```ruby
55
+ domain = Gman.new "illinois.gov"
56
+ domain.state #=> "IL"
57
+ domain.city #=> "springfield"
58
+ ```
59
+
60
+ ### Get information about a .gov domain's owner
61
+
62
+ ```ruby
63
+ domain = Gman.new "whitehouse.gov"
64
+ domain.agency #=> "Executive Office of the President"
65
+ ```
66
+
67
+ ### Get the ISO Country Code information represented by a government domain
68
+
69
+ ```ruby
70
+ domain = Gman.new "whitehouse.gov" #=> #<Gman domain="whitehouse.gov" valid=true>
71
+ domain.country.name #=> "United States"
72
+ domain.country.alpha2 #=> "US"
73
+ domain.country.alpha3 #=> "USA"
74
+ domain.country.currency #=> "USD"
75
+ domain.conutry.calling_code #=> "+1"
76
+ ```
77
+
78
+ ### Command line
79
+
80
+ #### Getting information about a given domain
81
+
82
+ ```
83
+ $ gman whitehouse.gov
84
+ Domain : whitehouse.gov
85
+ Valid government domain
86
+ Type : federal
87
+ Country : United States
88
+ State : DC
89
+ City : Washington
90
+ Agency : Executive Office of the President
91
+ ```
92
+
93
+ The command line tool will accept any domain-like string (email, url, etc.)
94
+
95
+ ```
96
+ $ gman foo@illinois.gov
97
+ Domain : illinois.gov
98
+ Valid government domain
99
+ Type : state
100
+ Country : United States
101
+ State : IL
102
+ City : Springfield
103
+ ```
104
+
105
+ #### Filter
106
+
107
+ Filters newline-separated email addresses from stdin. Example usage:
108
+
109
+ ```
110
+ $ gman_filter < path/to/list/of/addresses.txt
111
+ ```
112
+
113
+ ## Contributing
114
+
115
+ Contributions welcome! Please see [the contribution guidelines](CONTRIBUTING.md) for code contributions or for details on how to add, update, or delete government domains.
116
+
117
+ ## Credits
118
+
119
+ Heavily inspired by [swot](https://github.com/leereilly/swot). Thanks [@leereilly](https://github.com/leereilly)!
@@ -0,0 +1,29 @@
1
+ require 'rubygems'
2
+ require 'bundler'
3
+ begin
4
+ Bundler.setup(:default, :development)
5
+ rescue Bundler::BundlerError => e
6
+ $stderr.puts e.message
7
+ $stderr.puts "Run `bundle install` to install missing gems"
8
+ exit e.status_code
9
+ end
10
+ require 'rake'
11
+
12
+ require 'rake/testtask'
13
+ Rake::TestTask.new(:test) do |test|
14
+ test.libs << 'lib' << 'test'
15
+ test.pattern = 'test/**/test_gman*.rb'
16
+ test.verbose = true
17
+ end
18
+
19
+ desc "Open console with gman loaded"
20
+ task :console do
21
+ exec "irb -r ./lib/gman.rb"
22
+ end
23
+
24
+ desc "Validate the domain list"
25
+ Rake::TestTask.new(:domains) do |test|
26
+ test.libs << 'lib' << 'test'
27
+ test.pattern = 'test/**/test_domains*.rb'
28
+ test.verbose = true
29
+ end
@@ -0,0 +1,42 @@
1
+ #! /usr/bin/env ruby
2
+ # Given a domain-link string, returns information about the domain
3
+
4
+ require 'colorize'
5
+ require_relative "../lib/gman"
6
+
7
+ # Convenience method to simplify the command-line logic
8
+ class IsoCountryCodes
9
+ class Code
10
+ def to_s
11
+ name
12
+ end
13
+ end
14
+ end
15
+
16
+ domain = ARGV[0]
17
+
18
+ if domain.nil?
19
+ puts "USAGE: gman <domain or email address>".red
20
+ exit 1
21
+ end
22
+
23
+ gman = Gman.new(domain)
24
+
25
+ puts "Domain : #{gman.domain}"
26
+
27
+ if gman.domain_parts.nil?
28
+ puts "Status : " + "Invalid domain".red
29
+ exit 1
30
+ end
31
+
32
+ if !gman.valid?
33
+ puts "Status : " + "Not a government domain".red
34
+ exit 1
35
+ end
36
+
37
+ puts "Status : " + "Valid government domain".green
38
+
39
+ ["type", "country", "state", "city", "agency"].each do |key|
40
+ value = gman.send(key)
41
+ puts "#{key.capitalize.ljust(8)}: #{value}" if value
42
+ end
@@ -0,0 +1,29 @@
1
+ Gem::Specification.new do |s|
2
+ s.name = "gman"
3
+ s.summary = "Check if a given domain or email address belong to a governemnt entity"
4
+ s.description = "A ruby gem to check if the owner of a given email address is working for THE MAN."
5
+ s.version = '4.4.0'
6
+ s.authors = ["Ben Balter"]
7
+ s.email = "ben.balter@github.com"
8
+ s.homepage = "https://github.com/benbalter/gman"
9
+ s.licenses = ["MIT"]
10
+
11
+ s.files = `git ls-files`.split("\n")
12
+ s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
13
+ s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
14
+ s.require_paths = ["lib"]
15
+
16
+ s.require_paths = ["lib"]
17
+ s.add_dependency( "swot", '~> 0.4.2' )
18
+ s.add_dependency( "iso_country_codes", "~> 0.6" )
19
+ s.add_dependency( "naughty_or_nice", "~> 0.0.2" )
20
+ s.add_dependency( "colorize", "~> 0.7" )
21
+
22
+ s.add_development_dependency( "rake" )
23
+ s.add_development_dependency( "shoulda" )
24
+ s.add_development_dependency( "rdoc" )
25
+ s.add_development_dependency( "bundler" )
26
+ s.add_development_dependency( "pry" )
27
+ s.add_development_dependency( "parallel" )
28
+
29
+ end
@@ -13,6 +13,8 @@ class Gman < NaughtyOrNice
13
13
  :federal
14
14
  elsif county?
15
15
  :county
16
+ elsif list_category.nil?
17
+ nil
16
18
  elsif list_category.include?("usagov")
17
19
  :unknown
18
20
  else
@@ -0,0 +1,59 @@
1
+ # Utility functions for parsing and manipulating public-suffix formatted domain lists
2
+ require 'net/dns'
3
+ require 'net/dns/resolver'
4
+
5
+ class Gman < NaughtyOrNice
6
+ class Parser
7
+
8
+ COMMENT_REGEX = /\/\/[\/\s]*(.*)$/i
9
+
10
+ class << self
11
+
12
+ # Given a public-suffix list formatted file
13
+ # Converts to a hash in the form of :group => [domain1, domain2...]
14
+ def file_to_hash(file)
15
+ array_to_hash(file_to_array(file))
16
+ end
17
+
18
+ # Given a public-suffix list formatted file
19
+ # Convert it into an array of comments and domains representing each line
20
+ def file_to_array(file)
21
+ domains = File.open(file).read
22
+ domains.gsub! /\r\n?/, "\n" # Normalize line endings
23
+ domains = domains.split("\n")
24
+ end
25
+
26
+ # Given an array of comments/domains in public suffix format
27
+ # Converts to a hash in the form of :group => [domain1, domain2...]
28
+ def array_to_hash(domains)
29
+ group = ""
30
+ domain_hash = {}
31
+ domains.each do |line|
32
+ next if line.empty?
33
+ if match = COMMENT_REGEX.match(line)
34
+ group = match[1]
35
+ else
36
+ domain_hash[group] = [] if domain_hash[group].nil?
37
+ domain_hash[group].push line.downcase
38
+ end
39
+ end
40
+ domain_hash
41
+ end
42
+
43
+ def resolver
44
+ @resolver ||= begin
45
+ resolver = Net::DNS::Resolver.new
46
+ resolver.nameservers = ["8.8.8.8","8.8.4.4", "208.67.222.222", "208.67.220.220"]
47
+ resolver
48
+ end
49
+ end
50
+
51
+ # Verifies that the given domain has an MX record, and thus is valid
52
+ def domain_resolves?(domain)
53
+ resolver.search(domain).header.anCount > 0 ||
54
+ resolver.search(domain, Net::DNS::NS).header.anCount > 0 ||
55
+ resolver.search(domain, Net::DNS::MX).header.anCount > 0
56
+ end
57
+ end
58
+ end
59
+ end
@@ -0,0 +1,42 @@
1
+ #! /usr/bin/env ruby
2
+ #
3
+ # Alphabetizes entries in the domains.txt file
4
+ #
5
+ # usage: script/alphabetize
6
+
7
+ require_relative "../lib/gman"
8
+
9
+ # Read in existing list
10
+ domains = File.open(Gman.list_path).read
11
+ domains = domains.gsub /\r\n?/, "\n" #normalize line endings
12
+ domains = domains.split("\n")
13
+
14
+ # Split list into grouped hash
15
+ group = ""
16
+ domain_hash = {}
17
+ domains.each do |line|
18
+ next if line.empty?
19
+ if match = /\/\/[\/\s]*(.*)$/i.match(line)
20
+ group = match[1]
21
+ else
22
+ domain_hash[group] = [] if domain_hash[group].nil?
23
+ domain_hash[group].push line.downcase
24
+ end
25
+ end
26
+
27
+ # Sort by groups
28
+ domain_hash = domain_hash.sort_by { |k,v| k.downcase }.to_h
29
+
30
+ # Sort within groups
31
+ domain_hash.each do |group, domains|
32
+ domain_hash[group].sort!
33
+ end
34
+
35
+ output = ""
36
+ domain_hash.each do |group, domains|
37
+ output << "// #{group}\n"
38
+ output << domains.join("\n")
39
+ output << "\n\n"
40
+ end
41
+
42
+ File.write Gman.list_path, output.strip
@@ -0,0 +1,27 @@
1
+ #!/usr/bin/env ruby
2
+ # Propagates an initial list of best-guess government domains
3
+
4
+ require "public_suffix"
5
+ require "yaml"
6
+
7
+ # https://gist.github.com/benbalter/6147066
8
+ REGEX = /(\.g[ou]{1,2}(v|b|vt)|\.mil|\.gc|\.fed)(\.[a-z]{2})?$/i
9
+
10
+ YAML_FILE = File.dirname(__FILE__) + "/../lib/domains.yml"
11
+ domains = YAML.load_file YAML_FILE
12
+ domains = [] unless domains
13
+
14
+ PublicSuffix::List.default.each do |rule|
15
+ domain = nil
16
+
17
+ if rule.parts.length == 1
18
+ domain = rule.parts.first if ".#{rule.value}" =~ REGEX
19
+ else
20
+ domain = rule.parts.pop(2).join(".") if ".#{rule.value}" =~ REGEX
21
+ end
22
+
23
+ domains.push domain unless domain.nil? or domains.include? domain
24
+ end
25
+
26
+ domains = domains.sort
27
+ File.open(YAML_FILE, 'w+') {|f| f.write(domains.to_yaml)}
@@ -0,0 +1,7 @@
1
+ #!/bin/sh
2
+
3
+ set -e
4
+
5
+ bundle exec rake test
6
+ bundle exec script/dedupe
7
+ bundle exec script/state-domains
@@ -0,0 +1,3 @@
1
+ #! /bin/sh
2
+
3
+ bundle exec pry -r ./lib/gman.rb
@@ -0,0 +1,38 @@
1
+ #! /usr/bin/env ruby
2
+
3
+ require 'yaml'
4
+ require 'open-uri'
5
+ require './lib/gman'
6
+ require './lib/gman/parser'
7
+
8
+
9
+ current = Gman::Parser.file_to_array( Gman::list_path )
10
+ domain_hash = Gman::Parser.array_to_hash(current)
11
+ domain_list = domain_hash.flat_map { |k,v| v }
12
+
13
+ puts "Checking for duplicate domains in the domain list..."
14
+ puts "Current list contains #{domain_list.count} domains..."
15
+
16
+ SOURCE = "https://raw.githubusercontent.com/GSA/govt-urls/master/government-urls.yaml"
17
+ source_hash = YAML.load(open(SOURCE).read)
18
+ source_list = source_hash.flat_map { |k,v| v }
19
+
20
+ dupes = []
21
+ domain_hash.each do |group,domains|
22
+ domains.each do |domain|
23
+ if domain_list.count(domain) > 1 && source_list.count(domain) <= 1
24
+ dupes.push(domain)
25
+ end
26
+ end
27
+ end
28
+
29
+ dupes.uniq!
30
+
31
+ puts "Found #{dupes.count} dupes!"
32
+
33
+ if dupes.count > 0
34
+ puts dupes.inspect
35
+ exit 1
36
+ else
37
+ exit 0
38
+ end
@@ -0,0 +1,17 @@
1
+ #! /usr/bin/env ruby
2
+ # Given an array of domains, removes them from the list
3
+ # Example usage: script/prune foo.invalid, bar.invalid, foo.bar.invalid
4
+
5
+ domains = ARGV
6
+ domains = domains.clone.map { |d| d.gsub ",", "" }
7
+
8
+ list = File.open("./config/domains.txt").read
9
+ puts "Starting list: #{list.size} lines"
10
+
11
+ domains.each do |domain|
12
+ list.gsub! /^#{domain}$\n/, ""
13
+ end
14
+
15
+ puts "Ending list: #{list.size} lines"
16
+
17
+ File.write "./config/domains.txt", list
@@ -0,0 +1,38 @@
1
+ #!/bin/sh
2
+ # Tag and push a release.
3
+
4
+ set -e
5
+
6
+ # Make sure we're in the project root.
7
+
8
+ cd $(dirname "$0")/..
9
+
10
+ # Build a new gem archive.
11
+
12
+ rm -rf gman-*.gem
13
+ gem build -q gman.gemspec
14
+
15
+ # Make sure we're on the master branch.
16
+
17
+ (git branch | grep -q '* master') || {
18
+ echo "Only release from the master branch."
19
+ exit 1
20
+ }
21
+
22
+ # Figure out what version we're releasing.
23
+
24
+ tag=v`ls gman-*.gem | sed 's/^gman-\(.*\)\.gem$/\1/'`
25
+
26
+ # Make sure we haven't released this version before.
27
+
28
+ git fetch -t origin
29
+
30
+ (git tag -l | grep -q "$tag") && {
31
+ echo "Whoops, there's already a '${tag}' tag."
32
+ exit 1
33
+ }
34
+
35
+ # Tag it and bag it.
36
+
37
+ gem push gman-*.gem && git tag "$tag" &&
38
+ git push origin master && git push origin "$tag"
@@ -0,0 +1,38 @@
1
+ #! /usr/bin/env ruby
2
+ # Strips domains in the form of e.g., city.<locality>.<state>.us from the domain list
3
+
4
+ require './lib/gman'
5
+ require './lib/gman/parser'
6
+
7
+ current = Gman::Parser.file_to_array( Gman::list_path )
8
+ domain_hash = Gman::Parser.array_to_hash(current)
9
+
10
+ puts "Checking for state gov regex'd domains in the list..."
11
+ puts "Starting with #{current.size} domains..."
12
+
13
+ domain_hash.each do |group, domains|
14
+ next unless group =~ /usagov[A-Z]{2}/
15
+ state = group[-2,2].downcase
16
+ domain_hash[group].reject! { |d| d =~ Gman::LOCALITY_REGEX }
17
+ domain_hash[group].uniq!
18
+ domain_hash[group].sort!
19
+ end
20
+
21
+ # PublicSuffix Formatted Output
22
+ current_group = ""
23
+ output = ""
24
+ domain_hash.each do |group, domains|
25
+ if group != current_group
26
+ output << "\n\n" unless current_group.empty? # first entry
27
+ output << "// #{group}\n"
28
+ current_group = group
29
+ end
30
+ output << domains.join("\n")
31
+ end
32
+
33
+ File.open(Gman.list_path, "w") { |file| file.write output }
34
+
35
+ result = Gman::Parser.file_to_array( Gman::list_path )
36
+ puts "New list contains #{result.size} domains. Fin."
37
+
38
+ exit 1 if current.size != result.size
@@ -0,0 +1,44 @@
1
+ #! /usr/bin/env ruby
2
+
3
+ require 'csv'
4
+ require 'open-uri'
5
+ require './lib/gman'
6
+ require './lib/gman/parser'
7
+
8
+ source = "http://www.mik.nrw.de/nc/themen-aufgaben/kommunales/kommunale-adressen.html?tx_szkommunaldb_pi1%5Bexport%5D=csv"
9
+
10
+ csv = open(source).read.force_encoding("iso-8859-1").encode("UTF-8")
11
+
12
+ # For some reason, the header row is actually the last row
13
+ # Pop the last line off the file and prepend it at the begining
14
+ # So that when we pass it to CSV it detects the headers properly
15
+ lines = csv.split("\n")
16
+ lines.unshift lines.pop
17
+ csv = lines.join("\n")
18
+
19
+ data = CSV.parse(csv, :headers => true, :col_sep => ";")
20
+ domains = data.map { |row| row["Internet"].to_s.downcase.strip.gsub /^www./, "" }
21
+
22
+ domains.reject! { |domain| domain.empty? }
23
+ domains.select! { |domain| PublicSuffix.valid?(".#{domain}") } # Validate domain
24
+ domains.reject! { |domain| Swot::is_academic?(domain) } # Reject academic domains
25
+
26
+ current = Gman::Parser.file_to_array( Gman::list_path )
27
+ current_hash = Gman::Parser.array_to_hash(current)
28
+
29
+ current_hash["German Municipalities"] = domains
30
+ current_hash = current_hash.sort_by { |group, domains| group.downcase }
31
+
32
+ # PublicSuffix Formatted Output
33
+ current_group = ""
34
+ output = ""
35
+ current_hash.each do |group, domains|
36
+ if group != current_group
37
+ output << "\n\n" unless current_group.empty? # first entry
38
+ output << "// #{group}\n"
39
+ current_group = group
40
+ end
41
+ output << domains.join("\n")
42
+ end
43
+
44
+ File.open(Gman.list_path, "w") { |file| file.write output }
@@ -0,0 +1,5 @@
1
+ #!/bin/sh
2
+
3
+ DATE=2014-12-01
4
+
5
+ wget "https://raw.githubusercontent.com/GSA/data/gh-pages/dotgov-domains/$DATE-full.csv" -O config/vendor/dotgovs.csv
@@ -0,0 +1,82 @@
1
+ #! /usr/bin/env ruby
2
+ #
3
+ # Vendors the USA.gov-maintained list of US domains into domains.txt
4
+ # Source: https://github.com/GSA-OCSIT/govt-urls
5
+ #
6
+ # Normalizes and cleans inputs, validates domains, rejects academic domains, and
7
+ # sorts, ensures uniqueness, and merges into the existing lib/domains.txt list
8
+ #
9
+ # Usage: script/vendor-us
10
+ #
11
+ # Will automatically fetch latest version of the list and merge
12
+ # You can check for changes and commit via `git status`
13
+ #
14
+ # It's also probably a good idea to run `script/ci-build` for good measure
15
+
16
+ require 'rubygems'
17
+ require 'public_suffix'
18
+ require 'swot'
19
+ require 'yaml'
20
+ require 'open-uri'
21
+ require './lib/gman'
22
+ require './lib/gman/parser'
23
+
24
+ SOURCE = "https://raw.githubusercontent.com/GSA/govt-urls/master/government-urls.yaml"
25
+ BLACKLIST = ["usagovQUASI", "usagovFED", "usagovPW"]
26
+ domain_hash = {}
27
+
28
+ domain_hash = YAML.load(open(SOURCE).read)
29
+ puts "found #{domain_hash.map { |group,domains| domains.count }.inject(:+)} domains..."
30
+
31
+ # Normalize ALL THE THINGS
32
+ domain_hash.each do |group, domains|
33
+ domains.map! { |domain| domain.strip } # Strip trailing slashes
34
+ domains.map! { |domain| domain.gsub /\/$/, "" } # Strip trailing slashes
35
+ domains.map! { |domain| domain.downcase } # make lower case
36
+ domains.reject! { |domain| domain.empty? } # Reject empty strings
37
+ end
38
+
39
+ # filter
40
+ domain_hash.reject! { |group,domain| BLACKLIST.include?(group) } # Group blacklist
41
+ domain_hash.each do |group, domains|
42
+ puts "Filtering #{group}..."
43
+ domains.reject! { |domain| domain.match /\// } # Reject URLs
44
+ domains.select! { |domain| PublicSuffix.valid?(domain) } # Validate domain
45
+ domains.reject! { |domain| Swot::is_academic?(domain) } # Reject academic domains
46
+ end
47
+ puts "Filtered down to #{domain_hash.map { |group,domains| domains.count }.inject(:+)} domains"
48
+
49
+ # Grab existing list
50
+ current = Gman::Parser.file_to_array( Gman::list_path )
51
+ current_hash = Gman::Parser.array_to_hash(current)
52
+ puts "Current list contains #{current.size} domains... merging"
53
+
54
+ # Lazy deep merge
55
+ domain_hash.each do |group,domains|
56
+ current_hash[group] = [] if current_hash[group].nil?
57
+ current_hash[group].concat domains
58
+ current_hash[group].sort! # Alphabetize
59
+ current_hash[group].uniq! # Ensure uniqueness
60
+ end
61
+
62
+ # Sort by group
63
+ current_hash = current_hash.sort_by { |group, domains| group.downcase }
64
+
65
+ # PublicSuffix Formatted Output
66
+ current_group = ""
67
+ output = ""
68
+ current_hash.each do |group, domains|
69
+ if group != current_group
70
+ output << "\n\n" unless current_group.empty? # first entry
71
+ output << "// #{group}\n"
72
+ current_group = group
73
+ end
74
+ output << domains.join("\n")
75
+ end
76
+
77
+ puts "merged. Writing..."
78
+
79
+ File.open(Gman.list_path, "w") { |file| file.write output }
80
+
81
+ result = Gman::Parser.file_to_array( Gman::list_path )
82
+ puts "New list contains #{result.size} domains. Fin."
@@ -0,0 +1,26 @@
1
+ require 'rubygems'
2
+ require 'bundler'
3
+ require 'minitest/autorun'
4
+ require 'parallel'
5
+ require 'open3'
6
+
7
+ begin
8
+ Bundler.setup(:default, :development)
9
+ rescue Bundler::BundlerError => e
10
+ $stderr.puts e.message
11
+ $stderr.puts "Run `bundle install` to install missing gems"
12
+ exit e.status_code
13
+ end
14
+
15
+ require 'shoulda'
16
+
17
+ $LOAD_PATH.unshift(File.join(File.dirname(__FILE__), '..', 'lib'))
18
+ $LOAD_PATH.unshift(File.dirname(__FILE__))
19
+ require 'gman'
20
+ require 'net/dns'
21
+ require 'net/dns/resolver'
22
+ require './lib/gman/parser'
23
+
24
+ def test_bin(*args)
25
+ output, status = Open3.capture2e("bundle", "exec", "gman", *args)
26
+ end
@@ -0,0 +1,5 @@
1
+ barry@dcpchicago.org
2
+ prof.obama@uchicago.edu
3
+ mr.senator@obama.senate.gov
4
+ president@whitehouse.gov
5
+ commander.in.chief@us.army.mil
@@ -0,0 +1,60 @@
1
+ require File.join(File.dirname(__FILE__), 'helper')
2
+
3
+ class TestDomains < Minitest::Test
4
+
5
+ WHITELIST = [ "non-us gov", "non-us mil", "US Federal"]
6
+ DOMAINS = Gman::Parser.file_to_hash(Gman.list_path)
7
+
8
+ def whitelisted?(domain)
9
+ WHITELIST.each do |group|
10
+ return true if DOMAINS[group].include? domain
11
+ end
12
+ false
13
+ end
14
+
15
+ should "only contain resolvable domains" do
16
+ unresolvables = []
17
+ Parallel.each(Gman.list, :in_threads => 2) do |entry|
18
+ next if whitelisted? entry.name
19
+ resolves = Gman::Parser.domain_resolves?(entry.name)
20
+ unresolvables.push entry.name unless resolves
21
+ end
22
+ assert_equal [], unresolvables
23
+ end
24
+
25
+ should "not contain any educational domains" do
26
+ Parallel.each(Gman.list, :in_threads => 2) do |entry|
27
+ assert_equal false, Swot::is_academic?(entry.name), "#{entry.name} is an academic domain"
28
+ end
29
+ end
30
+
31
+ should "not contain any invalid domains" do
32
+ Parallel.each(Gman.list, :in_threads => 2) do |entry|
33
+ assert_equal true, PublicSuffix.valid?("foo.#{entry.name}"), "#{entry.name} is not a valid domain"
34
+ end
35
+ end
36
+
37
+ should "pass any url on the list" do
38
+ Parallel.each(Gman.list, :in_threads => 2) do |entry|
39
+ assert_equal true, Gman.valid?("http://foo.#{entry.name}/bar"), "http://foo.#{entry.name}/bar is not a valid"
40
+ end
41
+ end
42
+
43
+ should "pass any email on the list" do
44
+ Parallel.each(Gman.list, :in_threads => 2) do |entry|
45
+ assert_equal true, Gman.valid?("foo@bar.#{entry.name}"), "foo@bar.#{entry.name} is not a valid"
46
+ end
47
+ end
48
+
49
+ should "pass any domain on the list" do
50
+ Parallel.each(Gman.list, :in_threads => 2) do |entry|
51
+ assert_equal true, Gman.valid?("foo.#{entry.name}"), "foo.#{entry.name} is not a valid domain"
52
+ end
53
+ end
54
+
55
+ should "identify the coutnry for any domain on the list" do
56
+ Parallel.each(Gman.list, :in_threads => 2) do |entry|
57
+ Gman.new("foo.#{entry.name}").country.name
58
+ end
59
+ end
60
+ end
@@ -0,0 +1,57 @@
1
+ require File.join(File.dirname(__FILE__), 'helper')
2
+
3
+ VALID = [ "foo.gov",
4
+ "http://foo.mil",
5
+ "foo@bar.gc.ca",
6
+ "foo.gov.au",
7
+ "https://www.foo.gouv.fr",
8
+ "foo@ci.champaign.il.us",
9
+ "foo.bar.baz.gov.au",
10
+ "foo@bar.gov.uk",
11
+ ".gov",
12
+ "foo.fed.us",
13
+ "foo.state.il.us",
14
+ "state.il.us",
15
+ "foo@af.mil",
16
+ "gov.in"
17
+ ]
18
+
19
+ INVALID = [ "foo.bar.com",
20
+ "bar@foo.biz",
21
+ "http://www.foo.biz",
22
+ "foo.uk",
23
+ "gov",
24
+ "foo@k12.champaign.il.us",
25
+ "foo@kii.gov.by",
26
+ "foo",
27
+ "",
28
+ nil,
29
+ " ",
30
+ "foo.city.il.us",
31
+ "foo.ci.il.us",
32
+ "foo.zx.us",
33
+ "foo@mail.gov.ua"
34
+ ]
35
+
36
+ class TestGman < Minitest::Test
37
+
38
+ VALID.each do |domain|
39
+ should "recognize #{domain} as a government domain" do
40
+ assert Gman::valid?(domain)
41
+ end
42
+ end
43
+
44
+ INVALID.each do |domain|
45
+ should "recognize #{domain} as a non-government domain" do
46
+ refute Gman::valid?(domain)
47
+ end
48
+ end
49
+
50
+ should "not allow educational domains" do
51
+ assert_equal false, Gman::valid?("foo@gwu.edu")
52
+ end
53
+
54
+ should "returns the path to domains.txt" do
55
+ assert_equal true, File.exists?(Gman.list_path)
56
+ end
57
+ end
@@ -0,0 +1,56 @@
1
+ require_relative "helper"
2
+
3
+ class TestGmanBin < Minitest::Test
4
+
5
+ def setup
6
+ @output, @status = test_bin("whitehouse.gov")
7
+ end
8
+
9
+ should "parse the domain" do
10
+ output, status = test_bin("bar.gov")
11
+ assert_match /Domain : bar.gov/, output
12
+
13
+ output, status = test_bin("foo@bar.gov")
14
+ assert_match /Domain : bar.gov/, output
15
+
16
+ output, status = test_bin("http://bar.gov/foo")
17
+ assert_match /Domain : bar.gov/, output
18
+ end
19
+
20
+ should "err on invalid domains" do
21
+ output, status = test_bin("foo.invalid")
22
+ assert_equal 1, status.exitstatus
23
+ assert_match /Invalid domain/, output
24
+ end
25
+
26
+ should "err on non-government domains" do
27
+ output, status = test_bin("github.com")
28
+ assert_equal 1, status.exitstatus
29
+ assert_match /Not a government domain/, output
30
+ end
31
+
32
+ should "know the type" do
33
+ assert_match /federal/, @output
34
+ assert_equal 0, @status.exitstatus
35
+ end
36
+
37
+ should "know the agency" do
38
+ assert_match /Executive Office of the President/, @output
39
+ assert_equal 0, @status.exitstatus
40
+ end
41
+
42
+ should "know the country" do
43
+ assert_match /United States/, @output
44
+ assert_equal 0, @status.exitstatus
45
+ end
46
+
47
+ should "know the city" do
48
+ assert_match /Washington/, @output
49
+ assert_equal 0, @status.exitstatus
50
+ end
51
+
52
+ should "know the state" do
53
+ assert_match /DC/, @output
54
+ assert_equal 0, @status.exitstatus
55
+ end
56
+ end
@@ -0,0 +1,10 @@
1
+ require File.join(File.dirname(__FILE__), 'helper')
2
+
3
+ class TestGmanCountryCodes < Minitest::Test
4
+ should "determine a domain's country" do
5
+ assert_equal "United States", Gman.new("whitehouse.gov").country.name
6
+ assert_equal "United States", Gman.new("army.mil").country.name
7
+ assert_equal "United Kingdom", Gman.new("foo.gov.uk").country.name
8
+ assert_equal "Canada", Gman.new("foo.gc.ca").country.name
9
+ end
10
+ end
@@ -0,0 +1,19 @@
1
+ HERE = File.dirname(__FILE__)
2
+ require File.join(HERE, 'helper')
3
+
4
+ class TestGmanFilter < Minitest::Test
5
+
6
+ txt_path = File.join(HERE, "obama.txt")
7
+ exec_path = File.join(HERE, "..", "bin", "gman_filter")
8
+
9
+ should "remove non-gov/mil addresses" do
10
+ filtered = `#{exec_path} < #{txt_path}`
11
+ expected = %w(
12
+ mr.senator@obama.senate.gov
13
+ president@whitehouse.gov
14
+ commander.in.chief@us.army.mil
15
+ ).join("\n") + "\n"
16
+ assert_equal filtered, expected
17
+ end
18
+
19
+ end
@@ -0,0 +1,106 @@
1
+ require File.join(File.dirname(__FILE__), 'helper')
2
+
3
+ class TestGmanIdentifier < Minitest::Test
4
+ should "Parse the dotgov list" do
5
+ assert Gman.dotgov_list
6
+ assert_equal CSV::Table, Gman.dotgov_list.class
7
+ assert_equal CSV::Row, Gman.dotgov_list.first.class
8
+ assert Gman.dotgov_list.first["Domain Name"]
9
+ end
10
+
11
+ context "locality domains" do
12
+ should "detect state domains" do
13
+ domain = Gman.new("state.ak.us")
14
+ assert domain.state?
15
+
16
+ refute domain.dotgov?
17
+ refute domain.city?
18
+ refute domain.federal?
19
+ refute domain.county?
20
+
21
+ assert_equal :state, domain.type
22
+ assert_equal "AK", domain.state
23
+ end
24
+
25
+ should "detect city domains" do
26
+ domain = Gman.new("ci.champaign.il.us")
27
+ assert domain.city?
28
+
29
+ refute domain.dotgov?
30
+ refute domain.state?
31
+ refute domain.federal?
32
+ refute domain.county?
33
+
34
+ assert_equal :city, domain.type
35
+ assert_equal "IL", domain.state
36
+ end
37
+ end
38
+
39
+ context "dotgovs" do
40
+ should "detect federal dotgovs" do
41
+ domain = Gman.new "whitehouse.gov"
42
+ assert domain.federal?
43
+ assert domain.dotgov?
44
+
45
+ refute domain.city?
46
+ refute domain.state?
47
+ refute domain.county?
48
+
49
+ assert_equal :federal, domain.type
50
+ assert_equal "DC", domain.state
51
+ assert_equal "Washington", domain.city
52
+ assert_equal "Executive Office of the President", domain.agency
53
+ end
54
+
55
+ should "detect state dotgovs" do
56
+ domain = Gman.new "illinois.gov"
57
+ assert domain.state?
58
+ assert domain.dotgov?
59
+
60
+ refute domain.city?
61
+ refute domain.federal?
62
+ refute domain.county?
63
+
64
+ assert_equal :state, domain.type
65
+ assert_equal "IL", domain.state
66
+ assert_equal "Springfield", domain.city
67
+ end
68
+
69
+ should "detect county dotgovs" do
70
+ domain = Gman.new "ALLEGHENYCOUNTYPA.GOV"
71
+ assert domain.county?
72
+ assert domain.dotgov?
73
+
74
+ refute domain.city?
75
+ refute domain.federal?
76
+ refute domain.state?
77
+
78
+ assert_equal :county, domain.type
79
+ assert_equal "PA", domain.state
80
+ assert_equal "Pittsburgh", domain.city
81
+ end
82
+
83
+ should "detect the list category" do
84
+ assert_equal "US Federal", Gman.new("whitehouse.gov").send("list_category")
85
+ end
86
+ end
87
+
88
+ context "non-dotgov domains" do
89
+ should "determine a domain's group" do
90
+ assert_equal "usagovIN", Gman.new("cityofperu.org").send("list_category")
91
+ assert_equal :unknown, Gman.new("cityofperu.org").type
92
+
93
+ assert_equal "Canada municipal", Gman.new("acme.ca").send("list_category")
94
+ assert_equal :"Canada municipal", Gman.new("acme.ca").type
95
+
96
+ assert_equal "Canada federal", Gman.new("canada.ca").send("list_category")
97
+ assert_equal :"Canada federal", Gman.new("canada.ca").type
98
+ end
99
+
100
+ should "detect the state" do
101
+ assert_equal "PR", Gman.new("sanjuan.pr").state
102
+ assert_equal "OR", Gman.new("ashland.or.us").state
103
+ refute Gman.new("canada.ca").state
104
+ end
105
+ end
106
+ end
@@ -0,0 +1,10 @@
1
+ require File.join(File.dirname(__FILE__), 'helper')
2
+
3
+ class TestGmanLocality < Minitest::Test
4
+ should "parse the alpha2" do
5
+ assert_equal "us", Gman.new("whitehouse.gov").alpha2
6
+ assert_equal "us", Gman.new("army.mil").alpha2
7
+ assert_equal "gb", Gman.new("foo.gov.uk").alpha2
8
+ assert_equal "ca", Gman.new("gov.ca").alpha2
9
+ end
10
+ end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: gman
3
3
  version: !ruby/object:Gem::Version
4
- version: 4.3.1
4
+ version: 4.4.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ben Balter
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2015-01-22 00:00:00.000000000 Z
11
+ date: 2015-01-28 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: swot
@@ -30,14 +30,14 @@ dependencies:
30
30
  requirements:
31
31
  - - "~>"
32
32
  - !ruby/object:Gem::Version
33
- version: '0.4'
33
+ version: '0.6'
34
34
  type: :runtime
35
35
  prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
38
  - - "~>"
39
39
  - !ruby/object:Gem::Version
40
- version: '0.4'
40
+ version: '0.6'
41
41
  - !ruby/object:Gem::Dependency
42
42
  name: naughty_or_nice
43
43
  requirement: !ruby/object:Gem::Requirement
@@ -52,6 +52,20 @@ dependencies:
52
52
  - - "~>"
53
53
  - !ruby/object:Gem::Version
54
54
  version: 0.0.2
55
+ - !ruby/object:Gem::Dependency
56
+ name: colorize
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: '0.7'
62
+ type: :runtime
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: '0.7'
55
69
  - !ruby/object:Gem::Dependency
56
70
  name: rake
57
71
  requirement: !ruby/object:Gem::Requirement
@@ -140,18 +154,48 @@ description: A ruby gem to check if the owner of a given email address is workin
140
154
  for THE MAN.
141
155
  email: ben.balter@github.com
142
156
  executables:
157
+ - gman
143
158
  - gman_filter
144
159
  extensions: []
145
160
  extra_rdoc_files: []
146
161
  files:
162
+ - ".gitignore"
163
+ - ".travis.yml"
164
+ - CONTRIBUTING.md
165
+ - Gemfile
147
166
  - LICENSE
167
+ - README.md
168
+ - Rakefile
169
+ - bin/gman
148
170
  - bin/gman_filter
149
171
  - config/domains.txt
150
172
  - config/vendor/dotgovs.csv
173
+ - gman.gemspec
151
174
  - lib/gman.rb
152
175
  - lib/gman/country_codes.rb
153
176
  - lib/gman/identifier.rb
154
177
  - lib/gman/locality.rb
178
+ - lib/gman/parser.rb
179
+ - script/alphabetize
180
+ - script/build
181
+ - script/cibuild
182
+ - script/console
183
+ - script/dedupe
184
+ - script/prune
185
+ - script/release
186
+ - script/state-domains
187
+ - script/vendor-de
188
+ - script/vendor-gov-list
189
+ - script/vendor-us
190
+ - test/helper.rb
191
+ - test/obama.txt
192
+ - test/test_domains.rb
193
+ - test/test_gman.rb
194
+ - test/test_gman_bin.rb
195
+ - test/test_gman_country_codes.rb
196
+ - test/test_gman_filter.rb
197
+ - test/test_gman_identifier.rb
198
+ - test/test_gman_locality.rb
155
199
  homepage: https://github.com/benbalter/gman
156
200
  licenses:
157
201
  - MIT
@@ -176,4 +220,13 @@ rubygems_version: 2.2.0
176
220
  signing_key:
177
221
  specification_version: 4
178
222
  summary: Check if a given domain or email address belong to a governemnt entity
179
- test_files: []
223
+ test_files:
224
+ - test/helper.rb
225
+ - test/obama.txt
226
+ - test/test_domains.rb
227
+ - test/test_gman.rb
228
+ - test/test_gman_bin.rb
229
+ - test/test_gman_country_codes.rb
230
+ - test/test_gman_filter.rb
231
+ - test/test_gman_identifier.rb
232
+ - test/test_gman_locality.rb