kevintyll-ofac 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/History.txt +9 -0
- data/LICENSE +20 -0
- data/PostInstall.txt +11 -0
- data/README.rdoc +109 -0
- data/Rakefile +57 -0
- data/VERSION.yml +4 -0
- data/generators/ofac_migration/ofac_migration_generator.rb +12 -0
- data/generators/ofac_migration/templates/migration.rb +30 -0
- data/lib/ofac.rb +9 -0
- data/lib/ofac/models/ofac.rb +119 -0
- data/lib/ofac/models/ofac_sdn.rb +5 -0
- data/lib/ofac/models/ofac_sdn_loader.rb +305 -0
- data/lib/ofac/ofac_match.rb +132 -0
- data/lib/ofac/ruby_string_extensions.rb +22 -0
- data/lib/tasks/ofac.rake +8 -0
- data/test/files/test_address_data_load.pip +10 -0
- data/test/files/test_alt_data_load.pip +10 -0
- data/test/files/test_sdn_data_load.pip +9 -0
- data/test/files/valid_flattened_file.csv +19 -0
- data/test/mocks/test/ofac_sdn_loader.rb +20 -0
- data/test/ofac_sdn_loader_test.rb +40 -0
- data/test/ofac_test.rb +76 -0
- data/test/test_helper.rb +48 -0
- metadata +90 -0
data/History.txt
ADDED
data/LICENSE
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright (c) 2009 Kevin Tyll
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
4
|
+
a copy of this software and associated documentation files (the
|
5
|
+
"Software"), to deal in the Software without restriction, including
|
6
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
7
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8
|
+
permit persons to whom the Software is furnished to do so, subject to
|
9
|
+
the following conditions:
|
10
|
+
|
11
|
+
The above copyright notice and this permission notice shall be
|
12
|
+
included in all copies or substantial portions of the Software.
|
13
|
+
|
14
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/PostInstall.txt
ADDED
@@ -0,0 +1,11 @@
|
|
1
|
+
For more information on ofac, see http://kevintyll.github.com/ofac/
|
2
|
+
|
3
|
+
* To create the necessary db migration, from the command line, run:
|
4
|
+
script/generate ofac_migration
|
5
|
+
* Require the gem in your environment.rb file in the Rails::Initializer block:
|
6
|
+
config.gem 'kevintyll-ofac', :lib => 'ofac'
|
7
|
+
* To load your table with the current OFAC data, from the command line, run:
|
8
|
+
rake ofac:update_data
|
9
|
+
|
10
|
+
* The OFAC data is not updated with any regularity, but you can sign up for email notifications when the data changes at
|
11
|
+
http://www.treas.gov/offices/enforcement/ofac/sdn/index.shtml.
|
data/README.rdoc
ADDED
@@ -0,0 +1,109 @@
|
|
1
|
+
= ofac
|
2
|
+
|
3
|
+
* http://kevintyll.github.com/ofac
|
4
|
+
* http://www.drexel-labs.com
|
5
|
+
|
6
|
+
* http://www.treas.gov/offices/enforcement/ofac/sdn/index.shtml
|
7
|
+
|
8
|
+
== DESCRIPTION:
|
9
|
+
|
10
|
+
ofac is a ruby gem that tries to find a match of a person's name and address against the
|
11
|
+
Office of Foreign Assets Control's Specially Designated Nationals list...the so called
|
12
|
+
terrorist watch list.
|
13
|
+
|
14
|
+
This gem, like the ssn_validator gem, started as a need for the company I work for, Clarity Services Inc.
|
15
|
+
We decided once again to create a gem out of it and share it with the community. Much
|
16
|
+
thanks goes to the management at Clarity Services Inc. for allowing this code to be open sourced. Thanks
|
17
|
+
also to Larry Berland at Clarity Services Inc. The matching logic in the ofac_match.rb file was derived from
|
18
|
+
his work.
|
19
|
+
|
20
|
+
== FEATURES:
|
21
|
+
|
22
|
+
Creates a score, 1 - 100, based on how well the name, address and city match the data on the SDN list. Since
|
23
|
+
we have to match on strings, the likely hood of an exact match are virtually nil. So we've created an
|
24
|
+
algorithm that creates a score. The better the match, the higher the score. A score of 100 would be
|
25
|
+
a perfect match.
|
26
|
+
|
27
|
+
The score is calculated by adding up the weightings of each part that is matched. So
|
28
|
+
if only name is matched, then the max score is the weight for <tt>:name</tt> which is 60
|
29
|
+
|
30
|
+
It's possible to get partial matches, which will add partial weight to the score. If there
|
31
|
+
is not a match on the element as it is passed in, then each word element gets broken down
|
32
|
+
and matches are tried on each partial element. The weighting is distrubuted equally for
|
33
|
+
each partial that is matched.
|
34
|
+
|
35
|
+
If exact matches are not made, then a sounds like match is attempted. Any match made by sounds like
|
36
|
+
is given 75% of it's weight to the score.
|
37
|
+
Example:
|
38
|
+
|
39
|
+
If you are trying to match the name Kevin Tyll and there is a record for Smith, Kevin in the database, then
|
40
|
+
we will try to match both Kevin and Tyll separately, with each element Smith and Kevin. Since only Kevin
|
41
|
+
will find a match, and there were 2 elements in the searched name, the score will be added by half the weighting
|
42
|
+
for <tt>:name</tt>. So since the weight for <tt>:name</tt> is 60, then we will add 30 to the score.
|
43
|
+
|
44
|
+
If you are trying to match the name Kevin Gregory Tyll and there is a record for Tyll, Kevin in the database, then
|
45
|
+
we will try to match Kevin and Gregory and Tyll separately, with each element Tyll and Kevin. Since both Kevin
|
46
|
+
and Tyll will find a match, and there were 3 elements in the searched name, the score will be added by 2/3 the weighting
|
47
|
+
for <tt>:name</tt>. So since the weight for <tt>:name</tt> is 60, then we will add 40 to the score.
|
48
|
+
|
49
|
+
If you are trying to match the name Kevin Tyll and there is a record for Kevin Gregory Tyll in the database, then
|
50
|
+
we will try to match Kevin and Tyll separately, with each element Tyll and Kevin and Gregory. Since both Kevin
|
51
|
+
and Tyll will find a match, and there were 2 elements in the searched name, the score will be added by 2/2 the weighting
|
52
|
+
for <tt>:name</tt>. So since the weight for <tt>:name</tt> is 60, then we will add 60 to the score.
|
53
|
+
|
54
|
+
If you are trying to match the name Kevin Tyll, and there is a record for Teel, Kevin in the database, then an exact match
|
55
|
+
will be found for Kevin, and a sounds like match will be made for Tyll. Since there were 2 elements in hte searched name,
|
56
|
+
and the weight for <tt>:name</tt> is 60, then each element is worth 30. Since Kevin was an exact match, it will add 30, and
|
57
|
+
since Tyll was a sounds like match, it will add 30 * .75. So the <tt>:name</tt> portion of the search will be worth 53.
|
58
|
+
|
59
|
+
Matches for name are made for both the name and any aliases in the OFAC database.
|
60
|
+
|
61
|
+
Matches for <tt>:city</tt> and <tt>:address</tt> will only be added to the score if there is first a match on <tt>:name</tt>.
|
62
|
+
|
63
|
+
== SYNOPSIS:
|
64
|
+
Accepts a hash with the identity's demographic information
|
65
|
+
|
66
|
+
Ofac.new({:name => 'Oscar Hernandez', :city => 'Clearwater', :address => '123 somewhere ln'})
|
67
|
+
|
68
|
+
<tt>:name</tt> is required to get a score. If <tt>:name</tt> is missing, an error will not be thrown, but a score of 0 will be returned.
|
69
|
+
|
70
|
+
The more information provided, the higher the score could be. A score of 100 would mean all fields
|
71
|
+
were passed in, and all fields were 100% matches. If only the name is passed in without an address,
|
72
|
+
it will be impossible to get a score of 100, even if the name matches perfectly.
|
73
|
+
|
74
|
+
Acceptable hash keys and their weighting in score calculation:
|
75
|
+
|
76
|
+
* <tt>:name</tt> (weighting = 60%) (required) This can be a person, business, or marine vessel
|
77
|
+
* <tt>:address</tt> (weighting = 10%)
|
78
|
+
* <tt>:city</tt> (weighting = 30%)
|
79
|
+
|
80
|
+
* Instantiate the object with the identity's name, street address, and city.
|
81
|
+
ofac = Ofac.new(:name => 'Kevin Tyll', :city => 'Clearwater', :address => '123 Somewhere Ln.')
|
82
|
+
|
83
|
+
* Then get the score
|
84
|
+
ofac.score => return the score 1 - 100
|
85
|
+
|
86
|
+
* You can also get the list of all the partial matches with the score of each record.
|
87
|
+
ofac.possible_hits => returns an array of hashes.
|
88
|
+
|
89
|
+
== REQUIREMENTS:
|
90
|
+
|
91
|
+
* Rails 2.0.0 or greater
|
92
|
+
|
93
|
+
== INSTALL:
|
94
|
+
|
95
|
+
* To install the gem:
|
96
|
+
sudo gem install kevintyll-ofac
|
97
|
+
* To create the necessary db migration, from the command line, run:
|
98
|
+
script/generate ofac_migration
|
99
|
+
* Require the gem in your environment.rb file in the Rails::Initializer block:
|
100
|
+
config.gem 'kevintyll-ofac', :lib => 'ofac'
|
101
|
+
* To load your table with the current OFAC data, from the command line, run:
|
102
|
+
rake ofac:update_data
|
103
|
+
|
104
|
+
* The OFAC data is not updated with any regularity, but you can sign up for email notifications when the data changes at
|
105
|
+
http://www.treas.gov/offices/enforcement/ofac/sdn/index.shtml.
|
106
|
+
|
107
|
+
== Copyright
|
108
|
+
|
109
|
+
Copyright (c) 2009 Kevin Tyll. See LICENSE for details.
|
data/Rakefile
ADDED
@@ -0,0 +1,57 @@
|
|
1
|
+
require 'rubygems'
|
2
|
+
require 'rake'
|
3
|
+
|
4
|
+
begin
|
5
|
+
require 'jeweler'
|
6
|
+
Jeweler::Tasks.new do |gem|
|
7
|
+
gem.name = "ofac"
|
8
|
+
gem.summary = %Q{Attempts to find a hit on the Office of Foreign Assets Control's Specially Designated Nationals list.}
|
9
|
+
gem.description = %Q{Attempts to find a hit on the Office of Foreign Assets Control's Specially Designated Nationals list.}
|
10
|
+
gem.email = "kevintyll@gmail.com"
|
11
|
+
gem.homepage = "http://github.com/kevintyll/ofac"
|
12
|
+
gem.authors = ["Kevin Tyll"]
|
13
|
+
gem.post_install_message = File.readlines("PostInstall.txt").join("")
|
14
|
+
# gem is a Gem::Specification... see http://www.rubygems.org/read/chapter/20 for additional settings
|
15
|
+
end
|
16
|
+
rescue LoadError
|
17
|
+
puts "Jeweler not available. Install it with: sudo gem install technicalpickles-jeweler -s http://gems.github.com"
|
18
|
+
end
|
19
|
+
|
20
|
+
require 'rake/testtask'
|
21
|
+
Rake::TestTask.new(:test) do |test|
|
22
|
+
test.libs << 'lib' << 'test'
|
23
|
+
test.pattern = 'test/**/*_test.rb'
|
24
|
+
test.verbose = true
|
25
|
+
end
|
26
|
+
|
27
|
+
begin
|
28
|
+
require 'rcov/rcovtask'
|
29
|
+
Rcov::RcovTask.new do |test|
|
30
|
+
test.libs << 'test'
|
31
|
+
test.pattern = 'test/**/*_test.rb'
|
32
|
+
test.verbose = true
|
33
|
+
end
|
34
|
+
rescue LoadError
|
35
|
+
task :rcov do
|
36
|
+
abort "RCov is not available. In order to run rcov, you must: sudo gem install spicycode-rcov"
|
37
|
+
end
|
38
|
+
end
|
39
|
+
|
40
|
+
|
41
|
+
task :default => :test
|
42
|
+
|
43
|
+
require 'rake/rdoctask'
|
44
|
+
Rake::RDocTask.new do |rdoc|
|
45
|
+
if File.exist?('VERSION.yml')
|
46
|
+
config = YAML.load(File.read('VERSION.yml'))
|
47
|
+
version = "#{config[:major]}.#{config[:minor]}.#{config[:patch]}"
|
48
|
+
else
|
49
|
+
version = ""
|
50
|
+
end
|
51
|
+
|
52
|
+
rdoc.rdoc_dir = 'rdoc'
|
53
|
+
rdoc.title = "ofac #{version}"
|
54
|
+
rdoc.rdoc_files.include('README*')
|
55
|
+
rdoc.rdoc_files.include('lib/**/*.rb')
|
56
|
+
end
|
57
|
+
|
data/VERSION.yml
ADDED
@@ -0,0 +1,30 @@
|
|
1
|
+
class CreateOfacSdnTable < ActiveRecord::Migration
|
2
|
+
|
3
|
+
def self.up
|
4
|
+
create_table :ofac_sdns do |t|
|
5
|
+
t.text :name
|
6
|
+
t.string :sdn_type
|
7
|
+
t.string :program
|
8
|
+
t.string :title
|
9
|
+
t.string :vessel_call_sign
|
10
|
+
t.string :vessel_type
|
11
|
+
t.string :vessel_tonnage
|
12
|
+
t.string :gross_registered_tonnage
|
13
|
+
t.string :vessel_flag
|
14
|
+
t.string :vessel_owner
|
15
|
+
t.text :remarks
|
16
|
+
t.text :address
|
17
|
+
t.string :city
|
18
|
+
t.string :country
|
19
|
+
t.string :address_remarks
|
20
|
+
t.string :alternate_identity_type
|
21
|
+
t.text :alternate_identity_name
|
22
|
+
t.string :alternate_identity_remarks
|
23
|
+
t.timestamps
|
24
|
+
end
|
25
|
+
end
|
26
|
+
|
27
|
+
def self.down
|
28
|
+
drop_table :ofac_sdns
|
29
|
+
end
|
30
|
+
end
|
data/lib/ofac.rb
ADDED
@@ -0,0 +1,119 @@
|
|
1
|
+
require 'activerecord'
|
2
|
+
require 'active_record/connection_adapters/mysql_adapter'
|
3
|
+
|
4
|
+
class Ofac
|
5
|
+
|
6
|
+
|
7
|
+
# Accepts a hash with the identity's demographic information
|
8
|
+
#
|
9
|
+
# Ofac.new({:name => 'Oscar Hernandez', :city => 'Clearwater', :address => '123 somewhere ln'})
|
10
|
+
#
|
11
|
+
# <tt>:name</tt> is required to get a score. If <tt>:name</tt> is missing, an error will not be thrown, but a score of 0 will be returned.
|
12
|
+
#
|
13
|
+
# The more information provided, the higher the score could be. A score of 100 would mean all fields
|
14
|
+
# were passed in, and all fields were 100% matches. If only the name is passed in without an address,
|
15
|
+
# it will be impossible to get a score of 100, even if the name matches perfectly.
|
16
|
+
#
|
17
|
+
# Acceptable hash keys and their weighting in score calculation:
|
18
|
+
#
|
19
|
+
# * <tt>:name</tt> (weighting = 60%) (required) This can be a person, business, or marine vessel
|
20
|
+
# * <tt>:address</tt> (weighting = 10%)
|
21
|
+
# * <tt>:city</tt> (weighting = 30%)
|
22
|
+
def initialize(identity)
|
23
|
+
@identity = identity
|
24
|
+
end
|
25
|
+
|
26
|
+
# Creates a score, 1 - 100, based on how well the name and address match the data on the
|
27
|
+
# SDN (Specially Designated Nationals) list.
|
28
|
+
#
|
29
|
+
# The score is calculated by adding up the weightings of each part that is matched. So
|
30
|
+
# if only name is matched, then the max score is the weight for <tt>:name</tt> which is 60
|
31
|
+
#
|
32
|
+
# It's possible to get partial matches, which will add partial weight to the score. If there
|
33
|
+
# is not a match on the element as it is passed in, then each word element gets broken down
|
34
|
+
# and matches are tried on each partial element. The weighting is distrubuted equally for
|
35
|
+
# each partial that is matched.
|
36
|
+
#
|
37
|
+
# If exact matches are not made, then a sounds like match is attempted. Any match made by sounds like
|
38
|
+
# is given 75% of it's weight to the score.
|
39
|
+
#
|
40
|
+
# Example:
|
41
|
+
#
|
42
|
+
# If you are trying to match the name Kevin Tyll and there is a record for Smith, Kevin in the database, then
|
43
|
+
# we will try to match both Kevin and Tyll separately, with each element Smith and Kevin. Since only Kevin
|
44
|
+
# will find a match, and there were 2 elements in the searched name, the score will be added by half the weighting
|
45
|
+
# for <tt>:name</tt>. So since the weight for <tt>:name</tt> is 60, then we will add 30 to the score.
|
46
|
+
#
|
47
|
+
# If you are trying to match the name Kevin Gregory Tyll and there is a record for Tyll, Kevin in the database, then
|
48
|
+
# we will try to match Kevin and Gregory and Tyll separately, with each element Tyll and Kevin. Since both Kevin
|
49
|
+
# and Tyll will find a match, and there were 3 elements in the searched name, the score will be added by 2/3 the weighting
|
50
|
+
# for <tt>:name</tt>. So since the weight for <tt>:name</tt> is 60, then we will add 40 to the score.
|
51
|
+
#
|
52
|
+
# If you are trying to match the name Kevin Tyll and there is a record for Kevin Gregory Tyll in the database, then
|
53
|
+
# we will try to match Kevin and Tyll separately, with each element Tyll and Kevin and Gregory. Since both Kevin
|
54
|
+
# and Tyll will find a match, and there were 2 elements in the searched name, the score will be added by 2/2 the weighting
|
55
|
+
# for <tt>:name</tt>. So since the weight for <tt>:name</tt> is 60, then we will add 60 to the score.
|
56
|
+
#
|
57
|
+
# If you are trying to match the name Kevin Tyll, and there is a record for Teel, Kevin in the database, then an exact match
|
58
|
+
# will be found for Kevin, and a sounds like match will be made for Tyll. Since there were 2 elements in hte searched name,
|
59
|
+
# and the weight for <tt>:name</tt> is 60, then each element is worth 30. Since Kevin was an exact match, it will add 30, and
|
60
|
+
# since Tyll was a sounds like match, it will add 30 * .75. So the <tt>:name</tt> portion of the search will be worth 53.
|
61
|
+
#
|
62
|
+
# Matches for name are made for both the name and any aliases in the OFAC database.
|
63
|
+
#
|
64
|
+
# Matches for <tt>:city</tt> and <tt>:address</tt> will only be added to the score if there is first a match on <tt>:name</tt>.
|
65
|
+
def score
|
66
|
+
@score || calculate_score
|
67
|
+
end
|
68
|
+
|
69
|
+
# Returns an array of hashes of records in the OFAC data that found partial matches with that record's score.
|
70
|
+
#
|
71
|
+
# Ofac.new({:name => 'Oscar Hernandez', :city => 'Clearwater', :address => '123 somewhere ln'}).possible_hits
|
72
|
+
#returns
|
73
|
+
# [{:address=>"123 Somewhere Ln", :score=>100, :name=>"HERNANDEZ, Oscar|GUAMATUR, S.A.", :city=>"Clearwater"}, {:address=>"123 Somewhere Ln", :score=>100, :name=>"HERNANDEZ, Oscar|Alternate Name", :city=>"Clearwater"}]
|
74
|
+
#
|
75
|
+
def possible_hits
|
76
|
+
@possible_hits || retrieve_possible_hits
|
77
|
+
end
|
78
|
+
|
79
|
+
private
|
80
|
+
|
81
|
+
def retrieve_possible_hits
|
82
|
+
score
|
83
|
+
@possible_hits
|
84
|
+
end
|
85
|
+
|
86
|
+
def calculate_score
|
87
|
+
unless @identity[:name].to_s == ''
|
88
|
+
if OfacSdn.connection.kind_of?(ActiveRecord::ConnectionAdapters::MysqlAdapter)
|
89
|
+
#first get a list from the database of possible matches by name
|
90
|
+
#this query is pretty liberal, we just want to get a list of possible
|
91
|
+
#matches from the database that we can run through our ruby matching algorithm
|
92
|
+
partial_name = @identity[:name].gsub!(/\W/,'|')
|
93
|
+
name_array = partial_name.split('|')
|
94
|
+
name_array.delete('')
|
95
|
+
sql_name_partial = name_array.collect {|partial_name| "INSTR(SUBSTR(SOUNDEX(concat('O',name)), 2), REPLACE(SUBSTR(SOUNDEX('O#{partial_name}'), 2), '0', '')) > 0"}.join(' and ')
|
96
|
+
sql_alt_name_partial = name_array.collect {|partial_name| "INSTR(SUBSTR(SOUNDEX(concat('O',alternate_identity_name)), 2), REPLACE(SUBSTR(SOUNDEX('O#{partial_name}'), 2), '0', '')) > 0"}.join(' and ')
|
97
|
+
##this sql for getting "accurate sounds like" functionality comes from:
|
98
|
+
#http://jgeewax.wordpress.com/2006/07/21/efficient-sounds-like-searches-in-mysql/
|
99
|
+
possible_sdns = OfacSdn.connection.select_all("select concat(name,'|', alternate_identity_name) name, address, city
|
100
|
+
from ofac_sdns
|
101
|
+
where name is not null
|
102
|
+
and (((#{sql_name_partial}))
|
103
|
+
or ((#{sql_alt_name_partial})))")
|
104
|
+
else
|
105
|
+
possible_sdns = OfacSdn.find(:all, :select => 'name, alternate_identity_name, address, city').collect{|sdn| {:name => "#{sdn.name}|#{sdn.alternate_identity_name}", :address => sdn.address, :city => sdn.city}}
|
106
|
+
end
|
107
|
+
|
108
|
+
match = OfacMatch.new({:name => {:weight => 60, :token => "#{@identity[:name]}"},
|
109
|
+
:address => {:weight => 10, :token => @identity[:address]},
|
110
|
+
:city => {:weight => 30, :token => @identity[:city]}})
|
111
|
+
|
112
|
+
score = match.score(possible_sdns)
|
113
|
+
@possible_hits = match.possible_hits
|
114
|
+
end
|
115
|
+
@score = score || 0
|
116
|
+
return @score
|
117
|
+
end
|
118
|
+
|
119
|
+
end
|
@@ -0,0 +1,305 @@
|
|
1
|
+
require 'net/http'
|
2
|
+
require 'activerecord'
|
3
|
+
require 'active_record/connection_adapters/mysql_adapter'
|
4
|
+
|
5
|
+
class OfacSdnLoader
|
6
|
+
|
7
|
+
|
8
|
+
#Loads the most recent file from http://www.treas.gov/offices/enforcement/ofac/sdn/delimit/index.shtml
|
9
|
+
def self.load_current_sdn_file
|
10
|
+
puts "Reloading OFAC sdn data"
|
11
|
+
puts "Downloading OFAC data from http://www.treas.gov/offices/enforcement/ofac/sdn"
|
12
|
+
#get the 3 data files
|
13
|
+
sdn = Tempfile.new('sdn')
|
14
|
+
sdn.write(Net::HTTP.get(URI.parse('http://www.treas.gov/offices/enforcement/ofac/sdn/delimit/sdn.pip')))
|
15
|
+
sdn.rewind
|
16
|
+
address = Tempfile.new('sdn')
|
17
|
+
address.write(Net::HTTP.get(URI.parse('http://www.treas.gov/offices/enforcement/ofac/sdn/delimit/add.pip')))
|
18
|
+
address.rewind
|
19
|
+
alt = Tempfile.new('sdn')
|
20
|
+
alt.write(Net::HTTP.get(URI.parse('http://www.treas.gov/offices/enforcement/ofac/sdn/delimit/alt.pip')))
|
21
|
+
alt.rewind
|
22
|
+
|
23
|
+
if OfacSdn.connection.kind_of?(ActiveRecord::ConnectionAdapters::MysqlAdapter)
|
24
|
+
puts "Converting file to csv format for Mysql import. This could take several minutes."
|
25
|
+
|
26
|
+
csv_file = convert_to_flattened_csv(sdn, address, alt)
|
27
|
+
|
28
|
+
bulk_mysql_update(csv_file)
|
29
|
+
else
|
30
|
+
active_record_file_load(sdn, address, alt)
|
31
|
+
end
|
32
|
+
|
33
|
+
sdn.close
|
34
|
+
@address.close
|
35
|
+
@alt.close
|
36
|
+
end
|
37
|
+
|
38
|
+
|
39
|
+
private
|
40
|
+
|
41
|
+
#convert the file's null value to an empty string
|
42
|
+
#and removes " chars.
|
43
|
+
def self.clean_file_string(line)
|
44
|
+
line.gsub!(/-0-(\s)?/,'')
|
45
|
+
line.gsub!(/\n/,'')
|
46
|
+
line.gsub(/\"/,'')
|
47
|
+
end
|
48
|
+
|
49
|
+
#split the line into an array
|
50
|
+
def self.convert_line_to_array(line)
|
51
|
+
clean_file_string(line).split('|') unless line.nil?
|
52
|
+
end
|
53
|
+
|
54
|
+
#return an 2 arrays of the records matching the sdn primary key
|
55
|
+
#1 array of address records and one array of alt records
|
56
|
+
def self.foreign_key_records(sdn_id)
|
57
|
+
address_records = []
|
58
|
+
alt_records = []
|
59
|
+
|
60
|
+
#the first element in each array is the primary and foreign keys
|
61
|
+
#we are denormalizing the data
|
62
|
+
if @current_address_hash && @current_address_hash[:id] == sdn_id
|
63
|
+
address_records << @current_address_hash
|
64
|
+
loop do
|
65
|
+
@current_address_hash = address_text_to_hash(@address.gets)
|
66
|
+
if @current_address_hash && @current_address_hash[:id] == sdn_id
|
67
|
+
address_records << @current_address_hash
|
68
|
+
else
|
69
|
+
break
|
70
|
+
end
|
71
|
+
end
|
72
|
+
end
|
73
|
+
|
74
|
+
if @current_alt_hash && @current_alt_hash[:id] == sdn_id
|
75
|
+
alt_records << @current_alt_hash
|
76
|
+
loop do
|
77
|
+
@current_alt_hash = alt_text_to_hash(@alt.gets)
|
78
|
+
if @current_alt_hash && @current_alt_hash[:id] == sdn_id
|
79
|
+
alt_records << @current_alt_hash
|
80
|
+
else
|
81
|
+
break
|
82
|
+
end
|
83
|
+
end
|
84
|
+
end
|
85
|
+
return address_records, alt_records
|
86
|
+
end
|
87
|
+
|
88
|
+
def self.sdn_text_to_hash(line)
|
89
|
+
unless line.nil?
|
90
|
+
value_array = convert_line_to_array(line)
|
91
|
+
{:id => value_array[0],
|
92
|
+
:name => value_array[1],
|
93
|
+
:sdn_type => value_array[2],
|
94
|
+
:program => value_array[3],
|
95
|
+
:title => value_array[4],
|
96
|
+
:vessel_call_sign => value_array[5],
|
97
|
+
:vessel_type => value_array[6],
|
98
|
+
:vessel_tonnage => value_array[7],
|
99
|
+
:gross_registered_tonnage => value_array[8],
|
100
|
+
:vessel_flag => value_array[9],
|
101
|
+
:vessel_owner => value_array[10],
|
102
|
+
:remarks => value_array[11]
|
103
|
+
}
|
104
|
+
end
|
105
|
+
end
|
106
|
+
|
107
|
+
def self.address_text_to_hash(line)
|
108
|
+
unless line.nil?
|
109
|
+
value_array = convert_line_to_array(line)
|
110
|
+
{:id => value_array[0],
|
111
|
+
:address => value_array[2],
|
112
|
+
:city => value_array[3],
|
113
|
+
:country => value_array[4],
|
114
|
+
:address_remarks => value_array[5]
|
115
|
+
}
|
116
|
+
end
|
117
|
+
end
|
118
|
+
|
119
|
+
def self.alt_text_to_hash(line)
|
120
|
+
unless line.nil?
|
121
|
+
value_array = convert_line_to_array(line)
|
122
|
+
{:id => value_array[0],
|
123
|
+
:alternate_identity_type => value_array[2],
|
124
|
+
:alternate_identity_name => value_array[3],
|
125
|
+
:alternate_identity_remarks => value_array[4]
|
126
|
+
}
|
127
|
+
end
|
128
|
+
end
|
129
|
+
|
130
|
+
def self.convert_hash_to_mysql_import_string(record_hash)
|
131
|
+
# empty field for id to be generated by mysql.
|
132
|
+
new_line = "``|" +
|
133
|
+
# :name
|
134
|
+
"`#{record_hash[:name]}`|" +
|
135
|
+
# :sdn_type
|
136
|
+
"`#{record_hash[:sdn_type]}`|" +
|
137
|
+
# :program
|
138
|
+
"`#{record_hash[:program]}`|" +
|
139
|
+
# :title
|
140
|
+
"`#{record_hash[:title]}`|" +
|
141
|
+
# :vessel_call_sign
|
142
|
+
"`#{record_hash[:vessel_call_sign]}`|" +
|
143
|
+
# :vessel_type
|
144
|
+
"`#{record_hash[:vessel_type]}`|" +
|
145
|
+
# :vessel_tonnage
|
146
|
+
"`#{record_hash[:vessel_tonnage]}`|" +
|
147
|
+
# :gross_registered_tonnage
|
148
|
+
"`#{record_hash[:gross_registered_tonnage]}`|" +
|
149
|
+
# :vessel_flag
|
150
|
+
"`#{record_hash[:vessel_flag]}`|" +
|
151
|
+
# :vessel_owner
|
152
|
+
"`#{record_hash[:vessel_owner]}`|" +
|
153
|
+
# :remarks
|
154
|
+
"`#{record_hash[:remarks]}`|" +
|
155
|
+
# :address
|
156
|
+
"`#{record_hash[:address]}`|" +
|
157
|
+
# :city
|
158
|
+
"`#{record_hash[:city]}`|" +
|
159
|
+
# :country
|
160
|
+
"`#{record_hash[:country]}`|" +
|
161
|
+
# :address_remarks
|
162
|
+
"`#{record_hash[:address_remarks]}`|" +
|
163
|
+
# :alternate_identity_type
|
164
|
+
"`#{record_hash[:alternate_identity_type]}`|" +
|
165
|
+
# :alternate_identity_name
|
166
|
+
"`#{record_hash[:alternate_identity_name]}`|" +
|
167
|
+
# :alternate_identity_remarks
|
168
|
+
"`#{record_hash[:alternate_identity_remarks]}`|" +
|
169
|
+
#:created_at
|
170
|
+
"`#{Time.now.to_s(:db)}`|" +
|
171
|
+
# updated_at
|
172
|
+
"`#{Time.now.to_s(:db)}`" + "\n"
|
173
|
+
|
174
|
+
new_line
|
175
|
+
end
|
176
|
+
|
177
|
+
def self.convert_to_flattened_csv(sdn_file, address_file, alt_file)
|
178
|
+
@address = address_file
|
179
|
+
@alt = alt_file
|
180
|
+
|
181
|
+
csv_file = Tempfile.new("ofac") # create temp file for converted csv format.
|
182
|
+
#get the first line from the address and alt files
|
183
|
+
@current_address_hash = address_text_to_hash(@address.gets)
|
184
|
+
@current_alt_hash = alt_text_to_hash(@alt.gets)
|
185
|
+
|
186
|
+
start = Time.now
|
187
|
+
|
188
|
+
sdn_file.each_with_index do |line, i|
|
189
|
+
|
190
|
+
#initialize the address and alt atributes to empty strings
|
191
|
+
address_attributes = address_text_to_hash("|||||")
|
192
|
+
alt_attributes = alt_text_to_hash("||||")
|
193
|
+
|
194
|
+
sdn_attributes = sdn_text_to_hash(line)
|
195
|
+
|
196
|
+
#get the foreign key records for this sdn
|
197
|
+
address_records, alt_records = foreign_key_records(sdn_attributes[:id])
|
198
|
+
|
199
|
+
if address_records.empty?
|
200
|
+
#no matching address records, so initialized blank values will be used.
|
201
|
+
if alt_records.empty?
|
202
|
+
#no matching address records, so initialized blank values will be used.
|
203
|
+
csv_file.syswrite(convert_hash_to_mysql_import_string(sdn_attributes.merge(address_attributes).merge(alt_attributes)))
|
204
|
+
else
|
205
|
+
alt_records.each do |alt|
|
206
|
+
csv_file.syswrite(convert_hash_to_mysql_import_string(sdn_attributes.merge(address_attributes).merge(alt)))
|
207
|
+
end
|
208
|
+
end
|
209
|
+
else
|
210
|
+
address_records.each do |address|
|
211
|
+
if alt_records.empty?
|
212
|
+
#no matching address records, so initialized blank values will be used.
|
213
|
+
csv_file.syswrite(convert_hash_to_mysql_import_string(sdn_attributes.merge(address).merge(alt_attributes)))
|
214
|
+
else
|
215
|
+
alt_records.each do |alt|
|
216
|
+
csv_file.syswrite(convert_hash_to_mysql_import_string(sdn_attributes.merge(address).merge(alt)))
|
217
|
+
end
|
218
|
+
end
|
219
|
+
end
|
220
|
+
end
|
221
|
+
puts "#{i} records processed." if (i % 1000 == 0) && (i > 0)
|
222
|
+
end
|
223
|
+
puts "File conversion ran for #{(Time.now - start) / 60} minutes."
|
224
|
+
return csv_file
|
225
|
+
end
|
226
|
+
|
227
|
+
def self.active_record_file_load(sdn_file, address_file, alt_file)
|
228
|
+
@address = address_file
|
229
|
+
@alt = alt_file
|
230
|
+
|
231
|
+
#OFAC data is a complete list, so we have to dump and load
|
232
|
+
OfacSdn.delete_all
|
233
|
+
|
234
|
+
#get the first line from the address and alt files
|
235
|
+
@current_address_hash = address_text_to_hash(@address.gets)
|
236
|
+
@current_alt_hash = alt_text_to_hash(@alt.gets)
|
237
|
+
attributes = {}
|
238
|
+
sdn_file.each_with_index do |line, i|
|
239
|
+
|
240
|
+
#initialize the address and alt atributes to empty strings
|
241
|
+
address_attributes = address_text_to_hash("|||||")
|
242
|
+
alt_attributes = alt_text_to_hash("||||")
|
243
|
+
|
244
|
+
sdn_attributes = sdn_text_to_hash(line)
|
245
|
+
|
246
|
+
#get the foreign key records for this sdn
|
247
|
+
address_records, alt_records = foreign_key_records(sdn_attributes[:id])
|
248
|
+
|
249
|
+
if address_records.empty?
|
250
|
+
#no matching address records, so initialized blank values will be used.
|
251
|
+
if alt_records.empty?
|
252
|
+
#no matching address records, so initialized blank values will be used.
|
253
|
+
attributes = sdn_attributes.merge(address_attributes).merge(alt_attributes)
|
254
|
+
attributes.delete(:id)
|
255
|
+
OfacSdn.create(attributes)
|
256
|
+
else
|
257
|
+
alt_records.each do |alt|
|
258
|
+
attributes = sdn_attributes.merge(address_attributes).merge(alt)
|
259
|
+
attributes.delete(:id)
|
260
|
+
OfacSdn.create(attributes)
|
261
|
+
end
|
262
|
+
end
|
263
|
+
else
|
264
|
+
address_records.each do |address|
|
265
|
+
if alt_records.empty?
|
266
|
+
#no matching address records, so initialized blank values will be used.
|
267
|
+
attributes = sdn_attributes.merge(address).merge(alt_attributes)
|
268
|
+
attributes.delete(:id)
|
269
|
+
OfacSdn.create(attributes)
|
270
|
+
else
|
271
|
+
alt_records.each do |alt|
|
272
|
+
attributes = sdn_attributes.merge(address).merge(alt)
|
273
|
+
attributes.delete(:id)
|
274
|
+
OfacSdn.create(attributes)
|
275
|
+
end
|
276
|
+
end
|
277
|
+
end
|
278
|
+
end
|
279
|
+
|
280
|
+
puts "#{i} records processed." if (i % 5000 == 0) && (i > 0)
|
281
|
+
end
|
282
|
+
end
|
283
|
+
|
284
|
+
# For mysql, use:
|
285
|
+
# LOAD DATA LOCAL INFILE 'ssdm1.csv' INTO TABLE death_master_files FIELDS TERMINATED BY '|' ENCLOSED BY "`" LINES TERMINATED BY '\n';
|
286
|
+
# This is a much faster way of loading large amounts of data into mysql. For information on the LOAD DATA command
|
287
|
+
# see http://dev.mysql.com/doc/refman/5.1/en/load-data.html
|
288
|
+
def self.bulk_mysql_update(csv_file)
|
289
|
+
puts "Deleting all records in ofac_sdn..."
|
290
|
+
|
291
|
+
#OFAC data is a complete list, so we have to dump and load
|
292
|
+
OfacSdn.delete_all
|
293
|
+
|
294
|
+
puts "Importing into Mysql..."
|
295
|
+
|
296
|
+
mysql_command = <<-TEXT
|
297
|
+
LOAD DATA LOCAL INFILE '#{csv_file.path}' REPLACE INTO TABLE ofac_sdns FIELDS TERMINATED BY '|' ENCLOSED BY "`" LINES TERMINATED BY '\n';
|
298
|
+
TEXT
|
299
|
+
|
300
|
+
OfacSdn.connection.execute(mysql_command)
|
301
|
+
puts "Mysql import complete."
|
302
|
+
|
303
|
+
end
|
304
|
+
|
305
|
+
end
|
@@ -0,0 +1,132 @@
|
|
1
|
+
class OfacMatch
|
2
|
+
|
3
|
+
attr_reader :possible_hits
|
4
|
+
|
5
|
+
#Intialize a Match object with a record hash of fields you want to match on.
|
6
|
+
#Each key in the hash, also has a data hash value for the weight, token, and type.
|
7
|
+
#
|
8
|
+
# match = Ofac::Match.new({:name => {:weight => 10, :token => 'Kevin Tyll'},
|
9
|
+
# :city => {:weight => 40, :token => 'Clearwater', },
|
10
|
+
# :address => {:weight => 40, :token => '1234 Park St.', },
|
11
|
+
# :zip => {:weight => 10, :token => '33759', :type => :number}})
|
12
|
+
#
|
13
|
+
# data hash keys:
|
14
|
+
# * <tt>data[:weight]</tt> - value to apply to the score if there is a match (Default is 100/number of key in the record hash)
|
15
|
+
# * <tt>data[:token]</tt> - string to match
|
16
|
+
# * <tt>data[:match]</tt> - set from records hash
|
17
|
+
# * <tt>data[:score]</tt> - output field
|
18
|
+
# * <tt>data[:type]</tt> - the type of match that should be performed (valid values are +:sound+ | +:number+) (Default is +:sound+)
|
19
|
+
def initialize(stats={})
|
20
|
+
@possible_hits = []
|
21
|
+
@stats = stats.dup
|
22
|
+
weight = 100
|
23
|
+
weight = 100 / @stats.length if @stats.length > 0
|
24
|
+
@stats.each_value do |data|
|
25
|
+
data[:weight] ||= weight
|
26
|
+
data[:match] ||= ''
|
27
|
+
data[:type] ||= :sound
|
28
|
+
data[:score] ||= 0
|
29
|
+
data[:token] = data[:token].to_s.upcase
|
30
|
+
end
|
31
|
+
end
|
32
|
+
|
33
|
+
# match_records is an array of hashes.
|
34
|
+
#
|
35
|
+
# The hash keys must match the record hash keys set when initialized.
|
36
|
+
#
|
37
|
+
# score will return the highest score of all the records that
|
38
|
+
# are sent in match_records.
|
39
|
+
def score(match_records)
|
40
|
+
score_results = Array.new
|
41
|
+
unless match_records.empty?
|
42
|
+
#place the match_records information
|
43
|
+
#into our @stats hash
|
44
|
+
match_records.each do |match|
|
45
|
+
match.each do |key, value|
|
46
|
+
@stats[key.to_sym][:match] = value.to_s.upcase
|
47
|
+
end
|
48
|
+
record_score = calculate_record
|
49
|
+
score_results.push(record_score)
|
50
|
+
@possible_hits << match.merge(:score => record_score) if record_score > 0
|
51
|
+
end
|
52
|
+
score = score_results.max #take max score
|
53
|
+
end
|
54
|
+
@possible_hits.uniq!
|
55
|
+
score ||= 0
|
56
|
+
end
|
57
|
+
|
58
|
+
private
|
59
|
+
|
60
|
+
|
61
|
+
# calculate the score for this record
|
62
|
+
# comparing the token to the match fields in the @stats hash
|
63
|
+
# and storing the score into the record
|
64
|
+
def calculate_record
|
65
|
+
score = 0
|
66
|
+
unless @stats.nil?
|
67
|
+
#need to make sure we check the name first, since city and address don't
|
68
|
+
#get added to the score unless there is a name match
|
69
|
+
[:name,:city,:address].each do |field|
|
70
|
+
data = @stats[field]
|
71
|
+
if (data[:token].blank?)
|
72
|
+
value = 0 #token is blank can't be sure of a match if nothing to match against
|
73
|
+
else
|
74
|
+
if (data[:match].blank?)
|
75
|
+
value = 0 #token has value match is blank
|
76
|
+
else
|
77
|
+
#token and match both have values
|
78
|
+
if (data[:type] == :number)
|
79
|
+
value = data[:token] == data[:match] ? 1 : 0
|
80
|
+
else
|
81
|
+
#first see if there is an exact match
|
82
|
+
value = data[:token] == data[:match] ? 1 : 0
|
83
|
+
|
84
|
+
unless value > 0
|
85
|
+
#do a sounds like with the data as given to see if we get a match
|
86
|
+
#if match on sounds_like, only give .75 of the weight.
|
87
|
+
value = data[:token].ofac_sounds_like(data[:match],false) ? 0.75 : 0
|
88
|
+
end
|
89
|
+
|
90
|
+
#if no match, then break the data down and see if we can find matches on the
|
91
|
+
#individual words
|
92
|
+
unless value > 0
|
93
|
+
token_data = data[:token].gsub(/\W/,'|')
|
94
|
+
token_array = token_data.split('|')
|
95
|
+
token_array.delete('')
|
96
|
+
|
97
|
+
match_data = data[:match].gsub(/\W/,'|')
|
98
|
+
match_array = match_data.split('|')
|
99
|
+
match_array.delete('')
|
100
|
+
|
101
|
+
value = 0
|
102
|
+
partial_weight = 1/token_array.length.to_f
|
103
|
+
token_array.each do |partial_token|
|
104
|
+
#first see if we get an exact match of the partial
|
105
|
+
if match_array.include?(partial_token)
|
106
|
+
value += partial_weight
|
107
|
+
else
|
108
|
+
#otherwise, see if the partial sounds like any part of the OFAC record
|
109
|
+
match_array.each do |partial_match|
|
110
|
+
if partial_match.ofac_sounds_like(partial_token,false)
|
111
|
+
#give partial value for every part of token that is matched.
|
112
|
+
value += partial_weight * 0.75
|
113
|
+
break
|
114
|
+
end
|
115
|
+
end
|
116
|
+
end
|
117
|
+
end
|
118
|
+
end
|
119
|
+
end
|
120
|
+
end
|
121
|
+
end
|
122
|
+
data[:score] = data[:weight] * value
|
123
|
+
score += data[:score]
|
124
|
+
break if field == :name && data[:score] == 0
|
125
|
+
end
|
126
|
+
|
127
|
+
end
|
128
|
+
score.round
|
129
|
+
end
|
130
|
+
|
131
|
+
end
|
132
|
+
|
@@ -0,0 +1,22 @@
|
|
1
|
+
class String
|
2
|
+
|
3
|
+
Ofac_SoundexChars = 'BPFVCSKGJQXZDTLMNR'
|
4
|
+
Ofac_SoundexNums = '111122222222334556'
|
5
|
+
Ofac_SoundexCharsEx = '^' + Ofac_SoundexChars
|
6
|
+
Ofac_SoundexCharsDel = '^A-Z'
|
7
|
+
|
8
|
+
# desc: http://en.wikipedia.org/wiki/Soundex
|
9
|
+
def ofac_soundex(census = true)
|
10
|
+
str = upcase.delete(Ofac_SoundexCharsDel).squeeze
|
11
|
+
|
12
|
+
str[0 .. 0] + str[1 .. -1].
|
13
|
+
delete(Ofac_SoundexCharsEx).
|
14
|
+
tr(Ofac_SoundexChars, Ofac_SoundexNums)[0 .. (census ? 2 : -1)].
|
15
|
+
ljust(3, '0') rescue ''
|
16
|
+
end
|
17
|
+
|
18
|
+
def ofac_sounds_like(other, census = true)
|
19
|
+
ofac_soundex(census) == other.ofac_soundex(census)
|
20
|
+
end
|
21
|
+
|
22
|
+
end
|
data/lib/tasks/ofac.rake
ADDED
@@ -0,0 +1,10 @@
|
|
1
|
+
10|7|-0- |-0- |"Panama"|-0-
|
2
|
+
15|12|-0- |-0- |"Panama"|-0-
|
3
|
+
22|14|"123 Somewhere Ln"|"Clearwater"|"United States"|-0-
|
4
|
+
39|27|-0- |"Managua"|"Nicaragua"|-0-
|
5
|
+
39|29|"Bal Harbour Shopping Center, Via Italia"|"Panama City"|"Panama"|-0-
|
6
|
+
41|41|"Avenida de Concha, Espina 8, E-28036"|"Madrid"|"Spain"|-0-
|
7
|
+
41|102|-0- |-0- |-0- |-0-
|
8
|
+
66|111|-0- |"Milan"|"Italy"|-0-
|
9
|
+
66|117|-0- |-0- |"Panama"|-0-
|
10
|
+
66|125|"1840 West 49th Street"|"Hialeah, FL"|"United States"|-0-
|
@@ -0,0 +1,10 @@
|
|
1
|
+
15|14|"aka"|"VIAJES GUAMA TOURS"|-0-
|
2
|
+
22|15|"aka"|"HERNANDEZ, Oscar Grouch"|-0-
|
3
|
+
22|16|"aka"|"Alternate Name"|-0-
|
4
|
+
25|57|"aka"|"AVIA IMPORT"|-0-
|
5
|
+
36|219|"aka"|"BNC"|-0-
|
6
|
+
36|220|"aka"|"NATIONAL BANK OF CUBA"|-0-
|
7
|
+
36|221|"aka"|"BNC"|-0-
|
8
|
+
41|222|"aka"|"NATIONAL BANK OF CUBA"|-0-
|
9
|
+
66|223|"aka"|"BNC"|-0-
|
10
|
+
66|224|"aka"|"NATIONAL BANK OF CUBA"|-0-
|
@@ -0,0 +1,9 @@
|
|
1
|
+
10|"ABASTECEDORA NAVAL Y INDUSTRIAL, S.A."|-0- |"CUBA"|-0- |-0- |-0- |-0- |-0- |-0- |-0- |-0-
|
2
|
+
15|"ABDELNUR| Nury de Jesus"|"individual"|"CUBA"|-0- |-0- |-0- |-0- |-0- |-0- |-0- |-0-
|
3
|
+
22|"HERNANDEZ, Oscar"|"individual"|"CUBA"|-0- |-0- |"Unknown vessel type"|-0- |-0- |-0- |"Acechilly Navigation Co., Malta"|-0-
|
4
|
+
24|"LOPEZ MENDEZ, Luis Eduardo"|"individual"|"CUBA"|-0- |-0- |"Unknown vessel type"|-0- |-0- |-0- |"Acefrosty Shipping Co., Malta"|-0-
|
5
|
+
25|"ACEFROSTY SHIPPING CO., LTD."|-0- |"CUBA"|-0- |-0- |-0- |-0- |-0- |-0- |-0- |-0-
|
6
|
+
36|"AEROCARIBBEAN AIRLINES"|-0- |"CUBA"|-0- |-0- |-0- |-0- |-0- |-0- |-0- |-0-
|
7
|
+
39|"AEROTAXI EJECUTIVO, S.A."|-0- |"CUBA"|-0- |-0- |-0- |-0- |-0- |-0- |-0- |-0-
|
8
|
+
41|"AGENCIA DE VIAJES GUAMA"|-0- |"CUBA"|-0- |-0- |-0- |-0- |-0- |-0- |-0- |-0-
|
9
|
+
66|"AGUIAR, Raul"|"individual"|"CUBA"|"Director, Banco Nacional de Cuba"|-0- |-0- |-0- |-0- |-0- |-0- |"; Director, Banco Nacional de Cuba."
|
@@ -0,0 +1,19 @@
|
|
1
|
+
``|`ABASTECEDORA NAVAL Y INDUSTRIAL, S.A.`|``|`CUBA`|``|``|``|``|``|``|``|``|``|``|`Panama`|``|``|``|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
2
|
+
``|`ABDELNUR`|` Nury de Jesus`|`individual`|`CUBA`|``|``|``|``|``|``|``|``|``|`Panama`|``|`aka`|`VIAJES GUAMA TOURS`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
3
|
+
``|`HERNANDEZ, Oscar`|`individual`|`CUBA`|``|``|`Unknown vessel type`|``|``|``|`Acechilly Navigation Co., Malta`|``|`123 Somewhere Ln`|`Clearwater`|`United States`|``|`aka`|`HERNANDEZ, Oscar Grouch`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
4
|
+
``|`HERNANDEZ, Oscar`|`individual`|`CUBA`|``|``|`Unknown vessel type`|``|``|``|`Acechilly Navigation Co., Malta`|``|`123 Somewhere Ln`|`Clearwater`|`United States`|``|`aka`|`Alternate Name`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
5
|
+
``|`LOPEZ MENDEZ, Luis Eduardo`|`individual`|`CUBA`|``|``|`Unknown vessel type`|``|``|``|`Acefrosty Shipping Co., Malta`|``|``|``|``|``|``|``|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
6
|
+
``|`ACEFROSTY SHIPPING CO., LTD.`|``|`CUBA`|``|``|``|``|``|``|``|``|``|``|``|``|`aka`|`AVIA IMPORT`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
7
|
+
``|`AEROCARIBBEAN AIRLINES`|``|`CUBA`|``|``|``|``|``|``|``|``|``|``|``|``|`aka`|`BNC`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
8
|
+
``|`AEROCARIBBEAN AIRLINES`|``|`CUBA`|``|``|``|``|``|``|``|``|``|``|``|``|`aka`|`NATIONAL BANK OF CUBA`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
9
|
+
``|`AEROCARIBBEAN AIRLINES`|``|`CUBA`|``|``|``|``|``|``|``|``|``|``|``|``|`aka`|`BNC`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
10
|
+
``|`AEROTAXI EJECUTIVO, S.A.`|``|`CUBA`|``|``|``|``|``|``|``|``|``|`Managua`|`Nicaragua`|``|``|``|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
11
|
+
``|`AEROTAXI EJECUTIVO, S.A.`|``|`CUBA`|``|``|``|``|``|``|``|``|`Bal Harbour Shopping Center, Via Italia`|`Panama City`|`Panama`|``|``|``|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
12
|
+
``|`AGENCIA DE VIAJES GUAMA`|``|`CUBA`|``|``|``|``|``|``|``|``|`Avenida de Concha, Espina 8, E-28036`|`Madrid`|`Spain`|``|`aka`|`NATIONAL BANK OF CUBA`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
13
|
+
``|`AGENCIA DE VIAJES GUAMA`|``|`CUBA`|``|``|``|``|``|``|``|``|``|``|``|``|`aka`|`NATIONAL BANK OF CUBA`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
14
|
+
``|`AGUIAR, Raul`|`individual`|`CUBA`|`Director, Banco Nacional de Cuba`|``|``|``|``|``|``|`; Director, Banco Nacional de Cuba.`|``|`Milan`|`Italy`|``|`aka`|`BNC`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
15
|
+
``|`AGUIAR, Raul`|`individual`|`CUBA`|`Director, Banco Nacional de Cuba`|``|``|``|``|``|``|`; Director, Banco Nacional de Cuba.`|``|`Milan`|`Italy`|``|`aka`|`NATIONAL BANK OF CUBA`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
16
|
+
``|`AGUIAR, Raul`|`individual`|`CUBA`|`Director, Banco Nacional de Cuba`|``|``|``|``|``|``|`; Director, Banco Nacional de Cuba.`|``|``|`Panama`|``|`aka`|`BNC`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
17
|
+
``|`AGUIAR, Raul`|`individual`|`CUBA`|`Director, Banco Nacional de Cuba`|``|``|``|``|``|``|`; Director, Banco Nacional de Cuba.`|``|``|`Panama`|``|`aka`|`NATIONAL BANK OF CUBA`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
18
|
+
``|`AGUIAR, Raul`|`individual`|`CUBA`|`Director, Banco Nacional de Cuba`|``|``|``|``|``|``|`; Director, Banco Nacional de Cuba.`|`1840 West 49th Street`|`Hialeah, FL`|`United States`|``|`aka`|`BNC`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
19
|
+
``|`AGUIAR, Raul`|`individual`|`CUBA`|`Director, Banco Nacional de Cuba`|``|``|``|``|``|``|`; Director, Banco Nacional de Cuba.`|`1840 West 49th Street`|`Hialeah, FL`|`United States`|``|`aka`|`NATIONAL BANK OF CUBA`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
@@ -0,0 +1,20 @@
|
|
1
|
+
require 'ofac/models/ofac_sdn_loader'
|
2
|
+
|
3
|
+
class OfacSdnLoader
|
4
|
+
|
5
|
+
def self.load_current_sdn_file
|
6
|
+
sdn = File.new(File.dirname(__FILE__) + '/../../files/test_sdn_data_load.pip')
|
7
|
+
address = File.new(File.dirname(__FILE__) + '/../../files/test_address_data_load.pip')
|
8
|
+
alt = File.new(File.dirname(__FILE__) + '/../../files/test_alt_data_load.pip')
|
9
|
+
active_record_file_load(sdn, address, alt)
|
10
|
+
sdn.close
|
11
|
+
address.close
|
12
|
+
alt.close
|
13
|
+
end
|
14
|
+
|
15
|
+
#Gives access to the private convert_to_flattened_csv method
|
16
|
+
def self.create_csv_file(sdn, address, alt)
|
17
|
+
convert_to_flattened_csv(sdn, address, alt)
|
18
|
+
end
|
19
|
+
|
20
|
+
end
|
@@ -0,0 +1,40 @@
|
|
1
|
+
require 'test_helper'
|
2
|
+
|
3
|
+
class OfacSdnLoaderTest < Test::Unit::TestCase
|
4
|
+
|
5
|
+
context '' do
|
6
|
+
setup do setup_ofac_sdn_table end
|
7
|
+
|
8
|
+
should "load table from files multiple times and always have the same record count" do
|
9
|
+
assert_equal(0,OfacSdn.count)
|
10
|
+
OfacSdnLoader.load_current_sdn_file #this method is mocked to load test files instead of the live files from the web.
|
11
|
+
assert_equal(19, OfacSdn.count)
|
12
|
+
OfacSdnLoader.load_current_sdn_file
|
13
|
+
assert_equal(19, OfacSdn.count)
|
14
|
+
end
|
15
|
+
|
16
|
+
should "create flattened_csv_file_for_mysql_import" do
|
17
|
+
#since, I'm using sqlight3 for it's in memory db, I can't test the mysql load
|
18
|
+
#but I can test the csv file creation.
|
19
|
+
sdn = File.new(File.dirname(__FILE__) + '/files/test_sdn_data_load.pip')
|
20
|
+
address = File.new(File.dirname(__FILE__) + '/files/test_address_data_load.pip')
|
21
|
+
alt = File.new(File.dirname(__FILE__) + '/files/test_alt_data_load.pip')
|
22
|
+
|
23
|
+
csv = OfacSdnLoader.create_csv_file(sdn, address, alt) #this method was created in the mock only to call the private convert_to_flattened_csv method
|
24
|
+
correctly_formatted_csv = File.open(File.dirname(__FILE__) + '/files/valid_flattened_file.csv')
|
25
|
+
|
26
|
+
csv.rewind
|
27
|
+
generated_file = csv.readlines
|
28
|
+
#compare the values of each csv line, with the correctly formated "control file"
|
29
|
+
correctly_formatted_csv.each_with_index do |line,i|
|
30
|
+
csv_line = generated_file[i]
|
31
|
+
correctly_formatted_record_array = line.split('|')
|
32
|
+
csv_record_array = csv_line.split('|')
|
33
|
+
(0..18).each do |i| #skip indices 19 and 20, they are the created_at and updated_at fields, they will never match.
|
34
|
+
assert_equal correctly_formatted_record_array[i], csv_record_array[i]
|
35
|
+
end
|
36
|
+
end
|
37
|
+
end
|
38
|
+
|
39
|
+
end
|
40
|
+
end
|
data/test/ofac_test.rb
ADDED
@@ -0,0 +1,76 @@
|
|
1
|
+
require 'test_helper'
|
2
|
+
|
3
|
+
class OfacTest < Test::Unit::TestCase
|
4
|
+
|
5
|
+
context '' do
|
6
|
+
setup do
|
7
|
+
setup_ofac_sdn_table
|
8
|
+
OfacSdnLoader.load_current_sdn_file #this method is mocked to load test files instead of the live files from the web.
|
9
|
+
end
|
10
|
+
|
11
|
+
should "give a score of 0 if no name is given" do
|
12
|
+
assert_equal 0, Ofac.new({:address => '123 somewhere'}).score
|
13
|
+
end
|
14
|
+
|
15
|
+
should "give a score of 0 if there is no name match" do
|
16
|
+
assert_equal 0, Ofac.new({:name => 'Kevin'}).score
|
17
|
+
end
|
18
|
+
|
19
|
+
should "give a score of 0 if there is no name match but there is an address and city match" do
|
20
|
+
assert_equal 0, Ofac.new({:name => 'Kevin', :address => '123 somewhere ln', :city => 'Clearwater'}).score
|
21
|
+
end
|
22
|
+
|
23
|
+
should "give a score of 60 if there is a name match" do
|
24
|
+
assert_equal 60, Ofac.new({:name => 'Oscar Hernandez'}).score
|
25
|
+
assert_equal 60, Ofac.new({:name => 'Oscar Hernandez', :city => 'no match', :address => 'no match'}).score
|
26
|
+
assert_equal 60, Ofac.new({:name => 'Oscar Hernandez', :city => 'Las Vegas', :address => 'no match'}).score
|
27
|
+
assert_equal 60, Ofac.new({:name => 'Luis Lopez', :city => 'Las Vegas', :address => 'no match'}).score
|
28
|
+
end
|
29
|
+
|
30
|
+
should "give a score of 60 if there is a name match on alternate identity name" do
|
31
|
+
assert_equal 60, Ofac.new({:name => 'Alternate Name'}).score
|
32
|
+
end
|
33
|
+
|
34
|
+
should "give a partial score if there is a partial name match" do
|
35
|
+
assert_equal 40, Ofac.new({:name => 'Oscar middlename Hernandez'}).score
|
36
|
+
assert_equal 30, Ofac.new({:name => 'Oscar WrongLastName'}).score
|
37
|
+
assert_equal 70, Ofac.new({:name => 'Oscar middlename Hernandez',:city => 'Clearwater'}).score
|
38
|
+
end
|
39
|
+
|
40
|
+
should "give a score of 90 if there is a name and city match" do
|
41
|
+
assert_equal 90, Ofac.new({:name => 'Oscar Hernandez', :city => 'Clearwater', :address => 'no match'}).score
|
42
|
+
end
|
43
|
+
|
44
|
+
should "give a score of 100 if there is a name and city and address match" do
|
45
|
+
assert_equal 100, Ofac.new({:name => 'Oscar Hernandez', :city => 'Clearwater', :address => '123 somewhere ln'}).score
|
46
|
+
end
|
47
|
+
|
48
|
+
should "give partial scores for sounds like matches" do
|
49
|
+
|
50
|
+
#32456 summer lane sounds like 32456 Somewhere ln so is adds 75% of the address weight to the score, or 8.
|
51
|
+
assert_equal 98, Ofac.new({:name => 'Oscar Hernandez', :city => 'Clearwater', :address => '32456 summer lane'}).score
|
52
|
+
|
53
|
+
#summer sounds like somewhere, and all numbers sound alike, so 2 of the 3 address elements match by sound.
|
54
|
+
#Each element is worth 10\3 or 3.33. Exact matches add 2.33 each, and the sounds like adds 2.33 * .75 or 2.5
|
55
|
+
#because sounds like matches only add 75% of it's weight.
|
56
|
+
#2.5 + 2.5 = 5
|
57
|
+
assert_equal 95, Ofac.new({:name => 'Oscar Hernandez', :city => 'Clearwater', :address => '12358 summer blvd'}).score
|
58
|
+
|
59
|
+
|
60
|
+
#Louis sounds like Luis, and Lopez is an exact match:
|
61
|
+
#:name has a weight of 60, so each element is worth 30. A sounds like match is worth 30 * .75
|
62
|
+
assert_equal 53, Ofac.new({:name => 'Louis Lopez', :city => 'Las Vegas', :address => 'no match'}).score
|
63
|
+
end
|
64
|
+
|
65
|
+
should "return an array of possible hits" do
|
66
|
+
#it should matter which order you call score or possible hits.
|
67
|
+
sdn = Ofac.new({:name => 'Oscar Hernandez', :city => 'Clearwater', :address => '123 somewhere ln'})
|
68
|
+
assert sdn.score > 0
|
69
|
+
assert !sdn.possible_hits.empty?
|
70
|
+
|
71
|
+
sdn = Ofac.new({:name => 'Oscar Hernandez', :city => 'Clearwater', :address => '123 somewhere ln'})
|
72
|
+
assert !sdn.possible_hits.empty?
|
73
|
+
assert sdn.score > 0
|
74
|
+
end
|
75
|
+
end
|
76
|
+
end
|
data/test/test_helper.rb
ADDED
@@ -0,0 +1,48 @@
|
|
1
|
+
require 'rubygems'
|
2
|
+
require 'test/unit'
|
3
|
+
require 'shoulda'
|
4
|
+
require 'mocks/test/ofac_sdn_loader'
|
5
|
+
|
6
|
+
$LOAD_PATH.unshift(File.join(File.dirname(__FILE__), '..', 'lib'))
|
7
|
+
$LOAD_PATH.unshift(File.dirname(__FILE__))
|
8
|
+
require 'ofac'
|
9
|
+
|
10
|
+
ActiveRecord::Base.establish_connection :adapter => 'sqlite3', :database => ':memory:'
|
11
|
+
|
12
|
+
class Test::Unit::TestCase
|
13
|
+
def setup_ofac_sdn_table
|
14
|
+
ActiveRecord::Base.connection.tables.each { |table| ActiveRecord::Base.connection.drop_table(table) }
|
15
|
+
create_ofac_sdn_table
|
16
|
+
end
|
17
|
+
|
18
|
+
private
|
19
|
+
|
20
|
+
def create_ofac_sdn_table
|
21
|
+
silence_stream(STDOUT) do
|
22
|
+
ActiveRecord::Schema.define(:version => 1) do
|
23
|
+
create_table :ofac_sdns do |t|
|
24
|
+
t.text :name
|
25
|
+
t.string :sdn_type
|
26
|
+
t.string :program
|
27
|
+
t.string :title
|
28
|
+
t.string :vessel_call_sign
|
29
|
+
t.string :vessel_type
|
30
|
+
t.string :vessel_tonnage
|
31
|
+
t.string :gross_registered_tonnage
|
32
|
+
t.string :vessel_flag
|
33
|
+
t.string :vessel_owner
|
34
|
+
t.text :remarks
|
35
|
+
t.text :address
|
36
|
+
t.string :city
|
37
|
+
t.string :country
|
38
|
+
t.string :address_remarks
|
39
|
+
t.string :alternate_identity_type
|
40
|
+
t.text :alternate_identity_name
|
41
|
+
t.string :alternate_identity_remarks
|
42
|
+
t.timestamps
|
43
|
+
end
|
44
|
+
end
|
45
|
+
end
|
46
|
+
end
|
47
|
+
|
48
|
+
end
|
metadata
ADDED
@@ -0,0 +1,90 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: kevintyll-ofac
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 1.0.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Kevin Tyll
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
|
12
|
+
date: 2009-05-11 00:00:00 -07:00
|
13
|
+
default_executable:
|
14
|
+
dependencies: []
|
15
|
+
|
16
|
+
description: Attempts to find a hit on the Office of Foreign Assets Control's Specially Designated Nationals list.
|
17
|
+
email: kevintyll@gmail.com
|
18
|
+
executables: []
|
19
|
+
|
20
|
+
extensions: []
|
21
|
+
|
22
|
+
extra_rdoc_files:
|
23
|
+
- LICENSE
|
24
|
+
- README.rdoc
|
25
|
+
files:
|
26
|
+
- History.txt
|
27
|
+
- LICENSE
|
28
|
+
- PostInstall.txt
|
29
|
+
- README.rdoc
|
30
|
+
- Rakefile
|
31
|
+
- VERSION.yml
|
32
|
+
- generators/ofac_migration/ofac_migration_generator.rb
|
33
|
+
- generators/ofac_migration/templates/migration.rb
|
34
|
+
- lib/ofac.rb
|
35
|
+
- lib/ofac/models/ofac.rb
|
36
|
+
- lib/ofac/models/ofac_sdn.rb
|
37
|
+
- lib/ofac/models/ofac_sdn_loader.rb
|
38
|
+
- lib/ofac/ofac_match.rb
|
39
|
+
- lib/ofac/ruby_string_extensions.rb
|
40
|
+
- lib/tasks/ofac.rake
|
41
|
+
- test/files/test_address_data_load.pip
|
42
|
+
- test/files/test_alt_data_load.pip
|
43
|
+
- test/files/test_sdn_data_load.pip
|
44
|
+
- test/files/valid_flattened_file.csv
|
45
|
+
- test/mocks/test/ofac_sdn_loader.rb
|
46
|
+
- test/ofac_sdn_loader_test.rb
|
47
|
+
- test/ofac_test.rb
|
48
|
+
- test/test_helper.rb
|
49
|
+
has_rdoc: true
|
50
|
+
homepage: http://github.com/kevintyll/ofac
|
51
|
+
post_install_message: |-
|
52
|
+
For more information on ofac, see http://kevintyll.github.com/ofac/
|
53
|
+
|
54
|
+
* To create the necessary db migration, from the command line, run:
|
55
|
+
script/generate ofac_migration
|
56
|
+
* Require the gem in your environment.rb file in the Rails::Initializer block:
|
57
|
+
config.gem 'kevintyll-ofac', :lib => 'ofac'
|
58
|
+
* To load your table with the current OFAC data, from the command line, run:
|
59
|
+
rake ofac:update_data
|
60
|
+
|
61
|
+
* The OFAC data is not updated with any regularity, but you can sign up for email notifications when the data changes at
|
62
|
+
http://www.treas.gov/offices/enforcement/ofac/sdn/index.shtml.
|
63
|
+
rdoc_options:
|
64
|
+
- --charset=UTF-8
|
65
|
+
require_paths:
|
66
|
+
- lib
|
67
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
68
|
+
requirements:
|
69
|
+
- - ">="
|
70
|
+
- !ruby/object:Gem::Version
|
71
|
+
version: "0"
|
72
|
+
version:
|
73
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
74
|
+
requirements:
|
75
|
+
- - ">="
|
76
|
+
- !ruby/object:Gem::Version
|
77
|
+
version: "0"
|
78
|
+
version:
|
79
|
+
requirements: []
|
80
|
+
|
81
|
+
rubyforge_project:
|
82
|
+
rubygems_version: 1.2.0
|
83
|
+
signing_key:
|
84
|
+
specification_version: 2
|
85
|
+
summary: Attempts to find a hit on the Office of Foreign Assets Control's Specially Designated Nationals list.
|
86
|
+
test_files:
|
87
|
+
- test/mocks/test/ofac_sdn_loader.rb
|
88
|
+
- test/ofac_sdn_loader_test.rb
|
89
|
+
- test/ofac_test.rb
|
90
|
+
- test/test_helper.rb
|