kevintyll-ofac 1.0.0
Sign up to get free protection for your applications and to get access to all the features.
- data/History.txt +9 -0
- data/LICENSE +20 -0
- data/PostInstall.txt +11 -0
- data/README.rdoc +109 -0
- data/Rakefile +57 -0
- data/VERSION.yml +4 -0
- data/generators/ofac_migration/ofac_migration_generator.rb +12 -0
- data/generators/ofac_migration/templates/migration.rb +30 -0
- data/lib/ofac.rb +9 -0
- data/lib/ofac/models/ofac.rb +119 -0
- data/lib/ofac/models/ofac_sdn.rb +5 -0
- data/lib/ofac/models/ofac_sdn_loader.rb +305 -0
- data/lib/ofac/ofac_match.rb +132 -0
- data/lib/ofac/ruby_string_extensions.rb +22 -0
- data/lib/tasks/ofac.rake +8 -0
- data/test/files/test_address_data_load.pip +10 -0
- data/test/files/test_alt_data_load.pip +10 -0
- data/test/files/test_sdn_data_load.pip +9 -0
- data/test/files/valid_flattened_file.csv +19 -0
- data/test/mocks/test/ofac_sdn_loader.rb +20 -0
- data/test/ofac_sdn_loader_test.rb +40 -0
- data/test/ofac_test.rb +76 -0
- data/test/test_helper.rb +48 -0
- metadata +90 -0
data/History.txt
ADDED
data/LICENSE
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright (c) 2009 Kevin Tyll
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
4
|
+
a copy of this software and associated documentation files (the
|
5
|
+
"Software"), to deal in the Software without restriction, including
|
6
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
7
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8
|
+
permit persons to whom the Software is furnished to do so, subject to
|
9
|
+
the following conditions:
|
10
|
+
|
11
|
+
The above copyright notice and this permission notice shall be
|
12
|
+
included in all copies or substantial portions of the Software.
|
13
|
+
|
14
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/PostInstall.txt
ADDED
@@ -0,0 +1,11 @@
|
|
1
|
+
For more information on ofac, see http://kevintyll.github.com/ofac/
|
2
|
+
|
3
|
+
* To create the necessary db migration, from the command line, run:
|
4
|
+
script/generate ofac_migration
|
5
|
+
* Require the gem in your environment.rb file in the Rails::Initializer block:
|
6
|
+
config.gem 'kevintyll-ofac', :lib => 'ofac'
|
7
|
+
* To load your table with the current OFAC data, from the command line, run:
|
8
|
+
rake ofac:update_data
|
9
|
+
|
10
|
+
* The OFAC data is not updated with any regularity, but you can sign up for email notifications when the data changes at
|
11
|
+
http://www.treas.gov/offices/enforcement/ofac/sdn/index.shtml.
|
data/README.rdoc
ADDED
@@ -0,0 +1,109 @@
|
|
1
|
+
= ofac
|
2
|
+
|
3
|
+
* http://kevintyll.github.com/ofac
|
4
|
+
* http://www.drexel-labs.com
|
5
|
+
|
6
|
+
* http://www.treas.gov/offices/enforcement/ofac/sdn/index.shtml
|
7
|
+
|
8
|
+
== DESCRIPTION:
|
9
|
+
|
10
|
+
ofac is a ruby gem that tries to find a match of a person's name and address against the
|
11
|
+
Office of Foreign Assets Control's Specially Designated Nationals list...the so called
|
12
|
+
terrorist watch list.
|
13
|
+
|
14
|
+
This gem, like the ssn_validator gem, started as a need for the company I work for, Clarity Services Inc.
|
15
|
+
We decided once again to create a gem out of it and share it with the community. Much
|
16
|
+
thanks goes to the management at Clarity Services Inc. for allowing this code to be open sourced. Thanks
|
17
|
+
also to Larry Berland at Clarity Services Inc. The matching logic in the ofac_match.rb file was derived from
|
18
|
+
his work.
|
19
|
+
|
20
|
+
== FEATURES:
|
21
|
+
|
22
|
+
Creates a score, 1 - 100, based on how well the name, address and city match the data on the SDN list. Since
|
23
|
+
we have to match on strings, the likely hood of an exact match are virtually nil. So we've created an
|
24
|
+
algorithm that creates a score. The better the match, the higher the score. A score of 100 would be
|
25
|
+
a perfect match.
|
26
|
+
|
27
|
+
The score is calculated by adding up the weightings of each part that is matched. So
|
28
|
+
if only name is matched, then the max score is the weight for <tt>:name</tt> which is 60
|
29
|
+
|
30
|
+
It's possible to get partial matches, which will add partial weight to the score. If there
|
31
|
+
is not a match on the element as it is passed in, then each word element gets broken down
|
32
|
+
and matches are tried on each partial element. The weighting is distrubuted equally for
|
33
|
+
each partial that is matched.
|
34
|
+
|
35
|
+
If exact matches are not made, then a sounds like match is attempted. Any match made by sounds like
|
36
|
+
is given 75% of it's weight to the score.
|
37
|
+
Example:
|
38
|
+
|
39
|
+
If you are trying to match the name Kevin Tyll and there is a record for Smith, Kevin in the database, then
|
40
|
+
we will try to match both Kevin and Tyll separately, with each element Smith and Kevin. Since only Kevin
|
41
|
+
will find a match, and there were 2 elements in the searched name, the score will be added by half the weighting
|
42
|
+
for <tt>:name</tt>. So since the weight for <tt>:name</tt> is 60, then we will add 30 to the score.
|
43
|
+
|
44
|
+
If you are trying to match the name Kevin Gregory Tyll and there is a record for Tyll, Kevin in the database, then
|
45
|
+
we will try to match Kevin and Gregory and Tyll separately, with each element Tyll and Kevin. Since both Kevin
|
46
|
+
and Tyll will find a match, and there were 3 elements in the searched name, the score will be added by 2/3 the weighting
|
47
|
+
for <tt>:name</tt>. So since the weight for <tt>:name</tt> is 60, then we will add 40 to the score.
|
48
|
+
|
49
|
+
If you are trying to match the name Kevin Tyll and there is a record for Kevin Gregory Tyll in the database, then
|
50
|
+
we will try to match Kevin and Tyll separately, with each element Tyll and Kevin and Gregory. Since both Kevin
|
51
|
+
and Tyll will find a match, and there were 2 elements in the searched name, the score will be added by 2/2 the weighting
|
52
|
+
for <tt>:name</tt>. So since the weight for <tt>:name</tt> is 60, then we will add 60 to the score.
|
53
|
+
|
54
|
+
If you are trying to match the name Kevin Tyll, and there is a record for Teel, Kevin in the database, then an exact match
|
55
|
+
will be found for Kevin, and a sounds like match will be made for Tyll. Since there were 2 elements in hte searched name,
|
56
|
+
and the weight for <tt>:name</tt> is 60, then each element is worth 30. Since Kevin was an exact match, it will add 30, and
|
57
|
+
since Tyll was a sounds like match, it will add 30 * .75. So the <tt>:name</tt> portion of the search will be worth 53.
|
58
|
+
|
59
|
+
Matches for name are made for both the name and any aliases in the OFAC database.
|
60
|
+
|
61
|
+
Matches for <tt>:city</tt> and <tt>:address</tt> will only be added to the score if there is first a match on <tt>:name</tt>.
|
62
|
+
|
63
|
+
== SYNOPSIS:
|
64
|
+
Accepts a hash with the identity's demographic information
|
65
|
+
|
66
|
+
Ofac.new({:name => 'Oscar Hernandez', :city => 'Clearwater', :address => '123 somewhere ln'})
|
67
|
+
|
68
|
+
<tt>:name</tt> is required to get a score. If <tt>:name</tt> is missing, an error will not be thrown, but a score of 0 will be returned.
|
69
|
+
|
70
|
+
The more information provided, the higher the score could be. A score of 100 would mean all fields
|
71
|
+
were passed in, and all fields were 100% matches. If only the name is passed in without an address,
|
72
|
+
it will be impossible to get a score of 100, even if the name matches perfectly.
|
73
|
+
|
74
|
+
Acceptable hash keys and their weighting in score calculation:
|
75
|
+
|
76
|
+
* <tt>:name</tt> (weighting = 60%) (required) This can be a person, business, or marine vessel
|
77
|
+
* <tt>:address</tt> (weighting = 10%)
|
78
|
+
* <tt>:city</tt> (weighting = 30%)
|
79
|
+
|
80
|
+
* Instantiate the object with the identity's name, street address, and city.
|
81
|
+
ofac = Ofac.new(:name => 'Kevin Tyll', :city => 'Clearwater', :address => '123 Somewhere Ln.')
|
82
|
+
|
83
|
+
* Then get the score
|
84
|
+
ofac.score => return the score 1 - 100
|
85
|
+
|
86
|
+
* You can also get the list of all the partial matches with the score of each record.
|
87
|
+
ofac.possible_hits => returns an array of hashes.
|
88
|
+
|
89
|
+
== REQUIREMENTS:
|
90
|
+
|
91
|
+
* Rails 2.0.0 or greater
|
92
|
+
|
93
|
+
== INSTALL:
|
94
|
+
|
95
|
+
* To install the gem:
|
96
|
+
sudo gem install kevintyll-ofac
|
97
|
+
* To create the necessary db migration, from the command line, run:
|
98
|
+
script/generate ofac_migration
|
99
|
+
* Require the gem in your environment.rb file in the Rails::Initializer block:
|
100
|
+
config.gem 'kevintyll-ofac', :lib => 'ofac'
|
101
|
+
* To load your table with the current OFAC data, from the command line, run:
|
102
|
+
rake ofac:update_data
|
103
|
+
|
104
|
+
* The OFAC data is not updated with any regularity, but you can sign up for email notifications when the data changes at
|
105
|
+
http://www.treas.gov/offices/enforcement/ofac/sdn/index.shtml.
|
106
|
+
|
107
|
+
== Copyright
|
108
|
+
|
109
|
+
Copyright (c) 2009 Kevin Tyll. See LICENSE for details.
|
data/Rakefile
ADDED
@@ -0,0 +1,57 @@
|
|
1
|
+
require 'rubygems'
|
2
|
+
require 'rake'
|
3
|
+
|
4
|
+
begin
|
5
|
+
require 'jeweler'
|
6
|
+
Jeweler::Tasks.new do |gem|
|
7
|
+
gem.name = "ofac"
|
8
|
+
gem.summary = %Q{Attempts to find a hit on the Office of Foreign Assets Control's Specially Designated Nationals list.}
|
9
|
+
gem.description = %Q{Attempts to find a hit on the Office of Foreign Assets Control's Specially Designated Nationals list.}
|
10
|
+
gem.email = "kevintyll@gmail.com"
|
11
|
+
gem.homepage = "http://github.com/kevintyll/ofac"
|
12
|
+
gem.authors = ["Kevin Tyll"]
|
13
|
+
gem.post_install_message = File.readlines("PostInstall.txt").join("")
|
14
|
+
# gem is a Gem::Specification... see http://www.rubygems.org/read/chapter/20 for additional settings
|
15
|
+
end
|
16
|
+
rescue LoadError
|
17
|
+
puts "Jeweler not available. Install it with: sudo gem install technicalpickles-jeweler -s http://gems.github.com"
|
18
|
+
end
|
19
|
+
|
20
|
+
require 'rake/testtask'
|
21
|
+
Rake::TestTask.new(:test) do |test|
|
22
|
+
test.libs << 'lib' << 'test'
|
23
|
+
test.pattern = 'test/**/*_test.rb'
|
24
|
+
test.verbose = true
|
25
|
+
end
|
26
|
+
|
27
|
+
begin
|
28
|
+
require 'rcov/rcovtask'
|
29
|
+
Rcov::RcovTask.new do |test|
|
30
|
+
test.libs << 'test'
|
31
|
+
test.pattern = 'test/**/*_test.rb'
|
32
|
+
test.verbose = true
|
33
|
+
end
|
34
|
+
rescue LoadError
|
35
|
+
task :rcov do
|
36
|
+
abort "RCov is not available. In order to run rcov, you must: sudo gem install spicycode-rcov"
|
37
|
+
end
|
38
|
+
end
|
39
|
+
|
40
|
+
|
41
|
+
task :default => :test
|
42
|
+
|
43
|
+
require 'rake/rdoctask'
|
44
|
+
Rake::RDocTask.new do |rdoc|
|
45
|
+
if File.exist?('VERSION.yml')
|
46
|
+
config = YAML.load(File.read('VERSION.yml'))
|
47
|
+
version = "#{config[:major]}.#{config[:minor]}.#{config[:patch]}"
|
48
|
+
else
|
49
|
+
version = ""
|
50
|
+
end
|
51
|
+
|
52
|
+
rdoc.rdoc_dir = 'rdoc'
|
53
|
+
rdoc.title = "ofac #{version}"
|
54
|
+
rdoc.rdoc_files.include('README*')
|
55
|
+
rdoc.rdoc_files.include('lib/**/*.rb')
|
56
|
+
end
|
57
|
+
|
data/VERSION.yml
ADDED
@@ -0,0 +1,30 @@
|
|
1
|
+
class CreateOfacSdnTable < ActiveRecord::Migration
|
2
|
+
|
3
|
+
def self.up
|
4
|
+
create_table :ofac_sdns do |t|
|
5
|
+
t.text :name
|
6
|
+
t.string :sdn_type
|
7
|
+
t.string :program
|
8
|
+
t.string :title
|
9
|
+
t.string :vessel_call_sign
|
10
|
+
t.string :vessel_type
|
11
|
+
t.string :vessel_tonnage
|
12
|
+
t.string :gross_registered_tonnage
|
13
|
+
t.string :vessel_flag
|
14
|
+
t.string :vessel_owner
|
15
|
+
t.text :remarks
|
16
|
+
t.text :address
|
17
|
+
t.string :city
|
18
|
+
t.string :country
|
19
|
+
t.string :address_remarks
|
20
|
+
t.string :alternate_identity_type
|
21
|
+
t.text :alternate_identity_name
|
22
|
+
t.string :alternate_identity_remarks
|
23
|
+
t.timestamps
|
24
|
+
end
|
25
|
+
end
|
26
|
+
|
27
|
+
def self.down
|
28
|
+
drop_table :ofac_sdns
|
29
|
+
end
|
30
|
+
end
|
data/lib/ofac.rb
ADDED
@@ -0,0 +1,119 @@
|
|
1
|
+
require 'activerecord'
|
2
|
+
require 'active_record/connection_adapters/mysql_adapter'
|
3
|
+
|
4
|
+
class Ofac
|
5
|
+
|
6
|
+
|
7
|
+
# Accepts a hash with the identity's demographic information
|
8
|
+
#
|
9
|
+
# Ofac.new({:name => 'Oscar Hernandez', :city => 'Clearwater', :address => '123 somewhere ln'})
|
10
|
+
#
|
11
|
+
# <tt>:name</tt> is required to get a score. If <tt>:name</tt> is missing, an error will not be thrown, but a score of 0 will be returned.
|
12
|
+
#
|
13
|
+
# The more information provided, the higher the score could be. A score of 100 would mean all fields
|
14
|
+
# were passed in, and all fields were 100% matches. If only the name is passed in without an address,
|
15
|
+
# it will be impossible to get a score of 100, even if the name matches perfectly.
|
16
|
+
#
|
17
|
+
# Acceptable hash keys and their weighting in score calculation:
|
18
|
+
#
|
19
|
+
# * <tt>:name</tt> (weighting = 60%) (required) This can be a person, business, or marine vessel
|
20
|
+
# * <tt>:address</tt> (weighting = 10%)
|
21
|
+
# * <tt>:city</tt> (weighting = 30%)
|
22
|
+
def initialize(identity)
|
23
|
+
@identity = identity
|
24
|
+
end
|
25
|
+
|
26
|
+
# Creates a score, 1 - 100, based on how well the name and address match the data on the
|
27
|
+
# SDN (Specially Designated Nationals) list.
|
28
|
+
#
|
29
|
+
# The score is calculated by adding up the weightings of each part that is matched. So
|
30
|
+
# if only name is matched, then the max score is the weight for <tt>:name</tt> which is 60
|
31
|
+
#
|
32
|
+
# It's possible to get partial matches, which will add partial weight to the score. If there
|
33
|
+
# is not a match on the element as it is passed in, then each word element gets broken down
|
34
|
+
# and matches are tried on each partial element. The weighting is distrubuted equally for
|
35
|
+
# each partial that is matched.
|
36
|
+
#
|
37
|
+
# If exact matches are not made, then a sounds like match is attempted. Any match made by sounds like
|
38
|
+
# is given 75% of it's weight to the score.
|
39
|
+
#
|
40
|
+
# Example:
|
41
|
+
#
|
42
|
+
# If you are trying to match the name Kevin Tyll and there is a record for Smith, Kevin in the database, then
|
43
|
+
# we will try to match both Kevin and Tyll separately, with each element Smith and Kevin. Since only Kevin
|
44
|
+
# will find a match, and there were 2 elements in the searched name, the score will be added by half the weighting
|
45
|
+
# for <tt>:name</tt>. So since the weight for <tt>:name</tt> is 60, then we will add 30 to the score.
|
46
|
+
#
|
47
|
+
# If you are trying to match the name Kevin Gregory Tyll and there is a record for Tyll, Kevin in the database, then
|
48
|
+
# we will try to match Kevin and Gregory and Tyll separately, with each element Tyll and Kevin. Since both Kevin
|
49
|
+
# and Tyll will find a match, and there were 3 elements in the searched name, the score will be added by 2/3 the weighting
|
50
|
+
# for <tt>:name</tt>. So since the weight for <tt>:name</tt> is 60, then we will add 40 to the score.
|
51
|
+
#
|
52
|
+
# If you are trying to match the name Kevin Tyll and there is a record for Kevin Gregory Tyll in the database, then
|
53
|
+
# we will try to match Kevin and Tyll separately, with each element Tyll and Kevin and Gregory. Since both Kevin
|
54
|
+
# and Tyll will find a match, and there were 2 elements in the searched name, the score will be added by 2/2 the weighting
|
55
|
+
# for <tt>:name</tt>. So since the weight for <tt>:name</tt> is 60, then we will add 60 to the score.
|
56
|
+
#
|
57
|
+
# If you are trying to match the name Kevin Tyll, and there is a record for Teel, Kevin in the database, then an exact match
|
58
|
+
# will be found for Kevin, and a sounds like match will be made for Tyll. Since there were 2 elements in hte searched name,
|
59
|
+
# and the weight for <tt>:name</tt> is 60, then each element is worth 30. Since Kevin was an exact match, it will add 30, and
|
60
|
+
# since Tyll was a sounds like match, it will add 30 * .75. So the <tt>:name</tt> portion of the search will be worth 53.
|
61
|
+
#
|
62
|
+
# Matches for name are made for both the name and any aliases in the OFAC database.
|
63
|
+
#
|
64
|
+
# Matches for <tt>:city</tt> and <tt>:address</tt> will only be added to the score if there is first a match on <tt>:name</tt>.
|
65
|
+
def score
|
66
|
+
@score || calculate_score
|
67
|
+
end
|
68
|
+
|
69
|
+
# Returns an array of hashes of records in the OFAC data that found partial matches with that record's score.
|
70
|
+
#
|
71
|
+
# Ofac.new({:name => 'Oscar Hernandez', :city => 'Clearwater', :address => '123 somewhere ln'}).possible_hits
|
72
|
+
#returns
|
73
|
+
# [{:address=>"123 Somewhere Ln", :score=>100, :name=>"HERNANDEZ, Oscar|GUAMATUR, S.A.", :city=>"Clearwater"}, {:address=>"123 Somewhere Ln", :score=>100, :name=>"HERNANDEZ, Oscar|Alternate Name", :city=>"Clearwater"}]
|
74
|
+
#
|
75
|
+
def possible_hits
|
76
|
+
@possible_hits || retrieve_possible_hits
|
77
|
+
end
|
78
|
+
|
79
|
+
private
|
80
|
+
|
81
|
+
def retrieve_possible_hits
|
82
|
+
score
|
83
|
+
@possible_hits
|
84
|
+
end
|
85
|
+
|
86
|
+
def calculate_score
|
87
|
+
unless @identity[:name].to_s == ''
|
88
|
+
if OfacSdn.connection.kind_of?(ActiveRecord::ConnectionAdapters::MysqlAdapter)
|
89
|
+
#first get a list from the database of possible matches by name
|
90
|
+
#this query is pretty liberal, we just want to get a list of possible
|
91
|
+
#matches from the database that we can run through our ruby matching algorithm
|
92
|
+
partial_name = @identity[:name].gsub!(/\W/,'|')
|
93
|
+
name_array = partial_name.split('|')
|
94
|
+
name_array.delete('')
|
95
|
+
sql_name_partial = name_array.collect {|partial_name| "INSTR(SUBSTR(SOUNDEX(concat('O',name)), 2), REPLACE(SUBSTR(SOUNDEX('O#{partial_name}'), 2), '0', '')) > 0"}.join(' and ')
|
96
|
+
sql_alt_name_partial = name_array.collect {|partial_name| "INSTR(SUBSTR(SOUNDEX(concat('O',alternate_identity_name)), 2), REPLACE(SUBSTR(SOUNDEX('O#{partial_name}'), 2), '0', '')) > 0"}.join(' and ')
|
97
|
+
##this sql for getting "accurate sounds like" functionality comes from:
|
98
|
+
#http://jgeewax.wordpress.com/2006/07/21/efficient-sounds-like-searches-in-mysql/
|
99
|
+
possible_sdns = OfacSdn.connection.select_all("select concat(name,'|', alternate_identity_name) name, address, city
|
100
|
+
from ofac_sdns
|
101
|
+
where name is not null
|
102
|
+
and (((#{sql_name_partial}))
|
103
|
+
or ((#{sql_alt_name_partial})))")
|
104
|
+
else
|
105
|
+
possible_sdns = OfacSdn.find(:all, :select => 'name, alternate_identity_name, address, city').collect{|sdn| {:name => "#{sdn.name}|#{sdn.alternate_identity_name}", :address => sdn.address, :city => sdn.city}}
|
106
|
+
end
|
107
|
+
|
108
|
+
match = OfacMatch.new({:name => {:weight => 60, :token => "#{@identity[:name]}"},
|
109
|
+
:address => {:weight => 10, :token => @identity[:address]},
|
110
|
+
:city => {:weight => 30, :token => @identity[:city]}})
|
111
|
+
|
112
|
+
score = match.score(possible_sdns)
|
113
|
+
@possible_hits = match.possible_hits
|
114
|
+
end
|
115
|
+
@score = score || 0
|
116
|
+
return @score
|
117
|
+
end
|
118
|
+
|
119
|
+
end
|
@@ -0,0 +1,305 @@
|
|
1
|
+
require 'net/http'
|
2
|
+
require 'activerecord'
|
3
|
+
require 'active_record/connection_adapters/mysql_adapter'
|
4
|
+
|
5
|
+
class OfacSdnLoader
|
6
|
+
|
7
|
+
|
8
|
+
#Loads the most recent file from http://www.treas.gov/offices/enforcement/ofac/sdn/delimit/index.shtml
|
9
|
+
def self.load_current_sdn_file
|
10
|
+
puts "Reloading OFAC sdn data"
|
11
|
+
puts "Downloading OFAC data from http://www.treas.gov/offices/enforcement/ofac/sdn"
|
12
|
+
#get the 3 data files
|
13
|
+
sdn = Tempfile.new('sdn')
|
14
|
+
sdn.write(Net::HTTP.get(URI.parse('http://www.treas.gov/offices/enforcement/ofac/sdn/delimit/sdn.pip')))
|
15
|
+
sdn.rewind
|
16
|
+
address = Tempfile.new('sdn')
|
17
|
+
address.write(Net::HTTP.get(URI.parse('http://www.treas.gov/offices/enforcement/ofac/sdn/delimit/add.pip')))
|
18
|
+
address.rewind
|
19
|
+
alt = Tempfile.new('sdn')
|
20
|
+
alt.write(Net::HTTP.get(URI.parse('http://www.treas.gov/offices/enforcement/ofac/sdn/delimit/alt.pip')))
|
21
|
+
alt.rewind
|
22
|
+
|
23
|
+
if OfacSdn.connection.kind_of?(ActiveRecord::ConnectionAdapters::MysqlAdapter)
|
24
|
+
puts "Converting file to csv format for Mysql import. This could take several minutes."
|
25
|
+
|
26
|
+
csv_file = convert_to_flattened_csv(sdn, address, alt)
|
27
|
+
|
28
|
+
bulk_mysql_update(csv_file)
|
29
|
+
else
|
30
|
+
active_record_file_load(sdn, address, alt)
|
31
|
+
end
|
32
|
+
|
33
|
+
sdn.close
|
34
|
+
@address.close
|
35
|
+
@alt.close
|
36
|
+
end
|
37
|
+
|
38
|
+
|
39
|
+
private
|
40
|
+
|
41
|
+
#convert the file's null value to an empty string
|
42
|
+
#and removes " chars.
|
43
|
+
def self.clean_file_string(line)
|
44
|
+
line.gsub!(/-0-(\s)?/,'')
|
45
|
+
line.gsub!(/\n/,'')
|
46
|
+
line.gsub(/\"/,'')
|
47
|
+
end
|
48
|
+
|
49
|
+
#split the line into an array
|
50
|
+
def self.convert_line_to_array(line)
|
51
|
+
clean_file_string(line).split('|') unless line.nil?
|
52
|
+
end
|
53
|
+
|
54
|
+
#return an 2 arrays of the records matching the sdn primary key
|
55
|
+
#1 array of address records and one array of alt records
|
56
|
+
def self.foreign_key_records(sdn_id)
|
57
|
+
address_records = []
|
58
|
+
alt_records = []
|
59
|
+
|
60
|
+
#the first element in each array is the primary and foreign keys
|
61
|
+
#we are denormalizing the data
|
62
|
+
if @current_address_hash && @current_address_hash[:id] == sdn_id
|
63
|
+
address_records << @current_address_hash
|
64
|
+
loop do
|
65
|
+
@current_address_hash = address_text_to_hash(@address.gets)
|
66
|
+
if @current_address_hash && @current_address_hash[:id] == sdn_id
|
67
|
+
address_records << @current_address_hash
|
68
|
+
else
|
69
|
+
break
|
70
|
+
end
|
71
|
+
end
|
72
|
+
end
|
73
|
+
|
74
|
+
if @current_alt_hash && @current_alt_hash[:id] == sdn_id
|
75
|
+
alt_records << @current_alt_hash
|
76
|
+
loop do
|
77
|
+
@current_alt_hash = alt_text_to_hash(@alt.gets)
|
78
|
+
if @current_alt_hash && @current_alt_hash[:id] == sdn_id
|
79
|
+
alt_records << @current_alt_hash
|
80
|
+
else
|
81
|
+
break
|
82
|
+
end
|
83
|
+
end
|
84
|
+
end
|
85
|
+
return address_records, alt_records
|
86
|
+
end
|
87
|
+
|
88
|
+
def self.sdn_text_to_hash(line)
|
89
|
+
unless line.nil?
|
90
|
+
value_array = convert_line_to_array(line)
|
91
|
+
{:id => value_array[0],
|
92
|
+
:name => value_array[1],
|
93
|
+
:sdn_type => value_array[2],
|
94
|
+
:program => value_array[3],
|
95
|
+
:title => value_array[4],
|
96
|
+
:vessel_call_sign => value_array[5],
|
97
|
+
:vessel_type => value_array[6],
|
98
|
+
:vessel_tonnage => value_array[7],
|
99
|
+
:gross_registered_tonnage => value_array[8],
|
100
|
+
:vessel_flag => value_array[9],
|
101
|
+
:vessel_owner => value_array[10],
|
102
|
+
:remarks => value_array[11]
|
103
|
+
}
|
104
|
+
end
|
105
|
+
end
|
106
|
+
|
107
|
+
def self.address_text_to_hash(line)
|
108
|
+
unless line.nil?
|
109
|
+
value_array = convert_line_to_array(line)
|
110
|
+
{:id => value_array[0],
|
111
|
+
:address => value_array[2],
|
112
|
+
:city => value_array[3],
|
113
|
+
:country => value_array[4],
|
114
|
+
:address_remarks => value_array[5]
|
115
|
+
}
|
116
|
+
end
|
117
|
+
end
|
118
|
+
|
119
|
+
def self.alt_text_to_hash(line)
|
120
|
+
unless line.nil?
|
121
|
+
value_array = convert_line_to_array(line)
|
122
|
+
{:id => value_array[0],
|
123
|
+
:alternate_identity_type => value_array[2],
|
124
|
+
:alternate_identity_name => value_array[3],
|
125
|
+
:alternate_identity_remarks => value_array[4]
|
126
|
+
}
|
127
|
+
end
|
128
|
+
end
|
129
|
+
|
130
|
+
def self.convert_hash_to_mysql_import_string(record_hash)
|
131
|
+
# empty field for id to be generated by mysql.
|
132
|
+
new_line = "``|" +
|
133
|
+
# :name
|
134
|
+
"`#{record_hash[:name]}`|" +
|
135
|
+
# :sdn_type
|
136
|
+
"`#{record_hash[:sdn_type]}`|" +
|
137
|
+
# :program
|
138
|
+
"`#{record_hash[:program]}`|" +
|
139
|
+
# :title
|
140
|
+
"`#{record_hash[:title]}`|" +
|
141
|
+
# :vessel_call_sign
|
142
|
+
"`#{record_hash[:vessel_call_sign]}`|" +
|
143
|
+
# :vessel_type
|
144
|
+
"`#{record_hash[:vessel_type]}`|" +
|
145
|
+
# :vessel_tonnage
|
146
|
+
"`#{record_hash[:vessel_tonnage]}`|" +
|
147
|
+
# :gross_registered_tonnage
|
148
|
+
"`#{record_hash[:gross_registered_tonnage]}`|" +
|
149
|
+
# :vessel_flag
|
150
|
+
"`#{record_hash[:vessel_flag]}`|" +
|
151
|
+
# :vessel_owner
|
152
|
+
"`#{record_hash[:vessel_owner]}`|" +
|
153
|
+
# :remarks
|
154
|
+
"`#{record_hash[:remarks]}`|" +
|
155
|
+
# :address
|
156
|
+
"`#{record_hash[:address]}`|" +
|
157
|
+
# :city
|
158
|
+
"`#{record_hash[:city]}`|" +
|
159
|
+
# :country
|
160
|
+
"`#{record_hash[:country]}`|" +
|
161
|
+
# :address_remarks
|
162
|
+
"`#{record_hash[:address_remarks]}`|" +
|
163
|
+
# :alternate_identity_type
|
164
|
+
"`#{record_hash[:alternate_identity_type]}`|" +
|
165
|
+
# :alternate_identity_name
|
166
|
+
"`#{record_hash[:alternate_identity_name]}`|" +
|
167
|
+
# :alternate_identity_remarks
|
168
|
+
"`#{record_hash[:alternate_identity_remarks]}`|" +
|
169
|
+
#:created_at
|
170
|
+
"`#{Time.now.to_s(:db)}`|" +
|
171
|
+
# updated_at
|
172
|
+
"`#{Time.now.to_s(:db)}`" + "\n"
|
173
|
+
|
174
|
+
new_line
|
175
|
+
end
|
176
|
+
|
177
|
+
def self.convert_to_flattened_csv(sdn_file, address_file, alt_file)
|
178
|
+
@address = address_file
|
179
|
+
@alt = alt_file
|
180
|
+
|
181
|
+
csv_file = Tempfile.new("ofac") # create temp file for converted csv format.
|
182
|
+
#get the first line from the address and alt files
|
183
|
+
@current_address_hash = address_text_to_hash(@address.gets)
|
184
|
+
@current_alt_hash = alt_text_to_hash(@alt.gets)
|
185
|
+
|
186
|
+
start = Time.now
|
187
|
+
|
188
|
+
sdn_file.each_with_index do |line, i|
|
189
|
+
|
190
|
+
#initialize the address and alt atributes to empty strings
|
191
|
+
address_attributes = address_text_to_hash("|||||")
|
192
|
+
alt_attributes = alt_text_to_hash("||||")
|
193
|
+
|
194
|
+
sdn_attributes = sdn_text_to_hash(line)
|
195
|
+
|
196
|
+
#get the foreign key records for this sdn
|
197
|
+
address_records, alt_records = foreign_key_records(sdn_attributes[:id])
|
198
|
+
|
199
|
+
if address_records.empty?
|
200
|
+
#no matching address records, so initialized blank values will be used.
|
201
|
+
if alt_records.empty?
|
202
|
+
#no matching address records, so initialized blank values will be used.
|
203
|
+
csv_file.syswrite(convert_hash_to_mysql_import_string(sdn_attributes.merge(address_attributes).merge(alt_attributes)))
|
204
|
+
else
|
205
|
+
alt_records.each do |alt|
|
206
|
+
csv_file.syswrite(convert_hash_to_mysql_import_string(sdn_attributes.merge(address_attributes).merge(alt)))
|
207
|
+
end
|
208
|
+
end
|
209
|
+
else
|
210
|
+
address_records.each do |address|
|
211
|
+
if alt_records.empty?
|
212
|
+
#no matching address records, so initialized blank values will be used.
|
213
|
+
csv_file.syswrite(convert_hash_to_mysql_import_string(sdn_attributes.merge(address).merge(alt_attributes)))
|
214
|
+
else
|
215
|
+
alt_records.each do |alt|
|
216
|
+
csv_file.syswrite(convert_hash_to_mysql_import_string(sdn_attributes.merge(address).merge(alt)))
|
217
|
+
end
|
218
|
+
end
|
219
|
+
end
|
220
|
+
end
|
221
|
+
puts "#{i} records processed." if (i % 1000 == 0) && (i > 0)
|
222
|
+
end
|
223
|
+
puts "File conversion ran for #{(Time.now - start) / 60} minutes."
|
224
|
+
return csv_file
|
225
|
+
end
|
226
|
+
|
227
|
+
def self.active_record_file_load(sdn_file, address_file, alt_file)
|
228
|
+
@address = address_file
|
229
|
+
@alt = alt_file
|
230
|
+
|
231
|
+
#OFAC data is a complete list, so we have to dump and load
|
232
|
+
OfacSdn.delete_all
|
233
|
+
|
234
|
+
#get the first line from the address and alt files
|
235
|
+
@current_address_hash = address_text_to_hash(@address.gets)
|
236
|
+
@current_alt_hash = alt_text_to_hash(@alt.gets)
|
237
|
+
attributes = {}
|
238
|
+
sdn_file.each_with_index do |line, i|
|
239
|
+
|
240
|
+
#initialize the address and alt atributes to empty strings
|
241
|
+
address_attributes = address_text_to_hash("|||||")
|
242
|
+
alt_attributes = alt_text_to_hash("||||")
|
243
|
+
|
244
|
+
sdn_attributes = sdn_text_to_hash(line)
|
245
|
+
|
246
|
+
#get the foreign key records for this sdn
|
247
|
+
address_records, alt_records = foreign_key_records(sdn_attributes[:id])
|
248
|
+
|
249
|
+
if address_records.empty?
|
250
|
+
#no matching address records, so initialized blank values will be used.
|
251
|
+
if alt_records.empty?
|
252
|
+
#no matching address records, so initialized blank values will be used.
|
253
|
+
attributes = sdn_attributes.merge(address_attributes).merge(alt_attributes)
|
254
|
+
attributes.delete(:id)
|
255
|
+
OfacSdn.create(attributes)
|
256
|
+
else
|
257
|
+
alt_records.each do |alt|
|
258
|
+
attributes = sdn_attributes.merge(address_attributes).merge(alt)
|
259
|
+
attributes.delete(:id)
|
260
|
+
OfacSdn.create(attributes)
|
261
|
+
end
|
262
|
+
end
|
263
|
+
else
|
264
|
+
address_records.each do |address|
|
265
|
+
if alt_records.empty?
|
266
|
+
#no matching address records, so initialized blank values will be used.
|
267
|
+
attributes = sdn_attributes.merge(address).merge(alt_attributes)
|
268
|
+
attributes.delete(:id)
|
269
|
+
OfacSdn.create(attributes)
|
270
|
+
else
|
271
|
+
alt_records.each do |alt|
|
272
|
+
attributes = sdn_attributes.merge(address).merge(alt)
|
273
|
+
attributes.delete(:id)
|
274
|
+
OfacSdn.create(attributes)
|
275
|
+
end
|
276
|
+
end
|
277
|
+
end
|
278
|
+
end
|
279
|
+
|
280
|
+
puts "#{i} records processed." if (i % 5000 == 0) && (i > 0)
|
281
|
+
end
|
282
|
+
end
|
283
|
+
|
284
|
+
# For mysql, use:
|
285
|
+
# LOAD DATA LOCAL INFILE 'ssdm1.csv' INTO TABLE death_master_files FIELDS TERMINATED BY '|' ENCLOSED BY "`" LINES TERMINATED BY '\n';
|
286
|
+
# This is a much faster way of loading large amounts of data into mysql. For information on the LOAD DATA command
|
287
|
+
# see http://dev.mysql.com/doc/refman/5.1/en/load-data.html
|
288
|
+
def self.bulk_mysql_update(csv_file)
|
289
|
+
puts "Deleting all records in ofac_sdn..."
|
290
|
+
|
291
|
+
#OFAC data is a complete list, so we have to dump and load
|
292
|
+
OfacSdn.delete_all
|
293
|
+
|
294
|
+
puts "Importing into Mysql..."
|
295
|
+
|
296
|
+
mysql_command = <<-TEXT
|
297
|
+
LOAD DATA LOCAL INFILE '#{csv_file.path}' REPLACE INTO TABLE ofac_sdns FIELDS TERMINATED BY '|' ENCLOSED BY "`" LINES TERMINATED BY '\n';
|
298
|
+
TEXT
|
299
|
+
|
300
|
+
OfacSdn.connection.execute(mysql_command)
|
301
|
+
puts "Mysql import complete."
|
302
|
+
|
303
|
+
end
|
304
|
+
|
305
|
+
end
|
@@ -0,0 +1,132 @@
|
|
1
|
+
class OfacMatch
|
2
|
+
|
3
|
+
attr_reader :possible_hits
|
4
|
+
|
5
|
+
#Intialize a Match object with a record hash of fields you want to match on.
|
6
|
+
#Each key in the hash, also has a data hash value for the weight, token, and type.
|
7
|
+
#
|
8
|
+
# match = Ofac::Match.new({:name => {:weight => 10, :token => 'Kevin Tyll'},
|
9
|
+
# :city => {:weight => 40, :token => 'Clearwater', },
|
10
|
+
# :address => {:weight => 40, :token => '1234 Park St.', },
|
11
|
+
# :zip => {:weight => 10, :token => '33759', :type => :number}})
|
12
|
+
#
|
13
|
+
# data hash keys:
|
14
|
+
# * <tt>data[:weight]</tt> - value to apply to the score if there is a match (Default is 100/number of key in the record hash)
|
15
|
+
# * <tt>data[:token]</tt> - string to match
|
16
|
+
# * <tt>data[:match]</tt> - set from records hash
|
17
|
+
# * <tt>data[:score]</tt> - output field
|
18
|
+
# * <tt>data[:type]</tt> - the type of match that should be performed (valid values are +:sound+ | +:number+) (Default is +:sound+)
|
19
|
+
def initialize(stats={})
|
20
|
+
@possible_hits = []
|
21
|
+
@stats = stats.dup
|
22
|
+
weight = 100
|
23
|
+
weight = 100 / @stats.length if @stats.length > 0
|
24
|
+
@stats.each_value do |data|
|
25
|
+
data[:weight] ||= weight
|
26
|
+
data[:match] ||= ''
|
27
|
+
data[:type] ||= :sound
|
28
|
+
data[:score] ||= 0
|
29
|
+
data[:token] = data[:token].to_s.upcase
|
30
|
+
end
|
31
|
+
end
|
32
|
+
|
33
|
+
# match_records is an array of hashes.
|
34
|
+
#
|
35
|
+
# The hash keys must match the record hash keys set when initialized.
|
36
|
+
#
|
37
|
+
# score will return the highest score of all the records that
|
38
|
+
# are sent in match_records.
|
39
|
+
def score(match_records)
|
40
|
+
score_results = Array.new
|
41
|
+
unless match_records.empty?
|
42
|
+
#place the match_records information
|
43
|
+
#into our @stats hash
|
44
|
+
match_records.each do |match|
|
45
|
+
match.each do |key, value|
|
46
|
+
@stats[key.to_sym][:match] = value.to_s.upcase
|
47
|
+
end
|
48
|
+
record_score = calculate_record
|
49
|
+
score_results.push(record_score)
|
50
|
+
@possible_hits << match.merge(:score => record_score) if record_score > 0
|
51
|
+
end
|
52
|
+
score = score_results.max #take max score
|
53
|
+
end
|
54
|
+
@possible_hits.uniq!
|
55
|
+
score ||= 0
|
56
|
+
end
|
57
|
+
|
58
|
+
private
|
59
|
+
|
60
|
+
|
61
|
+
# calculate the score for this record
|
62
|
+
# comparing the token to the match fields in the @stats hash
|
63
|
+
# and storing the score into the record
|
64
|
+
def calculate_record
|
65
|
+
score = 0
|
66
|
+
unless @stats.nil?
|
67
|
+
#need to make sure we check the name first, since city and address don't
|
68
|
+
#get added to the score unless there is a name match
|
69
|
+
[:name,:city,:address].each do |field|
|
70
|
+
data = @stats[field]
|
71
|
+
if (data[:token].blank?)
|
72
|
+
value = 0 #token is blank can't be sure of a match if nothing to match against
|
73
|
+
else
|
74
|
+
if (data[:match].blank?)
|
75
|
+
value = 0 #token has value match is blank
|
76
|
+
else
|
77
|
+
#token and match both have values
|
78
|
+
if (data[:type] == :number)
|
79
|
+
value = data[:token] == data[:match] ? 1 : 0
|
80
|
+
else
|
81
|
+
#first see if there is an exact match
|
82
|
+
value = data[:token] == data[:match] ? 1 : 0
|
83
|
+
|
84
|
+
unless value > 0
|
85
|
+
#do a sounds like with the data as given to see if we get a match
|
86
|
+
#if match on sounds_like, only give .75 of the weight.
|
87
|
+
value = data[:token].ofac_sounds_like(data[:match],false) ? 0.75 : 0
|
88
|
+
end
|
89
|
+
|
90
|
+
#if no match, then break the data down and see if we can find matches on the
|
91
|
+
#individual words
|
92
|
+
unless value > 0
|
93
|
+
token_data = data[:token].gsub(/\W/,'|')
|
94
|
+
token_array = token_data.split('|')
|
95
|
+
token_array.delete('')
|
96
|
+
|
97
|
+
match_data = data[:match].gsub(/\W/,'|')
|
98
|
+
match_array = match_data.split('|')
|
99
|
+
match_array.delete('')
|
100
|
+
|
101
|
+
value = 0
|
102
|
+
partial_weight = 1/token_array.length.to_f
|
103
|
+
token_array.each do |partial_token|
|
104
|
+
#first see if we get an exact match of the partial
|
105
|
+
if match_array.include?(partial_token)
|
106
|
+
value += partial_weight
|
107
|
+
else
|
108
|
+
#otherwise, see if the partial sounds like any part of the OFAC record
|
109
|
+
match_array.each do |partial_match|
|
110
|
+
if partial_match.ofac_sounds_like(partial_token,false)
|
111
|
+
#give partial value for every part of token that is matched.
|
112
|
+
value += partial_weight * 0.75
|
113
|
+
break
|
114
|
+
end
|
115
|
+
end
|
116
|
+
end
|
117
|
+
end
|
118
|
+
end
|
119
|
+
end
|
120
|
+
end
|
121
|
+
end
|
122
|
+
data[:score] = data[:weight] * value
|
123
|
+
score += data[:score]
|
124
|
+
break if field == :name && data[:score] == 0
|
125
|
+
end
|
126
|
+
|
127
|
+
end
|
128
|
+
score.round
|
129
|
+
end
|
130
|
+
|
131
|
+
end
|
132
|
+
|
@@ -0,0 +1,22 @@
|
|
1
|
+
class String
|
2
|
+
|
3
|
+
Ofac_SoundexChars = 'BPFVCSKGJQXZDTLMNR'
|
4
|
+
Ofac_SoundexNums = '111122222222334556'
|
5
|
+
Ofac_SoundexCharsEx = '^' + Ofac_SoundexChars
|
6
|
+
Ofac_SoundexCharsDel = '^A-Z'
|
7
|
+
|
8
|
+
# desc: http://en.wikipedia.org/wiki/Soundex
|
9
|
+
def ofac_soundex(census = true)
|
10
|
+
str = upcase.delete(Ofac_SoundexCharsDel).squeeze
|
11
|
+
|
12
|
+
str[0 .. 0] + str[1 .. -1].
|
13
|
+
delete(Ofac_SoundexCharsEx).
|
14
|
+
tr(Ofac_SoundexChars, Ofac_SoundexNums)[0 .. (census ? 2 : -1)].
|
15
|
+
ljust(3, '0') rescue ''
|
16
|
+
end
|
17
|
+
|
18
|
+
def ofac_sounds_like(other, census = true)
|
19
|
+
ofac_soundex(census) == other.ofac_soundex(census)
|
20
|
+
end
|
21
|
+
|
22
|
+
end
|
data/lib/tasks/ofac.rake
ADDED
@@ -0,0 +1,10 @@
|
|
1
|
+
10|7|-0- |-0- |"Panama"|-0-
|
2
|
+
15|12|-0- |-0- |"Panama"|-0-
|
3
|
+
22|14|"123 Somewhere Ln"|"Clearwater"|"United States"|-0-
|
4
|
+
39|27|-0- |"Managua"|"Nicaragua"|-0-
|
5
|
+
39|29|"Bal Harbour Shopping Center, Via Italia"|"Panama City"|"Panama"|-0-
|
6
|
+
41|41|"Avenida de Concha, Espina 8, E-28036"|"Madrid"|"Spain"|-0-
|
7
|
+
41|102|-0- |-0- |-0- |-0-
|
8
|
+
66|111|-0- |"Milan"|"Italy"|-0-
|
9
|
+
66|117|-0- |-0- |"Panama"|-0-
|
10
|
+
66|125|"1840 West 49th Street"|"Hialeah, FL"|"United States"|-0-
|
@@ -0,0 +1,10 @@
|
|
1
|
+
15|14|"aka"|"VIAJES GUAMA TOURS"|-0-
|
2
|
+
22|15|"aka"|"HERNANDEZ, Oscar Grouch"|-0-
|
3
|
+
22|16|"aka"|"Alternate Name"|-0-
|
4
|
+
25|57|"aka"|"AVIA IMPORT"|-0-
|
5
|
+
36|219|"aka"|"BNC"|-0-
|
6
|
+
36|220|"aka"|"NATIONAL BANK OF CUBA"|-0-
|
7
|
+
36|221|"aka"|"BNC"|-0-
|
8
|
+
41|222|"aka"|"NATIONAL BANK OF CUBA"|-0-
|
9
|
+
66|223|"aka"|"BNC"|-0-
|
10
|
+
66|224|"aka"|"NATIONAL BANK OF CUBA"|-0-
|
@@ -0,0 +1,9 @@
|
|
1
|
+
10|"ABASTECEDORA NAVAL Y INDUSTRIAL, S.A."|-0- |"CUBA"|-0- |-0- |-0- |-0- |-0- |-0- |-0- |-0-
|
2
|
+
15|"ABDELNUR| Nury de Jesus"|"individual"|"CUBA"|-0- |-0- |-0- |-0- |-0- |-0- |-0- |-0-
|
3
|
+
22|"HERNANDEZ, Oscar"|"individual"|"CUBA"|-0- |-0- |"Unknown vessel type"|-0- |-0- |-0- |"Acechilly Navigation Co., Malta"|-0-
|
4
|
+
24|"LOPEZ MENDEZ, Luis Eduardo"|"individual"|"CUBA"|-0- |-0- |"Unknown vessel type"|-0- |-0- |-0- |"Acefrosty Shipping Co., Malta"|-0-
|
5
|
+
25|"ACEFROSTY SHIPPING CO., LTD."|-0- |"CUBA"|-0- |-0- |-0- |-0- |-0- |-0- |-0- |-0-
|
6
|
+
36|"AEROCARIBBEAN AIRLINES"|-0- |"CUBA"|-0- |-0- |-0- |-0- |-0- |-0- |-0- |-0-
|
7
|
+
39|"AEROTAXI EJECUTIVO, S.A."|-0- |"CUBA"|-0- |-0- |-0- |-0- |-0- |-0- |-0- |-0-
|
8
|
+
41|"AGENCIA DE VIAJES GUAMA"|-0- |"CUBA"|-0- |-0- |-0- |-0- |-0- |-0- |-0- |-0-
|
9
|
+
66|"AGUIAR, Raul"|"individual"|"CUBA"|"Director, Banco Nacional de Cuba"|-0- |-0- |-0- |-0- |-0- |-0- |"; Director, Banco Nacional de Cuba."
|
@@ -0,0 +1,19 @@
|
|
1
|
+
``|`ABASTECEDORA NAVAL Y INDUSTRIAL, S.A.`|``|`CUBA`|``|``|``|``|``|``|``|``|``|``|`Panama`|``|``|``|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
2
|
+
``|`ABDELNUR`|` Nury de Jesus`|`individual`|`CUBA`|``|``|``|``|``|``|``|``|``|`Panama`|``|`aka`|`VIAJES GUAMA TOURS`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
3
|
+
``|`HERNANDEZ, Oscar`|`individual`|`CUBA`|``|``|`Unknown vessel type`|``|``|``|`Acechilly Navigation Co., Malta`|``|`123 Somewhere Ln`|`Clearwater`|`United States`|``|`aka`|`HERNANDEZ, Oscar Grouch`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
4
|
+
``|`HERNANDEZ, Oscar`|`individual`|`CUBA`|``|``|`Unknown vessel type`|``|``|``|`Acechilly Navigation Co., Malta`|``|`123 Somewhere Ln`|`Clearwater`|`United States`|``|`aka`|`Alternate Name`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
5
|
+
``|`LOPEZ MENDEZ, Luis Eduardo`|`individual`|`CUBA`|``|``|`Unknown vessel type`|``|``|``|`Acefrosty Shipping Co., Malta`|``|``|``|``|``|``|``|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
6
|
+
``|`ACEFROSTY SHIPPING CO., LTD.`|``|`CUBA`|``|``|``|``|``|``|``|``|``|``|``|``|`aka`|`AVIA IMPORT`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
7
|
+
``|`AEROCARIBBEAN AIRLINES`|``|`CUBA`|``|``|``|``|``|``|``|``|``|``|``|``|`aka`|`BNC`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
8
|
+
``|`AEROCARIBBEAN AIRLINES`|``|`CUBA`|``|``|``|``|``|``|``|``|``|``|``|``|`aka`|`NATIONAL BANK OF CUBA`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
9
|
+
``|`AEROCARIBBEAN AIRLINES`|``|`CUBA`|``|``|``|``|``|``|``|``|``|``|``|``|`aka`|`BNC`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
10
|
+
``|`AEROTAXI EJECUTIVO, S.A.`|``|`CUBA`|``|``|``|``|``|``|``|``|``|`Managua`|`Nicaragua`|``|``|``|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
11
|
+
``|`AEROTAXI EJECUTIVO, S.A.`|``|`CUBA`|``|``|``|``|``|``|``|``|`Bal Harbour Shopping Center, Via Italia`|`Panama City`|`Panama`|``|``|``|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
12
|
+
``|`AGENCIA DE VIAJES GUAMA`|``|`CUBA`|``|``|``|``|``|``|``|``|`Avenida de Concha, Espina 8, E-28036`|`Madrid`|`Spain`|``|`aka`|`NATIONAL BANK OF CUBA`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
13
|
+
``|`AGENCIA DE VIAJES GUAMA`|``|`CUBA`|``|``|``|``|``|``|``|``|``|``|``|``|`aka`|`NATIONAL BANK OF CUBA`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
14
|
+
``|`AGUIAR, Raul`|`individual`|`CUBA`|`Director, Banco Nacional de Cuba`|``|``|``|``|``|``|`; Director, Banco Nacional de Cuba.`|``|`Milan`|`Italy`|``|`aka`|`BNC`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
15
|
+
``|`AGUIAR, Raul`|`individual`|`CUBA`|`Director, Banco Nacional de Cuba`|``|``|``|``|``|``|`; Director, Banco Nacional de Cuba.`|``|`Milan`|`Italy`|``|`aka`|`NATIONAL BANK OF CUBA`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
16
|
+
``|`AGUIAR, Raul`|`individual`|`CUBA`|`Director, Banco Nacional de Cuba`|``|``|``|``|``|``|`; Director, Banco Nacional de Cuba.`|``|``|`Panama`|``|`aka`|`BNC`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
17
|
+
``|`AGUIAR, Raul`|`individual`|`CUBA`|`Director, Banco Nacional de Cuba`|``|``|``|``|``|``|`; Director, Banco Nacional de Cuba.`|``|``|`Panama`|``|`aka`|`NATIONAL BANK OF CUBA`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
18
|
+
``|`AGUIAR, Raul`|`individual`|`CUBA`|`Director, Banco Nacional de Cuba`|``|``|``|``|``|``|`; Director, Banco Nacional de Cuba.`|`1840 West 49th Street`|`Hialeah, FL`|`United States`|``|`aka`|`BNC`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
19
|
+
``|`AGUIAR, Raul`|`individual`|`CUBA`|`Director, Banco Nacional de Cuba`|``|``|``|``|``|``|`; Director, Banco Nacional de Cuba.`|`1840 West 49th Street`|`Hialeah, FL`|`United States`|``|`aka`|`NATIONAL BANK OF CUBA`|``|`2009-05-06 15:55:24`|`2009-05-06 15:55:24`
|
@@ -0,0 +1,20 @@
|
|
1
|
+
require 'ofac/models/ofac_sdn_loader'
|
2
|
+
|
3
|
+
class OfacSdnLoader
|
4
|
+
|
5
|
+
def self.load_current_sdn_file
|
6
|
+
sdn = File.new(File.dirname(__FILE__) + '/../../files/test_sdn_data_load.pip')
|
7
|
+
address = File.new(File.dirname(__FILE__) + '/../../files/test_address_data_load.pip')
|
8
|
+
alt = File.new(File.dirname(__FILE__) + '/../../files/test_alt_data_load.pip')
|
9
|
+
active_record_file_load(sdn, address, alt)
|
10
|
+
sdn.close
|
11
|
+
address.close
|
12
|
+
alt.close
|
13
|
+
end
|
14
|
+
|
15
|
+
#Gives access to the private convert_to_flattened_csv method
|
16
|
+
def self.create_csv_file(sdn, address, alt)
|
17
|
+
convert_to_flattened_csv(sdn, address, alt)
|
18
|
+
end
|
19
|
+
|
20
|
+
end
|
@@ -0,0 +1,40 @@
|
|
1
|
+
require 'test_helper'
|
2
|
+
|
3
|
+
class OfacSdnLoaderTest < Test::Unit::TestCase
|
4
|
+
|
5
|
+
context '' do
|
6
|
+
setup do setup_ofac_sdn_table end
|
7
|
+
|
8
|
+
should "load table from files multiple times and always have the same record count" do
|
9
|
+
assert_equal(0,OfacSdn.count)
|
10
|
+
OfacSdnLoader.load_current_sdn_file #this method is mocked to load test files instead of the live files from the web.
|
11
|
+
assert_equal(19, OfacSdn.count)
|
12
|
+
OfacSdnLoader.load_current_sdn_file
|
13
|
+
assert_equal(19, OfacSdn.count)
|
14
|
+
end
|
15
|
+
|
16
|
+
should "create flattened_csv_file_for_mysql_import" do
|
17
|
+
#since, I'm using sqlight3 for it's in memory db, I can't test the mysql load
|
18
|
+
#but I can test the csv file creation.
|
19
|
+
sdn = File.new(File.dirname(__FILE__) + '/files/test_sdn_data_load.pip')
|
20
|
+
address = File.new(File.dirname(__FILE__) + '/files/test_address_data_load.pip')
|
21
|
+
alt = File.new(File.dirname(__FILE__) + '/files/test_alt_data_load.pip')
|
22
|
+
|
23
|
+
csv = OfacSdnLoader.create_csv_file(sdn, address, alt) #this method was created in the mock only to call the private convert_to_flattened_csv method
|
24
|
+
correctly_formatted_csv = File.open(File.dirname(__FILE__) + '/files/valid_flattened_file.csv')
|
25
|
+
|
26
|
+
csv.rewind
|
27
|
+
generated_file = csv.readlines
|
28
|
+
#compare the values of each csv line, with the correctly formated "control file"
|
29
|
+
correctly_formatted_csv.each_with_index do |line,i|
|
30
|
+
csv_line = generated_file[i]
|
31
|
+
correctly_formatted_record_array = line.split('|')
|
32
|
+
csv_record_array = csv_line.split('|')
|
33
|
+
(0..18).each do |i| #skip indices 19 and 20, they are the created_at and updated_at fields, they will never match.
|
34
|
+
assert_equal correctly_formatted_record_array[i], csv_record_array[i]
|
35
|
+
end
|
36
|
+
end
|
37
|
+
end
|
38
|
+
|
39
|
+
end
|
40
|
+
end
|
data/test/ofac_test.rb
ADDED
@@ -0,0 +1,76 @@
|
|
1
|
+
require 'test_helper'
|
2
|
+
|
3
|
+
class OfacTest < Test::Unit::TestCase
|
4
|
+
|
5
|
+
context '' do
|
6
|
+
setup do
|
7
|
+
setup_ofac_sdn_table
|
8
|
+
OfacSdnLoader.load_current_sdn_file #this method is mocked to load test files instead of the live files from the web.
|
9
|
+
end
|
10
|
+
|
11
|
+
should "give a score of 0 if no name is given" do
|
12
|
+
assert_equal 0, Ofac.new({:address => '123 somewhere'}).score
|
13
|
+
end
|
14
|
+
|
15
|
+
should "give a score of 0 if there is no name match" do
|
16
|
+
assert_equal 0, Ofac.new({:name => 'Kevin'}).score
|
17
|
+
end
|
18
|
+
|
19
|
+
should "give a score of 0 if there is no name match but there is an address and city match" do
|
20
|
+
assert_equal 0, Ofac.new({:name => 'Kevin', :address => '123 somewhere ln', :city => 'Clearwater'}).score
|
21
|
+
end
|
22
|
+
|
23
|
+
should "give a score of 60 if there is a name match" do
|
24
|
+
assert_equal 60, Ofac.new({:name => 'Oscar Hernandez'}).score
|
25
|
+
assert_equal 60, Ofac.new({:name => 'Oscar Hernandez', :city => 'no match', :address => 'no match'}).score
|
26
|
+
assert_equal 60, Ofac.new({:name => 'Oscar Hernandez', :city => 'Las Vegas', :address => 'no match'}).score
|
27
|
+
assert_equal 60, Ofac.new({:name => 'Luis Lopez', :city => 'Las Vegas', :address => 'no match'}).score
|
28
|
+
end
|
29
|
+
|
30
|
+
should "give a score of 60 if there is a name match on alternate identity name" do
|
31
|
+
assert_equal 60, Ofac.new({:name => 'Alternate Name'}).score
|
32
|
+
end
|
33
|
+
|
34
|
+
should "give a partial score if there is a partial name match" do
|
35
|
+
assert_equal 40, Ofac.new({:name => 'Oscar middlename Hernandez'}).score
|
36
|
+
assert_equal 30, Ofac.new({:name => 'Oscar WrongLastName'}).score
|
37
|
+
assert_equal 70, Ofac.new({:name => 'Oscar middlename Hernandez',:city => 'Clearwater'}).score
|
38
|
+
end
|
39
|
+
|
40
|
+
should "give a score of 90 if there is a name and city match" do
|
41
|
+
assert_equal 90, Ofac.new({:name => 'Oscar Hernandez', :city => 'Clearwater', :address => 'no match'}).score
|
42
|
+
end
|
43
|
+
|
44
|
+
should "give a score of 100 if there is a name and city and address match" do
|
45
|
+
assert_equal 100, Ofac.new({:name => 'Oscar Hernandez', :city => 'Clearwater', :address => '123 somewhere ln'}).score
|
46
|
+
end
|
47
|
+
|
48
|
+
should "give partial scores for sounds like matches" do
|
49
|
+
|
50
|
+
#32456 summer lane sounds like 32456 Somewhere ln so is adds 75% of the address weight to the score, or 8.
|
51
|
+
assert_equal 98, Ofac.new({:name => 'Oscar Hernandez', :city => 'Clearwater', :address => '32456 summer lane'}).score
|
52
|
+
|
53
|
+
#summer sounds like somewhere, and all numbers sound alike, so 2 of the 3 address elements match by sound.
|
54
|
+
#Each element is worth 10\3 or 3.33. Exact matches add 2.33 each, and the sounds like adds 2.33 * .75 or 2.5
|
55
|
+
#because sounds like matches only add 75% of it's weight.
|
56
|
+
#2.5 + 2.5 = 5
|
57
|
+
assert_equal 95, Ofac.new({:name => 'Oscar Hernandez', :city => 'Clearwater', :address => '12358 summer blvd'}).score
|
58
|
+
|
59
|
+
|
60
|
+
#Louis sounds like Luis, and Lopez is an exact match:
|
61
|
+
#:name has a weight of 60, so each element is worth 30. A sounds like match is worth 30 * .75
|
62
|
+
assert_equal 53, Ofac.new({:name => 'Louis Lopez', :city => 'Las Vegas', :address => 'no match'}).score
|
63
|
+
end
|
64
|
+
|
65
|
+
should "return an array of possible hits" do
|
66
|
+
#it should matter which order you call score or possible hits.
|
67
|
+
sdn = Ofac.new({:name => 'Oscar Hernandez', :city => 'Clearwater', :address => '123 somewhere ln'})
|
68
|
+
assert sdn.score > 0
|
69
|
+
assert !sdn.possible_hits.empty?
|
70
|
+
|
71
|
+
sdn = Ofac.new({:name => 'Oscar Hernandez', :city => 'Clearwater', :address => '123 somewhere ln'})
|
72
|
+
assert !sdn.possible_hits.empty?
|
73
|
+
assert sdn.score > 0
|
74
|
+
end
|
75
|
+
end
|
76
|
+
end
|
data/test/test_helper.rb
ADDED
@@ -0,0 +1,48 @@
|
|
1
|
+
require 'rubygems'
|
2
|
+
require 'test/unit'
|
3
|
+
require 'shoulda'
|
4
|
+
require 'mocks/test/ofac_sdn_loader'
|
5
|
+
|
6
|
+
$LOAD_PATH.unshift(File.join(File.dirname(__FILE__), '..', 'lib'))
|
7
|
+
$LOAD_PATH.unshift(File.dirname(__FILE__))
|
8
|
+
require 'ofac'
|
9
|
+
|
10
|
+
ActiveRecord::Base.establish_connection :adapter => 'sqlite3', :database => ':memory:'
|
11
|
+
|
12
|
+
class Test::Unit::TestCase
|
13
|
+
def setup_ofac_sdn_table
|
14
|
+
ActiveRecord::Base.connection.tables.each { |table| ActiveRecord::Base.connection.drop_table(table) }
|
15
|
+
create_ofac_sdn_table
|
16
|
+
end
|
17
|
+
|
18
|
+
private
|
19
|
+
|
20
|
+
def create_ofac_sdn_table
|
21
|
+
silence_stream(STDOUT) do
|
22
|
+
ActiveRecord::Schema.define(:version => 1) do
|
23
|
+
create_table :ofac_sdns do |t|
|
24
|
+
t.text :name
|
25
|
+
t.string :sdn_type
|
26
|
+
t.string :program
|
27
|
+
t.string :title
|
28
|
+
t.string :vessel_call_sign
|
29
|
+
t.string :vessel_type
|
30
|
+
t.string :vessel_tonnage
|
31
|
+
t.string :gross_registered_tonnage
|
32
|
+
t.string :vessel_flag
|
33
|
+
t.string :vessel_owner
|
34
|
+
t.text :remarks
|
35
|
+
t.text :address
|
36
|
+
t.string :city
|
37
|
+
t.string :country
|
38
|
+
t.string :address_remarks
|
39
|
+
t.string :alternate_identity_type
|
40
|
+
t.text :alternate_identity_name
|
41
|
+
t.string :alternate_identity_remarks
|
42
|
+
t.timestamps
|
43
|
+
end
|
44
|
+
end
|
45
|
+
end
|
46
|
+
end
|
47
|
+
|
48
|
+
end
|
metadata
ADDED
@@ -0,0 +1,90 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: kevintyll-ofac
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 1.0.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Kevin Tyll
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
|
12
|
+
date: 2009-05-11 00:00:00 -07:00
|
13
|
+
default_executable:
|
14
|
+
dependencies: []
|
15
|
+
|
16
|
+
description: Attempts to find a hit on the Office of Foreign Assets Control's Specially Designated Nationals list.
|
17
|
+
email: kevintyll@gmail.com
|
18
|
+
executables: []
|
19
|
+
|
20
|
+
extensions: []
|
21
|
+
|
22
|
+
extra_rdoc_files:
|
23
|
+
- LICENSE
|
24
|
+
- README.rdoc
|
25
|
+
files:
|
26
|
+
- History.txt
|
27
|
+
- LICENSE
|
28
|
+
- PostInstall.txt
|
29
|
+
- README.rdoc
|
30
|
+
- Rakefile
|
31
|
+
- VERSION.yml
|
32
|
+
- generators/ofac_migration/ofac_migration_generator.rb
|
33
|
+
- generators/ofac_migration/templates/migration.rb
|
34
|
+
- lib/ofac.rb
|
35
|
+
- lib/ofac/models/ofac.rb
|
36
|
+
- lib/ofac/models/ofac_sdn.rb
|
37
|
+
- lib/ofac/models/ofac_sdn_loader.rb
|
38
|
+
- lib/ofac/ofac_match.rb
|
39
|
+
- lib/ofac/ruby_string_extensions.rb
|
40
|
+
- lib/tasks/ofac.rake
|
41
|
+
- test/files/test_address_data_load.pip
|
42
|
+
- test/files/test_alt_data_load.pip
|
43
|
+
- test/files/test_sdn_data_load.pip
|
44
|
+
- test/files/valid_flattened_file.csv
|
45
|
+
- test/mocks/test/ofac_sdn_loader.rb
|
46
|
+
- test/ofac_sdn_loader_test.rb
|
47
|
+
- test/ofac_test.rb
|
48
|
+
- test/test_helper.rb
|
49
|
+
has_rdoc: true
|
50
|
+
homepage: http://github.com/kevintyll/ofac
|
51
|
+
post_install_message: |-
|
52
|
+
For more information on ofac, see http://kevintyll.github.com/ofac/
|
53
|
+
|
54
|
+
* To create the necessary db migration, from the command line, run:
|
55
|
+
script/generate ofac_migration
|
56
|
+
* Require the gem in your environment.rb file in the Rails::Initializer block:
|
57
|
+
config.gem 'kevintyll-ofac', :lib => 'ofac'
|
58
|
+
* To load your table with the current OFAC data, from the command line, run:
|
59
|
+
rake ofac:update_data
|
60
|
+
|
61
|
+
* The OFAC data is not updated with any regularity, but you can sign up for email notifications when the data changes at
|
62
|
+
http://www.treas.gov/offices/enforcement/ofac/sdn/index.shtml.
|
63
|
+
rdoc_options:
|
64
|
+
- --charset=UTF-8
|
65
|
+
require_paths:
|
66
|
+
- lib
|
67
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
68
|
+
requirements:
|
69
|
+
- - ">="
|
70
|
+
- !ruby/object:Gem::Version
|
71
|
+
version: "0"
|
72
|
+
version:
|
73
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
74
|
+
requirements:
|
75
|
+
- - ">="
|
76
|
+
- !ruby/object:Gem::Version
|
77
|
+
version: "0"
|
78
|
+
version:
|
79
|
+
requirements: []
|
80
|
+
|
81
|
+
rubyforge_project:
|
82
|
+
rubygems_version: 1.2.0
|
83
|
+
signing_key:
|
84
|
+
specification_version: 2
|
85
|
+
summary: Attempts to find a hit on the Office of Foreign Assets Control's Specially Designated Nationals list.
|
86
|
+
test_files:
|
87
|
+
- test/mocks/test/ofac_sdn_loader.rb
|
88
|
+
- test/ofac_sdn_loader_test.rb
|
89
|
+
- test/ofac_test.rb
|
90
|
+
- test/test_helper.rb
|