address_tokens 0.1.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: dc428acad65025604ed184b2c5c243b48715fed99e00dc5b3bfee54208e169f5
4
+ data.tar.gz: 791c48d742010fc338cae65e03471796b98867dce09d07d3697adad638c150fc
5
+ SHA512:
6
+ metadata.gz: afc4a676c93f9e0d2aa6be9431acc7cdebb0113c192e93cde33f4053d3cf405e09024be3ed211705d514a02a1e1f61395b40f91c444271dba7e228b6dbc00ec2
7
+ data.tar.gz: 5d49a5903fb85a89562a52151fd55aa394ec1528ea22558ccc6a4121e516c79ec7ef93c4a3aabdf56755ab3c6d3450e0a72e3ee280e6f65a3fc1fb2dc780a510
checksums.yaml.gz.sig ADDED
Binary file
data/.gitignore ADDED
@@ -0,0 +1,8 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /_yardoc/
4
+ /coverage/
5
+ /doc/
6
+ /pkg/
7
+ /spec/reports/
8
+ /tmp/
data/.travis.yml ADDED
@@ -0,0 +1,5 @@
1
+ sudo: false
2
+ language: ruby
3
+ rvm:
4
+ - 2.5.0
5
+ before_install: gem install bundler -v 1.16.1
data/Gemfile ADDED
@@ -0,0 +1,6 @@
1
+ source "https://rubygems.org"
2
+
3
+ git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }
4
+
5
+ # Specify your gem's dependencies in address_tokens.gemspec
6
+ gemspec
data/README.md ADDED
@@ -0,0 +1,150 @@
1
+ # AddressTokens
2
+
3
+ Gem to find tokens on address strings. Can identify address, city and state name
4
+ and abbreviation.
5
+
6
+ ## Installation
7
+
8
+ Add this line to your application's Gemfile:
9
+
10
+ ```ruby
11
+ gem 'address_tokens'
12
+ ```
13
+
14
+ And then execute:
15
+
16
+ $ bundle
17
+
18
+ Or install it yourself as:
19
+
20
+ $ gem install address_tokens
21
+
22
+ ## Usage
23
+
24
+ Always hated when you get some address like
25
+
26
+ ```
27
+ 88 Colin P Kelly Jr St San Francisco, CA 94107
28
+ ```
29
+
30
+ and don't know where is the address, where is the state, where is the city name?
31
+
32
+ Ok, me too. So I made this class.
33
+
34
+ We need state and city data to find out what the tokens are. To load the date,
35
+ we'll need YAML files like [I have on this
36
+ repo](https://github.com/taq/brstatescities). They follow formats like these:
37
+
38
+ ```
39
+ # states.yml:
40
+ ---
41
+ CA: California
42
+
43
+ # cities.yml
44
+ CA:
45
+ - San Francisco
46
+ ```
47
+
48
+ Finding by exact match is fast and piece of cake. The problem is where we have
49
+ some different ways to write or abbreviate the city names like we have here on
50
+ Brazil. For example, my city, São José do Rio Preto, can have some forms:
51
+
52
+ 1. São José do Rio Preto
53
+ 2. São José Rio Preto
54
+ 3. S. J. do Rio Preto
55
+ 4. S. J. R. Preto
56
+
57
+ And keep going. But I think I could find some ways to find that.
58
+
59
+ ### Finding stuff
60
+
61
+ First, we create a Finder object:
62
+
63
+ ```ruby
64
+ finder = AddressTokens::Finder.new('88 Colin P Kelly Jr St San Francisco, CA 94107')
65
+ ```
66
+
67
+ Load the cities and states:
68
+
69
+ ```ruby
70
+ finder.load(:states, '/tmp/states.yml')
71
+ finder.load(:cities, '/tmp/cities.yml')
72
+ finder.find
73
+ ```
74
+
75
+ And ask to find:
76
+
77
+ ```ruby
78
+ matches = finder.find
79
+ ```
80
+
81
+ We'll get something like this:
82
+
83
+ ```ruby
84
+ p matches
85
+ {
86
+ :state_abbr => "CA",
87
+ :state_name => "California",
88
+ :state_start_at => 36,
89
+ :city_name => "San Francisco",
90
+ :city_string => "San Francisco",
91
+ :city_start_at => 23,
92
+ :address => "88 Colin P Kelly Jr St"
93
+ }
94
+ ```
95
+ The `start_at` values shows where the strings were found. The `city_string` is
96
+ the way the city name was found.
97
+
98
+ ### Custom options
99
+
100
+ As we saw, the default city and states separator on USA is a comma (','), but on
101
+ Brasil is a hyphen ('-'), so, **before asking to find the address**, we must
102
+ change it:
103
+
104
+ ```ruby
105
+ finder.state_separator = '-'
106
+ ```
107
+
108
+ Using a Brazilian address:
109
+
110
+ ```ruby
111
+ finder = AddressTokens::Finder.new('Rua Tratado de Tordesihas, 88, Pq. Estoril,
112
+ S. J. do Rio Preto - SP')
113
+ finder.state_separator = '-'
114
+ finder.load(:states, '/tmp/states.yml')
115
+ finder.load(:cities, '/tmp/cities.yml')
116
+ finder.find
117
+ ```
118
+
119
+ will return:
120
+
121
+ ```ruby
122
+ {
123
+ :state_abbr => "SP",
124
+ :state_name => "São Paulo",
125
+ :state_start_at => 63,
126
+ :city_name => "São José do Rio Preto",
127
+ :city_string => "S. J. do Rio Preto",
128
+ :city_start_at => 44,
129
+ :address => "Rua Tratado de Tordesihas, 88, Pq. Estoril,"
130
+ }
131
+ ```
132
+
133
+ ## Development
134
+
135
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run
136
+ `rake test` to run the tests. You can also run `bin/console` for an interactive
137
+ prompt that will allow you to experiment.
138
+
139
+ To install this gem onto your local machine, run `bundle exec rake install`. To
140
+ release a new version, update the version number in `version.rb`, and then run
141
+ `bundle exec rake release`, which will create a git tag for the version, push
142
+ git commits and tags, and push the `.gem` file to
143
+ [rubygems.org](https://rubygems.org).
144
+
145
+ Please run the tests using `rake test`.
146
+
147
+ ## Contributing
148
+
149
+ Bug reports and pull requests are welcome on GitHub at
150
+ https://github.com/taq/address_tokens.
data/Rakefile ADDED
@@ -0,0 +1,10 @@
1
+ require "bundler/gem_tasks"
2
+ require "rake/testtask"
3
+
4
+ Rake::TestTask.new(:test) do |t|
5
+ t.libs << "test"
6
+ t.libs << "lib"
7
+ t.test_files = FileList["test/**/*_test.rb"]
8
+ end
9
+
10
+ task :default => :test
@@ -0,0 +1,30 @@
1
+
2
+ lib = File.expand_path("../lib", __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require "address_tokens/version"
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "address_tokens"
8
+ spec.version = AddressTokens::VERSION
9
+ spec.authors = ["Eustaquio Rangel"]
10
+ spec.email = ["taq@eustaquiorangel.com"]
11
+
12
+ spec.summary = %q{Find address tokens on a string}
13
+ spec.description = %q{Always want to find where address, city and state are on a string? Use this gem.}
14
+ spec.homepage = "http://github.com/taq/address_tokens"
15
+
16
+ spec.files = `git ls-files -z`.split("\x0").reject do |f|
17
+ f.match(%r{^(test|spec|features)/})
18
+ end
19
+ spec.bindir = "exe"
20
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
21
+ spec.require_paths = ["lib"]
22
+
23
+ spec.add_development_dependency "bundler", "~> 1.16"
24
+ spec.add_development_dependency "rake", "~> 10.0"
25
+ spec.add_development_dependency "minitest", "~> 5.0"
26
+ spec.add_development_dependency "i18n"
27
+
28
+ spec.signing_key = '/home/taq/.gemcert/gem-private_key.pem'
29
+ spec.cert_chain = ['gem-public_cert.pem']
30
+ end
data/bin/console ADDED
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "address_tokens"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start(__FILE__)
data/bin/setup ADDED
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,3 @@
1
+ module AddressTokens
2
+ VERSION = "0.1.3"
3
+ end
@@ -0,0 +1,6 @@
1
+ require 'address_tokens/version'
2
+ require 'finder'
3
+ require 'matcher'
4
+
5
+ module AddressTokens
6
+ end
@@ -0,0 +1,5 @@
1
+ module AddressTokens
2
+ class CityNotFound < StandardError
3
+ end
4
+ end
5
+
data/lib/finder.rb ADDED
@@ -0,0 +1,57 @@
1
+ require 'yaml'
2
+ require 'i18n'
3
+
4
+ module AddressTokens
5
+ class Finder
6
+ attr_accessor :states, :cities, :state_separator, :string
7
+ attr_reader :city_tokens
8
+
9
+ def initialize(str)
10
+ raise Exception, 'String is null or empty' if str.strip.size < 1
11
+ @string, @states, @cities, = str, {}, {}
12
+ I18n.config.available_locales = :en
13
+ @state_separator = ','
14
+ end
15
+
16
+ def load(var, file)
17
+ load_file(var, file)
18
+ end
19
+
20
+ def find
21
+ raise ArgumentError, 'No states found' if @states.size == 0
22
+ raise ArgumentError, 'No cities found' if @cities.size == 0
23
+ transliterate_cities
24
+ Matcher.new(self).match(@string)
25
+ end
26
+
27
+ private
28
+
29
+ def load_file(var, file)
30
+ raise IOError, "File #{file} was not found" if !File.exist?(file)
31
+ data = YAML.load(File.read(file))
32
+ raise TypeError, "File #{file} is not a valid YAML file" if !data.kind_of?(Hash)
33
+ instance_variable_set("@#{var}", data)
34
+ end
35
+
36
+ def transliterate_cities
37
+ @city_tokens = {}
38
+
39
+ for state, cities in @cities
40
+ @city_tokens[state] = []
41
+
42
+ for city in cities
43
+ tokens = []
44
+ city.gsub!(/\s{2,}/, ' ')
45
+ tokens << city
46
+
47
+ tdown = I18n.transliterate(city).downcase
48
+ tokens << tdown
49
+
50
+ splitted = tdown.split
51
+ tokens << splitted if splitted.size > 1
52
+ @city_tokens[state] << tokens
53
+ end
54
+ end
55
+ end
56
+ end
57
+ end
data/lib/matcher.rb ADDED
@@ -0,0 +1,90 @@
1
+ module AddressTokens
2
+ class Matcher
3
+ def initialize(finder)
4
+ @finder = finder
5
+ end
6
+
7
+ def match(str)
8
+ state_info = find_state(str)
9
+ city_info = find_city(str, state_info)
10
+ address_info = find_address(str, city_info)
11
+
12
+ {
13
+ state_abbr: state_info[:state],
14
+ state_name: @finder.states[state_info[:state]],
15
+ state_start_at: state_info[:start_at],
16
+ city_name: city_info[:city_name],
17
+ city_string: city_info[:city_string],
18
+ city_start_at: city_info[:start_at],
19
+ address: address_info[:address]
20
+ }
21
+ end
22
+
23
+ private
24
+
25
+ def find_state(str)
26
+ last_char = str.rindex(@finder.state_separator)
27
+ token = str[last_char .. -1]
28
+ token = token.gsub(/\s{2,}/, ' ')
29
+ matches = token.match(Regexp.new("#{@finder.state_separator}\s?(?<state>\\w+)"))
30
+ { state: matches[:state], start_at: last_char }
31
+ end
32
+
33
+ def find_city(str, state_info)
34
+ cities = @finder.city_tokens[state_info[:state]]
35
+ raise StateNotFound, "State #{state_info[:state]} not found on state data" if cities.nil?
36
+
37
+ without_state = str[0 .. state_info[:start_at] - 1].strip.gsub(/\s{2,}/, ' ')
38
+ transliterated = I18n.transliterate(without_state).downcase
39
+
40
+ exact_or_trans = find_city_by_exact_or_trans(cities, without_state, transliterated)
41
+ return exact_or_trans if !exact_or_trans.nil?
42
+
43
+ tokenized = find_city_by_tokenized(cities, transliterated, str)
44
+ return tokenized if !tokenized.nil?
45
+
46
+ raise CityNotFound, "City not found"
47
+ end
48
+
49
+ def find_city_by_tokenized(cities, transliterated, str)
50
+ choices = []
51
+
52
+ cities.each do |city|
53
+ tokens = transliterated.split
54
+ city_tokens = I18n.transliterate(city[0]).downcase.split
55
+
56
+ if tokens[-1] == city_tokens[-1]
57
+ first_tokens = tokens.map { |token| token[0] }.join
58
+ first_city_tokens = city_tokens.map { |token| token[0] }.join
59
+ first_shorten_city_tokens = city_tokens.select { |token| token.size > 2 }.map { |token| token[0] }.join
60
+
61
+ choices << [city[0], first_tokens, first_city_tokens] if Regexp.new("#{first_city_tokens}$").match? first_tokens
62
+ choices << [city[0], first_tokens, first_shorten_city_tokens] if Regexp.new("#{first_shorten_city_tokens}$").match? first_tokens
63
+ end
64
+ end
65
+
66
+ return nil if choices.size == 0
67
+
68
+ reversed = choices.sort_by { |choice| choice[2].size }.reverse
69
+ regex = reversed[0][2].scan(/./).map { |char| "#{char}[\\p{Latin}\.]*\\s+"}.join.strip[0..-4]
70
+ matches = Regexp.new(regex, 'i').match(str)
71
+ { city_name: reversed[0][0], city_string: matches ? matches[0].strip : nil, start_at: matches ? str.index(matches[0].strip) : -1 }
72
+ end
73
+
74
+ def find_city_by_exact_or_trans(cities, without_state, transliterated)
75
+ cities.each do |city|
76
+ exact = Regexp.new("#{city[0]}$")
77
+ trans = Regexp.new("#{city[1]}$")
78
+
79
+ return { city_name: city[0], start_at: without_state.rindex(city[0]) , city_string: city[0] } if exact.match? without_state
80
+ return { city_name: city[0], start_at: transliterated.rindex(city[1]), city_string: without_state[transliterated.rindex(trans)..-1] } if trans.match? transliterated
81
+ end
82
+ nil
83
+ end
84
+
85
+ def find_address(str, city_info)
86
+ return { address: str[0 ... city_info[:start_at]].strip } if city_info[:start_at] > 0
87
+ { address: str.split(city_info[:city_string])[0].strip }
88
+ end
89
+ end
90
+ end
@@ -0,0 +1,4 @@
1
+ module AddressTokens
2
+ class StateNotFound < StandardError
3
+ end
4
+ end
data.tar.gz.sig ADDED
@@ -0,0 +1,3 @@
1
+ X�ٹ��d'
2
  ����N�,m��
3
+ V �:��x��"Ɣ,��-V�@����0s�s/>������:^�ZlY*L�u��i���#S%
4
+ �EɌq^s�y�X�ࢵ���T�Z[�EAr�:��
metadata ADDED
@@ -0,0 +1,137 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: address_tokens
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.3
5
+ platform: ruby
6
+ authors:
7
+ - Eustaquio Rangel
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain:
11
+ - |
12
+ -----BEGIN CERTIFICATE-----
13
+ MIIDjjCCAnagAwIBAgIBATANBgkqhkiG9w0BAQUFADBGMRgwFgYDVQQDDA9ldXN0
14
+ YXF1aW9yYW5nZWwxFTATBgoJkiaJk/IsZAEZFgVnbWFpbDETMBEGCgmSJomT8ixk
15
+ ARkWA2NvbTAeFw0xNzA2MDMyMjM3MjdaFw0xODA2MDMyMjM3MjdaMEYxGDAWBgNV
16
+ BAMMD2V1c3RhcXVpb3JhbmdlbDEVMBMGCgmSJomT8ixkARkWBWdtYWlsMRMwEQYK
17
+ CZImiZPyLGQBGRYDY29tMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA
18
+ pI58OaGELZWRnqvArECgbbOOU5ThDxGRoBz/91KaXysVAalPUVtEXQjsZSc6wOKb
19
+ a+m2vEZ6j1LWMfG1xKh07KN/EwcmUoG9uj/U+OcZ+M6YA+DSDBQJozXEUgKiD0e0
20
+ crZqrq+Qo9jg19lPvYcsFS4WumC3gtrx8Fe8K5JTEFR71Md3nnuBuKp1pouYTC3b
21
+ UQ2Xv1URDITKa3N3Kyu/6MJOYBJ2nvDKdEdMFbXxKiC/Njv7eIF+1iNWrzp8osgL
22
+ 5LWITluL2n+/14QVDJehLRuh9Zg9WGT6FEji4PsmNQtZKwYpGGzNX5dDeYZcbhPk
23
+ ri9MK8ztMYBLkNzd4sbbPQIDAQABo4GGMIGDMAkGA1UdEwQCMAAwCwYDVR0PBAQD
24
+ AgSwMB0GA1UdDgQWBBR9bkdMJUe5Lyc1V8NQm1hbjPpKBzAkBgNVHREEHTAbgRll
25
+ dXN0YXF1aW9yYW5nZWxAZ21haWwuY29tMCQGA1UdEgQdMBuBGWV1c3RhcXVpb3Jh
26
+ bmdlbEBnbWFpbC5jb20wDQYJKoZIhvcNAQEFBQADggEBAF8KZx0njczbBRXzjGo0
27
+ mTsAOlB474ZqQjKN1zL/sGw8G4Q/UpeVEclBZrP9P9PUomYliGP38oYM4DHHMpyd
28
+ Qh9Wsou0ZJ5oD0O3nRraOVGFN7Azm05+xJ4fV1Zi8nsUdpF3za7s27Non9cYF4/2
29
+ iwJPWOjl/AxvAS2efkECSGbuZtuNKWrWYMH+aGbtavd5hfb8voGbTFYB4azbQQf+
30
+ ESf+WEicRlMQuQUn314wSFq1pq65S9GrZWMmSP5gc0kKfa51fOmIYmO3eSlmWG/p
31
+ G+/hO8DFcpmRPCD/YXu/rFkKHquizBGSfr1BR/es/HhfrIoT3BUN2uDUT02aaWmB
32
+ JMg=
33
+ -----END CERTIFICATE-----
34
+ date: 2018-01-08 00:00:00.000000000 Z
35
+ dependencies:
36
+ - !ruby/object:Gem::Dependency
37
+ name: bundler
38
+ requirement: !ruby/object:Gem::Requirement
39
+ requirements:
40
+ - - "~>"
41
+ - !ruby/object:Gem::Version
42
+ version: '1.16'
43
+ type: :development
44
+ prerelease: false
45
+ version_requirements: !ruby/object:Gem::Requirement
46
+ requirements:
47
+ - - "~>"
48
+ - !ruby/object:Gem::Version
49
+ version: '1.16'
50
+ - !ruby/object:Gem::Dependency
51
+ name: rake
52
+ requirement: !ruby/object:Gem::Requirement
53
+ requirements:
54
+ - - "~>"
55
+ - !ruby/object:Gem::Version
56
+ version: '10.0'
57
+ type: :development
58
+ prerelease: false
59
+ version_requirements: !ruby/object:Gem::Requirement
60
+ requirements:
61
+ - - "~>"
62
+ - !ruby/object:Gem::Version
63
+ version: '10.0'
64
+ - !ruby/object:Gem::Dependency
65
+ name: minitest
66
+ requirement: !ruby/object:Gem::Requirement
67
+ requirements:
68
+ - - "~>"
69
+ - !ruby/object:Gem::Version
70
+ version: '5.0'
71
+ type: :development
72
+ prerelease: false
73
+ version_requirements: !ruby/object:Gem::Requirement
74
+ requirements:
75
+ - - "~>"
76
+ - !ruby/object:Gem::Version
77
+ version: '5.0'
78
+ - !ruby/object:Gem::Dependency
79
+ name: i18n
80
+ requirement: !ruby/object:Gem::Requirement
81
+ requirements:
82
+ - - ">="
83
+ - !ruby/object:Gem::Version
84
+ version: '0'
85
+ type: :development
86
+ prerelease: false
87
+ version_requirements: !ruby/object:Gem::Requirement
88
+ requirements:
89
+ - - ">="
90
+ - !ruby/object:Gem::Version
91
+ version: '0'
92
+ description: Always want to find where address, city and state are on a string? Use
93
+ this gem.
94
+ email:
95
+ - taq@eustaquiorangel.com
96
+ executables: []
97
+ extensions: []
98
+ extra_rdoc_files: []
99
+ files:
100
+ - ".gitignore"
101
+ - ".travis.yml"
102
+ - Gemfile
103
+ - README.md
104
+ - Rakefile
105
+ - address_tokens.gemspec
106
+ - bin/console
107
+ - bin/setup
108
+ - lib/address_tokens.rb
109
+ - lib/address_tokens/version.rb
110
+ - lib/city_not_found.rb
111
+ - lib/finder.rb
112
+ - lib/matcher.rb
113
+ - lib/state_not_found.rb
114
+ homepage: http://github.com/taq/address_tokens
115
+ licenses: []
116
+ metadata: {}
117
+ post_install_message:
118
+ rdoc_options: []
119
+ require_paths:
120
+ - lib
121
+ required_ruby_version: !ruby/object:Gem::Requirement
122
+ requirements:
123
+ - - ">="
124
+ - !ruby/object:Gem::Version
125
+ version: '0'
126
+ required_rubygems_version: !ruby/object:Gem::Requirement
127
+ requirements:
128
+ - - ">="
129
+ - !ruby/object:Gem::Version
130
+ version: '0'
131
+ requirements: []
132
+ rubyforge_project:
133
+ rubygems_version: 2.7.3
134
+ signing_key:
135
+ specification_version: 4
136
+ summary: Find address tokens on a string
137
+ test_files: []
metadata.gz.sig ADDED
Binary file