address_tokens 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: dc428acad65025604ed184b2c5c243b48715fed99e00dc5b3bfee54208e169f5
4
+ data.tar.gz: 791c48d742010fc338cae65e03471796b98867dce09d07d3697adad638c150fc
5
+ SHA512:
6
+ metadata.gz: afc4a676c93f9e0d2aa6be9431acc7cdebb0113c192e93cde33f4053d3cf405e09024be3ed211705d514a02a1e1f61395b40f91c444271dba7e228b6dbc00ec2
7
+ data.tar.gz: 5d49a5903fb85a89562a52151fd55aa394ec1528ea22558ccc6a4121e516c79ec7ef93c4a3aabdf56755ab3c6d3450e0a72e3ee280e6f65a3fc1fb2dc780a510
checksums.yaml.gz.sig ADDED
Binary file
data/.gitignore ADDED
@@ -0,0 +1,8 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /_yardoc/
4
+ /coverage/
5
+ /doc/
6
+ /pkg/
7
+ /spec/reports/
8
+ /tmp/
data/.travis.yml ADDED
@@ -0,0 +1,5 @@
1
+ sudo: false
2
+ language: ruby
3
+ rvm:
4
+ - 2.5.0
5
+ before_install: gem install bundler -v 1.16.1
data/Gemfile ADDED
@@ -0,0 +1,6 @@
1
+ source "https://rubygems.org"
2
+
3
+ git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }
4
+
5
+ # Specify your gem's dependencies in address_tokens.gemspec
6
+ gemspec
data/README.md ADDED
@@ -0,0 +1,150 @@
1
+ # AddressTokens
2
+
3
+ Gem to find tokens on address strings. Can identify address, city and state name
4
+ and abbreviation.
5
+
6
+ ## Installation
7
+
8
+ Add this line to your application's Gemfile:
9
+
10
+ ```ruby
11
+ gem 'address_tokens'
12
+ ```
13
+
14
+ And then execute:
15
+
16
+ $ bundle
17
+
18
+ Or install it yourself as:
19
+
20
+ $ gem install address_tokens
21
+
22
+ ## Usage
23
+
24
+ Always hated when you get some address like
25
+
26
+ ```
27
+ 88 Colin P Kelly Jr St San Francisco, CA 94107
28
+ ```
29
+
30
+ and don't know where is the address, where is the state, where is the city name?
31
+
32
+ Ok, me too. So I made this class.
33
+
34
+ We need state and city data to find out what the tokens are. To load the date,
35
+ we'll need YAML files like [I have on this
36
+ repo](https://github.com/taq/brstatescities). They follow formats like these:
37
+
38
+ ```
39
+ # states.yml:
40
+ ---
41
+ CA: California
42
+
43
+ # cities.yml
44
+ CA:
45
+ - San Francisco
46
+ ```
47
+
48
+ Finding by exact match is fast and piece of cake. The problem is where we have
49
+ some different ways to write or abbreviate the city names like we have here on
50
+ Brazil. For example, my city, São José do Rio Preto, can have some forms:
51
+
52
+ 1. São José do Rio Preto
53
+ 2. São José Rio Preto
54
+ 3. S. J. do Rio Preto
55
+ 4. S. J. R. Preto
56
+
57
+ And keep going. But I think I could find some ways to find that.
58
+
59
+ ### Finding stuff
60
+
61
+ First, we create a Finder object:
62
+
63
+ ```ruby
64
+ finder = AddressTokens::Finder.new('88 Colin P Kelly Jr St San Francisco, CA 94107')
65
+ ```
66
+
67
+ Load the cities and states:
68
+
69
+ ```ruby
70
+ finder.load(:states, '/tmp/states.yml')
71
+ finder.load(:cities, '/tmp/cities.yml')
72
+ finder.find
73
+ ```
74
+
75
+ And ask to find:
76
+
77
+ ```ruby
78
+ matches = finder.find
79
+ ```
80
+
81
+ We'll get something like this:
82
+
83
+ ```ruby
84
+ p matches
85
+ {
86
+ :state_abbr => "CA",
87
+ :state_name => "California",
88
+ :state_start_at => 36,
89
+ :city_name => "San Francisco",
90
+ :city_string => "San Francisco",
91
+ :city_start_at => 23,
92
+ :address => "88 Colin P Kelly Jr St"
93
+ }
94
+ ```
95
+ The `start_at` values shows where the strings were found. The `city_string` is
96
+ the way the city name was found.
97
+
98
+ ### Custom options
99
+
100
+ As we saw, the default city and states separator on USA is a comma (','), but on
101
+ Brasil is a hyphen ('-'), so, **before asking to find the address**, we must
102
+ change it:
103
+
104
+ ```ruby
105
+ finder.state_separator = '-'
106
+ ```
107
+
108
+ Using a Brazilian address:
109
+
110
+ ```ruby
111
+ finder = AddressTokens::Finder.new('Rua Tratado de Tordesihas, 88, Pq. Estoril,
112
+ S. J. do Rio Preto - SP')
113
+ finder.state_separator = '-'
114
+ finder.load(:states, '/tmp/states.yml')
115
+ finder.load(:cities, '/tmp/cities.yml')
116
+ finder.find
117
+ ```
118
+
119
+ will return:
120
+
121
+ ```ruby
122
+ {
123
+ :state_abbr => "SP",
124
+ :state_name => "São Paulo",
125
+ :state_start_at => 63,
126
+ :city_name => "São José do Rio Preto",
127
+ :city_string => "S. J. do Rio Preto",
128
+ :city_start_at => 44,
129
+ :address => "Rua Tratado de Tordesihas, 88, Pq. Estoril,"
130
+ }
131
+ ```
132
+
133
+ ## Development
134
+
135
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run
136
+ `rake test` to run the tests. You can also run `bin/console` for an interactive
137
+ prompt that will allow you to experiment.
138
+
139
+ To install this gem onto your local machine, run `bundle exec rake install`. To
140
+ release a new version, update the version number in `version.rb`, and then run
141
+ `bundle exec rake release`, which will create a git tag for the version, push
142
+ git commits and tags, and push the `.gem` file to
143
+ [rubygems.org](https://rubygems.org).
144
+
145
+ Please run the tests using `rake test`.
146
+
147
+ ## Contributing
148
+
149
+ Bug reports and pull requests are welcome on GitHub at
150
+ https://github.com/taq/address_tokens.
data/Rakefile ADDED
@@ -0,0 +1,10 @@
1
+ require "bundler/gem_tasks"
2
+ require "rake/testtask"
3
+
4
+ Rake::TestTask.new(:test) do |t|
5
+ t.libs << "test"
6
+ t.libs << "lib"
7
+ t.test_files = FileList["test/**/*_test.rb"]
8
+ end
9
+
10
+ task :default => :test
@@ -0,0 +1,30 @@
1
+
2
+ lib = File.expand_path("../lib", __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require "address_tokens/version"
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "address_tokens"
8
+ spec.version = AddressTokens::VERSION
9
+ spec.authors = ["Eustaquio Rangel"]
10
+ spec.email = ["taq@eustaquiorangel.com"]
11
+
12
+ spec.summary = %q{Find address tokens on a string}
13
+ spec.description = %q{Always want to find where address, city and state are on a string? Use this gem.}
14
+ spec.homepage = "http://github.com/taq/address_tokens"
15
+
16
+ spec.files = `git ls-files -z`.split("\x0").reject do |f|
17
+ f.match(%r{^(test|spec|features)/})
18
+ end
19
+ spec.bindir = "exe"
20
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
21
+ spec.require_paths = ["lib"]
22
+
23
+ spec.add_development_dependency "bundler", "~> 1.16"
24
+ spec.add_development_dependency "rake", "~> 10.0"
25
+ spec.add_development_dependency "minitest", "~> 5.0"
26
+ spec.add_development_dependency "i18n"
27
+
28
+ spec.signing_key = '/home/taq/.gemcert/gem-private_key.pem'
29
+ spec.cert_chain = ['gem-public_cert.pem']
30
+ end
data/bin/console ADDED
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "address_tokens"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start(__FILE__)
data/bin/setup ADDED
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,3 @@
1
+ module AddressTokens
2
+ VERSION = "0.1.3"
3
+ end
@@ -0,0 +1,6 @@
1
+ require 'address_tokens/version'
2
+ require 'finder'
3
+ require 'matcher'
4
+
5
+ module AddressTokens
6
+ end
@@ -0,0 +1,5 @@
1
+ module AddressTokens
2
+ class CityNotFound < StandardError
3
+ end
4
+ end
5
+
data/lib/finder.rb ADDED
@@ -0,0 +1,57 @@
1
+ require 'yaml'
2
+ require 'i18n'
3
+
4
+ module AddressTokens
5
+ class Finder
6
+ attr_accessor :states, :cities, :state_separator, :string
7
+ attr_reader :city_tokens
8
+
9
+ def initialize(str)
10
+ raise Exception, 'String is null or empty' if str.strip.size < 1
11
+ @string, @states, @cities, = str, {}, {}
12
+ I18n.config.available_locales = :en
13
+ @state_separator = ','
14
+ end
15
+
16
+ def load(var, file)
17
+ load_file(var, file)
18
+ end
19
+
20
+ def find
21
+ raise ArgumentError, 'No states found' if @states.size == 0
22
+ raise ArgumentError, 'No cities found' if @cities.size == 0
23
+ transliterate_cities
24
+ Matcher.new(self).match(@string)
25
+ end
26
+
27
+ private
28
+
29
+ def load_file(var, file)
30
+ raise IOError, "File #{file} was not found" if !File.exist?(file)
31
+ data = YAML.load(File.read(file))
32
+ raise TypeError, "File #{file} is not a valid YAML file" if !data.kind_of?(Hash)
33
+ instance_variable_set("@#{var}", data)
34
+ end
35
+
36
+ def transliterate_cities
37
+ @city_tokens = {}
38
+
39
+ for state, cities in @cities
40
+ @city_tokens[state] = []
41
+
42
+ for city in cities
43
+ tokens = []
44
+ city.gsub!(/\s{2,}/, ' ')
45
+ tokens << city
46
+
47
+ tdown = I18n.transliterate(city).downcase
48
+ tokens << tdown
49
+
50
+ splitted = tdown.split
51
+ tokens << splitted if splitted.size > 1
52
+ @city_tokens[state] << tokens
53
+ end
54
+ end
55
+ end
56
+ end
57
+ end
data/lib/matcher.rb ADDED
@@ -0,0 +1,90 @@
1
+ module AddressTokens
2
+ class Matcher
3
+ def initialize(finder)
4
+ @finder = finder
5
+ end
6
+
7
+ def match(str)
8
+ state_info = find_state(str)
9
+ city_info = find_city(str, state_info)
10
+ address_info = find_address(str, city_info)
11
+
12
+ {
13
+ state_abbr: state_info[:state],
14
+ state_name: @finder.states[state_info[:state]],
15
+ state_start_at: state_info[:start_at],
16
+ city_name: city_info[:city_name],
17
+ city_string: city_info[:city_string],
18
+ city_start_at: city_info[:start_at],
19
+ address: address_info[:address]
20
+ }
21
+ end
22
+
23
+ private
24
+
25
+ def find_state(str)
26
+ last_char = str.rindex(@finder.state_separator)
27
+ token = str[last_char .. -1]
28
+ token = token.gsub(/\s{2,}/, ' ')
29
+ matches = token.match(Regexp.new("#{@finder.state_separator}\s?(?<state>\\w+)"))
30
+ { state: matches[:state], start_at: last_char }
31
+ end
32
+
33
+ def find_city(str, state_info)
34
+ cities = @finder.city_tokens[state_info[:state]]
35
+ raise StateNotFound, "State #{state_info[:state]} not found on state data" if cities.nil?
36
+
37
+ without_state = str[0 .. state_info[:start_at] - 1].strip.gsub(/\s{2,}/, ' ')
38
+ transliterated = I18n.transliterate(without_state).downcase
39
+
40
+ exact_or_trans = find_city_by_exact_or_trans(cities, without_state, transliterated)
41
+ return exact_or_trans if !exact_or_trans.nil?
42
+
43
+ tokenized = find_city_by_tokenized(cities, transliterated, str)
44
+ return tokenized if !tokenized.nil?
45
+
46
+ raise CityNotFound, "City not found"
47
+ end
48
+
49
+ def find_city_by_tokenized(cities, transliterated, str)
50
+ choices = []
51
+
52
+ cities.each do |city|
53
+ tokens = transliterated.split
54
+ city_tokens = I18n.transliterate(city[0]).downcase.split
55
+
56
+ if tokens[-1] == city_tokens[-1]
57
+ first_tokens = tokens.map { |token| token[0] }.join
58
+ first_city_tokens = city_tokens.map { |token| token[0] }.join
59
+ first_shorten_city_tokens = city_tokens.select { |token| token.size > 2 }.map { |token| token[0] }.join
60
+
61
+ choices << [city[0], first_tokens, first_city_tokens] if Regexp.new("#{first_city_tokens}$").match? first_tokens
62
+ choices << [city[0], first_tokens, first_shorten_city_tokens] if Regexp.new("#{first_shorten_city_tokens}$").match? first_tokens
63
+ end
64
+ end
65
+
66
+ return nil if choices.size == 0
67
+
68
+ reversed = choices.sort_by { |choice| choice[2].size }.reverse
69
+ regex = reversed[0][2].scan(/./).map { |char| "#{char}[\\p{Latin}\.]*\\s+"}.join.strip[0..-4]
70
+ matches = Regexp.new(regex, 'i').match(str)
71
+ { city_name: reversed[0][0], city_string: matches ? matches[0].strip : nil, start_at: matches ? str.index(matches[0].strip) : -1 }
72
+ end
73
+
74
+ def find_city_by_exact_or_trans(cities, without_state, transliterated)
75
+ cities.each do |city|
76
+ exact = Regexp.new("#{city[0]}$")
77
+ trans = Regexp.new("#{city[1]}$")
78
+
79
+ return { city_name: city[0], start_at: without_state.rindex(city[0]) , city_string: city[0] } if exact.match? without_state
80
+ return { city_name: city[0], start_at: transliterated.rindex(city[1]), city_string: without_state[transliterated.rindex(trans)..-1] } if trans.match? transliterated
81
+ end
82
+ nil
83
+ end
84
+
85
+ def find_address(str, city_info)
86
+ return { address: str[0 ... city_info[:start_at]].strip } if city_info[:start_at] > 0
87
+ { address: str.split(city_info[:city_string])[0].strip }
88
+ end
89
+ end
90
+ end
@@ -0,0 +1,4 @@
1
+ module AddressTokens
2
+ class StateNotFound < StandardError
3
+ end
4
+ end
data.tar.gz.sig ADDED
@@ -0,0 +1,3 @@
1
+ X�ٹ��d'
2
  ����N�,m��
3
+ V �:��x��"Ɣ,��-V�@����0s�s/>������:^�ZlY*L�u��i���#S%
4
+ �EɌq^s�y�X�ࢵ���T�Z[�EAr�:��
metadata ADDED
@@ -0,0 +1,137 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: address_tokens
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.3
5
+ platform: ruby
6
+ authors:
7
+ - Eustaquio Rangel
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain:
11
+ - |
12
+ -----BEGIN CERTIFICATE-----
13
+ MIIDjjCCAnagAwIBAgIBATANBgkqhkiG9w0BAQUFADBGMRgwFgYDVQQDDA9ldXN0
14
+ YXF1aW9yYW5nZWwxFTATBgoJkiaJk/IsZAEZFgVnbWFpbDETMBEGCgmSJomT8ixk
15
+ ARkWA2NvbTAeFw0xNzA2MDMyMjM3MjdaFw0xODA2MDMyMjM3MjdaMEYxGDAWBgNV
16
+ BAMMD2V1c3RhcXVpb3JhbmdlbDEVMBMGCgmSJomT8ixkARkWBWdtYWlsMRMwEQYK
17
+ CZImiZPyLGQBGRYDY29tMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA
18
+ pI58OaGELZWRnqvArECgbbOOU5ThDxGRoBz/91KaXysVAalPUVtEXQjsZSc6wOKb
19
+ a+m2vEZ6j1LWMfG1xKh07KN/EwcmUoG9uj/U+OcZ+M6YA+DSDBQJozXEUgKiD0e0
20
+ crZqrq+Qo9jg19lPvYcsFS4WumC3gtrx8Fe8K5JTEFR71Md3nnuBuKp1pouYTC3b
21
+ UQ2Xv1URDITKa3N3Kyu/6MJOYBJ2nvDKdEdMFbXxKiC/Njv7eIF+1iNWrzp8osgL
22
+ 5LWITluL2n+/14QVDJehLRuh9Zg9WGT6FEji4PsmNQtZKwYpGGzNX5dDeYZcbhPk
23
+ ri9MK8ztMYBLkNzd4sbbPQIDAQABo4GGMIGDMAkGA1UdEwQCMAAwCwYDVR0PBAQD
24
+ AgSwMB0GA1UdDgQWBBR9bkdMJUe5Lyc1V8NQm1hbjPpKBzAkBgNVHREEHTAbgRll
25
+ dXN0YXF1aW9yYW5nZWxAZ21haWwuY29tMCQGA1UdEgQdMBuBGWV1c3RhcXVpb3Jh
26
+ bmdlbEBnbWFpbC5jb20wDQYJKoZIhvcNAQEFBQADggEBAF8KZx0njczbBRXzjGo0
27
+ mTsAOlB474ZqQjKN1zL/sGw8G4Q/UpeVEclBZrP9P9PUomYliGP38oYM4DHHMpyd
28
+ Qh9Wsou0ZJ5oD0O3nRraOVGFN7Azm05+xJ4fV1Zi8nsUdpF3za7s27Non9cYF4/2
29
+ iwJPWOjl/AxvAS2efkECSGbuZtuNKWrWYMH+aGbtavd5hfb8voGbTFYB4azbQQf+
30
+ ESf+WEicRlMQuQUn314wSFq1pq65S9GrZWMmSP5gc0kKfa51fOmIYmO3eSlmWG/p
31
+ G+/hO8DFcpmRPCD/YXu/rFkKHquizBGSfr1BR/es/HhfrIoT3BUN2uDUT02aaWmB
32
+ JMg=
33
+ -----END CERTIFICATE-----
34
+ date: 2018-01-08 00:00:00.000000000 Z
35
+ dependencies:
36
+ - !ruby/object:Gem::Dependency
37
+ name: bundler
38
+ requirement: !ruby/object:Gem::Requirement
39
+ requirements:
40
+ - - "~>"
41
+ - !ruby/object:Gem::Version
42
+ version: '1.16'
43
+ type: :development
44
+ prerelease: false
45
+ version_requirements: !ruby/object:Gem::Requirement
46
+ requirements:
47
+ - - "~>"
48
+ - !ruby/object:Gem::Version
49
+ version: '1.16'
50
+ - !ruby/object:Gem::Dependency
51
+ name: rake
52
+ requirement: !ruby/object:Gem::Requirement
53
+ requirements:
54
+ - - "~>"
55
+ - !ruby/object:Gem::Version
56
+ version: '10.0'
57
+ type: :development
58
+ prerelease: false
59
+ version_requirements: !ruby/object:Gem::Requirement
60
+ requirements:
61
+ - - "~>"
62
+ - !ruby/object:Gem::Version
63
+ version: '10.0'
64
+ - !ruby/object:Gem::Dependency
65
+ name: minitest
66
+ requirement: !ruby/object:Gem::Requirement
67
+ requirements:
68
+ - - "~>"
69
+ - !ruby/object:Gem::Version
70
+ version: '5.0'
71
+ type: :development
72
+ prerelease: false
73
+ version_requirements: !ruby/object:Gem::Requirement
74
+ requirements:
75
+ - - "~>"
76
+ - !ruby/object:Gem::Version
77
+ version: '5.0'
78
+ - !ruby/object:Gem::Dependency
79
+ name: i18n
80
+ requirement: !ruby/object:Gem::Requirement
81
+ requirements:
82
+ - - ">="
83
+ - !ruby/object:Gem::Version
84
+ version: '0'
85
+ type: :development
86
+ prerelease: false
87
+ version_requirements: !ruby/object:Gem::Requirement
88
+ requirements:
89
+ - - ">="
90
+ - !ruby/object:Gem::Version
91
+ version: '0'
92
+ description: Always want to find where address, city and state are on a string? Use
93
+ this gem.
94
+ email:
95
+ - taq@eustaquiorangel.com
96
+ executables: []
97
+ extensions: []
98
+ extra_rdoc_files: []
99
+ files:
100
+ - ".gitignore"
101
+ - ".travis.yml"
102
+ - Gemfile
103
+ - README.md
104
+ - Rakefile
105
+ - address_tokens.gemspec
106
+ - bin/console
107
+ - bin/setup
108
+ - lib/address_tokens.rb
109
+ - lib/address_tokens/version.rb
110
+ - lib/city_not_found.rb
111
+ - lib/finder.rb
112
+ - lib/matcher.rb
113
+ - lib/state_not_found.rb
114
+ homepage: http://github.com/taq/address_tokens
115
+ licenses: []
116
+ metadata: {}
117
+ post_install_message:
118
+ rdoc_options: []
119
+ require_paths:
120
+ - lib
121
+ required_ruby_version: !ruby/object:Gem::Requirement
122
+ requirements:
123
+ - - ">="
124
+ - !ruby/object:Gem::Version
125
+ version: '0'
126
+ required_rubygems_version: !ruby/object:Gem::Requirement
127
+ requirements:
128
+ - - ">="
129
+ - !ruby/object:Gem::Version
130
+ version: '0'
131
+ requirements: []
132
+ rubyforge_project:
133
+ rubygems_version: 2.7.3
134
+ signing_key:
135
+ specification_version: 4
136
+ summary: Find address tokens on a string
137
+ test_files: []
metadata.gz.sig ADDED
Binary file