crm_formatter 2.0 → 2.1

Sign up to get free protection for your applications and to get access to all the features.
Files changed (5) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +298 -315
  3. data/Rakefile +21 -29
  4. data/lib/crm_formatter/version.rb +1 -1
  5. metadata +1 -1
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 96ed8a01bb47d8aac9c3bb7b95a6e8d261ecc107d6d47c8813088157978239c4
4
- data.tar.gz: 88125eff101ca1ab5e5fcec3015d8cdb3f5f45571e644cfd54547ab8259f268c
3
+ metadata.gz: 44e32d7fa28b037a18bcef82d65c47ad11c0bf740e871b5b3fb3716947604a74
4
+ data.tar.gz: 6e1772615e3b91e3c2fd30b36964378acd8aaf7de2c2a0d0ada055f9a722ba09
5
5
  SHA512:
6
- metadata.gz: 27ba92aa172d3de3813b6338ac77281f840ce7d158a29b54ffa329edf6eac4f2c6394f3608522ec78bb7fe8efaeb9fcd2f034b47e6676c2f275a61e3c39aa2b9
7
- data.tar.gz: 7b497e726c925b6cf402c3efb20c0baae87adad9bfd5e18d4a024915bc2127533356aa331804282e9325e4b0055db55eb911aa5cb14ec85eb68870d2a78631e5
6
+ metadata.gz: 2565ec3cd04042a5d3fd0a7bcef2434a10a472692d252e1288f28fd0cc5aae74e2ba46b991ab460e1af2eeaa2bd2c709f995ea1c12a38ca614718588d5c6b17e
7
+ data.tar.gz: 3bed59dfc2c244be215acc7375b4a75d25f02f79f9e1cb1d7f943a9560dcb09fc1c2006ce6f58a31dedbb8099498ccf435d5fba967cffa40a2623310b8097610
data/README.md CHANGED
@@ -6,352 +6,335 @@ CRM Wrap is perfect for curating high-volume enterprise-scale web scraping, and
6
6
 
7
7
  It's also perfect for processing API data, Web Forms, and routine DB normalizing and scrubbing processes. Not only does it reformat Address, Phone, and Web data, it can also accept lists to scrub against, then providing detailed reports about how each piece of data compares with your criteria lists.
8
8
 
9
- The CRM Wrap Gem is currently in '--pre versioning', or 'beta mode' as the process of reorganizing these proprietary, production environment processes from their native app environment into this newly created open source gem. Formal tests in the gem environment are still on the way, as is the documentation. But the processes themselves have been very reliable and an integral part of a very large app dedicated to such services.
10
-
11
9
  ## Getting Started
12
10
  CRM Wrap is compatible with Rails 4.2 and 5.0, 5.1 and 5.2 on Ruby 2.2 and later.
13
11
 
14
12
  In your Gemfile add:
15
13
  ```
16
- gem 'crm_formatter', '~> 1.0.8.pre.rc.1'
14
+ gem 'crm_formatter'
17
15
  ```
18
16
  Or to install locally:
19
17
  ```
20
- gem install crm_formatter --pre
18
+ gem install crm_formatter
21
19
  ```
22
20
  ## Usage
23
- Using CRM Wrap in your app is very simple, and could be accessed from your app's concerns, , helpers, lib, models, or services, but depends on the scope, location, and size of your application and server. For simple form submission validations the model callback is typically ideal. For database normalizing tasks the concerns, helpers, or lib is typically ideal. For long running processes like web scraping or high volume APIs calls, like Google Linkedin, or Twitter the lib or services might be ideal (asynchronous multithreaded even better)
24
21
 
25
- ### Class Names
26
- CrmFormatter contains three classes, which can be accessed like below with local or instance variables; you can name them anything you like.
27
- ```
28
- adr_formatter = CrmFormatter::Address.new
29
- @adr_formatter = CrmFormatter::Address.new
30
22
 
31
- ph_formatter = CrmFormatter::Phone.new
32
- @ph_formatter = CrmFormatter::Phone.new
23
+ ### I. Basic Usage
33
24
 
34
- web_formatter = CrmFormatter::Web.new
35
- @web_formatter = CrmFormatter::Web.new
25
+ 1. Format Array of Phone Numbers:
36
26
  ```
37
-
38
- ### Available Methods in Each Class
39
-
40
- ## Address Methods
41
- These are the methods available to you. You can use them a la cart, for example if you just wanted to wrap all your states, or you could combine the entire address into `get_full_address()` which will run each of the related methods for you. It also adds an additional hash pair containing the full address as a single string. There is also an indicator pair to report if there were any changes from the original version to the newly formatted.
42
- ```
43
- addr_formatter = CrmFormatter::Address.new
44
- full_address_hash = {street: street, city: city, state: state, zip: zip}
45
- addr_formatter.get_full_address(full_address_hash)
46
- addr_formatter.format_street(street_string)
47
- addr_formatter.format_city(city_string)
48
- addr_formatter.format_state(state_string)
49
- addr_formatter.format_zip(zip)
50
- addr_formatter.format_full_address(adr = {})
51
- addr_formatter.compare_versions(original, formatted)
52
- ```
53
-
54
- #### Phone Methods
55
- Phone only has two methods, with a subtle but important distinction between them. For simply formatting a known phone, use `format_phone` to convert to the normalized (555) 123-4567 wrap. Use `validate_phone` if either your phone data has a bunch of text and special characters to remove, or if you aren't even sure that it is a phone, as it will help determine if the phone number seem legitimate. If so, it then passes it along to `format_phone`.
56
- ```
57
- ph_formatter = CrmFormatter::Phone.new
58
- ph_formatter.validate_phone(phone)
59
- ph_formatter.format_phone(phone)
60
- ```
61
-
62
- #### Web Methods
63
- The examples on this README are from `format_url` method. The others are for web scraping, which will be documented in the near future.
64
- ```
65
- web_formatter = CrmFormatter::Web.new
66
- web_formatter.format_url(url)
67
- web_formatter.extract_path(url_path)
68
- web_formatter.remove_invalid_links(link)
69
- web_formatter.remove_invalid_hrefs(href)
70
- web_formatter.convert_to_scheme_host(url)
71
- ```
72
-
73
- ## Examples
74
- #### Below are two examples using the Web `format_url(url)` method:
75
-
76
- ### Example 1: 6 Example URLs Submitted:
77
- Custom Method to Query URLs
78
- ```
79
- def self.get_urls
80
- urls = %w(website.com website.business.site website website.fake website.fake.com website.com.fake)
81
- end
82
- ```
83
- Custom Wrapper Method
84
- ```
85
- def self.run_webs
86
- web = CrmFormatter::Web.new
87
- formatted_url_hashes = get_urls.map do |url|
88
- url_hash = web.format_url(url)
89
- end
90
- end
91
- ```
92
- Results as Hash: 3/6 Reformatted due to invalid or no url extensions. 3 Reformatted and Normalized with `http://www.`
93
- URL Extensions, **.com, .net, .fake** cross referenced with official IANA list.
94
- ```
95
- [ {:reformatted=>true, :url_path=>"website.com", :formatted_url=>"http://www.website.com", :neg=>[], :pos=>[]},
96
- {:reformatted=>false, :url_path=>"website.business.site", :formatted_url=>nil, :neg=>["error: ext.valid > 1 [business, site]"], :pos=>[]}, {:reformatted=>false, :url_path=>"website", :formatted_url=>nil, :neg=>["error: ext.none"], :pos=>[]},
97
- {:reformatted=>false, :url_path=>"website.fake", :formatted_url=>nil, :neg=>["error: ext.invalid [fake]"], :pos=>[]},
98
- {:reformatted=>true, :url_path=>"website.fake.com", :formatted_url=>"http://www.website.com", :neg=>[], :pos=>[]},
99
- {:reformatted=>true, :url_path=>"website.com.fake", :formatted_url=>"http://www.website.com", :neg=>[], :pos=>[]}
27
+ array_of_phones = %w[
28
+ 555-457-4391 555-888-4391
29
+ 555-457-4334
30
+ 555-555 555.555.1234
31
+ not_a_number
100
32
  ]
101
- ```
102
-
103
- ### Example 2: 6 Real URLs with Scrubbing Feature, but same configuration as above:
104
- **Intentionally partially obfuscated**
105
- ```
106
- urls = %w(approvXXXutosales.org autXXXartfinance.com leXXXummitautorepair.net melXXXtoyota.com norXXXastacura.com XXXmazda.com)
107
- ```
108
- These results list 'neg' and 'pos', which are the criteria I was scrubbing against. I wanted to find the URLs of franchise auto dealers and exclude ancillary URLs.
109
- ```
110
- [{:reformatted=>true, :url_path=>"approvXXXutosales.org", :formatted_url=>"http://www.approvXXXutosales.org", :neg=>["neg_urls: approv"], :pos=>[]},
111
- {:reformatted=>true, :url_path=>"autXXXartfinance.com", :formatted_url=>"http://www.autXXXartfinance.com", :neg=>["neg_urls: financ"], :pos=>["pos_urls: smart"]},
112
- {:reformatted=>true, :url_path=>"leXXXummitautorepair.net", :formatted_url=>"http://www.leXXXummitautorepair.net", :neg=>["neg_urls: repair"], :pos=>[]},
113
- {:reformatted=>true, :url_path=>"melXXXtoyota.com", :formatted_url=>"http://www.melXXXtoyota.com", :neg=>[], :pos=>["pos_urls: toyota"]},
114
- {:reformatted=>true, :url_path=>"norXXXastacura.com", :formatted_url=>"http://www.norXXXastacura.com", :neg=>[], :pos=>["pos_urls: acura"]},
115
- {:reformatted=>true, :url_path=>"XXXmazda.com", :formatted_url=>"http://www.XXXmazda.com", :neg=>[], :pos=>["pos_urls: mazda"]}
116
- ]
117
- ```
118
33
 
119
- ## Quick Setup Guide
120
-
121
- #### Create a Wrapper with a custom Class and Method(s)
122
- This is just one of several ways to configure. If you only need the gem for formatting form data, you could just create a callback method in your model, but to scrub a database or process API and Harvested data, you'll want a dedicated process so you can manage the queue, criteria, and results. If you don't already have one, this example will show you how. Concerns, Helpers and Models might be fine for smaller tasks, but for heavier tasks Lib and Services are ideal, but depends on your specifications.
123
- ```
124
- # /app/lib/start_crm.rb
125
- ```
126
- ```
127
- class StartCrm
128
- def initialize
129
- @web = CrmFormatter::Web.new
130
- end
131
-
132
- def run_webs
133
- formatted_url_hashes = urls.map do |url|
134
- url_hash = @web.format_url(url)
135
- end
136
- end
137
- end
138
- ```
139
- You may need to edit your application config file to recognize your new class.
34
+ formatted_phone_hashes = CrmFormatter.format_phones(array_of_phones)
35
+ ```
36
+
37
+ Formatted Phone Numbers:
38
+ ```
39
+ formatted_phone_hashes = [
40
+ {
41
+ phone_status: 'formatted',
42
+ phone: '555-457-4391',
43
+ phone_f: '(555) 457-4391'
44
+ },
45
+ {
46
+ phone_status: 'formatted',
47
+ phone: '555-888-4391',
48
+ phone_f: '(555) 888-4391'
49
+ },
50
+ {
51
+ phone_status: 'formatted',
52
+ phone: '555-457-4334',
53
+ phone_f: '(555) 457-4334'
54
+ },
55
+ {
56
+ phone_status: 'invalid',
57
+ phone: '555-555',
58
+ phone_f: nil
59
+ },
60
+ {
61
+ phone_status: 'formatted',
62
+ phone: '555.555.1234',
63
+ phone_f: '(555) 555-1234'
64
+ },
65
+ {
66
+ phone_status: 'invalid',
67
+ phone: 'not_a_number',
68
+ phone_f: nil
69
+ }
70
+ ]
140
71
  ```
141
- #/app/config/application.rb
142
72
 
143
- config.eager_load_paths << Rails.root.join('lib/**')
144
- config.eager_load_paths += Dir["#{config.root}/lib/**/"]
145
- ```
146
- #### Run in Rails Console
147
- In this example, we'll run it in Rails Console like below, but you could also create a Rake Task and integrate it with a scheduled Cron Job. You could also run the process through your contoller actions in a GUI. If accessing through the front end, you might want to do it asynchronously with gems like Delayed_job or SideKick so you can free-up your controllers and prevent your front end from freezing while waiting for the job to complete; if running very large tasks.
73
+ 2. Format Array of URLs:
148
74
  ```
149
- 2.5.1 :001 > StartCrm.new.run_webs
150
- ```
151
- #### Instance vs Class Methods in your Wrapper
152
- In the above example, `run_webs` is an instance method, but a class method `self.run_webs` could work well too, like the example below. At lease in the early stages, this is a little easier if you keep running it in Rails C, because not requiring initializing means less to type to call it. Next you could setup your class with various methods to assist your process, like so:
153
- ```
154
- class StartCrm
155
- def self.run_webs
156
- web = CrmFormatter::Web.new
157
-
158
- formatted_url_hashes = query_accounts.map do |act|
159
- url_hsh = web.format_url(act.url)
160
-
161
- if url_hash[:reformatted]
162
-
163
- act_hsh = { url: url_hsh[:formatted_url],
164
- url_sts: url_hsh[:formatted_url],
165
- scrub_date: Time.now
166
- }
167
- else
168
- act_hsh = { scrub_date: Time.now }
169
- end
170
-
171
- act.update(act_hsh)
172
- end
173
- end
75
+ array_of_urls = %w[
76
+ sample01.com/staff
77
+ www.sample02.net.com
78
+ http://www.sample3.net
79
+ www.sample04.net/contact_us
80
+ http://sample05.net
81
+ www.sample06.sofake
82
+ www.sample07.com.sofake
83
+ example08.not.real
84
+ www.sample09.net/staff/management
85
+ www.www.sample10.com
86
+ ]
174
87
 
175
- def self.query_accounts
176
- accounts = Account.where(url_sts: 'Invalid').limit(50)
177
- end
178
- end
88
+ formatted_url_hashes = CrmFormatter.format_urls(array_of_urls)
89
+ ```
90
+
91
+ Formatted URLs:
92
+ ```
93
+ formatted_url_hashes = [
94
+ {
95
+ web_status: 'invalid',
96
+ url: 'www.sample01.net.com',
97
+ url_f: nil,
98
+ url_path: nil,
99
+ web_neg: 'error: ext.valid > 1 [com, net]'
100
+ },
101
+ {
102
+ web_status: 'formatted',
103
+ url: 'sample02.com',
104
+ url_f: 'http://www.sample02.com',
105
+ url_path: nil,
106
+ web_neg: nil
107
+ },
108
+ {
109
+ web_status: 'unchanged',
110
+ url: 'http://www.sample3.net',
111
+ url_f: 'http://www.sample3.net',
112
+ url_path: nil,
113
+ web_neg: nil
114
+ },
115
+ {
116
+ web_status: 'formatted',
117
+ url: 'www.sample04.net/contact_us',
118
+ url_f: 'http://www.sample04.net',
119
+ url_path: '/contact_us',
120
+ web_neg: nil
121
+ },
122
+ {
123
+ web_status: 'formatted',
124
+ url: 'http://sample05.net',
125
+ url_f: 'http://www.sample05.net',
126
+ url_path: nil,
127
+ web_neg: nil
128
+ },
129
+ {
130
+ web_status: 'invalid',
131
+ url: 'www.sample06.sofake',
132
+ url_f: nil,
133
+ url_path: nil,
134
+ web_neg: 'error: ext.invalid [sofake]'
135
+ },
136
+ {
137
+ web_status: 'formatted',
138
+ url: 'www.sample07.com.sofake',
139
+ url_f: 'http://www.sample07.com',
140
+ url_path: nil,
141
+ web_neg: nil
142
+ },
143
+ {
144
+ web_status: 'invalid',
145
+ url: 'example08.not.real',
146
+ url_f: nil,
147
+ url_path: nil,
148
+ web_neg: 'error: ext.invalid [not, real]'
149
+ },
150
+ {
151
+ web_status: 'formatted',
152
+ url: 'www.sample09.net/staff/management',
153
+ url_f: 'http://www.sample09.net',
154
+ url_path: '/staff/management',
155
+ web_neg: nil
156
+ },
157
+ {
158
+ web_status: 'formatted',
159
+ url: 'www.www.sample10.com',
160
+ url_f: 'http://www.sample10.com',
161
+ url_path: nil,
162
+ web_neg: nil
163
+ }
164
+ ]
179
165
  ```
180
166
 
181
- #### Data Response in a Hash
182
- CRM Wrap returns data as a hash, which includes your original unaltered data you submitted, the formatted data, a T/F boolean indicator regarding if the original and formatted data are different, and for some methods, negs and pos regarding your criteria to scrub against. In the above example, the returned data from each submitted url would resemble the one below.
167
+ 3. Format Array of Addresses (each as a hash):
183
168
  ```
184
- # format_url method returns data like below this example...
185
- # url_hash = {:reformatted=>false,
186
- :url_path=>"https://www.steXXXXXXmitsubishiserviceandpartscenter.com",
187
- :formatted_url=>"https://www.steXXXXXXmitsubishiserviceandpartscenter.com",
188
- :neg=>["neg_urls: parts, rv, service"],
189
- :pos=>["pos_urls: mitsubishi"]
169
+ array_of_addresses = [
170
+ { street: '1234 EAST FAIR BOULEVARD', city: 'AUSTIN', state: 'TEXAS', zip: '78734' },
171
+ { street: '5678 North Lake Shore Drive', city: '555-123-4567', state: 'Illinois', zip: '610' },
172
+ { street: '9123 West Flagler Street', city: '1233144', state: 'NotAState', zip: 'Miami' }
173
+ ]
174
+ formatted_address_hashes = CrmFormatter.format_addresses(array_of_addresses)
175
+ ```
176
+
177
+ Formatted Addresses:
178
+ ```
179
+ formatted_address_hashes = [
180
+ {
181
+ address_status: 'formatted',
182
+ full_addr: '1234 East Fair Boulevard, Austin, Texas, 78734',
183
+ full_addr_f: '1234 E Fair Blvd, Austin, TX, 78734',
184
+ street_f: '1234 E Fair Blvd',
185
+ city_f: 'Austin',
186
+ state_f: 'TX',
187
+ zip_f: '78734'
188
+ },
189
+ {
190
+ address_status: 'formatted',
191
+ full_addr: '5678 North Lake Shore Drive, 555-123-4567, Illinois, 610',
192
+ full_addr_f: '5678 N Lake Shore Dr, IL',
193
+ street_f: '5678 N Lake Shore Dr',
194
+ city_f: nil,
195
+ state_f: 'IL',
196
+ zip_f: nil
197
+ },
198
+ {
199
+ address_status: 'formatted',
200
+ full_addr: '9123 West Flagler Street, 1233144, NotAState, Miami',
201
+ full_addr_f: '9123 W Flagler St',
202
+ street_f: '9123 W Flagler St',
203
+ city_f: nil,
204
+ state_f: nil,
205
+ zip_f: nil
190
206
  }
207
+ ]
191
208
  ```
192
209
 
193
- #### Optional Arguments OA
194
- A class can be instantiated with optional arguments 'OA', to load your criteria to scrub against. Only list the OA K-V Pairs you're using. No need to list empty values. It's not all or nothing. These are empty to illustrate the expected datatypes.
195
- **OA is currently only available for the Web class, but will soon be available in the Address & Phone classes.**
196
-
197
- Below is how the OA are received in the Web class at initialization.
198
- ```
199
- def initialize(args={})
200
- @empty_oa = args.empty?
201
- @pos_urls = args.fetch(:pos_urls, [])
202
- @neg_urls = args.fetch(:neg_urls, [])
203
- @pos_links = args.fetch(:pos_links, [])
204
- @neg_links = args.fetch(:neg_links, [])
205
- @pos_hrefs = args.fetch(:pos_hrefs, [])
206
- @neg_hrefs = args.fetch(:neg_hrefs, [])
207
- @pos_exts = args.fetch(:pos_exts, [])
208
- @neg_exts = args.fetch(:neg_exts, [])
209
- @min_length = args.fetch(:min_length, 2)
210
- @max_length = args.fetch(:max_length, 100)
211
- end
210
+ ### II. Advanced Usage
211
+ Advanced usage has ability to parse a CSV file or pass large data sets. It also leverages the Utf8Sanitizer gem to check for and remove any non-UTF8 characters and extra whitespace (double spaces, new line, new paragraph, carriage returns, etc.). The results will include a detailed report including the line numbers of altered data, along with the before and after for comparison. Then, it passes that data to the CrmFormatter gem's advanced usage to format all parts of the CRM data together (Address, Phone, Web)
212
+
213
+ Access advanced usage via `format_with_report(args)` method and pass a csv file_path or data hashes.
214
+
215
+ 1. Parse and Format CSV via File Path (Must be absolute path to root and follow the syntax as below)
216
+ ```
217
+ formatted_csv_results = CrmFormatter.format_with_report(file_path: './path/to/your/csv.csv')
218
+ ```
219
+
220
+ Parsed & Formatted CSV Results:
221
+ ```
222
+ formatted_csv_results = {
223
+ stats:
224
+ {
225
+ total_rows: 2,
226
+ header_row: 1,
227
+ valid_rows: 1,
228
+ error_rows: 0,
229
+ defective_rows: 0,
230
+ perfect_rows: 0,
231
+ encoded_rows: 1,
232
+ wchar_rows: 0
233
+ },
234
+ data:
235
+ {
236
+ valid_data:
237
+ [
238
+ {
239
+ row_id: 1,
240
+ act_name: 'Courtesy Ford',
241
+ street: '1410 West Pine Street Hattiesburg',
242
+ city: 'Wexford',
243
+ state: 'MS',
244
+ zip: '39401',
245
+ full_addr: '1410 West Pine Street Hattiesburg, Wexford, MS, 39401',
246
+ phone: '512-555-1212',
247
+ url: 'http://www.courtesyfordsales.com',
248
+ street_f: '1410 W Pine St Hattiesburg',
249
+ city_f: 'Wexford',
250
+ state_f: 'MS',
251
+ zip_f: '39401',
252
+ full_addr_f: '1410 W Pine St Hattiesburg, Wexford, MS, 39401',
253
+ phone_f: '(512) 555-1212',
254
+ url_f: 'http://www.courtesyfordsales.com',
255
+ url_path: nil,
256
+ web_neg: nil,
257
+ address_status: 'formatted',
258
+ phone_status: 'formatted',
259
+ web_status: 'unchanged',
260
+ utf_status: 'encoded'
261
+ }
262
+ ],
263
+ encoded_data:
264
+ [
265
+ { row_id: 1,
266
+ text: "http://www.courtesyfordsales.com,Courtesy Ford,__\xD5\xCB\xEB\x8F\xEB__\xD5\xCB\xEB\x8F\xEB____1410 West Pine Street Hattiesburg,Wexford,MS,39401,512-555-1212" }
267
+ ],
268
+ defective_data: [],
269
+ error_data: []
270
+ },
271
+ file_path: './path/to/your/csv.csv'
272
+ }
212
273
  ```
213
274
 
214
- Below is the syntax for how to use OA. Positive and Negative options available, and essentially function the same, but allow additional options for scrubbing data.
215
- ```
216
- oa_args = { neg_urls: %w(approv insur invest loan quick rent repair),
217
- neg_links: %w(buy call cash cheap click gas insta),
218
- neg_hrefs: %w(after anounc apply approved blog buy call click),
219
- neg_exts: %w(au ca edu es gov in ru uk us),
220
- min_length: 0,
221
- max_length: 30
275
+ 2. Format Data Hashes
276
+ ```
277
+ data_hashes_array = [{ row_id: '1', url: 'abcacura.com/twitter', act_name: "Stanley Chevrolet Kaufman\x99_\xCC", street: '825 East Fair Street', city: 'Kaufman', state: 'Texas', zip: '75142', phone: "555-457-4391\r\n" }]
278
+
279
+ formatted_data_hash_results = CrmFormatter.format_with_report(data: data_hashes_array)
280
+ ```
281
+
282
+ Formatted Data Hashes Results:
283
+ ```
284
+ formatted_data_hash_results = { stats:
285
+ {
286
+ total_rows: '1',
287
+ header_row: 1,
288
+ valid_rows: 1,
289
+ error_rows: 0,
290
+ defective_rows: 0,
291
+ perfect_rows: 0,
292
+ encoded_rows: 1,
293
+ wchar_rows: 1
294
+ },
295
+ data:
296
+ {
297
+ valid_data:
298
+ [
299
+ {
300
+ row_id: '1',
301
+ act_name: 'Stanley Chevrolet Kaufman',
302
+ street: '825 East Fair Street',
303
+ city: 'Kaufman',
304
+ state: 'Texas',
305
+ zip: '75142',
306
+ full_addr: '825 East Fair Street, Kaufman, Texas, 75142',
307
+ phone: '555-457-4391',
308
+ url: 'abcacura.com/twitter',
309
+ street_f: '825 E Fair St',
310
+ city_f: 'Kaufman',
311
+ state_f: 'TX',
312
+ zip_f: '75142',
313
+ full_addr_f: '825 E Fair St, Kaufman, TX, 75142',
314
+ phone_f: '(555) 457-4391',
315
+ url_f: 'http://www.abcacura.com',
316
+ url_path: '/twitter',
317
+ web_neg: nil,
318
+ address_status: 'formatted',
319
+ phone_status: 'formatted',
320
+ web_status: 'formatted',
321
+ utf_status: 'encoded, wchar'
322
+ }
323
+ ],
324
+ encoded_data:
325
+ [
326
+ {
327
+ row_id: '1',
328
+ text: "1,abcacura.com/twitter,Stanley Chevrolet Kaufman\x99_\xCC,825 East Fair Street,Kaufman,Texas,75142,555-457-4391\r\n"
222
329
  }
223
- @web_formatter = CrmFormatter::Web.new(oa_args)
330
+ ],
331
+ defective_data: [],
332
+ error_data: []
333
+ },
334
+ file_path: nil
335
+ }
224
336
  ```
225
337
 
226
- ### III. Detailed Examples
227
- Some of the examples are excessively verbose to help illustrate the datatypes and processes. Here are a few guidelines and tips:
228
-
229
- *These are just examples, not strict usage guides ...*
230
-
231
- #### 1. Address Examples
232
- ```
233
- def self.run_adrs
234
-
235
- crm_address_formatter = CrmFormatter::Address.new
236
-
237
- contacts = Contact.where.not(full_address: nil)
238
-
239
- contacts.each do |contact|
240
-
241
- cont_adr_hsh = { street: contact.street, city: contact.city,
242
- state: contact.state, zip: contact.zip }
243
-
244
- formatted_address_hsh = crm_address_formatter.format_full_address(cont_adr_hsh)
245
-
246
- end
247
-
248
- end
249
-
250
- ```
251
-
252
- #### 2. Phone Examples
253
- In the phone example, format_all_phone_in_my_db could be a custom wrapper method, which when called by Rails C or from a front end GUI process, could grab all phones in db meeting certain criteria to be scrubbed. The results will always be in hash wrap, such as below.... phone_hash
254
- ```
255
- @crm_phone = CrmFormatter::Phone.new
256
-
257
- def self.format_all_phone_in_my_db
258
- phones_from_contacts = Contacts.where.not(phone: nil)
259
-
260
- phones_from_contacts.each do |contact|
261
- phone_hash = @crm_phone.validate_phone(contact.phone)
262
- end
263
-
264
- end
265
-
266
- phone_hash = { phone: 555-123-4567, phone_f: (555) 123-4567, phone_status: true }
267
- ```
268
-
269
- #### 3. Web Examples
270
- The steps below will show you an option for how you could integrate larger processes in your app. Create a wrapper method you can call from an action or Rails C. In this example, a new class was also created in Lib for that purpose, as there could be related methods to create.
271
- ```
272
- # /app/lib/start_crm.rb
273
-
274
- class StartCrm
275
-
276
- ##Rails C: StartCrm.run_webs
277
- def self.run_webs
278
- oa_args = get_args
279
- web = CrmFormatter::Web.new(oa_args)
280
-
281
- formatted_url_hashes = get_urls.map do |url|
282
- url_hash = web.format_url(url)
283
- end
284
-
285
- formatted_url_hashes
286
- end
287
-
288
- end
289
- ```
290
- Application Config
291
- ```
292
- #/app/config/application.rb
293
-
294
- config.eager_load_paths << Rails.root.join('lib/**')
295
- config.eager_load_paths += Dir["#{config.root}/lib/**/"]
296
- ```
297
- Create your db query or put together a list of URLs to process, along with any OA to include. The below example is very verbose, but designed to be helpful. In reality, you might have various criteria saved in the db rather than writing it out.
298
- In this example, we have auto dealer URLs. In this process, we're focusing on franchise dealers.
299
- ```
300
- def self.get_args
301
- neg_urls = %w(approv avis budget collis eat enterprise facebook financ food google gourmet hertz hotel hyatt insur invest loan lube mobility motel motorola parts quick rent repair restaur rv ryder service softwar travel twitter webhost yellowpages yelp youtube)
302
-
303
- pos_urls = ["acura", "alfa romeo", "aston martin", "audi", "bmw", "bentley", "bugatti", "buick", "cdjr", "cadillac", "chevrolet", "chrysler", "dodge", "ferrari", "fiat", "ford", "gmc", "group", "group", "honda", "hummer", "hyundai", "infiniti", "isuzu", "jaguar", "jeep", "kia", "lamborghini", "lexus", "lincoln", "lotus", "mini", "maserati", "mazda", "mclaren", "mercedes-benz", "mitsubishi", "nissan", "porsche", "ram", "rolls-royce", "saab", "scion", "smart", "subaru", "suzuki", "toyota", "volkswagen", "volvo"]
304
-
305
- neg_exts = %w(au ca edu es gov in ru uk us)
306
- oa_args = {neg_urls: neg_urls, pos_urls: pos_urls, neg_exts: neg_exts}
307
- end
308
-
309
- def self.get_urls
310
- urls = ["https://www.stevXXXXXXmitsubishiserviceandpartscenter.com", "https://www.perXXXXXXchryslerjeepcenterville.com", "http://www.peXXXXXXchryslerjeepcenterville.com", "http://www.colXXXXXXchryslerdodgejeepram.com"]
311
- end
312
- ```
313
- Run your class and wrapper method in Rails C. By creating the wrapper method, you have set up the entire process to run like a runner. In reality, you might have several different criteria accessible from a GUI or even running in Cron Jobs.
314
- ```
315
- 2.5.1 :001 > StartCrm.run_webs
316
- ```
317
- Results are always in a Hash, like below. The URLs are slightly obfuscated out of respect (it's not a bug). These are examples from a large DB that runs on a loop 24/7 and gets to each organization about once a week, so it's already pretty well up to date, so there aren't any big changes below, but there are still a few things to point out below the code example.
318
- ```
319
- [ {:reformatted=>false,
320
- :url_path=>"https://www.steXXXXXXmitsubishiserviceandpartscenter.com",
321
- :formatted_url=>"https://www.steXXXXXXmitsubishiserviceandpartscenter.com",
322
- :neg=>["neg_urls: parts, rv, service"],
323
- :pos=>["pos_urls: mitsubishi"]},
324
-
325
- {:reformatted=>false,
326
- :url_path=>"https://www.perXXXXXXchryslerjeepcenterville.com",
327
- :formatted_url=>"https://www.perXXXXXXchryslerjeepcenterville.com",
328
- :neg=>["neg_urls: rv"],
329
- :pos=>["pos_urls: chrysler, jeep"]},
330
-
331
- {:reformatted=>false,
332
- :url_path=>"http://www.pXXXXXXchryslerjeepcenterville.com",
333
- :formatted_url=>"http://www.XXXXXXechryslerjeepcenterville.com",
334
- :neg=>["neg_urls: rv"],
335
- :pos=>["pos_urls: chrysler, jeep"]},
336
-
337
- {:reformatted=>false,
338
- :url_path=>"http://www.colXXXXXXchryslerdodgejeepram.com",
339
- :formatted_url=>"http://www.colXXXXXXchryslerdodgejeepram.com",
340
- :neg=>["neg_urls: rv"],
341
- :pos=>["pos_urls: chrysler, dodge, jeep, ram"]}
342
- ]
343
- ```
344
- `:reformatted` indicates T/F if url_path and `:formatted_url` differ. If False, then it means they are the same, or the `:url_path` had significant errors which prevented it from being formatted, thus `:formatted_url` would be nil in such a case. The reality is that you might have some URLs that are so far off that, that they can't be reliably reformatted, so better to only let them pass if we are confident that they are reliable.
345
-
346
- `:url_path` is the url originally submitted by the client. It can include directory links on the end too, '/careers/, '/about-us/', etc.
347
-
348
- `:formatted_url` is the formatted version of `:url_path`. It will be stripped of additional paths, '/deals/', '/staff/', etc. Also, often times people ommit 'http://:' and 'www' in CRMs. This can sometimes cause errors for users or Mechanized Web Scrapers. So, those will always be included to ensure consistency. In our production app we follow up the formatting with url redirect following, which our configurations require the entire path, so it will always be included. The redirect following gem is already being worked on and will be released as an additional gem shortly.
349
-
350
- `:neg` is an array of all the errors and negative, undesirable criteria to scrub against. If you include the criteria in OA `neg_urls:`, like above, it will automatically scrub and report. Regardless, any errors will also be included in there. So, if the url was not ultimately formatted, there will be details regarding why in `:neg`.
351
-
352
- `:pos` is the opposite, which highlights positive criteria you might be looking for. It too is available in OA via `pos_urls:`, like above.
353
-
354
-
355
338
  ## Author
356
339
 
357
340
  Adam J Booth - [4rlm](https://github.com/4rlm)
data/Rakefile CHANGED
@@ -15,11 +15,10 @@ task :console do
15
15
  require 'active_support/all'
16
16
  ARGV.clear
17
17
 
18
- formatted_data = format_with_report
19
- # formatted_phones = format_phones
18
+ # formatted_data = format_with_report
19
+ formatted_phones = format_phones
20
20
  # formatted_urls = format_urls
21
21
  # formatted_addresses = format_addresses
22
- binding.pry
23
22
  IRB.start
24
23
  end
25
24
 
@@ -43,36 +42,29 @@ def format_addresses
43
42
  end
44
43
 
45
44
  def format_phones
46
- array_of_phones = %w[555-457-4391 555-888-4391 555-457-4334 555-555 555.555.1234 not_a_number]
45
+ array_of_phones = %w[
46
+ 555-457-4391 555-888-4391
47
+ 555-457-4334
48
+ 555-555 555.555.1234
49
+ not_a_number
50
+ ]
47
51
  formatted_phones = CrmFormatter.format_phones(array_of_phones)
48
52
  end
49
53
 
50
54
  def format_urls
55
+ array_of_urls = %w[
56
+ sample01.com/staff
57
+ www.sample02.net.com
58
+ http://www.sample3.net
59
+ www.sample04.net/contact_us
60
+ http://sample05.net
61
+ www.sample06.sofake
62
+ www.sample07.com.sofake
63
+ example08.not.real
64
+ www.sample09.net/staff/management
65
+ www.www.sample10.com
66
+ ]
67
+
51
68
  array_of_urls = %w[sample01.com/staff www.sample02.net.com http://www.sample3.net www.sample04.net/contact_us http://sample05.net www.sample06.sofake www.sample07.com.sofake example08.not.real www.sample09.net/staff/management www.www.sample10.com]
52
69
  formatted_urls = CrmFormatter.format_urls(array_of_urls)
53
70
  end
54
-
55
- # gem install activesupport -v 5.0.0
56
- # gem install activesupport
57
-
58
- ##################################################################
59
- ####### !ORIGINAL! SAVE #######
60
- # Perfect!
61
- # 1. 'cd-crm' ||crm_formatter/lib
62
- # 2. Load runner at bottom before start.
63
- # 3. Allows for Active Record & Binding.pry.
64
- # task :console do
65
- # require 'irb'
66
- # require 'irb/completion'
67
- # require 'crm_formatter' # You know what to do.
68
- # ARGV.clear
69
- # CrmFormatter.run
70
- # IRB.start
71
- # end
72
- #############################
73
- # alias xx='exit exit'
74
- # alias ss='rake console'
75
- # alias cd-crm="cd ~/Desktop/gemdev/crm_formatter"
76
- # alias cd-gem.app="cd ~/Desktop/gemdev/gem_tester"
77
- # alias cd-lib="cd ~/Desktop/gemdev/crm_formatter/lib"
78
- #############################
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: false
2
2
 
3
3
  module CrmFormatter
4
- VERSION = "2.0"
4
+ VERSION = "2.1"
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: crm_formatter
3
3
  version: !ruby/object:Gem::Version
4
- version: '2.0'
4
+ version: '2.1'
5
5
  platform: ruby
6
6
  authors:
7
7
  - Adam Booth