crm_formatter 2.0 → 2.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +298 -315
- data/Rakefile +21 -29
- data/lib/crm_formatter/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 44e32d7fa28b037a18bcef82d65c47ad11c0bf740e871b5b3fb3716947604a74
|
4
|
+
data.tar.gz: 6e1772615e3b91e3c2fd30b36964378acd8aaf7de2c2a0d0ada055f9a722ba09
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 2565ec3cd04042a5d3fd0a7bcef2434a10a472692d252e1288f28fd0cc5aae74e2ba46b991ab460e1af2eeaa2bd2c709f995ea1c12a38ca614718588d5c6b17e
|
7
|
+
data.tar.gz: 3bed59dfc2c244be215acc7375b4a75d25f02f79f9e1cb1d7f943a9560dcb09fc1c2006ce6f58a31dedbb8099498ccf435d5fba967cffa40a2623310b8097610
|
data/README.md
CHANGED
@@ -6,352 +6,335 @@ CRM Wrap is perfect for curating high-volume enterprise-scale web scraping, and
|
|
6
6
|
|
7
7
|
It's also perfect for processing API data, Web Forms, and routine DB normalizing and scrubbing processes. Not only does it reformat Address, Phone, and Web data, it can also accept lists to scrub against, then providing detailed reports about how each piece of data compares with your criteria lists.
|
8
8
|
|
9
|
-
The CRM Wrap Gem is currently in '--pre versioning', or 'beta mode' as the process of reorganizing these proprietary, production environment processes from their native app environment into this newly created open source gem. Formal tests in the gem environment are still on the way, as is the documentation. But the processes themselves have been very reliable and an integral part of a very large app dedicated to such services.
|
10
|
-
|
11
9
|
## Getting Started
|
12
10
|
CRM Wrap is compatible with Rails 4.2 and 5.0, 5.1 and 5.2 on Ruby 2.2 and later.
|
13
11
|
|
14
12
|
In your Gemfile add:
|
15
13
|
```
|
16
|
-
gem 'crm_formatter'
|
14
|
+
gem 'crm_formatter'
|
17
15
|
```
|
18
16
|
Or to install locally:
|
19
17
|
```
|
20
|
-
gem install crm_formatter
|
18
|
+
gem install crm_formatter
|
21
19
|
```
|
22
20
|
## Usage
|
23
|
-
Using CRM Wrap in your app is very simple, and could be accessed from your app's concerns, , helpers, lib, models, or services, but depends on the scope, location, and size of your application and server. For simple form submission validations the model callback is typically ideal. For database normalizing tasks the concerns, helpers, or lib is typically ideal. For long running processes like web scraping or high volume APIs calls, like Google Linkedin, or Twitter the lib or services might be ideal (asynchronous multithreaded even better)
|
24
21
|
|
25
|
-
### Class Names
|
26
|
-
CrmFormatter contains three classes, which can be accessed like below with local or instance variables; you can name them anything you like.
|
27
|
-
```
|
28
|
-
adr_formatter = CrmFormatter::Address.new
|
29
|
-
@adr_formatter = CrmFormatter::Address.new
|
30
22
|
|
31
|
-
|
32
|
-
@ph_formatter = CrmFormatter::Phone.new
|
23
|
+
### I. Basic Usage
|
33
24
|
|
34
|
-
|
35
|
-
@web_formatter = CrmFormatter::Web.new
|
25
|
+
1. Format Array of Phone Numbers:
|
36
26
|
```
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
|
41
|
-
|
42
|
-
```
|
43
|
-
addr_formatter = CrmFormatter::Address.new
|
44
|
-
full_address_hash = {street: street, city: city, state: state, zip: zip}
|
45
|
-
addr_formatter.get_full_address(full_address_hash)
|
46
|
-
addr_formatter.format_street(street_string)
|
47
|
-
addr_formatter.format_city(city_string)
|
48
|
-
addr_formatter.format_state(state_string)
|
49
|
-
addr_formatter.format_zip(zip)
|
50
|
-
addr_formatter.format_full_address(adr = {})
|
51
|
-
addr_formatter.compare_versions(original, formatted)
|
52
|
-
```
|
53
|
-
|
54
|
-
#### Phone Methods
|
55
|
-
Phone only has two methods, with a subtle but important distinction between them. For simply formatting a known phone, use `format_phone` to convert to the normalized (555) 123-4567 wrap. Use `validate_phone` if either your phone data has a bunch of text and special characters to remove, or if you aren't even sure that it is a phone, as it will help determine if the phone number seem legitimate. If so, it then passes it along to `format_phone`.
|
56
|
-
```
|
57
|
-
ph_formatter = CrmFormatter::Phone.new
|
58
|
-
ph_formatter.validate_phone(phone)
|
59
|
-
ph_formatter.format_phone(phone)
|
60
|
-
```
|
61
|
-
|
62
|
-
#### Web Methods
|
63
|
-
The examples on this README are from `format_url` method. The others are for web scraping, which will be documented in the near future.
|
64
|
-
```
|
65
|
-
web_formatter = CrmFormatter::Web.new
|
66
|
-
web_formatter.format_url(url)
|
67
|
-
web_formatter.extract_path(url_path)
|
68
|
-
web_formatter.remove_invalid_links(link)
|
69
|
-
web_formatter.remove_invalid_hrefs(href)
|
70
|
-
web_formatter.convert_to_scheme_host(url)
|
71
|
-
```
|
72
|
-
|
73
|
-
## Examples
|
74
|
-
#### Below are two examples using the Web `format_url(url)` method:
|
75
|
-
|
76
|
-
### Example 1: 6 Example URLs Submitted:
|
77
|
-
Custom Method to Query URLs
|
78
|
-
```
|
79
|
-
def self.get_urls
|
80
|
-
urls = %w(website.com website.business.site website website.fake website.fake.com website.com.fake)
|
81
|
-
end
|
82
|
-
```
|
83
|
-
Custom Wrapper Method
|
84
|
-
```
|
85
|
-
def self.run_webs
|
86
|
-
web = CrmFormatter::Web.new
|
87
|
-
formatted_url_hashes = get_urls.map do |url|
|
88
|
-
url_hash = web.format_url(url)
|
89
|
-
end
|
90
|
-
end
|
91
|
-
```
|
92
|
-
Results as Hash: 3/6 Reformatted due to invalid or no url extensions. 3 Reformatted and Normalized with `http://www.`
|
93
|
-
URL Extensions, **.com, .net, .fake** cross referenced with official IANA list.
|
94
|
-
```
|
95
|
-
[ {:reformatted=>true, :url_path=>"website.com", :formatted_url=>"http://www.website.com", :neg=>[], :pos=>[]},
|
96
|
-
{:reformatted=>false, :url_path=>"website.business.site", :formatted_url=>nil, :neg=>["error: ext.valid > 1 [business, site]"], :pos=>[]}, {:reformatted=>false, :url_path=>"website", :formatted_url=>nil, :neg=>["error: ext.none"], :pos=>[]},
|
97
|
-
{:reformatted=>false, :url_path=>"website.fake", :formatted_url=>nil, :neg=>["error: ext.invalid [fake]"], :pos=>[]},
|
98
|
-
{:reformatted=>true, :url_path=>"website.fake.com", :formatted_url=>"http://www.website.com", :neg=>[], :pos=>[]},
|
99
|
-
{:reformatted=>true, :url_path=>"website.com.fake", :formatted_url=>"http://www.website.com", :neg=>[], :pos=>[]}
|
27
|
+
array_of_phones = %w[
|
28
|
+
555-457-4391 555-888-4391
|
29
|
+
555-457-4334
|
30
|
+
555-555 555.555.1234
|
31
|
+
not_a_number
|
100
32
|
]
|
101
|
-
```
|
102
|
-
|
103
|
-
### Example 2: 6 Real URLs with Scrubbing Feature, but same configuration as above:
|
104
|
-
**Intentionally partially obfuscated**
|
105
|
-
```
|
106
|
-
urls = %w(approvXXXutosales.org autXXXartfinance.com leXXXummitautorepair.net melXXXtoyota.com norXXXastacura.com XXXmazda.com)
|
107
|
-
```
|
108
|
-
These results list 'neg' and 'pos', which are the criteria I was scrubbing against. I wanted to find the URLs of franchise auto dealers and exclude ancillary URLs.
|
109
|
-
```
|
110
|
-
[{:reformatted=>true, :url_path=>"approvXXXutosales.org", :formatted_url=>"http://www.approvXXXutosales.org", :neg=>["neg_urls: approv"], :pos=>[]},
|
111
|
-
{:reformatted=>true, :url_path=>"autXXXartfinance.com", :formatted_url=>"http://www.autXXXartfinance.com", :neg=>["neg_urls: financ"], :pos=>["pos_urls: smart"]},
|
112
|
-
{:reformatted=>true, :url_path=>"leXXXummitautorepair.net", :formatted_url=>"http://www.leXXXummitautorepair.net", :neg=>["neg_urls: repair"], :pos=>[]},
|
113
|
-
{:reformatted=>true, :url_path=>"melXXXtoyota.com", :formatted_url=>"http://www.melXXXtoyota.com", :neg=>[], :pos=>["pos_urls: toyota"]},
|
114
|
-
{:reformatted=>true, :url_path=>"norXXXastacura.com", :formatted_url=>"http://www.norXXXastacura.com", :neg=>[], :pos=>["pos_urls: acura"]},
|
115
|
-
{:reformatted=>true, :url_path=>"XXXmazda.com", :formatted_url=>"http://www.XXXmazda.com", :neg=>[], :pos=>["pos_urls: mazda"]}
|
116
|
-
]
|
117
|
-
```
|
118
33
|
|
119
|
-
|
120
|
-
|
121
|
-
|
122
|
-
|
123
|
-
```
|
124
|
-
|
125
|
-
|
126
|
-
|
127
|
-
|
128
|
-
|
129
|
-
|
130
|
-
|
131
|
-
|
132
|
-
|
133
|
-
|
134
|
-
|
135
|
-
|
136
|
-
|
137
|
-
|
138
|
-
|
139
|
-
|
34
|
+
formatted_phone_hashes = CrmFormatter.format_phones(array_of_phones)
|
35
|
+
```
|
36
|
+
|
37
|
+
Formatted Phone Numbers:
|
38
|
+
```
|
39
|
+
formatted_phone_hashes = [
|
40
|
+
{
|
41
|
+
phone_status: 'formatted',
|
42
|
+
phone: '555-457-4391',
|
43
|
+
phone_f: '(555) 457-4391'
|
44
|
+
},
|
45
|
+
{
|
46
|
+
phone_status: 'formatted',
|
47
|
+
phone: '555-888-4391',
|
48
|
+
phone_f: '(555) 888-4391'
|
49
|
+
},
|
50
|
+
{
|
51
|
+
phone_status: 'formatted',
|
52
|
+
phone: '555-457-4334',
|
53
|
+
phone_f: '(555) 457-4334'
|
54
|
+
},
|
55
|
+
{
|
56
|
+
phone_status: 'invalid',
|
57
|
+
phone: '555-555',
|
58
|
+
phone_f: nil
|
59
|
+
},
|
60
|
+
{
|
61
|
+
phone_status: 'formatted',
|
62
|
+
phone: '555.555.1234',
|
63
|
+
phone_f: '(555) 555-1234'
|
64
|
+
},
|
65
|
+
{
|
66
|
+
phone_status: 'invalid',
|
67
|
+
phone: 'not_a_number',
|
68
|
+
phone_f: nil
|
69
|
+
}
|
70
|
+
]
|
140
71
|
```
|
141
|
-
#/app/config/application.rb
|
142
72
|
|
143
|
-
|
144
|
-
config.eager_load_paths += Dir["#{config.root}/lib/**/"]
|
145
|
-
```
|
146
|
-
#### Run in Rails Console
|
147
|
-
In this example, we'll run it in Rails Console like below, but you could also create a Rake Task and integrate it with a scheduled Cron Job. You could also run the process through your contoller actions in a GUI. If accessing through the front end, you might want to do it asynchronously with gems like Delayed_job or SideKick so you can free-up your controllers and prevent your front end from freezing while waiting for the job to complete; if running very large tasks.
|
73
|
+
2. Format Array of URLs:
|
148
74
|
```
|
149
|
-
|
150
|
-
|
151
|
-
|
152
|
-
|
153
|
-
|
154
|
-
|
155
|
-
|
156
|
-
|
157
|
-
|
158
|
-
|
159
|
-
|
160
|
-
|
161
|
-
if url_hash[:reformatted]
|
162
|
-
|
163
|
-
act_hsh = { url: url_hsh[:formatted_url],
|
164
|
-
url_sts: url_hsh[:formatted_url],
|
165
|
-
scrub_date: Time.now
|
166
|
-
}
|
167
|
-
else
|
168
|
-
act_hsh = { scrub_date: Time.now }
|
169
|
-
end
|
170
|
-
|
171
|
-
act.update(act_hsh)
|
172
|
-
end
|
173
|
-
end
|
75
|
+
array_of_urls = %w[
|
76
|
+
sample01.com/staff
|
77
|
+
www.sample02.net.com
|
78
|
+
http://www.sample3.net
|
79
|
+
www.sample04.net/contact_us
|
80
|
+
http://sample05.net
|
81
|
+
www.sample06.sofake
|
82
|
+
www.sample07.com.sofake
|
83
|
+
example08.not.real
|
84
|
+
www.sample09.net/staff/management
|
85
|
+
www.www.sample10.com
|
86
|
+
]
|
174
87
|
|
175
|
-
|
176
|
-
|
177
|
-
|
178
|
-
|
88
|
+
formatted_url_hashes = CrmFormatter.format_urls(array_of_urls)
|
89
|
+
```
|
90
|
+
|
91
|
+
Formatted URLs:
|
92
|
+
```
|
93
|
+
formatted_url_hashes = [
|
94
|
+
{
|
95
|
+
web_status: 'invalid',
|
96
|
+
url: 'www.sample01.net.com',
|
97
|
+
url_f: nil,
|
98
|
+
url_path: nil,
|
99
|
+
web_neg: 'error: ext.valid > 1 [com, net]'
|
100
|
+
},
|
101
|
+
{
|
102
|
+
web_status: 'formatted',
|
103
|
+
url: 'sample02.com',
|
104
|
+
url_f: 'http://www.sample02.com',
|
105
|
+
url_path: nil,
|
106
|
+
web_neg: nil
|
107
|
+
},
|
108
|
+
{
|
109
|
+
web_status: 'unchanged',
|
110
|
+
url: 'http://www.sample3.net',
|
111
|
+
url_f: 'http://www.sample3.net',
|
112
|
+
url_path: nil,
|
113
|
+
web_neg: nil
|
114
|
+
},
|
115
|
+
{
|
116
|
+
web_status: 'formatted',
|
117
|
+
url: 'www.sample04.net/contact_us',
|
118
|
+
url_f: 'http://www.sample04.net',
|
119
|
+
url_path: '/contact_us',
|
120
|
+
web_neg: nil
|
121
|
+
},
|
122
|
+
{
|
123
|
+
web_status: 'formatted',
|
124
|
+
url: 'http://sample05.net',
|
125
|
+
url_f: 'http://www.sample05.net',
|
126
|
+
url_path: nil,
|
127
|
+
web_neg: nil
|
128
|
+
},
|
129
|
+
{
|
130
|
+
web_status: 'invalid',
|
131
|
+
url: 'www.sample06.sofake',
|
132
|
+
url_f: nil,
|
133
|
+
url_path: nil,
|
134
|
+
web_neg: 'error: ext.invalid [sofake]'
|
135
|
+
},
|
136
|
+
{
|
137
|
+
web_status: 'formatted',
|
138
|
+
url: 'www.sample07.com.sofake',
|
139
|
+
url_f: 'http://www.sample07.com',
|
140
|
+
url_path: nil,
|
141
|
+
web_neg: nil
|
142
|
+
},
|
143
|
+
{
|
144
|
+
web_status: 'invalid',
|
145
|
+
url: 'example08.not.real',
|
146
|
+
url_f: nil,
|
147
|
+
url_path: nil,
|
148
|
+
web_neg: 'error: ext.invalid [not, real]'
|
149
|
+
},
|
150
|
+
{
|
151
|
+
web_status: 'formatted',
|
152
|
+
url: 'www.sample09.net/staff/management',
|
153
|
+
url_f: 'http://www.sample09.net',
|
154
|
+
url_path: '/staff/management',
|
155
|
+
web_neg: nil
|
156
|
+
},
|
157
|
+
{
|
158
|
+
web_status: 'formatted',
|
159
|
+
url: 'www.www.sample10.com',
|
160
|
+
url_f: 'http://www.sample10.com',
|
161
|
+
url_path: nil,
|
162
|
+
web_neg: nil
|
163
|
+
}
|
164
|
+
]
|
179
165
|
```
|
180
166
|
|
181
|
-
|
182
|
-
CRM Wrap returns data as a hash, which includes your original unaltered data you submitted, the formatted data, a T/F boolean indicator regarding if the original and formatted data are different, and for some methods, negs and pos regarding your criteria to scrub against. In the above example, the returned data from each submitted url would resemble the one below.
|
167
|
+
3. Format Array of Addresses (each as a hash):
|
183
168
|
```
|
184
|
-
|
185
|
-
|
186
|
-
|
187
|
-
|
188
|
-
|
189
|
-
|
169
|
+
array_of_addresses = [
|
170
|
+
{ street: '1234 EAST FAIR BOULEVARD', city: 'AUSTIN', state: 'TEXAS', zip: '78734' },
|
171
|
+
{ street: '5678 North Lake Shore Drive', city: '555-123-4567', state: 'Illinois', zip: '610' },
|
172
|
+
{ street: '9123 West Flagler Street', city: '1233144', state: 'NotAState', zip: 'Miami' }
|
173
|
+
]
|
174
|
+
formatted_address_hashes = CrmFormatter.format_addresses(array_of_addresses)
|
175
|
+
```
|
176
|
+
|
177
|
+
Formatted Addresses:
|
178
|
+
```
|
179
|
+
formatted_address_hashes = [
|
180
|
+
{
|
181
|
+
address_status: 'formatted',
|
182
|
+
full_addr: '1234 East Fair Boulevard, Austin, Texas, 78734',
|
183
|
+
full_addr_f: '1234 E Fair Blvd, Austin, TX, 78734',
|
184
|
+
street_f: '1234 E Fair Blvd',
|
185
|
+
city_f: 'Austin',
|
186
|
+
state_f: 'TX',
|
187
|
+
zip_f: '78734'
|
188
|
+
},
|
189
|
+
{
|
190
|
+
address_status: 'formatted',
|
191
|
+
full_addr: '5678 North Lake Shore Drive, 555-123-4567, Illinois, 610',
|
192
|
+
full_addr_f: '5678 N Lake Shore Dr, IL',
|
193
|
+
street_f: '5678 N Lake Shore Dr',
|
194
|
+
city_f: nil,
|
195
|
+
state_f: 'IL',
|
196
|
+
zip_f: nil
|
197
|
+
},
|
198
|
+
{
|
199
|
+
address_status: 'formatted',
|
200
|
+
full_addr: '9123 West Flagler Street, 1233144, NotAState, Miami',
|
201
|
+
full_addr_f: '9123 W Flagler St',
|
202
|
+
street_f: '9123 W Flagler St',
|
203
|
+
city_f: nil,
|
204
|
+
state_f: nil,
|
205
|
+
zip_f: nil
|
190
206
|
}
|
207
|
+
]
|
191
208
|
```
|
192
209
|
|
193
|
-
|
194
|
-
|
195
|
-
|
196
|
-
|
197
|
-
|
198
|
-
|
199
|
-
|
200
|
-
|
201
|
-
|
202
|
-
|
203
|
-
|
204
|
-
|
205
|
-
|
206
|
-
|
207
|
-
|
208
|
-
|
209
|
-
|
210
|
-
|
211
|
-
|
210
|
+
### II. Advanced Usage
|
211
|
+
Advanced usage has ability to parse a CSV file or pass large data sets. It also leverages the Utf8Sanitizer gem to check for and remove any non-UTF8 characters and extra whitespace (double spaces, new line, new paragraph, carriage returns, etc.). The results will include a detailed report including the line numbers of altered data, along with the before and after for comparison. Then, it passes that data to the CrmFormatter gem's advanced usage to format all parts of the CRM data together (Address, Phone, Web)
|
212
|
+
|
213
|
+
Access advanced usage via `format_with_report(args)` method and pass a csv file_path or data hashes.
|
214
|
+
|
215
|
+
1. Parse and Format CSV via File Path (Must be absolute path to root and follow the syntax as below)
|
216
|
+
```
|
217
|
+
formatted_csv_results = CrmFormatter.format_with_report(file_path: './path/to/your/csv.csv')
|
218
|
+
```
|
219
|
+
|
220
|
+
Parsed & Formatted CSV Results:
|
221
|
+
```
|
222
|
+
formatted_csv_results = {
|
223
|
+
stats:
|
224
|
+
{
|
225
|
+
total_rows: 2,
|
226
|
+
header_row: 1,
|
227
|
+
valid_rows: 1,
|
228
|
+
error_rows: 0,
|
229
|
+
defective_rows: 0,
|
230
|
+
perfect_rows: 0,
|
231
|
+
encoded_rows: 1,
|
232
|
+
wchar_rows: 0
|
233
|
+
},
|
234
|
+
data:
|
235
|
+
{
|
236
|
+
valid_data:
|
237
|
+
[
|
238
|
+
{
|
239
|
+
row_id: 1,
|
240
|
+
act_name: 'Courtesy Ford',
|
241
|
+
street: '1410 West Pine Street Hattiesburg',
|
242
|
+
city: 'Wexford',
|
243
|
+
state: 'MS',
|
244
|
+
zip: '39401',
|
245
|
+
full_addr: '1410 West Pine Street Hattiesburg, Wexford, MS, 39401',
|
246
|
+
phone: '512-555-1212',
|
247
|
+
url: 'http://www.courtesyfordsales.com',
|
248
|
+
street_f: '1410 W Pine St Hattiesburg',
|
249
|
+
city_f: 'Wexford',
|
250
|
+
state_f: 'MS',
|
251
|
+
zip_f: '39401',
|
252
|
+
full_addr_f: '1410 W Pine St Hattiesburg, Wexford, MS, 39401',
|
253
|
+
phone_f: '(512) 555-1212',
|
254
|
+
url_f: 'http://www.courtesyfordsales.com',
|
255
|
+
url_path: nil,
|
256
|
+
web_neg: nil,
|
257
|
+
address_status: 'formatted',
|
258
|
+
phone_status: 'formatted',
|
259
|
+
web_status: 'unchanged',
|
260
|
+
utf_status: 'encoded'
|
261
|
+
}
|
262
|
+
],
|
263
|
+
encoded_data:
|
264
|
+
[
|
265
|
+
{ row_id: 1,
|
266
|
+
text: "http://www.courtesyfordsales.com,Courtesy Ford,__\xD5\xCB\xEB\x8F\xEB__\xD5\xCB\xEB\x8F\xEB____1410 West Pine Street Hattiesburg,Wexford,MS,39401,512-555-1212" }
|
267
|
+
],
|
268
|
+
defective_data: [],
|
269
|
+
error_data: []
|
270
|
+
},
|
271
|
+
file_path: './path/to/your/csv.csv'
|
272
|
+
}
|
212
273
|
```
|
213
274
|
|
214
|
-
|
215
|
-
```
|
216
|
-
|
217
|
-
|
218
|
-
|
219
|
-
|
220
|
-
|
221
|
-
|
275
|
+
2. Format Data Hashes
|
276
|
+
```
|
277
|
+
data_hashes_array = [{ row_id: '1', url: 'abcacura.com/twitter', act_name: "Stanley Chevrolet Kaufman\x99_\xCC", street: '825 East Fair Street', city: 'Kaufman', state: 'Texas', zip: '75142', phone: "555-457-4391\r\n" }]
|
278
|
+
|
279
|
+
formatted_data_hash_results = CrmFormatter.format_with_report(data: data_hashes_array)
|
280
|
+
```
|
281
|
+
|
282
|
+
Formatted Data Hashes Results:
|
283
|
+
```
|
284
|
+
formatted_data_hash_results = { stats:
|
285
|
+
{
|
286
|
+
total_rows: '1',
|
287
|
+
header_row: 1,
|
288
|
+
valid_rows: 1,
|
289
|
+
error_rows: 0,
|
290
|
+
defective_rows: 0,
|
291
|
+
perfect_rows: 0,
|
292
|
+
encoded_rows: 1,
|
293
|
+
wchar_rows: 1
|
294
|
+
},
|
295
|
+
data:
|
296
|
+
{
|
297
|
+
valid_data:
|
298
|
+
[
|
299
|
+
{
|
300
|
+
row_id: '1',
|
301
|
+
act_name: 'Stanley Chevrolet Kaufman',
|
302
|
+
street: '825 East Fair Street',
|
303
|
+
city: 'Kaufman',
|
304
|
+
state: 'Texas',
|
305
|
+
zip: '75142',
|
306
|
+
full_addr: '825 East Fair Street, Kaufman, Texas, 75142',
|
307
|
+
phone: '555-457-4391',
|
308
|
+
url: 'abcacura.com/twitter',
|
309
|
+
street_f: '825 E Fair St',
|
310
|
+
city_f: 'Kaufman',
|
311
|
+
state_f: 'TX',
|
312
|
+
zip_f: '75142',
|
313
|
+
full_addr_f: '825 E Fair St, Kaufman, TX, 75142',
|
314
|
+
phone_f: '(555) 457-4391',
|
315
|
+
url_f: 'http://www.abcacura.com',
|
316
|
+
url_path: '/twitter',
|
317
|
+
web_neg: nil,
|
318
|
+
address_status: 'formatted',
|
319
|
+
phone_status: 'formatted',
|
320
|
+
web_status: 'formatted',
|
321
|
+
utf_status: 'encoded, wchar'
|
322
|
+
}
|
323
|
+
],
|
324
|
+
encoded_data:
|
325
|
+
[
|
326
|
+
{
|
327
|
+
row_id: '1',
|
328
|
+
text: "1,abcacura.com/twitter,Stanley Chevrolet Kaufman\x99_\xCC,825 East Fair Street,Kaufman,Texas,75142,555-457-4391\r\n"
|
222
329
|
}
|
223
|
-
|
330
|
+
],
|
331
|
+
defective_data: [],
|
332
|
+
error_data: []
|
333
|
+
},
|
334
|
+
file_path: nil
|
335
|
+
}
|
224
336
|
```
|
225
337
|
|
226
|
-
### III. Detailed Examples
|
227
|
-
Some of the examples are excessively verbose to help illustrate the datatypes and processes. Here are a few guidelines and tips:
|
228
|
-
|
229
|
-
*These are just examples, not strict usage guides ...*
|
230
|
-
|
231
|
-
#### 1. Address Examples
|
232
|
-
```
|
233
|
-
def self.run_adrs
|
234
|
-
|
235
|
-
crm_address_formatter = CrmFormatter::Address.new
|
236
|
-
|
237
|
-
contacts = Contact.where.not(full_address: nil)
|
238
|
-
|
239
|
-
contacts.each do |contact|
|
240
|
-
|
241
|
-
cont_adr_hsh = { street: contact.street, city: contact.city,
|
242
|
-
state: contact.state, zip: contact.zip }
|
243
|
-
|
244
|
-
formatted_address_hsh = crm_address_formatter.format_full_address(cont_adr_hsh)
|
245
|
-
|
246
|
-
end
|
247
|
-
|
248
|
-
end
|
249
|
-
|
250
|
-
```
|
251
|
-
|
252
|
-
#### 2. Phone Examples
|
253
|
-
In the phone example, format_all_phone_in_my_db could be a custom wrapper method, which when called by Rails C or from a front end GUI process, could grab all phones in db meeting certain criteria to be scrubbed. The results will always be in hash wrap, such as below.... phone_hash
|
254
|
-
```
|
255
|
-
@crm_phone = CrmFormatter::Phone.new
|
256
|
-
|
257
|
-
def self.format_all_phone_in_my_db
|
258
|
-
phones_from_contacts = Contacts.where.not(phone: nil)
|
259
|
-
|
260
|
-
phones_from_contacts.each do |contact|
|
261
|
-
phone_hash = @crm_phone.validate_phone(contact.phone)
|
262
|
-
end
|
263
|
-
|
264
|
-
end
|
265
|
-
|
266
|
-
phone_hash = { phone: 555-123-4567, phone_f: (555) 123-4567, phone_status: true }
|
267
|
-
```
|
268
|
-
|
269
|
-
#### 3. Web Examples
|
270
|
-
The steps below will show you an option for how you could integrate larger processes in your app. Create a wrapper method you can call from an action or Rails C. In this example, a new class was also created in Lib for that purpose, as there could be related methods to create.
|
271
|
-
```
|
272
|
-
# /app/lib/start_crm.rb
|
273
|
-
|
274
|
-
class StartCrm
|
275
|
-
|
276
|
-
##Rails C: StartCrm.run_webs
|
277
|
-
def self.run_webs
|
278
|
-
oa_args = get_args
|
279
|
-
web = CrmFormatter::Web.new(oa_args)
|
280
|
-
|
281
|
-
formatted_url_hashes = get_urls.map do |url|
|
282
|
-
url_hash = web.format_url(url)
|
283
|
-
end
|
284
|
-
|
285
|
-
formatted_url_hashes
|
286
|
-
end
|
287
|
-
|
288
|
-
end
|
289
|
-
```
|
290
|
-
Application Config
|
291
|
-
```
|
292
|
-
#/app/config/application.rb
|
293
|
-
|
294
|
-
config.eager_load_paths << Rails.root.join('lib/**')
|
295
|
-
config.eager_load_paths += Dir["#{config.root}/lib/**/"]
|
296
|
-
```
|
297
|
-
Create your db query or put together a list of URLs to process, along with any OA to include. The below example is very verbose, but designed to be helpful. In reality, you might have various criteria saved in the db rather than writing it out.
|
298
|
-
In this example, we have auto dealer URLs. In this process, we're focusing on franchise dealers.
|
299
|
-
```
|
300
|
-
def self.get_args
|
301
|
-
neg_urls = %w(approv avis budget collis eat enterprise facebook financ food google gourmet hertz hotel hyatt insur invest loan lube mobility motel motorola parts quick rent repair restaur rv ryder service softwar travel twitter webhost yellowpages yelp youtube)
|
302
|
-
|
303
|
-
pos_urls = ["acura", "alfa romeo", "aston martin", "audi", "bmw", "bentley", "bugatti", "buick", "cdjr", "cadillac", "chevrolet", "chrysler", "dodge", "ferrari", "fiat", "ford", "gmc", "group", "group", "honda", "hummer", "hyundai", "infiniti", "isuzu", "jaguar", "jeep", "kia", "lamborghini", "lexus", "lincoln", "lotus", "mini", "maserati", "mazda", "mclaren", "mercedes-benz", "mitsubishi", "nissan", "porsche", "ram", "rolls-royce", "saab", "scion", "smart", "subaru", "suzuki", "toyota", "volkswagen", "volvo"]
|
304
|
-
|
305
|
-
neg_exts = %w(au ca edu es gov in ru uk us)
|
306
|
-
oa_args = {neg_urls: neg_urls, pos_urls: pos_urls, neg_exts: neg_exts}
|
307
|
-
end
|
308
|
-
|
309
|
-
def self.get_urls
|
310
|
-
urls = ["https://www.stevXXXXXXmitsubishiserviceandpartscenter.com", "https://www.perXXXXXXchryslerjeepcenterville.com", "http://www.peXXXXXXchryslerjeepcenterville.com", "http://www.colXXXXXXchryslerdodgejeepram.com"]
|
311
|
-
end
|
312
|
-
```
|
313
|
-
Run your class and wrapper method in Rails C. By creating the wrapper method, you have set up the entire process to run like a runner. In reality, you might have several different criteria accessible from a GUI or even running in Cron Jobs.
|
314
|
-
```
|
315
|
-
2.5.1 :001 > StartCrm.run_webs
|
316
|
-
```
|
317
|
-
Results are always in a Hash, like below. The URLs are slightly obfuscated out of respect (it's not a bug). These are examples from a large DB that runs on a loop 24/7 and gets to each organization about once a week, so it's already pretty well up to date, so there aren't any big changes below, but there are still a few things to point out below the code example.
|
318
|
-
```
|
319
|
-
[ {:reformatted=>false,
|
320
|
-
:url_path=>"https://www.steXXXXXXmitsubishiserviceandpartscenter.com",
|
321
|
-
:formatted_url=>"https://www.steXXXXXXmitsubishiserviceandpartscenter.com",
|
322
|
-
:neg=>["neg_urls: parts, rv, service"],
|
323
|
-
:pos=>["pos_urls: mitsubishi"]},
|
324
|
-
|
325
|
-
{:reformatted=>false,
|
326
|
-
:url_path=>"https://www.perXXXXXXchryslerjeepcenterville.com",
|
327
|
-
:formatted_url=>"https://www.perXXXXXXchryslerjeepcenterville.com",
|
328
|
-
:neg=>["neg_urls: rv"],
|
329
|
-
:pos=>["pos_urls: chrysler, jeep"]},
|
330
|
-
|
331
|
-
{:reformatted=>false,
|
332
|
-
:url_path=>"http://www.pXXXXXXchryslerjeepcenterville.com",
|
333
|
-
:formatted_url=>"http://www.XXXXXXechryslerjeepcenterville.com",
|
334
|
-
:neg=>["neg_urls: rv"],
|
335
|
-
:pos=>["pos_urls: chrysler, jeep"]},
|
336
|
-
|
337
|
-
{:reformatted=>false,
|
338
|
-
:url_path=>"http://www.colXXXXXXchryslerdodgejeepram.com",
|
339
|
-
:formatted_url=>"http://www.colXXXXXXchryslerdodgejeepram.com",
|
340
|
-
:neg=>["neg_urls: rv"],
|
341
|
-
:pos=>["pos_urls: chrysler, dodge, jeep, ram"]}
|
342
|
-
]
|
343
|
-
```
|
344
|
-
`:reformatted` indicates T/F if url_path and `:formatted_url` differ. If False, then it means they are the same, or the `:url_path` had significant errors which prevented it from being formatted, thus `:formatted_url` would be nil in such a case. The reality is that you might have some URLs that are so far off that, that they can't be reliably reformatted, so better to only let them pass if we are confident that they are reliable.
|
345
|
-
|
346
|
-
`:url_path` is the url originally submitted by the client. It can include directory links on the end too, '/careers/, '/about-us/', etc.
|
347
|
-
|
348
|
-
`:formatted_url` is the formatted version of `:url_path`. It will be stripped of additional paths, '/deals/', '/staff/', etc. Also, often times people ommit 'http://:' and 'www' in CRMs. This can sometimes cause errors for users or Mechanized Web Scrapers. So, those will always be included to ensure consistency. In our production app we follow up the formatting with url redirect following, which our configurations require the entire path, so it will always be included. The redirect following gem is already being worked on and will be released as an additional gem shortly.
|
349
|
-
|
350
|
-
`:neg` is an array of all the errors and negative, undesirable criteria to scrub against. If you include the criteria in OA `neg_urls:`, like above, it will automatically scrub and report. Regardless, any errors will also be included in there. So, if the url was not ultimately formatted, there will be details regarding why in `:neg`.
|
351
|
-
|
352
|
-
`:pos` is the opposite, which highlights positive criteria you might be looking for. It too is available in OA via `pos_urls:`, like above.
|
353
|
-
|
354
|
-
|
355
338
|
## Author
|
356
339
|
|
357
340
|
Adam J Booth - [4rlm](https://github.com/4rlm)
|
data/Rakefile
CHANGED
@@ -15,11 +15,10 @@ task :console do
|
|
15
15
|
require 'active_support/all'
|
16
16
|
ARGV.clear
|
17
17
|
|
18
|
-
formatted_data = format_with_report
|
19
|
-
|
18
|
+
# formatted_data = format_with_report
|
19
|
+
formatted_phones = format_phones
|
20
20
|
# formatted_urls = format_urls
|
21
21
|
# formatted_addresses = format_addresses
|
22
|
-
binding.pry
|
23
22
|
IRB.start
|
24
23
|
end
|
25
24
|
|
@@ -43,36 +42,29 @@ def format_addresses
|
|
43
42
|
end
|
44
43
|
|
45
44
|
def format_phones
|
46
|
-
array_of_phones = %w[
|
45
|
+
array_of_phones = %w[
|
46
|
+
555-457-4391 555-888-4391
|
47
|
+
555-457-4334
|
48
|
+
555-555 555.555.1234
|
49
|
+
not_a_number
|
50
|
+
]
|
47
51
|
formatted_phones = CrmFormatter.format_phones(array_of_phones)
|
48
52
|
end
|
49
53
|
|
50
54
|
def format_urls
|
55
|
+
array_of_urls = %w[
|
56
|
+
sample01.com/staff
|
57
|
+
www.sample02.net.com
|
58
|
+
http://www.sample3.net
|
59
|
+
www.sample04.net/contact_us
|
60
|
+
http://sample05.net
|
61
|
+
www.sample06.sofake
|
62
|
+
www.sample07.com.sofake
|
63
|
+
example08.not.real
|
64
|
+
www.sample09.net/staff/management
|
65
|
+
www.www.sample10.com
|
66
|
+
]
|
67
|
+
|
51
68
|
array_of_urls = %w[sample01.com/staff www.sample02.net.com http://www.sample3.net www.sample04.net/contact_us http://sample05.net www.sample06.sofake www.sample07.com.sofake example08.not.real www.sample09.net/staff/management www.www.sample10.com]
|
52
69
|
formatted_urls = CrmFormatter.format_urls(array_of_urls)
|
53
70
|
end
|
54
|
-
|
55
|
-
# gem install activesupport -v 5.0.0
|
56
|
-
# gem install activesupport
|
57
|
-
|
58
|
-
##################################################################
|
59
|
-
####### !ORIGINAL! SAVE #######
|
60
|
-
# Perfect!
|
61
|
-
# 1. 'cd-crm' ||crm_formatter/lib
|
62
|
-
# 2. Load runner at bottom before start.
|
63
|
-
# 3. Allows for Active Record & Binding.pry.
|
64
|
-
# task :console do
|
65
|
-
# require 'irb'
|
66
|
-
# require 'irb/completion'
|
67
|
-
# require 'crm_formatter' # You know what to do.
|
68
|
-
# ARGV.clear
|
69
|
-
# CrmFormatter.run
|
70
|
-
# IRB.start
|
71
|
-
# end
|
72
|
-
#############################
|
73
|
-
# alias xx='exit exit'
|
74
|
-
# alias ss='rake console'
|
75
|
-
# alias cd-crm="cd ~/Desktop/gemdev/crm_formatter"
|
76
|
-
# alias cd-gem.app="cd ~/Desktop/gemdev/gem_tester"
|
77
|
-
# alias cd-lib="cd ~/Desktop/gemdev/crm_formatter/lib"
|
78
|
-
#############################
|