email_signature_parser 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 276004e620bb311863f8af53b6247b3d65160242e78e1eedde1cb4dba302cc1b
4
+ data.tar.gz: b92b2222b0b64f97c67f19ecd6a4026cf93e622c7a685d8ba61090b0ecc0391f
5
+ SHA512:
6
+ metadata.gz: 6bab1405757b03908ffd6a973f45c24689c0b6eebf5b9f832c0d2bda35dfa7dde9bf6c46e7e4dbf50b8060baf00d91e15a031499d4ac1a429883ab567e9fdf4a
7
+ data.tar.gz: 840e2be844581a467912f5af19b99035f87dd0932dd652f89d0215ee71af342c81193e22eef505a4d6d18e4b00f9f0acffeee4bd5f44f2c40185974b7a325b8a
data/README.md ADDED
@@ -0,0 +1,105 @@
1
+ # EmailSignatureParser
2
+
3
+ A Ruby gem for parsing email signatures. The gem tries to find the signature based on the name, if available, or email address and try to extract as much information as it can from the
4
+
5
+ ## Prerequisites
6
+
7
+ This library uses [ruby_postal](https://github.com/openvenues/ruby_postal), which uses [libpostal](https://github.com/openvenues/libpostal). You need to install the libpostal C library. Make sure you have the following prerequisites
8
+
9
+ **On Ubuntu/Debian**
10
+
11
+ ```bash
12
+ sudo apt-get install curl autoconf automake libtool pkg-config
13
+ ```
14
+
15
+ **On CentOS/RHEL**
16
+
17
+ ```bash
18
+ sudo yum install curl autoconf automake libtool pkgconfig
19
+ ```
20
+
21
+ **On Mac OSX**
22
+
23
+ ```bash
24
+ brew install curl autoconf automake libtool pkg-config
25
+ ```
26
+
27
+ **Installing libpostal**
28
+
29
+ ```bash
30
+ git clone https://github.com/openvenues/libpostal
31
+ cd libpostal
32
+ ./bootstrap.sh
33
+ ./configure --datadir=[...some dir with a few GB of space...]
34
+ make
35
+ sudo make install
36
+
37
+ # On Linux it's probably a good idea to run
38
+ sudo ldconfig
39
+ ```
40
+
41
+ ## Installation
42
+
43
+ Add this line to your application's Gemfile:
44
+
45
+ ```ruby
46
+ gem 'email_signature_parser'
47
+ ```
48
+
49
+ And then execute:
50
+
51
+ ```bash
52
+ bundle install
53
+ ```
54
+
55
+ Or install it yourself as:
56
+
57
+ ```bash
58
+ gem install email_signature_parser
59
+ ```
60
+
61
+ ## Usage
62
+
63
+ To extract information from an email signature, you can extract in from an eml file, from the plain text of an email, or pass it the
64
+
65
+ ```ruby
66
+ require 'email_signature_parser'
67
+
68
+ result = EmailSignatureParser.from_file('/path/to/email.eml')
69
+ result = EmailSignatureParser.from_html('John Doe <jdoe@email.com>', email_body_html)
70
+ result = EmailSignatureParser.from_text('John Doe <jdoe@email.com>', email_text)
71
+ ```
72
+
73
+ It will return a hash with whatever could be extracted from the signature
74
+
75
+ ```json
76
+ {
77
+ "name": "John Doe",
78
+ "email_address": "jdoe@testcompany.com",
79
+ "address": "Alhambra Circle Street, 125, Coral Gables, FL, 33134 USA",
80
+ "phones": [
81
+ {
82
+ "type": "Mobile",
83
+ "phone_number": "+1 5056223073",
84
+ "country": "US/CA"
85
+ },
86
+ ],
87
+ "links": {
88
+ "social_media": {
89
+ "linkedin": "https://www.linkedin.com/company/testcompany/"
90
+ },
91
+ "other": [
92
+ ]
93
+ },
94
+ "job_title": {
95
+ "title": "",
96
+ "acronym": "CEO"
97
+ },
98
+ "text": "Text of the signature",
99
+ "company_name": "TestCompany Ltd"
100
+ }
101
+ ```
102
+
103
+ ## Enron Data
104
+
105
+ Ive tested this library, among other things using the enron data. You can get the data [data](https://www.cs.cmu.edu/~enron/). Running `rake process_enron_data[input_path,output_path]` will process all emails and generate json files (with a copy of the original email) for all signatures found.
@@ -0,0 +1,440 @@
1
+ ---
2
+ # Executive Titles
3
+ - CEO # Chief Executive Officer
4
+ - COO # Chief Operating Officer
5
+ - CFO # Chief Financial Officer
6
+ - CTO # Chief Technology Officer
7
+ - CIO # Chief Information Officer
8
+ - CHRO # Chief Human Resources Officer
9
+ - CMO # Chief Marketing Officer
10
+ - CPO # Chief Product Officer
11
+ - CSO # Chief Strategy Officer
12
+ - CDO # Chief Data Officer
13
+ - CRO # Chief Revenue Officer
14
+ - CLO # Chief Legal Officer
15
+ - CCO # Chief Compliance Officer
16
+ - CISO # Chief Information Security Officer
17
+ - EVP # Executive Vice President
18
+ - SVP # Senior Vice President
19
+ - AVP # Assistant Vice President
20
+ - VP # Vice President
21
+ - Pres # President
22
+ - Dir # Director
23
+ - MGR # Manager
24
+ - Mgr # Manager
25
+ - GM # General Manager
26
+ - AGM # Assistant General Manager
27
+ - DGM # Deputy General Manager
28
+
29
+ # Academic Degrees
30
+ - PhD # Doctor of Philosophy
31
+ - Ph.D # Doctor of Philosophy
32
+ - Ph.D. # Doctor of Philosophy
33
+ - MD # Doctor of Medicine
34
+ - M.D # Doctor of Medicine
35
+ - M.D. # Doctor of Medicine
36
+ - JD # Juris Doctor
37
+ - J.D # Juris Doctor
38
+ - J.D. # Juris Doctor
39
+ - EdD # Doctor of Education
40
+ - Ed.D # Doctor of Education
41
+ - DDS # Doctor of Dental Surgery
42
+ - DMD # Doctor of Dental Medicine
43
+ - DVM # Doctor of Veterinary Medicine
44
+ - DSc # Doctor of Science
45
+ - D.Sc # Doctor of Science
46
+ - DSW # Doctor of Social Work
47
+ - DNP # Doctor of Nursing Practice
48
+ - PharmD # Doctor of Pharmacy
49
+ - MBA # Master of Business Administration
50
+ - M.B.A # Master of Business Administration
51
+ - M.B.A. # Master of Business Administration
52
+ - MPA # Master of Public Administration
53
+ - MPH # Master of Public Health
54
+ - MSc # Master of Science
55
+ - M.Sc # Master of Science
56
+ - M.Sc. # Master of Science
57
+ - MS # Master of Science
58
+ - M.S # Master of Science
59
+ - M.S. # Master of Science
60
+ - MSW # Master of Social Work
61
+ - MSN # Master of Science in Nursing
62
+ - MFT # Master of Family Therapy
63
+ - MEd # Master of Education
64
+ - M.Ed # Master of Education
65
+ - M.Ed. # Master of Education
66
+ - MA # Master of Arts
67
+ - M.A # Master of Arts
68
+ - M.A. # Master of Arts
69
+ - ME # Master of Engineering
70
+ - M.E # Master of Engineering
71
+ - MT # Master of Teaching
72
+ - MPT # Master of Physical Therapy
73
+ - MLS # Master of Library Science
74
+ - BSc # Bachelor of Science
75
+ - B.Sc # Bachelor of Science
76
+ - B.Sc. # Bachelor of Science
77
+ - BA # Bachelor of Arts
78
+ - B.A # Bachelor of Arts
79
+ - B.A. # Bachelor of Arts
80
+ - BS # Bachelor of Science
81
+ - B.S # Bachelor of Science
82
+ - B.S. # Bachelor of Science
83
+ - BBA # Bachelor of Business Administration
84
+ - BSN # Bachelor of Science in Nursing
85
+ - BSW # Bachelor of Social Work
86
+ - BE # Bachelor of Engineering
87
+ - B.E # Bachelor of Engineering
88
+ - BSEE # Bachelor of Science in Electrical Engineering
89
+ - BSME # Bachelor of Science in Mechanical Engineering
90
+ - BSCE # Bachelor of Science in Civil Engineering
91
+ - BSCS # Bachelor of Science in Computer Science
92
+
93
+ # Professional Certifications
94
+ - CPA # Certified Public Accountant
95
+ - C.P.A # Certified Public Accountant
96
+ - C.P.A. # Certified Public Accountant
97
+ - CFA # Chartered Financial Analyst
98
+ - CFP # Certified Financial Planner
99
+ - PMP # Project Management Professional
100
+ - CISSP # Certified Information Systems Security Professional
101
+ - CISA # Certified Information Systems Auditor
102
+ - CISM # Certified Information Security Manager
103
+ - PMI # Project Management Institute
104
+ - SHRM # Society for Human Resource Management
105
+ - SPHR # Senior Professional in Human Resources
106
+ - PHR # Professional in Human Resources
107
+ - CHRP # Certified Human Resources Professional
108
+ - CCP # Certified Compensation Professional
109
+ - GCIH # GIAC Certified Incident Handler
110
+ - GSEC # GIAC Security Essentials
111
+ - RHCE # Red Hat Certified Engineer
112
+ - MCSE # Microsoft Certified Systems Engineer
113
+ - CCNA # Cisco Certified Network Associate
114
+ - CCNP # Cisco Certified Network Professional
115
+ - CCIE # Cisco Certified Internetwork Expert
116
+
117
+ # Medical & Healthcare
118
+ - RN # Registered Nurse
119
+ - R.N # Registered Nurse
120
+ - R.N. # Registered Nurse
121
+ - LPN # Licensed Practical Nurse
122
+ - L.P.N # Licensed Practical Nurse
123
+ - NP # Nurse Practitioner
124
+ - N.P # Nurse Practitioner
125
+ - PA # Physician Assistant
126
+ - P.A # Physician Assistant
127
+ - RT # Respiratory Therapist
128
+ - R.T # Respiratory Therapist
129
+ - EMT # Emergency Medical Technician
130
+ - RD # Registered Dietitian
131
+ - R.D # Registered Dietitian
132
+ - OT # Occupational Therapist
133
+ - O.T # Occupational Therapist
134
+ - PT # Physical Therapist
135
+ - P.T # Physical Therapist
136
+ - ST # Speech Therapist
137
+ - S.T # Speech Therapist
138
+ - MT # Medical Technologist
139
+ - M.T # Medical Technologist
140
+ - MLT # Medical Laboratory Technician
141
+ - CNA # Certified Nursing Assistant
142
+ - LVN # Licensed Vocational Nurse
143
+ - CNS # Clinical Nurse Specialist
144
+ - CRNA # Certified Registered Nurse Anesthetist
145
+ - FNP # Family Nurse Practitioner
146
+ - ANP # Adult Nurse Practitioner
147
+ - GNP # Gerontological Nurse Practitioner
148
+ - PNP # Pediatric Nurse Practitioner
149
+ - WHNP # Women's Health Nurse Practitioner
150
+
151
+ # Engineering & Technology
152
+ - PE # Professional Engineer
153
+ - P.E # Professional Engineer
154
+ - P.E. # Professional Engineer
155
+ - FE # Fundamentals of Engineering
156
+ - EIT # Engineer in Training
157
+ - SE # Software Engineer
158
+ - SWE # Software Engineer
159
+ - Dev # Developer
160
+ - Eng # Engineer
161
+ - Arch # Architect
162
+ - Sys # Systems
163
+ - Net # Network
164
+ - DBA # Database Administrator
165
+ - SA # Systems Administrator
166
+ - IT # Information Technology
167
+ - I.T # Information Technology
168
+ - IS # Information Systems
169
+ - CS # Computer Science
170
+ - C.S # Computer Science
171
+ - ECE # Electrical and Computer Engineering
172
+ - ME # Mechanical Engineering
173
+ - M.E # Mechanical Engineering
174
+ - CE # Civil Engineering
175
+ - C.E # Civil Engineering
176
+ - ChE # Chemical Engineering
177
+ - IE # Industrial Engineering
178
+ - I.E # Industrial Engineering
179
+ - AE # Aerospace Engineering
180
+ - A.E # Aerospace Engineering
181
+
182
+ # Academic Titles
183
+ - Prof # Professor
184
+ - Prof. # Professor
185
+ - Asst # Assistant
186
+ - Asst. # Assistant
187
+ - Assoc # Associate
188
+ - Assoc. # Associate
189
+ - Adj # Adjunct
190
+ - Adj. # Adjunct
191
+ - Emer # Emeritus
192
+ - Emer. # Emeritus
193
+ - Lect # Lecturer
194
+ - Lect. # Lecturer
195
+ - Instr # Instructor
196
+ - Instr. # Instructor
197
+ - Res # Researcher
198
+ - Sci # Scientist
199
+
200
+ # Specific Professional Acronyms
201
+ - FACAAI
202
+ - FACOOG
203
+ - FFARCS
204
+ - FAAAI
205
+ - FAAFP
206
+ - FAAOS
207
+ - FAAPM
208
+ - FACCP
209
+ - FACEP
210
+ - FACOG
211
+ - FACPM
212
+ - FACSM
213
+ - FAOTA
214
+ - FAPHA
215
+ - AEMSN
216
+ - AOCN
217
+ - APMC
218
+ - APNP
219
+ - APRN
220
+ - ARNP
221
+ - ARRT
222
+ - ASCP
223
+ - ASNC
224
+ - ASPO
225
+ - ARTS
226
+ - BCLS
227
+ - BDSC
228
+ - BHYG
229
+ - BPHN
230
+ - BSED
231
+ - BSEH
232
+ - BSPH
233
+ - BVMS
234
+ - CANP
235
+ - CAPA
236
+ - CARN
237
+ - CCCN
238
+ - CCES
239
+ - CCNS
240
+ - CCRN
241
+ - CCSP
242
+ - CCST
243
+ - CCTC
244
+ - CCTN
245
+ - CDDN
246
+ - CDMS
247
+ - CETN
248
+ - CFCN
249
+ - CFNP
250
+ - CFRN
251
+ - CGRN
252
+ - CHES
253
+ - CHPN
254
+ - CHRN
255
+ - CHUC
256
+ - CLNC
257
+ - CMCN
258
+ - CNAA
259
+ - CNMT
260
+ - CNNP
261
+ - CNRN
262
+ - CNSN
263
+ - COCN
264
+ - COHN
265
+ - COMA
266
+ - CORN
267
+ - COTA
268
+ - CPAN
269
+ - CPDN
270
+ - CPFT
271
+ - CPHQ
272
+ - CPNA
273
+ - CPNL
274
+ - CPNP
275
+ - CPON
276
+ - CPSN
277
+ - CRTT
278
+ - CTRN
279
+ - CUNP
280
+ - CURN
281
+ - CWCN
282
+ - DDSC
283
+ - DMSC
284
+ - DNSC
285
+ - LATC
286
+ - LCCE
287
+ - LCPC
288
+ - LCSW
289
+ - LMSW
290
+ - LNCC
291
+ - LRCP
292
+ - LRCS
293
+ - MBBS
294
+ - MICN
295
+ - MPAS
296
+ - MRAD
297
+ - MRCP
298
+ - MRCS
299
+ - MSEE
300
+ - MSEH
301
+ - MSLS
302
+ - MSSW
303
+ - NCSN
304
+ - OGNP
305
+ - PALS
306
+ - PCCN
307
+ - PCNS
308
+ - PHRN
309
+ - RDCS
310
+ - RDMS
311
+ - REPT
312
+ - RHIA
313
+ - RHIT
314
+ - RNCS
315
+ - RNFA
316
+ - FAAN
317
+ - FAAO
318
+ - FAAP
319
+ - FACC
320
+ - FACD
321
+ - FACE
322
+ - FACG
323
+ - FACP
324
+ - FACR
325
+ - FACS
326
+ - FAEN
327
+ - FAGD
328
+ - FAMA
329
+ - FAPA
330
+ - FCAP
331
+ - FCCM
332
+ - FCPS
333
+ - FICA
334
+ - FICC
335
+ - FICS
336
+ - ACLS
337
+ - ACRN
338
+ - ALNC
339
+ - AMSC
340
+ - BCH
341
+ - BDS
342
+ - BHS
343
+ - BMS
344
+ - BSM
345
+ - CBE
346
+ - CBI
347
+ - CCE
348
+ - CCM
349
+ - CDA
350
+ - CDN
351
+ - CEN
352
+ - CFN
353
+ - CGN
354
+ - CGT
355
+ - CHB
356
+ - CHD
357
+ - CHN
358
+ - CIC
359
+ - CLA
360
+ - CLS
361
+ - CLT
362
+ - CMA
363
+ - CNE
364
+ - CNI
365
+ - CNM
366
+ - CNN
367
+ - CNO
368
+ - CNP
369
+ - CPN
370
+ - CRN
371
+ - CRT
372
+ - CSN
373
+ - CST
374
+ - CTN
375
+ - CUA
376
+ - CVN
377
+ - DCH
378
+ - DCP
379
+ - DDR
380
+ - DME
381
+ - DMT
382
+ - DMV
383
+ - DNC
384
+ - DNE
385
+ - DNS
386
+ - DON
387
+ - DOS
388
+ - DPH
389
+ - LDN
390
+ - LDO
391
+ - LNC
392
+ - LPC
393
+ - LRN
394
+ - LSN
395
+ - MCH
396
+ - MDS
397
+ - MHE
398
+ - MHN
399
+ - MHS
400
+ - MTA
401
+ - NCT
402
+ - NIC
403
+ - NMT
404
+ - NNP
405
+ - NPC
406
+ - NPP
407
+ - OCN
408
+ - OCS
409
+ - ONC
410
+ - OTA
411
+ - OTC
412
+ - OTL
413
+ - OTR
414
+ - PHN
415
+ - PTA
416
+ - RDA
417
+ - RDH
418
+ - RMA
419
+ - RNA
420
+ - RNC
421
+ - RNP
422
+ - RPH
423
+ - RPN
424
+ - RPT
425
+ - RRA
426
+ - RRT
427
+ - RTR
428
+ - RVT
429
+ - SBB
430
+ - SCD
431
+ - SCT
432
+ - SEN
433
+ - SHN
434
+ - SLS
435
+ - SPN
436
+ - SVN
437
+ - TNP
438
+ - TNS
439
+ - VMD
440
+ - WCC
@@ -0,0 +1,182 @@
1
+ ---
2
+ - accountant
3
+ - administrative
4
+ - assistant
5
+ - analyst
6
+ - architect
7
+ - manager
8
+ - management
9
+ - attorney
10
+ - auditor
11
+ - business
12
+ - carpenter
13
+ - cashier
14
+ - chef
15
+ - chief
16
+ - executive
17
+ - officer
18
+ - financial
19
+ - operating
20
+ - technology
21
+ - clerk
22
+ - coach
23
+ - consultant
24
+ - controller
25
+ - coordinator
26
+ - counselor
27
+ - customer
28
+ - service
29
+ - representative
30
+ - data
31
+ - dentist
32
+ - designer
33
+ - developer
34
+ - director
35
+ - doctor
36
+ - driver
37
+ - economist
38
+ - editor
39
+ - educator
40
+ - engineer
41
+ - facility
42
+ - general
43
+ - graphic
44
+ - human
45
+ - resources
46
+ - instructor
47
+ - lawyer
48
+ - librarian
49
+ - marketing
50
+ - mechanic
51
+ - nurse
52
+ - operations
53
+ - pharmacist
54
+ - photographer
55
+ - physician
56
+ - pilot
57
+ - president
58
+ - principal
59
+ - professor
60
+ - programmer
61
+ - project
62
+ - psychologist
63
+ - receptionist
64
+ - registered
65
+ - researcher
66
+ - sales
67
+ - scientist
68
+ - secretary
69
+ - security
70
+ - guard
71
+ - software
72
+ - specialist
73
+ - supervisor
74
+ - teacher
75
+ - technician
76
+ - therapist
77
+ - trainer
78
+ - treasurer
79
+ - vice
80
+ - waiter
81
+ - web
82
+ - writer
83
+ - development
84
+ - clinical
85
+ - computer
86
+ - construction
87
+ - worker
88
+ - success
89
+ - entry
90
+ - database
91
+ - administrator
92
+ - advisor
93
+ - information
94
+ - network
95
+ - office
96
+ - product
97
+ - program
98
+ - quality
99
+ - assurance
100
+ - research
101
+ - social
102
+ - systems
103
+ - technical
104
+ - branch
105
+ - communications
106
+ - compliance
107
+ - event
108
+ - public
109
+ - relations
110
+ - control
111
+ - inspector
112
+ - senior
113
+ - training
114
+ - account
115
+ - application
116
+ - art
117
+ - associate
118
+ - owner
119
+ - creative
120
+ - deputy
121
+ - planner
122
+ - artist
123
+ - health
124
+ - insurance
125
+ - agent
126
+ - nursing
127
+ - personnel
128
+ - production
129
+ - producer
130
+ - real
131
+ - estate
132
+ - user
133
+ - experience
134
+ - veterinarian
135
+ - accounting
136
+ - benefits
137
+ - contract
138
+ - education
139
+ - electrical
140
+ - facilities
141
+ - grant
142
+ - services
143
+ - it
144
+ - support
145
+ - laboratory
146
+ - maintenance
147
+ - medical
148
+ - payroll
149
+ - purchasing
150
+ - records
151
+ - safety
152
+ - territory
153
+ - volunteer
154
+ - warehouse
155
+ - dean
156
+ - budget
157
+ - case
158
+ - community
159
+ - credit
160
+ - employment
161
+ - engineering
162
+ - field
163
+ - finance
164
+ - grants
165
+ - inventory
166
+ - legal
167
+ - manufacturing
168
+ - technologist
169
+ - outreach
170
+ - physical
171
+ - affairs
172
+ - recruiting
173
+ - resource
174
+ - media
175
+ - staff
176
+ - supply
177
+ - chain
178
+ - transportation
179
+ - registrar
180
+ - content
181
+ - environmental
182
+ - realtor