email_address 0.0.3 → 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 77e8fb26631e827d41da7eee015141495316f430
4
- data.tar.gz: 0fa66926ca1dbbdc13154aae789850eec7e1eb75
3
+ metadata.gz: 570217e88251b8510966cf63bd1a90a7fa79b458
4
+ data.tar.gz: c23a8a65b94f8a5a6de14f221e8db7f16d0df6e1
5
5
  SHA512:
6
- metadata.gz: 5d41c4ca234699c636811187718fc7b73fe52ff55256ebbeb3e32e518d3330c016a72c3185f9632be0cb4d612064e09408517f893927ddd23db9518d0c219066
7
- data.tar.gz: 93c7fec757b3a635b39aea3bdc6a259ad5c69b0b5369d4ea48e41cde6009b406a8c2eec94a0bc1e42ed837b220a240d971079135e58455a57d0db5ad1e2df3dd
6
+ metadata.gz: d3e7f8dd2a92753b889bc4d8734961fd12626a40d28a06a52ce7e5c9c2691a5a2c62a8a05fc061741d54ec3c3576e1a6bde782b60e1647c3d4770bce95080c52
7
+ data.tar.gz: 9c05fb4ace40a99b51bf5c6646b0cc1d398a857e9e2f3074e3b05418c0562610e51bc7b0cef55ca7be4946b74de53bbff3b8fa69e121c8f064c5b08fd97e854d
data/.travis.yml ADDED
@@ -0,0 +1,10 @@
1
+ language: ruby
2
+ rvm:
3
+ - ruby-head
4
+ - 2.2.2
5
+ - jruby-9.0.4.0
6
+ #- rbx
7
+
8
+ addons:
9
+ code_climate:
10
+ repo_token: 729b2d0e2cc94f63b7da91042da0b6d77bbe30c2024dba04881f46caf702636f
data/Gemfile CHANGED
@@ -2,4 +2,3 @@ source 'https://rubygems.org'
2
2
 
3
3
  # Specify your gem's dependencies in email_address.gemspec
4
4
  gemspec
5
- gem 'minitest'
data/README.md CHANGED
@@ -1,156 +1,140 @@
1
1
  # Email Address
2
2
 
3
3
  [![Gem Version](https://badge.fury.io/rb/email_address.svg)](http://rubygems.org/gems/email_address)
4
+ [![Build Status](https://travis-ci.org/afair/email_address.svg?branch=v0.1)](https://travis-ci.org/afair/email_address)
5
+ [![Code Climate](https://codeclimate.com/github/afair/email_address/badges/gpa.svg)](https://codeclimate.com/github/afair/email_address)
4
6
 
5
- The EmailAddress gem provides a structured datatype for email addresses
6
- and pushes for an _opinionated_ model for which RFC patterns should be
7
- accepted as a "best practice" and which should not be supported (in the
8
- name of sanity).
9
-
10
- This library provides:
11
-
12
- * Email Address Validation
13
- * Converting between email address forms
14
- * **Original:** From the user or data source
15
- * **Normalized:** A standardized format for identification
16
- * **Canonical:** A format used to identify a unique user
17
- * **Redacted:** A format used to store an email address privately
18
- * **Reference:** Digest formats for sharing addresses without exposing
19
- them.
20
- * Matching addresses to Email/Internet Service Providers. Per-provider
21
- rules for:
22
- * Validation
23
- * Address Tag formats
24
- * Canonicalization
25
- * Unicode Support
26
-
27
- ## Email Addresses: The Good Parts
28
-
29
- Email Addresses are split into two parts: the `local` and `host` part,
30
- separated by the `@` symbol, or of the generalized format:
31
-
32
- mailbox+tag@subdomain.domain.tld
33
-
34
- The **Mailbox** usually identifies the user, role account, or application.
35
- A **Tag** is any suffix for the mailbox useful for separating and filtering
36
- incoming email. It is usually preceded by a '+' or other character. Tags are
37
- not always available for a given ESP or MTA.
38
-
39
- Local Parts should consist of lower-case 7-bit ASCII alphanumeric and these characters:
40
- `-+'.,` It should start with and end with an alphanumeric character and
41
- no more than one special character should appear together.
42
-
43
- Host parts contain a lower-case version of any standard domain name.
44
- International Domain Names are allowed, and can be converted to
45
- [Punycode](http://en.wikipedia.org/wiki/Punycode),
46
- an encoding system of Unicode strings into the 7-bit ASCII character set.
47
- Domain names should be configured with MX records in DNS to receive
48
- email, though this is sometimes mis-configured and the A record can be
49
- used as a backup.
50
-
51
- This is the subset of the RFC Email Address specification that should be
52
- used.
53
-
54
- ## Email Addresses: The Bad Parts
55
-
56
- Email addresses are defined and redefined in a series of RFC standards.
57
- Conforming to the full standards is not recommended for easily
58
- identifying and supporting email addresses. Among these specification,
59
- we reject are:
7
+ The `email_address` gem provides a ruby language library for working
8
+ with email addresses.
60
9
 
61
- * Case-sensitive local parts: `First.Last@example.com`
62
- * Spaces and Special Characters: `"():;<>@[\\]`
63
- * Quoting and Escaping Requirements: `"first \"nickname\" last"@example.com`
64
- * Comment Parts: `(comment)mailbox@example.com`
65
- * IP and IPv6 addresses as hosts: `mailbox@[127.0.0.1]`
66
- * Non-ASCII (7-bit) characters in the local part: `Pelé@example.com`
67
- * Validation by regular expressions like:
68
- ```
69
- (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*
70
- | "(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]
71
- | \\[\x01-\x09\x0b\x0c\x0e-\x7f])*")
72
- @ (?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
73
- | \[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
74
- (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:
75
- (?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]
76
- | \\[\x01-\x09\x0b\x0c\x0e-\x7f])+)
77
- \])
78
- ```
79
-
80
- ## Internationalization
10
+ By default, it validates against conventional usage,
11
+ the format preferred for user email addresses.
12
+ It can be configured to validate against RFC "Standard" formats,
13
+ common email service provider formats, and perform DNS validation.
81
14
 
82
- The industry is moving to support Unicode characters in the local part
83
- of the email address. Currently, SMTP supports only 7-bit ASCII, but a
84
- new `SMTPUTF8` standard is available, but not yet widely implemented.
85
- To work properly, global Email systems must be converted to UTF-8
86
- encoded databases and upgraded to the new email standards.
15
+ Using `email_address` to validate user email addresses results in
16
+ fewer "false positives" due to typing errors and gibberish data.
17
+ It validates syntax more strictly for popular email providers,
18
+ and can deal with gmail's "optional dots" in addresses.
87
19
 
88
- The problem with i18n email addresses is that support outside of the
89
- given locale becomes hard to enter addresses on keyboards for another
90
- locale. Because of this, internationalized local parts are not yet
91
- supported by default. They are more likely to be erroneous.
20
+ It provides Active Record (Rails) extensions, including an
21
+ address validator and attributes API custom datatypes.
92
22
 
93
- Proper personal identity can still be provided using
94
- [MIME Encoded-Words](http://en.wikipedia.org/wiki/MIME#Encoded-Word)
95
- in Email headers.
23
+ *Note:* Version 0.1.0 contains significant API and internal changes over the 0.0.3
24
+ version. If you have been using the 0.0.x series of the gem, you may
25
+ want to continue using with your current version.
96
26
 
97
- ## Email Addresses Forms
98
-
99
- * The **original** email address is of the format given by the user.
100
- * The **Normalized** address has:
101
- * Lower-case the local and domain part
102
- * Tags are kept as they are important for the user
103
- * Remove comments and any "bad parts"
104
- * This format is what should be used to identify the account.
105
- * The **Canonical** form is used to uniquely identify the mailbox.
106
- * Domains stored as punycode for IDN
107
- * Address Tags removed
108
- * Special characters removed (dots in gmail addresses are not
109
- significant)
110
- * Lower cased and "bad parts" removed
111
- * Useful for locating a user who forgets registering with a tag or
112
- with a "Bad part" in the email address.
113
- * The **Redacted** format is used to store email address fingerprints
114
- instead of the actual addresses:
115
- * Format: sha1(canonical_address)@domain
116
- * Given an email address, the record can be found
117
- * Useful for treating email addresses as sensitive data and
118
- complying with requests to remove the address from your database and
119
- still maintain the state of the account.
120
- * The **Reference** form allows you to publicly share an address without
121
- revealing the actual address.
122
- * Can be the MD5 or SHA1 of the normalized or canonical address
123
- * Useful for "do not email" lists
124
- * Useful for cookies that do not reveal the actual account
125
-
126
- ## Treating Email Addresses as Sensitive Data
27
+ Requires Ruby 2.0 or later.
127
28
 
128
- Like Social Security and Credit Card Numbers, email addresses are
129
- becoming more important as a personal identifier on the internet.
130
- Increasingly, we should treat email addresses as sensitive data. If your
131
- site/database becomes compromised by hackers, these email addresses can
132
- be stolen and used to spam your users and to try to gain access to their
133
- accounts. You should not be storing passwords in plain text; perhaps you
134
- don't need to store email addresses un-encoded either.
29
+ ## Background
135
30
 
136
- Consider this: upon registration, store the redacted email address for
137
- the user, and of course, the salted, encrypted password.
138
- When the user logs in, compute the redacted email address from
139
- the user-supplied one and look up the record. Store the original address
140
- in the session for the user, which goes away when the user logs out.
31
+ The email address specification is complex and often not what you want
32
+ when working with personal email addresses in applications. This library
33
+ introduces terms to distinguish types of email addresses.
141
34
 
142
- Sometimes, users demand you strike their information from the database.
143
- Instead of deleting their account, you can "redact" their email
144
- address, retaining the state of the account to prevent future
145
- access. Given the original email address again, the redacted account can
146
- be identified if necessary.
35
+ * *Normal* - The edited form of any input email address. Typically, it
36
+ is lower-cased and minor "fixes" can be performed, depending on the
37
+ configurations and email address provider.
147
38
 
148
- Because of these use cases, the **redact** method on the email address
149
- instance has been provided.
39
+ CKENT@DAILYPLANET.NEWS => ckent@dailyplanet.news
40
+
41
+ * *Conventional* - Most personal account addresses are in this basic
42
+ format, one or more "words" separated by a single simple punctuation
43
+ character. It consists of a mailbox (user name or role account) and
44
+ an optional address "tag" assigned by the user.
45
+
46
+ miles.o'brien@ncc-1701-d.ufp
47
+
48
+ * *Relaxed* - A less strict form of Conventional, same character set,
49
+ must begin and end with an alpha-numeric character, but order within
50
+ is not enforced.
51
+
52
+ aasdf-34-.z@example.com
53
+
54
+ * *Standard* - The RFC-Compliant syntax of an email address. This is
55
+ useful when working with software-generated addresses or handling
56
+ existing email addresses, but otherwise not useful for personal
57
+ addresses.
58
+
59
+ madness!."()<>[]:,;@\\\"!#$%&'*+-/=?^_`{}| ~.a(comment )"@example.org
150
60
 
151
- ## Installation
61
+ * *Canonical* - An unique account address, lower-cased, without the
62
+ tag, and with irrelevant characters stripped.
152
63
 
153
- Add this line to your application's Gemfile:
64
+ clark.kent+scoops@gmail.com => clarkkent@gmail.com
65
+
66
+ * *Reference* - The MD5 of the Canonical format, used to share account
67
+ references without exposing the private email address directly.
68
+
69
+ Clark.Kent+scoops@gmail.com => c5be3597c391169a5ad2870f9ca51901
70
+
71
+ * *Redacted* - A form of the email address where it is replaced by
72
+ a SHA1-based version to remove the original address from the
73
+ database, or to store the address privately, yet still keep it
74
+ accessible at query time by converting the queried address to
75
+ the redacted form.
76
+
77
+ Clark.Kent+scoops@gmail.com => {bea3f3560a757f8142d38d212a931237b218eb5e}@gmail.com
78
+
79
+ * *Munged* - An obfuscated version of the email address suitable for
80
+ publishing on the internet, where email address harvesting
81
+ could occur.
82
+
83
+ Clark.Kent+scoops@gmail.com => cl\*\*\*\*\*@gm\*\*\*\*\*
84
+
85
+ Other terms:
86
+
87
+ * *Local* - The left-hand side of the "@", representing the user,
88
+ mailbox, or role, and an optional "tag".
89
+
90
+ mailbox+tag@example.com; Local part: mailbox+tag
91
+
92
+ * *Mailbox* - The destination user account or role account.
93
+ * *Tag* - A parameter added after the mailbox, usually after the
94
+ "+" symbol, set by the user for mail filtering and sub-accounts.
95
+ Not all mail systems support this.
96
+ * *Host* (sometimes called *Domain*) - The right-hand side of the "@"
97
+ indicating the domain or host name server to delivery the email.
98
+ If missing, "localhost" is assumed, or if not a fully-qualified
99
+ domain name, it assumed another computer on the same network, but
100
+ this is increasingly rare.
101
+ * *Provider* - The Email Service Provider (ESP) providing the email
102
+ service. Each provider may have its own email address validation
103
+ and canonicalization rules.
104
+ * *Punycode* - A host name with Unicode characters (International
105
+ Domain Name or IDN) needs conversion to this ASCII-encoded format
106
+ for DNS lookup.
107
+
108
+ "HIRO@こんにちは世界.com" => "hiro@xn--28j2a3ar1pp75ovm7c.com"
109
+
110
+ Wikipedia has a great article on
111
+ [Email Addresses](https://en.wikipedia.org/wiki/Email_address),
112
+ much more readable than the section within
113
+ [RFC 5322](https://tools.ietf.org/html/rfc5322#section-3.4)
114
+
115
+ ## Avoiding the Bad Parts of RFC Specification
116
+
117
+ Following the RFC specification sounds like a good idea, until you
118
+ learn about all the madness contained therein. This library can
119
+ validate the RFC syntax, but this is never useful, especially when
120
+ validating user email address submissions. By default, it validates
121
+ to the *conventional* format.
122
+
123
+ Here are a few parts of the RFC specification you should avoid:
124
+
125
+ * Case-sensitive local parts: `First.Last@example.com`
126
+ * Spaces and Special Characters: `"():;<>@[\\]`
127
+ * Quoting and Escaping Requirements: `"first \"nickname\" last"@example.com`
128
+ * Comment Parts: `(comment)mailbox@example.com`
129
+ * IP and IPv6 addresses as hosts: `mailbox@[127.0.0.1]`
130
+ * Non-ASCII (7-bit) characters in the local part: `Pelé@example.com`
131
+ * Validation by voodoo regular expressions
132
+ * Gmail allows ".." in addresses since they are not meaningful, but
133
+ the standard does not.
134
+
135
+ ## Installation With Rails or Bundler
136
+
137
+ If you are using Rails or a project with Bundler, add this line to your application's Gemfile:
154
138
 
155
139
  gem 'email_address'
156
140
 
@@ -158,100 +142,361 @@ And then execute:
158
142
 
159
143
  $ bundle
160
144
 
161
- Or install it yourself as:
145
+ ## Installation Without Bundler
146
+
147
+ If you are not using Bundler, you need to install the gem yourself.
162
148
 
163
149
  $ gem install email_address
164
150
 
151
+ Require the gem inside your script.
152
+
153
+ require 'rubygems'
154
+ require 'email_address'
155
+
165
156
  ## Usage
166
157
 
167
- Inspect your email address string by creating an instance of
168
- EmailAddress:
158
+ Use `EmailAddress` to do transformations and validations. You can also
159
+ instantiate an object to inspect the address.
169
160
 
170
- email = EmailAddress.new("USER+tag@EXAMPLE.com")
171
- email.normalize #=> "user+tag@example.com"
172
- email.canonical #=> "user@example.com"
173
- email.redact #=> "63a710569261a24b3766275b7000ce8d7b32e2f7@example.com"
174
- email.sha1 #=> "63a710569261a24b3766275b7000ce8d7b32e2f7"
175
- email.md5 #=> "dea073fb289e438a6d69c5384113454c"
161
+ These top-level helpers return edited email addresses and validation
162
+ check.
176
163
 
177
- Email Service Provider (ESP) specific edits can be created to provide
178
- validations and canonical manipulations. A few are given out of the box.
179
- Providers can be defined bu email domain match rules, or by match rules
180
- for the MX host names using domains or CIDR addresses.
164
+ address = "Clark.Kent+scoops@gmail.com"
165
+ EmailAddress.valid?(address) #=> true
166
+ EmailAddress.normal(address) #=> "clark.kent+scoops@gmail.com"
167
+ EmailAddress.canonical(address) #=> "clarkkent@gmail.com"
168
+ EmailAddress.reference(address) #=> "c5be3597c391169a5ad2870f9ca51901"
169
+ EmailAddress.redact(address) #=> "{bea3f3560a757f8142d38d212a931237b218eb5e}@gmail.com"
170
+ EmailAddress.munge(address) #=> "cl*****@gm*****"
171
+ EmailAddress.matches?(address, 'google') #=> 'google' (true)
172
+ EmailAddress.error("#bad@example.com") #=> "Invalid Mailbox"
181
173
 
182
- email = EmailAddress.new("First.Last+Tag@Gmail.Com")
174
+ Or you can create an instance of the email address to work with it.
175
+
176
+ email = EmailAddress.new(address) #=> #<EmailAddress::Address:0x007fe6ee150540 ...>
177
+ email.normal #=> "clark.kent+scoops@gmail.com"
178
+ email.canonical #=> "clarkkent@gmail.com"
179
+ email.original #=> "Clark.Kent+scoops@gmail.com"
180
+ email.valid? #=> true
181
+
182
+ Here are some other methods that are available.
183
+
184
+ email.redact #=> "{bea3f3560a757f8142d38d212a931237b218eb5e}@gmail.com"
185
+ email.sha1 #=> "bea3f3560a757f8142d38d212a931237b218eb5e"
186
+ email.md5 #=> "c5be3597c391169a5ad2870f9ca51901"
187
+ email.host_name #=> "gmail.com"
183
188
  email.provider #=> :google
184
- email.canonical #=> "firstlast@gmail.com"
189
+ email.mailbox #=> "clark.kent"
190
+ email.tag #=> "scoops"
191
+
192
+ email.host.exchanger.first[:ip] #=> "2a00:1450:400b:c02::1a"
193
+ email.host.txt_hash #=> {:v=>"spf1", :redirect=>"\_spf.google.com"}
194
+
195
+ EmailAddress.normal("HIRO@こんにちは世界.com")
196
+ #=> "hiro@xn--28j2a3ar1pp75ovm7c.com"
197
+ EmailAddress.normal("hiro@xn--28j2a3ar1pp75ovm7c.com", host_encoding: :unicode)
198
+ #=> "hiro@こんにちは世界.com"
199
+
200
+ #### Rails Validator
201
+
202
+ For Rails' ActiveRecord classes, EmailAddress provides an ActiveRecordValidator.
203
+ Specify your email address attributes with `field: :user_email`, or
204
+ `fields: [:email1, :email2]`. If neither is given, it assumes to use the
205
+ `email` or `email_address` attribute.
206
+
207
+ class User < ActiveRecord::Base
208
+ validates_with EmailAddress::ActiveRecordValidator, field: :email
209
+ end
210
+
211
+ #### Rails Email Address Type Attribute
212
+
213
+ Initial support is provided for Active Record 5.0 attributes API.
214
+
215
+ First, you need to register the type in
216
+ `config/initializers/email_address.rb` along with any global
217
+ configurations you want.
218
+
219
+ ActiveRecord::Type.register(:email_address, EmailAddress::EmailAddressType)
220
+ ActiveRecord::Type.register(:canonical_email_address,
221
+ EmailAddress::CanonicalEmailAddressType)
222
+
223
+ Assume the Users table contains the columns "email" and "canonical_email".
224
+ We want to normalize the address in "email" and store the canonical/unique
225
+ version in "canonical_email". This code will set the canonical_email when
226
+ the email attribute is assigned. With the canonical_email column,
227
+ we can look up the User, even it the given email address didn't exactly
228
+ match the registered version.
229
+
230
+ class User < ApplicationRecord
231
+ attribute :email, :email_address
232
+ attribute :canonical_email, :canonical_email_address
185
233
 
186
- Storing the canonical address with the request address (don't remove
187
- tags given by users), you can lookup email addresses without the
188
- original formatting, case, and tag information.
234
+ validates_with EmailAddress::ActiveRecordValidator,
235
+ fields: %i(email canonical_email)
189
236
 
190
- You can inspect the MX (Mail Exchanger) records
237
+ def email=(email_address)
238
+ self[:canonical_email] = email_address
239
+ self[:email] = email_address
240
+ end
191
241
 
192
- email.host.exchanger.mxers.first
193
- #=> {:host=>"alt3.gmail-smtp-in.l.google.com", :ip=>"173.194.70.27", :priority=>30}
242
+ def self.find_by_email(email)
243
+ user = self.find_by(email: EmailAddress.normal(email))
244
+ user ||= self.find_by(canonical_email: EmailAddress.canonical(email))
245
+ user ||= self.find_by(canonical_email: EmailAddress.redacted(email))
246
+ user
247
+ end
194
248
 
195
- You can see if it validates as an opinionated address:
249
+ def redact!
250
+ self[:canonical_email] = EmailAddress.redact(self.canonical_email)
251
+ self[:email] = self[:canonical_email]
252
+ end
253
+ end
254
+
255
+ Here is how the User model works:
256
+
257
+ user = User.create(email:"Pat.Smith+registrations@gmail.com")
258
+ user.email #=> "pat.smith+registrations@gmail.com"
259
+ user.canonical_email #=> "patsmith@gmail.com"
260
+ User.find_by_email("PAT.SMITH@GMAIL.COM")
261
+ #=> #<User email="pat.smith+registrations@gmail.com">
262
+
263
+
264
+ The `find_by_email` method looks up a given email address by the
265
+ normalized form (lower case), then by the canonical form, then finally
266
+ by the redacted form.
267
+
268
+ #### Validation
196
269
 
197
- email.valid? # Resonably valid?
198
- email.errors #=> [:mx]
199
- email.valid_host? # Host name is defined in DNS
200
- email.strict? # Strictly valid?
270
+ The only true validation is to send a message to the email address and
271
+ have the user (or process) verify it has been received. Syntax checks
272
+ help prevent erroneous input. Even sent messages can be silently
273
+ dropped, or bounced back after acceptance. Conditions such as a
274
+ "Mailbox Full" can mean the email address is known, but abandoned.
275
+
276
+ There are different levels of validations you can perform. By default, it will
277
+ validate to the "Provider" (if known), or "Conventional" format defined as the
278
+ "default" provider. You may pass a a list of parameters to select
279
+ which syntax and network validations to perform.
280
+
281
+ #### Comparison
201
282
 
202
283
  You can compare email addresses:
203
284
 
204
- e1 = EmailAddress.new("First.Last@Gmail.com")
205
- e1.to_s #=> "first.last@gmail.com"
206
- e2 = EmailAddress.new("FirstLast+tag@Gmail.com")
207
- e3.to_s #=> "firstlast+tag@gmail.com"
285
+ e1 = EmailAddress.new("Clark.Kent@Gmail.com")
286
+ e2 = EmailAddress.new("clark.kent+Superman@Gmail.com")
208
287
  e3 = EmailAddress.new(e2.redact)
209
- e3.to_s #=> "554d32017ab3a7fcf51c88ffce078689003bc521@gmail.com"
288
+ e1.to_s #=> "clark.kent@gmail.com"
289
+ e2.to_s #=> "clark.kent+superman@gmail.com"
290
+ e3.to_s #=> "{bea3f3560a757f8142d38d212a931237b218eb5e}@gmail.com"
210
291
 
211
292
  e1 == e2 #=> false (Matches by normalized address)
212
293
  e1.same_as?(e2) #=> true (Matches as canonical address)
213
294
  e1.same_as?(e3) #=> true (Matches as redacted address)
214
295
  e1 < e2 #=> true (Compares using normalized address)
215
296
 
216
- ## Host Inspection
297
+ #### Matching
217
298
 
218
- The `EmailAddress::Host` can be used to inspect the email domain.
299
+ Matching addresses by simple patterns:
219
300
 
220
- ```ruby
221
- e1 = EmailAddress.new("First.Last@Gmail.com")
222
- e1.host.name #=> "gmail.com"
223
- e1.host.exchanger.mxers #=> [["alt4.gmail-smtp-in.l.google.com", "2a00:1450:400c:c01::1b", 30],...]
224
- e1.host.exchanger.mx_ips #=> ["2a00:1450:400c:c01::1b", ...]
225
- e1.host.matches?('.com') #=> true
226
- e1.host.txt #=> "v=spf1 redirect=_spf.google.com"
227
- ```
301
+ * Top-Level-Domain: .org
302
+ * Domain Name: example.com
303
+ * Registration Name: hotmail. (matches any TLD)
304
+ * Domain Glob: *.exampl?.com
305
+ * Provider Name: google
306
+ * Mailbox Name or Glob: user00*@
307
+ * Address or Glob: postmaster@domain*.com
308
+ * Provider or Registration: msn
228
309
 
229
- ## Domain Matching
310
+ Usage:
230
311
 
231
- You can also employ domain matching rules
312
+ e = EmailAddress.new("Clark.Kent@Gmail.com")
313
+ e.matches?("gmail.com") #=> true
314
+ e.matches?("google") #=> true
315
+ e.matches?(".org") #=> false
316
+ e.matches?("g*com") #=> true
317
+ e.matches?("gmail.") #=> true
318
+ e.matches?("*kent*@") #=> true
232
319
 
233
- email.host.matches?('gmail.com', '.us', '.msn.com', 'yahoo')
320
+ ### Configuration
234
321
 
235
- This tests the address can be matched in the given list of domain rules:
322
+ You can pass an options hash on the `.new()` and helper class methods to
323
+ control how the library treats that address. These can also be
324
+ configured during initialization by provider and default (see below).
236
325
 
237
- * Full host name. (subdomain.example.com)
238
- * TLD and domain wildcards (.us, .msg.com)
239
- * Registration names matching without the TLD. 'yahoo' matches:
240
- * "www.yahoo.com" (with Subdomains)
241
- * "yahoo.ca" (any TLD)
242
- * "yahoo.co.jp" (2-char TLD with 2-char Second-level)
243
- * But _may_ also match non-Yahoo domain names (yahoo.xxx)
326
+ EmailAddress.new("clark.kent@gmail.com",
327
+ dns_lookup::off, host_encoding: :unicode)
244
328
 
245
- ## Customizing
329
+ Globally, you can change and query configuration options:
246
330
 
247
- You can change configuration options and add new providers such as:
331
+ EmailAddress::Config.setting(:dns_lookup, :mx)
332
+ EmailAddress::Config.setting(:dns_lookup) #=> :mx
248
333
 
249
- EmailAddress::Config.setup do
250
- provider :github, domains:%w(github.com github.io)
251
- option :check_dns, false
252
- end
334
+ Or set multiple settings at once:
335
+
336
+ EmailAddress::Config.configure(local_downcase:false, dns_lookup: :off)
337
+
338
+ You can add special rules by domain or provider. It takes the options
339
+ above and adds the :domain_match and :exchanger_match rules.
340
+
341
+ EmailAddress.define_provider('google',
342
+ domain_match: %w(gmail.com googlemail.com),
343
+ exchanger_match: %w(google.com), # Requires dns_lookup==:mx
344
+ local_size: 5..64,
345
+ mailbox_canonical: ->(m) {m.gsub('.','')})
346
+
347
+ The library ships with the most common set of provider rules. It is not meant
348
+ to house a database of all providers, but a separate `email_address-providers`
349
+ gem may be created to hold this data for those who need more complete rules.
350
+
351
+ Personal and Corporate email systems are not intended for either solution.
352
+ Any of these email systems may be configured locally.
353
+
354
+ Pre-configured email address providers include: Google (gmail), AOL, MSN
355
+ (hotmail, live, outlook), and Yahoo. Any address not matching one of
356
+ those patterns use the "default" provider rule set. Exchanger matches
357
+ matches against the Mail Exchanger (SMTP receivers) hosts defined in
358
+ DNS. If you specify an exchanger pattern, but requires a DNS MX lookup.
359
+
360
+ For Rails application, create an initializer file with your default
361
+ configuration options:
362
+
363
+ # ./config/initializers/email_address.rb
364
+ EmailAddress::Config.setting( local_format: :relaxed )
365
+ EmailAddress::Config.provider(:github,
366
+ host_match: %w(github.com), local_format: :standard)
367
+
368
+ ### Available Configuration Settings
369
+
370
+ * dns_lookup: Enables DNS lookup for validation by
371
+ * :mx - DNS MX Record lookup
372
+ * :a - DNS A Record lookup (as some domains don't specify an MX incorrectly)
373
+ * :off - Do not perform DNS lookup (Test mode, network unavailable)
374
+
375
+ * sha1_secret -
376
+ This application-level secret is appended to the email_address to compute
377
+ the SHA1 Digest, making it unique to your application so it can't easily be
378
+ discovered by comparing against a known list of email/sha1 pairs.
379
+
380
+ * munge_string - "*****", the string to replace into munged addresses.
381
+
382
+ For local part configuration:
253
383
 
254
- See `lib/email_address/config.rb` for more options.
384
+ * local_downcase: true.
385
+ Downcase the local part. You probably want this for uniqueness.
386
+ RFC says local part is case insensitive, that's a bad part.
387
+
388
+ * local_fix: true.
389
+ Make simple fixes when available, remove spaces, condense multiple punctuations
390
+
391
+ * local_encoding: :ascii, :unicode,
392
+ Enable Unicode in local part. Most mail systems do not yet support this.
393
+ You probably want to stay with ASCII for now.
394
+
395
+ * local_parse: nil, ->(local) { [mailbox, tag, comment] }
396
+ Specify an optional lambda/Proc to parse the local part. It should return an
397
+ array (tuple) of mailbox, tag, and comment.
398
+
399
+ * local_format:
400
+ * :conventional - word ( puncuation{1} word )*
401
+ * :relaxed - alphanum ( allowed_characters)* alphanum
402
+ * :standard - RFC Compliant email addresses (anything goes!)
403
+
404
+ * local_size: 1..64,
405
+ A Range specifying the allowed size for mailbox + tags + comment
406
+
407
+ * tag_separator: nil, character (+)
408
+ Nil, or a character used to split the tag from the mailbox
409
+
410
+ For the mailbox (AKA account, role), without the tag
411
+ * mailbox_size: 1..64
412
+ A Range specifying the allowed size for mailbox
413
+
414
+ * mailbox_canonical: nil, ->(mailbox) { mailbox }
415
+ An optional lambda/Proc taking a mailbox name, returning a canonical
416
+ version of it. (E.G.: gmail removes '.' characters)
417
+
418
+ * mailbox_validator: nil, ->(mailbox) { true }
419
+ An optional lambda/Proc taking a mailbox name, returning true or false.
420
+
421
+ * host_encoding: :punycode, :unicode,
422
+ How to treat International Domain Names (IDN). Note that most mail and
423
+ DNS systems do not support unicode, so punycode needs to be passed.
424
+ :punycode Convert Unicode names to punycode representation
425
+ :unicode Keep Unicode names as is.
426
+
427
+ * host_validation:
428
+ :mx Ensure host is configured with DNS MX records
429
+ :a Ensure host is known to DNS (A Record)
430
+ :syntax Validate by syntax only, no Network verification
431
+ :connect Attempt host connection (not implemented, BAD!)
432
+
433
+ * host_size: 1..253,
434
+ A range specifying the size limit of the host part,
435
+
436
+ * host_allow_ip: false,
437
+ Allow IP address format in host: [127.0.0.1], [IPv6:::1]
438
+
439
+ * address_validation: :parts, :smtp, ->(address) { true }
440
+ Address validation policy
441
+ :parts Validate local and host.
442
+ :smtp Validate via SMTP (not implemented, BAD!)
443
+ A lambda/Proc taking the address string, returning true or false
444
+
445
+ * address_size: 3..254,
446
+ A range specifying the size limit of the complete address
447
+
448
+ * address_local: false,
449
+ Allow localhost, no domain, or local subdomains.
450
+
451
+ For provider rules to match to domain names and Exchanger hosts
452
+ The value is an array of match tokens.
453
+ * host_match: %w(.org example.com hotmail. user*@ sub.*.com)
454
+ * exchanger_match: %w(google.com 127.0.0.1 10.9.8.0/24 ::1/64)
455
+
456
+ ## Notes
457
+
458
+ #### Internationalization
459
+
460
+ The industry is moving to support Unicode characters in the local part
461
+ of the email address. Currently, SMTP supports only 7-bit ASCII, but a
462
+ new `SMTPUTF8` standard is available, but not yet widely implemented.
463
+ To work properly, global Email systems must be converted to UTF-8
464
+ encoded databases and upgraded to the new email standards.
465
+
466
+ The problem with i18n email addresses is that support outside of the
467
+ given locale becomes hard to enter addresses on keyboards for another
468
+ locale. Because of this, internationalized local parts are not yet
469
+ supported by default. They are more likely to be erroneous.
470
+
471
+ Proper personal identity can still be provided using
472
+ [MIME Encoded-Words](http://en.wikipedia.org/wiki/MIME#Encoded-Word)
473
+ in Email headers.
474
+
475
+
476
+ #### Email Addresses as Sensitive Data
477
+
478
+ Like Social Security and Credit Card Numbers, email addresses are
479
+ becoming more important as a personal identifier on the internet.
480
+ Increasingly, we should treat email addresses as sensitive data. If your
481
+ site/database becomes compromised by hackers, these email addresses can
482
+ be stolen and used to spam your users and to try to gain access to their
483
+ accounts. You should not be storing passwords in plain text; perhaps you
484
+ don't need to store email addresses un-encoded either.
485
+
486
+ Consider this: upon registration, store the redacted email address for
487
+ the user, and of course, the salted, encrypted password.
488
+ When the user logs in, compute the redacted email address from
489
+ the user-supplied one and look up the record. Store the original address
490
+ in the session for the user, which goes away when the user logs out.
491
+
492
+ Sometimes, users demand you strike their information from the database.
493
+ Instead of deleting their account, you can "redact" their email
494
+ address, retaining the state of the account to prevent future
495
+ access. Given the original email address again, the redacted account can
496
+ be identified if necessary.
497
+
498
+ Because of these use cases, the **redact** method on the email address
499
+ instance has been provided.
255
500
 
256
501
  ## Contributing
257
502
 
@@ -260,3 +505,12 @@ See `lib/email_address/config.rb` for more options.
260
505
  3. Commit your changes (`git commit -am 'Add some feature'`)
261
506
  4. Push to the branch (`git push origin my-new-feature`)
262
507
  5. Create new Pull Request
508
+
509
+ #### Project
510
+
511
+ This project lives at [https://github.com/afair/email_address/](https://github.com/afair/email_address/)
512
+
513
+ #### Authors
514
+
515
+ * [Allen Fair](https://github.com/afair) ([@allenfair](https://twitter.com/allenfair)):
516
+ I've worked with email-based applications and email addresses since 1999.