email_address 0.0.3 → 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 77e8fb26631e827d41da7eee015141495316f430
4
- data.tar.gz: 0fa66926ca1dbbdc13154aae789850eec7e1eb75
3
+ metadata.gz: 570217e88251b8510966cf63bd1a90a7fa79b458
4
+ data.tar.gz: c23a8a65b94f8a5a6de14f221e8db7f16d0df6e1
5
5
  SHA512:
6
- metadata.gz: 5d41c4ca234699c636811187718fc7b73fe52ff55256ebbeb3e32e518d3330c016a72c3185f9632be0cb4d612064e09408517f893927ddd23db9518d0c219066
7
- data.tar.gz: 93c7fec757b3a635b39aea3bdc6a259ad5c69b0b5369d4ea48e41cde6009b406a8c2eec94a0bc1e42ed837b220a240d971079135e58455a57d0db5ad1e2df3dd
6
+ metadata.gz: d3e7f8dd2a92753b889bc4d8734961fd12626a40d28a06a52ce7e5c9c2691a5a2c62a8a05fc061741d54ec3c3576e1a6bde782b60e1647c3d4770bce95080c52
7
+ data.tar.gz: 9c05fb4ace40a99b51bf5c6646b0cc1d398a857e9e2f3074e3b05418c0562610e51bc7b0cef55ca7be4946b74de53bbff3b8fa69e121c8f064c5b08fd97e854d
data/.travis.yml ADDED
@@ -0,0 +1,10 @@
1
+ language: ruby
2
+ rvm:
3
+ - ruby-head
4
+ - 2.2.2
5
+ - jruby-9.0.4.0
6
+ #- rbx
7
+
8
+ addons:
9
+ code_climate:
10
+ repo_token: 729b2d0e2cc94f63b7da91042da0b6d77bbe30c2024dba04881f46caf702636f
data/Gemfile CHANGED
@@ -2,4 +2,3 @@ source 'https://rubygems.org'
2
2
 
3
3
  # Specify your gem's dependencies in email_address.gemspec
4
4
  gemspec
5
- gem 'minitest'
data/README.md CHANGED
@@ -1,156 +1,140 @@
1
1
  # Email Address
2
2
 
3
3
  [![Gem Version](https://badge.fury.io/rb/email_address.svg)](http://rubygems.org/gems/email_address)
4
+ [![Build Status](https://travis-ci.org/afair/email_address.svg?branch=v0.1)](https://travis-ci.org/afair/email_address)
5
+ [![Code Climate](https://codeclimate.com/github/afair/email_address/badges/gpa.svg)](https://codeclimate.com/github/afair/email_address)
4
6
 
5
- The EmailAddress gem provides a structured datatype for email addresses
6
- and pushes for an _opinionated_ model for which RFC patterns should be
7
- accepted as a "best practice" and which should not be supported (in the
8
- name of sanity).
9
-
10
- This library provides:
11
-
12
- * Email Address Validation
13
- * Converting between email address forms
14
- * **Original:** From the user or data source
15
- * **Normalized:** A standardized format for identification
16
- * **Canonical:** A format used to identify a unique user
17
- * **Redacted:** A format used to store an email address privately
18
- * **Reference:** Digest formats for sharing addresses without exposing
19
- them.
20
- * Matching addresses to Email/Internet Service Providers. Per-provider
21
- rules for:
22
- * Validation
23
- * Address Tag formats
24
- * Canonicalization
25
- * Unicode Support
26
-
27
- ## Email Addresses: The Good Parts
28
-
29
- Email Addresses are split into two parts: the `local` and `host` part,
30
- separated by the `@` symbol, or of the generalized format:
31
-
32
- mailbox+tag@subdomain.domain.tld
33
-
34
- The **Mailbox** usually identifies the user, role account, or application.
35
- A **Tag** is any suffix for the mailbox useful for separating and filtering
36
- incoming email. It is usually preceded by a '+' or other character. Tags are
37
- not always available for a given ESP or MTA.
38
-
39
- Local Parts should consist of lower-case 7-bit ASCII alphanumeric and these characters:
40
- `-+'.,` It should start with and end with an alphanumeric character and
41
- no more than one special character should appear together.
42
-
43
- Host parts contain a lower-case version of any standard domain name.
44
- International Domain Names are allowed, and can be converted to
45
- [Punycode](http://en.wikipedia.org/wiki/Punycode),
46
- an encoding system of Unicode strings into the 7-bit ASCII character set.
47
- Domain names should be configured with MX records in DNS to receive
48
- email, though this is sometimes mis-configured and the A record can be
49
- used as a backup.
50
-
51
- This is the subset of the RFC Email Address specification that should be
52
- used.
53
-
54
- ## Email Addresses: The Bad Parts
55
-
56
- Email addresses are defined and redefined in a series of RFC standards.
57
- Conforming to the full standards is not recommended for easily
58
- identifying and supporting email addresses. Among these specification,
59
- we reject are:
7
+ The `email_address` gem provides a ruby language library for working
8
+ with email addresses.
60
9
 
61
- * Case-sensitive local parts: `First.Last@example.com`
62
- * Spaces and Special Characters: `"():;<>@[\\]`
63
- * Quoting and Escaping Requirements: `"first \"nickname\" last"@example.com`
64
- * Comment Parts: `(comment)mailbox@example.com`
65
- * IP and IPv6 addresses as hosts: `mailbox@[127.0.0.1]`
66
- * Non-ASCII (7-bit) characters in the local part: `Pelé@example.com`
67
- * Validation by regular expressions like:
68
- ```
69
- (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*
70
- | "(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]
71
- | \\[\x01-\x09\x0b\x0c\x0e-\x7f])*")
72
- @ (?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
73
- | \[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
74
- (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:
75
- (?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]
76
- | \\[\x01-\x09\x0b\x0c\x0e-\x7f])+)
77
- \])
78
- ```
79
-
80
- ## Internationalization
10
+ By default, it validates against conventional usage,
11
+ the format preferred for user email addresses.
12
+ It can be configured to validate against RFC "Standard" formats,
13
+ common email service provider formats, and perform DNS validation.
81
14
 
82
- The industry is moving to support Unicode characters in the local part
83
- of the email address. Currently, SMTP supports only 7-bit ASCII, but a
84
- new `SMTPUTF8` standard is available, but not yet widely implemented.
85
- To work properly, global Email systems must be converted to UTF-8
86
- encoded databases and upgraded to the new email standards.
15
+ Using `email_address` to validate user email addresses results in
16
+ fewer "false positives" due to typing errors and gibberish data.
17
+ It validates syntax more strictly for popular email providers,
18
+ and can deal with gmail's "optional dots" in addresses.
87
19
 
88
- The problem with i18n email addresses is that support outside of the
89
- given locale becomes hard to enter addresses on keyboards for another
90
- locale. Because of this, internationalized local parts are not yet
91
- supported by default. They are more likely to be erroneous.
20
+ It provides Active Record (Rails) extensions, including an
21
+ address validator and attributes API custom datatypes.
92
22
 
93
- Proper personal identity can still be provided using
94
- [MIME Encoded-Words](http://en.wikipedia.org/wiki/MIME#Encoded-Word)
95
- in Email headers.
23
+ *Note:* Version 0.1.0 contains significant API and internal changes over the 0.0.3
24
+ version. If you have been using the 0.0.x series of the gem, you may
25
+ want to continue using with your current version.
96
26
 
97
- ## Email Addresses Forms
98
-
99
- * The **original** email address is of the format given by the user.
100
- * The **Normalized** address has:
101
- * Lower-case the local and domain part
102
- * Tags are kept as they are important for the user
103
- * Remove comments and any "bad parts"
104
- * This format is what should be used to identify the account.
105
- * The **Canonical** form is used to uniquely identify the mailbox.
106
- * Domains stored as punycode for IDN
107
- * Address Tags removed
108
- * Special characters removed (dots in gmail addresses are not
109
- significant)
110
- * Lower cased and "bad parts" removed
111
- * Useful for locating a user who forgets registering with a tag or
112
- with a "Bad part" in the email address.
113
- * The **Redacted** format is used to store email address fingerprints
114
- instead of the actual addresses:
115
- * Format: sha1(canonical_address)@domain
116
- * Given an email address, the record can be found
117
- * Useful for treating email addresses as sensitive data and
118
- complying with requests to remove the address from your database and
119
- still maintain the state of the account.
120
- * The **Reference** form allows you to publicly share an address without
121
- revealing the actual address.
122
- * Can be the MD5 or SHA1 of the normalized or canonical address
123
- * Useful for "do not email" lists
124
- * Useful for cookies that do not reveal the actual account
125
-
126
- ## Treating Email Addresses as Sensitive Data
27
+ Requires Ruby 2.0 or later.
127
28
 
128
- Like Social Security and Credit Card Numbers, email addresses are
129
- becoming more important as a personal identifier on the internet.
130
- Increasingly, we should treat email addresses as sensitive data. If your
131
- site/database becomes compromised by hackers, these email addresses can
132
- be stolen and used to spam your users and to try to gain access to their
133
- accounts. You should not be storing passwords in plain text; perhaps you
134
- don't need to store email addresses un-encoded either.
29
+ ## Background
135
30
 
136
- Consider this: upon registration, store the redacted email address for
137
- the user, and of course, the salted, encrypted password.
138
- When the user logs in, compute the redacted email address from
139
- the user-supplied one and look up the record. Store the original address
140
- in the session for the user, which goes away when the user logs out.
31
+ The email address specification is complex and often not what you want
32
+ when working with personal email addresses in applications. This library
33
+ introduces terms to distinguish types of email addresses.
141
34
 
142
- Sometimes, users demand you strike their information from the database.
143
- Instead of deleting their account, you can "redact" their email
144
- address, retaining the state of the account to prevent future
145
- access. Given the original email address again, the redacted account can
146
- be identified if necessary.
35
+ * *Normal* - The edited form of any input email address. Typically, it
36
+ is lower-cased and minor "fixes" can be performed, depending on the
37
+ configurations and email address provider.
147
38
 
148
- Because of these use cases, the **redact** method on the email address
149
- instance has been provided.
39
+ CKENT@DAILYPLANET.NEWS => ckent@dailyplanet.news
40
+
41
+ * *Conventional* - Most personal account addresses are in this basic
42
+ format, one or more "words" separated by a single simple punctuation
43
+ character. It consists of a mailbox (user name or role account) and
44
+ an optional address "tag" assigned by the user.
45
+
46
+ miles.o'brien@ncc-1701-d.ufp
47
+
48
+ * *Relaxed* - A less strict form of Conventional, same character set,
49
+ must begin and end with an alpha-numeric character, but order within
50
+ is not enforced.
51
+
52
+ aasdf-34-.z@example.com
53
+
54
+ * *Standard* - The RFC-Compliant syntax of an email address. This is
55
+ useful when working with software-generated addresses or handling
56
+ existing email addresses, but otherwise not useful for personal
57
+ addresses.
58
+
59
+ madness!."()<>[]:,;@\\\"!#$%&'*+-/=?^_`{}| ~.a(comment )"@example.org
150
60
 
151
- ## Installation
61
+ * *Canonical* - An unique account address, lower-cased, without the
62
+ tag, and with irrelevant characters stripped.
152
63
 
153
- Add this line to your application's Gemfile:
64
+ clark.kent+scoops@gmail.com => clarkkent@gmail.com
65
+
66
+ * *Reference* - The MD5 of the Canonical format, used to share account
67
+ references without exposing the private email address directly.
68
+
69
+ Clark.Kent+scoops@gmail.com => c5be3597c391169a5ad2870f9ca51901
70
+
71
+ * *Redacted* - A form of the email address where it is replaced by
72
+ a SHA1-based version to remove the original address from the
73
+ database, or to store the address privately, yet still keep it
74
+ accessible at query time by converting the queried address to
75
+ the redacted form.
76
+
77
+ Clark.Kent+scoops@gmail.com => {bea3f3560a757f8142d38d212a931237b218eb5e}@gmail.com
78
+
79
+ * *Munged* - An obfuscated version of the email address suitable for
80
+ publishing on the internet, where email address harvesting
81
+ could occur.
82
+
83
+ Clark.Kent+scoops@gmail.com => cl\*\*\*\*\*@gm\*\*\*\*\*
84
+
85
+ Other terms:
86
+
87
+ * *Local* - The left-hand side of the "@", representing the user,
88
+ mailbox, or role, and an optional "tag".
89
+
90
+ mailbox+tag@example.com; Local part: mailbox+tag
91
+
92
+ * *Mailbox* - The destination user account or role account.
93
+ * *Tag* - A parameter added after the mailbox, usually after the
94
+ "+" symbol, set by the user for mail filtering and sub-accounts.
95
+ Not all mail systems support this.
96
+ * *Host* (sometimes called *Domain*) - The right-hand side of the "@"
97
+ indicating the domain or host name server to delivery the email.
98
+ If missing, "localhost" is assumed, or if not a fully-qualified
99
+ domain name, it assumed another computer on the same network, but
100
+ this is increasingly rare.
101
+ * *Provider* - The Email Service Provider (ESP) providing the email
102
+ service. Each provider may have its own email address validation
103
+ and canonicalization rules.
104
+ * *Punycode* - A host name with Unicode characters (International
105
+ Domain Name or IDN) needs conversion to this ASCII-encoded format
106
+ for DNS lookup.
107
+
108
+ "HIRO@こんにちは世界.com" => "hiro@xn--28j2a3ar1pp75ovm7c.com"
109
+
110
+ Wikipedia has a great article on
111
+ [Email Addresses](https://en.wikipedia.org/wiki/Email_address),
112
+ much more readable than the section within
113
+ [RFC 5322](https://tools.ietf.org/html/rfc5322#section-3.4)
114
+
115
+ ## Avoiding the Bad Parts of RFC Specification
116
+
117
+ Following the RFC specification sounds like a good idea, until you
118
+ learn about all the madness contained therein. This library can
119
+ validate the RFC syntax, but this is never useful, especially when
120
+ validating user email address submissions. By default, it validates
121
+ to the *conventional* format.
122
+
123
+ Here are a few parts of the RFC specification you should avoid:
124
+
125
+ * Case-sensitive local parts: `First.Last@example.com`
126
+ * Spaces and Special Characters: `"():;<>@[\\]`
127
+ * Quoting and Escaping Requirements: `"first \"nickname\" last"@example.com`
128
+ * Comment Parts: `(comment)mailbox@example.com`
129
+ * IP and IPv6 addresses as hosts: `mailbox@[127.0.0.1]`
130
+ * Non-ASCII (7-bit) characters in the local part: `Pelé@example.com`
131
+ * Validation by voodoo regular expressions
132
+ * Gmail allows ".." in addresses since they are not meaningful, but
133
+ the standard does not.
134
+
135
+ ## Installation With Rails or Bundler
136
+
137
+ If you are using Rails or a project with Bundler, add this line to your application's Gemfile:
154
138
 
155
139
  gem 'email_address'
156
140
 
@@ -158,100 +142,361 @@ And then execute:
158
142
 
159
143
  $ bundle
160
144
 
161
- Or install it yourself as:
145
+ ## Installation Without Bundler
146
+
147
+ If you are not using Bundler, you need to install the gem yourself.
162
148
 
163
149
  $ gem install email_address
164
150
 
151
+ Require the gem inside your script.
152
+
153
+ require 'rubygems'
154
+ require 'email_address'
155
+
165
156
  ## Usage
166
157
 
167
- Inspect your email address string by creating an instance of
168
- EmailAddress:
158
+ Use `EmailAddress` to do transformations and validations. You can also
159
+ instantiate an object to inspect the address.
169
160
 
170
- email = EmailAddress.new("USER+tag@EXAMPLE.com")
171
- email.normalize #=> "user+tag@example.com"
172
- email.canonical #=> "user@example.com"
173
- email.redact #=> "63a710569261a24b3766275b7000ce8d7b32e2f7@example.com"
174
- email.sha1 #=> "63a710569261a24b3766275b7000ce8d7b32e2f7"
175
- email.md5 #=> "dea073fb289e438a6d69c5384113454c"
161
+ These top-level helpers return edited email addresses and validation
162
+ check.
176
163
 
177
- Email Service Provider (ESP) specific edits can be created to provide
178
- validations and canonical manipulations. A few are given out of the box.
179
- Providers can be defined bu email domain match rules, or by match rules
180
- for the MX host names using domains or CIDR addresses.
164
+ address = "Clark.Kent+scoops@gmail.com"
165
+ EmailAddress.valid?(address) #=> true
166
+ EmailAddress.normal(address) #=> "clark.kent+scoops@gmail.com"
167
+ EmailAddress.canonical(address) #=> "clarkkent@gmail.com"
168
+ EmailAddress.reference(address) #=> "c5be3597c391169a5ad2870f9ca51901"
169
+ EmailAddress.redact(address) #=> "{bea3f3560a757f8142d38d212a931237b218eb5e}@gmail.com"
170
+ EmailAddress.munge(address) #=> "cl*****@gm*****"
171
+ EmailAddress.matches?(address, 'google') #=> 'google' (true)
172
+ EmailAddress.error("#bad@example.com") #=> "Invalid Mailbox"
181
173
 
182
- email = EmailAddress.new("First.Last+Tag@Gmail.Com")
174
+ Or you can create an instance of the email address to work with it.
175
+
176
+ email = EmailAddress.new(address) #=> #<EmailAddress::Address:0x007fe6ee150540 ...>
177
+ email.normal #=> "clark.kent+scoops@gmail.com"
178
+ email.canonical #=> "clarkkent@gmail.com"
179
+ email.original #=> "Clark.Kent+scoops@gmail.com"
180
+ email.valid? #=> true
181
+
182
+ Here are some other methods that are available.
183
+
184
+ email.redact #=> "{bea3f3560a757f8142d38d212a931237b218eb5e}@gmail.com"
185
+ email.sha1 #=> "bea3f3560a757f8142d38d212a931237b218eb5e"
186
+ email.md5 #=> "c5be3597c391169a5ad2870f9ca51901"
187
+ email.host_name #=> "gmail.com"
183
188
  email.provider #=> :google
184
- email.canonical #=> "firstlast@gmail.com"
189
+ email.mailbox #=> "clark.kent"
190
+ email.tag #=> "scoops"
191
+
192
+ email.host.exchanger.first[:ip] #=> "2a00:1450:400b:c02::1a"
193
+ email.host.txt_hash #=> {:v=>"spf1", :redirect=>"\_spf.google.com"}
194
+
195
+ EmailAddress.normal("HIRO@こんにちは世界.com")
196
+ #=> "hiro@xn--28j2a3ar1pp75ovm7c.com"
197
+ EmailAddress.normal("hiro@xn--28j2a3ar1pp75ovm7c.com", host_encoding: :unicode)
198
+ #=> "hiro@こんにちは世界.com"
199
+
200
+ #### Rails Validator
201
+
202
+ For Rails' ActiveRecord classes, EmailAddress provides an ActiveRecordValidator.
203
+ Specify your email address attributes with `field: :user_email`, or
204
+ `fields: [:email1, :email2]`. If neither is given, it assumes to use the
205
+ `email` or `email_address` attribute.
206
+
207
+ class User < ActiveRecord::Base
208
+ validates_with EmailAddress::ActiveRecordValidator, field: :email
209
+ end
210
+
211
+ #### Rails Email Address Type Attribute
212
+
213
+ Initial support is provided for Active Record 5.0 attributes API.
214
+
215
+ First, you need to register the type in
216
+ `config/initializers/email_address.rb` along with any global
217
+ configurations you want.
218
+
219
+ ActiveRecord::Type.register(:email_address, EmailAddress::EmailAddressType)
220
+ ActiveRecord::Type.register(:canonical_email_address,
221
+ EmailAddress::CanonicalEmailAddressType)
222
+
223
+ Assume the Users table contains the columns "email" and "canonical_email".
224
+ We want to normalize the address in "email" and store the canonical/unique
225
+ version in "canonical_email". This code will set the canonical_email when
226
+ the email attribute is assigned. With the canonical_email column,
227
+ we can look up the User, even it the given email address didn't exactly
228
+ match the registered version.
229
+
230
+ class User < ApplicationRecord
231
+ attribute :email, :email_address
232
+ attribute :canonical_email, :canonical_email_address
185
233
 
186
- Storing the canonical address with the request address (don't remove
187
- tags given by users), you can lookup email addresses without the
188
- original formatting, case, and tag information.
234
+ validates_with EmailAddress::ActiveRecordValidator,
235
+ fields: %i(email canonical_email)
189
236
 
190
- You can inspect the MX (Mail Exchanger) records
237
+ def email=(email_address)
238
+ self[:canonical_email] = email_address
239
+ self[:email] = email_address
240
+ end
191
241
 
192
- email.host.exchanger.mxers.first
193
- #=> {:host=>"alt3.gmail-smtp-in.l.google.com", :ip=>"173.194.70.27", :priority=>30}
242
+ def self.find_by_email(email)
243
+ user = self.find_by(email: EmailAddress.normal(email))
244
+ user ||= self.find_by(canonical_email: EmailAddress.canonical(email))
245
+ user ||= self.find_by(canonical_email: EmailAddress.redacted(email))
246
+ user
247
+ end
194
248
 
195
- You can see if it validates as an opinionated address:
249
+ def redact!
250
+ self[:canonical_email] = EmailAddress.redact(self.canonical_email)
251
+ self[:email] = self[:canonical_email]
252
+ end
253
+ end
254
+
255
+ Here is how the User model works:
256
+
257
+ user = User.create(email:"Pat.Smith+registrations@gmail.com")
258
+ user.email #=> "pat.smith+registrations@gmail.com"
259
+ user.canonical_email #=> "patsmith@gmail.com"
260
+ User.find_by_email("PAT.SMITH@GMAIL.COM")
261
+ #=> #<User email="pat.smith+registrations@gmail.com">
262
+
263
+
264
+ The `find_by_email` method looks up a given email address by the
265
+ normalized form (lower case), then by the canonical form, then finally
266
+ by the redacted form.
267
+
268
+ #### Validation
196
269
 
197
- email.valid? # Resonably valid?
198
- email.errors #=> [:mx]
199
- email.valid_host? # Host name is defined in DNS
200
- email.strict? # Strictly valid?
270
+ The only true validation is to send a message to the email address and
271
+ have the user (or process) verify it has been received. Syntax checks
272
+ help prevent erroneous input. Even sent messages can be silently
273
+ dropped, or bounced back after acceptance. Conditions such as a
274
+ "Mailbox Full" can mean the email address is known, but abandoned.
275
+
276
+ There are different levels of validations you can perform. By default, it will
277
+ validate to the "Provider" (if known), or "Conventional" format defined as the
278
+ "default" provider. You may pass a a list of parameters to select
279
+ which syntax and network validations to perform.
280
+
281
+ #### Comparison
201
282
 
202
283
  You can compare email addresses:
203
284
 
204
- e1 = EmailAddress.new("First.Last@Gmail.com")
205
- e1.to_s #=> "first.last@gmail.com"
206
- e2 = EmailAddress.new("FirstLast+tag@Gmail.com")
207
- e3.to_s #=> "firstlast+tag@gmail.com"
285
+ e1 = EmailAddress.new("Clark.Kent@Gmail.com")
286
+ e2 = EmailAddress.new("clark.kent+Superman@Gmail.com")
208
287
  e3 = EmailAddress.new(e2.redact)
209
- e3.to_s #=> "554d32017ab3a7fcf51c88ffce078689003bc521@gmail.com"
288
+ e1.to_s #=> "clark.kent@gmail.com"
289
+ e2.to_s #=> "clark.kent+superman@gmail.com"
290
+ e3.to_s #=> "{bea3f3560a757f8142d38d212a931237b218eb5e}@gmail.com"
210
291
 
211
292
  e1 == e2 #=> false (Matches by normalized address)
212
293
  e1.same_as?(e2) #=> true (Matches as canonical address)
213
294
  e1.same_as?(e3) #=> true (Matches as redacted address)
214
295
  e1 < e2 #=> true (Compares using normalized address)
215
296
 
216
- ## Host Inspection
297
+ #### Matching
217
298
 
218
- The `EmailAddress::Host` can be used to inspect the email domain.
299
+ Matching addresses by simple patterns:
219
300
 
220
- ```ruby
221
- e1 = EmailAddress.new("First.Last@Gmail.com")
222
- e1.host.name #=> "gmail.com"
223
- e1.host.exchanger.mxers #=> [["alt4.gmail-smtp-in.l.google.com", "2a00:1450:400c:c01::1b", 30],...]
224
- e1.host.exchanger.mx_ips #=> ["2a00:1450:400c:c01::1b", ...]
225
- e1.host.matches?('.com') #=> true
226
- e1.host.txt #=> "v=spf1 redirect=_spf.google.com"
227
- ```
301
+ * Top-Level-Domain: .org
302
+ * Domain Name: example.com
303
+ * Registration Name: hotmail. (matches any TLD)
304
+ * Domain Glob: *.exampl?.com
305
+ * Provider Name: google
306
+ * Mailbox Name or Glob: user00*@
307
+ * Address or Glob: postmaster@domain*.com
308
+ * Provider or Registration: msn
228
309
 
229
- ## Domain Matching
310
+ Usage:
230
311
 
231
- You can also employ domain matching rules
312
+ e = EmailAddress.new("Clark.Kent@Gmail.com")
313
+ e.matches?("gmail.com") #=> true
314
+ e.matches?("google") #=> true
315
+ e.matches?(".org") #=> false
316
+ e.matches?("g*com") #=> true
317
+ e.matches?("gmail.") #=> true
318
+ e.matches?("*kent*@") #=> true
232
319
 
233
- email.host.matches?('gmail.com', '.us', '.msn.com', 'yahoo')
320
+ ### Configuration
234
321
 
235
- This tests the address can be matched in the given list of domain rules:
322
+ You can pass an options hash on the `.new()` and helper class methods to
323
+ control how the library treats that address. These can also be
324
+ configured during initialization by provider and default (see below).
236
325
 
237
- * Full host name. (subdomain.example.com)
238
- * TLD and domain wildcards (.us, .msg.com)
239
- * Registration names matching without the TLD. 'yahoo' matches:
240
- * "www.yahoo.com" (with Subdomains)
241
- * "yahoo.ca" (any TLD)
242
- * "yahoo.co.jp" (2-char TLD with 2-char Second-level)
243
- * But _may_ also match non-Yahoo domain names (yahoo.xxx)
326
+ EmailAddress.new("clark.kent@gmail.com",
327
+ dns_lookup::off, host_encoding: :unicode)
244
328
 
245
- ## Customizing
329
+ Globally, you can change and query configuration options:
246
330
 
247
- You can change configuration options and add new providers such as:
331
+ EmailAddress::Config.setting(:dns_lookup, :mx)
332
+ EmailAddress::Config.setting(:dns_lookup) #=> :mx
248
333
 
249
- EmailAddress::Config.setup do
250
- provider :github, domains:%w(github.com github.io)
251
- option :check_dns, false
252
- end
334
+ Or set multiple settings at once:
335
+
336
+ EmailAddress::Config.configure(local_downcase:false, dns_lookup: :off)
337
+
338
+ You can add special rules by domain or provider. It takes the options
339
+ above and adds the :domain_match and :exchanger_match rules.
340
+
341
+ EmailAddress.define_provider('google',
342
+ domain_match: %w(gmail.com googlemail.com),
343
+ exchanger_match: %w(google.com), # Requires dns_lookup==:mx
344
+ local_size: 5..64,
345
+ mailbox_canonical: ->(m) {m.gsub('.','')})
346
+
347
+ The library ships with the most common set of provider rules. It is not meant
348
+ to house a database of all providers, but a separate `email_address-providers`
349
+ gem may be created to hold this data for those who need more complete rules.
350
+
351
+ Personal and Corporate email systems are not intended for either solution.
352
+ Any of these email systems may be configured locally.
353
+
354
+ Pre-configured email address providers include: Google (gmail), AOL, MSN
355
+ (hotmail, live, outlook), and Yahoo. Any address not matching one of
356
+ those patterns use the "default" provider rule set. Exchanger matches
357
+ matches against the Mail Exchanger (SMTP receivers) hosts defined in
358
+ DNS. If you specify an exchanger pattern, but requires a DNS MX lookup.
359
+
360
+ For Rails application, create an initializer file with your default
361
+ configuration options:
362
+
363
+ # ./config/initializers/email_address.rb
364
+ EmailAddress::Config.setting( local_format: :relaxed )
365
+ EmailAddress::Config.provider(:github,
366
+ host_match: %w(github.com), local_format: :standard)
367
+
368
+ ### Available Configuration Settings
369
+
370
+ * dns_lookup: Enables DNS lookup for validation by
371
+ * :mx - DNS MX Record lookup
372
+ * :a - DNS A Record lookup (as some domains don't specify an MX incorrectly)
373
+ * :off - Do not perform DNS lookup (Test mode, network unavailable)
374
+
375
+ * sha1_secret -
376
+ This application-level secret is appended to the email_address to compute
377
+ the SHA1 Digest, making it unique to your application so it can't easily be
378
+ discovered by comparing against a known list of email/sha1 pairs.
379
+
380
+ * munge_string - "*****", the string to replace into munged addresses.
381
+
382
+ For local part configuration:
253
383
 
254
- See `lib/email_address/config.rb` for more options.
384
+ * local_downcase: true.
385
+ Downcase the local part. You probably want this for uniqueness.
386
+ RFC says local part is case insensitive, that's a bad part.
387
+
388
+ * local_fix: true.
389
+ Make simple fixes when available, remove spaces, condense multiple punctuations
390
+
391
+ * local_encoding: :ascii, :unicode,
392
+ Enable Unicode in local part. Most mail systems do not yet support this.
393
+ You probably want to stay with ASCII for now.
394
+
395
+ * local_parse: nil, ->(local) { [mailbox, tag, comment] }
396
+ Specify an optional lambda/Proc to parse the local part. It should return an
397
+ array (tuple) of mailbox, tag, and comment.
398
+
399
+ * local_format:
400
+ * :conventional - word ( puncuation{1} word )*
401
+ * :relaxed - alphanum ( allowed_characters)* alphanum
402
+ * :standard - RFC Compliant email addresses (anything goes!)
403
+
404
+ * local_size: 1..64,
405
+ A Range specifying the allowed size for mailbox + tags + comment
406
+
407
+ * tag_separator: nil, character (+)
408
+ Nil, or a character used to split the tag from the mailbox
409
+
410
+ For the mailbox (AKA account, role), without the tag
411
+ * mailbox_size: 1..64
412
+ A Range specifying the allowed size for mailbox
413
+
414
+ * mailbox_canonical: nil, ->(mailbox) { mailbox }
415
+ An optional lambda/Proc taking a mailbox name, returning a canonical
416
+ version of it. (E.G.: gmail removes '.' characters)
417
+
418
+ * mailbox_validator: nil, ->(mailbox) { true }
419
+ An optional lambda/Proc taking a mailbox name, returning true or false.
420
+
421
+ * host_encoding: :punycode, :unicode,
422
+ How to treat International Domain Names (IDN). Note that most mail and
423
+ DNS systems do not support unicode, so punycode needs to be passed.
424
+ :punycode Convert Unicode names to punycode representation
425
+ :unicode Keep Unicode names as is.
426
+
427
+ * host_validation:
428
+ :mx Ensure host is configured with DNS MX records
429
+ :a Ensure host is known to DNS (A Record)
430
+ :syntax Validate by syntax only, no Network verification
431
+ :connect Attempt host connection (not implemented, BAD!)
432
+
433
+ * host_size: 1..253,
434
+ A range specifying the size limit of the host part,
435
+
436
+ * host_allow_ip: false,
437
+ Allow IP address format in host: [127.0.0.1], [IPv6:::1]
438
+
439
+ * address_validation: :parts, :smtp, ->(address) { true }
440
+ Address validation policy
441
+ :parts Validate local and host.
442
+ :smtp Validate via SMTP (not implemented, BAD!)
443
+ A lambda/Proc taking the address string, returning true or false
444
+
445
+ * address_size: 3..254,
446
+ A range specifying the size limit of the complete address
447
+
448
+ * address_local: false,
449
+ Allow localhost, no domain, or local subdomains.
450
+
451
+ For provider rules to match to domain names and Exchanger hosts
452
+ The value is an array of match tokens.
453
+ * host_match: %w(.org example.com hotmail. user*@ sub.*.com)
454
+ * exchanger_match: %w(google.com 127.0.0.1 10.9.8.0/24 ::1/64)
455
+
456
+ ## Notes
457
+
458
+ #### Internationalization
459
+
460
+ The industry is moving to support Unicode characters in the local part
461
+ of the email address. Currently, SMTP supports only 7-bit ASCII, but a
462
+ new `SMTPUTF8` standard is available, but not yet widely implemented.
463
+ To work properly, global Email systems must be converted to UTF-8
464
+ encoded databases and upgraded to the new email standards.
465
+
466
+ The problem with i18n email addresses is that support outside of the
467
+ given locale becomes hard to enter addresses on keyboards for another
468
+ locale. Because of this, internationalized local parts are not yet
469
+ supported by default. They are more likely to be erroneous.
470
+
471
+ Proper personal identity can still be provided using
472
+ [MIME Encoded-Words](http://en.wikipedia.org/wiki/MIME#Encoded-Word)
473
+ in Email headers.
474
+
475
+
476
+ #### Email Addresses as Sensitive Data
477
+
478
+ Like Social Security and Credit Card Numbers, email addresses are
479
+ becoming more important as a personal identifier on the internet.
480
+ Increasingly, we should treat email addresses as sensitive data. If your
481
+ site/database becomes compromised by hackers, these email addresses can
482
+ be stolen and used to spam your users and to try to gain access to their
483
+ accounts. You should not be storing passwords in plain text; perhaps you
484
+ don't need to store email addresses un-encoded either.
485
+
486
+ Consider this: upon registration, store the redacted email address for
487
+ the user, and of course, the salted, encrypted password.
488
+ When the user logs in, compute the redacted email address from
489
+ the user-supplied one and look up the record. Store the original address
490
+ in the session for the user, which goes away when the user logs out.
491
+
492
+ Sometimes, users demand you strike their information from the database.
493
+ Instead of deleting their account, you can "redact" their email
494
+ address, retaining the state of the account to prevent future
495
+ access. Given the original email address again, the redacted account can
496
+ be identified if necessary.
497
+
498
+ Because of these use cases, the **redact** method on the email address
499
+ instance has been provided.
255
500
 
256
501
  ## Contributing
257
502
 
@@ -260,3 +505,12 @@ See `lib/email_address/config.rb` for more options.
260
505
  3. Commit your changes (`git commit -am 'Add some feature'`)
261
506
  4. Push to the branch (`git push origin my-new-feature`)
262
507
  5. Create new Pull Request
508
+
509
+ #### Project
510
+
511
+ This project lives at [https://github.com/afair/email_address/](https://github.com/afair/email_address/)
512
+
513
+ #### Authors
514
+
515
+ * [Allen Fair](https://github.com/afair) ([@allenfair](https://twitter.com/allenfair)):
516
+ I've worked with email-based applications and email addresses since 1999.