extended_email_reply_parser 0.1.0 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
data/CHANGELOG.md CHANGED
@@ -10,6 +10,28 @@ This project adheres to [Semantic Versioning](http://semver.org/).
10
10
  ### Removed
11
11
  ### Fixed
12
12
 
13
+ ## ExtendedEmailReplyParser 0.2.0 (2016-07-22)
14
+ ### Added
15
+ - `Parsers::Base#hide_everything_after(expressions)` is useful when email clients do not quote the previous conversation. This parser method hides everything lead by a series of expressions, e.g. `hide_everything_after %w(From: Sent: To:)`.
16
+ - `Parsers::Base#except_in_visible_block_quotes`. Within this block, `hide_everything_after` is not applied. This is useful when a quote is already marked as to be shown.
17
+ - German parser `Parsers::I18nDe`, which removes previous conversation by searching for the phrases "Gesendet: Von: An:" and "Am ... schrieb ...:".
18
+ - Support for i18n-ed header lines. The github parser only knows "On ... wrote". Since this is needed when the github parser runs, specify additional regexes in the class header of the parsers using `add_quote_header_regex`, for example: `add_quote_header_regex '^Am .* schrieb.*$'`.
19
+ - The German parser adds the regex for quote headers like "Am ... schrieb ...:".
20
+ - Remove empty lines between quote lines:
21
+
22
+ > Hi,
23
+ > how are you doing?
24
+ > Cheers
25
+
26
+ rather than
27
+
28
+ > Hi,
29
+
30
+ > how are you doing?
31
+
32
+ > Cheers
33
+ - English parser `Parsers::I18nEn`, which removes previous conversation by searching for the phrases "From: Sent: To".
34
+
13
35
  ## ExtendedEmailReplyParser 0.1.0 (2016-07-22)
14
36
  ### Added
15
37
  - `ExtendedEmailReplyParser.read "/path/to/email.eml"` returns the corresponding `Mail::Message` object.
data/Gemfile CHANGED
@@ -2,3 +2,5 @@ source 'https://rubygems.org'
2
2
 
3
3
  # Specify your gem's dependencies in extended_email_reply_parser.gemspec
4
4
  gemspec
5
+
6
+ gem "codeclimate-test-reporter", group: :test, require: nil
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # ExtendedEmailReplyParser
2
2
 
3
- [![Build Status](https://travis-ci.org/fiedl/extended_email_reply_parser.svg?branch=master)](https://travis-ci.org/fiedl/extended_email_reply_parser)
3
+ [![Join the chat at https://gitter.im/fiedl/extended_email_reply_parser](https://badges.gitter.im/fiedl/extended_email_reply_parser.svg)](https://gitter.im/fiedl/extended_email_reply_parser?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) [![Build Status](https://travis-ci.org/fiedl/extended_email_reply_parser.svg?branch=master)](https://travis-ci.org/fiedl/extended_email_reply_parser) [![Code Climate](https://codeclimate.com/github/fiedl/extended_email_reply_parser/badges/gpa.svg)](https://codeclimate.com/github/fiedl/extended_email_reply_parser) [![Test Coverage](https://codeclimate.com/github/fiedl/extended_email_reply_parser/badges/coverage.svg)](https://codeclimate.com/github/fiedl/extended_email_reply_parser/coverage) [![Gem Version](https://badge.fury.io/rb/extended_email_reply_parser.svg)](https://badge.fury.io/rb/extended_email_reply_parser) [![Documentation](https://img.shields.io/badge/documentation-rubydoc.info-blue.svg)](http://www.rubydoc.info/github/fiedl/extended_email_reply_parser/)
4
4
 
5
5
  When implementing a "reply or comment by email" feature, it's neccessary to filter out signatures and the previous conversation. One needs to extract just the relevant parts for the conversation or comment section of the application. This is what this [ruby](https://www.ruby-lang.org) gem helps to do.
6
6
 
@@ -81,7 +81,6 @@ EmailParsers::ShoutParser.parse \
81
81
  ```
82
82
 
83
83
 
84
-
85
84
  ## Installation
86
85
 
87
86
  Add this line to your application's Gemfile:
@@ -105,6 +104,108 @@ After checking out the repo, run `bin/setup` to install dependencies. Then, run
105
104
 
106
105
  To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
107
106
 
107
+ ### Helper methods for writing parsers
108
+
109
+ To accomplish the most common parsing operations, there are a couple of helper methods.
110
+
111
+ This, for example, is the [English parser](lib/extended_email_reply_parser/parsers/i18n_en.rb).
112
+
113
+ ```ruby
114
+ module ExtendedEmailReplyParser
115
+ class Parsers::I18nEn < Parsers::Base
116
+
117
+ def parse
118
+ except_in_visible_block_quotes do
119
+ hide_everything_after ["From: ", "Sent: ", "To: "]
120
+ end
121
+ end
122
+
123
+ end
124
+ end
125
+ ```
126
+
127
+ #### `add_quote_header_regex`
128
+
129
+ The [github parser](https://github.com/github/email_reply_parser) needs to know how to identify the header line of quotes, for example "On Tue, 2011-03-01 at 18:02 +0530, Abhishek Kona wrote":
130
+
131
+ Hi,
132
+
133
+ On Tue, 2011-03-01 at 18:02 +0530, Abhishek Kona wrote:
134
+ > Hi folks
135
+ >
136
+ > What is the best way to clear a Riak bucket of all key, values after
137
+ > running a test?
138
+ > I am currently using the Java HTTP API.
139
+
140
+ You can list the keys for the bucket and call delete for each. Or if you
141
+ put the keys (and kept track of them in your test) you can delete them
142
+ one at a time (without incurring the cost of calling list first.)
143
+
144
+ By default, it uses the regex `/^On .* wrote:$/` for that. To make it recognize other header lines, specify their patterns using `add_quote_header_regex`.
145
+
146
+ Since this is needed by the github parser, i.e. possibly before the `parse` method of your custom parser is run, make sure to add the quote header regex in the class head:
147
+
148
+ ```ruby
149
+ module ExtendedEmailReplyParser
150
+ class Parsers::I18nDe < Parsers::Base
151
+ add_quote_header_regex '^Am .* schrieb.*$'
152
+ # ...
153
+ end
154
+ end
155
+ ```
156
+
157
+ #### `hide_everything_after`
158
+
159
+ Some email clients do not quote the previous conversation.
160
+
161
+ Hi Chris,
162
+ this is great, thanks!
163
+ Cheers, John
164
+
165
+
166
+ From: Chris <chris@example.com>
167
+ Sent: Saturday, July 09, 2016 3:27 PM
168
+ To: John <john@example.com>
169
+ Subject: The solution!
170
+
171
+ Hi John,
172
+ I've just found a solution to our big problem!
173
+ ...
174
+
175
+ To remove the previous conversation, tell the parser expressions to identify where start of the previous conversation:
176
+
177
+ ```ruby
178
+ module ExtendedEmailReplyParser
179
+ class Parsers::I18nEn < Parsers::Base
180
+ def parse
181
+ except_in_visible_block_quotes do
182
+ hide_everything_after ["From: ", "Sent: ", "To: "]
183
+ end
184
+ # ...
185
+ end
186
+ end
187
+ end
188
+ ```
189
+
190
+ (The parser will combine the expressions to a regex: `/(#{expressions.join(".*?")}.*?\n)/m`, for example: `/(From: .*?Sent: .*?To: .*?\n)/m`.)
191
+
192
+ To avoid cutting off the email within a visible quote, wrap the `hide_everything_after` within a `except_in_visible_block_quotes` block as shown above.
193
+
194
+ Hi Chris,
195
+
196
+ > From: Chris <chris@example.com>
197
+ > Sent: Saturday, July 09, 2016 3:27 PM
198
+ > To: John <john@example.com>
199
+ > Subject: The solution!
200
+ >
201
+ > Hi John,
202
+ > I've just found a solution to our big problem!
203
+
204
+ this is great, thanks!
205
+ Cheers, John
206
+
207
+ If not wrapped in `except_in_visible_block_quotes`, the parsed email would just be "Hi Chris,", because everything after "From: Sent: To:" would be cut off.
208
+
108
209
  ## Contributing
109
210
 
110
211
  Bug reports and pull requests are welcome on GitHub at https://github.com/fiedl/extended_email_reply_parser.
@@ -0,0 +1,68 @@
1
+ class EmailReplyParser
2
+ class Email
3
+ def hide_everything_after(expressions)
4
+ split_regex = /(#{expressions.join(".*?")}.*?\n)/m
5
+ split_fragments_at split_regex
6
+ end
7
+
8
+ def remove_empty_lines_between_block_quote_lines
9
+ @fragments = @fragments.collect do |fragment|
10
+ if fragment.quoted?
11
+ fragment.content = fragment.content.gsub /\n *?\n>/m, "\n>"
12
+ end
13
+ fragment
14
+ end
15
+ end
16
+
17
+ def split_fragments_at(regex)
18
+ @fragments = @fragments.collect do |fragment|
19
+ if fragment.to_s
20
+ first_text, *rest = fragment.to_s.split(regex)
21
+
22
+ first_fragment = Fragment.new(false, first_text)
23
+ first_fragment.quoted = fragment.quoted
24
+ first_fragment.hidden = fragment.hidden
25
+ first_fragment.signature = fragment.signature
26
+ first_fragment.content = first_text
27
+
28
+ hidden_fragment = Fragment.new(true, rest.join("\n"))
29
+ hidden_fragment.content = rest.join("\n")
30
+
31
+ hidden_fragment.quoted = true
32
+ if @except_in_visible_block_quotes
33
+ hidden_fragment.hidden = true unless fragment.quoted? and not fragment.hidden?
34
+ else
35
+ hidden_fragment.hidden = true
36
+ end
37
+
38
+ [first_fragment, hidden_fragment]
39
+ end
40
+ end.flatten - [nil]
41
+ @fragments = @fragments.select { |fragment| fragment.to_s && fragment.to_s != "" }
42
+ end
43
+
44
+ def except_in_visible_block_quotes
45
+ @except_in_visible_block_quotes = true
46
+ yield
47
+ @except_in_visible_block_quotes = false
48
+ end
49
+
50
+ private
51
+
52
+ # Detects if a given line is a header above a quoted area. It is only
53
+ # checked for lines preceding quoted regions.
54
+ #
55
+ # line - A String line of text from the email.
56
+ #
57
+ # Returns true if the line is a valid header, or false.
58
+ #
59
+ # This method overrides the original in order to include the different
60
+ # regex defined in the different ExtendedEmailReplyParser::Parsers.
61
+ #
62
+ def quote_header?(line)
63
+ regex = ExtendedEmailReplyParser::Parsers::Base.quote_header_regexes.join("|")
64
+ line.reverse =~ /#{regex}/
65
+ end
66
+
67
+ end
68
+ end
@@ -0,0 +1,9 @@
1
+ class EmailReplyParser
2
+ class Fragment
3
+
4
+ def content=(new_content)
5
+ @content = new_content
6
+ end
7
+
8
+ end
9
+ end
@@ -1,16 +1,172 @@
1
1
  module ExtendedEmailReplyParser
2
2
  class Parsers::Base
3
+ @@quote_header_regexes ||= []
3
4
 
4
5
  attr_accessor :text
5
6
 
6
7
  def initialize(text_before_parsing)
7
8
  self.text = text_before_parsing
9
+
10
+ # The `EmailReplyParser::Email` is extended in this gem.
11
+ # Have a look at:
12
+ #
13
+ # lib/extended_email_reply_parser/email_reply_parser/email.rb
14
+ #
15
+ @email = EmailReplyParser::Email.new.read(text)
8
16
  end
9
17
 
18
+ # This `parse` method of the `Parsers::Base` will be overridden
19
+ # by the individual parsers.
20
+ #
21
+ # The text before parsing is accessed with `text`.
22
+ # The method `parse` is expected to return the parsed text.
23
+ #
10
24
  def parse
11
25
  return text
12
26
  end
13
27
 
28
+ # To avoid cutting off the email within a visible quote, wrap the
29
+ # `hide_everything_after` calls within a `except_in_visible_block_quotes`
30
+ # block:
31
+ #
32
+ # module ExtendedEmailReplyParser
33
+ # class Parsers::I18nEn < Parsers::Base
34
+ # def parse
35
+ # except_in_visible_block_quotes do
36
+ # hide_everything_after ["From: ", "Sent: ", "To: "]
37
+ # end
38
+ # # ...
39
+ # end
40
+ # end
41
+ # end
42
+ #
43
+ # Otherwise, the following email would be completely cut off after
44
+ # "Hi Chris,".
45
+ #
46
+ # Hi Chris,
47
+ #
48
+ # > From: Chris <chris@example.com>
49
+ # > Sent: Saturday, July 09, 2016 3:27 PM
50
+ # > To: John <john@example.com>
51
+ # > Subject: The solution!
52
+ # >
53
+ # > Hi John,
54
+ # > I've just found a solution to our big problem!
55
+ #
56
+ # this is great, thanks!
57
+ # Cheers, John
58
+ #
59
+ def except_in_visible_block_quotes(&block)
60
+ @email.except_in_visible_block_quotes(&block)
61
+ return @email.visible_text
62
+ end
63
+
64
+ # Boil quote like these
65
+ #
66
+ # > Hi,
67
+ #
68
+ # > how are you doing?
69
+ #
70
+ # > Cheers
71
+ #
72
+ # down to
73
+ #
74
+ # > Hi,
75
+ # > how are you doing?
76
+ # > Cheers
77
+ #
78
+ #
79
+ def remove_empty_lines_between_block_quote_lines
80
+ @email.remove_empty_lines_between_block_quote_lines
81
+ return @email.visible_text
82
+ end
83
+
84
+ # Some email clients do not quote the previous conversation.
85
+ #
86
+ # Hi Chris,
87
+ # this is great, thanks!
88
+ # Cheers, John
89
+ #
90
+ #
91
+ # From: Chris <chris@example.com>
92
+ # Sent: Saturday, July 09, 2016 3:27 PM
93
+ # To: John <john@example.com>
94
+ # Subject: The solution!
95
+ #
96
+ # Hi John,
97
+ # I've just found a solution to our big problem!
98
+ # ...
99
+ #
100
+ # To remove the previous conversation, tell the parser expressions
101
+ # to identify where start of the previous conversation:
102
+ #
103
+ # module ExtendedEmailReplyParser
104
+ # class Parsers::I18nEn < Parsers::Base
105
+ # def parse
106
+ # except_in_visible_block_quotes do
107
+ # hide_everything_after ["From: ", "Sent: ", "To: "]
108
+ # end
109
+ # # ...
110
+ # end
111
+ # end
112
+ # end
113
+ #
114
+ # The parser will combine the expressions to a regex:
115
+ # /(#{expressions.join(".*?")}.*?\n)/m`
116
+ # for example:
117
+ # /(From: .*?Sent: .*?To: .*?\n)/m
118
+ #
119
+ def hide_everything_after(expressions)
120
+ @email.hide_everything_after(expressions)
121
+ return @email.visible_text
122
+ end
123
+
124
+ # "On ... wrote:" (English)
125
+ # "Am ... schrieb ...:" (German)
126
+ # ...
127
+ #
128
+ def self.quote_header_regexes
129
+ @@quote_header_regexes
130
+ end
131
+
132
+ # The github parser (https://github.com/github/email_reply_parser) needs to
133
+ # know how to identify the header line of quotes, for example
134
+ #
135
+ # "On Tue, 2011-03-01 at 18:02 +0530, Abhishek Kona wrote"
136
+ #
137
+ # Example email:
138
+ #
139
+ # Hi,
140
+ #
141
+ # On Tue, 2011-03-01 at 18:02 +0530, Abhishek Kona wrote:
142
+ # > Hi folks
143
+ # >
144
+ # > What is the best way to clear a Riak bucket of all key, values after
145
+ # > running a test?
146
+ # > I am currently using the Java HTTP API.
147
+ #
148
+ # You can list the keys for the bucket and call delete for each. Or if you
149
+ # put the keys (and kept track of them in your test) you can delete them
150
+ # one at a time (without incurring the cost of calling list first.)
151
+ #
152
+ # By default, the github parser uses the regex `/^On .* wrote:$/` for that.
153
+ # To make it recognize other header lines, specify their patterns using
154
+ # `add_quote_header_regex`.
155
+ #
156
+ # Since this is needed by the github parser, i.e. possibly before the `parse`
157
+ # method of your custom parser is run, make sure to add the quote header
158
+ # regex in the class head:
159
+ #
160
+ # module ExtendedEmailReplyParser
161
+ # class Parsers::I18nDe < Parsers::Base
162
+ # add_quote_header_regex '^Am .* schrieb.*$'
163
+ # # ...
164
+ # end
165
+ # end
166
+ #
167
+ def self.add_quote_header_regex(regex_string)
168
+ @@quote_header_regexes << regex_string
169
+ end
14
170
 
15
171
  def self.subclasses
16
172
  ObjectSpace.each_object(Class).select { |klass| klass < self }
@@ -1,5 +1,6 @@
1
1
  module ExtendedEmailReplyParser
2
2
  class Parsers::Github < Parsers::Base
3
+ add_quote_header_regex '^On .* wrote:$'
3
4
 
4
5
  def parse
5
6
  EmailReplyParser.parse_reply text
@@ -0,0 +1,14 @@
1
+ module ExtendedEmailReplyParser
2
+ class Parsers::I18nDe < Parsers::Base
3
+ add_quote_header_regex '^Am .* schrieb.*$'
4
+
5
+ def parse
6
+ remove_empty_lines_between_block_quote_lines
7
+ except_in_visible_block_quotes do
8
+ hide_everything_after ["Von: ", "Gesendet: ", "An: "]
9
+ hide_everything_after ["Am ", "schrieb "]
10
+ end
11
+ end
12
+
13
+ end
14
+ end
@@ -0,0 +1,11 @@
1
+ module ExtendedEmailReplyParser
2
+ class Parsers::I18nEn < Parsers::Base
3
+
4
+ def parse
5
+ except_in_visible_block_quotes do
6
+ hide_everything_after ["From: ", "Sent: ", "To: "]
7
+ end
8
+ end
9
+
10
+ end
11
+ end