extended_email_reply_parser 0.4.0 → 0.5.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +5 -0
- data/README.md +9 -0
- data/lib/extended_email_reply_parser/mail/message.rb +4 -0
- data/lib/extended_email_reply_parser/version.rb +1 -1
- data/lib/extended_email_reply_parser.rb +20 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 1343c2a228d59390f28c51ba8c2e5283f380e6fa
|
4
|
+
data.tar.gz: e2efc52cb6c93b4b6a8e271d47a0a0b894fbba61
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 851e7e03fbbf44ec84f40269f61d86e09393cba5df7cef4e5d1ce364379d3ee13a0ea9237585c480e499d93c7429c7212e305b155f61044c7c5e9f687ed92dea
|
7
|
+
data.tar.gz: d4d987194cec11ea53ba3d7779db48accdfcebee35f531fc61428e2c8dfb05e5bde52158e5e4dfa0fc353bf59d4384fa3922b9830aa60543a137c80b8e444b9f
|
data/CHANGELOG.md
CHANGED
@@ -10,6 +10,11 @@ This project adheres to [Semantic Versioning](http://semver.org/).
|
|
10
10
|
### Removed
|
11
11
|
### Fixed
|
12
12
|
|
13
|
+
## ExtendedEmailReplyParser 0.5.0 (2016-11-15)
|
14
|
+
### Added
|
15
|
+
- Adding `ExtendedEmailReplyParser.extract_text_or_html` which falls back to the html part if the text part is missing.
|
16
|
+
- `ExtendedEmailReplyParser.parse` uses `extract_text_or_html`. This is good, because html-only mails do not just return `nil`. But, be aware that `ExtendedEmailReplyParser.parse` may include html tags this way. If this causes you troble, use `ExtendedEmailReplyParser.parse(ExtendedEmailReplyParser.extract_text(message_or_path))` instead, which only uses the text part. We might correct the behavior in the future in order to strip the html tags during the parsing.
|
17
|
+
|
13
18
|
## ExtendedEmailReplyParser 0.4.0 (2016-11-14)
|
14
19
|
### Added
|
15
20
|
- `Mail::Message#extract_html` extracts the html part as a counterpart for `Mail::Message#extract_text`. This is useful when an email has no text part.
|
data/README.md
CHANGED
@@ -54,6 +54,15 @@ ExtendedEmailReplyParser.read("/path/to/email.eml") # => Mail::Message
|
|
54
54
|
ExtendedEmailReplyParser.read("/path/to/email.eml").extract_text # => String
|
55
55
|
```
|
56
56
|
|
57
|
+
**Optional: How to handle html-only emails**: There are emails that do not have a text part but only an html part. For those emails, `ExtendedEmailReplyParser.parse` uses the html part. But, to give you more control over how to handle those situations, `ExtendedEmailReplyParser.extract_text` returns `nil` for those emails. If you want your text extraction to fall back to the html part if the text part is missing, use this:
|
58
|
+
|
59
|
+
```ruby
|
60
|
+
ExtendedEmailReplyParser.extract_text_or_html message
|
61
|
+
ExtendedEmailReplyParser.extract_text_or_html '/path/to/email.eml'
|
62
|
+
```
|
63
|
+
|
64
|
+
The `Mail::Message` object it self is extended to support `message.extract_text`, `message.extract_html_body_content` as well as `message.extract_text_or_html`.
|
65
|
+
|
57
66
|
### Writing a parser
|
58
67
|
|
59
68
|
The parsing system allows you to add your own parser to the parsing chain. Just define a class inheriting from `ExtendedEmailReplyParser::Parsers::Base` and implement a `parse` method. The text before parsing is accessed via `text`.
|
@@ -25,6 +25,10 @@ module Mail
|
|
25
25
|
(self.html_part || (self if self.content_type.include?('text/html'))).try(:body_in_utf8)
|
26
26
|
end
|
27
27
|
|
28
|
+
def extract_text_or_html
|
29
|
+
extract_text || extract_html_body_content
|
30
|
+
end
|
31
|
+
|
28
32
|
def extract_html_body_content
|
29
33
|
# http://stackoverflow.com/a/356376/2066546
|
30
34
|
extract_html.match(/(.*<\s*body[^>]*>)(.*)(<\s*\/\s*body\s*\>.+)/m)[2] || extract_html
|
@@ -58,6 +58,25 @@ module ExtendedEmailReplyParser
|
|
58
58
|
end
|
59
59
|
end
|
60
60
|
|
61
|
+
# Extract the body text from the given Mail::Message.
|
62
|
+
# If there is no text part, extract the content of the body tag
|
63
|
+
# of the html part.
|
64
|
+
#
|
65
|
+
# ExtendedEmailReplyParser.extract_text_or_html message
|
66
|
+
# ExtendedEmailReplyParser.extract_text_or_html '/path/to/email.eml'
|
67
|
+
#
|
68
|
+
# This is the same as:
|
69
|
+
#
|
70
|
+
# message.extract_text_or_html
|
71
|
+
#
|
72
|
+
def self.extract_text_or_html(message_or_path)
|
73
|
+
if message_or_path.kind_of? Mail::Message
|
74
|
+
message_or_path.extract_text_or_html
|
75
|
+
elsif message_or_path.kind_of? String and File.file? message_or_path
|
76
|
+
Mail.read(message_or_path).extract_text_or_html
|
77
|
+
end
|
78
|
+
end
|
79
|
+
|
61
80
|
# This parses the given object, i.e. removes quoted replies etc.
|
62
81
|
#
|
63
82
|
# Examples:
|
@@ -81,7 +100,7 @@ module ExtendedEmailReplyParser
|
|
81
100
|
end
|
82
101
|
|
83
102
|
def self.parse_message(message)
|
84
|
-
self.parse_text(message.
|
103
|
+
self.parse_text(message.extract_text_or_html)
|
85
104
|
end
|
86
105
|
|
87
106
|
def self.parse_text(text)
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: extended_email_reply_parser
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.5.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Sebastian Fiedlschuster
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2016-11-
|
11
|
+
date: 2016-11-15 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|