extended_email_reply_parser 0.4.0 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +5 -0
- data/README.md +9 -0
- data/lib/extended_email_reply_parser/mail/message.rb +4 -0
- data/lib/extended_email_reply_parser/version.rb +1 -1
- data/lib/extended_email_reply_parser.rb +20 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 1343c2a228d59390f28c51ba8c2e5283f380e6fa
|
4
|
+
data.tar.gz: e2efc52cb6c93b4b6a8e271d47a0a0b894fbba61
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 851e7e03fbbf44ec84f40269f61d86e09393cba5df7cef4e5d1ce364379d3ee13a0ea9237585c480e499d93c7429c7212e305b155f61044c7c5e9f687ed92dea
|
7
|
+
data.tar.gz: d4d987194cec11ea53ba3d7779db48accdfcebee35f531fc61428e2c8dfb05e5bde52158e5e4dfa0fc353bf59d4384fa3922b9830aa60543a137c80b8e444b9f
|
data/CHANGELOG.md
CHANGED
@@ -10,6 +10,11 @@ This project adheres to [Semantic Versioning](http://semver.org/).
|
|
10
10
|
### Removed
|
11
11
|
### Fixed
|
12
12
|
|
13
|
+
## ExtendedEmailReplyParser 0.5.0 (2016-11-15)
|
14
|
+
### Added
|
15
|
+
- Adding `ExtendedEmailReplyParser.extract_text_or_html` which falls back to the html part if the text part is missing.
|
16
|
+
- `ExtendedEmailReplyParser.parse` uses `extract_text_or_html`. This is good, because html-only mails do not just return `nil`. But, be aware that `ExtendedEmailReplyParser.parse` may include html tags this way. If this causes you troble, use `ExtendedEmailReplyParser.parse(ExtendedEmailReplyParser.extract_text(message_or_path))` instead, which only uses the text part. We might correct the behavior in the future in order to strip the html tags during the parsing.
|
17
|
+
|
13
18
|
## ExtendedEmailReplyParser 0.4.0 (2016-11-14)
|
14
19
|
### Added
|
15
20
|
- `Mail::Message#extract_html` extracts the html part as a counterpart for `Mail::Message#extract_text`. This is useful when an email has no text part.
|
data/README.md
CHANGED
@@ -54,6 +54,15 @@ ExtendedEmailReplyParser.read("/path/to/email.eml") # => Mail::Message
|
|
54
54
|
ExtendedEmailReplyParser.read("/path/to/email.eml").extract_text # => String
|
55
55
|
```
|
56
56
|
|
57
|
+
**Optional: How to handle html-only emails**: There are emails that do not have a text part but only an html part. For those emails, `ExtendedEmailReplyParser.parse` uses the html part. But, to give you more control over how to handle those situations, `ExtendedEmailReplyParser.extract_text` returns `nil` for those emails. If you want your text extraction to fall back to the html part if the text part is missing, use this:
|
58
|
+
|
59
|
+
```ruby
|
60
|
+
ExtendedEmailReplyParser.extract_text_or_html message
|
61
|
+
ExtendedEmailReplyParser.extract_text_or_html '/path/to/email.eml'
|
62
|
+
```
|
63
|
+
|
64
|
+
The `Mail::Message` object it self is extended to support `message.extract_text`, `message.extract_html_body_content` as well as `message.extract_text_or_html`.
|
65
|
+
|
57
66
|
### Writing a parser
|
58
67
|
|
59
68
|
The parsing system allows you to add your own parser to the parsing chain. Just define a class inheriting from `ExtendedEmailReplyParser::Parsers::Base` and implement a `parse` method. The text before parsing is accessed via `text`.
|
@@ -25,6 +25,10 @@ module Mail
|
|
25
25
|
(self.html_part || (self if self.content_type.include?('text/html'))).try(:body_in_utf8)
|
26
26
|
end
|
27
27
|
|
28
|
+
def extract_text_or_html
|
29
|
+
extract_text || extract_html_body_content
|
30
|
+
end
|
31
|
+
|
28
32
|
def extract_html_body_content
|
29
33
|
# http://stackoverflow.com/a/356376/2066546
|
30
34
|
extract_html.match(/(.*<\s*body[^>]*>)(.*)(<\s*\/\s*body\s*\>.+)/m)[2] || extract_html
|
@@ -58,6 +58,25 @@ module ExtendedEmailReplyParser
|
|
58
58
|
end
|
59
59
|
end
|
60
60
|
|
61
|
+
# Extract the body text from the given Mail::Message.
|
62
|
+
# If there is no text part, extract the content of the body tag
|
63
|
+
# of the html part.
|
64
|
+
#
|
65
|
+
# ExtendedEmailReplyParser.extract_text_or_html message
|
66
|
+
# ExtendedEmailReplyParser.extract_text_or_html '/path/to/email.eml'
|
67
|
+
#
|
68
|
+
# This is the same as:
|
69
|
+
#
|
70
|
+
# message.extract_text_or_html
|
71
|
+
#
|
72
|
+
def self.extract_text_or_html(message_or_path)
|
73
|
+
if message_or_path.kind_of? Mail::Message
|
74
|
+
message_or_path.extract_text_or_html
|
75
|
+
elsif message_or_path.kind_of? String and File.file? message_or_path
|
76
|
+
Mail.read(message_or_path).extract_text_or_html
|
77
|
+
end
|
78
|
+
end
|
79
|
+
|
61
80
|
# This parses the given object, i.e. removes quoted replies etc.
|
62
81
|
#
|
63
82
|
# Examples:
|
@@ -81,7 +100,7 @@ module ExtendedEmailReplyParser
|
|
81
100
|
end
|
82
101
|
|
83
102
|
def self.parse_message(message)
|
84
|
-
self.parse_text(message.
|
103
|
+
self.parse_text(message.extract_text_or_html)
|
85
104
|
end
|
86
105
|
|
87
106
|
def self.parse_text(text)
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: extended_email_reply_parser
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.5.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Sebastian Fiedlschuster
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2016-11-
|
11
|
+
date: 2016-11-15 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|