ruby-msg 1.3.1 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README CHANGED
@@ -1,121 +1,116 @@
1
- #summary ruby-msg - A library for reading Outlook msg files, and for converting them to RFC2822 emails.
1
+ = Introduction
2
2
 
3
- = Introduction =
3
+ Generally, the goal of the project is the conversion of .msg files
4
+ into proper rfc2822 emails, independent of outlook, or any platform
5
+ dependencies etc. In fact its currently pure ruby, so it should be
6
+ easy to get started with.
4
7
 
5
- Generally, the goal of the project is the conversion of .msg files into proper rfc2822
6
- emails, independent of outlook, or any platform dependencies etc.
7
- In fact its currently pure ruby, so it should be easy to get started with.
8
+ There's also work-in-progess pst support (unfortunately outlook 97
9
+ only currently), based on libpst, making this project more of a general
10
+ ruby mapi message store conversion library now (though some significant
11
+ cleaning up has to happen first).
8
12
 
9
- It draws on `msgconvert.pl`, but tries to take a cleaner and more complete approach.
10
- Neither are complete yet, however, but I think that this project provides a clean foundation upon which to work on a good converter for msg files for use in outlook migrations etc.
13
+ It draws on <tt>msgconvert.pl</tt>, but tries to take a cleaner and
14
+ more complete approach. Neither are complete yet, however, but I think
15
+ that this project provides a clean foundation upon which to work on
16
+ a good converter for msg files for use in outlook migrations etc.
11
17
 
12
18
  I am happy to accept patches, give commit bits etc.
13
19
 
14
20
  Please let me know how it works for you, any feedback would be welcomed.
15
21
 
16
- = Usage =
17
-
18
- Higher level access to the msg, can be had through the top level data accessors.
19
-
20
- {{{
21
- require 'msg'
22
-
23
- msg = Msg.load open(filename)
24
-
25
- # access to the 3 main data stores, if you want to poke with the msg
26
- # internals
27
- msg.recipients
28
- # => [#<Recipient:'\'Marley, Bob\' <bob.marley@gmail.com>'>]
29
- msg.attachments
30
- # => [#<Attachment filename='blah1.tif'>, #<Attachment filename='blah2.tif'>]
31
- msg.properties
32
- # => #<Properties ... normalized_subject='Testing' ...
33
- # creation_time=#<DateTime: 2454042.45074714,0,2299161> ...>
34
- }}}
35
-
36
- To completely abstract away all msg peculiarities, convert the msg to a mime object.
37
- The message as a whole, and some of its main parts support conversion to mime objects.
38
-
39
- {{{
40
- msg.attachments.first.to_mime
41
- # => #<Mime content_type='application/octet-stream'>
42
- mime = msg.to_mime
43
- puts mime.to_tree
44
- # =>
45
- - #<Mime content_type='multipart/mixed'>
46
- |- #<Mime content_type='multipart/alternative'>
47
- | |- #<Mime content_type='text/plain'>
48
- | \- #<Mime content_type='text/html'>
49
- |- #<Mime content_type='application/octet-stream'>
50
- \- #<Mime content_type='application/octet-stream'>
51
-
52
- # convert mime object to serialised form,
53
- # inclusive of attachments etc. (not ideal in memory, but its wip).
54
- puts mime.to_s
55
- }}}
56
-
57
- You can also access the underlying ole object, and see all the gory details of how msgs are serialised:
58
-
59
- {{{
60
- puts msg.ole.root.to_tree
61
- # =>
62
- - #<OleDir:"Root Entry" size=3840 time="2006-11-03T00:52:53Z">
63
- |- #<OleDir:"__nameid_version1.0" size=0 time="2006-11-03T00:52:53Z">
64
- | |- #<OleDir:"__substg1.0_00020102" size=16 data="CCAGAAAAAADAAA...">
65
- | |- #<OleDir:"__substg1.0_00030102" size=64 data="DoUAAAYAAABShQ...">
66
- | |- #<OleDir:"__substg1.0_00040102" size=0 data="">
67
- | |- #<OleDir:"__substg1.0_10010102" size=16 data="UoUAAAYAAQAQhQ...">
68
- | |- #<OleDir:"__substg1.0_10090102" size=8 data="GIUAAAYABgA=">
69
- | |- #<OleDir:"__substg1.0_100A0102" size=8 data="BoUAAAYABwA=">
70
- | |- #<OleDir:"__substg1.0_100F0102" size=8 data="A4UAAAYABAA=">
71
- | |- #<OleDir:"__substg1.0_10110102" size=8 data="AYUAAAYAAwA=">
72
- | |- #<OleDir:"__substg1.0_10120102" size=8 data="DoUAAAYAAAA=">
73
- | \- #<OleDir:"__substg1.0_101E0102" size=8 data="VIUAAAYAAgA=">
74
- |- #<OleDir:"__substg1.0_001A001E" size=8 data="SVBNLk5vdGU=">
75
- ...
76
- |- #<OleDir:"__substg1.0_8002001E" size=4 data="MTEuMA==">
77
- |- #<OleDir:"__properties_version1.0" size=800 data="AAAAAAAAAAABAA...">
78
- \- #<OleDir:"__recip_version1.0_#00000000" size=0 time="2006-11-03T00:52:53Z">
79
-
80
- |- #<OleDir:"__substg1.0_0FF60102" size=4 data="AAAAAA==">
81
- |- #<OleDir:"__substg1.0_3001001E" size=4 data="YXNkZg==">
82
- |- #<OleDir:"__substg1.0_5FF6001E" size=4 data="YXNkZg==">
83
- \- #<OleDir:"__properties_version1.0" size=152 data="AAAAAAAAAAAeAA...">
84
- }}}
85
-
86
- = Further Details =
87
-
88
- Named properties have recently been implemented, and Msg::Properties now allows associated guids. Keys are represented by Msg::Properties::Key, which contains the relevant code.
89
-
90
- You can now write code like:
91
- {{{
92
- props = msg.properties
93
-
94
- props[0x0037] # access subject by mapi code
95
- props[0x0037, Msg::Properties::PS_MAPI] # equivalent, with explicit GUID.
96
- key = Msg::Properties::Key.new 0x0037 # => 0x0037
97
- props[key] # same again
98
-
99
- # keys support being converted to symbols, and then use a symbolic lookup
100
- key.to_sym # => :subject
101
- props[:subject] # as above
102
- props.subject # still good
103
- }}}
104
-
105
- Under the hood, there is complete support for named properties:
106
- {{{
107
- # to get the categories as set by outlook
108
- props['Keywords', Msg::Properties::PS_PUBLIC_STRINGS]
109
- # => ["Business", "Competition", "Favorites"]
110
-
111
- # and as a fallback, the symbolic lookup will automatically use named properties,
112
- # which can be seen:
113
- props.resolve :keywords
114
- # => #<Key {00020329-0000-0000-c000-000000000046}/"Keywords">
115
-
116
- # which allows this to work:
117
- props.keywords # as above
118
- }}}
119
-
120
- With some more work, the property storage model should be able to reach feature
121
- completion.
22
+ = Features
23
+
24
+ Broad features of the project:
25
+
26
+ * Can be used as a general msg library, where conversion to and working
27
+ on a standard format doesn't make sense.
28
+
29
+ * Supports conversion of msg files to standard formats, like rfc2822
30
+ emails, vCards, etc.
31
+
32
+ * Well commented, and easily extended.
33
+
34
+ * Most key .msg structures are understood, and the only the parsing
35
+ code should require minor tweaks. Most of remaining work is in achieving
36
+ high-fidelity conversion to standards formats (see [TODO]).
37
+
38
+ Features of the lower-level msg handling:
39
+
40
+ * Supports both types of property storage (large ones in +substg+
41
+ files, and small ones in the +properties+ file).
42
+
43
+ * Complete support for named properties in different GUID namespaces.
44
+
45
+ * Support for mapping property codes to symbolic names, with many
46
+ included.
47
+
48
+ * RTF decompression support included, as well as HTML extraction from
49
+ RTF where appropriate (both in pure ruby, see <tt>lib/msg/rtf.rb</tt>)
50
+
51
+ * Initial RTF converter, for providing a readable body when only RTF
52
+ exists (needs work)
53
+
54
+ * Initial support for handling embedded ole files, converting nested
55
+ .msg files to message/rfc822 attachments, and serializing others
56
+ as ole file attachments (allows you to view embedded excel for example).
57
+
58
+ = Usage
59
+
60
+ At the command line, it is simple to convert individual msg files
61
+ to .eml, or to convert a batch to an mbox format file. See help for
62
+ details:
63
+
64
+ msgtool -c some_email.msg > some_email.eml
65
+ msgtool -m *.msg > mbox
66
+
67
+ There is also a fairly complete and easy to use high level library
68
+ access:
69
+
70
+ require 'msg'
71
+
72
+ msg = Msg.open filename
73
+
74
+ # access to the 3 main data stores, if you want to poke with the msg
75
+ # internals
76
+ msg.recipients
77
+ # => [#<Recipient:'\'Marley, Bob\' <bob.marley@gmail.com>'>]
78
+ msg.attachments
79
+ # => [#<Attachment filename='blah1.tif'>, #<Attachment filename='blah2.tif'>]
80
+ msg.properties
81
+ # => #<Properties ... normalized_subject='Testing' ...
82
+ # creation_time=#<DateTime: 2454042.45074714,0,2299161> ...>
83
+
84
+ To completely abstract away all msg peculiarities, convert the msg
85
+ to a mime object. The message as a whole, and some of its main parts
86
+ support conversion to mime objects.
87
+
88
+ msg.attachments.first.to_mime
89
+ # => #<Mime content_type='application/octet-stream'>
90
+ mime = msg.to_mime
91
+ puts mime.to_tree
92
+ # =>
93
+ - #<Mime content_type='multipart/mixed'>
94
+ |- #<Mime content_type='multipart/alternative'>
95
+ | |- #<Mime content_type='text/plain'>
96
+ | \- #<Mime content_type='text/html'>
97
+ |- #<Mime content_type='application/octet-stream'>
98
+ \- #<Mime content_type='application/octet-stream'>
99
+
100
+ # convert mime object to serialised form,
101
+ # inclusive of attachments etc. (not ideal in memory, but its wip).
102
+ puts mime.to_s
103
+
104
+ = Other
105
+
106
+ For more information, see
107
+
108
+ * TODO
109
+
110
+ * MsgDetails[http://code.google.com/p/ruby-msg/wiki/MsgDetails]
111
+
112
+ * OleDetails[http://code.google.com/p/ruby-ole/wiki/OleDetails]
113
+
114
+ * msgconv[http://www.matijs.net/software/msgconv/], the original
115
+ perl converter.
116
+
data/Rakefile CHANGED
@@ -8,55 +8,69 @@ require 'fileutils'
8
8
 
9
9
  $:.unshift 'lib'
10
10
 
11
- require 'msg'
11
+ require 'mapi/msg'
12
12
 
13
13
  PKG_NAME = 'ruby-msg'
14
- PKG_VERSION = Msg::VERSION
14
+ PKG_VERSION = Mapi::VERSION
15
15
 
16
16
  task :default => [:test]
17
17
 
18
18
  Rake::TestTask.new(:test) do |t|
19
- t.test_files = FileList["test/test_*.rb"]
20
- t.warning = true
19
+ t.test_files = FileList["test/test_*.rb"] - ['test/test_pst.rb']
20
+ t.warning = false
21
21
  t.verbose = true
22
22
  end
23
23
 
24
- # RDocTask wasn't working for me
25
- desc 'Build the rdoc HTML Files'
26
- task :rdoc do
27
- system "rdoc -S -N --main Msg --tab-width 2 --title '#{PKG_NAME} documentation' lib"
24
+ begin
25
+ require 'rcov/rcovtask'
26
+ # NOTE: this will not do anything until you add some tests
27
+ desc "Create a cross-referenced code coverage report"
28
+ Rcov::RcovTask.new do |t|
29
+ t.test_files = FileList['test/test*.rb']
30
+ t.ruby_opts << "-Ilib" # in order to use this rcov
31
+ t.rcov_opts << "--xrefs" # comment to disable cross-references
32
+ t.rcov_opts << "--exclude /usr/local/lib/site_ruby"
33
+ t.verbose = true
34
+ end
35
+ rescue LoadError
36
+ # Rcov not available
37
+ end
38
+
39
+ Rake::RDocTask.new do |t|
40
+ t.rdoc_dir = 'doc'
41
+ t.title = "#{PKG_NAME} documentation"
42
+ t.options += %w[--main README --line-numbers --inline-source --tab-width 2]
43
+ t.rdoc_files.include 'lib/**/*.rb'
44
+ t.rdoc_files.include 'README'
28
45
  end
29
46
 
30
47
  spec = Gem::Specification.new do |s|
31
- s.name = PKG_NAME
32
- s.version = PKG_VERSION
33
- s.summary = %q{Ruby Msg library.}
48
+ s.name = PKG_NAME
49
+ s.version = PKG_VERSION
50
+ s.summary = %q{Ruby Msg library.}
34
51
  s.description = %q{A library for reading Outlook msg files, and for converting them to RFC2822 emails.}
35
- s.authors = ["Charles Lowe"]
36
- s.email = %q{aquasync@gmail.com}
37
- s.homepage = %q{http://code.google.com/p/ruby-msg}
38
- #s.rubyforge_project = %q{ruby-msg}
39
-
40
- s.executables = ['msgtool']
41
- s.files = Dir.glob('data/*.yaml') + ['Rakefile', 'README', 'FIXES']
42
- s.files += Dir.glob("lib/**/*.rb")
43
- s.files += Dir.glob("test/test_*.rb")
44
- s.files += Dir.glob("bin/*")
52
+ s.authors = ["Charles Lowe"]
53
+ s.email = %q{aquasync@gmail.com}
54
+ s.homepage = %q{http://code.google.com/p/ruby-msg}
55
+ s.rubyforge_project = %q{ruby-msg}
56
+
57
+ s.executables = ['mapitool']
58
+ s.files = FileList['data/*.yaml', 'Rakefile', 'README', 'FIXES']
59
+ s.files += FileList['lib/**/*.rb', 'test/test_*.rb', 'bin/*']
45
60
 
46
- s.has_rdoc = true
47
- s.rdoc_options += ['--main', 'Msg',
61
+ s.has_rdoc = true
62
+ s.extra_rdoc_files = ['README']
63
+ s.rdoc_options += ['--main', 'README',
48
64
  '--title', "#{PKG_NAME} documentation",
49
65
  '--tab-width', '2']
50
66
 
51
-
52
- s.autorequire = 'msg'
53
-
54
- s.add_dependency 'ruby-ole', '>=1.2.1'
67
+ s.add_dependency 'ruby-ole', '>=1.2.4'
68
+ s.add_dependency 'vpim', '>=0.360'
55
69
  end
56
70
 
57
71
  Rake::GemPackageTask.new(spec) do |p|
58
72
  p.gem_spec = spec
59
- p.need_tar = true
73
+ p.need_tar = false #true
60
74
  p.need_zip = false
61
75
  p.package_dir = 'build'
62
76
  end
@@ -0,0 +1,195 @@
1
+ #! /usr/bin/ruby
2
+
3
+ $:.unshift File.dirname(__FILE__) + '/../lib'
4
+
5
+ require 'optparse'
6
+ require 'rubygems'
7
+ require 'mapi/msg'
8
+ require 'mapi/pst'
9
+ require 'mapi/convert'
10
+ require 'time'
11
+
12
+ class Mapitool
13
+ attr_reader :files, :opts
14
+ def initialize files, opts
15
+ @files, @opts = files, opts
16
+ seen_pst = false
17
+ raise ArgumentError, 'Must specify 1 or more input files.' if files.empty?
18
+ files.map! do |f|
19
+ ext = File.extname(f.downcase)[1..-1]
20
+ raise ArgumentError, 'Unsupported file type - %s' % f unless ext =~ /^(msg|pst)$/
21
+ raise ArgumentError, 'Expermiental pst support not enabled' if ext == 'pst' and !opts[:enable_pst]
22
+ [ext.to_sym, f]
23
+ end
24
+ if dir = opts[:output_dir]
25
+ Dir.mkdir(dir) unless File.directory?(dir)
26
+ end
27
+ end
28
+
29
+ def each_message(&block)
30
+ files.each do |format, filename|
31
+ if format == :pst
32
+ if filter_path = opts[:filter_path]
33
+ filter_path = filter_path.tr("\\", '/').gsub(/\/+/, '/').sub(/^\//, '').sub(/\/$/, '')
34
+ end
35
+ open filename do |io|
36
+ pst = Mapi::Pst.new io
37
+ pst.each do |message|
38
+ next unless message.type == :message
39
+ if filter_path
40
+ next unless message.path =~ /^#{Regexp.quote filter_path}(\/|$)/i
41
+ end
42
+ yield message
43
+ end
44
+ end
45
+ else
46
+ Mapi::Msg.open filename, &block
47
+ end
48
+ end
49
+ end
50
+
51
+ def run
52
+ each_message(&method(:process_message))
53
+ end
54
+
55
+ def make_unique filename
56
+ @map ||= {}
57
+ return @map[filename] if !opts[:individual] and @map[filename]
58
+ try = filename
59
+ i = 1
60
+ try = filename.gsub(/(\.[^.]+)$/, ".#{i += 1}\\1") while File.exist?(try)
61
+ @map[filename] = try
62
+ try
63
+ end
64
+
65
+ def process_message message
66
+ # TODO make this more informative
67
+ mime_type = message.mime_type
68
+ return unless pair = Mapi::Message::CONVERSION_MAP[mime_type]
69
+
70
+ combined_map = {
71
+ 'eml' => 'Mail.mbox',
72
+ 'vcf' => 'Contacts.vcf',
73
+ 'txt' => 'Posts.txt'
74
+ }
75
+
76
+ # TODO handle merged mode, pst, etc etc...
77
+ case message
78
+ when Mapi::Msg
79
+ if opts[:individual]
80
+ filename = message.root.ole.io.path.gsub(/msg$/i, pair.last)
81
+ else
82
+ filename = combined_map[pair.last] or raise NotImplementedError
83
+ end
84
+ when Mapi::Pst::Item
85
+ if opts[:individual]
86
+ filename = "#{message.subject.tr ' ', '_'}.#{pair.last}".gsub(/[^A-Za-z0-9.()\[\]{}-]/, '_')
87
+ else
88
+ filename = combined_map[pair.last] or raise NotImplementedError
89
+ filename = (message.path.tr(' /', '_.').gsub(/[^A-Za-z0-9.()\[\]{}-]/, '_') + '.' + File.extname(filename)).squeeze('.')
90
+ end
91
+ dir = File.dirname(message.instance_variable_get(:@desc).pst.io.path)
92
+ filename = File.join dir, filename
93
+ else
94
+ raise
95
+ end
96
+
97
+ if dir = opts[:output_dir]
98
+ filename = File.join dir, File.basename(filename)
99
+ end
100
+
101
+ filename = make_unique filename
102
+
103
+ write_message = proc do |f|
104
+ data = message.send(pair.first).to_s
105
+ if !opts[:individual] and pair.last == 'eml'
106
+ # we do the append > style mbox quoting (mboxrd i think its called), as it
107
+ # is the only one that can be robuslty un-quoted. evolution doesn't use this!
108
+ f.puts "From mapitool@localhost #{Time.now.rfc2822}"
109
+ #munge_headers mime, opts
110
+ data.each do |line|
111
+ if line =~ /^>*From /o
112
+ f.print '>' + line
113
+ else
114
+ f.print line
115
+ end
116
+ end
117
+ else
118
+ f.write data
119
+ end
120
+ end
121
+
122
+ if opts[:stdout]
123
+ write_message[STDOUT]
124
+ else
125
+ open filename, 'a', &write_message
126
+ end
127
+ end
128
+
129
+ def munge_headers mime, opts
130
+ opts[:header_defaults].each do |s|
131
+ key, val = s.match(/(.*?):\s+(.*)/)[1..-1]
132
+ mime.headers[key] = [val] if mime.headers[key].empty?
133
+ end
134
+ end
135
+ end
136
+
137
+ def mapitool
138
+ opts = {:verbose => false, :action => :convert, :header_defaults => []}
139
+ op = OptionParser.new do |op|
140
+ op.banner = "Usage: mapitool [options] [files]"
141
+ #op.separator ''
142
+ #op.on('-c', '--convert', 'Convert input files (default)') { opts[:action] = :convert }
143
+ op.separator ''
144
+ op.on('-o', '--output-dir DIR', 'Put all output files in DIR') { |d| opts[:output_dir] = d }
145
+ op.on('-i', '--[no-]individual', 'Do not combine converted files') { |i| opts[:individual] = i }
146
+ op.on('-s', '--stdout', 'Write all data to stdout') { opts[:stdout] = true }
147
+ op.on('-f', '--filter-path PATH', 'Only process pst items in PATH') { |path| opts[:filter_path] = path }
148
+ op.on( '--enable-pst', 'Turn on experimental PST support') { opts[:enable_pst] = true }
149
+ #op.on('-d', '--header-default STR', 'Provide a default value for top level mail header') { |hd| opts[:header_defaults] << hd }
150
+ # --enable-pst
151
+ op.separator ''
152
+ op.on('-v', '--[no-]verbose', 'Run verbosely') { |v| opts[:verbose] = v }
153
+ op.on_tail('-h', '--help', 'Show this message') { puts op; exit }
154
+ end
155
+
156
+ files = op.parse ARGV
157
+
158
+ # for windows. see issue #2
159
+ STDOUT.binmode
160
+
161
+ Mapi::Log.level = Ole::Log.level = opts[:verbose] ? Logger::WARN : Logger::FATAL
162
+
163
+ tool = begin
164
+ Mapitool.new(files, opts)
165
+ rescue ArgumentError
166
+ puts $!
167
+ puts op
168
+ exit 1
169
+ end
170
+
171
+ tool.run
172
+ end
173
+
174
+ mapitool
175
+
176
+ __END__
177
+
178
+ mapitool [options] [files]
179
+
180
+ files is a list of *.msg & *.pst files.
181
+
182
+ one of the options should be some sort of path filter to apply to pst items.
183
+
184
+ --filter-path=
185
+ --filter-type=eml,vcf
186
+
187
+ with that out of the way, the entire list of files can be converted into a
188
+ list of items (with meta data about the source).
189
+
190
+ --convert
191
+ --[no-]separate one output file per item or combined output
192
+ --stdout
193
+ --output-dir=.
194
+
195
+