ruby-msg 1.3.1 → 1.4.0

Sign up to get free protection for your applications and to get access to all the features.
data/README CHANGED
@@ -1,121 +1,116 @@
1
- #summary ruby-msg - A library for reading Outlook msg files, and for converting them to RFC2822 emails.
1
+ = Introduction
2
2
 
3
- = Introduction =
3
+ Generally, the goal of the project is the conversion of .msg files
4
+ into proper rfc2822 emails, independent of outlook, or any platform
5
+ dependencies etc. In fact its currently pure ruby, so it should be
6
+ easy to get started with.
4
7
 
5
- Generally, the goal of the project is the conversion of .msg files into proper rfc2822
6
- emails, independent of outlook, or any platform dependencies etc.
7
- In fact its currently pure ruby, so it should be easy to get started with.
8
+ There's also work-in-progess pst support (unfortunately outlook 97
9
+ only currently), based on libpst, making this project more of a general
10
+ ruby mapi message store conversion library now (though some significant
11
+ cleaning up has to happen first).
8
12
 
9
- It draws on `msgconvert.pl`, but tries to take a cleaner and more complete approach.
10
- Neither are complete yet, however, but I think that this project provides a clean foundation upon which to work on a good converter for msg files for use in outlook migrations etc.
13
+ It draws on <tt>msgconvert.pl</tt>, but tries to take a cleaner and
14
+ more complete approach. Neither are complete yet, however, but I think
15
+ that this project provides a clean foundation upon which to work on
16
+ a good converter for msg files for use in outlook migrations etc.
11
17
 
12
18
  I am happy to accept patches, give commit bits etc.
13
19
 
14
20
  Please let me know how it works for you, any feedback would be welcomed.
15
21
 
16
- = Usage =
17
-
18
- Higher level access to the msg, can be had through the top level data accessors.
19
-
20
- {{{
21
- require 'msg'
22
-
23
- msg = Msg.load open(filename)
24
-
25
- # access to the 3 main data stores, if you want to poke with the msg
26
- # internals
27
- msg.recipients
28
- # => [#<Recipient:'\'Marley, Bob\' <bob.marley@gmail.com>'>]
29
- msg.attachments
30
- # => [#<Attachment filename='blah1.tif'>, #<Attachment filename='blah2.tif'>]
31
- msg.properties
32
- # => #<Properties ... normalized_subject='Testing' ...
33
- # creation_time=#<DateTime: 2454042.45074714,0,2299161> ...>
34
- }}}
35
-
36
- To completely abstract away all msg peculiarities, convert the msg to a mime object.
37
- The message as a whole, and some of its main parts support conversion to mime objects.
38
-
39
- {{{
40
- msg.attachments.first.to_mime
41
- # => #<Mime content_type='application/octet-stream'>
42
- mime = msg.to_mime
43
- puts mime.to_tree
44
- # =>
45
- - #<Mime content_type='multipart/mixed'>
46
- |- #<Mime content_type='multipart/alternative'>
47
- | |- #<Mime content_type='text/plain'>
48
- | \- #<Mime content_type='text/html'>
49
- |- #<Mime content_type='application/octet-stream'>
50
- \- #<Mime content_type='application/octet-stream'>
51
-
52
- # convert mime object to serialised form,
53
- # inclusive of attachments etc. (not ideal in memory, but its wip).
54
- puts mime.to_s
55
- }}}
56
-
57
- You can also access the underlying ole object, and see all the gory details of how msgs are serialised:
58
-
59
- {{{
60
- puts msg.ole.root.to_tree
61
- # =>
62
- - #<OleDir:"Root Entry" size=3840 time="2006-11-03T00:52:53Z">
63
- |- #<OleDir:"__nameid_version1.0" size=0 time="2006-11-03T00:52:53Z">
64
- | |- #<OleDir:"__substg1.0_00020102" size=16 data="CCAGAAAAAADAAA...">
65
- | |- #<OleDir:"__substg1.0_00030102" size=64 data="DoUAAAYAAABShQ...">
66
- | |- #<OleDir:"__substg1.0_00040102" size=0 data="">
67
- | |- #<OleDir:"__substg1.0_10010102" size=16 data="UoUAAAYAAQAQhQ...">
68
- | |- #<OleDir:"__substg1.0_10090102" size=8 data="GIUAAAYABgA=">
69
- | |- #<OleDir:"__substg1.0_100A0102" size=8 data="BoUAAAYABwA=">
70
- | |- #<OleDir:"__substg1.0_100F0102" size=8 data="A4UAAAYABAA=">
71
- | |- #<OleDir:"__substg1.0_10110102" size=8 data="AYUAAAYAAwA=">
72
- | |- #<OleDir:"__substg1.0_10120102" size=8 data="DoUAAAYAAAA=">
73
- | \- #<OleDir:"__substg1.0_101E0102" size=8 data="VIUAAAYAAgA=">
74
- |- #<OleDir:"__substg1.0_001A001E" size=8 data="SVBNLk5vdGU=">
75
- ...
76
- |- #<OleDir:"__substg1.0_8002001E" size=4 data="MTEuMA==">
77
- |- #<OleDir:"__properties_version1.0" size=800 data="AAAAAAAAAAABAA...">
78
- \- #<OleDir:"__recip_version1.0_#00000000" size=0 time="2006-11-03T00:52:53Z">
79
-
80
- |- #<OleDir:"__substg1.0_0FF60102" size=4 data="AAAAAA==">
81
- |- #<OleDir:"__substg1.0_3001001E" size=4 data="YXNkZg==">
82
- |- #<OleDir:"__substg1.0_5FF6001E" size=4 data="YXNkZg==">
83
- \- #<OleDir:"__properties_version1.0" size=152 data="AAAAAAAAAAAeAA...">
84
- }}}
85
-
86
- = Further Details =
87
-
88
- Named properties have recently been implemented, and Msg::Properties now allows associated guids. Keys are represented by Msg::Properties::Key, which contains the relevant code.
89
-
90
- You can now write code like:
91
- {{{
92
- props = msg.properties
93
-
94
- props[0x0037] # access subject by mapi code
95
- props[0x0037, Msg::Properties::PS_MAPI] # equivalent, with explicit GUID.
96
- key = Msg::Properties::Key.new 0x0037 # => 0x0037
97
- props[key] # same again
98
-
99
- # keys support being converted to symbols, and then use a symbolic lookup
100
- key.to_sym # => :subject
101
- props[:subject] # as above
102
- props.subject # still good
103
- }}}
104
-
105
- Under the hood, there is complete support for named properties:
106
- {{{
107
- # to get the categories as set by outlook
108
- props['Keywords', Msg::Properties::PS_PUBLIC_STRINGS]
109
- # => ["Business", "Competition", "Favorites"]
110
-
111
- # and as a fallback, the symbolic lookup will automatically use named properties,
112
- # which can be seen:
113
- props.resolve :keywords
114
- # => #<Key {00020329-0000-0000-c000-000000000046}/"Keywords">
115
-
116
- # which allows this to work:
117
- props.keywords # as above
118
- }}}
119
-
120
- With some more work, the property storage model should be able to reach feature
121
- completion.
22
+ = Features
23
+
24
+ Broad features of the project:
25
+
26
+ * Can be used as a general msg library, where conversion to and working
27
+ on a standard format doesn't make sense.
28
+
29
+ * Supports conversion of msg files to standard formats, like rfc2822
30
+ emails, vCards, etc.
31
+
32
+ * Well commented, and easily extended.
33
+
34
+ * Most key .msg structures are understood, and the only the parsing
35
+ code should require minor tweaks. Most of remaining work is in achieving
36
+ high-fidelity conversion to standards formats (see [TODO]).
37
+
38
+ Features of the lower-level msg handling:
39
+
40
+ * Supports both types of property storage (large ones in +substg+
41
+ files, and small ones in the +properties+ file).
42
+
43
+ * Complete support for named properties in different GUID namespaces.
44
+
45
+ * Support for mapping property codes to symbolic names, with many
46
+ included.
47
+
48
+ * RTF decompression support included, as well as HTML extraction from
49
+ RTF where appropriate (both in pure ruby, see <tt>lib/msg/rtf.rb</tt>)
50
+
51
+ * Initial RTF converter, for providing a readable body when only RTF
52
+ exists (needs work)
53
+
54
+ * Initial support for handling embedded ole files, converting nested
55
+ .msg files to message/rfc822 attachments, and serializing others
56
+ as ole file attachments (allows you to view embedded excel for example).
57
+
58
+ = Usage
59
+
60
+ At the command line, it is simple to convert individual msg files
61
+ to .eml, or to convert a batch to an mbox format file. See help for
62
+ details:
63
+
64
+ msgtool -c some_email.msg > some_email.eml
65
+ msgtool -m *.msg > mbox
66
+
67
+ There is also a fairly complete and easy to use high level library
68
+ access:
69
+
70
+ require 'msg'
71
+
72
+ msg = Msg.open filename
73
+
74
+ # access to the 3 main data stores, if you want to poke with the msg
75
+ # internals
76
+ msg.recipients
77
+ # => [#<Recipient:'\'Marley, Bob\' <bob.marley@gmail.com>'>]
78
+ msg.attachments
79
+ # => [#<Attachment filename='blah1.tif'>, #<Attachment filename='blah2.tif'>]
80
+ msg.properties
81
+ # => #<Properties ... normalized_subject='Testing' ...
82
+ # creation_time=#<DateTime: 2454042.45074714,0,2299161> ...>
83
+
84
+ To completely abstract away all msg peculiarities, convert the msg
85
+ to a mime object. The message as a whole, and some of its main parts
86
+ support conversion to mime objects.
87
+
88
+ msg.attachments.first.to_mime
89
+ # => #<Mime content_type='application/octet-stream'>
90
+ mime = msg.to_mime
91
+ puts mime.to_tree
92
+ # =>
93
+ - #<Mime content_type='multipart/mixed'>
94
+ |- #<Mime content_type='multipart/alternative'>
95
+ | |- #<Mime content_type='text/plain'>
96
+ | \- #<Mime content_type='text/html'>
97
+ |- #<Mime content_type='application/octet-stream'>
98
+ \- #<Mime content_type='application/octet-stream'>
99
+
100
+ # convert mime object to serialised form,
101
+ # inclusive of attachments etc. (not ideal in memory, but its wip).
102
+ puts mime.to_s
103
+
104
+ = Other
105
+
106
+ For more information, see
107
+
108
+ * TODO
109
+
110
+ * MsgDetails[http://code.google.com/p/ruby-msg/wiki/MsgDetails]
111
+
112
+ * OleDetails[http://code.google.com/p/ruby-ole/wiki/OleDetails]
113
+
114
+ * msgconv[http://www.matijs.net/software/msgconv/], the original
115
+ perl converter.
116
+
data/Rakefile CHANGED
@@ -8,55 +8,69 @@ require 'fileutils'
8
8
 
9
9
  $:.unshift 'lib'
10
10
 
11
- require 'msg'
11
+ require 'mapi/msg'
12
12
 
13
13
  PKG_NAME = 'ruby-msg'
14
- PKG_VERSION = Msg::VERSION
14
+ PKG_VERSION = Mapi::VERSION
15
15
 
16
16
  task :default => [:test]
17
17
 
18
18
  Rake::TestTask.new(:test) do |t|
19
- t.test_files = FileList["test/test_*.rb"]
20
- t.warning = true
19
+ t.test_files = FileList["test/test_*.rb"] - ['test/test_pst.rb']
20
+ t.warning = false
21
21
  t.verbose = true
22
22
  end
23
23
 
24
- # RDocTask wasn't working for me
25
- desc 'Build the rdoc HTML Files'
26
- task :rdoc do
27
- system "rdoc -S -N --main Msg --tab-width 2 --title '#{PKG_NAME} documentation' lib"
24
+ begin
25
+ require 'rcov/rcovtask'
26
+ # NOTE: this will not do anything until you add some tests
27
+ desc "Create a cross-referenced code coverage report"
28
+ Rcov::RcovTask.new do |t|
29
+ t.test_files = FileList['test/test*.rb']
30
+ t.ruby_opts << "-Ilib" # in order to use this rcov
31
+ t.rcov_opts << "--xrefs" # comment to disable cross-references
32
+ t.rcov_opts << "--exclude /usr/local/lib/site_ruby"
33
+ t.verbose = true
34
+ end
35
+ rescue LoadError
36
+ # Rcov not available
37
+ end
38
+
39
+ Rake::RDocTask.new do |t|
40
+ t.rdoc_dir = 'doc'
41
+ t.title = "#{PKG_NAME} documentation"
42
+ t.options += %w[--main README --line-numbers --inline-source --tab-width 2]
43
+ t.rdoc_files.include 'lib/**/*.rb'
44
+ t.rdoc_files.include 'README'
28
45
  end
29
46
 
30
47
  spec = Gem::Specification.new do |s|
31
- s.name = PKG_NAME
32
- s.version = PKG_VERSION
33
- s.summary = %q{Ruby Msg library.}
48
+ s.name = PKG_NAME
49
+ s.version = PKG_VERSION
50
+ s.summary = %q{Ruby Msg library.}
34
51
  s.description = %q{A library for reading Outlook msg files, and for converting them to RFC2822 emails.}
35
- s.authors = ["Charles Lowe"]
36
- s.email = %q{aquasync@gmail.com}
37
- s.homepage = %q{http://code.google.com/p/ruby-msg}
38
- #s.rubyforge_project = %q{ruby-msg}
39
-
40
- s.executables = ['msgtool']
41
- s.files = Dir.glob('data/*.yaml') + ['Rakefile', 'README', 'FIXES']
42
- s.files += Dir.glob("lib/**/*.rb")
43
- s.files += Dir.glob("test/test_*.rb")
44
- s.files += Dir.glob("bin/*")
52
+ s.authors = ["Charles Lowe"]
53
+ s.email = %q{aquasync@gmail.com}
54
+ s.homepage = %q{http://code.google.com/p/ruby-msg}
55
+ s.rubyforge_project = %q{ruby-msg}
56
+
57
+ s.executables = ['mapitool']
58
+ s.files = FileList['data/*.yaml', 'Rakefile', 'README', 'FIXES']
59
+ s.files += FileList['lib/**/*.rb', 'test/test_*.rb', 'bin/*']
45
60
 
46
- s.has_rdoc = true
47
- s.rdoc_options += ['--main', 'Msg',
61
+ s.has_rdoc = true
62
+ s.extra_rdoc_files = ['README']
63
+ s.rdoc_options += ['--main', 'README',
48
64
  '--title', "#{PKG_NAME} documentation",
49
65
  '--tab-width', '2']
50
66
 
51
-
52
- s.autorequire = 'msg'
53
-
54
- s.add_dependency 'ruby-ole', '>=1.2.1'
67
+ s.add_dependency 'ruby-ole', '>=1.2.4'
68
+ s.add_dependency 'vpim', '>=0.360'
55
69
  end
56
70
 
57
71
  Rake::GemPackageTask.new(spec) do |p|
58
72
  p.gem_spec = spec
59
- p.need_tar = true
73
+ p.need_tar = false #true
60
74
  p.need_zip = false
61
75
  p.package_dir = 'build'
62
76
  end
@@ -0,0 +1,195 @@
1
+ #! /usr/bin/ruby
2
+
3
+ $:.unshift File.dirname(__FILE__) + '/../lib'
4
+
5
+ require 'optparse'
6
+ require 'rubygems'
7
+ require 'mapi/msg'
8
+ require 'mapi/pst'
9
+ require 'mapi/convert'
10
+ require 'time'
11
+
12
+ class Mapitool
13
+ attr_reader :files, :opts
14
+ def initialize files, opts
15
+ @files, @opts = files, opts
16
+ seen_pst = false
17
+ raise ArgumentError, 'Must specify 1 or more input files.' if files.empty?
18
+ files.map! do |f|
19
+ ext = File.extname(f.downcase)[1..-1]
20
+ raise ArgumentError, 'Unsupported file type - %s' % f unless ext =~ /^(msg|pst)$/
21
+ raise ArgumentError, 'Expermiental pst support not enabled' if ext == 'pst' and !opts[:enable_pst]
22
+ [ext.to_sym, f]
23
+ end
24
+ if dir = opts[:output_dir]
25
+ Dir.mkdir(dir) unless File.directory?(dir)
26
+ end
27
+ end
28
+
29
+ def each_message(&block)
30
+ files.each do |format, filename|
31
+ if format == :pst
32
+ if filter_path = opts[:filter_path]
33
+ filter_path = filter_path.tr("\\", '/').gsub(/\/+/, '/').sub(/^\//, '').sub(/\/$/, '')
34
+ end
35
+ open filename do |io|
36
+ pst = Mapi::Pst.new io
37
+ pst.each do |message|
38
+ next unless message.type == :message
39
+ if filter_path
40
+ next unless message.path =~ /^#{Regexp.quote filter_path}(\/|$)/i
41
+ end
42
+ yield message
43
+ end
44
+ end
45
+ else
46
+ Mapi::Msg.open filename, &block
47
+ end
48
+ end
49
+ end
50
+
51
+ def run
52
+ each_message(&method(:process_message))
53
+ end
54
+
55
+ def make_unique filename
56
+ @map ||= {}
57
+ return @map[filename] if !opts[:individual] and @map[filename]
58
+ try = filename
59
+ i = 1
60
+ try = filename.gsub(/(\.[^.]+)$/, ".#{i += 1}\\1") while File.exist?(try)
61
+ @map[filename] = try
62
+ try
63
+ end
64
+
65
+ def process_message message
66
+ # TODO make this more informative
67
+ mime_type = message.mime_type
68
+ return unless pair = Mapi::Message::CONVERSION_MAP[mime_type]
69
+
70
+ combined_map = {
71
+ 'eml' => 'Mail.mbox',
72
+ 'vcf' => 'Contacts.vcf',
73
+ 'txt' => 'Posts.txt'
74
+ }
75
+
76
+ # TODO handle merged mode, pst, etc etc...
77
+ case message
78
+ when Mapi::Msg
79
+ if opts[:individual]
80
+ filename = message.root.ole.io.path.gsub(/msg$/i, pair.last)
81
+ else
82
+ filename = combined_map[pair.last] or raise NotImplementedError
83
+ end
84
+ when Mapi::Pst::Item
85
+ if opts[:individual]
86
+ filename = "#{message.subject.tr ' ', '_'}.#{pair.last}".gsub(/[^A-Za-z0-9.()\[\]{}-]/, '_')
87
+ else
88
+ filename = combined_map[pair.last] or raise NotImplementedError
89
+ filename = (message.path.tr(' /', '_.').gsub(/[^A-Za-z0-9.()\[\]{}-]/, '_') + '.' + File.extname(filename)).squeeze('.')
90
+ end
91
+ dir = File.dirname(message.instance_variable_get(:@desc).pst.io.path)
92
+ filename = File.join dir, filename
93
+ else
94
+ raise
95
+ end
96
+
97
+ if dir = opts[:output_dir]
98
+ filename = File.join dir, File.basename(filename)
99
+ end
100
+
101
+ filename = make_unique filename
102
+
103
+ write_message = proc do |f|
104
+ data = message.send(pair.first).to_s
105
+ if !opts[:individual] and pair.last == 'eml'
106
+ # we do the append > style mbox quoting (mboxrd i think its called), as it
107
+ # is the only one that can be robuslty un-quoted. evolution doesn't use this!
108
+ f.puts "From mapitool@localhost #{Time.now.rfc2822}"
109
+ #munge_headers mime, opts
110
+ data.each do |line|
111
+ if line =~ /^>*From /o
112
+ f.print '>' + line
113
+ else
114
+ f.print line
115
+ end
116
+ end
117
+ else
118
+ f.write data
119
+ end
120
+ end
121
+
122
+ if opts[:stdout]
123
+ write_message[STDOUT]
124
+ else
125
+ open filename, 'a', &write_message
126
+ end
127
+ end
128
+
129
+ def munge_headers mime, opts
130
+ opts[:header_defaults].each do |s|
131
+ key, val = s.match(/(.*?):\s+(.*)/)[1..-1]
132
+ mime.headers[key] = [val] if mime.headers[key].empty?
133
+ end
134
+ end
135
+ end
136
+
137
+ def mapitool
138
+ opts = {:verbose => false, :action => :convert, :header_defaults => []}
139
+ op = OptionParser.new do |op|
140
+ op.banner = "Usage: mapitool [options] [files]"
141
+ #op.separator ''
142
+ #op.on('-c', '--convert', 'Convert input files (default)') { opts[:action] = :convert }
143
+ op.separator ''
144
+ op.on('-o', '--output-dir DIR', 'Put all output files in DIR') { |d| opts[:output_dir] = d }
145
+ op.on('-i', '--[no-]individual', 'Do not combine converted files') { |i| opts[:individual] = i }
146
+ op.on('-s', '--stdout', 'Write all data to stdout') { opts[:stdout] = true }
147
+ op.on('-f', '--filter-path PATH', 'Only process pst items in PATH') { |path| opts[:filter_path] = path }
148
+ op.on( '--enable-pst', 'Turn on experimental PST support') { opts[:enable_pst] = true }
149
+ #op.on('-d', '--header-default STR', 'Provide a default value for top level mail header') { |hd| opts[:header_defaults] << hd }
150
+ # --enable-pst
151
+ op.separator ''
152
+ op.on('-v', '--[no-]verbose', 'Run verbosely') { |v| opts[:verbose] = v }
153
+ op.on_tail('-h', '--help', 'Show this message') { puts op; exit }
154
+ end
155
+
156
+ files = op.parse ARGV
157
+
158
+ # for windows. see issue #2
159
+ STDOUT.binmode
160
+
161
+ Mapi::Log.level = Ole::Log.level = opts[:verbose] ? Logger::WARN : Logger::FATAL
162
+
163
+ tool = begin
164
+ Mapitool.new(files, opts)
165
+ rescue ArgumentError
166
+ puts $!
167
+ puts op
168
+ exit 1
169
+ end
170
+
171
+ tool.run
172
+ end
173
+
174
+ mapitool
175
+
176
+ __END__
177
+
178
+ mapitool [options] [files]
179
+
180
+ files is a list of *.msg & *.pst files.
181
+
182
+ one of the options should be some sort of path filter to apply to pst items.
183
+
184
+ --filter-path=
185
+ --filter-type=eml,vcf
186
+
187
+ with that out of the way, the entire list of files can be converted into a
188
+ list of items (with meta data about the source).
189
+
190
+ --convert
191
+ --[no-]separate one output file per item or combined output
192
+ --stdout
193
+ --output-dir=.
194
+
195
+