sup 0.0.7 → 0.0.8

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of sup might be problematic. Click here for more details.

@@ -0,0 +1,31 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'rubygems'
4
+ require 'trollop'
5
+ require "sup"
6
+
7
+ $opts = Trollop::options do
8
+ version "sup-dump (sup #{Redwood::VERSION})"
9
+ banner <<EOS
10
+ Dumps all message state from the sup index to standard out. You can
11
+ later use sup-sync --restored --restore <filename> to recover the index.
12
+
13
+ This tool is primarily useful in the event that a Ferret upgrade breaks
14
+ the index format. This happened, for example, at Ferret version 0.11.
15
+
16
+ Usage:
17
+ sup-dump > <filename>
18
+ sup-dump | bzip2 > <filename> # even better
19
+
20
+ No options.
21
+ EOS
22
+ end
23
+
24
+ index = Redwood::Index.new
25
+ index.load
26
+
27
+ (1 ... index.index.reader.max_doc).each do |i|
28
+ next if index.index.deleted? i
29
+ d = index.index[i]
30
+ puts [d[:message_id], "(" + d[:label] + ")"] * " "
31
+ end
@@ -0,0 +1,235 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'uri'
4
+ require 'rubygems'
5
+ require 'trollop'
6
+ require "sup"
7
+
8
+ class Float
9
+ def to_s; sprintf '%.2f', self; end
10
+ end
11
+
12
+ class Numeric
13
+ def to_time_s
14
+ i = to_i
15
+ sprintf "%d:%02d:%02d", i / 3600, (i / 60) % 60, i % 60
16
+ end
17
+ end
18
+
19
+ def time
20
+ startt = Time.now
21
+ yield
22
+ Time.now - startt
23
+ end
24
+
25
+ opts = Trollop::options do
26
+ version "sup-sync (sup #{Redwood::VERSION})"
27
+ banner <<EOS
28
+ Synchronizes the Sup index with one or more message sources by adding
29
+ messages, deleting messages, or changing message state in the index as
30
+ appropriate.
31
+
32
+ "Message state" means read/unread, archived/inbox, starred/unstarred,
33
+ and all user-defined labels on each message.
34
+
35
+ "Default source state" refers to any state that a source itself has
36
+ keeps about a message. Sup-sync uses this information when adding a
37
+ new message to the index. The source state is typically limited to
38
+ read/unread, archived/inbox status and a single label based on the
39
+ source name. Messages using the default source state are placed in
40
+ the inbox (i.e. not archived) and unstarred.
41
+
42
+ Usage:
43
+ sup-sync [options] <source>*
44
+
45
+ where <source>* is zero or more source URIs. If no sources are given,
46
+ sync from all usual sources. All supported source URI schemes can
47
+ be seen by running "sup-add --help".
48
+
49
+ Options controlling WHICH messages sup-sync operates on:
50
+ EOS
51
+ opt :new, "Operate on new messages only. Don't scan over the entire source. (Default.)", :short => :none
52
+ opt :changed, "Scan over the entire source for messages that have been deleted, altered, or moved from another source. (In the case of mbox sources, this includes all messages AFTER an altered message.)"
53
+ opt :restored, "Operate only on those messages included in a dump file as specified by --restore which have changed state."
54
+ opt :all, "Operate on all messages in the source, regardless of newness or changedness."
55
+ opt :start_at, "For --changed and --all, start at a particular offset.", :type => :int
56
+
57
+ text <<EOS
58
+
59
+ Options controlling HOW message state is altered:
60
+ EOS
61
+ opt :asis, "If the message is already in the index, preserve its state. Otherwise, use default source state. (Default.)", :short => :none
62
+ opt :restore, "Restore message state from a dump file created with sup-dump. If a message is not in this dumpfile, act as --asis.", :type => String, :short => :none
63
+ opt :discard, "Discard any message state in the index and use the default source state. Dangerous!", :short => :none
64
+ opt :archive, "When using the default source state, mark messages as archived.", :short => "-x"
65
+ opt :read, "When using the default source state, mark messages as read."
66
+ opt :extra_labels, "When using the default source state, also apply these user-defined labels. Should be a comma-separated list.", :type => String, :short => :none
67
+
68
+ text <<EOS
69
+
70
+ Other options:
71
+ EOS
72
+ opt :verbose, "Print message ids as they're processed."
73
+ opt :optimize, "As the final operation, optimize the index."
74
+ opt :all_sources, "Scan over all sources.", :short => :none
75
+ opt :dry_run, "Don't actually modify the index. Probably only useful with --verbose.", :short => "-n"
76
+ opt :version, "Show version information", :short => :none
77
+
78
+ conflicts :changed, :all, :new, :restored
79
+ conflicts :asis, :restore, :discard
80
+ end
81
+ Trollop::die :restored, "requires --restore" if opts[:restored] unless opts[:restore]
82
+ if opts[:start_at]
83
+ Trollop::die :start_at, "must be non-negative" if opts[:start_at] < 0
84
+ Trollop::die :start_at, "requires either --changed or --all" unless opts[:changed] || opts[:all]
85
+ end
86
+
87
+ target = [:new, :changed, :all, :restored].find { |x| opts[x] } || :new
88
+ op = [:asis, :restore, :discard].find { |x| opts[x] } || :asis
89
+
90
+ Redwood::start
91
+ index = Redwood::Index.new
92
+ index.load
93
+
94
+ restored_state =
95
+ if opts[:restore]
96
+ dump = {}
97
+ $stderr.puts "Loading state dump from #{opts[:restore]}..."
98
+ IO.foreach opts[:restore] do |l|
99
+ l =~ /^(\S+) \((.*?)\)$/ or raise "Can't read dump line: #{l.inspect}"
100
+ mid, labels = $1, $2
101
+ dump[mid] = labels.split(" ").map { |x| x.intern }
102
+ end
103
+ $stderr.puts "Read #{dump.size} entries from dump file."
104
+ dump
105
+ else
106
+ {}
107
+ end
108
+
109
+ sources = ARGV.map do |uri|
110
+ uri = "mbox://#{uri}" unless uri =~ %r!://!
111
+ index.source_for uri or Trollop::die "Unknown source: #{uri}. Did you add it with sup-add first?"
112
+ end
113
+
114
+ sources = index.usual_sources if sources.empty?
115
+ sources = index.sources if opts[:all_sources]
116
+
117
+ unless target == :new
118
+ if opts[:start_at]
119
+ sources.each { |s| s.seek_to! opts[:start_at] }
120
+ else
121
+ sources.each { |s| s.reset! }
122
+ end
123
+ end
124
+
125
+ seen = {}
126
+ begin
127
+ sources.each do |source|
128
+ $stderr.puts "Scanning #{source}..."
129
+ num_added = num_updated = num_scanned = num_restored = 0
130
+ last_info_time = start_time = Time.now
131
+
132
+ Redwood::PollManager.add_messages_from source do |m, offset, entry|
133
+ num_scanned += 1
134
+ seen[m.id] = true
135
+
136
+ ## skip if we're operating only on changed messages, the message
137
+ ## is in the index, and it's unchanged from what the source is
138
+ ## reporting.
139
+ next if target == :changed && entry && entry[:source_id].to_i == source.id && entry[:source_info].to_i == offset
140
+
141
+ ## get the state currently in the index
142
+ index_state =
143
+ if entry
144
+ entry[:label].split(/\s+/).map { |x| x.intern }
145
+ else
146
+ nil
147
+ end
148
+
149
+ ## skip if we're operating on restored messages, and this one
150
+ ## ain't.
151
+ next if target == :restored && (!restored_state[m.id] || restored_state[m.id].sort_by { |s| s.to_s } == index_state.sort_by { |s| s.to_s })
152
+
153
+ ## m.labels is the default source labels. tweak these according
154
+ ## to default source state modification flags.
155
+ m.labels -= [:inbox] if opts[:archive]
156
+ m.labels -= [:unread] if opts[:read]
157
+ m.labels += opts[:extra_labels].split(/\s*,\s*/).map { |x| x.intern } if opts[:extra_labels]
158
+
159
+ ## assign message labels based on the operation we're performing
160
+ case op
161
+ when :asis
162
+ m.labels = index_state if index_state
163
+ when :restore
164
+ ## if the entry exists on disk
165
+ if restored_state[m.id]
166
+ m.labels = restored_state[m.id]
167
+ num_restored += 1
168
+ elsif index_state
169
+ m.labels = index_state
170
+ end
171
+ when :discard
172
+ ## nothin! use default source labels
173
+ end
174
+
175
+ if Time.now - last_info_time > 60
176
+ last_info_time = Time.now
177
+ elapsed = last_info_time - start_time
178
+ pctdone = source.respond_to?(:pct_done) ? source.pct_done : 100.0 * (source.cur_offset.to_f - source.start_offset).to_f / (source.end_offset - source.start_offset).to_f
179
+ remaining = (100.0 - pctdone) * (elapsed.to_f / pctdone)
180
+ $stderr.puts "## #{num_added + num_updated} (#{pctdone}% done) read; #{elapsed.to_time_s} elapsed; est. #{remaining.to_time_s} remaining (for this source)"
181
+ end
182
+
183
+ if index_state.nil?
184
+ puts "Adding message #{source}##{offset} with state {#{m.labels * ', '}}" if opts[:verbose]
185
+ num_added += 1
186
+ else
187
+ puts "Updating message #{source}##{offset}, source #{entry[:source_id]} => #{source.id}, offset #{entry[:source_info]} => #{offset}, state {#{index_state * ', '}} => {#{m.labels * ', '}}" if opts[:verbose]
188
+ num_updated += 1
189
+ end
190
+
191
+ opts[:dry_run] ? nil : m
192
+ end
193
+ $stderr.puts "Scanned #{num_scanned}, added #{num_added}, updated #{num_updated} messages from #{source}."
194
+ $stderr.puts "Restored state on #{num_restored} (#{100.0 * num_restored / num_scanned}%) messages." if num_restored > 0
195
+ end
196
+ rescue Exception => e
197
+ File.open("sup-exception-log.txt", "w") { |f| f.puts e.backtrace }
198
+ raise
199
+ ensure
200
+ index.save
201
+ Redwood::finish
202
+ end
203
+
204
+ ## delete any messages in the index that claim they're from one of
205
+ ## these sources, but that we didn't see.
206
+ ##
207
+ ## kinda crappy code here, because we delve directly into the Ferret
208
+ ## API.
209
+ ##
210
+ ## TODO: move this to Index, i suppose.
211
+ if target == :all || target == :changed
212
+ $stderr.puts "Deleting missing messages from the index..."
213
+ num_del, num_scanned = 0, 0
214
+ sources.each do |source|
215
+ raise "no source id for #{source}" unless source.id
216
+ q = "+source_id:#{source.id}"
217
+ q += " +source_info: >= #{opts[:start_at]}" if opts[:start_at]
218
+ index.index.search_each(q, :limit => :all) do |docid, score|
219
+ num_scanned += 1
220
+ mid = index.index[docid][:message_id]
221
+ unless seen[mid]
222
+ puts "Deleting #{mid}" if opts[:verbose]
223
+ index.index.delete docid unless opts[:dry_run]
224
+ num_del += 1
225
+ end
226
+ end
227
+ end
228
+ $stderr.puts "Deleted #{num_del} / #{num_scanned} messages"
229
+ end
230
+
231
+ if opts[:optimize]
232
+ $stderr.puts "Optimizing index..."
233
+ optt = time { index.index.optimize unless opts[:dry_run] }
234
+ $stderr.puts "Optimized index of size #{index.size} in #{optt}s."
235
+ end
@@ -8,11 +8,9 @@ Q: If you love GMail so much, why not just use it?
8
8
  A: I hate ads, I hate using a mouse, and I hate non-programmability
9
9
  and non-extensibility.
10
10
 
11
- Also, GMail encourages top-posting in a variety of ways. THIS
12
- CANNOT BE TOLERATED!
11
+ Also, GMail encourages top-posting. THIS CANNOT BE TOLERATED!
13
12
 
14
13
  Q: Why the console?
15
-
16
14
  A: Because a keystroke is with a hundred mouse clicks (as any Unix
17
15
  user knows). Because you don't need web browser. Because you get
18
16
  instantaneous response and a simple interface.
@@ -23,33 +21,71 @@ A: You can manually mark messages as spam, which prevents them from
23
21
  filtering should be done by a dedicated tool like SpamAssassin.
24
22
 
25
23
  Q: How do I delete a message?
26
- A: Press the 'd' key.
24
+ A: Why delete? Unless it's spam, you might as well just archive it.
25
+
26
+ Q: C'mon, really now!
27
+ A: Ok, press the 'd' key.
27
28
 
28
29
  Q: But I want to delete it for real, not just add a 'deleted' flag in
29
30
  the index. I want it gone from disk!
30
- A: Deleting a message is an old-fashioned concept. In the modern
31
- world, disk space is cheap enough that you should never have to
32
- delete a message. If it's spam, save it for future analysis.
33
-
34
- Q: C'mon, really now!
35
31
  A: Ok, at some point I plan to have a batch deletion tool that will
36
32
  run through a source and delete all messages that have a 'spam' or
37
- 'deleted' tags (and, for mbox sources, will update the offsets of
38
- all later messages). But that doesn't exist yet.
33
+ 'deleted' tags. But that doesn't exist yet.
39
34
 
40
- Q: I got some error message about needing to run sup-import --rescan
35
+ Q: I got some error message about needing to run sup-sync --changed
41
36
  when I tried to read a message. What's that about?
42
37
  A: If messages have been moved, deleted, or altered in a source, Sup
43
38
  may have to rebuild its index for that source. For example, for
44
- mbox files, even reading a message changes the offsets of every
45
- file on disk. Rather than rescanning every time, Sup assumes
39
+ mbox files, reading a single unread message changes the offsets of
40
+ every file on disk. Rather than rescanning every time, Sup assumes
46
41
  sources don't change except by having new messages added. If that
47
- assumption is violated, you'll have to run sup-import --rescan.
42
+ assumption is violated, you'll have to sync the index.
48
43
 
49
44
  The alternative is to rescan every source when Sup starts
50
45
  up. Because Sup is designed to work with arbitrarily large mbox
51
46
  files, this would not be a good idea.
52
47
 
48
+ Q: How do I back up my index?
49
+ Q: How do I make a state dump?
50
+ A: Since the contents of the messages are recoverable from their
51
+ sources using sup-sync, all you need to back up is the message
52
+ state. To do this, simply run:
53
+ sup-dump > <dumpfile>
54
+ This will save all message state in a big text file, which you
55
+ should probably compress.
56
+
57
+ Q: How do I restore the message state I saved in my state dump?
58
+ A: Run:
59
+ sup-sync [<source>+] --restored --restore <dumpfile>
60
+ where <dumpfile> was created as above.
61
+
62
+ Q: I see this message from Ferret:
63
+ Error occured in index.c:825 - sis_find_segments_file
64
+ A: Yikes! You've upgraded Ferret and the index format changed beneath
65
+ you. Follow the index rebuild instructions below.
66
+
67
+ Q: I upgraded Ferret and the index format changed. I need to
68
+ completely rebuild my index. How do I do this?
69
+ A: First, you'll need a complete state dump. If you haven't made
70
+ one, you'll need to downgrade Ferret and make a state dump as
71
+ above. Then run these commands:
72
+ rm -rf ~/.sup/ferret # omg wtf
73
+ sup-sync --all-sources --all --restore <dumpfile>
74
+ Voila! A brand new index.
75
+
76
+ Q: I want to move messages from one source to another. (E.g., my
77
+ primary inbox is an IMAP server with a quota, and I want to move
78
+ some of those messages to local mbox files.) How do I do that while
79
+ preserving message state?
80
+ A: Move the messages from the source to the target using whatever tool
81
+ you'd like. Then (and this is the important part), run:
82
+ sup-sync --changed <source1> <source2>
83
+
84
+ If you sup-sync only one source at a time, depending on the order,
85
+ the messages may be treated as missing and then deleted from the
86
+ index, which means that their state will be lost when you sync the
87
+ other source.
88
+
53
89
  Q: What are all these "Redwood" references I see in the code?
54
90
  A: That was Sup's original name. (Think pine, elm. Although I am a
55
91
  Mutt user, I couldn't think of a good progression there.) But it was
@@ -59,18 +95,9 @@ A: That was Sup's original name. (Think pine, elm. Although I am a
59
95
  Maybe one day I'll do a huge search-and-replace on the code, but it
60
96
  doesn't seem that important at this point.
61
97
 
62
- Q: I want to move messages from one source to another. (E.g., my
63
- primary inbox is an IMAP server with a quota, and I want to move
64
- some of those messages to local mbox files.) How do I do that while
65
- preserving message state?
66
- A: Move the messages from the source to the target using whatever tool
67
- you'd like. Then (and this is the important part), sup-import
68
- --rebuild both sources at once. If you do it one at a time, you may
69
- lose message state. (Depending, actually, on which order you do it
70
- in. But just do them both at once.)
71
-
72
98
  Q: How is Sup possible?
73
99
  A: Sup is only possible through the hard work of Dave Balmain, the
74
100
  author of ferret, which is the search engine behind Sup. Ferret is
75
101
  really a first-class piece of software, and it's due to the
76
102
  tremendous amount of time and effort he's put in to it.
103
+
@@ -1,24 +1,24 @@
1
- Should an email client have a philosophy? I think so. For many people,
2
- email is one of our primary means of communication. Something so
3
- important ought to warrant a little thought.
1
+ Should an email client have a philosophy? For many people, email is
2
+ one of our primary means of communication. Something so important
3
+ ought to warrant a little thought.
4
4
 
5
- So here's Sup's philosophy.
5
+ Here's Sup's philosophy.
6
6
 
7
7
  Using "traditional" email clients today is increasingly problematic.
8
8
  Anyone who's on a high-traffic mailing list knows this. My ruby-talk
9
- folder is 350 megs and Mutt sits there for 60 seconds while it opens
9
+ folder is 430 megs and Mutt sits there for 60 seconds while it opens
10
10
  it. Keeping up with the all the new traffic is painful, even with
11
11
  Mutt's excellent threading features, simply because there's so much of
12
12
  it. A single thread can span several pages in the folder index view
13
13
  alone! And Mutt is probably the fastest email client out there, and
14
- the most featureful and in terms of threading and mailing list
14
+ certainly most featureful and in terms of threading and mailing list
15
15
  support. God help me if I try and throw Thunderbird at that.
16
16
 
17
17
  The principle problem with traditional clients is that they deal with
18
18
  individual pieces of email. This places a high mental cost on the user
19
19
  for each incoming email, by forcing them to ask: Should I keep this
20
- email, or delete it? If I keep it, where should I file it?
21
- For example, I've spent the last 10 years of my life laboriously
20
+ email, or delete it? If I keep it, where should I file it? For
21
+ example, I've spent the last 10 years of my life laboriously
22
22
  hand-filing every email message I received and feeling a mild sense of
23
23
  panic every time an email was both "from Mom" and "about school". The
24
24
  massive amounts of email that many people receive, and the cheap cost
@@ -26,14 +26,14 @@ of storage, have made these questions both more costly and less useful
26
26
  to answer.
27
27
 
28
28
  I think GMail has taken the right approach. As a long-time Mutt user,
29
- I was blown away when I first saw people use GMail, because I saw them
30
- treat their email differently from how I had ever treated mine. I saw
29
+ I was pretty much blown away when I first saw people use GMail,
30
+ because I saw them treat email differently from how I ever had. I saw
31
31
  that making certain operations quantitatively easier (namely, search)
32
- resulted in a qualitative difference in usage. Uou didn't have to
32
+ resulted in a qualitative difference in usage. You didn't have to
33
33
  worry about filing things into folders correctly, because you could
34
- just find things later by searching for them. I also saw that
35
- thread-centrism had many advantages over message-centrism when message
36
- volume was high.
34
+ just find things later by searching for them. I also saw how
35
+ thread-centrism was advantageous over message-centrism when message
36
+ volume was high: if nothing else, there's simply less of them.
37
37
 
38
38
  Much of the inspiration for Sup was based on GMail. I think it's to
39
39
  the GMail designers' credit that they started with a somewhat ad-hoc
@@ -43,11 +43,11 @@ actually better than everything else out there. At least, that's how I
43
43
  imagine in happened. Maybe they knew what they were doing from the
44
44
  start.
45
45
 
46
- But ultimately, GMail wasn't right for me, which is why the idea for
47
- Sup was born.
46
+ But ultimately, GMail wasn't right for me (fuck top posting and HTML
47
+ mail), which is why the idea for Sup was born.
48
48
 
49
- Sup is based on the following principles, which I more or less stole
50
- directly from GMail:
49
+ Sup is based on the following principles, which I stole directly from
50
+ GMail:
51
51
 
52
52
  - An immediately accessible and fast search capability over the
53
53
  entire email archive eliminates most of the need for folders,
@@ -65,4 +65,4 @@ do with the fantastic productivity of a console- and keyboard-based
65
65
  application, the usefulness of multiple buffers, the necessity of
66
66
  handling multiple email accounts, etc. But those are just details!
67
67
 
68
- So give it a go and let me know what you think.
68
+ Let me know what you think.