s3sync 0.3.4 → 1.2.5

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,175 @@
1
+ Change Log:
2
+ -----------
3
+
4
+ 2006-09-29:
5
+ Added support for --expires and --cache-control. Eg:
6
+ --expires="Thu, 01 Dec 2007 16:00:00 GMT"
7
+ --cache-control="no-cache"
8
+
9
+ Thanks to Charles for pointing out the need for this, and supplying a patch
10
+ proving that it would be trivial to add =) Apologies for not including the short
11
+ form (-e) for the expires. I have a rule that options taking arguments should
12
+ use the long form.
13
+ ----------
14
+
15
+ 2006-10-04
16
+ Several minor debugs and edge cases.
17
+ Fixed a bug where retries didn't rewind the stream to start over.
18
+ ----------
19
+
20
+ 2006-10-12
21
+ Version 1.0.5
22
+ Finally figured out and fixed bug of trying to follow local symlink-to-directory.
23
+ Fixed a really nasty sorting discrepancy that caused problems when files started
24
+ with the same name as a directory.
25
+ Retry on connection-reset on the S3 side.
26
+ Skip files that we can't read instead of dying.
27
+ ----------
28
+
29
+ 2006-10-12
30
+ Version 1.0.6
31
+ Some GC voodoo to try and keep a handle on the memory footprint a little better.
32
+ There is still room for improvement here.
33
+ ----------
34
+
35
+ 2006-10-13
36
+ Version 1.0.7
37
+ Fixed symlink dirs being stored to S3 as real dirs (and failing with 400)
38
+ Added a retry catch for connection timeout error.
39
+ (Hopefully) caught a bug that expected every S3 listing to contain results
40
+ ----------
41
+
42
+ 2006-10-14
43
+ Version 1.0.8
44
+ Was testing for file? before symlink? in localnode.stream. This meant that for
45
+ symlink files it was trying to shove the real file contents into the symlink
46
+ body on s3.
47
+ ----------
48
+
49
+ 2006-10-14
50
+ Version 1.0.9
51
+ Woops, I was using "max-entries" for some reason but the proper header is
52
+ "max-keys". Not a big deal.
53
+ Broke out the S3try stuff into a separate file so I could re-use it for s3cmd.rb
54
+ ----------
55
+
56
+ 2006-10-16
57
+ Added a couple debug lines; not even enough to call it a version revision.
58
+ ----------
59
+
60
+ 2006-10-25
61
+ Version 1.0.10
62
+ UTF-8 fixes.
63
+ Catching a couple more retry-able errors in s3try (instead of aborting the
64
+ program).
65
+ ----------
66
+
67
+ 2006-10-26
68
+ Version 1.0.11
69
+ Revamped some details of the generators and comparator so that directories are
70
+ handled in a more exact and uniform fashion across local and S3.
71
+ ----------
72
+
73
+ 2006-11-28
74
+ Version 1.0.12
75
+ Added a couple more error catches to s3try.
76
+ ----------
77
+
78
+ 2007-01-08
79
+ Version 1.0.13
80
+ Numerous small changes to slash and path handling, in order to catch several
81
+ cases where "root" directory nodes were not being created on S3.
82
+ This makes restores work a lot more intuitively in many cases.
83
+ ----------
84
+
85
+ 2007-01-25
86
+ Version 1.0.14
87
+ Peter Fales' marker fix.
88
+ Also, markers should be decoded into native charset (because that's what s3
89
+ expects to see).
90
+ ----------
91
+
92
+ 2007-02-19
93
+ Version 1.1.0
94
+ *WARNING* Lots of path-handling changes. *PLEASE* test safely before you just
95
+ swap this in for your working 1.0.x version.
96
+
97
+ - Adding --exclude (and there was much rejoicing).
98
+ - Found Yet Another Leading Slash Bug with respect to local nodes. It was always
99
+ "recursing" into the first folder even if there was no trailing slash and -r
100
+ wasn't specified. What it should have done in this case is simply create a node
101
+ for the directory itself, then stop (not check the dir's contents).
102
+ - Local node canonicalization was (potentially) stripping the trailing slash,
103
+ which we need in order to make some decisios in the local generator.
104
+ - Fixed problem where it would prepend a "/" to s3 key names even with blank
105
+ prefix.
106
+ - Fixed S3->local when there's no "/" in the source so it doesn't try to create
107
+ a folder with the bucket name.
108
+ - Updated s3try and s3_s3sync_mod to allow SSL_CERT_FILE
109
+ ----------
110
+
111
+ 2007-02-22
112
+ Version 1.1.1
113
+ Fixed dumb regression bug caused by the S3->local bucket name fix in 1.1.0
114
+ ----------
115
+
116
+ 2007-02-25
117
+ Version 1.1.2
118
+ Added --progress
119
+ ----------
120
+
121
+ 2007-06-02
122
+ Version 1.1.3
123
+ IMPORTANT!
124
+ Pursuant to http://s3sync.net/forum/index.php?topic=49.0 , the tar.gz now
125
+ expands into its own sub-directory named "s3sync" instead of dumping all the
126
+ files into the current directory.
127
+
128
+ In the case of commands of the form:
129
+ s3sync -r somedir somebucket:
130
+ The root directory node in s3 was being stored as "somedir/" instead of "somedir"
131
+ which caused restores to mess up when you say:
132
+ s3sync -r somebucket: restoredir
133
+ The fix to this, by coincidence, actually makes s3fox work even *less* well with
134
+ s3sync. I really need to build my own xul+javascript s3 GUI some day.
135
+
136
+ Also fixed some of the NoMethodError stuff for when --progress is used
137
+ and caught Errno::ETIMEDOUT
138
+ ----------
139
+
140
+ 2007-07-12
141
+ Version 1.1.4
142
+ Added Alastair Brunton's yaml config code.
143
+ ----------
144
+
145
+ 2007-11-17
146
+ Version 1.2.1
147
+ Compatibility for S3 API revisions.
148
+ When retries are exhausted, emit an error.
149
+ Don't ever try to delete the 'root' local dir.
150
+ ----------
151
+
152
+ 2007-11-20
153
+ Version 1.2.2
154
+ Handle EU bucket 307 redirects (in s3try.rb)
155
+ --make-dirs added
156
+ ----------
157
+
158
+ 2007-11-20
159
+ Version 1.2.3
160
+ Fix SSL verification settings that broke in new S3 API.
161
+ ----------
162
+
163
+ 2008-01-06
164
+ Version 1.2.4
165
+ Run from any dir (search "here" for includes).
166
+ Search out s3config.yml in some likely places.
167
+ Reset connection (properly) on retry-able non-50x errors.
168
+ Fix calling format bug preventing it from working from yml.
169
+ Added http proxy support.
170
+ ----------
171
+
172
+ 2008-05-11
173
+ Version 1.2.5
174
+ Added option --no-md5
175
+ ----------
data/README ADDED
@@ -0,0 +1,401 @@
1
+ Welcome to s3sync.rb
2
+ --------------------
3
+ Home page, wiki, forum, bug reports, etc: http://s3sync.net
4
+
5
+ This is a ruby program that easily transfers directories between a local
6
+ directory and an S3 bucket:prefix. It behaves somewhat, but not precisely, like
7
+ the rsync program. In particular, it shares rsync's peculiar behavior that
8
+ trailing slashes on the source side are meaningful. See examples below.
9
+
10
+ One benefit over some other comparable tools is that s3sync goes out of its way
11
+ to mirror the directory structure on S3. Meaning you don't *need* to use s3sync
12
+ later in order to view your files on S3. You can just as easily use an S3
13
+ shell, a web browser (if you used the --public-read option), etc. Note that
14
+ s3sync is NOT necessarily going to be able to read files you uploaded via some
15
+ other tool. This includes things uploaded with the old perl version! For best
16
+ results, start fresh!
17
+
18
+ s3sync runs happily on linux, probably other *ix, and also Windows (except that
19
+ symlinks and permissions management features don't do anything on Windows). If
20
+ you get it running somewhere interesting let me know (see below)
21
+
22
+ s3sync is free, and license terms are included in all the source files. If you
23
+ decide to make it better, or find bugs, please let me know.
24
+
25
+ The original inspiration for this tool is the perl script by the same name which
26
+ was made by Thorsten von Eicken (and later updated by me). This ruby program
27
+ does not share any components or logic from that utility; the only relation is
28
+ that it performs a similar task.
29
+
30
+
31
+ Examples:
32
+ ---------
33
+ (using S3 bucket 'mybucket' and prefix 'pre')
34
+ Put the local etc directory itself into S3
35
+ s3sync.rb -r /etc mybucket:pre
36
+ (This will yield S3 keys named pre/etc/...)
37
+ Put the contents of the local /etc dir into S3, rename dir:
38
+ s3sync.rb -r /etc/ mybucket:pre/etcbackup
39
+ (This will yield S3 keys named pre/etcbackup/...)
40
+ Put contents of S3 "directory" etc into local dir
41
+ s3sync.rb -r mybucket:pre/etc/ /root/etcrestore
42
+ (This will yield local files at /root/etcrestore/...)
43
+ Put the contents of S3 "directory" etc into a local dir named etc
44
+ s3sync.rb -r mybucket:pre/etc /root
45
+ (This will yield local files at /root/etc/...)
46
+ Put S3 nodes under the key pre/etc/ to the local dir etcrestore
47
+ **and create local dirs even if S3 side lacks dir nodes**
48
+ s3sync.rb -r --make-dirs mybucket:pre/etc/ /root/etcrestore
49
+ (This will yield local files at /root/etcrestore/...)
50
+
51
+
52
+ Prerequisites:
53
+ --------------
54
+ You need a functioning Ruby (>=1.8.4) installation, as well as the OpenSSL ruby
55
+ library (which may or may not come with your ruby).
56
+
57
+ How you get these items working on your system is really not any of my
58
+ business, but you might find the following things helpful. If you're using
59
+ Windows, the ruby site has a useful "one click installer" (although it takes
60
+ more clicks than that, really). On debian (and ubuntu, and other debian-like
61
+ things), there are apt packages available for ruby and the open ssl lib.
62
+
63
+
64
+ Your environment:
65
+ -----------------
66
+ s3sync needs to know several interesting values to work right. It looks for
67
+ them in the following environment variables -or- a s3config.yml file.
68
+ In the yml case, the names need to be lowercase (see example file).
69
+ Furthermore, the yml is searched for in the following locations, in order:
70
+ $S3CONF/s3config.yml
71
+ $HOME/.s3conf/s3config.yml
72
+ /etc/s3conf/s3config.yml
73
+
74
+ Required:
75
+ AWS_ACCESS_KEY_ID
76
+ AWS_SECRET_ACCESS_KEY
77
+
78
+ If you don't know what these are, then s3sync is probably not the
79
+ right tool for you to be starting out with.
80
+ Optional:
81
+ AWS_S3_HOST - I don't see why the default would ever be wrong
82
+ HTTP_PROXY_HOST,HTTP_PROXY_PORT,HTTP_PROXY_USER,HTTP_PROXY_PASSWORD - proxy
83
+ SSL_CERT_DIR - Where your Cert Authority keys live; for verification
84
+ SSL_CERT_FILE - If you have just one PEM file for CA verification
85
+ S3SYNC_RETRIES - How many HTTP errors to tolerate before exiting
86
+ S3SYNC_WAITONERROR - How many seconds to wait after an http error
87
+ S3SYNC_MIME_TYPES_FILE - Where is your mime.types file
88
+ S3SYNC_NATIVE_CHARSET - For example Windows-1252. Defaults to ISO-8859-1.
89
+ AWS_CALLING_FORMAT - Defaults to REGULAR
90
+ REGULAR # http://s3.amazonaws.com/bucket/key
91
+ SUBDOMAIN # http://bucket.s3.amazonaws.com/key
92
+ VANITY # http://<vanity_domain>/key
93
+
94
+ Important: For EU-located buckets you should set the calling format to SUBDOMAIN
95
+ Important: For US buckets with CAPS or other weird traits set the calling format
96
+ to REGULAR
97
+
98
+ I use "envdir" from the daemontools package to set up my env
99
+ variables easily: http://cr.yp.to/daemontools/envdir.html
100
+ For example:
101
+ envdir /root/s3sync/env /root/s3sync/s3sync.rb -etc etc etc
102
+ I know there are other similar tools out there as well.
103
+
104
+ You can also just call it in a shell script where you have exported the vars
105
+ first such as:
106
+ #!/bin/bash
107
+ export AWS_ACCESS_KEY_ID=valueGoesHere
108
+ ...
109
+ s3sync.rb -etc etc etc
110
+
111
+ But by far the easiest (and newest) way to set this up is to put the name:value
112
+ pairs in a file named s3config.yml and let the yaml parser pick them up. There
113
+ is an .example file shipped with the tar.gz to show what a yaml file looks like.
114
+ Thanks to Alastair Brunton for this addition.
115
+
116
+ You can also use some combination of .yaml and environment variables, if you
117
+ want. Go nuts.
118
+
119
+
120
+ Management tasks
121
+ ----------------
122
+ For low-level S3 operations not encapsulated by the sync paradigm, try the
123
+ companion utility s3cmd.rb. See README_s3cmd.txt.
124
+
125
+
126
+ About single files
127
+ ------------------
128
+ s3sync lacks the special case code that would be needed in order to handle a
129
+ source/dest that's a single file. This isn't one of the supported use cases so
130
+ don't expect it to work. You can use the companion utility s3cmd.rb for single
131
+ get/puts.
132
+
133
+
134
+ About Directories, the bane of any S3 sync-er
135
+ ---------------------------------------------
136
+ In S3 there's no actual concept of folders, just keys and nodes. So, every tool
137
+ uses its own proprietary way of storing dir info (my scheme being the best
138
+ naturally) and in general the methods are not compatible.
139
+
140
+ If you populate S3 by some means *other than* s3sync and then try to use s3sync
141
+ to "get" the S3 stuff to a local filesystem, you will want to use the
142
+ --make-dirs option. This causes the local dirs to be created even if there is no
143
+ s3sync-compatible directory node info stored on the S3 side. In other words,
144
+ local folders are conjured into existence whenever they are needed to make the
145
+ "get" succeed.
146
+
147
+
148
+ About MD5 hashes
149
+ ----------------
150
+ s3sync's normal operation is to compare the file size and MD5 hash of each item
151
+ to decide whether it needs syncing. On the S3 side, these hashes are stored and
152
+ returned to us as the "ETag" of each item when the bucket is listed, so it's
153
+ very easy. On the local side, the MD5 must be calculated by pushing every byte
154
+ in the file through the MD5 algorithm. This is CPU and IO intensive!
155
+
156
+ Thus you can specify the option --no-md5. This will compare the upload time on
157
+ S3 to the "last modified" time on the local item, and not do md5 calculations
158
+ locally at all. This might cause more transfers than are absolutely necessary.
159
+ For example if the file is "touched" to a newer modified date, but its contents
160
+ didn't change. Conversely if a file's contents are modified but the date is not
161
+ updated, then the sync will pass over it. Lastly, if your clock is very
162
+ different from the one on the S3 servers, then you may see unanticipated
163
+ behavior.
164
+
165
+
166
+ A word on SSL_CERT_DIR:
167
+ -----------------------
168
+ On my debian install I didn't find any root authority public keys. I installed
169
+ some by running this shell archive:
170
+ http://mirbsd.mirsolutions.de/cvs.cgi/src/etc/ssl.certs.shar
171
+ (You have to click download, and then run it wherever you want the certs to be
172
+ placed). I do not in any way assert that these certificates are good,
173
+ comprehensive, moral, noble, or otherwise correct. But I am using them.
174
+
175
+ If you don't set up a cert dir, and try to use ssl, then you'll 1) get an ugly
176
+ warning message slapped down by ruby, and 2) not have any protection AT ALL from
177
+ malicious servers posing as s3.amazonaws.com. Seriously... you want to get
178
+ this right if you're going to have any sensitive data being tossed around.
179
+ --
180
+ There is a debian package ca-certificates; this is what I'm using now.
181
+ apt-get install ca-certificates
182
+ and then use:
183
+ SSL_CERT_DIR=/etc/ssl/certs
184
+
185
+ You used to be able to use just one certificate, but recently AWS has started
186
+ using more than one CA.
187
+
188
+
189
+ Getting started:
190
+ ----------------
191
+ Invoke by typing s3sync.rb and you should get a nice usage screen.
192
+ Options can be specified in short or long form (except --delete, which has no
193
+ short form)
194
+
195
+ ALWAYS TEST NEW COMMANDS using --dryrun(-n) if you want to see what will be
196
+ affected before actually doing it. ESPECIALLY if you use --delete. Otherwise, do
197
+ not be surprised if you misplace a '/' or two and end up deleting all your
198
+ precious, precious files.
199
+
200
+ If you use the --public-read(-p) option, items sent to S3 will be ACL'd so that
201
+ anonymous web users can download them, given the correct URL. This could be
202
+ useful if you intend to publish directories of information for others to see.
203
+ For example, I use s3sync to publish itself to its home on S3 via the following
204
+ command: s3sync.rb -v -p publish/ ServEdge_pub:s3sync Where the files live in a
205
+ local folder called "publish" and I wish them to be copied to the URL:
206
+ http://s3.amazonaws.com/ServEdge_pub/s3sync/... If you use --ssl(-s) then your
207
+ connections with S3 will be encrypted. Otherwise your data will be sent in clear
208
+ form, i.e. easy to intercept by malicious parties.
209
+
210
+ If you want to prune items from the destination side which are not found on the
211
+ source side, you can use --delete. Always test this with -n first to make sure
212
+ the command line you specify is not going to do something terrible to your
213
+ cherished and irreplaceable data.
214
+
215
+
216
+ Updates and other discussion:
217
+ -----------------------------
218
+ The latest version of s3sync should normally be at:
219
+ http://s3.amazonaws.com/ServEdge_pub/s3sync/s3sync.tar.gz
220
+ and the Amazon S3 forums probably have a few threads going on it at any given
221
+ time. I may not always see things posted to the threads, so if you want you can
222
+ contact me at gbs-s3@10forward.com too.
223
+
224
+
225
+ Change Log:
226
+ -----------
227
+
228
+ 2006-09-29:
229
+ Added support for --expires and --cache-control. Eg:
230
+ --expires="Thu, 01 Dec 2007 16:00:00 GMT"
231
+ --cache-control="no-cache"
232
+
233
+ Thanks to Charles for pointing out the need for this, and supplying a patch
234
+ proving that it would be trivial to add =) Apologies for not including the short
235
+ form (-e) for the expires. I have a rule that options taking arguments should
236
+ use the long form.
237
+ ----------
238
+
239
+ 2006-10-04
240
+ Several minor debugs and edge cases.
241
+ Fixed a bug where retries didn't rewind the stream to start over.
242
+ ----------
243
+
244
+ 2006-10-12
245
+ Version 1.0.5
246
+ Finally figured out and fixed bug of trying to follow local symlink-to-directory.
247
+ Fixed a really nasty sorting discrepancy that caused problems when files started
248
+ with the same name as a directory.
249
+ Retry on connection-reset on the S3 side.
250
+ Skip files that we can't read instead of dying.
251
+ ----------
252
+
253
+ 2006-10-12
254
+ Version 1.0.6
255
+ Some GC voodoo to try and keep a handle on the memory footprint a little better.
256
+ There is still room for improvement here.
257
+ ----------
258
+
259
+ 2006-10-13
260
+ Version 1.0.7
261
+ Fixed symlink dirs being stored to S3 as real dirs (and failing with 400)
262
+ Added a retry catch for connection timeout error.
263
+ (Hopefully) caught a bug that expected every S3 listing to contain results
264
+ ----------
265
+
266
+ 2006-10-14
267
+ Version 1.0.8
268
+ Was testing for file? before symlink? in localnode.stream. This meant that for
269
+ symlink files it was trying to shove the real file contents into the symlink
270
+ body on s3.
271
+ ----------
272
+
273
+ 2006-10-14
274
+ Version 1.0.9
275
+ Woops, I was using "max-entries" for some reason but the proper header is
276
+ "max-keys". Not a big deal.
277
+ Broke out the S3try stuff into a separate file so I could re-use it for s3cmd.rb
278
+ ----------
279
+
280
+ 2006-10-16
281
+ Added a couple debug lines; not even enough to call it a version revision.
282
+ ----------
283
+
284
+ 2006-10-25
285
+ Version 1.0.10
286
+ UTF-8 fixes.
287
+ Catching a couple more retry-able errors in s3try (instead of aborting the
288
+ program).
289
+ ----------
290
+
291
+ 2006-10-26
292
+ Version 1.0.11
293
+ Revamped some details of the generators and comparator so that directories are
294
+ handled in a more exact and uniform fashion across local and S3.
295
+ ----------
296
+
297
+ 2006-11-28
298
+ Version 1.0.12
299
+ Added a couple more error catches to s3try.
300
+ ----------
301
+
302
+ 2007-01-08
303
+ Version 1.0.13
304
+ Numerous small changes to slash and path handling, in order to catch several
305
+ cases where "root" directory nodes were not being created on S3.
306
+ This makes restores work a lot more intuitively in many cases.
307
+ ----------
308
+
309
+ 2007-01-25
310
+ Version 1.0.14
311
+ Peter Fales' marker fix.
312
+ Also, markers should be decoded into native charset (because that's what s3
313
+ expects to see).
314
+ ----------
315
+
316
+ 2007-02-19
317
+ Version 1.1.0
318
+ *WARNING* Lots of path-handling changes. *PLEASE* test safely before you just
319
+ swap this in for your working 1.0.x version.
320
+
321
+ - Adding --exclude (and there was much rejoicing).
322
+ - Found Yet Another Leading Slash Bug with respect to local nodes. It was always
323
+ "recursing" into the first folder even if there was no trailing slash and -r
324
+ wasn't specified. What it should have done in this case is simply create a node
325
+ for the directory itself, then stop (not check the dir's contents).
326
+ - Local node canonicalization was (potentially) stripping the trailing slash,
327
+ which we need in order to make some decisios in the local generator.
328
+ - Fixed problem where it would prepend a "/" to s3 key names even with blank
329
+ prefix.
330
+ - Fixed S3->local when there's no "/" in the source so it doesn't try to create
331
+ a folder with the bucket name.
332
+ - Updated s3try and s3_s3sync_mod to allow SSL_CERT_FILE
333
+ ----------
334
+
335
+ 2007-02-22
336
+ Version 1.1.1
337
+ Fixed dumb regression bug caused by the S3->local bucket name fix in 1.1.0
338
+ ----------
339
+
340
+ 2007-02-25
341
+ Version 1.1.2
342
+ Added --progress
343
+ ----------
344
+
345
+ 2007-06-02
346
+ Version 1.1.3
347
+ IMPORTANT!
348
+ Pursuant to http://s3sync.net/forum/index.php?topic=49.0 , the tar.gz now
349
+ expands into its own sub-directory named "s3sync" instead of dumping all the
350
+ files into the current directory.
351
+
352
+ In the case of commands of the form:
353
+ s3sync -r somedir somebucket:
354
+ The root directory node in s3 was being stored as "somedir/" instead of "somedir"
355
+ which caused restores to mess up when you say:
356
+ s3sync -r somebucket: restoredir
357
+ The fix to this, by coincidence, actually makes s3fox work even *less* well with
358
+ s3sync. I really need to build my own xul+javascript s3 GUI some day.
359
+
360
+ Also fixed some of the NoMethodError stuff for when --progress is used
361
+ and caught Errno::ETIMEDOUT
362
+ ----------
363
+
364
+ 2007-07-12
365
+ Version 1.1.4
366
+ Added Alastair Brunton's yaml config code.
367
+ ----------
368
+
369
+ 2007-11-17
370
+ Version 1.2.1
371
+ Compatibility for S3 API revisions.
372
+ When retries are exhausted, emit an error.
373
+ Don't ever try to delete the 'root' local dir.
374
+ ----------
375
+
376
+ 2007-11-20
377
+ Version 1.2.2
378
+ Handle EU bucket 307 redirects (in s3try.rb)
379
+ --make-dirs added
380
+ ----------
381
+
382
+ 2007-11-20
383
+ Version 1.2.3
384
+ Fix SSL verification settings that broke in new S3 API.
385
+ ----------
386
+
387
+ 2008-01-06
388
+ Version 1.2.4
389
+ Run from any dir (search "here" for includes).
390
+ Search out s3config.yml in some likely places.
391
+ Reset connection (properly) on retry-able non-50x errors.
392
+ Fix calling format bug preventing it from working from yml.
393
+ Added http proxy support.
394
+ ----------
395
+
396
+ 2008-05-11
397
+ Version 1.2.5
398
+ Added option --no-md5
399
+ ----------
400
+
401
+ FNORD