s3-sync 1.2.6

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,184 @@
1
+ === 0.0.1 2009-08-05
2
+
3
+ * 1 major enhancement:
4
+ * Initial release
5
+
6
+
7
+
8
+ 2006-09-29:
9
+ Added support for --expires and --cache-control. Eg:
10
+ --expires="Thu, 01 Dec 2007 16:00:00 GMT"
11
+ --cache-control="no-cache"
12
+
13
+ Thanks to Charles for pointing out the need for this, and supplying a patch
14
+ proving that it would be trivial to add =) Apologies for not including the short
15
+ form (-e) for the expires. I have a rule that options taking arguments should
16
+ use the long form.
17
+ ----------
18
+
19
+ 2006-10-04
20
+ Several minor debugs and edge cases.
21
+ Fixed a bug where retries didn't rewind the stream to start over.
22
+ ----------
23
+
24
+ 2006-10-12
25
+ Version 1.0.5
26
+ Finally figured out and fixed bug of trying to follow local symlink-to-directory.
27
+ Fixed a really nasty sorting discrepancy that caused problems when files started
28
+ with the same name as a directory.
29
+ Retry on connection-reset on the S3 side.
30
+ Skip files that we can't read instead of dying.
31
+ ----------
32
+
33
+ 2006-10-12
34
+ Version 1.0.6
35
+ Some GC voodoo to try and keep a handle on the memory footprint a little better.
36
+ There is still room for improvement here.
37
+ ----------
38
+
39
+ 2006-10-13
40
+ Version 1.0.7
41
+ Fixed symlink dirs being stored to S3 as real dirs (and failing with 400)
42
+ Added a retry catch for connection timeout error.
43
+ (Hopefully) caught a bug that expected every S3 listing to contain results
44
+ ----------
45
+
46
+ 2006-10-14
47
+ Version 1.0.8
48
+ Was testing for file? before symlink? in localnode.stream. This meant that for
49
+ symlink files it was trying to shove the real file contents into the symlink
50
+ body on s3.
51
+ ----------
52
+
53
+ 2006-10-14
54
+ Version 1.0.9
55
+ Woops, I was using "max-entries" for some reason but the proper header is
56
+ "max-keys". Not a big deal.
57
+ Broke out the S3try stuff into a separate file so I could re-use it for s3cmd.rb
58
+ ----------
59
+
60
+ 2006-10-16
61
+ Added a couple debug lines; not even enough to call it a version revision.
62
+ ----------
63
+
64
+ 2006-10-25
65
+ Version 1.0.10
66
+ UTF-8 fixes.
67
+ Catching a couple more retry-able errors in s3try (instead of aborting the
68
+ program).
69
+ ----------
70
+
71
+ 2006-10-26
72
+ Version 1.0.11
73
+ Revamped some details of the generators and comparator so that directories are
74
+ handled in a more exact and uniform fashion across local and S3.
75
+ ----------
76
+
77
+ 2006-11-28
78
+ Version 1.0.12
79
+ Added a couple more error catches to s3try.
80
+ ----------
81
+
82
+ 2007-01-08
83
+ Version 1.0.13
84
+ Numerous small changes to slash and path handling, in order to catch several
85
+ cases where "root" directory nodes were not being created on S3.
86
+ This makes restores work a lot more intuitively in many cases.
87
+ ----------
88
+
89
+ 2007-01-25
90
+ Version 1.0.14
91
+ Peter Fales' marker fix.
92
+ Also, markers should be decoded into native charset (because that's what s3
93
+ expects to see).
94
+ ----------
95
+
96
+ 2007-02-19
97
+ Version 1.1.0
98
+ *WARNING* Lots of path-handling changes. *PLEASE* test safely before you just
99
+ swap this in for your working 1.0.x version.
100
+
101
+ - Adding --exclude (and there was much rejoicing).
102
+ - Found Yet Another Leading Slash Bug with respect to local nodes. It was always
103
+ "recursing" into the first folder even if there was no trailing slash and -r
104
+ wasn't specified. What it should have done in this case is simply create a node
105
+ for the directory itself, then stop (not check the dir's contents).
106
+ - Local node canonicalization was (potentially) stripping the trailing slash,
107
+ which we need in order to make some decisios in the local generator.
108
+ - Fixed problem where it would prepend a "/" to s3 key names even with blank
109
+ prefix.
110
+ - Fixed S3->local when there's no "/" in the source so it doesn't try to create
111
+ a folder with the bucket name.
112
+ - Updated s3try and s3_s3sync_mod to allow SSL_CERT_FILE
113
+ ----------
114
+
115
+ 2007-02-22
116
+ Version 1.1.1
117
+ Fixed dumb regression bug caused by the S3->local bucket name fix in 1.1.0
118
+ ----------
119
+
120
+ 2007-02-25
121
+ Version 1.1.2
122
+ Added --progress
123
+ ----------
124
+
125
+ 2007-06-02
126
+ Version 1.1.3
127
+ IMPORTANT!
128
+ Pursuant to http://s3sync.net/forum/index.php?topic=49.0 , the tar.gz now
129
+ expands into its own sub-directory named "s3sync" instead of dumping all the
130
+ files into the current directory.
131
+
132
+ In the case of commands of the form:
133
+ s3sync -r somedir somebucket:
134
+ The root directory node in s3 was being stored as "somedir/" instead of "somedir"
135
+ which caused restores to mess up when you say:
136
+ s3sync -r somebucket: restoredir
137
+ The fix to this, by coincidence, actually makes s3fox work even *less* well with
138
+ s3sync. I really need to build my own xul+javascript s3 GUI some day.
139
+
140
+ Also fixed some of the NoMethodError stuff for when --progress is used
141
+ and caught Errno::ETIMEDOUT
142
+ ----------
143
+
144
+ 2007-07-12
145
+ Version 1.1.4
146
+ Added Alastair Brunton's yaml config code.
147
+ ----------
148
+
149
+ 2007-11-17
150
+ Version 1.2.1
151
+ Compatibility for S3 API revisions.
152
+ When retries are exhausted, emit an error.
153
+ Don't ever try to delete the 'root' local dir.
154
+ ----------
155
+
156
+ 2007-11-20
157
+ Version 1.2.2
158
+ Handle EU bucket 307 redirects (in s3try.rb)
159
+ --make-dirs added
160
+ ----------
161
+
162
+ 2007-11-20
163
+ Version 1.2.3
164
+ Fix SSL verification settings that broke in new S3 API.
165
+ ----------
166
+
167
+ 2008-01-06
168
+ Version 1.2.4
169
+ Run from any dir (search "here" for includes).
170
+ Search out s3config.yml in some likely places.
171
+ Reset connection (properly) on retry-able non-50x errors.
172
+ Fix calling format bug preventing it from working from yml.
173
+ Added http proxy support.
174
+ ----------
175
+
176
+ 2008-05-11
177
+ Version 1.2.5
178
+ Added option --no-md5
179
+ ----------
180
+
181
+ 2008-06-16
182
+ Version 1.2.6
183
+ Catch connect errors and retry.
184
+ ----------
@@ -0,0 +1,318 @@
1
+ = s3sync
2
+
3
+ * http://github.com/mitchc2/s3sync
4
+
5
+ == DESCRIPTION:
6
+
7
+ Welcome to s3sync.rb
8
+ --------------------
9
+ Home page, wiki, forum, bug reports, etc: http://s3sync.net
10
+
11
+ This is a ruby program that easily transfers directories between a local
12
+ directory and an S3 bucket:prefix. It behaves somewhat, but not precisely, like
13
+ the rsync program. In particular, it shares rsync's peculiar behavior that
14
+ trailing slashes on the source side are meaningful. See examples below.
15
+
16
+ One benefit over some other comparable tools is that s3sync goes out of its way
17
+ to mirror the directory structure on S3. Meaning you don't *need* to use s3sync
18
+ later in order to view your files on S3. You can just as easily use an S3
19
+ shell, a web browser (if you used the --public-read option), etc. Note that
20
+ s3sync is NOT necessarily going to be able to read files you uploaded via some
21
+ other tool. This includes things uploaded with the old perl version! For best
22
+ results, start fresh!
23
+
24
+ s3sync runs happily on linux, probably other *ix, and also Windows (except that
25
+ symlinks and permissions management features don't do anything on Windows). If
26
+ you get it running somewhere interesting let me know (see below)
27
+
28
+ s3sync is free, and license terms are included in all the source files. If you
29
+ decide to make it better, or find bugs, please let me know.
30
+
31
+ The original inspiration for this tool is the perl script by the same name which
32
+ was made by Thorsten von Eicken (and later updated by me). This ruby program
33
+ does not share any components or logic from that utility; the only relation is
34
+ that it performs a similar task.
35
+
36
+
37
+ Management tasks
38
+ ----------------
39
+ For low-level S3 operations not encapsulated by the sync paradigm, try the
40
+ companion utility s3cmd.rb. See README_s3cmd.txt.
41
+
42
+
43
+ About single files
44
+ ------------------
45
+ s3sync lacks the special case code that would be needed in order to handle a
46
+ source/dest that's a single file. This isn't one of the supported use cases so
47
+ don't expect it to work. You can use the companion utility s3cmd.rb for single
48
+ get/puts.
49
+
50
+
51
+ About Directories, the bane of any S3 sync-er
52
+ ---------------------------------------------
53
+ In S3 there's no actual concept of folders, just keys and nodes. So, every tool
54
+ uses its own proprietary way of storing dir info (my scheme being the best
55
+ naturally) and in general the methods are not compatible.
56
+
57
+ If you populate S3 by some means *other than* s3sync and then try to use s3sync
58
+ to "get" the S3 stuff to a local filesystem, you will want to use the
59
+ --make-dirs option. This causes the local dirs to be created even if there is no
60
+ s3sync-compatible directory node info stored on the S3 side. In other words,
61
+ local folders are conjured into existence whenever they are needed to make the
62
+ "get" succeed.
63
+
64
+
65
+ About MD5 hashes
66
+ ----------------
67
+ s3sync's normal operation is to compare the file size and MD5 hash of each item
68
+ to decide whether it needs syncing. On the S3 side, these hashes are stored and
69
+ returned to us as the "ETag" of each item when the bucket is listed, so it's
70
+ very easy. On the local side, the MD5 must be calculated by pushing every byte
71
+ in the file through the MD5 algorithm. This is CPU and IO intensive!
72
+
73
+ Thus you can specify the option --no-md5. This will compare the upload time on
74
+ S3 to the "last modified" time on the local item, and not do md5 calculations
75
+ locally at all. This might cause more transfers than are absolutely necessary.
76
+ For example if the file is "touched" to a newer modified date, but its contents
77
+ didn't change. Conversely if a file's contents are modified but the date is not
78
+ updated, then the sync will pass over it. Lastly, if your clock is very
79
+ different from the one on the S3 servers, then you may see unanticipated
80
+ behavior.
81
+
82
+
83
+ A word on SSL_CERT_DIR:
84
+ -----------------------
85
+ On my debian install I didn't find any root authority public keys. I installed
86
+ some by running this shell archive:
87
+ http://mirbsd.mirsolutions.de/cvs.cgi/src/etc/ssl.certs.shar
88
+ (You have to click download, and then run it wherever you want the certs to be
89
+ placed). I do not in any way assert that these certificates are good,
90
+ comprehensive, moral, noble, or otherwise correct. But I am using them.
91
+
92
+ If you don't set up a cert dir, and try to use ssl, then you'll 1) get an ugly
93
+ warning message slapped down by ruby, and 2) not have any protection AT ALL from
94
+ malicious servers posing as s3.amazonaws.com. Seriously... you want to get
95
+ this right if you're going to have any sensitive data being tossed around.
96
+ --
97
+ There is a debian package ca-certificates; this is what I'm using now.
98
+ apt-get install ca-certificates
99
+ and then use:
100
+ SSL_CERT_DIR=/etc/ssl/certs
101
+
102
+ You used to be able to use just one certificate, but recently AWS has started
103
+ using more than one CA.
104
+
105
+
106
+ Getting started:
107
+ ----------------
108
+ Invoke by typing s3sync.rb and you should get a nice usage screen.
109
+ Options can be specified in short or long form (except --delete, which has no
110
+ short form)
111
+
112
+ ALWAYS TEST NEW COMMANDS using --dryrun(-n) if you want to see what will be
113
+ affected before actually doing it. ESPECIALLY if you use --delete. Otherwise, do
114
+ not be surprised if you misplace a '/' or two and end up deleting all your
115
+ precious, precious files.
116
+
117
+ If you use the --public-read(-p) option, items sent to S3 will be ACL'd so that
118
+ anonymous web users can download them, given the correct URL. This could be
119
+ useful if you intend to publish directories of information for others to see.
120
+ For example, I use s3sync to publish itself to its home on S3 via the following
121
+ command: s3sync.rb -v -p publish/ ServEdge_pub:s3sync Where the files live in a
122
+ local folder called "publish" and I wish them to be copied to the URL:
123
+ http://s3.amazonaws.com/ServEdge_pub/s3sync/... If you use --ssl(-s) then your
124
+ connections with S3 will be encrypted. Otherwise your data will be sent in clear
125
+ form, i.e. easy to intercept by malicious parties.
126
+
127
+ If you want to prune items from the destination side which are not found on the
128
+ source side, you can use --delete. Always test this with -n first to make sure
129
+ the command line you specify is not going to do something terrible to your
130
+ cherished and irreplaceable data.
131
+
132
+
133
+ Updates and other discussion:
134
+ -----------------------------
135
+ The latest version of s3sync should normally be at:
136
+ http://s3.amazonaws.com/ServEdge_pub/s3sync/s3sync.tar.gz
137
+ and the Amazon S3 forums probably have a few threads going on it at any given
138
+ time. I may not always see things posted to the threads, so if you want you can
139
+ contact me at gbs-s3@10forward.com too.
140
+
141
+
142
+ == FEATURES/PROBLEMS:
143
+
144
+ * FIX (list of features or problems)
145
+
146
+ == SYNOPSIS:
147
+
148
+ Examples:
149
+ ---------
150
+ (using S3 bucket 'mybucket' and prefix 'pre')
151
+ Put the local etc directory itself into S3
152
+ s3sync.rb -r /etc mybucket:pre
153
+ (This will yield S3 keys named pre/etc/...)
154
+ Put the contents of the local /etc dir into S3, rename dir:
155
+ s3sync.rb -r /etc/ mybucket:pre/etcbackup
156
+ (This will yield S3 keys named pre/etcbackup/...)
157
+ Put contents of S3 "directory" etc into local dir
158
+ s3sync.rb -r mybucket:pre/etc/ /root/etcrestore
159
+ (This will yield local files at /root/etcrestore/...)
160
+ Put the contents of S3 "directory" etc into a local dir named etc
161
+ s3sync.rb -r mybucket:pre/etc /root
162
+ (This will yield local files at /root/etc/...)
163
+ Put S3 nodes under the key pre/etc/ to the local dir etcrestore
164
+ **and create local dirs even if S3 side lacks dir nodes**
165
+ s3sync.rb -r --make-dirs mybucket:pre/etc/ /root/etcrestore
166
+ (This will yield local files at /root/etcrestore/...)
167
+
168
+ List all the buckets your account owns:
169
+ s3cmd.rb listbuckets
170
+
171
+ Create a new bucket:
172
+ s3cmd.rb createbucket BucketName
173
+
174
+ Create a new bucket in the EU:
175
+ s3cmd.rb createbucket BucketName EU
176
+
177
+ Find out the location constraint of a bucket:
178
+ s3cmd.rb location BucketName
179
+
180
+ Delete an old bucket you don't want any more:
181
+ s3cmd.rb deletebucket BucketName
182
+
183
+ Find out what's in a bucket, 10 lines at a time:
184
+ s3cmd.rb list BucketName 10
185
+
186
+ Only look in a particular prefix:
187
+ s3cmd.rb list BucketName:startsWithThis
188
+
189
+ Look in the virtual "directory" named foo;
190
+ lists sub-"directories" and keys that are at this level.
191
+ Note that if you specify a delimiter you must specify a max before it.
192
+ (until I make the options parsing smarter)
193
+ s3cmd.rb list BucketName:foo/ 10 /
194
+
195
+ Delete a key:
196
+ s3cmd.rb delete BucketName:AKey
197
+
198
+ Delete all keys that match (like a combo between list and delete):
199
+ s3cmd.rb deleteall BucketName:SomePrefix
200
+
201
+ Only pretend you're going to delete all keys that match, but list them:
202
+ s3cmd.rb --dryrun deleteall BucketName:SomePrefix
203
+
204
+ Delete all keys in a bucket (leaving the bucket):
205
+ s3cmd.rb deleteall BucketName
206
+
207
+ Get a file from S3 and store it to a local file
208
+ s3cmd.rb get BucketName:TheFileOnS3.txt ALocalFile.txt
209
+
210
+ Put a local file up to S3
211
+ Note we don't automatically set mime type, etc.
212
+ NOTE that the order of the options doesn't change. S3 stays first!
213
+ s3cmd.rb put BucketName:TheFileOnS3.txt ALocalFile.txt
214
+
215
+
216
+ A note about [headers]
217
+ ----------------------
218
+ For some S3 operations, such as "put", you might want to specify certain headers
219
+ to the request such as Cache-Control, Expires, x-amz-acl, etc. Rather than
220
+ supporting a load of separate command-line options for these, I just allow
221
+ header specification. So to upload a file with public-read access you could
222
+ say:
223
+ s3cmd.rb put MyBucket:TheFile.txt x-amz-acl:public-read
224
+
225
+ If you don't need to add any particular headers then you can just ignore this
226
+ whole [headers] thing and pretend it's not there. This is somewhat of an
227
+ advanced option.
228
+
229
+
230
+ == REQUIREMENTS:
231
+
232
+ * FIX (list of requirements)
233
+
234
+ == INSTALL:
235
+
236
+ sudo gem install s3sync
237
+
238
+
239
+ Your environment:
240
+ -----------------
241
+ s3sync needs to know several interesting values to work right. It looks for
242
+ them in the following environment variables -or- a s3config.yml file.
243
+ In the yml case, the names need to be lowercase (see example file).
244
+ Furthermore, the yml is searched for in the following locations, in order:
245
+ $S3CONF/s3config.yml
246
+ $HOME/.s3conf/s3config.yml
247
+ /etc/s3conf/s3config.yml
248
+
249
+ Required:
250
+ AWS_ACCESS_KEY_ID
251
+ AWS_SECRET_ACCESS_KEY
252
+
253
+ If you don't know what these are, then s3sync is probably not the
254
+ right tool for you to be starting out with.
255
+ Optional:
256
+ AWS_S3_HOST - I don't see why the default would ever be wrong
257
+ HTTP_PROXY_HOST,HTTP_PROXY_PORT,HTTP_PROXY_USER,HTTP_PROXY_PASSWORD - proxy
258
+ SSL_CERT_DIR - Where your Cert Authority keys live; for verification
259
+ SSL_CERT_FILE - If you have just one PEM file for CA verification
260
+ S3SYNC_RETRIES - How many HTTP errors to tolerate before exiting
261
+ S3SYNC_WAITONERROR - How many seconds to wait after an http error
262
+ S3SYNC_MIME_TYPES_FILE - Where is your mime.types file
263
+ S3SYNC_NATIVE_CHARSET - For example Windows-1252. Defaults to ISO-8859-1.
264
+ AWS_CALLING_FORMAT - Defaults to REGULAR
265
+ REGULAR # http://s3.amazonaws.com/bucket/key
266
+ SUBDOMAIN # http://bucket.s3.amazonaws.com/key
267
+ VANITY # http://<vanity_domain>/key
268
+
269
+ Important: For EU-located buckets you should set the calling format to SUBDOMAIN
270
+ Important: For US buckets with CAPS or other weird traits set the calling format
271
+ to REGULAR
272
+
273
+ I use "envdir" from the daemontools package to set up my env
274
+ variables easily: http://cr.yp.to/daemontools/envdir.html
275
+ For example:
276
+ envdir /root/s3sync/env /root/s3sync/s3sync.rb -etc etc etc
277
+ I know there are other similar tools out there as well.
278
+
279
+ You can also just call it in a shell script where you have exported the vars
280
+ first such as:
281
+ #!/bin/bash
282
+ export AWS_ACCESS_KEY_ID=valueGoesHere
283
+ ...
284
+ s3sync.rb -etc etc etc
285
+
286
+ But by far the easiest (and newest) way to set this up is to put the name:value
287
+ pairs in a file named s3config.yml and let the yaml parser pick them up. There
288
+ is an .example file shipped with the tar.gz to show what a yaml file looks like.
289
+ Thanks to Alastair Brunton for this addition.
290
+
291
+ You can also use some combination of .yaml and environment variables, if you
292
+ want. Go nuts.
293
+
294
+
295
+ == LICENSE:
296
+
297
+ (The MIT License)
298
+
299
+ Copyright (c) 2009 FIXME full name
300
+
301
+ Permission is hereby granted, free of charge, to any person obtaining
302
+ a copy of this software and associated documentation files (the
303
+ 'Software'), to deal in the Software without restriction, including
304
+ without limitation the rights to use, copy, modify, merge, publish,
305
+ distribute, sublicense, and/or sell copies of the Software, and to
306
+ permit persons to whom the Software is furnished to do so, subject to
307
+ the following conditions:
308
+
309
+ The above copyright notice and this permission notice shall be
310
+ included in all copies or substantial portions of the Software.
311
+
312
+ THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
313
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
314
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
315
+ IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
316
+ CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
317
+ TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
318
+ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.