iostreams 1.0.0.beta7 → 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 4d8a7eaa8009e55339490c405a111986838219094c3df4d0b2303f7e1d16e951
4
- data.tar.gz: 3a0635cb14723f81054674759f6c0d1752196a5d8d5c6c69dc96d9910fa63b84
3
+ metadata.gz: e81d84bc2bb265b09acc66d39fa3ebc168d045127cf9942ce49cb06c5da369f2
4
+ data.tar.gz: 534e1bf05113578b1848fe196981b21c516115c783d778bb31ec78b0bc60b271
5
5
  SHA512:
6
- metadata.gz: e61d3ed29557eb7abd74737c6dd2b81ecfb2667c0a1f753a4a66fd746816f7083ea7dfc7fd8c86339f0dc671418692ccdef0948875b522d39e16a262c820c188
7
- data.tar.gz: 5d6b8cfc44627bd41b5170cc5e0c55ad10089d624bab6409ef864738c114fa614a9d92895c3e29eed20b88ff6627a06834e4aa722fba369fd11018efb38a5a4d
6
+ metadata.gz: 7a9bdf64f5142ab31c6e3f1f620d6e619041c7b9802928fd2b9c2508b9c90b95e943188299632c2407660594a0aeecbe3b5fc61a7bee80703701cfdf2827d906
7
+ data.tar.gz: d9584118b6bb8088c2e1e54f60fc2370213d693b1130e0e104c5a6cfcc4027b776e736f39e71393e14c0510ab1353d76398fad9ff44ec32479d17bbb5f45c86d
data/README.md CHANGED
@@ -9,19 +9,20 @@ Production Ready, but API is subject to breaking changes until V1 is released.
9
9
 
10
10
  ## Features
11
11
 
12
- Supported file / stream types:
12
+ Supported streams:
13
13
 
14
14
  * Zip
15
15
  * Gzip
16
16
  * BZip2
17
- * CSV
18
- * PGP (Uses GnuPG)
17
+ * PGP (Requires GnuPG)
19
18
  * Xlsx (Reading)
20
19
  * Encryption using [Symmetric Encryption](https://github.com/reidmorrison/symmetric-encryption)
21
20
 
22
- Streaming support currently under development:
21
+ Supported sources and/or targets:
23
22
 
24
- * S3
23
+ * File
24
+ * HTTP (Read only)
25
+ * AWS S3
25
26
  * SFTP
26
27
 
27
28
  Supported file formats:
@@ -31,6 +32,71 @@ Supported file formats:
31
32
  * JSON
32
33
  * PSV
33
34
 
35
+ ## Quick examples
36
+
37
+ Read an entire file into memory:
38
+
39
+ ```ruby
40
+ IOStreams.path('example.txt').read
41
+ ```
42
+
43
+ Decompress an entire gzip file into memory:
44
+
45
+ ```ruby
46
+ IOStreams.path('example.gz').read
47
+ ```
48
+
49
+ Read and decompress the first file in a zip file into memory:
50
+
51
+ ```ruby
52
+ IOStreams.path('example.zip').read
53
+ ```
54
+
55
+ Read a file one line at a time
56
+
57
+ ```ruby
58
+ IOStreams.path('example.txt').each do |line|
59
+ puts line
60
+ end
61
+ ```
62
+
63
+ Read a CSV file one line at a time, returning each line as an array:
64
+
65
+ ```ruby
66
+ IOStreams.path('example.csv').each(:array) do |array|
67
+ p array
68
+ end
69
+ ```
70
+
71
+ Read a CSV file a record at a time, returning each line as a hash.
72
+ The first line of the file is assumed to be the header line:
73
+
74
+ ```ruby
75
+ IOStreams.path('example.csv').each(:hash) do |hash|
76
+ p hash
77
+ end
78
+ ```
79
+
80
+ Read a file using an http get,
81
+ decompressing the named file in the zip file,
82
+ returning each records from the named file as a hash:
83
+
84
+ ```ruby
85
+ IOStreams.
86
+ path("https://www5.fdic.gov/idasp/Offices2.zip").
87
+ option(:zip, entry_file_name: 'OFFICES2_ALL.CSV').
88
+ reader(:hash) do |stream|
89
+ p stream.read
90
+ end
91
+ ```
92
+
93
+ Read the file without unzipping and streaming the first file in the zip:
94
+
95
+ ```ruby
96
+ IOStreams.path('https://www5.fdic.gov/idasp/Offices2.zip').stream(:none).reader {|file| puts file.read}
97
+ ```
98
+
99
+
34
100
  ## Introduction
35
101
 
36
102
  If all files were small, they could just be loaded into memory in their entirety. With the
@@ -53,7 +119,7 @@ together several streams, `iostreams` attempts to offer similar features for Rub
53
119
 
54
120
  ```ruby
55
121
  # Read a compressed file:
56
- IOStreams.reader('hello.gz') do |reader|
122
+ IOStreams.path("hello.gz").reader do |reader|
57
123
  data = reader.read(1024)
58
124
  puts "Read: #{data}"
59
125
  end
@@ -65,9 +131,9 @@ any temporary files to process the stream.
65
131
 
66
132
  ```ruby
67
133
  # Create a file that is compressed with GZip and then encrypted with Symmetric Encryption:
68
- IOStreams.writer('hello.gz.enc') do |writer|
69
- writer.write('Hello World')
70
- writer.write('and some more')
134
+ IOStreams.path("hello.gz.enc").writer do |writer|
135
+ writer.write("Hello World")
136
+ writer.write("and some more")
71
137
  end
72
138
  ```
73
139
 
@@ -91,16 +157,18 @@ un-encrypted data.
91
157
  While decompressing the file, display 128 characters at a time from the file.
92
158
 
93
159
  ~~~ruby
94
- require 'iostreams'
95
- IOStreams.reader('abc.csv') do |io|
96
- p data while (data = io.read(128))
160
+ require "iostreams"
161
+ IOStreams.path("abc.csv").reader do |io|
162
+ while (data = io.read(128))
163
+ p data
164
+ end
97
165
  end
98
166
  ~~~
99
167
 
100
168
  While decompressing the file, display one line at a time from the file.
101
169
 
102
170
  ~~~ruby
103
- IOStreams.each_line('abc.csv') do |line|
171
+ IOStreams.path("abc.csv").each do |line|
104
172
  puts line
105
173
  end
106
174
  ~~~
@@ -108,7 +176,7 @@ end
108
176
  While decompressing the file, display each row from the csv file as an array.
109
177
 
110
178
  ~~~ruby
111
- IOStreams.each_row('abc.csv') do |array|
179
+ IOStreams.path("abc.csv").each(:array) do |array|
112
180
  p array
113
181
  end
114
182
  ~~~
@@ -117,20 +185,7 @@ While decompressing the file, display each record from the csv file as a hash.
117
185
  The first line is assumed to be the header row.
118
186
 
119
187
  ~~~ruby
120
- IOStreams.each_record('abc.csv') do |hash|
121
- p hash
122
- end
123
- ~~~
124
-
125
- Display each line from the array as a hash.
126
- The first line is assumed to be the header row.
127
-
128
- ~~~ruby
129
- array = [
130
- 'name, address, zip_code',
131
- 'Jack, Down Under, 12345'
132
- ]
133
- IOStreams.each_record(array) do |hash|
188
+ IOStreams.path("abc.csv").each(:hash) do |hash|
134
189
  p hash
135
190
  end
136
191
  ~~~
@@ -138,9 +193,9 @@ end
138
193
  Write data while compressing the file.
139
194
 
140
195
  ~~~ruby
141
- IOStreams.writer('abc.csv') do |io|
142
- io.write('This')
143
- io.write(' is ')
196
+ IOStreams.path("abc.csv").writer do |io|
197
+ io.write("This")
198
+ io.write(" is ")
144
199
  io.write(" one line\n")
145
200
  end
146
201
  ~~~
@@ -148,12 +203,12 @@ end
148
203
  Write a line at a time while compressing the file.
149
204
 
150
205
  ~~~ruby
151
- IOStreams.line_writer('abc.csv') do |file|
152
- file << 'these'
153
- file << 'are'
154
- file << 'all'
155
- file << 'separate'
156
- file << 'lines'
206
+ IOStreams.path("abc.csv").writer(:line) do |file|
207
+ file << "these"
208
+ file << "are"
209
+ file << "all"
210
+ file << "separate"
211
+ file << "lines"
157
212
  end
158
213
  ~~~
159
214
 
@@ -161,10 +216,10 @@ Write an array (row) at a time while compressing the file.
161
216
  Each array is converted to csv before being compressed with zip.
162
217
 
163
218
  ~~~ruby
164
- IOStreams.row_writer('abc.csv') do |io|
219
+ IOStreams.path("abc.csv").writer(:array) do |io|
165
220
  io << %w[name address zip_code]
166
221
  io << %w[Jack There 1234]
167
- io << ['Joe', 'Over There somewhere', 1234]
222
+ io << ["Joe", "Over There somewhere", 1234]
168
223
  end
169
224
  ~~~
170
225
 
@@ -173,9 +228,9 @@ Each hash is converted to csv before being compressed with zip.
173
228
  The header row is extracted from the first hash supplied.
174
229
 
175
230
  ~~~ruby
176
- IOStreams.record_writer('abc.csv') do |stream|
177
- stream << {name: 'Jack', address: 'There', zip_code: 1234}
178
- stream << {name: 'Joe', address: 'Over There somewhere', zip_code: 1234}
231
+ IOStreams.path("abc.csv").writer(:hash) do |stream|
232
+ stream << {name: "Jack", address: "There", zip_code: 1234}
233
+ stream << {name: "Joe", address: "Over There somewhere", zip_code: 1234}
179
234
  end
180
235
  ~~~
181
236
 
@@ -183,9 +238,9 @@ Write to a string IO for testing, supplying the filename so that the streams can
183
238
 
184
239
  ~~~ruby
185
240
  io = StringIO.new
186
- IOStreams::Tabular::Writer(io, file_name: 'abc.csv') do |stream|
187
- stream << {name: 'Jack', address: 'There', zip_code: 1234}
188
- stream << {name: 'Joe', address: 'Over There somewhere', zip_code: 1234}
241
+ IOStreams.stream(io, file_name: "abc.csv").writer(:hash) do |stream|
242
+ stream << {name: "Jack", address: "There", zip_code: 1234}
243
+ stream << {name: "Joe", address: "Over There somewhere", zip_code: 1234}
189
244
  end
190
245
  puts io.string
191
246
  ~~~
@@ -193,8 +248,8 @@ puts io.string
193
248
  Read a CSV file and write the output to an encrypted file in JSON format.
194
249
 
195
250
  ~~~ruby
196
- IOStreams.record_writer('sample.json.enc') do |output|
197
- IOStreams.each_record('sample.csv') do |record|
251
+ IOStreams.path("sample.json.enc").writer(:hash) do |output|
252
+ IOStreams.path("sample.csv").each(:hash) do |record|
198
253
  output << record
199
254
  end
200
255
  end
@@ -207,38 +262,49 @@ Stream based file copying. Changes the file type without changing the file forma
207
262
  Encrypt the contents of the file `sample.json` and write to `sample.json.enc`
208
263
 
209
264
  ~~~ruby
210
- IOStreams.copy('sample.json', 'sample.json.enc')
265
+ input = IOStreams.path("sample.json")
266
+ IOStreams.path("sample.json.enc").copy_from(input)
211
267
  ~~~
212
268
 
213
269
  Encrypt and compress the contents of the file `sample.json` with Symmetric Encryption and write to `sample.json.enc`
214
270
 
215
271
  ~~~ruby
216
- IOStreams.copy('sample.json', 'sample.json.enc', target_options: {streams: {enc: {compress: true}}})
272
+ input = IOStreams.path("sample.json")
273
+ IOStreams.path("sample.json.enc").option(:enc, compress: true).copy_from(input)
217
274
  ~~~
218
275
 
219
276
  Encrypt and compress the contents of the file `sample.json` with pgp and write to `sample.json.enc`
220
277
 
221
278
  ~~~ruby
222
- IOStreams.copy('sample.json', 'sample.json.pgp', target_options: {streams: {pgp: {recipient: 'sender@example.org'}}})
279
+ input = IOStreams.path("sample.json")
280
+ IOStreams.path("sample.json.pgp").option(:pgp, recipient: "sender@example.org").copy_from(input)
223
281
  ~~~
224
282
 
225
283
  Decrypt the file `abc.csv.enc` and write it to `xyz.csv`.
226
284
 
227
285
  ~~~ruby
228
- IOStreams.copy('abc.csv.enc', 'xyz.csv')
286
+ input = IOStreams.path("abc.csv.enc")
287
+ IOStreams.path("xyz.csv").copy_from(input)
288
+ ~~~
289
+
290
+ Decrypt file `ABC` that was encrypted with Symmetric Encryption,
291
+ PGP encrypt the output file and write it to `xyz.csv.pgp` using the pgp key that was imported for `a@a.com`.
292
+
293
+ ~~~ruby
294
+ input = IOStreams.path("ABC").stream(:enc)
295
+ IOStreams.path("xyz.csv.pgp").option(:pgp, recipient: "a@a.com").copy_from(input)
229
296
  ~~~
230
297
 
231
- Read `ABC`, PGP encrypt the file and write to `xyz.csv.pgp`, applying
298
+ To copy a file _without_ performing any conversions (ignore file extensions), set `convert` to `false`:
232
299
 
233
300
  ~~~ruby
234
- IOStreams.copy('ABC', 'xyz.csv.pgp',
235
- source_options: [:enc],
236
- target_options: [pgp: {email_recipient: 'a@a.com'})
301
+ input = IOStreams.path("sample.json.zip")
302
+ IOStreams.path("sample.copy").copy_from(input, convert: false)
237
303
  ~~~
238
304
 
239
305
  ## Philosopy
240
306
 
241
- IOStreams can be used to work against a single stream. it's real capability becomes apparent when chainging together
307
+ IOStreams can be used to work against a single stream. it's real capability becomes apparent when chaining together
242
308
  multiple streams to process data, without loading entire files into memory.
243
309
 
244
310
  #### Linux Pipes
@@ -298,7 +364,7 @@ Since IOStreams can autodetect file types based on the file extension, `IOStream
298
364
  to start with:
299
365
  ~~~ruby
300
366
  line_count = 0
301
- IOStreams.reader("hello.csv.gz") do |input|
367
+ IOStreams.path("hello.csv.gz").reader do |input|
302
368
  IOStreams::Line::Reader.open(input) do |lines|
303
369
  lines.each { line_count += 1}
304
370
  end
@@ -306,19 +372,19 @@ to start with:
306
372
  puts "hello.csv.gz contains #{line_count} lines"
307
373
  ~~~
308
374
 
309
- Since we know we want a line reader, it can be simplified using `IOStreams.line_reader`:
375
+ Since we know we want a line reader, it can be simplified using `#reader(:line)`:
310
376
  ~~~ruby
311
377
  line_count = 0
312
- IOStreams.line_reader("hello.csv.gz") do |lines|
378
+ IOStreams.path("hello.csv.gz").reader(:line) do |lines|
313
379
  lines.each { line_count += 1}
314
380
  end
315
381
  puts "hello.csv.gz contains #{line_count} lines"
316
382
  ~~~
317
383
 
318
- It can be simplified even further using `IOStreams.each_line`:
384
+ It can be simplified even further using `#each`:
319
385
  ~~~ruby
320
386
  line_count = 0
321
- IOStreams.each_line("hello.csv.gz") { line_count += 1}
387
+ IOStreams.path("hello.csv.gz").each { line_count += 1}
322
388
  puts "hello.csv.gz contains #{line_count} lines"
323
389
  ~~~
324
390
 
@@ -336,25 +402,25 @@ and converting to valid US ASCII.
336
402
  apple_count = 0
337
403
  IOStreams::Gzip::Reader.open("hello.csv.gz") do |input|
338
404
  IOStreams::Encode::Reader.open(input,
339
- encoding: 'US-ASCII',
340
- encode_replace: '',
405
+ encoding: "US-ASCII",
406
+ encode_replace: "",
341
407
  encode_cleaner: :printable) do |cleansed|
342
408
  IOStreams::Line::Reader.open(cleansed) do |lines|
343
- lines.each { |line| apple_count += line.scan('apple').count}
409
+ lines.each { |line| apple_count += line.scan("apple").count}
344
410
  end
345
411
  end
346
412
  puts "Found the word 'apple' #{apple_count} times in hello.csv.gz"
347
413
  ~~~
348
414
 
349
415
  Let IOStreams perform the above stream chaining automatically under the covers:
416
+
350
417
  ~~~ruby
351
418
  apple_count = 0
352
- IOStreams.each_line("hello.csv.gz",
353
- encoding: 'US-ASCII',
354
- encode_replace: '',
355
- encode_cleaner: :printable) do |line|
356
- apple_count += line.scan('apple').count
357
- end
419
+ IOStreams.path("hello.csv.gz").
420
+ option(:encode, encoding: "US-ASCII", replace: "", cleaner: :printable).
421
+ each do |line|
422
+ apple_count += line.scan("apple").count
423
+ end
358
424
 
359
425
  puts "Found the word 'apple' #{apple_count} times in hello.csv.gz"
360
426
  ~~~
@@ -363,16 +429,10 @@ Let IOStreams perform the above stream chaining automatically under the covers:
363
429
 
364
430
  * Due to the nature of Zip, both its Reader and Writer methods will create
365
431
  a temp file when reading from or writing to a stream.
366
- Recommended to use Gzip over Zip since it can be streamed.
432
+ Recommended to use Gzip over Zip since it can be streamed without requiring temp files.
367
433
  * Zip becomes exponentially slower with very large files, especially files
368
434
  that exceed 4GB when uncompressed. Highly recommend using GZip for large files.
369
435
 
370
- To completely implement io streaming for Ruby will take a lot more input and thoughts
371
- from the Ruby community. This gem represents a starting point to get the discussion going.
372
-
373
- By keeping this gem a 0.x version and not going V1, we can change the interface as needed
374
- to implement community feedback.
375
-
376
436
  ## Versioning
377
437
 
378
438
  This project adheres to [Semantic Versioning](http://semver.org/).
@@ -383,7 +443,7 @@ This project adheres to [Semantic Versioning](http://semver.org/).
383
443
 
384
444
  ## License
385
445
 
386
- Copyright 2018 Reid Morrison
446
+ Copyright 2020 Reid Morrison
387
447
 
388
448
  Licensed under the Apache License, Version 2.0 (the "License");
389
449
  you may not use this file except in compliance with the License.
@@ -1,5 +1,6 @@
1
1
  module IOStreams
2
- class Streams
2
+ # Build the streams that need to be applied to a path druing reading or writing.
3
+ class Builder
3
4
  attr_accessor :file_name
4
5
  attr_reader :streams, :options
5
6
 
@@ -114,10 +115,10 @@ module IOStreams
114
115
  block.call(io_stream)
115
116
  elsif pipeline.size == 1
116
117
  stream, opts = pipeline.first
117
- class_for_stream(type, stream).stream(io_stream, opts, &block)
118
+ class_for_stream(type, stream).open(io_stream, opts, &block)
118
119
  else
119
120
  # Daisy chain multiple streams together
120
- last = pipeline.keys.inject(block) { |inner, stream_sym| ->(io) { class_for_stream(type, stream_sym).stream(io, pipeline[stream_sym], &inner) } }
121
+ last = pipeline.keys.inject(block) { |inner, stream_sym| ->(io) { class_for_stream(type, stream_sym).open(io, pipeline[stream_sym], &inner) } }
121
122
  last.call(io_stream)
122
123
  end
123
124
  end
@@ -26,19 +26,19 @@ module IOStreams
26
26
  # DEPRECATED
27
27
  def each_line(file_name_or_io, encoding: nil, encode_cleaner: nil, encode_replace: nil, **args, &block)
28
28
  path = build_path(file_name_or_io, encoding: encoding, encode_cleaner: encode_cleaner, encode_replace: encode_replace)
29
- path.each_line(**args, &block)
29
+ path.each(:line, **args, &block)
30
30
  end
31
31
 
32
32
  # DEPRECATED
33
33
  def each_row(file_name_or_io, encoding: nil, encode_cleaner: nil, encode_replace: nil, **args, &block)
34
34
  path = build_path(file_name_or_io, encoding: encoding, encode_cleaner: encode_cleaner, encode_replace: encode_replace)
35
- path.each_row(**args, &block)
35
+ path.each(:array, **args, &block)
36
36
  end
37
37
 
38
38
  # DEPRECATED
39
39
  def each_record(file_name_or_io, encoding: nil, encode_cleaner: nil, encode_replace: nil, **args, &block)
40
40
  path = build_path(file_name_or_io, encoding: encoding, encode_cleaner: encode_cleaner, encode_replace: encode_replace)
41
- path.each_record(**args, &block)
41
+ path.each(:hash, **args, &block)
42
42
  end
43
43
 
44
44
  # DEPRECATED. Use `#path` or `#io`
@@ -57,19 +57,19 @@ module IOStreams
57
57
  # DEPRECATED
58
58
  def line_writer(file_name_or_io, streams: nil, file_name: nil, encoding: nil, encode_cleaner: nil, encode_replace: nil, **args, &block)
59
59
  path = build_path(file_name_or_io, streams: streams, file_name: file_name, encoding: encoding, encode_cleaner: encode_cleaner, encode_replace: encode_replace)
60
- path.line_writer(**args, &block)
60
+ path.writer(:line, **args, &block)
61
61
  end
62
62
 
63
63
  # DEPRECATED
64
64
  def row_writer(file_name_or_io, streams: nil, file_name: nil, encoding: nil, encode_cleaner: nil, encode_replace: nil, **args, &block)
65
65
  path = build_path(file_name_or_io, streams: streams, file_name: file_name, encoding: encoding, encode_cleaner: encode_cleaner, encode_replace: encode_replace)
66
- path.row_writer(**args, &block)
66
+ path.writer(:array, **args, &block)
67
67
  end
68
68
 
69
69
  # DEPRECATED
70
70
  def record_writer(file_name_or_io, streams: nil, file_name: nil, encoding: nil, encode_cleaner: nil, encode_replace: nil, **args, &block)
71
71
  path = build_path(file_name_or_io, streams: streams, file_name: file_name, encoding: encoding, encode_cleaner: encode_cleaner, encode_replace: encode_replace)
72
- path.record_writer(**args, &block)
72
+ path.writer(:hash, **args, &block)
73
73
  end
74
74
 
75
75
  # Copies the source file/stream to the target file/stream.
@@ -170,19 +170,19 @@ module IOStreams
170
170
  # DEPRECATED
171
171
  def line_reader(file_name_or_io, streams: nil, file_name: nil, encoding: nil, encode_cleaner: nil, encode_replace: nil, **args, &block)
172
172
  path = build_path(file_name_or_io, streams: streams, file_name: file_name, encoding: encoding, encode_cleaner: encode_cleaner, encode_replace: encode_replace)
173
- path.line_reader(**args, &block)
173
+ path.reader(:line, **args, &block)
174
174
  end
175
175
 
176
176
  # DEPRECATED
177
177
  def row_reader(file_name_or_io, streams: nil, file_name: nil, encoding: nil, encode_cleaner: nil, encode_replace: nil, **args, &block)
178
178
  path = build_path(file_name_or_io, streams: streams, file_name: file_name, encoding: encoding, encode_cleaner: encode_cleaner, encode_replace: encode_replace)
179
- path.line_reader(**args, &block)
179
+ path.reader(:line, **args, &block)
180
180
  end
181
181
 
182
182
  # DEPRECATED
183
183
  def record_reader(file_name_or_io, streams: nil, file_name: nil, encoding: nil, encode_cleaner: nil, encode_replace: nil, **args, &block)
184
184
  path = build_path(file_name_or_io, streams: streams, file_name: file_name, encoding: encoding, encode_cleaner: encode_cleaner, encode_replace: encode_replace)
185
- path.record_reader(**args, &block)
185
+ path.reader(:hash, **args, &block)
186
186
  end
187
187
 
188
188
  private