iostreams 1.0.0.beta7 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 4d8a7eaa8009e55339490c405a111986838219094c3df4d0b2303f7e1d16e951
4
- data.tar.gz: 3a0635cb14723f81054674759f6c0d1752196a5d8d5c6c69dc96d9910fa63b84
3
+ metadata.gz: e81d84bc2bb265b09acc66d39fa3ebc168d045127cf9942ce49cb06c5da369f2
4
+ data.tar.gz: 534e1bf05113578b1848fe196981b21c516115c783d778bb31ec78b0bc60b271
5
5
  SHA512:
6
- metadata.gz: e61d3ed29557eb7abd74737c6dd2b81ecfb2667c0a1f753a4a66fd746816f7083ea7dfc7fd8c86339f0dc671418692ccdef0948875b522d39e16a262c820c188
7
- data.tar.gz: 5d6b8cfc44627bd41b5170cc5e0c55ad10089d624bab6409ef864738c114fa614a9d92895c3e29eed20b88ff6627a06834e4aa722fba369fd11018efb38a5a4d
6
+ metadata.gz: 7a9bdf64f5142ab31c6e3f1f620d6e619041c7b9802928fd2b9c2508b9c90b95e943188299632c2407660594a0aeecbe3b5fc61a7bee80703701cfdf2827d906
7
+ data.tar.gz: d9584118b6bb8088c2e1e54f60fc2370213d693b1130e0e104c5a6cfcc4027b776e736f39e71393e14c0510ab1353d76398fad9ff44ec32479d17bbb5f45c86d
data/README.md CHANGED
@@ -9,19 +9,20 @@ Production Ready, but API is subject to breaking changes until V1 is released.
9
9
 
10
10
  ## Features
11
11
 
12
- Supported file / stream types:
12
+ Supported streams:
13
13
 
14
14
  * Zip
15
15
  * Gzip
16
16
  * BZip2
17
- * CSV
18
- * PGP (Uses GnuPG)
17
+ * PGP (Requires GnuPG)
19
18
  * Xlsx (Reading)
20
19
  * Encryption using [Symmetric Encryption](https://github.com/reidmorrison/symmetric-encryption)
21
20
 
22
- Streaming support currently under development:
21
+ Supported sources and/or targets:
23
22
 
24
- * S3
23
+ * File
24
+ * HTTP (Read only)
25
+ * AWS S3
25
26
  * SFTP
26
27
 
27
28
  Supported file formats:
@@ -31,6 +32,71 @@ Supported file formats:
31
32
  * JSON
32
33
  * PSV
33
34
 
35
+ ## Quick examples
36
+
37
+ Read an entire file into memory:
38
+
39
+ ```ruby
40
+ IOStreams.path('example.txt').read
41
+ ```
42
+
43
+ Decompress an entire gzip file into memory:
44
+
45
+ ```ruby
46
+ IOStreams.path('example.gz').read
47
+ ```
48
+
49
+ Read and decompress the first file in a zip file into memory:
50
+
51
+ ```ruby
52
+ IOStreams.path('example.zip').read
53
+ ```
54
+
55
+ Read a file one line at a time
56
+
57
+ ```ruby
58
+ IOStreams.path('example.txt').each do |line|
59
+ puts line
60
+ end
61
+ ```
62
+
63
+ Read a CSV file one line at a time, returning each line as an array:
64
+
65
+ ```ruby
66
+ IOStreams.path('example.csv').each(:array) do |array|
67
+ p array
68
+ end
69
+ ```
70
+
71
+ Read a CSV file a record at a time, returning each line as a hash.
72
+ The first line of the file is assumed to be the header line:
73
+
74
+ ```ruby
75
+ IOStreams.path('example.csv').each(:hash) do |hash|
76
+ p hash
77
+ end
78
+ ```
79
+
80
+ Read a file using an http get,
81
+ decompressing the named file in the zip file,
82
+ returning each records from the named file as a hash:
83
+
84
+ ```ruby
85
+ IOStreams.
86
+ path("https://www5.fdic.gov/idasp/Offices2.zip").
87
+ option(:zip, entry_file_name: 'OFFICES2_ALL.CSV').
88
+ reader(:hash) do |stream|
89
+ p stream.read
90
+ end
91
+ ```
92
+
93
+ Read the file without unzipping and streaming the first file in the zip:
94
+
95
+ ```ruby
96
+ IOStreams.path('https://www5.fdic.gov/idasp/Offices2.zip').stream(:none).reader {|file| puts file.read}
97
+ ```
98
+
99
+
34
100
  ## Introduction
35
101
 
36
102
  If all files were small, they could just be loaded into memory in their entirety. With the
@@ -53,7 +119,7 @@ together several streams, `iostreams` attempts to offer similar features for Rub
53
119
 
54
120
  ```ruby
55
121
  # Read a compressed file:
56
- IOStreams.reader('hello.gz') do |reader|
122
+ IOStreams.path("hello.gz").reader do |reader|
57
123
  data = reader.read(1024)
58
124
  puts "Read: #{data}"
59
125
  end
@@ -65,9 +131,9 @@ any temporary files to process the stream.
65
131
 
66
132
  ```ruby
67
133
  # Create a file that is compressed with GZip and then encrypted with Symmetric Encryption:
68
- IOStreams.writer('hello.gz.enc') do |writer|
69
- writer.write('Hello World')
70
- writer.write('and some more')
134
+ IOStreams.path("hello.gz.enc").writer do |writer|
135
+ writer.write("Hello World")
136
+ writer.write("and some more")
71
137
  end
72
138
  ```
73
139
 
@@ -91,16 +157,18 @@ un-encrypted data.
91
157
  While decompressing the file, display 128 characters at a time from the file.
92
158
 
93
159
  ~~~ruby
94
- require 'iostreams'
95
- IOStreams.reader('abc.csv') do |io|
96
- p data while (data = io.read(128))
160
+ require "iostreams"
161
+ IOStreams.path("abc.csv").reader do |io|
162
+ while (data = io.read(128))
163
+ p data
164
+ end
97
165
  end
98
166
  ~~~
99
167
 
100
168
  While decompressing the file, display one line at a time from the file.
101
169
 
102
170
  ~~~ruby
103
- IOStreams.each_line('abc.csv') do |line|
171
+ IOStreams.path("abc.csv").each do |line|
104
172
  puts line
105
173
  end
106
174
  ~~~
@@ -108,7 +176,7 @@ end
108
176
  While decompressing the file, display each row from the csv file as an array.
109
177
 
110
178
  ~~~ruby
111
- IOStreams.each_row('abc.csv') do |array|
179
+ IOStreams.path("abc.csv").each(:array) do |array|
112
180
  p array
113
181
  end
114
182
  ~~~
@@ -117,20 +185,7 @@ While decompressing the file, display each record from the csv file as a hash.
117
185
  The first line is assumed to be the header row.
118
186
 
119
187
  ~~~ruby
120
- IOStreams.each_record('abc.csv') do |hash|
121
- p hash
122
- end
123
- ~~~
124
-
125
- Display each line from the array as a hash.
126
- The first line is assumed to be the header row.
127
-
128
- ~~~ruby
129
- array = [
130
- 'name, address, zip_code',
131
- 'Jack, Down Under, 12345'
132
- ]
133
- IOStreams.each_record(array) do |hash|
188
+ IOStreams.path("abc.csv").each(:hash) do |hash|
134
189
  p hash
135
190
  end
136
191
  ~~~
@@ -138,9 +193,9 @@ end
138
193
  Write data while compressing the file.
139
194
 
140
195
  ~~~ruby
141
- IOStreams.writer('abc.csv') do |io|
142
- io.write('This')
143
- io.write(' is ')
196
+ IOStreams.path("abc.csv").writer do |io|
197
+ io.write("This")
198
+ io.write(" is ")
144
199
  io.write(" one line\n")
145
200
  end
146
201
  ~~~
@@ -148,12 +203,12 @@ end
148
203
  Write a line at a time while compressing the file.
149
204
 
150
205
  ~~~ruby
151
- IOStreams.line_writer('abc.csv') do |file|
152
- file << 'these'
153
- file << 'are'
154
- file << 'all'
155
- file << 'separate'
156
- file << 'lines'
206
+ IOStreams.path("abc.csv").writer(:line) do |file|
207
+ file << "these"
208
+ file << "are"
209
+ file << "all"
210
+ file << "separate"
211
+ file << "lines"
157
212
  end
158
213
  ~~~
159
214
 
@@ -161,10 +216,10 @@ Write an array (row) at a time while compressing the file.
161
216
  Each array is converted to csv before being compressed with zip.
162
217
 
163
218
  ~~~ruby
164
- IOStreams.row_writer('abc.csv') do |io|
219
+ IOStreams.path("abc.csv").writer(:array) do |io|
165
220
  io << %w[name address zip_code]
166
221
  io << %w[Jack There 1234]
167
- io << ['Joe', 'Over There somewhere', 1234]
222
+ io << ["Joe", "Over There somewhere", 1234]
168
223
  end
169
224
  ~~~
170
225
 
@@ -173,9 +228,9 @@ Each hash is converted to csv before being compressed with zip.
173
228
  The header row is extracted from the first hash supplied.
174
229
 
175
230
  ~~~ruby
176
- IOStreams.record_writer('abc.csv') do |stream|
177
- stream << {name: 'Jack', address: 'There', zip_code: 1234}
178
- stream << {name: 'Joe', address: 'Over There somewhere', zip_code: 1234}
231
+ IOStreams.path("abc.csv").writer(:hash) do |stream|
232
+ stream << {name: "Jack", address: "There", zip_code: 1234}
233
+ stream << {name: "Joe", address: "Over There somewhere", zip_code: 1234}
179
234
  end
180
235
  ~~~
181
236
 
@@ -183,9 +238,9 @@ Write to a string IO for testing, supplying the filename so that the streams can
183
238
 
184
239
  ~~~ruby
185
240
  io = StringIO.new
186
- IOStreams::Tabular::Writer(io, file_name: 'abc.csv') do |stream|
187
- stream << {name: 'Jack', address: 'There', zip_code: 1234}
188
- stream << {name: 'Joe', address: 'Over There somewhere', zip_code: 1234}
241
+ IOStreams.stream(io, file_name: "abc.csv").writer(:hash) do |stream|
242
+ stream << {name: "Jack", address: "There", zip_code: 1234}
243
+ stream << {name: "Joe", address: "Over There somewhere", zip_code: 1234}
189
244
  end
190
245
  puts io.string
191
246
  ~~~
@@ -193,8 +248,8 @@ puts io.string
193
248
  Read a CSV file and write the output to an encrypted file in JSON format.
194
249
 
195
250
  ~~~ruby
196
- IOStreams.record_writer('sample.json.enc') do |output|
197
- IOStreams.each_record('sample.csv') do |record|
251
+ IOStreams.path("sample.json.enc").writer(:hash) do |output|
252
+ IOStreams.path("sample.csv").each(:hash) do |record|
198
253
  output << record
199
254
  end
200
255
  end
@@ -207,38 +262,49 @@ Stream based file copying. Changes the file type without changing the file forma
207
262
  Encrypt the contents of the file `sample.json` and write to `sample.json.enc`
208
263
 
209
264
  ~~~ruby
210
- IOStreams.copy('sample.json', 'sample.json.enc')
265
+ input = IOStreams.path("sample.json")
266
+ IOStreams.path("sample.json.enc").copy_from(input)
211
267
  ~~~
212
268
 
213
269
  Encrypt and compress the contents of the file `sample.json` with Symmetric Encryption and write to `sample.json.enc`
214
270
 
215
271
  ~~~ruby
216
- IOStreams.copy('sample.json', 'sample.json.enc', target_options: {streams: {enc: {compress: true}}})
272
+ input = IOStreams.path("sample.json")
273
+ IOStreams.path("sample.json.enc").option(:enc, compress: true).copy_from(input)
217
274
  ~~~
218
275
 
219
276
  Encrypt and compress the contents of the file `sample.json` with pgp and write to `sample.json.enc`
220
277
 
221
278
  ~~~ruby
222
- IOStreams.copy('sample.json', 'sample.json.pgp', target_options: {streams: {pgp: {recipient: 'sender@example.org'}}})
279
+ input = IOStreams.path("sample.json")
280
+ IOStreams.path("sample.json.pgp").option(:pgp, recipient: "sender@example.org").copy_from(input)
223
281
  ~~~
224
282
 
225
283
  Decrypt the file `abc.csv.enc` and write it to `xyz.csv`.
226
284
 
227
285
  ~~~ruby
228
- IOStreams.copy('abc.csv.enc', 'xyz.csv')
286
+ input = IOStreams.path("abc.csv.enc")
287
+ IOStreams.path("xyz.csv").copy_from(input)
288
+ ~~~
289
+
290
+ Decrypt file `ABC` that was encrypted with Symmetric Encryption,
291
+ PGP encrypt the output file and write it to `xyz.csv.pgp` using the pgp key that was imported for `a@a.com`.
292
+
293
+ ~~~ruby
294
+ input = IOStreams.path("ABC").stream(:enc)
295
+ IOStreams.path("xyz.csv.pgp").option(:pgp, recipient: "a@a.com").copy_from(input)
229
296
  ~~~
230
297
 
231
- Read `ABC`, PGP encrypt the file and write to `xyz.csv.pgp`, applying
298
+ To copy a file _without_ performing any conversions (ignore file extensions), set `convert` to `false`:
232
299
 
233
300
  ~~~ruby
234
- IOStreams.copy('ABC', 'xyz.csv.pgp',
235
- source_options: [:enc],
236
- target_options: [pgp: {email_recipient: 'a@a.com'})
301
+ input = IOStreams.path("sample.json.zip")
302
+ IOStreams.path("sample.copy").copy_from(input, convert: false)
237
303
  ~~~
238
304
 
239
305
  ## Philosopy
240
306
 
241
- IOStreams can be used to work against a single stream. it's real capability becomes apparent when chainging together
307
+ IOStreams can be used to work against a single stream. it's real capability becomes apparent when chaining together
242
308
  multiple streams to process data, without loading entire files into memory.
243
309
 
244
310
  #### Linux Pipes
@@ -298,7 +364,7 @@ Since IOStreams can autodetect file types based on the file extension, `IOStream
298
364
  to start with:
299
365
  ~~~ruby
300
366
  line_count = 0
301
- IOStreams.reader("hello.csv.gz") do |input|
367
+ IOStreams.path("hello.csv.gz").reader do |input|
302
368
  IOStreams::Line::Reader.open(input) do |lines|
303
369
  lines.each { line_count += 1}
304
370
  end
@@ -306,19 +372,19 @@ to start with:
306
372
  puts "hello.csv.gz contains #{line_count} lines"
307
373
  ~~~
308
374
 
309
- Since we know we want a line reader, it can be simplified using `IOStreams.line_reader`:
375
+ Since we know we want a line reader, it can be simplified using `#reader(:line)`:
310
376
  ~~~ruby
311
377
  line_count = 0
312
- IOStreams.line_reader("hello.csv.gz") do |lines|
378
+ IOStreams.path("hello.csv.gz").reader(:line) do |lines|
313
379
  lines.each { line_count += 1}
314
380
  end
315
381
  puts "hello.csv.gz contains #{line_count} lines"
316
382
  ~~~
317
383
 
318
- It can be simplified even further using `IOStreams.each_line`:
384
+ It can be simplified even further using `#each`:
319
385
  ~~~ruby
320
386
  line_count = 0
321
- IOStreams.each_line("hello.csv.gz") { line_count += 1}
387
+ IOStreams.path("hello.csv.gz").each { line_count += 1}
322
388
  puts "hello.csv.gz contains #{line_count} lines"
323
389
  ~~~
324
390
 
@@ -336,25 +402,25 @@ and converting to valid US ASCII.
336
402
  apple_count = 0
337
403
  IOStreams::Gzip::Reader.open("hello.csv.gz") do |input|
338
404
  IOStreams::Encode::Reader.open(input,
339
- encoding: 'US-ASCII',
340
- encode_replace: '',
405
+ encoding: "US-ASCII",
406
+ encode_replace: "",
341
407
  encode_cleaner: :printable) do |cleansed|
342
408
  IOStreams::Line::Reader.open(cleansed) do |lines|
343
- lines.each { |line| apple_count += line.scan('apple').count}
409
+ lines.each { |line| apple_count += line.scan("apple").count}
344
410
  end
345
411
  end
346
412
  puts "Found the word 'apple' #{apple_count} times in hello.csv.gz"
347
413
  ~~~
348
414
 
349
415
  Let IOStreams perform the above stream chaining automatically under the covers:
416
+
350
417
  ~~~ruby
351
418
  apple_count = 0
352
- IOStreams.each_line("hello.csv.gz",
353
- encoding: 'US-ASCII',
354
- encode_replace: '',
355
- encode_cleaner: :printable) do |line|
356
- apple_count += line.scan('apple').count
357
- end
419
+ IOStreams.path("hello.csv.gz").
420
+ option(:encode, encoding: "US-ASCII", replace: "", cleaner: :printable).
421
+ each do |line|
422
+ apple_count += line.scan("apple").count
423
+ end
358
424
 
359
425
  puts "Found the word 'apple' #{apple_count} times in hello.csv.gz"
360
426
  ~~~
@@ -363,16 +429,10 @@ Let IOStreams perform the above stream chaining automatically under the covers:
363
429
 
364
430
  * Due to the nature of Zip, both its Reader and Writer methods will create
365
431
  a temp file when reading from or writing to a stream.
366
- Recommended to use Gzip over Zip since it can be streamed.
432
+ Recommended to use Gzip over Zip since it can be streamed without requiring temp files.
367
433
  * Zip becomes exponentially slower with very large files, especially files
368
434
  that exceed 4GB when uncompressed. Highly recommend using GZip for large files.
369
435
 
370
- To completely implement io streaming for Ruby will take a lot more input and thoughts
371
- from the Ruby community. This gem represents a starting point to get the discussion going.
372
-
373
- By keeping this gem a 0.x version and not going V1, we can change the interface as needed
374
- to implement community feedback.
375
-
376
436
  ## Versioning
377
437
 
378
438
  This project adheres to [Semantic Versioning](http://semver.org/).
@@ -383,7 +443,7 @@ This project adheres to [Semantic Versioning](http://semver.org/).
383
443
 
384
444
  ## License
385
445
 
386
- Copyright 2018 Reid Morrison
446
+ Copyright 2020 Reid Morrison
387
447
 
388
448
  Licensed under the Apache License, Version 2.0 (the "License");
389
449
  you may not use this file except in compliance with the License.
@@ -1,5 +1,6 @@
1
1
  module IOStreams
2
- class Streams
2
+ # Build the streams that need to be applied to a path druing reading or writing.
3
+ class Builder
3
4
  attr_accessor :file_name
4
5
  attr_reader :streams, :options
5
6
 
@@ -114,10 +115,10 @@ module IOStreams
114
115
  block.call(io_stream)
115
116
  elsif pipeline.size == 1
116
117
  stream, opts = pipeline.first
117
- class_for_stream(type, stream).stream(io_stream, opts, &block)
118
+ class_for_stream(type, stream).open(io_stream, opts, &block)
118
119
  else
119
120
  # Daisy chain multiple streams together
120
- last = pipeline.keys.inject(block) { |inner, stream_sym| ->(io) { class_for_stream(type, stream_sym).stream(io, pipeline[stream_sym], &inner) } }
121
+ last = pipeline.keys.inject(block) { |inner, stream_sym| ->(io) { class_for_stream(type, stream_sym).open(io, pipeline[stream_sym], &inner) } }
121
122
  last.call(io_stream)
122
123
  end
123
124
  end
@@ -26,19 +26,19 @@ module IOStreams
26
26
  # DEPRECATED
27
27
  def each_line(file_name_or_io, encoding: nil, encode_cleaner: nil, encode_replace: nil, **args, &block)
28
28
  path = build_path(file_name_or_io, encoding: encoding, encode_cleaner: encode_cleaner, encode_replace: encode_replace)
29
- path.each_line(**args, &block)
29
+ path.each(:line, **args, &block)
30
30
  end
31
31
 
32
32
  # DEPRECATED
33
33
  def each_row(file_name_or_io, encoding: nil, encode_cleaner: nil, encode_replace: nil, **args, &block)
34
34
  path = build_path(file_name_or_io, encoding: encoding, encode_cleaner: encode_cleaner, encode_replace: encode_replace)
35
- path.each_row(**args, &block)
35
+ path.each(:array, **args, &block)
36
36
  end
37
37
 
38
38
  # DEPRECATED
39
39
  def each_record(file_name_or_io, encoding: nil, encode_cleaner: nil, encode_replace: nil, **args, &block)
40
40
  path = build_path(file_name_or_io, encoding: encoding, encode_cleaner: encode_cleaner, encode_replace: encode_replace)
41
- path.each_record(**args, &block)
41
+ path.each(:hash, **args, &block)
42
42
  end
43
43
 
44
44
  # DEPRECATED. Use `#path` or `#io`
@@ -57,19 +57,19 @@ module IOStreams
57
57
  # DEPRECATED
58
58
  def line_writer(file_name_or_io, streams: nil, file_name: nil, encoding: nil, encode_cleaner: nil, encode_replace: nil, **args, &block)
59
59
  path = build_path(file_name_or_io, streams: streams, file_name: file_name, encoding: encoding, encode_cleaner: encode_cleaner, encode_replace: encode_replace)
60
- path.line_writer(**args, &block)
60
+ path.writer(:line, **args, &block)
61
61
  end
62
62
 
63
63
  # DEPRECATED
64
64
  def row_writer(file_name_or_io, streams: nil, file_name: nil, encoding: nil, encode_cleaner: nil, encode_replace: nil, **args, &block)
65
65
  path = build_path(file_name_or_io, streams: streams, file_name: file_name, encoding: encoding, encode_cleaner: encode_cleaner, encode_replace: encode_replace)
66
- path.row_writer(**args, &block)
66
+ path.writer(:array, **args, &block)
67
67
  end
68
68
 
69
69
  # DEPRECATED
70
70
  def record_writer(file_name_or_io, streams: nil, file_name: nil, encoding: nil, encode_cleaner: nil, encode_replace: nil, **args, &block)
71
71
  path = build_path(file_name_or_io, streams: streams, file_name: file_name, encoding: encoding, encode_cleaner: encode_cleaner, encode_replace: encode_replace)
72
- path.record_writer(**args, &block)
72
+ path.writer(:hash, **args, &block)
73
73
  end
74
74
 
75
75
  # Copies the source file/stream to the target file/stream.
@@ -170,19 +170,19 @@ module IOStreams
170
170
  # DEPRECATED
171
171
  def line_reader(file_name_or_io, streams: nil, file_name: nil, encoding: nil, encode_cleaner: nil, encode_replace: nil, **args, &block)
172
172
  path = build_path(file_name_or_io, streams: streams, file_name: file_name, encoding: encoding, encode_cleaner: encode_cleaner, encode_replace: encode_replace)
173
- path.line_reader(**args, &block)
173
+ path.reader(:line, **args, &block)
174
174
  end
175
175
 
176
176
  # DEPRECATED
177
177
  def row_reader(file_name_or_io, streams: nil, file_name: nil, encoding: nil, encode_cleaner: nil, encode_replace: nil, **args, &block)
178
178
  path = build_path(file_name_or_io, streams: streams, file_name: file_name, encoding: encoding, encode_cleaner: encode_cleaner, encode_replace: encode_replace)
179
- path.line_reader(**args, &block)
179
+ path.reader(:line, **args, &block)
180
180
  end
181
181
 
182
182
  # DEPRECATED
183
183
  def record_reader(file_name_or_io, streams: nil, file_name: nil, encoding: nil, encode_cleaner: nil, encode_replace: nil, **args, &block)
184
184
  path = build_path(file_name_or_io, streams: streams, file_name: file_name, encoding: encoding, encode_cleaner: encode_cleaner, encode_replace: encode_replace)
185
- path.record_reader(**args, &block)
185
+ path.reader(:hash, **args, &block)
186
186
  end
187
187
 
188
188
  private