iostreams 1.2.0 → 1.2.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +3 -425
- data/lib/io_streams/line/reader.rb +2 -2
- data/lib/io_streams/paths/http.rb +1 -0
- data/lib/io_streams/pgp.rb +0 -76
- data/lib/io_streams/record/reader.rb +33 -8
- data/lib/io_streams/record/writer.rb +33 -8
- data/lib/io_streams/row/reader.rb +4 -4
- data/lib/io_streams/row/writer.rb +4 -4
- data/lib/io_streams/stream.rb +6 -6
- data/lib/io_streams/tabular.rb +29 -1
- data/lib/io_streams/utils.rb +1 -0
- data/lib/io_streams/version.rb +1 -1
- data/test/io_streams_test.rb +49 -0
- metadata +3 -4
- data/lib/io_streams/utils/reliable_http.rb +0 -98
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 1dad581b0665992975c33f75b23f50964ae1311e025b7a1524fca4004f0ede2b
|
4
|
+
data.tar.gz: 4db01e4d6c2d36ce522df3b323a6e0d9f42de0d1644a282a0cea06479e979289
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 4057a5c484129c60dbc9c84e462026da862900e17b0604b385164210f14814fbae6d065d015ee9171402eb9f793f33ac26c0ee7658f94b8cdeb0724c796cbe63
|
7
|
+
data.tar.gz: 5a84fe37c1eebc775bd84b9903181ff035c325b1233ab64990e586f5b0bd3fd51c21d4f1429f9b0e8ab64733e9b63be5c5e05df7bf026e9d1d8c0cd8a7716417
|
data/README.md
CHANGED
@@ -5,433 +5,11 @@ Input and Output streaming for Ruby.
|
|
5
5
|
|
6
6
|
## Project Status
|
7
7
|
|
8
|
-
Production Ready.
|
8
|
+
Production Ready, heavily used in production environments, many as part of Rocket Job.
|
9
9
|
|
10
|
-
##
|
10
|
+
## Documentation
|
11
11
|
|
12
|
-
|
13
|
-
|
14
|
-
* Zip
|
15
|
-
* Gzip
|
16
|
-
* BZip2
|
17
|
-
* PGP (Requires GnuPG)
|
18
|
-
* Xlsx (Reading)
|
19
|
-
* Encryption using [Symmetric Encryption](https://github.com/reidmorrison/symmetric-encryption)
|
20
|
-
|
21
|
-
Supported sources and/or targets:
|
22
|
-
|
23
|
-
* File
|
24
|
-
* HTTP (Read only)
|
25
|
-
* AWS S3
|
26
|
-
* SFTP
|
27
|
-
|
28
|
-
Supported file formats:
|
29
|
-
|
30
|
-
* CSV
|
31
|
-
* Fixed width formats
|
32
|
-
* JSON
|
33
|
-
* PSV
|
34
|
-
|
35
|
-
## Quick examples
|
36
|
-
|
37
|
-
Read an entire file into memory:
|
38
|
-
|
39
|
-
```ruby
|
40
|
-
IOStreams.path('example.txt').read
|
41
|
-
```
|
42
|
-
|
43
|
-
Decompress an entire gzip file into memory:
|
44
|
-
|
45
|
-
```ruby
|
46
|
-
IOStreams.path('example.gz').read
|
47
|
-
```
|
48
|
-
|
49
|
-
Read and decompress the first file in a zip file into memory:
|
50
|
-
|
51
|
-
```ruby
|
52
|
-
IOStreams.path('example.zip').read
|
53
|
-
```
|
54
|
-
|
55
|
-
Read a file one line at a time
|
56
|
-
|
57
|
-
```ruby
|
58
|
-
IOStreams.path('example.txt').each do |line|
|
59
|
-
puts line
|
60
|
-
end
|
61
|
-
```
|
62
|
-
|
63
|
-
Read a CSV file one line at a time, returning each line as an array:
|
64
|
-
|
65
|
-
```ruby
|
66
|
-
IOStreams.path('example.csv').each(:array) do |array|
|
67
|
-
p array
|
68
|
-
end
|
69
|
-
```
|
70
|
-
|
71
|
-
Read a CSV file a record at a time, returning each line as a hash.
|
72
|
-
The first line of the file is assumed to be the header line:
|
73
|
-
|
74
|
-
```ruby
|
75
|
-
IOStreams.path('example.csv').each(:hash) do |hash|
|
76
|
-
p hash
|
77
|
-
end
|
78
|
-
```
|
79
|
-
|
80
|
-
Read a file using an http get,
|
81
|
-
decompressing the named file in the zip file,
|
82
|
-
returning each records from the named file as a hash:
|
83
|
-
|
84
|
-
```ruby
|
85
|
-
IOStreams.
|
86
|
-
path("https://www5.fdic.gov/idasp/Offices2.zip").
|
87
|
-
option(:zip, entry_file_name: 'OFFICES2_ALL.CSV').
|
88
|
-
reader(:hash) do |stream|
|
89
|
-
p stream.read
|
90
|
-
end
|
91
|
-
```
|
92
|
-
|
93
|
-
Read the file without unzipping and streaming the first file in the zip:
|
94
|
-
|
95
|
-
```ruby
|
96
|
-
IOStreams.path('https://www5.fdic.gov/idasp/Offices2.zip').stream(:none).reader {|file| puts file.read}
|
97
|
-
```
|
98
|
-
|
99
|
-
|
100
|
-
## Introduction
|
101
|
-
|
102
|
-
If all files were small, they could just be loaded into memory in their entirety. With the
|
103
|
-
advent of very large files, often into several Gigabytes, or even Terabytes in size, loading
|
104
|
-
them into memory is not feasible.
|
105
|
-
|
106
|
-
In linux it is common to use pipes to stream data between processes.
|
107
|
-
For example:
|
108
|
-
|
109
|
-
```
|
110
|
-
# Count the number of lines in a file that has been compressed with gzip
|
111
|
-
cat abc.gz | gunzip -c | wc -l
|
112
|
-
```
|
113
|
-
|
114
|
-
For large files it is critical to be able to read and write these files as streams. Ruby has support
|
115
|
-
for reading and writing files using streams, but has no built-in way of passing one stream through
|
116
|
-
another to support for example compressing the data, encrypting it and then finally writing the result
|
117
|
-
to a file. Several streaming implementations exist for languages such as `C++` and `Java` to chain
|
118
|
-
together several streams, `iostreams` attempts to offer similar features for Ruby.
|
119
|
-
|
120
|
-
```ruby
|
121
|
-
# Read a compressed file:
|
122
|
-
IOStreams.path("hello.gz").reader do |reader|
|
123
|
-
data = reader.read(1024)
|
124
|
-
puts "Read: #{data}"
|
125
|
-
end
|
126
|
-
```
|
127
|
-
|
128
|
-
The true power of streams is shown when many streams are chained together to achieve the end
|
129
|
-
result, without holding the entire file in memory, or ideally without needing to create
|
130
|
-
any temporary files to process the stream.
|
131
|
-
|
132
|
-
```ruby
|
133
|
-
# Create a file that is compressed with GZip and then encrypted with Symmetric Encryption:
|
134
|
-
IOStreams.path("hello.gz.enc").writer do |writer|
|
135
|
-
writer.write("Hello World")
|
136
|
-
writer.write("and some more")
|
137
|
-
end
|
138
|
-
```
|
139
|
-
|
140
|
-
The power of the above example applies when the data being written starts to exceed hundreds of megabytes,
|
141
|
-
or even gigabytes.
|
142
|
-
|
143
|
-
By looking at the file name supplied above, `iostreams` is able to determine which streams to apply
|
144
|
-
to the data being read or written. For example:
|
145
|
-
* `hello.zip` => Compressed using Zip
|
146
|
-
* `hello.zip.enc` => Compressed using Zip and then encrypted using Symmetric Encryption
|
147
|
-
* `hello.gz.enc` => Compressed using GZip and then encrypted using Symmetric Encryption
|
148
|
-
|
149
|
-
The objective is that all of these streaming processes are performed used streaming
|
150
|
-
so that only the current portion of the file is loaded into memory as it moves
|
151
|
-
through the entire file.
|
152
|
-
Where possible each stream never goes to disk, which for example could expose
|
153
|
-
un-encrypted data.
|
154
|
-
|
155
|
-
## Examples
|
156
|
-
|
157
|
-
While decompressing the file, display 128 characters at a time from the file.
|
158
|
-
|
159
|
-
~~~ruby
|
160
|
-
require "iostreams"
|
161
|
-
IOStreams.path("abc.csv").reader do |io|
|
162
|
-
while (data = io.read(128))
|
163
|
-
p data
|
164
|
-
end
|
165
|
-
end
|
166
|
-
~~~
|
167
|
-
|
168
|
-
While decompressing the file, display one line at a time from the file.
|
169
|
-
|
170
|
-
~~~ruby
|
171
|
-
IOStreams.path("abc.csv").each do |line|
|
172
|
-
puts line
|
173
|
-
end
|
174
|
-
~~~
|
175
|
-
|
176
|
-
While decompressing the file, display each row from the csv file as an array.
|
177
|
-
|
178
|
-
~~~ruby
|
179
|
-
IOStreams.path("abc.csv").each(:array) do |array|
|
180
|
-
p array
|
181
|
-
end
|
182
|
-
~~~
|
183
|
-
|
184
|
-
While decompressing the file, display each record from the csv file as a hash.
|
185
|
-
The first line is assumed to be the header row.
|
186
|
-
|
187
|
-
~~~ruby
|
188
|
-
IOStreams.path("abc.csv").each(:hash) do |hash|
|
189
|
-
p hash
|
190
|
-
end
|
191
|
-
~~~
|
192
|
-
|
193
|
-
Write data while compressing the file.
|
194
|
-
|
195
|
-
~~~ruby
|
196
|
-
IOStreams.path("abc.csv").writer do |io|
|
197
|
-
io.write("This")
|
198
|
-
io.write(" is ")
|
199
|
-
io.write(" one line\n")
|
200
|
-
end
|
201
|
-
~~~
|
202
|
-
|
203
|
-
Write a line at a time while compressing the file.
|
204
|
-
|
205
|
-
~~~ruby
|
206
|
-
IOStreams.path("abc.csv").writer(:line) do |file|
|
207
|
-
file << "these"
|
208
|
-
file << "are"
|
209
|
-
file << "all"
|
210
|
-
file << "separate"
|
211
|
-
file << "lines"
|
212
|
-
end
|
213
|
-
~~~
|
214
|
-
|
215
|
-
Write an array (row) at a time while compressing the file.
|
216
|
-
Each array is converted to csv before being compressed with zip.
|
217
|
-
|
218
|
-
~~~ruby
|
219
|
-
IOStreams.path("abc.csv").writer(:array) do |io|
|
220
|
-
io << %w[name address zip_code]
|
221
|
-
io << %w[Jack There 1234]
|
222
|
-
io << ["Joe", "Over There somewhere", 1234]
|
223
|
-
end
|
224
|
-
~~~
|
225
|
-
|
226
|
-
Write a hash (record) at a time while compressing the file.
|
227
|
-
Each hash is converted to csv before being compressed with zip.
|
228
|
-
The header row is extracted from the first hash supplied.
|
229
|
-
|
230
|
-
~~~ruby
|
231
|
-
IOStreams.path("abc.csv").writer(:hash) do |stream|
|
232
|
-
stream << {name: "Jack", address: "There", zip_code: 1234}
|
233
|
-
stream << {name: "Joe", address: "Over There somewhere", zip_code: 1234}
|
234
|
-
end
|
235
|
-
~~~
|
236
|
-
|
237
|
-
Write to a string IO for testing, supplying the filename so that the streams can be determined.
|
238
|
-
|
239
|
-
~~~ruby
|
240
|
-
io = StringIO.new
|
241
|
-
IOStreams.stream(io, file_name: "abc.csv").writer(:hash) do |stream|
|
242
|
-
stream << {name: "Jack", address: "There", zip_code: 1234}
|
243
|
-
stream << {name: "Joe", address: "Over There somewhere", zip_code: 1234}
|
244
|
-
end
|
245
|
-
puts io.string
|
246
|
-
~~~
|
247
|
-
|
248
|
-
Read a CSV file and write the output to an encrypted file in JSON format.
|
249
|
-
|
250
|
-
~~~ruby
|
251
|
-
IOStreams.path("sample.json.enc").writer(:hash) do |output|
|
252
|
-
IOStreams.path("sample.csv").each(:hash) do |record|
|
253
|
-
output << record
|
254
|
-
end
|
255
|
-
end
|
256
|
-
~~~
|
257
|
-
|
258
|
-
## Copying between files
|
259
|
-
|
260
|
-
Stream based file copying. Changes the file type without changing the file format. For example, compress or encrypt.
|
261
|
-
|
262
|
-
Encrypt the contents of the file `sample.json` and write to `sample.json.enc`
|
263
|
-
|
264
|
-
~~~ruby
|
265
|
-
input = IOStreams.path("sample.json")
|
266
|
-
IOStreams.path("sample.json.enc").copy_from(input)
|
267
|
-
~~~
|
268
|
-
|
269
|
-
Encrypt and compress the contents of the file `sample.json` with Symmetric Encryption and write to `sample.json.enc`
|
270
|
-
|
271
|
-
~~~ruby
|
272
|
-
input = IOStreams.path("sample.json")
|
273
|
-
IOStreams.path("sample.json.enc").option(:enc, compress: true).copy_from(input)
|
274
|
-
~~~
|
275
|
-
|
276
|
-
Encrypt and compress the contents of the file `sample.json` with pgp and write to `sample.json.enc`
|
277
|
-
|
278
|
-
~~~ruby
|
279
|
-
input = IOStreams.path("sample.json")
|
280
|
-
IOStreams.path("sample.json.pgp").option(:pgp, recipient: "sender@example.org").copy_from(input)
|
281
|
-
~~~
|
282
|
-
|
283
|
-
Decrypt the file `abc.csv.enc` and write it to `xyz.csv`.
|
284
|
-
|
285
|
-
~~~ruby
|
286
|
-
input = IOStreams.path("abc.csv.enc")
|
287
|
-
IOStreams.path("xyz.csv").copy_from(input)
|
288
|
-
~~~
|
289
|
-
|
290
|
-
Decrypt file `ABC` that was encrypted with Symmetric Encryption,
|
291
|
-
PGP encrypt the output file and write it to `xyz.csv.pgp` using the pgp key that was imported for `a@a.com`.
|
292
|
-
|
293
|
-
~~~ruby
|
294
|
-
input = IOStreams.path("ABC").stream(:enc)
|
295
|
-
IOStreams.path("xyz.csv.pgp").option(:pgp, recipient: "a@a.com").copy_from(input)
|
296
|
-
~~~
|
297
|
-
|
298
|
-
To copy a file _without_ performing any conversions (ignore file extensions), set `convert` to `false`:
|
299
|
-
|
300
|
-
~~~ruby
|
301
|
-
input = IOStreams.path("sample.json.zip")
|
302
|
-
IOStreams.path("sample.copy").copy_from(input, convert: false)
|
303
|
-
~~~
|
304
|
-
|
305
|
-
## Philosopy
|
306
|
-
|
307
|
-
IOStreams can be used to work against a single stream. it's real capability becomes apparent when chaining together
|
308
|
-
multiple streams to process data, without loading entire files into memory.
|
309
|
-
|
310
|
-
#### Linux Pipes
|
311
|
-
|
312
|
-
Linux has built-in support for streaming using the `|` (pipe operator) to send the output from one process to another.
|
313
|
-
|
314
|
-
Example: count the number of lines in a compressed file:
|
315
|
-
|
316
|
-
gunzip -c hello.csv.gz | wc -l
|
317
|
-
|
318
|
-
The file `hello.csv.gz` is uncompressed and returned to standard output, which in turn is piped into the standard
|
319
|
-
input for `wc -l`, which counts the number of lines in the uncompressed data.
|
320
|
-
|
321
|
-
As each block of data is returned from `gunzip` it is immediately passed into `wc` so that it
|
322
|
-
can start counting lines of uncompressed data, without waiting until the entire file is decompressed.
|
323
|
-
The uncompressed contents of the file are not written to disk before passing to `wc -l` and the file is not loaded
|
324
|
-
into memory before passing to `wc -l`.
|
325
|
-
|
326
|
-
In this way extremely large files can be processed with very little memory being used.
|
327
|
-
|
328
|
-
#### Push Model
|
329
|
-
|
330
|
-
In the Linux pipes example above this would be considered a "push model" where each task in the list pushes
|
331
|
-
its output to the input of the next task.
|
332
|
-
|
333
|
-
A major challenge or disadvantage with the push model is that buffering would need to occur between tasks since
|
334
|
-
each task could complete at very different speeds. To prevent large memory usage the standard output from a previous
|
335
|
-
task would have to be blocked to try and make it slow down.
|
336
|
-
|
337
|
-
#### Pull Model
|
338
|
-
|
339
|
-
Another approach with multiple tasks that need to process a single stream, is to move to a "pull model" where the
|
340
|
-
task at the end of the list pulls a block from a previous task when it is ready to process it.
|
341
|
-
|
342
|
-
#### IOStreams
|
343
|
-
|
344
|
-
IOStreams uses the pull model when reading data, where each stream performs a read against the previous stream
|
345
|
-
when it is ready for more data.
|
346
|
-
|
347
|
-
When writing to an output stream, IOStreams uses the push model, where each block of data that is ready to be written
|
348
|
-
is pushed to the task/stream in the list. The write push only returns once it has traversed all the way down to
|
349
|
-
the final task / stream in the list, this avoids complex buffering issues between each task / stream in the list.
|
350
|
-
|
351
|
-
Example: Implementing in Ruby: `gunzip -c hello.csv.gz | wc -l`
|
352
|
-
|
353
|
-
~~~ruby
|
354
|
-
line_count = 0
|
355
|
-
IOStreams::Gzip::Reader.open("hello.csv.gz") do |input|
|
356
|
-
IOStreams::Line::Reader.open(input) do |lines|
|
357
|
-
lines.each { line_count += 1}
|
358
|
-
end
|
359
|
-
end
|
360
|
-
puts "hello.csv.gz contains #{line_count} lines"
|
361
|
-
~~~
|
362
|
-
|
363
|
-
Since IOStreams can autodetect file types based on the file extension, `IOStreams.reader` can figure which stream
|
364
|
-
to start with:
|
365
|
-
~~~ruby
|
366
|
-
line_count = 0
|
367
|
-
IOStreams.path("hello.csv.gz").reader do |input|
|
368
|
-
IOStreams::Line::Reader.open(input) do |lines|
|
369
|
-
lines.each { line_count += 1}
|
370
|
-
end
|
371
|
-
end
|
372
|
-
puts "hello.csv.gz contains #{line_count} lines"
|
373
|
-
~~~
|
374
|
-
|
375
|
-
Since we know we want a line reader, it can be simplified using `#reader(:line)`:
|
376
|
-
~~~ruby
|
377
|
-
line_count = 0
|
378
|
-
IOStreams.path("hello.csv.gz").reader(:line) do |lines|
|
379
|
-
lines.each { line_count += 1}
|
380
|
-
end
|
381
|
-
puts "hello.csv.gz contains #{line_count} lines"
|
382
|
-
~~~
|
383
|
-
|
384
|
-
It can be simplified even further using `#each`:
|
385
|
-
~~~ruby
|
386
|
-
line_count = 0
|
387
|
-
IOStreams.path("hello.csv.gz").each { line_count += 1}
|
388
|
-
puts "hello.csv.gz contains #{line_count} lines"
|
389
|
-
~~~
|
390
|
-
|
391
|
-
The benefit in all of the above cases is that the file can be any arbitrary size and only one block of the file
|
392
|
-
is held in memory at any time.
|
393
|
-
|
394
|
-
#### Chaining
|
395
|
-
|
396
|
-
In the above example only 2 streams were used. Streams can be nested as deep as necessary to process data.
|
397
|
-
|
398
|
-
Example, search for all occurrences of the word apple, cleansing the input data stream of non printable characters
|
399
|
-
and converting to valid US ASCII.
|
400
|
-
|
401
|
-
~~~ruby
|
402
|
-
apple_count = 0
|
403
|
-
IOStreams::Gzip::Reader.open("hello.csv.gz") do |input|
|
404
|
-
IOStreams::Encode::Reader.open(input,
|
405
|
-
encoding: "US-ASCII",
|
406
|
-
encode_replace: "",
|
407
|
-
encode_cleaner: :printable) do |cleansed|
|
408
|
-
IOStreams::Line::Reader.open(cleansed) do |lines|
|
409
|
-
lines.each { |line| apple_count += line.scan("apple").count}
|
410
|
-
end
|
411
|
-
end
|
412
|
-
puts "Found the word 'apple' #{apple_count} times in hello.csv.gz"
|
413
|
-
~~~
|
414
|
-
|
415
|
-
Let IOStreams perform the above stream chaining automatically under the covers:
|
416
|
-
|
417
|
-
~~~ruby
|
418
|
-
apple_count = 0
|
419
|
-
IOStreams.path("hello.csv.gz").
|
420
|
-
option(:encode, encoding: "US-ASCII", replace: "", cleaner: :printable).
|
421
|
-
each do |line|
|
422
|
-
apple_count += line.scan("apple").count
|
423
|
-
end
|
424
|
-
|
425
|
-
puts "Found the word 'apple' #{apple_count} times in hello.csv.gz"
|
426
|
-
~~~
|
427
|
-
|
428
|
-
## Notes
|
429
|
-
|
430
|
-
* Due to the nature of Zip, both its Reader and Writer methods will create
|
431
|
-
a temp file when reading from or writing to a stream.
|
432
|
-
Recommended to use Gzip over Zip since it can be streamed without requiring temp files.
|
433
|
-
* Zip becomes exponentially slower with very large files, especially files
|
434
|
-
that exceed 4GB when uncompressed. Highly recommend using GZip for large files.
|
12
|
+
[Semantic Logger Guide](http://rocketjob.github.io/iostreams)
|
435
13
|
|
436
14
|
## Versioning
|
437
15
|
|
@@ -9,7 +9,7 @@ module IOStreams
|
|
9
9
|
LINEFEED_REGEXP = Regexp.compile(/\r\n|\n|\r/).freeze
|
10
10
|
|
11
11
|
# Read a line at a time from a stream
|
12
|
-
def self.stream(input_stream,
|
12
|
+
def self.stream(input_stream, **args)
|
13
13
|
# Pass-through if already a line reader
|
14
14
|
return yield(input_stream) if input_stream.is_a?(self.class)
|
15
15
|
|
@@ -44,7 +44,7 @@ module IOStreams
|
|
44
44
|
# - Skip "empty" / "blank" lines. RegExp?
|
45
45
|
# - Extract header line(s) / first non-comment, non-blank line
|
46
46
|
# - Embedded newline support, RegExp? or Proc?
|
47
|
-
def initialize(input_stream, delimiter: nil, buffer_size: 65_536, embedded_within: nil)
|
47
|
+
def initialize(input_stream, delimiter: nil, buffer_size: 65_536, embedded_within: nil, original_file_name: nil)
|
48
48
|
super(input_stream)
|
49
49
|
|
50
50
|
@embedded_within = embedded_within
|
data/lib/io_streams/pgp.rb
CHANGED
@@ -2,85 +2,9 @@ require "open3"
|
|
2
2
|
module IOStreams
|
3
3
|
# Read/Write PGP/GPG file or stream.
|
4
4
|
#
|
5
|
-
# Example Setup:
|
6
|
-
#
|
7
|
-
# 1. Install OpenPGP
|
8
|
-
# Mac OSX (homebrew) : `brew install gpg2`
|
9
|
-
# Redhat Linux: `rpm install gpg2`
|
10
|
-
#
|
11
|
-
# 2. # Generate senders private and public key
|
12
|
-
# IOStreams::Pgp.generate_key(name: 'Sender', email: 'sender@example.org', passphrase: 'sender_passphrase')
|
13
|
-
#
|
14
|
-
# 3. # Generate receivers private and public key
|
15
|
-
# IOStreams::Pgp.generate_key(name: 'Receiver', email: 'receiver@example.org', passphrase: 'receiver_passphrase')
|
16
|
-
#
|
17
|
-
# Example 1:
|
18
|
-
#
|
19
|
-
# # Generate encrypted file for a specific recipient and sign it with senders credentials
|
20
|
-
# data = %w(this is some data that should be encrypted using pgp)
|
21
|
-
# IOStreams::Pgp::Writer.open('secure.gpg', recipient: 'receiver@example.org', signer: 'sender@example.org', signer_passphrase: 'sender_passphrase') do |output|
|
22
|
-
# data.each { |word| output.puts(word) }
|
23
|
-
# end
|
24
|
-
#
|
25
|
-
# # Decrypt the file sent to `receiver@example.org` using its private key
|
26
|
-
# # Recipient must also have the senders public key to verify the signature
|
27
|
-
# IOStreams::Pgp::Reader.open('secure.gpg', passphrase: 'receiver_passphrase') do |stream|
|
28
|
-
# while !stream.eof?
|
29
|
-
# p stream.read(10)
|
30
|
-
# puts
|
31
|
-
# end
|
32
|
-
# end
|
33
|
-
#
|
34
|
-
# Example 2:
|
35
|
-
#
|
36
|
-
# # Default user and passphrase to sign the output file:
|
37
|
-
# IOStreams::Pgp::Writer.default_signer = 'sender@example.org'
|
38
|
-
# IOStreams::Pgp::Writer.default_signer_passphrase = 'sender_passphrase'
|
39
|
-
#
|
40
|
-
# # Default passphrase for decrypting recipients files.
|
41
|
-
# # Note: Usually this would be the senders passphrase, but in this example
|
42
|
-
# # it is decrypting the file intended for the recipient.
|
43
|
-
# IOStreams::Pgp::Reader.default_passphrase = 'receiver_passphrase'
|
44
|
-
#
|
45
|
-
# # Generate encrypted file for a specific recipient and sign it with senders credentials
|
46
|
-
# data = %w(this is some data that should be encrypted using pgp)
|
47
|
-
# IOStreams.writer('secure.gpg', streams: {pgp: {recipient: 'receiver@example.org'}}) do |output|
|
48
|
-
# data.each { |word| output.puts(word) }
|
49
|
-
# end
|
50
|
-
#
|
51
|
-
# # Decrypt the file sent to `receiver@example.org` using its private key
|
52
|
-
# # Recipient must also have the senders public key to verify the signature
|
53
|
-
# IOStreams.reader('secure.gpg') do |stream|
|
54
|
-
# while data = stream.read(10)
|
55
|
-
# p data
|
56
|
-
# end
|
57
|
-
# end
|
58
|
-
#
|
59
|
-
# FAQ:
|
60
|
-
# - If you get not trusted errors
|
61
|
-
# gpg --edit-key sender@example.org
|
62
|
-
# Select highest level: 5
|
63
|
-
#
|
64
|
-
# Delete test keys:
|
65
|
-
# IOStreams::Pgp.delete_keys(email: 'sender@example.org', private: true)
|
66
|
-
# IOStreams::Pgp.delete_keys(email: 'receiver@example.org', private: true)
|
67
|
-
#
|
68
5
|
# Limitations
|
69
6
|
# - Designed for processing larger files since a process is spawned for each file processed.
|
70
7
|
# - For small in memory files or individual emails, use the 'opengpgme' library.
|
71
|
-
#
|
72
|
-
# Compression Performance:
|
73
|
-
# Running tests on an Early 2015 Macbook Pro Dual Core with Ruby v2.3.1
|
74
|
-
#
|
75
|
-
# Input file: test.log 3.6GB
|
76
|
-
# :none: size: 3.6GB write: 52s read: 45s
|
77
|
-
# :zip: size: 411MB write: 75s read: 31s
|
78
|
-
# :zlib: size: 241MB write: 66s read: 23s ( 756KB Memory )
|
79
|
-
# :bzip2: size: 129MB write: 430s read: 130s ( 5MB Memory )
|
80
|
-
#
|
81
|
-
# Notes:
|
82
|
-
# - Tested against gnupg v1.4.21 and v2.0.30
|
83
|
-
# - Does not work yet with gnupg v2.1. Pull Requests welcome.
|
84
8
|
module Pgp
|
85
9
|
autoload :Reader, "io_streams/pgp/reader"
|
86
10
|
autoload :Writer, "io_streams/pgp/writer"
|
@@ -7,7 +7,7 @@ module IOStreams
|
|
7
7
|
# Read a record at a time from a line stream
|
8
8
|
# Note:
|
9
9
|
# - The supplied stream _must_ already be a line stream, or a stream that responds to :each
|
10
|
-
def self.stream(line_reader,
|
10
|
+
def self.stream(line_reader, **args)
|
11
11
|
# Pass-through if already a record reader
|
12
12
|
return yield(line_reader) if line_reader.is_a?(self.class)
|
13
13
|
|
@@ -17,7 +17,7 @@ module IOStreams
|
|
17
17
|
# When reading from a file also add the line reader stream
|
18
18
|
def self.file(file_name, original_file_name: file_name, delimiter: $/, **args)
|
19
19
|
IOStreams::Line::Reader.file(file_name, original_file_name: original_file_name, delimiter: delimiter) do |io|
|
20
|
-
yield new(io, **args)
|
20
|
+
yield new(io, original_file_name: original_file_name, **args)
|
21
21
|
end
|
22
22
|
end
|
23
23
|
|
@@ -25,19 +25,44 @@ module IOStreams
|
|
25
25
|
# Parse a delimited data source.
|
26
26
|
#
|
27
27
|
# Parameters
|
28
|
-
# delimited: [#each]
|
29
|
-
# Anything that returns one line / record at a time when #each is called on it.
|
30
|
-
#
|
31
28
|
# format: [Symbol]
|
32
29
|
# :csv, :hash, :array, :json, :psv, :fixed
|
33
30
|
#
|
34
|
-
#
|
35
|
-
|
31
|
+
# file_name: [String]
|
32
|
+
# When `:format` is not supplied the file name can be used to infer the required format.
|
33
|
+
# Optional. Default: nil
|
34
|
+
#
|
35
|
+
# format_options: [Hash]
|
36
|
+
# Any specialized format specific options. For example, `:fixed` format requires the file definition.
|
37
|
+
#
|
38
|
+
# columns [Array<String>]
|
39
|
+
# The header columns when the file does not include a header row.
|
40
|
+
# Note:
|
41
|
+
# It is recommended to keep all columns as strings to avoid any issues when persistence
|
42
|
+
# with MongoDB when it converts symbol keys to strings.
|
43
|
+
#
|
44
|
+
# allowed_columns [Array<String>]
|
45
|
+
# List of columns to allow.
|
46
|
+
# Default: nil ( Allow all columns )
|
47
|
+
# Note:
|
48
|
+
# When supplied any columns that are rejected will be returned in the cleansed columns
|
49
|
+
# as nil so that they can be ignored during processing.
|
50
|
+
#
|
51
|
+
# required_columns [Array<String>]
|
52
|
+
# List of columns that must be present, otherwise an Exception is raised.
|
53
|
+
#
|
54
|
+
# skip_unknown [true|false]
|
55
|
+
# true:
|
56
|
+
# Skip columns not present in the `allowed_columns` by cleansing them to nil.
|
57
|
+
# #as_hash will skip these additional columns entirely as if they were not in the file at all.
|
58
|
+
# false:
|
59
|
+
# Raises Tabular::InvalidHeader when a column is supplied that is not in the whitelist.
|
60
|
+
def initialize(line_reader, cleanse_header: true, original_file_name: nil, **args)
|
36
61
|
unless line_reader.respond_to?(:each)
|
37
62
|
raise(ArgumentError, "Stream must be a IOStreams::Line::Reader or implement #each")
|
38
63
|
end
|
39
64
|
|
40
|
-
@tabular = IOStreams::Tabular.new(**args)
|
65
|
+
@tabular = IOStreams::Tabular.new(file_name: original_file_name, **args)
|
41
66
|
@line_reader = line_reader
|
42
67
|
@cleanse_header = cleanse_header
|
43
68
|
end
|
@@ -9,7 +9,7 @@ module IOStreams
|
|
9
9
|
# Write a record as a Hash at a time to a stream.
|
10
10
|
# Note:
|
11
11
|
# - The supplied stream _must_ already be a line stream, or a stream that responds to :<<
|
12
|
-
def self.stream(line_writer,
|
12
|
+
def self.stream(line_writer, **args)
|
13
13
|
# Pass-through if already a record writer
|
14
14
|
return yield(line_writer) if line_writer.is_a?(self.class)
|
15
15
|
|
@@ -19,7 +19,7 @@ module IOStreams
|
|
19
19
|
# When writing to a file also add the line writer stream
|
20
20
|
def self.file(file_name, original_file_name: file_name, delimiter: $/, **args, &block)
|
21
21
|
IOStreams::Line::Writer.file(file_name, original_file_name: original_file_name, delimiter: delimiter) do |io|
|
22
|
-
yield new(io, **args, &block)
|
22
|
+
yield new(io, original_file_name: original_file_name, **args, &block)
|
23
23
|
end
|
24
24
|
end
|
25
25
|
|
@@ -27,17 +27,42 @@ module IOStreams
|
|
27
27
|
# Parse a delimited data source.
|
28
28
|
#
|
29
29
|
# Parameters
|
30
|
-
# delimited: [#<<]
|
31
|
-
# Anything that accepts a line / record at a time when #<< is called on it.
|
32
|
-
#
|
33
30
|
# format: [Symbol]
|
34
31
|
# :csv, :hash, :array, :json, :psv, :fixed
|
35
32
|
#
|
36
|
-
#
|
37
|
-
|
33
|
+
# file_name: [String]
|
34
|
+
# When `:format` is not supplied the file name can be used to infer the required format.
|
35
|
+
# Optional. Default: nil
|
36
|
+
#
|
37
|
+
# format_options: [Hash]
|
38
|
+
# Any specialized format specific options. For example, `:fixed` format requires the file definition.
|
39
|
+
#
|
40
|
+
# columns [Array<String>]
|
41
|
+
# The header columns when the file does not include a header row.
|
42
|
+
# Note:
|
43
|
+
# It is recommended to keep all columns as strings to avoid any issues when persistence
|
44
|
+
# with MongoDB when it converts symbol keys to strings.
|
45
|
+
#
|
46
|
+
# allowed_columns [Array<String>]
|
47
|
+
# List of columns to allow.
|
48
|
+
# Default: nil ( Allow all columns )
|
49
|
+
# Note:
|
50
|
+
# When supplied any columns that are rejected will be returned in the cleansed columns
|
51
|
+
# as nil so that they can be ignored during processing.
|
52
|
+
#
|
53
|
+
# required_columns [Array<String>]
|
54
|
+
# List of columns that must be present, otherwise an Exception is raised.
|
55
|
+
#
|
56
|
+
# skip_unknown [true|false]
|
57
|
+
# true:
|
58
|
+
# Skip columns not present in the `allowed_columns` by cleansing them to nil.
|
59
|
+
# #as_hash will skip these additional columns entirely as if they were not in the file at all.
|
60
|
+
# false:
|
61
|
+
# Raises Tabular::InvalidHeader when a column is supplied that is not in the whitelist.
|
62
|
+
def initialize(line_writer, columns: nil, original_file_name: nil, **args)
|
38
63
|
raise(ArgumentError, "Stream must be a IOStreams::Line::Writer or implement #<<") unless line_writer.respond_to?(:<<)
|
39
64
|
|
40
|
-
@tabular = IOStreams::Tabular.new(columns: columns, **args)
|
65
|
+
@tabular = IOStreams::Tabular.new(columns: columns, file_name: original_file_name, **args)
|
41
66
|
@line_writer = line_writer
|
42
67
|
|
43
68
|
# Render header line when `columns` is supplied.
|
@@ -5,7 +5,7 @@ module IOStreams
|
|
5
5
|
# Read a line as an Array at a time from a stream.
|
6
6
|
# Note:
|
7
7
|
# - The supplied stream _must_ already be a line stream, or a stream that responds to :each
|
8
|
-
def self.stream(line_reader,
|
8
|
+
def self.stream(line_reader, **args)
|
9
9
|
# Pass-through if already a row reader
|
10
10
|
return yield(line_reader) if line_reader.is_a?(self.class)
|
11
11
|
|
@@ -15,7 +15,7 @@ module IOStreams
|
|
15
15
|
# When reading from a file also add the line reader stream
|
16
16
|
def self.file(file_name, original_file_name: file_name, delimiter: $/, **args)
|
17
17
|
IOStreams::Line::Reader.file(file_name, original_file_name: original_file_name, delimiter: delimiter) do |io|
|
18
|
-
yield new(io, **args)
|
18
|
+
yield new(io, original_file_name: original_file_name, **args)
|
19
19
|
end
|
20
20
|
end
|
21
21
|
|
@@ -29,12 +29,12 @@ module IOStreams
|
|
29
29
|
# :csv, :hash, :array, :json, :psv, :fixed
|
30
30
|
#
|
31
31
|
# For all other parameters, see Tabular::Header.new
|
32
|
-
def initialize(line_reader, cleanse_header: true, **args)
|
32
|
+
def initialize(line_reader, cleanse_header: true, original_file_name: nil, **args)
|
33
33
|
unless line_reader.respond_to?(:each)
|
34
34
|
raise(ArgumentError, "Stream must be a IOStreams::Line::Reader or implement #each")
|
35
35
|
end
|
36
36
|
|
37
|
-
@tabular = IOStreams::Tabular.new(**args)
|
37
|
+
@tabular = IOStreams::Tabular.new(file_name: original_file_name, **args)
|
38
38
|
@line_reader = line_reader
|
39
39
|
@cleanse_header = cleanse_header
|
40
40
|
end
|
@@ -12,7 +12,7 @@ module IOStreams
|
|
12
12
|
#
|
13
13
|
# Note:
|
14
14
|
# - The supplied stream _must_ already be a line stream, or a stream that responds to :<<
|
15
|
-
def self.stream(line_writer,
|
15
|
+
def self.stream(line_writer, **args)
|
16
16
|
# Pass-through if already a row writer
|
17
17
|
return yield(line_writer) if line_writer.is_a?(self.class)
|
18
18
|
|
@@ -22,7 +22,7 @@ module IOStreams
|
|
22
22
|
# When writing to a file also add the line writer stream
|
23
23
|
def self.file(file_name, original_file_name: file_name, delimiter: $/, **args, &block)
|
24
24
|
IOStreams::Line::Writer.file(file_name, original_file_name: original_file_name, delimiter: delimiter) do |io|
|
25
|
-
yield new(io, **args, &block)
|
25
|
+
yield new(io, original_file_name: original_file_name, **args, &block)
|
26
26
|
end
|
27
27
|
end
|
28
28
|
|
@@ -36,10 +36,10 @@ module IOStreams
|
|
36
36
|
# :csv, :hash, :array, :json, :psv, :fixed
|
37
37
|
#
|
38
38
|
# For all other parameters, see Tabular::Header.new
|
39
|
-
def initialize(line_writer, columns: nil, **args)
|
39
|
+
def initialize(line_writer, columns: nil, original_file_name: nil, **args)
|
40
40
|
raise(ArgumentError, "Stream must be a IOStreams::Line::Writer or implement #<<") unless line_writer.respond_to?(:<<)
|
41
41
|
|
42
|
-
@tabular = IOStreams::Tabular.new(columns: columns, **args)
|
42
|
+
@tabular = IOStreams::Tabular.new(columns: columns, file_name: original_file_name, **args)
|
43
43
|
@line_writer = line_writer
|
44
44
|
|
45
45
|
# Render header line when `columns` is supplied.
|
data/lib/io_streams/stream.rb
CHANGED
@@ -282,20 +282,20 @@ module IOStreams
|
|
282
282
|
def line_reader(embedded_within: nil, **args)
|
283
283
|
embedded_within = '"' if embedded_within.nil? && builder.file_name&.include?(".csv")
|
284
284
|
|
285
|
-
stream_reader { |io| yield IOStreams::Line::Reader.new(io, embedded_within: embedded_within, **args) }
|
285
|
+
stream_reader { |io| yield IOStreams::Line::Reader.new(io, original_file_name: builder.file_name, embedded_within: embedded_within, **args) }
|
286
286
|
end
|
287
287
|
|
288
288
|
# Iterate over a file / stream returning each line as an array, one at a time.
|
289
289
|
def row_reader(delimiter: nil, embedded_within: nil, **args)
|
290
290
|
line_reader(delimiter: delimiter, embedded_within: embedded_within) do |io|
|
291
|
-
yield IOStreams::Row::Reader.new(io, **args)
|
291
|
+
yield IOStreams::Row::Reader.new(io, original_file_name: builder.file_name, **args)
|
292
292
|
end
|
293
293
|
end
|
294
294
|
|
295
295
|
# Iterate over a file / stream returning each line as a hash, one at a time.
|
296
296
|
def record_reader(delimiter: nil, embedded_within: nil, **args)
|
297
297
|
line_reader(delimiter: delimiter, embedded_within: embedded_within) do |io|
|
298
|
-
yield IOStreams::Record::Reader.new(io, **args)
|
298
|
+
yield IOStreams::Record::Reader.new(io, original_file_name: builder.file_name, **args)
|
299
299
|
end
|
300
300
|
end
|
301
301
|
|
@@ -306,19 +306,19 @@ module IOStreams
|
|
306
306
|
def line_writer(**args, &block)
|
307
307
|
return block.call(io_stream) if io_stream&.is_a?(IOStreams::Line::Writer)
|
308
308
|
|
309
|
-
writer { |io| IOStreams::Line::Writer.stream(io, **args, &block) }
|
309
|
+
writer { |io| IOStreams::Line::Writer.stream(io, original_file_name: builder.file_name, **args, &block) }
|
310
310
|
end
|
311
311
|
|
312
312
|
def row_writer(delimiter: $/, **args, &block)
|
313
313
|
return block.call(io_stream) if io_stream&.is_a?(IOStreams::Row::Writer)
|
314
314
|
|
315
|
-
line_writer(delimiter: delimiter) { |io| IOStreams::Row::Writer.stream(io, **args, &block) }
|
315
|
+
line_writer(delimiter: delimiter) { |io| IOStreams::Row::Writer.stream(io, original_file_name: builder.file_name, **args, &block) }
|
316
316
|
end
|
317
317
|
|
318
318
|
def record_writer(delimiter: $/, **args, &block)
|
319
319
|
return block.call(io_stream) if io_stream&.is_a?(IOStreams::Record::Writer)
|
320
320
|
|
321
|
-
line_writer(delimiter: delimiter) { |io| IOStreams::Record::Writer.stream(io, **args, &block) }
|
321
|
+
line_writer(delimiter: delimiter) { |io| IOStreams::Record::Writer.stream(io, original_file_name: builder.file_name, **args, &block) }
|
322
322
|
end
|
323
323
|
end
|
324
324
|
end
|
data/lib/io_streams/tabular.rb
CHANGED
@@ -52,7 +52,35 @@ module IOStreams
|
|
52
52
|
# format: [Symbol]
|
53
53
|
# :csv, :hash, :array, :json, :psv, :fixed
|
54
54
|
#
|
55
|
-
#
|
55
|
+
# file_name: [String]
|
56
|
+
# When `:format` is not supplied the file name can be used to infer the required format.
|
57
|
+
# Optional. Default: nil
|
58
|
+
#
|
59
|
+
# format_options: [Hash]
|
60
|
+
# Any specialized format specific options. For example, `:fixed` format requires the file definition.
|
61
|
+
#
|
62
|
+
# columns [Array<String>]
|
63
|
+
# The header columns when the file does not include a header row.
|
64
|
+
# Note:
|
65
|
+
# It is recommended to keep all columns as strings to avoid any issues when persistence
|
66
|
+
# with MongoDB when it converts symbol keys to strings.
|
67
|
+
#
|
68
|
+
# allowed_columns [Array<String>]
|
69
|
+
# List of columns to allow.
|
70
|
+
# Default: nil ( Allow all columns )
|
71
|
+
# Note:
|
72
|
+
# When supplied any columns that are rejected will be returned in the cleansed columns
|
73
|
+
# as nil so that they can be ignored during processing.
|
74
|
+
#
|
75
|
+
# required_columns [Array<String>]
|
76
|
+
# List of columns that must be present, otherwise an Exception is raised.
|
77
|
+
#
|
78
|
+
# skip_unknown [true|false]
|
79
|
+
# true:
|
80
|
+
# Skip columns not present in the `allowed_columns` by cleansing them to nil.
|
81
|
+
# #as_hash will skip these additional columns entirely as if they were not in the file at all.
|
82
|
+
# false:
|
83
|
+
# Raises Tabular::InvalidHeader when a column is supplied that is not in the whitelist.
|
56
84
|
def initialize(format: nil, file_name: nil, format_options: nil, **args)
|
57
85
|
@header = Header.new(**args)
|
58
86
|
klass =
|
data/lib/io_streams/utils.rb
CHANGED
data/lib/io_streams/version.rb
CHANGED
data/test/io_streams_test.rb
CHANGED
@@ -1,8 +1,24 @@
|
|
1
1
|
require_relative "test_helper"
|
2
|
+
require "json"
|
2
3
|
|
3
4
|
module IOStreams
|
4
5
|
class PathTest < Minitest::Test
|
5
6
|
describe IOStreams do
|
7
|
+
let :records do
|
8
|
+
[
|
9
|
+
{"name" => "Jack Jones", "login" => "jjones"},
|
10
|
+
{"name" => "Jill Smith", "login" => "jsmith"}
|
11
|
+
]
|
12
|
+
end
|
13
|
+
|
14
|
+
let :expected_json do
|
15
|
+
records.collect(&:to_json).join("\n") + "\n"
|
16
|
+
end
|
17
|
+
|
18
|
+
let :json_file_name do
|
19
|
+
"/tmp/io_streams/abc.json"
|
20
|
+
end
|
21
|
+
|
6
22
|
describe ".root" do
|
7
23
|
it "return default path" do
|
8
24
|
path = ::File.expand_path(::File.join(__dir__, "../tmp/default"))
|
@@ -60,6 +76,39 @@ module IOStreams
|
|
60
76
|
IOStreams.path("s3://a.xyz")
|
61
77
|
assert_equal :s3, path
|
62
78
|
end
|
79
|
+
|
80
|
+
it "hash writer detects json format from file name" do
|
81
|
+
path = IOStreams.path("/tmp/io_streams/abc.json")
|
82
|
+
path.writer(:hash) do |io|
|
83
|
+
records.each { |hash| io << hash }
|
84
|
+
end
|
85
|
+
actual = path.read
|
86
|
+
path.delete
|
87
|
+
assert_equal expected_json, actual
|
88
|
+
end
|
89
|
+
|
90
|
+
it "hash reader detects json format from file name" do
|
91
|
+
::File.open(json_file_name, "wb") { |file| file.write(expected_json) }
|
92
|
+
rows = []
|
93
|
+
path = IOStreams.path("/tmp/io_streams/abc.json")
|
94
|
+
path.each(:hash) do |row|
|
95
|
+
rows << row
|
96
|
+
end
|
97
|
+
actual = rows.collect(&:to_json).join("\n") + "\n"
|
98
|
+
path.delete
|
99
|
+
assert_equal expected_json, actual
|
100
|
+
end
|
101
|
+
|
102
|
+
it "array writer detects json format from file name" do
|
103
|
+
path = IOStreams.path("/tmp/io_streams/abc.json")
|
104
|
+
path.writer(:array, columns: %w[name login]) do |io|
|
105
|
+
io << ["Jack Jones", "jjones"]
|
106
|
+
io << ["Jill Smith", "jsmith"]
|
107
|
+
end
|
108
|
+
actual = path.read
|
109
|
+
path.delete
|
110
|
+
assert_equal expected_json, actual
|
111
|
+
end
|
63
112
|
end
|
64
113
|
|
65
114
|
describe ".temp_file" do
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: iostreams
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.2.
|
4
|
+
version: 1.2.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Reid Morrison
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2020-
|
11
|
+
date: 2020-05-19 00:00:00.000000000 Z
|
12
12
|
dependencies: []
|
13
13
|
description:
|
14
14
|
email:
|
@@ -60,7 +60,6 @@ files:
|
|
60
60
|
- lib/io_streams/tabular/parser/psv.rb
|
61
61
|
- lib/io_streams/tabular/utility/csv_row.rb
|
62
62
|
- lib/io_streams/utils.rb
|
63
|
-
- lib/io_streams/utils/reliable_http.rb
|
64
63
|
- lib/io_streams/version.rb
|
65
64
|
- lib/io_streams/writer.rb
|
66
65
|
- lib/io_streams/xlsx/reader.rb
|
@@ -131,7 +130,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
131
130
|
- !ruby/object:Gem::Version
|
132
131
|
version: '0'
|
133
132
|
requirements: []
|
134
|
-
rubygems_version: 3.0.
|
133
|
+
rubygems_version: 3.0.6
|
135
134
|
signing_key:
|
136
135
|
specification_version: 4
|
137
136
|
summary: Input and Output streaming for Ruby.
|
@@ -1,98 +0,0 @@
|
|
1
|
-
require "net/http"
|
2
|
-
require "uri"
|
3
|
-
module IOStreams
|
4
|
-
module Utils
|
5
|
-
class ReliableHTTP
|
6
|
-
attr_reader :username, :password, :max_redirects, :url
|
7
|
-
|
8
|
-
# Reliable HTTP implementation with support for:
|
9
|
-
# * HTTP Redirects
|
10
|
-
# * Basic authentication
|
11
|
-
# * Raises an exception anytime the HTTP call is not successful.
|
12
|
-
# * TODO: Automatic retries with a logarithmic backoff strategy.
|
13
|
-
#
|
14
|
-
# Parameters:
|
15
|
-
# url: [String]
|
16
|
-
# URI of the file to download.
|
17
|
-
# Example:
|
18
|
-
# https://www5.fdic.gov/idasp/Offices2.zip
|
19
|
-
# http://hostname/path/file_name
|
20
|
-
#
|
21
|
-
# Full url showing all the optional elements that can be set via the url:
|
22
|
-
# https://username:password@hostname/path/file_name
|
23
|
-
#
|
24
|
-
# username: [String]
|
25
|
-
# When supplied, basic authentication is used with the username and password.
|
26
|
-
#
|
27
|
-
# password: [String]
|
28
|
-
# Password to use use with basic authentication when the username is supplied.
|
29
|
-
#
|
30
|
-
# max_redirects: [Integer]
|
31
|
-
# Maximum number of http redirects to follow.
|
32
|
-
def initialize(url, username: nil, password: nil, max_redirects: 10)
|
33
|
-
uri = URI.parse(url)
|
34
|
-
unless %w[http https].include?(uri.scheme)
|
35
|
-
raise(ArgumentError, "Invalid URL. Required Format: 'http://<host_name>/<file_name>', or 'https://<host_name>/<file_name>'")
|
36
|
-
end
|
37
|
-
|
38
|
-
@username = username || uri.user
|
39
|
-
@password = password || uri.password
|
40
|
-
@max_redirects = max_redirects
|
41
|
-
@url = url
|
42
|
-
end
|
43
|
-
|
44
|
-
# Read a file using an http get.
|
45
|
-
#
|
46
|
-
# For example:
|
47
|
-
# IOStreams.path('https://www5.fdic.gov/idasp/Offices2.zip').reader {|file| puts file.read}
|
48
|
-
#
|
49
|
-
# Read the file without unzipping and streaming the first file in the zip:
|
50
|
-
# IOStreams.path('https://www5.fdic.gov/idasp/Offices2.zip').stream(:none).reader {|file| puts file.read}
|
51
|
-
#
|
52
|
-
# Notes:
|
53
|
-
# * Since Net::HTTP download only supports a push stream, the data is streamed into a tempfile first.
|
54
|
-
def post(&block)
|
55
|
-
handle_redirects(Net::HTTP::Post, url, max_redirects, &block)
|
56
|
-
end
|
57
|
-
|
58
|
-
def get(&block)
|
59
|
-
handle_redirects(Net::HTTP::Get, url, max_redirects, &block)
|
60
|
-
end
|
61
|
-
|
62
|
-
private
|
63
|
-
|
64
|
-
def handle_redirects(request_class, uri, max_redirects, &block)
|
65
|
-
uri = URI.parse(uri) unless uri.is_a?(URI)
|
66
|
-
result = nil
|
67
|
-
raise(IOStreams::Errors::CommunicationsFailure, "Too many redirects") if max_redirects < 1
|
68
|
-
|
69
|
-
Net::HTTP.start(uri.hostname, uri.port, use_ssl: uri.scheme == "https") do |http|
|
70
|
-
request = request_class.new(uri)
|
71
|
-
request.basic_auth(username, password) if username
|
72
|
-
|
73
|
-
http.request(request) do |response|
|
74
|
-
raise(IOStreams::Errors::CommunicationsFailure, "Invalid URL: #{uri}") if response.is_a?(Net::HTTPNotFound)
|
75
|
-
|
76
|
-
if response.is_a?(Net::HTTPUnauthorized)
|
77
|
-
raise(IOStreams::Errors::CommunicationsFailure, "Authorization Required: Invalid :username or :password.")
|
78
|
-
end
|
79
|
-
|
80
|
-
if response.is_a?(Net::HTTPRedirection)
|
81
|
-
new_uri = response["location"]
|
82
|
-
return handle_redirects(request_class, new_uri, max_redirects: max_redirects - 1, &block)
|
83
|
-
end
|
84
|
-
|
85
|
-
unless response.is_a?(Net::HTTPSuccess)
|
86
|
-
raise(IOStreams::Errors::CommunicationsFailure, "Invalid response code: #{response.code}")
|
87
|
-
end
|
88
|
-
|
89
|
-
yield(response) if block_given?
|
90
|
-
|
91
|
-
result = response
|
92
|
-
end
|
93
|
-
end
|
94
|
-
result
|
95
|
-
end
|
96
|
-
end
|
97
|
-
end
|
98
|
-
end
|