oak 0.4.1 → 0.4.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.travis.yml +5 -3
- data/CHANGELOG.md +10 -1
- data/DESIDERATA.md +9 -4
- data/DESIGN.md +410 -0
- data/ENCRYPTION.md +893 -0
- data/README.md +2 -6
- data/bin/oak +1 -0
- data/bin/oak.rb +122 -124
- data/lib/oak.rb +13 -0
- data/lib/oak/version.rb +1 -1
- metadata +5 -3
- data/bin/oak +0 -245
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: c5a9c1b9b0290b6e92de16b26030631e9f795fec
|
4
|
+
data.tar.gz: 898af2d521ee3fba60258c21411fe4dcc839ce16
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: f724eac8c05c5f948192ac73c70c136623beae95289ddff8f2d1b024d0351a2f55032841c347e54b0caf6c8d3b604f698b4e86981f716a988599b709c79e95b9
|
7
|
+
data.tar.gz: 75137b8bdf2d4807c018253f7535335bad3d74b94afa7c71ebd085a03de423015dfa6cd12e95d3d3c8f61ce4c8d366ca2cc7ecb5f58e89686e6e2a7cf4ca1451
|
data/.travis.yml
CHANGED
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,12 @@
|
|
1
|
+
## 0.4.2 (TBD)
|
2
|
+
|
3
|
+
- `oak`, `oak.rb` patched to play nicer with bundler
|
4
|
+
- Fix exception type bug in ruby >= 2.3 and openssl >= 2.0.0 when
|
5
|
+
parsing some corrupt encrypted data.
|
6
|
+
- Updated .travis.yml to update and expand the versions of Ruby covered.
|
7
|
+
- Migrate DESIGN.md and ENCRYPTION.md in from Google Docs.
|
8
|
+
|
9
|
+
|
1
10
|
## 0.4.1 (2018-10-01)
|
2
11
|
|
3
12
|
- `oak`, `oak.rb` published as executables from gem.
|
@@ -8,7 +17,7 @@
|
|
8
17
|
|
9
18
|
## 0.4.0 (2018-09-24)
|
10
19
|
|
11
|
-
- First export from ProsperWorks/ALI.
|
20
|
+
- First export from [ProsperWorks/ALI](https://github.com/ProsperWorks/ALI).
|
12
21
|
- First conversion to gem.
|
13
22
|
- Not open (yet).
|
14
23
|
- OAK3 emitted by default.
|
data/DESIDERATA.md
CHANGED
@@ -266,10 +266,15 @@ than hypothesis.
|
|
266
266
|
|
267
267
|
## Encryption Choices
|
268
268
|
|
269
|
-
|
270
|
-
|
271
|
-
bodies
|
272
|
-
|
269
|
+
OAK has been live in `ALI` (copper.com's primary web service) in our
|
270
|
+
Redis cache layer on 2016-06-02 and for archiving correspondence
|
271
|
+
bodies in S3 on 2016-07-06.
|
272
|
+
|
273
|
+
There had been only Rubocop updates and nary a bugfix since
|
274
|
+
2016-07-01.
|
275
|
+
|
276
|
+
Encryption is the first extension since OAK first went live.
|
277
|
+
|
273
278
|
|
274
279
|
### Encryption-in-OAK Design Decisions (see arch doc for discussion):
|
275
280
|
|
data/DESIGN.md
ADDED
@@ -0,0 +1,410 @@
|
|
1
|
+
# OAK: The Object ArKive
|
2
|
+
|
3
|
+
## All that to avoid JSON?
|
4
|
+
|
5
|
+
OAK is a serialization and envelope format which encodes simple Ruby
|
6
|
+
objects as strings. It bundles together a variety of well-understood
|
7
|
+
encoding libraries into a succinct self-describing package.
|
8
|
+
|
9
|
+
This document covers the existing OAK format, and proposed encryption
|
10
|
+
extension.
|
11
|
+
|
12
|
+
OAK compares to JSON, YAML, and Marshal. OAK is more precise than
|
13
|
+
JSON or YAML, but slightly more Ruby-esque, and supports fewer types
|
14
|
+
than Marshal. OAK also has features similar to OpenPGP
|
15
|
+
([https://tools.ietf.org/html/rfc4880](https://tools.ietf.org/html/rfc4880))
|
16
|
+
(though not, so far, encryption).
|
17
|
+
|
18
|
+
The main value proposition for OAK is operational flexibility. OAK
|
19
|
+
leds you defer choices between compression, checksumming, and 7-bit
|
20
|
+
cleanliness algorithms until after a system is live and under load.
|
21
|
+
|
22
|
+
As of 2017-09-13, OAK is used by `ALI` (copper.com's primary web
|
23
|
+
service) for volatile caches in Redis, durable archives in S3, and for
|
24
|
+
7-bit clean encoding of complex configuration data.
|
25
|
+
|
26
|
+
Author: [JHW](https://github.com/jhwillett)
|
27
|
+
Advisors: Marshall, Gerald, Kelly, Neil
|
28
|
+
|
29
|
+
Here is a sneak preview of some OAK strings:
|
30
|
+
```
|
31
|
+
$ echo 'HelloWorld!' | bin/oak.rb --format none
|
32
|
+
oak_3CNN_1336599037_18_F1SU11_HelloWorld!_ok
|
33
|
+
|
34
|
+
$ echo 'HelloWorld!' | bin/oak.rb
|
35
|
+
oak_3CNB_1336599037_24_RjFTVTExX0hlbGxvV29ybGQh_ok
|
36
|
+
|
37
|
+
$ echo 'HelloWorld!' | bin/oak.rb --compression lz4 --force
|
38
|
+
oak_3C4B_1336599037_28_EvADRjFTVTExX0hlbGxvV29ybGQh_ok
|
39
|
+
|
40
|
+
$ echo 'HelloWorld!' | bin/oak.rb | bin/oak.rb --mode decode-lines
|
41
|
+
HelloWorld!
|
42
|
+
```
|
43
|
+
|
44
|
+
## OAK Version History
|
45
|
+
|
46
|
+
OAK has been live for 16 months by the time this arch document was
|
47
|
+
prepared retroactively.
|
48
|
+
|
49
|
+
* [https://github.com/ProsperWorks/ALI/pull/1245](https://github.com/ProsperWorks/ALI/pull/1245) **oak**
|
50
|
+
* Merged to major_2016_04_dragonfruit.
|
51
|
+
* Initial implementation. Not integrated or active.
|
52
|
+
* Version oak_1
|
53
|
+
* [https://github.com/ProsperWorks/ALI/pull/1350](https://github.com/ProsperWorks/ALI/pull/1350) **oak-in-summary_accessor**
|
54
|
+
* Merged to major_2016_04_dragonfruit.
|
55
|
+
* Reworked SummaryAccessor with ENV flag to switch to OAK serialization.
|
56
|
+
* Exposure to live data revealed some issues missed in the lab.
|
57
|
+
* [https://github.com/ProsperWorks/ALI/pull/1631](https://github.com/ProsperWorks/ALI/pull/1631) **oak-remove-json-and-yaml**
|
58
|
+
* Merged to major_2016_06_goat.
|
59
|
+
* Simplified down to only the one serialization algorithm, "FRIZZY".
|
60
|
+
* Version oak_2
|
61
|
+
* [https://github.com/ProsperWorks/ALI/pull/1618](https://github.com/ProsperWorks/ALI/pull/1618) **fix-oak-utf8**
|
62
|
+
* Merged in major_2016_06_goat.
|
63
|
+
* Fixed issues uncovered in SummaryAccessor ramp.
|
64
|
+
* Version oak_3
|
65
|
+
* [https://github.com/ProsperWorks/ALI/pull/1655](https://github.com/ProsperWorks/ALI/pull/1655) **volatile_cache_accessor**
|
66
|
+
* Merged in major_2016_06_goat.
|
67
|
+
* Introduced RedisCache. Later PRs use RedisCache for
|
68
|
+
* SummaryAccessor (OAK)
|
69
|
+
* RussianDoll caches (Marshal+OAK)
|
70
|
+
* S3 cache (OAK)
|
71
|
+
* COMPANY_ID_CACHE (JSON)
|
72
|
+
* Commits to OAK for volatile use cases.
|
73
|
+
* [https://github.com/ProsperWorks/ALI/pull/1757](https://github.com/ProsperWorks/ALI/pull/1757) **oak-woe-2016-06-30**
|
74
|
+
* Merged in major_2016_07_hotpocket.
|
75
|
+
* Fixed a regexp which had broken some Float parsing.
|
76
|
+
* [https://github.com/ProsperWorks/ALI/pull/1724](https://github.com/ProsperWorks/ALI/pull/1724) **correspondences-in-s3-fixes**
|
77
|
+
* Merged in major_2016_07_hotpocket.
|
78
|
+
* Stores Correspondence bodies in S3 as OAK strings.
|
79
|
+
* Commits to OAK for durable use cases at oak_3.
|
80
|
+
|
81
|
+
See also [OAK: Encryption-in-OAK](ENCRYPTION.md) for later
|
82
|
+
developments and the introduction of OAK_4.
|
83
|
+
|
84
|
+
## Overview of OAK strings.
|
85
|
+
|
86
|
+
`OAK.encode` includes a manifest of its options explicitly in the
|
87
|
+
OAK string output. There is no need for an options back channel to
|
88
|
+
`OAK.decode`.
|
89
|
+
|
90
|
+
We could encode every OAK string with different options, and
|
91
|
+
`OAK.decode` can reverse all of them with no extra info.
|
92
|
+
```
|
93
|
+
>> OAK.encode('HelloWorld',redundancy: :none)
|
94
|
+
=> "oak_3NNB_0_23_RjFTVTEwX0hlbGxvV29ybGQ_ok"
|
95
|
+
|
96
|
+
>> OAK.encode('HelloWorld',format: :none,redundancy: :none)
|
97
|
+
=> "oak_3NNN_0_17_F1SU10_HelloWorld_ok"
|
98
|
+
|
99
|
+
>> OAK.encode('HelloWorld',compression: :zlib,force: true)
|
100
|
+
=> "oak_3CZB_3789329355_34_eJxzMwwONTSI90jNyckPzy_KSQEAL2gF3A_ok"
|
101
|
+
|
102
|
+
>> OAK.decode(OAK.encode('HelloWorld',redundancy: :none))
|
103
|
+
=> "HelloWorld"
|
104
|
+
|
105
|
+
>> OAK.decode(OAK.encode('HelloWorld',format: :none,redundancy: :none))
|
106
|
+
=> "HelloWorld"
|
107
|
+
|
108
|
+
>> OAK.decode(OAK.encode('HelloWorld',compression: :zlib,force: true))
|
109
|
+
=> "HelloWorld"
|
110
|
+
```
|
111
|
+
|
112
|
+
We use this to defer our choice of time-space tradeoffs until runtime.
|
113
|
+
`ALI`'s `Caches::RedisCache` mechanism enshrines this pattern by parsing
|
114
|
+
OAK options from the ENV:
|
115
|
+
```
|
116
|
+
# in Caches::RedisCache#_serialize
|
117
|
+
OAK.encode(
|
118
|
+
pre_obj,
|
119
|
+
redundancy: (ENV["CACHE_OAK_REDUNDANCY_#{name}"] || 'sha1').intern,
|
120
|
+
compression: (ENV["CACHE_OAK_COMPRESSION_#{name}"]|| 'bzip2').intern,
|
121
|
+
force: (ENV["CACHE_OAK_FORCE_#{name}"] == 'true'),
|
122
|
+
format: (ENV["CACHE_OAK_FORMAT_#{name}"] || 'base64').intern,
|
123
|
+
)
|
124
|
+
```
|
125
|
+
These defaults differ from those in `OAK.encode`.
|
126
|
+
|
127
|
+
Here is a quick parse of some OAK strings.
|
128
|
+
```
|
129
|
+
>> OAK.encode('Hi',format: :none)
|
130
|
+
|
131
|
+
=> "oak_3CNN_3475096913_8_F1SU2_Hi_ok"
|
132
|
+
oak_3 # OAK ver 3
|
133
|
+
C # checksum Crc32
|
134
|
+
N # compression None
|
135
|
+
N # format None
|
136
|
+
3475096913 # checksum value
|
137
|
+
8 # 8 data bytes
|
138
|
+
F1SU2_Hi # data
|
139
|
+
ok # end of sequence
|
140
|
+
|
141
|
+
>> OAK.encode([1,'2'],redundancy: :none,format: :none)
|
142
|
+
|
143
|
+
=> "oak_3NNN_0_15_F3A2_1_2I1SU1_2_ok"
|
144
|
+
oak_3 # OAK ver 3
|
145
|
+
N # checksum None
|
146
|
+
N # compression None
|
147
|
+
N # format None
|
148
|
+
0 # checksum value
|
149
|
+
15 # 15 data bytes
|
150
|
+
F3A2_1_2I1SU1_2 # data
|
151
|
+
ok # end of sequence
|
152
|
+
```
|
153
|
+
The FRIZZY format encodes all the objects in the graph as a vector,
|
154
|
+
with the element 0 implicitly the top-level object and compound
|
155
|
+
objects encoded with indices into the main object vector.
|
156
|
+
```
|
157
|
+
F # FRIZZY serializer
|
158
|
+
3 # 3 objects
|
159
|
+
A2_1_2 # obj 0 an Array
|
160
|
+
# w/ 2 slots:
|
161
|
+
# obj 1 and
|
162
|
+
# obj 2
|
163
|
+
I1 # obj 1 Int 1
|
164
|
+
SU1_2 # obj 2 Str
|
165
|
+
# UTF-8
|
166
|
+
# 1 string bytes
|
167
|
+
# bytes '2'
|
168
|
+
```
|
169
|
+
|
170
|
+
## option :redundancy => :crc32, :none, or :sha1
|
171
|
+
|
172
|
+
The `:redundancy` option selects which algorithm is used to compute
|
173
|
+
the checksum included by OAK.encode. This checksum lets OAK.decode
|
174
|
+
detect stream errors.The choice of `:redundancy` at encode time is
|
175
|
+
recorded in the 6th character of the OAK string.
|
176
|
+
|
177
|
+
`:redundancy => :crc32`, the default, is flagged as a `C` and is
|
178
|
+
`'%d' % Zlib.crc32(str)`.
|
179
|
+
|
180
|
+
Advantages:
|
181
|
+
|
182
|
+
* Encodes in only 12 bytes.
|
183
|
+
* Plenty good enough for all natural stream errors.
|
184
|
+
|
185
|
+
Disadvantages:
|
186
|
+
|
187
|
+
* Encodes in 12 whole bytes!
|
188
|
+
* Easily spoofed: not cryptographically secure.
|
189
|
+
|
190
|
+
`:redundancy => :none`, is flagged as a `N` and is simply `_0`.
|
191
|
+
|
192
|
+
I chose to leave an explicit place-holder field even at the cost of 2
|
193
|
+
useless bytes to keep the number of meta-data field constant.
|
194
|
+
|
195
|
+
Advantages:
|
196
|
+
|
197
|
+
* Encodes in only 2 bytes.
|
198
|
+
* OAK will still catch truncation errors.
|
199
|
+
|
200
|
+
Disadvantages:
|
201
|
+
|
202
|
+
* Encodes in 2 whole bytes!
|
203
|
+
* OAK will not catch twiddled bits.
|
204
|
+
* All compression algorithms do their own checksumming.
|
205
|
+
* So this is only a disadvantage with `:compression => :none`.
|
206
|
+
* Caution: `:force => false` is default.
|
207
|
+
* So fallback to `:compression => :none` is likely.
|
208
|
+
* So `:compression` is not a substitute for `:redundancy`.
|
209
|
+
|
210
|
+
`:redundancy => :sha1`, is flagged as a `S`. It is very large and is
|
211
|
+
not recommended for most use cases.
|
212
|
+
|
213
|
+
Advantages:
|
214
|
+
|
215
|
+
* Harder for a malicious hacker to fool.
|
216
|
+
|
217
|
+
Disadvantages:
|
218
|
+
|
219
|
+
* Encodes in 41 bytes!
|
220
|
+
|
221
|
+
## option :compression => :none, :lz4, :zlib, :bzip2, or :lzma
|
222
|
+
|
223
|
+
`:compression` is recorded in the 7th char and selects which
|
224
|
+
algorithm compresses the payload.
|
225
|
+
|
226
|
+
`:compression => :none`, the default, is flagged as a `N`. No compression.
|
227
|
+
|
228
|
+
* No compression or decompression costs.
|
229
|
+
* Human-readable if the source content is human readable and format is none.
|
230
|
+
|
231
|
+
`:compression => :lz4` is flagged as a `4`. [LZ4](https://github.com/lz4/lz4) is in the [Lempel-Ziv](https://en.wikipedia.org/wiki/LZ77_and_LZ78) family of dictionary-based redundancy eaters which are popular for low-latency online systems
|
232
|
+
|
233
|
+
* Low compression costs, low decompression costs.
|
234
|
+
* Compression ratios in 1.8-2.1 for English.
|
235
|
+
|
236
|
+
`:compression => :zlib` is flagged as a `Z`. [RFC 1951 Zlib ](https://en.wikipedia.org/wiki/DEFLATE)is the widely-used compression used in pkzip, zip, and gzip. It crunches LZ77 with a follow-on Huffman step.
|
237
|
+
|
238
|
+
* Medium compression costs, medium decompression costs.
|
239
|
+
* Compression ratios around 4.0 for English.
|
240
|
+
|
241
|
+
`:compression => :bzip2` is flagged as a `B`. [Burroughs-Wheeler transform](https://en.wikipedia.org/wiki/Bzip2) with some Huffman, delta, and sparse array encoding thrown in for good measure.
|
242
|
+
|
243
|
+
* Higher compression and decompression costs.
|
244
|
+
* Compression ratios around 5.0 for English.
|
245
|
+
|
246
|
+
`:compression => :lzma` is flagged as a `M` uses the [Lempel-Zib_Markov chain algorithm](https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Markov_chain_algorithm). It is an unusual choice for an online system.
|
247
|
+
|
248
|
+
Advantages:
|
249
|
+
|
250
|
+
* Very high compression costs, medium-low decompression costs.
|
251
|
+
* Compression ratios around 5.2 for English.
|
252
|
+
|
253
|
+
`option :force => false, true`
|
254
|
+
|
255
|
+
By default, `OAK.encode` will fall back to `:compression => :none` if
|
256
|
+
the compressed string is larger than the source string.
|
257
|
+
|
258
|
+
`:force => true` overrides this fail safe.
|
259
|
+
|
260
|
+
## option :format => :base64 or :none
|
261
|
+
|
262
|
+
The `:format` option selects the character set used in the main body
|
263
|
+
of the OAK string - the payload part which follows the flags and
|
264
|
+
checksum and before the `_ok` terminator. The choice of `:redundancy`
|
265
|
+
at encode time is recorded in the 8th character of the OAK string.
|
266
|
+
|
267
|
+
`:format => :base64`, the default, is flagged as a `B` and is
|
268
|
+
`Base64.urlsafe_encode64(str)` with the final `===` padding stripped.
|
269
|
+
|
270
|
+
Advantages:
|
271
|
+
|
272
|
+
* 7-bit clean (not that it matters in the TCP age)
|
273
|
+
* Prints prettily in ASCII terminals and editors.
|
274
|
+
* Easy to eyeball: no spaces, commas, slashes, colons, etc.
|
275
|
+
* In most text GUIs, the auto-highlighting feature when you
|
276
|
+
double-click an OAK string will exactly select it.
|
277
|
+
* Personally, this is my favorite feature of OAK.
|
278
|
+
|
279
|
+
Disadvantages:
|
280
|
+
|
281
|
+
* Not human readable.
|
282
|
+
* Only matters when compression is off.
|
283
|
+
* Size bloat in the ratio of 6 bits to 8 bits i.e. by a factor of 133%.
|
284
|
+
* Reverses some compression gains.
|
285
|
+
|
286
|
+
`:format => :none` is flagged as a `N` and does nothing. The source
|
287
|
+
string, including zeros, bells, form feeds, and umlauts are all are
|
288
|
+
catenated nakedly into the OAK string.
|
289
|
+
|
290
|
+
Advantages:
|
291
|
+
|
292
|
+
* Human-readable if the source content is human readable.
|
293
|
+
* No size bloat.
|
294
|
+
|
295
|
+
Disadvantages:
|
296
|
+
|
297
|
+
* Hard to parse visually in most cases.
|
298
|
+
* Nasty in logs when data is binary or compressed.
|
299
|
+
|
300
|
+
## Why FRIZZY? Why not JSON, YAML, XML, or Marshal?
|
301
|
+
|
302
|
+
JSON treats everything as a value type - it knows nothing about object
|
303
|
+
identity.
|
304
|
+
|
305
|
+
In Ruby, each distinct string literal is a distinct String
|
306
|
+
object. There is a subtle difference between a pair of equivalent
|
307
|
+
Strings and a pair of identical strings:
|
308
|
+
```
|
309
|
+
>> arr = ['x','x'] ; arr[0].object_id == arr[1].object_id
|
310
|
+
=> false
|
311
|
+
|
312
|
+
>> str = 'x'; arr = [str,str] ; arr[0].object_id == arr[1].object_id
|
313
|
+
=> true
|
314
|
+
```
|
315
|
+
JSON is the same for `['x','x']` and `[str,str]`. The difference
|
316
|
+
is lost in translation.
|
317
|
+
```
|
318
|
+
>> str = 'x' ; JSON.dump([str,str]) == JSON.dump(['x','x'])
|
319
|
+
=> true
|
320
|
+
|
321
|
+
>> JSON.dump(['x','x'])
|
322
|
+
=> "[\"x\",\"x\"]"
|
323
|
+
|
324
|
+
>> str = 'x' ; JSON.dump([str,str])
|
325
|
+
=> "[\"x\",\"x\"]"
|
326
|
+
|
327
|
+
>> arr = JSON.load(JSON.dump([str,str]))
|
328
|
+
=> ["x", "x"]
|
329
|
+
|
330
|
+
>> arr[0].object_id == arr[1].object_id
|
331
|
+
=> false
|
332
|
+
```
|
333
|
+
With OAK, vive la différence:
|
334
|
+
```
|
335
|
+
>> str = 'x' ; OAK.encode([str,str]) == OAK.encode(['x','x'])
|
336
|
+
=> false
|
337
|
+
|
338
|
+
>> OAK.encode(['x','x'],format: :none)
|
339
|
+
=> "oak_3CNN_3737537744_16_F3A2_1_2SU1_xsU0_ok"
|
340
|
+
|
341
|
+
>> str = 'x' ; OAK.encode([str,str],format: :none)
|
342
|
+
=> "oak_3CNN_2865617390_13_F2A2_1_1SU1_x_ok"
|
343
|
+
```
|
344
|
+
|
345
|
+
The JSON format does not support `Infinity`, `-Infinity`, or `NaN` -
|
346
|
+
though Ruby's JSON encoder transcodes thes via a nonstandard
|
347
|
+
extension.
|
348
|
+
|
349
|
+
YAML handles `Infinity`, `-Infinity`, and `NaN`. YAML also handles
|
350
|
+
DAGs - but not cycles.
|
351
|
+
|
352
|
+
XML is ... XML. And huge. And Nokogiri is weird.
|
353
|
+
|
354
|
+
Who cares? These are just strings. It's better to treat them as
|
355
|
+
immutable anyhow, right? What about compound objects like lists or
|
356
|
+
hashes?
|
357
|
+
|
358
|
+
It turns out that capturing identity is the key to serializing any
|
359
|
+
non-tree objects.
|
360
|
+
```
|
361
|
+
>> a = ['a','TBD']
|
362
|
+
=> ["a", "TBD"]
|
363
|
+
|
364
|
+
>> b = ['b',a]
|
365
|
+
=> ["b", ["a", "TBD"]]
|
366
|
+
|
367
|
+
>> a[1] = b # a cycle!
|
368
|
+
=> ["b", ["a", [...]]]
|
369
|
+
|
370
|
+
>> JSON.dump(a)
|
371
|
+
SystemStackError: stack level too deep
|
372
|
+
|
373
|
+
>> OAK.encode(a,format: :none)
|
374
|
+
=> "oak_3CNN_3573295141_24_F4A2_1_2SU1_aA2_3_0SU1_b_ok"
|
375
|
+
```
|
376
|
+
The essence of serializing non-tree objects is capturing identity.
|
377
|
+
|
378
|
+
Does this matter in `ALI`? Honestly, I don't know. Cycles and DAGs are
|
379
|
+
irrelevant for Correspondence bodies. We do have Summaries which are
|
380
|
+
DAGgy on Strings but that is probably irrelevant in all logic.
|
381
|
+
|
382
|
+
But "Do we need it?" is the wrong question. Data in the wild is
|
383
|
+
diverse and surprising. The right question is, "Can we *prove* that
|
384
|
+
we do not need it now or tomorrow?" With a cycle-aware serializer, I
|
385
|
+
don't *need* to prove nonexistence or constrain the future.
|
386
|
+
|
387
|
+
What about
|
388
|
+
[Marshal](http://jakegoulding.com/blog/2013/01/15/a-little-dip-into-rubys-marshal-format/)?
|
389
|
+
It handles (almost) all Ruby types including user-defined classes, has
|
390
|
+
no problems with cycles, and is widely available and accepted.
|
391
|
+
|
392
|
+
My reasons for not using Marshal are:
|
393
|
+
|
394
|
+
* Security
|
395
|
+
* [https://ruby-doc.org/core-2.2.2/Marshal.html](https://ruby-doc.org/core-2.2.2/Marshal.html)
|
396
|
+
* "By design, `load` can deserialize almost any class loaded into
|
397
|
+
the Ruby process. In many cases this can lead to remote code
|
398
|
+
execution if the Marshal data is loaded from an untrusted
|
399
|
+
source."
|
400
|
+
* Marshal can have problems if a user-defined class changes between
|
401
|
+
encoding time and decoding time.
|
402
|
+
* I wanted OAK to refuse to encode objects whose structure it
|
403
|
+
could not guarantee to recover with perfect fidelity.
|
404
|
+
* I have ambitions for language portability with in OAK.
|
405
|
+
* Specious: porting a subset of Marshal would be no harder than
|
406
|
+
porting OAK.
|
407
|
+
|
408
|
+
To be fair, we use Marshal anyhow, wrapped in OAK, in our cache layers
|
409
|
+
which store full ActiveRecord model objects. So any arguments about
|
410
|
+
architectural purity vis-a-vis OAK are part hype.
|