oak 0.4.1 → 0.4.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.travis.yml +5 -3
- data/CHANGELOG.md +10 -1
- data/DESIDERATA.md +9 -4
- data/DESIGN.md +410 -0
- data/ENCRYPTION.md +893 -0
- data/README.md +2 -6
- data/bin/oak +1 -0
- data/bin/oak.rb +122 -124
- data/lib/oak.rb +13 -0
- data/lib/oak/version.rb +1 -1
- metadata +5 -3
- data/bin/oak +0 -245
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: c5a9c1b9b0290b6e92de16b26030631e9f795fec
|
4
|
+
data.tar.gz: 898af2d521ee3fba60258c21411fe4dcc839ce16
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: f724eac8c05c5f948192ac73c70c136623beae95289ddff8f2d1b024d0351a2f55032841c347e54b0caf6c8d3b604f698b4e86981f716a988599b709c79e95b9
|
7
|
+
data.tar.gz: 75137b8bdf2d4807c018253f7535335bad3d74b94afa7c71ebd085a03de423015dfa6cd12e95d3d3c8f61ce4c8d366ca2cc7ecb5f58e89686e6e2a7cf4ca1451
|
data/.travis.yml
CHANGED
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,12 @@
|
|
1
|
+
## 0.4.2 (TBD)
|
2
|
+
|
3
|
+
- `oak`, `oak.rb` patched to play nicer with bundler
|
4
|
+
- Fix exception type bug in ruby >= 2.3 and openssl >= 2.0.0 when
|
5
|
+
parsing some corrupt encrypted data.
|
6
|
+
- Updated .travis.yml to update and expand the versions of Ruby covered.
|
7
|
+
- Migrate DESIGN.md and ENCRYPTION.md in from Google Docs.
|
8
|
+
|
9
|
+
|
1
10
|
## 0.4.1 (2018-10-01)
|
2
11
|
|
3
12
|
- `oak`, `oak.rb` published as executables from gem.
|
@@ -8,7 +17,7 @@
|
|
8
17
|
|
9
18
|
## 0.4.0 (2018-09-24)
|
10
19
|
|
11
|
-
- First export from ProsperWorks/ALI.
|
20
|
+
- First export from [ProsperWorks/ALI](https://github.com/ProsperWorks/ALI).
|
12
21
|
- First conversion to gem.
|
13
22
|
- Not open (yet).
|
14
23
|
- OAK3 emitted by default.
|
data/DESIDERATA.md
CHANGED
@@ -266,10 +266,15 @@ than hypothesis.
|
|
266
266
|
|
267
267
|
## Encryption Choices
|
268
268
|
|
269
|
-
|
270
|
-
|
271
|
-
bodies
|
272
|
-
|
269
|
+
OAK has been live in `ALI` (copper.com's primary web service) in our
|
270
|
+
Redis cache layer on 2016-06-02 and for archiving correspondence
|
271
|
+
bodies in S3 on 2016-07-06.
|
272
|
+
|
273
|
+
There had been only Rubocop updates and nary a bugfix since
|
274
|
+
2016-07-01.
|
275
|
+
|
276
|
+
Encryption is the first extension since OAK first went live.
|
277
|
+
|
273
278
|
|
274
279
|
### Encryption-in-OAK Design Decisions (see arch doc for discussion):
|
275
280
|
|
data/DESIGN.md
ADDED
@@ -0,0 +1,410 @@
|
|
1
|
+
# OAK: The Object ArKive
|
2
|
+
|
3
|
+
## All that to avoid JSON?
|
4
|
+
|
5
|
+
OAK is a serialization and envelope format which encodes simple Ruby
|
6
|
+
objects as strings. It bundles together a variety of well-understood
|
7
|
+
encoding libraries into a succinct self-describing package.
|
8
|
+
|
9
|
+
This document covers the existing OAK format, and proposed encryption
|
10
|
+
extension.
|
11
|
+
|
12
|
+
OAK compares to JSON, YAML, and Marshal. OAK is more precise than
|
13
|
+
JSON or YAML, but slightly more Ruby-esque, and supports fewer types
|
14
|
+
than Marshal. OAK also has features similar to OpenPGP
|
15
|
+
([https://tools.ietf.org/html/rfc4880](https://tools.ietf.org/html/rfc4880))
|
16
|
+
(though not, so far, encryption).
|
17
|
+
|
18
|
+
The main value proposition for OAK is operational flexibility. OAK
|
19
|
+
leds you defer choices between compression, checksumming, and 7-bit
|
20
|
+
cleanliness algorithms until after a system is live and under load.
|
21
|
+
|
22
|
+
As of 2017-09-13, OAK is used by `ALI` (copper.com's primary web
|
23
|
+
service) for volatile caches in Redis, durable archives in S3, and for
|
24
|
+
7-bit clean encoding of complex configuration data.
|
25
|
+
|
26
|
+
Author: [JHW](https://github.com/jhwillett)
|
27
|
+
Advisors: Marshall, Gerald, Kelly, Neil
|
28
|
+
|
29
|
+
Here is a sneak preview of some OAK strings:
|
30
|
+
```
|
31
|
+
$ echo 'HelloWorld!' | bin/oak.rb --format none
|
32
|
+
oak_3CNN_1336599037_18_F1SU11_HelloWorld!_ok
|
33
|
+
|
34
|
+
$ echo 'HelloWorld!' | bin/oak.rb
|
35
|
+
oak_3CNB_1336599037_24_RjFTVTExX0hlbGxvV29ybGQh_ok
|
36
|
+
|
37
|
+
$ echo 'HelloWorld!' | bin/oak.rb --compression lz4 --force
|
38
|
+
oak_3C4B_1336599037_28_EvADRjFTVTExX0hlbGxvV29ybGQh_ok
|
39
|
+
|
40
|
+
$ echo 'HelloWorld!' | bin/oak.rb | bin/oak.rb --mode decode-lines
|
41
|
+
HelloWorld!
|
42
|
+
```
|
43
|
+
|
44
|
+
## OAK Version History
|
45
|
+
|
46
|
+
OAK has been live for 16 months by the time this arch document was
|
47
|
+
prepared retroactively.
|
48
|
+
|
49
|
+
* [https://github.com/ProsperWorks/ALI/pull/1245](https://github.com/ProsperWorks/ALI/pull/1245) **oak**
|
50
|
+
* Merged to major_2016_04_dragonfruit.
|
51
|
+
* Initial implementation. Not integrated or active.
|
52
|
+
* Version oak_1
|
53
|
+
* [https://github.com/ProsperWorks/ALI/pull/1350](https://github.com/ProsperWorks/ALI/pull/1350) **oak-in-summary_accessor**
|
54
|
+
* Merged to major_2016_04_dragonfruit.
|
55
|
+
* Reworked SummaryAccessor with ENV flag to switch to OAK serialization.
|
56
|
+
* Exposure to live data revealed some issues missed in the lab.
|
57
|
+
* [https://github.com/ProsperWorks/ALI/pull/1631](https://github.com/ProsperWorks/ALI/pull/1631) **oak-remove-json-and-yaml**
|
58
|
+
* Merged to major_2016_06_goat.
|
59
|
+
* Simplified down to only the one serialization algorithm, "FRIZZY".
|
60
|
+
* Version oak_2
|
61
|
+
* [https://github.com/ProsperWorks/ALI/pull/1618](https://github.com/ProsperWorks/ALI/pull/1618) **fix-oak-utf8**
|
62
|
+
* Merged in major_2016_06_goat.
|
63
|
+
* Fixed issues uncovered in SummaryAccessor ramp.
|
64
|
+
* Version oak_3
|
65
|
+
* [https://github.com/ProsperWorks/ALI/pull/1655](https://github.com/ProsperWorks/ALI/pull/1655) **volatile_cache_accessor**
|
66
|
+
* Merged in major_2016_06_goat.
|
67
|
+
* Introduced RedisCache. Later PRs use RedisCache for
|
68
|
+
* SummaryAccessor (OAK)
|
69
|
+
* RussianDoll caches (Marshal+OAK)
|
70
|
+
* S3 cache (OAK)
|
71
|
+
* COMPANY_ID_CACHE (JSON)
|
72
|
+
* Commits to OAK for volatile use cases.
|
73
|
+
* [https://github.com/ProsperWorks/ALI/pull/1757](https://github.com/ProsperWorks/ALI/pull/1757) **oak-woe-2016-06-30**
|
74
|
+
* Merged in major_2016_07_hotpocket.
|
75
|
+
* Fixed a regexp which had broken some Float parsing.
|
76
|
+
* [https://github.com/ProsperWorks/ALI/pull/1724](https://github.com/ProsperWorks/ALI/pull/1724) **correspondences-in-s3-fixes**
|
77
|
+
* Merged in major_2016_07_hotpocket.
|
78
|
+
* Stores Correspondence bodies in S3 as OAK strings.
|
79
|
+
* Commits to OAK for durable use cases at oak_3.
|
80
|
+
|
81
|
+
See also [OAK: Encryption-in-OAK](ENCRYPTION.md) for later
|
82
|
+
developments and the introduction of OAK_4.
|
83
|
+
|
84
|
+
## Overview of OAK strings.
|
85
|
+
|
86
|
+
`OAK.encode` includes a manifest of its options explicitly in the
|
87
|
+
OAK string output. There is no need for an options back channel to
|
88
|
+
`OAK.decode`.
|
89
|
+
|
90
|
+
We could encode every OAK string with different options, and
|
91
|
+
`OAK.decode` can reverse all of them with no extra info.
|
92
|
+
```
|
93
|
+
>> OAK.encode('HelloWorld',redundancy: :none)
|
94
|
+
=> "oak_3NNB_0_23_RjFTVTEwX0hlbGxvV29ybGQ_ok"
|
95
|
+
|
96
|
+
>> OAK.encode('HelloWorld',format: :none,redundancy: :none)
|
97
|
+
=> "oak_3NNN_0_17_F1SU10_HelloWorld_ok"
|
98
|
+
|
99
|
+
>> OAK.encode('HelloWorld',compression: :zlib,force: true)
|
100
|
+
=> "oak_3CZB_3789329355_34_eJxzMwwONTSI90jNyckPzy_KSQEAL2gF3A_ok"
|
101
|
+
|
102
|
+
>> OAK.decode(OAK.encode('HelloWorld',redundancy: :none))
|
103
|
+
=> "HelloWorld"
|
104
|
+
|
105
|
+
>> OAK.decode(OAK.encode('HelloWorld',format: :none,redundancy: :none))
|
106
|
+
=> "HelloWorld"
|
107
|
+
|
108
|
+
>> OAK.decode(OAK.encode('HelloWorld',compression: :zlib,force: true))
|
109
|
+
=> "HelloWorld"
|
110
|
+
```
|
111
|
+
|
112
|
+
We use this to defer our choice of time-space tradeoffs until runtime.
|
113
|
+
`ALI`'s `Caches::RedisCache` mechanism enshrines this pattern by parsing
|
114
|
+
OAK options from the ENV:
|
115
|
+
```
|
116
|
+
# in Caches::RedisCache#_serialize
|
117
|
+
OAK.encode(
|
118
|
+
pre_obj,
|
119
|
+
redundancy: (ENV["CACHE_OAK_REDUNDANCY_#{name}"] || 'sha1').intern,
|
120
|
+
compression: (ENV["CACHE_OAK_COMPRESSION_#{name}"]|| 'bzip2').intern,
|
121
|
+
force: (ENV["CACHE_OAK_FORCE_#{name}"] == 'true'),
|
122
|
+
format: (ENV["CACHE_OAK_FORMAT_#{name}"] || 'base64').intern,
|
123
|
+
)
|
124
|
+
```
|
125
|
+
These defaults differ from those in `OAK.encode`.
|
126
|
+
|
127
|
+
Here is a quick parse of some OAK strings.
|
128
|
+
```
|
129
|
+
>> OAK.encode('Hi',format: :none)
|
130
|
+
|
131
|
+
=> "oak_3CNN_3475096913_8_F1SU2_Hi_ok"
|
132
|
+
oak_3 # OAK ver 3
|
133
|
+
C # checksum Crc32
|
134
|
+
N # compression None
|
135
|
+
N # format None
|
136
|
+
3475096913 # checksum value
|
137
|
+
8 # 8 data bytes
|
138
|
+
F1SU2_Hi # data
|
139
|
+
ok # end of sequence
|
140
|
+
|
141
|
+
>> OAK.encode([1,'2'],redundancy: :none,format: :none)
|
142
|
+
|
143
|
+
=> "oak_3NNN_0_15_F3A2_1_2I1SU1_2_ok"
|
144
|
+
oak_3 # OAK ver 3
|
145
|
+
N # checksum None
|
146
|
+
N # compression None
|
147
|
+
N # format None
|
148
|
+
0 # checksum value
|
149
|
+
15 # 15 data bytes
|
150
|
+
F3A2_1_2I1SU1_2 # data
|
151
|
+
ok # end of sequence
|
152
|
+
```
|
153
|
+
The FRIZZY format encodes all the objects in the graph as a vector,
|
154
|
+
with the element 0 implicitly the top-level object and compound
|
155
|
+
objects encoded with indices into the main object vector.
|
156
|
+
```
|
157
|
+
F # FRIZZY serializer
|
158
|
+
3 # 3 objects
|
159
|
+
A2_1_2 # obj 0 an Array
|
160
|
+
# w/ 2 slots:
|
161
|
+
# obj 1 and
|
162
|
+
# obj 2
|
163
|
+
I1 # obj 1 Int 1
|
164
|
+
SU1_2 # obj 2 Str
|
165
|
+
# UTF-8
|
166
|
+
# 1 string bytes
|
167
|
+
# bytes '2'
|
168
|
+
```
|
169
|
+
|
170
|
+
## option :redundancy => :crc32, :none, or :sha1
|
171
|
+
|
172
|
+
The `:redundancy` option selects which algorithm is used to compute
|
173
|
+
the checksum included by OAK.encode. This checksum lets OAK.decode
|
174
|
+
detect stream errors.The choice of `:redundancy` at encode time is
|
175
|
+
recorded in the 6th character of the OAK string.
|
176
|
+
|
177
|
+
`:redundancy => :crc32`, the default, is flagged as a `C` and is
|
178
|
+
`'%d' % Zlib.crc32(str)`.
|
179
|
+
|
180
|
+
Advantages:
|
181
|
+
|
182
|
+
* Encodes in only 12 bytes.
|
183
|
+
* Plenty good enough for all natural stream errors.
|
184
|
+
|
185
|
+
Disadvantages:
|
186
|
+
|
187
|
+
* Encodes in 12 whole bytes!
|
188
|
+
* Easily spoofed: not cryptographically secure.
|
189
|
+
|
190
|
+
`:redundancy => :none`, is flagged as a `N` and is simply `_0`.
|
191
|
+
|
192
|
+
I chose to leave an explicit place-holder field even at the cost of 2
|
193
|
+
useless bytes to keep the number of meta-data field constant.
|
194
|
+
|
195
|
+
Advantages:
|
196
|
+
|
197
|
+
* Encodes in only 2 bytes.
|
198
|
+
* OAK will still catch truncation errors.
|
199
|
+
|
200
|
+
Disadvantages:
|
201
|
+
|
202
|
+
* Encodes in 2 whole bytes!
|
203
|
+
* OAK will not catch twiddled bits.
|
204
|
+
* All compression algorithms do their own checksumming.
|
205
|
+
* So this is only a disadvantage with `:compression => :none`.
|
206
|
+
* Caution: `:force => false` is default.
|
207
|
+
* So fallback to `:compression => :none` is likely.
|
208
|
+
* So `:compression` is not a substitute for `:redundancy`.
|
209
|
+
|
210
|
+
`:redundancy => :sha1`, is flagged as a `S`. It is very large and is
|
211
|
+
not recommended for most use cases.
|
212
|
+
|
213
|
+
Advantages:
|
214
|
+
|
215
|
+
* Harder for a malicious hacker to fool.
|
216
|
+
|
217
|
+
Disadvantages:
|
218
|
+
|
219
|
+
* Encodes in 41 bytes!
|
220
|
+
|
221
|
+
## option :compression => :none, :lz4, :zlib, :bzip2, or :lzma
|
222
|
+
|
223
|
+
`:compression` is recorded in the 7th char and selects which
|
224
|
+
algorithm compresses the payload.
|
225
|
+
|
226
|
+
`:compression => :none`, the default, is flagged as a `N`. No compression.
|
227
|
+
|
228
|
+
* No compression or decompression costs.
|
229
|
+
* Human-readable if the source content is human readable and format is none.
|
230
|
+
|
231
|
+
`:compression => :lz4` is flagged as a `4`. [LZ4](https://github.com/lz4/lz4) is in the [Lempel-Ziv](https://en.wikipedia.org/wiki/LZ77_and_LZ78) family of dictionary-based redundancy eaters which are popular for low-latency online systems
|
232
|
+
|
233
|
+
* Low compression costs, low decompression costs.
|
234
|
+
* Compression ratios in 1.8-2.1 for English.
|
235
|
+
|
236
|
+
`:compression => :zlib` is flagged as a `Z`. [RFC 1951 Zlib ](https://en.wikipedia.org/wiki/DEFLATE)is the widely-used compression used in pkzip, zip, and gzip. It crunches LZ77 with a follow-on Huffman step.
|
237
|
+
|
238
|
+
* Medium compression costs, medium decompression costs.
|
239
|
+
* Compression ratios around 4.0 for English.
|
240
|
+
|
241
|
+
`:compression => :bzip2` is flagged as a `B`. [Burroughs-Wheeler transform](https://en.wikipedia.org/wiki/Bzip2) with some Huffman, delta, and sparse array encoding thrown in for good measure.
|
242
|
+
|
243
|
+
* Higher compression and decompression costs.
|
244
|
+
* Compression ratios around 5.0 for English.
|
245
|
+
|
246
|
+
`:compression => :lzma` is flagged as a `M` uses the [Lempel-Zib_Markov chain algorithm](https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Markov_chain_algorithm). It is an unusual choice for an online system.
|
247
|
+
|
248
|
+
Advantages:
|
249
|
+
|
250
|
+
* Very high compression costs, medium-low decompression costs.
|
251
|
+
* Compression ratios around 5.2 for English.
|
252
|
+
|
253
|
+
`option :force => false, true`
|
254
|
+
|
255
|
+
By default, `OAK.encode` will fall back to `:compression => :none` if
|
256
|
+
the compressed string is larger than the source string.
|
257
|
+
|
258
|
+
`:force => true` overrides this fail safe.
|
259
|
+
|
260
|
+
## option :format => :base64 or :none
|
261
|
+
|
262
|
+
The `:format` option selects the character set used in the main body
|
263
|
+
of the OAK string - the payload part which follows the flags and
|
264
|
+
checksum and before the `_ok` terminator. The choice of `:redundancy`
|
265
|
+
at encode time is recorded in the 8th character of the OAK string.
|
266
|
+
|
267
|
+
`:format => :base64`, the default, is flagged as a `B` and is
|
268
|
+
`Base64.urlsafe_encode64(str)` with the final `===` padding stripped.
|
269
|
+
|
270
|
+
Advantages:
|
271
|
+
|
272
|
+
* 7-bit clean (not that it matters in the TCP age)
|
273
|
+
* Prints prettily in ASCII terminals and editors.
|
274
|
+
* Easy to eyeball: no spaces, commas, slashes, colons, etc.
|
275
|
+
* In most text GUIs, the auto-highlighting feature when you
|
276
|
+
double-click an OAK string will exactly select it.
|
277
|
+
* Personally, this is my favorite feature of OAK.
|
278
|
+
|
279
|
+
Disadvantages:
|
280
|
+
|
281
|
+
* Not human readable.
|
282
|
+
* Only matters when compression is off.
|
283
|
+
* Size bloat in the ratio of 6 bits to 8 bits i.e. by a factor of 133%.
|
284
|
+
* Reverses some compression gains.
|
285
|
+
|
286
|
+
`:format => :none` is flagged as a `N` and does nothing. The source
|
287
|
+
string, including zeros, bells, form feeds, and umlauts are all are
|
288
|
+
catenated nakedly into the OAK string.
|
289
|
+
|
290
|
+
Advantages:
|
291
|
+
|
292
|
+
* Human-readable if the source content is human readable.
|
293
|
+
* No size bloat.
|
294
|
+
|
295
|
+
Disadvantages:
|
296
|
+
|
297
|
+
* Hard to parse visually in most cases.
|
298
|
+
* Nasty in logs when data is binary or compressed.
|
299
|
+
|
300
|
+
## Why FRIZZY? Why not JSON, YAML, XML, or Marshal?
|
301
|
+
|
302
|
+
JSON treats everything as a value type - it knows nothing about object
|
303
|
+
identity.
|
304
|
+
|
305
|
+
In Ruby, each distinct string literal is a distinct String
|
306
|
+
object. There is a subtle difference between a pair of equivalent
|
307
|
+
Strings and a pair of identical strings:
|
308
|
+
```
|
309
|
+
>> arr = ['x','x'] ; arr[0].object_id == arr[1].object_id
|
310
|
+
=> false
|
311
|
+
|
312
|
+
>> str = 'x'; arr = [str,str] ; arr[0].object_id == arr[1].object_id
|
313
|
+
=> true
|
314
|
+
```
|
315
|
+
JSON is the same for `['x','x']` and `[str,str]`. The difference
|
316
|
+
is lost in translation.
|
317
|
+
```
|
318
|
+
>> str = 'x' ; JSON.dump([str,str]) == JSON.dump(['x','x'])
|
319
|
+
=> true
|
320
|
+
|
321
|
+
>> JSON.dump(['x','x'])
|
322
|
+
=> "[\"x\",\"x\"]"
|
323
|
+
|
324
|
+
>> str = 'x' ; JSON.dump([str,str])
|
325
|
+
=> "[\"x\",\"x\"]"
|
326
|
+
|
327
|
+
>> arr = JSON.load(JSON.dump([str,str]))
|
328
|
+
=> ["x", "x"]
|
329
|
+
|
330
|
+
>> arr[0].object_id == arr[1].object_id
|
331
|
+
=> false
|
332
|
+
```
|
333
|
+
With OAK, vive la différence:
|
334
|
+
```
|
335
|
+
>> str = 'x' ; OAK.encode([str,str]) == OAK.encode(['x','x'])
|
336
|
+
=> false
|
337
|
+
|
338
|
+
>> OAK.encode(['x','x'],format: :none)
|
339
|
+
=> "oak_3CNN_3737537744_16_F3A2_1_2SU1_xsU0_ok"
|
340
|
+
|
341
|
+
>> str = 'x' ; OAK.encode([str,str],format: :none)
|
342
|
+
=> "oak_3CNN_2865617390_13_F2A2_1_1SU1_x_ok"
|
343
|
+
```
|
344
|
+
|
345
|
+
The JSON format does not support `Infinity`, `-Infinity`, or `NaN` -
|
346
|
+
though Ruby's JSON encoder transcodes thes via a nonstandard
|
347
|
+
extension.
|
348
|
+
|
349
|
+
YAML handles `Infinity`, `-Infinity`, and `NaN`. YAML also handles
|
350
|
+
DAGs - but not cycles.
|
351
|
+
|
352
|
+
XML is ... XML. And huge. And Nokogiri is weird.
|
353
|
+
|
354
|
+
Who cares? These are just strings. It's better to treat them as
|
355
|
+
immutable anyhow, right? What about compound objects like lists or
|
356
|
+
hashes?
|
357
|
+
|
358
|
+
It turns out that capturing identity is the key to serializing any
|
359
|
+
non-tree objects.
|
360
|
+
```
|
361
|
+
>> a = ['a','TBD']
|
362
|
+
=> ["a", "TBD"]
|
363
|
+
|
364
|
+
>> b = ['b',a]
|
365
|
+
=> ["b", ["a", "TBD"]]
|
366
|
+
|
367
|
+
>> a[1] = b # a cycle!
|
368
|
+
=> ["b", ["a", [...]]]
|
369
|
+
|
370
|
+
>> JSON.dump(a)
|
371
|
+
SystemStackError: stack level too deep
|
372
|
+
|
373
|
+
>> OAK.encode(a,format: :none)
|
374
|
+
=> "oak_3CNN_3573295141_24_F4A2_1_2SU1_aA2_3_0SU1_b_ok"
|
375
|
+
```
|
376
|
+
The essence of serializing non-tree objects is capturing identity.
|
377
|
+
|
378
|
+
Does this matter in `ALI`? Honestly, I don't know. Cycles and DAGs are
|
379
|
+
irrelevant for Correspondence bodies. We do have Summaries which are
|
380
|
+
DAGgy on Strings but that is probably irrelevant in all logic.
|
381
|
+
|
382
|
+
But "Do we need it?" is the wrong question. Data in the wild is
|
383
|
+
diverse and surprising. The right question is, "Can we *prove* that
|
384
|
+
we do not need it now or tomorrow?" With a cycle-aware serializer, I
|
385
|
+
don't *need* to prove nonexistence or constrain the future.
|
386
|
+
|
387
|
+
What about
|
388
|
+
[Marshal](http://jakegoulding.com/blog/2013/01/15/a-little-dip-into-rubys-marshal-format/)?
|
389
|
+
It handles (almost) all Ruby types including user-defined classes, has
|
390
|
+
no problems with cycles, and is widely available and accepted.
|
391
|
+
|
392
|
+
My reasons for not using Marshal are:
|
393
|
+
|
394
|
+
* Security
|
395
|
+
* [https://ruby-doc.org/core-2.2.2/Marshal.html](https://ruby-doc.org/core-2.2.2/Marshal.html)
|
396
|
+
* "By design, `load` can deserialize almost any class loaded into
|
397
|
+
the Ruby process. In many cases this can lead to remote code
|
398
|
+
execution if the Marshal data is loaded from an untrusted
|
399
|
+
source."
|
400
|
+
* Marshal can have problems if a user-defined class changes between
|
401
|
+
encoding time and decoding time.
|
402
|
+
* I wanted OAK to refuse to encode objects whose structure it
|
403
|
+
could not guarantee to recover with perfect fidelity.
|
404
|
+
* I have ambitions for language portability with in OAK.
|
405
|
+
* Specious: porting a subset of Marshal would be no harder than
|
406
|
+
porting OAK.
|
407
|
+
|
408
|
+
To be fair, we use Marshal anyhow, wrapped in OAK, in our cache layers
|
409
|
+
which store full ActiveRecord model objects. So any arguments about
|
410
|
+
architectural purity vis-a-vis OAK are part hype.
|