RubyGems - oak - Versions diffs - 0.0.3 → 0.4.1 - Mend

oak 0.0.3 → 0.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

checksums.yaml +7 -0
data/.gitignore +51 -0
data/.rubocop.yml +74 -0
data/.travis.yml +17 -0
data/CHANGELOG.md +24 -0
data/DESIDERATA.md +318 -0
data/Gemfile +3 -15
data/LICENSE +22 -0
data/Makefile +113 -0
data/README.md +163 -23
data/Rakefile +6 -47
data/bin/oak +242 -3
data/bin/oak.rb +245 -0
data/lib/oak.rb +1049 -86
data/lib/oak/version.rb +3 -0
data/oak.gemspec +29 -65
metadata +121 -71
data/.document +0 -5
data/Gemfile.lock +0 -26
data/LICENSE.txt +0 -20
data/VERSION +0 -1
data/test/files/config/application.rb +0 -3
data/test/files/config/database.yml +0 -25
data/test/files/config/initializers/secret_token.rb +0 -7
data/test/files/dot_gitignore +0 -0
data/test/helper.rb +0 -29
data/test/test_oak.rb +0 -44

checksums.yaml ADDED

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: 1894c4827e6cb478e373686a2c5c198530a6eabf
+  data.tar.gz: 157bf6e8b64b962cefdabfee31eadd0955b58fed
+SHA512:
+  metadata.gz: fb82da115acd3abd4cc992bfa0896c0ad8ec0a13e56a4843ce332672ba7dc8c1e5b6608dcc0ac7215debf16644e3ac2bf709cb043c826d35aeaf8c07a6a92cb8
+  data.tar.gz: 30200fc38b86dd7e5953a9c70ae0d84825f6008042c1283a66554ad47480eae06658ef626eb6eaa7b3d93156d718952669cd4677add032b3b232408a8acdbf3f

data/.gitignore ADDED

@@ -0,0 +1,51 @@
+*.gem
+*.rbc
+/.config
+/coverage/
+/InstalledFiles
+/pkg/
+/spec/reports/
+/spec/examples.txt
+/test/tmp/
+/test/version_tmp/
+/tmp/
+# Used by dotenv library to load environment variables.
+# .env
+## Specific to RubyMotion:
+.dat*
+.repl_history
+build/
+*.bridgesupport
+build-iPhoneOS/
+build-iPhoneSimulator/
+## Specific to RubyMotion (use of CocoaPods):
+#
+# We recommend against adding the Pods directory to your .gitignore. However
+# you should judge for yourself, the pros and cons are mentioned at:
+# https://guides.cocoapods.org/using/using-cocoapods.html#should-i-check-the-pods-directory-into-source-control
+#
+# vendor/Pods/
+## Documentation cache and generated files:
+/.yardoc/
+/_yardoc/
+/doc/
+/rdoc/
+## Environment normalization:
+/.bundle/
+/vendor/bundle
+/lib/bundler/man/
+# For a library or gem, you might want to ignore these files since the code is
+# intended to run in multiple environments; otherwise, check them in:
+#
+Gemfile.lock
+.ruby-version
+.ruby-gemset
+# unless supporting rvm < 1.11.0 or doing something fancy, ignore this:
+.rvmrc

data/.rubocop.yml ADDED

@@ -0,0 +1,74 @@
+AllCops:
+  Include:
+    - Rakefile
+    - Gemfile
+    - '*.gemspec'
+# I like the Metrics family in principle, but OAK was built in a
+# lower-level style much like C.  The default thresholds for these are
+# too tight for the style I chose for OAK.
+#
+# Moreover, IMO test code is not the place get pedantic about class
+# length, method complexity, etc.  One should be encouraged to add
+# more tests with minimal friction, not forced to make a hard choice
+# between cutting tests or splitting up my test suites.
+#
+Metrics/ParameterLists:
+  Max: 10
+Metrics/BlockLength:
+  Max: 150
+  Exclude:
+    - 'test/**/*.rb'
+Metrics/ClassLength:
+  Max: 400
+  Exclude:
+    - 'test/**/*.rb'
+Metrics/MethodLength:
+  Max: 150
+  Exclude:
+    - 'test/**/*.rb'
+Metrics/ModuleLength:
+  Max: 1000
+  Exclude:
+    - 'test/**/*.rb'
+Metrics/AbcSize:
+  Max: 150
+  Exclude:
+    - 'test/**/*.rb'
+Metrics/BlockNesting:
+  Max: 5
+Metrics/CyclomaticComplexity:
+  Max: 50
+Metrics/PerceivedComplexity:
+  Max: 25
+#
+# Normally I am a pedantic adherent to 80-column lines.
+#
+# Over in test/oak.rb however, there are necessarily a lot of OAK
+# strings which are much larger than 80 characters.
+#
+# I have decided that disablement in .rubocop.yml is less disruptive
+# than repeate use of inline rubocop: comments.
+#
+Metrics/LineLength:
+  Exclude:
+    - 'test/**/*.rb'
+Naming/UncommunicativeMethodParamName:
+  Enabled: false
+# I put extra spaces in a lot of expressions for a lot of different
+# reasons, including especially readability.
+#
+# I reject these cops.
+#
+Layout:
+  Enabled: false
+# As a group, the Style cops are bewilderingly opiniated.
+#
+# In some cases IMO they are harmful e.g. Style/TernaryParentheses.
+#
+# I reject these cops.
+#
+Style:
+  Enabled: false

data/.travis.yml ADDED

@@ -0,0 +1,17 @@
+sudo: true
+language: ruby
+before_install:
+  - gem install bundler -v 1.16.1
+rvm:
+  - 2.1.6
+  - 2.2.9
+  - 2.4.3
+  - 2.5.0
+script:
+  #
+  # Run several tests in parallel, and be happy if they are all happy.
+  #
+  # If any fail, rerun serially so we get clean output from the ones
+  # which failed.
+  #
+  - make test -j 3 || make test

data/CHANGELOG.md ADDED

@@ -0,0 +1,24 @@
+## 0.4.1 (2018-10-01)
+- `oak`, `oak.rb` published as executables from gem.
+- Removed heavier dep on `contracts`, switched to manual checks and looser spec.
+- Documentation reorg and cleanup.
+- Open-sourced with MIT License, published as https://rubygems.org/gems/oak!
+## 0.4.0 (2018-09-24)
+- First export from ProsperWorks/ALI.
+- First conversion to gem.
+- Not open (yet).
+- OAK3 emitted by default.
+- OAK4 with AES-256-GCM encryption with random IV supported.
+## 0.0.3 (2011-11-07) and earlier
+- https://rubygems.org/gems/oak had an earlier incarnation as a
+  secret management utility, https://github.com/imonyse/oak.
+- Special thanks and regards to https://github.com/imonyse, who
+  generously released the gem name `oak` so it could have a second
+  life.

data/DESIDERATA.md ADDED

@@ -0,0 +1,318 @@
+# oak design desiderata
+Some design goals with which I started this project.
+- P1 means "top priorty"
+- P2 means "very important"
+- P3 means "nice to have"
+- P4 means "not harmful if cheap"
+- `+` means "accomplished"
+- `-` means "not accomplished"
+- `?` means "accomplished, but only for some combinations of arguments"
+Desiderata for the structure layer:
+- P1 + losslessly handle nil, true, false, Integer, and String
+- P1 + losslessly handle List with arbitrary values and deep nesting
+- P1 + losslessly handle Hash with string keys and deep nesting in values
+- P1 + detect cycles and DAGS in input structures, fail or handle
+- P1 + handle all Integer types without loss
+- P1 - handle Floats with no more than a small quantified loss
+- P2 + Hash key ordering is preserved in Ruby-Ruby transcoding
+- P3 - convenient:  vaguely human-readable representations available
+- P3 + encode cycles and DAGs
+- P3 + handle Hash with non-string keys and deep nesting in keys
+- P3 + losslessly handle Symbol distinct from String
+- P3 - handle Times and Dates
+Desiderata for the byte layer:
+- P1 + reversible:  original string can be reconstructed from only OAK string
+- P1 + unambiguous: no OAK string is the prefix of any other OAK string
+- P1 + extensible:  OAK strings contain ids for ver, format, compression, etc
+- P1 + robust:      error detection in OAK strings
+- P2 + flexible:    multiple compression modes available
+- P3 + convenient:  available representation without `{}`, comma, whitespace
+- P3 + convenient:  7-bit clean representations available
+- P3 + convenient:  representations which are selectable with double-click
+- P3 + convenient:  vaguely human-readable representations available
+- P3 - streamable:  reversing can be accomplished with definite-size buffers
+- P4 - embeddable:  reversing can be accomplished with fixed-size buffers
+- P4 - defensive:   error correction available (no good libs found)
+Techniques used in the byte layer to accomplish these goals.
+- manifest type id for self-identification
+- manifest version id in case format changes in future
+- salient encoding algorithm choices stored in output stream
+  - error detection algorithm aka redundancy
+  - compression
+  - formatting
+- microchoices made to confine metadata characters to [_0-9a-z]
+- algorithm menu constructed to offer data characters in [-_0-9a-z]
+## Serialization Choices
+A survey of alternatives considered for the serialization layer.
+### Considering Marshal
+The Marshal format has some major drawbacks which I believe make it
+a nonstarter.
+- strictly Ruby-specific
+- readability across major versions not guaranteed
+- too powerful: can be used to execute arbitrary code
+- binary and non-human-readable
+  - many option combos for oak make oak strings also non-human-readable
+  - still, it is nice to have layer which is at least potentially clear
+Marshal does offer one major advantage:
+- transcodes all Ruby value types and user-defined value-like classes
+- reported to be much faster than JSON or YAML for serializing
+### Considering JSON
+JSON is awesome most of the time, especially in highly constrained
+environments such as API specifications and simple ad-hoc caching
+situations.
+JSON offers advantages:
+- a portable object model
+- easy to read
+- widely deployed
+- the go-to choice for interchange in recent years
+But it has some shortcomings which lead me to reject it for the
+structural level in OAK.
+- floating point precision is implementation-dependent
+- always decodes as a tree - fails to transcode DAGiness
+- cannot represent cycles - encoder reject, stack overflow, or infinite loop
+- no native date or time handling
+- table keys may only be strings
+  - e.g. `{'123'=>'x'} == JSON.parse(JSON.dump({123=>'x'}))`
+- type information symbol-vs-string lost, symbols transcode to strings
+  - e.g. `'foo'        == JSON.parse(JSON.dump(:foo))`
+  - e.g. `{'foo'=>'x'} == JSON.parse(JSON.dump({:foo=>'x'}))`
+- official grammer only allows {} or [] as top-level object
+  - e.g. `123 == JSON.parse('123')` but `JSON.parse('123')` raises `ParserError`
+  - many parsers in the wild support only this strict official grammer
+  - JSON is suitable only for document encoding, not streams
+    - allows only one object per file
+    - multiple objects must be members of a list
+    - lists must be fully scanned and parsed before being processed
+    - no possibility of streamy processing
+Biggest limitation of JSON IMO is that Hash keys can only be strings:
+```
+2.1.6 :008 > obj = {'str'=>'bar',[1,2,3]=>'baz'}
+ => {"str"=>"bar", [1, 2, 3]=>"baz"}
+2.1.6 :009 > JSON.dump(obj)
+ => "{\"str\":\"bar\",\"[1, 2, 3]\":\"baz\"}"
+2.1.6 :010 > JSON.parse(JSON.dump(obj))
+ => {"str"=>"bar", "[1, 2, 3]"=>"baz"}
+2.1.6 :011 > JSON.parse(JSON.dump(obj)) == obj
+ => false
+```
+### Considering YAML
+YAML is strong where JSON is strong, and also strong in many places
+where JSON is weak.  In fact, YAML includes JSON as a subformat: JSON
+strings *are* YAML strings!
+Some of the advantages of YAML over JSON are:
+- handles any directed graph, including DAGy bits and cycles
+- arguably more human-readable than JSON
+- YAML spec subsumes JSON spec: JSON files are YAML files
+- supports non-string keys
+  - e.g. `{123=>'x'}  == YAML.load(YAML.dump({123=>'x'}))`
+- supports symbols
+  - e.g. `:foo        == YAML.load(YAML.dump(:foo))`
+  - e.g. `{:foo=>'x'} == YAML.load(YAML.dump({:foo=>'x'}))`
+- allows integer or string as top-level object
+YAML overcomes the biggest limitation of JSON by supporting non-string
+hash keys:
+```
+  2.1.6 :008 > obj = {'str'=>'bar',[1,2,3]=>'baz'}
+   => {"str"=>"bar", [1, 2, 3]=>"baz"}
+  2.1.6 :012 > YAML.dump(obj)
+   => "---\nstr: bar\n? - 1\n  - 2\n  - 3\n: baz\n"
+  2.1.6 :013 > YAML.load(YAML.dump(obj))
+  => {"str"=>"bar", [1, 2, 3]=>"baz"}
+  2.1.6 :014 > YAML.load(YAML.dump(obj)) == obj
+   => true
+```
+Note: YAML's support for Symbols is due to Psych, not strictly the
+YAML format itself.  I've taken both `YAML.dump(:foo)` and
+`YAML.dump(':foo')` into Python and done `yaml.load()` on them.  Both
+result in `':foo'`.  So this nicety is not portable.
+But YAML still has some shortcomings:
+- floating point precision is implementation-dependent
+- no native date or time handling
+- unclear whether available parsers support stream processing
+- DAGs and cycles of Arrays and Hash are handled, but Strings are not.
+### Considering FRIZZY
+FRIZZY is a home-grown serialization format which I ended up commiting
+to for OAK.
+The name FRIZZY means nothing, and survives only as the rogue `F`
+character at the start of a serialized object:
+```
+.1.6 :006 > OAK.encode('Hello, World!',redundancy: :none,format: :none)
+ => "oak_3NNN_0_20_F1SU13_Hello, World!_ok"
+```
+Advantages:
+  - Recongizes when Strings are identical, not just equivalent.
+  - It is much more compact than YAML.
+  - Has built-in folding of String and Symbol representation.
+Disadvantages:
+  - Home grown.
+  - Very much not human readable.
+  - Floating point precision is incompletely specified.
+    - Current implementation crudely uses Number.to_s and String.to_f
+I decided to reinvent the wheel and go with FRIZZY.  We have
+discovered Summaries which are DAGs on strings.  It might be
+acceptable to lose that information but I did not want to *prove* it
+was acceptable to lose that information.
+It may have been an ego-driven sin to go custom here, but I did not
+want to pessimize future use cases on fidelity or control.
+## Compression Choices
+A survey of alternatives considered for the compression layer.
+### Considering LZO, LZF, and LZ4.
+These compression formats are similar in performance and algorithm.
+All are in the Lempel-Ziv family of dictionary-based
+redundancy-eaters.  They will all be cheap to compress, cheap to
+uncompress, but will delver only modest compression ratios.
+This family of algorithms are unfamiliar to those accustomed to
+archive formats, but they are used widely in low-latency applications
+(such as server caches ;) ).
+To keep things simple, I settled on supporting only LZ4 because its
+gem, `lz4-ruby`, seems to have more mindshare and momentum.  It is
+weaker but faster than the other weak+fast options - which seems like
+the way to be.
+Based on previous experience, I expect this to be a clear win for use
+in Redis caches vs being uncompressed.
+### Considering ZLIB
+Including ZLIB felt like a no-brainer.  ZLIB is familiar,
+widely-deployed, and standardized in RFC 1951.  It uses the L-Z
+process with an additional Huffman encoding phase.  It will deliver
+intermediate cost for intermediate compression.
+Based on previous experience, I expect this option will usually be
+dominated by either LZ4 for low-latency applications or BZIP2 for
+archival applications, but I'm including it for comparisons and
+because it would feel strage not to.
+### Considering BZIP2
+BZIP2 is an aggressive compression which uses the Burrows–Wheeler,
+move-to-front, and run-length-encoding transforms with Huffman It will
+be several times slower but several 10% stronger than ZLIB.  I chose
+the gem bzip2-ffi over the more flexible rbzip2 to make absolutely
+certain that we use the native libbz2 implementation and do not
+falling back silently to a Ruby version which is 100x slower if/when
+Heroku does not offer FFI.
+Based on previous experience, I expect this option will dominate where
+data is generally cold or where storage is very expensive compared to
+CPU.
+### Considering LZMA
+LZMA is the Lempel-Ziv-Markov chains algorithm.  It will be an order
+of magnitude more expensive to compress than BZIP2, but will
+decompress slightly faster and will yield better compression ratios by
+few 5%.
+This will be useful only for cases where read-write ratios are over 10
+and storage:cpu cost ratios are high.  When read-write ratios are
+close to unity, LZO will dominate where storage:cpu is low and BZIP2
+will dominate where storage:cpu is high.
+Nonetheless, I have a soft spot in my heart for this algorithm so I am
+including it - if only so we can rule it out by demonstration rather
+than hypothesis.
+## Encryption Choices
+Encryption is the first extension of OAK since it went live in
+ProsperWorks's Redis layer on 2016-06-02 and in the S3 Correspondence
+bodies since 2016-07-06.  There had been only Rubocop updates and nary
+a bugfix since 2016-07-01.
+### Encryption-in-OAK Design Decisions (see arch doc for discussion):
+- Encryption is the only change in OAK4.
+- OAK4 will only support AES-256-GCM with random IVs chosen for
+  each encryption event.
+  - OAK4 will use no salt other than the random IV.
+- Encrypted OAK strings will be nondeterministic.
+  - This crushes the desiderata of making OAK.encode a pure function.
+  - This is unavoidable to avoid a blatant security hole.
+- OAK4 dramatically changes how headers are managed from OAK3.
+  - Encrypts all headers which are not required for decryption.
+  - Athenticates all headers and the encrypted stream.
+- Key rotation is supported.
+  - Via an ENV-specified key chain.
+  - Can hold multiple master keys.
+### Encryption-in-OAK Backward Compatibility
+Before encryption was added, the format identifier for OAK strings
+was `'oak_3'`.
+To indicate we are making a non-backward compatible change, I am
+bumping that up to `'oak_4'` for encrypted strings.
+The legacy OAK3 are still supported both on read and on write.
+By default, OAK4 is used only when encryption is requested.
+### Encryption-in-OAK Regarding Compression vs Encryption
+Note that compression of encrypted strings is next to useless.  By
+design, encryption algorithms obscure exploitable redundancy in
+plaintext and produce incompressible ciphertext.
+On the other hand, in the wild there have been a handful of successful
+chosen-plaintext attacks on compress-then-encrypt encodings.  See:
+- https://blog.appcanary.com/2016/encrypt-or-compress.html
+- https://en.wikipedia.org/wiki/CRIME
+OAK4 supports compression and does compression-then-encryption.
+The extremely paranoid are encouraged to use compression: :none.  Note
+however that the source data may be compressed.  Furthermore, for
+larger objects FRIZZY itself is, in part, a compression algorithm.