cborg 4.0.9 → 4.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,3 +1,22 @@
1
+ ## [4.1.1](https://github.com/rvagg/cborg/compare/v4.1.0...v4.1.1) (2024-03-11)
2
+
3
+
4
+ ### Trivial Changes
5
+
6
+ * add byte strings example using tags to retain typedarray types ([5bf989b](https://github.com/rvagg/cborg/commit/5bf989bf7dde0a6124fb1605f3caf899273c6d4e))
7
+
8
+ ## [4.1.0](https://github.com/rvagg/cborg/compare/v4.0.9...v4.1.0) (2024-02-29)
9
+
10
+
11
+ ### Features
12
+
13
+ * export `Tokenizer`, document how it can be used ([fbba395](https://github.com/rvagg/cborg/commit/fbba395b6848bd8dcb66a1d18c36e1033581b5ef))
14
+
15
+
16
+ ### Bug Fixes
17
+
18
+ * update tyupes for new exports ([986035d](https://github.com/rvagg/cborg/commit/986035dbffbc682b06d7bfecb8d2c3cc02048429))
19
+
1
20
  ## [4.0.9](https://github.com/rvagg/cborg/compare/v4.0.8...v4.0.9) (2024-02-07)
2
21
 
3
22
 
package/README.md CHANGED
@@ -31,10 +31,12 @@
31
31
  * [`encodedLength(data[, options])`](#encodedlengthdata-options)
32
32
  * [Type encoders](#type-encoders)
33
33
  * [Tag decoders](#tag-decoders)
34
+ * [Decoding with a custom tokeniser](#decoding-with-a-custom-tokeniser)
34
35
  * [Deterministic encoding recommendations](#deterministic-encoding-recommendations)
35
36
  * [Round-trip consistency](#round-trip-consistency)
36
37
  * [JSON mode](#json-mode)
37
38
  * [Example](#example-1)
39
+ * [Advanced types and tags](#advanced-types-and-tags)
38
40
  * [License and Copyright](#license-and-copyright)
39
41
 
40
42
  ## Example
@@ -243,7 +245,7 @@ Decode valid CBOR bytes from a `Uint8Array` (or `Buffer`) and return a JavaScrip
243
245
  * `rejectDuplicateMapKeys` (boolean, default `false`): when the decoder encounters duplicate keys for the same map, an error will be thrown when this option is set. This is an additional _strictness_ option, disallowing data-hiding and reducing the number of same-data different-bytes possibilities where it matters.
244
246
  * `retainStringBytes` (boolean, default `false`): when decoding strings, retain the original bytes on the `Token` object as `byteValue`. Since it is possible to encode non-UTF-8 characters in strings in CBOR, and JavaScript doesn't properly handle non-UTF-8 in its conversion from bytes (`TextEncoder` or `Buffer`), this can result in a loss of data (and an inability to round-trip). Where this is important, a token stream should be consumed instead of a plain `decode()` and the `byteValue` property on string tokens can be inspected (see [lib/diagnostic.js](lib/diagnostic.js) for an example of its use.)
245
247
  * `tags` (array): a mapping of tag number to tag decoder function. By default no tags are supported. See [Tag decoders](#tag-decoders).
246
- * `tokenizer` (object): an object with two methods, `next()` which returns a `Token` and `done()` which returns a `boolean`. Can be used to implement custom input decoding. See the source code for examples.
248
+ * `tokenizer` (object): an object with two methods, `next()` which returns a `Token`, `done()` which returns a `boolean` and `pos()` which returns the current byte position being decoded. Can be used to implement custom input decoding. See the source code for examples. (Note en-US spelling "tokenizer" used throughout exported methods and types, which may be confused with "tokeniser" used in these docs).
247
249
 
248
250
  ### `decodeFirst(data[, options])`
249
251
 
@@ -383,6 +385,45 @@ function bigNegIntDecoder (bytes) {
383
385
  }
384
386
  ```
385
387
 
388
+ ## Decoding with a custom tokeniser
389
+
390
+ `decode()` allows overriding the `tokenizer` option to provide a custom tokeniser. This object can be described with the following interface:
391
+
392
+ ```typescript
393
+ export interface DecodeTokenizer {
394
+ next(): Token,
395
+ done(): boolean,
396
+ pos(): number,
397
+ }
398
+ ```
399
+
400
+ `next()` should return the next token in the stream, `done()` should return `true` when the stream is finished, and `pos()` should return the current byte position in the stream.
401
+
402
+ Overriding the default tokeniser can be useful for changing the rules of decode. For example, it is used to turn cborg into a JSON decoder by changing parsing rules on how to turn bytes into tokens. See the source code for how this works.
403
+
404
+ The default `Tokenizer` class is available from the default export. Providing `options.tokenizer = new Tokenizer(bytes, options)` would result in the same decode path using this tokeniser. However, this can also be used to override or modify default decode paths by intercepting the token stream. For example, to perform a decode that disallows bytes, the following code would work:
405
+
406
+ ```js
407
+ import { decode, Tokenizer, Type } from 'cborg'
408
+
409
+ class CustomTokeniser extends Tokenizer {
410
+ next () {
411
+ const nextToken = super.next()
412
+ if (nextToken.type === Type.bytes) {
413
+ throw new Error('Unsupported type: bytes')
414
+ }
415
+ return nextToken
416
+ }
417
+ }
418
+
419
+ function customDecode (data, options) {
420
+ options = Object.assign({}, options, {
421
+ tokenizer: new CustomTokeniser(data, options)
422
+ })
423
+ return decode(data, options)
424
+ }
425
+ ```
426
+
386
427
  ## Deterministic encoding recommendations
387
428
 
388
429
  cborg is designed with deterministic encoding forms as a primary feature. It is suitable for use with content addressed systems or other systems where convergence of binary forms is important. The ideal is to have strictly _one way_ of mapping a set of data into a binary form. Unfortunately CBOR has many opportunities for flexibility, including:
@@ -428,7 +469,7 @@ There are a number of forms where an object will not round-trip precisely, if th
428
469
 
429
470
  Use `import { encode, decode, decodeFirst } from 'cborg/json'` to access the JSON handling encoder and decoder.
430
471
 
431
- Many of the same encode and decode options available for CBOR can be used to manage JSON handling. These include strictness requirements for decode and custom tag encoders for encode. Tag encoders can't create new tags as there are no tags in JSON, but they can replace JavaScript object forms with custom JSON forms (e.g. convert a `Uint8Array` to a valid JSON form rather than having the encoder throw an error). The inverse is also possible, turning specific JSON forms into JavaScript forms, by using a custom tokenizer on decode.
472
+ Many of the same encode and decode options available for CBOR can be used to manage JSON handling. These include strictness requirements for decode and custom tag encoders for encode. Tag encoders can't create new tags as there are no tags in JSON, but they can replace JavaScript object forms with custom JSON forms (e.g. convert a `Uint8Array` to a valid JSON form rather than having the encoder throw an error). The inverse is also possible, turning specific JSON forms into JavaScript forms, by using a custom tokeniser on decode.
432
473
 
433
474
  Special notes on options specific to the JSON:
434
475
 
@@ -461,6 +502,10 @@ encoded: Uint8Array(34) [
461
502
  encoded (string): {"this":{"is":"JSON!","yay":true}}
462
503
  ```
463
504
 
505
+ ## Advanced types and tags
506
+
507
+ As demonstrated above, the ability to provide custom `typeEncoders` to `encode()`, `tags` and even a custom `tokenizer` to `decode()` allow for quite a bit of flexibility in manipulating both the encode and decode process. An advanced example that uses all of these features can be found in [example-bytestrings.js](./example-bytestrings.js) which demonstrates how one might implement [RFC 8746](https://www.rfc-editor.org/rfc/rfc8746.html) to allow typed arrays to round-trip through CBOR and retain their original types. Since cborg is designed to speak purely in terms of `Uint8Array`s, its default behaviour will squash all typed arrays down to their byte array forms and materialise them as plain `Uint8Arrays`. Where round-trip fidelity is important and CBOR tags are an option, this form of usage is an option.
508
+
464
509
  ## License and Copyright
465
510
 
466
511
  Copyright 2020 Rod Vagg
package/cborg.js CHANGED
@@ -1,5 +1,5 @@
1
1
  import { encode } from './lib/encode.js'
2
- import { decode, decodeFirst } from './lib/decode.js'
2
+ import { decode, decodeFirst, Tokeniser, tokensToObject } from './lib/decode.js'
3
3
  import { Token, Type } from './lib/token.js'
4
4
 
5
5
  /**
@@ -14,6 +14,8 @@ import { Token, Type } from './lib/token.js'
14
14
  export {
15
15
  decode,
16
16
  decodeFirst,
17
+ Tokeniser as Tokenizer,
18
+ tokensToObject,
17
19
  encode,
18
20
  Token,
19
21
  Type
@@ -0,0 +1,180 @@
1
+ /*
2
+ RFC 8746 defines a set of tags to use for typed arrays. Out of the box, cborg doesn't care about
3
+ tags and just squashes all concerns around byte arrays to Uint8Array with major type 2. This is
4
+ fine for most use cases, but it is lossy, you can't round-trip and retain your original type.
5
+
6
+ This example shows how to use cborg to round-trip a typed array with tags.
7
+
8
+ https://www.rfc-editor.org/rfc/rfc8746.html
9
+ */
10
+
11
+ import { encode, decode, Token, Tokenizer, Type } from 'cborg.js'
12
+
13
+ const tagUint8Array = 64
14
+ const tagUint64Array = 71
15
+ // etc... see https://www.rfc-editor.org/rfc/rfc8746.html#name-iana-considerations
16
+
17
+ /* ENCODERS */
18
+
19
+ /**
20
+ * @param {any} obj
21
+ * @returns {[Token]}
22
+ */
23
+ function uint8ArrayEncoder (obj) {
24
+ if (!(obj instanceof Uint8Array)) {
25
+ throw new Error('expected Uint8Array')
26
+ }
27
+ return [
28
+ new Token(Type.tag, tagUint8Array),
29
+ new Token(Type.bytes, obj)
30
+ ]
31
+ }
32
+
33
+ /**
34
+ * @param {any} obj
35
+ * @returns {[Token]}
36
+ */
37
+ function uint64ArrayEncoder (obj) {
38
+ if (!(obj instanceof BigUint64Array)) {
39
+ throw new Error('expected BigUint64Array')
40
+ }
41
+ return [
42
+ new Token(Type.tag, tagUint64Array),
43
+ // BigUint64Array to a Uint8Array, but we have to pay attention to the possibility of it being
44
+ // a view of a larger ArrayBuffer.
45
+ new Token(Type.bytes, new Uint8Array(obj.buffer, obj.byteOffset, obj.byteLength))
46
+ ]
47
+ }
48
+
49
+ // etc...
50
+
51
+ const typeEncoders = {
52
+ Uint8Array: uint8ArrayEncoder,
53
+ BigUint64Array: uint64ArrayEncoder
54
+ }
55
+
56
+ /* DECODERS */
57
+
58
+ /**
59
+ * @param {ArrayBuffer} bytes
60
+ * @returns {any}
61
+ */
62
+ function uint8ArrayDecoder (bytes) {
63
+ if (!(bytes instanceof ArrayBuffer)) {
64
+ throw new Error('expected ArrayBuffer')
65
+ }
66
+ return new Uint8Array(bytes)
67
+ }
68
+
69
+ /**
70
+ * @param {ArrayBuffer} bytes
71
+ * @returns {any}
72
+ */
73
+ function uint64ArrayDecoder (bytes) {
74
+ if (!(bytes instanceof ArrayBuffer)) {
75
+ throw new Error('expected ArrayBuffer')
76
+ }
77
+ return new BigUint64Array(bytes)
78
+ }
79
+
80
+ // etc...
81
+
82
+ const tags = []
83
+ tags[tagUint8Array] = uint8ArrayDecoder
84
+ tags[tagUint64Array] = uint64ArrayDecoder
85
+
86
+ /* TOKENIZER */
87
+
88
+ // We have to deal with the fact that cborg talks in Uint8Arrays but we now want it to treat major 2
89
+ // as ArrayBuffers, so we have to transform the token stream to replace the Uint8Array with an
90
+ // ArrayBuffer.
91
+
92
+ class ArrayBufferTransformingTokeniser extends Tokenizer {
93
+ next () {
94
+ const nextToken = super.next()
95
+ if (nextToken.type === Type.bytes) {
96
+ // Transform the (assumed) Uint8Array value to an ArrayBuffer of the same bytes, note though
97
+ // that all tags we care about are going to be <tag><bytes>, so we're also transforming those
98
+ // into ArrayBuffers, so our tag decoders need to also assume they are getting ArrayBuffers
99
+ // now. An alternative would be to watch the token stream for <tag> and not transform the next
100
+ // token if it's <bytes>, but that's a bit more complicated for demo purposes.
101
+ nextToken.value = nextToken.value.buffer
102
+ }
103
+ return nextToken
104
+ }
105
+ }
106
+
107
+ // Optional: a new decode() wrapper, mainly so we don't have to deal with the complications of\
108
+ // instantiating a Tokenizer which needs both data and the options.
109
+ function byteStringDecoder (data, options) {
110
+ options = Object.assign({}, options, {
111
+ tags,
112
+ tokenizer: new ArrayBufferTransformingTokeniser(data, options)
113
+ })
114
+ return decode(data, options)
115
+ }
116
+
117
+ /* ROUND-TRIP */
118
+
119
+ const original = {
120
+ u8: new Uint8Array([1, 2, 3, 4, 5]),
121
+ u64: new BigUint64Array([10000000000000000n, 20000000000000000n, 30000000000000000n, 40000000000000000n, 50000000000000000n]),
122
+ ab: new Uint8Array([6, 7, 8, 9, 10]).buffer
123
+ }
124
+
125
+ const encoded = encode(original, { typeEncoders })
126
+
127
+ const decoded = byteStringDecoder(encoded)
128
+
129
+ console.log('Original:', original)
130
+ console.log('Encoded:', Buffer.from(encoded).toString('hex')) // excuse the Buffer, sorry browser peeps
131
+ console.log('Decoded:', decoded)
132
+
133
+ /* Output:
134
+
135
+ Original: {
136
+ u8: Uint8Array(5) [ 1, 2, 3, 4, 5 ],
137
+ u64: BigUint64Array(5) [
138
+ 10000000000000000n,
139
+ 20000000000000000n,
140
+ 30000000000000000n,
141
+ 40000000000000000n,
142
+ 50000000000000000n
143
+ ],
144
+ ab: ArrayBuffer { [Uint8Contents]: <06 07 08 09 0a>, byteLength: 5 }
145
+ }
146
+ Encoded: a362616245060708090a627538d84045010203040563753634d84758280000c16ff2862300000082dfe40d47000000434fd7946a00000004bfc91b8e000000c52ebca2b100
147
+ Decoded: {
148
+ ab: ArrayBuffer { [Uint8Contents]: <06 07 08 09 0a>, byteLength: 5 },
149
+ u8: Uint8Array(5) [ 1, 2, 3, 4, 5 ],
150
+ u64: BigUint64Array(5) [
151
+ 10000000000000000n,
152
+ 20000000000000000n,
153
+ 30000000000000000n,
154
+ 40000000000000000n,
155
+ 50000000000000000n
156
+ ]
157
+ }
158
+
159
+ */
160
+
161
+ /* Diagnostic:
162
+
163
+ $ cborg hex2diag a362616245060708090a627538d84045010203040563753634d84758280000c16ff2862300000082dfe40d47000000434fd7946a00000004bfc91b8e000000c52ebca2b100
164
+ a3 # map(3)
165
+ 62 # string(2)
166
+ 6162 # "ab"
167
+ 45 # bytes(5)
168
+ 060708090a # "\x06\x07\x08\x09\x0a"
169
+ 62 # string(2)
170
+ 7538 # "u8"
171
+ d8 40 # tag(64)
172
+ 45 # bytes(5)
173
+ 0102030405 # "\x01\x02\x03\x04\x05"
174
+ 63 # string(3)
175
+ 753634 # "u64"
176
+ d8 47 # tag(71)
177
+ 58 28 # bytes(40)
178
+ 0000c16ff2862300000082dfe40d47000000434fd7 # "\x00\x00Áoò\x86#\x00\x00\x00\x82ßä\x0dG\x00\x00\x00CO×"
179
+ 946a00000004bfc91b8e000000c52ebca2b100 # "\x94j\x00\x00\x00\x04¿É\x1b\x8e\x00\x00\x00Å.¼¢±\x00
180
+ */
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "cborg",
3
- "version": "4.0.9",
3
+ "version": "4.1.1",
4
4
  "description": "Fast CBOR with a focus on strictness",
5
5
  "main": "cborg.js",
6
6
  "type": "module",
package/types/cborg.d.ts CHANGED
@@ -16,8 +16,10 @@ export type DecodeOptions = import('./interface').DecodeOptions;
16
16
  export type EncodeOptions = import('./interface').EncodeOptions;
17
17
  import { decode } from './lib/decode.js';
18
18
  import { decodeFirst } from './lib/decode.js';
19
+ import { Tokeniser } from './lib/decode.js';
20
+ import { tokensToObject } from './lib/decode.js';
19
21
  import { encode } from './lib/encode.js';
20
22
  import { Token } from './lib/token.js';
21
23
  import { Type } from './lib/token.js';
22
- export { decode, decodeFirst, encode, Token, Type };
24
+ export { decode, decodeFirst, Tokeniser as Tokenizer, tokensToObject, encode, Token, Type };
23
25
  //# sourceMappingURL=cborg.d.ts.map
@@ -1 +1 @@
1
- {"version":3,"file":"cborg.d.ts","sourceRoot":"","sources":["../cborg.js"],"names":[],"mappings":";;;yBAMa,OAAO,aAAa,EAAE,UAAU;;;;0BAEhC,OAAO,aAAa,EAAE,mBAAmB;;;;4BACzC,OAAO,aAAa,EAAE,aAAa;;;;4BACnC,OAAO,aAAa,EAAE,aAAa;uBATZ,iBAAiB;4BAAjB,iBAAiB;uBAD9B,iBAAiB;sBAEZ,gBAAgB;qBAAhB,gBAAgB"}
1
+ {"version":3,"file":"cborg.d.ts","sourceRoot":"","sources":["../cborg.js"],"names":[],"mappings":";;;yBAMa,OAAO,aAAa,EAAE,UAAU;;;;0BAEhC,OAAO,aAAa,EAAE,mBAAmB;;;;4BACzC,OAAO,aAAa,EAAE,aAAa;;;;4BACnC,OAAO,aAAa,EAAE,aAAa;uBATe,iBAAiB;4BAAjB,iBAAiB;0BAAjB,iBAAiB;+BAAjB,iBAAiB;uBADzD,iBAAiB;sBAEZ,gBAAgB;qBAAhB,gBAAgB"}