wscodec 0.1.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,13 +1,17 @@
1
1
  # wscodec
2
2
 
3
- Pure-JS codec for Soulmask `actor_data` property streams the binary
4
- payload UE 4.27 emits for every actor's serialized state, with the
5
- Soulmask-specific quirks layered on top.
3
+ Pure-JS codec for Soulmask `actor_data` property streams. Soulmask is a
4
+ survival game whose dedicated server stores world state in a `world.db`
5
+ SQLite file; every actor's serialized state lives in the `actor_data`
6
+ column as an LZ4-compressed UE 4.27 `FPropertyTag` byte stream with a
7
+ few Soulmask-specific quirks layered on top.
8
+
9
+ wscodec parses the property stream into a JavaScript object tree and
10
+ serializes it back. Repo: https://github.com/auroris/SoulmaskCodec.
6
11
 
7
12
  Zero runtime dependencies. Accepts uncompressed bytes, returns
8
13
  JavaScript objects, and vice versa. Round-trip is byte-identical
9
- against every actor in a tested `world.db` (174.6 MB across 11,667
10
- rows; `npm test`).
14
+ against every actor in a tested `world.db` (`npm test`).
11
15
 
12
16
  ## Scope
13
17
 
@@ -23,7 +27,7 @@ actor_data column bytes
23
27
  ```
24
28
 
25
29
  wscodec handles the bottom half (the bytes that come out of LZ4
26
- decompression). The caller handles LZ4 see "LZ4 integration" below
30
+ decompression). The caller handles LZ4. See "LZ4 integration" below
27
31
  for a copy-paste recipe with `lz4-wasm-nodejs`.
28
32
 
29
33
  The SQLite `actor_table.data_version` column stores the NEGATIVE of
@@ -31,11 +35,36 @@ the on-wire DataVersion. A healthy blob with wire `DataVersion=2`
31
35
  lives in a row whose `data_version` column reads `-2`. The wire
32
36
  bytes themselves are always the unsigned `0x00000002`.
33
37
 
34
- ## Install
38
+ ## Setup
35
39
 
36
- ```sh
37
- npm install wscodec
38
- ```
40
+ wscodec itself has zero runtime dependencies, but a realistic workflow
41
+ also needs LZ4 decompression and a SQLite reader. The recommended
42
+ stack:
43
+
44
+ 1. **Node.js LTS.** Install from <https://nodejs.org/>. On Windows, tick
45
+ the "Automatically install the necessary tools" checkbox in the
46
+ installer; this pulls in the Visual Studio Build Tools and Python
47
+ that `better-sqlite3` needs to compile its native bindings. Without
48
+ them `npm install better-sqlite3` will fail with a node-gyp error.
49
+
50
+ 2. **Install wscodec:**
51
+
52
+ ```sh
53
+ npm install wscodec
54
+ ```
55
+
56
+ 3. **Install the LZ4 + SQLite peers** when you need them:
57
+
58
+ ```sh
59
+ npm install lz4-wasm-nodejs better-sqlite3
60
+ ```
61
+
62
+ - `lz4-wasm-nodejs` is pure WASM, no build step.
63
+ - `better-sqlite3` builds native bindings (hence the optional tools above).
64
+
65
+ The test suite uses both peers; if you're only consuming wscodec
66
+ programmatically against bytes you already have in memory, neither
67
+ peer is required.
39
68
 
40
69
  ## API
41
70
 
@@ -44,9 +73,9 @@ npm install wscodec
44
73
  ```js
45
74
  import { UnrealBlob } from 'wscodec';
46
75
 
47
- const blob = UnrealBlob.decode(uncompressedBytes); // Uint8Array blob
48
- const bytes = blob.serialize(); // blob Uint8Array
49
- UnrealBlob.detect(u8); // sniff version tag boolean
76
+ const blob = UnrealBlob.decode(uncompressedBytes); // Uint8Array to blob
77
+ const bytes = blob.serialize(); // blob to Uint8Array
78
+ UnrealBlob.detect(u8); // sniff version tag, returns boolean
50
79
  ```
51
80
 
52
81
  `UnrealBlob.decode(u8)` parses the version tag + property stream and
@@ -68,25 +97,28 @@ is true it re-emits the property stream from `properties` via
68
97
  `writePropertyStream`.
69
98
 
70
99
  `blob.findProperty(name)` returns the first top-level property whose
71
- tag name matches, or `null`.
100
+ tag name matches, or `null`. It does NOT traverse into embedded
101
+ streams, struct values, array elements, or map entries; use
102
+ `blob.findPropertyDeep(name)` for a depth-first walk across the whole
103
+ property tree.
72
104
 
73
105
  ### Property tree
74
106
 
75
107
  `blob.properties` is an array of `Property` instances. Each carries
76
- a `PropertyTag` (`name`, `type`, `size`, ) and a `value` whose
108
+ a `PropertyTag` (`name`, `type`, `size`, ...) and a `value` whose
77
109
  JavaScript shape depends on the tag's type:
78
110
 
79
111
  | tag type | value shape |
80
112
  |---|---|
81
- | `IntProperty`, `FloatProperty`, `BoolProperty`, | plain JS primitive |
113
+ | `IntProperty`, `FloatProperty`, `BoolProperty`, ... | plain JS primitive |
82
114
  | `StrProperty`, `NameProperty` | string / `FName` |
83
- | `StructProperty` | `StructValue` `.value` is either a plain object (known binary structs like `Vector`, `Quat`, `Transform`, `Guid`, …) or a nested property array |
115
+ | `StructProperty` | `StructValue`. `.value` is either a plain object for known binary structs (`Vector`, `Quat`, `Transform`, ...), an `FGuid` instance for the `Guid` struct, or a nested property array for unknown structs |
84
116
  | `ArrayProperty`, `SetProperty` | `ArrayValue` / `SetValue` with `.elements` |
85
- | `MapProperty` | `MapValue` with `.entries: [[key, value], ]` |
117
+ | `MapProperty` | `MapValue` with `.entries: [[key, value], ...]` |
86
118
  | `ObjectProperty`, `ClassProperty`, `Weak*`, `Lazy*`, `WSObjectProperty` | `ObjectRef` (kind + optional path/classPath/embedded stream) |
87
119
  | `SoftObjectProperty`, `SoftClassProperty` | `SoftObjectRef` (`assetPath`, `subPath`) |
88
120
  | `TextProperty` | `FTextValue` (handles UE4 FText history types -1, 0, 2, 4) |
89
- | anything wscodec couldn't structurally decode | `OpaqueValue` bytes retained verbatim |
121
+ | anything wscodec couldn't structurally decode | `OpaqueValue`. Bytes retained verbatim |
90
122
 
91
123
  Submodule re-exports make the value classes importable directly:
92
124
 
@@ -97,33 +129,103 @@ import { FName, FGuid } from 'wscodec';
97
129
  ```
98
130
 
99
131
  Lower-level helpers (`Cursor`, `Writer`, `readPropertyStream`,
100
- `writePropertyStream`, `readValue`, `writeValue`, `STRUCT_HANDLERS`)
101
- are also exported for callers building custom workflows on top.
132
+ `writePropertyStream`, `readValue`, `writeValue`, `STRUCT_HANDLERS`,
133
+ `registerStructHandler`) are also exported for callers building
134
+ custom workflows on top.
135
+
136
+ ### Extending the struct registry
137
+
138
+ `STRUCT_HANDLERS` is a mutable registry of binary struct handlers
139
+ (`Vector`, `Quat`, `Transform`, `Guid`, ...). Unknown struct names
140
+ fall through to the nested-property-stream path, which is correct
141
+ when the struct is tagged and byte-identical via `OpaqueValue` when
142
+ it isn't. To teach the codec a new binary shape, use
143
+ `registerStructHandler`:
144
+
145
+ ```js
146
+ import { registerStructHandler, Cursor, Writer } from 'wscodec';
147
+
148
+ registerStructHandler('MyVector', {
149
+ read: (c) => ({ x: c.readFloat32(), y: c.readFloat32(), z: c.readFloat32() }),
150
+ write: (w, v) => { w.writeFloat32(v.x); w.writeFloat32(v.y); w.writeFloat32(v.z); },
151
+ });
152
+ ```
153
+
154
+ The helper validates that both `read(cursor)` and `write(writer, value)`
155
+ are functions. Register before calling `UnrealBlob.decode` on any blob
156
+ that uses the type.
102
157
 
103
158
  ### Editing
104
159
 
160
+ The library does not provide typed mutators. Callers manipulate the
161
+ `properties` tree directly, then set `_dirty` on the ROOT blob to
162
+ force a re-encode.
163
+
105
164
  ```js
106
- import { UnrealBlob } from 'wscodec';
165
+ import { UnrealBlob, FName } from 'wscodec';
107
166
 
108
167
  const blob = UnrealBlob.decode(inner);
109
168
 
110
- // Mutate the tree directly. The library does not provide typed
111
- // mutators; callers manipulate `properties` and set `_dirty` to
112
- // force re-encode.
169
+ // (1) Edit a primitive value.
170
+ // JianZhuHP (jianzhu = "building") is the building's HP property.
113
171
  blob.findProperty('JianZhuHP').value = 100;
172
+
173
+ // (2) Replace an FName-typed value. NameProperty values are FName
174
+ // instances, not bare strings.
175
+ blob.findProperty('CharacterClass').value = new FName('NPC_Skeleton');
176
+
177
+ // (3) Mutate a nested struct. Known binary structs (Vector, Quat, ...)
178
+ // expose .value as a plain object.
179
+ const transform = blob.findProperty('Transform');
180
+ transform.value.value.translation = { x: 100, y: 200, z: 50 };
181
+
182
+ // (4) Append to an array. ArrayValue.elements is a plain JS array.
183
+ const inventory = blob.findProperty('InventoryItems');
184
+ inventory.value.elements.push(new FName('Item_Wood'));
185
+
186
+ // (5) Remove an element. Just splice it out; don't set null.
187
+ inventory.value.elements.splice(0, 1);
188
+
189
+ // Always set _dirty on the ROOT blob (not on nested properties). The
190
+ // flag is read by blob.serialize() to decide pass-through vs re-encode.
114
191
  blob._dirty = true;
115
192
 
116
193
  const updatedBytes = blob.serialize(); // re-emits from properties
117
194
  ```
118
195
 
196
+ Gotchas:
197
+
198
+ - `_dirty` lives on the root `UnrealBlob`, not on nested `Property` /
199
+ `ArrayValue` / `StructValue` objects. Mutating a deep value without
200
+ setting `blob._dirty = true` returns the original `_raw` bytes
201
+ unchanged.
202
+ - `BoolProperty` values live in the `tag` (`tag.boolVal`), not in
203
+ `property.value`. To flip a bool, edit `prop.tag.boolVal`.
204
+ - Removing a property means splicing it out of `blob.properties`, not
205
+ setting `property.value = null`.
206
+ - If you change a value's encoded SIZE (e.g. extending an FString),
207
+ the property's `tag.size` is recomputed on write, but any property
208
+ that previously carried a `_sizeMismatch` annotation refuses to
209
+ re-emit. Such properties are extremely rare in healthy world.db
210
+ files and are reported by `npm test`.
211
+ - `serialize()` throws if `_dirty` is true AND `error` is set:
212
+ re-emitting from an empty properties array would produce a malformed
213
+ stream. Leave `_dirty=false` to pass through `_raw` verbatim, or
214
+ clear `.error` first if you've replaced `.properties` manually.
215
+ - 64-bit integer values (`Int64Property`, `UInt64Property`,
216
+ `DateTime`, `Timespan`) round-trip as decimal strings. If you
217
+ replace such a value with a Number, it must be a safe integer
218
+ (`|v| <= Number.MAX_SAFE_INTEGER`); otherwise the writer throws
219
+ rather than silently lose precision.
220
+
119
221
  `serialize()` for a dirty blob is byte-identical to a fresh
120
- `decode + serialize` cycle on its output verified on every row
121
- of the tested `world.db`.
222
+ `decode + serialize` cycle on its output, verified on every row of
223
+ the tested `world.db`.
122
224
 
123
225
  ## LZ4 integration
124
226
 
125
227
  `actor_data` column bytes come out of LZ4 compression. wscodec
126
- doesn't bundle an LZ4 implementation that's a caller concern. A
228
+ doesn't bundle an LZ4 implementation; that's a caller concern. A
127
229
  working recipe using `lz4-wasm-nodejs`:
128
230
 
129
231
  ```js
@@ -161,11 +263,11 @@ for (const row of db.prepare('SELECT actor_serial, actor_data FROM actor_table')
161
263
  }
162
264
  ```
163
265
 
164
- Note: LZ4 compression is not deterministic two compressors will
165
- produce different bytes for the same input. wscodec's
166
- byte-identical guarantee covers the inner property-stream bytes;
167
- the outer column bytes round-trip only for unmodified blobs (cache
168
- the input column bytes if you need that).
266
+ Note: LZ4 compression is not deterministic. Two compressors will
267
+ produce different bytes for the same input. wscodec's byte-identical
268
+ guarantee covers the inner property-stream bytes; the outer column
269
+ bytes round-trip only for unmodified blobs (cache the input column
270
+ bytes if you need that).
169
271
 
170
272
  ## Round-trip guarantees
171
273
 
@@ -175,36 +277,56 @@ For every row in the tested `world.db`:
175
277
  - `blob.serialize()` with `_dirty = false` returns the input bytes byte-identical.
176
278
  - `blob.serialize()` with `_dirty = true` re-emits from `properties` and is byte-identical to the input.
177
279
 
178
- Coverage: 174,610,207 bytes across 11,667 actors, including every
179
- known Soulmask wire-format quirk:
180
-
181
- - `kind=0x01` ObjectProperty with the 4-byte actor-ref prefix.
182
- - Embedded ObjectProperty streams with the 4-byte FName.Number trailer
183
- (the Soulmask `JianZhuInstGLQComponent`-style nested format).
184
- - ArrayProperty<ObjectProperty> with the JianZhuInstYuanXings
185
- per-element placement-binary blocks (8-byte header + three
186
- stride/count sections per yuan-xing prototype).
187
- - ArrayProperty<TextProperty> elements with FText history types
188
- -1, 0, 2, and 4 (including the legacy UE3-style uint32 booleans in
189
- FNumberFormattingOptions).
190
- - SetProperty<StructProperty> with implicit FGuid struct keys.
191
- - Custom Soulmask Map<Struct,Struct> framing.
280
+ Coverage includes every known Soulmask wire-format quirk:
281
+
282
+ - **kind=0x01 ObjectProperty with the 4-byte actor-ref prefix.**
283
+ Soulmask's hard actor references (a pawn pointing at its inventory
284
+ actor, for example) prepend an extra 4-byte field between the kind
285
+ byte and the path FString. Observed value is always 1; semantic is
286
+ unknown but the bytes are captured and replayed verbatim.
287
+ - **Embedded ObjectProperty streams with the 4-byte FName.Number trailer.**
288
+ Some Soulmask nested ObjectProperty values (`JianZhuInstGLQComponent`
289
+ is the canonical example; `JianZhu` = "building") carry the
290
+ outermost-stream None trailer (a 4-byte FName.Number = 0) after their
291
+ embedded property stream, where stock UE 4.27 nested streams do not.
292
+ - **ArrayProperty<ObjectProperty> with per-element placement-binary blocks.**
293
+ `JianZhuInstYuanXings` arrays (`YuanXing` = "prototype", so
294
+ "building-zone yuan-xing" is the list of building-piece prototypes
295
+ inside a building zone) interleave a fixed-shape binary block after
296
+ each ObjectProperty element: an 8-byte header + three stride/count
297
+ sections (per-piece world transforms, ids, and aux data).
298
+ - **ArrayProperty<TextProperty> with mixed FText history types.**
299
+ Elements use history types -1 (culture-invariant), 0 (localized),
300
+ 2 (ordered format), and 4 (`FTextHistory_AsNumber`). History type 4
301
+ embeds a legacy UE3-style `FNumberFormattingOptions` whose boolean
302
+ fields are 4 bytes wide rather than the modern 1 byte; the codec
303
+ emits this correctly.
304
+ - **SetProperty<StructProperty> with implicit FGuid struct keys.**
305
+ Soulmask `SetProperty` declarations whose inner is `StructProperty`
306
+ don't carry an inner struct shape; every populated instance in
307
+ `world.db` uses raw 16-byte FGuids as elements.
308
+ - **Custom Soulmask Map<Struct,Struct> framing.** The guild-data maps
309
+ (`GongHuiMap`, `PlayerGongHuiDataMap`; `GongHui` = "guild") use a
310
+ non-standard layout. The map's tag.size lies (observed 632838 vs
311
+ actual 636422); pair shapes are detected by peeking at the next
312
+ bytes rather than trusting the declared size.
192
313
 
193
314
  ## Running the test
194
315
 
195
316
  ```sh
196
- git clone … # repo with world.db at the root
197
- cd repo/wscodec
317
+ git clone https://github.com/auroris/SoulmaskCodec.git
318
+ cd SoulmaskCodec
198
319
  npm install
199
- npm test # looks for world.db two dirs up by default
320
+ npm test # looks for world.db two dirs up by default
200
321
  # or
201
322
  node test/test-roundtrip.mjs /path/to/world.db
202
323
  ```
203
324
 
204
- Test deps: `lz4-wasm-nodejs` (LZ4 inside the test), `better-sqlite3`
205
- (reads the `world.db` SQLite file). Both are picked up via npm
206
- module resolution; if `better-sqlite3` isn't installed at the package
207
- root the test will surface that with a clear error.
325
+ Test deps: `lz4-wasm-nodejs` (LZ4 inside the test) and
326
+ `better-sqlite3` (reads the `world.db` SQLite file). Both are picked
327
+ up via npm module resolution; if `better-sqlite3` isn't installed at
328
+ the package root the test will surface that with a clear error. See
329
+ the Setup section above for the build-tools prerequisite on Windows.
208
330
 
209
331
  ## License
210
332
 
package/io.mjs CHANGED
@@ -1,5 +1,5 @@
1
1
  /**
2
- * Cursor + Writer byte-level read/write primitives over a Uint8Array.
2
+ * Cursor + Writer: byte-level read/write primitives over a Uint8Array.
3
3
  *
4
4
  * No Unreal semantics here. FString lives here too because it's a stateful
5
5
  * read/write on the same DataView; everything else (FName, FGuid, structs,
@@ -15,8 +15,33 @@ export class Cursor {
15
15
  pos() { return this.offset; }
16
16
  eof() { return this.offset >= this.bytes.length; }
17
17
  remaining() { return this.bytes.length - this.offset; }
18
- skip(n) { this.offset += n; }
19
- seek(n) { this.offset = n; }
18
+
19
+ /**
20
+ * Advance the cursor by `n` bytes. Throws RangeError if `n` is negative or
21
+ * would take the cursor past the end of the buffer. Use `seek(n)` to jump
22
+ * to an absolute offset (including backwards).
23
+ */
24
+ skip(n) {
25
+ if (!Number.isFinite(n) || n < 0) {
26
+ throw new RangeError(`Cursor.skip: n must be a non-negative finite number, got ${n}`);
27
+ }
28
+ if (this.offset + n > this.bytes.length) {
29
+ throw new RangeError(`Cursor.skip: ${n} bytes from offset ${this.offset} exceeds buffer length ${this.bytes.length}`);
30
+ }
31
+ this.offset += n;
32
+ }
33
+
34
+ /**
35
+ * Move the cursor to absolute offset `n`. Throws RangeError if `n` is out
36
+ * of `[0, buffer.length]` (note: length is allowed; the cursor is then at
37
+ * EOF and any further read would throw).
38
+ */
39
+ seek(n) {
40
+ if (!Number.isFinite(n) || n < 0 || n > this.bytes.length) {
41
+ throw new RangeError(`Cursor.seek: offset ${n} out of range [0, ${this.bytes.length}]`);
42
+ }
43
+ this.offset = n;
44
+ }
20
45
 
21
46
  readUint8() { const v = this.dv.getUint8(this.offset); this.offset += 1; return v; }
22
47
  readInt8() { const v = this.dv.getInt8(this.offset); this.offset += 1; return v; }
@@ -28,6 +53,14 @@ export class Cursor {
28
53
  readInt64() { const v = this.dv.getBigInt64(this.offset, true); this.offset += 8; return v; }
29
54
  readFloat32() { const v = this.dv.getFloat32(this.offset, true); this.offset += 4; return v; }
30
55
  readFloat64() { const v = this.dv.getFloat64(this.offset, true); this.offset += 8; return v; }
56
+
57
+ /**
58
+ * Read `n` bytes and return them as a Uint8Array VIEW over the underlying
59
+ * buffer (no copy). The returned subarray shares storage with this cursor's
60
+ * buffer: mutating it mutates the buffer, and the view becomes stale if
61
+ * the buffer is detached. Callers that need to retain the bytes past the
62
+ * buffer's lifetime should `.slice()` the result.
63
+ */
31
64
  readBytes(n) { const out = this.bytes.subarray(this.offset, this.offset + n); this.offset += n; return out; }
32
65
 
33
66
  /**
@@ -94,8 +127,18 @@ export class Writer {
94
127
  writeInt16(v) { this._ensure(2); this.dv.setInt16(this.offset, v, true); this.offset += 2; }
95
128
  writeUint32(v) { this._ensure(4); this.dv.setUint32(this.offset, v >>> 0, true); this.offset += 4; }
96
129
  writeInt32(v) { this._ensure(4); this.dv.setInt32(this.offset, v | 0, true); this.offset += 4; }
97
- writeUint64(v) { this._ensure(8); this.dv.setBigUint64(this.offset, BigInt(v), true); this.offset += 8; }
98
- writeInt64(v) { this._ensure(8); this.dv.setBigInt64(this.offset, BigInt(v), true); this.offset += 8; }
130
+
131
+ /**
132
+ * Write a 64-bit unsigned integer. Accepts BigInt, a decimal string, or a
133
+ * safe-integer Number (|v| <= Number.MAX_SAFE_INTEGER = 2^53 - 1). A Number
134
+ * outside that range throws RangeError rather than silently losing precision
135
+ * via `BigInt(largeNumber)`. The codec's decoders return I64/U64 values as
136
+ * strings for this reason; this guard catches accidental mutation that
137
+ * substitutes an unsafe Number.
138
+ */
139
+ writeUint64(v) { this._ensure(8); this.dv.setBigUint64(this.offset, _toBigInt64(v, 'Writer.writeUint64'), true); this.offset += 8; }
140
+ /** Signed 64-bit integer. See writeUint64 for accepted value forms. */
141
+ writeInt64(v) { this._ensure(8); this.dv.setBigInt64(this.offset, _toBigInt64(v, 'Writer.writeInt64'), true); this.offset += 8; }
99
142
  writeFloat32(v) { this._ensure(4); this.dv.setFloat32(this.offset, v, true); this.offset += 4; }
100
143
  writeFloat64(v) { this._ensure(8); this.dv.setFloat64(this.offset, v, true); this.offset += 8; }
101
144
  writeBytes(u8) { this._ensure(u8.length); this.bytes.set(u8, this.offset); this.offset += u8.length; }
@@ -148,3 +191,23 @@ export class Writer {
148
191
  }
149
192
  }
150
193
  }
194
+
195
+ /**
196
+ * Coerce a 64-bit integer value into a BigInt suitable for
197
+ * DataView.setBig{Int,Uint}64. Accepts BigInt directly; converts string and
198
+ * safe-integer Number; throws on unsafe Number or unsupported types.
199
+ *
200
+ * The motivation: `BigInt(largeNumber)` silently loses precision for
201
+ * |v| > 2^53. The decoder paths return I64/U64 values as decimal strings
202
+ * specifically to avoid this; tightening the writer's contract catches
203
+ * accidental round-trip-breaking mutation at the source.
204
+ */
205
+ function _toBigInt64(v, fnName) {
206
+ if (typeof v === 'bigint') return v;
207
+ if (typeof v === 'string') return BigInt(v);
208
+ if (typeof v === 'number') {
209
+ if (Number.isInteger(v) && Math.abs(v) <= Number.MAX_SAFE_INTEGER) return BigInt(v);
210
+ throw new RangeError(`${fnName}: Number ${v} is unsafe for 64-bit conversion (non-integer or |v| > 2^53). Pass a BigInt or decimal string.`);
211
+ }
212
+ throw new TypeError(`${fnName}: expected BigInt, string, or safe-integer Number; got ${typeof v}`);
213
+ }