wscodec 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,13 +1,17 @@
1
1
  # wscodec
2
2
 
3
- Pure-JS codec for Soulmask `actor_data` property streams the binary
4
- payload UE 4.27 emits for every actor's serialized state, with the
5
- Soulmask-specific quirks layered on top.
3
+ Pure-JS codec for Soulmask `actor_data` property streams. Soulmask is a
4
+ survival game whose dedicated server stores world state in a `world.db`
5
+ SQLite file; every actor's serialized state lives in the `actor_data`
6
+ column as an LZ4-compressed UE 4.27 `FPropertyTag` byte stream with a
7
+ few Soulmask-specific quirks layered on top.
8
+
9
+ wscodec parses the property stream into a JavaScript object tree and
10
+ serializes it back. Repo: https://github.com/auroris/SoulmaskCodec.
6
11
 
7
12
  Zero runtime dependencies. Accepts uncompressed bytes, returns
8
13
  JavaScript objects, and vice versa. Round-trip is byte-identical
9
- against every actor in a tested `world.db` (174.6 MB across 11,667
10
- rows; `npm test`).
14
+ against every actor in a tested `world.db` (`npm test`).
11
15
 
12
16
  ## Scope
13
17
 
@@ -23,7 +27,7 @@ actor_data column bytes
23
27
  ```
24
28
 
25
29
  wscodec handles the bottom half (the bytes that come out of LZ4
26
- decompression). The caller handles LZ4 see "LZ4 integration" below
30
+ decompression). The caller handles LZ4. See "LZ4 integration" below
27
31
  for a copy-paste recipe with `lz4-wasm-nodejs`.
28
32
 
29
33
  The SQLite `actor_table.data_version` column stores the NEGATIVE of
@@ -31,11 +35,36 @@ the on-wire DataVersion. A healthy blob with wire `DataVersion=2`
31
35
  lives in a row whose `data_version` column reads `-2`. The wire
32
36
  bytes themselves are always the unsigned `0x00000002`.
33
37
 
34
- ## Install
38
+ ## Setup
35
39
 
36
- ```sh
37
- npm install wscodec
38
- ```
40
+ wscodec itself has zero runtime dependencies, but a realistic workflow
41
+ also needs LZ4 decompression and a SQLite reader. The recommended
42
+ stack:
43
+
44
+ 1. **Node.js LTS.** Install from <https://nodejs.org/>. On Windows, tick
45
+ the "Automatically install the necessary tools" checkbox in the
46
+ installer; this pulls in the Visual Studio Build Tools and Python
47
+ that `better-sqlite3` needs to compile its native bindings. Without
48
+ them `npm install better-sqlite3` will fail with a node-gyp error.
49
+
50
+ 2. **Install wscodec:**
51
+
52
+ ```sh
53
+ npm install wscodec
54
+ ```
55
+
56
+ 3. **Install the LZ4 + SQLite peers** when you need them:
57
+
58
+ ```sh
59
+ npm install lz4-wasm-nodejs better-sqlite3
60
+ ```
61
+
62
+ - `lz4-wasm-nodejs` is pure WASM, no build step.
63
+ - `better-sqlite3` builds native bindings (hence the optional tools above).
64
+
65
+ The test suite uses both peers; if you're only consuming wscodec
66
+ programmatically against bytes you already have in memory, neither
67
+ peer is required.
39
68
 
40
69
  ## API
41
70
 
@@ -44,9 +73,9 @@ npm install wscodec
44
73
  ```js
45
74
  import { UnrealBlob } from 'wscodec';
46
75
 
47
- const blob = UnrealBlob.decode(uncompressedBytes); // Uint8Array blob
48
- const bytes = blob.serialize(); // blob Uint8Array
49
- UnrealBlob.detect(u8); // sniff version tag boolean
76
+ const blob = UnrealBlob.decode(uncompressedBytes); // Uint8Array to blob
77
+ const bytes = blob.serialize(); // blob to Uint8Array
78
+ UnrealBlob.detect(u8); // sniff version tag, returns boolean
50
79
  ```
51
80
 
52
81
  `UnrealBlob.decode(u8)` parses the version tag + property stream and
@@ -68,25 +97,28 @@ is true it re-emits the property stream from `properties` via
68
97
  `writePropertyStream`.
69
98
 
70
99
  `blob.findProperty(name)` returns the first top-level property whose
71
- tag name matches, or `null`.
100
+ tag name matches, or `null`. It does NOT traverse into embedded
101
+ streams, struct values, array elements, or map entries; use
102
+ `blob.findPropertyDeep(name)` for a depth-first walk across the whole
103
+ property tree.
72
104
 
73
105
  ### Property tree
74
106
 
75
107
  `blob.properties` is an array of `Property` instances. Each carries
76
- a `PropertyTag` (`name`, `type`, `size`, ) and a `value` whose
108
+ a `PropertyTag` (`name`, `type`, `size`, ...) and a `value` whose
77
109
  JavaScript shape depends on the tag's type:
78
110
 
79
111
  | tag type | value shape |
80
112
  |---|---|
81
- | `IntProperty`, `FloatProperty`, `BoolProperty`, | plain JS primitive |
113
+ | `IntProperty`, `FloatProperty`, `BoolProperty`, ... | plain JS primitive |
82
114
  | `StrProperty`, `NameProperty` | string / `FName` |
83
- | `StructProperty` | `StructValue` `.value` is either a plain object (known binary structs like `Vector`, `Quat`, `Transform`, `Guid`, …) or a nested property array |
115
+ | `StructProperty` | `StructValue`. `.value` is either a plain object for known binary structs (`Vector`, `Quat`, `Transform`, ...), an `FGuid` instance for the `Guid` struct, or a nested property array for unknown structs |
84
116
  | `ArrayProperty`, `SetProperty` | `ArrayValue` / `SetValue` with `.elements` |
85
- | `MapProperty` | `MapValue` with `.entries: [[key, value], ]` |
117
+ | `MapProperty` | `MapValue` with `.entries: [[key, value], ...]` |
86
118
  | `ObjectProperty`, `ClassProperty`, `Weak*`, `Lazy*`, `WSObjectProperty` | `ObjectRef` (kind + optional path/classPath/embedded stream) |
87
119
  | `SoftObjectProperty`, `SoftClassProperty` | `SoftObjectRef` (`assetPath`, `subPath`) |
88
120
  | `TextProperty` | `FTextValue` (handles UE4 FText history types -1, 0, 2, 4) |
89
- | anything wscodec couldn't structurally decode | `OpaqueValue` bytes retained verbatim |
121
+ | anything wscodec couldn't structurally decode | `OpaqueValue`. Bytes retained verbatim |
90
122
 
91
123
  Submodule re-exports make the value classes importable directly:
92
124
 
@@ -97,33 +129,103 @@ import { FName, FGuid } from 'wscodec';
97
129
  ```
98
130
 
99
131
  Lower-level helpers (`Cursor`, `Writer`, `readPropertyStream`,
100
- `writePropertyStream`, `readValue`, `writeValue`, `STRUCT_HANDLERS`)
101
- are also exported for callers building custom workflows on top.
132
+ `writePropertyStream`, `readValue`, `writeValue`, `STRUCT_HANDLERS`,
133
+ `registerStructHandler`) are also exported for callers building
134
+ custom workflows on top.
135
+
136
+ ### Extending the struct registry
137
+
138
+ `STRUCT_HANDLERS` is a mutable registry of binary struct handlers
139
+ (`Vector`, `Quat`, `Transform`, `Guid`, ...). Unknown struct names
140
+ fall through to the nested-property-stream path, which is correct
141
+ when the struct is tagged and byte-identical via `OpaqueValue` when
142
+ it isn't. To teach the codec a new binary shape, use
143
+ `registerStructHandler`:
144
+
145
+ ```js
146
+ import { registerStructHandler, Cursor, Writer } from 'wscodec';
147
+
148
+ registerStructHandler('MyVector', {
149
+ read: (c) => ({ x: c.readFloat32(), y: c.readFloat32(), z: c.readFloat32() }),
150
+ write: (w, v) => { w.writeFloat32(v.x); w.writeFloat32(v.y); w.writeFloat32(v.z); },
151
+ });
152
+ ```
153
+
154
+ The helper validates that both `read(cursor)` and `write(writer, value)`
155
+ are functions. Register before calling `UnrealBlob.decode` on any blob
156
+ that uses the type.
102
157
 
103
158
  ### Editing
104
159
 
160
+ The library does not provide typed mutators. Callers manipulate the
161
+ `properties` tree directly, then set `_dirty` on the ROOT blob to
162
+ force a re-encode.
163
+
105
164
  ```js
106
- import { UnrealBlob } from 'wscodec';
165
+ import { UnrealBlob, FName } from 'wscodec';
107
166
 
108
167
  const blob = UnrealBlob.decode(inner);
109
168
 
110
- // Mutate the tree directly. The library does not provide typed
111
- // mutators; callers manipulate `properties` and set `_dirty` to
112
- // force re-encode.
169
+ // (1) Edit a primitive value.
170
+ // JianZhuHP (jianzhu = "building") is the building's HP property.
113
171
  blob.findProperty('JianZhuHP').value = 100;
172
+
173
+ // (2) Replace an FName-typed value. NameProperty values are FName
174
+ // instances, not bare strings.
175
+ blob.findProperty('CharacterClass').value = new FName('NPC_Skeleton');
176
+
177
+ // (3) Mutate a nested struct. Known binary structs (Vector, Quat, ...)
178
+ // expose .value as a plain object.
179
+ const transform = blob.findProperty('Transform');
180
+ transform.value.value.translation = { x: 100, y: 200, z: 50 };
181
+
182
+ // (4) Append to an array. ArrayValue.elements is a plain JS array.
183
+ const inventory = blob.findProperty('InventoryItems');
184
+ inventory.value.elements.push(new FName('Item_Wood'));
185
+
186
+ // (5) Remove an element. Just splice it out; don't set null.
187
+ inventory.value.elements.splice(0, 1);
188
+
189
+ // Always set _dirty on the ROOT blob (not on nested properties). The
190
+ // flag is read by blob.serialize() to decide pass-through vs re-encode.
114
191
  blob._dirty = true;
115
192
 
116
193
  const updatedBytes = blob.serialize(); // re-emits from properties
117
194
  ```
118
195
 
196
+ Gotchas:
197
+
198
+ - `_dirty` lives on the root `UnrealBlob`, not on nested `Property` /
199
+ `ArrayValue` / `StructValue` objects. Mutating a deep value without
200
+ setting `blob._dirty = true` returns the original `_raw` bytes
201
+ unchanged.
202
+ - `BoolProperty` values live in the `tag` (`tag.boolVal`), not in
203
+ `property.value`. To flip a bool, edit `prop.tag.boolVal`.
204
+ - Removing a property means splicing it out of `blob.properties`, not
205
+ setting `property.value = null`.
206
+ - If you change a value's encoded SIZE (e.g. extending an FString),
207
+ the property's `tag.size` is recomputed on write, but any property
208
+ that previously carried a `_sizeMismatch` annotation refuses to
209
+ re-emit. Such properties are extremely rare in healthy world.db
210
+ files and are reported by `npm test`.
211
+ - `serialize()` throws if `_dirty` is true AND `error` is set:
212
+ re-emitting from an empty properties array would produce a malformed
213
+ stream. Leave `_dirty=false` to pass through `_raw` verbatim, or
214
+ clear `.error` first if you've replaced `.properties` manually.
215
+ - 64-bit integer values (`Int64Property`, `UInt64Property`,
216
+ `DateTime`, `Timespan`) round-trip as decimal strings. If you
217
+ replace such a value with a Number, it must be a safe integer
218
+ (`|v| <= Number.MAX_SAFE_INTEGER`); otherwise the writer throws
219
+ rather than silently lose precision.
220
+
119
221
  `serialize()` for a dirty blob is byte-identical to a fresh
120
- `decode + serialize` cycle on its output verified on every row
121
- of the tested `world.db`.
222
+ `decode + serialize` cycle on its output, verified on every row of
223
+ the tested `world.db`.
122
224
 
123
225
  ## LZ4 integration
124
226
 
125
227
  `actor_data` column bytes come out of LZ4 compression. wscodec
126
- doesn't bundle an LZ4 implementation that's a caller concern. A
228
+ doesn't bundle an LZ4 implementation; that's a caller concern. A
127
229
  working recipe using `lz4-wasm-nodejs`:
128
230
 
129
231
  ```js
@@ -161,11 +263,11 @@ for (const row of db.prepare('SELECT actor_serial, actor_data FROM actor_table')
161
263
  }
162
264
  ```
163
265
 
164
- Note: LZ4 compression is not deterministic two compressors will
165
- produce different bytes for the same input. wscodec's
166
- byte-identical guarantee covers the inner property-stream bytes;
167
- the outer column bytes round-trip only for unmodified blobs (cache
168
- the input column bytes if you need that).
266
+ Note: LZ4 compression is not deterministic. Two compressors will
267
+ produce different bytes for the same input. wscodec's byte-identical
268
+ guarantee covers the inner property-stream bytes; the outer column
269
+ bytes round-trip only for unmodified blobs (cache the input column
270
+ bytes if you need that).
169
271
 
170
272
  ## Round-trip guarantees
171
273
 
@@ -175,36 +277,56 @@ For every row in the tested `world.db`:
175
277
  - `blob.serialize()` with `_dirty = false` returns the input bytes byte-identical.
176
278
  - `blob.serialize()` with `_dirty = true` re-emits from `properties` and is byte-identical to the input.
177
279
 
178
- Coverage: 174,610,207 bytes across 11,667 actors, including every
179
- known Soulmask wire-format quirk:
180
-
181
- - `kind=0x01` ObjectProperty with the 4-byte actor-ref prefix.
182
- - Embedded ObjectProperty streams with the 4-byte FName.Number trailer
183
- (the Soulmask `JianZhuInstGLQComponent`-style nested format).
184
- - ArrayProperty<ObjectProperty> with the JianZhuInstYuanXings
185
- per-element placement-binary blocks (8-byte header + three
186
- stride/count sections per yuan-xing prototype).
187
- - ArrayProperty<TextProperty> elements with FText history types
188
- -1, 0, 2, and 4 (including the legacy UE3-style uint32 booleans in
189
- FNumberFormattingOptions).
190
- - SetProperty<StructProperty> with implicit FGuid struct keys.
191
- - Custom Soulmask Map<Struct,Struct> framing.
280
+ Coverage includes every known Soulmask wire-format quirk:
281
+
282
+ - **kind=0x01 ObjectProperty with the 4-byte actor-ref prefix.**
283
+ Soulmask's hard actor references (a pawn pointing at its inventory
284
+ actor, for example) prepend an extra 4-byte field between the kind
285
+ byte and the path FString. Observed value is always 1; semantic is
286
+ unknown but the bytes are captured and replayed verbatim.
287
+ - **Embedded ObjectProperty streams with the 4-byte FName.Number trailer.**
288
+ Some Soulmask nested ObjectProperty values (`JianZhuInstGLQComponent`
289
+ is the canonical example; `JianZhu` = "building") carry the
290
+ outermost-stream None trailer (a 4-byte FName.Number = 0) after their
291
+ embedded property stream, where stock UE 4.27 nested streams do not.
292
+ - **ArrayProperty<ObjectProperty> with per-element placement-binary blocks.**
293
+ `JianZhuInstYuanXings` arrays (`YuanXing` = "prototype", so
294
+ "building-zone yuan-xing" is the list of building-piece prototypes
295
+ inside a building zone) interleave a fixed-shape binary block after
296
+ each ObjectProperty element: an 8-byte header + three stride/count
297
+ sections (per-piece world transforms, ids, and aux data).
298
+ - **ArrayProperty<TextProperty> with mixed FText history types.**
299
+ Elements use history types -1 (culture-invariant), 0 (localized),
300
+ 2 (ordered format), and 4 (`FTextHistory_AsNumber`). History type 4
301
+ embeds a legacy UE3-style `FNumberFormattingOptions` whose boolean
302
+ fields are 4 bytes wide rather than the modern 1 byte; the codec
303
+ emits this correctly.
304
+ - **SetProperty<StructProperty> with implicit FGuid struct keys.**
305
+ Soulmask `SetProperty` declarations whose inner is `StructProperty`
306
+ don't carry an inner struct shape; every populated instance in
307
+ `world.db` uses raw 16-byte FGuids as elements.
308
+ - **Custom Soulmask Map<Struct,Struct> framing.** The guild-data maps
309
+ (`GongHuiMap`, `PlayerGongHuiDataMap`; `GongHui` = "guild") use a
310
+ non-standard layout. The map's tag.size lies (observed 632838 vs
311
+ actual 636422); pair shapes are detected by peeking at the next
312
+ bytes rather than trusting the declared size.
192
313
 
193
314
  ## Running the test
194
315
 
195
316
  ```sh
196
- git clone … # repo with world.db at the root
197
- cd repo/wscodec
317
+ git clone https://github.com/auroris/SoulmaskCodec.git
318
+ cd SoulmaskCodec
198
319
  npm install
199
- npm test # looks for world.db two dirs up by default
320
+ npm test # looks for world.db two dirs up by default
200
321
  # or
201
322
  node test/test-roundtrip.mjs /path/to/world.db
202
323
  ```
203
324
 
204
- Test deps: `lz4-wasm-nodejs` (LZ4 inside the test), `better-sqlite3`
205
- (reads the `world.db` SQLite file). Both are picked up via npm
206
- module resolution; if `better-sqlite3` isn't installed at the package
207
- root the test will surface that with a clear error.
325
+ Test deps: `lz4-wasm-nodejs` (LZ4 inside the test) and
326
+ `better-sqlite3` (reads the `world.db` SQLite file). Both are picked
327
+ up via npm module resolution; if `better-sqlite3` isn't installed at
328
+ the package root the test will surface that with a clear error. See
329
+ the Setup section above for the build-tools prerequisite on Windows.
208
330
 
209
331
  ## License
210
332
 
package/io.mjs CHANGED
@@ -1,5 +1,5 @@
1
1
  /**
2
- * Cursor + Writer byte-level read/write primitives over a Uint8Array.
2
+ * Cursor + Writer: byte-level read/write primitives over a Uint8Array.
3
3
  *
4
4
  * No Unreal semantics here. FString lives here too because it's a stateful
5
5
  * read/write on the same DataView; everything else (FName, FGuid, structs,
@@ -15,8 +15,33 @@ export class Cursor {
15
15
  pos() { return this.offset; }
16
16
  eof() { return this.offset >= this.bytes.length; }
17
17
  remaining() { return this.bytes.length - this.offset; }
18
- skip(n) { this.offset += n; }
19
- seek(n) { this.offset = n; }
18
+
19
+ /**
20
+ * Advance the cursor by `n` bytes. Throws RangeError if `n` is negative or
21
+ * would take the cursor past the end of the buffer. Use `seek(n)` to jump
22
+ * to an absolute offset (including backwards).
23
+ */
24
+ skip(n) {
25
+ if (!Number.isFinite(n) || n < 0) {
26
+ throw new RangeError(`Cursor.skip: n must be a non-negative finite number, got ${n}`);
27
+ }
28
+ if (this.offset + n > this.bytes.length) {
29
+ throw new RangeError(`Cursor.skip: ${n} bytes from offset ${this.offset} exceeds buffer length ${this.bytes.length}`);
30
+ }
31
+ this.offset += n;
32
+ }
33
+
34
+ /**
35
+ * Move the cursor to absolute offset `n`. Throws RangeError if `n` is out
36
+ * of `[0, buffer.length]` (note: length is allowed; the cursor is then at
37
+ * EOF and any further read would throw).
38
+ */
39
+ seek(n) {
40
+ if (!Number.isFinite(n) || n < 0 || n > this.bytes.length) {
41
+ throw new RangeError(`Cursor.seek: offset ${n} out of range [0, ${this.bytes.length}]`);
42
+ }
43
+ this.offset = n;
44
+ }
20
45
 
21
46
  readUint8() { const v = this.dv.getUint8(this.offset); this.offset += 1; return v; }
22
47
  readInt8() { const v = this.dv.getInt8(this.offset); this.offset += 1; return v; }
@@ -28,6 +53,14 @@ export class Cursor {
28
53
  readInt64() { const v = this.dv.getBigInt64(this.offset, true); this.offset += 8; return v; }
29
54
  readFloat32() { const v = this.dv.getFloat32(this.offset, true); this.offset += 4; return v; }
30
55
  readFloat64() { const v = this.dv.getFloat64(this.offset, true); this.offset += 8; return v; }
56
+
57
+ /**
58
+ * Read `n` bytes and return them as a Uint8Array VIEW over the underlying
59
+ * buffer (no copy). The returned subarray shares storage with this cursor's
60
+ * buffer: mutating it mutates the buffer, and the view becomes stale if
61
+ * the buffer is detached. Callers that need to retain the bytes past the
62
+ * buffer's lifetime should `.slice()` the result.
63
+ */
31
64
  readBytes(n) { const out = this.bytes.subarray(this.offset, this.offset + n); this.offset += n; return out; }
32
65
 
33
66
  /**
@@ -94,8 +127,18 @@ export class Writer {
94
127
  writeInt16(v) { this._ensure(2); this.dv.setInt16(this.offset, v, true); this.offset += 2; }
95
128
  writeUint32(v) { this._ensure(4); this.dv.setUint32(this.offset, v >>> 0, true); this.offset += 4; }
96
129
  writeInt32(v) { this._ensure(4); this.dv.setInt32(this.offset, v | 0, true); this.offset += 4; }
97
- writeUint64(v) { this._ensure(8); this.dv.setBigUint64(this.offset, BigInt(v), true); this.offset += 8; }
98
- writeInt64(v) { this._ensure(8); this.dv.setBigInt64(this.offset, BigInt(v), true); this.offset += 8; }
130
+
131
+ /**
132
+ * Write a 64-bit unsigned integer. Accepts BigInt, a decimal string, or a
133
+ * safe-integer Number (|v| <= Number.MAX_SAFE_INTEGER = 2^53 - 1). A Number
134
+ * outside that range throws RangeError rather than silently losing precision
135
+ * via `BigInt(largeNumber)`. The codec's decoders return I64/U64 values as
136
+ * strings for this reason; this guard catches accidental mutation that
137
+ * substitutes an unsafe Number.
138
+ */
139
+ writeUint64(v) { this._ensure(8); this.dv.setBigUint64(this.offset, _toBigInt64(v, 'Writer.writeUint64'), true); this.offset += 8; }
140
+ /** Signed 64-bit integer. See writeUint64 for accepted value forms. */
141
+ writeInt64(v) { this._ensure(8); this.dv.setBigInt64(this.offset, _toBigInt64(v, 'Writer.writeInt64'), true); this.offset += 8; }
99
142
  writeFloat32(v) { this._ensure(4); this.dv.setFloat32(this.offset, v, true); this.offset += 4; }
100
143
  writeFloat64(v) { this._ensure(8); this.dv.setFloat64(this.offset, v, true); this.offset += 8; }
101
144
  writeBytes(u8) { this._ensure(u8.length); this.bytes.set(u8, this.offset); this.offset += u8.length; }
@@ -148,3 +191,23 @@ export class Writer {
148
191
  }
149
192
  }
150
193
  }
194
+
195
+ /**
196
+ * Coerce a 64-bit integer value into a BigInt suitable for
197
+ * DataView.setBig{Int,Uint}64. Accepts BigInt directly; converts string and
198
+ * safe-integer Number; throws on unsafe Number or unsupported types.
199
+ *
200
+ * The motivation: `BigInt(largeNumber)` silently loses precision for
201
+ * |v| > 2^53. The decoder paths return I64/U64 values as decimal strings
202
+ * specifically to avoid this; tightening the writer's contract catches
203
+ * accidental round-trip-breaking mutation at the source.
204
+ */
205
+ function _toBigInt64(v, fnName) {
206
+ if (typeof v === 'bigint') return v;
207
+ if (typeof v === 'string') return BigInt(v);
208
+ if (typeof v === 'number') {
209
+ if (Number.isInteger(v) && Math.abs(v) <= Number.MAX_SAFE_INTEGER) return BigInt(v);
210
+ throw new RangeError(`${fnName}: Number ${v} is unsafe for 64-bit conversion (non-integer or |v| > 2^53). Pass a BigInt or decimal string.`);
211
+ }
212
+ throw new TypeError(`${fnName}: expected BigInt, string, or safe-integer Number; got ${typeof v}`);
213
+ }
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "wscodec",
3
- "version": "0.1.0",
4
- "description": "Pure-JS codec for Soulmask actor_data property streams (UE4.27 FPropertyTag wire format). Zero runtime dependencies accepts uncompressed bytes, returns JS objects, and vice versa. Round-trip byte-identical against every actor.",
3
+ "version": "0.2.0",
4
+ "description": "Pure-JS codec for Soulmask actor_data property streams (UE 4.27 FPropertyTag wire format). Zero runtime dependencies. Accepts uncompressed bytes, returns JS objects, and vice versa. Round-trip byte-identical against every actor.",
5
5
  "type": "module",
6
6
  "main": "./wscodec.mjs",
7
7
  "exports": {
@@ -29,9 +29,23 @@
29
29
  "unreal",
30
30
  "ue4",
31
31
  "fpropertytag",
32
- "codec"
32
+ "codec",
33
+ "world.db",
34
+ "actor_data"
33
35
  ],
36
+ "repository": {
37
+ "type": "git",
38
+ "url": "git+https://github.com/auroris/SoulmaskCodec.git"
39
+ },
40
+ "homepage": "https://github.com/auroris/SoulmaskCodec#readme",
41
+ "bugs": {
42
+ "url": "https://github.com/auroris/SoulmaskCodec/issues"
43
+ },
44
+ "author": "auroris",
34
45
  "license": "MIT",
46
+ "engines": {
47
+ "node": ">=20"
48
+ },
35
49
  "devDependencies": {
36
50
  "better-sqlite3": "^12.10.0",
37
51
  "lz4-wasm-nodejs": "^0.9.2"
package/primitives.mjs CHANGED
@@ -1,15 +1,28 @@
1
1
  /**
2
- * FName and FGuid.
2
+ * FName and FGuid: the two pervasive identifier types in UE serialization.
3
3
  *
4
- * In this Soulmask format FName is serialized as a plain FString (no
5
- * trailing FName.Number int32). The `number` field stays 0 and exists
6
- * only for symmetry with full UE FNames.
4
+ * Soulmask's quirk for FName: the property-stream wire form is a plain
5
+ * FString (no trailing FName.Number int32). Stock UE 4.27 serializes FName
6
+ * inside a property tag as FString + int32 Number; Soulmask drops the int32
7
+ * everywhere except the OUTERMOST None terminator (which still carries a
8
+ * 4-byte FName.Number = 0 trailer, handled by the property-stream reader).
9
+ *
10
+ * `FName.read` / instance `write` therefore use the Soulmask form (bare
11
+ * FString). The full UE form is exposed separately as `FName.readWithNumber`
12
+ * / instance `writeWithNumber`: not used by Soulmask today, but wired up so
13
+ * the codec can speak the standard wire format if the game's serializer ever
14
+ * adopts it.
7
15
  */
8
16
 
17
+ const ZERO_GUID = '00000000-0000-0000-0000-000000000000';
18
+
9
19
  export class FName {
10
20
  constructor(value, { isUnicode = false, number = 0, isNull = false } = {}) {
11
21
  this.value = value;
12
22
  this.isUnicode = isUnicode;
23
+ // FName.Number: zero in every observed Soulmask FName (the wire form
24
+ // omits it). Preserved on the instance for round-trip when the full UE
25
+ // form is used via readWithNumber/writeWithNumber.
13
26
  this.number = number;
14
27
  // Tracks the wire-form distinction between an FString with SaveNum=0
15
28
  // (the "null" form) and SaveNum=1 (empty-with-terminator). Only ever
@@ -18,15 +31,39 @@ export class FName {
18
31
  this.isNull = isNull;
19
32
  }
20
33
  toString() { return this.value; }
34
+ /** JSON-friendly form: the bare name string (drops isUnicode/number/isNull metadata). */
35
+ toJSON() { return this.value; }
21
36
 
37
+ /**
38
+ * Read an FName in the Soulmask property-stream form: a bare FString,
39
+ * no trailing FName.Number. `number` is left at 0.
40
+ */
22
41
  static read(cursor) {
23
42
  const s = cursor.readFString();
24
43
  return new FName(s.value, { isUnicode: s.isUnicode, isNull: !!s.isNull });
25
44
  }
26
45
 
46
+ /** Write the Soulmask form (FString only). */
27
47
  write(writer) { writer.writeFString(this.value, this.isUnicode, this.isNull); }
28
48
 
29
- /** Accepts an FName, a bare string, or a plain {value,isUnicode,isNull} record. */
49
+ /**
50
+ * Read an FName in the stock UE 4.27 property-tag form: FString + int32
51
+ * Number. Use this if you're decoding a non-Soulmask stream or a future
52
+ * Soulmask wire format that re-adopts the int32 suffix.
53
+ */
54
+ static readWithNumber(cursor) {
55
+ const s = cursor.readFString();
56
+ const number = cursor.readInt32();
57
+ return new FName(s.value, { isUnicode: s.isUnicode, isNull: !!s.isNull, number });
58
+ }
59
+
60
+ /** Write the stock UE form (FString + int32 Number). */
61
+ writeWithNumber(writer) {
62
+ writer.writeFString(this.value, this.isUnicode, this.isNull);
63
+ writer.writeInt32(this.number | 0);
64
+ }
65
+
66
+ /** Accepts an FName, a bare string, or a plain {value,isUnicode,isNull,number} record. */
30
67
  static from(x) {
31
68
  if (x instanceof FName) return x;
32
69
  if (typeof x === 'string') return new FName(x);
@@ -46,6 +83,31 @@ export class FGuid {
46
83
  constructor(value) { this.value = value; }
47
84
  toString() { return this.value; }
48
85
 
86
+ /**
87
+ * JSON-friendly form: the bare GUID string, so `JSON.stringify(fguid)`
88
+ * yields `"AABBCCDD-..."` rather than `{"value":"AABBCCDD-..."}`.
89
+ */
90
+ toJSON() { return this.value; }
91
+
92
+ /**
93
+ * Structural equality. Case-insensitive: an FGuid constructed from a
94
+ * lowercase string compares equal to one read off the wire (uppercase).
95
+ * Accepts an FGuid or a string; anything else returns false.
96
+ */
97
+ equals(other) {
98
+ const otherStr = other instanceof FGuid ? other.value
99
+ : typeof other === 'string' ? other
100
+ : null;
101
+ if (otherStr == null) return false;
102
+ return String(this.value).toUpperCase() === otherStr.toUpperCase();
103
+ }
104
+
105
+ /** True iff the GUID is all zeros (the conventional null-GUID sentinel). */
106
+ isZero() { return String(this.value).toUpperCase() === ZERO_GUID; }
107
+
108
+ /** All-zero FGuid sentinel. New instance per call (FGuid is mutable). */
109
+ static zero() { return new FGuid(ZERO_GUID); }
110
+
49
111
  static read(cursor) {
50
112
  const A = cursor.readUint32(), B = cursor.readUint32(), C = cursor.readUint32(), D = cursor.readUint32();
51
113
  const h = (n, w) => n.toString(16).padStart(w, '0').toUpperCase();
@@ -62,6 +124,7 @@ export class FGuid {
62
124
  writer.writeUint32(A); writer.writeUint32(B); writer.writeUint32(C); writer.writeUint32(D);
63
125
  }
64
126
 
127
+ /** Accepts an FGuid or a canonical 8-4-4-4-12 hex string. */
65
128
  static from(x) {
66
129
  if (x instanceof FGuid) return x;
67
130
  if (typeof x === 'string') return new FGuid(x);
package/properties.mjs CHANGED
@@ -30,7 +30,7 @@ import { StructValue, STRUCT_HANDLERS } from './structs.mj
30
30
  import { ObjectRef, SoftObjectRef, FTextValue, OpaqueValue } from './values.mjs';
31
31
 
32
32
  // ==========================================================================
33
- // PropertyTag the header preceding each property's value bytes.
33
+ // PropertyTag: the header preceding each property's value bytes.
34
34
  // ==========================================================================
35
35
  export class PropertyTag {
36
36
  constructor(fields = {}) {
@@ -124,7 +124,7 @@ export class MapValue {
124
124
  }
125
125
 
126
126
  // ==========================================================================
127
- // Property one tag + its decoded value.
127
+ // Property: one tag + its decoded value.
128
128
  // ==========================================================================
129
129
  export class Property {
130
130
  constructor(tag, value, { sizeMismatch = null } = {}) {
@@ -137,7 +137,7 @@ export class Property {
137
137
  }
138
138
 
139
139
  // ==========================================================================
140
- // Value codec dispatches on tag.type.value.
140
+ // Value codec: dispatches on tag.type.value.
141
141
  //
142
142
  // sizeHint is the tag's Size field (bytes following the tag). Containers
143
143
  // (Array/Set/Map) and StructProperty use it as the byte budget for nested
@@ -163,7 +163,7 @@ export function readValue(cursor, tag, sizeHint) {
163
163
  case 'ClassProperty':
164
164
  case 'WeakObjectProperty':
165
165
  case 'LazyObjectProperty':
166
- case 'WSObjectProperty': // Soulmask-specific alias (per BLOB_FORMAT.md)
166
+ case 'WSObjectProperty': // Soulmask alias for ObjectProperty (same wire layout, different tag name)
167
167
  return readObjectValue(cursor, sizeHint);
168
168
  case 'SoftObjectProperty':
169
169
  case 'SoftClassProperty':
@@ -207,7 +207,7 @@ export function readValue(cursor, tag, sizeHint) {
207
207
  }
208
208
 
209
209
  export function writeValue(writer, tag, value) {
210
- // Decode may have fallen back to OpaqueValue for any property type
210
+ // Decode may have fallen back to OpaqueValue for any property type:
211
211
  // Array/Set/Map/Struct/Text decode failures, unknown property types,
212
212
  // overshoot recoveries, etc. Emit the captured bytes verbatim so the
213
213
  // outer stream stays aligned regardless of which slot held the opaque.
@@ -279,7 +279,7 @@ function readFText(cursor, sizeHint) {
279
279
  if (historyType === 0) {
280
280
  // Base / localized: namespace + key + source string. Empty strings on
281
281
  // the wire may use either null-form (SaveNum=0) or empty-with-terminator
282
- // (SaveNum=1) capture `isNull` per-field so the writer reproduces the
282
+ // (SaveNum=1). Capture `isNull` per-field so the writer reproduces the
283
283
  // exact wire form.
284
284
  const nsFS = cursor.readFString();
285
285
  const kFS = cursor.readFString();
@@ -297,7 +297,7 @@ function readFText(cursor, sizeHint) {
297
297
  // the value for that type:
298
298
  // 0=Int(int64) 1=UInt(uint64) 2=Float(f32) 3=Double(f64)
299
299
  // 4=Text(FText, recursive) 5=Gender(int8)
300
- // No argument names on the wire arguments are positional ({0}, {1} ).
300
+ // No argument names on the wire; arguments are positional ({0}, {1} ...).
301
301
  const sourceFmt = readFText(cursor, Infinity);
302
302
  const numArgs = cursor.readInt32();
303
303
  const args = [];
@@ -328,7 +328,7 @@ function readFText(cursor, sizeHint) {
328
328
  // Inside FNumberFormattingOptions, AlwaysSign and UseGrouping are also
329
329
  // uint32 booleans. Only RoundingMode (int8) and the four digit-count
330
330
  // fields (int32) follow the modern sizes. This matches the actual wire
331
- // bytes empirically MaxIntDigits = ~324 (close to DBL_MAX_10_EXP+1
331
+ // bytes: empirically MaxIntDigits = ~324 (close to DBL_MAX_10_EXP+1
332
332
  // = 309) and MaxFracDigits = 3 (UE default) under this interpretation.
333
333
  const argType = cursor.readInt8();
334
334
  let argValue;
@@ -366,7 +366,7 @@ function readFText(cursor, sizeHint) {
366
366
  }
367
367
  // Unknown history type: preserve remaining bytes verbatim for round-trip.
368
368
  // When called from an array-element context sizeHint is Infinity because
369
- // the per-element byte budget is unknown throw so the callers can decide
369
+ // the per-element byte budget is unknown; throw so the callers can decide
370
370
  // whether to fall back to OpaqueValue at the element or array level.
371
371
  if (!isFinite(sizeHint)) throw new Error(`readFText: unimplemented historyType ${historyType} (no size budget; cannot store raw bytes)`);
372
372
  const remaining = sizeHint - (cursor.pos() - start);
@@ -420,7 +420,7 @@ function writeFText(writer, value) {
420
420
  case 5: writer.writeInt64(sv.value); break;
421
421
  default: throw new Error(`writeFText: unknown FFormatArgumentValue type ${sv.type} in AsNumber`);
422
422
  }
423
- // Legacy uint32 booleans see readFText AsNumber for rationale.
423
+ // Legacy uint32 booleans (see readFText AsNumber for rationale).
424
424
  const hasFormatOptions = value.formatOptions != null;
425
425
  writer.writeUint32(hasFormatOptions ? 1 : 0);
426
426
  if (hasFormatOptions) {
@@ -459,7 +459,7 @@ function writeFText(writer, value) {
459
459
  // preserves the wire's choice between null-form (SaveNum=0, 4 bytes) and
460
460
  // empty-with-terminator (SaveNum=1 plus 1-byte NUL, 5 bytes). The previous
461
461
  // version always wrote `writeFString(this.path)` for ObjectRef and emitted
462
- // a 4-byte null FString even for kind-only values silently inflating the
462
+ // a 4-byte null FString even for kind-only values, silently inflating the
463
463
  // encoded blob by 4 B for every kind-only reference.
464
464
  function readObjectValue(cursor, sizeHint) {
465
465
  const start = cursor.pos();
@@ -471,7 +471,7 @@ function readObjectValue(cursor, sizeHint) {
471
471
  }
472
472
  // Soulmask kind=0x01 (hard actor reference, e.g. HBindBGCompActor on
473
473
  // NPC pawns) prepends a 4-byte field between the kind byte and the
474
- // path FString. Observed value is always 1; semantic unknown — captured
474
+ // path FString. Observed value is always 1; semantic unknown. Captured
475
475
  // verbatim and replayed on write. Without this branch the reader treats
476
476
  // those four bytes as the path FString's SaveNum, which overshoots the
477
477
  // budget and falls back to OpaqueValue (the symptom that hid every
@@ -484,7 +484,7 @@ function readObjectValue(cursor, sizeHint) {
484
484
  }
485
485
  }
486
486
  const pathFS = cursor.readFString();
487
- // Guard against path FStrings whose SaveNum overshoots the value budget
487
+ // Guard against path FStrings whose SaveNum overshoots the value budget:
488
488
  // this happens for properties whose format differs from kind+path+... and
489
489
  // whose first "path" bytes happen to encode a huge length.
490
490
  if (cursor.pos() - start > sizeHint) throw new Error('path FString exceeded value budget');
@@ -598,14 +598,14 @@ function readArrayValue(cursor, tag, sizeHint) {
598
598
  * [u32 stride=64] [u32 count] [count×64 bytes] per-piece aux (bbox + scale-ish floats)
599
599
  *
600
600
  * Returns { header, sections } on success. Returns null (cursor rolled back)
601
- * when the bytes don't match — non-JianZhuInstYuanXings ObjectProperty arrays
601
+ * when the bytes don't match. Non-JianZhuInstYuanXings ObjectProperty arrays
602
602
  * have no such block, so peeking-and-rolling-back keeps them unaffected.
603
603
  *
604
604
  * Verified by in-game experiment 2026-05-18: numElements counts UNIQUE
605
605
  * prototypes (foundation, wall, door frame, …); section 0/1 counts are the
606
606
  * placed-piece count for that prototype; section 2 count is typically that
607
607
  * count or one greater. The earlier "single trailing block after all
608
- * elements" model was wrong these blocks are interleaved per element.
608
+ * elements" model was wrong; these blocks are interleaved per element.
609
609
  */
610
610
  function tryReadObjectArrayPerElementBlock(cursor, endOff) {
611
611
  const start = cursor.pos();
@@ -688,7 +688,7 @@ function readSetValue(cursor, tag) {
688
688
  // Set elements for StructProperty inner type are raw binary structs with no
689
689
  // inner PropertyTag wrapper (unlike ArrayProperty<StructProperty>, which does
690
690
  // have one). Every observed Set<StructProperty> in world.db uses 16-byte Guids
691
- // as elements the same assumption MapProperty makes for Struct keys.
691
+ // as elements: the same assumption MapProperty makes for Struct keys.
692
692
  function readSetElement(cursor, innerType) {
693
693
  if (innerType === 'StructProperty') return FGuid.read(cursor).value;
694
694
  return readArrayElement(cursor, innerType);
@@ -737,7 +737,7 @@ function writeMapValue(writer, tag, value) {
737
737
 
738
738
  /**
739
739
  * Map element (one key or one value) when the map's inner/value type is
740
- * StructProperty Soulmask uses several conventions that diverge from
740
+ * StructProperty: Soulmask uses several conventions that diverge from
741
741
  * stock UE 4.27 here:
742
742
  *
743
743
  * Key (StructProperty) → a raw 16-byte FGuid. The map tag declares
@@ -751,7 +751,7 @@ function writeMapValue(writer, tag, value) {
751
751
  * `GeRenJianZhuYingHuoList`, `GeRenMapRiZhi`)
752
752
  * OR a raw 16-byte FGuid (`PlayerGongHuiMap`,
753
753
  * a player→guild membership lookup).
754
- * We sniff which by peeking ahead — a
754
+ * We sniff which by peeking ahead. A
755
755
  * property stream starts with an FString
756
756
  * length prefix for the first tag's name
757
757
  * (small positive int, body is identifier
@@ -767,7 +767,7 @@ function writeMapValue(writer, tag, value) {
767
767
  * NOT match the actual byte span of the data section (observed:
768
768
  * tag.size=632838, actual=636422 for a populated GongHuiMap). The
769
769
  * decoder advances the cursor based on pair count + per-pair shape,
770
- * NOT the tag.size which is why this works despite the size lie.
770
+ * NOT the tag.size, which is why this works despite the size lie.
771
771
  */
772
772
  function readMapElement(cursor, type, isKey) {
773
773
  if (type !== 'StructProperty') return readArrayElement(cursor, type);
@@ -798,17 +798,25 @@ function writeMapElement(writer, type, value, isKey) {
798
798
 
799
799
  /**
800
800
  * Peek the next bytes of `cursor` (without advancing): do they look like
801
- * the start of a PropertyTag i.e. an FString that names a property?
801
+ * the start of a PropertyTag (i.e. an FString that names a property)?
802
802
  *
803
803
  * A property name FString is:
804
804
  * - int32 SaveNum > 0 and reasonably small (<= 64 chars in Soulmask)
805
805
  * - SaveNum bytes of ANSI body whose last byte is NUL
806
806
  * - body chars (minus NUL) are identifier-safe: A-Z, a-z, 0-9, _.
807
807
  *
808
- * Random GUID bytes effectively never satisfy this the first uint32
808
+ * Random GUID bytes effectively never satisfy this: the first uint32
809
809
  * of a Guid is ~uniform over [0..2^32), and even when it lands in a
810
810
  * "plausible length" range the printable-ASCII + NUL-terminator check
811
811
  * eliminates the false positives.
812
+ *
813
+ * Caveat: we only match ANSI property names (SaveNum > 0). Every Soulmask
814
+ * property name observed in world.db is ASCII, so a negative-SaveNum
815
+ * (UTF-16) tag is currently treated as "not a tag" and the caller falls
816
+ * through to the alternate read path. If a future Soulmask version emits
817
+ * UTF-16 property names inside a Map<Struct,Struct> value, this needs an
818
+ * additional branch matching saveNum < 0 with the equivalent UTF-16
819
+ * identifier-character + NUL-terminator check.
812
820
  */
813
821
  function peekLooksLikePropertyTag(cursor) {
814
822
  if (cursor.remaining() < 8) return false;
@@ -836,6 +844,49 @@ function peekLooksLikePropertyTag(cursor) {
836
844
  // kind+path, kind+path+classPath, or full kind+path+classPath+embedded).
837
845
  // Other inner types have fixed sizes determined by the type itself, so
838
846
  // they ignore the hint.
847
+ //
848
+ // =====================================================================
849
+ // Heuristics preamble: how this reader disambiguates ObjectProperty
850
+ // elements that don't carry a per-element delimiter on the wire.
851
+ //
852
+ // Stock UE ArrayProperty<ObjectProperty> writes a sequence of object
853
+ // values back-to-back with no length tag and no separator between
854
+ // elements. Each element's wire form is one of:
855
+ //
856
+ // (A) kind-only 1 byte
857
+ // (B) kind+path 1 byte + FString
858
+ // (C) kind+path+class 1 byte + FString + FString
859
+ // (D) kind+path+class+embedded property stream (terminated by None)
860
+ //
861
+ // Without per-element bounds we'd read past the element's actual end
862
+ // into either the next element's kind byte or a trailing binary section
863
+ // (origin / placement data) and cascade-fail.
864
+ //
865
+ // We address this with four guards, each cheap and orthogonal:
866
+ //
867
+ // Guard 1: budget exhaustion. After path, if there's no room for even
868
+ // a null-form classPath FString (4 bytes), stop here.
869
+ // Guard 2: implausible saveNum magnitude. A real classPath is short
870
+ // (<= 1024 chars); a peek that decodes to a huge magnitude
871
+ // usually means we're looking at the start of the next
872
+ // element's bytes instead.
873
+ // Guard 3: classPath starts with '/'. Soulmask asset paths are always
874
+ // "/Script/..." or "/Game/...". A peek whose first content
875
+ // byte isn't '/' (or '/' '\0' for UTF-16) is the next
876
+ // element's payload, not a real classPath.
877
+ // Guard 4: embedded-stream signature. The bytes following classPath
878
+ // either start a PropertyTag (identifier-character name with
879
+ // a small ANSI SaveNum) or they don't; if they don't, the
880
+ // element ends without an embedded stream.
881
+ //
882
+ // The same logic governs whether a 4-byte trailer at the element's tail
883
+ // is consumed: only when the next 4 bytes are 0x00000000 (FName.Number)
884
+ // AND we're still within budget.
885
+ //
886
+ // In practice this catches every known Soulmask actor in the tested
887
+ // world.db. Adding a new game-specific element shape means adding a
888
+ // new guard, not relaxing the existing ones.
889
+ // =====================================================================
839
890
  function readArrayElement(cursor, innerType, sizeHint = Infinity) {
840
891
  switch (innerType) {
841
892
  case 'IntProperty': return cursor.readInt32();
@@ -864,7 +915,7 @@ function readArrayElement(cursor, innerType, sizeHint = Infinity) {
864
915
  case 'WSObjectProperty': {
865
916
  // Bounded read, mirroring readObjectValue. The variable wire shapes
866
917
  // (kind-only, +path, +path+classPath, +embedded) are disambiguated by
867
- // sizeHint without the bound we'd read past the element into the
918
+ // sizeHint. Without the bound we'd read past the element into the
868
919
  // next property's tag, which causes catastrophic cascade failures
869
920
  // (cf. ChengHaoList in serial 92 and friends). Capture per-FString
870
921
  // isNull flags for byte-identical round-trip of empty wire-FStrings.
@@ -878,7 +929,7 @@ function readArrayElement(cursor, innerType, sizeHint = Infinity) {
878
929
  // kind byte with no path, classPath, or embedded stream. Without this
879
930
  // early-out, the FString reader would interpret the next 4 bytes (which
880
931
  // belong to either the next element or the trailing binary section)
881
- // as a path saveNum typically a huge garbage value that overshoots
932
+ // as a path saveNum: typically a huge garbage value that overshoots
882
933
  // the array. Seen in JianZhuInstYuanXings, ZhuangBeiLanDaoJuJiYiList,
883
934
  // and KuaiJieLanDaoJuJiYiList trailing slots.
884
935
  if (kind === 0) {
@@ -915,7 +966,7 @@ function readArrayElement(cursor, innerType, sizeHint = Infinity) {
915
966
  // "/Script/Module.Class" or "/Game/...". The first content character
916
967
  // is therefore "/" (0x2F). When the bytes after path are actually the
917
968
  // start of the NEXT element (kind byte + optional kindOnePrefix +
918
- // path saveNum), peekSN can fall in the [-1024, 1024] range e.g.
969
+ // path saveNum), peekSN can fall in the [-1024, 1024] range, e.g.
919
970
  // bytes `01 01 00 00` from a kind=1 element with kindOnePrefix=1 read
920
971
  // as int32 = 257. The previous saveNum-magnitude guard misses this;
921
972
  // checking the first content byte for '/' catches it cleanly. Allow
@@ -947,7 +998,7 @@ function readArrayElement(cursor, innerType, sizeHint = Infinity) {
947
998
  });
948
999
  }
949
1000
  // Guard 4: embedded-stream presence. Same problem as the classPath
950
- // guards but one level deeper when the element has classPath but
1001
+ // guards but one level deeper: when the element has classPath but
951
1002
  // NO embedded stream, the bytes that follow classPath are the start
952
1003
  // of the NEXT element (kind byte) or the trailing binary section
953
1004
  // (12 zero bytes of origin). An embedded stream begins with a
@@ -1103,7 +1154,7 @@ export function writePropertyStream(writer, properties, emitTerminatorTrailer =
1103
1154
  // StructValue.write) to avoid needing a writePropertyStream re-export.
1104
1155
  // `emitTerminatorTrailer` defaults to false because nested streams in
1105
1156
  // stock UE 4.27 don't carry the 4-byte FName.Number trailer that the
1106
- // outermost stream does but some Soulmask embedded ObjectProperty
1157
+ // outermost stream does. But some Soulmask embedded ObjectProperty
1107
1158
  // streams DO (see ObjectRef.hasTerminatorTrailer / readObjectValue's
1108
1159
  // trailer-skip detection), so callers can opt in.
1109
1160
  export function writeNestedPropertyStream(writer, properties, emitTerminatorTrailer = false) {
package/structs.mjs CHANGED
@@ -3,16 +3,20 @@
3
3
  *
4
4
  * Soulmask is UE 4.27 so "core" structs (Vector etc.) use 32-bit floats.
5
5
  * Known struct names read directly as binary; unknown struct names fall
6
- * through to a nested property stream (handled in properties.mjs, not here
7
- * StructValue.read is supplied a `streamReader` callback to avoid a
6
+ * through to a nested property stream (handled in properties.mjs, not here;
7
+ * StructValue.read is supplied a `streamReader` callback to avoid a
8
8
  * load-order cycle).
9
9
  */
10
10
 
11
11
  import { FGuid } from './primitives.mjs';
12
12
 
13
- // Binary struct handlers. Each entry has read(cursor) plain object and
13
+ // Binary struct handlers. Each entry has read(cursor) -> plain object and
14
14
  // write(writer, plainObject). The plain object is what callers see as
15
15
  // `structValue.value` when the struct is one of these known shapes.
16
+ //
17
+ // Consumers can extend this registry to teach the codec about additional
18
+ // known-binary structs; prefer the `registerStructHandler` helper below over
19
+ // mutating this object directly, since it validates the handler shape.
16
20
  export const STRUCT_HANDLERS = {
17
21
  Vector: { read: c => ({ x: c.readFloat32(), y: c.readFloat32(), z: c.readFloat32() }),
18
22
  write: (w, v) => { w.writeFloat32(v.x); w.writeFloat32(v.y); w.writeFloat32(v.z); } },
@@ -24,12 +28,20 @@ export const STRUCT_HANDLERS = {
24
28
  write: (w, v) => { w.writeFloat32(v.pitch); w.writeFloat32(v.yaw); w.writeFloat32(v.roll); } },
25
29
  Quat: { read: c => ({ x: c.readFloat32(), y: c.readFloat32(), z: c.readFloat32(), w: c.readFloat32() }),
26
30
  write: (w, v) => { w.writeFloat32(v.x); w.writeFloat32(v.y); w.writeFloat32(v.z); w.writeFloat32(v.w); } },
31
+ // FColor wire order is B, G, R, A (not R, G, B, A). This matches UE4's
32
+ // FColor::Serialize, where the in-memory union exposes the bytes in BGRA
33
+ // order to match Windows DIB / DirectX texture layout. Don't "fix" the
34
+ // ordering; it's correct as-is.
27
35
  Color: { read: c => ({ b: c.readUint8(), g: c.readUint8(), r: c.readUint8(), a: c.readUint8() }),
28
36
  write: (w, v) => { w.writeUint8(v.b); w.writeUint8(v.g); w.writeUint8(v.r); w.writeUint8(v.a); } },
29
37
  LinearColor: { read: c => ({ r: c.readFloat32(), g: c.readFloat32(), b: c.readFloat32(), a: c.readFloat32() }),
30
38
  write: (w, v) => { w.writeFloat32(v.r); w.writeFloat32(v.g); w.writeFloat32(v.b); w.writeFloat32(v.a); } },
31
- Guid: { read: c => FGuid.read(c).value,
32
- write: (w, v) => new FGuid(v).write(w) },
39
+ // Guid returns an FGuid INSTANCE (not a bare string). FGuid carries
40
+ // toJSON/equals/isZero helpers; the write path accepts FGuid or a bare
41
+ // 8-4-4-4-12 string for backward compatibility with code that built the
42
+ // struct value from a literal.
43
+ Guid: { read: c => FGuid.read(c),
44
+ write: (w, v) => FGuid.from(v).write(w) },
33
45
  DateTime: { read: c => c.readInt64().toString(),
34
46
  write: (w, v) => w.writeInt64(v) },
35
47
  Timespan: { read: c => c.readInt64().toString(),
@@ -48,6 +60,27 @@ export const STRUCT_HANDLERS = {
48
60
  write: (w, v) => { STRUCT_HANDLERS.Quat.write(w, v.rotation); STRUCT_HANDLERS.Vector.write(w, v.translation); STRUCT_HANDLERS.Vector.write(w, v.scale3D); } },
49
61
  };
50
62
 
63
+ /**
64
+ * Register (or replace) a struct handler. Callers can use this to teach the
65
+ * codec about additional binary structs the game emits that aren't in the
66
+ * stock registry. Without a handler, an unknown struct name falls through to
67
+ * the nested-property-stream path; that's still correct when the struct is
68
+ * actually tagged, and is byte-identical on round-trip via OpaqueValue when
69
+ * it isn't.
70
+ *
71
+ * Validates that `handler` has both `read(cursor)` and `write(writer, value)`
72
+ * functions and that `name` is a non-empty string.
73
+ */
74
+ export function registerStructHandler(name, handler) {
75
+ if (typeof name !== 'string' || name.length === 0) {
76
+ throw new TypeError('registerStructHandler: name must be a non-empty string');
77
+ }
78
+ if (!handler || typeof handler.read !== 'function' || typeof handler.write !== 'function') {
79
+ throw new TypeError('registerStructHandler: handler must expose read(cursor) and write(writer, value) functions');
80
+ }
81
+ STRUCT_HANDLERS[name] = handler;
82
+ }
83
+
51
84
  export class StructValue {
52
85
  constructor(structName, { value = null, terminated = false, decodeError = null, opaqueTail = null } = {}) {
53
86
  this._structName = structName;
@@ -70,7 +103,7 @@ export class StructValue {
70
103
  * PropertyTag (FString name with identifier-character ASCII content).
71
104
  * When supplied and the wire bytes look tagged, the read switches to
72
105
  * the property-stream path even for structs that have a known binary
73
- * handler Soulmask encodes known-binary structs (Transform, Box, ...)
106
+ * handler: Soulmask encodes known-binary structs (Transform, Box, ...)
74
107
  * as TAGGED property streams inside Map struct values, which would
75
108
  * otherwise be misread as raw 40-byte Transforms / etc. The decision
76
109
  * is recorded on the returned StructValue via `Array.isArray(value)`,
package/values.mjs CHANGED
@@ -1,9 +1,9 @@
1
1
  /**
2
2
  * Wrapper classes for non-trivial property-value shapes:
3
- * ObjectRef ObjectProperty / ClassProperty / Weak / Lazy
4
- * SoftObjectRef SoftObjectProperty / SoftClassProperty
5
- * FTextValue TextProperty (FText: localized / culture-invariant string)
6
- * OpaqueValue bytes we don't decode (fallback for unknown/unimplemented)
3
+ * ObjectRef : ObjectProperty / ClassProperty / Weak / Lazy
4
+ * SoftObjectRef : SoftObjectProperty / SoftClassProperty
5
+ * FTextValue : TextProperty (FText: localized / culture-invariant string)
6
+ * OpaqueValue : bytes we don't decode (fallback for unknown/unimplemented)
7
7
  *
8
8
  * Array/Set/Map values live in properties.mjs because they're tightly
9
9
  * coupled to PropertyTag (struct arrays carry an inner tag).
@@ -16,7 +16,7 @@
16
16
  import { writeNestedPropertyStream } from './properties.mjs';
17
17
 
18
18
  /**
19
- * Soulmask ObjectProperty value layout. Each field is optional the wire
19
+ * Soulmask ObjectProperty value layout. Each field is optional; the wire
20
20
  * shape is bounded by the property tag's size budget and the reader stops
21
21
  * at whichever boundary it hits first:
22
22
  *
@@ -25,7 +25,7 @@ import { writeNestedPropertyStream } from './properties.mjs';
25
25
  * 4-byte field sitting between the kind byte
26
26
  * and pathFS. Observed value is always 1; the
27
27
  * semantic meaning is unclear (a flag, an
28
- * FName.Number, or a count) we capture and
28
+ * FName.Number, or a count); we capture and
29
29
  * replay it verbatim for byte-identical
30
30
  * round-trip. Seen on hard actor references
31
31
  * like NPC `HBindBGCompActor` (the pawn's
@@ -67,7 +67,7 @@ export class ObjectRef {
67
67
  this.terminated = terminated;
68
68
  // When true, the embedded property stream was followed by a 4-byte
69
69
  // FName.Number trailer (the outermost-stream None-trailer convention,
70
- // applied here by Soulmask to some nested ObjectProperty embeddeds
70
+ // applied here by Soulmask to some nested ObjectProperty embeddeds,
71
71
  // e.g. JianZhuInstGLQComponent). The reader detects this when exactly
72
72
  // 4 trailing bytes remain inside the tag's size budget; the writer
73
73
  // replays them so round-trip stays byte-identical.
@@ -118,28 +118,28 @@ export class SoftObjectRef {
118
118
  * UE4 FText wire format:
119
119
  * uint32 Flags
120
120
  * int8 HistoryType
121
- * HistoryType -1 (None / culture-invariant):
121
+ * HistoryType -1 (None / culture-invariant):
122
122
  * int32 bHasCultureInvariantString
123
123
  * [FString displayString] (only when bHasCultureInvariantString != 0)
124
- * HistoryType 0 (Base / localized):
124
+ * HistoryType 0 (Base / localized):
125
125
  * FString Namespace
126
126
  * FString Key
127
127
  * FString SourceString
128
- * HistoryType 2 (OrderedFormat):
129
- * FText SourceFmt the format pattern, e.g. "{0} < {1} >"
128
+ * HistoryType 2 (OrderedFormat):
129
+ * FText SourceFmt (the format pattern, e.g. "{0} < {1} >")
130
130
  * int32 NumArguments
131
131
  * for each: int8 ContentType + value
132
132
  * 0=Int(int64) 1=UInt(uint64) 2=Float(f32) 3=Double(f64)
133
133
  * 4=Text(FText) 5=Gender(int8)
134
- * HistoryType 4 (AsNumber, FTextHistory_AsNumber):
134
+ * HistoryType 4 (AsNumber, FTextHistory_AsNumber):
135
135
  * FFormatArgumentValue SourceValue (int8 type + value-by-type)
136
136
  * uint32 bHasFormatOptions ← legacy UE3-style 4-byte bool, NOT 1-byte
137
137
  * [FNumberFormattingOptions FormatOptions]
138
138
  * uint32 bHasCulture ← also a uint32 bool
139
139
  * [FString TargetCulture]
140
140
  * FNumberFormattingOptions = AlwaysSign(uint32) + UseGrouping(uint32) +
141
- * RoundingMode(int8) + 4 × int32 digit-count fields.
142
- * All other types: remaining bytes stored in _raw for verbatim round-trip.
141
+ * RoundingMode(int8) + 4 x int32 digit-count fields.
142
+ * All other types: remaining bytes stored in _raw for verbatim round-trip.
143
143
  */
144
144
  export class FTextValue {
145
145
  constructor({
@@ -195,20 +195,21 @@ export class FTextValue {
195
195
  }
196
196
 
197
197
  /**
198
- * Holds raw bytes we couldn't (or wouldn't) decode. `reason` is for
199
- * debugging only; encoding writes the bytes back verbatim.
198
+ * Holds raw bytes we couldn't (or wouldn't) decode. `reason` is a free-form
199
+ * string for debugging only; encoding writes the bytes back verbatim, so a
200
+ * value the codec couldn't parse still round-trips byte-identical.
200
201
  *
201
- * Two access paths intentionally co-exist:
202
- * - `.bytes` / `.reason` the canonical getter API.
203
- * - `._opaque` / `._opaqueReason` backing-store fields, also publicly
204
- * readable for consumers that pre-date the getter API.
202
+ * OpaqueValue is the codec's universal fallback for everything from unknown
203
+ * property types to mid-decode recoveries (Struct/Array/Set/Map/Text whose
204
+ * inner shape didn't parse cleanly). The reader's contract is: on any
205
+ * structural failure inside a finite byte budget, rewind to the value's
206
+ * start and capture the budget verbatim into an OpaqueValue, so the outer
207
+ * stream stays byte-aligned regardless of what went wrong inside.
205
208
  */
206
209
  export class OpaqueValue {
207
210
  constructor(bytes, reason = null) {
208
- this._opaque = bytes;
209
- if (reason) this._opaqueReason = reason;
211
+ this.bytes = bytes;
212
+ this.reason = reason;
210
213
  }
211
- get bytes() { return this._opaque; }
212
- get reason() { return this._opaqueReason ?? null; }
213
- write(writer) { writer.writeBytes(this._opaque); }
214
+ write(writer) { writer.writeBytes(this.bytes); }
214
215
  }
package/wscodec.mjs CHANGED
@@ -1,9 +1,9 @@
1
1
  /**
2
- * wscodec pure-JS codec for Soulmask actor_data property streams.
2
+ * wscodec: pure-JS codec for Soulmask actor_data property streams.
3
3
  *
4
4
  * The library accepts uncompressed bytes (the payload that comes out of
5
5
  * Soulmask's outer LZ4 wrapper) and returns a JavaScript object tree, and
6
- * vice versa. It has zero runtime dependencies LZ4 handling, SQLite
6
+ * vice versa. It has zero runtime dependencies; LZ4 handling, SQLite
7
7
  * access, etc. are the caller's responsibility.
8
8
  *
9
9
  * Wire layout (the bytes accepted by `UnrealBlob.decode` and produced by
@@ -18,14 +18,14 @@
18
18
  * The SQLite `actor_table.data_version` column stores the NEGATIVE of the
19
19
  * wire-format DataVersion. A healthy blob with DataVersion=2 lives in a row
20
20
  * whose `data_version` column reads -2. The wire bytes themselves are
21
- * always the unsigned 0x00000002 the negation is purely a column-side
21
+ * always the unsigned 0x00000002; the negation is purely a column-side
22
22
  * convention.
23
23
  *
24
24
  * Round-trip safety: when `_dirty` is false, `serialize` returns the
25
25
  * original input bytes verbatim. When `_dirty` is true, it re-emits the
26
26
  * property stream from scratch via `writePropertyStream`. Both paths are
27
27
  * verified byte-identical against every row in a tested world.db
28
- * (174.6 MB, 11,667 rows; `npm test`).
28
+ * (`npm test`).
29
29
  *
30
30
  * Re-exports the most commonly used types so callers can do
31
31
  * import { UnrealBlob, FName, FGuid, ObjectRef, ... } from 'wscodec';
@@ -38,7 +38,7 @@ import { readPropertyStream, writePropertyStream } from './properties.mjs';
38
38
  // Convenience re-exports for the public API surface.
39
39
  export { Cursor, Writer } from './io.mjs';
40
40
  export { FName, FGuid } from './primitives.mjs';
41
- export { StructValue, STRUCT_HANDLERS } from './structs.mjs';
41
+ export { StructValue, STRUCT_HANDLERS, registerStructHandler } from './structs.mjs';
42
42
  export { ObjectRef, SoftObjectRef, FTextValue, OpaqueValue } from './values.mjs';
43
43
  export {
44
44
  PropertyTag, Property,
@@ -69,10 +69,25 @@ export class UnrealBlob {
69
69
  this._dirty = false;
70
70
  }
71
71
 
72
- get kind() { return NAME; }
72
+ /**
73
+ * Codec-adapter name (`'unreal-properties'`). Surfaced for registries that
74
+ * dispatch on a codec's `kind` field; matches the `name` on the bare
75
+ * `codec` adapter exported at the bottom of this module.
76
+ */
77
+ get kind() { return NAME; }
78
+
79
+ /**
80
+ * Number of bytes the blob was decoded from (`_raw.length`), or 0 if the
81
+ * blob was constructed without an input buffer. NOT the post-serialize
82
+ * size; for that, call `serialize().length`.
83
+ */
73
84
  get totalSize() { return this._raw ? this._raw.length : 0; }
74
85
 
75
- /** First top-level property with the given name, or null. */
86
+ /**
87
+ * First TOP-LEVEL property with the given tag name, or null. Does NOT
88
+ * traverse into embedded streams, struct values, array elements, or map
89
+ * entries. Use `findPropertyDeep` to walk the full tree.
90
+ */
76
91
  findProperty(propName) {
77
92
  for (const p of this.properties) {
78
93
  if (p.tag && p.tag.name && p.tag.name.value === propName) return p;
@@ -80,6 +95,23 @@ export class UnrealBlob {
80
95
  return null;
81
96
  }
82
97
 
98
+ /**
99
+ * First property with the given tag name found anywhere in the property
100
+ * tree, or null. Performs a depth-first traversal across:
101
+ *
102
+ * - top-level properties
103
+ * - ObjectRef.embedded streams (nested ObjectProperty values)
104
+ * - StructValue.value when it's a tagged property array
105
+ * - ArrayProperty / SetProperty struct elements
106
+ * - MapProperty entries: both key (if StructValue) and value
107
+ *
108
+ * Returns the first match in traversal order; later matches are not
109
+ * surfaced. For all matches, walk the tree manually.
110
+ */
111
+ findPropertyDeep(propName) {
112
+ return _findPropertyDeep(this.properties, propName);
113
+ }
114
+
83
115
  static detect(u8) {
84
116
  if (!u8 || u8.length < VERSION_HEADER_SIZE) return false;
85
117
  const dv = new DataView(u8.buffer, u8.byteOffset, u8.byteLength);
@@ -90,7 +122,7 @@ export class UnrealBlob {
90
122
  * Parse uncompressed property-stream bytes into an UnrealBlob.
91
123
  *
92
124
  * On unrecoverable structural failure the returned blob has `error` set
93
- * and `properties` empty callers that need a hard failure should check
125
+ * and `properties` empty. Callers that need a hard failure should check
94
126
  * `blob.error` after decode.
95
127
  */
96
128
  static decode(u8) {
@@ -125,14 +157,27 @@ export class UnrealBlob {
125
157
  /**
126
158
  * Return the uncompressed property-stream bytes for this blob.
127
159
  *
128
- * Pass-through when `_dirty` is false: returns the input bytes verbatim.
129
- * Re-encodes from `properties` when `_dirty` is true. `bodyTrailing`, if
130
- * present, is appended after the None terminator + 4-byte FName.Number
160
+ * Pass-through when `_dirty` is false: returns the input bytes verbatim,
161
+ * even if `error` is set (the original bytes round-trip even when decode
162
+ * was incomplete). Re-encodes from `properties` when `_dirty` is true,
163
+ * appending `bodyTrailing` after the None terminator + 4-byte FName.Number
131
164
  * trailer that `writePropertyStream` emits.
165
+ *
166
+ * Throws if `_dirty` is true AND `error` is set: re-emitting would produce
167
+ * a malformed stream (the property tree is empty after a structural
168
+ * failure). Clear `.error` first if you intentionally want to emit from
169
+ * an externally-constructed properties array.
132
170
  */
133
171
  serialize() {
134
172
  if (!this._dirty && this._raw instanceof Uint8Array) return this._raw;
135
173
 
174
+ if (this.error != null) {
175
+ throw new Error(
176
+ `UnrealBlob.serialize: cannot re-emit a blob with decode error (${this.error}). ` +
177
+ `Leave _dirty=false to pass through _raw verbatim, or clear .error if you've replaced .properties manually.`
178
+ );
179
+ }
180
+
136
181
  const w = new Writer(this._raw?.length || 256);
137
182
  w.writeUint32(this.versionTag);
138
183
  writePropertyStream(w, this.properties, /*emitTerminatorTrailer=*/true);
@@ -143,6 +188,59 @@ export class UnrealBlob {
143
188
  }
144
189
  }
145
190
 
191
+ // Deep-search helper. Walks the property tree in depth-first order and
192
+ // returns the first Property whose tag.name matches. Kept out of the class
193
+ // body so the recursion can reach into nested shapes uniformly without
194
+ // having to thread `this` around.
195
+ function _findPropertyDeep(properties, propName) {
196
+ if (!Array.isArray(properties)) return null;
197
+ for (const p of properties) {
198
+ if (p.tag && p.tag.name && p.tag.name.value === propName) return p;
199
+ const v = p.value;
200
+ if (v == null) continue;
201
+ // ObjectRef with embedded property stream.
202
+ if (v.embedded) {
203
+ const hit = _findPropertyDeep(v.embedded, propName);
204
+ if (hit) return hit;
205
+ }
206
+ // StructValue: .value is either a property array (unknown struct) or a
207
+ // plain binary record (known struct). Only the array form is searchable.
208
+ if (v._structName && Array.isArray(v.value)) {
209
+ const hit = _findPropertyDeep(v.value, propName);
210
+ if (hit) return hit;
211
+ }
212
+ // ArrayProperty / SetProperty struct elements + ObjectRef embeddeds.
213
+ if (Array.isArray(v.elements)) {
214
+ for (const e of v.elements) {
215
+ if (e && e._structName && Array.isArray(e.value)) {
216
+ const hit = _findPropertyDeep(e.value, propName);
217
+ if (hit) return hit;
218
+ }
219
+ if (e && e.embedded) {
220
+ const hit = _findPropertyDeep(e.embedded, propName);
221
+ if (hit) return hit;
222
+ }
223
+ }
224
+ }
225
+ // MapProperty entries: both key (if StructValue) and value can hold a
226
+ // nested property stream.
227
+ if (Array.isArray(v.entries)) {
228
+ for (const ent of v.entries) {
229
+ if (ent.key && ent.key._structName && Array.isArray(ent.key.value)) {
230
+ const hit = _findPropertyDeep(ent.key.value, propName);
231
+ if (hit) return hit;
232
+ }
233
+ const ev = ent.value;
234
+ if (ev && ev._structName && Array.isArray(ev.value)) {
235
+ const hit = _findPropertyDeep(ev.value, propName);
236
+ if (hit) return hit;
237
+ }
238
+ }
239
+ }
240
+ }
241
+ return null;
242
+ }
243
+
146
244
  // Generic codec-adapter shape (name + detect + decode + encode), suitable
147
245
  // for plugging into any registry that uses that quartet. Operates on the
148
246
  // uncompressed bytes that `UnrealBlob.decode` accepts; for callers reading