@zio.dev/zio-blocks 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,282 @@
1
+ ---
2
+ id: registers
3
+ title: "Register System"
4
+ sidebar_label: "Register"
5
+ ---
6
+
7
+ The register system is one of the key innovations in ZIO Blocks (ZIO Schema 2) that enables **zero-allocation, box-free construction and deconstruction** of data types.
8
+
9
+ ## The Problem: Boxing/Unboxing Overheads
10
+
11
+ When building generic abstractions over data types (like serialization libraries), you need to describe all possible constructions and deconstructions uniformly. The traditional approach uses tuples and boxed primitives. For example assume we have a simple record data type:
12
+
13
+ ```scala mdoc:compile-only
14
+ case class Person(name: String, age: Int)
15
+ ```
16
+
17
+ A traditional library might represent it as tuple when serializing/deserializing:
18
+
19
+ ```scala mdoc:compile-only
20
+ trait Tuple
21
+ case class Tuple2[A, B]( _1: A, _2: B) extends Tuple
22
+ case class Tuple3[A, B, C](_1: A, _2: B, _3: C) extends Tuple
23
+ // ...
24
+ ```
25
+
26
+ Tuple is generic data structure that can hold values of any type. So serializing `Person` would involve converting it to/from `Tuple2[String, Int]`:
27
+
28
+ ```scala mdoc:compile-only
29
+ case class Person(name: String, age: Int)
30
+ val person = Person("john", 42)
31
+
32
+ // Generic construction via tuples:
33
+ val tuple: (String, Int) = ("john", 42) // Tuple2[String, Int]
34
+ ```
35
+
36
+ The problem is that `Int` is a primitive type, and in order to fit it into a tuple, it must be **boxed** into `java.lang.Integer`. Why? Because tuples can only hold references to objects, not raw primitive values. They are generic containers that work uniformly for any type.
37
+
38
+ So the actual memory representation of the tuple looks like this:
39
+
40
+ ```text
41
+ Stack: Heap:
42
+
43
+
44
+ ┌────────────────────────┐ ┌─────────────────┐
45
+ ┌─────────────┐ │ Tuple2 object │ │ String "john" │
46
+ │ tuple (ref) │─────▶│ _1: ────────────────────────▶│ char[]/byte[]: ───────▶[j][o][h][n]
47
+ └───────────── │ _2: ──────┐ │ └─────────────────┘
48
+ └────────────────────────┘
49
+
50
+
51
+ ┌─────────────────┐
52
+ │ Integer object │
53
+ │ value: 42 │ ← 4 bytes for int
54
+ └─────────────────┘
55
+ ```
56
+
57
+ This boxing creates significant runtime overhead because:
58
+
59
+ 1. **Primitive boxing**: Values like `Int`, `Long`, `Double` must be wrapped in heap-allocated objects (`java.lang.Integer`, etc.)
60
+ 2. **Tuple allocation**: All constructor arguments get wrapped in tuple objects. This creates extra allocations for every construction/deconstruction.
61
+ 3. **Garbage collection pressure**: Each serialization/deserialization creates temporary objects
62
+
63
+ ## The Solution: Register-Based Architecture
64
+
65
+ ZIO Blocks introduces a novel register-based design that completely eliminates tupling and boxing:
66
+
67
+ > "Zero allocation, zero boxing, construction and deconstruction of records. It doesn't get faster than this. You can't make generic code in Scala faster than what's been done here."
68
+ > — John De Goes, LambdaConf 2025
69
+
70
+ Instead of tuples, ZIO Blocks uses the `Registers` data structure, which contains:
71
+
72
+ 1. A **byte array** for storing primitives (Int, Long, Double, Float, Boolean, Byte, Char, Short)
73
+ 2. An **object array** for storing references to heap-allocated objects (AnyRef, which is the supertype of all reference types in Scala including String, custom classes, etc.)
74
+
75
+ This classification determines where values are stored in the `Registers` data structure. All primitive types use the same `bytes` register to store raw values, and all reference types use the same `objects` register to store references:
76
+
77
+ ```scala
78
+ // Conceptual structure of Registers
79
+ class Registers {
80
+ var bytes: Array[Byte] = new Array[Byte](byteArrayLength) // Stores all primitives efficiently
81
+ var objects: Array[AnyRef] = new Array[AnyRef](objectArrayLength) // Stores object references
82
+ }
83
+ ```
84
+
85
+ The `Registers` class is a mutable data structure that serves as an intermediate buffer for **construction** and **deconstruction** operations. So it has methods to set and get values for each primitive type and for object references:
86
+
87
+ ```scala
88
+ class Registers {
89
+ private var bytes: Array[Byte] = new Array[Byte](byteArrayLength)
90
+ private var objects: Array[AnyRef] = new Array[AnyRef](objectArrayLength)
91
+
92
+ // Methods to get/set primitive values in byte array (getInt/setInt, getBoolean/setBoolean, etc.)
93
+ def getInt(offset: RegisterOffset): Int = ???
94
+ def setInt(offset: RegisterOffset, value: Int): Unit = ???
95
+
96
+ def setBoolean(offset: RegisterOffset, value: Boolean): Unit = ???
97
+ def getBoolean(offset: RegisterOffset): Boolean = ???
98
+
99
+ // Two methods to get/set object references in object array
100
+ def getObject(offset: RegisterOffset): AnyRef = ???
101
+ def setObject(offset: RegisterOffset, value: AnyRef): Unit = ???
102
+ }
103
+ ```
104
+
105
+ This design allows primitives to be stored directly in their native binary representation without boxing, while objects are stored as simple references. So, when you use registers, the library adds zero overhead to construction and deconstruction — no tuples and no boxing of primitives.
106
+
107
+ ```
108
+ ┌─────────────────────────────────────────────────────────────────┐
109
+ │ Registers │
110
+ ├─────────────────────────────────────────────────────────────────┤
111
+ │ Byte Array (primitives) │ Object Array (references) │
112
+ │ ┌───┬───┬───┬───┬───┬───┬───┐ │ ┌───────┬───────┬───────┐ │
113
+ │ │ B │ B │ S │ S │ I │ I │...│ │ │ Obj0 │ Obj1 │ Obj2 │ │
114
+ │ └───┴───┴───┴───┴───┴───┴───┘ │ └───────┴───────┴───────┘ │
115
+ │ (raw bytes, no boxing) │ (String, etc.) │
116
+ └─────────────────────────────────────────────────────────────────┘
117
+ ```
118
+
119
+ One powerful aspect of the register system is that you can reuse registers. If you reuse registers, then without any allocation you can construct and deconstruct things all day long. This is particularly valuable in high-throughput scenarios like:
120
+ - Deserializing streams of records
121
+ - Batch processing operations
122
+ - Real-time data pipelines
123
+
124
+ ## RegisterOffset: Tracking Positions
125
+
126
+ `RegisterOffset` is a compact way to track positions within the `Registers` structure. It uses a single `Int` to encode two pieces of information:
127
+ 1. Byte offset (for primitives) in the upper 16 bits
128
+ 2. Object offset (for references) in the lower 16 bits
129
+
130
+ You can think of `RegisterOffset` as a cursor that tells you where to read/write the next primitive or object value. For example:
131
+ 1. `RegisterOffset.Zero` represents the starting position with zero indexes for both primitives and objects.
132
+ 2. `RegisterOffset(objects = 1)` indicates that the next object reference should be stored at index 1 in the object array, with zero byte offset for primitives.
133
+ 3. `RegisterOffset(bytes = 10, objects = 3)` indicates that the next primitive value should be retrieved/stored starting at byte index 10 in the byte array, and the next object reference should be stored at index 3 in the object array.
134
+ 4. `RegisterOffset(bytes = 4, ints = 2, objects = 3)` indicates that the next primitive value should be retrieved/stored starting at byte index 8 (4 bytes + 2 ints × 4 bytes each) in the byte array, and the next object reference should be stored at index 3 in the object array.
135
+
136
+ The byte offset is calculated by weighting each type by its size:
137
+
138
+ ```
139
+ primitiveBytes = booleans + bytes
140
+ + (chars + shorts) × 2
141
+ + (floats + ints) × 4
142
+ + (doubles + longs) × 8
143
+ ```
144
+
145
+ The object offset is simply the count of object references.
146
+
147
+ You don't need to do these calculations manually; the `RegisterOffset.getBytes` method computes the byte offset of the primitive register, and `RegisterOffset.getObjects` computes the offset of the object register.
148
+
149
+ ```scala
150
+ RegisterOffset.getBytes(RegisterOffset(bytes = 4, ints = 2, objects = 3)) // output = (4 + 2*4) = 12
151
+ RegisterOffset.getObjects(RegisterOffset(bytes = 4, ints = 2, objects = 3)) // output = 3
152
+ ```
153
+
154
+ ## Creating Registers
155
+
156
+ `Registers` is a mutable container that holds values. It's created with an initial capacity:
157
+
158
+ ```scala mdoc:compile-only
159
+ import zio.blocks.schema.binding._
160
+ import zio.blocks.schema.binding.RegisterOffset._
161
+
162
+ // Create registers with space for 3 bytes, 1 ints, and 2 objects
163
+ val registers = Registers(RegisterOffset(bytes = 3, ints = 1, objects = 2))
164
+ ```
165
+
166
+ While creating `Registers`, you specify how much space is needed for primitives and objects, using `RegisterOffset`. The library allocates the necessary arrays internally. After creation, when you set values, the library ensures that it has sufficient capacity, growing the arrays if necessary.
167
+
168
+ ## Setting and Getting Values
169
+
170
+ The `Registers` data type has methods to set and get values for each primitive type and for object references. For example, the `setInt` method sets an `Int` value in the byte array, while `setObject` sets an object reference in the object array:
171
+
172
+ ```scala
173
+ class Registers {
174
+ private var bytes: Array[Byte] = new Array[Byte](byteArrayLength)
175
+ private var objects: Array[AnyRef] = new Array[AnyRef](objectArrayLength)
176
+
177
+ // Methods to get/set primitive values in byte array (getInt/setInt, getBoolean/setBoolean, etc.)
178
+ def getInt(offset: RegisterOffset): Int = ???
179
+ def setInt(offset: RegisterOffset, value: Int): Unit = ???
180
+
181
+ def setBoolean(offset: RegisterOffset, value: Boolean): Unit = ???
182
+ def getBoolean(offset: RegisterOffset): Boolean = ???
183
+
184
+ // Two methods to get/set object references in object array
185
+ def getObject(offset: RegisterOffset): AnyRef = ???
186
+ def setObject(offset: RegisterOffset, value: AnyRef): Unit = ???
187
+ }
188
+ ```
189
+
190
+ Here are all the available methods:
191
+ - `setBoolean` / `getBoolean`
192
+ - `setByte` / `getByte`
193
+ - `setShort` / `getShort`
194
+ - `setInt` / `getInt`
195
+ - `setLong` / `getLong`
196
+ - `setFloat` / `getFloat`
197
+ - `setDouble` / `getDouble`
198
+ - `setChar` / `getChar`
199
+ - `setObject` / `getObject`
200
+
201
+ When setting or getting a value, you are required to provide one or two of the following parameters:
202
+
203
+ 1. `offset`: The `RegisterOffset` indicating where the specific field starts in the `Registers`. It can be used to handle nested structures to point to where inner record starts, or for multiple records to point to each record's starting position, or for variant types to anchor the position of different cases.
204
+ 2. `value`: The actual value to set. For the primitives, this value is a typed primitive (e.g. `Int`, `Double`); for objects, this is an `AnyRef`.
205
+
206
+ ## Example: Encoding/Decoding a Record Data Type
207
+
208
+ Encoding data instances into registers involves mapping each field of a data type to its corresponding position in the two register arrays (byte array for primitives and object array for references). For example, assume we have a `Person` data type as below:
209
+
210
+ ```scala mdoc:silent
211
+ case class Person(
212
+ name: String,
213
+ email: String,
214
+ age: Int,
215
+ height: Double,
216
+ weight: Double
217
+ )
218
+ ```
219
+
220
+ We can encode it with registers, as follows:
221
+
222
+ ```scala mdoc:silent
223
+ import zio.blocks.schema.binding._
224
+ import zio.blocks.schema.binding.RegisterOffset._
225
+
226
+ // Person("John", "john@example.com", 42, 180.0, 67.0)
227
+ val registers = Registers(RegisterOffset(objects = 2, ints = 1, doubles = 2))
228
+
229
+ registers.setObject(
230
+ RegisterOffset.Zero, // Object index: 0
231
+ "John"
232
+ )
233
+
234
+ registers.setObject(
235
+ RegisterOffset(objects = 1), // Object index: 1
236
+ "john@example.com"
237
+ )
238
+
239
+ registers.setInt(
240
+ RegisterOffset(objects = 2), // Byte index: 0
241
+ 42
242
+ )
243
+
244
+ registers.setDouble(
245
+ RegisterOffset(objects = 2, ints = 1), // Byte index: (1 * 4) = 4
246
+ 180.0
247
+ )
248
+
249
+ registers.setDouble(
250
+ RegisterOffset(objects = 2, ints = 1, doubles = 1), // Byte index: (1 * 4) + (1 * 8) = 12
251
+ 67.0
252
+ )
253
+ ```
254
+
255
+ Conversely, to decode the `Person` data type from registers, you would read the values back from their respective positions:
256
+
257
+ ```scala mdoc:silent
258
+ import zio.blocks.schema.binding._
259
+ import zio.blocks.schema.binding.RegisterOffset._
260
+ // Decode Person from registers
261
+ val name = registers.getObject(
262
+ RegisterOffset.Zero, // Object index: 0
263
+ ).asInstanceOf[String]
264
+
265
+ val email = registers.getObject(
266
+ RegisterOffset(objects = 1) // Object index: 1
267
+ ).asInstanceOf[String]
268
+
269
+ val age = registers.getInt(
270
+ RegisterOffset(objects = 2) // Object index: 2
271
+ )
272
+
273
+ val height = registers.getDouble(
274
+ RegisterOffset(objects = 2, ints = 1) // Byte index: (1 * 4) = 4
275
+ )
276
+
277
+ val weight = registers.getDouble(
278
+ RegisterOffset(objects = 2, ints = 1, doubles = 1) // Byte index: (1 * 4) + (1 * 8) = 12
279
+ )
280
+
281
+ val person = Person(name, email, age, height, weight)
282
+ ```