pgoutput-parser 0.1.1 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 58a2b938bf7d260f461f769757d66f2a0ef1a8127b85ba415091fabfde9aee19
4
- data.tar.gz: efacd6a3e298aa165066170a4a4c826a35a48d75f10d2a8a281ba90e709ac1e6
3
+ metadata.gz: 5822df660843e811e03208f6004f7700821953d9c602d70dd3d059f7fb9b82f7
4
+ data.tar.gz: 0231a9f0d3fcc295976b534bfa7350e08f0156c96a1daebb5528e095df2a63db
5
5
  SHA512:
6
- metadata.gz: 83ecef417fc9b40a21c4c4440dfc5be4d0de4a03cb7c6b4439efb17d176888bf0080d803d61d1ad334cc440ae661287073f242b976aac3fa85991115fe558fcf
7
- data.tar.gz: 1e31f54d1c0ea212aeb1e78a01f6c6a8dba6a982717d8bffb75ec87d61e32f89001f4e828c7ad2be643bcabcfb347146e66170750eedce66560b5a612c227fc7
6
+ metadata.gz: 6221a4d4a884f2877fdb9fe0086ce37b955137f204de2c207892de697be28a4d67f4290e12b027531e12ee9ae4277233124989d6d0073a4d9a7584fc587eac78
7
+ data.tar.gz: c868e1446cb07ee253f0ece9a5f8fc104a9ca5bf9e3181e169a12d4affcd03e58a1ca240bf86107a35b388b007b0911985ebba214a6d667b2a886af5f8c90995
data/CHANGELOG.md CHANGED
@@ -7,9 +7,41 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ---
11
+
12
+ ## [0.2.0] - 2026-06-06
13
+
10
14
  ### Added
11
15
 
12
- * Placeholder for future development.
16
+ * Added `Pgoutput::TupleArityError` for relation/tuple column-count mismatches.
17
+ * Added a dependency-free parser throughput benchmark for `BinaryParser` and
18
+ cached `RelationTracker` DML processing.
19
+ * Added parser support for non-streaming Message (`M`), Origin (`O`), Type (`Y`),
20
+ and Truncate (`T`) messages.
21
+ * Added benchmark tuning controls for iterations, warmup, Ractor count, and
22
+ scenario selection.
23
+ * Added benchmark relation-cache tuning for comparing the default Hash cache
24
+ with optional `Ratomic::Map`.
25
+ * Added generated documentation pages for the glossary and detailed
26
+ `RelationTracker` usage.
27
+ * Added optional `Ratomic::Map` relation-cache coverage and benchmark smoke
28
+ paths.
29
+ * Added grouped unit, integration, and behavior test layout aligned with the CDC
30
+ component test shape.
31
+ * Added explicit BinaryParser coverage for unterminated C strings and negative
32
+ byte lengths.
33
+
34
+ ### Changed
35
+
36
+ * `Pgoutput::RelationTracker` now rejects DML tuples whose value count differs from
37
+ the cached `Relation` column count instead of silently assigning incomplete
38
+ column metadata.
39
+ * Raised the minimum supported Ruby version to Ruby 4.0 to align with the CDC
40
+ ecosystem's Ractor-first parser/runtime direction.
41
+ * `Pgoutput::RelationTracker` now accepts an injectable `relation_cache:` object
42
+ compatible with Hash-style `#[]=` and `#fetch`.
43
+ * GitHub Actions now validates RBS signatures and runs a small benchmark smoke
44
+ test against Hash and Ratomic relation-cache paths.
13
45
 
14
46
  ---
15
47
 
@@ -112,6 +144,6 @@ A future companion project (`pgoutput-decoder`) may provide PostgreSQL type deco
112
144
 
113
145
  ---
114
146
 
115
- [Unreleased]: https://github.com/your-github-username/pgoutput-parser/compare/v0.1.0...HEAD
116
- [0.1.0]: https://github.com/your-github-username/pgoutput-parser/releases/tag/v0.1.0
117
-
147
+ [Unreleased]: https://github.com/kanutocd/pgoutput-parser/compare/v0.2.0...HEAD
148
+ [0.2.0]: https://github.com/kanutocd/pgoutput-parser/compare/v0.1.0...v0.2.0
149
+ [0.1.0]: https://github.com/kanutocd/pgoutput-parser/releases/tag/v0.1.0
data/README.md CHANGED
@@ -10,7 +10,7 @@ It intentionally does **not** convert PostgreSQL values into application-specifi
10
10
 
11
11
  ## Requirements
12
12
 
13
- - Ruby 3.4+
13
+ - Ruby 4+
14
14
  - PostgreSQL 10+
15
15
 
16
16
  ---
@@ -18,7 +18,7 @@ It intentionally does **not** convert PostgreSQL values into application-specifi
18
18
  ## Features
19
19
 
20
20
  - Pure Ruby implementation
21
- - Ruby 3.4+
21
+ - Ruby 4+
22
22
  - Ractor-safe parsed messages
23
23
  - Immutable protocol message objects
24
24
  - PostgreSQL logical replication protocol support
@@ -28,6 +28,9 @@ It intentionally does **not** convert PostgreSQL values into application-specifi
28
28
  - YARD documentation included
29
29
  - No runtime dependencies
30
30
 
31
+ The generated documentation also includes a project glossary:
32
+ [docs/glossary.md](docs/glossary.md).
33
+
31
34
  ---
32
35
 
33
36
  ## Why Another pgoutput Library?
@@ -46,13 +49,17 @@ This keeps the parser small, predictable, dependency-free, and faithful to Postg
46
49
 
47
50
  ## Supported MVP Scope
48
51
 
49
- Supports the core pgoutput row-change replication messages:
52
+ Supports the core non-streaming pgoutput logical replication messages:
50
53
 
51
54
  - Begin (`B`)
55
+ - Message (`M`)
56
+ - Origin (`O`)
52
57
  - Relation (`R`)
58
+ - Type (`Y`)
53
59
  - Insert (`I`)
54
60
  - Update (`U`)
55
61
  - Delete (`D`)
62
+ - Truncate (`T`)
56
63
  - Commit (`C`)
57
64
 
58
65
  The currently supported message formats are stable across PostgreSQL 10 through PostgreSQL 18.
@@ -70,10 +77,6 @@ TupleData supports all base column markers:
70
77
 
71
78
  Future releases may add support for:
72
79
 
73
- - Message (`M`)
74
- - Truncate (`T`)
75
- - Origin (`O`)
76
- - Type (`Y`)
77
80
  - Stream Start (`S`)
78
81
  - Stream Stop (`E`)
79
82
  - Stream Commit (`c`)
@@ -275,6 +278,13 @@ delete.old_tuple
275
278
 
276
279
  `RelationTracker` keeps a local relation cache so tuple values can be associated with PostgreSQL column OIDs defined by preceding Relation (`R`) messages.
277
280
 
281
+ The tracker accepts an optional `relation_cache:` argument. The default is a
282
+ plain Hash, but callers can inject `Ratomic::Map` for a Ractor-safe cache in
283
+ experimental or parallel setups.
284
+
285
+ For a deeper guide, including stream-order behavior, tuple arity validation, and
286
+ `Ratomic::Map` usage, see [docs/relation_tracker.md](docs/relation_tracker.md).
287
+
278
288
  No type conversion is performed.
279
289
 
280
290
  Only protocol metadata is attached.
@@ -290,6 +300,10 @@ message.new_tuple.map(&:oid)
290
300
 
291
301
  The relation tracker itself is stateful and maintains relation metadata encountered in the replication stream.
292
302
 
303
+ If a DML tuple's value count does not match the cached relation column count, `RelationTracker` raises
304
+ `Pgoutput::TupleArityError`. This keeps malformed payloads or mismatched stream state from being silently
305
+ annotated with incomplete column metadata.
306
+
293
307
  ---
294
308
 
295
309
  ## Ractor Safety
@@ -392,12 +406,16 @@ The tracker maintains relation metadata discovered during the stream and therefo
392
406
 
393
407
  ```ruby
394
408
  Pgoutput::Messages::Begin
409
+ Pgoutput::Messages::Message
410
+ Pgoutput::Messages::Origin
395
411
  Pgoutput::Messages::Relation
412
+ Pgoutput::Messages::Type
396
413
  Pgoutput::Messages::Column
397
414
  Pgoutput::Messages::TupleValue
398
415
  Pgoutput::Messages::Insert
399
416
  Pgoutput::Messages::Update
400
417
  Pgoutput::Messages::Delete
418
+ Pgoutput::Messages::Truncate
401
419
  Pgoutput::Messages::Commit
402
420
  ```
403
421
 
@@ -435,6 +453,50 @@ COVERAGE=true bundle exec rake test
435
453
 
436
454
  ---
437
455
 
456
+ ## Benchmarking
457
+
458
+ Run the parser throughput benchmark:
459
+
460
+ ```bash
461
+ ruby benchmark/parser_throughput.rb
462
+ ```
463
+
464
+ The benchmark reports single-process parser throughput, relation-tracker throughput, and Ractor-parallel throughput. It is intended to show both the single-thread baseline and the Ruby 4 Ractor path this parser is designed to support. Relation-tracker scenarios can also compare the default Hash relation cache with an optional `Ratomic::Map` cache.
465
+
466
+ Tune the run with environment variables:
467
+
468
+ | Variable | Default | Description |
469
+ | -------- | ------- | ----------- |
470
+ | `PGOUTPUT_BENCH_ITERATIONS` | `100000` | Iterations per selected scenario. |
471
+ | `PGOUTPUT_BENCH_WARMUP` | `1000` | Warmup iterations before timing. |
472
+ | `PGOUTPUT_BENCH_RACTORS` | `2` or CPU count, whichever is lower | Number of Ractor workers for Ractor scenarios. |
473
+ | `PGOUTPUT_BENCH_SCENARIOS` | `all` | Comma-separated scenarios: `binary`, `tracker_dml`, `tracker_mixed`, `ractor_binary`, `ractor_tracker`, or `all`. |
474
+ | `PGOUTPUT_BENCH_RELATION_CACHE` | `hash` | Comma-separated relation-cache backends for tracker scenarios: `hash`, `ratomic`, or `all`. |
475
+
476
+ Examples:
477
+
478
+ ```bash
479
+ PGOUTPUT_BENCH_ITERATIONS=10000 ruby benchmark/parser_throughput.rb
480
+ PGOUTPUT_BENCH_SCENARIOS=binary,tracker_mixed ruby benchmark/parser_throughput.rb
481
+ PGOUTPUT_BENCH_RACTORS=4 PGOUTPUT_BENCH_SCENARIOS=ractor_binary,ractor_tracker ruby benchmark/parser_throughput.rb
482
+ PGOUTPUT_BENCH_RELATION_CACHE=all PGOUTPUT_BENCH_SCENARIOS=tracker_mixed,ractor_tracker ruby benchmark/parser_throughput.rb
483
+ ```
484
+
485
+ Sample Ruby 4 output:
486
+
487
+ ```text
488
+ pgoutput-parser throughput
489
+ iterations=1000 warmup=10 ractors=2 scenarios=tracker_mixed,ractor_tracker relation_cache=hash,ratomic ruby=4.0.5
490
+ RelationTracker hash 7000 messages in 0.163s 42891 msg/s
491
+ RelationTracker ratomic 7000 messages in 0.131s 53579 msg/s
492
+ Ractor RelationTracker hash 14000 messages in 0.197s 71097 msg/s
493
+ Ractor RelationTracker ratomic 14000 messages in 0.146s 96190 msg/s
494
+ ```
495
+
496
+ Interpret the Ractor rows as aggregate throughput across workers. They are not a replacement for the single-process rows; they demonstrate the parser's shareable-message design under parallel execution.
497
+
498
+ ---
499
+
438
500
  ## Development
439
501
 
440
502
  Generate YARD documentation:
data/docs/glossary.md ADDED
@@ -0,0 +1,145 @@
1
+ # Glossary
2
+
3
+ This glossary defines terms as they are used by `pgoutput-parser`. It focuses on
4
+ the protocol parsing layer and avoids application-level decoding concepts that
5
+ belong to higher CDC components.
6
+
7
+ ## Binary Value
8
+
9
+ A tuple value sent by PostgreSQL with the `b` TupleData marker. The parser
10
+ preserves the bytes exactly as received and does not interpret the PostgreSQL
11
+ binary format.
12
+
13
+ ## BinaryParser
14
+
15
+ The stateless entry point for parsing one pgoutput message payload. It decodes
16
+ the wire-format tag and fields into an immutable `Pgoutput::Messages` object.
17
+
18
+ ## Column Flag
19
+
20
+ The per-column flag byte in a Relation (`R`) message. PostgreSQL uses flag `1`
21
+ to identify replica identity key columns.
22
+
23
+ ## Commit LSN
24
+
25
+ The PostgreSQL log sequence number associated with a transaction commit. The
26
+ parser exposes it as protocol metadata and does not use it for ordering logic.
27
+
28
+ ## CopyData Payload
29
+
30
+ The PostgreSQL replication protocol frame body that contains one pgoutput
31
+ message. This gem expects callers to provide that payload; it does not manage
32
+ the PostgreSQL connection or replication stream.
33
+
34
+ ## DML Message
35
+
36
+ A data manipulation message emitted by pgoutput for row or table changes. In
37
+ this parser, DML messages are Insert (`I`), Update (`U`), Delete (`D`), and
38
+ Truncate (`T`).
39
+
40
+ ## Immutable Message
41
+
42
+ A parsed protocol object that has been made shareable with `Ractor`. Parsed
43
+ messages can be passed across Ractors without sharing mutable parser state.
44
+
45
+ ## LSN
46
+
47
+ Log sequence number. PostgreSQL uses LSN values to identify positions in the
48
+ write-ahead log. This parser keeps LSN values as integers and does not convert
49
+ them into PostgreSQL's textual `X/Y` notation.
50
+
51
+ ## Message Tag
52
+
53
+ The first byte of a pgoutput payload. It identifies the message type, such as
54
+ `R` for Relation, `I` for Insert, or `C` for Commit.
55
+
56
+ ## OID
57
+
58
+ Object identifier. In this gem, OIDs are most commonly relation IDs and
59
+ PostgreSQL type IDs exposed by Relation (`R`) and Type (`Y`) messages.
60
+
61
+ ## pgoutput
62
+
63
+ PostgreSQL's built-in logical replication output plugin. It serializes
64
+ transaction metadata, relation metadata, and row changes into a binary protocol.
65
+
66
+ ## Relation
67
+
68
+ PostgreSQL table metadata sent by a Relation (`R`) message. It includes the
69
+ relation ID, schema name, table name, replica identity setting, and columns.
70
+
71
+ ## RelationTracker
72
+
73
+ The stream-order parser wrapper that remembers Relation (`R`) messages and uses
74
+ them to annotate later DML tuple values with PostgreSQL type OIDs.
75
+
76
+ ## Replica Identity
77
+
78
+ PostgreSQL metadata that determines what old-row data is available for Update
79
+ and Delete messages. The parser exposes the replica identity byte from Relation
80
+ messages and preserves old-key or old-full tuples when PostgreSQL sends them.
81
+
82
+ ## Ractor-Safe
83
+
84
+ Safe to pass between Ruby Ractors. Parsed messages are Ractor-safe; parser and
85
+ tracker instances remain mutable and should be scoped to one owner unless the
86
+ caller supplies an explicitly Ractor-safe relation cache.
87
+
88
+ ## Ratomic
89
+
90
+ An optional Ruby library that provides Ractor-oriented concurrent data
91
+ structures. `pgoutput-parser` does not require Ratomic at runtime, but benchmark
92
+ and development code can use it to evaluate Ractor-safe relation cache behavior.
93
+
94
+ ## Ratomic::Map
95
+
96
+ A Ractor-safe map implementation from Ratomic. `RelationTracker` can use
97
+ `Ratomic::Map` as its `relation_cache:` when callers need relation metadata in a
98
+ cache that can be shared across Ractor-oriented execution designs.
99
+
100
+ ## Raw Tuple Value
101
+
102
+ The uninterpreted bytes for a tuple column value. Text values and binary values
103
+ are both kept as raw strings; NULL and unchanged TOAST markers have no raw
104
+ payload.
105
+
106
+ ## Text Value
107
+
108
+ A tuple value sent by PostgreSQL with the `t` TupleData marker. The parser
109
+ preserves the text bytes and leaves type conversion to a decoder layer.
110
+
111
+ ## TOAST
112
+
113
+ PostgreSQL's storage mechanism for large column values. In pgoutput TupleData,
114
+ the `u` marker means an unchanged TOAST value was not resent.
115
+
116
+ ## Truncate
117
+
118
+ A pgoutput DML message that reports table truncation. It contains relation IDs
119
+ and option bits such as CASCADE and RESTART IDENTITY.
120
+
121
+ ## Tuple Arity
122
+
123
+ The number of values in tuple data. `RelationTracker` validates DML tuple arity
124
+ against cached Relation column metadata before annotating type OIDs.
125
+
126
+ ## TupleData
127
+
128
+ The pgoutput structure that carries row values for Insert, Update, and Delete
129
+ messages. Each value is marked as NULL, unchanged TOAST, text, or binary.
130
+
131
+ ## Type Decoding
132
+
133
+ Conversion from PostgreSQL raw tuple bytes into application-level Ruby values.
134
+ This gem intentionally does not perform type decoding; that responsibility
135
+ belongs to a higher-level decoder component.
136
+
137
+ ## Type Modifier
138
+
139
+ PostgreSQL column type metadata carried in Relation messages. For example,
140
+ typmods can encode precision or length constraints for some PostgreSQL types.
141
+
142
+ ## WAL
143
+
144
+ Write-ahead log. PostgreSQL logical replication streams changes derived from WAL
145
+ through output plugins such as pgoutput.
data/docs/index.md ADDED
@@ -0,0 +1,239 @@
1
+ # pgoutput-parser
2
+
3
+ A high-performance, Ractor-safe PostgreSQL `pgoutput` logical replication protocol parser written in pure Ruby.
4
+
5
+ `pgoutput-parser` parses PostgreSQL logical replication `CopyData` payloads into immutable protocol message objects.
6
+
7
+ It focuses exclusively on PostgreSQL's `pgoutput` wire format:
8
+
9
+ * Transaction boundaries
10
+ * Relation metadata
11
+ * DML message structure
12
+ * Tuple payload markers
13
+ * Raw tuple values
14
+
15
+ It intentionally does **not** decode PostgreSQL values into application-level Ruby objects.
16
+
17
+ That responsibility belongs to higher layers such as:
18
+
19
+ ```text
20
+ pgoutput-decoder
21
+ ```
22
+
23
+ ---
24
+
25
+ # Architecture
26
+
27
+ ```text
28
+ PostgreSQL WAL
29
+ |
30
+ v
31
+ CopyData payload
32
+ |
33
+ v
34
+ pgoutput-parser
35
+ |
36
+ v
37
+ Immutable protocol messages
38
+ ```
39
+
40
+ The parser is the protocol layer of the CDC ecosystem.
41
+
42
+ ```text
43
+ PostgreSQL
44
+ |
45
+ v
46
+ pgoutput-parser
47
+ |
48
+ v
49
+ pgoutput-decoder
50
+ |
51
+ v
52
+ cdc-core
53
+ ```
54
+
55
+ ---
56
+
57
+ # Quick Start
58
+
59
+ ```ruby
60
+ require "pgoutput"
61
+
62
+ stream = Pgoutput::RelationTracker.new
63
+
64
+ stream.process(relation_payload)
65
+
66
+ insert = stream.process(insert_payload)
67
+
68
+ insert.relation_id
69
+ # => 42
70
+
71
+ insert.tuple.first.raw
72
+ # => "7"
73
+
74
+ insert.tuple.first.oid
75
+ # => 23
76
+ ```
77
+
78
+ ---
79
+
80
+ # Core Concepts
81
+
82
+ ## BinaryParser
83
+
84
+ Parses a single `pgoutput` payload.
85
+
86
+ ```ruby
87
+ message = Pgoutput::BinaryParser
88
+ .new(payload)
89
+ .parse
90
+ ```
91
+
92
+ Use this when stream state is not required.
93
+
94
+ ---
95
+
96
+ ## RelationTracker
97
+
98
+ Tracks PostgreSQL relation metadata across a replication stream.
99
+
100
+ ```ruby
101
+ stream = Pgoutput::RelationTracker.new
102
+
103
+ stream.process(relation_payload)
104
+
105
+ message = stream.process(insert_payload)
106
+ ```
107
+
108
+ This allows tuple values to be associated with PostgreSQL column OIDs.
109
+
110
+ If tuple data does not match the cached relation column count, `RelationTracker`
111
+ raises `Pgoutput::TupleArityError`.
112
+
113
+ ---
114
+
115
+ # Supported Messages
116
+
117
+ Current MVP support:
118
+
119
+ * Begin (`B`)
120
+ * Message (`M`)
121
+ * Origin (`O`)
122
+ * Relation (`R`)
123
+ * Type (`Y`)
124
+ * Insert (`I`)
125
+ * Update (`U`)
126
+ * Delete (`D`)
127
+ * Truncate (`T`)
128
+ * Commit (`C`)
129
+
130
+ Supported across PostgreSQL 10–18.
131
+
132
+ ---
133
+
134
+ # Tuple Values
135
+
136
+ Tuple values are preserved exactly as PostgreSQL sends them.
137
+
138
+ ```ruby
139
+ value.raw
140
+ # => "2026-05-31 12:34:56+00"
141
+ ```
142
+
143
+ No application-level decoding occurs.
144
+
145
+ ---
146
+
147
+ # Binary Values
148
+
149
+ Binary tuple values are preserved exactly as received.
150
+
151
+ ```ruby
152
+ value.raw
153
+ # => "\x00\x00\x00\x07".b
154
+ ```
155
+
156
+ The parser does not interpret binary payloads.
157
+
158
+ ---
159
+
160
+ # Ractor Safety
161
+
162
+ All parsed protocol messages are immutable and shareable.
163
+
164
+ ```ruby
165
+ message = stream.process(update_payload)
166
+
167
+ Ractor.shareable?(message)
168
+ # => true
169
+ ```
170
+
171
+ Passing messages across Ractors:
172
+
173
+ ```ruby
174
+ message = stream.process(update_payload)
175
+
176
+ result = Ractor.new(message) do |update|
177
+ update.new_tuple.map(&:raw)
178
+ end.take
179
+ ```
180
+
181
+ ---
182
+
183
+ # Non-Goals
184
+
185
+ `pgoutput-parser` intentionally does not:
186
+
187
+ * Open replication connections
188
+ * Manage replication slots
189
+ * Track WAL positions
190
+ * Reconnect to PostgreSQL
191
+ * Decode PostgreSQL values
192
+ * Build CDC pipelines
193
+ * Integrate with ActiveRecord
194
+
195
+ Its sole responsibility is protocol parsing.
196
+
197
+ ---
198
+
199
+ # Public API
200
+
201
+ See the generated API documentation for:
202
+
203
+ * `Pgoutput::BinaryParser`
204
+ * `Pgoutput::RelationTracker`
205
+ * `Pgoutput::Messages::*`
206
+
207
+ ---
208
+
209
+ # Development
210
+
211
+ Generate documentation:
212
+
213
+ ```bash
214
+ bundle exec yard doc
215
+ ```
216
+
217
+ Run tests:
218
+
219
+ ```bash
220
+ bundle exec rake test
221
+ ```
222
+
223
+ Run coverage:
224
+
225
+ ```bash
226
+ COVERAGE=true bundle exec rake test
227
+ ```
228
+
229
+ Run Steep:
230
+
231
+ ```bash
232
+ bundle exec steep check
233
+ ```
234
+
235
+ ---
236
+
237
+ # License
238
+
239
+ MIT