pgoutput-parser 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: af80f7a3e1fadcd7bac5878116c55b4c21c8b9b4ccb5583153c13703abe2211d
4
+ data.tar.gz: eb474803e09cde2cc8aaae6daefb8f214461781e93d469d8684698338b230d97
5
+ SHA512:
6
+ metadata.gz: cf622ea5b01013ed1005cae3958cc758126e42936e46c9e8f6af79c25daaa8d0db71a9fd51f698a8182e7870a5b208dcd4d83af4d6b6d52c4c6a230457fffa00
7
+ data.tar.gz: 9866a5d67ddd3463566e46cc81bff3ef433117254484bbddd962207aa2af9a1d2af5a409d578c571b9459dbcc6db37e7d3090c366f2a54e70e0da1e41b59a464
data/CHANGELOG.md ADDED
@@ -0,0 +1,117 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [Unreleased]
9
+
10
+ ### Added
11
+
12
+ * Placeholder for future development.
13
+
14
+ ---
15
+
16
+ ## [0.1.0] - 2026-05-31
17
+
18
+ ### Added
19
+
20
+ #### Protocol Support
21
+
22
+ * Added support for PostgreSQL `pgoutput` logical replication protocol parsing.
23
+ * Added support for Begin (`B`) messages.
24
+ * Added support for Relation (`R`) messages.
25
+ * Added support for Insert (`I`) messages.
26
+ * Added support for Update (`U`) messages.
27
+ * Added support for Delete (`D`) messages.
28
+ * Added support for Commit (`C`) messages.
29
+ * Added support for TupleData value markers:
30
+
31
+ * `n` (NULL)
32
+ * `u` (Unchanged TOAST)
33
+ * `t` (Text)
34
+ * `b` (Binary)
35
+
36
+ #### Message Models
37
+
38
+ * Added immutable protocol message classes:
39
+
40
+ * `Pgoutput::Messages::Begin`
41
+ * `Pgoutput::Messages::Relation`
42
+ * `Pgoutput::Messages::Column`
43
+ * `Pgoutput::Messages::TupleValue`
44
+ * `Pgoutput::Messages::Insert`
45
+ * `Pgoutput::Messages::Update`
46
+ * `Pgoutput::Messages::Delete`
47
+ * `Pgoutput::Messages::Commit`
48
+
49
+ #### Parsing Infrastructure
50
+
51
+ * Added `Pgoutput::BinaryParser`.
52
+ * Added offset-based binary parsing implementation.
53
+ * Added support for parsing PostgreSQL null-terminated strings.
54
+ * Added support for parsing PostgreSQL integer wire types.
55
+ * Added protocol validation and truncation detection.
56
+ * Added parser error hierarchy.
57
+
58
+ #### Relation Tracking
59
+
60
+ * Added `Pgoutput::RelationTracker`.
61
+ * Added relation metadata cache.
62
+ * Added relation ID to column OID mapping.
63
+ * Added tuple annotation with PostgreSQL type OIDs.
64
+ * Added validation for unknown relation references.
65
+
66
+ #### Concurrency
67
+
68
+ * Added Ractor-safe message generation.
69
+ * Added deep-shareable protocol message objects.
70
+ * Added immutable arrays and strings throughout parsed outputs.
71
+
72
+ #### Type System
73
+
74
+ * Added RBS type signatures.
75
+ * Added Steep compatibility.
76
+
77
+ #### Documentation
78
+
79
+ * Added YARD documentation coverage for public API.
80
+ * Added project README.
81
+ * Added architecture documentation.
82
+ * Added protocol examples.
83
+ * Added Ractor usage examples.
84
+
85
+ #### Testing
86
+
87
+ * Added Minitest test suite.
88
+ * Added protocol message unit tests.
89
+ * Added end-to-end stream integration tests.
90
+ * Added Ractor compatibility tests.
91
+ * Added binary payload builders for test fixtures.
92
+
93
+ #### Tooling
94
+
95
+ * Added Bundler project setup.
96
+ * Added GitHub Actions CI workflow.
97
+ * Added SimpleCov coverage support.
98
+ * Added Rake tasks for development workflows.
99
+
100
+ ### Notes
101
+
102
+ This release intentionally focuses on protocol parsing only.
103
+
104
+ The library does not:
105
+
106
+ * Manage replication connections
107
+ * Manage replication slots
108
+ * Track WAL positions
109
+ * Decode PostgreSQL values into Ruby objects
110
+
111
+ A future companion project (`pgoutput-decoder`) may provide PostgreSQL type decoding and higher-level row representations.
112
+
113
+ ---
114
+
115
+ [Unreleased]: https://github.com/your-github-username/pgoutput-parser/compare/v0.1.0...HEAD
116
+ [0.1.0]: https://github.com/your-github-username/pgoutput-parser/releases/tag/v0.1.0
117
+
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2026 Kenneth C. Demanawa
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,471 @@
1
+ # pgoutput-parser
2
+
3
+ A high-performance, Ractor-safe PostgreSQL `pgoutput` logical replication protocol parser written in pure Ruby.
4
+
5
+ `pgoutput-parser` parses PostgreSQL logical replication `CopyData` payloads into immutable protocol message objects. It focuses on the `pgoutput` wire format: transaction boundaries, relation metadata, DML message structure, tuple payload markers, and raw tuple bytes.
6
+
7
+ It intentionally does **not** convert PostgreSQL values into application-specific Ruby objects. That belongs to a higher-level decoder layer, such as a future `pgoutput-decoder` gem.
8
+
9
+ ---
10
+
11
+ ## Requirements
12
+
13
+ - Ruby 3.4+
14
+ - PostgreSQL 10+
15
+
16
+ ---
17
+
18
+ ## Features
19
+
20
+ - Pure Ruby implementation
21
+ - Ruby 3.4+
22
+ - Ractor-safe parsed messages
23
+ - Immutable protocol message objects
24
+ - PostgreSQL logical replication protocol support
25
+ - Relation metadata tracking
26
+ - Binary-safe tuple parsing
27
+ - RBS type signatures included
28
+ - YARD documentation included
29
+ - No runtime dependencies
30
+
31
+ ---
32
+
33
+ ## Why Another pgoutput Library?
34
+
35
+ This gem focuses exclusively on protocol parsing.
36
+
37
+ It intentionally separates:
38
+
39
+ - Protocol parsing (`pgoutput-parser`)
40
+ - Type decoding (`pgoutput-decoder`)
41
+ - Replication transport/client management
42
+
43
+ This keeps the parser small, predictable, dependency-free, and faithful to PostgreSQL's wire format.
44
+
45
+ ---
46
+
47
+ ## Supported MVP Scope
48
+
49
+ Supports the core pgoutput row-change replication messages:
50
+
51
+ - Begin (`B`)
52
+ - Relation (`R`)
53
+ - Insert (`I`)
54
+ - Update (`U`)
55
+ - Delete (`D`)
56
+ - Commit (`C`)
57
+
58
+ The currently supported message formats are stable across PostgreSQL 10 through PostgreSQL 18.
59
+
60
+ TupleData supports all base column markers:
61
+
62
+ | Tuple Value Tag | Meaning |
63
+ | --------------- | -------------------------- |
64
+ | `n` | NULL |
65
+ | `u` | Unchanged TOAST value |
66
+ | `t` | Text-formatted raw value |
67
+ | `b` | Binary-formatted raw value |
68
+
69
+ ### Planned Support
70
+
71
+ Future releases may add support for:
72
+
73
+ - Message (`M`)
74
+ - Truncate (`T`)
75
+ - Origin (`O`)
76
+ - Type (`Y`)
77
+ - Stream Start (`S`)
78
+ - Stream Stop (`E`)
79
+ - Stream Commit (`c`)
80
+ - Stream Abort (`A`)
81
+ - Two-Phase Commit messages
82
+
83
+ ---
84
+
85
+ ## What This Gem Does
86
+
87
+ ```text
88
+ PostgreSQL CopyData payload
89
+
90
+
91
+ pgoutput-parser
92
+
93
+
94
+ Immutable protocol messages
95
+ ```
96
+
97
+ The parser understands:
98
+
99
+ - Message tags and binary field sizes
100
+ - Transaction begin metadata
101
+ - Transaction commit metadata
102
+ - Relation metadata
103
+ - Column flags
104
+ - Column names
105
+ - PostgreSQL type OIDs
106
+ - PostgreSQL type modifiers
107
+ - Insert tuples
108
+ - Update old-key tuples
109
+ - Update old full tuples
110
+ - Update new tuples
111
+ - Delete old-key tuples
112
+ - Delete old full tuples
113
+ - Tuple value markers (`n`, `u`, `t`, `b`)
114
+
115
+ ---
116
+
117
+ ## What This Gem Does Not Do
118
+
119
+ The parser does not perform application-level type decoding.
120
+
121
+ It does not convert:
122
+
123
+ - UUID
124
+ - JSONB
125
+ - Timestamp
126
+ - Numeric
127
+ - Array
128
+ - Range
129
+ - PostGIS
130
+ - Custom PostgreSQL types
131
+
132
+ Example:
133
+
134
+ ```ruby
135
+ value.raw
136
+ # => "2026-05-31 12:34:56+00"
137
+ ```
138
+
139
+ The raw value is preserved exactly as received.
140
+
141
+ A higher-level decoder may later interpret it.
142
+
143
+ ---
144
+
145
+ ## Non-goals
146
+
147
+ This project intentionally does not:
148
+
149
+ - Manage replication slots
150
+ - Open replication connections
151
+ - Maintain WAL positions
152
+ - Reconnect to PostgreSQL
153
+ - Decode PostgreSQL types
154
+ - Integrate with ActiveRecord
155
+ - Publish events
156
+ - Build CDC pipelines
157
+
158
+ Its sole responsibility is parsing pgoutput protocol messages.
159
+
160
+ ---
161
+
162
+ ## Installation
163
+
164
+ Add this line to your Gemfile:
165
+
166
+ ```ruby
167
+ gem "pgoutput-parser"
168
+ ```
169
+
170
+ Then run:
171
+
172
+ ```bash
173
+ bundle install
174
+ ```
175
+
176
+ Require the library:
177
+
178
+ ```ruby
179
+ require "pgoutput"
180
+ ```
181
+
182
+ ---
183
+
184
+ ## Quick Start
185
+
186
+ ```ruby
187
+ require "pgoutput"
188
+
189
+ stream = Pgoutput::RelationTracker.new
190
+
191
+ stream.process(relation_payload)
192
+
193
+ insert = stream.process(insert_payload)
194
+
195
+ insert.relation_id
196
+ # => 42
197
+
198
+ insert.tuple.first.raw
199
+ # => "7"
200
+
201
+ insert.tuple.first.oid
202
+ # => 23
203
+ ```
204
+
205
+ ---
206
+
207
+ ## Binary Tuple Values
208
+
209
+ When PostgreSQL publishes tuple values using binary format (`b`), the parser preserves the raw bytes exactly as received.
210
+
211
+ ```ruby
212
+ value.raw
213
+ # => "\x00\x00\x00\x07".b
214
+ ```
215
+
216
+ The parser does not interpret binary values.
217
+
218
+ ---
219
+
220
+ ## Update Messages
221
+
222
+ PostgreSQL `Update` messages may contain:
223
+
224
+ - No old tuple
225
+ - An old key tuple (`K`)
226
+ - An old full tuple (`O`)
227
+
228
+ They always contain a new tuple (`N`).
229
+
230
+ ```ruby
231
+ update = stream.process(update_payload)
232
+
233
+ update.old_key_tuple
234
+ # => [Pgoutput::Messages::TupleValue, ...] or nil
235
+
236
+ update.old_tuple
237
+ # => [Pgoutput::Messages::TupleValue, ...] or nil
238
+
239
+ update.new_tuple
240
+ # => [Pgoutput::Messages::TupleValue, ...]
241
+ ```
242
+
243
+ ### Update Tuple Example
244
+
245
+ ```ruby
246
+ update = stream.process(update_payload)
247
+
248
+ update.old_key_tuple
249
+ update.old_tuple
250
+ update.new_tuple
251
+ ```
252
+
253
+ ---
254
+
255
+ ## Delete Messages
256
+
257
+ PostgreSQL `Delete` messages contain either:
258
+
259
+ - An old key tuple (`K`)
260
+ - An old full tuple (`O`)
261
+
262
+ ```ruby
263
+ delete = stream.process(delete_payload)
264
+
265
+ delete.old_key_tuple
266
+ # => [Pgoutput::Messages::TupleValue, ...] or nil
267
+
268
+ delete.old_tuple
269
+ # => [Pgoutput::Messages::TupleValue, ...] or nil
270
+ ```
271
+
272
+ ---
273
+
274
+ ## Relation Metadata Tracking
275
+
276
+ `RelationTracker` keeps a local relation cache so tuple values can be associated with PostgreSQL column OIDs defined by preceding Relation (`R`) messages.
277
+
278
+ No type conversion is performed.
279
+
280
+ Only protocol metadata is attached.
281
+
282
+ ```ruby
283
+ stream.process(relation_payload)
284
+
285
+ message = stream.process(update_payload)
286
+
287
+ message.new_tuple.map(&:oid)
288
+ # => [23, 25, 16]
289
+ ```
290
+
291
+ The relation tracker itself is stateful and maintains relation metadata encountered in the replication stream.
292
+
293
+ ---
294
+
295
+ ## Ractor Safety
296
+
297
+ ```ruby
298
+ message = stream.process(update_payload)
299
+
300
+ Ractor.shareable?(message)
301
+ # => true
302
+ ```
303
+
304
+ Passing parsed messages to a Ractor:
305
+
306
+ ```ruby
307
+ message = stream.process(update_payload)
308
+
309
+ result = Ractor.new(message) do |update|
310
+ update.new_tuple.map(&:raw)
311
+ end.take
312
+ ```
313
+
314
+ ---
315
+
316
+ ## Architecture
317
+
318
+ ```text
319
+ PostgreSQL
320
+
321
+
322
+ CopyData payload
323
+
324
+
325
+ Pgoutput::BinaryParser
326
+
327
+
328
+ Parsed protocol message
329
+
330
+
331
+ Pgoutput::RelationTracker
332
+
333
+
334
+ Protocol message with relation metadata
335
+
336
+
337
+ Ractor-safe protocol message
338
+ ```
339
+
340
+ ---
341
+
342
+ ## Public API
343
+
344
+ ### Pgoutput::BinaryParser
345
+
346
+ Parses a single pgoutput payload without stream state.
347
+
348
+ ```ruby
349
+ message = Pgoutput::BinaryParser.new(payload).parse
350
+ ```
351
+
352
+ ### Pgoutput::RelationTracker
353
+
354
+ Parses messages in stream order and remembers relation metadata.
355
+
356
+ ```ruby
357
+ stream = Pgoutput::RelationTracker.new
358
+
359
+ stream.process(relation_payload)
360
+
361
+ message = stream.process(insert_payload)
362
+ ```
363
+
364
+ ### Optional Usage
365
+
366
+ `RelationTracker` is optional.
367
+
368
+ If relation metadata tracking is not required, payloads can be parsed directly:
369
+
370
+ ```ruby
371
+ message =
372
+ Pgoutput::BinaryParser
373
+ .new(payload)
374
+ .parse
375
+ ```
376
+
377
+ ---
378
+
379
+ ## RelationTracker Lifecycle
380
+
381
+ A `RelationTracker` should be created per logical replication stream.
382
+
383
+ ```ruby
384
+ stream = Pgoutput::RelationTracker.new
385
+ ```
386
+
387
+ The tracker maintains relation metadata discovered during the stream and therefore should not be reused across unrelated replication sessions.
388
+
389
+ ---
390
+
391
+ ## Message Classes
392
+
393
+ ```ruby
394
+ Pgoutput::Messages::Begin
395
+ Pgoutput::Messages::Relation
396
+ Pgoutput::Messages::Column
397
+ Pgoutput::Messages::TupleValue
398
+ Pgoutput::Messages::Insert
399
+ Pgoutput::Messages::Update
400
+ Pgoutput::Messages::Delete
401
+ Pgoutput::Messages::Commit
402
+ ```
403
+
404
+ ---
405
+
406
+ ## Type Signatures
407
+
408
+ RBS signatures are included:
409
+
410
+ ```text
411
+ sig/pgoutput.rbs
412
+ ```
413
+
414
+ Run Steep:
415
+
416
+ ```bash
417
+ bundle exec steep check
418
+ ```
419
+
420
+ ---
421
+
422
+ ## Testing
423
+
424
+ Run all tests:
425
+
426
+ ```bash
427
+ bundle exec rake test
428
+ ```
429
+
430
+ Run with coverage:
431
+
432
+ ```bash
433
+ COVERAGE=true bundle exec rake test
434
+ ```
435
+
436
+ ---
437
+
438
+ ## Development
439
+
440
+ Generate YARD documentation:
441
+
442
+ ```bash
443
+ bundle exec yard doc
444
+ ```
445
+
446
+ ---
447
+
448
+ ## Ecosystem Direction
449
+
450
+ This gem is the protocol layer.
451
+
452
+ ```text
453
+ pgoutput-parser
454
+
455
+
456
+ Protocol messages
457
+
458
+
459
+ pgoutput-decoder
460
+
461
+
462
+ Application objects
463
+ ```
464
+
465
+ `pgoutput-parser` should remain small, dependency-free, binary-safe, and faithful to PostgreSQL's wire format.
466
+
467
+ ---
468
+
469
+ ## License
470
+
471
+ MIT.
@@ -0,0 +1,171 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Pgoutput
4
+ # Offset-based binary parser for one PostgreSQL pgoutput logical replication
5
+ # message payload.
6
+ #
7
+ # A parser instance is intentionally short-lived and mutable while reading a
8
+ # single payload. Its returned message object is deeply frozen/shareable and may
9
+ # cross Ractor boundaries safely.
10
+ #
11
+ # @api public
12
+ class BinaryParser
13
+ # @param payload [String] one pgoutput message payload from a CopyData frame.
14
+ # @return [void]
15
+ def initialize(payload)
16
+ @payload = payload.b
17
+ @offset = 0
18
+ end
19
+
20
+ # Parse one supported pgoutput message.
21
+ #
22
+ # Supported MVP tags are `B`, `R`, `I`, `U`, `D`, and `C`.
23
+ #
24
+ # @return [Pgoutput::Messages::Begin, Pgoutput::Messages::Relation,
25
+ # Pgoutput::Messages::Insert, Pgoutput::Messages::Update,
26
+ # Pgoutput::Messages::Delete, Pgoutput::Messages::Commit]
27
+ # @raise [UnsupportedMessageError] if the message tag is unsupported.
28
+ # @raise [TruncatedMessageError] if the payload is incomplete.
29
+ def parse
30
+ case read_byte_chr
31
+ when "B" then parse_begin
32
+ when "R" then parse_relation
33
+ when "I" then parse_insert
34
+ when "U" then parse_update
35
+ when "D" then parse_delete
36
+ when "C" then parse_commit
37
+ else
38
+ raise UnsupportedMessageError, "unsupported pgoutput message tag"
39
+ end
40
+ end
41
+
42
+ private
43
+
44
+ def parse_begin
45
+ share(Messages::Begin.new(read_uint64, read_uint64, read_uint32))
46
+ end
47
+
48
+ def parse_relation
49
+ relation_id = read_uint32
50
+ schema = read_cstring
51
+ table = read_cstring
52
+ replica_identity = read_uint8
53
+ column_count = read_uint16
54
+
55
+ columns = Array.new(column_count) do
56
+ Messages::Column.new(read_uint8, read_cstring, read_uint32, read_int32)
57
+ end.freeze
58
+
59
+ share(Messages::Relation.new(relation_id, schema, table, replica_identity, columns))
60
+ end
61
+
62
+ def parse_insert
63
+ relation_id = read_uint32
64
+ tuple_tag = read_byte_chr
65
+ raise UnsupportedMessageError, "expected insert tuple tag N, got #{tuple_tag.inspect}" unless tuple_tag == "N"
66
+
67
+ share(Messages::Insert.new(relation_id, parse_tuple_data))
68
+ end
69
+
70
+ def parse_update
71
+ relation_id = read_uint32
72
+ old_key_tuple = nil
73
+ old_tuple = nil
74
+
75
+ first_tag = read_byte_chr
76
+ case first_tag
77
+ when "K"
78
+ old_key_tuple = parse_tuple_data
79
+ new_tag = read_byte_chr
80
+ when "O"
81
+ old_tuple = parse_tuple_data
82
+ new_tag = read_byte_chr
83
+ when "N"
84
+ new_tag = first_tag
85
+ else
86
+ raise UnsupportedMessageError, "expected update tuple tag K, O, or N, got #{first_tag.inspect}"
87
+ end
88
+
89
+ raise UnsupportedMessageError, "expected update new tuple tag N, got #{new_tag.inspect}" unless new_tag == "N"
90
+
91
+ share(Messages::Update.new(relation_id, old_key_tuple, old_tuple, parse_tuple_data))
92
+ end
93
+
94
+ def parse_delete
95
+ relation_id = read_uint32
96
+ tuple_tag = read_byte_chr
97
+
98
+ case tuple_tag
99
+ when "K"
100
+ share(Messages::Delete.new(relation_id, parse_tuple_data, nil))
101
+ when "O"
102
+ share(Messages::Delete.new(relation_id, nil, parse_tuple_data))
103
+ else
104
+ raise UnsupportedMessageError, "expected delete tuple tag K or O, got #{tuple_tag.inspect}"
105
+ end
106
+ end
107
+
108
+ def parse_commit
109
+ share(Messages::Commit.new(read_uint8, read_uint64, read_uint64, read_uint64))
110
+ end
111
+
112
+ def parse_tuple_data
113
+ column_count = read_uint16
114
+
115
+ Array.new(column_count) do
116
+ tag = read_byte_chr
117
+
118
+ case tag
119
+ when "n"
120
+ Messages::TupleValue.new(:null, nil, nil)
121
+ when "u"
122
+ Messages::TupleValue.new(:unchanged_toast, nil, nil)
123
+ when "t", "b"
124
+ raw = read_bytes(read_int32).freeze
125
+ Messages::TupleValue.new(tag == "t" ? :text : :binary, raw, nil)
126
+ else
127
+ raise UnsupportedMessageError, "unsupported tuple data tag: #{tag.inspect}"
128
+ end
129
+ end.freeze
130
+ end
131
+
132
+ def read_uint8 = read_bytes(1).unpack1("C")
133
+
134
+ def read_uint16 = read_bytes(2).unpack1("n")
135
+
136
+ def read_uint32 = read_bytes(4).unpack1("N")
137
+
138
+ def read_int32
139
+ value = read_uint32
140
+ value >= 0x8000_0000 ? value - 0x1_0000_0000 : value
141
+ end
142
+
143
+ def read_uint64 = read_bytes(8).unpack1("Q>")
144
+
145
+ def read_byte_chr = read_bytes(1)
146
+
147
+ def read_cstring
148
+ zero = @payload.index("\0", @offset)
149
+ raise TruncatedMessageError, "unterminated cstring at offset #{@offset}" unless zero
150
+
151
+ value = @payload.byteslice(@offset, zero - @offset).freeze
152
+ @offset = zero + 1
153
+ value
154
+ end
155
+
156
+ def read_bytes(length)
157
+ raise TruncatedMessageError, "negative byte length #{length}" if length.negative?
158
+ if @offset + length > @payload.bytesize
159
+ raise TruncatedMessageError, "need #{length} bytes at offset #{@offset}, payload has #{@payload.bytesize} bytes"
160
+ end
161
+
162
+ value = @payload.byteslice(@offset, length)
163
+ @offset += length
164
+ value
165
+ end
166
+
167
+ def share(message)
168
+ Ractor.make_shareable(message)
169
+ end
170
+ end
171
+ end
@@ -0,0 +1,24 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Pgoutput
4
+ # Base error for all parser failures.
5
+ #
6
+ # @api public
7
+ class Error < StandardError; end
8
+
9
+ # Raised when a payload ends before the requested protocol field can be read.
10
+ #
11
+ # @api public
12
+ class TruncatedMessageError < Error; end
13
+
14
+ # Raised when the parser sees a message or tuple tag outside this MVP scope.
15
+ #
16
+ # @api public
17
+ class UnsupportedMessageError < Error; end
18
+
19
+ # Raised when row data references a relation id that has not been observed via
20
+ # a preceding Relation (`R`) message in the current stream decoder.
21
+ #
22
+ # @api public
23
+ class UnknownRelationError < Error; end
24
+ end
@@ -0,0 +1,106 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Pgoutput
4
+ # Immutable message model classes for the PostgreSQL pgoutput protocol.
5
+ #
6
+ # Every value returned by the parser is deeply shareable via
7
+ # {Ractor.make_shareable}. These classes are protocol-level structures only;
8
+ # they preserve tuple bytes and metadata but do not convert PostgreSQL values
9
+ # into application-specific Ruby types.
10
+ #
11
+ # @api public
12
+ module Messages
13
+ # Transaction begin message.
14
+ #
15
+ # @!attribute [r] final_lsn
16
+ # @return [Integer] final transaction LSN.
17
+ # @!attribute [r] commit_timestamp
18
+ # @return [Integer] microseconds since PostgreSQL epoch.
19
+ # @!attribute [r] xid
20
+ # @return [Integer] transaction id.
21
+ Begin = Data.define(:final_lsn, :commit_timestamp, :xid)
22
+
23
+ # Relation column metadata.
24
+ #
25
+ # @!attribute [r] flags
26
+ # @return [Integer] column flags; key columns use flag 1.
27
+ # @!attribute [r] name
28
+ # @return [String] column name.
29
+ # @!attribute [r] oid
30
+ # @return [Integer] PostgreSQL type OID.
31
+ # @!attribute [r] type_modifier
32
+ # @return [Integer] PostgreSQL type modifier.
33
+ Column = Data.define(:flags, :name, :oid, :type_modifier)
34
+
35
+ # Relation metadata message.
36
+ #
37
+ # @!attribute [r] relation_id
38
+ # @return [Integer] relation OID used by later DML messages.
39
+ # @!attribute [r] schema
40
+ # @return [String] namespace name.
41
+ # @!attribute [r] table
42
+ # @return [String] relation name.
43
+ # @!attribute [r] replica_identity
44
+ # @return [Integer] relation replica identity setting.
45
+ # @!attribute [r] columns
46
+ # @return [Array<Column>] immutable column metadata.
47
+ Relation = Data.define(:relation_id, :schema, :table, :replica_identity, :columns)
48
+
49
+ # One tuple column value.
50
+ #
51
+ # @!attribute [r] format
52
+ # @return [:null, :unchanged_toast, :text, :binary] protocol value format.
53
+ # @!attribute [r] raw
54
+ # @return [String, nil] immutable raw payload, or nil for NULL/TOAST markers.
55
+ # @!attribute [r] oid
56
+ # @return [Integer, nil] PostgreSQL type OID when relation metadata is known.
57
+ TupleValue = Data.define(:format, :raw, :oid)
58
+
59
+ # Insert DML message.
60
+ #
61
+ # @!attribute [r] relation_id
62
+ # @return [Integer] relation OID.
63
+ # @!attribute [r] tuple
64
+ # @return [Array<TupleValue>] new tuple data.
65
+ Insert = Data.define(:relation_id, :tuple)
66
+
67
+ # Update DML message.
68
+ #
69
+ # The message may contain either an old key tuple, an old full tuple, or
70
+ # neither; it always contains a new tuple.
71
+ #
72
+ # @!attribute [r] relation_id
73
+ # @return [Integer] relation OID.
74
+ # @!attribute [r] old_key_tuple
75
+ # @return [Array<TupleValue>, nil] replica identity key tuple.
76
+ # @!attribute [r] old_tuple
77
+ # @return [Array<TupleValue>, nil] full old tuple when replica identity is FULL.
78
+ # @!attribute [r] new_tuple
79
+ # @return [Array<TupleValue>] new tuple data.
80
+ Update = Data.define(:relation_id, :old_key_tuple, :old_tuple, :new_tuple)
81
+
82
+ # Delete DML message.
83
+ #
84
+ # The message contains either an old key tuple or an old full tuple.
85
+ #
86
+ # @!attribute [r] relation_id
87
+ # @return [Integer] relation OID.
88
+ # @!attribute [r] old_key_tuple
89
+ # @return [Array<TupleValue>, nil] replica identity key tuple.
90
+ # @!attribute [r] old_tuple
91
+ # @return [Array<TupleValue>, nil] full old tuple when replica identity is FULL.
92
+ Delete = Data.define(:relation_id, :old_key_tuple, :old_tuple)
93
+
94
+ # Transaction commit message.
95
+ #
96
+ # @!attribute [r] flags
97
+ # @return [Integer] commit flags; currently unused by PostgreSQL.
98
+ # @!attribute [r] commit_lsn
99
+ # @return [Integer] commit LSN.
100
+ # @!attribute [r] transaction_end_lsn
101
+ # @return [Integer] transaction end LSN.
102
+ # @!attribute [r] commit_timestamp
103
+ # @return [Integer] microseconds since PostgreSQL epoch.
104
+ Commit = Data.define(:flags, :commit_lsn, :transaction_end_lsn, :commit_timestamp)
105
+ end
106
+ end
@@ -0,0 +1,109 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Pgoutput
4
+ # Stateful relation tracker for pgoutput message sequences.
5
+ #
6
+ # The relation tracker remembers Relation (`R`) messages so DML tuple values can
7
+ # be annotated with PostgreSQL type OIDs. It does not decode or convert values.
8
+ # It only adds protocol metadata to tuple values while keeping returned objects
9
+ # deeply shareable.
10
+ #
11
+ # The instance contains mutable relation-cache state and should not be shared
12
+ # across Ractors. Returned message objects are Ractor-safe.
13
+ #
14
+ # @api public
15
+ class RelationTracker
16
+ # @return [void]
17
+ def initialize
18
+ @relations = {}
19
+ end
20
+
21
+ # Process one pgoutput payload in stream order.
22
+ #
23
+ # @param payload [String] one pgoutput logical replication message payload.
24
+ # @return [Pgoutput::Messages::Begin, Pgoutput::Messages::Relation,
25
+ # Pgoutput::Messages::Insert, Pgoutput::Messages::Update,
26
+ # Pgoutput::Messages::Delete, Pgoutput::Messages::Commit]
27
+ # @raise [UnknownRelationError] if DML references an unseen relation id.
28
+ def process(payload)
29
+ message = BinaryParser.new(payload).parse
30
+
31
+ case message
32
+ when Messages::Relation
33
+ @relations[message.relation_id] = message
34
+ message
35
+ when Messages::Insert
36
+ annotate_insert(message)
37
+ when Messages::Update
38
+ annotate_update(message)
39
+ when Messages::Delete
40
+ annotate_delete(message)
41
+ else
42
+ message
43
+ end
44
+ end
45
+
46
+ # Backwards-compatible alias for callers migrating from RelationTracker.
47
+ #
48
+ # @param payload [String] one pgoutput logical replication message payload.
49
+ # @return [Pgoutput::Messages::Begin, Pgoutput::Messages::Relation,
50
+ # Pgoutput::Messages::Insert, Pgoutput::Messages::Update,
51
+ # Pgoutput::Messages::Delete, Pgoutput::Messages::Commit]
52
+ def decode(payload)
53
+ process(payload)
54
+ end
55
+
56
+ private
57
+
58
+ def annotate_insert(message)
59
+ relation = relation_for(message.relation_id)
60
+
61
+ Ractor.make_shareable(
62
+ Messages::Insert.new(
63
+ message.relation_id,
64
+ annotate_tuple(message.tuple, relation)
65
+ )
66
+ )
67
+ end
68
+
69
+ def annotate_update(message)
70
+ relation = relation_for(message.relation_id)
71
+
72
+ Ractor.make_shareable(
73
+ Messages::Update.new(
74
+ message.relation_id,
75
+ annotate_tuple(message.old_key_tuple, relation),
76
+ annotate_tuple(message.old_tuple, relation),
77
+ annotate_tuple(message.new_tuple, relation)
78
+ )
79
+ )
80
+ end
81
+
82
+ def annotate_delete(message)
83
+ relation = relation_for(message.relation_id)
84
+
85
+ Ractor.make_shareable(
86
+ Messages::Delete.new(
87
+ message.relation_id,
88
+ annotate_tuple(message.old_key_tuple, relation),
89
+ annotate_tuple(message.old_tuple, relation)
90
+ )
91
+ )
92
+ end
93
+
94
+ def annotate_tuple(tuple, relation)
95
+ return nil if tuple.nil?
96
+
97
+ tuple.each_with_index.map do |value, index|
98
+ column = relation.columns[index]
99
+ Messages::TupleValue.new(value.format, value.raw, column&.oid)
100
+ end.freeze
101
+ end
102
+
103
+ def relation_for(relation_id)
104
+ @relations.fetch(relation_id) do
105
+ raise UnknownRelationError, "unknown relation id #{relation_id}; parse Relation message first"
106
+ end
107
+ end
108
+ end
109
+ end
@@ -0,0 +1,8 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Pgoutput
4
+ # Current gem version.
5
+ #
6
+ # @return [String] semantic version string.
7
+ VERSION = "0.1.0"
8
+ end
data/lib/pgoutput.rb ADDED
@@ -0,0 +1,17 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "pgoutput/version"
4
+ require_relative "pgoutput/errors"
5
+ require_relative "pgoutput/messages"
6
+ require_relative "pgoutput/binary_parser"
7
+ require_relative "pgoutput/relation_tracker"
8
+
9
+ # Top-level namespace for pgoutput-parser.
10
+ #
11
+ # pgoutput-parser parses PostgreSQL `pgoutput` logical replication protocol
12
+ # payloads into immutable Ruby protocol message objects. The namespace is kept
13
+ # short as `Pgoutput`, while the RubyGems package name is `pgoutput-parser`.
14
+ #
15
+ # @api public
16
+ module Pgoutput
17
+ end
data/sig/pgoutput.rbs ADDED
@@ -0,0 +1,129 @@
1
+ module Pgoutput
2
+ VERSION: String
3
+
4
+ type message =
5
+ Messages::Begin |
6
+ Messages::Relation |
7
+ Messages::Insert |
8
+ Messages::Update |
9
+ Messages::Delete |
10
+ Messages::Commit
11
+
12
+ class Error < StandardError
13
+ end
14
+
15
+ class TruncatedMessageError < Error
16
+ end
17
+
18
+ class UnsupportedMessageError < Error
19
+ end
20
+
21
+ class UnknownRelationError < Error
22
+ end
23
+
24
+ module Messages
25
+ class Begin < Data
26
+ attr_reader final_lsn: Integer
27
+ attr_reader commit_timestamp: Integer
28
+ attr_reader xid: Integer
29
+
30
+ def self.new: (Integer final_lsn, Integer commit_timestamp, Integer xid) -> Begin
31
+ end
32
+
33
+ class Column < Data
34
+ attr_reader flags: Integer
35
+ attr_reader name: String
36
+ attr_reader oid: Integer
37
+ attr_reader type_modifier: Integer
38
+
39
+ def self.new: (Integer flags, String name, Integer oid, Integer type_modifier) -> Column
40
+ end
41
+
42
+ class Relation < Data
43
+ attr_reader relation_id: Integer
44
+ attr_reader schema: String
45
+ attr_reader table: String
46
+ attr_reader replica_identity: Integer
47
+ attr_reader columns: Array[Column]
48
+
49
+ def self.new: (
50
+ Integer relation_id,
51
+ String schema,
52
+ String table,
53
+ Integer replica_identity,
54
+ Array[Column] columns
55
+ ) -> Relation
56
+ end
57
+
58
+ class TupleValue < Data
59
+ attr_reader format: :null | :unchanged_toast | :text | :binary
60
+ attr_reader raw: String?
61
+ attr_reader oid: Integer?
62
+
63
+ def self.new: (
64
+ :null | :unchanged_toast | :text | :binary format,
65
+ String? raw,
66
+ Integer? oid
67
+ ) -> TupleValue
68
+ end
69
+
70
+ class Insert < Data
71
+ attr_reader relation_id: Integer
72
+ attr_reader tuple: Array[TupleValue]
73
+
74
+ def self.new: (Integer relation_id, Array[TupleValue] tuple) -> Insert
75
+ end
76
+
77
+ class Update < Data
78
+ attr_reader relation_id: Integer
79
+ attr_reader old_key_tuple: Array[TupleValue]?
80
+ attr_reader old_tuple: Array[TupleValue]?
81
+ attr_reader new_tuple: Array[TupleValue]
82
+
83
+ def self.new: (
84
+ Integer relation_id,
85
+ Array[TupleValue]? old_key_tuple,
86
+ Array[TupleValue]? old_tuple,
87
+ Array[TupleValue] new_tuple
88
+ ) -> Update
89
+ end
90
+
91
+ class Delete < Data
92
+ attr_reader relation_id: Integer
93
+ attr_reader old_key_tuple: Array[TupleValue]?
94
+ attr_reader old_tuple: Array[TupleValue]?
95
+
96
+ def self.new: (
97
+ Integer relation_id,
98
+ Array[TupleValue]? old_key_tuple,
99
+ Array[TupleValue]? old_tuple
100
+ ) -> Delete
101
+ end
102
+
103
+ class Commit < Data
104
+ attr_reader flags: Integer
105
+ attr_reader commit_lsn: Integer
106
+ attr_reader transaction_end_lsn: Integer
107
+ attr_reader commit_timestamp: Integer
108
+
109
+ def self.new: (
110
+ Integer flags,
111
+ Integer commit_lsn,
112
+ Integer transaction_end_lsn,
113
+ Integer commit_timestamp
114
+ ) -> Commit
115
+ end
116
+ end
117
+
118
+ class BinaryParser
119
+ def initialize: (String payload) -> void
120
+
121
+ def parse: () -> message
122
+ end
123
+
124
+ class RelationTracker
125
+ def initialize: () -> void
126
+
127
+ def process: (String payload) -> message
128
+ end
129
+ end
metadata ADDED
@@ -0,0 +1,153 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: pgoutput-parser
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Ken C. Demanawa
8
+ bindir: bin
9
+ cert_chain: []
10
+ date: 1980-01-02 00:00:00.000000000 Z
11
+ dependencies:
12
+ - !ruby/object:Gem::Dependency
13
+ name: minitest
14
+ requirement: !ruby/object:Gem::Requirement
15
+ requirements:
16
+ - - "~>"
17
+ - !ruby/object:Gem::Version
18
+ version: '5.27'
19
+ type: :development
20
+ prerelease: false
21
+ version_requirements: !ruby/object:Gem::Requirement
22
+ requirements:
23
+ - - "~>"
24
+ - !ruby/object:Gem::Version
25
+ version: '5.27'
26
+ - !ruby/object:Gem::Dependency
27
+ name: pry
28
+ requirement: !ruby/object:Gem::Requirement
29
+ requirements:
30
+ - - "~>"
31
+ - !ruby/object:Gem::Version
32
+ version: 0.16.0
33
+ type: :development
34
+ prerelease: false
35
+ version_requirements: !ruby/object:Gem::Requirement
36
+ requirements:
37
+ - - "~>"
38
+ - !ruby/object:Gem::Version
39
+ version: 0.16.0
40
+ - !ruby/object:Gem::Dependency
41
+ name: rake
42
+ requirement: !ruby/object:Gem::Requirement
43
+ requirements:
44
+ - - "~>"
45
+ - !ruby/object:Gem::Version
46
+ version: '13.4'
47
+ type: :development
48
+ prerelease: false
49
+ version_requirements: !ruby/object:Gem::Requirement
50
+ requirements:
51
+ - - "~>"
52
+ - !ruby/object:Gem::Version
53
+ version: '13.4'
54
+ - !ruby/object:Gem::Dependency
55
+ name: rubocop
56
+ requirement: !ruby/object:Gem::Requirement
57
+ requirements:
58
+ - - "~>"
59
+ - !ruby/object:Gem::Version
60
+ version: '1.87'
61
+ type: :development
62
+ prerelease: false
63
+ version_requirements: !ruby/object:Gem::Requirement
64
+ requirements:
65
+ - - "~>"
66
+ - !ruby/object:Gem::Version
67
+ version: '1.87'
68
+ - !ruby/object:Gem::Dependency
69
+ name: simplecov
70
+ requirement: !ruby/object:Gem::Requirement
71
+ requirements:
72
+ - - "~>"
73
+ - !ruby/object:Gem::Version
74
+ version: '0.22'
75
+ type: :development
76
+ prerelease: false
77
+ version_requirements: !ruby/object:Gem::Requirement
78
+ requirements:
79
+ - - "~>"
80
+ - !ruby/object:Gem::Version
81
+ version: '0.22'
82
+ - !ruby/object:Gem::Dependency
83
+ name: steep
84
+ requirement: !ruby/object:Gem::Requirement
85
+ requirements:
86
+ - - "~>"
87
+ - !ruby/object:Gem::Version
88
+ version: '1.10'
89
+ type: :development
90
+ prerelease: false
91
+ version_requirements: !ruby/object:Gem::Requirement
92
+ requirements:
93
+ - - "~>"
94
+ - !ruby/object:Gem::Version
95
+ version: '1.10'
96
+ - !ruby/object:Gem::Dependency
97
+ name: yard
98
+ requirement: !ruby/object:Gem::Requirement
99
+ requirements:
100
+ - - "~>"
101
+ - !ruby/object:Gem::Version
102
+ version: 0.9.44
103
+ type: :development
104
+ prerelease: false
105
+ version_requirements: !ruby/object:Gem::Requirement
106
+ requirements:
107
+ - - "~>"
108
+ - !ruby/object:Gem::Version
109
+ version: 0.9.44
110
+ description: A pure Ruby parser for PostgreSQL pgoutput logical replication CopyData
111
+ payloads.
112
+ email:
113
+ - kenneth.c.demanawa@gmail.com
114
+ executables: []
115
+ extensions: []
116
+ extra_rdoc_files: []
117
+ files:
118
+ - CHANGELOG.md
119
+ - LICENSE.txt
120
+ - README.md
121
+ - lib/pgoutput.rb
122
+ - lib/pgoutput/binary_parser.rb
123
+ - lib/pgoutput/errors.rb
124
+ - lib/pgoutput/messages.rb
125
+ - lib/pgoutput/relation_tracker.rb
126
+ - lib/pgoutput/version.rb
127
+ - sig/pgoutput.rbs
128
+ homepage: https://github.com/kanutocd/pgoutput-parser
129
+ licenses:
130
+ - MIT
131
+ metadata:
132
+ homepage_uri: https://github.com/kanutocd/pgoutput-parser
133
+ source_code_uri: https://github.com/kanutocd/pgoutput-parser
134
+ changelog_uri: https://github.com/kanutocd/pgoutput-parser/CHANGELOG.md
135
+ rubygems_mfa_required: 'true'
136
+ rdoc_options: []
137
+ require_paths:
138
+ - lib
139
+ required_ruby_version: !ruby/object:Gem::Requirement
140
+ requirements:
141
+ - - ">="
142
+ - !ruby/object:Gem::Version
143
+ version: '3.4'
144
+ required_rubygems_version: !ruby/object:Gem::Requirement
145
+ requirements:
146
+ - - ">="
147
+ - !ruby/object:Gem::Version
148
+ version: '0'
149
+ requirements: []
150
+ rubygems_version: 3.6.9
151
+ specification_version: 4
152
+ summary: Ractor-safe PostgreSQL pgoutput logical replication protocol parser.
153
+ test_files: []