pgoutput-parser 0.1.1 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +36 -4
- data/README.md +69 -7
- data/docs/glossary.md +145 -0
- data/docs/index.md +239 -0
- data/docs/relation_tracker.md +168 -0
- data/lib/pgoutput/binary_parser.rb +47 -11
- data/lib/pgoutput/errors.rb +10 -4
- data/lib/pgoutput/messages.rb +40 -2
- data/lib/pgoutput/relation_tracker.rb +79 -18
- data/lib/pgoutput/version.rb +1 -1
- data/lib/pgoutput.rb +1 -1
- data/sig/pgoutput.rbs +85 -30
- metadata +7 -102
|
@@ -0,0 +1,168 @@
|
|
|
1
|
+
# RelationTracker Guide
|
|
2
|
+
|
|
3
|
+
`Pgoutput::RelationTracker` is the stream-order parser wrapper for callers that
|
|
4
|
+
want DML tuple values annotated with PostgreSQL type OIDs.
|
|
5
|
+
|
|
6
|
+
Use `Pgoutput::BinaryParser` when each payload can be decoded independently. Use
|
|
7
|
+
`RelationTracker` when Insert, Update, and Delete messages need the Relation
|
|
8
|
+
metadata that PostgreSQL sent earlier in the same logical replication stream.
|
|
9
|
+
|
|
10
|
+
## Why Relation Tracking Exists
|
|
11
|
+
|
|
12
|
+
pgoutput row-change messages reference a relation ID, but tuple values do not
|
|
13
|
+
repeat column names or PostgreSQL type OIDs. PostgreSQL sends that metadata in a
|
|
14
|
+
Relation (`R`) message.
|
|
15
|
+
|
|
16
|
+
The tracker caches Relation messages:
|
|
17
|
+
|
|
18
|
+
```text
|
|
19
|
+
R users(id int4, email text, active bool)
|
|
20
|
+
I relation_id=42 tuple=["7", "dev@example.test", "t"]
|
|
21
|
+
U relation_id=42 tuple=["7", "ops@example.test", "t"]
|
|
22
|
+
D relation_id=42 old_key=["7"]
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
After the Relation message has been seen, later DML tuple values can be
|
|
26
|
+
annotated:
|
|
27
|
+
|
|
28
|
+
```ruby
|
|
29
|
+
stream = Pgoutput::RelationTracker.new
|
|
30
|
+
|
|
31
|
+
stream.process(relation_payload)
|
|
32
|
+
insert = stream.process(insert_payload)
|
|
33
|
+
|
|
34
|
+
insert.tuple.map(&:oid)
|
|
35
|
+
# => [23, 25, 16]
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
The raw tuple bytes are unchanged. The tracker only attaches metadata.
|
|
39
|
+
|
|
40
|
+
## Stream-Order Contract
|
|
41
|
+
|
|
42
|
+
`RelationTracker` assumes payloads are processed in pgoutput stream order.
|
|
43
|
+
|
|
44
|
+
This order matters because DML messages depend on earlier Relation messages. If
|
|
45
|
+
an Insert, Update, or Delete references a relation ID that has not been cached,
|
|
46
|
+
the tracker raises `Pgoutput::UnknownRelationError`.
|
|
47
|
+
|
|
48
|
+
```ruby
|
|
49
|
+
stream = Pgoutput::RelationTracker.new
|
|
50
|
+
|
|
51
|
+
stream.process(insert_payload)
|
|
52
|
+
# raises Pgoutput::UnknownRelationError
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
The tracker does not reorder messages, buffer future DML, deduplicate events, or
|
|
56
|
+
validate row lifecycle semantics such as whether an Insert occurred before a
|
|
57
|
+
Delete for the same primary key. Those guarantees belong to higher CDC pipeline
|
|
58
|
+
layers.
|
|
59
|
+
|
|
60
|
+
## Tuple Arity Validation
|
|
61
|
+
|
|
62
|
+
The tracker validates tuple arity before annotating OIDs. If PostgreSQL sends a
|
|
63
|
+
tuple with a different number of values than the cached Relation column count,
|
|
64
|
+
the tracker raises `Pgoutput::TupleArityError`.
|
|
65
|
+
|
|
66
|
+
This avoids silently assigning the wrong type OIDs to tuple positions.
|
|
67
|
+
|
|
68
|
+
```ruby
|
|
69
|
+
stream = Pgoutput::RelationTracker.new
|
|
70
|
+
|
|
71
|
+
stream.process(relation_payload)
|
|
72
|
+
stream.process(malformed_insert_payload)
|
|
73
|
+
# raises Pgoutput::TupleArityError
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
## Default Relation Cache
|
|
77
|
+
|
|
78
|
+
By default, each tracker owns a plain Ruby `Hash`:
|
|
79
|
+
|
|
80
|
+
```ruby
|
|
81
|
+
stream = Pgoutput::RelationTracker.new
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
That is the right default for a single stream owner. The tracker instance itself
|
|
85
|
+
is mutable and should be scoped to the code path that processes that logical
|
|
86
|
+
replication stream.
|
|
87
|
+
|
|
88
|
+
Parsed message objects returned from `process` are Ractor-shareable. The mutable
|
|
89
|
+
tracker is not the shareable value; the parsed messages are.
|
|
90
|
+
|
|
91
|
+
## Swappable Relation Cache
|
|
92
|
+
|
|
93
|
+
`RelationTracker` accepts a `relation_cache:` object:
|
|
94
|
+
|
|
95
|
+
```ruby
|
|
96
|
+
stream = Pgoutput::RelationTracker.new(relation_cache: {})
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
The cache object must support:
|
|
100
|
+
|
|
101
|
+
- `#[]=` for storing Relation messages by relation ID
|
|
102
|
+
- `#fetch` for reading Relation messages and raising through the provided block
|
|
103
|
+
when a relation ID is unknown
|
|
104
|
+
|
|
105
|
+
This keeps `RelationTracker` independent from a specific cache implementation.
|
|
106
|
+
|
|
107
|
+
## Ratomic::Map Cache
|
|
108
|
+
|
|
109
|
+
For experimental or parallel Ruby 4 setups, callers can inject
|
|
110
|
+
`Ratomic::Map`:
|
|
111
|
+
|
|
112
|
+
```ruby
|
|
113
|
+
require "ratomic"
|
|
114
|
+
require "pgoutput"
|
|
115
|
+
|
|
116
|
+
relation_cache = Ratomic::Map.new
|
|
117
|
+
stream = Pgoutput::RelationTracker.new(relation_cache: relation_cache)
|
|
118
|
+
|
|
119
|
+
stream.process(relation_payload)
|
|
120
|
+
insert = stream.process(insert_payload)
|
|
121
|
+
|
|
122
|
+
insert.tuple.map(&:oid)
|
|
123
|
+
# => [23, 25, 16]
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
`Ratomic::Map` is useful when relation metadata must live in a Ractor-safe cache.
|
|
127
|
+
This gem keeps Ratomic as an optional development/benchmark dependency rather
|
|
128
|
+
than a runtime dependency; applications that want this cache backend should add
|
|
129
|
+
Ratomic directly.
|
|
130
|
+
|
|
131
|
+
Prefer the default Hash unless a pipeline design specifically needs a shared
|
|
132
|
+
Ractor-safe relation metadata cache.
|
|
133
|
+
|
|
134
|
+
## Ractor Pattern
|
|
135
|
+
|
|
136
|
+
A common pattern is to keep one tracker per stream-processing lane and pass only
|
|
137
|
+
parsed immutable messages across Ractors:
|
|
138
|
+
|
|
139
|
+
```ruby
|
|
140
|
+
stream = Pgoutput::RelationTracker.new
|
|
141
|
+
|
|
142
|
+
stream.process(relation_payload)
|
|
143
|
+
message = stream.process(update_payload)
|
|
144
|
+
|
|
145
|
+
worker = Ractor.new(message) do |event|
|
|
146
|
+
event.new_tuple.map(&:raw)
|
|
147
|
+
end
|
|
148
|
+
|
|
149
|
+
worker.take
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
If relation metadata itself must be shared across lanes, use an explicit
|
|
153
|
+
Ractor-safe cache such as `Ratomic::Map` and benchmark the result for the target
|
|
154
|
+
workload.
|
|
155
|
+
|
|
156
|
+
## Boundary
|
|
157
|
+
|
|
158
|
+
`RelationTracker` is still a parser-layer utility. It does not perform:
|
|
159
|
+
|
|
160
|
+
- PostgreSQL value decoding
|
|
161
|
+
- application-level type conversion
|
|
162
|
+
- event ordering across sink workers
|
|
163
|
+
- checkpointing
|
|
164
|
+
- retry coordination
|
|
165
|
+
- row lifecycle validation
|
|
166
|
+
|
|
167
|
+
Its job is narrow: remember Relation metadata, annotate DML tuple values with
|
|
168
|
+
type OIDs, validate tuple arity, and return immutable protocol messages.
|
|
@@ -8,10 +8,11 @@ module Pgoutput
|
|
|
8
8
|
# single payload. Its returned message object is deeply frozen/shareable and may
|
|
9
9
|
# cross Ractor boundaries safely.
|
|
10
10
|
#
|
|
11
|
-
# @api public
|
|
11
|
+
# @api public Public parser for decoding one pgoutput protocol message payload.
|
|
12
|
+
# rubocop:disable Metrics/ClassLength
|
|
12
13
|
class BinaryParser
|
|
13
14
|
# @param payload [String] one pgoutput message payload from a CopyData frame.
|
|
14
|
-
# @return [void]
|
|
15
|
+
# @return [void] initializes parser state for the supplied payload.
|
|
15
16
|
def initialize(payload)
|
|
16
17
|
@payload = payload.b
|
|
17
18
|
@offset = 0
|
|
@@ -19,25 +20,34 @@ module Pgoutput
|
|
|
19
20
|
|
|
20
21
|
# Parse one supported pgoutput message.
|
|
21
22
|
#
|
|
22
|
-
# Supported MVP tags are `B`, `R`, `I`, `U`, `D`, and `C`.
|
|
23
|
+
# Supported MVP tags are `B`, `M`, `O`, `R`, `Y`, `I`, `U`, `D`, `T`, and `C`.
|
|
23
24
|
#
|
|
24
|
-
# @return [Pgoutput::Messages::Begin, Pgoutput::Messages::
|
|
25
|
+
# @return [Pgoutput::Messages::Begin, Pgoutput::Messages::Message,
|
|
26
|
+
# Pgoutput::Messages::Origin, Pgoutput::Messages::Relation,
|
|
27
|
+
# Pgoutput::Messages::Type, Pgoutput::Messages::Truncate,
|
|
25
28
|
# Pgoutput::Messages::Insert, Pgoutput::Messages::Update,
|
|
26
|
-
# Pgoutput::Messages::Delete, Pgoutput::Messages::Commit]
|
|
29
|
+
# Pgoutput::Messages::Delete, Pgoutput::Messages::Commit] parsed immutable
|
|
30
|
+
# message object for the payload tag.
|
|
27
31
|
# @raise [UnsupportedMessageError] if the message tag is unsupported.
|
|
28
32
|
# @raise [TruncatedMessageError] if the payload is incomplete.
|
|
33
|
+
# rubocop:disable Metrics/CyclomaticComplexity
|
|
29
34
|
def parse
|
|
30
35
|
case read_byte_chr
|
|
31
36
|
when "B" then parse_begin
|
|
37
|
+
when "M" then parse_message
|
|
38
|
+
when "O" then parse_origin
|
|
32
39
|
when "R" then parse_relation
|
|
40
|
+
when "Y" then parse_type
|
|
33
41
|
when "I" then parse_insert
|
|
34
42
|
when "U" then parse_update
|
|
35
43
|
when "D" then parse_delete
|
|
44
|
+
when "T" then parse_truncate
|
|
36
45
|
when "C" then parse_commit
|
|
37
46
|
else
|
|
38
47
|
raise UnsupportedMessageError, "unsupported pgoutput message tag"
|
|
39
48
|
end
|
|
40
49
|
end
|
|
50
|
+
# rubocop:enable Metrics/CyclomaticComplexity
|
|
41
51
|
|
|
42
52
|
private
|
|
43
53
|
|
|
@@ -45,6 +55,19 @@ module Pgoutput
|
|
|
45
55
|
share(Messages::Begin.new(read_uint64, read_uint64, read_uint32))
|
|
46
56
|
end
|
|
47
57
|
|
|
58
|
+
def parse_message
|
|
59
|
+
flags = read_uint8
|
|
60
|
+
lsn = read_uint64
|
|
61
|
+
prefix = read_cstring
|
|
62
|
+
content = read_bytes(read_int32).freeze
|
|
63
|
+
|
|
64
|
+
share(Messages::Message.new(flags, lsn, prefix, content))
|
|
65
|
+
end
|
|
66
|
+
|
|
67
|
+
def parse_origin
|
|
68
|
+
share(Messages::Origin.new(read_uint64, read_cstring))
|
|
69
|
+
end
|
|
70
|
+
|
|
48
71
|
def parse_relation
|
|
49
72
|
relation_id = read_uint32
|
|
50
73
|
schema = read_cstring
|
|
@@ -59,6 +82,10 @@ module Pgoutput
|
|
|
59
82
|
share(Messages::Relation.new(relation_id, schema, table, replica_identity, columns))
|
|
60
83
|
end
|
|
61
84
|
|
|
85
|
+
def parse_type
|
|
86
|
+
share(Messages::Type.new(read_uint32, read_cstring, read_cstring))
|
|
87
|
+
end
|
|
88
|
+
|
|
62
89
|
def parse_insert
|
|
63
90
|
relation_id = read_uint32
|
|
64
91
|
tuple_tag = read_byte_chr
|
|
@@ -105,6 +132,14 @@ module Pgoutput
|
|
|
105
132
|
end
|
|
106
133
|
end
|
|
107
134
|
|
|
135
|
+
def parse_truncate
|
|
136
|
+
relation_count = read_uint32
|
|
137
|
+
options = read_uint8
|
|
138
|
+
relation_ids = Array.new(relation_count) { read_uint32 }.freeze
|
|
139
|
+
|
|
140
|
+
share(Messages::Truncate.new(relation_ids, options))
|
|
141
|
+
end
|
|
142
|
+
|
|
108
143
|
def parse_commit
|
|
109
144
|
share(Messages::Commit.new(read_uint8, read_uint64, read_uint64, read_uint64))
|
|
110
145
|
end
|
|
@@ -129,18 +164,18 @@ module Pgoutput
|
|
|
129
164
|
end.freeze
|
|
130
165
|
end
|
|
131
166
|
|
|
132
|
-
def read_uint8 = read_bytes(1).unpack1("C")
|
|
167
|
+
def read_uint8 = Integer(read_bytes(1).unpack1("C"))
|
|
133
168
|
|
|
134
|
-
def read_uint16 = read_bytes(2).unpack1("n")
|
|
169
|
+
def read_uint16 = Integer(read_bytes(2).unpack1("n"))
|
|
135
170
|
|
|
136
|
-
def read_uint32 = read_bytes(4).unpack1("N")
|
|
171
|
+
def read_uint32 = Integer(read_bytes(4).unpack1("N"))
|
|
137
172
|
|
|
138
173
|
def read_int32
|
|
139
174
|
value = read_uint32
|
|
140
175
|
value >= 0x8000_0000 ? value - 0x1_0000_0000 : value
|
|
141
176
|
end
|
|
142
177
|
|
|
143
|
-
def read_uint64 = read_bytes(8).unpack1("Q>")
|
|
178
|
+
def read_uint64 = Integer(read_bytes(8).unpack1("Q>"))
|
|
144
179
|
|
|
145
180
|
def read_byte_chr = read_bytes(1)
|
|
146
181
|
|
|
@@ -148,7 +183,7 @@ module Pgoutput
|
|
|
148
183
|
zero = @payload.index("\0", @offset)
|
|
149
184
|
raise TruncatedMessageError, "unterminated cstring at offset #{@offset}" unless zero
|
|
150
185
|
|
|
151
|
-
value = @payload.byteslice(@offset, zero - @offset).freeze
|
|
186
|
+
value = String(@payload.byteslice(@offset, zero - @offset)).freeze
|
|
152
187
|
@offset = zero + 1
|
|
153
188
|
value
|
|
154
189
|
end
|
|
@@ -159,7 +194,7 @@ module Pgoutput
|
|
|
159
194
|
raise TruncatedMessageError, "need #{length} bytes at offset #{@offset}, payload has #{@payload.bytesize} bytes"
|
|
160
195
|
end
|
|
161
196
|
|
|
162
|
-
value = @payload.byteslice(@offset, length)
|
|
197
|
+
value = String(@payload.byteslice(@offset, length))
|
|
163
198
|
@offset += length
|
|
164
199
|
value
|
|
165
200
|
end
|
|
@@ -168,4 +203,5 @@ module Pgoutput
|
|
|
168
203
|
Ractor.make_shareable(message)
|
|
169
204
|
end
|
|
170
205
|
end
|
|
206
|
+
# rubocop:enable Metrics/ClassLength
|
|
171
207
|
end
|
data/lib/pgoutput/errors.rb
CHANGED
|
@@ -3,22 +3,28 @@
|
|
|
3
3
|
module Pgoutput
|
|
4
4
|
# Base error for all parser failures.
|
|
5
5
|
#
|
|
6
|
-
# @api public
|
|
6
|
+
# @api public Public base class for rescuing all pgoutput-parser errors.
|
|
7
7
|
class Error < StandardError; end
|
|
8
8
|
|
|
9
9
|
# Raised when a payload ends before the requested protocol field can be read.
|
|
10
10
|
#
|
|
11
|
-
# @api public
|
|
11
|
+
# @api public Public parser error for incomplete binary payloads.
|
|
12
12
|
class TruncatedMessageError < Error; end
|
|
13
13
|
|
|
14
14
|
# Raised when the parser sees a message or tuple tag outside this MVP scope.
|
|
15
15
|
#
|
|
16
|
-
# @api public
|
|
16
|
+
# @api public Public parser error for pgoutput protocol features outside this scope.
|
|
17
17
|
class UnsupportedMessageError < Error; end
|
|
18
18
|
|
|
19
19
|
# Raised when row data references a relation id that has not been observed via
|
|
20
20
|
# a preceding Relation (`R`) message in the current stream decoder.
|
|
21
21
|
#
|
|
22
|
-
# @api public
|
|
22
|
+
# @api public Public tracker error for DML messages missing relation metadata.
|
|
23
23
|
class UnknownRelationError < Error; end
|
|
24
|
+
|
|
25
|
+
# Raised when tuple data does not match the column count advertised by the
|
|
26
|
+
# cached Relation (`R`) message.
|
|
27
|
+
#
|
|
28
|
+
# @api public Public tracker error for malformed tuple/relation metadata pairs.
|
|
29
|
+
class TupleArityError < Error; end
|
|
24
30
|
end
|
data/lib/pgoutput/messages.rb
CHANGED
|
@@ -4,11 +4,11 @@ module Pgoutput
|
|
|
4
4
|
# Immutable message model classes for the PostgreSQL pgoutput protocol.
|
|
5
5
|
#
|
|
6
6
|
# Every value returned by the parser is deeply shareable via
|
|
7
|
-
#
|
|
7
|
+
# `Ractor.make_shareable`. These classes are protocol-level structures only;
|
|
8
8
|
# they preserve tuple bytes and metadata but do not convert PostgreSQL values
|
|
9
9
|
# into application-specific Ruby types.
|
|
10
10
|
#
|
|
11
|
-
# @api public
|
|
11
|
+
# @api public Public immutable message model namespace returned by parsers.
|
|
12
12
|
module Messages
|
|
13
13
|
# Transaction begin message.
|
|
14
14
|
#
|
|
@@ -20,6 +20,26 @@ module Pgoutput
|
|
|
20
20
|
# @return [Integer] transaction id.
|
|
21
21
|
Begin = Data.define(:final_lsn, :commit_timestamp, :xid)
|
|
22
22
|
|
|
23
|
+
# Logical decoding message.
|
|
24
|
+
#
|
|
25
|
+
# @!attribute [r] flags
|
|
26
|
+
# @return [Integer] message flags; bit 0 marks transactional messages.
|
|
27
|
+
# @!attribute [r] lsn
|
|
28
|
+
# @return [Integer] LSN of the logical decoding message.
|
|
29
|
+
# @!attribute [r] prefix
|
|
30
|
+
# @return [String] message prefix.
|
|
31
|
+
# @!attribute [r] content
|
|
32
|
+
# @return [String] immutable raw message content.
|
|
33
|
+
Message = Data.define(:flags, :lsn, :prefix, :content)
|
|
34
|
+
|
|
35
|
+
# Replication origin message.
|
|
36
|
+
#
|
|
37
|
+
# @!attribute [r] origin_lsn
|
|
38
|
+
# @return [Integer] commit LSN on the origin server.
|
|
39
|
+
# @!attribute [r] name
|
|
40
|
+
# @return [String] origin name.
|
|
41
|
+
Origin = Data.define(:origin_lsn, :name)
|
|
42
|
+
|
|
23
43
|
# Relation column metadata.
|
|
24
44
|
#
|
|
25
45
|
# @!attribute [r] flags
|
|
@@ -46,6 +66,16 @@ module Pgoutput
|
|
|
46
66
|
# @return [Array<Column>] immutable column metadata.
|
|
47
67
|
Relation = Data.define(:relation_id, :schema, :table, :replica_identity, :columns)
|
|
48
68
|
|
|
69
|
+
# PostgreSQL type metadata message.
|
|
70
|
+
#
|
|
71
|
+
# @!attribute [r] oid
|
|
72
|
+
# @return [Integer] PostgreSQL type OID.
|
|
73
|
+
# @!attribute [r] schema
|
|
74
|
+
# @return [String] namespace name.
|
|
75
|
+
# @!attribute [r] name
|
|
76
|
+
# @return [String] type name.
|
|
77
|
+
Type = Data.define(:oid, :schema, :name)
|
|
78
|
+
|
|
49
79
|
# One tuple column value.
|
|
50
80
|
#
|
|
51
81
|
# @!attribute [r] format
|
|
@@ -91,6 +121,14 @@ module Pgoutput
|
|
|
91
121
|
# @return [Array<TupleValue>, nil] full old tuple when replica identity is FULL.
|
|
92
122
|
Delete = Data.define(:relation_id, :old_key_tuple, :old_tuple)
|
|
93
123
|
|
|
124
|
+
# Truncate DML message.
|
|
125
|
+
#
|
|
126
|
+
# @!attribute [r] relation_ids
|
|
127
|
+
# @return [Array<Integer>] relation OIDs affected by the truncate.
|
|
128
|
+
# @!attribute [r] options
|
|
129
|
+
# @return [Integer] option bits; 1 is CASCADE, 2 is RESTART IDENTITY.
|
|
130
|
+
Truncate = Data.define(:relation_ids, :options)
|
|
131
|
+
|
|
94
132
|
# Transaction commit message.
|
|
95
133
|
#
|
|
96
134
|
# @!attribute [r] flags
|
|
@@ -8,23 +8,68 @@ module Pgoutput
|
|
|
8
8
|
# It only adds protocol metadata to tuple values while keeping returned objects
|
|
9
9
|
# deeply shareable.
|
|
10
10
|
#
|
|
11
|
-
#
|
|
12
|
-
#
|
|
11
|
+
# pgoutput DML messages carry a relation id and tuple values, but they do not
|
|
12
|
+
# repeat column names or type OIDs. PostgreSQL sends that metadata separately
|
|
13
|
+
# in Relation (`R`) messages. Call {#process} with payloads in stream order so
|
|
14
|
+
# Relation messages are cached before the Insert, Update, or Delete messages
|
|
15
|
+
# that reference them.
|
|
13
16
|
#
|
|
14
|
-
#
|
|
17
|
+
# The relation cache is injectable. The default cache is a plain Hash, which is
|
|
18
|
+
# appropriate when one stream owner processes payloads sequentially. Callers
|
|
19
|
+
# with an explicit Ractor-oriented design can supply a compatible cache object,
|
|
20
|
+
# such as `Ratomic::Map`, through the `relation_cache:` keyword.
|
|
21
|
+
#
|
|
22
|
+
# A custom relation cache must implement `#[]=` and `#fetch`. The tracker
|
|
23
|
+
# stores cached Relation messages by relation id and uses `#fetch` with a block
|
|
24
|
+
# so unknown relation ids still raise {UnknownRelationError}.
|
|
25
|
+
#
|
|
26
|
+
# `RelationTracker` does not reorder messages, buffer DML until metadata
|
|
27
|
+
# arrives, enforce per-record lifecycle ordering, or coordinate sink retries.
|
|
28
|
+
# Those guarantees belong to higher CDC pipeline layers. This class only
|
|
29
|
+
# preserves parser-layer stream semantics and validates tuple arity against
|
|
30
|
+
# cached Relation metadata.
|
|
31
|
+
#
|
|
32
|
+
# Returned message objects are Ractor-safe.
|
|
33
|
+
#
|
|
34
|
+
# @example Default Hash-backed relation cache
|
|
35
|
+
# stream = Pgoutput::RelationTracker.new
|
|
36
|
+
# stream.process(relation_payload)
|
|
37
|
+
# insert = stream.process(insert_payload)
|
|
38
|
+
# insert.tuple.map(&:oid)
|
|
39
|
+
#
|
|
40
|
+
# @example Ractor-safe relation cache with Ratomic::Map
|
|
41
|
+
# require "ratomic"
|
|
42
|
+
#
|
|
43
|
+
# relation_cache = Ratomic::Map.new
|
|
44
|
+
# stream = Pgoutput::RelationTracker.new(relation_cache: relation_cache)
|
|
45
|
+
# stream.process(relation_payload)
|
|
46
|
+
# update = stream.process(update_payload)
|
|
47
|
+
# update.new_tuple.map(&:oid)
|
|
48
|
+
#
|
|
49
|
+
# @api public Public stream-order decoder that annotates DML with relation OIDs.
|
|
15
50
|
class RelationTracker
|
|
16
|
-
#
|
|
17
|
-
|
|
18
|
-
|
|
51
|
+
# Create a tracker with an optional relation cache.
|
|
52
|
+
#
|
|
53
|
+
# @param relation_cache [Hash, #fetch, #[]=] cache for relation metadata,
|
|
54
|
+
# keyed by pgoutput relation id. The default Hash is suitable for one
|
|
55
|
+
# stream owner; callers may inject `Ratomic::Map` or another compatible
|
|
56
|
+
# cache for explicit Ractor-safe relation metadata sharing.
|
|
57
|
+
# @return [void] initializes an empty tracker using the supplied cache object.
|
|
58
|
+
def initialize(relation_cache: {})
|
|
59
|
+
@relations = relation_cache
|
|
19
60
|
end
|
|
20
61
|
|
|
21
62
|
# Process one pgoutput payload in stream order.
|
|
22
63
|
#
|
|
23
64
|
# @param payload [String] one pgoutput logical replication message payload.
|
|
24
|
-
# @return [Pgoutput::Messages::Begin, Pgoutput::Messages::
|
|
65
|
+
# @return [Pgoutput::Messages::Begin, Pgoutput::Messages::Message,
|
|
66
|
+
# Pgoutput::Messages::Origin, Pgoutput::Messages::Relation,
|
|
67
|
+
# Pgoutput::Messages::Type, Pgoutput::Messages::Truncate,
|
|
25
68
|
# Pgoutput::Messages::Insert, Pgoutput::Messages::Update,
|
|
26
|
-
# Pgoutput::Messages::Delete, Pgoutput::Messages::Commit]
|
|
69
|
+
# Pgoutput::Messages::Delete, Pgoutput::Messages::Commit] parsed immutable
|
|
70
|
+
# message object, with DML tuple OIDs annotated when relation metadata exists.
|
|
27
71
|
# @raise [UnknownRelationError] if DML references an unseen relation id.
|
|
72
|
+
# @raise [TupleArityError] if DML tuple data does not match relation metadata.
|
|
28
73
|
def process(payload)
|
|
29
74
|
message = BinaryParser.new(payload).parse
|
|
30
75
|
|
|
@@ -43,12 +88,15 @@ module Pgoutput
|
|
|
43
88
|
end
|
|
44
89
|
end
|
|
45
90
|
|
|
46
|
-
# Backwards-compatible alias for callers migrating
|
|
91
|
+
# Backwards-compatible alias for callers migrating to `process`.
|
|
47
92
|
#
|
|
48
93
|
# @param payload [String] one pgoutput logical replication message payload.
|
|
49
|
-
# @return [Pgoutput::Messages::Begin, Pgoutput::Messages::
|
|
94
|
+
# @return [Pgoutput::Messages::Begin, Pgoutput::Messages::Message,
|
|
95
|
+
# Pgoutput::Messages::Origin, Pgoutput::Messages::Relation,
|
|
96
|
+
# Pgoutput::Messages::Type, Pgoutput::Messages::Truncate,
|
|
50
97
|
# Pgoutput::Messages::Insert, Pgoutput::Messages::Update,
|
|
51
|
-
# Pgoutput::Messages::Delete, Pgoutput::Messages::Commit]
|
|
98
|
+
# Pgoutput::Messages::Delete, Pgoutput::Messages::Commit] parsed immutable
|
|
99
|
+
# message object, with DML tuple OIDs annotated when relation metadata exists.
|
|
52
100
|
def decode(payload)
|
|
53
101
|
process(payload)
|
|
54
102
|
end
|
|
@@ -72,8 +120,8 @@ module Pgoutput
|
|
|
72
120
|
Ractor.make_shareable(
|
|
73
121
|
Messages::Update.new(
|
|
74
122
|
message.relation_id,
|
|
75
|
-
|
|
76
|
-
|
|
123
|
+
annotate_optional_tuple(message.old_key_tuple, relation),
|
|
124
|
+
annotate_optional_tuple(message.old_tuple, relation),
|
|
77
125
|
annotate_tuple(message.new_tuple, relation)
|
|
78
126
|
)
|
|
79
127
|
)
|
|
@@ -85,21 +133,34 @@ module Pgoutput
|
|
|
85
133
|
Ractor.make_shareable(
|
|
86
134
|
Messages::Delete.new(
|
|
87
135
|
message.relation_id,
|
|
88
|
-
|
|
89
|
-
|
|
136
|
+
annotate_optional_tuple(message.old_key_tuple, relation),
|
|
137
|
+
annotate_optional_tuple(message.old_tuple, relation)
|
|
90
138
|
)
|
|
91
139
|
)
|
|
92
140
|
end
|
|
93
141
|
|
|
94
|
-
def
|
|
142
|
+
def annotate_optional_tuple(tuple, relation)
|
|
95
143
|
return nil if tuple.nil?
|
|
96
144
|
|
|
145
|
+
annotate_tuple(tuple, relation)
|
|
146
|
+
end
|
|
147
|
+
|
|
148
|
+
def annotate_tuple(tuple, relation)
|
|
149
|
+
validate_tuple_arity!(tuple, relation)
|
|
150
|
+
|
|
97
151
|
tuple.each_with_index.map do |value, index|
|
|
98
|
-
|
|
99
|
-
Messages::TupleValue.new(value.format, value.raw, column&.oid)
|
|
152
|
+
Messages::TupleValue.new(value.format, value.raw, relation.columns.fetch(index).oid)
|
|
100
153
|
end.freeze
|
|
101
154
|
end
|
|
102
155
|
|
|
156
|
+
def validate_tuple_arity!(tuple, relation)
|
|
157
|
+
return if tuple.length == relation.columns.length
|
|
158
|
+
|
|
159
|
+
raise TupleArityError,
|
|
160
|
+
"tuple has #{tuple.length} values but relation #{relation.relation_id} " \
|
|
161
|
+
"has #{relation.columns.length} columns"
|
|
162
|
+
end
|
|
163
|
+
|
|
103
164
|
def relation_for(relation_id)
|
|
104
165
|
@relations.fetch(relation_id) do
|
|
105
166
|
raise UnknownRelationError, "unknown relation id #{relation_id}; parse Relation message first"
|
data/lib/pgoutput/version.rb
CHANGED
data/lib/pgoutput.rb
CHANGED
|
@@ -12,6 +12,6 @@ require_relative "pgoutput/relation_tracker"
|
|
|
12
12
|
# payloads into immutable Ruby protocol message objects. The namespace is kept
|
|
13
13
|
# short as `Pgoutput`, while the RubyGems package name is `pgoutput-parser`.
|
|
14
14
|
#
|
|
15
|
-
# @api public
|
|
15
|
+
# @api public Public namespace for parser entry points, message models, and errors.
|
|
16
16
|
module Pgoutput
|
|
17
17
|
end
|