pgoutput-parser 0.1.1 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +36 -4
- data/README.md +69 -7
- data/docs/glossary.md +145 -0
- data/docs/index.md +239 -0
- data/docs/relation_tracker.md +168 -0
- data/lib/pgoutput/binary_parser.rb +47 -11
- data/lib/pgoutput/errors.rb +10 -4
- data/lib/pgoutput/messages.rb +40 -2
- data/lib/pgoutput/relation_tracker.rb +79 -18
- data/lib/pgoutput/version.rb +1 -1
- data/lib/pgoutput.rb +1 -1
- data/sig/pgoutput.rbs +85 -30
- metadata +7 -102
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 5822df660843e811e03208f6004f7700821953d9c602d70dd3d059f7fb9b82f7
|
|
4
|
+
data.tar.gz: 0231a9f0d3fcc295976b534bfa7350e08f0156c96a1daebb5528e095df2a63db
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 6221a4d4a884f2877fdb9fe0086ce37b955137f204de2c207892de697be28a4d67f4290e12b027531e12ee9ae4277233124989d6d0073a4d9a7584fc587eac78
|
|
7
|
+
data.tar.gz: c868e1446cb07ee253f0ece9a5f8fc104a9ca5bf9e3181e169a12d4affcd03e58a1ca240bf86107a35b388b007b0911985ebba214a6d667b2a886af5f8c90995
|
data/CHANGELOG.md
CHANGED
|
@@ -7,9 +7,41 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|
|
7
7
|
|
|
8
8
|
## [Unreleased]
|
|
9
9
|
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
## [0.2.0] - 2026-06-06
|
|
13
|
+
|
|
10
14
|
### Added
|
|
11
15
|
|
|
12
|
-
*
|
|
16
|
+
* Added `Pgoutput::TupleArityError` for relation/tuple column-count mismatches.
|
|
17
|
+
* Added a dependency-free parser throughput benchmark for `BinaryParser` and
|
|
18
|
+
cached `RelationTracker` DML processing.
|
|
19
|
+
* Added parser support for non-streaming Message (`M`), Origin (`O`), Type (`Y`),
|
|
20
|
+
and Truncate (`T`) messages.
|
|
21
|
+
* Added benchmark tuning controls for iterations, warmup, Ractor count, and
|
|
22
|
+
scenario selection.
|
|
23
|
+
* Added benchmark relation-cache tuning for comparing the default Hash cache
|
|
24
|
+
with optional `Ratomic::Map`.
|
|
25
|
+
* Added generated documentation pages for the glossary and detailed
|
|
26
|
+
`RelationTracker` usage.
|
|
27
|
+
* Added optional `Ratomic::Map` relation-cache coverage and benchmark smoke
|
|
28
|
+
paths.
|
|
29
|
+
* Added grouped unit, integration, and behavior test layout aligned with the CDC
|
|
30
|
+
component test shape.
|
|
31
|
+
* Added explicit BinaryParser coverage for unterminated C strings and negative
|
|
32
|
+
byte lengths.
|
|
33
|
+
|
|
34
|
+
### Changed
|
|
35
|
+
|
|
36
|
+
* `Pgoutput::RelationTracker` now rejects DML tuples whose value count differs from
|
|
37
|
+
the cached `Relation` column count instead of silently assigning incomplete
|
|
38
|
+
column metadata.
|
|
39
|
+
* Raised the minimum supported Ruby version to Ruby 4.0 to align with the CDC
|
|
40
|
+
ecosystem's Ractor-first parser/runtime direction.
|
|
41
|
+
* `Pgoutput::RelationTracker` now accepts an injectable `relation_cache:` object
|
|
42
|
+
compatible with Hash-style `#[]=` and `#fetch`.
|
|
43
|
+
* GitHub Actions now validates RBS signatures and runs a small benchmark smoke
|
|
44
|
+
test against Hash and Ratomic relation-cache paths.
|
|
13
45
|
|
|
14
46
|
---
|
|
15
47
|
|
|
@@ -112,6 +144,6 @@ A future companion project (`pgoutput-decoder`) may provide PostgreSQL type deco
|
|
|
112
144
|
|
|
113
145
|
---
|
|
114
146
|
|
|
115
|
-
[Unreleased]: https://github.com/
|
|
116
|
-
[0.
|
|
117
|
-
|
|
147
|
+
[Unreleased]: https://github.com/kanutocd/pgoutput-parser/compare/v0.2.0...HEAD
|
|
148
|
+
[0.2.0]: https://github.com/kanutocd/pgoutput-parser/compare/v0.1.0...v0.2.0
|
|
149
|
+
[0.1.0]: https://github.com/kanutocd/pgoutput-parser/releases/tag/v0.1.0
|
data/README.md
CHANGED
|
@@ -10,7 +10,7 @@ It intentionally does **not** convert PostgreSQL values into application-specifi
|
|
|
10
10
|
|
|
11
11
|
## Requirements
|
|
12
12
|
|
|
13
|
-
- Ruby
|
|
13
|
+
- Ruby 4+
|
|
14
14
|
- PostgreSQL 10+
|
|
15
15
|
|
|
16
16
|
---
|
|
@@ -18,7 +18,7 @@ It intentionally does **not** convert PostgreSQL values into application-specifi
|
|
|
18
18
|
## Features
|
|
19
19
|
|
|
20
20
|
- Pure Ruby implementation
|
|
21
|
-
- Ruby
|
|
21
|
+
- Ruby 4+
|
|
22
22
|
- Ractor-safe parsed messages
|
|
23
23
|
- Immutable protocol message objects
|
|
24
24
|
- PostgreSQL logical replication protocol support
|
|
@@ -28,6 +28,9 @@ It intentionally does **not** convert PostgreSQL values into application-specifi
|
|
|
28
28
|
- YARD documentation included
|
|
29
29
|
- No runtime dependencies
|
|
30
30
|
|
|
31
|
+
The generated documentation also includes a project glossary:
|
|
32
|
+
[docs/glossary.md](docs/glossary.md).
|
|
33
|
+
|
|
31
34
|
---
|
|
32
35
|
|
|
33
36
|
## Why Another pgoutput Library?
|
|
@@ -46,13 +49,17 @@ This keeps the parser small, predictable, dependency-free, and faithful to Postg
|
|
|
46
49
|
|
|
47
50
|
## Supported MVP Scope
|
|
48
51
|
|
|
49
|
-
Supports the core pgoutput
|
|
52
|
+
Supports the core non-streaming pgoutput logical replication messages:
|
|
50
53
|
|
|
51
54
|
- Begin (`B`)
|
|
55
|
+
- Message (`M`)
|
|
56
|
+
- Origin (`O`)
|
|
52
57
|
- Relation (`R`)
|
|
58
|
+
- Type (`Y`)
|
|
53
59
|
- Insert (`I`)
|
|
54
60
|
- Update (`U`)
|
|
55
61
|
- Delete (`D`)
|
|
62
|
+
- Truncate (`T`)
|
|
56
63
|
- Commit (`C`)
|
|
57
64
|
|
|
58
65
|
The currently supported message formats are stable across PostgreSQL 10 through PostgreSQL 18.
|
|
@@ -70,10 +77,6 @@ TupleData supports all base column markers:
|
|
|
70
77
|
|
|
71
78
|
Future releases may add support for:
|
|
72
79
|
|
|
73
|
-
- Message (`M`)
|
|
74
|
-
- Truncate (`T`)
|
|
75
|
-
- Origin (`O`)
|
|
76
|
-
- Type (`Y`)
|
|
77
80
|
- Stream Start (`S`)
|
|
78
81
|
- Stream Stop (`E`)
|
|
79
82
|
- Stream Commit (`c`)
|
|
@@ -275,6 +278,13 @@ delete.old_tuple
|
|
|
275
278
|
|
|
276
279
|
`RelationTracker` keeps a local relation cache so tuple values can be associated with PostgreSQL column OIDs defined by preceding Relation (`R`) messages.
|
|
277
280
|
|
|
281
|
+
The tracker accepts an optional `relation_cache:` argument. The default is a
|
|
282
|
+
plain Hash, but callers can inject `Ratomic::Map` for a Ractor-safe cache in
|
|
283
|
+
experimental or parallel setups.
|
|
284
|
+
|
|
285
|
+
For a deeper guide, including stream-order behavior, tuple arity validation, and
|
|
286
|
+
`Ratomic::Map` usage, see [docs/relation_tracker.md](docs/relation_tracker.md).
|
|
287
|
+
|
|
278
288
|
No type conversion is performed.
|
|
279
289
|
|
|
280
290
|
Only protocol metadata is attached.
|
|
@@ -290,6 +300,10 @@ message.new_tuple.map(&:oid)
|
|
|
290
300
|
|
|
291
301
|
The relation tracker itself is stateful and maintains relation metadata encountered in the replication stream.
|
|
292
302
|
|
|
303
|
+
If a DML tuple's value count does not match the cached relation column count, `RelationTracker` raises
|
|
304
|
+
`Pgoutput::TupleArityError`. This keeps malformed payloads or mismatched stream state from being silently
|
|
305
|
+
annotated with incomplete column metadata.
|
|
306
|
+
|
|
293
307
|
---
|
|
294
308
|
|
|
295
309
|
## Ractor Safety
|
|
@@ -392,12 +406,16 @@ The tracker maintains relation metadata discovered during the stream and therefo
|
|
|
392
406
|
|
|
393
407
|
```ruby
|
|
394
408
|
Pgoutput::Messages::Begin
|
|
409
|
+
Pgoutput::Messages::Message
|
|
410
|
+
Pgoutput::Messages::Origin
|
|
395
411
|
Pgoutput::Messages::Relation
|
|
412
|
+
Pgoutput::Messages::Type
|
|
396
413
|
Pgoutput::Messages::Column
|
|
397
414
|
Pgoutput::Messages::TupleValue
|
|
398
415
|
Pgoutput::Messages::Insert
|
|
399
416
|
Pgoutput::Messages::Update
|
|
400
417
|
Pgoutput::Messages::Delete
|
|
418
|
+
Pgoutput::Messages::Truncate
|
|
401
419
|
Pgoutput::Messages::Commit
|
|
402
420
|
```
|
|
403
421
|
|
|
@@ -435,6 +453,50 @@ COVERAGE=true bundle exec rake test
|
|
|
435
453
|
|
|
436
454
|
---
|
|
437
455
|
|
|
456
|
+
## Benchmarking
|
|
457
|
+
|
|
458
|
+
Run the parser throughput benchmark:
|
|
459
|
+
|
|
460
|
+
```bash
|
|
461
|
+
ruby benchmark/parser_throughput.rb
|
|
462
|
+
```
|
|
463
|
+
|
|
464
|
+
The benchmark reports single-process parser throughput, relation-tracker throughput, and Ractor-parallel throughput. It is intended to show both the single-thread baseline and the Ruby 4 Ractor path this parser is designed to support. Relation-tracker scenarios can also compare the default Hash relation cache with an optional `Ratomic::Map` cache.
|
|
465
|
+
|
|
466
|
+
Tune the run with environment variables:
|
|
467
|
+
|
|
468
|
+
| Variable | Default | Description |
|
|
469
|
+
| -------- | ------- | ----------- |
|
|
470
|
+
| `PGOUTPUT_BENCH_ITERATIONS` | `100000` | Iterations per selected scenario. |
|
|
471
|
+
| `PGOUTPUT_BENCH_WARMUP` | `1000` | Warmup iterations before timing. |
|
|
472
|
+
| `PGOUTPUT_BENCH_RACTORS` | `2` or CPU count, whichever is lower | Number of Ractor workers for Ractor scenarios. |
|
|
473
|
+
| `PGOUTPUT_BENCH_SCENARIOS` | `all` | Comma-separated scenarios: `binary`, `tracker_dml`, `tracker_mixed`, `ractor_binary`, `ractor_tracker`, or `all`. |
|
|
474
|
+
| `PGOUTPUT_BENCH_RELATION_CACHE` | `hash` | Comma-separated relation-cache backends for tracker scenarios: `hash`, `ratomic`, or `all`. |
|
|
475
|
+
|
|
476
|
+
Examples:
|
|
477
|
+
|
|
478
|
+
```bash
|
|
479
|
+
PGOUTPUT_BENCH_ITERATIONS=10000 ruby benchmark/parser_throughput.rb
|
|
480
|
+
PGOUTPUT_BENCH_SCENARIOS=binary,tracker_mixed ruby benchmark/parser_throughput.rb
|
|
481
|
+
PGOUTPUT_BENCH_RACTORS=4 PGOUTPUT_BENCH_SCENARIOS=ractor_binary,ractor_tracker ruby benchmark/parser_throughput.rb
|
|
482
|
+
PGOUTPUT_BENCH_RELATION_CACHE=all PGOUTPUT_BENCH_SCENARIOS=tracker_mixed,ractor_tracker ruby benchmark/parser_throughput.rb
|
|
483
|
+
```
|
|
484
|
+
|
|
485
|
+
Sample Ruby 4 output:
|
|
486
|
+
|
|
487
|
+
```text
|
|
488
|
+
pgoutput-parser throughput
|
|
489
|
+
iterations=1000 warmup=10 ractors=2 scenarios=tracker_mixed,ractor_tracker relation_cache=hash,ratomic ruby=4.0.5
|
|
490
|
+
RelationTracker hash 7000 messages in 0.163s 42891 msg/s
|
|
491
|
+
RelationTracker ratomic 7000 messages in 0.131s 53579 msg/s
|
|
492
|
+
Ractor RelationTracker hash 14000 messages in 0.197s 71097 msg/s
|
|
493
|
+
Ractor RelationTracker ratomic 14000 messages in 0.146s 96190 msg/s
|
|
494
|
+
```
|
|
495
|
+
|
|
496
|
+
Interpret the Ractor rows as aggregate throughput across workers. They are not a replacement for the single-process rows; they demonstrate the parser's shareable-message design under parallel execution.
|
|
497
|
+
|
|
498
|
+
---
|
|
499
|
+
|
|
438
500
|
## Development
|
|
439
501
|
|
|
440
502
|
Generate YARD documentation:
|
data/docs/glossary.md
ADDED
|
@@ -0,0 +1,145 @@
|
|
|
1
|
+
# Glossary
|
|
2
|
+
|
|
3
|
+
This glossary defines terms as they are used by `pgoutput-parser`. It focuses on
|
|
4
|
+
the protocol parsing layer and avoids application-level decoding concepts that
|
|
5
|
+
belong to higher CDC components.
|
|
6
|
+
|
|
7
|
+
## Binary Value
|
|
8
|
+
|
|
9
|
+
A tuple value sent by PostgreSQL with the `b` TupleData marker. The parser
|
|
10
|
+
preserves the bytes exactly as received and does not interpret the PostgreSQL
|
|
11
|
+
binary format.
|
|
12
|
+
|
|
13
|
+
## BinaryParser
|
|
14
|
+
|
|
15
|
+
The stateless entry point for parsing one pgoutput message payload. It decodes
|
|
16
|
+
the wire-format tag and fields into an immutable `Pgoutput::Messages` object.
|
|
17
|
+
|
|
18
|
+
## Column Flag
|
|
19
|
+
|
|
20
|
+
The per-column flag byte in a Relation (`R`) message. PostgreSQL uses flag `1`
|
|
21
|
+
to identify replica identity key columns.
|
|
22
|
+
|
|
23
|
+
## Commit LSN
|
|
24
|
+
|
|
25
|
+
The PostgreSQL log sequence number associated with a transaction commit. The
|
|
26
|
+
parser exposes it as protocol metadata and does not use it for ordering logic.
|
|
27
|
+
|
|
28
|
+
## CopyData Payload
|
|
29
|
+
|
|
30
|
+
The PostgreSQL replication protocol frame body that contains one pgoutput
|
|
31
|
+
message. This gem expects callers to provide that payload; it does not manage
|
|
32
|
+
the PostgreSQL connection or replication stream.
|
|
33
|
+
|
|
34
|
+
## DML Message
|
|
35
|
+
|
|
36
|
+
A data manipulation message emitted by pgoutput for row or table changes. In
|
|
37
|
+
this parser, DML messages are Insert (`I`), Update (`U`), Delete (`D`), and
|
|
38
|
+
Truncate (`T`).
|
|
39
|
+
|
|
40
|
+
## Immutable Message
|
|
41
|
+
|
|
42
|
+
A parsed protocol object that has been made shareable with `Ractor`. Parsed
|
|
43
|
+
messages can be passed across Ractors without sharing mutable parser state.
|
|
44
|
+
|
|
45
|
+
## LSN
|
|
46
|
+
|
|
47
|
+
Log sequence number. PostgreSQL uses LSN values to identify positions in the
|
|
48
|
+
write-ahead log. This parser keeps LSN values as integers and does not convert
|
|
49
|
+
them into PostgreSQL's textual `X/Y` notation.
|
|
50
|
+
|
|
51
|
+
## Message Tag
|
|
52
|
+
|
|
53
|
+
The first byte of a pgoutput payload. It identifies the message type, such as
|
|
54
|
+
`R` for Relation, `I` for Insert, or `C` for Commit.
|
|
55
|
+
|
|
56
|
+
## OID
|
|
57
|
+
|
|
58
|
+
Object identifier. In this gem, OIDs are most commonly relation IDs and
|
|
59
|
+
PostgreSQL type IDs exposed by Relation (`R`) and Type (`Y`) messages.
|
|
60
|
+
|
|
61
|
+
## pgoutput
|
|
62
|
+
|
|
63
|
+
PostgreSQL's built-in logical replication output plugin. It serializes
|
|
64
|
+
transaction metadata, relation metadata, and row changes into a binary protocol.
|
|
65
|
+
|
|
66
|
+
## Relation
|
|
67
|
+
|
|
68
|
+
PostgreSQL table metadata sent by a Relation (`R`) message. It includes the
|
|
69
|
+
relation ID, schema name, table name, replica identity setting, and columns.
|
|
70
|
+
|
|
71
|
+
## RelationTracker
|
|
72
|
+
|
|
73
|
+
The stream-order parser wrapper that remembers Relation (`R`) messages and uses
|
|
74
|
+
them to annotate later DML tuple values with PostgreSQL type OIDs.
|
|
75
|
+
|
|
76
|
+
## Replica Identity
|
|
77
|
+
|
|
78
|
+
PostgreSQL metadata that determines what old-row data is available for Update
|
|
79
|
+
and Delete messages. The parser exposes the replica identity byte from Relation
|
|
80
|
+
messages and preserves old-key or old-full tuples when PostgreSQL sends them.
|
|
81
|
+
|
|
82
|
+
## Ractor-Safe
|
|
83
|
+
|
|
84
|
+
Safe to pass between Ruby Ractors. Parsed messages are Ractor-safe; parser and
|
|
85
|
+
tracker instances remain mutable and should be scoped to one owner unless the
|
|
86
|
+
caller supplies an explicitly Ractor-safe relation cache.
|
|
87
|
+
|
|
88
|
+
## Ratomic
|
|
89
|
+
|
|
90
|
+
An optional Ruby library that provides Ractor-oriented concurrent data
|
|
91
|
+
structures. `pgoutput-parser` does not require Ratomic at runtime, but benchmark
|
|
92
|
+
and development code can use it to evaluate Ractor-safe relation cache behavior.
|
|
93
|
+
|
|
94
|
+
## Ratomic::Map
|
|
95
|
+
|
|
96
|
+
A Ractor-safe map implementation from Ratomic. `RelationTracker` can use
|
|
97
|
+
`Ratomic::Map` as its `relation_cache:` when callers need relation metadata in a
|
|
98
|
+
cache that can be shared across Ractor-oriented execution designs.
|
|
99
|
+
|
|
100
|
+
## Raw Tuple Value
|
|
101
|
+
|
|
102
|
+
The uninterpreted bytes for a tuple column value. Text values and binary values
|
|
103
|
+
are both kept as raw strings; NULL and unchanged TOAST markers have no raw
|
|
104
|
+
payload.
|
|
105
|
+
|
|
106
|
+
## Text Value
|
|
107
|
+
|
|
108
|
+
A tuple value sent by PostgreSQL with the `t` TupleData marker. The parser
|
|
109
|
+
preserves the text bytes and leaves type conversion to a decoder layer.
|
|
110
|
+
|
|
111
|
+
## TOAST
|
|
112
|
+
|
|
113
|
+
PostgreSQL's storage mechanism for large column values. In pgoutput TupleData,
|
|
114
|
+
the `u` marker means an unchanged TOAST value was not resent.
|
|
115
|
+
|
|
116
|
+
## Truncate
|
|
117
|
+
|
|
118
|
+
A pgoutput DML message that reports table truncation. It contains relation IDs
|
|
119
|
+
and option bits such as CASCADE and RESTART IDENTITY.
|
|
120
|
+
|
|
121
|
+
## Tuple Arity
|
|
122
|
+
|
|
123
|
+
The number of values in tuple data. `RelationTracker` validates DML tuple arity
|
|
124
|
+
against cached Relation column metadata before annotating type OIDs.
|
|
125
|
+
|
|
126
|
+
## TupleData
|
|
127
|
+
|
|
128
|
+
The pgoutput structure that carries row values for Insert, Update, and Delete
|
|
129
|
+
messages. Each value is marked as NULL, unchanged TOAST, text, or binary.
|
|
130
|
+
|
|
131
|
+
## Type Decoding
|
|
132
|
+
|
|
133
|
+
Conversion from PostgreSQL raw tuple bytes into application-level Ruby values.
|
|
134
|
+
This gem intentionally does not perform type decoding; that responsibility
|
|
135
|
+
belongs to a higher-level decoder component.
|
|
136
|
+
|
|
137
|
+
## Type Modifier
|
|
138
|
+
|
|
139
|
+
PostgreSQL column type metadata carried in Relation messages. For example,
|
|
140
|
+
typmods can encode precision or length constraints for some PostgreSQL types.
|
|
141
|
+
|
|
142
|
+
## WAL
|
|
143
|
+
|
|
144
|
+
Write-ahead log. PostgreSQL logical replication streams changes derived from WAL
|
|
145
|
+
through output plugins such as pgoutput.
|
data/docs/index.md
ADDED
|
@@ -0,0 +1,239 @@
|
|
|
1
|
+
# pgoutput-parser
|
|
2
|
+
|
|
3
|
+
A high-performance, Ractor-safe PostgreSQL `pgoutput` logical replication protocol parser written in pure Ruby.
|
|
4
|
+
|
|
5
|
+
`pgoutput-parser` parses PostgreSQL logical replication `CopyData` payloads into immutable protocol message objects.
|
|
6
|
+
|
|
7
|
+
It focuses exclusively on PostgreSQL's `pgoutput` wire format:
|
|
8
|
+
|
|
9
|
+
* Transaction boundaries
|
|
10
|
+
* Relation metadata
|
|
11
|
+
* DML message structure
|
|
12
|
+
* Tuple payload markers
|
|
13
|
+
* Raw tuple values
|
|
14
|
+
|
|
15
|
+
It intentionally does **not** decode PostgreSQL values into application-level Ruby objects.
|
|
16
|
+
|
|
17
|
+
That responsibility belongs to higher layers such as:
|
|
18
|
+
|
|
19
|
+
```text
|
|
20
|
+
pgoutput-decoder
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
# Architecture
|
|
26
|
+
|
|
27
|
+
```text
|
|
28
|
+
PostgreSQL WAL
|
|
29
|
+
|
|
|
30
|
+
v
|
|
31
|
+
CopyData payload
|
|
32
|
+
|
|
|
33
|
+
v
|
|
34
|
+
pgoutput-parser
|
|
35
|
+
|
|
|
36
|
+
v
|
|
37
|
+
Immutable protocol messages
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
The parser is the protocol layer of the CDC ecosystem.
|
|
41
|
+
|
|
42
|
+
```text
|
|
43
|
+
PostgreSQL
|
|
44
|
+
|
|
|
45
|
+
v
|
|
46
|
+
pgoutput-parser
|
|
47
|
+
|
|
|
48
|
+
v
|
|
49
|
+
pgoutput-decoder
|
|
50
|
+
|
|
|
51
|
+
v
|
|
52
|
+
cdc-core
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
# Quick Start
|
|
58
|
+
|
|
59
|
+
```ruby
|
|
60
|
+
require "pgoutput"
|
|
61
|
+
|
|
62
|
+
stream = Pgoutput::RelationTracker.new
|
|
63
|
+
|
|
64
|
+
stream.process(relation_payload)
|
|
65
|
+
|
|
66
|
+
insert = stream.process(insert_payload)
|
|
67
|
+
|
|
68
|
+
insert.relation_id
|
|
69
|
+
# => 42
|
|
70
|
+
|
|
71
|
+
insert.tuple.first.raw
|
|
72
|
+
# => "7"
|
|
73
|
+
|
|
74
|
+
insert.tuple.first.oid
|
|
75
|
+
# => 23
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
---
|
|
79
|
+
|
|
80
|
+
# Core Concepts
|
|
81
|
+
|
|
82
|
+
## BinaryParser
|
|
83
|
+
|
|
84
|
+
Parses a single `pgoutput` payload.
|
|
85
|
+
|
|
86
|
+
```ruby
|
|
87
|
+
message = Pgoutput::BinaryParser
|
|
88
|
+
.new(payload)
|
|
89
|
+
.parse
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
Use this when stream state is not required.
|
|
93
|
+
|
|
94
|
+
---
|
|
95
|
+
|
|
96
|
+
## RelationTracker
|
|
97
|
+
|
|
98
|
+
Tracks PostgreSQL relation metadata across a replication stream.
|
|
99
|
+
|
|
100
|
+
```ruby
|
|
101
|
+
stream = Pgoutput::RelationTracker.new
|
|
102
|
+
|
|
103
|
+
stream.process(relation_payload)
|
|
104
|
+
|
|
105
|
+
message = stream.process(insert_payload)
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
This allows tuple values to be associated with PostgreSQL column OIDs.
|
|
109
|
+
|
|
110
|
+
If tuple data does not match the cached relation column count, `RelationTracker`
|
|
111
|
+
raises `Pgoutput::TupleArityError`.
|
|
112
|
+
|
|
113
|
+
---
|
|
114
|
+
|
|
115
|
+
# Supported Messages
|
|
116
|
+
|
|
117
|
+
Current MVP support:
|
|
118
|
+
|
|
119
|
+
* Begin (`B`)
|
|
120
|
+
* Message (`M`)
|
|
121
|
+
* Origin (`O`)
|
|
122
|
+
* Relation (`R`)
|
|
123
|
+
* Type (`Y`)
|
|
124
|
+
* Insert (`I`)
|
|
125
|
+
* Update (`U`)
|
|
126
|
+
* Delete (`D`)
|
|
127
|
+
* Truncate (`T`)
|
|
128
|
+
* Commit (`C`)
|
|
129
|
+
|
|
130
|
+
Supported across PostgreSQL 10–18.
|
|
131
|
+
|
|
132
|
+
---
|
|
133
|
+
|
|
134
|
+
# Tuple Values
|
|
135
|
+
|
|
136
|
+
Tuple values are preserved exactly as PostgreSQL sends them.
|
|
137
|
+
|
|
138
|
+
```ruby
|
|
139
|
+
value.raw
|
|
140
|
+
# => "2026-05-31 12:34:56+00"
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
No application-level decoding occurs.
|
|
144
|
+
|
|
145
|
+
---
|
|
146
|
+
|
|
147
|
+
# Binary Values
|
|
148
|
+
|
|
149
|
+
Binary tuple values are preserved exactly as received.
|
|
150
|
+
|
|
151
|
+
```ruby
|
|
152
|
+
value.raw
|
|
153
|
+
# => "\x00\x00\x00\x07".b
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
The parser does not interpret binary payloads.
|
|
157
|
+
|
|
158
|
+
---
|
|
159
|
+
|
|
160
|
+
# Ractor Safety
|
|
161
|
+
|
|
162
|
+
All parsed protocol messages are immutable and shareable.
|
|
163
|
+
|
|
164
|
+
```ruby
|
|
165
|
+
message = stream.process(update_payload)
|
|
166
|
+
|
|
167
|
+
Ractor.shareable?(message)
|
|
168
|
+
# => true
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
Passing messages across Ractors:
|
|
172
|
+
|
|
173
|
+
```ruby
|
|
174
|
+
message = stream.process(update_payload)
|
|
175
|
+
|
|
176
|
+
result = Ractor.new(message) do |update|
|
|
177
|
+
update.new_tuple.map(&:raw)
|
|
178
|
+
end.take
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
---
|
|
182
|
+
|
|
183
|
+
# Non-Goals
|
|
184
|
+
|
|
185
|
+
`pgoutput-parser` intentionally does not:
|
|
186
|
+
|
|
187
|
+
* Open replication connections
|
|
188
|
+
* Manage replication slots
|
|
189
|
+
* Track WAL positions
|
|
190
|
+
* Reconnect to PostgreSQL
|
|
191
|
+
* Decode PostgreSQL values
|
|
192
|
+
* Build CDC pipelines
|
|
193
|
+
* Integrate with ActiveRecord
|
|
194
|
+
|
|
195
|
+
Its sole responsibility is protocol parsing.
|
|
196
|
+
|
|
197
|
+
---
|
|
198
|
+
|
|
199
|
+
# Public API
|
|
200
|
+
|
|
201
|
+
See the generated API documentation for:
|
|
202
|
+
|
|
203
|
+
* `Pgoutput::BinaryParser`
|
|
204
|
+
* `Pgoutput::RelationTracker`
|
|
205
|
+
* `Pgoutput::Messages::*`
|
|
206
|
+
|
|
207
|
+
---
|
|
208
|
+
|
|
209
|
+
# Development
|
|
210
|
+
|
|
211
|
+
Generate documentation:
|
|
212
|
+
|
|
213
|
+
```bash
|
|
214
|
+
bundle exec yard doc
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
Run tests:
|
|
218
|
+
|
|
219
|
+
```bash
|
|
220
|
+
bundle exec rake test
|
|
221
|
+
```
|
|
222
|
+
|
|
223
|
+
Run coverage:
|
|
224
|
+
|
|
225
|
+
```bash
|
|
226
|
+
COVERAGE=true bundle exec rake test
|
|
227
|
+
```
|
|
228
|
+
|
|
229
|
+
Run Steep:
|
|
230
|
+
|
|
231
|
+
```bash
|
|
232
|
+
bundle exec steep check
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
---
|
|
236
|
+
|
|
237
|
+
# License
|
|
238
|
+
|
|
239
|
+
MIT
|