dms-parser 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 8c14f1d20d52d7187a1d070cd97a73489c7ecf26d33f36e1b7caec2800e5ebf7
4
+ data.tar.gz: c23cd13f0f751281b72245f2ae5c4193b705c44ffc7154208146d2af06535bfd
5
+ SHA512:
6
+ metadata.gz: e54215e05be86f064095579e879b0a45518b2b221dfea42d0c2314bce3f45863ba2be8098f460ebd82acd5e1387339b280ddb7718271f3f3fd0e61c42fec2999
7
+ data.tar.gz: aa9925ba49da2e63580c171a6d08f91db92ad5b4a6a5623fcca75ac8a26ef21102935ebed7323069f16cc3d03f01493f98a4cc5636fb7625ed492d402c7523a5
data/LICENSE-APACHE ADDED
@@ -0,0 +1,15 @@
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ Licensed under the Apache License, Version 2.0 (the "License");
6
+ you may not use this file except in compliance with the License.
7
+ You may obtain a copy of the License at
8
+
9
+ http://www.apache.org/licenses/LICENSE-2.0
10
+
11
+ Unless required by applicable law or agreed to in writing, software
12
+ distributed under the License is distributed on an "AS IS" BASIS,
13
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
+ See the License for the specific language governing permissions and
15
+ limitations under the License.
data/LICENSE-MIT ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Filip Lopes
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,166 @@
1
+ <p align="center"><a href="https://gitlab.com/flo-labs/pub/dms"><img src="assets/logo.png" alt="DMS" width="120"></a></p>
2
+
3
+ # dms-rb
4
+
5
+ Ruby parser for **[DMS](https://gitlab.com/flo-labs/pub/dms)**, a data syntax with strong typing, ordered maps,
6
+ multi-line heredocs, and front-matter metadata.
7
+
8
+ Two gems live in this repo, both with the same Ruby API and value shape:
9
+
10
+ | gem | implementation | when to use |
11
+ | ------ | ------------------------------------ | ------------------------------------ |
12
+ | `dms` | pure Ruby | portable; no C toolchain required |
13
+ | `dms-c`| C extension wrapping the dms-c decoder | hot paths; ~2× faster than pure Ruby |
14
+
15
+ ## Install
16
+
17
+ ```sh
18
+ gem install dms # pure Ruby
19
+ gem install dms-c # native (C) extension, same API
20
+ ```
21
+
22
+ ## Usage
23
+
24
+ ```ruby
25
+ require "dms" # or: require "dms_c"
26
+
27
+ src = File.read("config.dms")
28
+
29
+ # Body-only (drops front matter and comments after decode).
30
+ body = Dms.decode(src) # or: DmsC.decode(src)
31
+
32
+ # Full document (preserves comments + literal forms for encode round-trip).
33
+ doc = Dms.decode_document(src)
34
+ doc.meta # Hash | nil — nil when there is no `+++` block
35
+ doc.body # decoded root value
36
+ doc.comments # Array of Dms::AttachedComment
37
+ doc.original_forms # Array of [path, Dms::OriginalLiteral]
38
+
39
+ # Re-emit DMS source.
40
+ output = Dms.encode(doc)
41
+ ```
42
+
43
+ > **Migrating from `parse`/`to_dms`?** SPEC v0.14 renamed the canonical
44
+ > entry points. The old names (`Dms.parse`, `Dms.parse_document`,
45
+ > `Dms.parse_lite`, `Dms.to_dms`, `Dms.to_dms_lite`, and the matching
46
+ > `DmsC.parse*`) still work as deprecated aliases — each emits a
47
+ > one-shot warning on first call, then forwards to the canonical name.
48
+ > They will be removed in the next release.
49
+
50
+ Tables are insertion-ordered Hashes (Ruby Hashes preserve insertion
51
+ order since 1.9). Lists are Arrays. Datetimes are wrapped types: the
52
+ pure module returns `Dms::LocalDate` / `Dms::LocalTime` /
53
+ `Dms::LocalDateTime` / `Dms::OffsetDateTime` class instances; the C
54
+ extension returns plain `{ __dms_type:, value: }` hashes with the same
55
+ data. Encoders that detect via `__dms_type` + `value` work unchanged
56
+ across both gems.
57
+
58
+ ## Working with comments and heredocs
59
+
60
+ DMS preserves comments through decode → mutate → re-emit (SPEC
61
+ §Comments). Attach a comment to a value *after* decoding and have it
62
+ round-trip through `Dms.encode`:
63
+
64
+ ```ruby
65
+ require "dms"
66
+
67
+ doc = Dms.decode_document("db:\n port: 8080\n")
68
+
69
+ # Mutate a value in place.
70
+ doc.body["db"]["port"] = 5432
71
+
72
+ # Attach a leading line comment to db.port.
73
+ doc.comments << Dms::AttachedComment.new(
74
+ Dms::Comment.new("# bumped after LB change", :line),
75
+ :leading,
76
+ ["db", "port"],
77
+ )
78
+
79
+ puts Dms.encode(doc)
80
+ ```
81
+
82
+ ### Forcing a heredoc on emit
83
+
84
+ Strings parse and re-emit in their source form. To switch a basic-quoted
85
+ string to a heredoc (or to construct one from scratch), append an
86
+ `OriginalLiteral.string` record to `doc.original_forms` keyed by the
87
+ value's path:
88
+
89
+ ```ruby
90
+ doc.body["db"]["greeting"] = "Hello, friend.\nWelcome aboard.\n"
91
+
92
+ doc.original_forms << [
93
+ ["db", "greeting"],
94
+ Dms::OriginalLiteral.string(
95
+ Dms::StringForm.heredoc(
96
+ :basic_triple, # or :literal_triple for '''
97
+ nil, # nil = unlabeled (terminator is """ / ''')
98
+ [], # _trim(...), _fold_paragraphs(), …
99
+ ),
100
+ ),
101
+ ]
102
+ ```
103
+
104
+ Round-trip rules (SPEC §Round-trip semantics): comments stick to
105
+ *still-present* nodes; deleting a node drops its comments; newly
106
+ inserted nodes start with no comments. The first `original_forms`
107
+ entry per path wins, so override a parser-recorded form by replacing
108
+ rather than appending if the key is already present.
109
+
110
+ ## Performance
111
+
112
+ 50,000-key flat document (~700 KB), best-of-5, startup-subtracted,
113
+ Ruby 3.3 on Windows 11:
114
+
115
+ | tier | DMS gem | time | JSON peer | time | YAML peer | time | DMS / JSON | DMS / YAML |
116
+ |-------------|-----------|-----------|-----------|----------|-----------|-----------|------------|------------|
117
+ | pure Ruby | `dms` | 115.8 ms | n/a | — | n/a | — | n/a | n/a |
118
+ | native (C) | `dms-c` | 56.5 ms | `json` | 21.4 ms | `psych` | 260.4 ms | 2.63× | **0.22× — DMS ~4.6× faster** |
119
+
120
+ Ruby's stdlib `json` and `psych` are both C-backed; there's no
121
+ widely-used pure-Ruby alternative for either, so JSON and YAML peers
122
+ only appear in the FFI tier (same situation as Node). The pure-Ruby
123
+ DMS port is reported on its own — no fair pure-vs-pure peer to
124
+ compare against.
125
+
126
+ The C extension is ~2× faster than pure Ruby; against C-backed peers
127
+ DMS is ~2.6× the JSON cost (the cost of carrying comments, ordered
128
+ keys, and source-form metadata) and ~5× faster than libyaml.
129
+
130
+ Reproduce with:
131
+
132
+ ```sh
133
+ ruby bench/run_formats.rb
134
+ ```
135
+
136
+ ## Build & test
137
+
138
+ ```sh
139
+ # pure gem:
140
+ bundle install
141
+ bundle exec rake test
142
+
143
+ # native (C) gem:
144
+ cd dms-c/ext/dms_c && ruby extconf.rb && make
145
+ ```
146
+
147
+ The C-extension build needs Ruby's MSYS toolchain on Windows or a
148
+ standard `cc` + `make` on Unix; mkmf handles the platform detection.
149
+
150
+ ## Conformance
151
+
152
+ The fixture corpus lives in
153
+ [dms-tests](https://gitlab.com/flo-labs/pub/dms-tests) (4500+ pairs).
154
+ Clone it once as a sibling:
155
+
156
+ ```sh
157
+ cd ..
158
+ git clone https://gitlab.com/flo-labs/pub/dms-tests.git
159
+ ```
160
+
161
+ The `dms-encoder` binary reads DMS from stdin and writes tagged JSON to
162
+ stdout, matching the format the conformance runner consumes.
163
+
164
+ ## License
165
+
166
+ Dual-licensed: MIT or Apache-2.0, your choice.
data/bin/dms-encoder ADDED
@@ -0,0 +1,234 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ # DMS -> tagged-JSON encoder for the conformance suite.
5
+ # Reads DMS source from stdin, writes tagged JSON to stdout.
6
+
7
+ $LOAD_PATH.unshift(File.expand_path("../lib", __dir__))
8
+
9
+ require "json"
10
+ require "dms"
11
+
12
+ # Force UTF-8 output: tagged JSON uses ensure_ascii=False equivalent
13
+ # (we write raw UTF-8) so stdout must accept it. Disable CRLF translation.
14
+ $stdout.set_encoding("UTF-8")
15
+ $stdin.set_encoding("UTF-8")
16
+ if RUBY_PLATFORM =~ /mswin|mingw/
17
+ $stdout.binmode
18
+ end
19
+
20
+ # Match Rust ryu / Python repr shortest float. Ruby's Float#to_s is shortest
21
+ # round-trip, but uses "1e+100" style; strip the +.
22
+ def shortest_float(v)
23
+ s = v.to_s
24
+ if s.include?("e")
25
+ mantissa, exp = s.split("e", 2)
26
+ exp = exp[1..] if exp.start_with?("+")
27
+ "#{mantissa}e#{exp}"
28
+ elsif !s.include?(".")
29
+ s + ".0"
30
+ else
31
+ s
32
+ end
33
+ end
34
+
35
+ def tag(v)
36
+ case v
37
+ when true
38
+ { "type" => "bool", "value" => "true" }
39
+ when false
40
+ { "type" => "bool", "value" => "false" }
41
+ when Integer
42
+ { "type" => "integer", "value" => v.to_s }
43
+ when Float
44
+ s =
45
+ if v.nan?
46
+ "nan"
47
+ elsif v.infinite?
48
+ v.infinite? > 0 ? "inf" : "-inf"
49
+ else
50
+ shortest_float(v)
51
+ end
52
+ { "type" => "float", "value" => s }
53
+ when Dms::OffsetDateTime
54
+ { "type" => "offset-datetime", "value" => v.value }
55
+ when Dms::LocalDateTime
56
+ { "type" => "local-datetime", "value" => v.value }
57
+ when Dms::LocalDate
58
+ { "type" => "local-date", "value" => v.value }
59
+ when Dms::LocalTime
60
+ { "type" => "local-time", "value" => v.value }
61
+ when String
62
+ { "type" => "string", "value" => v }
63
+ when Hash
64
+ out = {}
65
+ v.each { |k, x| out[k] = tag(x) }
66
+ out
67
+ when Array
68
+ v.map { |x| tag(x) }
69
+ else
70
+ raise "cannot encode #{v.class}"
71
+ end
72
+ end
73
+
74
+ # Walks ARGV looking for `--name N` and returns N (or default if missing /
75
+ # invalid). Mirror of the Rust encoder's parse_usize_flag for protocol
76
+ # parity. See dms-tests/TESTS.md §3a.
77
+ def parse_int_flag(args, name, default = 0)
78
+ i = args.index(name)
79
+ return default if i.nil? || i + 1 >= args.length
80
+
81
+ Integer(args[i + 1])
82
+ rescue ArgumentError, TypeError
83
+ default
84
+ end
85
+
86
+ def parse_str_flag(args, name)
87
+ i = args.index(name)
88
+ return nil if i.nil? || i + 1 >= args.length
89
+
90
+ args[i + 1]
91
+ end
92
+
93
+ # Build the tagged-JSON output string for `doc` (matches default path).
94
+ def emit_tagged_json(doc)
95
+ body_tagged = tag(doc.body)
96
+ out =
97
+ if doc.meta.nil?
98
+ body_tagged
99
+ else
100
+ meta_tagged = {}
101
+ doc.meta.each { |k, v| meta_tagged[k] = tag(v) }
102
+ { "_meta" => meta_tagged, "_body" => body_tagged }
103
+ end
104
+ # Match Python's `json.dumps(out, indent=2, ensure_ascii=False) + "\n"`.
105
+ JSON.pretty_generate(out, indent: " ") + "\n"
106
+ end
107
+
108
+ # Dispatch full vs lite encode. Caller has already validated lite
109
+ # availability when relevant.
110
+ def emit_dms(doc, lite)
111
+ if lite
112
+ # Guarded by caller — should never reach here when not available.
113
+ Dms.encode_lite(doc)
114
+ else
115
+ Dms.encode(doc)
116
+ end
117
+ end
118
+
119
+ # Bench mode: parse stdin once (untimed, already done by caller), then loop
120
+ # the emit step N times printing one `iter <i> <ns>` per call. Uses
121
+ # Process::CLOCK_MONOTONIC for nanosecond-resolution monotonic timestamps.
122
+ # See dms-tests/TESTS.md §3a.
123
+ def run_bench(doc, roundtrip, lite, iters, warmup)
124
+ # Warmup: a few untimed emits so allocator / JIT / icache settle.
125
+ warmup.times do
126
+ if roundtrip
127
+ _ = emit_dms(doc, lite)
128
+ else
129
+ _ = emit_tagged_json(doc)
130
+ end
131
+ end
132
+ out = $stdout
133
+ iters.times do |i|
134
+ t0 = Process.clock_gettime(Process::CLOCK_MONOTONIC, :nanosecond)
135
+ s =
136
+ if roundtrip
137
+ emit_dms(doc, lite)
138
+ else
139
+ emit_tagged_json(doc)
140
+ end
141
+ elapsed = Process.clock_gettime(Process::CLOCK_MONOTONIC, :nanosecond) - t0
142
+ # Keep `s` referenced so the optimizer can't elide the work.
143
+ s.length if s
144
+ out.printf("iter %d %d\n", i, elapsed)
145
+ end
146
+ out.flush
147
+ end
148
+
149
+ args = ARGV
150
+ roundtrip = args.include?("--roundtrip")
151
+
152
+ # --tier=1: tier-1 wrapper JSON output.
153
+ tier_flag = args.include?("--tier=1") ? 1 : 0
154
+
155
+ # --ignore-order: opts into the unordered parse mode (SPEC §Unordered
156
+ # tables). dms-rb honours this at runtime by building body tables as
157
+ # `Dms::UnorderedHash` with shuffled key order; full-mode `to_dms` then
158
+ # refuses such Documents (use `--mode lite` for canonical emit). Falls
159
+ # back to DMS_IGNORE_ORDER env var.
160
+ ignore_order = args.include?("--ignore-order") || !ENV["DMS_IGNORE_ORDER"].to_s.empty?
161
+
162
+ # --mode {full,lite} (default: full). In --roundtrip mode this picks
163
+ # to_dms vs to_dms_lite. Falls back to DMS_MODE env var. Default
164
+ # tagged-JSON mode currently ignores --mode (output shape is fixed by
165
+ # the conformance contract).
166
+ mode_arg = parse_str_flag(args, "--mode") || ENV["DMS_MODE"] || "full"
167
+ case mode_arg
168
+ when "full"
169
+ lite = false
170
+ when "lite"
171
+ lite = true
172
+ else
173
+ $stderr.puts "0:0: --mode must be full|lite, got #{mode_arg.inspect}"
174
+ exit 1
175
+ end
176
+
177
+ # Bench mode flags. See dms-tests/TESTS.md §3a.
178
+ bench_iters = parse_int_flag(args, "--bench-iters", 0)
179
+ bench_warmup = parse_int_flag(args, "--bench-warmup", 3)
180
+
181
+ src = $stdin.read
182
+
183
+ # Empty / whitespace-only stdin → exit 0 (startup probe used by bench harness).
184
+ if src.nil? || src.strip.empty?
185
+ exit 0
186
+ end
187
+
188
+ begin
189
+ # Tier-1 decode path: parse with decorator awareness, emit wrapper JSON.
190
+ if tier_flag == 1
191
+ doc_t1 = Dms::Tier1.parse(src)
192
+ out = Dms::Tier1.emit_t1_json(doc_t1, method(:tag))
193
+ $stdout.write(JSON.pretty_generate(out, indent: " ") + "\n")
194
+ exit 0
195
+ end
196
+
197
+ # Decode mode dispatch: `--ignore-order` opts into unordered backing
198
+ # (SPEC §"Unordered tables"); `--mode lite` (or DMS_MODE=lite) routes
199
+ # through the lite decoder even on the tagged-JSON conformance path.
200
+ # Tagged JSON is mode-invariant (same value tree); the dispatch lets
201
+ # the conformance harness exercise either decoder under one driver.
202
+ doc =
203
+ if lite && ignore_order
204
+ Dms.decode_lite_document_unordered(src)
205
+ elsif lite
206
+ Dms.decode_lite_document(src)
207
+ elsif ignore_order
208
+ Dms.decode_document_unordered(src)
209
+ else
210
+ Dms.decode_document(src)
211
+ end
212
+ rescue Dms::DecodeError => e
213
+ warn e.message
214
+ exit 1
215
+ rescue StandardError => e
216
+ warn "0:0: #{e.message}"
217
+ exit 1
218
+ end
219
+
220
+ if bench_iters > 0
221
+ run_bench(doc, roundtrip, lite, bench_iters, bench_warmup)
222
+ exit 0
223
+ end
224
+
225
+ begin
226
+ if roundtrip
227
+ $stdout.write(emit_dms(doc, lite))
228
+ else
229
+ $stdout.write(emit_tagged_json(doc))
230
+ end
231
+ rescue StandardError => e
232
+ warn "0:0: #{e.message}"
233
+ exit 1
234
+ end