encode_m 2.0.0 → 3.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +37 -0
- data/README.md +139 -41
- data/encode_m.gemspec +27 -16
- data/lib/encode_m/composite.rb +105 -0
- data/lib/encode_m/decoder.rb +59 -5
- data/lib/encode_m/numeric.rb +24 -11
- data/lib/encode_m/string.rb +85 -0
- data/lib/encode_m/version.rb +2 -2
- data/lib/encode_m.rb +47 -9
- data/logo.png +0 -0
- metadata +17 -15
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 97b4b00c071667466ef61b65805c3143abcbc42720f629b1a0ee30f9fef0d200
|
4
|
+
data.tar.gz: 07e37e38818a96b8ba30330422d6ec32f31c33694a3c36a3515433b91cc6994e
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: b8cfd69a708969bdc2e2f16940f9597d12a4fd1b83021c60eaa82e1101626ee0d036d96433c3e930399c152022371dce01c4880d831ae39552135fa5d1db4ae7
|
7
|
+
data.tar.gz: 10146bf5686a83fa4036586f2380c46c29d9edd7a7808488646c7994f1a8ff0305eef3ace164aa1e9296cc77f155162e9674ed9ac5fce7cbb06ad2d3c2002ef2
|
data/CHANGELOG.md
CHANGED
@@ -5,6 +5,43 @@ All notable changes to the EncodeM project will be documented in this file.
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
7
7
|
|
8
|
+
## [3.0.0] - 2025-01-03
|
9
|
+
|
10
|
+
### 🎉 Major Features
|
11
|
+
- **Complete M language subscript support!** Now includes strings and composite keys
|
12
|
+
- String encoding with proper `0xFF` prefix and escape sequences
|
13
|
+
- Composite keys for hierarchical data structures (e.g., `M("users", 42, "email")`)
|
14
|
+
- Full compatibility with YottaDB/GT.M subscript encoding
|
15
|
+
|
16
|
+
### Added
|
17
|
+
- `EncodeM::String` class for string subscripts
|
18
|
+
- `EncodeM::Composite` class for multi-component keys
|
19
|
+
- Support for variadic arguments in `M()` function
|
20
|
+
- Automatic type detection (numeric strings parse as numbers)
|
21
|
+
- Comprehensive test suite for string and composite features
|
22
|
+
- Support for nil values (converted to empty strings)
|
23
|
+
|
24
|
+
### Changed
|
25
|
+
- Float values are now truncated to integers (M language only supports integer encoding)
|
26
|
+
- `M()` function can now accept multiple arguments for composite keys
|
27
|
+
- Decoder enhanced to handle strings and composite keys
|
28
|
+
- Division operations now perform integer division
|
29
|
+
|
30
|
+
### Examples
|
31
|
+
```ruby
|
32
|
+
# Strings
|
33
|
+
M("Hello") # String encoding
|
34
|
+
M("") # Empty string
|
35
|
+
|
36
|
+
# Composite keys
|
37
|
+
M("users", 42, "email") # Database-style keys
|
38
|
+
M(2025, 1, 15) # Date as composite
|
39
|
+
M("cache", namespace, key) # Cache keys
|
40
|
+
|
41
|
+
# Mixed types
|
42
|
+
M("user", 123, "posts", -1) # All types work together
|
43
|
+
```
|
44
|
+
|
8
45
|
## [2.0.0] - 2025-09-03
|
9
46
|
|
10
47
|
### Changed
|
data/README.md
CHANGED
@@ -3,15 +3,25 @@
|
|
3
3
|
[](https://badge.fury.io/rb/encode_m)
|
4
4
|
[](LICENSE)
|
5
5
|
|
6
|
-
|
6
|
+
**🎉 Version 3.0: Complete M language subscript encoding - numbers, strings, and composite keys!**
|
7
|
+
|
8
|
+
Bringing the power of M language (MUMPS) subscript encoding to Ruby. Build hierarchical database keys like `M("users", 42, "email")` with perfect sort order. Based on YottaDB/GT.M's 40-year production-tested algorithm.
|
7
9
|
|
8
10
|
## Why You Should Use EncodeM
|
9
11
|
|
10
|
-
|
12
|
+
**Version 3.0 brings complete M language subscript support!** Not just numbers anymore - now you can encode strings and build powerful composite keys for hierarchical data structures.
|
13
|
+
|
14
|
+
If you're building anything that stores data in a database or key-value store, EncodeM is a game-changer. The magic is simple but powerful: when you encode values with EncodeM, the resulting byte strings maintain perfect sort order. This means your database can compare and sort **without ever decoding** - just pure byte comparison like strcmp().
|
11
15
|
|
12
|
-
|
16
|
+
### What's New in v3.0:
|
17
|
+
- **String encoding**: Strings sort correctly after all numbers
|
18
|
+
- **Composite keys**: Build hierarchical keys like `M("users", 42, "profile", "email")`
|
19
|
+
- **Full M compatibility**: Generate YottaDB/GT.M compatible subscripts
|
20
|
+
- **Mixed types**: Combine numbers, strings, and more in a single key
|
13
21
|
|
14
|
-
|
22
|
+
Imagine building a user database where `M("users", userId, "posts", postId)` creates perfectly sortable hierarchical keys. Or time-series data with `M(2025, 1, 15, sensorId, "temperature")`. The encoding ensures all components sort correctly - numbers before strings, maintaining hierarchical order.
|
23
|
+
|
24
|
+
This is production-tested technology - literally the same encoding that's been processing medical records and financial transactions since the 1980s in YottaDB/GT.M systems. Epic (70% of US hospitals) and VistA use this exact algorithm for their global arrays. Drop it in, encode your data, and watch your database operations get faster.
|
15
25
|
|
16
26
|
## About the M Language Heritage
|
17
27
|
|
@@ -19,11 +29,13 @@ The M language (formerly MUMPS - Massachusetts General Hospital Utility Multi-Pr
|
|
19
29
|
|
20
30
|
## Key Features
|
21
31
|
|
22
|
-
- **
|
32
|
+
- **Complete M Language Support**: Numbers, strings, and composite keys
|
33
|
+
- **Sortable Byte Encoding**: All types encode to bytes that sort correctly without decoding
|
34
|
+
- **Hierarchical Keys**: Build multi-component database keys with perfect sort order
|
23
35
|
- **Production-Tested**: Algorithm proven in healthcare and finance for 40 years
|
24
|
-
- **
|
25
|
-
- **Memory Efficient**: Compact representation
|
26
|
-
- **Database-Friendly**: Perfect for
|
36
|
+
- **YottaDB Compatible**: Generate valid YottaDB/GT.M subscripts
|
37
|
+
- **Memory Efficient**: Compact representation for all data types
|
38
|
+
- **Database-Friendly**: Perfect for B-tree indexes and key-value stores
|
27
39
|
|
28
40
|
## Installation
|
29
41
|
|
@@ -41,40 +53,84 @@ $ gem install encode_m
|
|
41
53
|
|
42
54
|
## Usage
|
43
55
|
|
56
|
+
### Numbers (Classic M encoding)
|
44
57
|
```ruby
|
45
58
|
require 'encode_m'
|
46
59
|
|
47
60
|
# Create numbers using the M() convenience method
|
48
61
|
a = M(42)
|
49
|
-
b = M(3.14)
|
62
|
+
b = M(3.14) # Floats are truncated to integers
|
50
63
|
c = M(-100)
|
51
64
|
|
52
65
|
# Arithmetic works naturally
|
53
|
-
sum = a + b # =>
|
54
|
-
product = a * M(2) # =>
|
66
|
+
sum = a + b # => M(45)
|
67
|
+
product = a * M(2) # => M(84)
|
55
68
|
|
56
69
|
# The magic: encoded bytes sort correctly!
|
57
70
|
numbers = [M(5), M(-10), M(0), M(100), M(-5)]
|
58
71
|
sorted = numbers.sort # Correctly sorted: -10, -5, 0, 5, 100
|
59
72
|
|
60
73
|
# Perfect for databases - compare without decoding
|
61
|
-
encoded_a = a.to_encoded # => "\xBD\
|
74
|
+
encoded_a = a.to_encoded # => "\xBD\x2B"
|
62
75
|
encoded_b = b.to_encoded # => "\xBC\x04"
|
63
|
-
encoded_a < encoded_b # => false (42 > 3
|
76
|
+
encoded_a < encoded_b # => false (42 > 3)
|
77
|
+
```
|
78
|
+
|
79
|
+
### Strings (New in v3.0!)
|
80
|
+
```ruby
|
81
|
+
# Encode strings - they sort after all numbers
|
82
|
+
name = M("Alice")
|
83
|
+
empty = M("") # Empty string
|
84
|
+
|
85
|
+
# M language ordering: all numbers < all strings
|
86
|
+
M(999999) < M("0") # => true
|
64
87
|
|
65
|
-
#
|
66
|
-
|
88
|
+
# String comparison maintains byte order
|
89
|
+
M("apple") < M("banana") # => true
|
90
|
+
```
|
91
|
+
|
92
|
+
### Composite Keys (New in v3.0!)
|
93
|
+
```ruby
|
94
|
+
# Build hierarchical database keys
|
95
|
+
user_email = M("users", 42, "email")
|
96
|
+
user_name = M("users", 42, "name")
|
97
|
+
user_post = M("users", 42, "posts", 1)
|
98
|
+
|
99
|
+
# Perfect for time-series data
|
100
|
+
event = M(2025, 1, 15, 14, 30, "sensor_123", "temperature")
|
101
|
+
|
102
|
+
# Keys sort hierarchically
|
103
|
+
keys = [
|
104
|
+
M("users", 2, "email"),
|
105
|
+
M("users", 1, "name"),
|
106
|
+
M("users", 1, "email"),
|
107
|
+
M("users", 2, "name")
|
108
|
+
].sort
|
109
|
+
# Result order:
|
110
|
+
# ["users", 1, "email"]
|
111
|
+
# ["users", 1, "name"]
|
112
|
+
# ["users", 2, "email"]
|
113
|
+
# ["users", 2, "name"]
|
114
|
+
|
115
|
+
# Access components
|
116
|
+
user_email[0].value # => "users"
|
117
|
+
user_email[1].value # => 42
|
118
|
+
user_email.to_a # => ["users", 42, "email"]
|
119
|
+
|
120
|
+
# Decode composite keys
|
121
|
+
encoded = user_email.to_encoded
|
122
|
+
decoded = EncodeM.decode_composite(encoded) # => ["users", 42, "email"]
|
67
123
|
```
|
68
124
|
|
69
125
|
## Format Specification
|
70
126
|
|
71
|
-
EncodeM uses the M language
|
127
|
+
EncodeM uses the complete M language subscript encoding that guarantees lexicographic byte ordering matches logical ordering for all data types.
|
72
128
|
|
73
129
|
### Encoding Structure
|
74
130
|
|
75
131
|
```
|
76
|
-
0x00 KEY_DELIMITER (
|
77
|
-
0x01 STR_SUB_ESCAPE (escape
|
132
|
+
0x00 KEY_DELIMITER (separates components in composite keys)
|
133
|
+
0x01 STR_SUB_ESCAPE (escape byte for strings)
|
78
134
|
------- NEGATIVE NUMBERS (decreasing magnitude) -------
|
79
135
|
0x3B -999,999,999 to -100,000,000 (9 digits)
|
80
136
|
0x3C -99,999,999 to -10,000,000 (8 digits)
|
@@ -97,28 +153,46 @@ EncodeM uses the M language numeric encoding that guarantees lexicographic byte
|
|
97
153
|
0xC2 1,000,000 to 9,999,999 (7 digits)
|
98
154
|
0xC3 10,000,000 to 99,999,999 (8 digits)
|
99
155
|
0xC4 100,000,000 to 999,999,999 (9 digits)
|
100
|
-
|
156
|
+
------- STRINGS -------
|
157
|
+
0xFF STR_SUB_PREFIX (all strings start with this)
|
101
158
|
```
|
102
159
|
|
160
|
+
### Numeric Encoding
|
103
161
|
- **First byte**: Determines sign and magnitude range
|
104
162
|
- **Following bytes**: Encode digit pairs (00-99) using lookup tables
|
105
163
|
- **Terminator**: Negative numbers end with `0xFF` to maintain sort order
|
106
164
|
|
165
|
+
### String Encoding
|
166
|
+
- **Prefix**: All strings start with `0xFF`
|
167
|
+
- **Content**: UTF-8 bytes of the string
|
168
|
+
- **Escaping**: Special bytes are escaped:
|
169
|
+
- `0x00` → `0x01 0xFF`
|
170
|
+
- `0x01` → `0x01 0xFE`
|
171
|
+
|
172
|
+
### Composite Key Encoding
|
173
|
+
- **Structure**: Components separated by `0x00` (KEY_DELIMITER)
|
174
|
+
- **Ordering**: Maintains hierarchical sort order
|
175
|
+
- **Example**: `M("users", 42)` → `[0xFF "users" 0x00 0xBD 0x2B]`
|
176
|
+
|
107
177
|
### Encoding Examples
|
108
178
|
|
109
|
-
|
|
110
|
-
|
111
|
-
| -1000 | `
|
112
|
-
| -
|
113
|
-
| -10 | `42 EE FF` | 2-digit negative, mantissa, terminator |
|
114
|
-
| -1 | `43 FD FF` | 1-digit negative, mantissa, terminator |
|
179
|
+
| Value | Hex Bytes | Description |
|
180
|
+
|-------|-----------|-------------|
|
181
|
+
| -1000 | `3F FD EF FF` | 4-digit negative |
|
182
|
+
| -1 | `43 FB FF` | 1-digit negative |
|
115
183
|
| 0 | `80` | Zero (single byte) |
|
116
|
-
| 1 | `BC 02` | 1-digit positive
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
121
|
-
|
184
|
+
| 1 | `BC 02` | 1-digit positive |
|
185
|
+
| 42 | `BD 2B` | 2-digit positive |
|
186
|
+
| 1000 | `BF 0B 01` | 4-digit positive |
|
187
|
+
| "Hello" | `FF 48 65 6C 6C 6F` | String with 0xFF prefix |
|
188
|
+
| "" | `FF` | Empty string |
|
189
|
+
| ["users", 42] | `FF 75 73 65 72 73 00 BD 2B` | Composite key |
|
190
|
+
| [2025, 1, 15] | `BF 14 19 00 BC 02 00 BD 10` | Date as composite |
|
191
|
+
|
192
|
+
The encoding ensures:
|
193
|
+
- `bytewise_compare(encode(x), encode(y)) == logical_compare(x, y)`
|
194
|
+
- All numbers sort before all strings
|
195
|
+
- Composite keys maintain hierarchical order
|
122
196
|
|
123
197
|
## Ordering Guarantees
|
124
198
|
|
@@ -138,13 +212,15 @@ This enables direct byte comparison in databases without decoding.
|
|
138
212
|
|
139
213
|
| Method | Description | Example |
|
140
214
|
|--------|-------------|---------|
|
141
|
-
| `M(value)` | Create
|
142
|
-
| `
|
143
|
-
| `EncodeM.
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
| `#
|
215
|
+
| `M(value)` | Create encoded value | `M(42)`, `M("hello")` |
|
216
|
+
| `M(*values)` | Create composite key | `M("users", 42, "email")` |
|
217
|
+
| `EncodeM.new(value)` | Create encoded value | `EncodeM.new(42)` |
|
218
|
+
| `EncodeM.new(*values)` | Create composite key | `EncodeM.new("users", 42)` |
|
219
|
+
| `EncodeM.decode(bytes)` | Decode bytes to value | `EncodeM.decode("\xBD\x2B")` → `42` |
|
220
|
+
| `EncodeM.decode_composite(bytes)` | Decode composite key | Returns array of components |
|
221
|
+
| `#to_encoded` | Get encoded byte string | `M(42).to_encoded` → `"\xBD\x2B"` |
|
222
|
+
| `#value` | Get original value | `M(42).value` → `42` |
|
223
|
+
| `#to_a` | Get composite components | `M("a", 1).to_a` → `["a", 1]` |
|
148
224
|
|
149
225
|
### Arithmetic Operations
|
150
226
|
|
@@ -167,21 +243,43 @@ This enables direct byte comparison in databases without decoding.
|
|
167
243
|
| `>=` | Greater or equal | `M(10) >= M(5)` → `true` |
|
168
244
|
| `<=>` | Spaceship operator | `M(5) <=> M(10)` → `-1` |
|
169
245
|
|
170
|
-
###
|
246
|
+
### Numeric Methods
|
171
247
|
|
172
248
|
| Method | Description | Example |
|
173
249
|
|--------|-------------|---------|
|
250
|
+
| `#to_i` | Convert to Integer | `M(3.14).to_i` → `3` |
|
251
|
+
| `#to_f` | Convert to Float | `M(42).to_f` → `42.0` |
|
252
|
+
| `#to_s` | Convert to String | `M(42).to_s` → `"42"` |
|
174
253
|
| `#zero?` | Check if zero | `M(0).zero?` → `true` |
|
175
254
|
| `#positive?` | Check if positive | `M(42).positive?` → `true` |
|
176
255
|
| `#negative?` | Check if negative | `M(-5).negative?` → `true` |
|
177
256
|
|
257
|
+
### String Methods
|
258
|
+
|
259
|
+
| Method | Description | Example |
|
260
|
+
|--------|-------------|---------|
|
261
|
+
| `#to_s` | Get string value | `M("hello").to_s` → `"hello"` |
|
262
|
+
| `#length` | String length | `M("hello").length` → `5` |
|
263
|
+
| `#empty?` | Check if empty | `M("").empty?` → `true` |
|
264
|
+
|
265
|
+
### Composite Methods
|
266
|
+
|
267
|
+
| Method | Description | Example |
|
268
|
+
|--------|-------------|---------|
|
269
|
+
| `#[]` | Access component | `M("a", 1)[0]` → `M("a")` |
|
270
|
+
| `#length` | Number of components | `M("a", 1, "b").length` → `3` |
|
271
|
+
| `#to_a` | Get all components | `M("a", 1).to_a` → `["a", 1]` |
|
272
|
+
|
178
273
|
## Edge Cases & Limits
|
179
274
|
|
180
275
|
### Supported Values
|
181
276
|
- **Integers**: Full range up to 18 digits
|
182
|
-
- **
|
183
|
-
- **
|
277
|
+
- **Floats**: Truncated to integers (M language design)
|
278
|
+
- **Strings**: Any UTF-8 string, with automatic escaping
|
279
|
+
- **Composite Keys**: Unlimited components of mixed types
|
280
|
+
- **Zero**: Handled as special case (single byte: `0x80`)
|
184
281
|
- **Negative numbers**: Full support with proper ordering
|
282
|
+
- **Nil**: Converted to empty string `""`
|
185
283
|
|
186
284
|
### Not Supported
|
187
285
|
- **NaN**: Raises `ArgumentError`
|
data/encode_m.gemspec
CHANGED
@@ -5,46 +5,57 @@ Gem::Specification.new do |spec|
|
|
5
5
|
spec.version = EncodeM::VERSION
|
6
6
|
spec.authors = ['Steve Shreeve']
|
7
7
|
spec.email = ['steve.shreeve@gmail.com']
|
8
|
-
|
9
|
-
spec.summary = 'M language
|
10
|
-
spec.description = 'EncodeM brings
|
11
|
-
'
|
12
|
-
'
|
13
|
-
'
|
14
|
-
'
|
15
|
-
'
|
16
|
-
'
|
8
|
+
|
9
|
+
spec.summary = 'Complete M language subscript encoding - numbers, strings, and composite keys'
|
10
|
+
spec.description = 'EncodeM v3.0 brings complete M language (MUMPS) subscript encoding to Ruby, ' \
|
11
|
+
'supporting numbers, strings, and composite keys with perfect sort order. ' \
|
12
|
+
'Build hierarchical database keys like M("users", 42, "email") that sort ' \
|
13
|
+
'correctly as raw bytes. This 40-year production-tested algorithm from ' \
|
14
|
+
'YottaDB/GT.M powers Epic (70% of US hospitals) and VistA. Perfect for ' \
|
15
|
+
'B-tree indexes, key-value stores, and any system requiring sortable ' \
|
16
|
+
'hierarchical keys. All types maintain correct ordering when compared ' \
|
17
|
+
'as byte strings - no decoding needed.'
|
17
18
|
spec.homepage = 'https://github.com/shreeve/encode_m'
|
18
19
|
spec.license = 'MIT'
|
19
20
|
spec.required_ruby_version = '>= 2.5.0'
|
20
|
-
|
21
|
+
|
21
22
|
spec.metadata['homepage_uri'] = spec.homepage
|
22
23
|
spec.metadata['source_code_uri'] = spec.homepage
|
23
24
|
spec.metadata['changelog_uri'] = "#{spec.homepage}/blob/main/CHANGELOG.md"
|
24
25
|
spec.metadata['bug_tracker_uri'] = "#{spec.homepage}/issues"
|
25
26
|
spec.metadata['documentation_uri'] = "https://rubydoc.info/gems/encode_m"
|
26
|
-
|
27
|
+
|
27
28
|
spec.files = Dir.chdir(File.expand_path('..', __FILE__)) do
|
28
|
-
`git ls-files -z`.split("\x0").reject { |f|
|
29
|
+
`git ls-files -z`.split("\x0").reject { |f|
|
29
30
|
f.match(%r{^(test|spec|features)/}) ||
|
30
31
|
f.match(%r{^\.}) ||
|
31
32
|
f == 'Gemfile.lock'
|
32
33
|
}
|
33
34
|
end
|
34
35
|
spec.require_paths = ['lib']
|
35
|
-
|
36
|
+
|
36
37
|
spec.add_development_dependency 'bundler', '~> 2.0'
|
37
38
|
spec.add_development_dependency 'rake', '~> 13.0'
|
38
39
|
spec.add_development_dependency 'minitest', '~> 5.0'
|
39
40
|
spec.add_development_dependency 'minitest-reporters', '~> 1.6'
|
40
41
|
spec.add_development_dependency 'benchmark-ips', '~> 2.10'
|
41
|
-
|
42
|
+
|
42
43
|
spec.post_install_message = <<-MSG
|
43
|
-
Thank you for installing EncodeM!
|
44
|
+
Thank you for installing EncodeM v3.0!
|
45
|
+
|
46
|
+
🎉 NEW: Complete M language support - numbers, strings, and composite keys!
|
44
47
|
|
45
48
|
Quick start:
|
46
49
|
require 'encode_m'
|
47
|
-
|
50
|
+
|
51
|
+
# Numbers
|
52
|
+
M(42)
|
53
|
+
|
54
|
+
# Strings
|
55
|
+
M("Hello")
|
56
|
+
|
57
|
+
# Composite keys
|
58
|
+
M("users", 42, "email")
|
48
59
|
|
49
60
|
Learn more: https://github.com/shreeve/encode_m
|
50
61
|
MSG
|
@@ -0,0 +1,105 @@
|
|
1
|
+
# Composite key encoding for M language subscripts
|
2
|
+
module EncodeM
|
3
|
+
class Composite
|
4
|
+
include Comparable
|
5
|
+
|
6
|
+
attr_reader :components, :encoded
|
7
|
+
|
8
|
+
def initialize(*components)
|
9
|
+
raise ArgumentError, "Composite key requires at least one component" if components.empty?
|
10
|
+
|
11
|
+
@components = components.map { |c| normalize_component(c) }
|
12
|
+
@encoded = encode_composite(@components)
|
13
|
+
end
|
14
|
+
|
15
|
+
def to_a
|
16
|
+
@components.map do |component|
|
17
|
+
case component
|
18
|
+
when EncodeM::Numeric
|
19
|
+
component.value
|
20
|
+
when EncodeM::String
|
21
|
+
component.value
|
22
|
+
else
|
23
|
+
component
|
24
|
+
end
|
25
|
+
end
|
26
|
+
end
|
27
|
+
|
28
|
+
def to_encoded
|
29
|
+
@encoded
|
30
|
+
end
|
31
|
+
|
32
|
+
def inspect
|
33
|
+
"EncodeM::Composite(#{to_a.map(&:inspect).join(', ')})"
|
34
|
+
end
|
35
|
+
|
36
|
+
def [](index)
|
37
|
+
@components[index]
|
38
|
+
end
|
39
|
+
|
40
|
+
def length
|
41
|
+
@components.length
|
42
|
+
end
|
43
|
+
|
44
|
+
alias size length
|
45
|
+
|
46
|
+
# Comparison operations
|
47
|
+
def <=>(other)
|
48
|
+
case other
|
49
|
+
when EncodeM::Composite
|
50
|
+
@encoded <=> other.encoded
|
51
|
+
when EncodeM::Numeric, EncodeM::String
|
52
|
+
# Single values sort before composites with same first element
|
53
|
+
# This maintains hierarchical ordering
|
54
|
+
first_comparison = @components.first <=> other
|
55
|
+
first_comparison == 0 ? 1 : first_comparison
|
56
|
+
else
|
57
|
+
nil
|
58
|
+
end
|
59
|
+
end
|
60
|
+
|
61
|
+
def ==(other)
|
62
|
+
case other
|
63
|
+
when EncodeM::Composite
|
64
|
+
@components == other.components
|
65
|
+
when Array
|
66
|
+
to_a == other
|
67
|
+
else
|
68
|
+
false
|
69
|
+
end
|
70
|
+
end
|
71
|
+
|
72
|
+
alias eql? ==
|
73
|
+
|
74
|
+
def hash
|
75
|
+
@components.hash
|
76
|
+
end
|
77
|
+
|
78
|
+
private
|
79
|
+
|
80
|
+
def normalize_component(value)
|
81
|
+
case value
|
82
|
+
when EncodeM::Numeric, EncodeM::String
|
83
|
+
value
|
84
|
+
when EncodeM::Composite
|
85
|
+
raise ArgumentError, "Cannot nest composite keys"
|
86
|
+
when ::Numeric # Use :: to ensure we get Ruby's Numeric
|
87
|
+
EncodeM::Numeric.new(value)
|
88
|
+
when ::String
|
89
|
+
EncodeM::String.new(value)
|
90
|
+
when NilClass
|
91
|
+
EncodeM::String.new("") # nil becomes empty string in M
|
92
|
+
else
|
93
|
+
raise ArgumentError, "Unsupported type in composite key: #{value.class}"
|
94
|
+
end
|
95
|
+
end
|
96
|
+
|
97
|
+
def encode_composite(components)
|
98
|
+
encoded_parts = components.map(&:to_encoded)
|
99
|
+
|
100
|
+
# Join with KEY_DELIMITER (0x00)
|
101
|
+
# Each component is separated by 0x00 to maintain hierarchical sorting
|
102
|
+
encoded_parts.join([Encoder::KEY_DELIMITER].pack('C'))
|
103
|
+
end
|
104
|
+
end
|
105
|
+
end
|
data/lib/encode_m/decoder.rb
CHANGED
@@ -1,4 +1,4 @@
|
|
1
|
-
# Decoder for M language numeric
|
1
|
+
# Decoder for M language encoding (numeric and string)
|
2
2
|
module EncodeM
|
3
3
|
class Decoder
|
4
4
|
POS_DECODE = Encoder::POS_CODE.each_with_index.map { |v, i| [v, i] }.to_h.freeze
|
@@ -6,14 +6,49 @@ module EncodeM
|
|
6
6
|
|
7
7
|
def self.decode(encoded_bytes)
|
8
8
|
bytes = encoded_bytes.unpack('C*')
|
9
|
-
|
10
|
-
|
9
|
+
|
10
|
+
# Check for string prefix
|
11
|
+
if bytes[0] == Encoder::STR_SUB_PREFIX
|
12
|
+
decode_string(bytes)
|
13
|
+
elsif bytes[0] == Encoder::SUBSCRIPT_ZERO
|
14
|
+
0
|
15
|
+
else
|
16
|
+
decode_numeric(bytes)
|
17
|
+
end
|
18
|
+
end
|
19
|
+
|
20
|
+
def self.decode_composite(encoded_bytes)
|
21
|
+
components = []
|
22
|
+
bytes = encoded_bytes.unpack('C*')
|
23
|
+
current = []
|
24
|
+
|
25
|
+
bytes.each do |byte|
|
26
|
+
if byte == Encoder::KEY_DELIMITER
|
27
|
+
# End of component
|
28
|
+
unless current.empty?
|
29
|
+
components << decode(current.pack('C*'))
|
30
|
+
current = []
|
31
|
+
end
|
32
|
+
else
|
33
|
+
current << byte
|
34
|
+
end
|
35
|
+
end
|
36
|
+
|
37
|
+
# Don't forget the last component
|
38
|
+
components << decode(current.pack('C*')) unless current.empty?
|
39
|
+
|
40
|
+
components
|
41
|
+
end
|
42
|
+
|
43
|
+
private
|
44
|
+
|
45
|
+
def self.decode_numeric(bytes)
|
11
46
|
first_byte = bytes[0]
|
12
|
-
|
47
|
+
|
13
48
|
# Determine if negative based on first byte
|
14
49
|
# Negative: 0x3B-0x43, Positive: 0xBC-0xC4
|
15
50
|
is_negative = first_byte < Encoder::SUBSCRIPT_ZERO
|
16
|
-
|
51
|
+
|
17
52
|
if is_negative
|
18
53
|
decode_table = NEG_DECODE
|
19
54
|
else
|
@@ -37,5 +72,24 @@ module EncodeM
|
|
37
72
|
|
38
73
|
is_negative ? -result : result
|
39
74
|
end
|
75
|
+
|
76
|
+
def self.decode_string(bytes)
|
77
|
+
result = []
|
78
|
+
i = 1 # Skip the 0xFF prefix
|
79
|
+
|
80
|
+
while i < bytes.length
|
81
|
+
if bytes[i] == Encoder::STR_SUB_ESCAPE && i + 1 < bytes.length
|
82
|
+
# Unescape: next byte is XORed with 0xFF
|
83
|
+
result << (bytes[i + 1] ^ 0xFF)
|
84
|
+
i += 2
|
85
|
+
else
|
86
|
+
result << bytes[i]
|
87
|
+
i += 1
|
88
|
+
end
|
89
|
+
end
|
90
|
+
|
91
|
+
# Force UTF-8 encoding for proper string handling
|
92
|
+
result.pack('C*').force_encoding('UTF-8')
|
93
|
+
end
|
40
94
|
end
|
41
95
|
end
|
data/lib/encode_m/numeric.rb
CHANGED
@@ -59,12 +59,30 @@ module EncodeM
|
|
59
59
|
|
60
60
|
# M language feature: encoded comparison
|
61
61
|
def <=>(other)
|
62
|
-
|
62
|
+
case other
|
63
|
+
when EncodeM::Numeric
|
64
|
+
@encoded <=> other.encoded
|
65
|
+
when EncodeM::String
|
66
|
+
-1 # Numbers always sort before strings in M language
|
67
|
+
when EncodeM::Composite
|
68
|
+
# Let Composite handle the comparison
|
69
|
+
-(other <=> self)
|
70
|
+
when Numeric
|
71
|
+
@encoded <=> self.class.new(other).encoded
|
72
|
+
else
|
73
|
+
nil
|
74
|
+
end
|
63
75
|
end
|
64
76
|
|
65
77
|
def ==(other)
|
66
|
-
|
67
|
-
|
78
|
+
case other
|
79
|
+
when EncodeM::Numeric
|
80
|
+
@value == other.value
|
81
|
+
when Numeric
|
82
|
+
@value == other
|
83
|
+
else
|
84
|
+
false
|
85
|
+
end
|
68
86
|
end
|
69
87
|
|
70
88
|
def abs
|
@@ -91,11 +109,6 @@ module EncodeM
|
|
91
109
|
end
|
92
110
|
end
|
93
111
|
|
94
|
-
# Direct encoded comparison - key M language feature
|
95
|
-
def encoded_compare(other)
|
96
|
-
@encoded <=> other.encoded
|
97
|
-
end
|
98
|
-
|
99
112
|
private
|
100
113
|
|
101
114
|
def parse_value(val)
|
@@ -105,10 +118,10 @@ module EncodeM
|
|
105
118
|
when Float
|
106
119
|
raise ArgumentError, "Cannot represent Infinity" if val.infinite?
|
107
120
|
raise ArgumentError, "Cannot represent NaN" if val.nan?
|
108
|
-
val
|
109
|
-
when String
|
121
|
+
val.to_i # M language only supports integer encoding
|
122
|
+
when ::String
|
110
123
|
if val.include?('.')
|
111
|
-
Float(val)
|
124
|
+
Float(val).to_i # M language only supports integer encoding
|
112
125
|
else
|
113
126
|
Integer(val)
|
114
127
|
end
|
@@ -0,0 +1,85 @@
|
|
1
|
+
# String encoding for M language subscripts
|
2
|
+
module EncodeM
|
3
|
+
class String
|
4
|
+
include Comparable
|
5
|
+
|
6
|
+
attr_reader :value, :encoded
|
7
|
+
|
8
|
+
def initialize(value)
|
9
|
+
@value = value.to_s
|
10
|
+
@encoded = encode_string(@value)
|
11
|
+
end
|
12
|
+
|
13
|
+
def to_s
|
14
|
+
@value
|
15
|
+
end
|
16
|
+
|
17
|
+
def to_encoded
|
18
|
+
@encoded
|
19
|
+
end
|
20
|
+
|
21
|
+
def inspect
|
22
|
+
"EncodeM::String(#{@value.inspect})"
|
23
|
+
end
|
24
|
+
|
25
|
+
# String-specific predicates
|
26
|
+
def empty?
|
27
|
+
@value.empty?
|
28
|
+
end
|
29
|
+
|
30
|
+
def length
|
31
|
+
@value.length
|
32
|
+
end
|
33
|
+
|
34
|
+
# Comparison operations
|
35
|
+
def <=>(other)
|
36
|
+
case other
|
37
|
+
when EncodeM::String
|
38
|
+
@encoded <=> other.encoded
|
39
|
+
when EncodeM::Numeric
|
40
|
+
1 # Strings always sort after numbers in M language
|
41
|
+
when EncodeM::Composite
|
42
|
+
# Let Composite handle the comparison
|
43
|
+
-(other <=> self)
|
44
|
+
else
|
45
|
+
nil
|
46
|
+
end
|
47
|
+
end
|
48
|
+
|
49
|
+
def ==(other)
|
50
|
+
case other
|
51
|
+
when EncodeM::String
|
52
|
+
@value == other.value
|
53
|
+
when ::String
|
54
|
+
@value == other
|
55
|
+
else
|
56
|
+
false
|
57
|
+
end
|
58
|
+
end
|
59
|
+
|
60
|
+
alias eql? ==
|
61
|
+
|
62
|
+
def hash
|
63
|
+
@value.hash
|
64
|
+
end
|
65
|
+
|
66
|
+
private
|
67
|
+
|
68
|
+
def encode_string(str)
|
69
|
+
result = [Encoder::STR_SUB_PREFIX] # 0xFF prefix for strings
|
70
|
+
|
71
|
+
str.bytes.each do |byte|
|
72
|
+
if byte == Encoder::KEY_DELIMITER || byte == Encoder::STR_SUB_ESCAPE
|
73
|
+
# Escape special bytes: 0x00 and 0x01
|
74
|
+
# Use 0x01 followed by (byte XOR 0xFF)
|
75
|
+
result << Encoder::STR_SUB_ESCAPE
|
76
|
+
result << (byte ^ 0xFF)
|
77
|
+
else
|
78
|
+
result << byte
|
79
|
+
end
|
80
|
+
end
|
81
|
+
|
82
|
+
result.pack('C*')
|
83
|
+
end
|
84
|
+
end
|
85
|
+
end
|
data/lib/encode_m/version.rb
CHANGED
data/lib/encode_m.rb
CHANGED
@@ -1,31 +1,69 @@
|
|
1
|
-
# EncodeM -
|
1
|
+
# EncodeM - Complete M language subscript encoding for Ruby
|
2
2
|
# Based on YottaDB/GT.M's 40-year production-tested algorithm
|
3
3
|
|
4
4
|
require 'encode_m/version'
|
5
5
|
require 'encode_m/encoder'
|
6
6
|
require 'encode_m/decoder'
|
7
7
|
require 'encode_m/numeric'
|
8
|
+
require 'encode_m/string'
|
9
|
+
require 'encode_m/composite'
|
8
10
|
|
9
11
|
module EncodeM
|
10
12
|
class Error < StandardError; end
|
11
13
|
|
12
|
-
# Factory method
|
13
|
-
def self.new(
|
14
|
-
|
14
|
+
# Factory method supporting all M types
|
15
|
+
def self.new(*values)
|
16
|
+
if values.length == 1
|
17
|
+
create_single(values[0])
|
18
|
+
else
|
19
|
+
Composite.new(*values)
|
20
|
+
end
|
15
21
|
end
|
16
22
|
|
17
23
|
# Decode - reverse the M encoding
|
18
24
|
def self.decode(encoded)
|
19
25
|
Decoder.decode(encoded)
|
20
26
|
end
|
27
|
+
|
28
|
+
# Decode composite keys
|
29
|
+
def self.decode_composite(encoded)
|
30
|
+
Decoder.decode_composite(encoded)
|
31
|
+
end
|
21
32
|
|
22
|
-
#
|
23
|
-
def self.M(
|
24
|
-
|
33
|
+
# M language style constructor
|
34
|
+
def self.M(*values)
|
35
|
+
if values.length == 1
|
36
|
+
create_single(values[0])
|
37
|
+
else
|
38
|
+
Composite.new(*values)
|
39
|
+
end
|
40
|
+
end
|
41
|
+
|
42
|
+
private
|
43
|
+
|
44
|
+
def self.create_single(value)
|
45
|
+
case value
|
46
|
+
when EncodeM::Numeric, EncodeM::String, EncodeM::Composite
|
47
|
+
value # Already encoded
|
48
|
+
when ::Numeric # Use :: to ensure we get Ruby's Numeric, not EncodeM::Numeric
|
49
|
+
Numeric.new(value)
|
50
|
+
when ::String
|
51
|
+
# Try to parse as a number first
|
52
|
+
begin
|
53
|
+
Numeric.new(value)
|
54
|
+
rescue ArgumentError
|
55
|
+
# Not a number, treat as string
|
56
|
+
String.new(value)
|
57
|
+
end
|
58
|
+
when NilClass
|
59
|
+
String.new("") # nil becomes empty string in M
|
60
|
+
else
|
61
|
+
raise ArgumentError, "Unsupported type: #{value.class}"
|
62
|
+
end
|
25
63
|
end
|
26
64
|
end
|
27
65
|
|
28
66
|
# Global convenience method (like M language global functions)
|
29
|
-
def M(
|
30
|
-
EncodeM
|
67
|
+
def M(*values)
|
68
|
+
EncodeM.M(*values)
|
31
69
|
end
|
data/logo.png
ADDED
Binary file
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: encode_m
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version:
|
4
|
+
version: 3.0.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Steve Shreeve
|
@@ -79,11 +79,13 @@ dependencies:
|
|
79
79
|
- - "~>"
|
80
80
|
- !ruby/object:Gem::Version
|
81
81
|
version: '2.10'
|
82
|
-
description: EncodeM brings
|
83
|
-
|
84
|
-
|
85
|
-
|
86
|
-
|
82
|
+
description: EncodeM v3.0 brings complete M language (MUMPS) subscript encoding to
|
83
|
+
Ruby, supporting numbers, strings, and composite keys with perfect sort order. Build
|
84
|
+
hierarchical database keys like M("users", 42, "email") that sort correctly as raw
|
85
|
+
bytes. This 40-year production-tested algorithm from YottaDB/GT.M powers Epic (70%
|
86
|
+
of US hospitals) and VistA. Perfect for B-tree indexes, key-value stores, and any
|
87
|
+
system requiring sortable hierarchical keys. All types maintain correct ordering
|
88
|
+
when compared as byte strings - no decoding needed.
|
87
89
|
email:
|
88
90
|
- steve.shreeve@gmail.com
|
89
91
|
executables: []
|
@@ -97,10 +99,13 @@ files:
|
|
97
99
|
- Rakefile
|
98
100
|
- encode_m.gemspec
|
99
101
|
- lib/encode_m.rb
|
102
|
+
- lib/encode_m/composite.rb
|
100
103
|
- lib/encode_m/decoder.rb
|
101
104
|
- lib/encode_m/encoder.rb
|
102
105
|
- lib/encode_m/numeric.rb
|
106
|
+
- lib/encode_m/string.rb
|
103
107
|
- lib/encode_m/version.rb
|
108
|
+
- logo.png
|
104
109
|
homepage: https://github.com/shreeve/encode_m
|
105
110
|
licenses:
|
106
111
|
- MIT
|
@@ -110,14 +115,10 @@ metadata:
|
|
110
115
|
changelog_uri: https://github.com/shreeve/encode_m/blob/main/CHANGELOG.md
|
111
116
|
bug_tracker_uri: https://github.com/shreeve/encode_m/issues
|
112
117
|
documentation_uri: https://rubydoc.info/gems/encode_m
|
113
|
-
post_install_message:
|
114
|
-
|
115
|
-
|
116
|
-
|
117
|
-
require 'encode_m'
|
118
|
-
a = M(42) # Create a number with M language encoding
|
119
|
-
|
120
|
-
Learn more: https://github.com/shreeve/encode_m
|
118
|
+
post_install_message: "Thank you for installing EncodeM v3.0!\n\n\U0001F389 NEW: Complete
|
119
|
+
M language support - numbers, strings, and composite keys!\n\nQuick start:\n require
|
120
|
+
'encode_m'\n\n # Numbers\n M(42)\n\n # Strings\n M(\"Hello\")\n\n # Composite
|
121
|
+
keys\n M(\"users\", 42, \"email\")\n\nLearn more: https://github.com/shreeve/encode_m\n"
|
121
122
|
rdoc_options: []
|
122
123
|
require_paths:
|
123
124
|
- lib
|
@@ -134,5 +135,6 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
134
135
|
requirements: []
|
135
136
|
rubygems_version: 3.7.1
|
136
137
|
specification_version: 4
|
137
|
-
summary: M language
|
138
|
+
summary: Complete M language subscript encoding - numbers, strings, and composite
|
139
|
+
keys
|
138
140
|
test_files: []
|