encode_m 2.0.0 → 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: d40d8792ba3c7759d3c00820f6646ff1e0ca1dced7260359d1c0b19105d582bd
4
- data.tar.gz: 885ed4aa86eda098308e22da56be642437a2ee89a7d74594cdfde9b3ef6abff4
3
+ metadata.gz: 97b4b00c071667466ef61b65805c3143abcbc42720f629b1a0ee30f9fef0d200
4
+ data.tar.gz: 07e37e38818a96b8ba30330422d6ec32f31c33694a3c36a3515433b91cc6994e
5
5
  SHA512:
6
- metadata.gz: e2547603a7a54d6371d93fe2d0fd524111ca477570ee36365c37a12cb7a1ec765d8a085aa60850b07e978f9ae85ee53c982aa365bf5eb2a8ed3ec9ed90a339b9
7
- data.tar.gz: 6131029ca37383c3fdae7c908413a523603e25ebbc6c6594f11b909ca958b4e8120e294821544f7e8e64dd28dd0f4407c19088740b0104d37fbcb946305b8a5d
6
+ metadata.gz: b8cfd69a708969bdc2e2f16940f9597d12a4fd1b83021c60eaa82e1101626ee0d036d96433c3e930399c152022371dce01c4880d831ae39552135fa5d1db4ae7
7
+ data.tar.gz: 10146bf5686a83fa4036586f2380c46c29d9edd7a7808488646c7994f1a8ff0305eef3ace164aa1e9296cc77f155162e9674ed9ac5fce7cbb06ad2d3c2002ef2
data/CHANGELOG.md CHANGED
@@ -5,6 +5,43 @@ All notable changes to the EncodeM project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [3.0.0] - 2025-01-03
9
+
10
+ ### 🎉 Major Features
11
+ - **Complete M language subscript support!** Now includes strings and composite keys
12
+ - String encoding with proper `0xFF` prefix and escape sequences
13
+ - Composite keys for hierarchical data structures (e.g., `M("users", 42, "email")`)
14
+ - Full compatibility with YottaDB/GT.M subscript encoding
15
+
16
+ ### Added
17
+ - `EncodeM::String` class for string subscripts
18
+ - `EncodeM::Composite` class for multi-component keys
19
+ - Support for variadic arguments in `M()` function
20
+ - Automatic type detection (numeric strings parse as numbers)
21
+ - Comprehensive test suite for string and composite features
22
+ - Support for nil values (converted to empty strings)
23
+
24
+ ### Changed
25
+ - Float values are now truncated to integers (M language only supports integer encoding)
26
+ - `M()` function can now accept multiple arguments for composite keys
27
+ - Decoder enhanced to handle strings and composite keys
28
+ - Division operations now perform integer division
29
+
30
+ ### Examples
31
+ ```ruby
32
+ # Strings
33
+ M("Hello") # String encoding
34
+ M("") # Empty string
35
+
36
+ # Composite keys
37
+ M("users", 42, "email") # Database-style keys
38
+ M(2025, 1, 15) # Date as composite
39
+ M("cache", namespace, key) # Cache keys
40
+
41
+ # Mixed types
42
+ M("user", 123, "posts", -1) # All types work together
43
+ ```
44
+
8
45
  ## [2.0.0] - 2025-09-03
9
46
 
10
47
  ### Changed
data/README.md CHANGED
@@ -3,15 +3,25 @@
3
3
  [![Gem Version](https://badge.fury.io/rb/encode_m.svg)](https://badge.fury.io/rb/encode_m)
4
4
  [![MIT License](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
5
5
 
6
- Bringing the power of M language (MUMPS) numeric encoding to Ruby. Based on YottaDB/GT.M's 40-year production-tested algorithm.
6
+ **🎉 Version 3.0: Complete M language subscript encoding - numbers, strings, and composite keys!**
7
+
8
+ Bringing the power of M language (MUMPS) subscript encoding to Ruby. Build hierarchical database keys like `M("users", 42, "email")` with perfect sort order. Based on YottaDB/GT.M's 40-year production-tested algorithm.
7
9
 
8
10
  ## Why You Should Use EncodeM
9
11
 
10
- If you're building anything that stores numbers in a database or key-value store, EncodeM is a game-changer. The magic is simple but powerful: when you encode numbers with EncodeM, the resulting byte strings maintain numeric sort order. This means your database can compare and sort numbers **without ever decoding them** - just pure byte comparison like strcmp(). Imagine your B-tree indexes comparing numbers 3x faster because they never deserialize, or range queries that just compare raw bytes. This is the secret sauce that's been powering Epic (used by 70% of US hospitals) and other M language systems for 40 years.
12
+ **Version 3.0 brings complete M language subscript support!** Not just numbers anymore - now you can encode strings and build powerful composite keys for hierarchical data structures.
13
+
14
+ If you're building anything that stores data in a database or key-value store, EncodeM is a game-changer. The magic is simple but powerful: when you encode values with EncodeM, the resulting byte strings maintain perfect sort order. This means your database can compare and sort **without ever decoding** - just pure byte comparison like strcmp().
11
15
 
12
- Beyond the sorting superpower, EncodeM is surprisingly memory efficient. Small numbers (1-99) take just 2 bytes compared to 8 for a Float, and common values stay compact at 2-6 bytes. You get 18 digits of precision - more than Float but without BigDecimal's overhead. The encoding handles positive, negative, and zero correctly, maintaining perfect sort order across the entire numeric range.
16
+ ### What's New in v3.0:
17
+ - **String encoding**: Strings sort correctly after all numbers
18
+ - **Composite keys**: Build hierarchical keys like `M("users", 42, "profile", "email")`
19
+ - **Full M compatibility**: Generate YottaDB/GT.M compatible subscripts
20
+ - **Mixed types**: Combine numbers, strings, and more in a single key
13
21
 
14
- The best part? It's production-tested technology. This isn't some experimental algorithm - it's literally the same encoding that's been processing medical records and financial transactions since the 1980s in YottaDB/GT.M systems. If you're building a system where you need sortable numeric keys (think time-series data, financial ledgers, or any ordered numeric index), EncodeM gives you the performance of byte-level operations with the correctness of proper numeric comparison. Drop it in, encode your numbers, and watch your database operations get faster.
22
+ Imagine building a user database where `M("users", userId, "posts", postId)` creates perfectly sortable hierarchical keys. Or time-series data with `M(2025, 1, 15, sensorId, "temperature")`. The encoding ensures all components sort correctly - numbers before strings, maintaining hierarchical order.
23
+
24
+ This is production-tested technology - literally the same encoding that's been processing medical records and financial transactions since the 1980s in YottaDB/GT.M systems. Epic (70% of US hospitals) and VistA use this exact algorithm for their global arrays. Drop it in, encode your data, and watch your database operations get faster.
15
25
 
16
26
  ## About the M Language Heritage
17
27
 
@@ -19,11 +29,13 @@ The M language (formerly MUMPS - Massachusetts General Hospital Utility Multi-Pr
19
29
 
20
30
  ## Key Features
21
31
 
22
- - **Sortable Byte Encoding**: Numbers encode to bytes that sort correctly without decoding
32
+ - **Complete M Language Support**: Numbers, strings, and composite keys
33
+ - **Sortable Byte Encoding**: All types encode to bytes that sort correctly without decoding
34
+ - **Hierarchical Keys**: Build multi-component database keys with perfect sort order
23
35
  - **Production-Tested**: Algorithm proven in healthcare and finance for 40 years
24
- - **Optimized for Real Use**: Special handling for common number ranges
25
- - **Memory Efficient**: Compact representation, especially for small integers
26
- - **Database-Friendly**: Perfect for indexing and byte-wise comparisons
36
+ - **YottaDB Compatible**: Generate valid YottaDB/GT.M subscripts
37
+ - **Memory Efficient**: Compact representation for all data types
38
+ - **Database-Friendly**: Perfect for B-tree indexes and key-value stores
27
39
 
28
40
  ## Installation
29
41
 
@@ -41,40 +53,84 @@ $ gem install encode_m
41
53
 
42
54
  ## Usage
43
55
 
56
+ ### Numbers (Classic M encoding)
44
57
  ```ruby
45
58
  require 'encode_m'
46
59
 
47
60
  # Create numbers using the M() convenience method
48
61
  a = M(42)
49
- b = M(3.14)
62
+ b = M(3.14) # Floats are truncated to integers
50
63
  c = M(-100)
51
64
 
52
65
  # Arithmetic works naturally
53
- sum = a + b # => EncodeM(45.14)
54
- product = a * M(2) # => EncodeM(84)
66
+ sum = a + b # => M(45)
67
+ product = a * M(2) # => M(84)
55
68
 
56
69
  # The magic: encoded bytes sort correctly!
57
70
  numbers = [M(5), M(-10), M(0), M(100), M(-5)]
58
71
  sorted = numbers.sort # Correctly sorted: -10, -5, 0, 5, 100
59
72
 
60
73
  # Perfect for databases - compare without decoding
61
- encoded_a = a.to_encoded # => "\xBD\x43"
74
+ encoded_a = a.to_encoded # => "\xBD\x2B"
62
75
  encoded_b = b.to_encoded # => "\xBC\x04"
63
- encoded_a < encoded_b # => false (42 > 3.14)
76
+ encoded_a < encoded_b # => false (42 > 3)
77
+ ```
78
+
79
+ ### Strings (New in v3.0!)
80
+ ```ruby
81
+ # Encode strings - they sort after all numbers
82
+ name = M("Alice")
83
+ empty = M("") # Empty string
84
+
85
+ # M language ordering: all numbers < all strings
86
+ M(999999) < M("0") # => true
64
87
 
65
- # Decode back to numbers
66
- original = EncodeM.decode(encoded_a) # => 42
88
+ # String comparison maintains byte order
89
+ M("apple") < M("banana") # => true
90
+ ```
91
+
92
+ ### Composite Keys (New in v3.0!)
93
+ ```ruby
94
+ # Build hierarchical database keys
95
+ user_email = M("users", 42, "email")
96
+ user_name = M("users", 42, "name")
97
+ user_post = M("users", 42, "posts", 1)
98
+
99
+ # Perfect for time-series data
100
+ event = M(2025, 1, 15, 14, 30, "sensor_123", "temperature")
101
+
102
+ # Keys sort hierarchically
103
+ keys = [
104
+ M("users", 2, "email"),
105
+ M("users", 1, "name"),
106
+ M("users", 1, "email"),
107
+ M("users", 2, "name")
108
+ ].sort
109
+ # Result order:
110
+ # ["users", 1, "email"]
111
+ # ["users", 1, "name"]
112
+ # ["users", 2, "email"]
113
+ # ["users", 2, "name"]
114
+
115
+ # Access components
116
+ user_email[0].value # => "users"
117
+ user_email[1].value # => 42
118
+ user_email.to_a # => ["users", 42, "email"]
119
+
120
+ # Decode composite keys
121
+ encoded = user_email.to_encoded
122
+ decoded = EncodeM.decode_composite(encoded) # => ["users", 42, "email"]
67
123
  ```
68
124
 
69
125
  ## Format Specification
70
126
 
71
- EncodeM uses the M language numeric encoding that guarantees lexicographic byte ordering matches numeric ordering.
127
+ EncodeM uses the complete M language subscript encoding that guarantees lexicographic byte ordering matches logical ordering for all data types.
72
128
 
73
129
  ### Encoding Structure
74
130
 
75
131
  ```
76
- 0x00 KEY_DELIMITER (terminator)
77
- 0x01 STR_SUB_ESCAPE (escape in strings)
132
+ 0x00 KEY_DELIMITER (separates components in composite keys)
133
+ 0x01 STR_SUB_ESCAPE (escape byte for strings)
78
134
  ------- NEGATIVE NUMBERS (decreasing magnitude) -------
79
135
  0x3B -999,999,999 to -100,000,000 (9 digits)
80
136
  0x3C -99,999,999 to -10,000,000 (8 digits)
@@ -97,28 +153,46 @@ EncodeM uses the M language numeric encoding that guarantees lexicographic byte
97
153
  0xC2 1,000,000 to 9,999,999 (7 digits)
98
154
  0xC3 10,000,000 to 99,999,999 (8 digits)
99
155
  0xC4 100,000,000 to 999,999,999 (9 digits)
100
- 0xFF STR_SUB_PREFIX (string marker)
156
+ ------- STRINGS -------
157
+ 0xFF STR_SUB_PREFIX (all strings start with this)
101
158
  ```
102
159
 
160
+ ### Numeric Encoding
103
161
  - **First byte**: Determines sign and magnitude range
104
162
  - **Following bytes**: Encode digit pairs (00-99) using lookup tables
105
163
  - **Terminator**: Negative numbers end with `0xFF` to maintain sort order
106
164
 
165
+ ### String Encoding
166
+ - **Prefix**: All strings start with `0xFF`
167
+ - **Content**: UTF-8 bytes of the string
168
+ - **Escaping**: Special bytes are escaped:
169
+ - `0x00` → `0x01 0xFF`
170
+ - `0x01` → `0x01 0xFE`
171
+
172
+ ### Composite Key Encoding
173
+ - **Structure**: Components separated by `0x00` (KEY_DELIMITER)
174
+ - **Ordering**: Maintains hierarchical sort order
175
+ - **Example**: `M("users", 42)` → `[0xFF "users" 0x00 0xBD 0x2B]`
176
+
107
177
  ### Encoding Examples
108
178
 
109
- | Number | Hex Bytes | Explanation |
110
- |--------|-----------|-------------|
111
- | -1000 | `40 EE FE FF` | 4-digit negative, mantissa, terminator |
112
- | -100 | `41 FD FE FF` | 3-digit negative, mantissa, terminator |
113
- | -10 | `42 EE FF` | 2-digit negative, mantissa, terminator |
114
- | -1 | `43 FD FF` | 1-digit negative, mantissa, terminator |
179
+ | Value | Hex Bytes | Description |
180
+ |-------|-----------|-------------|
181
+ | -1000 | `3F FD EF FF` | 4-digit negative |
182
+ | -1 | `43 FB FF` | 1-digit negative |
115
183
  | 0 | `80` | Zero (single byte) |
116
- | 1 | `BC 02` | 1-digit positive, mantissa |
117
- | 10 | `BD 11` | 2-digit positive, mantissa |
118
- | 100 | `BE 02 01` | 3-digit positive, mantissa |
119
- | 1000 | `BF 11 01` | 4-digit positive, mantissa |
120
-
121
- The encoding ensures: `bytewise_compare(encode(x), encode(y)) == numeric_compare(x, y)`
184
+ | 1 | `BC 02` | 1-digit positive |
185
+ | 42 | `BD 2B` | 2-digit positive |
186
+ | 1000 | `BF 0B 01` | 4-digit positive |
187
+ | "Hello" | `FF 48 65 6C 6C 6F` | String with 0xFF prefix |
188
+ | "" | `FF` | Empty string |
189
+ | ["users", 42] | `FF 75 73 65 72 73 00 BD 2B` | Composite key |
190
+ | [2025, 1, 15] | `BF 14 19 00 BC 02 00 BD 10` | Date as composite |
191
+
192
+ The encoding ensures:
193
+ - `bytewise_compare(encode(x), encode(y)) == logical_compare(x, y)`
194
+ - All numbers sort before all strings
195
+ - Composite keys maintain hierarchical order
122
196
 
123
197
  ## Ordering Guarantees
124
198
 
@@ -138,13 +212,15 @@ This enables direct byte comparison in databases without decoding.
138
212
 
139
213
  | Method | Description | Example |
140
214
  |--------|-------------|---------|
141
- | `M(value)` | Create EncodeM number (global) | `M(42)` |
142
- | `EncodeM.new(value)` | Create EncodeM number | `EncodeM.new(42)` |
143
- | `EncodeM.decode(bytes)` | Decode bytes to number | `EncodeM.decode("\x41\x43")` → `42` |
144
- | `#to_encoded` | Get encoded byte string | `M(42).to_encoded` → `"\x41\x43"` |
145
- | `#to_i` | Convert to Integer | `M(3.14).to_i` → `3` |
146
- | `#to_f` | Convert to Float | `M(42).to_f` `42.0` |
147
- | `#to_s` | Convert to String | `M(42).to_s` → `"42"` |
215
+ | `M(value)` | Create encoded value | `M(42)`, `M("hello")` |
216
+ | `M(*values)` | Create composite key | `M("users", 42, "email")` |
217
+ | `EncodeM.new(value)` | Create encoded value | `EncodeM.new(42)` |
218
+ | `EncodeM.new(*values)` | Create composite key | `EncodeM.new("users", 42)` |
219
+ | `EncodeM.decode(bytes)` | Decode bytes to value | `EncodeM.decode("\xBD\x2B")` → `42` |
220
+ | `EncodeM.decode_composite(bytes)` | Decode composite key | Returns array of components |
221
+ | `#to_encoded` | Get encoded byte string | `M(42).to_encoded` → `"\xBD\x2B"` |
222
+ | `#value` | Get original value | `M(42).value` → `42` |
223
+ | `#to_a` | Get composite components | `M("a", 1).to_a` → `["a", 1]` |
148
224
 
149
225
  ### Arithmetic Operations
150
226
 
@@ -167,21 +243,43 @@ This enables direct byte comparison in databases without decoding.
167
243
  | `>=` | Greater or equal | `M(10) >= M(5)` → `true` |
168
244
  | `<=>` | Spaceship operator | `M(5) <=> M(10)` → `-1` |
169
245
 
170
- ### Predicates
246
+ ### Numeric Methods
171
247
 
172
248
  | Method | Description | Example |
173
249
  |--------|-------------|---------|
250
+ | `#to_i` | Convert to Integer | `M(3.14).to_i` → `3` |
251
+ | `#to_f` | Convert to Float | `M(42).to_f` → `42.0` |
252
+ | `#to_s` | Convert to String | `M(42).to_s` → `"42"` |
174
253
  | `#zero?` | Check if zero | `M(0).zero?` → `true` |
175
254
  | `#positive?` | Check if positive | `M(42).positive?` → `true` |
176
255
  | `#negative?` | Check if negative | `M(-5).negative?` → `true` |
177
256
 
257
+ ### String Methods
258
+
259
+ | Method | Description | Example |
260
+ |--------|-------------|---------|
261
+ | `#to_s` | Get string value | `M("hello").to_s` → `"hello"` |
262
+ | `#length` | String length | `M("hello").length` → `5` |
263
+ | `#empty?` | Check if empty | `M("").empty?` → `true` |
264
+
265
+ ### Composite Methods
266
+
267
+ | Method | Description | Example |
268
+ |--------|-------------|---------|
269
+ | `#[]` | Access component | `M("a", 1)[0]` → `M("a")` |
270
+ | `#length` | Number of components | `M("a", 1, "b").length` → `3` |
271
+ | `#to_a` | Get all components | `M("a", 1).to_a` → `["a", 1]` |
272
+
178
273
  ## Edge Cases & Limits
179
274
 
180
275
  ### Supported Values
181
276
  - **Integers**: Full range up to 18 digits
182
- - **Decimals**: Currently converts to integer (decimal support planned)
183
- - **Zero**: Handled as special case (single byte: `0x40`)
277
+ - **Floats**: Truncated to integers (M language design)
278
+ - **Strings**: Any UTF-8 string, with automatic escaping
279
+ - **Composite Keys**: Unlimited components of mixed types
280
+ - **Zero**: Handled as special case (single byte: `0x80`)
184
281
  - **Negative numbers**: Full support with proper ordering
282
+ - **Nil**: Converted to empty string `""`
185
283
 
186
284
  ### Not Supported
187
285
  - **NaN**: Raises `ArgumentError`
data/encode_m.gemspec CHANGED
@@ -5,46 +5,57 @@ Gem::Specification.new do |spec|
5
5
  spec.version = EncodeM::VERSION
6
6
  spec.authors = ['Steve Shreeve']
7
7
  spec.email = ['steve.shreeve@gmail.com']
8
-
9
- spec.summary = 'M language numeric encoding for Ruby - sortable, efficient, production-tested'
10
- spec.description = 'EncodeM brings a 40-year production-tested numeric encoding algorithm ' \
11
- 'from YottaDB/GT.M to Ruby. This algorithm from the M language (MUMPS) ' \
12
- 'provides efficient numeric handling with the unique property that ' \
13
- 'encoded byte strings maintain sort order. Perfect for database ' \
14
- 'operations, financial calculations, and systems requiring efficient ' \
15
- 'sortable number storage. A practical alternative between Float and ' \
16
- 'BigDecimal.'
8
+
9
+ spec.summary = 'Complete M language subscript encoding - numbers, strings, and composite keys'
10
+ spec.description = 'EncodeM v3.0 brings complete M language (MUMPS) subscript encoding to Ruby, ' \
11
+ 'supporting numbers, strings, and composite keys with perfect sort order. ' \
12
+ 'Build hierarchical database keys like M("users", 42, "email") that sort ' \
13
+ 'correctly as raw bytes. This 40-year production-tested algorithm from ' \
14
+ 'YottaDB/GT.M powers Epic (70% of US hospitals) and VistA. Perfect for ' \
15
+ 'B-tree indexes, key-value stores, and any system requiring sortable ' \
16
+ 'hierarchical keys. All types maintain correct ordering when compared ' \
17
+ 'as byte strings - no decoding needed.'
17
18
  spec.homepage = 'https://github.com/shreeve/encode_m'
18
19
  spec.license = 'MIT'
19
20
  spec.required_ruby_version = '>= 2.5.0'
20
-
21
+
21
22
  spec.metadata['homepage_uri'] = spec.homepage
22
23
  spec.metadata['source_code_uri'] = spec.homepage
23
24
  spec.metadata['changelog_uri'] = "#{spec.homepage}/blob/main/CHANGELOG.md"
24
25
  spec.metadata['bug_tracker_uri'] = "#{spec.homepage}/issues"
25
26
  spec.metadata['documentation_uri'] = "https://rubydoc.info/gems/encode_m"
26
-
27
+
27
28
  spec.files = Dir.chdir(File.expand_path('..', __FILE__)) do
28
- `git ls-files -z`.split("\x0").reject { |f|
29
+ `git ls-files -z`.split("\x0").reject { |f|
29
30
  f.match(%r{^(test|spec|features)/}) ||
30
31
  f.match(%r{^\.}) ||
31
32
  f == 'Gemfile.lock'
32
33
  }
33
34
  end
34
35
  spec.require_paths = ['lib']
35
-
36
+
36
37
  spec.add_development_dependency 'bundler', '~> 2.0'
37
38
  spec.add_development_dependency 'rake', '~> 13.0'
38
39
  spec.add_development_dependency 'minitest', '~> 5.0'
39
40
  spec.add_development_dependency 'minitest-reporters', '~> 1.6'
40
41
  spec.add_development_dependency 'benchmark-ips', '~> 2.10'
41
-
42
+
42
43
  spec.post_install_message = <<-MSG
43
- Thank you for installing EncodeM!
44
+ Thank you for installing EncodeM v3.0!
45
+
46
+ 🎉 NEW: Complete M language support - numbers, strings, and composite keys!
44
47
 
45
48
  Quick start:
46
49
  require 'encode_m'
47
- a = M(42) # Create a number with M language encoding
50
+
51
+ # Numbers
52
+ M(42)
53
+
54
+ # Strings
55
+ M("Hello")
56
+
57
+ # Composite keys
58
+ M("users", 42, "email")
48
59
 
49
60
  Learn more: https://github.com/shreeve/encode_m
50
61
  MSG
@@ -0,0 +1,105 @@
1
+ # Composite key encoding for M language subscripts
2
+ module EncodeM
3
+ class Composite
4
+ include Comparable
5
+
6
+ attr_reader :components, :encoded
7
+
8
+ def initialize(*components)
9
+ raise ArgumentError, "Composite key requires at least one component" if components.empty?
10
+
11
+ @components = components.map { |c| normalize_component(c) }
12
+ @encoded = encode_composite(@components)
13
+ end
14
+
15
+ def to_a
16
+ @components.map do |component|
17
+ case component
18
+ when EncodeM::Numeric
19
+ component.value
20
+ when EncodeM::String
21
+ component.value
22
+ else
23
+ component
24
+ end
25
+ end
26
+ end
27
+
28
+ def to_encoded
29
+ @encoded
30
+ end
31
+
32
+ def inspect
33
+ "EncodeM::Composite(#{to_a.map(&:inspect).join(', ')})"
34
+ end
35
+
36
+ def [](index)
37
+ @components[index]
38
+ end
39
+
40
+ def length
41
+ @components.length
42
+ end
43
+
44
+ alias size length
45
+
46
+ # Comparison operations
47
+ def <=>(other)
48
+ case other
49
+ when EncodeM::Composite
50
+ @encoded <=> other.encoded
51
+ when EncodeM::Numeric, EncodeM::String
52
+ # Single values sort before composites with same first element
53
+ # This maintains hierarchical ordering
54
+ first_comparison = @components.first <=> other
55
+ first_comparison == 0 ? 1 : first_comparison
56
+ else
57
+ nil
58
+ end
59
+ end
60
+
61
+ def ==(other)
62
+ case other
63
+ when EncodeM::Composite
64
+ @components == other.components
65
+ when Array
66
+ to_a == other
67
+ else
68
+ false
69
+ end
70
+ end
71
+
72
+ alias eql? ==
73
+
74
+ def hash
75
+ @components.hash
76
+ end
77
+
78
+ private
79
+
80
+ def normalize_component(value)
81
+ case value
82
+ when EncodeM::Numeric, EncodeM::String
83
+ value
84
+ when EncodeM::Composite
85
+ raise ArgumentError, "Cannot nest composite keys"
86
+ when ::Numeric # Use :: to ensure we get Ruby's Numeric
87
+ EncodeM::Numeric.new(value)
88
+ when ::String
89
+ EncodeM::String.new(value)
90
+ when NilClass
91
+ EncodeM::String.new("") # nil becomes empty string in M
92
+ else
93
+ raise ArgumentError, "Unsupported type in composite key: #{value.class}"
94
+ end
95
+ end
96
+
97
+ def encode_composite(components)
98
+ encoded_parts = components.map(&:to_encoded)
99
+
100
+ # Join with KEY_DELIMITER (0x00)
101
+ # Each component is separated by 0x00 to maintain hierarchical sorting
102
+ encoded_parts.join([Encoder::KEY_DELIMITER].pack('C'))
103
+ end
104
+ end
105
+ end
@@ -1,4 +1,4 @@
1
- # Decoder for M language numeric encoding
1
+ # Decoder for M language encoding (numeric and string)
2
2
  module EncodeM
3
3
  class Decoder
4
4
  POS_DECODE = Encoder::POS_CODE.each_with_index.map { |v, i| [v, i] }.to_h.freeze
@@ -6,14 +6,49 @@ module EncodeM
6
6
 
7
7
  def self.decode(encoded_bytes)
8
8
  bytes = encoded_bytes.unpack('C*')
9
- return 0 if bytes[0] == Encoder::SUBSCRIPT_ZERO
10
-
9
+
10
+ # Check for string prefix
11
+ if bytes[0] == Encoder::STR_SUB_PREFIX
12
+ decode_string(bytes)
13
+ elsif bytes[0] == Encoder::SUBSCRIPT_ZERO
14
+ 0
15
+ else
16
+ decode_numeric(bytes)
17
+ end
18
+ end
19
+
20
+ def self.decode_composite(encoded_bytes)
21
+ components = []
22
+ bytes = encoded_bytes.unpack('C*')
23
+ current = []
24
+
25
+ bytes.each do |byte|
26
+ if byte == Encoder::KEY_DELIMITER
27
+ # End of component
28
+ unless current.empty?
29
+ components << decode(current.pack('C*'))
30
+ current = []
31
+ end
32
+ else
33
+ current << byte
34
+ end
35
+ end
36
+
37
+ # Don't forget the last component
38
+ components << decode(current.pack('C*')) unless current.empty?
39
+
40
+ components
41
+ end
42
+
43
+ private
44
+
45
+ def self.decode_numeric(bytes)
11
46
  first_byte = bytes[0]
12
-
47
+
13
48
  # Determine if negative based on first byte
14
49
  # Negative: 0x3B-0x43, Positive: 0xBC-0xC4
15
50
  is_negative = first_byte < Encoder::SUBSCRIPT_ZERO
16
-
51
+
17
52
  if is_negative
18
53
  decode_table = NEG_DECODE
19
54
  else
@@ -37,5 +72,24 @@ module EncodeM
37
72
 
38
73
  is_negative ? -result : result
39
74
  end
75
+
76
+ def self.decode_string(bytes)
77
+ result = []
78
+ i = 1 # Skip the 0xFF prefix
79
+
80
+ while i < bytes.length
81
+ if bytes[i] == Encoder::STR_SUB_ESCAPE && i + 1 < bytes.length
82
+ # Unescape: next byte is XORed with 0xFF
83
+ result << (bytes[i + 1] ^ 0xFF)
84
+ i += 2
85
+ else
86
+ result << bytes[i]
87
+ i += 1
88
+ end
89
+ end
90
+
91
+ # Force UTF-8 encoding for proper string handling
92
+ result.pack('C*').force_encoding('UTF-8')
93
+ end
40
94
  end
41
95
  end
@@ -59,12 +59,30 @@ module EncodeM
59
59
 
60
60
  # M language feature: encoded comparison
61
61
  def <=>(other)
62
- @encoded <=> self.class.new(other).encoded
62
+ case other
63
+ when EncodeM::Numeric
64
+ @encoded <=> other.encoded
65
+ when EncodeM::String
66
+ -1 # Numbers always sort before strings in M language
67
+ when EncodeM::Composite
68
+ # Let Composite handle the comparison
69
+ -(other <=> self)
70
+ when Numeric
71
+ @encoded <=> self.class.new(other).encoded
72
+ else
73
+ nil
74
+ end
63
75
  end
64
76
 
65
77
  def ==(other)
66
- return false unless other.is_a?(self.class) || other.is_a?(::Numeric)
67
- @value == coerce_value(other)
78
+ case other
79
+ when EncodeM::Numeric
80
+ @value == other.value
81
+ when Numeric
82
+ @value == other
83
+ else
84
+ false
85
+ end
68
86
  end
69
87
 
70
88
  def abs
@@ -91,11 +109,6 @@ module EncodeM
91
109
  end
92
110
  end
93
111
 
94
- # Direct encoded comparison - key M language feature
95
- def encoded_compare(other)
96
- @encoded <=> other.encoded
97
- end
98
-
99
112
  private
100
113
 
101
114
  def parse_value(val)
@@ -105,10 +118,10 @@ module EncodeM
105
118
  when Float
106
119
  raise ArgumentError, "Cannot represent Infinity" if val.infinite?
107
120
  raise ArgumentError, "Cannot represent NaN" if val.nan?
108
- val
109
- when String
121
+ val.to_i # M language only supports integer encoding
122
+ when ::String
110
123
  if val.include?('.')
111
- Float(val)
124
+ Float(val).to_i # M language only supports integer encoding
112
125
  else
113
126
  Integer(val)
114
127
  end
@@ -0,0 +1,85 @@
1
+ # String encoding for M language subscripts
2
+ module EncodeM
3
+ class String
4
+ include Comparable
5
+
6
+ attr_reader :value, :encoded
7
+
8
+ def initialize(value)
9
+ @value = value.to_s
10
+ @encoded = encode_string(@value)
11
+ end
12
+
13
+ def to_s
14
+ @value
15
+ end
16
+
17
+ def to_encoded
18
+ @encoded
19
+ end
20
+
21
+ def inspect
22
+ "EncodeM::String(#{@value.inspect})"
23
+ end
24
+
25
+ # String-specific predicates
26
+ def empty?
27
+ @value.empty?
28
+ end
29
+
30
+ def length
31
+ @value.length
32
+ end
33
+
34
+ # Comparison operations
35
+ def <=>(other)
36
+ case other
37
+ when EncodeM::String
38
+ @encoded <=> other.encoded
39
+ when EncodeM::Numeric
40
+ 1 # Strings always sort after numbers in M language
41
+ when EncodeM::Composite
42
+ # Let Composite handle the comparison
43
+ -(other <=> self)
44
+ else
45
+ nil
46
+ end
47
+ end
48
+
49
+ def ==(other)
50
+ case other
51
+ when EncodeM::String
52
+ @value == other.value
53
+ when ::String
54
+ @value == other
55
+ else
56
+ false
57
+ end
58
+ end
59
+
60
+ alias eql? ==
61
+
62
+ def hash
63
+ @value.hash
64
+ end
65
+
66
+ private
67
+
68
+ def encode_string(str)
69
+ result = [Encoder::STR_SUB_PREFIX] # 0xFF prefix for strings
70
+
71
+ str.bytes.each do |byte|
72
+ if byte == Encoder::KEY_DELIMITER || byte == Encoder::STR_SUB_ESCAPE
73
+ # Escape special bytes: 0x00 and 0x01
74
+ # Use 0x01 followed by (byte XOR 0xFF)
75
+ result << Encoder::STR_SUB_ESCAPE
76
+ result << (byte ^ 0xFF)
77
+ else
78
+ result << byte
79
+ end
80
+ end
81
+
82
+ result.pack('C*')
83
+ end
84
+ end
85
+ end
@@ -1,4 +1,4 @@
1
1
  module EncodeM
2
- VERSION = "2.0.0"
3
- # Honoring 40 years of M language (MUMPS) innovation from GT.M/YottaDB
2
+ VERSION = "3.0.0"
3
+ # Complete M language subscript encoding - now with strings and composite keys!
4
4
  end
data/lib/encode_m.rb CHANGED
@@ -1,31 +1,69 @@
1
- # EncodeM - Bringing M language numeric encoding to Ruby
1
+ # EncodeM - Complete M language subscript encoding for Ruby
2
2
  # Based on YottaDB/GT.M's 40-year production-tested algorithm
3
3
 
4
4
  require 'encode_m/version'
5
5
  require 'encode_m/encoder'
6
6
  require 'encode_m/decoder'
7
7
  require 'encode_m/numeric'
8
+ require 'encode_m/string'
9
+ require 'encode_m/composite'
8
10
 
9
11
  module EncodeM
10
12
  class Error < StandardError; end
11
13
 
12
- # Factory method honoring M language convention
13
- def self.new(value)
14
- Numeric.new(value)
14
+ # Factory method supporting all M types
15
+ def self.new(*values)
16
+ if values.length == 1
17
+ create_single(values[0])
18
+ else
19
+ Composite.new(*values)
20
+ end
15
21
  end
16
22
 
17
23
  # Decode - reverse the M encoding
18
24
  def self.decode(encoded)
19
25
  Decoder.decode(encoded)
20
26
  end
27
+
28
+ # Decode composite keys
29
+ def self.decode_composite(encoded)
30
+ Decoder.decode_composite(encoded)
31
+ end
21
32
 
22
- # Alias for M language users
23
- def self.M(value)
24
- Numeric.new(value)
33
+ # M language style constructor
34
+ def self.M(*values)
35
+ if values.length == 1
36
+ create_single(values[0])
37
+ else
38
+ Composite.new(*values)
39
+ end
40
+ end
41
+
42
+ private
43
+
44
+ def self.create_single(value)
45
+ case value
46
+ when EncodeM::Numeric, EncodeM::String, EncodeM::Composite
47
+ value # Already encoded
48
+ when ::Numeric # Use :: to ensure we get Ruby's Numeric, not EncodeM::Numeric
49
+ Numeric.new(value)
50
+ when ::String
51
+ # Try to parse as a number first
52
+ begin
53
+ Numeric.new(value)
54
+ rescue ArgumentError
55
+ # Not a number, treat as string
56
+ String.new(value)
57
+ end
58
+ when NilClass
59
+ String.new("") # nil becomes empty string in M
60
+ else
61
+ raise ArgumentError, "Unsupported type: #{value.class}"
62
+ end
25
63
  end
26
64
  end
27
65
 
28
66
  # Global convenience method (like M language global functions)
29
- def M(value)
30
- EncodeM::Numeric.new(value)
67
+ def M(*values)
68
+ EncodeM.M(*values)
31
69
  end
data/logo.png ADDED
Binary file
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: encode_m
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.0.0
4
+ version: 3.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Steve Shreeve
@@ -79,11 +79,13 @@ dependencies:
79
79
  - - "~>"
80
80
  - !ruby/object:Gem::Version
81
81
  version: '2.10'
82
- description: EncodeM brings a 40-year production-tested numeric encoding algorithm
83
- from YottaDB/GT.M to Ruby. This algorithm from the M language (MUMPS) provides efficient
84
- numeric handling with the unique property that encoded byte strings maintain sort
85
- order. Perfect for database operations, financial calculations, and systems requiring
86
- efficient sortable number storage. A practical alternative between Float and BigDecimal.
82
+ description: EncodeM v3.0 brings complete M language (MUMPS) subscript encoding to
83
+ Ruby, supporting numbers, strings, and composite keys with perfect sort order. Build
84
+ hierarchical database keys like M("users", 42, "email") that sort correctly as raw
85
+ bytes. This 40-year production-tested algorithm from YottaDB/GT.M powers Epic (70%
86
+ of US hospitals) and VistA. Perfect for B-tree indexes, key-value stores, and any
87
+ system requiring sortable hierarchical keys. All types maintain correct ordering
88
+ when compared as byte strings - no decoding needed.
87
89
  email:
88
90
  - steve.shreeve@gmail.com
89
91
  executables: []
@@ -97,10 +99,13 @@ files:
97
99
  - Rakefile
98
100
  - encode_m.gemspec
99
101
  - lib/encode_m.rb
102
+ - lib/encode_m/composite.rb
100
103
  - lib/encode_m/decoder.rb
101
104
  - lib/encode_m/encoder.rb
102
105
  - lib/encode_m/numeric.rb
106
+ - lib/encode_m/string.rb
103
107
  - lib/encode_m/version.rb
108
+ - logo.png
104
109
  homepage: https://github.com/shreeve/encode_m
105
110
  licenses:
106
111
  - MIT
@@ -110,14 +115,10 @@ metadata:
110
115
  changelog_uri: https://github.com/shreeve/encode_m/blob/main/CHANGELOG.md
111
116
  bug_tracker_uri: https://github.com/shreeve/encode_m/issues
112
117
  documentation_uri: https://rubydoc.info/gems/encode_m
113
- post_install_message: |
114
- Thank you for installing EncodeM!
115
-
116
- Quick start:
117
- require 'encode_m'
118
- a = M(42) # Create a number with M language encoding
119
-
120
- Learn more: https://github.com/shreeve/encode_m
118
+ post_install_message: "Thank you for installing EncodeM v3.0!\n\n\U0001F389 NEW: Complete
119
+ M language support - numbers, strings, and composite keys!\n\nQuick start:\n require
120
+ 'encode_m'\n\n # Numbers\n M(42)\n\n # Strings\n M(\"Hello\")\n\n # Composite
121
+ keys\n M(\"users\", 42, \"email\")\n\nLearn more: https://github.com/shreeve/encode_m\n"
121
122
  rdoc_options: []
122
123
  require_paths:
123
124
  - lib
@@ -134,5 +135,6 @@ required_rubygems_version: !ruby/object:Gem::Requirement
134
135
  requirements: []
135
136
  rubygems_version: 3.7.1
136
137
  specification_version: 4
137
- summary: M language numeric encoding for Ruby - sortable, efficient, production-tested
138
+ summary: Complete M language subscript encoding - numbers, strings, and composite
139
+ keys
138
140
  test_files: []