smart_message 0.0.10 → 0.0.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (169) hide show
  1. checksums.yaml +4 -4
  2. data/.github/workflows/deploy-github-pages.yml +38 -0
  3. data/.gitignore +5 -0
  4. data/CHANGELOG.md +30 -0
  5. data/Gemfile.lock +35 -4
  6. data/README.md +169 -71
  7. data/Rakefile +29 -4
  8. data/docs/assets/images/ddq_architecture.svg +130 -0
  9. data/docs/assets/images/dlq_architecture.svg +115 -0
  10. data/docs/assets/images/enhanced-dual-publishing.svg +136 -0
  11. data/docs/assets/images/enhanced-fluent-api.svg +149 -0
  12. data/docs/assets/images/enhanced-microservices-routing.svg +115 -0
  13. data/docs/assets/images/enhanced-pattern-matching.svg +107 -0
  14. data/docs/assets/images/fluent-api-demo.svg +59 -0
  15. data/docs/assets/images/performance-comparison.svg +161 -0
  16. data/docs/assets/images/redis-basic-architecture.svg +53 -0
  17. data/docs/assets/images/redis-enhanced-architecture.svg +88 -0
  18. data/docs/assets/images/redis-queue-architecture.svg +101 -0
  19. data/docs/assets/images/smart_message.jpg +0 -0
  20. data/docs/assets/images/smart_message_walking.jpg +0 -0
  21. data/docs/assets/images/smartmessage_architecture_overview.svg +173 -0
  22. data/docs/assets/images/transport-comparison-matrix.svg +171 -0
  23. data/docs/assets/javascripts/mathjax.js +17 -0
  24. data/docs/assets/stylesheets/extra.css +51 -0
  25. data/docs/{addressing.md → core-concepts/addressing.md} +5 -7
  26. data/docs/{architecture.md → core-concepts/architecture.md} +78 -138
  27. data/docs/{dispatcher.md → core-concepts/dispatcher.md} +21 -21
  28. data/docs/{message_filtering.md → core-concepts/message-filtering.md} +2 -3
  29. data/docs/{message_processing.md → core-concepts/message-processing.md} +17 -17
  30. data/docs/{troubleshooting.md → development/troubleshooting.md} +7 -7
  31. data/docs/{examples.md → getting-started/examples.md} +115 -89
  32. data/docs/{getting-started.md → getting-started/quick-start.md} +47 -18
  33. data/docs/guides/redis-queue-getting-started.md +697 -0
  34. data/docs/guides/redis-queue-patterns.md +889 -0
  35. data/docs/guides/redis-queue-production.md +1091 -0
  36. data/docs/index.md +64 -0
  37. data/docs/{dead_letter_queue.md → reference/dead-letter-queue.md} +2 -3
  38. data/docs/{logging.md → reference/logging.md} +1 -1
  39. data/docs/{message_deduplication.md → reference/message-deduplication.md} +1 -0
  40. data/docs/{proc_handlers_summary.md → reference/proc-handlers.md} +7 -6
  41. data/docs/{serializers.md → reference/serializers.md} +3 -5
  42. data/docs/{transports.md → reference/transports.md} +133 -11
  43. data/docs/transports/memory-transport.md +374 -0
  44. data/docs/transports/redis-enhanced-transport.md +524 -0
  45. data/docs/transports/redis-queue-transport.md +1304 -0
  46. data/docs/transports/redis-transport-comparison.md +496 -0
  47. data/docs/transports/redis-transport.md +509 -0
  48. data/examples/README.md +98 -5
  49. data/examples/city_scenario/911_emergency_call_flow.svg +99 -0
  50. data/examples/city_scenario/README.md +515 -0
  51. data/examples/city_scenario/ai_visitor_intelligence_flow.svg +108 -0
  52. data/examples/city_scenario/citizen.rb +195 -0
  53. data/examples/city_scenario/city_diagram.svg +125 -0
  54. data/examples/city_scenario/common/health_monitor.rb +80 -0
  55. data/examples/city_scenario/common/logger.rb +30 -0
  56. data/examples/city_scenario/emergency_dispatch_center.rb +270 -0
  57. data/examples/city_scenario/fire_department.rb +446 -0
  58. data/examples/city_scenario/fire_emergency_flow.svg +95 -0
  59. data/examples/city_scenario/health_department.rb +100 -0
  60. data/examples/city_scenario/health_monitoring_system.svg +130 -0
  61. data/examples/city_scenario/house.rb +244 -0
  62. data/examples/city_scenario/local_bank.rb +217 -0
  63. data/examples/city_scenario/messages/emergency_911_message.rb +81 -0
  64. data/examples/city_scenario/messages/emergency_resolved_message.rb +43 -0
  65. data/examples/city_scenario/messages/fire_dispatch_message.rb +43 -0
  66. data/examples/city_scenario/messages/fire_emergency_message.rb +45 -0
  67. data/examples/city_scenario/messages/health_check_message.rb +22 -0
  68. data/examples/city_scenario/messages/health_status_message.rb +35 -0
  69. data/examples/city_scenario/messages/police_dispatch_message.rb +46 -0
  70. data/examples/city_scenario/messages/silent_alarm_message.rb +38 -0
  71. data/examples/city_scenario/police_department.rb +316 -0
  72. data/examples/city_scenario/redis_monitor.rb +129 -0
  73. data/examples/city_scenario/redis_stats.rb +743 -0
  74. data/examples/city_scenario/room_for_improvement.md +240 -0
  75. data/examples/city_scenario/security_emergency_flow.svg +95 -0
  76. data/examples/city_scenario/service_internal_architecture.svg +154 -0
  77. data/examples/city_scenario/smart_message_ai_agent.rb +364 -0
  78. data/examples/city_scenario/start_demo.sh +236 -0
  79. data/examples/city_scenario/stop_demo.sh +106 -0
  80. data/examples/city_scenario/visitor.rb +631 -0
  81. data/examples/{10_message_deduplication.rb → memory/01_message_deduplication_demo.rb} +1 -1
  82. data/examples/{09_dead_letter_queue_demo.rb → memory/02_dead_letter_queue_demo.rb} +13 -40
  83. data/examples/{01_point_to_point_orders.rb → memory/03_point_to_point_orders.rb} +1 -1
  84. data/examples/{02_publish_subscribe_events.rb → memory/04_publish_subscribe_events.rb} +2 -2
  85. data/examples/{03_many_to_many_chat.rb → memory/05_many_to_many_chat.rb} +4 -4
  86. data/examples/{show_me.rb → memory/06_pretty_print_demo.rb} +1 -1
  87. data/examples/{05_proc_handlers.rb → memory/07_proc_handlers_demo.rb} +2 -2
  88. data/examples/{06_custom_logger_example.rb → memory/08_custom_logger_demo.rb} +17 -14
  89. data/examples/{07_error_handling_scenarios.rb → memory/09_error_handling_demo.rb} +4 -4
  90. data/examples/{08_entity_addressing_basic.rb → memory/10_entity_addressing_basic.rb} +8 -8
  91. data/examples/{08_entity_addressing_with_filtering.rb → memory/11_entity_addressing_with_filtering.rb} +6 -6
  92. data/examples/{09_regex_filtering_microservices.rb → memory/12_regex_filtering_microservices.rb} +2 -2
  93. data/examples/{10_header_block_configuration.rb → memory/13_header_block_configuration.rb} +6 -6
  94. data/examples/{11_global_configuration_example.rb → memory/14_global_configuration_demo.rb} +19 -8
  95. data/examples/{show_logger.rb → memory/15_logger_demo.rb} +1 -1
  96. data/examples/memory/README.md +163 -0
  97. data/examples/memory/memory_transport_architecture.svg +90 -0
  98. data/examples/memory/point_to_point_pattern.svg +94 -0
  99. data/examples/memory/publish_subscribe_pattern.svg +125 -0
  100. data/examples/{04_redis_smart_home_iot.rb → redis/01_smart_home_iot_demo.rb} +5 -5
  101. data/examples/redis/README.md +230 -0
  102. data/examples/redis/alert_system_flow.svg +127 -0
  103. data/examples/redis/dashboard_status_flow.svg +107 -0
  104. data/examples/redis/device_command_flow.svg +113 -0
  105. data/examples/redis/redis_transport_architecture.svg +115 -0
  106. data/examples/{smart_home_iot_dataflow.md → redis/smart_home_iot_dataflow.md} +4 -116
  107. data/examples/redis/smart_home_system_architecture.svg +133 -0
  108. data/examples/redis_enhanced/README.md +319 -0
  109. data/examples/redis_enhanced/enhanced_01_basic_patterns.rb +233 -0
  110. data/examples/redis_enhanced/enhanced_02_fluent_api.rb +331 -0
  111. data/examples/redis_enhanced/enhanced_03_dual_publishing.rb +281 -0
  112. data/examples/redis_enhanced/enhanced_04_advanced_routing.rb +419 -0
  113. data/examples/redis_queue/01_basic_messaging.rb +221 -0
  114. data/examples/redis_queue/01_comprehensive_examples.rb +508 -0
  115. data/examples/redis_queue/02_pattern_routing.rb +405 -0
  116. data/examples/redis_queue/03_fluent_api.rb +422 -0
  117. data/examples/redis_queue/04_load_balancing.rb +486 -0
  118. data/examples/redis_queue/05_microservices.rb +735 -0
  119. data/examples/redis_queue/06_emergency_alerts.rb +777 -0
  120. data/examples/redis_queue/07_queue_management.rb +587 -0
  121. data/examples/redis_queue/README.md +366 -0
  122. data/examples/redis_queue/enhanced_01_basic_patterns.rb +233 -0
  123. data/examples/redis_queue/enhanced_02_fluent_api.rb +331 -0
  124. data/examples/redis_queue/enhanced_03_dual_publishing.rb +281 -0
  125. data/examples/redis_queue/enhanced_04_advanced_routing.rb +419 -0
  126. data/examples/redis_queue/redis_queue_architecture.svg +148 -0
  127. data/ideas/README.md +41 -0
  128. data/ideas/agents.md +1001 -0
  129. data/ideas/database_transport.md +980 -0
  130. data/ideas/improvement.md +359 -0
  131. data/ideas/meshage.md +1788 -0
  132. data/ideas/message_discovery.md +178 -0
  133. data/ideas/message_schema.md +1381 -0
  134. data/lib/smart_message/.idea/.gitignore +8 -0
  135. data/lib/smart_message/.idea/markdown.xml +6 -0
  136. data/lib/smart_message/.idea/misc.xml +4 -0
  137. data/lib/smart_message/.idea/modules.xml +8 -0
  138. data/lib/smart_message/.idea/smart_message.iml +16 -0
  139. data/lib/smart_message/.idea/vcs.xml +6 -0
  140. data/lib/smart_message/addressing.rb +15 -0
  141. data/lib/smart_message/base.rb +0 -2
  142. data/lib/smart_message/configuration.rb +1 -1
  143. data/lib/smart_message/logger.rb +15 -4
  144. data/lib/smart_message/plugins.rb +5 -2
  145. data/lib/smart_message/serializer.rb +14 -0
  146. data/lib/smart_message/transport/redis_enhanced_transport.rb +399 -0
  147. data/lib/smart_message/transport/redis_queue_transport.rb +555 -0
  148. data/lib/smart_message/transport/registry.rb +1 -0
  149. data/lib/smart_message/transport.rb +34 -1
  150. data/lib/smart_message/version.rb +1 -1
  151. data/lib/smart_message.rb +5 -52
  152. data/mkdocs.yml +184 -0
  153. data/p2p_plan.md +326 -0
  154. data/p2p_roadmap.md +287 -0
  155. data/smart_message.gemspec +2 -0
  156. data/smart_message.svg +51 -0
  157. metadata +170 -44
  158. data/docs/README.md +0 -57
  159. data/examples/dead_letters.jsonl +0 -12
  160. data/examples/temp.txt +0 -94
  161. data/examples/tmux_chat/README.md +0 -283
  162. data/examples/tmux_chat/bot_agent.rb +0 -278
  163. data/examples/tmux_chat/human_agent.rb +0 -199
  164. data/examples/tmux_chat/room_monitor.rb +0 -160
  165. data/examples/tmux_chat/shared_chat_system.rb +0 -328
  166. data/examples/tmux_chat/start_chat_demo.sh +0 -190
  167. data/examples/tmux_chat/stop_chat_demo.sh +0 -22
  168. /data/docs/{properties.md → core-concepts/properties.md} +0 -0
  169. /data/docs/{ideas_to_think_about.md → development/ideas.md} +0 -0
data/ideas/meshage.md ADDED
@@ -0,0 +1,1788 @@
1
+ # Meshage: True Mesh Network Transport for SmartMessage
2
+
3
+ ## Overview
4
+
5
+ Meshage (Mesh + Message) would be a fully decentralized mesh network transport for SmartMessage that enables resilient message passing without any central coordination. In a true mesh network, publishers don't need to know where subscribers are located - they simply publish messages addressed to a subscriber/service, and the mesh network automatically routes the message through intermediate nodes until it reaches the destination or expires.
6
+
7
+ ## Lessons from P2P2 Gem
8
+
9
+ The p2p2 Ruby gem provides excellent patterns for NAT traversal and P2P connection management that directly apply to mesh networking:
10
+
11
+ ### NAT Hole Punching Architecture
12
+ P2P2 uses a pairing daemon (paird) that coordinates P2P connections between clients behind NATs:
13
+
14
+ ```ruby
15
+ # Adapted from p2p2's approach for SmartMessage mesh nodes
16
+ class MeshHolePunchingService
17
+ def initialize(coordination_port = 4040)
18
+ @coordination_servers = [] # Multiple servers for redundancy
19
+ @active_sessions = {} # node_id => session_info
20
+
21
+ # Create multiple UDP sockets on different ports (like p2p2)
22
+ 10.times do |i|
23
+ port = coordination_port + i
24
+ socket = UDPSocket.new
25
+ socket.bind("0.0.0.0", port)
26
+ @coordination_servers << { socket: socket, port: port }
27
+ end
28
+ end
29
+
30
+ # Node announces itself to establish P2P connections
31
+ def announce_node(node_id, capabilities)
32
+ # Similar to p2p2's "title" concept but for mesh nodes
33
+ session_data = {
34
+ node_id: node_id,
35
+ local_services: capabilities[:services],
36
+ is_bridge: capabilities[:bridge_node],
37
+ announced_at: Time.now
38
+ }
39
+
40
+ # Send to random coordination port (load balancing like p2p2)
41
+ server = @coordination_servers.sample
42
+ server[:socket].send(session_data.to_json, 0,
43
+ coordination_address, server[:port])
44
+ end
45
+
46
+ # Coordinate hole punching between two nodes
47
+ def coordinate_connection(node1_id, node2_id)
48
+ node1_session = @active_sessions[node1_id]
49
+ node2_session = @active_sessions[node2_id]
50
+
51
+ return unless node1_session && node2_session
52
+
53
+ # Exchange address info (like p2p2's paird logic)
54
+ send_peer_address(node1_session, node2_session[:address])
55
+ send_peer_address(node2_session, node1_session[:address])
56
+ end
57
+ end
58
+ ```
59
+
60
+ ### Connection Management Patterns
61
+ P2P2's worker architecture with role-based socket management:
62
+
63
+ ```ruby
64
+ class MeshNodeWorker
65
+ def initialize
66
+ @sockets = {}
67
+ @socket_roles = {} # socket => :mesh_peer, :local_service, :bridge
68
+ @read_sockets = []
69
+ @write_sockets = []
70
+
71
+ # Buffer management (adapted from p2p2's buffering)
72
+ @peer_buffers = {} # peer_id => { read_buffer: "", write_buffer: "" }
73
+ @buffer_limits = {
74
+ max_buffer_size: 50 * 1024 * 1024, # 50MB like p2p2
75
+ resume_threshold: 25 * 1024 * 1024 # Resume when below 25MB
76
+ }
77
+ end
78
+
79
+ def main_loop
80
+ loop do
81
+ readable, writable = IO.select(@read_sockets, @write_sockets)
82
+
83
+ readable.each do |socket|
84
+ role = @socket_roles[socket]
85
+ case role
86
+ when :mesh_peer
87
+ handle_peer_message(socket)
88
+ when :local_service
89
+ handle_service_message(socket)
90
+ when :bridge
91
+ handle_bridge_message(socket)
92
+ end
93
+ end
94
+
95
+ writable.each do |socket|
96
+ flush_write_buffer(socket)
97
+ end
98
+ end
99
+ end
100
+
101
+ # Flow control like p2p2 - pause reading when buffers full
102
+ def handle_buffer_overflow(peer_id)
103
+ peer_socket = find_peer_socket(peer_id)
104
+ @read_sockets.delete(peer_socket) # Pause reading
105
+
106
+ # Resume when buffer drains (checked periodically)
107
+ schedule_buffer_check(peer_id)
108
+ end
109
+ end
110
+ ```
111
+
112
+ ### Multi-Port UDP Coordination
113
+ P2P2 uses multiple UDP ports to improve NAT traversal success:
114
+
115
+ ```ruby
116
+ class MeshCoordinationService
117
+ def initialize(base_port = 4040)
118
+ @coordination_ports = []
119
+
120
+ # Create 10 coordination ports like p2p2
121
+ 10.times do |i|
122
+ port = base_port + i
123
+ socket = UDPSocket.new
124
+ socket.setsockopt(Socket::SOL_SOCKET, Socket::SO_REUSEPORT, 1)
125
+ socket.bind("0.0.0.0", port)
126
+
127
+ @coordination_ports << {
128
+ socket: socket,
129
+ port: port,
130
+ active_sessions: {}
131
+ }
132
+ end
133
+ end
134
+
135
+ def coordinate_mesh_connection(requester_id, target_service)
136
+ # Find nodes that provide target_service
137
+ candidate_nodes = find_service_providers(target_service)
138
+
139
+ candidate_nodes.each do |node_info|
140
+ # Attempt hole punching to each candidate
141
+ attempt_hole_punch(requester_id, node_info[:node_id])
142
+ end
143
+ end
144
+
145
+ # P2P2-style room/session management for mesh
146
+ def manage_mesh_sessions
147
+ @coordination_ports.each do |port_info|
148
+ port_info[:active_sessions].each do |session_id, session|
149
+ if session_expired?(session)
150
+ cleanup_session(session_id)
151
+ end
152
+ end
153
+ end
154
+ end
155
+ end
156
+ ```
157
+
158
+ ### TCP Tunneling Over UDP Holes
159
+ P2P2 establishes UDP holes then creates TCP connections through them:
160
+
161
+ ```ruby
162
+ class MeshTCPTunnel
163
+ def initialize(local_service_port, remote_peer_address)
164
+ @local_service_port = local_service_port
165
+ @remote_peer_address = remote_peer_address
166
+ @tcp_connections = {}
167
+
168
+ # Create tunnel socket through UDP hole (like p2p2)
169
+ @tunnel_socket = establish_tcp_through_udp_hole
170
+ end
171
+
172
+ def establish_tcp_through_udp_hole
173
+ # First establish UDP hole
174
+ udp_socket = create_udp_hole(@remote_peer_address)
175
+
176
+ # Then create TCP connection using same local port
177
+ tcp_socket = Socket.new(Socket::AF_INET, Socket::SOCK_STREAM, 0)
178
+ tcp_socket.setsockopt(Socket::IPPROTO_TCP, Socket::TCP_NODELAY, 1)
179
+ tcp_socket.bind(udp_socket.local_address) # Reuse UDP hole port
180
+
181
+ # Connect through the hole (may require multiple attempts like p2p2)
182
+ retry_count = 0
183
+ begin
184
+ tcp_socket.connect_nonblock(@remote_peer_address)
185
+ rescue IO::WaitWritable
186
+ retry_count += 1
187
+ if retry_count < 5 # P2P2's PUNCH_LIMIT
188
+ sleep(0.1)
189
+ retry
190
+ else
191
+ raise "Failed to establish TCP tunnel after #{retry_count} attempts"
192
+ end
193
+ end
194
+
195
+ tcp_socket
196
+ end
197
+
198
+ # Bridge local service to remote mesh node
199
+ def bridge_service_traffic
200
+ local_service = TCPSocket.new("127.0.0.1", @local_service_port)
201
+
202
+ # Bidirectional forwarding like p2p2's tun/dst pattern
203
+ Thread.new do
204
+ loop do
205
+ data = local_service.read_nonblock(1024 * 1024) # P2P2's READ_SIZE
206
+ @tunnel_socket.write(data)
207
+ rescue IO::WaitReadable
208
+ # Handle using IO.select like p2p2
209
+ end
210
+ end
211
+
212
+ Thread.new do
213
+ loop do
214
+ data = @tunnel_socket.read_nonblock(1024 * 1024)
215
+ local_service.write(data)
216
+ rescue IO::WaitReadable
217
+ # Handle using IO.select like p2p2
218
+ end
219
+ end
220
+ end
221
+ end
222
+ ```
223
+
224
+ ### Key Improvements for SmartMessage Mesh
225
+
226
+ **Better Service Discovery:**
227
+ P2P2 uses simple "room" names. Mesh needs service-based discovery:
228
+
229
+ ```ruby
230
+ # P2P2 style
231
+ "room_name" # Simple string matching
232
+
233
+ # Mesh style
234
+ {
235
+ service_name: "inventory-service",
236
+ capabilities: [:read, :write],
237
+ version: "2.1",
238
+ region: "us-west"
239
+ }
240
+ ```
241
+
242
+ **Message Routing vs Direct Tunneling:**
243
+ P2P2 creates direct tunnels. Mesh needs multi-hop routing:
244
+
245
+ ```ruby
246
+ # P2P2: Direct tunnel
247
+ Client A → Coordination Server → Client B
248
+ (establish direct tunnel)
249
+
250
+ # Mesh: Multi-hop routing
251
+ Publisher → Node A → Node C → Node F → Subscriber
252
+ (route through intermediate nodes)
253
+ ```
254
+
255
+ **SmartMessage Integration:**
256
+ P2P2 forwards raw TCP streams. Mesh handles typed messages:
257
+
258
+ ```ruby
259
+ # P2P2: Raw data forwarding
260
+ tun_socket.write(raw_data)
261
+
262
+ # Mesh: SmartMessage integration
263
+ mesh_connection.send_message(order_message)
264
+ ```
265
+
266
+ ### P2P2 Advantages for Mesh
267
+
268
+ 1. **Proven NAT Traversal**: P2P2's hole punching works reliably across different NAT types
269
+ 2. **Efficient Buffering**: Flow control prevents memory exhaustion during high traffic
270
+ 3. **Multi-Port Strategy**: Increases success rate of establishing connections
271
+ 4. **Graceful Degradation**: Handles connection failures and retries intelligently
272
+ 5. **Resource Management**: Proper cleanup of expired sessions and connections
273
+ 6. **Non-Blocking I/O**: Uses IO.select for efficient concurrent connection handling
274
+
275
+ The p2p2 gem provides the low-level P2P connection primitives that mesh networking builds upon - specifically NAT traversal, connection establishment, and traffic forwarding. For SmartMessage mesh, we'd use these patterns as the foundation layer while adding service discovery, message routing, and distributed coordination on top.
276
+
277
+ ## Key Mesh Network Principles
278
+
279
+ ### 1. Complete Decentralization
280
+ Every node in the mesh can route messages. No central authority, brokers, or coordination points.
281
+
282
+ ### 2. Location-Agnostic Publishing
283
+ Publishers send messages to subscriber IDs or service names without knowing which physical node hosts them:
284
+
285
+ ```ruby
286
+ # Publisher doesn't know or care where inventory-service runs
287
+ OrderMessage.new(
288
+ order_id: "123",
289
+ items: ["widget", "gadget"]
290
+ ).publish(to: "inventory-service")
291
+
292
+ # The mesh network figures out routing automatically
293
+ ```
294
+
295
+ ### 3. Multi-Hop Message Routing
296
+ Messages travel through intermediate nodes to reach their destination:
297
+
298
+ ```
299
+ Node A → Node C → Node F → Node K (inventory-service)
300
+ ```
301
+
302
+ ### 4. Self-Terminating Messages
303
+ Messages include TTL (Time To Live) or hop limits to prevent infinite routing loops.
304
+
305
+ ## Core Concepts
306
+
307
+ ### Peer Discovery - Local vs Inter-Network
308
+
309
+ Mesh networks need different discovery mechanisms for local vs remote networks:
310
+
311
+ #### Local Network Discovery (UDP Broadcast)
312
+ ```ruby
313
+ class LocalNetworkDiscovery
314
+ def initialize(mesh_node)
315
+ @mesh_node = mesh_node
316
+ @udp_port = 31337
317
+ @multicast_address = '224.220.221.222'
318
+ end
319
+
320
+ def start_discovery
321
+ # UDP multicast for local network discovery
322
+ start_udp_broadcaster
323
+ start_udp_listener
324
+ end
325
+
326
+ def broadcast_presence
327
+ message = {
328
+ node_id: @mesh_node.id,
329
+ services: @mesh_node.local_services,
330
+ tcp_port: @mesh_node.tcp_port,
331
+ is_bridge: @mesh_node.bridge_node?,
332
+ bridge_networks: @mesh_node.bridge_networks
333
+ }
334
+
335
+ @udp_socket.send(message.to_json, 0, @multicast_address, @udp_port)
336
+ end
337
+ end
338
+ ```
339
+
340
+ #### Bridge Nodes for Inter-Network Connectivity
341
+ ```ruby
342
+ class BridgeNode < MeshNode
343
+ def initialize(options = {})
344
+ super
345
+ @bridge_networks = options[:bridge_networks] || []
346
+ @external_connections = {} # network_id => [P2PConnection]
347
+ @bootstrap_nodes = options[:bootstrap_nodes] || []
348
+ end
349
+
350
+ def bridge_node?
351
+ true
352
+ end
353
+
354
+ def start
355
+ super
356
+
357
+ # Connect to other networks via TCP to known bridge nodes
358
+ connect_to_external_networks
359
+
360
+ # Advertise bridge capability in local UDP broadcasts
361
+ advertise_bridge_capability
362
+ end
363
+
364
+ private
365
+
366
+ def connect_to_external_networks
367
+ @bootstrap_nodes.each do |external_address|
368
+ # TCP connection to bridge nodes in other networks
369
+ connection = P2PConnection.new(external_address, protocol: :tcp)
370
+
371
+ begin
372
+ connection.establish_secure_channel(@keypair)
373
+ network_id = determine_network_id(external_address)
374
+ @external_connections[network_id] ||= []
375
+ @external_connections[network_id] << connection
376
+
377
+ # Exchange routing information with remote network
378
+ exchange_inter_network_routes(connection, network_id)
379
+ rescue => e
380
+ logger.warn "Failed to connect to external network #{external_address}: #{e}"
381
+ end
382
+ end
383
+ end
384
+
385
+ def route_message(message)
386
+ if local_network_destination?(message.to)
387
+ # Route within local network using UDP-discovered nodes
388
+ route_locally(message)
389
+ else
390
+ # Route to external network via TCP bridge connections
391
+ route_to_external_network(message)
392
+ end
393
+ end
394
+
395
+ def route_to_external_network(message)
396
+ target_network = determine_target_network(message.to)
397
+
398
+ if bridge_connections = @external_connections[target_network]
399
+ # Send via TCP to bridge nodes in target network
400
+ bridge_connections.each do |connection|
401
+ connection.send_message(message)
402
+ end
403
+ else
404
+ # Don't know target network - flood to all external connections
405
+ @external_connections.each_value do |connections|
406
+ connections.each { |conn| conn.send_message(message) }
407
+ end
408
+ end
409
+ end
410
+ end
411
+ ```
412
+
413
+ #### Network Topology Examples
414
+
415
+ **Single Local Network:**
416
+ ```
417
+ [Node A] ←UDP→ [Node B] ←UDP→ [Node C]
418
+ ↑ ↑
419
+ UDP multicast discovery works for all nodes
420
+ ```
421
+
422
+ **Multi-Network Mesh with Bridges:**
423
+ ```
424
+ Local Network 1: Local Network 2:
425
+ [Node A] ←UDP→ [Bridge B] [Bridge D] ←UDP→ [Node E]
426
+ ↑ ↑
427
+ TCP Bridge Connection
428
+ (crosses router boundaries)
429
+
430
+ Bridge B connects:
431
+ - Local nodes via UDP (A, others on network 1)
432
+ - Remote networks via TCP (Bridge D on network 2)
433
+ ```
434
+
435
+ ### Local Knowledge Model
436
+
437
+ Each node only knows about its immediate connections - this keeps the system scalable:
438
+
439
+ ```ruby
440
+ class MeshNode
441
+ def initialize
442
+ @node_id = generate_node_id
443
+
444
+ # ONLY know about directly connected peers
445
+ @connected_peers = {} # peer_id => P2PConnection
446
+
447
+ # ONLY know about local subscribers
448
+ @local_subscribers = {} # service_name => [callback_handlers]
449
+
450
+ # NO global knowledge of who subscribes to what on other nodes
451
+ @routing_cache = LRU.new(100) # Cache successful routes
452
+ end
453
+
454
+ def knows_local_subscribers_for?(service_name)
455
+ @local_subscribers.key?(service_name)
456
+ end
457
+
458
+ def knows_connected_peers
459
+ @connected_peers.keys
460
+ end
461
+
462
+ # This node does NOT know what services exist on remote nodes
463
+ def knows_remote_subscribers?
464
+ false # This is the key insight!
465
+ end
466
+ end
467
+ ```
468
+
469
+ ### Message Routing with Local Knowledge Only
470
+
471
+ ```ruby
472
+ class MeshRouter
473
+ def route_message(message)
474
+ return if already_processed?(message)
475
+ mark_as_processed(message)
476
+
477
+ # Check if we have local subscribers for this service
478
+ if has_local_subscribers?(message.to)
479
+ deliver_to_local_subscribers(message)
480
+ # Note: Don't return - message might also need to go to other nodes
481
+ end
482
+
483
+ # We DON'T know if other nodes have subscribers
484
+ # So we use discovery routing to all connected peers
485
+ forward_to_discovery(message)
486
+ end
487
+
488
+ private
489
+
490
+ def forward_to_discovery(message)
491
+ # Decrement TTL to prevent infinite loops
492
+ message._sm_header.ttl -= 1
493
+ return if message._sm_header.ttl <= 0
494
+
495
+ # Check routing cache for previously successful routes
496
+ if cached_route = @routing_cache[message.to]
497
+ forward_to_cached_peers(message, cached_route)
498
+ else
499
+ # No cached route - flood to all connected peers
500
+ flood_to_connected_peers(message)
501
+ end
502
+ end
503
+
504
+ def flood_to_connected_peers(message)
505
+ @connected_peers.each do |peer_id, connection|
506
+ # Don't send back to where it came from
507
+ next if message._sm_header.came_from == peer_id
508
+
509
+ connection.send_message(message)
510
+ end
511
+ end
512
+
513
+ def forward_to_cached_peers(message, cached_peers)
514
+ cached_peers.each do |peer_id|
515
+ if connection = @connected_peers[peer_id]
516
+ connection.send_message(message)
517
+ end
518
+ end
519
+
520
+ # If cached route fails, fall back to flooding
521
+ # (This would be detected by lack of response/acknowledgment)
522
+ end
523
+
524
+ # When a message is successfully delivered, cache the route
525
+ def learn_successful_route(service_name, peer_id)
526
+ @routing_cache[service_name] ||= []
527
+ @routing_cache[service_name] << peer_id unless
528
+ @routing_cache[service_name].include?(peer_id)
529
+ end
530
+ end
531
+ ```
532
+
533
+ ### Publisher Knowledge Model
534
+
535
+ ```ruby
536
+ class Publisher
537
+ def initialize(mesh_transport)
538
+ @mesh = mesh_transport
539
+ @known_local_services = Set.new # Services on same node as publisher
540
+ @connected_peer_nodes = Set.new # Node IDs we can directly reach
541
+ end
542
+
543
+ def publish_message(message, to:)
544
+ message.to = to
545
+
546
+ # Publisher knows about local services on same node
547
+ if @known_local_services.include?(to)
548
+ @mesh.deliver_locally(message)
549
+ return
550
+ end
551
+
552
+ # Publisher knows which peer nodes it can connect to
553
+ # But does NOT know what subscribers are on those nodes
554
+ @mesh.send_to_connected_peers(message)
555
+
556
+ # The mesh network handles discovery from there
557
+ end
558
+
559
+ def discover_local_services
560
+ # Publisher only discovers services on its own node
561
+ @known_local_services = @mesh.local_services
562
+ end
563
+
564
+ def discover_connected_peers
565
+ # Publisher knows which nodes it can directly connect to
566
+ @connected_peer_nodes = @mesh.connected_peer_ids
567
+ end
568
+
569
+ # Publisher does NOT have this method:
570
+ # def discover_remote_services # ← This doesn't exist!
571
+ end
572
+ ```
573
+ ```
574
+
575
+ ## Implementation Architecture
576
+
577
+ ### Node Structure - P2P Connection Management
578
+
579
+ Each mesh node manages multiple P2P connections and routes messages between them:
580
+
581
+ ```ruby
582
+ class MeshNode
583
+ attr_reader :id, :address, :public_key
584
+
585
+ def initialize
586
+ @id = generate_node_id
587
+ @p2p_connections = {} # peer_id => P2PConnection
588
+ @local_subscribers = {} # message_class => [callbacks]
589
+ @service_registry = ServiceRegistry.new
590
+ @routing_table = RoutingTable.new
591
+
592
+ # Cryptographic identity
593
+ @keypair = OpenSSL::PKey::RSA.new(2048)
594
+ @public_key = @keypair.public_key
595
+ end
596
+
597
+ # Establish P2P connection to another mesh node
598
+ def connect_to_peer(peer_address)
599
+ connection = P2PConnection.new(peer_address)
600
+ connection.establish_secure_channel(@keypair)
601
+
602
+ @p2p_connections[connection.peer_id] = connection
603
+ exchange_routing_info(connection)
604
+ end
605
+
606
+ # Publish message into the mesh via P2P connections
607
+ def publish_to_mesh(message)
608
+ message._sm_header.from = @id
609
+ message._sm_header.ttl ||= 10 # Prevent infinite routing
610
+
611
+ if service_is_local?(message.to)
612
+ # Deliver locally via P2P to local subscribers
613
+ deliver_to_local_subscribers(message)
614
+ else
615
+ # Route to other nodes via P2P connections
616
+ route_to_remote_nodes(message)
617
+ end
618
+ end
619
+
620
+ # Receive message from peer and decide: deliver locally or route further
621
+ def receive_from_peer(message, from_peer_id)
622
+ return if already_seen?(message)
623
+
624
+ if service_is_local?(message.to)
625
+ # Final delivery via P2P to local subscribers
626
+ deliver_to_local_subscribers(message)
627
+ else
628
+ # Continue routing via P2P to other nodes
629
+ forward_to_other_peers(message, exclude: from_peer_id)
630
+ end
631
+ end
632
+
633
+ private
634
+
635
+ def route_to_remote_nodes(message)
636
+ target_peers = @routing_table.find_routes(message.to)
637
+
638
+ if target_peers.any?
639
+ # Send via P2P to known routes
640
+ target_peers.each do |peer_id|
641
+ @p2p_connections[peer_id].send_message(message)
642
+ end
643
+ else
644
+ # Flood via P2P to all neighbors for discovery
645
+ @p2p_connections.each_value do |connection|
646
+ connection.send_message(message)
647
+ end
648
+ end
649
+ end
650
+
651
+ def forward_to_other_peers(message, exclude:)
652
+ message._sm_header.ttl -= 1
653
+ return if message._sm_header.ttl <= 0
654
+
655
+ @p2p_connections.each do |peer_id, connection|
656
+ next if peer_id == exclude
657
+ connection.send_message(message)
658
+ end
659
+ end
660
+ end
661
+
662
+ # P2P connection handles the actual networking
663
+ class P2PConnection
664
+ def initialize(peer_address)
665
+ @peer_address = peer_address
666
+ @socket = nil
667
+ @message_queue = Queue.new
668
+ @send_thread = nil
669
+ end
670
+
671
+ def send_message(message)
672
+ @message_queue.push(message)
673
+ ensure_send_thread_running
674
+ end
675
+
676
+ private
677
+
678
+ def ensure_send_thread_running
679
+ return if @send_thread&.alive?
680
+
681
+ @send_thread = Thread.new do
682
+ while message = @message_queue.pop
683
+ deliver_message_via_socket(message)
684
+ end
685
+ end
686
+ end
687
+ end
688
+ ```
689
+
690
+ ### P2P Transport Implementation
691
+
692
+ ```ruby
693
+ module SmartMessage
694
+ module Transport
695
+ class P2PTransport < Base
696
+ def initialize(options = {})
697
+ super
698
+ @mesh_node = MeshNode.new
699
+ @mesh_node.start(options)
700
+ end
701
+
702
+ def publish(message, routing_key = nil)
703
+ # P2P doesn't use routing keys, uses message.to field
704
+ message._sm_header.from = @mesh_node.id
705
+
706
+ if message.to
707
+ # Direct message to specific peer
708
+ @mesh_node.send_to_peer(message.to, message)
709
+ else
710
+ # Broadcast to all peers subscribed to this message type
711
+ @mesh_node.broadcast(message)
712
+ end
713
+ end
714
+
715
+ def subscribe(routing_key = nil, &block)
716
+ # Subscribe to message types, not routing keys
717
+ message_class = routing_key || SmartMessage::Base
718
+ @mesh_node.subscribe(message_class, &block)
719
+ end
720
+ end
721
+ end
722
+ end
723
+ ```
724
+
725
+ ## Advanced Features
726
+
727
+ ### Distributed Hash Table (DHT) for Message Storage
728
+
729
+ ```ruby
730
+ class DistributedMessageStore
731
+ def initialize(node)
732
+ @node = node
733
+ @dht = Kademlia::DHT.new(node.id)
734
+ end
735
+
736
+ def store_message(message)
737
+ key = Digest::SHA256.hexdigest(message.uuid)
738
+
739
+ # Find nodes responsible for this key
740
+ nodes = @dht.find_nodes(key, k: 3)
741
+
742
+ # Replicate to multiple nodes
743
+ nodes.each do |node|
744
+ node.store(key, message.to_json)
745
+ end
746
+ end
747
+
748
+ def retrieve_message(uuid)
749
+ key = Digest::SHA256.hexdigest(uuid)
750
+ nodes = @dht.find_nodes(key)
751
+
752
+ nodes.each do |node|
753
+ if data = node.retrieve(key)
754
+ return SmartMessage.from_json(data)
755
+ end
756
+ end
757
+ nil
758
+ end
759
+ end
760
+ ```
761
+
762
+ ### Gossip Protocol for State Synchronization
763
+
764
+ ```ruby
765
+ class GossipProtocol
766
+ def initialize(node, interval: 1.0)
767
+ @node = node
768
+ @interval = interval
769
+ @state_version = 0
770
+ @peer_states = {}
771
+ end
772
+
773
+ def start
774
+ Thread.new do
775
+ loop do
776
+ sleep @interval
777
+ gossip_with_random_peer
778
+ end
779
+ end
780
+ end
781
+
782
+ def gossip_with_random_peer
783
+ peer = @node.connections.values.sample
784
+ return unless peer
785
+
786
+ # Exchange state information
787
+ my_state = {
788
+ version: @state_version,
789
+ subscriptions: @node.subscriptions.keys,
790
+ peers_count: @node.connections.size,
791
+ message_types: known_message_types
792
+ }
793
+
794
+ peer_state = peer.exchange_gossip(my_state)
795
+ merge_peer_state(peer_state)
796
+ end
797
+ end
798
+ ```
799
+
800
+ ## Use Cases
801
+
802
+ ### 1. Decentralized IoT Networks with Bridge Nodes
803
+
804
+ ```ruby
805
+ class IoTSensorReading < SmartMessage::Base
806
+ property :sensor_id, required: true
807
+ property :temperature, type: Float
808
+ property :humidity, type: Float
809
+ property :timestamp, type: Time
810
+
811
+ transport SmartMessage::Transport::MeshTransport.new
812
+ end
813
+
814
+ # Sensor on local factory network publishes to cloud analytics
815
+ sensor = IoTSensorReading.new(
816
+ sensor_id: "factory_sensor_01",
817
+ temperature: 72.5,
818
+ humidity: 45.0,
819
+ timestamp: Time.now
820
+ )
821
+
822
+ # Routes across network boundaries via bridge nodes
823
+ sensor.publish(to: "cloud-analytics-service")
824
+
825
+ # Routing path:
826
+ # Factory Sensor → Local Gateway (Bridge) → Internet → Cloud Bridge → Analytics
827
+ # (UDP local) (TCP bridge) (UDP cloud)
828
+ ```
829
+
830
+ ### 2. Resilient Microservices
831
+
832
+ ```ruby
833
+ class OrderService
834
+ def initialize
835
+ @transport = SmartMessage::Transport::MeshTransport.new(
836
+ service_name: "order-service"
837
+ )
838
+
839
+ # Register this node as providing "order-service"
840
+ @transport.register_service("order-service")
841
+
842
+ # Subscribe to payment confirmations (may come from any payment node)
843
+ PaymentConfirmed.transport(@transport)
844
+ PaymentConfirmed.subscribe do |payment|
845
+ process_payment_confirmation(payment)
846
+ end
847
+ end
848
+
849
+ def create_order(data)
850
+ # Send to inventory service - mesh will find it wherever it runs
851
+ InventoryCheck.new(
852
+ order_id: data[:order_id],
853
+ items: data[:items]
854
+ ).publish(to: "inventory-service")
855
+
856
+ # Send to payment service - could be on any node in the mesh
857
+ PaymentRequest.new(
858
+ order_id: data[:order_id],
859
+ amount: data[:amount]
860
+ ).publish(to: "payment-service")
861
+ end
862
+ end
863
+
864
+ # Messages route through the mesh automatically:
865
+ # Order Node → Edge Node → Cloud Node → Payment Service Node
866
+ # Order Node → Local Node → Inventory Service Node
867
+ ```
868
+
869
+ ### 3. Edge Computing Mesh
870
+
871
+ ```ruby
872
+ # Edge nodes form a mesh for distributed computation
873
+ class EdgeComputeNode
874
+ def initialize
875
+ @mesh = SmartMessage::Transport::P2PTransport.new(
876
+ capabilities: [:gpu, :high_memory],
877
+ region: "us-west"
878
+ )
879
+
880
+ ComputeTask.transport(@mesh)
881
+ ComputeTask.subscribe do |task|
882
+ if can_handle?(task)
883
+ result = execute_task(task)
884
+
885
+ # Send result back through mesh
886
+ TaskResult.new(
887
+ task_id: task.id,
888
+ result: result,
889
+ to: task.from # Route back to originator
890
+ ).publish
891
+ else
892
+ # Forward to more capable peer
893
+ forward_to_capable_peer(task)
894
+ end
895
+ end
896
+ end
897
+ end
898
+ ```
899
+
900
+ ## P2P as Mesh Foundation
901
+
902
+ **P2P connections are the foundation** - every hop in the mesh is a peer-to-peer connection:
903
+
904
+ ```
905
+ Publisher → Node A → Node C → Node F → Subscriber
906
+ ↑ ↑ ↑ ↑
907
+ P2P P2P P2P P2P
908
+ ```
909
+
910
+ **Each step involves P2P:**
911
+ 1. **Publisher → First Node**: P2P connection to inject message into mesh
912
+ 2. **Node → Node**: P2P connections for routing between mesh nodes
913
+ 3. **Final Node → Subscriber**: P2P connection for final delivery
914
+
915
+ **Key Difference:**
916
+
917
+ **Simple P2P (journeta-style):**
918
+ - Single-hop: Publisher directly connects to subscriber's node
919
+ - Publisher must discover which specific node hosts the service
920
+
921
+ **Mesh P2P (meshage):**
922
+ - Multi-hop: Publisher connects to any mesh node, message routes through multiple P2P hops
923
+ - Publisher only needs to know service name, not location
924
+
925
+ ```ruby
926
+ # Simple P2P: Publisher must know exact location
927
+ peer_node = discover_node_hosting("inventory-service")
928
+ peer_node.send_message(inventory_check)
929
+
930
+ # Mesh P2P: Publisher connects to any mesh node
931
+ mesh.publish(inventory_check, to: "inventory-service")
932
+ # Mesh handles: local_node → intermediate_nodes → destination_node
933
+ ```
934
+
935
+ **Mesh Network = P2P + Routing Intelligence**
936
+
937
+ ## Benefits
938
+
939
+ 1. **No Single Point of Failure**: No central broker, no single routing node
940
+ 2. **Self-Healing**: Network routes around failed nodes and discovers new paths
941
+ 3. **Location Independence**: Services can move between nodes transparently
942
+ 4. **Fault Tolerance**: Multiple routing paths provide redundancy
943
+ 5. **Dynamic Discovery**: Services are found through routing, not pre-configuration
944
+ 6. **Scalability**: Mesh grows organically, routing distributes automatically
945
+ 7. **Privacy**: Onion routing and encryption possible
946
+ 8. **Partition Tolerance**: Network segments can operate independently
947
+
948
+ ## Challenges
949
+
950
+ 1. **Network Partitions**: Mesh can split into islands
951
+ 2. **Message Ordering**: No global ordering guarantees
952
+ 3. **Security**: Need peer authentication and encryption
953
+ 4. **Discovery Overhead**: Finding peers can be expensive
954
+ 5. **NAT Traversal**: Peers behind firewalls need special handling
955
+ 6. **Bridge Node Reliability**: Bridge failure isolates entire network segments
956
+ 7. **UDP vs TCP Coordination**: Local UDP discovery vs remote TCP connections
957
+ 8. **Bootstrap Node Dependencies**: Need known addresses to establish inter-network bridges
958
+
959
+ ### Bridge Node Challenges
960
+
961
+ **Single Point of Failure:**
962
+ ```
963
+ Network A ←→ [Single Bridge] ←→ Network B
964
+ ↓ FAILS
965
+ Networks A and B become isolated
966
+ ```
967
+
968
+ **Solution - Multiple Bridge Nodes:**
969
+ ```
970
+ Network A ←→ [Bridge 1] ←→ Network B
971
+ ↑ ←→ [Bridge 2] ←→ ↑
972
+ Multiple redundant bridge connections
973
+ ```
974
+
975
+ **NAT Traversal for Bridge Nodes:**
976
+ - Bridge nodes behind NAT need port forwarding or STUN/TURN
977
+ - Or use reverse connections where bridge initiates outbound connections
978
+ - WebRTC-style techniques for hole punching
979
+
980
+ ## Lessons from Journeta
981
+
982
+ The journeta codebase provides excellent patterns for P2P networking that directly apply to our meshage implementation:
983
+
984
+ ### Discovery Architecture
985
+ Journeta uses UDP multicast for presence broadcasting - a simple but effective approach:
986
+
987
+ ```ruby
988
+ # From journeta/presence_broadcaster.rb - simplified
989
+ class PresenceBroadcaster
990
+ def broadcast_presence
991
+ socket = UDPSocket.open
992
+ note = PresenceMessage.new(uuid, peer_port, groups)
993
+ socket.send(note.to_yaml, 0, multicast_address, port)
994
+ end
995
+ end
996
+
997
+ # For SmartMessage meshage:
998
+ class MeshPresence < SmartMessage::Base
999
+ property :node_id, required: true
1000
+ property :address, required: true
1001
+ property :port, required: true
1002
+ property :capabilities, type: Array
1003
+ property :message_types, type: Array # What messages this node handles
1004
+ property :timestamp, type: Time
1005
+ end
1006
+ ```
1007
+
1008
+ ### Peer Registry with Automatic Cleanup
1009
+ Journeta's PeerRegistry manages peer lifecycle with automatic reaping - crucial for mesh reliability:
1010
+
1011
+ ```ruby
1012
+ # Adapted from journeta/peer_registry.rb
1013
+ class MeshPeerRegistry
1014
+ def initialize(mesh_node)
1015
+ @peers = {}
1016
+ @mutex = Mutex.new
1017
+ @reaper_tolerance = 10.0 # seconds
1018
+ start_reaper
1019
+ end
1020
+
1021
+ def reap_stale_peers
1022
+ @mutex.synchronize do
1023
+ stale_peers = @peers.select do |id, peer|
1024
+ peer.last_seen < (Time.now - @reaper_tolerance)
1025
+ end
1026
+
1027
+ stale_peers.each do |id, peer|
1028
+ @peers.delete(id)
1029
+ notify_peer_offline(peer)
1030
+ end
1031
+ end
1032
+ end
1033
+ end
1034
+ ```
1035
+
1036
+ ### Connection Management
1037
+ Journeta uses queued message sending with separate threads per peer - good pattern for mesh:
1038
+
1039
+ ```ruby
1040
+ # From journeta/peer_connection.rb concept
1041
+ class MeshPeerConnection
1042
+ def initialize(peer_info)
1043
+ @peer = peer_info
1044
+ @message_queue = Queue.new
1045
+ @connection_thread = nil
1046
+ end
1047
+
1048
+ def send_message(message)
1049
+ @message_queue.push(message)
1050
+ ensure_connection_thread_running
1051
+ end
1052
+
1053
+ private
1054
+
1055
+ def connection_worker
1056
+ while message = @message_queue.pop
1057
+ begin
1058
+ deliver_message(message)
1059
+ rescue => e
1060
+ handle_delivery_failure(message, e)
1061
+ end
1062
+ end
1063
+ end
1064
+ end
1065
+ ```
1066
+
1067
+ ### Group-Based Messaging
1068
+ Journeta's group concept maps perfectly to SmartMessage's message types and routing:
1069
+
1070
+ ```ruby
1071
+ # Enhanced meshage with group/topic support
1072
+ class MeshTransport < SmartMessage::Transport::Base
1073
+ def initialize(options = {})
1074
+ @groups = options[:groups] || [] # Which message types we handle
1075
+ @mesh_node = MeshNode.new(
1076
+ groups: @groups,
1077
+ capabilities: options[:capabilities] || []
1078
+ )
1079
+ end
1080
+
1081
+ def subscribe(message_class, &block)
1082
+ # Register interest in this message type
1083
+ @groups << message_class.name
1084
+ @mesh_node.update_presence_info
1085
+
1086
+ # Set up message handler
1087
+ @mesh_node.on_message(message_class) do |message|
1088
+ block.call(message) if block
1089
+ end
1090
+ end
1091
+ end
1092
+ ```
1093
+
1094
+ ### Threading Model
1095
+ Journeta's use of dedicated threads for each component is solid for mesh networking:
1096
+
1097
+ ```ruby
1098
+ class MeshNode
1099
+ def start
1100
+ @presence_broadcaster.start # Periodic UDP broadcast
1101
+ @presence_listener.start # UDP listener for peer discovery
1102
+ @message_listener.start # TCP listener for direct messages
1103
+ @peer_registry.start # Peer lifecycle management
1104
+ end
1105
+
1106
+ def stop
1107
+ [@presence_broadcaster, @presence_listener,
1108
+ @message_listener, @peer_registry].each(&:stop)
1109
+ end
1110
+ end
1111
+ ```
1112
+
1113
+ ### Key Improvements for Meshage
1114
+
1115
+ 1. **Better Routing**: Journeta only does direct peer-to-peer. Meshage needs routing through intermediate nodes.
1116
+
1117
+ 2. **Encryption**: Journeta sends YAML in plaintext. Meshage should encrypt all communications.
1118
+
1119
+ 3. **NAT Traversal**: Journeta assumes LAN connectivity. Meshage needs hole punching for internet-scale mesh.
1120
+
1121
+ 4. **Message Types**: Journeta sends arbitrary Ruby objects. Meshage should integrate with SmartMessage's typed message system.
1122
+
1123
+ ## Architecture Synthesis
1124
+
1125
+ Combining journeta's proven patterns with SmartMessage's features:
1126
+
1127
+ ```ruby
1128
+ class SmartMeshTransport < SmartMessage::Transport::Base
1129
+ def initialize(options = {})
1130
+ @mesh_engine = JournetaEngine.new(
1131
+ peer_handler: SmartMessagePeerHandler.new(self),
1132
+ groups: extract_message_types_from_subscriptions
1133
+ )
1134
+
1135
+ # Enhanced with routing, encryption, and SmartMessage integration
1136
+ @mesh_router = MeshRouter.new(@mesh_engine)
1137
+ @message_crypto = MessageCrypto.new(options[:keypair])
1138
+ end
1139
+
1140
+ def publish(message, routing_key = nil)
1141
+ encrypted_message = @message_crypto.encrypt(message)
1142
+
1143
+ if message.to
1144
+ @mesh_router.route_to_peer(message.to, encrypted_message)
1145
+ else
1146
+ @mesh_router.broadcast_to_subscribers(message.class, encrypted_message)
1147
+ end
1148
+ end
1149
+ end
1150
+ ```
1151
+
1152
+ ## Key Insight: Local Knowledge with Network Discovery
1153
+
1154
+ The fundamental characteristic is **limited local knowledge with network-wide discovery**:
1155
+
1156
+ ```ruby
1157
+ # Publisher knows:
1158
+ # - Local services on same node ✓
1159
+ # - Which peer nodes it can connect to ✓
1160
+ # - What subscribers are on remote nodes ✗
1161
+
1162
+ OrderMessage.new(data: order_data).publish(to: "inventory-service")
1163
+
1164
+ # Each node in the route knows:
1165
+ # - Its local subscribers ✓
1166
+ # - Its connected peer nodes ✓
1167
+ # - Subscribers on other nodes ✗
1168
+
1169
+ # Network discovery works via:
1170
+ # 1. Check local subscribers first
1171
+ # 2. Forward to connected peers (they don't know either)
1172
+ # 3. Each peer checks locally, forwards if not found
1173
+ # 4. Eventually reaches node(s) with matching subscribers
1174
+ # 5. Route gets cached for future messages
1175
+ ```
1176
+
1177
+ **This approach is scalable because:**
1178
+ - No node needs global knowledge of all services
1179
+ - No centralized service directory to maintain
1180
+ - Discovery happens naturally through message routing
1181
+ - Successful routes are cached to avoid repeated flooding
1182
+
1183
+ The mesh network acts as a **distributed discovery system** where each node only knows about its immediate neighborhood, but the collective network can find services anywhere through progressive forwarding.
1184
+
1185
+ ## Message Deduplication for Multi-Node Subscribers
1186
+
1187
+ **Critical Challenge:** A subscriber connected to multiple nodes can receive the same message via different routing paths:
1188
+
1189
+ ```
1190
+ Publisher → Node A → Subscriber
1191
+ ↘ Node B ↗
1192
+
1193
+ Subscriber receives same message twice!
1194
+ ```
1195
+
1196
+ ### Deduplication Architecture
1197
+
1198
+ ```ruby
1199
+ class MeshSubscriber
1200
+ def initialize(service_name)
1201
+ @service_name = service_name
1202
+ @message_cache = LRU.new(1000) # Recent message UUIDs
1203
+ @connected_nodes = Set.new # Multiple mesh nodes
1204
+ end
1205
+
1206
+ # Connect to multiple mesh nodes for redundancy
1207
+ def connect_to_mesh_nodes(node_addresses)
1208
+ node_addresses.each do |address|
1209
+ mesh_connection = MeshConnection.new(address)
1210
+ mesh_connection.subscribe(@service_name) do |message|
1211
+ handle_message_with_deduplication(message)
1212
+ end
1213
+ @connected_nodes.add(mesh_connection)
1214
+ end
1215
+ end
1216
+
1217
+ private
1218
+
1219
+ def handle_message_with_deduplication(message)
1220
+ # Check if we've already processed this message
1221
+ return if @message_cache.include?(message._sm_header.uuid)
1222
+
1223
+ # Mark as processed to prevent duplicates
1224
+ @message_cache[message._sm_header.uuid] = Time.now
1225
+
1226
+ # Process the message only once
1227
+ process_message(message)
1228
+ end
1229
+ end
1230
+ ```
1231
+
1232
+ ### Multi-Path Routing Example
1233
+
1234
+ ```ruby
1235
+ class InventoryService
1236
+ def initialize
1237
+ # Connect to multiple nodes for fault tolerance
1238
+ @mesh_subscriber = MeshSubscriber.new("inventory-service")
1239
+ @mesh_subscriber.connect_to_mesh_nodes([
1240
+ "mesh-node-1:8080",
1241
+ "mesh-node-2:8080",
1242
+ "mesh-node-3:8080"
1243
+ ])
1244
+ end
1245
+
1246
+ # This will only be called once per unique message
1247
+ # even though connected to multiple nodes
1248
+ def process_message(order_message)
1249
+ puts "Processing order #{order_message.order_id} - will only see this once!"
1250
+ update_inventory(order_message.items)
1251
+ end
1252
+ end
1253
+
1254
+ # Message flow with deduplication:
1255
+ # Publisher → Node A → InventoryService ✓ (processed)
1256
+ # ↘ Node B → InventoryService ✗ (deduplicated)
1257
+ # ↘ Node C → InventoryService ✗ (deduplicated)
1258
+ ```
1259
+
1260
+ ### Node-Level Deduplication - Critical for Multi-Peer Nodes
1261
+
1262
+ **Challenge:** Nodes connected to multiple peers receive the same message via different routes:
1263
+
1264
+ ```
1265
+ Peer A → Node X ← Peer B
1266
+
1267
+ Same message arrives twice!
1268
+ ```
1269
+
1270
+ **Node DDQ Implementation:**
1271
+
1272
+ ```ruby
1273
+ class MeshNode
1274
+ def initialize
1275
+ @processed_messages = LRU.new(2000) # Track processed message UUIDs
1276
+ @connected_peers = {} # Multiple peer connections
1277
+ @local_subscribers = {} # Local service handlers
1278
+ end
1279
+
1280
+ def receive_message_from_peer(message, from_peer_id)
1281
+ # CRITICAL: Check if we've already processed this message
1282
+ if @processed_messages.include?(message._sm_header.uuid)
1283
+ log_debug("Dropping duplicate message #{message._sm_header.uuid} from #{from_peer_id}")
1284
+ return # Don't process duplicates!
1285
+ end
1286
+
1287
+ # Mark as processed IMMEDIATELY to prevent re-processing
1288
+ @processed_messages[message._sm_header.uuid] = {
1289
+ first_received_from: from_peer_id,
1290
+ received_at: Time.now
1291
+ }
1292
+
1293
+ # Now safe to process the message
1294
+ route_message_internally(message, from_peer_id)
1295
+ end
1296
+
1297
+ private
1298
+
1299
+ def route_message_internally(message, from_peer_id)
1300
+ # Deliver to local subscribers if we have them
1301
+ if has_local_subscribers?(message.to)
1302
+ deliver_to_local_subscribers(message)
1303
+ # Note: Don't return - message may need to continue routing
1304
+ end
1305
+
1306
+ # Forward to other connected peers (excluding sender)
1307
+ forward_to_other_peers(message, exclude: from_peer_id)
1308
+ end
1309
+
1310
+ def forward_to_other_peers(message, exclude:)
1311
+ # Decrement TTL to prevent infinite routing
1312
+ message._sm_header.ttl -= 1
1313
+ return if message._sm_header.ttl <= 0
1314
+
1315
+ @connected_peers.each do |peer_id, connection|
1316
+ next if peer_id == exclude # Don't send back to sender
1317
+
1318
+ connection.send_message(message)
1319
+ end
1320
+ end
1321
+ end
1322
+ ```
1323
+
1324
+ **Multi-Peer Node Scenario:**
1325
+
1326
+ ```ruby
1327
+ # Node connected to 4 peers for redundancy
1328
+ class HighAvailabilityMeshNode < MeshNode
1329
+ def initialize
1330
+ super
1331
+
1332
+ # Connect to multiple peers for fault tolerance
1333
+ connect_to_peers([
1334
+ "mesh-peer-1:8080",
1335
+ "mesh-peer-2:8080",
1336
+ "mesh-peer-3:8080",
1337
+ "mesh-peer-4:8080"
1338
+ ])
1339
+ end
1340
+
1341
+ # Same message might arrive from multiple peers:
1342
+ # Peer 1 → This Node ✓ (processed)
1343
+ # Peer 2 → This Node ✗ (deduplicated)
1344
+ # Peer 3 → This Node ✗ (deduplicated)
1345
+ # Peer 4 → This Node ✗ (deduplicated)
1346
+ end
1347
+ ```
1348
+
1349
+ **DDQ Prevents Multiple Issues:**
1350
+
1351
+ 1. **Duplicate Local Delivery:**
1352
+ ```ruby
1353
+ # Without DDQ:
1354
+ # Peer A sends OrderMessage → Node processes → Delivers to local InventoryService
1355
+ # Peer B sends same OrderMessage → Node processes → Delivers AGAIN to InventoryService!
1356
+
1357
+ # With DDQ:
1358
+ # Peer A sends OrderMessage → Node processes → Delivers to local InventoryService ✓
1359
+ # Peer B sends same OrderMessage → Node deduplicates → NO duplicate delivery ✓
1360
+ ```
1361
+
1362
+ 2. **Duplicate Forwarding:**
1363
+ ```ruby
1364
+ # Without DDQ:
1365
+ # Peer A → Node X → forwards to Peers C,D,E
1366
+ # Peer B → Node X → forwards AGAIN to Peers C,D,E (message storm!)
1367
+
1368
+ # With DDQ:
1369
+ # Peer A → Node X → forwards to Peers C,D,E ✓
1370
+ # Peer B → Node X → deduplicated, no forwarding ✓
1371
+ ```
1372
+
1373
+ 3. **Routing Loops:**
1374
+ ```ruby
1375
+ # Without DDQ, messages can loop forever:
1376
+ # Node A → Node B → Node C → Node A → Node B...
1377
+
1378
+ # With DDQ, each node only processes each message once:
1379
+ # Node A → Node B → Node C → (back to Node A but deduplicated)
1380
+ ```
1381
+
1382
+ **Enhanced Message Flow with Node DDQ:**
1383
+
1384
+ ```
1385
+ Publisher
1386
+
1387
+ Node 1 (receives original message)
1388
+ ↙ ↘
1389
+ Node 2 Node 3 (both forward to Node 4)
1390
+ ↘ ↙
1391
+ Node 4 (receives same message from both Node 2 and Node 3)
1392
+ DDQ prevents duplicate processing! ✓
1393
+ ```
1394
+
1395
+ ### SmartMessage Header Enhancement
1396
+
1397
+ ```ruby
1398
+ class SmartMessageHeader
1399
+ property :uuid, required: true # For deduplication
1400
+ property :ttl, type: Integer, default: 10 # Prevent infinite routing
1401
+ property :route_path, type: Array # Track routing path
1402
+ property :came_from, type: String # Prevent backtracking
1403
+
1404
+ def add_to_route_path(node_id)
1405
+ @route_path ||= []
1406
+ @route_path << node_id
1407
+ end
1408
+
1409
+ def visited_node?(node_id)
1410
+ @route_path&.include?(node_id)
1411
+ end
1412
+ end
1413
+ ```
1414
+
1415
+ ### Deduplication Benefits in Mesh
1416
+
1417
+ 1. **Subscriber Reliability**: Subscribers can connect to multiple nodes without receiving duplicates
1418
+ 2. **Node Reliability**: Nodes can connect to multiple peers without processing duplicates
1419
+ 3. **Fault Tolerance**: If connections fail, redundant paths still work without creating duplicates
1420
+ 4. **Load Distribution**: Messages can flow through different paths but are processed exactly once
1421
+ 5. **Network Efficiency**: Prevents message storms and routing loops
1422
+ 6. **Mesh Scalability**: Enables dense connectivity without duplicate processing overhead
1423
+
1424
+ ### DDQ at Every Level
1425
+
1426
+ **Complete Deduplication Stack:**
1427
+
1428
+ ```ruby
1429
+ # Level 1: Publisher (sends once but to multiple entry points)
1430
+ Publisher → [Node A, Node B] (same message to multiple nodes)
1431
+
1432
+ # Level 2: Entry Nodes (deduplicate between entry points)
1433
+ Node A → downstream peers (processes once)
1434
+ Node B → downstream peers (deduplicates, doesn't reprocess)
1435
+
1436
+ # Level 3: Intermediate Nodes (deduplicate multi-path routing)
1437
+ Node C ← [Node A, Node B] (receives from both, processes once)
1438
+
1439
+ # Level 4: Subscriber Nodes (deduplicate final delivery)
1440
+ Subscriber Node ← [Path 1, Path 2] (receives via multiple paths, processes once)
1441
+
1442
+ # Level 5: Subscribers (deduplicate multi-node connections)
1443
+ Subscriber ← [Node X, Node Y] (connected to multiple nodes, processes once)
1444
+ ```
1445
+
1446
+ **Every layer needs DDQ because every layer can receive duplicates!**
1447
+
1448
+ ### Use Case: Resilient Payment Service
1449
+
1450
+ ```ruby
1451
+ class PaymentService
1452
+ def initialize
1453
+ # Connect to multiple mesh nodes across different data centers
1454
+ @mesh_subscriber = MeshSubscriber.new("payment-service")
1455
+ @mesh_subscriber.connect_to_mesh_nodes([
1456
+ "dc1-mesh-node:8080", # Data center 1
1457
+ "dc2-mesh-node:8080", # Data center 2
1458
+ "edge-mesh-node:8080" # Edge location
1459
+ ])
1460
+ end
1461
+
1462
+ def process_payment(payment_request)
1463
+ # Critical: This must only execute once per payment
1464
+ # Even though we're connected to multiple mesh nodes
1465
+ charge_credit_card(payment_request.amount)
1466
+ end
1467
+ end
1468
+
1469
+ # Scenario: Network partition heals
1470
+ # - Payment request sent during partition reached DC1
1471
+ # - When partition heals, might also route through DC2
1472
+ # - Deduplication ensures payment only processed once
1473
+ ```
1474
+
1475
+ ## Network Control Messages
1476
+
1477
+ Mesh networks need control messages for management and coordination - these have different routing patterns than application messages:
1478
+
1479
+ ### Control Message Types
1480
+
1481
+ ```ruby
1482
+ module SmartMessage
1483
+ module MeshControl
1484
+ # Node presence announcement (local network broadcast)
1485
+ class PresenceAnnouncement < SmartMessage::Base
1486
+ property :node_id, required: true
1487
+ property :node_address, required: true
1488
+ property :tcp_port, type: Integer
1489
+ property :capabilities, type: Array, default: []
1490
+ property :local_services, type: Array, default: []
1491
+ property :is_bridge_node, type: TrueClass, default: false
1492
+ property :bridge_networks, type: Array, default: []
1493
+ property :mesh_version, type: String
1494
+ property :announced_at, type: Time, default: -> { Time.now }
1495
+
1496
+ # Presence messages use UDP broadcast, not mesh routing
1497
+ transport SmartMessage::Transport::UDPBroadcast.new
1498
+ end
1499
+
1500
+ # Graceful shutdown notification (mesh-routed)
1501
+ class NodeShutdown < SmartMessage::Base
1502
+ property :node_id, required: true
1503
+ property :reason, type: String, default: "graceful_shutdown"
1504
+ property :estimated_downtime, type: Integer # seconds
1505
+ property :replacement_nodes, type: Array, default: []
1506
+ property :shutdown_at, type: Time, default: -> { Time.now }
1507
+
1508
+ # Shutdown messages route through mesh to all nodes
1509
+ transport SmartMessage::Transport::MeshTransport.new
1510
+ end
1511
+
1512
+ # Health check / heartbeat (peer-to-peer)
1513
+ class HealthCheck < SmartMessage::Base
1514
+ property :node_id, required: true
1515
+ property :sequence_number, type: Integer
1516
+ property :timestamp, type: Time, default: -> { Time.now }
1517
+ property :load_average, type: Float
1518
+ property :active_connections, type: Integer
1519
+ property :message_queue_depth, type: Integer
1520
+
1521
+ # Health checks go directly between connected peers
1522
+ transport SmartMessage::Transport::P2PTransport.new
1523
+ end
1524
+
1525
+ # Route learning/sharing between nodes
1526
+ class RouteAdvertisement < SmartMessage::Base
1527
+ property :node_id, required: true
1528
+ property :known_services, type: Hash # service_name => [node_paths]
1529
+ property :route_costs, type: Hash # service_name => hop_count
1530
+ property :last_seen, type: Hash # service_name => timestamp
1531
+
1532
+ # Route ads propagate through mesh with limited TTL
1533
+ transport SmartMessage::Transport::MeshTransport.new(ttl: 3)
1534
+ end
1535
+ end
1536
+ end
1537
+ ```
1538
+
1539
+ ### Control Message Routing Patterns
1540
+
1541
+ ```ruby
1542
+ class MeshControlHandler
1543
+ def initialize(mesh_node)
1544
+ @mesh_node = mesh_node
1545
+ setup_control_message_subscriptions
1546
+ end
1547
+
1548
+ private
1549
+
1550
+ def setup_control_message_subscriptions
1551
+ # Handle presence announcements (UDP broadcast)
1552
+ PresenceAnnouncement.subscribe do |announcement|
1553
+ handle_peer_presence(announcement)
1554
+ end
1555
+
1556
+ # Handle shutdown notifications (mesh-routed)
1557
+ NodeShutdown.subscribe do |shutdown|
1558
+ handle_peer_shutdown(shutdown)
1559
+ end
1560
+
1561
+ # Handle health checks (direct P2P)
1562
+ HealthCheck.subscribe do |health|
1563
+ handle_peer_health(health)
1564
+ end
1565
+
1566
+ # Handle route advertisements (mesh-routed, limited TTL)
1567
+ RouteAdvertisement.subscribe do |route_ad|
1568
+ handle_route_advertisement(route_ad)
1569
+ end
1570
+ end
1571
+
1572
+ def handle_peer_presence(announcement)
1573
+ if announcement.node_id != @mesh_node.id
1574
+ # Update peer registry
1575
+ @mesh_node.register_or_update_peer(
1576
+ id: announcement.node_id,
1577
+ address: announcement.node_address,
1578
+ port: announcement.tcp_port,
1579
+ services: announcement.local_services,
1580
+ is_bridge: announcement.is_bridge_node,
1581
+ last_seen: announcement.announced_at
1582
+ )
1583
+
1584
+ # Attempt connection if beneficial
1585
+ consider_connecting_to_peer(announcement)
1586
+ end
1587
+ end
1588
+
1589
+ def handle_peer_shutdown(shutdown)
1590
+ # Remove from routing tables
1591
+ @mesh_node.remove_peer(shutdown.node_id)
1592
+
1593
+ # Update route cache to avoid the shutting down node
1594
+ @mesh_node.invalidate_routes_through(shutdown.node_id)
1595
+
1596
+ # If replacement nodes suggested, consider connecting
1597
+ shutdown.replacement_nodes.each do |replacement|
1598
+ consider_connecting_to_peer(replacement)
1599
+ end
1600
+ end
1601
+
1602
+ def handle_peer_health(health)
1603
+ # Update peer health metrics
1604
+ @mesh_node.update_peer_health(
1605
+ health.node_id,
1606
+ load: health.load_average,
1607
+ connections: health.active_connections,
1608
+ queue_depth: health.message_queue_depth,
1609
+ last_heartbeat: health.timestamp
1610
+ )
1611
+
1612
+ # Respond with our health if this is a health check request
1613
+ respond_to_health_check(health) if health.sequence_number > 0
1614
+ end
1615
+
1616
+ def handle_route_advertisement(route_ad)
1617
+ # Update routing table with learned routes
1618
+ route_ad.known_services.each do |service_name, node_paths|
1619
+ cost = route_ad.route_costs[service_name] + 1 # Add one hop
1620
+ last_seen = route_ad.last_seen[service_name]
1621
+
1622
+ @mesh_node.learn_route(service_name, node_paths, cost, last_seen)
1623
+ end
1624
+ end
1625
+ end
1626
+ ```
1627
+
1628
+ ### Periodic Control Message Generation
1629
+
1630
+ ```ruby
1631
+ class MeshControlScheduler
1632
+ def initialize(mesh_node)
1633
+ @mesh_node = mesh_node
1634
+ @running = false
1635
+ end
1636
+
1637
+ def start
1638
+ @running = true
1639
+
1640
+ # Start presence broadcasting (local network)
1641
+ @presence_thread = Thread.new { presence_broadcast_loop }
1642
+
1643
+ # Start health checking (peer connections)
1644
+ @health_thread = Thread.new { health_check_loop }
1645
+
1646
+ # Start route sharing (mesh network)
1647
+ @route_thread = Thread.new { route_advertisement_loop }
1648
+ end
1649
+
1650
+ private
1651
+
1652
+ def presence_broadcast_loop
1653
+ sequence = 0
1654
+ while @running
1655
+ PresenceAnnouncement.new(
1656
+ node_id: @mesh_node.id,
1657
+ node_address: @mesh_node.external_address,
1658
+ tcp_port: @mesh_node.port,
1659
+ capabilities: @mesh_node.capabilities,
1660
+ local_services: @mesh_node.local_service_names,
1661
+ is_bridge_node: @mesh_node.bridge_node?,
1662
+ bridge_networks: @mesh_node.bridge_networks
1663
+ ).publish
1664
+
1665
+ sleep 5 # Broadcast every 5 seconds
1666
+ sequence += 1
1667
+ end
1668
+ end
1669
+
1670
+ def health_check_loop
1671
+ sequence = 0
1672
+ while @running
1673
+ # Send health check to each connected peer
1674
+ @mesh_node.connected_peers.each do |peer_id, connection|
1675
+ HealthCheck.new(
1676
+ node_id: @mesh_node.id,
1677
+ sequence_number: sequence,
1678
+ load_average: system_load_average,
1679
+ active_connections: @mesh_node.connection_count,
1680
+ message_queue_depth: @mesh_node.queue_depth
1681
+ ).publish(to: peer_id)
1682
+ end
1683
+
1684
+ sleep 10 # Health check every 10 seconds
1685
+ sequence += 1
1686
+ end
1687
+ end
1688
+
1689
+ def route_advertisement_loop
1690
+ while @running
1691
+ # Share known routes with mesh (limited propagation)
1692
+ RouteAdvertisement.new(
1693
+ node_id: @mesh_node.id,
1694
+ known_services: @mesh_node.routing_table.known_services,
1695
+ route_costs: @mesh_node.routing_table.route_costs,
1696
+ last_seen: @mesh_node.routing_table.last_seen_times
1697
+ ).publish # Mesh-routed with TTL=3
1698
+
1699
+ sleep 30 # Route sharing every 30 seconds
1700
+ end
1701
+ end
1702
+ end
1703
+ ```
1704
+
1705
+ ### Graceful Shutdown Protocol
1706
+
1707
+ ```ruby
1708
+ class GracefulShutdown
1709
+ def initialize(mesh_node)
1710
+ @mesh_node = mesh_node
1711
+ end
1712
+
1713
+ def initiate_shutdown(reason: "graceful_shutdown", drain_time: 10)
1714
+ # 1. Stop accepting new connections
1715
+ @mesh_node.stop_accepting_connections
1716
+
1717
+ # 2. Announce shutdown to mesh network
1718
+ NodeShutdown.new(
1719
+ node_id: @mesh_node.id,
1720
+ reason: reason,
1721
+ estimated_downtime: drain_time,
1722
+ replacement_nodes: suggest_replacement_nodes
1723
+ ).publish
1724
+
1725
+ # 3. Wait for message queues to drain
1726
+ wait_for_queue_drain(timeout: drain_time)
1727
+
1728
+ # 4. Close peer connections gracefully
1729
+ @mesh_node.close_all_connections
1730
+
1731
+ # 5. Stop control message generation
1732
+ @mesh_node.stop_control_scheduler
1733
+ end
1734
+
1735
+ private
1736
+
1737
+ def suggest_replacement_nodes
1738
+ # Suggest peer nodes that could handle our local services
1739
+ @mesh_node.connected_peers.select do |peer_id, peer_info|
1740
+ peer_info.capabilities.intersect?(@mesh_node.local_services)
1741
+ end.keys
1742
+ end
1743
+ end
1744
+ ```
1745
+
1746
+ ### Control Message Benefits
1747
+
1748
+ 1. **Network Awareness**: Nodes discover each other and their capabilities
1749
+ 2. **Health Monitoring**: Detect failed nodes and connection issues
1750
+ 3. **Route Learning**: Build efficient routing tables through shared knowledge
1751
+ 4. **Graceful Degradation**: Handle planned shutdowns and maintenance
1752
+ 5. **Load Balancing**: Route messages based on node health and capacity
1753
+ 6. **Bridge Discovery**: Find nodes that can route to other networks
1754
+
1755
+ ## Implementation Summary
1756
+
1757
+ This comprehensive design for Meshage provides a complete architecture for true mesh networking in SmartMessage:
1758
+
1759
+ ### Core Architecture Components
1760
+ 1. **P2P Foundation**: Uses p2p2-style NAT traversal and connection management as the networking foundation
1761
+ 2. **Multi-Hop Routing**: Messages route through intermediate nodes with local knowledge only
1762
+ 3. **Bridge Nodes**: Enable inter-network connectivity beyond UDP broadcast limitations
1763
+ 4. **Multi-Layer Deduplication**: Prevents message storms at subscriber, node, and network levels
1764
+ 5. **Network Control Messages**: Management protocols for presence, health, shutdown, and route discovery
1765
+
1766
+ ### Key Design Principles Achieved
1767
+ - **Complete Decentralization**: No central brokers or coordination points
1768
+ - **Location-Agnostic Publishing**: Publishers don't need to know subscriber locations
1769
+ - **Local Knowledge Model**: Nodes only know immediate connections, ensuring scalability
1770
+ - **Progressive Discovery**: Services found through network-wide routing, not pre-configuration
1771
+ - **Fault Tolerance**: Multiple routing paths and redundant connections
1772
+ - **Self-Healing**: Network automatically routes around failed nodes
1773
+
1774
+ ### Innovation Synthesis
1775
+ - **P2P2 NAT Traversal**: Proven hole punching techniques for internet-scale connectivity
1776
+ - **Journeta Threading**: Robust concurrent connection management patterns
1777
+ - **SmartMessage Integration**: Typed messages with validation and lifecycle management
1778
+ - **Mesh Routing Intelligence**: Multi-hop discovery with route caching and TTL protection
1779
+
1780
+ This design transforms SmartMessage from a traditional message bus into a resilient, decentralized mesh networking platform suitable for IoT, edge computing, and distributed microservices architectures.
1781
+
1782
+ ## Next Steps for Implementation
1783
+
1784
+ - **Phase 1**: Basic P2P connections with SmartMessage integration
1785
+ - **Phase 2**: Local network mesh with UDP discovery and multi-hop routing
1786
+ - **Phase 3**: Bridge nodes for inter-network connectivity with TCP tunneling
1787
+ - **Phase 4**: Advanced features (DHT storage, gossip protocols, encryption)
1788
+ - **Phase 5**: Production hardening (monitoring, metrics, debugging tools)