smart_message 0.0.10 → 0.0.12
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.github/workflows/deploy-github-pages.yml +38 -0
- data/.gitignore +5 -0
- data/CHANGELOG.md +30 -0
- data/Gemfile.lock +35 -4
- data/README.md +169 -71
- data/Rakefile +29 -4
- data/docs/assets/images/ddq_architecture.svg +130 -0
- data/docs/assets/images/dlq_architecture.svg +115 -0
- data/docs/assets/images/enhanced-dual-publishing.svg +136 -0
- data/docs/assets/images/enhanced-fluent-api.svg +149 -0
- data/docs/assets/images/enhanced-microservices-routing.svg +115 -0
- data/docs/assets/images/enhanced-pattern-matching.svg +107 -0
- data/docs/assets/images/fluent-api-demo.svg +59 -0
- data/docs/assets/images/performance-comparison.svg +161 -0
- data/docs/assets/images/redis-basic-architecture.svg +53 -0
- data/docs/assets/images/redis-enhanced-architecture.svg +88 -0
- data/docs/assets/images/redis-queue-architecture.svg +101 -0
- data/docs/assets/images/smart_message.jpg +0 -0
- data/docs/assets/images/smart_message_walking.jpg +0 -0
- data/docs/assets/images/smartmessage_architecture_overview.svg +173 -0
- data/docs/assets/images/transport-comparison-matrix.svg +171 -0
- data/docs/assets/javascripts/mathjax.js +17 -0
- data/docs/assets/stylesheets/extra.css +51 -0
- data/docs/{addressing.md → core-concepts/addressing.md} +5 -7
- data/docs/{architecture.md → core-concepts/architecture.md} +78 -138
- data/docs/{dispatcher.md → core-concepts/dispatcher.md} +21 -21
- data/docs/{message_filtering.md → core-concepts/message-filtering.md} +2 -3
- data/docs/{message_processing.md → core-concepts/message-processing.md} +17 -17
- data/docs/{troubleshooting.md → development/troubleshooting.md} +7 -7
- data/docs/{examples.md → getting-started/examples.md} +115 -89
- data/docs/{getting-started.md → getting-started/quick-start.md} +47 -18
- data/docs/guides/redis-queue-getting-started.md +697 -0
- data/docs/guides/redis-queue-patterns.md +889 -0
- data/docs/guides/redis-queue-production.md +1091 -0
- data/docs/index.md +64 -0
- data/docs/{dead_letter_queue.md → reference/dead-letter-queue.md} +2 -3
- data/docs/{logging.md → reference/logging.md} +1 -1
- data/docs/{message_deduplication.md → reference/message-deduplication.md} +1 -0
- data/docs/{proc_handlers_summary.md → reference/proc-handlers.md} +7 -6
- data/docs/{serializers.md → reference/serializers.md} +3 -5
- data/docs/{transports.md → reference/transports.md} +133 -11
- data/docs/transports/memory-transport.md +374 -0
- data/docs/transports/redis-enhanced-transport.md +524 -0
- data/docs/transports/redis-queue-transport.md +1304 -0
- data/docs/transports/redis-transport-comparison.md +496 -0
- data/docs/transports/redis-transport.md +509 -0
- data/examples/README.md +98 -5
- data/examples/city_scenario/911_emergency_call_flow.svg +99 -0
- data/examples/city_scenario/README.md +515 -0
- data/examples/city_scenario/ai_visitor_intelligence_flow.svg +108 -0
- data/examples/city_scenario/citizen.rb +195 -0
- data/examples/city_scenario/city_diagram.svg +125 -0
- data/examples/city_scenario/common/health_monitor.rb +80 -0
- data/examples/city_scenario/common/logger.rb +30 -0
- data/examples/city_scenario/emergency_dispatch_center.rb +270 -0
- data/examples/city_scenario/fire_department.rb +446 -0
- data/examples/city_scenario/fire_emergency_flow.svg +95 -0
- data/examples/city_scenario/health_department.rb +100 -0
- data/examples/city_scenario/health_monitoring_system.svg +130 -0
- data/examples/city_scenario/house.rb +244 -0
- data/examples/city_scenario/local_bank.rb +217 -0
- data/examples/city_scenario/messages/emergency_911_message.rb +81 -0
- data/examples/city_scenario/messages/emergency_resolved_message.rb +43 -0
- data/examples/city_scenario/messages/fire_dispatch_message.rb +43 -0
- data/examples/city_scenario/messages/fire_emergency_message.rb +45 -0
- data/examples/city_scenario/messages/health_check_message.rb +22 -0
- data/examples/city_scenario/messages/health_status_message.rb +35 -0
- data/examples/city_scenario/messages/police_dispatch_message.rb +46 -0
- data/examples/city_scenario/messages/silent_alarm_message.rb +38 -0
- data/examples/city_scenario/police_department.rb +316 -0
- data/examples/city_scenario/redis_monitor.rb +129 -0
- data/examples/city_scenario/redis_stats.rb +743 -0
- data/examples/city_scenario/room_for_improvement.md +240 -0
- data/examples/city_scenario/security_emergency_flow.svg +95 -0
- data/examples/city_scenario/service_internal_architecture.svg +154 -0
- data/examples/city_scenario/smart_message_ai_agent.rb +364 -0
- data/examples/city_scenario/start_demo.sh +236 -0
- data/examples/city_scenario/stop_demo.sh +106 -0
- data/examples/city_scenario/visitor.rb +631 -0
- data/examples/{10_message_deduplication.rb → memory/01_message_deduplication_demo.rb} +1 -1
- data/examples/{09_dead_letter_queue_demo.rb → memory/02_dead_letter_queue_demo.rb} +13 -40
- data/examples/{01_point_to_point_orders.rb → memory/03_point_to_point_orders.rb} +1 -1
- data/examples/{02_publish_subscribe_events.rb → memory/04_publish_subscribe_events.rb} +2 -2
- data/examples/{03_many_to_many_chat.rb → memory/05_many_to_many_chat.rb} +4 -4
- data/examples/{show_me.rb → memory/06_pretty_print_demo.rb} +1 -1
- data/examples/{05_proc_handlers.rb → memory/07_proc_handlers_demo.rb} +2 -2
- data/examples/{06_custom_logger_example.rb → memory/08_custom_logger_demo.rb} +17 -14
- data/examples/{07_error_handling_scenarios.rb → memory/09_error_handling_demo.rb} +4 -4
- data/examples/{08_entity_addressing_basic.rb → memory/10_entity_addressing_basic.rb} +8 -8
- data/examples/{08_entity_addressing_with_filtering.rb → memory/11_entity_addressing_with_filtering.rb} +6 -6
- data/examples/{09_regex_filtering_microservices.rb → memory/12_regex_filtering_microservices.rb} +2 -2
- data/examples/{10_header_block_configuration.rb → memory/13_header_block_configuration.rb} +6 -6
- data/examples/{11_global_configuration_example.rb → memory/14_global_configuration_demo.rb} +19 -8
- data/examples/{show_logger.rb → memory/15_logger_demo.rb} +1 -1
- data/examples/memory/README.md +163 -0
- data/examples/memory/memory_transport_architecture.svg +90 -0
- data/examples/memory/point_to_point_pattern.svg +94 -0
- data/examples/memory/publish_subscribe_pattern.svg +125 -0
- data/examples/{04_redis_smart_home_iot.rb → redis/01_smart_home_iot_demo.rb} +5 -5
- data/examples/redis/README.md +230 -0
- data/examples/redis/alert_system_flow.svg +127 -0
- data/examples/redis/dashboard_status_flow.svg +107 -0
- data/examples/redis/device_command_flow.svg +113 -0
- data/examples/redis/redis_transport_architecture.svg +115 -0
- data/examples/{smart_home_iot_dataflow.md → redis/smart_home_iot_dataflow.md} +4 -116
- data/examples/redis/smart_home_system_architecture.svg +133 -0
- data/examples/redis_enhanced/README.md +319 -0
- data/examples/redis_enhanced/enhanced_01_basic_patterns.rb +233 -0
- data/examples/redis_enhanced/enhanced_02_fluent_api.rb +331 -0
- data/examples/redis_enhanced/enhanced_03_dual_publishing.rb +281 -0
- data/examples/redis_enhanced/enhanced_04_advanced_routing.rb +419 -0
- data/examples/redis_queue/01_basic_messaging.rb +221 -0
- data/examples/redis_queue/01_comprehensive_examples.rb +508 -0
- data/examples/redis_queue/02_pattern_routing.rb +405 -0
- data/examples/redis_queue/03_fluent_api.rb +422 -0
- data/examples/redis_queue/04_load_balancing.rb +486 -0
- data/examples/redis_queue/05_microservices.rb +735 -0
- data/examples/redis_queue/06_emergency_alerts.rb +777 -0
- data/examples/redis_queue/07_queue_management.rb +587 -0
- data/examples/redis_queue/README.md +366 -0
- data/examples/redis_queue/enhanced_01_basic_patterns.rb +233 -0
- data/examples/redis_queue/enhanced_02_fluent_api.rb +331 -0
- data/examples/redis_queue/enhanced_03_dual_publishing.rb +281 -0
- data/examples/redis_queue/enhanced_04_advanced_routing.rb +419 -0
- data/examples/redis_queue/redis_queue_architecture.svg +148 -0
- data/ideas/README.md +41 -0
- data/ideas/agents.md +1001 -0
- data/ideas/database_transport.md +980 -0
- data/ideas/improvement.md +359 -0
- data/ideas/meshage.md +1788 -0
- data/ideas/message_discovery.md +178 -0
- data/ideas/message_schema.md +1381 -0
- data/lib/smart_message/.idea/.gitignore +8 -0
- data/lib/smart_message/.idea/markdown.xml +6 -0
- data/lib/smart_message/.idea/misc.xml +4 -0
- data/lib/smart_message/.idea/modules.xml +8 -0
- data/lib/smart_message/.idea/smart_message.iml +16 -0
- data/lib/smart_message/.idea/vcs.xml +6 -0
- data/lib/smart_message/addressing.rb +15 -0
- data/lib/smart_message/base.rb +0 -2
- data/lib/smart_message/configuration.rb +1 -1
- data/lib/smart_message/logger.rb +15 -4
- data/lib/smart_message/plugins.rb +5 -2
- data/lib/smart_message/serializer.rb +14 -0
- data/lib/smart_message/transport/redis_enhanced_transport.rb +399 -0
- data/lib/smart_message/transport/redis_queue_transport.rb +555 -0
- data/lib/smart_message/transport/registry.rb +1 -0
- data/lib/smart_message/transport.rb +34 -1
- data/lib/smart_message/version.rb +1 -1
- data/lib/smart_message.rb +5 -52
- data/mkdocs.yml +184 -0
- data/p2p_plan.md +326 -0
- data/p2p_roadmap.md +287 -0
- data/smart_message.gemspec +2 -0
- data/smart_message.svg +51 -0
- metadata +170 -44
- data/docs/README.md +0 -57
- data/examples/dead_letters.jsonl +0 -12
- data/examples/temp.txt +0 -94
- data/examples/tmux_chat/README.md +0 -283
- data/examples/tmux_chat/bot_agent.rb +0 -278
- data/examples/tmux_chat/human_agent.rb +0 -199
- data/examples/tmux_chat/room_monitor.rb +0 -160
- data/examples/tmux_chat/shared_chat_system.rb +0 -328
- data/examples/tmux_chat/start_chat_demo.sh +0 -190
- data/examples/tmux_chat/stop_chat_demo.sh +0 -22
- /data/docs/{properties.md → core-concepts/properties.md} +0 -0
- /data/docs/{ideas_to_think_about.md → development/ideas.md} +0 -0
data/ideas/meshage.md
ADDED
@@ -0,0 +1,1788 @@
|
|
1
|
+
# Meshage: True Mesh Network Transport for SmartMessage
|
2
|
+
|
3
|
+
## Overview
|
4
|
+
|
5
|
+
Meshage (Mesh + Message) would be a fully decentralized mesh network transport for SmartMessage that enables resilient message passing without any central coordination. In a true mesh network, publishers don't need to know where subscribers are located - they simply publish messages addressed to a subscriber/service, and the mesh network automatically routes the message through intermediate nodes until it reaches the destination or expires.
|
6
|
+
|
7
|
+
## Lessons from P2P2 Gem
|
8
|
+
|
9
|
+
The p2p2 Ruby gem provides excellent patterns for NAT traversal and P2P connection management that directly apply to mesh networking:
|
10
|
+
|
11
|
+
### NAT Hole Punching Architecture
|
12
|
+
P2P2 uses a pairing daemon (paird) that coordinates P2P connections between clients behind NATs:
|
13
|
+
|
14
|
+
```ruby
|
15
|
+
# Adapted from p2p2's approach for SmartMessage mesh nodes
|
16
|
+
class MeshHolePunchingService
|
17
|
+
def initialize(coordination_port = 4040)
|
18
|
+
@coordination_servers = [] # Multiple servers for redundancy
|
19
|
+
@active_sessions = {} # node_id => session_info
|
20
|
+
|
21
|
+
# Create multiple UDP sockets on different ports (like p2p2)
|
22
|
+
10.times do |i|
|
23
|
+
port = coordination_port + i
|
24
|
+
socket = UDPSocket.new
|
25
|
+
socket.bind("0.0.0.0", port)
|
26
|
+
@coordination_servers << { socket: socket, port: port }
|
27
|
+
end
|
28
|
+
end
|
29
|
+
|
30
|
+
# Node announces itself to establish P2P connections
|
31
|
+
def announce_node(node_id, capabilities)
|
32
|
+
# Similar to p2p2's "title" concept but for mesh nodes
|
33
|
+
session_data = {
|
34
|
+
node_id: node_id,
|
35
|
+
local_services: capabilities[:services],
|
36
|
+
is_bridge: capabilities[:bridge_node],
|
37
|
+
announced_at: Time.now
|
38
|
+
}
|
39
|
+
|
40
|
+
# Send to random coordination port (load balancing like p2p2)
|
41
|
+
server = @coordination_servers.sample
|
42
|
+
server[:socket].send(session_data.to_json, 0,
|
43
|
+
coordination_address, server[:port])
|
44
|
+
end
|
45
|
+
|
46
|
+
# Coordinate hole punching between two nodes
|
47
|
+
def coordinate_connection(node1_id, node2_id)
|
48
|
+
node1_session = @active_sessions[node1_id]
|
49
|
+
node2_session = @active_sessions[node2_id]
|
50
|
+
|
51
|
+
return unless node1_session && node2_session
|
52
|
+
|
53
|
+
# Exchange address info (like p2p2's paird logic)
|
54
|
+
send_peer_address(node1_session, node2_session[:address])
|
55
|
+
send_peer_address(node2_session, node1_session[:address])
|
56
|
+
end
|
57
|
+
end
|
58
|
+
```
|
59
|
+
|
60
|
+
### Connection Management Patterns
|
61
|
+
P2P2's worker architecture with role-based socket management:
|
62
|
+
|
63
|
+
```ruby
|
64
|
+
class MeshNodeWorker
|
65
|
+
def initialize
|
66
|
+
@sockets = {}
|
67
|
+
@socket_roles = {} # socket => :mesh_peer, :local_service, :bridge
|
68
|
+
@read_sockets = []
|
69
|
+
@write_sockets = []
|
70
|
+
|
71
|
+
# Buffer management (adapted from p2p2's buffering)
|
72
|
+
@peer_buffers = {} # peer_id => { read_buffer: "", write_buffer: "" }
|
73
|
+
@buffer_limits = {
|
74
|
+
max_buffer_size: 50 * 1024 * 1024, # 50MB like p2p2
|
75
|
+
resume_threshold: 25 * 1024 * 1024 # Resume when below 25MB
|
76
|
+
}
|
77
|
+
end
|
78
|
+
|
79
|
+
def main_loop
|
80
|
+
loop do
|
81
|
+
readable, writable = IO.select(@read_sockets, @write_sockets)
|
82
|
+
|
83
|
+
readable.each do |socket|
|
84
|
+
role = @socket_roles[socket]
|
85
|
+
case role
|
86
|
+
when :mesh_peer
|
87
|
+
handle_peer_message(socket)
|
88
|
+
when :local_service
|
89
|
+
handle_service_message(socket)
|
90
|
+
when :bridge
|
91
|
+
handle_bridge_message(socket)
|
92
|
+
end
|
93
|
+
end
|
94
|
+
|
95
|
+
writable.each do |socket|
|
96
|
+
flush_write_buffer(socket)
|
97
|
+
end
|
98
|
+
end
|
99
|
+
end
|
100
|
+
|
101
|
+
# Flow control like p2p2 - pause reading when buffers full
|
102
|
+
def handle_buffer_overflow(peer_id)
|
103
|
+
peer_socket = find_peer_socket(peer_id)
|
104
|
+
@read_sockets.delete(peer_socket) # Pause reading
|
105
|
+
|
106
|
+
# Resume when buffer drains (checked periodically)
|
107
|
+
schedule_buffer_check(peer_id)
|
108
|
+
end
|
109
|
+
end
|
110
|
+
```
|
111
|
+
|
112
|
+
### Multi-Port UDP Coordination
|
113
|
+
P2P2 uses multiple UDP ports to improve NAT traversal success:
|
114
|
+
|
115
|
+
```ruby
|
116
|
+
class MeshCoordinationService
|
117
|
+
def initialize(base_port = 4040)
|
118
|
+
@coordination_ports = []
|
119
|
+
|
120
|
+
# Create 10 coordination ports like p2p2
|
121
|
+
10.times do |i|
|
122
|
+
port = base_port + i
|
123
|
+
socket = UDPSocket.new
|
124
|
+
socket.setsockopt(Socket::SOL_SOCKET, Socket::SO_REUSEPORT, 1)
|
125
|
+
socket.bind("0.0.0.0", port)
|
126
|
+
|
127
|
+
@coordination_ports << {
|
128
|
+
socket: socket,
|
129
|
+
port: port,
|
130
|
+
active_sessions: {}
|
131
|
+
}
|
132
|
+
end
|
133
|
+
end
|
134
|
+
|
135
|
+
def coordinate_mesh_connection(requester_id, target_service)
|
136
|
+
# Find nodes that provide target_service
|
137
|
+
candidate_nodes = find_service_providers(target_service)
|
138
|
+
|
139
|
+
candidate_nodes.each do |node_info|
|
140
|
+
# Attempt hole punching to each candidate
|
141
|
+
attempt_hole_punch(requester_id, node_info[:node_id])
|
142
|
+
end
|
143
|
+
end
|
144
|
+
|
145
|
+
# P2P2-style room/session management for mesh
|
146
|
+
def manage_mesh_sessions
|
147
|
+
@coordination_ports.each do |port_info|
|
148
|
+
port_info[:active_sessions].each do |session_id, session|
|
149
|
+
if session_expired?(session)
|
150
|
+
cleanup_session(session_id)
|
151
|
+
end
|
152
|
+
end
|
153
|
+
end
|
154
|
+
end
|
155
|
+
end
|
156
|
+
```
|
157
|
+
|
158
|
+
### TCP Tunneling Over UDP Holes
|
159
|
+
P2P2 establishes UDP holes then creates TCP connections through them:
|
160
|
+
|
161
|
+
```ruby
|
162
|
+
class MeshTCPTunnel
|
163
|
+
def initialize(local_service_port, remote_peer_address)
|
164
|
+
@local_service_port = local_service_port
|
165
|
+
@remote_peer_address = remote_peer_address
|
166
|
+
@tcp_connections = {}
|
167
|
+
|
168
|
+
# Create tunnel socket through UDP hole (like p2p2)
|
169
|
+
@tunnel_socket = establish_tcp_through_udp_hole
|
170
|
+
end
|
171
|
+
|
172
|
+
def establish_tcp_through_udp_hole
|
173
|
+
# First establish UDP hole
|
174
|
+
udp_socket = create_udp_hole(@remote_peer_address)
|
175
|
+
|
176
|
+
# Then create TCP connection using same local port
|
177
|
+
tcp_socket = Socket.new(Socket::AF_INET, Socket::SOCK_STREAM, 0)
|
178
|
+
tcp_socket.setsockopt(Socket::IPPROTO_TCP, Socket::TCP_NODELAY, 1)
|
179
|
+
tcp_socket.bind(udp_socket.local_address) # Reuse UDP hole port
|
180
|
+
|
181
|
+
# Connect through the hole (may require multiple attempts like p2p2)
|
182
|
+
retry_count = 0
|
183
|
+
begin
|
184
|
+
tcp_socket.connect_nonblock(@remote_peer_address)
|
185
|
+
rescue IO::WaitWritable
|
186
|
+
retry_count += 1
|
187
|
+
if retry_count < 5 # P2P2's PUNCH_LIMIT
|
188
|
+
sleep(0.1)
|
189
|
+
retry
|
190
|
+
else
|
191
|
+
raise "Failed to establish TCP tunnel after #{retry_count} attempts"
|
192
|
+
end
|
193
|
+
end
|
194
|
+
|
195
|
+
tcp_socket
|
196
|
+
end
|
197
|
+
|
198
|
+
# Bridge local service to remote mesh node
|
199
|
+
def bridge_service_traffic
|
200
|
+
local_service = TCPSocket.new("127.0.0.1", @local_service_port)
|
201
|
+
|
202
|
+
# Bidirectional forwarding like p2p2's tun/dst pattern
|
203
|
+
Thread.new do
|
204
|
+
loop do
|
205
|
+
data = local_service.read_nonblock(1024 * 1024) # P2P2's READ_SIZE
|
206
|
+
@tunnel_socket.write(data)
|
207
|
+
rescue IO::WaitReadable
|
208
|
+
# Handle using IO.select like p2p2
|
209
|
+
end
|
210
|
+
end
|
211
|
+
|
212
|
+
Thread.new do
|
213
|
+
loop do
|
214
|
+
data = @tunnel_socket.read_nonblock(1024 * 1024)
|
215
|
+
local_service.write(data)
|
216
|
+
rescue IO::WaitReadable
|
217
|
+
# Handle using IO.select like p2p2
|
218
|
+
end
|
219
|
+
end
|
220
|
+
end
|
221
|
+
end
|
222
|
+
```
|
223
|
+
|
224
|
+
### Key Improvements for SmartMessage Mesh
|
225
|
+
|
226
|
+
**Better Service Discovery:**
|
227
|
+
P2P2 uses simple "room" names. Mesh needs service-based discovery:
|
228
|
+
|
229
|
+
```ruby
|
230
|
+
# P2P2 style
|
231
|
+
"room_name" # Simple string matching
|
232
|
+
|
233
|
+
# Mesh style
|
234
|
+
{
|
235
|
+
service_name: "inventory-service",
|
236
|
+
capabilities: [:read, :write],
|
237
|
+
version: "2.1",
|
238
|
+
region: "us-west"
|
239
|
+
}
|
240
|
+
```
|
241
|
+
|
242
|
+
**Message Routing vs Direct Tunneling:**
|
243
|
+
P2P2 creates direct tunnels. Mesh needs multi-hop routing:
|
244
|
+
|
245
|
+
```ruby
|
246
|
+
# P2P2: Direct tunnel
|
247
|
+
Client A → Coordination Server → Client B
|
248
|
+
(establish direct tunnel)
|
249
|
+
|
250
|
+
# Mesh: Multi-hop routing
|
251
|
+
Publisher → Node A → Node C → Node F → Subscriber
|
252
|
+
(route through intermediate nodes)
|
253
|
+
```
|
254
|
+
|
255
|
+
**SmartMessage Integration:**
|
256
|
+
P2P2 forwards raw TCP streams. Mesh handles typed messages:
|
257
|
+
|
258
|
+
```ruby
|
259
|
+
# P2P2: Raw data forwarding
|
260
|
+
tun_socket.write(raw_data)
|
261
|
+
|
262
|
+
# Mesh: SmartMessage integration
|
263
|
+
mesh_connection.send_message(order_message)
|
264
|
+
```
|
265
|
+
|
266
|
+
### P2P2 Advantages for Mesh
|
267
|
+
|
268
|
+
1. **Proven NAT Traversal**: P2P2's hole punching works reliably across different NAT types
|
269
|
+
2. **Efficient Buffering**: Flow control prevents memory exhaustion during high traffic
|
270
|
+
3. **Multi-Port Strategy**: Increases success rate of establishing connections
|
271
|
+
4. **Graceful Degradation**: Handles connection failures and retries intelligently
|
272
|
+
5. **Resource Management**: Proper cleanup of expired sessions and connections
|
273
|
+
6. **Non-Blocking I/O**: Uses IO.select for efficient concurrent connection handling
|
274
|
+
|
275
|
+
The p2p2 gem provides the low-level P2P connection primitives that mesh networking builds upon - specifically NAT traversal, connection establishment, and traffic forwarding. For SmartMessage mesh, we'd use these patterns as the foundation layer while adding service discovery, message routing, and distributed coordination on top.
|
276
|
+
|
277
|
+
## Key Mesh Network Principles
|
278
|
+
|
279
|
+
### 1. Complete Decentralization
|
280
|
+
Every node in the mesh can route messages. No central authority, brokers, or coordination points.
|
281
|
+
|
282
|
+
### 2. Location-Agnostic Publishing
|
283
|
+
Publishers send messages to subscriber IDs or service names without knowing which physical node hosts them:
|
284
|
+
|
285
|
+
```ruby
|
286
|
+
# Publisher doesn't know or care where inventory-service runs
|
287
|
+
OrderMessage.new(
|
288
|
+
order_id: "123",
|
289
|
+
items: ["widget", "gadget"]
|
290
|
+
).publish(to: "inventory-service")
|
291
|
+
|
292
|
+
# The mesh network figures out routing automatically
|
293
|
+
```
|
294
|
+
|
295
|
+
### 3. Multi-Hop Message Routing
|
296
|
+
Messages travel through intermediate nodes to reach their destination:
|
297
|
+
|
298
|
+
```
|
299
|
+
Node A → Node C → Node F → Node K (inventory-service)
|
300
|
+
```
|
301
|
+
|
302
|
+
### 4. Self-Terminating Messages
|
303
|
+
Messages include TTL (Time To Live) or hop limits to prevent infinite routing loops.
|
304
|
+
|
305
|
+
## Core Concepts
|
306
|
+
|
307
|
+
### Peer Discovery - Local vs Inter-Network
|
308
|
+
|
309
|
+
Mesh networks need different discovery mechanisms for local vs remote networks:
|
310
|
+
|
311
|
+
#### Local Network Discovery (UDP Broadcast)
|
312
|
+
```ruby
|
313
|
+
class LocalNetworkDiscovery
|
314
|
+
def initialize(mesh_node)
|
315
|
+
@mesh_node = mesh_node
|
316
|
+
@udp_port = 31337
|
317
|
+
@multicast_address = '224.220.221.222'
|
318
|
+
end
|
319
|
+
|
320
|
+
def start_discovery
|
321
|
+
# UDP multicast for local network discovery
|
322
|
+
start_udp_broadcaster
|
323
|
+
start_udp_listener
|
324
|
+
end
|
325
|
+
|
326
|
+
def broadcast_presence
|
327
|
+
message = {
|
328
|
+
node_id: @mesh_node.id,
|
329
|
+
services: @mesh_node.local_services,
|
330
|
+
tcp_port: @mesh_node.tcp_port,
|
331
|
+
is_bridge: @mesh_node.bridge_node?,
|
332
|
+
bridge_networks: @mesh_node.bridge_networks
|
333
|
+
}
|
334
|
+
|
335
|
+
@udp_socket.send(message.to_json, 0, @multicast_address, @udp_port)
|
336
|
+
end
|
337
|
+
end
|
338
|
+
```
|
339
|
+
|
340
|
+
#### Bridge Nodes for Inter-Network Connectivity
|
341
|
+
```ruby
|
342
|
+
class BridgeNode < MeshNode
|
343
|
+
def initialize(options = {})
|
344
|
+
super
|
345
|
+
@bridge_networks = options[:bridge_networks] || []
|
346
|
+
@external_connections = {} # network_id => [P2PConnection]
|
347
|
+
@bootstrap_nodes = options[:bootstrap_nodes] || []
|
348
|
+
end
|
349
|
+
|
350
|
+
def bridge_node?
|
351
|
+
true
|
352
|
+
end
|
353
|
+
|
354
|
+
def start
|
355
|
+
super
|
356
|
+
|
357
|
+
# Connect to other networks via TCP to known bridge nodes
|
358
|
+
connect_to_external_networks
|
359
|
+
|
360
|
+
# Advertise bridge capability in local UDP broadcasts
|
361
|
+
advertise_bridge_capability
|
362
|
+
end
|
363
|
+
|
364
|
+
private
|
365
|
+
|
366
|
+
def connect_to_external_networks
|
367
|
+
@bootstrap_nodes.each do |external_address|
|
368
|
+
# TCP connection to bridge nodes in other networks
|
369
|
+
connection = P2PConnection.new(external_address, protocol: :tcp)
|
370
|
+
|
371
|
+
begin
|
372
|
+
connection.establish_secure_channel(@keypair)
|
373
|
+
network_id = determine_network_id(external_address)
|
374
|
+
@external_connections[network_id] ||= []
|
375
|
+
@external_connections[network_id] << connection
|
376
|
+
|
377
|
+
# Exchange routing information with remote network
|
378
|
+
exchange_inter_network_routes(connection, network_id)
|
379
|
+
rescue => e
|
380
|
+
logger.warn "Failed to connect to external network #{external_address}: #{e}"
|
381
|
+
end
|
382
|
+
end
|
383
|
+
end
|
384
|
+
|
385
|
+
def route_message(message)
|
386
|
+
if local_network_destination?(message.to)
|
387
|
+
# Route within local network using UDP-discovered nodes
|
388
|
+
route_locally(message)
|
389
|
+
else
|
390
|
+
# Route to external network via TCP bridge connections
|
391
|
+
route_to_external_network(message)
|
392
|
+
end
|
393
|
+
end
|
394
|
+
|
395
|
+
def route_to_external_network(message)
|
396
|
+
target_network = determine_target_network(message.to)
|
397
|
+
|
398
|
+
if bridge_connections = @external_connections[target_network]
|
399
|
+
# Send via TCP to bridge nodes in target network
|
400
|
+
bridge_connections.each do |connection|
|
401
|
+
connection.send_message(message)
|
402
|
+
end
|
403
|
+
else
|
404
|
+
# Don't know target network - flood to all external connections
|
405
|
+
@external_connections.each_value do |connections|
|
406
|
+
connections.each { |conn| conn.send_message(message) }
|
407
|
+
end
|
408
|
+
end
|
409
|
+
end
|
410
|
+
end
|
411
|
+
```
|
412
|
+
|
413
|
+
#### Network Topology Examples
|
414
|
+
|
415
|
+
**Single Local Network:**
|
416
|
+
```
|
417
|
+
[Node A] ←UDP→ [Node B] ←UDP→ [Node C]
|
418
|
+
↑ ↑
|
419
|
+
UDP multicast discovery works for all nodes
|
420
|
+
```
|
421
|
+
|
422
|
+
**Multi-Network Mesh with Bridges:**
|
423
|
+
```
|
424
|
+
Local Network 1: Local Network 2:
|
425
|
+
[Node A] ←UDP→ [Bridge B] [Bridge D] ←UDP→ [Node E]
|
426
|
+
↑ ↑
|
427
|
+
TCP Bridge Connection
|
428
|
+
(crosses router boundaries)
|
429
|
+
|
430
|
+
Bridge B connects:
|
431
|
+
- Local nodes via UDP (A, others on network 1)
|
432
|
+
- Remote networks via TCP (Bridge D on network 2)
|
433
|
+
```
|
434
|
+
|
435
|
+
### Local Knowledge Model
|
436
|
+
|
437
|
+
Each node only knows about its immediate connections - this keeps the system scalable:
|
438
|
+
|
439
|
+
```ruby
|
440
|
+
class MeshNode
|
441
|
+
def initialize
|
442
|
+
@node_id = generate_node_id
|
443
|
+
|
444
|
+
# ONLY know about directly connected peers
|
445
|
+
@connected_peers = {} # peer_id => P2PConnection
|
446
|
+
|
447
|
+
# ONLY know about local subscribers
|
448
|
+
@local_subscribers = {} # service_name => [callback_handlers]
|
449
|
+
|
450
|
+
# NO global knowledge of who subscribes to what on other nodes
|
451
|
+
@routing_cache = LRU.new(100) # Cache successful routes
|
452
|
+
end
|
453
|
+
|
454
|
+
def knows_local_subscribers_for?(service_name)
|
455
|
+
@local_subscribers.key?(service_name)
|
456
|
+
end
|
457
|
+
|
458
|
+
def knows_connected_peers
|
459
|
+
@connected_peers.keys
|
460
|
+
end
|
461
|
+
|
462
|
+
# This node does NOT know what services exist on remote nodes
|
463
|
+
def knows_remote_subscribers?
|
464
|
+
false # This is the key insight!
|
465
|
+
end
|
466
|
+
end
|
467
|
+
```
|
468
|
+
|
469
|
+
### Message Routing with Local Knowledge Only
|
470
|
+
|
471
|
+
```ruby
|
472
|
+
class MeshRouter
|
473
|
+
def route_message(message)
|
474
|
+
return if already_processed?(message)
|
475
|
+
mark_as_processed(message)
|
476
|
+
|
477
|
+
# Check if we have local subscribers for this service
|
478
|
+
if has_local_subscribers?(message.to)
|
479
|
+
deliver_to_local_subscribers(message)
|
480
|
+
# Note: Don't return - message might also need to go to other nodes
|
481
|
+
end
|
482
|
+
|
483
|
+
# We DON'T know if other nodes have subscribers
|
484
|
+
# So we use discovery routing to all connected peers
|
485
|
+
forward_to_discovery(message)
|
486
|
+
end
|
487
|
+
|
488
|
+
private
|
489
|
+
|
490
|
+
def forward_to_discovery(message)
|
491
|
+
# Decrement TTL to prevent infinite loops
|
492
|
+
message._sm_header.ttl -= 1
|
493
|
+
return if message._sm_header.ttl <= 0
|
494
|
+
|
495
|
+
# Check routing cache for previously successful routes
|
496
|
+
if cached_route = @routing_cache[message.to]
|
497
|
+
forward_to_cached_peers(message, cached_route)
|
498
|
+
else
|
499
|
+
# No cached route - flood to all connected peers
|
500
|
+
flood_to_connected_peers(message)
|
501
|
+
end
|
502
|
+
end
|
503
|
+
|
504
|
+
def flood_to_connected_peers(message)
|
505
|
+
@connected_peers.each do |peer_id, connection|
|
506
|
+
# Don't send back to where it came from
|
507
|
+
next if message._sm_header.came_from == peer_id
|
508
|
+
|
509
|
+
connection.send_message(message)
|
510
|
+
end
|
511
|
+
end
|
512
|
+
|
513
|
+
def forward_to_cached_peers(message, cached_peers)
|
514
|
+
cached_peers.each do |peer_id|
|
515
|
+
if connection = @connected_peers[peer_id]
|
516
|
+
connection.send_message(message)
|
517
|
+
end
|
518
|
+
end
|
519
|
+
|
520
|
+
# If cached route fails, fall back to flooding
|
521
|
+
# (This would be detected by lack of response/acknowledgment)
|
522
|
+
end
|
523
|
+
|
524
|
+
# When a message is successfully delivered, cache the route
|
525
|
+
def learn_successful_route(service_name, peer_id)
|
526
|
+
@routing_cache[service_name] ||= []
|
527
|
+
@routing_cache[service_name] << peer_id unless
|
528
|
+
@routing_cache[service_name].include?(peer_id)
|
529
|
+
end
|
530
|
+
end
|
531
|
+
```
|
532
|
+
|
533
|
+
### Publisher Knowledge Model
|
534
|
+
|
535
|
+
```ruby
|
536
|
+
class Publisher
|
537
|
+
def initialize(mesh_transport)
|
538
|
+
@mesh = mesh_transport
|
539
|
+
@known_local_services = Set.new # Services on same node as publisher
|
540
|
+
@connected_peer_nodes = Set.new # Node IDs we can directly reach
|
541
|
+
end
|
542
|
+
|
543
|
+
def publish_message(message, to:)
|
544
|
+
message.to = to
|
545
|
+
|
546
|
+
# Publisher knows about local services on same node
|
547
|
+
if @known_local_services.include?(to)
|
548
|
+
@mesh.deliver_locally(message)
|
549
|
+
return
|
550
|
+
end
|
551
|
+
|
552
|
+
# Publisher knows which peer nodes it can connect to
|
553
|
+
# But does NOT know what subscribers are on those nodes
|
554
|
+
@mesh.send_to_connected_peers(message)
|
555
|
+
|
556
|
+
# The mesh network handles discovery from there
|
557
|
+
end
|
558
|
+
|
559
|
+
def discover_local_services
|
560
|
+
# Publisher only discovers services on its own node
|
561
|
+
@known_local_services = @mesh.local_services
|
562
|
+
end
|
563
|
+
|
564
|
+
def discover_connected_peers
|
565
|
+
# Publisher knows which nodes it can directly connect to
|
566
|
+
@connected_peer_nodes = @mesh.connected_peer_ids
|
567
|
+
end
|
568
|
+
|
569
|
+
# Publisher does NOT have this method:
|
570
|
+
# def discover_remote_services # ← This doesn't exist!
|
571
|
+
end
|
572
|
+
```
|
573
|
+
```
|
574
|
+
|
575
|
+
## Implementation Architecture
|
576
|
+
|
577
|
+
### Node Structure - P2P Connection Management
|
578
|
+
|
579
|
+
Each mesh node manages multiple P2P connections and routes messages between them:
|
580
|
+
|
581
|
+
```ruby
|
582
|
+
class MeshNode
|
583
|
+
attr_reader :id, :address, :public_key
|
584
|
+
|
585
|
+
def initialize
|
586
|
+
@id = generate_node_id
|
587
|
+
@p2p_connections = {} # peer_id => P2PConnection
|
588
|
+
@local_subscribers = {} # message_class => [callbacks]
|
589
|
+
@service_registry = ServiceRegistry.new
|
590
|
+
@routing_table = RoutingTable.new
|
591
|
+
|
592
|
+
# Cryptographic identity
|
593
|
+
@keypair = OpenSSL::PKey::RSA.new(2048)
|
594
|
+
@public_key = @keypair.public_key
|
595
|
+
end
|
596
|
+
|
597
|
+
# Establish P2P connection to another mesh node
|
598
|
+
def connect_to_peer(peer_address)
|
599
|
+
connection = P2PConnection.new(peer_address)
|
600
|
+
connection.establish_secure_channel(@keypair)
|
601
|
+
|
602
|
+
@p2p_connections[connection.peer_id] = connection
|
603
|
+
exchange_routing_info(connection)
|
604
|
+
end
|
605
|
+
|
606
|
+
# Publish message into the mesh via P2P connections
|
607
|
+
def publish_to_mesh(message)
|
608
|
+
message._sm_header.from = @id
|
609
|
+
message._sm_header.ttl ||= 10 # Prevent infinite routing
|
610
|
+
|
611
|
+
if service_is_local?(message.to)
|
612
|
+
# Deliver locally via P2P to local subscribers
|
613
|
+
deliver_to_local_subscribers(message)
|
614
|
+
else
|
615
|
+
# Route to other nodes via P2P connections
|
616
|
+
route_to_remote_nodes(message)
|
617
|
+
end
|
618
|
+
end
|
619
|
+
|
620
|
+
# Receive message from peer and decide: deliver locally or route further
|
621
|
+
def receive_from_peer(message, from_peer_id)
|
622
|
+
return if already_seen?(message)
|
623
|
+
|
624
|
+
if service_is_local?(message.to)
|
625
|
+
# Final delivery via P2P to local subscribers
|
626
|
+
deliver_to_local_subscribers(message)
|
627
|
+
else
|
628
|
+
# Continue routing via P2P to other nodes
|
629
|
+
forward_to_other_peers(message, exclude: from_peer_id)
|
630
|
+
end
|
631
|
+
end
|
632
|
+
|
633
|
+
private
|
634
|
+
|
635
|
+
def route_to_remote_nodes(message)
|
636
|
+
target_peers = @routing_table.find_routes(message.to)
|
637
|
+
|
638
|
+
if target_peers.any?
|
639
|
+
# Send via P2P to known routes
|
640
|
+
target_peers.each do |peer_id|
|
641
|
+
@p2p_connections[peer_id].send_message(message)
|
642
|
+
end
|
643
|
+
else
|
644
|
+
# Flood via P2P to all neighbors for discovery
|
645
|
+
@p2p_connections.each_value do |connection|
|
646
|
+
connection.send_message(message)
|
647
|
+
end
|
648
|
+
end
|
649
|
+
end
|
650
|
+
|
651
|
+
def forward_to_other_peers(message, exclude:)
|
652
|
+
message._sm_header.ttl -= 1
|
653
|
+
return if message._sm_header.ttl <= 0
|
654
|
+
|
655
|
+
@p2p_connections.each do |peer_id, connection|
|
656
|
+
next if peer_id == exclude
|
657
|
+
connection.send_message(message)
|
658
|
+
end
|
659
|
+
end
|
660
|
+
end
|
661
|
+
|
662
|
+
# P2P connection handles the actual networking
|
663
|
+
class P2PConnection
|
664
|
+
def initialize(peer_address)
|
665
|
+
@peer_address = peer_address
|
666
|
+
@socket = nil
|
667
|
+
@message_queue = Queue.new
|
668
|
+
@send_thread = nil
|
669
|
+
end
|
670
|
+
|
671
|
+
def send_message(message)
|
672
|
+
@message_queue.push(message)
|
673
|
+
ensure_send_thread_running
|
674
|
+
end
|
675
|
+
|
676
|
+
private
|
677
|
+
|
678
|
+
def ensure_send_thread_running
|
679
|
+
return if @send_thread&.alive?
|
680
|
+
|
681
|
+
@send_thread = Thread.new do
|
682
|
+
while message = @message_queue.pop
|
683
|
+
deliver_message_via_socket(message)
|
684
|
+
end
|
685
|
+
end
|
686
|
+
end
|
687
|
+
end
|
688
|
+
```
|
689
|
+
|
690
|
+
### P2P Transport Implementation
|
691
|
+
|
692
|
+
```ruby
|
693
|
+
module SmartMessage
|
694
|
+
module Transport
|
695
|
+
class P2PTransport < Base
|
696
|
+
def initialize(options = {})
|
697
|
+
super
|
698
|
+
@mesh_node = MeshNode.new
|
699
|
+
@mesh_node.start(options)
|
700
|
+
end
|
701
|
+
|
702
|
+
def publish(message, routing_key = nil)
|
703
|
+
# P2P doesn't use routing keys, uses message.to field
|
704
|
+
message._sm_header.from = @mesh_node.id
|
705
|
+
|
706
|
+
if message.to
|
707
|
+
# Direct message to specific peer
|
708
|
+
@mesh_node.send_to_peer(message.to, message)
|
709
|
+
else
|
710
|
+
# Broadcast to all peers subscribed to this message type
|
711
|
+
@mesh_node.broadcast(message)
|
712
|
+
end
|
713
|
+
end
|
714
|
+
|
715
|
+
def subscribe(routing_key = nil, &block)
|
716
|
+
# Subscribe to message types, not routing keys
|
717
|
+
message_class = routing_key || SmartMessage::Base
|
718
|
+
@mesh_node.subscribe(message_class, &block)
|
719
|
+
end
|
720
|
+
end
|
721
|
+
end
|
722
|
+
end
|
723
|
+
```
|
724
|
+
|
725
|
+
## Advanced Features
|
726
|
+
|
727
|
+
### Distributed Hash Table (DHT) for Message Storage
|
728
|
+
|
729
|
+
```ruby
|
730
|
+
class DistributedMessageStore
|
731
|
+
def initialize(node)
|
732
|
+
@node = node
|
733
|
+
@dht = Kademlia::DHT.new(node.id)
|
734
|
+
end
|
735
|
+
|
736
|
+
def store_message(message)
|
737
|
+
key = Digest::SHA256.hexdigest(message.uuid)
|
738
|
+
|
739
|
+
# Find nodes responsible for this key
|
740
|
+
nodes = @dht.find_nodes(key, k: 3)
|
741
|
+
|
742
|
+
# Replicate to multiple nodes
|
743
|
+
nodes.each do |node|
|
744
|
+
node.store(key, message.to_json)
|
745
|
+
end
|
746
|
+
end
|
747
|
+
|
748
|
+
def retrieve_message(uuid)
|
749
|
+
key = Digest::SHA256.hexdigest(uuid)
|
750
|
+
nodes = @dht.find_nodes(key)
|
751
|
+
|
752
|
+
nodes.each do |node|
|
753
|
+
if data = node.retrieve(key)
|
754
|
+
return SmartMessage.from_json(data)
|
755
|
+
end
|
756
|
+
end
|
757
|
+
nil
|
758
|
+
end
|
759
|
+
end
|
760
|
+
```
|
761
|
+
|
762
|
+
### Gossip Protocol for State Synchronization
|
763
|
+
|
764
|
+
```ruby
|
765
|
+
class GossipProtocol
|
766
|
+
def initialize(node, interval: 1.0)
|
767
|
+
@node = node
|
768
|
+
@interval = interval
|
769
|
+
@state_version = 0
|
770
|
+
@peer_states = {}
|
771
|
+
end
|
772
|
+
|
773
|
+
def start
|
774
|
+
Thread.new do
|
775
|
+
loop do
|
776
|
+
sleep @interval
|
777
|
+
gossip_with_random_peer
|
778
|
+
end
|
779
|
+
end
|
780
|
+
end
|
781
|
+
|
782
|
+
def gossip_with_random_peer
|
783
|
+
peer = @node.connections.values.sample
|
784
|
+
return unless peer
|
785
|
+
|
786
|
+
# Exchange state information
|
787
|
+
my_state = {
|
788
|
+
version: @state_version,
|
789
|
+
subscriptions: @node.subscriptions.keys,
|
790
|
+
peers_count: @node.connections.size,
|
791
|
+
message_types: known_message_types
|
792
|
+
}
|
793
|
+
|
794
|
+
peer_state = peer.exchange_gossip(my_state)
|
795
|
+
merge_peer_state(peer_state)
|
796
|
+
end
|
797
|
+
end
|
798
|
+
```
|
799
|
+
|
800
|
+
## Use Cases
|
801
|
+
|
802
|
+
### 1. Decentralized IoT Networks with Bridge Nodes
|
803
|
+
|
804
|
+
```ruby
|
805
|
+
class IoTSensorReading < SmartMessage::Base
|
806
|
+
property :sensor_id, required: true
|
807
|
+
property :temperature, type: Float
|
808
|
+
property :humidity, type: Float
|
809
|
+
property :timestamp, type: Time
|
810
|
+
|
811
|
+
transport SmartMessage::Transport::MeshTransport.new
|
812
|
+
end
|
813
|
+
|
814
|
+
# Sensor on local factory network publishes to cloud analytics
|
815
|
+
sensor = IoTSensorReading.new(
|
816
|
+
sensor_id: "factory_sensor_01",
|
817
|
+
temperature: 72.5,
|
818
|
+
humidity: 45.0,
|
819
|
+
timestamp: Time.now
|
820
|
+
)
|
821
|
+
|
822
|
+
# Routes across network boundaries via bridge nodes
|
823
|
+
sensor.publish(to: "cloud-analytics-service")
|
824
|
+
|
825
|
+
# Routing path:
|
826
|
+
# Factory Sensor → Local Gateway (Bridge) → Internet → Cloud Bridge → Analytics
|
827
|
+
# (UDP local) (TCP bridge) (UDP cloud)
|
828
|
+
```
|
829
|
+
|
830
|
+
### 2. Resilient Microservices
|
831
|
+
|
832
|
+
```ruby
|
833
|
+
class OrderService
|
834
|
+
def initialize
|
835
|
+
@transport = SmartMessage::Transport::MeshTransport.new(
|
836
|
+
service_name: "order-service"
|
837
|
+
)
|
838
|
+
|
839
|
+
# Register this node as providing "order-service"
|
840
|
+
@transport.register_service("order-service")
|
841
|
+
|
842
|
+
# Subscribe to payment confirmations (may come from any payment node)
|
843
|
+
PaymentConfirmed.transport(@transport)
|
844
|
+
PaymentConfirmed.subscribe do |payment|
|
845
|
+
process_payment_confirmation(payment)
|
846
|
+
end
|
847
|
+
end
|
848
|
+
|
849
|
+
def create_order(data)
|
850
|
+
# Send to inventory service - mesh will find it wherever it runs
|
851
|
+
InventoryCheck.new(
|
852
|
+
order_id: data[:order_id],
|
853
|
+
items: data[:items]
|
854
|
+
).publish(to: "inventory-service")
|
855
|
+
|
856
|
+
# Send to payment service - could be on any node in the mesh
|
857
|
+
PaymentRequest.new(
|
858
|
+
order_id: data[:order_id],
|
859
|
+
amount: data[:amount]
|
860
|
+
).publish(to: "payment-service")
|
861
|
+
end
|
862
|
+
end
|
863
|
+
|
864
|
+
# Messages route through the mesh automatically:
|
865
|
+
# Order Node → Edge Node → Cloud Node → Payment Service Node
|
866
|
+
# Order Node → Local Node → Inventory Service Node
|
867
|
+
```
|
868
|
+
|
869
|
+
### 3. Edge Computing Mesh
|
870
|
+
|
871
|
+
```ruby
|
872
|
+
# Edge nodes form a mesh for distributed computation
|
873
|
+
class EdgeComputeNode
|
874
|
+
def initialize
|
875
|
+
@mesh = SmartMessage::Transport::P2PTransport.new(
|
876
|
+
capabilities: [:gpu, :high_memory],
|
877
|
+
region: "us-west"
|
878
|
+
)
|
879
|
+
|
880
|
+
ComputeTask.transport(@mesh)
|
881
|
+
ComputeTask.subscribe do |task|
|
882
|
+
if can_handle?(task)
|
883
|
+
result = execute_task(task)
|
884
|
+
|
885
|
+
# Send result back through mesh
|
886
|
+
TaskResult.new(
|
887
|
+
task_id: task.id,
|
888
|
+
result: result,
|
889
|
+
to: task.from # Route back to originator
|
890
|
+
).publish
|
891
|
+
else
|
892
|
+
# Forward to more capable peer
|
893
|
+
forward_to_capable_peer(task)
|
894
|
+
end
|
895
|
+
end
|
896
|
+
end
|
897
|
+
end
|
898
|
+
```
|
899
|
+
|
900
|
+
## P2P as Mesh Foundation
|
901
|
+
|
902
|
+
**P2P connections are the foundation** - every hop in the mesh is a peer-to-peer connection:
|
903
|
+
|
904
|
+
```
|
905
|
+
Publisher → Node A → Node C → Node F → Subscriber
|
906
|
+
↑ ↑ ↑ ↑
|
907
|
+
P2P P2P P2P P2P
|
908
|
+
```
|
909
|
+
|
910
|
+
**Each step involves P2P:**
|
911
|
+
1. **Publisher → First Node**: P2P connection to inject message into mesh
|
912
|
+
2. **Node → Node**: P2P connections for routing between mesh nodes
|
913
|
+
3. **Final Node → Subscriber**: P2P connection for final delivery
|
914
|
+
|
915
|
+
**Key Difference:**
|
916
|
+
|
917
|
+
**Simple P2P (journeta-style):**
|
918
|
+
- Single-hop: Publisher directly connects to subscriber's node
|
919
|
+
- Publisher must discover which specific node hosts the service
|
920
|
+
|
921
|
+
**Mesh P2P (meshage):**
|
922
|
+
- Multi-hop: Publisher connects to any mesh node, message routes through multiple P2P hops
|
923
|
+
- Publisher only needs to know service name, not location
|
924
|
+
|
925
|
+
```ruby
|
926
|
+
# Simple P2P: Publisher must know exact location
|
927
|
+
peer_node = discover_node_hosting("inventory-service")
|
928
|
+
peer_node.send_message(inventory_check)
|
929
|
+
|
930
|
+
# Mesh P2P: Publisher connects to any mesh node
|
931
|
+
mesh.publish(inventory_check, to: "inventory-service")
|
932
|
+
# Mesh handles: local_node → intermediate_nodes → destination_node
|
933
|
+
```
|
934
|
+
|
935
|
+
**Mesh Network = P2P + Routing Intelligence**
|
936
|
+
|
937
|
+
## Benefits
|
938
|
+
|
939
|
+
1. **No Single Point of Failure**: No central broker, no single routing node
|
940
|
+
2. **Self-Healing**: Network routes around failed nodes and discovers new paths
|
941
|
+
3. **Location Independence**: Services can move between nodes transparently
|
942
|
+
4. **Fault Tolerance**: Multiple routing paths provide redundancy
|
943
|
+
5. **Dynamic Discovery**: Services are found through routing, not pre-configuration
|
944
|
+
6. **Scalability**: Mesh grows organically, routing distributes automatically
|
945
|
+
7. **Privacy**: Onion routing and encryption possible
|
946
|
+
8. **Partition Tolerance**: Network segments can operate independently
|
947
|
+
|
948
|
+
## Challenges
|
949
|
+
|
950
|
+
1. **Network Partitions**: Mesh can split into islands
|
951
|
+
2. **Message Ordering**: No global ordering guarantees
|
952
|
+
3. **Security**: Need peer authentication and encryption
|
953
|
+
4. **Discovery Overhead**: Finding peers can be expensive
|
954
|
+
5. **NAT Traversal**: Peers behind firewalls need special handling
|
955
|
+
6. **Bridge Node Reliability**: Bridge failure isolates entire network segments
|
956
|
+
7. **UDP vs TCP Coordination**: Local UDP discovery vs remote TCP connections
|
957
|
+
8. **Bootstrap Node Dependencies**: Need known addresses to establish inter-network bridges
|
958
|
+
|
959
|
+
### Bridge Node Challenges
|
960
|
+
|
961
|
+
**Single Point of Failure:**
|
962
|
+
```
|
963
|
+
Network A ←→ [Single Bridge] ←→ Network B
|
964
|
+
↓ FAILS
|
965
|
+
Networks A and B become isolated
|
966
|
+
```
|
967
|
+
|
968
|
+
**Solution - Multiple Bridge Nodes:**
|
969
|
+
```
|
970
|
+
Network A ←→ [Bridge 1] ←→ Network B
|
971
|
+
↑ ←→ [Bridge 2] ←→ ↑
|
972
|
+
Multiple redundant bridge connections
|
973
|
+
```
|
974
|
+
|
975
|
+
**NAT Traversal for Bridge Nodes:**
|
976
|
+
- Bridge nodes behind NAT need port forwarding or STUN/TURN
|
977
|
+
- Or use reverse connections where bridge initiates outbound connections
|
978
|
+
- WebRTC-style techniques for hole punching
|
979
|
+
|
980
|
+
## Lessons from Journeta
|
981
|
+
|
982
|
+
The journeta codebase provides excellent patterns for P2P networking that directly apply to our meshage implementation:
|
983
|
+
|
984
|
+
### Discovery Architecture
|
985
|
+
Journeta uses UDP multicast for presence broadcasting - a simple but effective approach:
|
986
|
+
|
987
|
+
```ruby
|
988
|
+
# From journeta/presence_broadcaster.rb - simplified
|
989
|
+
class PresenceBroadcaster
|
990
|
+
def broadcast_presence
|
991
|
+
socket = UDPSocket.open
|
992
|
+
note = PresenceMessage.new(uuid, peer_port, groups)
|
993
|
+
socket.send(note.to_yaml, 0, multicast_address, port)
|
994
|
+
end
|
995
|
+
end
|
996
|
+
|
997
|
+
# For SmartMessage meshage:
|
998
|
+
class MeshPresence < SmartMessage::Base
|
999
|
+
property :node_id, required: true
|
1000
|
+
property :address, required: true
|
1001
|
+
property :port, required: true
|
1002
|
+
property :capabilities, type: Array
|
1003
|
+
property :message_types, type: Array # What messages this node handles
|
1004
|
+
property :timestamp, type: Time
|
1005
|
+
end
|
1006
|
+
```
|
1007
|
+
|
1008
|
+
### Peer Registry with Automatic Cleanup
|
1009
|
+
Journeta's PeerRegistry manages peer lifecycle with automatic reaping - crucial for mesh reliability:
|
1010
|
+
|
1011
|
+
```ruby
|
1012
|
+
# Adapted from journeta/peer_registry.rb
|
1013
|
+
class MeshPeerRegistry
|
1014
|
+
def initialize(mesh_node)
|
1015
|
+
@peers = {}
|
1016
|
+
@mutex = Mutex.new
|
1017
|
+
@reaper_tolerance = 10.0 # seconds
|
1018
|
+
start_reaper
|
1019
|
+
end
|
1020
|
+
|
1021
|
+
def reap_stale_peers
|
1022
|
+
@mutex.synchronize do
|
1023
|
+
stale_peers = @peers.select do |id, peer|
|
1024
|
+
peer.last_seen < (Time.now - @reaper_tolerance)
|
1025
|
+
end
|
1026
|
+
|
1027
|
+
stale_peers.each do |id, peer|
|
1028
|
+
@peers.delete(id)
|
1029
|
+
notify_peer_offline(peer)
|
1030
|
+
end
|
1031
|
+
end
|
1032
|
+
end
|
1033
|
+
end
|
1034
|
+
```
|
1035
|
+
|
1036
|
+
### Connection Management
|
1037
|
+
Journeta uses queued message sending with separate threads per peer - good pattern for mesh:
|
1038
|
+
|
1039
|
+
```ruby
|
1040
|
+
# From journeta/peer_connection.rb concept
|
1041
|
+
class MeshPeerConnection
|
1042
|
+
def initialize(peer_info)
|
1043
|
+
@peer = peer_info
|
1044
|
+
@message_queue = Queue.new
|
1045
|
+
@connection_thread = nil
|
1046
|
+
end
|
1047
|
+
|
1048
|
+
def send_message(message)
|
1049
|
+
@message_queue.push(message)
|
1050
|
+
ensure_connection_thread_running
|
1051
|
+
end
|
1052
|
+
|
1053
|
+
private
|
1054
|
+
|
1055
|
+
def connection_worker
|
1056
|
+
while message = @message_queue.pop
|
1057
|
+
begin
|
1058
|
+
deliver_message(message)
|
1059
|
+
rescue => e
|
1060
|
+
handle_delivery_failure(message, e)
|
1061
|
+
end
|
1062
|
+
end
|
1063
|
+
end
|
1064
|
+
end
|
1065
|
+
```
|
1066
|
+
|
1067
|
+
### Group-Based Messaging
|
1068
|
+
Journeta's group concept maps perfectly to SmartMessage's message types and routing:
|
1069
|
+
|
1070
|
+
```ruby
|
1071
|
+
# Enhanced meshage with group/topic support
|
1072
|
+
class MeshTransport < SmartMessage::Transport::Base
|
1073
|
+
def initialize(options = {})
|
1074
|
+
@groups = options[:groups] || [] # Which message types we handle
|
1075
|
+
@mesh_node = MeshNode.new(
|
1076
|
+
groups: @groups,
|
1077
|
+
capabilities: options[:capabilities] || []
|
1078
|
+
)
|
1079
|
+
end
|
1080
|
+
|
1081
|
+
def subscribe(message_class, &block)
|
1082
|
+
# Register interest in this message type
|
1083
|
+
@groups << message_class.name
|
1084
|
+
@mesh_node.update_presence_info
|
1085
|
+
|
1086
|
+
# Set up message handler
|
1087
|
+
@mesh_node.on_message(message_class) do |message|
|
1088
|
+
block.call(message) if block
|
1089
|
+
end
|
1090
|
+
end
|
1091
|
+
end
|
1092
|
+
```
|
1093
|
+
|
1094
|
+
### Threading Model
|
1095
|
+
Journeta's use of dedicated threads for each component is solid for mesh networking:
|
1096
|
+
|
1097
|
+
```ruby
|
1098
|
+
class MeshNode
|
1099
|
+
def start
|
1100
|
+
@presence_broadcaster.start # Periodic UDP broadcast
|
1101
|
+
@presence_listener.start # UDP listener for peer discovery
|
1102
|
+
@message_listener.start # TCP listener for direct messages
|
1103
|
+
@peer_registry.start # Peer lifecycle management
|
1104
|
+
end
|
1105
|
+
|
1106
|
+
def stop
|
1107
|
+
[@presence_broadcaster, @presence_listener,
|
1108
|
+
@message_listener, @peer_registry].each(&:stop)
|
1109
|
+
end
|
1110
|
+
end
|
1111
|
+
```
|
1112
|
+
|
1113
|
+
### Key Improvements for Meshage
|
1114
|
+
|
1115
|
+
1. **Better Routing**: Journeta only does direct peer-to-peer. Meshage needs routing through intermediate nodes.
|
1116
|
+
|
1117
|
+
2. **Encryption**: Journeta sends YAML in plaintext. Meshage should encrypt all communications.
|
1118
|
+
|
1119
|
+
3. **NAT Traversal**: Journeta assumes LAN connectivity. Meshage needs hole punching for internet-scale mesh.
|
1120
|
+
|
1121
|
+
4. **Message Types**: Journeta sends arbitrary Ruby objects. Meshage should integrate with SmartMessage's typed message system.
|
1122
|
+
|
1123
|
+
## Architecture Synthesis
|
1124
|
+
|
1125
|
+
Combining journeta's proven patterns with SmartMessage's features:
|
1126
|
+
|
1127
|
+
```ruby
|
1128
|
+
class SmartMeshTransport < SmartMessage::Transport::Base
|
1129
|
+
def initialize(options = {})
|
1130
|
+
@mesh_engine = JournetaEngine.new(
|
1131
|
+
peer_handler: SmartMessagePeerHandler.new(self),
|
1132
|
+
groups: extract_message_types_from_subscriptions
|
1133
|
+
)
|
1134
|
+
|
1135
|
+
# Enhanced with routing, encryption, and SmartMessage integration
|
1136
|
+
@mesh_router = MeshRouter.new(@mesh_engine)
|
1137
|
+
@message_crypto = MessageCrypto.new(options[:keypair])
|
1138
|
+
end
|
1139
|
+
|
1140
|
+
def publish(message, routing_key = nil)
|
1141
|
+
encrypted_message = @message_crypto.encrypt(message)
|
1142
|
+
|
1143
|
+
if message.to
|
1144
|
+
@mesh_router.route_to_peer(message.to, encrypted_message)
|
1145
|
+
else
|
1146
|
+
@mesh_router.broadcast_to_subscribers(message.class, encrypted_message)
|
1147
|
+
end
|
1148
|
+
end
|
1149
|
+
end
|
1150
|
+
```
|
1151
|
+
|
1152
|
+
## Key Insight: Local Knowledge with Network Discovery
|
1153
|
+
|
1154
|
+
The fundamental characteristic is **limited local knowledge with network-wide discovery**:
|
1155
|
+
|
1156
|
+
```ruby
|
1157
|
+
# Publisher knows:
|
1158
|
+
# - Local services on same node ✓
|
1159
|
+
# - Which peer nodes it can connect to ✓
|
1160
|
+
# - What subscribers are on remote nodes ✗
|
1161
|
+
|
1162
|
+
OrderMessage.new(data: order_data).publish(to: "inventory-service")
|
1163
|
+
|
1164
|
+
# Each node in the route knows:
|
1165
|
+
# - Its local subscribers ✓
|
1166
|
+
# - Its connected peer nodes ✓
|
1167
|
+
# - Subscribers on other nodes ✗
|
1168
|
+
|
1169
|
+
# Network discovery works via:
|
1170
|
+
# 1. Check local subscribers first
|
1171
|
+
# 2. Forward to connected peers (they don't know either)
|
1172
|
+
# 3. Each peer checks locally, forwards if not found
|
1173
|
+
# 4. Eventually reaches node(s) with matching subscribers
|
1174
|
+
# 5. Route gets cached for future messages
|
1175
|
+
```
|
1176
|
+
|
1177
|
+
**This approach is scalable because:**
|
1178
|
+
- No node needs global knowledge of all services
|
1179
|
+
- No centralized service directory to maintain
|
1180
|
+
- Discovery happens naturally through message routing
|
1181
|
+
- Successful routes are cached to avoid repeated flooding
|
1182
|
+
|
1183
|
+
The mesh network acts as a **distributed discovery system** where each node only knows about its immediate neighborhood, but the collective network can find services anywhere through progressive forwarding.
|
1184
|
+
|
1185
|
+
## Message Deduplication for Multi-Node Subscribers
|
1186
|
+
|
1187
|
+
**Critical Challenge:** A subscriber connected to multiple nodes can receive the same message via different routing paths:
|
1188
|
+
|
1189
|
+
```
|
1190
|
+
Publisher → Node A → Subscriber
|
1191
|
+
↘ Node B ↗
|
1192
|
+
|
1193
|
+
Subscriber receives same message twice!
|
1194
|
+
```
|
1195
|
+
|
1196
|
+
### Deduplication Architecture
|
1197
|
+
|
1198
|
+
```ruby
|
1199
|
+
class MeshSubscriber
|
1200
|
+
def initialize(service_name)
|
1201
|
+
@service_name = service_name
|
1202
|
+
@message_cache = LRU.new(1000) # Recent message UUIDs
|
1203
|
+
@connected_nodes = Set.new # Multiple mesh nodes
|
1204
|
+
end
|
1205
|
+
|
1206
|
+
# Connect to multiple mesh nodes for redundancy
|
1207
|
+
def connect_to_mesh_nodes(node_addresses)
|
1208
|
+
node_addresses.each do |address|
|
1209
|
+
mesh_connection = MeshConnection.new(address)
|
1210
|
+
mesh_connection.subscribe(@service_name) do |message|
|
1211
|
+
handle_message_with_deduplication(message)
|
1212
|
+
end
|
1213
|
+
@connected_nodes.add(mesh_connection)
|
1214
|
+
end
|
1215
|
+
end
|
1216
|
+
|
1217
|
+
private
|
1218
|
+
|
1219
|
+
def handle_message_with_deduplication(message)
|
1220
|
+
# Check if we've already processed this message
|
1221
|
+
return if @message_cache.include?(message._sm_header.uuid)
|
1222
|
+
|
1223
|
+
# Mark as processed to prevent duplicates
|
1224
|
+
@message_cache[message._sm_header.uuid] = Time.now
|
1225
|
+
|
1226
|
+
# Process the message only once
|
1227
|
+
process_message(message)
|
1228
|
+
end
|
1229
|
+
end
|
1230
|
+
```
|
1231
|
+
|
1232
|
+
### Multi-Path Routing Example
|
1233
|
+
|
1234
|
+
```ruby
|
1235
|
+
class InventoryService
|
1236
|
+
def initialize
|
1237
|
+
# Connect to multiple nodes for fault tolerance
|
1238
|
+
@mesh_subscriber = MeshSubscriber.new("inventory-service")
|
1239
|
+
@mesh_subscriber.connect_to_mesh_nodes([
|
1240
|
+
"mesh-node-1:8080",
|
1241
|
+
"mesh-node-2:8080",
|
1242
|
+
"mesh-node-3:8080"
|
1243
|
+
])
|
1244
|
+
end
|
1245
|
+
|
1246
|
+
# This will only be called once per unique message
|
1247
|
+
# even though connected to multiple nodes
|
1248
|
+
def process_message(order_message)
|
1249
|
+
puts "Processing order #{order_message.order_id} - will only see this once!"
|
1250
|
+
update_inventory(order_message.items)
|
1251
|
+
end
|
1252
|
+
end
|
1253
|
+
|
1254
|
+
# Message flow with deduplication:
|
1255
|
+
# Publisher → Node A → InventoryService ✓ (processed)
|
1256
|
+
# ↘ Node B → InventoryService ✗ (deduplicated)
|
1257
|
+
# ↘ Node C → InventoryService ✗ (deduplicated)
|
1258
|
+
```
|
1259
|
+
|
1260
|
+
### Node-Level Deduplication - Critical for Multi-Peer Nodes
|
1261
|
+
|
1262
|
+
**Challenge:** Nodes connected to multiple peers receive the same message via different routes:
|
1263
|
+
|
1264
|
+
```
|
1265
|
+
Peer A → Node X ← Peer B
|
1266
|
+
↓
|
1267
|
+
Same message arrives twice!
|
1268
|
+
```
|
1269
|
+
|
1270
|
+
**Node DDQ Implementation:**
|
1271
|
+
|
1272
|
+
```ruby
|
1273
|
+
class MeshNode
|
1274
|
+
def initialize
|
1275
|
+
@processed_messages = LRU.new(2000) # Track processed message UUIDs
|
1276
|
+
@connected_peers = {} # Multiple peer connections
|
1277
|
+
@local_subscribers = {} # Local service handlers
|
1278
|
+
end
|
1279
|
+
|
1280
|
+
def receive_message_from_peer(message, from_peer_id)
|
1281
|
+
# CRITICAL: Check if we've already processed this message
|
1282
|
+
if @processed_messages.include?(message._sm_header.uuid)
|
1283
|
+
log_debug("Dropping duplicate message #{message._sm_header.uuid} from #{from_peer_id}")
|
1284
|
+
return # Don't process duplicates!
|
1285
|
+
end
|
1286
|
+
|
1287
|
+
# Mark as processed IMMEDIATELY to prevent re-processing
|
1288
|
+
@processed_messages[message._sm_header.uuid] = {
|
1289
|
+
first_received_from: from_peer_id,
|
1290
|
+
received_at: Time.now
|
1291
|
+
}
|
1292
|
+
|
1293
|
+
# Now safe to process the message
|
1294
|
+
route_message_internally(message, from_peer_id)
|
1295
|
+
end
|
1296
|
+
|
1297
|
+
private
|
1298
|
+
|
1299
|
+
def route_message_internally(message, from_peer_id)
|
1300
|
+
# Deliver to local subscribers if we have them
|
1301
|
+
if has_local_subscribers?(message.to)
|
1302
|
+
deliver_to_local_subscribers(message)
|
1303
|
+
# Note: Don't return - message may need to continue routing
|
1304
|
+
end
|
1305
|
+
|
1306
|
+
# Forward to other connected peers (excluding sender)
|
1307
|
+
forward_to_other_peers(message, exclude: from_peer_id)
|
1308
|
+
end
|
1309
|
+
|
1310
|
+
def forward_to_other_peers(message, exclude:)
|
1311
|
+
# Decrement TTL to prevent infinite routing
|
1312
|
+
message._sm_header.ttl -= 1
|
1313
|
+
return if message._sm_header.ttl <= 0
|
1314
|
+
|
1315
|
+
@connected_peers.each do |peer_id, connection|
|
1316
|
+
next if peer_id == exclude # Don't send back to sender
|
1317
|
+
|
1318
|
+
connection.send_message(message)
|
1319
|
+
end
|
1320
|
+
end
|
1321
|
+
end
|
1322
|
+
```
|
1323
|
+
|
1324
|
+
**Multi-Peer Node Scenario:**
|
1325
|
+
|
1326
|
+
```ruby
|
1327
|
+
# Node connected to 4 peers for redundancy
|
1328
|
+
class HighAvailabilityMeshNode < MeshNode
|
1329
|
+
def initialize
|
1330
|
+
super
|
1331
|
+
|
1332
|
+
# Connect to multiple peers for fault tolerance
|
1333
|
+
connect_to_peers([
|
1334
|
+
"mesh-peer-1:8080",
|
1335
|
+
"mesh-peer-2:8080",
|
1336
|
+
"mesh-peer-3:8080",
|
1337
|
+
"mesh-peer-4:8080"
|
1338
|
+
])
|
1339
|
+
end
|
1340
|
+
|
1341
|
+
# Same message might arrive from multiple peers:
|
1342
|
+
# Peer 1 → This Node ✓ (processed)
|
1343
|
+
# Peer 2 → This Node ✗ (deduplicated)
|
1344
|
+
# Peer 3 → This Node ✗ (deduplicated)
|
1345
|
+
# Peer 4 → This Node ✗ (deduplicated)
|
1346
|
+
end
|
1347
|
+
```
|
1348
|
+
|
1349
|
+
**DDQ Prevents Multiple Issues:**
|
1350
|
+
|
1351
|
+
1. **Duplicate Local Delivery:**
|
1352
|
+
```ruby
|
1353
|
+
# Without DDQ:
|
1354
|
+
# Peer A sends OrderMessage → Node processes → Delivers to local InventoryService
|
1355
|
+
# Peer B sends same OrderMessage → Node processes → Delivers AGAIN to InventoryService!
|
1356
|
+
|
1357
|
+
# With DDQ:
|
1358
|
+
# Peer A sends OrderMessage → Node processes → Delivers to local InventoryService ✓
|
1359
|
+
# Peer B sends same OrderMessage → Node deduplicates → NO duplicate delivery ✓
|
1360
|
+
```
|
1361
|
+
|
1362
|
+
2. **Duplicate Forwarding:**
|
1363
|
+
```ruby
|
1364
|
+
# Without DDQ:
|
1365
|
+
# Peer A → Node X → forwards to Peers C,D,E
|
1366
|
+
# Peer B → Node X → forwards AGAIN to Peers C,D,E (message storm!)
|
1367
|
+
|
1368
|
+
# With DDQ:
|
1369
|
+
# Peer A → Node X → forwards to Peers C,D,E ✓
|
1370
|
+
# Peer B → Node X → deduplicated, no forwarding ✓
|
1371
|
+
```
|
1372
|
+
|
1373
|
+
3. **Routing Loops:**
|
1374
|
+
```ruby
|
1375
|
+
# Without DDQ, messages can loop forever:
|
1376
|
+
# Node A → Node B → Node C → Node A → Node B...
|
1377
|
+
|
1378
|
+
# With DDQ, each node only processes each message once:
|
1379
|
+
# Node A → Node B → Node C → (back to Node A but deduplicated)
|
1380
|
+
```
|
1381
|
+
|
1382
|
+
**Enhanced Message Flow with Node DDQ:**
|
1383
|
+
|
1384
|
+
```
|
1385
|
+
Publisher
|
1386
|
+
↓
|
1387
|
+
Node 1 (receives original message)
|
1388
|
+
↙ ↘
|
1389
|
+
Node 2 Node 3 (both forward to Node 4)
|
1390
|
+
↘ ↙
|
1391
|
+
Node 4 (receives same message from both Node 2 and Node 3)
|
1392
|
+
DDQ prevents duplicate processing! ✓
|
1393
|
+
```
|
1394
|
+
|
1395
|
+
### SmartMessage Header Enhancement
|
1396
|
+
|
1397
|
+
```ruby
|
1398
|
+
class SmartMessageHeader
|
1399
|
+
property :uuid, required: true # For deduplication
|
1400
|
+
property :ttl, type: Integer, default: 10 # Prevent infinite routing
|
1401
|
+
property :route_path, type: Array # Track routing path
|
1402
|
+
property :came_from, type: String # Prevent backtracking
|
1403
|
+
|
1404
|
+
def add_to_route_path(node_id)
|
1405
|
+
@route_path ||= []
|
1406
|
+
@route_path << node_id
|
1407
|
+
end
|
1408
|
+
|
1409
|
+
def visited_node?(node_id)
|
1410
|
+
@route_path&.include?(node_id)
|
1411
|
+
end
|
1412
|
+
end
|
1413
|
+
```
|
1414
|
+
|
1415
|
+
### Deduplication Benefits in Mesh
|
1416
|
+
|
1417
|
+
1. **Subscriber Reliability**: Subscribers can connect to multiple nodes without receiving duplicates
|
1418
|
+
2. **Node Reliability**: Nodes can connect to multiple peers without processing duplicates
|
1419
|
+
3. **Fault Tolerance**: If connections fail, redundant paths still work without creating duplicates
|
1420
|
+
4. **Load Distribution**: Messages can flow through different paths but are processed exactly once
|
1421
|
+
5. **Network Efficiency**: Prevents message storms and routing loops
|
1422
|
+
6. **Mesh Scalability**: Enables dense connectivity without duplicate processing overhead
|
1423
|
+
|
1424
|
+
### DDQ at Every Level
|
1425
|
+
|
1426
|
+
**Complete Deduplication Stack:**
|
1427
|
+
|
1428
|
+
```ruby
|
1429
|
+
# Level 1: Publisher (sends once but to multiple entry points)
|
1430
|
+
Publisher → [Node A, Node B] (same message to multiple nodes)
|
1431
|
+
|
1432
|
+
# Level 2: Entry Nodes (deduplicate between entry points)
|
1433
|
+
Node A → downstream peers (processes once)
|
1434
|
+
Node B → downstream peers (deduplicates, doesn't reprocess)
|
1435
|
+
|
1436
|
+
# Level 3: Intermediate Nodes (deduplicate multi-path routing)
|
1437
|
+
Node C ← [Node A, Node B] (receives from both, processes once)
|
1438
|
+
|
1439
|
+
# Level 4: Subscriber Nodes (deduplicate final delivery)
|
1440
|
+
Subscriber Node ← [Path 1, Path 2] (receives via multiple paths, processes once)
|
1441
|
+
|
1442
|
+
# Level 5: Subscribers (deduplicate multi-node connections)
|
1443
|
+
Subscriber ← [Node X, Node Y] (connected to multiple nodes, processes once)
|
1444
|
+
```
|
1445
|
+
|
1446
|
+
**Every layer needs DDQ because every layer can receive duplicates!**
|
1447
|
+
|
1448
|
+
### Use Case: Resilient Payment Service
|
1449
|
+
|
1450
|
+
```ruby
|
1451
|
+
class PaymentService
|
1452
|
+
def initialize
|
1453
|
+
# Connect to multiple mesh nodes across different data centers
|
1454
|
+
@mesh_subscriber = MeshSubscriber.new("payment-service")
|
1455
|
+
@mesh_subscriber.connect_to_mesh_nodes([
|
1456
|
+
"dc1-mesh-node:8080", # Data center 1
|
1457
|
+
"dc2-mesh-node:8080", # Data center 2
|
1458
|
+
"edge-mesh-node:8080" # Edge location
|
1459
|
+
])
|
1460
|
+
end
|
1461
|
+
|
1462
|
+
def process_payment(payment_request)
|
1463
|
+
# Critical: This must only execute once per payment
|
1464
|
+
# Even though we're connected to multiple mesh nodes
|
1465
|
+
charge_credit_card(payment_request.amount)
|
1466
|
+
end
|
1467
|
+
end
|
1468
|
+
|
1469
|
+
# Scenario: Network partition heals
|
1470
|
+
# - Payment request sent during partition reached DC1
|
1471
|
+
# - When partition heals, might also route through DC2
|
1472
|
+
# - Deduplication ensures payment only processed once
|
1473
|
+
```
|
1474
|
+
|
1475
|
+
## Network Control Messages
|
1476
|
+
|
1477
|
+
Mesh networks need control messages for management and coordination - these have different routing patterns than application messages:
|
1478
|
+
|
1479
|
+
### Control Message Types
|
1480
|
+
|
1481
|
+
```ruby
|
1482
|
+
module SmartMessage
|
1483
|
+
module MeshControl
|
1484
|
+
# Node presence announcement (local network broadcast)
|
1485
|
+
class PresenceAnnouncement < SmartMessage::Base
|
1486
|
+
property :node_id, required: true
|
1487
|
+
property :node_address, required: true
|
1488
|
+
property :tcp_port, type: Integer
|
1489
|
+
property :capabilities, type: Array, default: []
|
1490
|
+
property :local_services, type: Array, default: []
|
1491
|
+
property :is_bridge_node, type: TrueClass, default: false
|
1492
|
+
property :bridge_networks, type: Array, default: []
|
1493
|
+
property :mesh_version, type: String
|
1494
|
+
property :announced_at, type: Time, default: -> { Time.now }
|
1495
|
+
|
1496
|
+
# Presence messages use UDP broadcast, not mesh routing
|
1497
|
+
transport SmartMessage::Transport::UDPBroadcast.new
|
1498
|
+
end
|
1499
|
+
|
1500
|
+
# Graceful shutdown notification (mesh-routed)
|
1501
|
+
class NodeShutdown < SmartMessage::Base
|
1502
|
+
property :node_id, required: true
|
1503
|
+
property :reason, type: String, default: "graceful_shutdown"
|
1504
|
+
property :estimated_downtime, type: Integer # seconds
|
1505
|
+
property :replacement_nodes, type: Array, default: []
|
1506
|
+
property :shutdown_at, type: Time, default: -> { Time.now }
|
1507
|
+
|
1508
|
+
# Shutdown messages route through mesh to all nodes
|
1509
|
+
transport SmartMessage::Transport::MeshTransport.new
|
1510
|
+
end
|
1511
|
+
|
1512
|
+
# Health check / heartbeat (peer-to-peer)
|
1513
|
+
class HealthCheck < SmartMessage::Base
|
1514
|
+
property :node_id, required: true
|
1515
|
+
property :sequence_number, type: Integer
|
1516
|
+
property :timestamp, type: Time, default: -> { Time.now }
|
1517
|
+
property :load_average, type: Float
|
1518
|
+
property :active_connections, type: Integer
|
1519
|
+
property :message_queue_depth, type: Integer
|
1520
|
+
|
1521
|
+
# Health checks go directly between connected peers
|
1522
|
+
transport SmartMessage::Transport::P2PTransport.new
|
1523
|
+
end
|
1524
|
+
|
1525
|
+
# Route learning/sharing between nodes
|
1526
|
+
class RouteAdvertisement < SmartMessage::Base
|
1527
|
+
property :node_id, required: true
|
1528
|
+
property :known_services, type: Hash # service_name => [node_paths]
|
1529
|
+
property :route_costs, type: Hash # service_name => hop_count
|
1530
|
+
property :last_seen, type: Hash # service_name => timestamp
|
1531
|
+
|
1532
|
+
# Route ads propagate through mesh with limited TTL
|
1533
|
+
transport SmartMessage::Transport::MeshTransport.new(ttl: 3)
|
1534
|
+
end
|
1535
|
+
end
|
1536
|
+
end
|
1537
|
+
```
|
1538
|
+
|
1539
|
+
### Control Message Routing Patterns
|
1540
|
+
|
1541
|
+
```ruby
|
1542
|
+
class MeshControlHandler
|
1543
|
+
def initialize(mesh_node)
|
1544
|
+
@mesh_node = mesh_node
|
1545
|
+
setup_control_message_subscriptions
|
1546
|
+
end
|
1547
|
+
|
1548
|
+
private
|
1549
|
+
|
1550
|
+
def setup_control_message_subscriptions
|
1551
|
+
# Handle presence announcements (UDP broadcast)
|
1552
|
+
PresenceAnnouncement.subscribe do |announcement|
|
1553
|
+
handle_peer_presence(announcement)
|
1554
|
+
end
|
1555
|
+
|
1556
|
+
# Handle shutdown notifications (mesh-routed)
|
1557
|
+
NodeShutdown.subscribe do |shutdown|
|
1558
|
+
handle_peer_shutdown(shutdown)
|
1559
|
+
end
|
1560
|
+
|
1561
|
+
# Handle health checks (direct P2P)
|
1562
|
+
HealthCheck.subscribe do |health|
|
1563
|
+
handle_peer_health(health)
|
1564
|
+
end
|
1565
|
+
|
1566
|
+
# Handle route advertisements (mesh-routed, limited TTL)
|
1567
|
+
RouteAdvertisement.subscribe do |route_ad|
|
1568
|
+
handle_route_advertisement(route_ad)
|
1569
|
+
end
|
1570
|
+
end
|
1571
|
+
|
1572
|
+
def handle_peer_presence(announcement)
|
1573
|
+
if announcement.node_id != @mesh_node.id
|
1574
|
+
# Update peer registry
|
1575
|
+
@mesh_node.register_or_update_peer(
|
1576
|
+
id: announcement.node_id,
|
1577
|
+
address: announcement.node_address,
|
1578
|
+
port: announcement.tcp_port,
|
1579
|
+
services: announcement.local_services,
|
1580
|
+
is_bridge: announcement.is_bridge_node,
|
1581
|
+
last_seen: announcement.announced_at
|
1582
|
+
)
|
1583
|
+
|
1584
|
+
# Attempt connection if beneficial
|
1585
|
+
consider_connecting_to_peer(announcement)
|
1586
|
+
end
|
1587
|
+
end
|
1588
|
+
|
1589
|
+
def handle_peer_shutdown(shutdown)
|
1590
|
+
# Remove from routing tables
|
1591
|
+
@mesh_node.remove_peer(shutdown.node_id)
|
1592
|
+
|
1593
|
+
# Update route cache to avoid the shutting down node
|
1594
|
+
@mesh_node.invalidate_routes_through(shutdown.node_id)
|
1595
|
+
|
1596
|
+
# If replacement nodes suggested, consider connecting
|
1597
|
+
shutdown.replacement_nodes.each do |replacement|
|
1598
|
+
consider_connecting_to_peer(replacement)
|
1599
|
+
end
|
1600
|
+
end
|
1601
|
+
|
1602
|
+
def handle_peer_health(health)
|
1603
|
+
# Update peer health metrics
|
1604
|
+
@mesh_node.update_peer_health(
|
1605
|
+
health.node_id,
|
1606
|
+
load: health.load_average,
|
1607
|
+
connections: health.active_connections,
|
1608
|
+
queue_depth: health.message_queue_depth,
|
1609
|
+
last_heartbeat: health.timestamp
|
1610
|
+
)
|
1611
|
+
|
1612
|
+
# Respond with our health if this is a health check request
|
1613
|
+
respond_to_health_check(health) if health.sequence_number > 0
|
1614
|
+
end
|
1615
|
+
|
1616
|
+
def handle_route_advertisement(route_ad)
|
1617
|
+
# Update routing table with learned routes
|
1618
|
+
route_ad.known_services.each do |service_name, node_paths|
|
1619
|
+
cost = route_ad.route_costs[service_name] + 1 # Add one hop
|
1620
|
+
last_seen = route_ad.last_seen[service_name]
|
1621
|
+
|
1622
|
+
@mesh_node.learn_route(service_name, node_paths, cost, last_seen)
|
1623
|
+
end
|
1624
|
+
end
|
1625
|
+
end
|
1626
|
+
```
|
1627
|
+
|
1628
|
+
### Periodic Control Message Generation
|
1629
|
+
|
1630
|
+
```ruby
|
1631
|
+
class MeshControlScheduler
|
1632
|
+
def initialize(mesh_node)
|
1633
|
+
@mesh_node = mesh_node
|
1634
|
+
@running = false
|
1635
|
+
end
|
1636
|
+
|
1637
|
+
def start
|
1638
|
+
@running = true
|
1639
|
+
|
1640
|
+
# Start presence broadcasting (local network)
|
1641
|
+
@presence_thread = Thread.new { presence_broadcast_loop }
|
1642
|
+
|
1643
|
+
# Start health checking (peer connections)
|
1644
|
+
@health_thread = Thread.new { health_check_loop }
|
1645
|
+
|
1646
|
+
# Start route sharing (mesh network)
|
1647
|
+
@route_thread = Thread.new { route_advertisement_loop }
|
1648
|
+
end
|
1649
|
+
|
1650
|
+
private
|
1651
|
+
|
1652
|
+
def presence_broadcast_loop
|
1653
|
+
sequence = 0
|
1654
|
+
while @running
|
1655
|
+
PresenceAnnouncement.new(
|
1656
|
+
node_id: @mesh_node.id,
|
1657
|
+
node_address: @mesh_node.external_address,
|
1658
|
+
tcp_port: @mesh_node.port,
|
1659
|
+
capabilities: @mesh_node.capabilities,
|
1660
|
+
local_services: @mesh_node.local_service_names,
|
1661
|
+
is_bridge_node: @mesh_node.bridge_node?,
|
1662
|
+
bridge_networks: @mesh_node.bridge_networks
|
1663
|
+
).publish
|
1664
|
+
|
1665
|
+
sleep 5 # Broadcast every 5 seconds
|
1666
|
+
sequence += 1
|
1667
|
+
end
|
1668
|
+
end
|
1669
|
+
|
1670
|
+
def health_check_loop
|
1671
|
+
sequence = 0
|
1672
|
+
while @running
|
1673
|
+
# Send health check to each connected peer
|
1674
|
+
@mesh_node.connected_peers.each do |peer_id, connection|
|
1675
|
+
HealthCheck.new(
|
1676
|
+
node_id: @mesh_node.id,
|
1677
|
+
sequence_number: sequence,
|
1678
|
+
load_average: system_load_average,
|
1679
|
+
active_connections: @mesh_node.connection_count,
|
1680
|
+
message_queue_depth: @mesh_node.queue_depth
|
1681
|
+
).publish(to: peer_id)
|
1682
|
+
end
|
1683
|
+
|
1684
|
+
sleep 10 # Health check every 10 seconds
|
1685
|
+
sequence += 1
|
1686
|
+
end
|
1687
|
+
end
|
1688
|
+
|
1689
|
+
def route_advertisement_loop
|
1690
|
+
while @running
|
1691
|
+
# Share known routes with mesh (limited propagation)
|
1692
|
+
RouteAdvertisement.new(
|
1693
|
+
node_id: @mesh_node.id,
|
1694
|
+
known_services: @mesh_node.routing_table.known_services,
|
1695
|
+
route_costs: @mesh_node.routing_table.route_costs,
|
1696
|
+
last_seen: @mesh_node.routing_table.last_seen_times
|
1697
|
+
).publish # Mesh-routed with TTL=3
|
1698
|
+
|
1699
|
+
sleep 30 # Route sharing every 30 seconds
|
1700
|
+
end
|
1701
|
+
end
|
1702
|
+
end
|
1703
|
+
```
|
1704
|
+
|
1705
|
+
### Graceful Shutdown Protocol
|
1706
|
+
|
1707
|
+
```ruby
|
1708
|
+
class GracefulShutdown
|
1709
|
+
def initialize(mesh_node)
|
1710
|
+
@mesh_node = mesh_node
|
1711
|
+
end
|
1712
|
+
|
1713
|
+
def initiate_shutdown(reason: "graceful_shutdown", drain_time: 10)
|
1714
|
+
# 1. Stop accepting new connections
|
1715
|
+
@mesh_node.stop_accepting_connections
|
1716
|
+
|
1717
|
+
# 2. Announce shutdown to mesh network
|
1718
|
+
NodeShutdown.new(
|
1719
|
+
node_id: @mesh_node.id,
|
1720
|
+
reason: reason,
|
1721
|
+
estimated_downtime: drain_time,
|
1722
|
+
replacement_nodes: suggest_replacement_nodes
|
1723
|
+
).publish
|
1724
|
+
|
1725
|
+
# 3. Wait for message queues to drain
|
1726
|
+
wait_for_queue_drain(timeout: drain_time)
|
1727
|
+
|
1728
|
+
# 4. Close peer connections gracefully
|
1729
|
+
@mesh_node.close_all_connections
|
1730
|
+
|
1731
|
+
# 5. Stop control message generation
|
1732
|
+
@mesh_node.stop_control_scheduler
|
1733
|
+
end
|
1734
|
+
|
1735
|
+
private
|
1736
|
+
|
1737
|
+
def suggest_replacement_nodes
|
1738
|
+
# Suggest peer nodes that could handle our local services
|
1739
|
+
@mesh_node.connected_peers.select do |peer_id, peer_info|
|
1740
|
+
peer_info.capabilities.intersect?(@mesh_node.local_services)
|
1741
|
+
end.keys
|
1742
|
+
end
|
1743
|
+
end
|
1744
|
+
```
|
1745
|
+
|
1746
|
+
### Control Message Benefits
|
1747
|
+
|
1748
|
+
1. **Network Awareness**: Nodes discover each other and their capabilities
|
1749
|
+
2. **Health Monitoring**: Detect failed nodes and connection issues
|
1750
|
+
3. **Route Learning**: Build efficient routing tables through shared knowledge
|
1751
|
+
4. **Graceful Degradation**: Handle planned shutdowns and maintenance
|
1752
|
+
5. **Load Balancing**: Route messages based on node health and capacity
|
1753
|
+
6. **Bridge Discovery**: Find nodes that can route to other networks
|
1754
|
+
|
1755
|
+
## Implementation Summary
|
1756
|
+
|
1757
|
+
This comprehensive design for Meshage provides a complete architecture for true mesh networking in SmartMessage:
|
1758
|
+
|
1759
|
+
### Core Architecture Components
|
1760
|
+
1. **P2P Foundation**: Uses p2p2-style NAT traversal and connection management as the networking foundation
|
1761
|
+
2. **Multi-Hop Routing**: Messages route through intermediate nodes with local knowledge only
|
1762
|
+
3. **Bridge Nodes**: Enable inter-network connectivity beyond UDP broadcast limitations
|
1763
|
+
4. **Multi-Layer Deduplication**: Prevents message storms at subscriber, node, and network levels
|
1764
|
+
5. **Network Control Messages**: Management protocols for presence, health, shutdown, and route discovery
|
1765
|
+
|
1766
|
+
### Key Design Principles Achieved
|
1767
|
+
- **Complete Decentralization**: No central brokers or coordination points
|
1768
|
+
- **Location-Agnostic Publishing**: Publishers don't need to know subscriber locations
|
1769
|
+
- **Local Knowledge Model**: Nodes only know immediate connections, ensuring scalability
|
1770
|
+
- **Progressive Discovery**: Services found through network-wide routing, not pre-configuration
|
1771
|
+
- **Fault Tolerance**: Multiple routing paths and redundant connections
|
1772
|
+
- **Self-Healing**: Network automatically routes around failed nodes
|
1773
|
+
|
1774
|
+
### Innovation Synthesis
|
1775
|
+
- **P2P2 NAT Traversal**: Proven hole punching techniques for internet-scale connectivity
|
1776
|
+
- **Journeta Threading**: Robust concurrent connection management patterns
|
1777
|
+
- **SmartMessage Integration**: Typed messages with validation and lifecycle management
|
1778
|
+
- **Mesh Routing Intelligence**: Multi-hop discovery with route caching and TTL protection
|
1779
|
+
|
1780
|
+
This design transforms SmartMessage from a traditional message bus into a resilient, decentralized mesh networking platform suitable for IoT, edge computing, and distributed microservices architectures.
|
1781
|
+
|
1782
|
+
## Next Steps for Implementation
|
1783
|
+
|
1784
|
+
- **Phase 1**: Basic P2P connections with SmartMessage integration
|
1785
|
+
- **Phase 2**: Local network mesh with UDP discovery and multi-hop routing
|
1786
|
+
- **Phase 3**: Bridge nodes for inter-network connectivity with TCP tunneling
|
1787
|
+
- **Phase 4**: Advanced features (DHT storage, gossip protocols, encryption)
|
1788
|
+
- **Phase 5**: Production hardening (monitoring, metrics, debugging tools)
|