@diagrammo/dgmo 0.8.12 → 0.8.14
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/cli.cjs +115 -719
- package/dist/index.cjs +11 -5
- package/dist/index.cjs.map +1 -1
- package/dist/index.js +9 -6
- package/dist/index.js.map +1 -1
- package/docs/guide/chart-arc.md +71 -0
- package/docs/guide/chart-area.md +73 -0
- package/docs/guide/chart-bar-stacked.md +61 -0
- package/docs/guide/chart-bar.md +62 -0
- package/docs/guide/chart-boxes-and-lines.md +243 -0
- package/docs/guide/chart-c4.md +300 -0
- package/docs/guide/chart-chord.md +43 -0
- package/docs/guide/chart-class.md +204 -0
- package/docs/guide/chart-doughnut.md +38 -0
- package/docs/guide/chart-er.md +218 -0
- package/docs/guide/chart-flowchart.md +102 -0
- package/docs/guide/chart-function.md +56 -0
- package/docs/guide/chart-funnel.md +38 -0
- package/docs/guide/chart-gantt.md +368 -0
- package/docs/guide/chart-heatmap.md +41 -0
- package/docs/guide/chart-infra.md +694 -0
- package/docs/guide/chart-kanban.md +156 -0
- package/docs/guide/chart-line.md +79 -0
- package/docs/guide/chart-multi-line.md +84 -0
- package/docs/guide/chart-org.md +209 -0
- package/docs/guide/chart-pie.md +39 -0
- package/docs/guide/chart-polar-area.md +38 -0
- package/docs/guide/chart-quadrant.md +69 -0
- package/docs/guide/chart-radar.md +38 -0
- package/docs/guide/chart-sankey.md +103 -0
- package/docs/guide/chart-scatter.md +94 -0
- package/docs/guide/chart-sequence.md +332 -0
- package/docs/guide/chart-sitemap.md +248 -0
- package/docs/guide/chart-slope.md +56 -0
- package/docs/guide/chart-state.md +171 -0
- package/docs/guide/chart-timeline.md +229 -0
- package/docs/guide/chart-venn.md +81 -0
- package/docs/guide/chart-wordcloud.md +66 -0
- package/docs/guide/colors.md +283 -0
- package/docs/guide/index.md +55 -0
- package/docs/guide/keyboard-shortcuts.md +49 -0
- package/docs/guide/registry.json +51 -0
- package/gallery/fixtures/boxes-and-lines.dgmo +4 -6
- package/package.json +2 -2
- package/src/sharing.ts +3 -4
|
@@ -0,0 +1,694 @@
|
|
|
1
|
+
# Infrastructure Diagram
|
|
2
|
+
|
|
3
|
+
```dgmo
|
|
4
|
+
infra SaaS API Platform
|
|
5
|
+
|
|
6
|
+
tag Team alias t
|
|
7
|
+
Platform(teal) default
|
|
8
|
+
Backend(blue)
|
|
9
|
+
Data(purple)
|
|
10
|
+
|
|
11
|
+
Edge
|
|
12
|
+
rps 8000
|
|
13
|
+
-> CDN
|
|
14
|
+
|
|
15
|
+
CDN | t: Platform
|
|
16
|
+
description Edge cache — static assets and cacheable API responses
|
|
17
|
+
cache-hit 75%
|
|
18
|
+
-> WAF
|
|
19
|
+
|
|
20
|
+
WAF | t: Platform
|
|
21
|
+
description Web application firewall
|
|
22
|
+
firewall-block 8%
|
|
23
|
+
ratelimit-rps 20000
|
|
24
|
+
-> LB
|
|
25
|
+
|
|
26
|
+
LB | t: Platform
|
|
27
|
+
-/api-> [API Cluster] | split: 60%
|
|
28
|
+
-/search-> SearchAPI | split: 30%
|
|
29
|
+
-/static-> StaticCDN | split: 10%
|
|
30
|
+
|
|
31
|
+
[API Cluster]
|
|
32
|
+
instances 3
|
|
33
|
+
|
|
34
|
+
APIServer | t: Backend
|
|
35
|
+
description Core REST API — auth, billing, user data
|
|
36
|
+
instances 2
|
|
37
|
+
max-rps 1200
|
|
38
|
+
latency-ms 40
|
|
39
|
+
cb-error-threshold 50%
|
|
40
|
+
-> PostgreSQL
|
|
41
|
+
-> JobQueue
|
|
42
|
+
|
|
43
|
+
PostgreSQL | t: Data
|
|
44
|
+
max-rps 6000
|
|
45
|
+
latency-ms 8
|
|
46
|
+
uptime 99.999%
|
|
47
|
+
|
|
48
|
+
JobQueue | t: Data
|
|
49
|
+
buffer 250000
|
|
50
|
+
drain-rate 1200
|
|
51
|
+
retention-hours 48
|
|
52
|
+
partitions 8
|
|
53
|
+
-> JobWorker
|
|
54
|
+
|
|
55
|
+
[Job Workers]
|
|
56
|
+
instances 1-6
|
|
57
|
+
|
|
58
|
+
JobWorker | t: Backend
|
|
59
|
+
max-rps 400
|
|
60
|
+
latency-ms 180
|
|
61
|
+
|
|
62
|
+
SearchAPI | t: Backend
|
|
63
|
+
concurrency 800
|
|
64
|
+
duration-ms 120
|
|
65
|
+
cold-start-ms 700
|
|
66
|
+
-> SearchShards | fanout: 6
|
|
67
|
+
|
|
68
|
+
SearchShards | t: Data
|
|
69
|
+
max-rps 30000
|
|
70
|
+
latency-ms 3
|
|
71
|
+
|
|
72
|
+
StaticCDN | t: Platform
|
|
73
|
+
cache-hit 98%
|
|
74
|
+
latency-ms 4
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
## Overview
|
|
78
|
+
|
|
79
|
+
Infrastructure diagrams model system topology as a **directed traffic flow graph**. You declare components, wire them together, and set behavioral properties — DGMO computes downstream RPS, latency percentiles, availability, circuit breaker states, and queue metrics automatically.
|
|
80
|
+
|
|
81
|
+
Unlike static architecture diagrams, infra diagrams are **live simulations**. Change the entry RPS or switch a scenario and every downstream metric updates.
|
|
82
|
+
|
|
83
|
+
Components don't have explicit types — their **role is inferred from their properties**. A node with `cache-hit` is a cache. One with `buffer` is a queue. One with `concurrency` is serverless.
|
|
84
|
+
|
|
85
|
+
## Settings
|
|
86
|
+
|
|
87
|
+
| Key | Description | Default |
|
|
88
|
+
| -------------------- | ---------------------------------------------------- | -------- |
|
|
89
|
+
| `chart` | Must be `infra` | — |
|
|
90
|
+
| `title` | Diagram title | None |
|
|
91
|
+
| `direction-tb` | Top-to-bottom layout (omit for left-to-right) | off (LR) |
|
|
92
|
+
| `default-latency-ms` | Latency for components without explicit `latency-ms` | `0` |
|
|
93
|
+
| `default-uptime` | Uptime % for components without explicit `uptime` | `100` |
|
|
94
|
+
| `animate` | Flow animation (boolean; `no-animate` to disable) | on |
|
|
95
|
+
|
|
96
|
+
## Entry Point (Edge)
|
|
97
|
+
|
|
98
|
+
Every diagram needs exactly one **edge entry point** — the source of all inbound traffic. Name a component `Edge` (or `Internet`) and give it an `rps` property:
|
|
99
|
+
|
|
100
|
+
```
|
|
101
|
+
Edge
|
|
102
|
+
rps 100000
|
|
103
|
+
-> FirstComponent
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
The `rps` property is **only valid on the edge node**. All downstream RPS values are computed from this single number.
|
|
107
|
+
|
|
108
|
+
## Components
|
|
109
|
+
|
|
110
|
+
A component is any named node — server, database, cache, queue, or service. Write the name on its own line, then indent properties below it:
|
|
111
|
+
|
|
112
|
+
```
|
|
113
|
+
APIServer
|
|
114
|
+
description Handles REST API requests for the mobile app
|
|
115
|
+
max-rps 500
|
|
116
|
+
latency-ms 30
|
|
117
|
+
uptime 99.95%
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
Names must start with a letter or underscore and can contain letters, numbers, and underscores.
|
|
121
|
+
|
|
122
|
+
## Connections
|
|
123
|
+
|
|
124
|
+
Connect components with arrow syntax:
|
|
125
|
+
|
|
126
|
+
```
|
|
127
|
+
-> Target // unlabeled
|
|
128
|
+
-/api-> Target // labeled
|
|
129
|
+
-/api-> Target | split: 60% // with traffic split
|
|
130
|
+
-> [Group Name] // to a group
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
Connections define the directed acyclic graph (DAG) that traffic flows through. **Cycles are not allowed** — DGMO will report an error.
|
|
134
|
+
|
|
135
|
+
## Traffic Splits
|
|
136
|
+
|
|
137
|
+
When a component has multiple outbound connections, traffic is distributed across them. Add `| split: N%` after the target to declare an explicit percentage.
|
|
138
|
+
|
|
139
|
+
### All splits declared
|
|
140
|
+
|
|
141
|
+
When every outbound edge has a split, they **must sum to 100%**. DGMO warns if they don't.
|
|
142
|
+
|
|
143
|
+
```
|
|
144
|
+
LB
|
|
145
|
+
-/api-> API | split: 60%
|
|
146
|
+
-/web-> Web | split: 30%
|
|
147
|
+
-/static-> Static | split: 10%
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
If the LB receives 10,000 RPS after upstream behaviors: API gets 6,000, Web gets 3,000, Static gets 1,000.
|
|
151
|
+
|
|
152
|
+
### No splits declared
|
|
153
|
+
|
|
154
|
+
When no outbound edge has a split, traffic is distributed **evenly**:
|
|
155
|
+
|
|
156
|
+
```
|
|
157
|
+
LB
|
|
158
|
+
-> API
|
|
159
|
+
-> Web
|
|
160
|
+
-> Static
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
Three targets → each gets 33.3% of the LB's output RPS.
|
|
164
|
+
|
|
165
|
+
### Some splits declared
|
|
166
|
+
|
|
167
|
+
When only some edges have splits, the declared percentages are used and undeclared targets **share the remainder equally**:
|
|
168
|
+
|
|
169
|
+
```
|
|
170
|
+
LB
|
|
171
|
+
-/api-> API | split: 60%
|
|
172
|
+
-/web-> Web | split: 20%
|
|
173
|
+
-/health-> Health
|
|
174
|
+
-/metrics-> Metrics
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
API gets 60%, Web gets 20%, and the remaining 20% is split evenly between Health (10%) and Metrics (10%).
|
|
178
|
+
|
|
179
|
+
If the declared splits exceed 100%, DGMO produces a warning.
|
|
180
|
+
|
|
181
|
+
### Splits happen after behaviors
|
|
182
|
+
|
|
183
|
+
Split percentages apply to the **post-behavior** RPS — after cache, firewall, and rate-limit reductions. If a component receives 10,000 RPS and has `cache-hit: 50%`, only 5,000 RPS reach the split point:
|
|
184
|
+
|
|
185
|
+
```
|
|
186
|
+
CDN
|
|
187
|
+
cache-hit 50%
|
|
188
|
+
-/api-> API | split: 70% // 3,500 RPS
|
|
189
|
+
-/static-> Static | split: 30% // 1,500 RPS
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
### Single outbound connection
|
|
193
|
+
|
|
194
|
+
A component with one outbound edge always sends 100% of its (post-behavior) traffic — no split annotation needed:
|
|
195
|
+
|
|
196
|
+
```
|
|
197
|
+
CDN
|
|
198
|
+
cache-hit 80%
|
|
199
|
+
-> API
|
|
200
|
+
```
|
|
201
|
+
|
|
202
|
+
---
|
|
203
|
+
|
|
204
|
+
## Component Properties
|
|
205
|
+
|
|
206
|
+
Each property maps to a specific behavior in the traffic simulation. The sections below are grouped by capability.
|
|
207
|
+
|
|
208
|
+
### Description — `description`
|
|
209
|
+
|
|
210
|
+
Add a short prose annotation to any non-edge component. The description is **display metadata only** — it has no effect on traffic simulation.
|
|
211
|
+
|
|
212
|
+
```
|
|
213
|
+
AuthService
|
|
214
|
+
description Handles JWT issuance and session validation
|
|
215
|
+
max-rps 2000
|
|
216
|
+
latency-ms 15
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
The description appears as a muted subtitle below the component name **only when the component is selected** (clicked). This keeps the diagram clean at rest while surfacing context on demand.
|
|
220
|
+
|
|
221
|
+
- Single line — no wrapping
|
|
222
|
+
- Space-separated: `description Handles auth: JWT, sessions`
|
|
223
|
+
- Silently ignored on the `Edge`/`Internet` entry-point node
|
|
224
|
+
|
|
225
|
+
### Cache — `cache-hit`
|
|
226
|
+
|
|
227
|
+
A cache layer absorbs traffic before it reaches downstream. The percentage is the fraction served from cache.
|
|
228
|
+
|
|
229
|
+
```
|
|
230
|
+
CDN
|
|
231
|
+
cache-hit 80%
|
|
232
|
+
-> AppServer
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
**Effect on downstream RPS:**
|
|
236
|
+
|
|
237
|
+
```
|
|
238
|
+
downstream_rps = inbound_rps × (1 − cache_hit / 100)
|
|
239
|
+
```
|
|
240
|
+
|
|
241
|
+
100K RPS with `cache-hit: 80%` → only 20K forwarded downstream.
|
|
242
|
+
|
|
243
|
+
### Firewall — `firewall-block`
|
|
244
|
+
|
|
245
|
+
Drops a percentage of inbound traffic (malicious requests, bots, blocked IPs):
|
|
246
|
+
|
|
247
|
+
```
|
|
248
|
+
WAF
|
|
249
|
+
firewall-block 5%
|
|
250
|
+
-> Gateway
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
**Effect:** `downstream_rps = inbound_rps × (1 − firewall_block / 100)`
|
|
254
|
+
|
|
255
|
+
Cache and firewall compose multiplicatively. Traffic through `cache-hit: 80%` then `firewall-block: 5%` keeps only `20% × 95% = 19%`.
|
|
256
|
+
|
|
257
|
+
### Rate Limiting — `ratelimit-rps`
|
|
258
|
+
|
|
259
|
+
Caps throughput at a fixed threshold. Excess traffic is rejected:
|
|
260
|
+
|
|
261
|
+
```
|
|
262
|
+
Gateway
|
|
263
|
+
ratelimit-rps 10000
|
|
264
|
+
-> API
|
|
265
|
+
```
|
|
266
|
+
|
|
267
|
+
**Effect:** `downstream_rps = min(effective_inbound_rps, ratelimit_rps)`
|
|
268
|
+
|
|
269
|
+
The effective inbound RPS is calculated after cache and firewall reductions. Rate limiting also reduces [availability](#how-availability-is-computed) — rejected traffic counts against it.
|
|
270
|
+
|
|
271
|
+
### Capacity — `max-rps` and `instances`
|
|
272
|
+
|
|
273
|
+
Define a component's throughput capacity. `max-rps` is per-instance, `instances` multiplies it:
|
|
274
|
+
|
|
275
|
+
```
|
|
276
|
+
API
|
|
277
|
+
instances 3
|
|
278
|
+
max-rps 400
|
|
279
|
+
latency-ms 30
|
|
280
|
+
```
|
|
281
|
+
|
|
282
|
+
**Total capacity:** `max_rps × instances = 400 × 3 = 1,200 RPS`
|
|
283
|
+
|
|
284
|
+
When computed RPS exceeds capacity, the component is **overloaded** — shown with red indicators.
|
|
285
|
+
|
|
286
|
+
If `instances` is omitted it defaults to 1. If `max-rps` is omitted the component has unlimited capacity.
|
|
287
|
+
|
|
288
|
+
### Dynamic Scaling — `instances: min-max`
|
|
289
|
+
|
|
290
|
+
A range like `instances: 1-8` makes DGMO compute the needed instance count:
|
|
291
|
+
|
|
292
|
+
```
|
|
293
|
+
API
|
|
294
|
+
instances 1-8
|
|
295
|
+
max-rps 300
|
|
296
|
+
latency-ms 25
|
|
297
|
+
```
|
|
298
|
+
|
|
299
|
+
**Formula:**
|
|
300
|
+
|
|
301
|
+
```
|
|
302
|
+
needed = ceil(computed_rps / max_rps)
|
|
303
|
+
actual = clamp(needed, min, max)
|
|
304
|
+
```
|
|
305
|
+
|
|
306
|
+
If the API receives 2,000 RPS: `needed = ceil(2000/300) = 7`, `actual = clamp(7, 1, 8) = 7`. But at 5,000 RPS: `needed = 17`, `actual = 8` (maxed out, overloaded).
|
|
307
|
+
|
|
308
|
+
### Latency — `latency-ms`
|
|
309
|
+
|
|
310
|
+
Per-component response time in milliseconds. Latency accumulates along the path:
|
|
311
|
+
|
|
312
|
+
```
|
|
313
|
+
CDN
|
|
314
|
+
latency-ms 5
|
|
315
|
+
-> API
|
|
316
|
+
|
|
317
|
+
API
|
|
318
|
+
latency-ms 40
|
|
319
|
+
-> DB
|
|
320
|
+
|
|
321
|
+
DB
|
|
322
|
+
latency-ms 8
|
|
323
|
+
```
|
|
324
|
+
|
|
325
|
+
A request traversing CDN → API → DB has cumulative latency of `5 + 40 + 8 = 53ms`.
|
|
326
|
+
|
|
327
|
+
If omitted, a component contributes 0ms (or `default-latency-ms` if set).
|
|
328
|
+
|
|
329
|
+
### Uptime — `uptime`
|
|
330
|
+
|
|
331
|
+
Component reliability as a percentage. Uptime propagates as the **product** along paths:
|
|
332
|
+
|
|
333
|
+
```
|
|
334
|
+
API
|
|
335
|
+
uptime 99.95%
|
|
336
|
+
-> DB
|
|
337
|
+
|
|
338
|
+
DB
|
|
339
|
+
uptime 99.99%
|
|
340
|
+
```
|
|
341
|
+
|
|
342
|
+
End-to-end: `99.95% × 99.99% ≈ 99.94%`
|
|
343
|
+
|
|
344
|
+
Defaults to 100% (or `default-uptime` if set globally).
|
|
345
|
+
|
|
346
|
+
### Circuit Breakers — `cb-error-threshold` and `cb-latency-threshold-ms`
|
|
347
|
+
|
|
348
|
+
Circuit breakers trip when failure conditions are met. DGMO models **closed** (normal) and **open** (tripped) states.
|
|
349
|
+
|
|
350
|
+
```
|
|
351
|
+
API
|
|
352
|
+
max-rps 300
|
|
353
|
+
instances 2
|
|
354
|
+
cb-error-threshold 50%
|
|
355
|
+
```
|
|
356
|
+
|
|
357
|
+
**Error-rate trigger:**
|
|
358
|
+
|
|
359
|
+
```
|
|
360
|
+
capacity = max_rps × instances
|
|
361
|
+
error_rate = (computed_rps − capacity) / computed_rps × 100
|
|
362
|
+
if error_rate ≥ cb_error_threshold → OPEN
|
|
363
|
+
```
|
|
364
|
+
|
|
365
|
+
The error rate comes from overload — excess RPS beyond capacity is treated as errors.
|
|
366
|
+
|
|
367
|
+
**Latency trigger:**
|
|
368
|
+
|
|
369
|
+
```
|
|
370
|
+
if cumulative_latency_ms > cb_latency_threshold_ms → OPEN
|
|
371
|
+
```
|
|
372
|
+
|
|
373
|
+
Both triggers can be combined. The breaker opens if **either** fires.
|
|
374
|
+
|
|
375
|
+
### Serverless — `concurrency`, `duration-ms`, `cold-start-ms`
|
|
376
|
+
|
|
377
|
+
Serverless functions use a different capacity model based on concurrency and execution time:
|
|
378
|
+
|
|
379
|
+
```
|
|
380
|
+
ProcessOrder
|
|
381
|
+
concurrency 1000
|
|
382
|
+
duration-ms 200
|
|
383
|
+
cold-start-ms 800
|
|
384
|
+
```
|
|
385
|
+
|
|
386
|
+
**Capacity:** `concurrency / (duration_ms / 1000) = 1000 / 0.2 = 5,000 RPS`
|
|
387
|
+
|
|
388
|
+
**Cold starts:** DGMO splits traffic into two paths for percentile computation:
|
|
389
|
+
|
|
390
|
+
- 95% warm → latency = `duration-ms`
|
|
391
|
+
- 5% cold → latency = `duration-ms + cold-start-ms`
|
|
392
|
+
|
|
393
|
+
This means cold starts primarily affect p99 latency: a function with `duration-ms: 200` and `cold-start-ms: 800` shows p50 ≈ 200ms but p99 ≈ 1,000ms.
|
|
394
|
+
|
|
395
|
+
> **Mutual exclusion:** `concurrency` cannot be combined with `instances` or `max-rps`. A component is either serverless or traditional.
|
|
396
|
+
|
|
397
|
+
### Queues — `buffer`, `drain-rate`, `retention-hours`, `partitions`
|
|
398
|
+
|
|
399
|
+
Queues decouple producers from consumers. They absorb traffic bursts and **reset the latency boundary**.
|
|
400
|
+
|
|
401
|
+
```
|
|
402
|
+
OrderQueue
|
|
403
|
+
buffer 50000
|
|
404
|
+
drain-rate 1000
|
|
405
|
+
retention-hours 72
|
|
406
|
+
-> Worker
|
|
407
|
+
```
|
|
408
|
+
|
|
409
|
+
| Property | What it does |
|
|
410
|
+
| ----------------- | ---------------------------------------------------- |
|
|
411
|
+
| `buffer` | Max queue depth (messages). Determines overflow risk |
|
|
412
|
+
| `drain-rate` | Messages consumed per second. Caps downstream RPS |
|
|
413
|
+
| `retention-hours` | Message retention duration (informational) |
|
|
414
|
+
| `partitions` | Number of partitions (informational) |
|
|
415
|
+
|
|
416
|
+
**How queues transform traffic:**
|
|
417
|
+
|
|
418
|
+
1. **RPS capping** — downstream receives at most `drain-rate` RPS, regardless of inbound volume
|
|
419
|
+
2. **Overflow** — when `inbound_rps > drain_rate`, the buffer fills:
|
|
420
|
+
```
|
|
421
|
+
fill_rate = inbound_rps − drain_rate
|
|
422
|
+
time_to_overflow = buffer / fill_rate (seconds)
|
|
423
|
+
```
|
|
424
|
+
3. **Latency reset** — queues break the cumulative latency chain. Downstream latency starts from queue wait time:
|
|
425
|
+
```
|
|
426
|
+
wait_time_ms = (fill_rate / drain_rate) × 1000
|
|
427
|
+
```
|
|
428
|
+
4. **Availability decoupling** — producer and consumer sides have independent availability
|
|
429
|
+
|
|
430
|
+
> **Mutual exclusion:** `buffer` cannot be combined with `max-rps`. A component is either a queue or a service.
|
|
431
|
+
|
|
432
|
+
---
|
|
433
|
+
|
|
434
|
+
## Groups
|
|
435
|
+
|
|
436
|
+
Groups represent clusters, pods, or replica sets. Wrap components in `[Group Name]`:
|
|
437
|
+
|
|
438
|
+
```
|
|
439
|
+
[API Cluster]
|
|
440
|
+
instances 3
|
|
441
|
+
APIServer
|
|
442
|
+
max-rps 500
|
|
443
|
+
latency-ms 45
|
|
444
|
+
-> DB
|
|
445
|
+
```
|
|
446
|
+
|
|
447
|
+
### Group properties
|
|
448
|
+
|
|
449
|
+
| Property | Description |
|
|
450
|
+
| ----------- | --------------------------------------------- |
|
|
451
|
+
| `instances` | Multiplier on child capacity. Can be a range. |
|
|
452
|
+
| `collapsed` | Visual hint — start collapsed in the app |
|
|
453
|
+
|
|
454
|
+
The group's `instances` acts as a **capacity multiplier** on its children. If `APIServer` has `max-rps: 500` and the group has `instances: 3`, total capacity is `500 × 3 = 1,500 RPS`.
|
|
455
|
+
|
|
456
|
+
### Connecting to groups
|
|
457
|
+
|
|
458
|
+
Traffic sent to a group is distributed to its children:
|
|
459
|
+
|
|
460
|
+
```
|
|
461
|
+
LB
|
|
462
|
+
-/api-> [API Cluster] | split: 60%
|
|
463
|
+
```
|
|
464
|
+
|
|
465
|
+
### Bottleneck capacity
|
|
466
|
+
|
|
467
|
+
When a group contains multiple components in a chain, the group's effective capacity is the **bottleneck** — the minimum capacity among its children:
|
|
468
|
+
|
|
469
|
+
```
|
|
470
|
+
[Backend Pod]
|
|
471
|
+
instances 3
|
|
472
|
+
API // max-rps 500 per instance
|
|
473
|
+
max-rps 500
|
|
474
|
+
-> Cache
|
|
475
|
+
Cache // max-rps 2000 per instance
|
|
476
|
+
max-rps 2000
|
|
477
|
+
```
|
|
478
|
+
|
|
479
|
+
Effective capacity: `500 × 3 = 1,500` (bottlenecked on API, not Cache).
|
|
480
|
+
|
|
481
|
+
### Queue scaling in groups
|
|
482
|
+
|
|
483
|
+
For queues inside groups, `drain-rate` scales with group instances (more consumers = faster draining), but `buffer` does **not** scale (fixed capacity per queue).
|
|
484
|
+
|
|
485
|
+
---
|
|
486
|
+
|
|
487
|
+
## Tags
|
|
488
|
+
|
|
489
|
+
Tags add metadata dimensions — team ownership, environment, region. They appear as colored badges in the legend.
|
|
490
|
+
|
|
491
|
+
```
|
|
492
|
+
tag Team alias t
|
|
493
|
+
Backend(blue)
|
|
494
|
+
Platform(teal) default
|
|
495
|
+
Data(violet)
|
|
496
|
+
|
|
497
|
+
CDN | t: Platform
|
|
498
|
+
API | t: Backend
|
|
499
|
+
DB | t: Data
|
|
500
|
+
```
|
|
501
|
+
|
|
502
|
+
- `tag Name alias x` — declare a tag group with optional shorthand
|
|
503
|
+
- `Value(color)` — a tag value with its color
|
|
504
|
+
- `default` — auto-applies to components without this tag
|
|
505
|
+
- `| alias: Value` — assign inline on a component
|
|
506
|
+
|
|
507
|
+
---
|
|
508
|
+
|
|
509
|
+
## How Calculations Work
|
|
510
|
+
|
|
511
|
+
DGMO runs a full traffic simulation. This section explains the math so you can predict what the diagram will show.
|
|
512
|
+
|
|
513
|
+
### How RPS Propagates
|
|
514
|
+
|
|
515
|
+
Traffic flows from the edge via breadth-first traversal. At each component, behaviors transform the RPS in order:
|
|
516
|
+
|
|
517
|
+
1. **Cache:** `rps = rps × (1 − cache_hit / 100)`
|
|
518
|
+
2. **Firewall:** `rps = rps × (1 − firewall_block / 100)`
|
|
519
|
+
3. **Rate limiter:** `rps = min(rps, ratelimit_rps)`
|
|
520
|
+
4. **Queue:** `rps = min(rps, drain_rate × group_instances)`
|
|
521
|
+
|
|
522
|
+
Then the post-behavior RPS is split across outbound edges by their split percentages. If a node receives traffic from multiple sources, values are summed.
|
|
523
|
+
|
|
524
|
+
**Example:**
|
|
525
|
+
|
|
526
|
+
```
|
|
527
|
+
Edge (100K) → CDN (cache 80%) → 20K → WAF (block 5%) → 19K → LB
|
|
528
|
+
LB splits: /api 60% → 11,400 /static 40% → 7,600
|
|
529
|
+
```
|
|
530
|
+
|
|
531
|
+
### How Latency Is Computed
|
|
532
|
+
|
|
533
|
+
Latency accumulates along the path from edge to each leaf:
|
|
534
|
+
|
|
535
|
+
- Each component adds its `latency-ms`
|
|
536
|
+
- Multiple incoming paths → DGMO takes the **maximum** (worst case)
|
|
537
|
+
- Queues **reset** the chain — downstream starts from queue wait time
|
|
538
|
+
|
|
539
|
+
**Percentiles (p50 / p90 / p99):** DGMO collects all edge-to-leaf paths, weights them by traffic volume, sorts by latency, and interpolates at the 50th/90th/99th weight thresholds.
|
|
540
|
+
|
|
541
|
+
For serverless with cold starts, each path splits into a 95% warm sub-path and a 5% cold sub-path. Cold starts therefore affect p99 more than p50.
|
|
542
|
+
|
|
543
|
+
### How Availability Is Computed
|
|
544
|
+
|
|
545
|
+
Availability has two layers:
|
|
546
|
+
|
|
547
|
+
**1. Uptime (path-based):** Product of all `uptime` values along the path from edge:
|
|
548
|
+
|
|
549
|
+
```
|
|
550
|
+
path_uptime = product(each component's uptime / 100)
|
|
551
|
+
```
|
|
552
|
+
|
|
553
|
+
Multiple paths → take the minimum (most conservative).
|
|
554
|
+
|
|
555
|
+
**2. Local availability (load-dependent):** Each component's availability based on current load:
|
|
556
|
+
|
|
557
|
+
- **Under capacity:** 1.0 (100%)
|
|
558
|
+
- **Overloaded:** `capacity / inbound_rps` (degrades linearly)
|
|
559
|
+
- **Rate-limited:** `ratelimit_rps / effective_inbound_rps`
|
|
560
|
+
- **Queue filling (overflow < 60s):** `drain_rate / inbound_rps`
|
|
561
|
+
|
|
562
|
+
**3. Compound:** Product of all local availabilities along the path.
|
|
563
|
+
|
|
564
|
+
Queues decouple availability — consumer-side doesn't inherit producer overload.
|
|
565
|
+
|
|
566
|
+
### How Circuit Breakers Trip
|
|
567
|
+
|
|
568
|
+
| Condition | Check | Trips when |
|
|
569
|
+
| ---------- | ------------------------------- | --------------------------- |
|
|
570
|
+
| Error rate | `(rps − capacity) / rps × 100` | `≥ cb-error-threshold` |
|
|
571
|
+
| Latency | cumulative latency at this node | `> cb-latency-threshold-ms` |
|
|
572
|
+
|
|
573
|
+
Either condition triggers the breaker to open.
|
|
574
|
+
|
|
575
|
+
### How Queue Metrics Are Derived
|
|
576
|
+
|
|
577
|
+
| Metric | Formula | Meaning |
|
|
578
|
+
| ---------------- | ---------------------------------- | -------------------------- |
|
|
579
|
+
| Fill rate | `max(0, inbound_rps − drain_rate)` | Queue growth speed (msg/s) |
|
|
580
|
+
| Time to overflow | `buffer / fill_rate` | Seconds until full |
|
|
581
|
+
| Wait time | `(fill_rate / drain_rate) × 1000` | Per-message wait (ms) |
|
|
582
|
+
|
|
583
|
+
If `fill_rate = 0` (drain keeps up), time to overflow is infinite and wait time is 0.
|
|
584
|
+
|
|
585
|
+
---
|
|
586
|
+
|
|
587
|
+
## Validation
|
|
588
|
+
|
|
589
|
+
DGMO validates your diagram and reports diagnostics:
|
|
590
|
+
|
|
591
|
+
| Check | What it catches |
|
|
592
|
+
| ------------------ | ------------------------------------------------------------------ |
|
|
593
|
+
| Cycle detection | Circular connections (must be a DAG) |
|
|
594
|
+
| Split sum | Split percentages not adding to 100% |
|
|
595
|
+
| Orphan detection | Components not reachable from the edge |
|
|
596
|
+
| Overload | RPS exceeding component capacity |
|
|
597
|
+
| Rate-limit excess | Inbound RPS exceeding the rate limiter |
|
|
598
|
+
| System uptime | Overall uptime below 99% |
|
|
599
|
+
| Property conflicts | Mixing incompatible properties (e.g., `concurrency` + `instances`) |
|
|
600
|
+
|
|
601
|
+
---
|
|
602
|
+
|
|
603
|
+
## Property Quick Reference
|
|
604
|
+
|
|
605
|
+
| Property | Type | Valid on | Effect |
|
|
606
|
+
| ------------------------- | -------------- | -------------- | -------------------------------------------------------------------- |
|
|
607
|
+
| `description` | text | Non-edge | Prose annotation shown on click (display only, no simulation effect) |
|
|
608
|
+
| `rps` | number | Edge only | Total inbound requests per second |
|
|
609
|
+
| `cache-hit` | percentage | Any | Fraction served from cache, not forwarded |
|
|
610
|
+
| `firewall-block` | percentage | Any | Fraction dropped (blocked) |
|
|
611
|
+
| `ratelimit-rps` | number | Any | Max RPS forwarded; excess rejected |
|
|
612
|
+
| `max-rps` | number | Non-queue | Per-instance max throughput |
|
|
613
|
+
| `instances` | number / range | Non-serverless | Replica count (e.g., `3` or `1-8`) |
|
|
614
|
+
| `latency-ms` | number | Any | Response time (ms) |
|
|
615
|
+
| `uptime` | percentage | Any | Component reliability |
|
|
616
|
+
| `cb-error-threshold` | percentage | Any | CB trips at this error rate |
|
|
617
|
+
| `cb-latency-threshold-ms` | number | Any | CB trips when cumulative latency exceeds |
|
|
618
|
+
| `concurrency` | number | Serverless | Max concurrent executions |
|
|
619
|
+
| `duration-ms` | number | Serverless | Execution time per invocation |
|
|
620
|
+
| `cold-start-ms` | number | Serverless | Extra latency on cold starts |
|
|
621
|
+
| `buffer` | number | Queue | Max queue depth (messages) |
|
|
622
|
+
| `drain-rate` | number | Queue | Messages consumed per second |
|
|
623
|
+
| `retention-hours` | number | Queue | Message retention (informational) |
|
|
624
|
+
| `partitions` | number | Queue | Partition count (informational) |
|
|
625
|
+
|
|
626
|
+
## Comments
|
|
627
|
+
|
|
628
|
+
```
|
|
629
|
+
// This line is ignored by the parser
|
|
630
|
+
```
|
|
631
|
+
|
|
632
|
+
## Complete Example
|
|
633
|
+
|
|
634
|
+
```dgmo
|
|
635
|
+
infra E-Commerce Platform
|
|
636
|
+
|
|
637
|
+
tag Team alias t
|
|
638
|
+
Backend(blue)
|
|
639
|
+
Platform(teal) default
|
|
640
|
+
Data(violet)
|
|
641
|
+
|
|
642
|
+
Edge
|
|
643
|
+
rps 100000
|
|
644
|
+
-> CloudFront
|
|
645
|
+
|
|
646
|
+
CloudFront | t: Platform
|
|
647
|
+
cache-hit 80%
|
|
648
|
+
-> WAF
|
|
649
|
+
|
|
650
|
+
WAF | t: Platform
|
|
651
|
+
firewall-block 5%
|
|
652
|
+
-> ALB
|
|
653
|
+
|
|
654
|
+
ALB | t: Platform
|
|
655
|
+
-/api-> [API Pods] | split: 60%
|
|
656
|
+
-/purchase-> [Commerce Pods] | split: 30%
|
|
657
|
+
-/static-> StaticServer | split: 10%
|
|
658
|
+
|
|
659
|
+
[API Pods]
|
|
660
|
+
instances 3
|
|
661
|
+
APIServer | t: Backend
|
|
662
|
+
description Core REST API — auth, orders, user data
|
|
663
|
+
max-rps 500
|
|
664
|
+
latency-ms 45
|
|
665
|
+
cb-error-threshold 50%
|
|
666
|
+
-> OrderDB
|
|
667
|
+
|
|
668
|
+
[Commerce Pods]
|
|
669
|
+
PurchaseMS | t: Backend
|
|
670
|
+
description Checkout and payment processing
|
|
671
|
+
instances 1-8
|
|
672
|
+
max-rps 300
|
|
673
|
+
latency-ms 120
|
|
674
|
+
-> OrderQueue
|
|
675
|
+
|
|
676
|
+
OrderDB | t: Data
|
|
677
|
+
description Primary Postgres — orders and inventory
|
|
678
|
+
latency-ms 8
|
|
679
|
+
uptime 99.99%
|
|
680
|
+
|
|
681
|
+
OrderQueue
|
|
682
|
+
buffer 50000
|
|
683
|
+
drain-rate 1000
|
|
684
|
+
retention-hours 72
|
|
685
|
+
-> Worker
|
|
686
|
+
|
|
687
|
+
Worker | t: Backend
|
|
688
|
+
instances 3
|
|
689
|
+
max-rps 400
|
|
690
|
+
latency-ms 100
|
|
691
|
+
|
|
692
|
+
StaticServer | t: Platform
|
|
693
|
+
latency-ms 5
|
|
694
|
+
```
|