@teckedd-code2save/b2dp 1.0.1 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +43 -119
- package/dist/index.js +29 -10
- package/dist/index.js.map +1 -1
- package/package.json +2 -1
- package/skills/api-test-generator/SKILL.md +72 -0
- package/skills/business-to-data-platform/SKILL.md +206 -0
- package/skills/cloud-solution-architect/SKILL.md +317 -0
- package/skills/cloud-solution-architect/references/acceptance-criteria.md +436 -0
- package/skills/cloud-solution-architect/references/architecture-styles.md +365 -0
- package/skills/cloud-solution-architect/references/best-practices.md +311 -0
- package/skills/cloud-solution-architect/references/design-patterns.md +873 -0
- package/skills/cloud-solution-architect/references/design-principles.md +328 -0
- package/skills/cloud-solution-architect/references/mission-critical.md +285 -0
- package/skills/cloud-solution-architect/references/performance-antipatterns.md +242 -0
- package/skills/cloud-solution-architect/references/technology-choices.md +159 -0
- package/skills/context7-mcp/SKILL.md +53 -0
- package/skills/frontend-data-consumer/SKILL.md +75 -0
- package/skills/frontend-design-review/SKILL.md +138 -0
- package/skills/frontend-design-review/references/pattern-examples.md +21 -0
- package/skills/frontend-design-review/references/quick-checklist.md +38 -0
- package/skills/frontend-design-review/references/review-output-format.md +68 -0
- package/skills/frontend-design-review/references/review-type-modifiers.md +31 -0
- package/skills/infrastructure-as-code-architect/SKILL.md +56 -0
|
@@ -0,0 +1,365 @@
|
|
|
1
|
+
# Azure Architecture Styles Reference
|
|
2
|
+
|
|
3
|
+
## Comparison Table
|
|
4
|
+
|
|
5
|
+
| Style | Dependency Management | Domain Type |
|
|
6
|
+
|---|---|---|
|
|
7
|
+
| N-tier | Horizontal tiers divided by subnet | Traditional business, low update frequency |
|
|
8
|
+
| Web-Queue-Worker | Front/back-end decoupled by async messaging | Simple domain, resource-intensive tasks |
|
|
9
|
+
| Microservices | Vertically decomposed services via APIs | Complex domain, frequent updates |
|
|
10
|
+
| Event-driven | Producer/consumer, independent views | IoT, real-time systems |
|
|
11
|
+
| Big data | Divide into small chunks, parallel processing | Batch/real-time data analysis, ML |
|
|
12
|
+
| Big compute | Data allocation to thousands of cores | Compute-intensive (simulation) |
|
|
13
|
+
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
## 1. N-tier
|
|
17
|
+
|
|
18
|
+
Traditional architecture that divides an application into logical layers and physical tiers. Each layer has a specific responsibility and communicates only with the layer directly below it.
|
|
19
|
+
|
|
20
|
+
### Logical Diagram
|
|
21
|
+
|
|
22
|
+
```
|
|
23
|
+
┌──────────────────────────────────┐
|
|
24
|
+
│ Presentation Tier │ ← Web / UI
|
|
25
|
+
│ (Subnet A) │
|
|
26
|
+
├──────────────────────────────────┤
|
|
27
|
+
│ Business Logic Tier │ ← Rules / Workflows
|
|
28
|
+
│ (Subnet B) │
|
|
29
|
+
├──────────────────────────────────┤
|
|
30
|
+
│ Data Access Tier │ ← Database / Storage
|
|
31
|
+
│ (Subnet C) │
|
|
32
|
+
└──────────────────────────────────┘
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
### Benefits
|
|
36
|
+
|
|
37
|
+
- Familiar pattern for most development teams
|
|
38
|
+
- Natural mapping for migrating existing layered applications to Azure
|
|
39
|
+
- Clear separation of concerns between tiers
|
|
40
|
+
|
|
41
|
+
### Challenges
|
|
42
|
+
|
|
43
|
+
- Horizontal layering makes cross-cutting changes difficult — a single feature may touch every tier
|
|
44
|
+
- Limits agility and release velocity as tiers are tightly coupled vertically
|
|
45
|
+
|
|
46
|
+
### Best Practices
|
|
47
|
+
|
|
48
|
+
- Use VNet subnets to isolate tiers and control traffic flow with NSGs
|
|
49
|
+
- Keep each tier stateless where possible to enable horizontal scaling
|
|
50
|
+
- Use managed services (App Service, Azure SQL) to reduce operational overhead
|
|
51
|
+
|
|
52
|
+
### Dependency Management
|
|
53
|
+
|
|
54
|
+
Horizontal tiers divided by subnet. Each tier depends only on the tier directly below it, enforced through network segmentation.
|
|
55
|
+
|
|
56
|
+
### Recommended Azure Services
|
|
57
|
+
|
|
58
|
+
- Azure App Service
|
|
59
|
+
- Azure SQL Database
|
|
60
|
+
- Azure Virtual Machines
|
|
61
|
+
- Azure Virtual Network (subnets)
|
|
62
|
+
|
|
63
|
+
---
|
|
64
|
+
|
|
65
|
+
## 2. Web-Queue-Worker
|
|
66
|
+
|
|
67
|
+
A web front end handles HTTP requests while a worker process performs resource-intensive or long-running tasks. The two components communicate through an asynchronous message queue.
|
|
68
|
+
|
|
69
|
+
### Logical Diagram
|
|
70
|
+
|
|
71
|
+
```
|
|
72
|
+
┌───────────┐
|
|
73
|
+
HTTP ─────────►│ Web │
|
|
74
|
+
Requests │ Front End │
|
|
75
|
+
└─────┬─────┘
|
|
76
|
+
│
|
|
77
|
+
▼
|
|
78
|
+
┌──────────────┐
|
|
79
|
+
│ Message │
|
|
80
|
+
│ Queue │
|
|
81
|
+
└──────┬───────┘
|
|
82
|
+
│
|
|
83
|
+
▼
|
|
84
|
+
┌───────────┐
|
|
85
|
+
│ Worker │
|
|
86
|
+
│ Process │
|
|
87
|
+
└─────┬─────┘
|
|
88
|
+
│
|
|
89
|
+
▼
|
|
90
|
+
┌───────────┐
|
|
91
|
+
│ Database │
|
|
92
|
+
└───────────┘
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
### Benefits
|
|
96
|
+
|
|
97
|
+
- Easy to understand and deploy, especially with managed compute services
|
|
98
|
+
- Clean separation between interactive and background workloads
|
|
99
|
+
- Each component can scale independently
|
|
100
|
+
|
|
101
|
+
### Challenges
|
|
102
|
+
|
|
103
|
+
- Without careful design, the front end and worker can become monolithic components that are hard to maintain and update
|
|
104
|
+
- Hidden dependencies may emerge if front end and worker share data schemas or storage
|
|
105
|
+
|
|
106
|
+
### Best Practices
|
|
107
|
+
|
|
108
|
+
- Keep the web front end thin — delegate heavy processing to the worker
|
|
109
|
+
- Use durable message queues to ensure work is not lost on failure
|
|
110
|
+
- Design idempotent worker operations to handle message retries safely
|
|
111
|
+
|
|
112
|
+
### Dependency Management
|
|
113
|
+
|
|
114
|
+
Front-end and back-end jobs are decoupled by asynchronous messaging. The web tier never calls the worker directly; all communication flows through the queue.
|
|
115
|
+
|
|
116
|
+
### Recommended Azure Services
|
|
117
|
+
|
|
118
|
+
- Azure App Service
|
|
119
|
+
- Azure Functions
|
|
120
|
+
- Azure Queue Storage
|
|
121
|
+
- Azure Service Bus
|
|
122
|
+
|
|
123
|
+
---
|
|
124
|
+
|
|
125
|
+
## 3. Microservices
|
|
126
|
+
|
|
127
|
+
A collection of small, autonomous services where each service implements a single business capability. Each service owns its bounded context and data, and communicates with other services via well-defined APIs.
|
|
128
|
+
|
|
129
|
+
### Logical Diagram
|
|
130
|
+
|
|
131
|
+
```
|
|
132
|
+
┌──────────┐ ┌──────────┐ ┌──────────┐
|
|
133
|
+
│ Service │ │ Service │ │ Service │
|
|
134
|
+
│ A │ │ B │ │ C │
|
|
135
|
+
│ ┌──────┐ │ │ ┌──────┐ │ │ ┌──────┐ │
|
|
136
|
+
│ │ Data │ │ │ │ Data │ │ │ │ Data │ │
|
|
137
|
+
│ └──────┘ │ │ └──────┘ │ │ └──────┘ │
|
|
138
|
+
└────┬─────┘ └────┬─────┘ └────┬─────┘
|
|
139
|
+
│ │ │
|
|
140
|
+
└──────┬───────┘──────────────┘
|
|
141
|
+
▼
|
|
142
|
+
┌──────────────┐
|
|
143
|
+
│ API Gateway │
|
|
144
|
+
└──────┬───────┘
|
|
145
|
+
│
|
|
146
|
+
Clients
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
### Benefits
|
|
150
|
+
|
|
151
|
+
- Autonomous teams can develop, deploy, and scale services independently
|
|
152
|
+
- Enables frequent updates and higher release velocity
|
|
153
|
+
- Technology diversity — each service can use the stack best suited to its task
|
|
154
|
+
|
|
155
|
+
### Challenges
|
|
156
|
+
|
|
157
|
+
- Service discovery and inter-service communication add complexity
|
|
158
|
+
- Data consistency across services requires patterns like Saga or eventual consistency
|
|
159
|
+
- Distributed system management (monitoring, debugging, tracing) is inherently harder
|
|
160
|
+
|
|
161
|
+
### Best Practices
|
|
162
|
+
|
|
163
|
+
- Define clear bounded contexts — avoid sharing databases between services
|
|
164
|
+
- Use an API gateway for cross-cutting concerns (auth, rate limiting, routing)
|
|
165
|
+
- Implement health checks, circuit breakers, and distributed tracing from day one
|
|
166
|
+
|
|
167
|
+
### Dependency Management
|
|
168
|
+
|
|
169
|
+
Vertically decomposed services calling each other via APIs. Each service is independently deployable with its own data store, minimizing coupling.
|
|
170
|
+
|
|
171
|
+
### Recommended Azure Services
|
|
172
|
+
|
|
173
|
+
- Azure Kubernetes Service (AKS)
|
|
174
|
+
- Azure Container Apps
|
|
175
|
+
- Azure API Management
|
|
176
|
+
- Azure Service Bus
|
|
177
|
+
- Azure Cosmos DB
|
|
178
|
+
|
|
179
|
+
---
|
|
180
|
+
|
|
181
|
+
## 4. Event-driven
|
|
182
|
+
|
|
183
|
+
A publish-subscribe architecture where event producers emit events and event consumers react to them. Producers and consumers are fully decoupled, communicating only through event channels or brokers.
|
|
184
|
+
|
|
185
|
+
### Logical Diagram
|
|
186
|
+
|
|
187
|
+
```
|
|
188
|
+
┌──────────┐ ┌──────────────────┐ ┌──────────┐
|
|
189
|
+
│ Producer │────►│ │────►│ Consumer │
|
|
190
|
+
│ A │ │ Event Broker │ │ A │
|
|
191
|
+
└──────────┘ │ / Channel │ └──────────┘
|
|
192
|
+
│ │
|
|
193
|
+
┌──────────┐ │ ┌────────────┐ │ ┌──────────┐
|
|
194
|
+
│ Producer │────►│ │ Pub/Sub │ │────►│ Consumer │
|
|
195
|
+
│ B │ │ │ or Stream │ │ │ B │
|
|
196
|
+
└──────────┘ │ └────────────┘ │ └──────────┘
|
|
197
|
+
└──────────────────┘
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
**Two models:** Pub/Sub (events delivered to subscribers) and Event Streaming (events written to an ordered log for consumers to read).
|
|
201
|
+
|
|
202
|
+
**Consumer variations:** Simple event processing, basic correlation, complex event processing, event stream processing.
|
|
203
|
+
|
|
204
|
+
### Benefits
|
|
205
|
+
|
|
206
|
+
- Producers and consumers are fully decoupled — they can evolve independently
|
|
207
|
+
- Highly scalable — add consumers without affecting producers
|
|
208
|
+
- Responsive and well-suited to real-time processing pipelines
|
|
209
|
+
|
|
210
|
+
### Challenges
|
|
211
|
+
|
|
212
|
+
- Guaranteed delivery requires careful broker configuration and dead-letter handling
|
|
213
|
+
- Event ordering can be difficult to maintain across partitions
|
|
214
|
+
- Eventual consistency — consumers may see stale data temporarily
|
|
215
|
+
- Error handling and poison message management add operational complexity
|
|
216
|
+
|
|
217
|
+
### Best Practices
|
|
218
|
+
|
|
219
|
+
- Design events as immutable facts with clear schemas
|
|
220
|
+
- Use dead-letter queues for events that fail processing
|
|
221
|
+
- Implement idempotent consumers to handle duplicate delivery safely
|
|
222
|
+
|
|
223
|
+
### Dependency Management
|
|
224
|
+
|
|
225
|
+
Producer/consumer model with independent views per subsystem. Producers have no knowledge of consumers; each subsystem maintains its own projection of the event stream.
|
|
226
|
+
|
|
227
|
+
### Recommended Azure Services
|
|
228
|
+
|
|
229
|
+
- Azure Event Grid
|
|
230
|
+
- Azure Event Hubs
|
|
231
|
+
- Azure Functions
|
|
232
|
+
- Azure Service Bus
|
|
233
|
+
- Azure Stream Analytics
|
|
234
|
+
|
|
235
|
+
---
|
|
236
|
+
|
|
237
|
+
## 5. Big Data
|
|
238
|
+
|
|
239
|
+
Architecture designed to handle ingestion, processing, and analysis of data that is too large or complex for traditional database systems.
|
|
240
|
+
|
|
241
|
+
### Logical Diagram
|
|
242
|
+
|
|
243
|
+
```
|
|
244
|
+
┌─────────────┐ ┌──────────────────────────────────┐
|
|
245
|
+
│ Data Sources│───►│ Data Storage │
|
|
246
|
+
│ (logs, IoT, │ │ (Data Lake) │
|
|
247
|
+
│ files) │ └──┬──────────────┬─────────────────┘
|
|
248
|
+
└─────────────┘ │ │
|
|
249
|
+
▼ ▼
|
|
250
|
+
┌──────────────┐ ┌──────────────┐
|
|
251
|
+
│ Batch │ │ Real-time │
|
|
252
|
+
│ Processing │ │ Processing │
|
|
253
|
+
└──────┬───────┘ └──────┬───────┘
|
|
254
|
+
│ │
|
|
255
|
+
▼ ▼
|
|
256
|
+
┌───────────────────────────────┐
|
|
257
|
+
│ Analytical Data Store │
|
|
258
|
+
└──────────────┬────────────────┘
|
|
259
|
+
│
|
|
260
|
+
┌──────────────▼────────────────┐
|
|
261
|
+
│ Analysis & Reporting │
|
|
262
|
+
│ (Dashboards, ML Models) │
|
|
263
|
+
└───────────────────────────────┘
|
|
264
|
+
|
|
265
|
+
Orchestration manages the full pipeline
|
|
266
|
+
```
|
|
267
|
+
|
|
268
|
+
**Components:** Data sources → Data storage (data lake) → Batch processing → Real-time processing → Analytical data store → Analysis and reporting → Orchestration.
|
|
269
|
+
|
|
270
|
+
### Benefits
|
|
271
|
+
|
|
272
|
+
- Process massive datasets that exceed traditional database capacity
|
|
273
|
+
- Support both batch and real-time analytics in a single architecture
|
|
274
|
+
- Enable predictive analytics and machine learning at scale
|
|
275
|
+
|
|
276
|
+
### Challenges
|
|
277
|
+
|
|
278
|
+
- Complexity of coordinating batch and real-time processing paths
|
|
279
|
+
- Data quality and governance across a data lake require disciplined schema management
|
|
280
|
+
- Cost management — large-scale storage and compute can grow unpredictably
|
|
281
|
+
|
|
282
|
+
### Best Practices
|
|
283
|
+
|
|
284
|
+
- Use parallelism for both batch and real-time processing
|
|
285
|
+
- Partition data to enable parallel reads and writes
|
|
286
|
+
- Apply schema-on-read semantics to keep ingestion flexible
|
|
287
|
+
- Process data in batches on arrival rather than waiting for scheduled windows
|
|
288
|
+
- Balance usage costs against time-to-insight requirements
|
|
289
|
+
|
|
290
|
+
### Dependency Management
|
|
291
|
+
|
|
292
|
+
Divide huge datasets into small chunks for parallel processing. Each chunk can be processed independently, with an orchestration layer coordinating the overall pipeline.
|
|
293
|
+
|
|
294
|
+
### Recommended Azure Services
|
|
295
|
+
|
|
296
|
+
- Microsoft Fabric
|
|
297
|
+
- Azure Data Lake Storage
|
|
298
|
+
- Azure Event Hubs
|
|
299
|
+
- Azure SQL Database
|
|
300
|
+
- Azure Cosmos DB
|
|
301
|
+
- Power BI
|
|
302
|
+
|
|
303
|
+
---
|
|
304
|
+
|
|
305
|
+
## 6. Big Compute
|
|
306
|
+
|
|
307
|
+
Architecture for large-scale workloads that require hundreds or thousands of cores running in parallel. Tasks can be independent (embarrassingly parallel) or tightly coupled requiring inter-node communication.
|
|
308
|
+
|
|
309
|
+
### Logical Diagram
|
|
310
|
+
|
|
311
|
+
```
|
|
312
|
+
┌─────────────────────────────────────────────┐
|
|
313
|
+
│ Job Scheduler │
|
|
314
|
+
│ (submit, monitor, manage) │
|
|
315
|
+
└─────────────────┬───────────────────────────┘
|
|
316
|
+
│
|
|
317
|
+
┌─────────────┼─────────────┐
|
|
318
|
+
▼ ▼ ▼
|
|
319
|
+
┌────────┐ ┌────────┐ ┌────────────────┐
|
|
320
|
+
│ Core │ │ Core │ │ Core │
|
|
321
|
+
│ Pool 1 │ │ Pool 2 │ │ Pool N │
|
|
322
|
+
│(100s) │ │(100s) │ │(1000s of cores)│
|
|
323
|
+
└───┬────┘ └───┬────┘ └───┬────────────┘
|
|
324
|
+
│ │ │
|
|
325
|
+
└─────────┬─┘────────────┘
|
|
326
|
+
▼
|
|
327
|
+
┌──────────────┐
|
|
328
|
+
│ Results │
|
|
329
|
+
│ Storage │
|
|
330
|
+
└──────────────┘
|
|
331
|
+
```
|
|
332
|
+
|
|
333
|
+
**Use cases:** Simulations, financial risk modeling, oil exploration, drug design, image rendering.
|
|
334
|
+
|
|
335
|
+
### Benefits
|
|
336
|
+
|
|
337
|
+
- High performance through massive parallel processing
|
|
338
|
+
- Access to specialized hardware (GPU, FPGA, InfiniBand) for compute-intensive workloads
|
|
339
|
+
- Scales to thousands of cores for embarrassingly parallel problems
|
|
340
|
+
|
|
341
|
+
### Challenges
|
|
342
|
+
|
|
343
|
+
- Managing VM infrastructure at scale (provisioning, patching, decommissioning)
|
|
344
|
+
- Provisioning thousands of cores in a timely manner to meet job deadlines
|
|
345
|
+
- Cost control — idle compute resources are expensive
|
|
346
|
+
|
|
347
|
+
### Best Practices
|
|
348
|
+
|
|
349
|
+
- Use low-priority or spot VMs to reduce cost for fault-tolerant workloads
|
|
350
|
+
- Auto-scale compute pools based on job queue depth
|
|
351
|
+
- Partition work into independent tasks when possible to maximize parallelism
|
|
352
|
+
|
|
353
|
+
### Dependency Management
|
|
354
|
+
|
|
355
|
+
Data allocation to thousands of cores. The job scheduler distributes work units across the compute pool, with each core processing its assigned data partition independently.
|
|
356
|
+
|
|
357
|
+
### Recommended Azure Services
|
|
358
|
+
|
|
359
|
+
- Azure Batch
|
|
360
|
+
- Microsoft HPC Pack
|
|
361
|
+
- H-series Virtual Machines (HPC-optimized)
|
|
362
|
+
|
|
363
|
+
---
|
|
364
|
+
|
|
365
|
+
> Source: [Azure Architecture Center](https://learn.microsoft.com/en-us/azure/architecture/)
|
|
@@ -0,0 +1,311 @@
|
|
|
1
|
+
# Cloud Application Best Practices
|
|
2
|
+
|
|
3
|
+
Twelve best practices from the Azure Architecture Center for designing, building, and operating cloud applications.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## 1. API Design
|
|
8
|
+
|
|
9
|
+
Design RESTful web APIs that promote platform independence and loose coupling between clients and services.
|
|
10
|
+
|
|
11
|
+
### Key Recommendations
|
|
12
|
+
|
|
13
|
+
- Organize APIs around resources using nouns, not verbs, in URIs
|
|
14
|
+
- Use standard HTTP methods (GET, POST, PUT, PATCH, DELETE) with correct semantics
|
|
15
|
+
- Use plural nouns for collection endpoints (e.g., `/orders`, `/customers`)
|
|
16
|
+
- Support HATEOAS to enable client navigation of the API without prior knowledge
|
|
17
|
+
- Design coarse-grained operations to avoid chatty request patterns
|
|
18
|
+
- Do not expose internal database structure through the API surface
|
|
19
|
+
- Version APIs to manage breaking changes without disrupting existing clients
|
|
20
|
+
- Return appropriate HTTP status codes and consistent error response bodies
|
|
21
|
+
|
|
22
|
+
### WAF Pillar Alignment
|
|
23
|
+
|
|
24
|
+
Performance Efficiency · Operational Excellence
|
|
25
|
+
|
|
26
|
+
### Common Mistakes
|
|
27
|
+
|
|
28
|
+
- Using verbs in URIs (e.g., `/getOrders`) instead of resource-based paths
|
|
29
|
+
- Exposing database schema directly through API contracts
|
|
30
|
+
- Creating chatty APIs that require multiple round-trips for a single logical operation
|
|
31
|
+
|
|
32
|
+
---
|
|
33
|
+
|
|
34
|
+
## 2. API Implementation
|
|
35
|
+
|
|
36
|
+
Implement web APIs to be efficient, responsive, scalable, and available for consuming clients.
|
|
37
|
+
|
|
38
|
+
### Key Recommendations
|
|
39
|
+
|
|
40
|
+
- Make actions idempotent so retries are safe (especially PUT and DELETE)
|
|
41
|
+
- Support content negotiation via `Accept` and `Content-Type` headers
|
|
42
|
+
- Follow the HTTP specification for status codes, methods, and headers
|
|
43
|
+
- Handle exceptions gracefully and return meaningful error responses
|
|
44
|
+
- Support resource discovery through links and metadata
|
|
45
|
+
- Limit and paginate large result sets to minimize network traffic
|
|
46
|
+
- Handle large requests asynchronously using `202 Accepted` with status polling
|
|
47
|
+
- Compress responses where appropriate to reduce payload size
|
|
48
|
+
|
|
49
|
+
### WAF Pillar Alignment
|
|
50
|
+
|
|
51
|
+
Operational Excellence
|
|
52
|
+
|
|
53
|
+
### Common Mistakes
|
|
54
|
+
|
|
55
|
+
- Not handling large requests asynchronously, causing timeouts
|
|
56
|
+
- Not minimizing network traffic through pagination, filtering, or compression
|
|
57
|
+
|
|
58
|
+
---
|
|
59
|
+
|
|
60
|
+
## 3. Autoscaling
|
|
61
|
+
|
|
62
|
+
Dynamically allocate and deallocate resources to match performance requirements while optimizing cost.
|
|
63
|
+
|
|
64
|
+
### Key Recommendations
|
|
65
|
+
|
|
66
|
+
- Use Azure Monitor autoscale and built-in platform autoscaling features
|
|
67
|
+
- Scale based on metrics that directly correlate with load (CPU, queue length, request rate)
|
|
68
|
+
- Combine schedule-based and metric-based scaling for predictable traffic patterns
|
|
69
|
+
- Set appropriate minimum, maximum, and default instance counts
|
|
70
|
+
- Configure scale-in rules as carefully as scale-out rules
|
|
71
|
+
- Use cooldown periods to prevent oscillation (flapping)
|
|
72
|
+
- Plan for the delay between triggering a scale event and resources becoming available
|
|
73
|
+
|
|
74
|
+
### WAF Pillar Alignment
|
|
75
|
+
|
|
76
|
+
Performance Efficiency · Cost Optimization
|
|
77
|
+
|
|
78
|
+
### Common Mistakes
|
|
79
|
+
|
|
80
|
+
- Not setting appropriate minimum and maximum limits for scaling
|
|
81
|
+
- Not considering scale-in behavior, leading to premature resource removal
|
|
82
|
+
- Using metrics that do not accurately reflect application load
|
|
83
|
+
|
|
84
|
+
---
|
|
85
|
+
|
|
86
|
+
## 4. Background Jobs
|
|
87
|
+
|
|
88
|
+
Implement batch processing, long-running tasks, and workflows as background jobs decoupled from the user interface.
|
|
89
|
+
|
|
90
|
+
### Key Recommendations
|
|
91
|
+
|
|
92
|
+
- Use Azure platform services such as Functions, WebJobs, and Batch for hosting
|
|
93
|
+
- Trigger background jobs with events, schedules, or message queues
|
|
94
|
+
- Return results to calling tasks through queues, events, or shared storage
|
|
95
|
+
- Design jobs to be independently deployable, scalable, and versioned
|
|
96
|
+
- Handle partial failures and support safe restarts with checkpointing
|
|
97
|
+
- Monitor job health with logging, metrics, and alerting
|
|
98
|
+
- Implement graceful shutdown to allow in-progress work to complete
|
|
99
|
+
|
|
100
|
+
### WAF Pillar Alignment
|
|
101
|
+
|
|
102
|
+
Operational Excellence
|
|
103
|
+
|
|
104
|
+
### Common Mistakes
|
|
105
|
+
|
|
106
|
+
- Not handling failures or partial completion within long-running jobs
|
|
107
|
+
- Not monitoring background job health, missing silent failures
|
|
108
|
+
|
|
109
|
+
---
|
|
110
|
+
|
|
111
|
+
## 5. Caching
|
|
112
|
+
|
|
113
|
+
Copy frequently read, rarely modified data to fast storage close to the application to improve performance.
|
|
114
|
+
|
|
115
|
+
### Key Recommendations
|
|
116
|
+
|
|
117
|
+
- Cache data that is read often but changes infrequently
|
|
118
|
+
- Use Azure Cache for Redis for distributed, high-throughput caching
|
|
119
|
+
- Set appropriate expiration policies (TTL) to balance freshness and hit rates
|
|
120
|
+
- Handle cache misses gracefully with a cache-aside pattern
|
|
121
|
+
- Address concurrency issues when multiple processes update the same cached data
|
|
122
|
+
- Implement cache invalidation strategies aligned with data change patterns
|
|
123
|
+
- Pre-populate caches for known hot data during application startup
|
|
124
|
+
|
|
125
|
+
### WAF Pillar Alignment
|
|
126
|
+
|
|
127
|
+
Performance Efficiency
|
|
128
|
+
|
|
129
|
+
### Common Mistakes
|
|
130
|
+
|
|
131
|
+
- Caching highly volatile data that expires before it can be served
|
|
132
|
+
- Not handling cache invalidation, serving stale data to users
|
|
133
|
+
- Cache stampede — many concurrent requests regenerating the same expired entry
|
|
134
|
+
|
|
135
|
+
---
|
|
136
|
+
|
|
137
|
+
## 6. CDN (Content Delivery Network)
|
|
138
|
+
|
|
139
|
+
Use CDNs to deliver static and dynamic web content efficiently to users from edge locations worldwide.
|
|
140
|
+
|
|
141
|
+
### Key Recommendations
|
|
142
|
+
|
|
143
|
+
- Offload static assets (images, scripts, stylesheets) to the CDN to reduce origin load
|
|
144
|
+
- Configure appropriate cache-control headers for each content type
|
|
145
|
+
- Version static content via file names or query strings for reliable cache busting
|
|
146
|
+
- Use HTTPS and enforce TLS for secure delivery
|
|
147
|
+
- Plan for CDN fallback so the application degrades gracefully if the CDN is unavailable
|
|
148
|
+
- Handle deployment and versioning so users receive updated content promptly
|
|
149
|
+
|
|
150
|
+
### WAF Pillar Alignment
|
|
151
|
+
|
|
152
|
+
Performance Efficiency
|
|
153
|
+
|
|
154
|
+
### Common Mistakes
|
|
155
|
+
|
|
156
|
+
- Not versioning content, causing users to receive stale cached assets after deployments
|
|
157
|
+
- Setting improper cache headers, resulting in under-caching or over-caching
|
|
158
|
+
|
|
159
|
+
---
|
|
160
|
+
|
|
161
|
+
## 7. Data Partitioning
|
|
162
|
+
|
|
163
|
+
Divide data stores into partitions to improve scalability, availability, and performance while reducing contention and storage costs.
|
|
164
|
+
|
|
165
|
+
### Key Recommendations
|
|
166
|
+
|
|
167
|
+
- Choose partition keys that distribute data and load evenly across partitions
|
|
168
|
+
- Use horizontal (sharding), vertical, or functional partitioning based on access patterns
|
|
169
|
+
- Minimize cross-partition queries to avoid performance degradation
|
|
170
|
+
- Design partitions to match the most common query patterns
|
|
171
|
+
- Plan for rebalancing as data volume and access patterns evolve
|
|
172
|
+
- Consider partition limits and throughput caps of the target data store
|
|
173
|
+
- Reduce contention and storage costs by separating hot and cold data
|
|
174
|
+
|
|
175
|
+
### WAF Pillar Alignment
|
|
176
|
+
|
|
177
|
+
Performance Efficiency · Cost Optimization
|
|
178
|
+
|
|
179
|
+
### Common Mistakes
|
|
180
|
+
|
|
181
|
+
- Creating hotspots by selecting a partition key with skewed distribution
|
|
182
|
+
- Not considering the cost and latency of cross-partition queries
|
|
183
|
+
|
|
184
|
+
---
|
|
185
|
+
|
|
186
|
+
## 8. Data Partitioning Strategies (by Service)
|
|
187
|
+
|
|
188
|
+
Apply service-specific partitioning strategies across Azure SQL Database, Azure Table Storage, Azure Blob Storage, and other data services.
|
|
189
|
+
|
|
190
|
+
### Key Recommendations
|
|
191
|
+
|
|
192
|
+
- Shard Azure SQL Database to distribute data for horizontal scaling
|
|
193
|
+
- Design Azure Table Storage partition keys around query access patterns
|
|
194
|
+
- Organize Azure Blob Storage using virtual directories and naming conventions
|
|
195
|
+
- Align partition boundaries with the most frequent query predicates
|
|
196
|
+
- Reduce latency by co-locating related data within the same partition
|
|
197
|
+
- Monitor partition metrics and rebalance when skew is detected
|
|
198
|
+
|
|
199
|
+
### WAF Pillar Alignment
|
|
200
|
+
|
|
201
|
+
Performance Efficiency · Cost Optimization
|
|
202
|
+
|
|
203
|
+
### Common Mistakes
|
|
204
|
+
|
|
205
|
+
- Not aligning the partition strategy with actual query patterns, causing full scans
|
|
206
|
+
- Ignoring service-specific partition limits and throttling thresholds
|
|
207
|
+
|
|
208
|
+
---
|
|
209
|
+
|
|
210
|
+
## 9. Host Name Preservation
|
|
211
|
+
|
|
212
|
+
Preserve the original HTTP host name between reverse proxies and backend web applications to avoid issues with cookies, redirects, and CORS.
|
|
213
|
+
|
|
214
|
+
### Key Recommendations
|
|
215
|
+
|
|
216
|
+
- Forward the original `Host` header from the reverse proxy to the backend
|
|
217
|
+
- Configure Azure Front Door, Application Gateway, and API Management for host preservation
|
|
218
|
+
- Ensure cookies are set with the correct domain matching the original host name
|
|
219
|
+
- Verify redirect URLs reference the external host name, not the internal backend address
|
|
220
|
+
- Test CORS configurations end-to-end with the preserved host name
|
|
221
|
+
- Document host name flow across all network hops in the architecture
|
|
222
|
+
|
|
223
|
+
### WAF Pillar Alignment
|
|
224
|
+
|
|
225
|
+
Reliability
|
|
226
|
+
|
|
227
|
+
### Common Mistakes
|
|
228
|
+
|
|
229
|
+
- Not preserving host headers, causing redirect loops or incorrect absolute URLs
|
|
230
|
+
- Breaking session cookies because the cookie domain does not match the forwarded host
|
|
231
|
+
|
|
232
|
+
---
|
|
233
|
+
|
|
234
|
+
## 10. Message Encoding
|
|
235
|
+
|
|
236
|
+
Choose the right payload structure, encoding format, and serialization library for asynchronous messages exchanged between distributed components.
|
|
237
|
+
|
|
238
|
+
### Key Recommendations
|
|
239
|
+
|
|
240
|
+
- Evaluate JSON, Avro, Protobuf, and MessagePack based on performance and interoperability needs
|
|
241
|
+
- Use schema registries to enforce and version message contracts
|
|
242
|
+
- Validate incoming messages against their schemas before processing
|
|
243
|
+
- Prefer compact binary formats (Protobuf, Avro) for high-throughput, latency-sensitive paths
|
|
244
|
+
- Use JSON for human-readable messages and broad ecosystem compatibility
|
|
245
|
+
- Consider backward and forward compatibility when evolving message schemas
|
|
246
|
+
|
|
247
|
+
### WAF Pillar Alignment
|
|
248
|
+
|
|
249
|
+
Security
|
|
250
|
+
|
|
251
|
+
### Common Mistakes
|
|
252
|
+
|
|
253
|
+
- Using an inefficient encoding format for high-volume message streams
|
|
254
|
+
- Not validating message schemas, allowing malformed data into the processing pipeline
|
|
255
|
+
|
|
256
|
+
---
|
|
257
|
+
|
|
258
|
+
## 11. Monitoring and Diagnostics
|
|
259
|
+
|
|
260
|
+
Track system health, usage, and performance through a comprehensive monitoring pipeline that turns raw data into alerts, reports, and automated triggers.
|
|
261
|
+
|
|
262
|
+
### Key Recommendations
|
|
263
|
+
|
|
264
|
+
- Instrument applications with structured logging, metrics, and distributed tracing
|
|
265
|
+
- Use Azure Monitor, Application Insights, and Log Analytics as the monitoring backbone
|
|
266
|
+
- Define actionable alerts with clear thresholds, severity levels, and response procedures
|
|
267
|
+
- Detect and correct issues before they affect users by monitoring leading indicators
|
|
268
|
+
- Correlate telemetry across services using distributed trace context (e.g., correlation IDs)
|
|
269
|
+
- Establish performance baselines and track deviations over time
|
|
270
|
+
- Build dashboards for operational visibility across all tiers of the architecture
|
|
271
|
+
- Review and tune alert rules regularly to reduce noise
|
|
272
|
+
|
|
273
|
+
### WAF Pillar Alignment
|
|
274
|
+
|
|
275
|
+
Operational Excellence
|
|
276
|
+
|
|
277
|
+
### Common Mistakes
|
|
278
|
+
|
|
279
|
+
- Insufficient logging, making incident root-cause analysis slow or impossible
|
|
280
|
+
- Not using distributed tracing in microservice or multi-tier architectures
|
|
281
|
+
- Alert fatigue from poorly tuned thresholds that generate excessive false positives
|
|
282
|
+
|
|
283
|
+
---
|
|
284
|
+
|
|
285
|
+
## 12. Transient Fault Handling
|
|
286
|
+
|
|
287
|
+
Detect and handle transient faults caused by momentary loss of network connectivity, temporary service unavailability, or resource throttling.
|
|
288
|
+
|
|
289
|
+
### Key Recommendations
|
|
290
|
+
|
|
291
|
+
- Implement retry logic with exponential backoff and jitter for transient failures
|
|
292
|
+
- Use a circuit breaker pattern to stop retrying when failures are persistent
|
|
293
|
+
- Distinguish transient faults (e.g., HTTP 429, 503) from permanent errors (e.g., 400, 404)
|
|
294
|
+
- Leverage built-in retry policies in Azure SDKs before adding custom retry logic
|
|
295
|
+
- Avoid duplicating retry layers across middleware and application code
|
|
296
|
+
- Log every retry attempt for post-incident analysis
|
|
297
|
+
- Set a maximum retry count and total timeout to bound retry duration
|
|
298
|
+
|
|
299
|
+
### WAF Pillar Alignment
|
|
300
|
+
|
|
301
|
+
Reliability
|
|
302
|
+
|
|
303
|
+
### Common Mistakes
|
|
304
|
+
|
|
305
|
+
- Retrying non-transient faults (e.g., authentication failures, bad requests)
|
|
306
|
+
- Not using exponential backoff, overwhelming a recovering service with constant retries
|
|
307
|
+
- Retry storms caused by multiple layers retrying simultaneously without coordination
|
|
308
|
+
|
|
309
|
+
---
|
|
310
|
+
|
|
311
|
+
> Source: [Azure Architecture Center](https://learn.microsoft.com/en-us/azure/architecture/)
|