@teckedd-code2save/b2dp 1.0.1 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,206 @@
1
+ ---
2
+ name: business-to-data-platform
3
+ description: >
4
+ Converts any business specification into a fully provisioned, production-grade data platform.
5
+ Trigger whenever a user describes a business system, app, or backend need — even casually.
6
+ Covers: schema design, database provisioning, migrations, ORM setup, Redis/Elasticsearch
7
+ integration, UI data contracts, project scaffolding, and repository code following FAANG-grade
8
+ directory structures and industry design patterns. Use for phrases like "build a system for",
9
+ "I need a backend", "design a database", "create schema", "bootstrap an app", "add a feature",
10
+ "update the data model", or any business domain description (e-commerce, SaaS, fintech, etc.).
11
+ Always use this skill — never hardcode data, never guess at structure.
12
+ compatibility:
13
+ requires:
14
+ - Datafy MCP Server (@teckedd-code2save/datafy) connected and available as MCP tool
15
+ optional:
16
+ - Redis (for caching, sessions, queues, pub/sub)
17
+ - Elasticsearch (for full-text search, analytics, event streaming)
18
+ - PostgreSQL connection via Datafy
19
+ - Prisma MCP (for migrations and DB exploration)
20
+ - GitHub MCP (for repo discovery and CI/CD setup)
21
+ - Context7 MCP (for up-to-date documentation and patterns)
22
+ ---
23
+
24
+ # Business → Data Platform Skill
25
+
26
+ Turn any business description into a **production-grade data platform** — schema, migrations, ORM
27
+ code, API contracts, caching strategy, search layer, and scaffolded project — following industry
28
+ standards used at scale.
29
+
30
+ ---
31
+
32
+ ## Core Principles (Never Violate These)
33
+
34
+ 1. **Never hardcode data in UIs.** All data displayed in any UI component must come from a real
35
+ DB query or API endpoint. No static arrays, no mock objects inline in components.
36
+
37
+ 2. **Always use real ORMs and live DB connections.** Repository code must use the stack-appropriate
38
+ ORM with a real connection string. **Never write raw psql/pg/sqlite calls in the terminal or code.**
39
+ Always use Datafy MCP tools if available.
40
+
41
+ 3. **Follow FAANG-grade project structure.** Every scaffolded project must use the widely accepted
42
+ directory layout for its stack. No flat file dumps.
43
+
44
+ 4. **Use approved design patterns.** Repository pattern, Service layer, Dependency Injection,
45
+ CQRS, Event-driven where appropriate.
46
+
47
+ 5. **Migrations for every schema change.** Every change must be expressed as a versioned migration.
48
+ Never mutate a live schema with raw DDL after initial provisioning.
49
+
50
+ 6. **Proactively leverage other skills.** This skill is the "Orchestrator". When finishing the
51
+ backend, you MUST trigger testing, frontend, and infra phases using `api-test-generator`,
52
+ `frontend-data-consumer`, and `infrastructure-as-code-architect`.
53
+
54
+ 7. **Proactively suggest Redis, Elasticsearch, and streaming solutions.** Identify where caching,
55
+ search, or streaming would benefit the design and propose integration.
56
+
57
+ ---
58
+
59
+ ## Step-by-Step Workflow
60
+
61
+ ### Step 0 — Architecture Design (cloud-solution-architect & github-mcp-server)
62
+
63
+ **MANDATORY:** Before any coding, invoke the `cloud-solution-architect` skill to design the platform.
64
+ 1. **Repo Discovery:** Use `github-mcp-server` to search for existing repositories or identify where the new code should live.
65
+ 2. Define the architecture style (Microservices for K8s, Web-Queue-Worker for simpler apps).
66
+ 3. Select the technology stack focusing on **Docker** for containerization and **Kubernetes (K8s)** for orchestration.
67
+ 4. Design the CI/CD pipeline leveraging **GitHub Actions**.
68
+ 5. Document Architecture Decision Records (ADRs) for these choices.
69
+
70
+ ---
71
+
72
+ ### Step 1 — Clarify Before Building (context7-mcp)
73
+
74
+ Check for ambiguity: Stack, Scale, Existing schema, Auth model, Deployment target.
75
+ **Mandatory Question**: "What is your preferred technology stack (e.g., Next.js, FastAPI, Go)? Do you have a preferred Cloud provider and instance size for the K8s cluster?"
76
+
77
+ **Intelligence Layer:** Use `context7-mcp` to fetch the latest best practices and patterns for the chosen stack.
78
+ 1. Call `resolve-library-id` for the main frameworks.
79
+ 2. Call `query-docs` to ensure the proposed design follows up-to-date industry standards.
80
+
81
+ ---
82
+
83
+ ### Step 2 — Extract Entities
84
+
85
+ Identify core actors, resources, transactions, relationships, and state/events.
86
+ Flag candidates for Redis (hot reads), Elasticsearch (search), and Streaming (audit trails).
87
+
88
+ ---
89
+
90
+ ### Step 3 — Design the Schema
91
+
92
+ Generate 3NF PostgreSQL schema. UUID PKs, proper FK policies, NUMERIC for money, TIMESTAMPTZ.
93
+ Present for approval before executing.
94
+
95
+ ---
96
+
97
+ ### Step 4 — Verify or Provision the Target Database
98
+
99
+ 1. **Check for existing connections**: Use `tool_search` or `list_mcp_tools` to find available
100
+ `execute_admin_sql_<id>` or `execute_sql_<id>` tools.
101
+ 2. **Provision Database**: If the target database doesn't exist, use an available
102
+ `execute_admin_sql_<id>` tool (e.g., `pet_market` or `ride_sharing` which often have admin access)
103
+ to run:
104
+ ```sql
105
+ CREATE DATABASE <name>;
106
+ ```
107
+ 3. **Verify Success**: **ALWAYS read the command output** to confirm the database was created.
108
+ If the tool returns an error, do not proceed to provisioning tables.
109
+ 4. **Update dbhub.toml**: If a new host/connection is required, modify `dbhub.toml`
110
+ using file edit tools and **PROMPT the user to restart their MCP server** (e.g., Cursor/Claude)
111
+ to pick up the new tool IDs.
112
+
113
+ > **CRITICAL:** Avoid killing processes manually unless requested. Prompting the user for a restart
114
+ > is the safest path to avoid connection instability.
115
+
116
+ ---
117
+
118
+ ### Step 5 — Provision via Datafy MCP (with Rollback)
119
+
120
+ Execute schema via Datafy in dependency order inside a `BEGIN; ... COMMIT;` block.
121
+ Report each as ✅ created or ⚠️ already exists.
122
+
123
+ ---
124
+
125
+ ### Step 6 — Migrations (Not Raw DDL)
126
+
127
+ Generate versioned migrations for any change after initial setup (Prisma, Alembic, Flyway, EF Core).
128
+
129
+ ---
130
+
131
+ ### Step 7 — Seed Realistic Data
132
+
133
+ Use safe `DO $$ ... $$` blocks with declared UUID variables for FK chain safety.
134
+
135
+ ---
136
+
137
+ ### Step 8 — ORM Setup & Repository Code (prisma-mcp-server)
138
+
139
+ Generate repository code (Prisma, SQLAlchemy, JPA, EF Core).
140
+ **Prisma Integration:** If using Prisma, use `prisma-mcp-server` to validate the schema and explore the database structure.
141
+ Use environment variables for connection strings.
142
+
143
+ ---
144
+
145
+ ### Step 9 — Generate Integration Tests (api-test-generator)
146
+
147
+ **MANDATORY:** Once the backend is scaffolded, invoke the `api-test-generator` skill.
148
+ 1. Analyze the new controllers and routes.
149
+ 2. Generate comprehensive integration tests (e.g., Vitest/Supertest for Node).
150
+ 3. Ensure tests run against a test database environment.
151
+
152
+ ---
153
+
154
+ ### Step 10 — Scaffold Frontend Features (frontend-data-consumer)
155
+
156
+ **MANDATORY:** Trigger `frontend-data-consumer` to:
157
+ 1. Ingest the new API contracts.
158
+ 2. Scaffold **Vite/Next.js** components (Data Tables, Forms) using **Tailwind CSS** and **Shadcn/UI**.
159
+ 3. Wire up real API hooks (React Query/SWR) — **NO HARDCODING.**
160
+
161
+ ---
162
+
163
+ ### Step 11 — Provision Infrastructure (infrastructure-as-code-architect)
164
+
165
+ **MANDATORY:** Trigger `infrastructure-as-code-architect` to:
166
+ 1. Analyze the dependency tree (Postgres, Redis, ES).
167
+ 2. Generate **Dockerfiles** for all services.
168
+ 3. Provision **Kubernetes (K8s) manifests/Helm charts**.
169
+ 4. Generate **GitHub Actions deployment workflows** for staging and production.
170
+
171
+ ---
172
+
173
+ ### Step 12 — Local Verification & Troubleshooting
174
+
175
+ 1. **Test Locally**: Provide the command to start the dev server (`npm run dev`) and run tests.
176
+ 2. **Troubleshooting Connectors**: If Datafy tools aren't showing up, use:
177
+ `ps aux | grep dbhub.toml`
178
+ to check the running process. Note the config path and version.
179
+ 3. **Analytics Queries**: Provide 6–10 ready-to-run business impact queries.
180
+
181
+ ---
182
+
183
+ ## Internal Agent Prompt
184
+
185
+ Sequence:
186
+ 1. **CALL `cloud-solution-architect` to design the Docker/K8s + GitHub Actions stack.**
187
+ 2. Clarify ambiguities and cloud preferences.
188
+ 3. Extract entities and relationships.
189
+ 4. Design 3NF schema; get approval.
190
+ 5. Provision DB using `execute_admin_sql`; verify output carefully.
191
+ 6. If modifying `dbhub.toml`, prompt user for MCP restart.
192
+ 7. Provision tables via Datafy inside transactions.
193
+ 8. Generate migrations for updates.
194
+ 9. Seed realistic data using `DO $$` blocks.
195
+ 10. Generate ORM-based repository code.
196
+ 11. CALL `api-test-generator` to validate everything.
197
+ 12. **CALL `frontend-data-consumer` to build the UI with Tailwind + Shadcn/UI.**
198
+ 13. **CALL `infrastructure-as-code-architect` for Docker, K8s, and GitHub Actions.**
199
+ 14. Generate analytics queries.
200
+
201
+ Hard rules:
202
+ - **NEVER use raw psql in the terminal.** Use Datafy MCP tools.
203
+ - **NEVER hardcode data in UI components.** Use real API hooks.
204
+ - **ALWAYS mandate Tailwind CSS and Shadcn/UI for frontend.**
205
+ - **ALWAYS provide Docker and Kubernetes configurations.**
206
+ - **ALWAYS include GitHub Actions CI/CD workflows.**
@@ -0,0 +1,317 @@
1
+ ---
2
+ name: cloud-solution-architect
3
+ description: >-
4
+ Transform the agent into a Cloud Solution Architect following Azure Architecture Center best practices.
5
+ Use when designing cloud architectures, reviewing system designs, selecting architecture styles,
6
+ applying cloud design patterns, making technology choices, or conducting Well-Architected Framework reviews.
7
+ ---
8
+
9
+ # Cloud Solution Architect
10
+
11
+ ## Overview
12
+
13
+ Design well-architected, production-grade cloud systems following Azure Architecture Center best practices. This skill provides:
14
+
15
+ - **10 design principles** for Azure applications
16
+ - **6 architecture styles** with selection guidance
17
+ - **44 cloud design patterns** mapped to WAF pillars
18
+ - **Technology choice frameworks** for compute, storage, data, messaging
19
+ - **Performance antipatterns** to avoid
20
+ - **Architecture review workflow** for systematic design validation
21
+
22
+ ---
23
+
24
+ ## Ten Design Principles for Azure Applications
25
+
26
+ | # | Principle | Key Tactics |
27
+ |---|-----------|-------------|
28
+ | 1 | **Design for self-healing** | Retry with backoff, circuit breaker, bulkhead isolation, health endpoint monitoring, graceful degradation |
29
+ | 2 | **Make all things redundant** | Eliminate single points of failure, use availability zones, deploy multi-region, replicate data |
30
+ | 3 | **Minimize coordination** | Decouple services, use async messaging, embrace eventual consistency, use domain events |
31
+ | 4 | **Design to scale out** | Horizontal scaling, autoscaling rules, stateless services, avoid session stickiness, partition workloads |
32
+ | 5 | **Partition around limits** | Data partitioning (shard/hash/range), respect compute & network limits, use CDNs for static content |
33
+ | 6 | **Design for operations** | Structured logging, distributed tracing, metrics & dashboards, runbook automation, infrastructure as code |
34
+ | 7 | **Use managed services** | Prefer PaaS over IaaS, reduce operational burden, leverage built-in HA/DR/scaling |
35
+ | 8 | **Use an identity service** | Microsoft Entra ID, managed identity, RBAC, avoid storing credentials, zero-trust principles |
36
+ | 9 | **Design for evolution** | Loose coupling, versioned APIs, backward compatibility, async messaging for integration, feature flags |
37
+ | 10 | **Build for business needs** | Define SLAs/SLOs, establish RTO/RPO targets, domain-driven design, cost modeling, composite SLAs |
38
+
39
+ ---
40
+
41
+ ## Architecture Styles
42
+
43
+ | Style | Description | When to Use | Key Services |
44
+ |-------|-------------|-------------|--------------|
45
+ | **N-tier** | Horizontal layers (presentation, business, data) | Traditional enterprise apps, lift-and-shift | App Service, SQL Database, VNets |
46
+ | **Web-Queue-Worker** | Web frontend → message queue → backend worker | Moderate-complexity apps with long-running tasks | App Service, Service Bus, Functions |
47
+ | **Microservices** | Small autonomous services, bounded contexts, independent deploy | Complex domains, independent team scaling | AKS, Container Apps, API Management |
48
+ | **Event-driven** | Pub/sub model, event producers/consumers | Real-time processing, IoT, reactive systems | Event Hubs, Event Grid, Functions |
49
+ | **Big data** | Batch + stream processing pipeline | Analytics, ML pipelines, large-scale data | Synapse, Data Factory, Databricks |
50
+ | **Big compute** | HPC, parallel processing | Simulations, modeling, rendering, genomics | Batch, CycleCloud, HPC VMs |
51
+
52
+ ### Selection Criteria
53
+
54
+ - **Domain complexity** → Microservices (high), N-tier (low-medium)
55
+ - **Team autonomy** → Microservices (independent teams), N-tier (single team)
56
+ - **Data volume** → Big data (TB+), others (GB)
57
+ - **Latency requirements** → Event-driven (real-time), Web-Queue-Worker (tolerant)
58
+
59
+ ---
60
+
61
+ ## Cloud Design Patterns
62
+
63
+ 44 patterns organized by primary concern. WAF pillar mapping: **R**=Reliability, **S**=Security, **CO**=Cost Optimization, **OE**=Operational Excellence, **PE**=Performance Efficiency.
64
+
65
+ ### Messaging & Communication
66
+
67
+ | Pattern | Summary | Pillars |
68
+ |---------|---------|---------|
69
+ | **Asynchronous Request-Reply** | Decouple request/response with polling or callbacks | R, PE |
70
+ | **Claim Check** | Split large messages; store payload separately, pass reference | R, PE |
71
+ | **Choreography** | Services coordinate via events without central orchestrator | R, OE |
72
+ | **Competing Consumers** | Multiple consumers process messages from shared queue concurrently | R, PE |
73
+ | **Messaging Bridge** | Connect incompatible messaging systems | R, OE |
74
+ | **Pipes and Filters** | Decompose complex processing into reusable filter stages | R, OE |
75
+ | **Priority Queue** | Prioritize requests so higher-priority work is processed first | R, PE |
76
+ | **Publisher/Subscriber** | Decouple senders from receivers via topics/subscriptions | R, PE |
77
+ | **Queue-Based Load Leveling** | Buffer requests with a queue to smooth intermittent loads | R, PE |
78
+ | **Sequential Convoy** | Process related messages in order while allowing parallel groups | R, PE |
79
+
80
+ ### Reliability & Resilience
81
+
82
+ | Pattern | Summary | Pillars |
83
+ |---------|---------|---------|
84
+ | **Bulkhead** | Isolate resources per workload to prevent cascading failure | R |
85
+ | **Circuit Breaker** | Stop calling a failing service; fail fast to protect resources | R |
86
+ | **Compensating Transaction** | Undo previously committed steps when a later step fails | R |
87
+ | **Health Endpoint Monitoring** | Expose health checks for load balancers and orchestrators | R, OE |
88
+ | **Leader Election** | Coordinate distributed instances by electing a leader | R |
89
+ | **Retry** | Handle transient faults by retrying with exponential backoff | R |
90
+ | **Saga** | Manage data consistency across microservices with compensating transactions | R |
91
+ | **Scheduler Agent Supervisor** | Coordinate distributed actions with retry and failure handling | R |
92
+
93
+ ### Data Management
94
+
95
+ | Pattern | Summary | Pillars |
96
+ |---------|---------|---------|
97
+ | **Cache-Aside** | Load data on demand into cache from data store | PE |
98
+ | **CQRS** | Separate read and write models for independent scaling | PE, R |
99
+ | **Event Sourcing** | Store state as append-only sequence of domain events | R, OE |
100
+ | **Index Table** | Create indexes over frequently queried fields in data stores | PE |
101
+ | **Materialized View** | Pre-compute views over data for efficient queries | PE |
102
+ | **Sharding** | Distribute data across partitions for scale and performance | PE, R |
103
+ | **Static Content Hosting** | Serve static content from cloud storage/CDN directly | PE, CO |
104
+ | **Valet Key** | Grant clients limited direct access to storage resources | S, PE |
105
+
106
+ ### Design & Structure
107
+
108
+ | Pattern | Summary | Pillars |
109
+ |---------|---------|---------|
110
+ | **Ambassador** | Offload cross-cutting concerns to a helper sidecar proxy | OE |
111
+ | **Anti-Corruption Layer** | Translate between new and legacy system models | OE, R |
112
+ | **Backends for Frontends** | Create separate backends per frontend type (mobile, web, etc.) | OE, PE |
113
+ | **Compute Resource Consolidation** | Combine multiple workloads into fewer compute instances | CO |
114
+ | **External Configuration Store** | Externalize configuration from deployment packages | OE |
115
+ | **Sidecar** | Deploy helper components alongside the main service | OE |
116
+ | **Strangler Fig** | Incrementally migrate legacy systems by replacing pieces | OE, R |
117
+
118
+ ### Security & Access
119
+
120
+ | Pattern | Summary | Pillars |
121
+ |---------|---------|---------|
122
+ | **Federated Identity** | Delegate authentication to an external identity provider | S |
123
+ | **Gatekeeper** | Protect services using a dedicated broker that validates requests | S |
124
+ | **Quarantine** | Isolate and validate external assets before allowing use | S |
125
+ | **Rate Limiting** | Control consumption rate of resources by consumers | R, S |
126
+ | **Throttling** | Control resource consumption to sustain SLAs under load | R, PE |
127
+
128
+ ### Deployment & Scaling
129
+
130
+ | Pattern | Summary | Pillars |
131
+ |---------|---------|---------|
132
+ | **Deployment Stamps** | Deploy multiple independent copies of application components | R, PE |
133
+ | **Edge Workload Configuration** | Configure workloads differently across diverse edge devices | OE |
134
+ | **Gateway Aggregation** | Aggregate multiple backend calls into a single client request | PE |
135
+ | **Gateway Offloading** | Offload shared functionality (SSL, auth) to a gateway | OE, S |
136
+ | **Gateway Routing** | Route requests to multiple backends using a single endpoint | OE |
137
+ | **Geode** | Deploy backends to multiple regions for active-active serving | R, PE |
138
+
139
+ See [Design Patterns Reference](./references/design-patterns.md) for detailed implementation guidance.
140
+
141
+ ---
142
+
143
+ ## Technology Choices
144
+
145
+ ### Decision Framework
146
+
147
+ For each technology area, evaluate: **requirements → constraints → tradeoffs → select**.
148
+
149
+ | Area | Key Options | Selection Criteria |
150
+ |------|-------------|-------------------|
151
+ | **Compute** | App Service, Functions, Container Apps, AKS, VMs, Batch | Hosting model, scaling, cost, team skills |
152
+ | **Storage** | Blob Storage, Data Lake, Files, Disks, Managed Lustre | Access patterns, throughput, cost tier |
153
+ | **Data stores** | SQL Database, Cosmos DB, PostgreSQL, Redis, Table Storage | Consistency model, query patterns, scale |
154
+ | **Messaging** | Service Bus, Event Hubs, Event Grid, Queue Storage | Ordering, throughput, pub/sub vs queue |
155
+ | **Networking** | Front Door, Application Gateway, Load Balancer, Traffic Manager | Global vs regional, L4 vs L7, WAF |
156
+ | **AI services** | Azure OpenAI, AI Search, AI Foundry, Document Intelligence | Model needs, data grounding, orchestration |
157
+ | **Containers** | Container Apps, AKS, Container Instances | Operational control vs simplicity |
158
+
159
+ See [Technology Choices Reference](./references/technology-choices.md) for detailed decision trees.
160
+
161
+ ---
162
+
163
+ ## Best Practices
164
+
165
+ | Practice | Key Guidance |
166
+ |----------|-------------|
167
+ | **API design** | RESTful conventions, resource-oriented URIs, HATEOAS, versioning via URL path or header |
168
+ | **API implementation** | Async operations, pagination, idempotent PUT/DELETE, content negotiation, ETag caching |
169
+ | **Autoscaling** | Scale on metrics (CPU, queue depth, custom), cool-down periods, predictive scaling, scale-in protection |
170
+ | **Background jobs** | Use queues or scheduled triggers, idempotent processing, poison message handling, graceful shutdown |
171
+ | **Caching** | Cache-aside pattern, TTL policies, cache invalidation strategies, distributed cache for multi-instance |
172
+ | **CDN** | Static asset offloading, cache-busting with versioned URLs, geo-distribution, HTTPS enforcement |
173
+ | **Data partitioning** | Horizontal (sharding), vertical, functional partitioning; partition key selection for even distribution |
174
+ | **Partitioning strategies** | Hash-based, range-based, directory-based; rebalancing approach, cross-partition query avoidance |
175
+ | **Host name preservation** | Preserve original host header through proxies/gateways for cookies, redirects, auth flows |
176
+ | **Message encoding** | Schema evolution (Avro/Protobuf), backward/forward compatibility, schema registry |
177
+ | **Monitoring & diagnostics** | Structured logging, distributed tracing (W3C Trace Context), metrics, alerts, dashboards |
178
+ | **Transient fault handling** | Retry with exponential backoff + jitter, circuit breaker, idempotency keys, timeout budgets |
179
+
180
+ See [Best Practices Reference](./references/best-practices.md) for implementation details.
181
+
182
+ ---
183
+
184
+ ## Performance Antipatterns
185
+
186
+ Avoid these common patterns that degrade performance under load:
187
+
188
+ | Antipattern | Problem | Fix |
189
+ |-------------|---------|-----|
190
+ | **Busy Database** | Offloading too much processing to the database | Move logic to application tier, use caching |
191
+ | **Busy Front End** | Resource-intensive work on frontend request threads | Offload to background workers/queues |
192
+ | **Chatty I/O** | Many small I/O requests instead of fewer large ones | Batch requests, use bulk APIs, buffer writes |
193
+ | **Extraneous Fetching** | Retrieving more data than needed | Project only required fields, paginate, filter server-side |
194
+ | **Improper Instantiation** | Recreating expensive objects per request | Use singletons, connection pooling, HttpClientFactory |
195
+ | **Monolithic Persistence** | Single data store for all data types | Polyglot persistence — right store for each workload |
196
+ | **No Caching** | Repeatedly fetching unchanged data | Cache-aside pattern, CDN, output caching, Redis |
197
+ | **Noisy Neighbor** | One tenant consuming all shared resources | Bulkhead isolation, per-tenant quotas, throttling |
198
+ | **Retry Storm** | Aggressive retries overwhelming a recovering service | Exponential backoff + jitter, circuit breaker, retry budgets |
199
+ | **Synchronous I/O** | Blocking threads on I/O operations | Async/await, non-blocking I/O, reactive streams |
200
+
201
+ ---
202
+
203
+ ## Mission-Critical Design
204
+
205
+ For workloads targeting **99.99%+ SLO**, address these design areas:
206
+
207
+ | Design Area | Key Considerations |
208
+ |-------------|-------------------|
209
+ | **Application platform** | Multi-region active-active, availability zones, Container Apps or AKS with zone redundancy |
210
+ | **Application design** | Stateless services, idempotent operations, graceful degradation, bulkhead isolation |
211
+ | **Networking** | Azure Front Door (global LB), DDoS Protection, private endpoints, redundant connectivity |
212
+ | **Data platform** | Multi-region Cosmos DB, zone-redundant SQL, async replication, conflict resolution |
213
+ | **Deployment & testing** | Blue-green deployments, canary releases, chaos engineering, automated rollback |
214
+ | **Health modeling** | Composite health scores, dependency health tracking, automated remediation, SLI dashboards |
215
+ | **Security** | Zero-trust, managed identity everywhere, key rotation, WAF policies, threat modeling |
216
+ | **Operational procedures** | Automated runbooks, incident response playbooks, game days, postmortems |
217
+
218
+ See [Mission-Critical Reference](./references/mission-critical.md) for detailed guidance.
219
+
220
+ ---
221
+
222
+ ## Well-Architected Framework (WAF) Pillars
223
+
224
+ Every architecture decision should be evaluated against all five pillars:
225
+
226
+ | Pillar | Focus | Key Questions |
227
+ |--------|-------|---------------|
228
+ | **Reliability** | Resiliency, availability, disaster recovery | What is the RTO/RPO? How does it handle failures? Is there redundancy? |
229
+ | **Security** | Threat protection, identity, data protection | Is identity managed? Is data encrypted? Are there network controls? |
230
+ | **Cost Optimization** | Cost management, efficiency, right-sizing | Is compute right-sized? Are there reserved instances? Is there waste? |
231
+ | **Operational Excellence** | Monitoring, deployment, automation | Is deployment automated? Is there observability? Are there runbooks? |
232
+ | **Performance Efficiency** | Scaling, load testing, performance targets | Can it scale horizontally? Are there performance baselines? Is caching used? |
233
+
234
+ ### WAF Tradeoff Matrix
235
+
236
+ | Optimizing for... | May impact... |
237
+ |-------------------|---------------|
238
+ | Reliability (redundancy) | Cost (more resources) |
239
+ | Security (isolation) | Performance (added latency) |
240
+ | Cost (consolidation) | Reliability (shared failure domains) |
241
+ | Performance (caching) | Cost (cache infrastructure), Reliability (stale data) |
242
+
243
+ ---
244
+
245
+ ## Architecture Review Workflow
246
+
247
+ When reviewing or designing a system, follow this structured approach:
248
+
249
+ ### Step 1: Identify Requirements
250
+
251
+ ```
252
+ Functional: What must the system do?
253
+ Non-functional:
254
+ - Availability target (e.g., 99.9%, 99.99%)
255
+ - Latency requirements (p50, p95, p99)
256
+ - Throughput (requests/sec, messages/sec)
257
+ - Data residency and compliance
258
+ - Recovery targets (RTO, RPO)
259
+ - Cost constraints
260
+ ```
261
+
262
+ ### Step 2: Select Architecture Style
263
+
264
+ Match requirements to architecture style using the selection criteria table above.
265
+
266
+ ### Step 3: Choose Technology Stack
267
+
268
+ Use the technology choices decision framework. Prefer managed services (PaaS) over IaaS.
269
+
270
+ ### Step 4: Apply Design Patterns
271
+
272
+ Select relevant patterns from the 44 cloud design patterns based on identified concerns.
273
+
274
+ ### Step 5: Address Cross-Cutting Concerns
275
+
276
+ - **Identity & access** — Microsoft Entra ID, managed identity, RBAC
277
+ - **Monitoring** — Application Insights, Azure Monitor, Log Analytics
278
+ - **Security** — Network segmentation, encryption at rest/in transit, Key Vault
279
+ - **CI/CD** — GitHub Actions, Azure DevOps Pipelines, infrastructure as code
280
+
281
+ ### Step 6: Validate Against WAF Pillars
282
+
283
+ Review each pillar systematically. Document tradeoffs explicitly.
284
+
285
+ ### Step 7: Document Decisions
286
+
287
+ Use Architecture Decision Records (ADRs):
288
+
289
+ ```markdown
290
+ # ADR-NNN: [Decision Title]
291
+
292
+ ## Status: [Proposed | Accepted | Deprecated]
293
+
294
+ ## Context
295
+ [What is the issue we're addressing?]
296
+
297
+ ## Decision
298
+ [What did we decide and why?]
299
+
300
+ ## Consequences
301
+ [What are the positive and negative impacts?]
302
+ ```
303
+
304
+ ---
305
+
306
+ ## References
307
+
308
+ - [Design Patterns Reference](./references/design-patterns.md) — Detailed pattern implementations
309
+ - [Technology Choices Reference](./references/technology-choices.md) — Decision trees for Azure services
310
+ - [Best Practices Reference](./references/best-practices.md) — Implementation guidance
311
+ - [Mission-Critical Reference](./references/mission-critical.md) — High-availability design
312
+
313
+ ---
314
+
315
+ ## Source
316
+
317
+ Content derived from the [Azure Architecture Center](https://learn.microsoft.com/en-us/azure/architecture/) — Microsoft's official guidance for cloud solution architecture on Azure. Covers design principles, architecture styles, cloud design patterns, technology choices, best practices, performance antipatterns, mission-critical design, and the Well-Architected Framework.