@zigrivers/scaffold 3.6.0 → 3.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (115) hide show
  1. package/README.md +127 -12
  2. package/content/knowledge/backend/backend-api-design.md +103 -0
  3. package/content/knowledge/backend/backend-architecture.md +100 -0
  4. package/content/knowledge/backend/backend-async-patterns.md +101 -0
  5. package/content/knowledge/backend/backend-auth-patterns.md +100 -0
  6. package/content/knowledge/backend/backend-conventions.md +105 -0
  7. package/content/knowledge/backend/backend-data-modeling.md +102 -0
  8. package/content/knowledge/backend/backend-deployment.md +100 -0
  9. package/content/knowledge/backend/backend-dev-environment.md +102 -0
  10. package/content/knowledge/backend/backend-observability.md +102 -0
  11. package/content/knowledge/backend/backend-project-structure.md +100 -0
  12. package/content/knowledge/backend/backend-requirements.md +103 -0
  13. package/content/knowledge/backend/backend-security.md +104 -0
  14. package/content/knowledge/backend/backend-testing.md +101 -0
  15. package/content/knowledge/backend/backend-worker-patterns.md +100 -0
  16. package/content/knowledge/cli/cli-architecture.md +101 -0
  17. package/content/knowledge/cli/cli-conventions.md +117 -0
  18. package/content/knowledge/cli/cli-dev-environment.md +121 -0
  19. package/content/knowledge/cli/cli-distribution-patterns.md +106 -0
  20. package/content/knowledge/cli/cli-interactivity-patterns.md +116 -0
  21. package/content/knowledge/cli/cli-output-patterns.md +107 -0
  22. package/content/knowledge/cli/cli-project-structure.md +124 -0
  23. package/content/knowledge/cli/cli-requirements.md +101 -0
  24. package/content/knowledge/cli/cli-shell-integration.md +130 -0
  25. package/content/knowledge/cli/cli-testing.md +134 -0
  26. package/content/knowledge/library/library-api-design.md +306 -0
  27. package/content/knowledge/library/library-architecture.md +247 -0
  28. package/content/knowledge/library/library-bundling.md +244 -0
  29. package/content/knowledge/library/library-conventions.md +229 -0
  30. package/content/knowledge/library/library-dev-environment.md +220 -0
  31. package/content/knowledge/library/library-documentation.md +300 -0
  32. package/content/knowledge/library/library-project-structure.md +237 -0
  33. package/content/knowledge/library/library-requirements.md +173 -0
  34. package/content/knowledge/library/library-security.md +257 -0
  35. package/content/knowledge/library/library-testing.md +319 -0
  36. package/content/knowledge/library/library-type-definitions.md +284 -0
  37. package/content/knowledge/library/library-versioning.md +300 -0
  38. package/content/knowledge/mobile-app/mobile-app-architecture.md +283 -0
  39. package/content/knowledge/mobile-app/mobile-app-conventions.md +180 -0
  40. package/content/knowledge/mobile-app/mobile-app-deployment.md +298 -0
  41. package/content/knowledge/mobile-app/mobile-app-dev-environment.md +257 -0
  42. package/content/knowledge/mobile-app/mobile-app-distribution.md +264 -0
  43. package/content/knowledge/mobile-app/mobile-app-observability.md +317 -0
  44. package/content/knowledge/mobile-app/mobile-app-offline-patterns.md +311 -0
  45. package/content/knowledge/mobile-app/mobile-app-project-structure.md +245 -0
  46. package/content/knowledge/mobile-app/mobile-app-push-notifications.md +321 -0
  47. package/content/knowledge/mobile-app/mobile-app-requirements.md +147 -0
  48. package/content/knowledge/mobile-app/mobile-app-security.md +338 -0
  49. package/content/knowledge/mobile-app/mobile-app-testing.md +400 -0
  50. package/content/knowledge/web-app/web-app-api-patterns.md +224 -0
  51. package/content/knowledge/web-app/web-app-architecture.md +116 -0
  52. package/content/knowledge/web-app/web-app-auth-patterns.md +256 -0
  53. package/content/knowledge/web-app/web-app-conventions.md +121 -0
  54. package/content/knowledge/web-app/web-app-data-patterns.md +218 -0
  55. package/content/knowledge/web-app/web-app-deployment-workflow.md +143 -0
  56. package/content/knowledge/web-app/web-app-deployment.md +134 -0
  57. package/content/knowledge/web-app/web-app-design-system.md +158 -0
  58. package/content/knowledge/web-app/web-app-dev-environment.md +173 -0
  59. package/content/knowledge/web-app/web-app-observability.md +221 -0
  60. package/content/knowledge/web-app/web-app-project-structure.md +160 -0
  61. package/content/knowledge/web-app/web-app-rendering-strategies.md +133 -0
  62. package/content/knowledge/web-app/web-app-requirements.md +112 -0
  63. package/content/knowledge/web-app/web-app-security.md +193 -0
  64. package/content/knowledge/web-app/web-app-session-patterns.md +214 -0
  65. package/content/knowledge/web-app/web-app-testing.md +249 -0
  66. package/content/knowledge/web-app/web-app-ux-patterns.md +162 -0
  67. package/content/methodology/backend-overlay.yml +73 -0
  68. package/content/methodology/cli-overlay.yml +69 -0
  69. package/content/methodology/library-overlay.yml +67 -0
  70. package/content/methodology/mobile-app-overlay.yml +71 -0
  71. package/content/methodology/web-app-overlay.yml +79 -0
  72. package/dist/cli/commands/init.d.ts +21 -0
  73. package/dist/cli/commands/init.d.ts.map +1 -1
  74. package/dist/cli/commands/init.js +261 -13
  75. package/dist/cli/commands/init.js.map +1 -1
  76. package/dist/cli/commands/init.test.js +206 -0
  77. package/dist/cli/commands/init.test.js.map +1 -1
  78. package/dist/config/schema.d.ts +1392 -64
  79. package/dist/config/schema.d.ts.map +1 -1
  80. package/dist/config/schema.js +82 -5
  81. package/dist/config/schema.js.map +1 -1
  82. package/dist/config/schema.test.js +302 -1
  83. package/dist/config/schema.test.js.map +1 -1
  84. package/dist/core/assembly/overlay-loader.d.ts.map +1 -1
  85. package/dist/core/assembly/overlay-loader.js +2 -1
  86. package/dist/core/assembly/overlay-loader.js.map +1 -1
  87. package/dist/core/assembly/overlay-loader.test.js +56 -0
  88. package/dist/core/assembly/overlay-loader.test.js.map +1 -1
  89. package/dist/e2e/game-pipeline.test.js +1 -0
  90. package/dist/e2e/game-pipeline.test.js.map +1 -1
  91. package/dist/e2e/project-type-overlays.test.d.ts +16 -0
  92. package/dist/e2e/project-type-overlays.test.d.ts.map +1 -0
  93. package/dist/e2e/project-type-overlays.test.js +834 -0
  94. package/dist/e2e/project-type-overlays.test.js.map +1 -0
  95. package/dist/types/config.d.ts +19 -2
  96. package/dist/types/config.d.ts.map +1 -1
  97. package/dist/types/index.d.ts +0 -1
  98. package/dist/types/index.d.ts.map +1 -1
  99. package/dist/types/index.js +0 -1
  100. package/dist/types/index.js.map +1 -1
  101. package/dist/wizard/questions.d.ts +27 -1
  102. package/dist/wizard/questions.d.ts.map +1 -1
  103. package/dist/wizard/questions.js +142 -3
  104. package/dist/wizard/questions.js.map +1 -1
  105. package/dist/wizard/questions.test.js +206 -8
  106. package/dist/wizard/questions.test.js.map +1 -1
  107. package/dist/wizard/wizard.d.ts +21 -0
  108. package/dist/wizard/wizard.d.ts.map +1 -1
  109. package/dist/wizard/wizard.js +27 -1
  110. package/dist/wizard/wizard.js.map +1 -1
  111. package/package.json +1 -1
  112. package/dist/types/wizard.d.ts +0 -14
  113. package/dist/types/wizard.d.ts.map +0 -1
  114. package/dist/types/wizard.js +0 -2
  115. package/dist/types/wizard.js.map +0 -1
@@ -0,0 +1,105 @@
1
+ ---
2
+ name: backend-conventions
3
+ description: Service and handler naming conventions, structured error handling patterns, structured logging with correlation IDs, and file organization standards for backend codebases
4
+ topics: [backend, conventions, error-handling, logging, naming, file-organization]
5
+ ---
6
+
7
+ Consistent conventions in a backend codebase reduce cognitive load, make code reviewable at a glance, and prevent entire classes of bugs. Naming, error handling, and logging are the three highest-leverage areas — they touch every layer of the stack and every engineer on the team. Establish these conventions before the first PR, codify them in linting rules where possible, and treat violations as blocking review comments.
8
+
9
+ ## Summary
10
+
11
+ Backend conventions cover naming, error handling, logging, and file organization. Services are named after domain concepts (noun form), handlers after HTTP resources, and repositories after data entities. Structured errors use machine-readable codes; structured logs emit JSON with correlation IDs. Group by layer in small services, by feature/domain in large codebases.
12
+
13
+ Conventions enforced only by humans drift immediately. Automate with ESLint, TypeScript strict mode, commit hooks, and schema validation in integration tests.
14
+
15
+ ## Deep Guidance
16
+
17
+ ### Service and Handler Naming
18
+
19
+ Names should reveal intent and be consistent across the codebase:
20
+
21
+ - **Services**: Named after the domain concept they own, noun form. `OrderService`, `UserAuthService`, `PaymentProcessor`. One service per domain concept. Services contain business logic only — no HTTP, no database drivers.
22
+ - **Handlers / Controllers**: Named after the HTTP resource or operation. `OrdersHandler`, `UserController`, `AuthRouter`. One handler per resource family. Handlers translate HTTP concerns to service calls — no business logic.
23
+ - **Repository / DAO classes**: `OrderRepository`, `UserStore`. Contain only data-access logic. No business rules, no HTTP.
24
+ - **Method naming**: Use verb-noun patterns that reveal intent. Prefer `createOrder`, `cancelOrder`, `findOrdersByCustomer` over generic `get`, `set`, `update`. Reserve `get` for trivial accessors.
25
+ - **File names**: Match the class/module name in kebab-case. `order-service.ts`, `orders-handler.ts`, `order-repository.ts`.
26
+
27
+ ### Structured Error Handling
28
+
29
+ Unstructured errors (`throw new Error("something went wrong")`) are a maintenance liability. Use structured errors throughout:
30
+
31
+ - **Error codes**: Define an enum or constant map of machine-readable error codes. `ORDER_NOT_FOUND`, `PAYMENT_DECLINED`, `INSUFFICIENT_INVENTORY`. Codes enable consumers to handle specific failure cases without string parsing.
32
+ - **Error shape**: Standardize the error response object across all endpoints. A minimal shape: `{ code: string, message: string, details?: unknown, requestId: string }`. Never leak stack traces to API consumers in production.
33
+ - **Error categories**: Separate operational errors (expected, business logic failures) from programmer errors (unexpected, indicative of a bug). Operational errors are `4xx`; programmer errors are `5xx`. Log programmer errors with full context.
34
+ - **Never swallow errors**: A bare `catch {}` or `catch (e) { /* ignore */ }` is a production bug waiting to surface. At minimum, log the error with context. If the error is expected and recoverable, handle it explicitly and document why.
35
+
36
+ ### Structured Logging
37
+
38
+ Log entries are your primary debugging tool in production. Treat them as a first-class API:
39
+
40
+ - **Structured format**: Always emit JSON logs. Human-readable log lines are unsearchable at scale. Every entry should be a JSON object parseable by log aggregation tools (Datadog, Loki, CloudWatch).
41
+ - **Required fields on every log entry**: `timestamp` (ISO 8601), `level` (debug/info/warn/error), `message`, `service`, `requestId` (correlation ID), `environment`.
42
+ - **Correlation IDs**: Generate a `requestId` (UUID v4) at the API gateway or first handler and propagate it through every downstream service call via a header (`X-Request-ID`) and a context object (AsyncLocalStorage in Node.js). Every log line and error response includes this ID. A support ticket becomes: "give me requestId X" → full trace in 10 seconds.
43
+ - **Log levels**: `debug` — internal state, loop iterations (disabled in production); `info` — request start/end, state transitions; `warn` — recoverable anomalies, deprecated usage; `error` — failures requiring human attention.
44
+ - **Avoid over-logging**: Logging every field of every object in a high-throughput path fills disks and incurs cost. Log decision points, not data dumps.
45
+
46
+ ### File Organization
47
+
48
+ Enforce a predictable directory layout:
49
+
50
+ - Group by layer, not by feature, in small services. In larger codebases, group by feature/domain with layer sub-directories.
51
+ - Test files co-located with source files (`order-service.test.ts` next to `order-service.ts`) or in a parallel `__tests__/` directory — choose one pattern and enforce it.
52
+ - Configuration files at the root of `src/`, not scattered across subdirectories.
53
+
54
+ ### Enforcing Conventions with Tooling
55
+
56
+ Conventions enforced by humans alone drift. Use automated tools:
57
+
58
+ - **ESLint / Biome**: Enforce naming patterns, ban `console.log` in favor of a logger, require explicit error handling.
59
+ - **TypeScript strict mode**: Catches the class of errors that naming conventions alone cannot prevent — `strictNullChecks`, `noImplicitAny`, `noUncheckedIndexedAccess`.
60
+ - **Commit hooks**: Run linting on staged files via `lint-staged` + `husky`. Prevents convention drift from reaching the repository.
61
+ - **OpenAPI response schema validation**: Validate that error responses match the standard shape in integration tests. Catches endpoints that return non-standard errors before they reach production.
62
+
63
+ ### Naming Conventions by Language
64
+
65
+ Each language ecosystem has established naming conventions. Follow them without exception:
66
+
67
+ **TypeScript/Node.js:**
68
+ - Files: `kebab-case.ts` (`order-service.ts`, `payment-handler.ts`)
69
+ - Classes: `PascalCase` (`OrderService`, `PaymentHandler`)
70
+ - Functions/methods: `camelCase` (`createOrder`, `findUserById`)
71
+ - Constants: `UPPER_SNAKE_CASE` (`MAX_RETRIES`, `DEFAULT_TIMEOUT_MS`)
72
+ - Interfaces/Types: `PascalCase` with no prefix (`OrderData`, not `IOrderData`)
73
+
74
+ **Python:**
75
+ - Files: `snake_case.py` (`order_service.py`)
76
+ - Classes: `PascalCase` (`OrderService`)
77
+ - Functions/methods: `snake_case` (`create_order`, `find_user_by_id`)
78
+ - Constants: `UPPER_SNAKE_CASE` (`MAX_RETRIES`)
79
+
80
+ **Go:**
81
+ - Files: `snake_case.go` (`order_service.go`)
82
+ - Exported: `PascalCase` (`CreateOrder`)
83
+ - Unexported: `camelCase` (`validateInput`)
84
+ - Packages: lowercase, single word (`orders`, not `orderService`)
85
+
86
+ ### Request/Response Envelope Standards
87
+
88
+ Standardize the shape of all API responses:
89
+
90
+ ```json
91
+ {
92
+ "data": { "id": "ord_123", "status": "created" },
93
+ "meta": { "requestId": "req_abc", "timestamp": "2024-01-15T10:30:00Z" }
94
+ }
95
+ ```
96
+
97
+ For error responses:
98
+ ```json
99
+ {
100
+ "error": { "code": "NOT_FOUND", "message": "Order not found", "details": [] },
101
+ "meta": { "requestId": "req_abc", "timestamp": "2024-01-15T10:30:00Z" }
102
+ }
103
+ ```
104
+
105
+ Never mix success and error shapes. A response has either `data` or `error`, never both. This makes client-side type discrimination trivial. Include `meta` on every response for debugging and correlation.
@@ -0,0 +1,102 @@
1
+ ---
2
+ name: backend-data-modeling
3
+ description: Relational vs document modeling tradeoffs, migration strategies, connection pooling, ORM vs query builder tradeoffs, multi-tenancy patterns, and eventual consistency
4
+ topics: [backend, data-modeling, database, migrations, orm, multi-tenancy, eventual-consistency, connection-pooling]
5
+ ---
6
+
7
+ Data modeling decisions have the highest reversal cost of any backend choice. A schema design that seemed reasonable at launch can become an operational crisis at scale — queries that worked at 10,000 rows fail at 100 million. The goal is to match the data model to the access patterns of the application, not to normalize for its own sake or to denormalize prematurely. Design the schema with the queries in mind from day one.
8
+
9
+ ## Summary
10
+
11
+ Data modeling decisions have the highest reversal cost of any backend choice. Choose relational databases for transactional integrity and complex queries, document stores for hierarchical data read as a unit. Use versioned migrations for all schema changes with non-destructive patterns for zero-downtime deploys. Every production backend requires connection pooling; pool exhaustion under load is a complete outage.
12
+
13
+ Multi-tenancy, eventual consistency, and ORM vs query builder selection are load-bearing choices that must be made with explicit tradeoff acknowledgment before the first schema ships.
14
+
15
+ ## Deep Guidance
16
+
17
+ ### Relational vs Document Modeling
18
+
19
+ Choose based on your access patterns and data structure, not familiarity:
20
+
21
+ **Relational (PostgreSQL, MySQL):**
22
+ - Strong fit for: highly relational data with many join paths, transactional integrity across multiple entities (financial records, inventory), complex queries with ad hoc filter combinations, strict schema enforcement.
23
+ - Use normalized forms (3NF) as the default. Denormalize only for specific, measured performance bottlenecks, not speculatively.
24
+ - PostgreSQL's JSONB column type gives you document storage inside a relational database — defer the choice of full document store until the access pattern clearly demands it.
25
+
26
+ **Document (MongoDB, DynamoDB, Firestore):**
27
+ - Strong fit for: hierarchical or nested data read as a unit (a blog post with all its comments), schema-less or rapidly evolving schemas, single-entity lookups by a known ID, extreme write throughput.
28
+ - Weak fit for: ad hoc querying across multiple attributes, transactions spanning multiple documents, highly relational data with many access patterns.
29
+ - The most common document modeling mistake is replicating relational joins via application-level fetching — this is an N+1 at the data layer and scales poorly.
30
+
31
+ ### Migration Strategies
32
+
33
+ Every schema change must go through a versioned migration:
34
+
35
+ - **Non-destructive first**: Add new columns as nullable or with defaults. Only make existing columns NOT NULL after backfilling all rows. This enables zero-downtime deployments where the old and new app versions run simultaneously.
36
+ - **Expand-contract pattern**: For large schema changes — (1) expand: add new column/table while keeping old; (2) backfill data; (3) deploy code that writes to both; (4) contract: remove old column/table once all consumers use the new one.
37
+ - **Irreversible migrations**: Some changes cannot be undone (dropping a column, deleting data). Flag these explicitly in code and in your migration changelog. Require explicit human sign-off before running in production.
38
+ - **Test migrations against production data size**: A migration that runs in 2 ms on 1,000 rows may lock a table for 20 minutes on 100 million rows. Use `pt-online-schema-change` (MySQL) or `pg_repack` / `CREATE INDEX CONCURRENTLY` (PostgreSQL) for zero-lock migrations on large tables.
39
+
40
+ ### Connection Pooling
41
+
42
+ Every production backend must use a connection pool. Direct connections to the database at high concurrency exhaust database connection limits within minutes:
43
+
44
+ - **Application-level pooling**: PgBouncer (PostgreSQL), ProxySQL (MySQL), or the ORM's built-in pool (Prisma, Sequelize, Knex). Configure `min`, `max`, and `idleTimeoutMillis`.
45
+ - **Right-sizing the pool**: A pool larger than the database can handle is worse than no pool. PostgreSQL supports ~100 active connections by default before performance degrades. Rule of thumb: pool size = (number of CPU cores × 2) + effective spindle count.
46
+ - **Monitor pool saturation**: Alert when `waiting_clients` exceeds zero for more than a few seconds. Pool exhaustion under load is a complete service outage.
47
+
48
+ ### ORM vs Query Builder
49
+
50
+ | | ORM (Prisma, TypeORM, Hibernate) | Query Builder (Knex, JOOQ, Drizzle) |
51
+ |---|---|---|
52
+ | Productivity | Higher for standard CRUD | Higher for complex SQL |
53
+ | Complex queries | Awkward, often forces raw SQL | Natural SQL expression |
54
+ | Type safety | High (Prisma) to medium | Medium to high (Drizzle) |
55
+ | Performance | Can hide N+1s | Explicit, predictable |
56
+ | Migration | Built-in (Prisma) | Manual or separate tool |
57
+
58
+ Use an ORM for standard CRUD-heavy domains. Use a query builder or raw SQL for analytics, reporting, or any service where query control is critical. Do not mix ORMs in the same service — pick one.
59
+
60
+ ### Multi-Tenancy Patterns
61
+
62
+ Three approaches, ordered by isolation strength:
63
+
64
+ - **Shared schema / shared tables**: Row-level isolation via a `tenant_id` column. Simplest to operate; strongest risk of data leakage if a query omits the tenant filter. Always enforce tenant filtering via a middleware or database row-level security (PostgreSQL RLS).
65
+ - **Shared database / separate schemas**: Each tenant gets their own schema. Better isolation, more complex migrations (must apply to all tenant schemas), good balance for B2B SaaS.
66
+ - **Separate databases per tenant**: Full isolation. High operational overhead — connection pools multiply by tenant count. Justified for enterprise customers with contractual data isolation requirements.
67
+
68
+ ### Eventual Consistency
69
+
70
+ Distributed systems and event-driven architectures introduce eventual consistency:
71
+
72
+ - **Identify consistency requirements explicitly**: Which operations require immediate consistency (payment confirmation, inventory reservation) vs eventual (email notification, analytics event)? Document this per use case.
73
+ - **Idempotency**: All event consumers and queue workers must be idempotent — processing the same message twice produces the same outcome as processing it once. Use an idempotency key stored in the database to detect duplicates.
74
+ - **Sagas for distributed transactions**: When a business operation spans multiple services with no shared transaction, use the Saga pattern — a sequence of local transactions with compensating transactions for rollback. Choreography (event-driven) or orchestration (central coordinator) variants.
75
+
76
+ ### Index Strategy
77
+
78
+ An unindexed query on a large table is a latency spike and a database lock. Index columns that appear in `WHERE`, `ORDER BY`, `GROUP BY`, and `JOIN ON` clauses. Over-indexing is also a problem — indexes slow writes. Review the query plan (`EXPLAIN ANALYZE`) for every significant query before shipping. Index maintenance is a recurring task, not a one-time setup.
79
+
80
+ ### Soft Deletes vs Hard Deletes
81
+
82
+ - **Soft deletes**: Add a `deleted_at` timestamp column. Rows are never physically removed; queries filter with `WHERE deleted_at IS NULL`. Benefits: audit trail, easy recovery, referential integrity preserved. Cost: every query must include the filter (use a database view or ORM default scope to enforce this), storage grows indefinitely.
83
+ - **Hard deletes**: Rows are physically removed with `DELETE`. Benefits: simpler queries, smaller tables, no filter overhead. Cost: data is gone, referential integrity may break if foreign keys exist.
84
+ - **Archive pattern**: Move deleted rows to a separate `_archive` table in the same transaction. Active table stays clean; historical data is preserved. Best of both worlds at the cost of archive table maintenance.
85
+
86
+ Default to soft deletes for business entities (users, orders, products). Use hard deletes for ephemeral data (sessions, temp tokens, analytics events past retention).
87
+
88
+ ### Data Retention and Purging
89
+
90
+ Define retention policies explicitly for each table:
91
+
92
+ - **Regulatory**: Financial records may require 7-year retention. Medical records vary by jurisdiction. Document the legal basis for each retention period.
93
+ - **Operational**: Log tables, event tables, and session tables should have automated purging. Use database partitioning by date and drop old partitions — this is orders of magnitude faster than `DELETE WHERE created_at < ?`.
94
+ - **Implementation**: Schedule a recurring job (cron, pg_cron, Kubernetes CronJob) that purges expired data. Alert if the job fails silently — unpurged data grows disk usage and degrades query performance over months.
95
+
96
+ ### Query Optimization Fundamentals
97
+
98
+ Before optimizing, measure. Use `EXPLAIN ANALYZE` (PostgreSQL) or `EXPLAIN FORMAT=JSON` (MySQL) to understand the query plan:
99
+
100
+ - **Sequential scans on large tables**: Add an index on the filtered column. A sequential scan on a million-row table that should be an index scan is the most common performance bug.
101
+ - **Index-only scans**: When the query only needs columns in the index, the database reads the index without touching the table. Use covering indexes (`CREATE INDEX ... INCLUDE (col)`) for frequently queried column combinations.
102
+ - **Join order**: The query planner usually picks the right join order, but with 5+ joins, it may choose poorly. Use `EXPLAIN` to verify. If the planner is wrong, consider rewriting as CTEs or materializing intermediate results.
@@ -0,0 +1,100 @@
1
+ ---
2
+ name: backend-deployment
3
+ description: Containerization best practices, serverless patterns, health check endpoints, graceful shutdown, and deployment strategies
4
+ topics: [backend, deployment, docker, serverless, health-checks, graceful-shutdown, blue-green, canary]
5
+ ---
6
+
7
+ Deployment reliability is a multiplier on every other engineering investment — a well-written service that deploys poorly will cause more incidents than a mediocre service that deploys safely and rolls back cleanly.
8
+
9
+ ## Summary
10
+
11
+ Backend deployment covers containerization with multi-stage builds, serverless cold-start mitigation, health check endpoints, graceful shutdown, and deployment strategies. Every service needs liveness (`/health`) and readiness (`/ready`) endpoints. Handle `SIGTERM` for clean request draining during deploys.
12
+
13
+ Use blue-green or canary deployment strategies for production to minimize downtime and catch regressions under real traffic. Always run containers as non-root users with read-only filesystems.
14
+
15
+ ## Deep Guidance
16
+
17
+ ### Containerization
18
+
19
+ **Multi-stage Dockerfile:** Use separate build and runtime stages. The build stage installs all dependencies and compiles. The runtime stage copies only the compiled output and production dependencies — no compiler toolchain, no dev dependencies, no source maps in production. This reduces image size by 60–80% and eliminates build-time tools as attack surface.
20
+
21
+ ```dockerfile
22
+ FROM node:20-alpine AS builder
23
+ WORKDIR /app
24
+ COPY package*.json ./
25
+ RUN npm ci
26
+ COPY . .
27
+ RUN npm run build
28
+
29
+ FROM node:20-alpine AS runtime
30
+ WORKDIR /app
31
+ COPY --from=builder /app/dist ./dist
32
+ COPY --from=builder /app/node_modules ./node_modules
33
+ USER node
34
+ EXPOSE 3000
35
+ CMD ["node", "dist/main.js"]
36
+ ```
37
+
38
+ **Distroless base images:** For production, consider `gcr.io/distroless/nodejs` or `cgr.dev/chainguard/node`. These images contain only the runtime and its dependencies — no shell, no package manager, no OS utilities. Smaller attack surface, pass most container security scanners. Trade-off: harder to debug; use a separate debug image with a shell for incident investigation.
39
+
40
+ **Non-root user:** Always run the container process as a non-root user (`USER node` or `USER nobody`). Combine with read-only filesystems (`--read-only`) and a writable tmpfs for temp files. Set resource limits (CPU, memory) to prevent noisy-neighbor problems in shared clusters.
41
+
42
+ ### Serverless Patterns
43
+
44
+ **Cold start mitigation:** Cold starts add 200ms–2s of latency on the first request. Minimize cold starts by: keeping the function bundle small (tree-shake, avoid heavy dependencies), using provisioned concurrency for latency-sensitive paths, initializing database connections in the module scope (not inside the handler), and using connection proxies (RDS Proxy, PgBouncer) that maintain connection pools outside the function lifecycle.
45
+
46
+ **Connection pooling:** Serverless functions can spawn thousands of instances simultaneously, overwhelming a traditional database's connection limit. Use a connection pooler (PgBouncer, RDS Proxy, Neon serverless driver) that sits between functions and the database. Configure pool size per function instance conservatively (1–5 connections).
47
+
48
+ **Stateless design:** Serverless functions must be stateless. Store all session state, cache, and shared state in external services (Redis, DynamoDB). Write output to S3 or a database, never to the local filesystem.
49
+
50
+ ### Health Check Endpoints
51
+
52
+ Every service must expose two health endpoints:
53
+
54
+ **`GET /health` (liveness):** Returns 200 if the process is running and not deadlocked. Checked by the orchestrator to decide whether to restart the container. Must not check external dependencies — if the database is down, the container should not be restarted (it won't help). Respond within 50ms.
55
+
56
+ **`GET /ready` (readiness):** Returns 200 if the service is ready to serve traffic; returns 503 otherwise. Checked before routing traffic to a new instance. Should verify critical dependencies: database connectivity, required cache availability. Remove from load balancer rotation when returning 503. Add a startup delay check so new instances don't receive traffic before warming up.
57
+
58
+ Include response body with version, uptime, and dependency statuses for operational visibility.
59
+
60
+ ### Graceful Shutdown
61
+
62
+ Handle `SIGTERM` (sent by Kubernetes, Docker, and process managers before killing the process):
63
+
64
+ 1. Stop accepting new connections (close the HTTP server's listening socket).
65
+ 2. Allow in-flight requests to complete (drain the request queue).
66
+ 3. Close database connections and message queue consumers cleanly.
67
+ 4. Flush buffered logs and metrics.
68
+ 5. Exit with code 0.
69
+
70
+ Set a shutdown timeout (10–30 seconds). If draining takes longer, force exit. In Kubernetes, set `terminationGracePeriodSeconds` to match. Without graceful shutdown, deployments cause dropped requests and incomplete transactions.
71
+
72
+ ### Blue-Green and Canary Deploys
73
+
74
+ **Blue-green:** Maintain two identical production environments (blue and green). Deploy the new version to the inactive environment, run smoke tests, then cut over traffic in one atomic switch. Instant rollback: switch traffic back. Cost: double the infrastructure during deployment.
75
+
76
+ **Canary:** Route a small percentage of traffic (1%, 5%, 25%) to the new version. Monitor error rates, latency, and business metrics. Gradually increase the percentage if metrics are healthy, or roll back if they degrade. Better for catching issues that only appear under real traffic patterns. Requires traffic splitting at the load balancer or service mesh layer. Define explicit rollback criteria before deploying.
77
+
78
+ ### Infrastructure as Code
79
+
80
+ Define all infrastructure in version-controlled configuration:
81
+
82
+ - **Terraform / Pulumi**: Define cloud resources (load balancers, databases, queues, DNS) as code. Every infrastructure change goes through PR review and CI validation. `terraform plan` shows the diff before `terraform apply` makes changes.
83
+ - **Docker Compose for local**: Mirror production infrastructure locally. Pin exact versions to prevent local-production drift.
84
+ - **Kubernetes manifests**: Use Helm charts or Kustomize for templating. Keep environment-specific values in separate overlay files, not hardcoded in templates.
85
+
86
+ Never create production infrastructure manually through a cloud console. Manual changes create configuration drift that causes incidents when the next Terraform apply overwrites them.
87
+
88
+ ### Resource Limits and Autoscaling
89
+
90
+ Every container must have explicit resource requests and limits:
91
+
92
+ - **CPU and memory requests**: The minimum resources the container needs. The scheduler uses these to place the container on a node with sufficient capacity.
93
+ - **CPU and memory limits**: The maximum resources the container can consume. Exceeding memory limits causes an OOM kill; exceeding CPU limits causes throttling.
94
+ - **Autoscaling**: Configure horizontal pod autoscaling (HPA) based on CPU utilization or custom metrics (request rate, queue depth). Set minimum replicas to handle baseline traffic without cold starts. Set maximum replicas to prevent runaway scaling from exhausting the cluster.
95
+
96
+ Right-size by observing actual resource usage under load, not by guessing. Over-provisioning wastes money; under-provisioning causes throttling and OOM kills.
97
+
98
+ ### Deployment Checklist
99
+
100
+ Before every production deployment: Are database migrations backward-compatible with both the old and new app version? Has the new version been validated in a staging environment? Are rollback procedures tested and documented? Are monitoring dashboards open and ready to observe the deployment? Is the on-call engineer aware of the deployment? Are feature flags configured to allow incremental rollout?
@@ -0,0 +1,102 @@
1
+ ---
2
+ name: backend-dev-environment
3
+ description: Docker Compose for local databases and queues, database seeding and migration scripts, API testing tools, environment variable management, and local SSL setup
4
+ topics: [backend, dev-environment, docker, migrations, testing, environment-variables]
5
+ ---
6
+
7
+ A backend development environment that requires manual setup steps is a productivity drain and an onboarding failure. The standard should be: clone the repo, run one command, and have a fully functional local environment in under five minutes. Docker Compose is the primary tool for achieving this — it pins the exact versions of every external dependency and makes the environment reproducible across all developer machines.
8
+
9
+ ## Summary
10
+
11
+ A backend dev environment should go from fresh checkout to running API in under five minutes. Docker Compose pins infrastructure dependencies (databases, caches, queues) to exact production-matching versions. Migrations are the source of truth for schema evolution; seed scripts provide a baseline dataset. Validate all environment variables at startup with a schema.
12
+
13
+ ## Deep Guidance
14
+
15
+ ### Docker Compose for Infrastructure
16
+
17
+ Use Docker Compose to run all stateful infrastructure dependencies locally:
18
+
19
+ - **Databases**: Pin the exact version that matches production. `postgres:16.2-alpine`, not `postgres:latest`. Version drift between local and production is a common source of query behavior differences.
20
+ - **Cache / message queues**: Redis, RabbitMQ, Kafka, or SQS-compatible LocalStack — all run well in Compose. Define realistic resource limits to catch memory pressure issues locally.
21
+ - **Health checks**: Add `healthcheck` to each service so `docker compose up --wait` blocks until the database is actually accepting connections, not just when the container starts.
22
+ - **Named volumes**: Use named volumes (not bind mounts) for database data. This prevents accidental data loss when rebuilding containers.
23
+
24
+ A `compose.yml` at the repo root is the convention. A `compose.override.yml` (gitignored) allows developer-specific customizations without touching the shared file.
25
+
26
+ ### Database Migrations
27
+
28
+ Migrations are the source of truth for schema evolution. Never modify the database directly in development or production:
29
+
30
+ - **Migration tool selection**: Flyway (Java, SQL-based), Liquibase (XML/YAML/SQL), Alembic (Python), golang-migrate, Prisma Migrate, or Knex — choose one and commit to it.
31
+ - **Up/down migrations**: Every migration needs a reversible `down` script. Test the down migration in CI — it is the escape hatch for a bad deployment.
32
+ - **Naming convention**: `V20240115__add_order_status_index.sql` — timestamp prefix ensures ordering is deterministic across branches.
33
+ - **Run on startup**: In development, configure the app to run pending migrations automatically on startup. In production, run migrations as a pre-deployment step before traffic shifts.
34
+
35
+ ### Database Seeding
36
+
37
+ Seed scripts provide a consistent baseline dataset for development and testing:
38
+
39
+ - **Seed data separate from migrations**: Never embed seed data in migration files. Migrations are schema changes; seed scripts populate data for local development.
40
+ - **Idempotent seeds**: Seed scripts must be safe to run multiple times. Use `INSERT ... ON CONFLICT DO NOTHING` or check for existence before inserting.
41
+ - **Representative data**: Seed data should cover edge cases — empty states, boundary values, long strings — not just happy-path records.
42
+ - **`npm run db:seed` or `make seed`**: One command to seed the database from a fresh state.
43
+
44
+ ### API Testing Tools
45
+
46
+ Maintain machine-readable API test artifacts in the repository:
47
+
48
+ - **Postman / Bruno collections**: Commit a `postman_collection.json` or `bruno/` directory with requests for every endpoint. Parameterize base URLs and auth tokens via environment files. Bruno is preferred for new projects — it uses plain-text files that diff well in git.
49
+ - **curl scripts**: A `scripts/api/` directory with shell scripts exercising each endpoint is a low-overhead alternative. Always useful for CI health checks.
50
+ - **REST Client files**: `.http` files (VS Code REST Client extension) are readable and committable. Good for simple endpoint documentation.
51
+
52
+ ### Environment Variable Management
53
+
54
+ - **`.env.example`**: Committed to the repo. Contains all required variable names with placeholder values and comments explaining each. New developers copy this to `.env`.
55
+ - **`.env`**: Gitignored. Developer-specific values. Never committed.
56
+ - **Validation at startup**: Parse and validate the entire env config with a schema (Zod + `z.string().url()`, envalid) before the server starts. Fail with a clear error listing missing variables, not a cryptic runtime crash 30 minutes later.
57
+ - **Secrets vs config**: Secrets (database passwords, private keys) are fetched from a secrets manager in production environments. Config (feature flags, timeouts) can be env vars.
58
+
59
+ ### Local SSL
60
+
61
+ For services that require HTTPS locally (OAuth redirects, secure cookies, mixed-content testing):
62
+
63
+ - **mkcert**: Generates locally trusted TLS certificates signed by a local CA. `mkcert localhost 127.0.0.1` produces a certificate that browsers trust without warnings.
64
+ - **Caddy reverse proxy**: Configure Caddy as a local reverse proxy with automatic HTTPS to avoid configuring TLS in the app itself.
65
+
66
+ ### One-Command Setup
67
+
68
+ The `make setup` or `./scripts/dev-setup.sh` target should: install required tool versions (via asdf or mise), pull Docker images, copy `.env.example` to `.env`, start Compose services, run migrations, and run seed scripts. A developer who has never seen the project should be running a working API in under five minutes.
69
+
70
+ Document any manual steps that cannot be automated (hardware keys, corporate VPN certificates) in a `docs/dev-setup.md`. Every manual step is a future support ticket.
71
+
72
+ ### Database GUI Tools
73
+
74
+ Provide a web-based database admin tool in the Compose stack for development:
75
+
76
+ - **pgAdmin** or **Adminer**: Low-overhead database browsing. Adminer supports PostgreSQL, MySQL, SQLite, and MongoDB in a single container.
77
+ - **TablePlus / DBeaver**: Recommend as desktop alternatives in the dev setup docs. Include connection strings for copy-paste.
78
+
79
+ A developer who cannot see the data cannot debug the application. Make database access a one-click experience in the dev environment.
80
+
81
+ ### Test Data Generation
82
+
83
+ Beyond simple seeds, provide tools for generating realistic test data at scale:
84
+
85
+ - **Faker libraries**: `@faker-js/faker` (Node), `Faker` (Python), `gofakeit` (Go) generate realistic names, addresses, emails, and domain-specific values.
86
+ - **Factory patterns**: Build factory functions that produce valid domain objects with sensible defaults and explicit overrides. `createOrder({ status: 'cancelled' })` should handle all required fields automatically.
87
+ - **Volume seeding**: For performance testing locally, generate 10,000–100,000 rows. This catches query performance issues that are invisible at 100 rows. Include a `make seed-large` target for this scenario.
88
+
89
+ ### Hot Reloading Backend Services
90
+
91
+ For Node.js services, use `tsx --watch` or `nodemon` for automatic restarts on file change:
92
+
93
+ ```json
94
+ {
95
+ "scripts": {
96
+ "dev": "tsx watch src/server.ts",
97
+ "dev:debug": "tsx watch --inspect src/server.ts"
98
+ }
99
+ }
100
+ ```
101
+
102
+ For Go: `air` provides automatic rebuild and restart. For Python: `uvicorn --reload` handles FastAPI and Starlette. For Rust: `cargo watch -x run`. The goal is sub-second feedback loops for backend changes matching what frontend developers expect from HMR.
@@ -0,0 +1,102 @@
1
+ ---
2
+ name: backend-observability
3
+ description: Structured logging, distributed tracing, RED method metrics, SLO-based alerting, and operational dashboards
4
+ topics: [backend, observability, logging, tracing, metrics, alerting, opentelemetry, slo]
5
+ ---
6
+
7
+ Observability is the ability to answer arbitrary questions about a system's behavior using its outputs alone — without deploying new code. Investing in structured logging, distributed tracing, and SLO-based alerting before the first incident makes the difference between a 5-minute diagnosis and a 5-hour outage.
8
+
9
+ ## Summary
10
+
11
+ Observability requires three pillars: structured JSON logging with correlation IDs, distributed tracing via OpenTelemetry, and RED method metrics (Rate, Errors, Duration) for every service endpoint. SLO-based alerting replaces noisy threshold alerts. Build operational dashboards around these signals before the first incident, not after.
12
+
13
+ ## Deep Guidance
14
+
15
+ ### Structured Logging
16
+
17
+ Log in JSON to make logs machine-parseable and searchable in aggregation systems (Datadog, Splunk, CloudWatch Logs Insights, Grafana Loki).
18
+
19
+ **Standard fields every log line should include:**
20
+ - `timestamp` — ISO 8601 with milliseconds
21
+ - `level` — `debug`, `info`, `warn`, `error`
22
+ - `message` — human-readable description
23
+ - `service` — service name and version
24
+ - `traceId` / `requestId` — correlation ID propagated from the incoming request
25
+
26
+ **Log levels:**
27
+ - `debug` — verbose development information; disabled in production by default
28
+ - `info` — normal operational events (request received, job completed, user logged in)
29
+ - `warn` — recoverable issues (retry attempted, fallback used, deprecated API called)
30
+ - `error` — failures requiring attention (unhandled exception, external service down, database error)
31
+
32
+ **Never log:** Passwords, tokens, API keys, credit card numbers, SSNs, or full request bodies of sensitive endpoints. Implement a logging middleware that redacts known-sensitive field names before any log line is written.
33
+
34
+ **Correlation IDs:** Generate a UUID per request at the entry point (API gateway or first middleware). Propagate it through all downstream service calls via the `X-Request-ID` or `traceparent` header. Include it in every log line. This makes it possible to trace a single user request across all services and log streams.
35
+
36
+ ### Distributed Tracing
37
+
38
+ Use OpenTelemetry (OTel) — the vendor-neutral standard for distributed tracing, metrics, and logs. Instrument once, export to any backend (Jaeger, Zipkin, Datadog APM, Honeycomb, AWS X-Ray).
39
+
40
+ **Trace context propagation:** The OTel SDK automatically propagates `traceparent` and `tracestate` headers when you use instrumented HTTP clients. Always use the instrumented clients — never make raw HTTP calls that bypass propagation.
41
+
42
+ **Auto-instrumentation:** Use OTel auto-instrumentation packages for your framework (Express, FastAPI, Spring Boot) to capture traces for all incoming requests and outgoing calls without manual spans.
43
+
44
+ **Custom spans:** Add manual spans for business operations that span multiple functions: `processOrder`, `chargePayment`, `sendNotification`. Annotate spans with relevant attributes (user ID, order ID, amount). This provides visibility into where time is spent within a service.
45
+
46
+ **Sampling:** In high-throughput services, trace every request at 100% to observe development/staging. In production, use head-based sampling (5–10%) to control costs while preserving statistically representative data. Always sample 100% of errored traces regardless of the sampling rate.
47
+
48
+ ### Metrics — RED Method
49
+
50
+ The RED method (Rate, Errors, Duration) provides a minimal but complete view of service health:
51
+
52
+ - **Rate:** Requests per second. Baseline normal traffic and alert on significant drops (traffic lost) or spikes (traffic surge, potential attack).
53
+ - **Errors:** Request error rate as a percentage. Alert when the error rate exceeds the SLO threshold. Track by error type (4xx client errors vs. 5xx server errors).
54
+ - **Duration:** Request latency. Track p50, p95, and p99. Alert on p99 exceeding the SLO latency threshold. Latency degradation often precedes error rate spikes.
55
+
56
+ Instrument these three metrics for every service endpoint. For background workers, instrument job throughput (rate), job failures (errors), and job duration.
57
+
58
+ ### SLO-Based Alerting
59
+
60
+ Define SLOs (Service Level Objectives) before writing alert rules. An SLO is a measurable target: "99.9% of requests complete successfully" or "p99 latency < 500ms over a 28-day window."
61
+
62
+ **Error budget:** The tolerance implied by the SLO. A 99.9% availability SLO has a 0.1% error budget — about 44 minutes of downtime per month. Track error budget consumption and alert when it is burning faster than expected.
63
+
64
+ **Burn rate alerts:** Alert when the error budget is being consumed at a rate that would exhaust it before the window ends. A burn rate of 2x means the budget runs out in half the window. Use multi-window burn rate alerts (fast burn: 5-minute window at 14x burn rate; slow burn: 1-hour window at 2x burn rate) to catch both sudden incidents and gradual degradations.
65
+
66
+ Avoid threshold alerts on raw metrics — they generate too many false positives. SLO-based alerting reduces alert fatigue and ensures alerts correspond to actual user impact.
67
+
68
+ ### Dashboards
69
+
70
+ Build dashboards around the RED metrics plus infrastructure health. Every service dashboard should show: request rate, error rate, p50/p95/p99 latency, error budget remaining, database query latency, and downstream service call rates.
71
+
72
+ Add runbook links to every alert. An alert without a runbook wastes incident response time.
73
+
74
+ ### Log Aggregation Pipeline
75
+
76
+ Production logging requires a pipeline from application to searchable dashboard:
77
+
78
+ 1. **Application** emits structured JSON logs to stdout (never to files in containerized environments)
79
+ 2. **Log shipper** (Fluentd, Fluent Bit, Vector, or the platform's native agent) collects logs from stdout
80
+ 3. **Log store** (Elasticsearch, Grafana Loki, CloudWatch Logs, Datadog) indexes and stores logs
81
+ 4. **Dashboard** (Kibana, Grafana, CloudWatch Insights) provides search, filtering, and alerting
82
+
83
+ Keep log retention proportional to the debugging window: 14–30 days for verbose logs, 90 days for error logs, 1 year for audit logs. Storage costs scale linearly with retention — budget accordingly.
84
+
85
+ ### Infrastructure Metrics
86
+
87
+ Beyond application metrics, monitor the infrastructure layer:
88
+
89
+ - **Container metrics**: CPU usage, memory usage, restart count, OOM kill events
90
+ - **Database metrics**: Active connections, query latency p95, replication lag, disk usage
91
+ - **Queue metrics**: Queue depth, consumer lag, DLQ depth, message age
92
+ - **Network metrics**: Request rate at the load balancer, 5xx rate, connection errors
93
+
94
+ Set alerts on infrastructure metrics that precede application failures: database connection pool saturation, disk usage above 80%, and replication lag above 10 seconds. Catching these early prevents user-facing incidents.
95
+
96
+ ### Incident Response Observability
97
+
98
+ During an incident, observability tools must answer three questions in under 5 minutes:
99
+
100
+ 1. **What changed?** — Deployment timeline overlaid on error rate graphs. Correlate the incident start time with the most recent deployment.
101
+ 2. **What is affected?** — Service dependency map showing which services are degraded. Error rate by endpoint to identify the blast radius.
102
+ 3. **What is the root cause?** — Distributed trace for a failing request showing where time is spent and where errors originate. Log search by `requestId` for the specific failure context.
@@ -0,0 +1,100 @@
1
+ ---
2
+ name: backend-project-structure
3
+ description: Canonical directory layout for backend services — routes/controllers, services, models/repositories, middleware, utils, config resolution, and dependency injection patterns
4
+ topics: [backend, project-structure, architecture, dependency-injection, layers]
5
+ ---
6
+
7
+ A well-organized backend project is readable to a new engineer in minutes. The directory layout should communicate the architecture: which layer owns which responsibility, where to add a new feature, and where to find any piece of behavior. The most common structural failure is mixing concerns — business logic in controllers, database calls in services, HTTP parsing in repositories. Enforce boundaries through structure first, tooling second.
8
+
9
+ ## Summary
10
+
11
+ A backend project structure should communicate its architecture at a glance: routes own HTTP registration, handlers translate HTTP to service calls, services contain business logic, and repositories abstract data access. Config is loaded and validated at startup. Dependencies are composed explicitly via constructor injection at a single composition root.
12
+
13
+ For small services, group by layer. For large codebases (10+ engineers or 20+ domain concepts), reorganize by feature/domain with layer sub-directories and enforce import boundaries between feature modules.
14
+
15
+ ## Deep Guidance
16
+
17
+ ### Canonical Directory Layout
18
+
19
+ ```
20
+ src/
21
+ routes/ # HTTP route definitions and handler registration
22
+ handlers/ # (or controllers/) Request parsing, validation, response formatting
23
+ services/ # Business logic — no HTTP, no database drivers
24
+ repositories/ # (or models/) Data access — queries, mutations, ORM models
25
+ middleware/ # Cross-cutting HTTP concerns: auth, logging, rate limiting
26
+ utils/ # Pure utility functions: date formatting, hashing, parsing
27
+ config/ # Configuration loading and validation
28
+ types/ # Shared TypeScript types and interfaces
29
+ errors/ # Custom error classes and error code definitions
30
+ jobs/ # Background jobs, queue workers, scheduled tasks
31
+ events/ # Event emitters, message queue publishers/consumers
32
+ app.ts # Application composition root
33
+ server.ts # HTTP server startup (separate from app composition)
34
+ ```
35
+
36
+ ### Layer Responsibilities
37
+
38
+ **routes/** (or routers): Define HTTP paths, methods, and which handler each maps to. No logic — only route registration. In Express: `router.post('/orders', ordersHandler.create)`. In Fastify: schema-decorated route objects.
39
+
40
+ **handlers/** (or controllers/): Translate HTTP into service calls. Parse and validate request parameters and bodies. Call one or more services. Format and send the HTTP response. No business rules, no SQL.
41
+
42
+ **services/**: Contain all business logic. Receive plain objects (not request/response objects). Return plain objects or throw domain errors. Depend on repository interfaces, not concrete implementations — this enables testing without a real database.
43
+
44
+ **repositories/**: Abstract data access behind a consistent interface. A `UserRepository` exposes `findById(id)`, `create(data)`, `update(id, data)` — callers never see SQL or ORM query syntax. Swap implementations (Postgres ↔ in-memory ↔ DynamoDB) without changing services.
45
+
46
+ **middleware/**: Applied globally or per-route at the framework level. Authentication, request ID injection, body parsing, rate limiting, CORS headers. Each middleware does one thing.
47
+
48
+ **config/**: Load environment variables, validate them with a schema (Zod, Joi, envalid), and export a typed config object. Never access `process.env` directly outside this module.
49
+
50
+ ### Config Resolution
51
+
52
+ Config follows a priority order: environment variables override file-based config, which overrides built-in defaults. Validate the entire config object at startup with a strict schema — fail fast with a clear error if a required variable is missing. Never lazy-load config in a request handler; config should be loaded and validated once at process start.
53
+
54
+ Separate config from secrets. Config (feature flags, timeouts, rate limits) lives in environment variables or a config file. Secrets (database passwords, API keys) are fetched from a secrets manager (AWS Secrets Manager, HashiCorp Vault, Doppler) at startup or via a sidecar — never committed to version control.
55
+
56
+ ### Dependency Injection Patterns
57
+
58
+ Avoid module-level singletons for dependencies with side effects (database connections, HTTP clients). Instead, compose dependencies explicitly:
59
+
60
+ - **Constructor injection**: Pass dependencies as constructor arguments to service and repository classes. `new OrderService(orderRepository, paymentClient, eventEmitter)`. Testable without framework magic.
61
+ - **Composition root**: Wire all dependencies together in `app.ts` or a dedicated `container.ts`. This is the only place that instantiates concrete implementations.
62
+ - **DI frameworks**: InversifyJS, tsyringe, or NestJS's built-in IoC container add decorator-based injection. Use only if the manual wiring becomes unmanageable — DI frameworks add complexity and build overhead.
63
+
64
+ ### Feature-Based Structure (Large Codebases)
65
+
66
+ When a codebase grows beyond ~10 engineers or ~20 domain concepts, flat layer directories become unwieldy. Reorganize by feature/domain with layer sub-directories:
67
+
68
+ ```
69
+ src/
70
+ orders/
71
+ orders.handler.ts
72
+ orders.service.ts
73
+ orders.repository.ts
74
+ orders.types.ts
75
+ orders.test.ts
76
+ users/
77
+ users.handler.ts
78
+ users.service.ts
79
+ ...
80
+ shared/
81
+ middleware/
82
+ utils/
83
+ config/
84
+ ```
85
+
86
+ This makes the blast radius of a domain change obvious and allows teams to own vertical slices. Enforce a rule: code in `orders/` may not import from `users/` — communication happens through a service interface or event. Cross-domain dependencies are always explicit.
87
+
88
+ ### When to Create a New Layer
89
+
90
+ The urge to add layers (`managers/`, `helpers/`, `transformers/`) should be resisted unless the new concept is meaningfully distinct and will appear in more than one place. Every layer is a conceptual tax on new engineers. Prefer a clear violation of the existing convention to an ad hoc proliferation of vaguely named directories.
91
+
92
+ ### Test File Organization
93
+
94
+ Choose one test file placement pattern and enforce it project-wide:
95
+
96
+ - **Co-located**: `order-service.ts` and `order-service.test.ts` in the same directory. Easier to find the test for a given file. Preferred for unit tests.
97
+ - **Mirror directory**: `src/services/order-service.ts` and `tests/services/order-service.test.ts`. Keeps the `src/` directory clean. Preferred when test files significantly outnumber source files.
98
+ - **Integration tests**: Always in a dedicated `tests/integration/` or `tests/e2e/` directory. These test cross-layer behavior and do not belong next to any single source file.
99
+
100
+ Whichever pattern you choose, enforce it in the ESLint configuration so new files follow the convention automatically.