@coralai/sps-cli 0.41.2 → 0.43.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (168) hide show
  1. package/README.md +34 -3
  2. package/dist/commands/cardAdd.d.ts +1 -1
  3. package/dist/commands/cardAdd.d.ts.map +1 -1
  4. package/dist/commands/cardAdd.js +16 -6
  5. package/dist/commands/cardAdd.js.map +1 -1
  6. package/dist/commands/cardDashboard.js +1 -1
  7. package/dist/commands/cardDashboard.js.map +1 -1
  8. package/dist/commands/doctor.d.ts +9 -0
  9. package/dist/commands/doctor.d.ts.map +1 -1
  10. package/dist/commands/doctor.js +3 -314
  11. package/dist/commands/doctor.js.map +1 -1
  12. package/dist/commands/hookCommand.d.ts.map +1 -1
  13. package/dist/commands/hookCommand.js +6 -7
  14. package/dist/commands/hookCommand.js.map +1 -1
  15. package/dist/commands/pmCommand.js +1 -1
  16. package/dist/commands/pmCommand.js.map +1 -1
  17. package/dist/commands/projectInit.d.ts.map +1 -1
  18. package/dist/commands/projectInit.js +60 -37
  19. package/dist/commands/projectInit.js.map +1 -1
  20. package/dist/commands/setup.d.ts.map +1 -1
  21. package/dist/commands/setup.js +3 -30
  22. package/dist/commands/setup.js.map +1 -1
  23. package/dist/commands/skillCommand.d.ts +2 -0
  24. package/dist/commands/skillCommand.d.ts.map +1 -0
  25. package/dist/commands/skillCommand.js +235 -0
  26. package/dist/commands/skillCommand.js.map +1 -0
  27. package/dist/commands/tick.js +1 -1
  28. package/dist/commands/tick.js.map +1 -1
  29. package/dist/core/checklist.d.ts +22 -0
  30. package/dist/core/checklist.d.ts.map +1 -0
  31. package/dist/core/checklist.js +38 -0
  32. package/dist/core/checklist.js.map +1 -0
  33. package/dist/core/checklist.test.d.ts +2 -0
  34. package/dist/core/checklist.test.d.ts.map +1 -0
  35. package/dist/core/checklist.test.js +74 -0
  36. package/dist/core/checklist.test.js.map +1 -0
  37. package/dist/core/config.d.ts +1 -1
  38. package/dist/core/config.d.ts.map +1 -1
  39. package/dist/core/config.js +1 -1
  40. package/dist/core/config.js.map +1 -1
  41. package/dist/core/config.test.js +7 -4
  42. package/dist/core/config.test.js.map +1 -1
  43. package/dist/core/context.d.ts +1 -1
  44. package/dist/core/context.d.ts.map +1 -1
  45. package/dist/core/skillStore.d.ts +46 -0
  46. package/dist/core/skillStore.d.ts.map +1 -0
  47. package/dist/core/skillStore.js +197 -0
  48. package/dist/core/skillStore.js.map +1 -0
  49. package/dist/core/skillStore.test.d.ts +2 -0
  50. package/dist/core/skillStore.test.d.ts.map +1 -0
  51. package/dist/core/skillStore.test.js +190 -0
  52. package/dist/core/skillStore.test.js.map +1 -0
  53. package/dist/engines/EventHandler.test.js +3 -3
  54. package/dist/engines/EventHandler.test.js.map +1 -1
  55. package/dist/engines/MonitorEngine.js +2 -2
  56. package/dist/engines/MonitorEngine.js.map +1 -1
  57. package/dist/engines/SchedulerEngine.js +1 -1
  58. package/dist/engines/SchedulerEngine.js.map +1 -1
  59. package/dist/engines/StageEngine.js +3 -3
  60. package/dist/engines/StageEngine.js.map +1 -1
  61. package/dist/engines/engine-pipeline-adapter.test.js +2 -2
  62. package/dist/engines/engine-pipeline-adapter.test.js.map +1 -1
  63. package/dist/interfaces/TaskBackend.d.ts +3 -1
  64. package/dist/interfaces/TaskBackend.d.ts.map +1 -1
  65. package/dist/main.js +19 -17
  66. package/dist/main.js.map +1 -1
  67. package/dist/models/types.d.ts +16 -1
  68. package/dist/models/types.d.ts.map +1 -1
  69. package/dist/providers/MarkdownTaskBackend.d.ts +2 -1
  70. package/dist/providers/MarkdownTaskBackend.d.ts.map +1 -1
  71. package/dist/providers/MarkdownTaskBackend.js +28 -5
  72. package/dist/providers/MarkdownTaskBackend.js.map +1 -1
  73. package/dist/providers/registry.d.ts.map +1 -1
  74. package/dist/providers/registry.js +5 -7
  75. package/dist/providers/registry.js.map +1 -1
  76. package/package.json +1 -1
  77. package/project-template/.claude/hooks/start.sh +44 -0
  78. package/project-template/.claude/settings.json +1 -1
  79. package/skills/architecture-decision-records/SKILL.md +207 -0
  80. package/skills/backend/SKILL.md +62 -0
  81. package/skills/backend/references/api-design.md +168 -0
  82. package/skills/backend/references/caching.md +181 -0
  83. package/skills/backend/references/data-access.md +173 -0
  84. package/skills/backend/references/layering.md +181 -0
  85. package/skills/backend/references/observability.md +190 -0
  86. package/skills/backend/references/resilience.md +201 -0
  87. package/skills/backend/references/security.md +186 -0
  88. package/skills/backend-architect/SKILL.md +119 -0
  89. package/skills/code-reviewer/SKILL.md +143 -0
  90. package/skills/coding-standards/SKILL.md +60 -0
  91. package/skills/coding-standards/references/clean-code.md +258 -0
  92. package/skills/coding-standards/references/code-review.md +192 -0
  93. package/skills/coding-standards/references/commits-and-prs.md +226 -0
  94. package/skills/coding-standards/references/error-strategy.md +193 -0
  95. package/skills/coding-standards/references/naming.md +185 -0
  96. package/skills/coding-standards/references/tdd.md +171 -0
  97. package/skills/database/SKILL.md +53 -0
  98. package/skills/database/references/indexing.md +190 -0
  99. package/skills/database/references/migrations.md +199 -0
  100. package/skills/database/references/nosql.md +185 -0
  101. package/skills/database/references/queries.md +295 -0
  102. package/skills/database/references/scaling.md +203 -0
  103. package/skills/database/references/schema.md +191 -0
  104. package/skills/database-optimizer/SKILL.md +168 -0
  105. package/skills/debugging-workflow/SKILL.md +244 -0
  106. package/skills/devops/SKILL.md +55 -0
  107. package/skills/devops/references/ci-cd.md +204 -0
  108. package/skills/devops/references/containers.md +272 -0
  109. package/skills/devops/references/deploy.md +201 -0
  110. package/skills/devops/references/iac.md +252 -0
  111. package/skills/devops/references/observability.md +228 -0
  112. package/skills/devops/references/secrets.md +178 -0
  113. package/skills/devops-automator/SKILL.md +164 -0
  114. package/skills/frontend/SKILL.md +52 -0
  115. package/skills/frontend/references/accessibility.md +222 -0
  116. package/skills/frontend/references/components.md +206 -0
  117. package/skills/frontend/references/performance.md +219 -0
  118. package/skills/frontend/references/routing.md +209 -0
  119. package/skills/frontend/references/state.md +190 -0
  120. package/skills/frontend/references/testing.md +216 -0
  121. package/skills/frontend-developer/SKILL.md +115 -0
  122. package/skills/git-workflow/SKILL.md +355 -0
  123. package/skills/golang/SKILL.md +49 -0
  124. package/skills/golang/references/concurrency.md +284 -0
  125. package/skills/golang/references/errors.md +241 -0
  126. package/skills/golang/references/idioms.md +285 -0
  127. package/skills/golang/references/testing.md +238 -0
  128. package/skills/java/SKILL.md +50 -0
  129. package/skills/java/references/concurrency.md +194 -0
  130. package/skills/java/references/idioms.md +283 -0
  131. package/skills/java/references/testing.md +228 -0
  132. package/skills/kotlin/SKILL.md +47 -0
  133. package/skills/kotlin/references/coroutines.md +240 -0
  134. package/skills/kotlin/references/idioms.md +268 -0
  135. package/skills/kotlin/references/testing.md +219 -0
  136. package/skills/mobile/SKILL.md +50 -0
  137. package/skills/mobile/references/architecture.md +204 -0
  138. package/skills/mobile/references/navigation.md +158 -0
  139. package/skills/mobile/references/performance.md +152 -0
  140. package/skills/mobile/references/platform.md +166 -0
  141. package/skills/mobile/references/state-and-data.md +174 -0
  142. package/skills/python/SKILL.md +51 -0
  143. package/skills/python/THIRD_PARTY.md +14 -0
  144. package/skills/python/references/async.md +218 -0
  145. package/skills/python/references/error-handling.md +254 -0
  146. package/skills/python/references/idioms.md +279 -0
  147. package/skills/python/references/packaging.md +233 -0
  148. package/skills/python/references/testing.md +269 -0
  149. package/skills/python/references/typing.md +292 -0
  150. package/skills/qa-tester/SKILL.md +186 -0
  151. package/skills/rust/SKILL.md +50 -0
  152. package/skills/rust/references/async.md +224 -0
  153. package/skills/rust/references/errors.md +240 -0
  154. package/skills/rust/references/ownership.md +263 -0
  155. package/skills/rust/references/testing.md +274 -0
  156. package/skills/rust/references/traits.md +250 -0
  157. package/skills/security-engineer/SKILL.md +157 -0
  158. package/skills/swift/SKILL.md +48 -0
  159. package/skills/swift/references/concurrency.md +280 -0
  160. package/skills/swift/references/idioms.md +334 -0
  161. package/skills/swift/references/testing.md +229 -0
  162. package/skills/typescript/SKILL.md +51 -0
  163. package/skills/typescript/references/async.md +241 -0
  164. package/skills/typescript/references/errors.md +208 -0
  165. package/skills/typescript/references/idioms.md +246 -0
  166. package/skills/typescript/references/testing.md +225 -0
  167. package/skills/typescript/references/tooling.md +208 -0
  168. package/skills/typescript/references/types.md +259 -0
@@ -0,0 +1,207 @@
1
+ ---
2
+ name: architecture-decision-records
3
+ description: Workflow skill — write, review, and maintain ADRs. Capture the *why* behind technical decisions so future readers don't re-litigate them.
4
+ origin: ecc-fork (https://github.com/affaan-m/everything-claude-code, MIT)
5
+ ---
6
+
7
+ # Architecture Decision Records (ADRs)
8
+
9
+ Short, versioned documents capturing a single technical decision: what we decided, why, and what we'd need to reconsider it.
10
+
11
+ ## When to load
12
+
13
+ - Making a technical decision with non-trivial reach (affects multiple teams / components / for > 6 months).
14
+ - Introducing a new technology, service, pattern.
15
+ - Deprecating a significant piece of infrastructure.
16
+ - Reviewing someone's proposed ADR.
17
+ - Wondering "why do we do X?" and finding no record.
18
+
19
+ ## Why ADRs matter
20
+
21
+ A codebase without ADRs has this conversation every six months:
22
+
23
+ > "Why are we using Kafka here? MQ would be simpler."
24
+ > "I think… performance? I wasn't here when we decided."
25
+
26
+ The decision gets made again, people compromise on different tradeoffs, the choice drifts. An ADR records the decision while the context is fresh, so the next discussion starts from facts, not vibes.
27
+
28
+ ## Anatomy of an ADR
29
+
30
+ ```
31
+ # ADR-0007: Adopt Postgres as the primary OLTP store
32
+
33
+ Date: 2026-04-21
34
+ Status: Accepted
35
+ Deciders: Alice (CTO), Bob (Staff Eng), Carol (Platform)
36
+
37
+ ## Context
38
+
39
+ We need a primary OLTP store for the new user service. Current options
40
+ considered: Postgres, MySQL, DynamoDB, CockroachDB.
41
+
42
+ Constraints:
43
+ - Must run in both AWS and on-prem (current requirement from Customer X).
44
+ - Expect 10k QPS peak, 1 TB at year 2.
45
+ - Team has strong Postgres experience; no DynamoDB experience.
46
+ - Budget constraint: self-hosted preferred over managed where reasonable.
47
+
48
+ ## Decision
49
+
50
+ Adopt Postgres 16 as the primary OLTP store for the user service,
51
+ managed via RDS in AWS and self-hosted on-prem.
52
+
53
+ ## Consequences
54
+
55
+ Positive:
56
+ - Team already fluent; hiring pool large.
57
+ - JSONB + strong relational semantics covers 95% of our model.
58
+ - Rich ecosystem (partitioning, logical replication, extensions).
59
+
60
+ Negative:
61
+ - Horizontal scaling requires sharding (future problem if we grow past
62
+ a single-instance + read-replica topology).
63
+ - Less native cloud integration than DynamoDB on AWS.
64
+
65
+ ## Alternatives considered
66
+
67
+ - MySQL: team less familiar; similar capability otherwise.
68
+ - DynamoDB: no on-prem story, access-pattern-locked schema design.
69
+ - CockroachDB: stronger horizontal scaling; team has no ops experience.
70
+
71
+ ## Reconsider if
72
+
73
+ - We need genuine multi-region write active/active.
74
+ - On-prem requirement is dropped.
75
+ - Operational burden of sharding exceeds the effort to migrate.
76
+
77
+ ## Related
78
+ - ADR-0003 (record why we split auth from user service)
79
+ - ADR-0005 (picked AWS as primary cloud)
80
+ ```
81
+
82
+ ## Structure — keep it short
83
+
84
+ Sections:
85
+
86
+ 1. **Context** — the situation and constraints.
87
+ 2. **Decision** — one paragraph. What we're doing.
88
+ 3. **Consequences** — positive + negative + neutral effects.
89
+ 4. **Alternatives considered** — what else we weighed.
90
+ 5. **Reconsider if** — conditions that should trigger a revisit.
91
+ 6. **Related** — links to prior ADRs, docs, tickets.
92
+
93
+ Two pages max. ADRs that bloat into design docs stop getting read.
94
+
95
+ ## Numbering & status
96
+
97
+ Sequential: `ADR-0001-...md` in `docs/adr/` or similar. Status:
98
+
99
+ | Status | Meaning |
100
+ |---|---|
101
+ | **Proposed** | Up for review |
102
+ | **Accepted** | Approved and in effect |
103
+ | **Rejected** | Considered, not adopted |
104
+ | **Deprecated** | No longer applied; kept for history |
105
+ | **Superseded by ADR-XXXX** | Replaced; link the successor |
106
+
107
+ Don't edit accepted ADRs. Write a new one that supersedes, and update the old one's status to `Superseded by ADR-NNNN`.
108
+
109
+ ## When to write one
110
+
111
+ Rule of thumb: if someone will ask "why did we do this?" in six months, there should be an ADR.
112
+
113
+ Triggers:
114
+ - Adopting or replacing infrastructure (DB, queue, cache, build tool).
115
+ - Choosing a communication style (REST vs. gRPC, sync vs. async).
116
+ - Non-obvious architectural constraints (single-writer model, tenant isolation scheme).
117
+ - Significant policy: code style, review rules, SLO definitions.
118
+ - Deprecations and removals.
119
+
120
+ Don't write one for:
121
+ - Naming a variable.
122
+ - Choosing an icon size.
123
+ - Local refactors without reach beyond the file.
124
+
125
+ ## The review
126
+
127
+ Treat an ADR like a PR. Open it for comment with `Status: Proposed`. Reviewers focus on:
128
+ - Are the constraints accurate?
129
+ - Are the alternatives real alternatives?
130
+ - Are the consequences honest (including the painful ones)?
131
+ - Is the "reconsider if" section a real re-opener?
132
+
133
+ Timebox the review — ADRs that linger in review lose momentum. A week is usually enough.
134
+
135
+ ## Who writes / approves
136
+
137
+ - **Author**: the engineer proposing or doing the work.
138
+ - **Reviewers**: peers, tech lead, any team directly affected.
139
+ - **Approver**: usually the senior engineer / architect responsible for the area. One approver is enough; more than three is a committee.
140
+
141
+ ## Living with ADRs
142
+
143
+ The document isn't the point — the decision is. Refer to ADRs in:
144
+
145
+ - PR descriptions ("This implements the approach in ADR-0012").
146
+ - Onboarding docs ("Our conventions live in `docs/adr/`").
147
+ - Incident postmortems (when a decision's tradeoff bit).
148
+
149
+ A directory of ADRs is the most compact onboarding material you can give a new engineer.
150
+
151
+ ## Tools
152
+
153
+ Minimal stack:
154
+ - `docs/adr/NNNN-short-title.md` in the repo.
155
+ - A script or `adr-tools` / `log4brains` for numbering.
156
+ - Index file listing all ADRs and statuses.
157
+
158
+ Heavier options (Confluence, Notion) work, but markdown-in-repo wins for:
159
+ - Version control (the decision is versioned with the code that enacts it).
160
+ - Easy diff when an ADR is updated.
161
+ - No hunting across multiple surfaces.
162
+
163
+ ## What a good ADR feels like
164
+
165
+ - A reader can decide "should I care about this?" from the title + first sentence.
166
+ - A new hire reading it a year later can understand the choice without asking.
167
+ - The "reconsider if" section is specific enough that an engineer in 2028 knows when to revisit.
168
+
169
+ ## What a bad ADR looks like
170
+
171
+ - Title: "ADR-12: Kafka" (no decision; no context).
172
+ - 15 pages describing the system in full, decision buried on page 9.
173
+ - No alternatives. No constraints. Reads like a sales pitch for the chosen option.
174
+ - No "reconsider if" — the decision looks eternal.
175
+ - Written after the decision was shipped, recast to fit what was built.
176
+
177
+ ## Tradeoffs to always name
178
+
179
+ - **Write now vs. write later**: writing during the decision takes 30 min; reconstructing it a year later takes hours and produces lies.
180
+ - **Rigor vs. effort**: short-and-honest beats long-and-idealized.
181
+ - **Formal vs. casual process**: start casual; formalize as the org grows.
182
+ - **Centralized vs. team-local ADRs**: team-local for team-scoped decisions; central for cross-team.
183
+
184
+ ## Common failure modes
185
+
186
+ | Failure | Why |
187
+ |---|---|
188
+ | No ADRs written | Decisions get re-litigated; tribal knowledge rots |
189
+ | ADRs written but ignored | Not linked from PRs / docs; unfindable |
190
+ | ADRs written post-hoc to justify | Lose the "we considered X and Y" honesty |
191
+ | ADRs that are 20 pages | Nobody reads them; collapse to summary |
192
+ | ADRs that keep getting edited | Write a new one that supersedes |
193
+ | "ADR" that just says "we'll use X" | Decision without context / alternatives / consequences |
194
+
195
+ ## Anti-patterns
196
+
197
+ - Writing an ADR to lock down a decision that hasn't actually been discussed.
198
+ - Using ADRs as RFC-lite without a clear question and clear options.
199
+ - Updating an accepted ADR to change the decision — write a new superseding ADR.
200
+ - Endless review cycles (> 2 weeks) — call consensus and accept; iterate if reality disagrees later.
201
+ - Hiding ADRs in Confluence under three levels of navigation — in the repo is best.
202
+ - Treating ADRs as permission — the ADR records a decision, it doesn't replace engineering judgment on specifics.
203
+
204
+ ## Pair with
205
+
206
+ - [`coding-standards/references/code-review.md`](../coding-standards/references/code-review.md) — review discipline.
207
+ - [`backend-architect`](../backend-architect/SKILL.md) — the role that most often drives ADRs.
@@ -0,0 +1,62 @@
1
+ ---
2
+ name: backend
3
+ description: Backend end skill — API design, layering, data access, caching, auth, resilience, observability. Language-neutral. Combine with a language skill (`python`, `typescript`, `golang`, etc.) for syntax, and with persona skills (`backend-architect`, `database-optimizer`) for mindset.
4
+ origin: ecc-fork + original (https://github.com/affaan-m/everything-claude-code, MIT)
5
+ ---
6
+
7
+ # Backend
8
+
9
+ Server-side architecture patterns. **Language-neutral by design** — examples use pseudocode or diagrams, never a specific language. Pair with a language skill for idiomatic implementation.
10
+
11
+ ## When to load
12
+
13
+ - Designing or reviewing server-side code (API, service, worker)
14
+ - Deciding layering (repository, service, controller, domain)
15
+ - Data access: queries, transactions, migrations, N+1, connection pooling
16
+ - Caching, queuing, background jobs
17
+ - Authentication, authorization, rate limiting, input validation
18
+ - Resilience: retries, timeouts, circuit breakers, idempotency
19
+ - Observability: structured logging, metrics, traces, health checks
20
+
21
+ ## Core principles
22
+
23
+ 1. **Keep the domain ignorant of infrastructure.** Business logic doesn't import HTTP, DB drivers, or queues directly — those cross the boundary through interfaces.
24
+ 2. **The caller should be able to swap the implementation.** If you can't replace the DB with an in-memory fake in tests, your layering is wrong.
25
+ 3. **Every write is either idempotent or transactional.** Retries must be safe.
26
+ 4. **Input validation happens at the edge.** Once data is inside the domain, it is trusted.
27
+ 5. **Timeouts on every outbound call.** No unbounded network wait. Ever.
28
+ 6. **Never log secrets, tokens, PII.** Redact at the logger, not at the call site.
29
+ 7. **Observability is not optional.** A request you can't trace is a bug you can't fix.
30
+ 8. **Errors cross the boundary as data, not as exceptions.** The HTTP layer decides status codes; the domain raises domain errors.
31
+
32
+ ## How to use references
33
+
34
+ | Reference | When to load |
35
+ |---|---|
36
+ | [`references/api-design.md`](references/api-design.md) | REST/GraphQL/gRPC conventions, versioning, error format, pagination |
37
+ | [`references/layering.md`](references/layering.md) | Repository / service / controller, hexagonal, dependency direction |
38
+ | [`references/data-access.md`](references/data-access.md) | Transactions, N+1, migrations, connection pooling |
39
+ | [`references/caching.md`](references/caching.md) | Cache-aside, write-through, TTL, invalidation, stampede protection |
40
+ | [`references/security.md`](references/security.md) | AuthN vs authZ, sessions vs tokens, RBAC, rate limiting, input validation |
41
+ | [`references/resilience.md`](references/resilience.md) | Retries, timeouts, circuit breakers, idempotency, background jobs |
42
+ | [`references/observability.md`](references/observability.md) | Structured logging, metrics, traces, health checks, correlation IDs |
43
+
44
+ ## Language binding
45
+
46
+ This skill has no language-specific content. For concrete syntax:
47
+
48
+ - Python backend → load `python` + this skill
49
+ - TypeScript/Node → load `typescript` + this skill
50
+ - Go → load `golang` + this skill
51
+ - etc.
52
+
53
+ ## Forbidden patterns (auto-reject)
54
+
55
+ - Business logic that imports HTTP request/response objects directly
56
+ - DB queries issued from controllers (bypass the repository)
57
+ - Outbound HTTP / DB call with no timeout
58
+ - Writes that aren't idempotent AND aren't in a transaction
59
+ - Secrets or tokens in logs
60
+ - Unvalidated input reaching the domain layer
61
+ - Catch-all `500 Internal Server Error` as the only error response
62
+ - Silent swallowing of background-job failures (no dead-letter, no alert)
@@ -0,0 +1,168 @@
1
+ # API Design
2
+
3
+ REST, GraphQL, gRPC conventions. Focus on contracts, not implementations.
4
+
5
+ ## Style selection
6
+
7
+ | Style | Good for | Weak for |
8
+ |---|---|---|
9
+ | REST | CRUD resources, public APIs, cacheable reads | Rich queries, partial responses, real-time |
10
+ | GraphQL | Client-driven shape, many UIs against one backend | Simple CRUD, caching, rate limiting per field |
11
+ | gRPC | Service-to-service, strict schemas, streaming | Browsers without a proxy, public APIs |
12
+
13
+ When in doubt, start with REST. Switch later if the pain justifies the churn.
14
+
15
+ ## REST resource conventions
16
+
17
+ Nouns, plural, lowercase, hyphenated. Hierarchy reflects ownership.
18
+
19
+ ```
20
+ GET /users # list
21
+ GET /users/{id} # read one
22
+ POST /users # create
23
+ PUT /users/{id} # full replace
24
+ PATCH /users/{id} # partial update
25
+ DELETE /users/{id} # delete
26
+
27
+ GET /users/{id}/orders # sub-resources
28
+ POST /users/{id}/orders
29
+ ```
30
+
31
+ Avoid verbs in paths (`/getUser`, `/createOrder`). If an action truly doesn't fit CRUD, sub-resource it: `POST /orders/{id}/cancel`.
32
+
33
+ ## Pagination
34
+
35
+ Cursor-based for anything that can grow. Offset/limit is fine for small fixed sets.
36
+
37
+ ```
38
+ # Cursor (preferred for large / infinite lists)
39
+ GET /events?cursor=eyJpZCI6MTIzfQ&limit=50
40
+ # Response
41
+ {
42
+ "data": [...],
43
+ "next_cursor": "eyJpZCI6MTczfQ",
44
+ "has_more": true
45
+ }
46
+
47
+ # Offset (fine for small admin views)
48
+ GET /users?offset=0&limit=20
49
+ ```
50
+
51
+ Offset pagination breaks silently when rows are inserted during paging; cursors don't.
52
+
53
+ ## Filtering, sorting
54
+
55
+ ```
56
+ GET /orders?status=paid&created_after=2026-01-01&sort=-created_at&limit=20
57
+
58
+ # Sort prefix: - for descending
59
+ sort=-created_at,name
60
+ ```
61
+
62
+ Whitelist allowed filter and sort fields. Never pass user-provided strings into query builders without validation.
63
+
64
+ ## Error responses
65
+
66
+ Consistent shape everywhere. Problem Details (RFC 9457) is a reasonable default.
67
+
68
+ ```json
69
+ {
70
+ "type": "https://errors.example.com/validation",
71
+ "title": "Validation failed",
72
+ "status": 422,
73
+ "detail": "email is required",
74
+ "errors": [
75
+ { "field": "email", "code": "required" },
76
+ { "field": "age", "code": "out_of_range" }
77
+ ],
78
+ "request_id": "req_01HX..."
79
+ }
80
+ ```
81
+
82
+ Rules:
83
+ - `status` matches the HTTP status.
84
+ - `request_id` correlates with server logs.
85
+ - Never leak stack traces to clients.
86
+
87
+ ## HTTP status codes
88
+
89
+ | Code | Use for |
90
+ |---|---|
91
+ | 200 | Successful read or update |
92
+ | 201 | Resource created; include `Location` header |
93
+ | 202 | Async accepted; polling URL in body or `Location` |
94
+ | 204 | Success with no body (e.g. DELETE) |
95
+ | 400 | Malformed request (bad JSON, missing path param) |
96
+ | 401 | No / invalid auth |
97
+ | 403 | Authenticated but forbidden |
98
+ | 404 | Resource does not exist (or is hidden from this caller) |
99
+ | 409 | Conflict (duplicate, version mismatch) |
100
+ | 422 | Well-formed but semantically invalid |
101
+ | 429 | Rate limited; include `Retry-After` |
102
+ | 500 | Unexpected server error |
103
+ | 503 | Dependency down / overloaded |
104
+
105
+ `400` vs `422`: parse error vs validation error. `403` vs `404`: exposing 403 leaks existence of the resource — return `404` when that leak matters.
106
+
107
+ ## Versioning
108
+
109
+ Pick one and be consistent.
110
+
111
+ | Strategy | Example | Trade-off |
112
+ |---|---|---|
113
+ | URL | `/v1/users`, `/v2/users` | Simple; clutters paths; forces clients to migrate wholesale |
114
+ | Header | `Accept: application/vnd.api+json;version=2` | Clean URLs; harder to test in curl |
115
+ | Query | `/users?v=2` | Easy; often accidentally cached |
116
+
117
+ Bump the major version only for breaking changes. Additive changes (new optional fields) go in the same version.
118
+
119
+ ## Idempotency
120
+
121
+ Any non-GET request that retries must be safe. Accept an `Idempotency-Key` header for unsafe methods.
122
+
123
+ ```
124
+ POST /payments
125
+ Idempotency-Key: 7a8b9c...
126
+ ```
127
+
128
+ Server stores `(key, request_hash) -> response` for N hours. Same key + same body → return stored response. Same key + different body → 409.
129
+
130
+ ## GraphQL conventions
131
+
132
+ - One endpoint: `POST /graphql`.
133
+ - Mutations return the modified object plus a client-defined selection, so the UI can update without a refetch.
134
+ - Don't expose database IDs; use opaque global IDs (Relay spec) if you want pagination federation.
135
+ - Enforce max query depth and complexity to prevent DoS-by-query.
136
+
137
+ ## gRPC conventions
138
+
139
+ - Use proto3.
140
+ - Every field is optional; breaking changes happen when you rename or renumber.
141
+ - Stream only when the payload doesn't fit one response.
142
+ - Put auth in metadata, not in the request message.
143
+
144
+ ## Response shape
145
+
146
+ Keep it flat. Don't wrap with `{ success: true, data: ... }` unless your framework forces it — HTTP status already signals success.
147
+
148
+ ```json
149
+ # Single resource
150
+ { "id": "u_01H", "name": "Alice", "email": "a@x.com" }
151
+
152
+ # Collection
153
+ { "data": [...], "next_cursor": "...", "has_more": false }
154
+
155
+ # Errors: see above
156
+ ```
157
+
158
+ Consistency matters more than cleverness. Pick a shape, document it, follow it.
159
+
160
+ ## Documentation
161
+
162
+ Every public endpoint has:
163
+ - path, method, auth requirement
164
+ - request body schema
165
+ - success response schema (with example)
166
+ - listed error codes
167
+
168
+ OpenAPI / Protobuf schemas are the contract. Hand-written prose docs drift and lie.
@@ -0,0 +1,181 @@
1
+ # Caching
2
+
3
+ Rules, strategies, pitfalls. Cache-aside covers 90% of cases.
4
+
5
+ ## Cache-aside (lazy loading)
6
+
7
+ Application checks cache first; on miss, loads from source and populates cache.
8
+
9
+ ```
10
+ get(id):
11
+ v = cache.get(key(id))
12
+ if v is not None:
13
+ return v # hit
14
+ v = source.load(id) # miss
15
+ if v is not None:
16
+ cache.set(key(id), v, ttl=5min)
17
+ return v
18
+ ```
19
+
20
+ Pros: simple; stale data only appears on cached keys; source is authoritative.
21
+ Cons: first reader after expiry pays full latency; risk of **cache stampede** when many readers miss together.
22
+
23
+ ## Write-through
24
+
25
+ Application writes to cache AND source atomically (usually: write source first, then cache).
26
+
27
+ ```
28
+ save(entity):
29
+ source.save(entity)
30
+ cache.set(key(entity.id), entity, ttl=5min)
31
+ ```
32
+
33
+ Pros: cache is always fresh after a write.
34
+ Cons: writes are slower; if cache write fails, you have stale data (decide: rollback, or fire-and-forget with expiry).
35
+
36
+ ## Write-behind (deferred)
37
+
38
+ Application writes to cache; a background job flushes to source later. Rare — only for very high write volume and tolerance for delayed durability. Almost always the wrong choice; you're trading data loss risk for write throughput.
39
+
40
+ ## What to cache (and what NOT to)
41
+
42
+ **Cache-friendly**:
43
+ - Read-heavy, changes rarely (config, product catalog, user profile)
44
+ - Expensive to compute (rendered HTML, aggregations, vector search)
45
+ - Idempotent reads
46
+
47
+ **Avoid caching**:
48
+ - Per-user personalized data with high cardinality (cache hit rate too low)
49
+ - Rapidly changing data (reconciliation cost > cache benefit)
50
+ - Anything where staleness is a correctness bug (balances, seat availability)
51
+
52
+ ## TTL strategy
53
+
54
+ Every cache entry must expire. No TTL = memory leak.
55
+
56
+ | Data type | Starting TTL |
57
+ |---|---|
58
+ | Static config | 1–24 h |
59
+ | User profile | 5–60 min |
60
+ | Hot aggregation | 10 s – 5 min |
61
+ | Computed render | minutes |
62
+ | Feature flag eval | 30–60 s |
63
+
64
+ Add a small random jitter (±10%) so entries don't all expire at the same instant → stampede.
65
+
66
+ ## Invalidation
67
+
68
+ The second hardest problem in computing. Three approaches:
69
+
70
+ 1. **TTL only** — simple; tolerate staleness up to TTL. Default choice.
71
+ 2. **Explicit invalidation** — on write, delete the cache key. Works if your mutation paths are countable.
72
+ ```
73
+ save(user):
74
+ db.update(user)
75
+ cache.delete(key(user.id))
76
+ ```
77
+ 3. **Event-driven** — publish `UserUpdated`; subscribers invalidate their caches. Needed when many services cache the same entity.
78
+
79
+ Don't try to *update* the cache on write in complex systems — delete instead and let the next read repopulate. Updates race; deletes don't.
80
+
81
+ ## Cache key design
82
+
83
+ Stable, explicit, version-prefixed.
84
+
85
+ ```
86
+ # ✅
87
+ user:v2:{user_id}
88
+ product:v1:{sku}:detail
89
+ list:orders:v1:user={uid}:status=paid:cursor={c}
90
+
91
+ # ❌
92
+ u_123 # ambiguous across services
93
+ users:123:details # no version
94
+ ${JSON.stringify(query)} # fragile; order-dependent
95
+ ```
96
+
97
+ Version prefix lets you deploy a new format without stampeding the old one; old keys simply age out.
98
+
99
+ ## Stampede protection
100
+
101
+ When a hot key expires, many requests miss at once and pile onto the source. Two fixes:
102
+
103
+ ### Single-flight / coalescing
104
+
105
+ In-process: at most one loader per key; concurrent callers wait for the same result.
106
+
107
+ ```
108
+ load(key):
109
+ with singleFlight(key):
110
+ return source.load(key)
111
+ ```
112
+
113
+ ### Probabilistic early expiration (XFetch)
114
+
115
+ Before the TTL, some fraction of readers voluntarily refresh.
116
+
117
+ ```
118
+ get(key):
119
+ v, ttl_remaining = cache.get_with_ttl(key)
120
+ if v is None or should_refresh_early(ttl_remaining):
121
+ v = source.load(key)
122
+ cache.set(key, v, ttl=5min)
123
+ return v
124
+ ```
125
+
126
+ ## Negative caching
127
+
128
+ Cache misses are expensive if they happen constantly (e.g., 404 lookups). Cache the absence too, with a short TTL.
129
+
130
+ ```
131
+ get(id):
132
+ v = cache.get(key(id))
133
+ if v is MISSING_SENTINEL:
134
+ return None # known-not-found
135
+ if v is not None:
136
+ return v
137
+ v = source.load(id)
138
+ cache.set(key(id), v if v else MISSING_SENTINEL, ttl=30s)
139
+ return v
140
+ ```
141
+
142
+ Short TTL — don't cache `None` for hours; the item may just have been created.
143
+
144
+ ## HTTP-level caching
145
+
146
+ For public GET endpoints, let the HTTP layer cache. Free, correctly implemented, respected by CDNs.
147
+
148
+ ```
149
+ Cache-Control: public, max-age=300, s-maxage=600, stale-while-revalidate=60
150
+ ETag: "abc123"
151
+ ```
152
+
153
+ - `max-age`: browser/client
154
+ - `s-maxage`: shared caches (CDN)
155
+ - `stale-while-revalidate`: serve stale while refreshing in the background
156
+ - `ETag` + `If-None-Match`: 304 responses save bandwidth
157
+
158
+ ## Local (in-process) cache vs distributed
159
+
160
+ | | Local (in-process) | Distributed (Redis, Memcached) |
161
+ |---|---|---|
162
+ | Latency | Nanoseconds | ~1 ms |
163
+ | Consistency across instances | No — each pod has its own | Yes |
164
+ | Size | Limited to process memory | Limited to cluster |
165
+ | Eviction | LRU, LFU | LRU, LFU, TTL |
166
+ | Cost | Free | Infra + ops |
167
+ | Invalidation | Hard across pods | One call |
168
+
169
+ Use local for small hot data; distributed for shared state. Don't mix carelessly — a per-pod cache that's supposed to be consistent will drift.
170
+
171
+ ## Anti-patterns
172
+
173
+ | Anti-pattern | Why bad |
174
+ |---|---|
175
+ | No TTL anywhere | Memory leak; stale data forever |
176
+ | Caching mutable objects by reference | Next reader mutates the cached copy |
177
+ | Caching per-user data with high cardinality | Low hit rate; wastes memory |
178
+ | Cache key includes a timestamp that changes every request | Every request is a miss |
179
+ | Serializing cache writes into the request path without timeout | Cache outage → requests hang |
180
+ | Reading cache without a fallback path | Cache is a dependency; treat it as optional |
181
+ | Storing secrets in shared cache | Secret sprawl across cluster |