@coralai/sps-cli 0.41.2 → 0.43.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (168) hide show
  1. package/README.md +34 -3
  2. package/dist/commands/cardAdd.d.ts +1 -1
  3. package/dist/commands/cardAdd.d.ts.map +1 -1
  4. package/dist/commands/cardAdd.js +16 -6
  5. package/dist/commands/cardAdd.js.map +1 -1
  6. package/dist/commands/cardDashboard.js +1 -1
  7. package/dist/commands/cardDashboard.js.map +1 -1
  8. package/dist/commands/doctor.d.ts +9 -0
  9. package/dist/commands/doctor.d.ts.map +1 -1
  10. package/dist/commands/doctor.js +3 -314
  11. package/dist/commands/doctor.js.map +1 -1
  12. package/dist/commands/hookCommand.d.ts.map +1 -1
  13. package/dist/commands/hookCommand.js +6 -7
  14. package/dist/commands/hookCommand.js.map +1 -1
  15. package/dist/commands/pmCommand.js +1 -1
  16. package/dist/commands/pmCommand.js.map +1 -1
  17. package/dist/commands/projectInit.d.ts.map +1 -1
  18. package/dist/commands/projectInit.js +60 -37
  19. package/dist/commands/projectInit.js.map +1 -1
  20. package/dist/commands/setup.d.ts.map +1 -1
  21. package/dist/commands/setup.js +3 -30
  22. package/dist/commands/setup.js.map +1 -1
  23. package/dist/commands/skillCommand.d.ts +2 -0
  24. package/dist/commands/skillCommand.d.ts.map +1 -0
  25. package/dist/commands/skillCommand.js +235 -0
  26. package/dist/commands/skillCommand.js.map +1 -0
  27. package/dist/commands/tick.js +1 -1
  28. package/dist/commands/tick.js.map +1 -1
  29. package/dist/core/checklist.d.ts +22 -0
  30. package/dist/core/checklist.d.ts.map +1 -0
  31. package/dist/core/checklist.js +38 -0
  32. package/dist/core/checklist.js.map +1 -0
  33. package/dist/core/checklist.test.d.ts +2 -0
  34. package/dist/core/checklist.test.d.ts.map +1 -0
  35. package/dist/core/checklist.test.js +74 -0
  36. package/dist/core/checklist.test.js.map +1 -0
  37. package/dist/core/config.d.ts +1 -1
  38. package/dist/core/config.d.ts.map +1 -1
  39. package/dist/core/config.js +1 -1
  40. package/dist/core/config.js.map +1 -1
  41. package/dist/core/config.test.js +7 -4
  42. package/dist/core/config.test.js.map +1 -1
  43. package/dist/core/context.d.ts +1 -1
  44. package/dist/core/context.d.ts.map +1 -1
  45. package/dist/core/skillStore.d.ts +46 -0
  46. package/dist/core/skillStore.d.ts.map +1 -0
  47. package/dist/core/skillStore.js +197 -0
  48. package/dist/core/skillStore.js.map +1 -0
  49. package/dist/core/skillStore.test.d.ts +2 -0
  50. package/dist/core/skillStore.test.d.ts.map +1 -0
  51. package/dist/core/skillStore.test.js +190 -0
  52. package/dist/core/skillStore.test.js.map +1 -0
  53. package/dist/engines/EventHandler.test.js +3 -3
  54. package/dist/engines/EventHandler.test.js.map +1 -1
  55. package/dist/engines/MonitorEngine.js +2 -2
  56. package/dist/engines/MonitorEngine.js.map +1 -1
  57. package/dist/engines/SchedulerEngine.js +1 -1
  58. package/dist/engines/SchedulerEngine.js.map +1 -1
  59. package/dist/engines/StageEngine.js +3 -3
  60. package/dist/engines/StageEngine.js.map +1 -1
  61. package/dist/engines/engine-pipeline-adapter.test.js +2 -2
  62. package/dist/engines/engine-pipeline-adapter.test.js.map +1 -1
  63. package/dist/interfaces/TaskBackend.d.ts +3 -1
  64. package/dist/interfaces/TaskBackend.d.ts.map +1 -1
  65. package/dist/main.js +19 -17
  66. package/dist/main.js.map +1 -1
  67. package/dist/models/types.d.ts +16 -1
  68. package/dist/models/types.d.ts.map +1 -1
  69. package/dist/providers/MarkdownTaskBackend.d.ts +2 -1
  70. package/dist/providers/MarkdownTaskBackend.d.ts.map +1 -1
  71. package/dist/providers/MarkdownTaskBackend.js +28 -5
  72. package/dist/providers/MarkdownTaskBackend.js.map +1 -1
  73. package/dist/providers/registry.d.ts.map +1 -1
  74. package/dist/providers/registry.js +5 -7
  75. package/dist/providers/registry.js.map +1 -1
  76. package/package.json +1 -1
  77. package/project-template/.claude/hooks/start.sh +44 -0
  78. package/project-template/.claude/settings.json +1 -1
  79. package/skills/architecture-decision-records/SKILL.md +207 -0
  80. package/skills/backend/SKILL.md +62 -0
  81. package/skills/backend/references/api-design.md +168 -0
  82. package/skills/backend/references/caching.md +181 -0
  83. package/skills/backend/references/data-access.md +173 -0
  84. package/skills/backend/references/layering.md +181 -0
  85. package/skills/backend/references/observability.md +190 -0
  86. package/skills/backend/references/resilience.md +201 -0
  87. package/skills/backend/references/security.md +186 -0
  88. package/skills/backend-architect/SKILL.md +119 -0
  89. package/skills/code-reviewer/SKILL.md +143 -0
  90. package/skills/coding-standards/SKILL.md +60 -0
  91. package/skills/coding-standards/references/clean-code.md +258 -0
  92. package/skills/coding-standards/references/code-review.md +192 -0
  93. package/skills/coding-standards/references/commits-and-prs.md +226 -0
  94. package/skills/coding-standards/references/error-strategy.md +193 -0
  95. package/skills/coding-standards/references/naming.md +185 -0
  96. package/skills/coding-standards/references/tdd.md +171 -0
  97. package/skills/database/SKILL.md +53 -0
  98. package/skills/database/references/indexing.md +190 -0
  99. package/skills/database/references/migrations.md +199 -0
  100. package/skills/database/references/nosql.md +185 -0
  101. package/skills/database/references/queries.md +295 -0
  102. package/skills/database/references/scaling.md +203 -0
  103. package/skills/database/references/schema.md +191 -0
  104. package/skills/database-optimizer/SKILL.md +168 -0
  105. package/skills/debugging-workflow/SKILL.md +244 -0
  106. package/skills/devops/SKILL.md +55 -0
  107. package/skills/devops/references/ci-cd.md +204 -0
  108. package/skills/devops/references/containers.md +272 -0
  109. package/skills/devops/references/deploy.md +201 -0
  110. package/skills/devops/references/iac.md +252 -0
  111. package/skills/devops/references/observability.md +228 -0
  112. package/skills/devops/references/secrets.md +178 -0
  113. package/skills/devops-automator/SKILL.md +164 -0
  114. package/skills/frontend/SKILL.md +52 -0
  115. package/skills/frontend/references/accessibility.md +222 -0
  116. package/skills/frontend/references/components.md +206 -0
  117. package/skills/frontend/references/performance.md +219 -0
  118. package/skills/frontend/references/routing.md +209 -0
  119. package/skills/frontend/references/state.md +190 -0
  120. package/skills/frontend/references/testing.md +216 -0
  121. package/skills/frontend-developer/SKILL.md +115 -0
  122. package/skills/git-workflow/SKILL.md +355 -0
  123. package/skills/golang/SKILL.md +49 -0
  124. package/skills/golang/references/concurrency.md +284 -0
  125. package/skills/golang/references/errors.md +241 -0
  126. package/skills/golang/references/idioms.md +285 -0
  127. package/skills/golang/references/testing.md +238 -0
  128. package/skills/java/SKILL.md +50 -0
  129. package/skills/java/references/concurrency.md +194 -0
  130. package/skills/java/references/idioms.md +283 -0
  131. package/skills/java/references/testing.md +228 -0
  132. package/skills/kotlin/SKILL.md +47 -0
  133. package/skills/kotlin/references/coroutines.md +240 -0
  134. package/skills/kotlin/references/idioms.md +268 -0
  135. package/skills/kotlin/references/testing.md +219 -0
  136. package/skills/mobile/SKILL.md +50 -0
  137. package/skills/mobile/references/architecture.md +204 -0
  138. package/skills/mobile/references/navigation.md +158 -0
  139. package/skills/mobile/references/performance.md +152 -0
  140. package/skills/mobile/references/platform.md +166 -0
  141. package/skills/mobile/references/state-and-data.md +174 -0
  142. package/skills/python/SKILL.md +51 -0
  143. package/skills/python/THIRD_PARTY.md +14 -0
  144. package/skills/python/references/async.md +218 -0
  145. package/skills/python/references/error-handling.md +254 -0
  146. package/skills/python/references/idioms.md +279 -0
  147. package/skills/python/references/packaging.md +233 -0
  148. package/skills/python/references/testing.md +269 -0
  149. package/skills/python/references/typing.md +292 -0
  150. package/skills/qa-tester/SKILL.md +186 -0
  151. package/skills/rust/SKILL.md +50 -0
  152. package/skills/rust/references/async.md +224 -0
  153. package/skills/rust/references/errors.md +240 -0
  154. package/skills/rust/references/ownership.md +263 -0
  155. package/skills/rust/references/testing.md +274 -0
  156. package/skills/rust/references/traits.md +250 -0
  157. package/skills/security-engineer/SKILL.md +157 -0
  158. package/skills/swift/SKILL.md +48 -0
  159. package/skills/swift/references/concurrency.md +280 -0
  160. package/skills/swift/references/idioms.md +334 -0
  161. package/skills/swift/references/testing.md +229 -0
  162. package/skills/typescript/SKILL.md +51 -0
  163. package/skills/typescript/references/async.md +241 -0
  164. package/skills/typescript/references/errors.md +208 -0
  165. package/skills/typescript/references/idioms.md +246 -0
  166. package/skills/typescript/references/testing.md +225 -0
  167. package/skills/typescript/references/tooling.md +208 -0
  168. package/skills/typescript/references/types.md +259 -0
@@ -0,0 +1,201 @@
1
+ # Resilience
2
+
3
+ Timeouts, retries, circuit breakers, idempotency, background jobs. Make failures cheap.
4
+
5
+ ## Timeouts — every outbound call
6
+
7
+ No exceptions. A dependency that never answers will exhaust threads, sockets, and memory.
8
+
9
+ ```
10
+ # Wrong: no timeout
11
+ response = http.get("https://upstream/api")
12
+
13
+ # Right: fail fast
14
+ response = http.get("https://upstream/api", timeout=2.0)
15
+ ```
16
+
17
+ Timeout budget, layered:
18
+
19
+ ```
20
+ client 10s
21
+ └ gateway 8s
22
+ └ service 5s
23
+ └ dependency call 2s ← must be smaller than parent budget
24
+ ```
25
+
26
+ If the inner call's timeout ≥ the outer's, the outer never gets to return a clean 504 — it just hangs.
27
+
28
+ ## Retries — only for safe, transient failures
29
+
30
+ **Retryable**:
31
+ - Network timeouts
32
+ - 5xx on GET/idempotent calls
33
+ - 429 (with `Retry-After`)
34
+ - Explicit DB "retry" errors (e.g., serialization failures)
35
+
36
+ **NOT retryable**:
37
+ - 4xx other than 429 (client bug; retry won't help)
38
+ - Any non-idempotent call without an `Idempotency-Key`
39
+ - "Connection reset" where the write may have landed
40
+
41
+ ### Exponential backoff with jitter
42
+
43
+ Pure exponential backoff creates thundering herds when many clients fail together. Always add jitter.
44
+
45
+ ```
46
+ attempt(n):
47
+ base = 100ms
48
+ max = 10s
49
+ sleep = min(max, base * 2^n) * random(0.5, 1.5)
50
+ ```
51
+
52
+ Bound the total attempts and total time; don't let retries outlive the user's patience.
53
+
54
+ ## Circuit breakers
55
+
56
+ When a dependency is sick, stop hammering it. Three states:
57
+
58
+ ```
59
+ CLOSED (normal)
60
+ │ failures exceed threshold
61
+
62
+ OPEN (fail fast, short-circuit calls)
63
+ │ after cool-down, try one request
64
+
65
+ HALF_OPEN ──success──► CLOSED
66
+
67
+ └─failure──────────► OPEN
68
+ ```
69
+
70
+ Thresholds to tune: error rate (e.g., >50% of last 20 calls), minimum sample size, cool-down time, half-open probe count.
71
+
72
+ Open-circuit response: fall back to cache, degraded response, or fail fast with 503. Never silently return empty data.
73
+
74
+ ## Idempotency
75
+
76
+ Any operation that might be retried must be safe to run twice.
77
+
78
+ ### Idempotency keys
79
+
80
+ For non-GET HTTP writes, accept an `Idempotency-Key` header.
81
+
82
+ ```
83
+ POST /payments
84
+ Idempotency-Key: 7a8b9c...
85
+
86
+ server:
87
+ stored = store.get(key)
88
+ if stored and stored.request_hash == hash(body):
89
+ return stored.response
90
+ if stored:
91
+ return 409 # same key, different body → conflict
92
+ response = execute()
93
+ store.set(key, (hash(body), response), ttl=24h)
94
+ return response
95
+ ```
96
+
97
+ ### Natural idempotency
98
+
99
+ Often better than keys: design the operation so repeats are harmless.
100
+
101
+ ```
102
+ # Not idempotent
103
+ UPDATE balance SET amount = amount + 10 WHERE id = 1
104
+
105
+ # Idempotent — absorbs double-apply
106
+ INSERT INTO ledger (id, account, amount) VALUES (:tx_id, 1, 10)
107
+ ON CONFLICT (id) DO NOTHING
108
+ ```
109
+
110
+ ## Graceful degradation
111
+
112
+ When a non-critical dependency is down, return a usable response, not an error.
113
+
114
+ ```
115
+ product = productRepo.get(id)
116
+ try:
117
+ product.recommendations = recService.for(id, timeout=300ms)
118
+ except (Timeout, ServiceError):
119
+ product.recommendations = [] # degrade, don't fail
120
+ return product
121
+ ```
122
+
123
+ Decide up front which pieces are essential vs. nice-to-have. Never degrade silently on essentials (payments, auth).
124
+
125
+ ## Background jobs
126
+
127
+ For anything not strictly needed in the request path: send, enqueue, return.
128
+
129
+ ```
130
+ # Request path
131
+ handler(req):
132
+ order = orderRepo.save(newOrder)
133
+ queue.enqueue(SendOrderEmail(order.id)) # defer
134
+ queue.enqueue(UpdateSearchIndex(order.id))
135
+ return 201
136
+ ```
137
+
138
+ Queue requirements:
139
+ - **Durable** — enqueue survives broker restart (disk, replicated)
140
+ - **At-least-once delivery** — so jobs must be idempotent
141
+ - **Dead-letter queue** — after N failures, park the message and alert
142
+ - **Visibility timeout** — consumer crashes → job requeues automatically
143
+
144
+ Common choices: Postgres-backed (pgboss, solid-queue), Redis (BullMQ, Sidekiq), managed (SQS, Cloud Tasks), streaming (Kafka).
145
+
146
+ ## Scheduled jobs
147
+
148
+ Two traps:
149
+ 1. **Lock per job** — multiple replicas must not run the same job twice. Use a DB advisory lock or a leader-election lib.
150
+ 2. **Overlap** — if a job runs longer than its interval, the next tick starts before the previous ends. Decide: skip, queue, or overlap — explicitly.
151
+
152
+ Don't use `cron` on a single VM in production; it dies with the VM. Use a platform scheduler (Kubernetes CronJob, cloud scheduler) + idempotent job logic.
153
+
154
+ ## Health checks
155
+
156
+ Two separate endpoints:
157
+
158
+ ```
159
+ GET /health/live # Am I running? (200 = process alive)
160
+ GET /health/ready # Can I take traffic? (checks DB, cache, queue connectivity)
161
+ ```
162
+
163
+ Orchestrators (K8s, load balancers) need both. `/ready` failing for 30s → take the pod out of rotation, don't kill it.
164
+
165
+ ## Graceful shutdown
166
+
167
+ On SIGTERM:
168
+ 1. Stop accepting new requests (`/ready` → 503).
169
+ 2. Finish in-flight requests (with a hard deadline, e.g., 30 s).
170
+ 3. Drain the job consumer.
171
+ 4. Close DB pools and sockets.
172
+ 5. Exit.
173
+
174
+ Without this, a deploy drops requests and leaves half-processed jobs.
175
+
176
+ ## Bulkheads
177
+
178
+ Isolate failure domains so one tenant / feature can't drown the others.
179
+
180
+ - Separate thread pool / connection pool per downstream service
181
+ - Separate queue / worker group per job class
182
+ - Separate rate limit per tenant
183
+
184
+ One noisy neighbor should degrade its own lane, not everyone's.
185
+
186
+ ## Timeouts for tasks, not just HTTP
187
+
188
+ DB query timeouts (`statement_timeout` in Postgres), job max runtime, lock wait timeout — all finite. Anything unbounded will eventually hang something.
189
+
190
+ ## Anti-patterns
191
+
192
+ | Anti-pattern | Why |
193
+ |---|---|
194
+ | Infinite retries | One bad day becomes a queue explosion |
195
+ | Retries without backoff | Synchronized thundering herds |
196
+ | Retry on POST without idempotency key | Duplicate payments, double-sends |
197
+ | Shared retry budget across unrelated calls | One bad dep exhausts retries for healthy ones |
198
+ | Catching all exceptions to mask failures | Bugs silently go to prod |
199
+ | Fire-and-forget without a dead-letter queue | Failed jobs vanish with no alert |
200
+ | "Run every N seconds" cron on a single machine | Loses work on reboot |
201
+ | Waiting forever for a lock | Locks don't auto-expire unless you say so |
@@ -0,0 +1,186 @@
1
+ # Security
2
+
3
+ Authentication, authorization, input validation, rate limiting, secrets. The non-negotiables.
4
+
5
+ ## AuthN vs AuthZ
6
+
7
+ | | Authentication | Authorization |
8
+ |---|---|---|
9
+ | Answers | Who are you? | What can you do? |
10
+ | Failure code | 401 | 403 |
11
+ | Mechanism | Session, token, signature | Role / policy / permission check |
12
+
13
+ Never conflate these. A 401 says "tell me who you are"; a 403 says "I know who you are and you can't do this".
14
+
15
+ ## Session vs token
16
+
17
+ | | Server session | Stateless token (JWT) |
18
+ |---|---|---|
19
+ | State | Server-side (DB / Redis) | In the token itself |
20
+ | Revocation | Delete session row | Hard — need blocklist or short TTL |
21
+ | Scale | Needs sticky / shared store | Stateless across servers |
22
+ | Size on wire | Small (cookie id) | Large (signed payload) |
23
+ | First-party web | Excellent | Overkill |
24
+ | Service-to-service | Weak | Natural fit |
25
+
26
+ For a browser-based web app, **server-side sessions with secure cookies** are usually the right answer. JWTs shine for APIs, federation, and service-to-service.
27
+
28
+ ## Cookies — secure defaults
29
+
30
+ ```
31
+ Set-Cookie: session=abc...; Secure; HttpOnly; SameSite=Lax; Path=/
32
+ ```
33
+
34
+ - **Secure** — HTTPS only.
35
+ - **HttpOnly** — JS can't read it (blocks XSS-based token theft).
36
+ - **SameSite=Lax** — default; blocks CSRF on cross-site POSTs. Use `Strict` for admin; `None` + `Secure` only for true cross-origin use cases.
37
+ - **Path** — scope to where it's needed.
38
+ - Don't store user data in the cookie payload; store an opaque session id.
39
+
40
+ ## JWT rules
41
+
42
+ - Always check signature. Reject `alg: none`. Reject unexpected algorithms.
43
+ - Verify `iss`, `aud`, `exp`, `nbf`.
44
+ - Short lifetime (5–15 min) + rotating refresh token.
45
+ - Don't put secrets inside; tokens are readable by anyone who has them.
46
+ - Rotate signing keys; publish via JWKS.
47
+ - Revocation: maintain a short jti blocklist in Redis for stolen-token cases.
48
+
49
+ ## Authorization models
50
+
51
+ | Model | Use when |
52
+ |---|---|
53
+ | RBAC (roles) | Small fixed set of roles: admin, user, moderator |
54
+ | ABAC (attributes) | Rules depend on attributes of user, resource, time, IP |
55
+ | ReBAC (relationships) | "Can Alice read doc X?" answered via a graph (Google Zanzibar / OpenFGA) |
56
+ | Policy-as-code (OPA, Cedar) | Complex rules that need to live outside the app |
57
+
58
+ Start with RBAC. Graduate to ReBAC/ABAC when roles no longer express the rules. Never hard-code `if user.email == "admin@x.com"`.
59
+
60
+ ## Enforce authorization at the boundary
61
+
62
+ Every handler starts with a permission check. No implicit trust.
63
+
64
+ ```
65
+ handler(req):
66
+ user = requireAuth(req)
67
+ resource = repo.load(req.id)
68
+ if not user.can(READ, resource):
69
+ return 403 | 404 # 404 if the existence of the resource is itself secret
70
+ return resource
71
+ ```
72
+
73
+ `403` vs `404`: return 404 if the existence of the resource is itself secret (e.g., private documents); return 403 otherwise.
74
+
75
+ ## Input validation
76
+
77
+ Validate everything at the edge, once. Never trust "internal" callers.
78
+
79
+ ```
80
+ schema:
81
+ email : string, format=email
82
+ age : int, 0 <= x <= 150
83
+ role : enum(user, admin)
84
+
85
+ handler(req):
86
+ cmd = schema.parse(req.body) # rejects anything else
87
+ useCase.execute(cmd)
88
+ ```
89
+
90
+ Rules:
91
+ - Whitelist what you accept, not blacklist what you reject.
92
+ - Reject unknown fields (guard against mass-assignment).
93
+ - Bound all variable-size inputs (strings, arrays): `max_length`, `max_items`.
94
+ - Parse into strong types at the boundary; don't pass raw dicts through the system.
95
+
96
+ ## Injection defenses
97
+
98
+ - **SQL**: parameterized queries ONLY. Never string-concatenate. ORMs handle this if you use their query API, not raw strings.
99
+ - **Command**: don't build shell commands from user input. If you must: use array-form `exec` (no shell) and whitelist args.
100
+ - **LDAP / XPath / NoSQL**: same rule — parameterize.
101
+ - **Template injection**: never render user input as a template (Jinja2, ERB, etc.).
102
+ - **Path traversal**: canonicalize and assert the result is inside an allow-listed directory.
103
+ - **Prototype pollution / mass assignment**: whitelist fields; never `Object.assign(user, req.body)`.
104
+
105
+ ## Passwords
106
+
107
+ - **argon2id** (preferred) or **bcrypt** (with cost ≥ 12). Never SHA-* for passwords.
108
+ - Never log passwords, even hashed.
109
+ - Enforce length (≥ 12 chars), not character classes. Check against a breached-password list (HaveIBeenPwned API / offline list).
110
+ - Account-level lockout on repeated failures, plus rate limiting per IP/account.
111
+
112
+ ## Secrets
113
+
114
+ - Never in source control. `.env` files are .gitignored; production secrets come from a secret manager (Vault, AWS Secrets Manager, GCP Secret Manager, 1Password Connect).
115
+ - Rotate on compromise AND on a schedule.
116
+ - Scope per-service and per-environment. One stolen dev key should never reach prod.
117
+ - Don't print secrets to logs. Redact at the logger config.
118
+
119
+ ## Rate limiting
120
+
121
+ Apply at the edge (CDN/API gateway) AND per-endpoint in the app.
122
+
123
+ Limits by identity:
124
+ - Anonymous: by IP — coarse, bypassable with proxies.
125
+ - Authenticated: by user id — reliable.
126
+ - Authenticated + IP: both, for defense in depth.
127
+
128
+ Algorithms:
129
+ - **Token bucket**: allows short bursts; refill rate controls long-run.
130
+ - **Fixed window**: simple, but bursty at boundaries.
131
+ - **Sliding window**: smooth; costs more.
132
+
133
+ Always include `Retry-After` on 429 responses.
134
+
135
+ ## CSRF
136
+
137
+ Required if the client is a browser using cookies. Not required if you use `Authorization: Bearer` (attacker can't trigger the header).
138
+
139
+ Defenses, pick one:
140
+ - **SameSite=Lax cookie** (default-covers most cases).
141
+ - **Double-submit cookie** — random token in cookie AND in a header; server checks they match.
142
+ - **Synchronizer token** — per-session token in the form + server-side store.
143
+
144
+ ## CORS
145
+
146
+ Set it to what you actually need. `Access-Control-Allow-Origin: *` with credentials is a silent vulnerability — browsers refuse, but a misconfigured gateway can still leak.
147
+
148
+ ```
149
+ Access-Control-Allow-Origin: https://app.example.com
150
+ Access-Control-Allow-Credentials: true
151
+ Access-Control-Allow-Methods: GET, POST, PATCH, DELETE
152
+ Access-Control-Allow-Headers: Authorization, Content-Type
153
+ Access-Control-Max-Age: 86400
154
+ ```
155
+
156
+ ## Security headers (for any HTML-serving endpoint)
157
+
158
+ ```
159
+ Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
160
+ Content-Security-Policy: default-src 'self'; ...
161
+ X-Content-Type-Options: nosniff
162
+ Referrer-Policy: strict-origin-when-cross-origin
163
+ Permissions-Policy: camera=(), microphone=(), geolocation=()
164
+ ```
165
+
166
+ ## Audit logging
167
+
168
+ Every security-relevant event gets an immutable log entry:
169
+ - login (success / fail), password change, role change, permission change
170
+ - admin actions, data exports
171
+ - access to sensitive resources
172
+
173
+ Include: actor, action, target, timestamp, source IP, request id. Store separately from app logs so a compromised app can't tamper with them.
174
+
175
+ ## Anti-patterns
176
+
177
+ | Anti-pattern | Why |
178
+ |---|---|
179
+ | Rolling your own crypto | Don't. Use the standard library / vetted lib. |
180
+ | Comparing secrets with `==` | Timing attack; use constant-time compare |
181
+ | Returning different errors for "user doesn't exist" vs "wrong password" | Username enumeration |
182
+ | Trusting `X-Forwarded-For` without checking source | Spoofable; respect it only from trusted proxies |
183
+ | One API key per team, shared over Slack | No revocation granularity |
184
+ | Storing JWTs in localStorage | XSS steals them; use HttpOnly cookies |
185
+ | "Security through obscurity" (weird endpoint paths) | Not a control |
186
+ | Disabling TLS verification "temporarily" in prod | Never |
@@ -0,0 +1,119 @@
1
+ ---
2
+ name: backend-architect
3
+ description: Persona skill — think like a backend architect. System boundaries, data flow, scaling, failure modes. Overlay on top of `backend` + language skills. For the patterns themselves, load `backend`.
4
+ origin: agency-agents-fork + original (https://github.com/msitarzewski/agency-agents, MIT)
5
+ ---
6
+
7
+ # Backend Architect
8
+
9
+ Think like a backend architect. This skill is a **mindset overlay**, not a pattern catalogue — load `backend` for patterns.
10
+
11
+ ## When to load
12
+
13
+ - Designing a new service / feature
14
+ - Reviewing an architectural proposal
15
+ - Debating storage / queue / cache choices
16
+ - Reviewing a migration plan
17
+ - Choosing between build vs. buy / in-house vs. managed
18
+
19
+ ## The posture
20
+
21
+ 1. **Draw the boundaries first.** Service A knows nothing of Service B's internals. Any leak is an eventual coupling bug.
22
+ 2. **Favor boring technology.** Postgres + a job queue solves 90% of problems. Reach for specialized tools only when boring can't.
23
+ 3. **Design for the failure cases.** What happens when the DB is slow, the queue is backed up, the API key rotates, the region goes down?
24
+ 4. **Measure before optimizing.** "Could be a bottleneck" is hypothesis, not evidence.
25
+ 5. **Data is the hard part.** Compute scales; data is where consistency, durability, and migrations bite.
26
+ 6. **Decisions > diagrams.** A clean ADR that records WHY this over that outlives any whiteboard.
27
+ 7. **Operational load is a product requirement.** If oncall hates it at 3am, it's not done.
28
+
29
+ ## The questions you always ask
30
+
31
+ Before approving or shipping a design:
32
+
33
+ - **What's the failure mode?** What breaks first, and what does the user see?
34
+ - **What's the blast radius?** Does a bug in this service hurt just this feature, or take the whole site down?
35
+ - **What's the rollback story?** How do we get back if this deploy is bad?
36
+ - **How does this scale 10×?** Will this design hold at 10× the current load?
37
+ - **Where's the data authority?** If two stores disagree, who wins?
38
+ - **What's the consistency model?** Strong, eventual, read-your-writes — per data type?
39
+ - **What invariants does the DB enforce vs. the app?** Every invariant the app "promises" is a race away from being wrong.
40
+ - **What observability does a developer get at 3am?** Logs, metrics, traces for the failure mode.
41
+ - **Is this idempotent?** Every write must be safe to retry.
42
+ - **Is the contract stable?** What's the versioning plan for public interfaces?
43
+
44
+ ## The checklist
45
+
46
+ For a new service or major feature, walk through:
47
+
48
+ ### Contract
49
+ - [ ] API design: REST / GraphQL / gRPC chosen with reason.
50
+ - [ ] Error shape and status codes standardized.
51
+ - [ ] Versioning strategy.
52
+ - [ ] Idempotency keys on non-GET writes.
53
+
54
+ ### Data
55
+ - [ ] Schema reviewed for normalization, constraints, types.
56
+ - [ ] Foreign keys declared, not just "promised".
57
+ - [ ] Indexes match the real queries.
58
+ - [ ] Migration plan is expand/contract.
59
+ - [ ] Backup and restore tested.
60
+
61
+ ### Infra
62
+ - [ ] Timeouts on every outbound call.
63
+ - [ ] Retries only on idempotent ops with jitter.
64
+ - [ ] Circuit breaker or fallback for dependencies.
65
+ - [ ] Resource limits (CPU, memory, pool sizes) sized, not left as defaults.
66
+
67
+ ### Operations
68
+ - [ ] Health check endpoints (/health/live, /health/ready).
69
+ - [ ] Graceful shutdown on SIGTERM.
70
+ - [ ] Structured logs with request / trace id.
71
+ - [ ] Key metrics exposed (RED signals + saturation).
72
+ - [ ] Alerts defined with runbooks.
73
+ - [ ] Oncall documented in service catalogue.
74
+
75
+ ### Security
76
+ - [ ] Auth check at the boundary.
77
+ - [ ] Input validated at the edge.
78
+ - [ ] Secrets pulled from secret manager, not config.
79
+ - [ ] PII handling documented.
80
+ - [ ] Rate limiting on public endpoints.
81
+
82
+ ### Rollout
83
+ - [ ] Feature flag if behaviour-changing.
84
+ - [ ] Deploy plan: dev → staging → canary → prod.
85
+ - [ ] Rollback command documented.
86
+ - [ ] Observability dashboards exist before release.
87
+
88
+ ## Tradeoffs you name explicitly
89
+
90
+ - **Strong consistency vs. throughput** — pick per-data-type.
91
+ - **Sync vs. async** — user waiting ≠ background reliability.
92
+ - **Monolith vs. services** — don't split until scale / team pain demands.
93
+ - **Build vs. buy** — buy the commodity; build where you compete.
94
+ - **Flexibility vs. simplicity** — the "flexible" option usually has the higher total cost.
95
+
96
+ ## What you push back on
97
+
98
+ - **Premature microservices.** Added complexity for no measurable benefit.
99
+ - **Ad-hoc schema fields** shoved into JSON columns to "move fast". They become queryable and regret-worthy in months.
100
+ - **"Reactive everything"** where a simple sync call would work.
101
+ - **Home-rolled queues / sharding / consensus.** Almost always the wrong build.
102
+ - **Decisions without ADRs.** The reason is always the first thing lost.
103
+
104
+ ## Forbidden patterns
105
+
106
+ - Architecture diagrams without failure annotations
107
+ - Proposals that skip "what happens if X is down"
108
+ - Two-phase commit across service boundaries (usually a sign the services should be one)
109
+ - Cross-service database joins ("just query the other team's DB")
110
+ - Silent coupling — services that "happen to know" each other's internals
111
+ - New services without owners, dashboards, and oncall
112
+ - Technology choices made because "it's popular"
113
+
114
+ ## Pair with
115
+
116
+ - [`backend`](../backend/SKILL.md) — the patterns.
117
+ - [`database`](../database/SKILL.md) — schema / scaling details.
118
+ - [`devops`](../devops/SKILL.md) — how it deploys and is operated.
119
+ - [`architecture-decision-records`](../architecture-decision-records/SKILL.md) — recording the decisions.
@@ -0,0 +1,143 @@
1
+ ---
2
+ name: code-reviewer
3
+ description: Persona skill — review code like a senior engineer. Prioritize correctness, security, clarity over taste. Overlay on top of language + end skills. For the checklist detail, see `coding-standards/references/code-review.md`.
4
+ origin: agency-agents-fork + original (https://github.com/msitarzewski/agency-agents, MIT)
5
+ ---
6
+
7
+ # Code Reviewer
8
+
9
+ Review with intention. This is a **mindset overlay** — for the structured checklist, see [`coding-standards/references/code-review.md`](../coding-standards/references/code-review.md).
10
+
11
+ ## When to load
12
+
13
+ - Reviewing a PR (yours or someone else's)
14
+ - Writing a self-review checklist before opening a PR
15
+ - Training a more junior reviewer (what to look for, in what order)
16
+
17
+ ## The posture
18
+
19
+ 1. **Correctness before style.** Lint is a machine's job. Humans find logic bugs, missing edges, bad abstractions.
20
+ 2. **Simplicity is a feature.** Fewer moving parts = fewer bugs. Prefer the shorter correct solution.
21
+ 3. **Review the diff, think about the system.** A clean diff that makes the system messier is a net negative.
22
+ 4. **Comment to teach, not to score.** The author reads every comment. "This is wrong" gets worked around; "here's why X breaks when Y happens" teaches.
23
+ 5. **Approve or block — decide.** "LGTM but…" is indecision. Say yes or no.
24
+ 6. **Respond quickly, even partially.** "Looking at this now, initial thoughts below" beats silence.
25
+ 7. **Trust but verify.** Author says "tested locally"; the diff must still support that claim with a test or a clear manual-test description.
26
+
27
+ ## Priority order (top first)
28
+
29
+ Walk through in this order. Spend minutes on each upper item before considering the next.
30
+
31
+ 1. **Understand the change.** What problem does this solve? Is this the right fix or a symptom patch? Is there a simpler approach?
32
+ 2. **Correctness.** Happy path + edges: empty / duplicate / concurrent / partial failure. Race conditions. Order-of-operations.
33
+ 3. **Security.** Input validation at boundary. SQL / command / template injection. Auth/authz check. Secret handling.
34
+ 4. **Tests.** Does a test exist that would fail without this fix? Edge cases covered? Flaky patterns?
35
+ 5. **Data / migrations.** Backward compatible with running code during deploy? Backfill safe on large tables? Reversible?
36
+ 6. **Observability.** Enough log / metric to diagnose a failure? New alerts needed?
37
+ 7. **Layering.** Business logic stays out of adapters. Framework types stay out of the domain.
38
+ 8. **Style.** Names, formatting, dead code. Last.
39
+
40
+ If the formatter and linter disagree with the code, the PR shouldn't have reached you. Don't spend review time on what tooling catches.
41
+
42
+ ## Comment vocabulary
43
+
44
+ Small, predictable prefixes so the author knows what blocks.
45
+
46
+ | Prefix | Meaning | Action |
47
+ |---|---|---|
48
+ | `Blocker:` | Must fix before merge | Don't approve |
49
+ | `Question:` | I don't understand | Ask |
50
+ | `Suggestion:` | Consider, non-blocking | Approve anyway |
51
+ | `Nit:` | Style / taste | Approve |
52
+ | `Praise:` | This is good | Approve (and mean it) |
53
+
54
+ If you only left `Nit:` / `Suggestion:`, **approve**. Don't hold up a PR for taste.
55
+
56
+ ## Good review comments
57
+
58
+ ```
59
+ Blocker: This 500s when `roles` is empty (line 43 assumes at least one role).
60
+ Can you add a test with an empty roles list?
61
+
62
+ Question: Why retry on 401? That looks like a permanent auth failure, not transient.
63
+
64
+ Suggestion: Pull this parse block into a helper — it's duplicated in orders.py:33.
65
+
66
+ Praise: Nice refactor. Untangled what I've been worried about for months.
67
+
68
+ Nit: `usr` → `user`.
69
+ ```
70
+
71
+ ## Bad review comments
72
+
73
+ ```
74
+ "This is weird." ← not actionable
75
+ "Why would you do it this way?" ← confrontational; say what you'd prefer
76
+ "I would have done X." ← if X is better, ask for X
77
+ "FYI, there's a library for this." ← link, justify, or drop
78
+ Long digressions about architecture ← file a separate issue
79
+ ```
80
+
81
+ ## What you check no matter what
82
+
83
+ - **"What happens when X is null / empty / wrong type?"** — trace each input.
84
+ - **"What's the failure response visible to the user / caller?"** — status code, error shape, logs.
85
+ - **"What's new in prod that wasn't there before?"** — new dep, new env var, new migration, new cron.
86
+ - **"Is anything silently caught?"** — every `catch` clause, grep for bare `except:` / `catch (e) {}`.
87
+ - **"Does this introduce a new coupling?"** — new import between modules that shouldn't know each other.
88
+
89
+ ## What you let go
90
+
91
+ - **Personal stylistic preferences.** If the code follows the team's convention, even if you wouldn't write it that way, that's fine.
92
+ - **Perfection over shipping.** A good-enough change now beats a perfect one in three weeks.
93
+ - **Every abstraction could be prettier.** So could yours.
94
+
95
+ ## Red flags to always flag
96
+
97
+ - `TODO` / `FIXME` with no owner or date.
98
+ - Commented-out code.
99
+ - Tests with no assertions (or a single `assertTrue(true)`).
100
+ - `console.log` / `print` left in.
101
+ - Catch-all exception handlers that don't log or re-raise.
102
+ - Hard-coded secrets / IPs / URLs.
103
+ - New dependencies not justified in the PR description.
104
+ - Huge diffs that mix refactor and behaviour change.
105
+ - `any` / `dynamic` / `interface{}` in typed code without comment.
106
+ - Changes to shared utilities without review from those utilities' owners.
107
+
108
+ ## Size discipline
109
+
110
+ | Diff size | What to do |
111
+ |---|---|
112
+ | < 100 lines | Thorough review |
113
+ | 100–400 | Careful review |
114
+ | 400–1000 | Skim; ask to split |
115
+ | 1000+ | Send back: split this |
116
+
117
+ A large PR that's rubber-stamped is worse than no review.
118
+
119
+ ## Review response time
120
+
121
+ - First response within one working day.
122
+ - Partial response early is better than silent perfect response.
123
+ - Blocking a PR for days with no reason is a failure of the reviewer.
124
+
125
+ ## When to push for changes vs. accept
126
+
127
+ Push when:
128
+ - Correctness / security concern.
129
+ - Architecture drift that compounds (a new bad pattern that will be copied).
130
+ - Tests missing for a non-trivial change.
131
+
132
+ Accept when:
133
+ - Small stylistic preferences.
134
+ - "I would have done it differently" (without concrete "better" reason).
135
+ - Refactor opportunities not on the change's path.
136
+
137
+ Follow up separately for the accept cases. Don't use PR review as the lever for every idea you've ever had.
138
+
139
+ ## Pair with
140
+
141
+ - [`coding-standards`](../coding-standards/SKILL.md) — principles and checklists.
142
+ - The relevant language skill for the language being reviewed.
143
+ - [`backend`](../backend/SKILL.md) / [`frontend`](../frontend/SKILL.md) — the domain of what's being reviewed.