npm - @coralai/sps-cli - Versions diffs - 0.42.0 → 0.43.0 - Mend

@coralai/sps-cli 0.42.0 → 0.43.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (109) hide show

package/README.md +34 -3
package/dist/commands/projectInit.d.ts.map +1 -1
package/dist/commands/projectInit.js +40 -53
package/dist/commands/projectInit.js.map +1 -1
package/dist/commands/skillCommand.d.ts +2 -0
package/dist/commands/skillCommand.d.ts.map +1 -0
package/dist/commands/skillCommand.js +235 -0
package/dist/commands/skillCommand.js.map +1 -0
package/dist/core/skillStore.d.ts +46 -0
package/dist/core/skillStore.d.ts.map +1 -0
package/dist/core/skillStore.js +197 -0
package/dist/core/skillStore.js.map +1 -0
package/dist/core/skillStore.test.d.ts +2 -0
package/dist/core/skillStore.test.d.ts.map +1 -0
package/dist/core/skillStore.test.js +190 -0
package/dist/core/skillStore.test.js.map +1 -0
package/dist/main.js +19 -17
package/dist/main.js.map +1 -1
package/package.json +1 -1
package/skills/architecture-decision-records/SKILL.md +207 -0
package/skills/backend/SKILL.md +62 -0
package/skills/backend/references/api-design.md +168 -0
package/skills/backend/references/caching.md +181 -0
package/skills/backend/references/data-access.md +173 -0
package/skills/backend/references/layering.md +181 -0
package/skills/backend/references/observability.md +190 -0
package/skills/backend/references/resilience.md +201 -0
package/skills/backend/references/security.md +186 -0
package/skills/backend-architect/SKILL.md +119 -0
package/skills/code-reviewer/SKILL.md +143 -0
package/skills/coding-standards/SKILL.md +60 -0
package/skills/coding-standards/references/clean-code.md +258 -0
package/skills/coding-standards/references/code-review.md +192 -0
package/skills/coding-standards/references/commits-and-prs.md +226 -0
package/skills/coding-standards/references/error-strategy.md +193 -0
package/skills/coding-standards/references/naming.md +185 -0
package/skills/coding-standards/references/tdd.md +171 -0
package/skills/database/SKILL.md +53 -0
package/skills/database/references/indexing.md +190 -0
package/skills/database/references/migrations.md +199 -0
package/skills/database/references/nosql.md +185 -0
package/skills/database/references/queries.md +295 -0
package/skills/database/references/scaling.md +203 -0
package/skills/database/references/schema.md +191 -0
package/skills/database-optimizer/SKILL.md +168 -0
package/skills/debugging-workflow/SKILL.md +244 -0
package/skills/devops/SKILL.md +55 -0
package/skills/devops/references/ci-cd.md +204 -0
package/skills/devops/references/containers.md +272 -0
package/skills/devops/references/deploy.md +201 -0
package/skills/devops/references/iac.md +252 -0
package/skills/devops/references/observability.md +228 -0
package/skills/devops/references/secrets.md +178 -0
package/skills/devops-automator/SKILL.md +164 -0
package/skills/frontend/SKILL.md +52 -0
package/skills/frontend/references/accessibility.md +222 -0
package/skills/frontend/references/components.md +206 -0
package/skills/frontend/references/performance.md +219 -0
package/skills/frontend/references/routing.md +209 -0
package/skills/frontend/references/state.md +190 -0
package/skills/frontend/references/testing.md +216 -0
package/skills/frontend-developer/SKILL.md +115 -0
package/skills/git-workflow/SKILL.md +355 -0
package/skills/golang/SKILL.md +49 -0
package/skills/golang/references/concurrency.md +284 -0
package/skills/golang/references/errors.md +241 -0
package/skills/golang/references/idioms.md +285 -0
package/skills/golang/references/testing.md +238 -0
package/skills/java/SKILL.md +50 -0
package/skills/java/references/concurrency.md +194 -0
package/skills/java/references/idioms.md +283 -0
package/skills/java/references/testing.md +228 -0
package/skills/kotlin/SKILL.md +47 -0
package/skills/kotlin/references/coroutines.md +240 -0
package/skills/kotlin/references/idioms.md +268 -0
package/skills/kotlin/references/testing.md +219 -0
package/skills/mobile/SKILL.md +50 -0
package/skills/mobile/references/architecture.md +204 -0
package/skills/mobile/references/navigation.md +158 -0
package/skills/mobile/references/performance.md +152 -0
package/skills/mobile/references/platform.md +166 -0
package/skills/mobile/references/state-and-data.md +174 -0
package/skills/python/SKILL.md +51 -0
package/skills/python/THIRD_PARTY.md +14 -0
package/skills/python/references/async.md +218 -0
package/skills/python/references/error-handling.md +254 -0
package/skills/python/references/idioms.md +279 -0
package/skills/python/references/packaging.md +233 -0
package/skills/python/references/testing.md +269 -0
package/skills/python/references/typing.md +292 -0
package/skills/qa-tester/SKILL.md +186 -0
package/skills/rust/SKILL.md +50 -0
package/skills/rust/references/async.md +224 -0
package/skills/rust/references/errors.md +240 -0
package/skills/rust/references/ownership.md +263 -0
package/skills/rust/references/testing.md +274 -0
package/skills/rust/references/traits.md +250 -0
package/skills/security-engineer/SKILL.md +157 -0
package/skills/swift/SKILL.md +48 -0
package/skills/swift/references/concurrency.md +280 -0
package/skills/swift/references/idioms.md +334 -0
package/skills/swift/references/testing.md +229 -0
package/skills/typescript/SKILL.md +51 -0
package/skills/typescript/references/async.md +241 -0
package/skills/typescript/references/errors.md +208 -0
package/skills/typescript/references/idioms.md +246 -0
package/skills/typescript/references/testing.md +225 -0
package/skills/typescript/references/tooling.md +208 -0
package/skills/typescript/references/types.md +259 -0

package/skills/devops/references/ci-cd.md ADDED Viewed

@@ -0,0 +1,204 @@
+# CI / CD
+Pipelines, caching, parallelism, artifacts, gates.
+## Pipeline stages — the standard shape
+```
+┌──────────┐ ┌───────┐ ┌──────┐ ┌──────┐ ┌──────────┐ ┌──────────┐
+│ checkout │▶│ lint  │▶│ test │▶│ build│▶│ scan/sign│▶│  deploy  │
+└──────────┘ └───────┘ └──────┘ └──────┘ └──────────┘ └──────────┘
+                  │
+                  └─► parallel jobs where possible
+```
+Order matters: cheap-and-fast first (lint, typecheck). Expensive and slow last (E2E, image build). Failing lint should fail the pipeline in under a minute.
+## Keep CI fast
+Target: **< 10 min end-to-end on a typical change**. Slow CI punishes every commit.
+Levers:
+- **Cache dependencies.** Lockfile as cache key. `actions/cache` / equivalent.
+- **Parallelize independent jobs.** Lint + typecheck + unit tests can all run at once.
+- **Shard tests.** A 10-minute test suite split into 4 shards = 2.5 min each.
+- **Run integration / E2E on critical paths only**, or only on main.
+- **Test only what changed** for monorepos. `nx affected`, `turbo run --filter`, `bazel query`.
+## Cache keys
+```yaml
+# ✅ stable, invalidates only when deps change
+key: "${{ runner.os }}-node-${{ hashFiles('package-lock.json') }}"
+# ❌ too narrow — cache misses every run
+key: "${{ runner.os }}-node-${{ github.sha }}"
+# ❌ too broad — may return incompatible cache
+key: "${{ runner.os }}-node"
+```
+Cache the right things:
+- `node_modules` / `pip wheels` / `cargo target` / `gradle caches` — big wins.
+- Build output (`.next`, `dist`) if subsequent jobs use it.
+- Don't cache test reports or transient artifacts.
+## Build artifacts, promote them
+Build once. The same artifact flows dev → staging → prod.
+```
+PR:    build → test                (no artifact published)
+main:  build → test → publish      (publish image:sha)
+deploy dev:     pull image:sha → deploy
+deploy staging: pull image:sha → deploy      (same image)
+deploy prod:    pull image:sha → deploy      (same image)
+```
+Building again per environment re-runs tests and invites "it worked in staging" surprises when the new build differs (new dependency version, timestamp).
+## Artifact tagging
+```
+image: myapp:sha-abc1234          # immutable, references a commit
+image: myapp:v1.2.3               # semver release
+image: myapp:main                 # mutable — latest main
+image: myapp:latest               # mutable — last push to whatever
+```
+Deploy by immutable tag (`sha-abc1234` or `v1.2.3`). Mutable tags (`main`, `latest`) are convenient for humans but make rollbacks ambiguous.
+## Gates and approvals
+Autodeploy to dev. Require a check/approval for staging → prod (or for sensitive envs).
+```
+merge to main
+  ▶ deploy dev (auto)
+  ▶ deploy staging (auto, smoke tests)
+  ▶ deploy prod (manual approval)
+```
+Manual gate is the pause for "should this actually ship now?" — release freeze, cross-team sync.
+## Required status checks
+On the PR branch, block merge unless:
+- Lint / typecheck pass
+- Unit tests pass
+- Coverage above threshold (if enforced)
+- Review approval received
+Configure in the VCS (GitHub branch protection, GitLab push rules).
+## Secrets in CI
+- **Never** store secrets in CI config files or env files checked into the repo.
+- CI platforms have secret stores (GitHub Secrets, GitLab Variables, environment-scoped).
+- Scope per environment (`PROD_DB_URL`, not a shared one).
+- Prefer short-lived credentials (OIDC) over long-lived keys.
+  ```
+  # GitHub Actions → AWS via OIDC, no AWS_ACCESS_KEY stored in GitHub
+  permissions: { id-token: write }
+  - uses: aws-actions/configure-aws-credentials@v4
+    with: { role-to-assume: arn:aws:iam::...:role/github-prod }
+  ```
+- Mask secrets in logs (most CI tools do this automatically).
+## Supply-chain security
+- **Pin** third-party actions / images by SHA, not version tag.
+  ```yaml
+  uses: actions/checkout@11bd71901bbe5b1630ceea73d27796261f9...   # v4.0.0
+  ```
+  Tags are mutable; an attacker who takes over the repo can repoint a tag.
+- **Dependency scanning**: Dependabot / Renovate for updates; Snyk / Trivy / Grype for vulnerabilities.
+- **SBOM generation**: produce one per build, store it.
+- **Image signing**: cosign + Sigstore; verify at deploy.
+## Matrix builds
+For multi-version / multi-OS testing:
+```yaml
+strategy:
+  matrix:
+    node: [18, 20, 22]
+    os: [ubuntu-latest, macos-latest]
+```
+Keep matrices narrow — `3 × 2 = 6` jobs, not 30. CI-minutes add up.
+## Flaky tests — triage immediately
+One flaky test poisons the signal.
+- Tag the test as flaky, move to a separate job, investigate within a week.
+- A test that fails intermittently is ALWAYS a bug: race condition, shared state, timing assumption. Don't accept "just retry".
+- Quarantine + retry is a short-term fix only. Delete the test rather than leave it quarantined forever.
+## Pull-request vs. main pipelines
+Different triggers, often different scopes:
+| Trigger | Run |
+|---|---|
+| PR | Lint, typecheck, unit, key integration |
+| PR (target main) | + E2E happy path |
+| Merge to main | + build, publish artifact, deploy dev / staging |
+| Tag / release | + prod deploy gate |
+| Scheduled | + full E2E, perf tests, security scans |
+Don't run everything on every PR. Keep PRs fast; save heavy tests for main.
+## Monorepo considerations
+- **Change-aware testing**: don't rebuild / test the whole monorepo if only one package changed.
+- **Project graph tools**: Turborepo, Nx, Bazel, Pants.
+- **Shared cache**: remote cache (Turbo Cloud, Nx Cloud, BuildBuddy) pays for itself on larger teams.
+## Deploy previews
+Ephemeral environments per PR:
+```
+PR #123 → https://pr-123.preview.myapp.com
+```
+Great for frontend, reasonable for APIs, expensive for heavy backends. Tear down on PR close.
+Tools: Vercel / Netlify / Cloudflare for frontends; Render / Fly / Kubernetes preview envs / Garden / Uffizzi for full-stack.
+## Concurrency control
+Don't let two prod deploys race:
+```yaml
+concurrency:
+  group: deploy-prod
+  cancel-in-progress: false
+```
+For PR previews, cancel old runs when a new commit arrives:
+```yaml
+concurrency:
+  group: pr-${{ github.ref }}
+  cancel-in-progress: true
+```
+## Anti-patterns
+| Anti-pattern | Fix |
+|---|---|
+| `|| true` to hide test failures | Fix or delete the test |
+| `:latest` tag in prod deploy manifest | Immutable tag |
+| Deploying code untested in staging | Dev → staging → prod, same artifact |
+| Secrets via commits / CI log | Secret store, masked |
+| 40-minute CI runs on every PR | Split; run heavy tests on main |
+| Tests that share mutable state | Isolate / reset per test |
+| Action pinned by tag only | Pin by SHA |
+| Deploying from a dev laptop | CI-only deploy path |
+| No automated rollback plan | See `deploy.md` |
+| Ignoring flaky tests | Quarantine + fix within a week; don't normalize |

package/skills/devops/references/containers.md ADDED Viewed

@@ -0,0 +1,272 @@
+# Containers
+Dockerfile, multi-stage, size, rootless, base images.
+## The goals
+1. **Small** — ship less bytes, fewer CVEs, faster pulls.
+2. **Reproducible** — same Dockerfile + same lockfile → same image.
+3. **Secure** — no root, no shell if possible, minimal deps, signed.
+4. **Fast to build** — layer caching aligned with change frequency.
+## Base image selection
+```
+Prefer:  distroless > alpine > slim > full distro
+```
+| Base | Size | Trade-off |
+|---|---|---|
+| `gcr.io/distroless/static` | ~2 MB | No shell, no package manager. Static binaries only. |
+| `gcr.io/distroless/base` | ~15 MB | libc, openssl, etc. Good for most compiled langs. |
+| `alpine:3.20` | ~5 MB | musl libc (incompat with some packages); apk package manager |
+| `debian:12-slim` | ~75 MB | glibc; widest compatibility; still small |
+| `debian:12` / `ubuntu:24.04` | 120–200 MB | Full distro; use only when you need dev tools at runtime |
+Choose the smallest one that runs your workload. Distroless is ideal for production runtime.
+## Multi-stage builds
+Separate build env from runtime env.
+```dockerfile
+# syntax=docker/dockerfile:1
+# --- Build stage
+FROM node:20-alpine AS build
+WORKDIR /app
+# Cache deps separately from source
+COPY package.json package-lock.json ./
+RUN npm ci
+COPY . .
+RUN npm run build
+# Prune dev deps after build
+RUN npm prune --omit=dev
+# --- Runtime stage
+FROM gcr.io/distroless/nodejs20-debian12
+WORKDIR /app
+COPY --from=build /app/node_modules ./node_modules
+COPY --from=build /app/dist ./dist
+COPY --from=build /app/package.json ./
+USER nonroot
+EXPOSE 3000
+CMD ["dist/server.js"]
+```
+Benefits:
+- Build tools don't ship to prod.
+- Different base OS for build vs runtime (Alpine to build, distroless to run).
+- Smaller image, smaller attack surface.
+## Layer order matters for caching
+Stable layers first, volatile last.
+```dockerfile
+# ✅
+COPY package.json package-lock.json ./     # stable; rarely changes
+RUN npm ci                                    # cached unless lockfile changed
+COPY . .                                       # volatile; changes every commit
+RUN npm run build
+# ❌ re-runs npm ci on every code change
+COPY . .
+RUN npm ci
+RUN npm run build
+```
+## Pin versions
+```dockerfile
+FROM node:20.11.1-alpine3.19        # not node:20, not node:latest
+# or pin by digest for strictest reproducibility:
+FROM node@sha256:5b57a...
+```
+Tags are mutable; SHAs aren't. For prod, pin by SHA; for dev, tag is usually fine.
+## Don't run as root
+```dockerfile
+# Debian-based
+RUN groupadd --system app && useradd --system --gid app app
+USER app
+# Alpine
+RUN addgroup -S app && adduser -S -G app app
+USER app
+# Distroless already provides a `nonroot` user
+USER nonroot
+```
+Root inside a container is not isolation. Principle of least privilege applies here too.
+## Don't bake secrets
+```dockerfile
+# ❌
+ARG API_KEY
+ENV API_KEY=$API_KEY         # baked into the image layer
+# ✅ pass at runtime
+docker run -e API_KEY=... myapp
+# or mount from secret manager
+```
+Secrets that land in a layer are visible to anyone with the image. Rotating is painful.
+## Build secrets (BuildKit)
+For secrets needed **during build** only (e.g., private npm registry token):
+```dockerfile
+# syntax=docker/dockerfile:1.4
+RUN --mount=type=secret,id=npm_token \
+    NPM_TOKEN=$(cat /run/secrets/npm_token) npm ci
+```
+The secret doesn't persist in the final image.
+## `.dockerignore`
+Exclude the junk. Every extra file bloats the build context.
+```
+.git
+node_modules
+**/__pycache__
+**/*.log
+.DS_Store
+.idea/
+.vscode/
+.env
+tests/
+.venv
+target/
+coverage/
+dist/
+```
+A 2 GB build context on a 200 MB repo is a signal you need a `.dockerignore`.
+## HEALTHCHECK
+Declare how to know the container is alive.
+```dockerfile
+HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
+  CMD curl -fsS http://localhost:3000/health || exit 1
+```
+Kubernetes uses its own liveness / readiness probes — set them in the manifest, not the Dockerfile. Docker Swarm / standalone Docker uses `HEALTHCHECK`.
+## Signal handling
+```dockerfile
+# ✅ node handles SIGTERM directly (no shell in between)
+CMD ["node", "server.js"]
+# ❌ shell form — shell gets SIGTERM, may not forward to node
+CMD node server.js
+```
+Exec form (array) runs the binary directly. Shell form runs `/bin/sh -c` and can swallow signals. Use exec form for main process.
+For apps that don't handle signals, use `tini` as init:
+```dockerfile
+ENTRYPOINT ["/sbin/tini", "--"]
+CMD ["./my-binary"]
+```
+## Logging
+Write to stdout / stderr. The container runtime collects these.
+```dockerfile
+# ❌ logs go into the container filesystem; lost on restart
+CMD ["my-binary", "--log-file", "/var/log/app.log"]
+# ✅ stdout
+CMD ["my-binary"]       # app logs to stdout by default
+```
+## Image scanning
+Run a scanner in CI (Trivy, Grype, Snyk).
+```
+trivy image myapp:sha-abc123
+# fail build on HIGH/CRITICAL unfixed CVEs
+```
+Scan at build, again on a schedule (CVEs are disclosed after you build).
+## Image signing (supply chain)
+Sign images so the cluster can verify.
+```
+cosign sign --key cosign.key myregistry/myapp:sha-abc123
+```
+At deploy, policy (Kyverno, Gatekeeper, ECR policy, Sigstore policy-controller) verifies the signature before admission.
+## Don't install what you don't need
+Every package installed is:
+- Bytes on the wire
+- Disk on the node
+- A CVE waiting to be reported
+```dockerfile
+# ❌
+RUN apt-get update && apt-get install -y \
+    curl vim git build-essential python3 netcat ...
+# ✅
+RUN apt-get update && apt-get install --no-install-recommends -y \
+    ca-certificates && rm -rf /var/lib/apt/lists/*
+```
+Clean apt lists, yum caches, pip caches in the same layer that installed them.
+## Distroless specifics
+No shell. No `ls`, no `cat`, no `curl`. This is a feature.
+- Healthcheck: use the binary itself (`myapp healthcheck`) or a Kubernetes probe over HTTP.
+- Debugging: `kubectl exec` won't give you a shell. Use ephemeral debug containers (`kubectl debug`) or log more.
+Downsides: ops is harder at first. Upside: drastically smaller attack surface.
+## Image size — typical targets
+| App | Target size |
+|---|---|
+| Go / Rust static binary | 5–15 MB |
+| Node.js app | 100–200 MB |
+| Python app | 100–250 MB |
+| Java app (with JRE) | 150–250 MB |
+If your Node image is 1.5 GB, you shipped `node_modules/` twice, left dev deps, or forgot `--omit=dev`.
+## Anti-patterns
+| Anti-pattern | Fix |
+|---|---|
+| `FROM ubuntu:latest` | Specific version; smaller base |
+| `COPY . .` before `RUN npm ci` | Deps before source for cache |
+| `apt-get install -y *` no cleanup | `rm -rf /var/lib/apt/lists/*` same layer |
+| `RUN chmod ...` in 10 separate layers | Combine; each layer has size cost |
+| Running as root | `USER` before CMD |
+| Shell form CMD | Exec form, signals work |
+| Logs to file inside container | stdout/stderr |
+| `ADD` for local files | `COPY` — `ADD` also untars and downloads, surprising |
+| Secrets in `ENV` / `ARG` | Mount at runtime |
+| Every service uses a different base image | Standardize; fewer bases to scan and update |

package/skills/devops/references/deploy.md ADDED Viewed

@@ -0,0 +1,201 @@
+# Deploy
+Rolling, blue-green, canary, feature flags. Rollback plan always.
+## Deploy ≠ release
+- **Deploy**: new code is running on the infra.
+- **Release**: new code is serving user traffic.
+Decoupling them (deploy first, flag on later) is how you ship safely. The deploy can be tested under real load without user impact; the release is a quick toggle.
+## Rolling update
+Replace instances one / a few at a time. Default in Kubernetes, ECS, most orchestrators.
+```
+v1 v1 v1 v1      (4 pods)
+v1 v1 v1 v2      (replace one)
+v1 v1 v2 v2      (replace next)
+...              (eventually all v2)
+```
+Parameters to tune:
+- **maxSurge**: how many extra pods can exist during roll (e.g., +25%).
+- **maxUnavailable**: how many pods can be missing (e.g., 0 for strict).
+- **readinessProbe**: traffic waits until new pod reports ready.
+Rollback: roll back to the previous replica set / task definition.
+## Blue-green
+Two full environments. Cut over all traffic at once.
+```
+blue (prod traffic) — current version
+green (idle)        — new version, warmed up
+Cut over: point load balancer to green.
+Rollback: point back to blue (instant).
+```
+Pros: clean cutover, instant rollback.
+Cons: 2× infra cost during the overlap window.
+Use for:
+- Critical releases where rolling drag-out is risky.
+- DB schema changes that coexist with both versions.
+## Canary
+Send a small percentage of traffic to the new version; scale up if healthy, roll back if not.
+```
+100% v1
+  5% v2, 95% v1   — watch metrics for 10 min
+ 25% v2, 75% v1   — watch
+ 50% v2, 50% v1
+100% v2
+```
+Observation:
+- Error rate on v2 vs. v1
+- Latency p95/p99
+- Business metrics (conversion, checkout success)
+- Custom alarms (signup, payment)
+Automated: Argo Rollouts, Flagger, AWS CodeDeploy canary. Manual also works — humans read dashboards and decide.
+Rollback: pull the canary (route 100% back to v1).
+## Progressive delivery
+Canary + automated analysis. The rollout controller evaluates metrics at each step and promotes / rolls back automatically.
+```yaml
+# Argo Rollouts (sketch)
+strategy:
+  canary:
+    steps:
+      - setWeight: 10
+      - pause: { duration: 5m }
+      - analysis: { templates: [error-rate, latency-p95] }
+      - setWeight: 50
+      - analysis: [...]
+      - setWeight: 100
+```
+The analysis step is a check against a metric threshold. Fail → auto-rollback.
+## Feature flags
+Ship code dark; flip for a percentage of users; roll back with a toggle.
+```
+if feature_enabled('new_checkout', user):
+    new_checkout()
+else:
+    old_checkout()
+```
+Benefits:
+- Deploy-release decoupling.
+- A/B testing for correctness, not just design.
+- Instant rollback without a redeploy.
+Discipline:
+- Every flag is temporary. Clean up after full rollout (or after hypothesis failure).
+- Document owner + expiry for each flag. Stale flags accrete and become impossible to remove.
+- Service (LaunchDarkly, ConfigCat, Unleash, Flagsmith, home-grown).
+## Database migrations + deploys
+See `database/migrations.md` — expand / contract is the safe dance. Key rules for deploy:
+- **Migration BEFORE code that needs it.** Don't deploy v2 of the app before running its required migration.
+- **No destructive migrations during peak hours.** Schedule windows; at least reduce blast radius.
+- **New code tolerates old schema AND old code tolerates new schema** at the overlap.
+## Preflight checks
+Before actually rolling out:
+- **Smoke test** against staging with the exact artifact going to prod.
+- **Load test** for major changes (perf regressions hide in staging noise).
+- **Dependency audit** (new CVE in the image?).
+- **Release notes** drafted; rollback plan documented.
+## During deploy
+Monitor:
+- **Deployment health**: readiness failures, crash loops.
+- **Service health**: error rate, latency, saturation.
+- **Downstream**: DB, cache, message broker metrics — did the new code change call patterns?
+- **Business metrics**: signups / second, checkout completion.
+Alerting during a deploy is different — some jitter is normal. Tighten post-deploy windows.
+## Rollback
+Every deploy plan includes: **how do we get back?**
+```
+If <condition>, roll back by <step 1>, <step 2>, ...
+```
+Common "conditions":
+- Error rate > 2× baseline for 5 min
+- Latency p99 > 2× baseline
+- Business metric (signup, checkout) drops > 20%
+- Manual oncall decision
+Rollback command should be one thing — not a checklist. Automate it.
+### What's rollback-safe?
+| Change | Rollback |
+|---|---|
+| Code-only | Deploy previous artifact |
+| Migration that's additive | Old code works; no DB action needed |
+| Migration that removed a column | Restore from backup (painful) — avoid this in a rolled-back state |
+| Feature flag on | Turn it off |
+| Config change | Revert config |
+Design changes so rolling back code is sufficient. That dictates the migration pattern (expand/contract).
+## Shutdown gracefully
+On `SIGTERM`:
+1. Stop accepting new connections (readiness → unhealthy; LB removes pod).
+2. Finish in-flight requests (with a grace period — 30 s typical).
+3. Drain queues / finish current job.
+4. Close DB / external connections.
+5. Exit 0.
+Without this, rolling updates drop requests and leave half-processed jobs.
+Kubernetes: `terminationGracePeriodSeconds: 30` + a `preStop` hook + SIGTERM handling in app.
+## Deploy cadence
+- **Fast** (multiple / day, per team) — best for small changes, high automation, strong tests.
+- **Batched** (weekly / fortnightly) — when risk of each release is high.
+High-performing orgs deploy multiple times per day. The trick is making each deploy small.
+## "Release" vs. "deploy" for mobile
+Mobile can't feature-flag installed versions. But server-side flags can control behaviour of the installed app. Design APIs so server-side flags let you turn off client features without a new app version.
+## Anti-patterns
+| Anti-pattern | Fix |
+|---|---|
+| Deploying on Fridays without cause | Deploy midweek, quieter rollback |
+| Rollback as "SSH in and run..." | Automated, one-command rollback |
+| Waiting for the deploy to "look OK" by refreshing the app | Instrument; set specific metrics |
+| Manual canary percentage math | Use the orchestrator's progressive rollout |
+| Schema migration in the same step as code rollout | Pre-migrate; expand/contract |
+| Long-lived feature flags | Set expiry; clean up |
+| Sidecar fetching config at startup with no timeout | Fail fast; bounded retry |
+| Skipping grace period on SIGTERM | Lose requests at every deploy |
+| Deploys that don't produce an event in observability | Correlate spikes; deploys are first-class events |