npm - @jetrabbits/agentic - Versions diffs - 0.0.3 → 0.0.4 - Mend

@jetrabbits/agentic 0.0.3 → 0.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

package/areas/devops/ci-cd/workflows/release-pipeline.md CHANGED Viewed

@@ -2,10 +2,11 @@
 name: release-pipeline
 type: workflow
 trigger: /release-pipeline
-description: Run a full production release — version tagging, changelog generation, image signing, staging validation, canary deploy to production.
+description: Run a production release with supply-chain verification, database compatibility controls, progressive delivery, and measurable rollback criteria.
 inputs:
   - version (semver: v1.2.3)
   - release_notes (optional)
+  - risk_level (low|medium|high)
 outputs:
   - published_release
   - deployed_version
@@ -15,6 +16,7 @@ roles:
   - developer
   - team-lead
   - pm
+  - qa
 execution:
   initiator: developer
 related-rules:
@@ -27,89 +29,97 @@ uses-skills:
   - pipeline-security
 quality-gates:
   - all CI gates pass on release commit
-  - image signed and SBOM attached before deploy
-  - staging deploy healthy ≥ 15 min before production gate
+  - image signed, provenance generated, and SBOM attached before deploy
+  - staging deploy healthy >= 15 min before production gate
   - manual approval from team-lead for production
+  - rollback criteria defined before canary starts
 ---
 ## Steps
-### 1. Pre-Release Checks — `@devops-engineer` + `@team-lead`
+### 1. Release Readiness and Freeze Check — `@team-lead` + `@pm`
 - **Actions:**
-  - Confirm no active P0/P1 incidents
-  - Verify staging is healthy and running the release candidate
-  - Run final security scan on release image: `trivy image <image>:<version>`
-  - Check dependency review — no new Critical/High CVEs introduced
-  - Confirm changelog complete and reviewed
-- **Done when:** all checks green; team-lead approves release to proceed
+  - Confirm no active P0/P1 incidents.
+  - Confirm release window is approved (freeze policy respected).
+  - Assign release owner, rollback owner, and incident commander.
+  - Confirm stakeholder communication plan (`#deployments`, support, customer-facing status if needed).
+- **Done when:** readiness checklist signed by team-lead.
-### 2. Tag Release — `@developer`
+### 2. Database Compatibility Gate — `@developer` + `@devops-engineer`
 - **Actions:**
-  ```bash
-  # Create annotated git tag
-  git tag -a v${VERSION} -m "Release v${VERSION}: ${RELEASE_NOTES}"
-  git push origin v${VERSION}
-  ```
-- **Output:** git tag triggers release pipeline in CI
-- **Done when:** CI pipeline starts on the tag event
+  - Validate schema changes follow **expand/contract** strategy.
+  - Forbid destructive migrations in same release as dependent app change.
+  - Ensure old and new app versions can run concurrently during canary.
+  - Prepare rollback-safe migration plan.
+- **Done when:** DB compatibility checklist is green.
-### 3. CI Release Pipeline (automated) — CI system
+### 3. Tag Release — `@developer`
+```bash
+git tag -a v${VERSION} -m "Release v${VERSION}: ${RELEASE_NOTES}"
+git push origin v${VERSION}
+```
+- **Done when:** tag-triggered pipeline starts.
+### 4. CI Release Pipeline (automated) — CI system
 - **Stages:**
-  1. `validate` — lint + test suite must pass on tagged commit
-  2. `build` — Docker image tagged with semver + SHA digest
-  3. `sign` — `cosign sign` + `syft` SBOM generation + `cosign attach sbom`
-  4. `scan` — Trivy image scan on the exact release image; block on Critical/High
-  5. `publish` — push to releases registry; create GitHub Release with changelog
-- **Done when:** CI pipeline green; release published to registry
+  1. `validate` — lint/test/type/security checks.
+  2. `build` — immutable image digest produced.
+  3. `sign` — keyless `cosign sign` on digest.
+  4. `attest` — SLSA provenance generated.
+  5. `sbom` — CycloneDX/SPDX SBOM generated + attached.
+  6. `verify` — signature/provenance identity checks.
+  7. `publish` — publish artifact and release notes.
+- **Done when:** pipeline green with verifiable artifact metadata.
-### 4. Deploy Staging — `@devops-engineer`
+### 5. Deploy Staging — `@devops-engineer`
 ```bash
 helm upgrade --install order-service charts/order-service \
-  --set image.tag=v${VERSION} \
+  --set image.digest=sha256:${DIGEST} \
   --namespace staging \
-  --atomic --timeout 5m
+  --atomic --timeout 10m
 ```
-- Monitor for 15 minutes: error rate, p99 latency, pod restarts
-- Run automated smoke test suite against staging
-- **Done when:** 15 min stable; smoke tests pass
+- Run smoke + integration critical path tests.
+- Observe golden signals for at least 15 minutes.
+- **Done when:** staging stable and tests pass.
-### 5. Production Gate — `@team-lead` (manual approval)
-- Review staging metrics: confirm no anomalies
-- Check error budget: confirm budget not exhausted
-- Approve in CI platform (GitHub Environment approval / GitLab manual job)
-- **Done when:** approval recorded
+### 6. Production Gate — `@team-lead` + `@qa`
+- **Actions:**
+  - Confirm error budget not exhausted.
+  - Confirm rollback command and previous digest ready.
+  - Confirm canary SLO thresholds and observation duration are set.
+- **Done when:** manual approval is recorded.
-### 6. Deploy Production (canary) — `@devops-engineer`
-```bash
-# Canary: 10% traffic to new version
-helm upgrade --install order-service charts/order-service \
-  --set image.tag=v${VERSION} \
-  --set canary.enabled=true \
-  --set canary.weight=10 \
-  --namespace production \
-  --atomic --timeout 5m
+### 7. Canary Deployment — `@devops-engineer`
+- **Sequence:**
+  - 5% traffic for 10 min.
+  - 25% traffic for 15 min.
+  - 50% traffic for 15 min (high-risk releases only).
+  - 100% traffic only if all gates pass.
+- **Automatic rollback triggers (example baseline):**
+  - 5xx rate > 1% for 5 min,
+  - p99 latency regression > 20% for 10 min,
+  - fast burn-rate alert firing (>14.4x, 1h window).
-# Watch for 5 minutes
-# If SLO breach → auto-rollback
-# If healthy → progress to 100%
-helm upgrade order-service charts/order-service \
-  --set image.tag=v${VERSION} \
-  --set canary.enabled=false \
-  --namespace production \
-  --atomic --timeout 5m
-```
-- **Done when:** 100% traffic on new version; no SLO breaches
+### 8. Feature Flag Progression — `@developer` + `@qa`
+- Keep high-risk features behind flags during rollout.
+- Enable by cohorts (internal → 5% users → 25% → 100%).
+- Roll back by disabling flag if service health degrades without binary rollback.
+### 9. Post-Deploy Validation — `@qa` + `@pm`
+- Run production smoke checks.
+- Verify business KPIs (conversion, checkout success, error funnel).
+- Publish deployment report with links to metrics, logs, and release artifact metadata.
-### 7. Post-Deploy Validation — `@qa` + `@pm`
-- Run production smoke tests
-- Verify key business metrics not degraded
-- Announce release in #deployments channel
+## Rollback
-### Rollback (if needed at any step)
 ```bash
 helm rollback order-service -n production
-# or: deploy previous version tag explicitly
+# or redeploy previous verified digest
 ```
+- Rollback is mandatory when any SLO rollback trigger is met.
+- If DB migrations were expanded, execute rollback-safe contract plan only after traffic is stable.
 ## Exit
-Production 100% + smoke tests pass + team notified + deployment report = release complete.
+Release is complete when 100% traffic is healthy, post-deploy checks pass, and release report is published.

package/areas/devops/kubernetes/skills/pod-troubleshooting/SKILL.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: pod-troubleshooting
 type: skill
-description: Systematic diagnosis of pod failures — CrashLoopBackOff, OOMKilled, Pending, ImagePullBackOff, and service connectivity issues.
+description: "Systematic diagnosis of Kubernetes pod failures — CrashLoopBackOff, OOMKilled, Pending, ImagePullBackOff, and service connectivity issues. Use when the user encounters pods not starting, container restart loops, scheduling failures, or service unreachability in a K8s cluster."
 related-rules:
   - resource-governance.md
   - workload-security.md

package/areas/devops/observability/rules/alerting-standards.md CHANGED Viewed

@@ -1,36 +1,42 @@
 # Rule: Alerting Standards
-**Priority**: P1 — Alerts without runbooks are not deployed.
+**Priority**: P1 — Alerts must be actionable, SLO-aligned, and mapped to ownership.
 ## Alert Quality Rules
-1. **Every alert has a runbook** — `annotations.runbook_url` is mandatory.
-2. **No alert fires without a human action** — if no one can do anything about it, it's not an alert (it's a dashboard).
-3. **Alert on symptoms, not causes** — `HighErrorRate` is an alert; `HighCPU` is a warning unless it causes user impact.
-4. **Severity classification**
-   | Severity | Meaning | Response |
-   |:---|:---|:---|
-   | `critical` | User-facing outage or data loss risk | Page on-call immediately |
-   | `warning` | Degraded but not broken; trending toward critical | Notify Slack; fix in business hours |
-   | `info` | Informational; no action required | Dashboard only |
-5. **`for:` duration** — critical alerts: `for: 2m`; warning alerts: `for: 10m`. Instant alerts cause false positives.
-6. **Alert fatigue policy** — if an alert fires more than 3 times in a week without action → reduce sensitivity or fix root cause.
-## Notification Routing (Alertmanager)
-```yaml
-route:
-  group_by: [alertname, namespace]
-  group_wait: 30s
-  group_interval: 5m
-  repeat_interval: 4h
-  receiver: slack-warning
-  routes:
-    - matchers: [severity="critical"]
-      receiver: pagerduty-oncall
-      continue: true
-    - matchers: [severity="critical"]
-      receiver: slack-critical
-```
+1. **Runbook required** — every alert includes `runbook_url` and service owner.
+2. **Actionability required** — alerts without a defined human or automated action are downgraded to dashboard signals.
+3. **Symptom-first** — page on user impact, not raw infrastructure noise.
+## Severity Model
+| Severity | Meaning | Response |
+|:---|:---|:---|
+| `critical` | Active user-impacting incident / fast error-budget burn | Page on-call immediately |
+| `warning` | Degradation trending toward SLO breach | Notify team channel, triage in business hours or sooner |
+| `info` | Context signal only | Dashboard or ticket, no paging |
+## Multi-Window Burn-Rate Standard
+4. Define at least:
+   - **fast burn** alert (e.g., ~1h window),
+   - **slow burn** alert (e.g., ~6h window).
+5. Fast burn pages on-call; slow burn creates prioritized reliability action.
+6. Burn-rate thresholds must map to error-budget policy and release gating.
+## Anti-Fatigue and Signal Hygiene
+7. Configure `for:` durations to reduce noise.
+8. If an alert fires repeatedly without action, either improve runbook/automation or retire the alert.
+9. Track alert precision/recall metrics during reliability reviews.
+## Routing and Escalation
+10. Route by service ownership and environment (prod vs non-prod).
+11. Define escalation path and timeout for unacknowledged critical alerts.
+12. Support maintenance windows and silence policies with audit logging.
+## Auto-Remediation
+13. For known-safe remediations (e.g., restart stateless worker), allow guarded auto-remediation.
+14. Auto-remediation actions must emit events and be reversible.

package/areas/devops/observability/rules/golden-signals.md CHANGED Viewed

@@ -1,28 +1,37 @@
-# Rule: Golden Signals & Observability Baseline
+# Rule: Golden Signals & SLO-First Observability Baseline
-**Priority**: P1 — Services without golden signals cannot be promoted to production.
+**Priority**: P1 — Services without measurable user-impact SLIs and enforceable SLO alerts cannot be promoted to production.
-## Four Golden Signals (mandatory for every service)
+## Required Coverage
-| Signal | Metric | Alert threshold |
+1. **Golden signals are mandatory**: latency, traffic, errors, saturation.
+2. **User-journey SLIs are mandatory** for critical flows (e.g., checkout success, login success, payment confirmation latency).
+3. **Instrumentation is vendor-neutral**: do not hardcode stack-specific ports/endpoints in policy; enforce metric contract and discoverability.
+## Signal Baseline
+| Signal | Minimum metric coverage | Alerting baseline |
 |:---|:---|:---|
-| **Latency** | p50, p95, p99 request duration | p99 > 1s for 5 min |
-| **Traffic** | Requests per second (RPS) | Drop > 50% from baseline |
-| **Errors** | 5xx rate / error rate | > 1% for 2 min |
-| **Saturation** | CPU %, memory %, queue depth | CPU > 80%, Memory > 85% |
+| Latency | p50/p95/p99 by endpoint/operation | p99 SLO burn alert |
+| Traffic | request rate + success volume | anomaly vs rolling baseline |
+| Errors | error rate by class (4xx/5xx/domain) | user-impacting error budget burn |
+| Saturation | CPU, memory, queue/concurrency, DB saturation | sustained saturation with user impact |
-## Instrumentation Requirements
+## SLO and Alerting Requirements
-1. **Every HTTP service exposes** `/metrics` in Prometheus format on port 9090 (or sidecar).
-2. **Every service has** a `ServiceMonitor` (kube-prometheus-stack) or scrape config.
-3. **Structured JSON logging** — no unstructured log lines in production.
-4. **Trace context propagated** — W3C TraceContext headers forwarded between all services.
-5. **Health endpoints** — `/health/ready` (readiness) and `/health/live` (liveness) separate.
+4. Define at least one availability and one latency SLO per critical service.
+5. Use multi-window multi-burn-rate alerting (fast + slow burn).
+6. Link alert severity to error-budget policy actions.
+7. Every alert must include runbook URL and primary owner.
-## Three Pillars Coverage
+## Cardinality and Cost Governance
-| Pillar | Stack | Retention |
-|:---|:---|:---|
-| Metrics | Prometheus + VictoriaMetrics (long-term) | 15 days hot / 1 year cold |
-| Logs | Loki or ELK (Elasticsearch+Logstash+Kibana) | 30 days |
-| Traces | Tempo or Jaeger (via OpenTelemetry) | 7 days |
+8. Define metric label cardinality budget per service.
+9. For high-cardinality telemetry, apply sampling, aggregation, or drop policy with documented rationale.
+10. Retention tiers must be explicit and mapped to compliance + incident forensics needs.
+## Trace and Log Correlation
+11. Propagate trace context across service boundaries.
+12. Ensure logs, metrics, and traces can be correlated via request/trace IDs.
+13. Sensitive data must be redacted before ingestion.

package/areas/devops/observability/skills/distributed-tracing/SKILL.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: distributed-tracing
 type: skill
-description: Implement distributed tracing with OpenTelemetry, Tempo/Jaeger — instrumentation, sampling, and trace-to-log correlation.
+description: "Implement distributed tracing with OpenTelemetry, Tempo/Jaeger — instrumentation, sampling, and trace-to-log correlation. Use when the user asks about distributed tracing, OpenTelemetry setup, span instrumentation, trace propagation, or connecting traces to logs and metrics."
 related-rules:
   - golden-signals.md
   - data-retention.md
@@ -16,6 +16,15 @@ allowed-tools: Read, Write, Edit
 When adding tracing to a service, debugging slow distributed transactions, or setting up trace → log → metric correlation.
+## End-to-End Setup Workflow
+1. **Deploy collector** — configure and deploy the OTel Collector as a DaemonSet (see config below)
+2. **Instrument service** — add SDK initialization and auto-instrumentation for your framework (Python/Go examples below)
+3. **Verify traces** — confirm traces appear in Tempo/Jaeger: `curl -s http://tempo:3200/api/search?q={}&limit=5`
+4. **Add log correlation** — inject `trace_id` and `span_id` into log lines for Loki/Grafana linkage
+5. **Validate linkage** — click a trace in Grafana → Explore → verify it links to the corresponding log entries
+6. **Tune sampling** — apply tail-based sampling policies for errors and slow traces (see strategy table)
 ## OpenTelemetry Collector (K8s DaemonSet)
 ```yaml

package/areas/software/backend/rules/security.md CHANGED Viewed

@@ -1,20 +1,40 @@
 # Rule: Backend Security & OWASP Standards
-**Priority**: P0 — Security vulnerabilities are release blockers.
+**Priority**: P0 — Security vulnerabilities affecting confidentiality, integrity, or availability are release blockers.
 ## OWASP-aligned baseline
-1. **Access control**
-   - Protect all endpoints with a default-deny posture.
-   - Use RBAC or ABAC and enforce resource-level authorization.
+1. **Access control and authorization**
+   - Default-deny endpoint access.
+   - Enforce RBAC/ABAC and object-level authorization for every resource operation.
+   - Test negative authorization paths (forbidden cross-tenant access).
-2. **Cryptography and secrets**
-   - Use Argon2id or bcrypt for password hashing.
-   - Store secrets in a dedicated secret manager (Vault/AWS Secrets Manager/etc.).
+2. **Authentication and session/token lifecycle**
+   - Validate issuer, audience, expiry, and signature for tokens.
+   - Use short-lived access tokens and revocable refresh strategy.
+   - Detect and alert on anomalous auth patterns.
-3. **Injection prevention**
-   - Never build SQL via string concatenation; use parameterized queries only.
-   - Validate input at system boundaries with typed DTO/schema validation.
+3. **Cryptography and secrets**
+   - Use modern algorithms (Argon2id/bcrypt for password hashing).
+   - Secrets only from dedicated manager (Vault/AWS/GCP/Azure secret stores).
+   - Define key/secret rotation and emergency revocation procedures.
-4. **Authentication hardening**
-   - Implement strong token/session validation, rotation, and auditability.
+4. **Injection and input safety**
+   - Parameterized queries only; no string-concatenated SQL.
+   - Strict DTO/schema validation at boundaries.
+   - Block mass assignment with explicit writable-field allowlists.
+5. **Critical web attack defenses**
+   - SSRF protections for outbound HTTP integrations.
+   - Safe deserialization only; reject untrusted executable payload formats.
+   - File upload protection: type validation, scanning, quarantine storage.
+6. **Service identity and least privilege**
+   - Prefer workload identity/mTLS for service-to-service auth.
+   - Avoid static credentials in runtime environments.
+   - Separate service accounts per environment and per bounded context.
+7. **Security telemetry and audit**
+   - Log auth events, privilege changes, and sensitive config access.
+   - Mask/redact sensitive fields in logs and traces.
+   - Propagate correlation IDs for incident reconstruction.

package/areas/software/frontend/skills/component-design/SKILL.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: component-design
 type: skill
-description: Design reusable React/Vue components with correct patterns, typed APIs, state handling, and accessibility.
+description: "Design reusable React components with compound patterns, controlled/uncontrolled hybrids, typed prop APIs, async state handling, and ARIA accessibility. Use when the user creates, refactors, or reviews React components, or mentions props, hooks, .tsx files, component APIs, or accessible UI patterns."
 related-rules:
   - architecture.md
   - accessibility.md
@@ -12,6 +12,18 @@ allowed-tools: Read, Write, Edit, Bash
 > **Expertise:** Compound components, controlled/uncontrolled, render props, component API design, accessibility requirements.
+## When to load
+When creating, refactoring, or reviewing React components — especially when choosing between compound, controlled/uncontrolled, or headless patterns, designing typed prop APIs, or implementing accessible interactive widgets.
+## Component Design Workflow
+1. **Choose pattern** — use the decision tree below to select the right component pattern
+2. **Define typed props** — follow the Props API Design Rules (explicit variants, no boolean explosion)
+3. **Implement all states** — loading, error, empty, success for any async data
+4. **Add accessibility** — use the ARIA requirements table to add correct roles and keyboard support
+5. **Verify** — confirm keyboard navigation works, screen reader announces states, and TypeScript compiles with `--strict`
 ## Pattern Selection Guide
 ```

package/areas/software/full-stack/rules/security-guide.md CHANGED Viewed

@@ -1,22 +1,58 @@
 ---
 trigger: model_decision
 glob: security-guide
-description: enforce secrets handling, input validation, and least privilege
+description: enforce cloud-native security controls: authZ, secure service identity, secrets lifecycle, and high-risk input protections
 ---
 # Security Rule
-**Rules:**
+**Priority**: P0 — Security regressions in authentication, authorization, data protection, or high-risk input handling block release.
-- Never hardcode secrets or credentials.
-- Validate all external input (API, DB, files).
-- Use Bearer Auth in headers.
-- Apply the least privilege for DB, API, files.
-- Encrypt sensitive data in transit and at rest.
-- Audit and sanitize logs to avoid secrets leakage.
+## Mandatory Controls
-**Violations:**
+1. **AuthN/AuthZ baseline**
+   - Default-deny authorization at route and resource level.
+   - Enforce object-level access checks (prevent IDOR/BOLA).
+   - Short-lived tokens with rotation and revocation strategy.
-- Raw secrets in code.
-- Unvalidated user input.
-- Elevated privileges without justification.
+2. **Service-to-service identity**
+   - Use workload identity / mTLS where possible.
+   - No long-lived static credentials between services.
+   - Scope service permissions to least privilege.
+3. **Input and output hardening**
+   - Validate all external input with strict schema and type constraints.
+   - Prevent mass assignment via explicit allowlists of writable fields.
+   - Apply output encoding/sanitization where user-generated content is rendered.
+4. **High-risk attack classes to address explicitly**
+   - SSRF protections (deny internal metadata ranges, allowlist outbound domains).
+   - Insecure deserialization prevention (safe parsers, signed payloads where needed).
+   - Unsafe file upload controls (MIME + extension + antivirus + storage isolation).
+5. **Secrets lifecycle**
+   - Secrets stored only in secret manager/vault.
+   - Rotation cadence defined per secret class.
+   - Emergency revocation runbook required.
+6. **Data and transport protection**
+   - Encrypt in transit (TLS 1.2+ minimum) and at rest.
+   - Avoid logging secrets, tokens, or sensitive PII.
+7. **Auditability**
+   - Security-relevant actions (auth, privilege changes, key operations) must be audit logged.
+   - Logs should be tamper-evident and correlated with request/user/service identity.
+## Release Security Checklist (required)
+- Threat model updated for new trust boundaries/data flows.
+- AuthZ checks validated by tests on critical endpoints.
+- Secret exposure checks run in CI and deployment logs.
+- Dependency risks triaged with exploitability status.
+## Violations
+- Hardcoded secrets or tokens in code/config.
+- Missing object-level authorization.
+- Unvalidated/deserialized untrusted payloads.
+- Service credentials shared across environments without isolation.