mindforge-cc 10.0.3 → 11.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.mindforge/MINDFORGE-V2-SCHEMA.json +43 -10
- package/.mindforge/config.json +30 -2
- package/.mindforge/engine/cross-model-eval.md +74 -0
- package/.mindforge/engine/proactive/signal-detector.md +60 -0
- package/.mindforge/engine/proactive/suggestion-engine.md +100 -0
- package/.mindforge/personas/agent-architect.md +57 -0
- package/.mindforge/personas/agent-evaluator.md +162 -0
- package/.mindforge/personas/agent-memory-designer.md +157 -0
- package/.mindforge/personas/agent-ops-engineer.md +120 -0
- package/.mindforge/personas/agent-orchestrator.md +112 -0
- package/.mindforge/personas/ai-economist.md +57 -0
- package/.mindforge/personas/ai-safety-engineer.md +57 -0
- package/.mindforge/personas/analytics-engineer.md +57 -0
- package/.mindforge/personas/anti-pattern-hunter.md +61 -0
- package/.mindforge/personas/api-gateway-designer.md +132 -0
- package/.mindforge/personas/auth-engineer.md +112 -0
- package/.mindforge/personas/build-engineer.md +57 -0
- package/.mindforge/personas/business-analyst.md +56 -0
- package/.mindforge/personas/cache-architect.md +100 -0
- package/.mindforge/personas/causal-scientist.md +57 -0
- package/.mindforge/personas/cdn-architect.md +118 -0
- package/.mindforge/personas/change-agent.md +104 -0
- package/.mindforge/personas/code-narrator.md +52 -0
- package/.mindforge/personas/codegen-specialist.md +68 -0
- package/.mindforge/personas/communication-architect.md +102 -0
- package/.mindforge/personas/compliance-engineer.md +96 -0
- package/.mindforge/personas/consensus-engineer.md +116 -0
- package/.mindforge/personas/contract-tester.md +60 -192
- package/.mindforge/personas/data-architect.md +108 -0
- package/.mindforge/personas/data-mesh-architect.md +57 -0
- package/.mindforge/personas/data-pipeline-architect.md +120 -0
- package/.mindforge/personas/de-sloppifier.md +60 -0
- package/.mindforge/personas/debt-manager.md +66 -0
- package/.mindforge/personas/decision-architect.md +82 -51
- package/.mindforge/personas/deployment-captain.md +74 -0
- package/.mindforge/personas/design-system-lead.md +112 -0
- package/.mindforge/personas/dmux-orchestrator.md +75 -0
- package/.mindforge/personas/dx-engineer.md +96 -0
- package/.mindforge/personas/ecommerce-engineer.md +57 -0
- package/.mindforge/personas/edge-engineer.md +94 -0
- package/.mindforge/personas/edtech-architect.md +106 -0
- package/.mindforge/personas/embedding-architect.md +57 -0
- package/.mindforge/personas/environment-engineer.md +57 -0
- package/.mindforge/personas/eval-judge.md +55 -0
- package/.mindforge/personas/event-architect.md +102 -0
- package/.mindforge/personas/experiment-designer.md +138 -0
- package/.mindforge/personas/feature-store-engineer.md +57 -0
- package/.mindforge/personas/finops-analyst.md +66 -0
- package/.mindforge/personas/fintech-architect.md +57 -0
- package/.mindforge/personas/flutter-engineer.md +104 -0
- package/.mindforge/personas/gaming-engineer.md +57 -0
- package/.mindforge/personas/graphql-designer.md +73 -0
- package/.mindforge/personas/healthcare-engineer.md +57 -0
- package/.mindforge/personas/hiring-strategist.md +105 -0
- package/.mindforge/personas/hitl-architect.md +165 -0
- package/.mindforge/personas/i18n-architect.md +69 -0
- package/.mindforge/personas/iot-architect.md +105 -0
- package/.mindforge/personas/knowledge-curator.md +139 -0
- package/.mindforge/personas/knowledge-engineer.md +57 -0
- package/.mindforge/personas/lakehouse-architect.md +57 -0
- package/.mindforge/personas/llm-orchestrator.md +57 -0
- package/.mindforge/personas/logistics-architect.md +106 -0
- package/.mindforge/personas/market-analyst.md +53 -0
- package/.mindforge/personas/marketplace-engineer.md +105 -0
- package/.mindforge/personas/mcp-designer.md +54 -0
- package/.mindforge/personas/meeting-designer.md +104 -0
- package/.mindforge/personas/mentorship-lead.md +106 -0
- package/.mindforge/personas/migration-architect.md +57 -0
- package/.mindforge/personas/ml-ops-engineer.md +101 -0
- package/.mindforge/personas/mobile-architect.md +105 -0
- package/.mindforge/personas/mobile-security-engineer.md +106 -0
- package/.mindforge/personas/multi-tenancy-architect.md +71 -0
- package/.mindforge/personas/multimodal-engineer.md +57 -0
- package/.mindforge/personas/offline-specialist.md +105 -0
- package/.mindforge/personas/onboarding-navigator.md +63 -0
- package/.mindforge/personas/payments-engineer.md +135 -0
- package/.mindforge/personas/pipeline-engineer.md +115 -0
- package/.mindforge/personas/platform-engineer.md +97 -0
- package/.mindforge/personas/platform-lead.md +57 -0
- package/.mindforge/personas/privacy-engineer.md +57 -0
- package/.mindforge/personas/product-owner.md +56 -0
- package/.mindforge/personas/productivity-analyst.md +57 -0
- package/.mindforge/personas/prompt-architect.md +101 -0
- package/.mindforge/personas/proofreader.md +53 -0
- package/.mindforge/personas/pwa-architect.md +105 -0
- package/.mindforge/personas/quality-scorer.md +63 -0
- package/.mindforge/personas/react-native-engineer.md +106 -0
- package/.mindforge/personas/resilience-engineer.md +69 -0
- package/.mindforge/personas/rfc-architect.md +64 -0
- package/.mindforge/personas/saga-orchestrator.md +80 -0
- package/.mindforge/personas/secrets-engineer.md +57 -0
- package/.mindforge/personas/skill-smith.md +79 -0
- package/.mindforge/personas/sre-lead.md +107 -0
- package/.mindforge/personas/stream-engineer.md +57 -0
- package/.mindforge/personas/streaming-engineer.md +64 -0
- package/.mindforge/personas/swarm-templates.json +674 -44
- package/.mindforge/personas/system-designer.md +57 -0
- package/.mindforge/personas/team-coach.md +120 -0
- package/.mindforge/personas/tech-lead-coach.md +103 -0
- package/.mindforge/personas/technical-writer-lead.md +111 -0
- package/.mindforge/personas/vibe-checker.md +75 -0
- package/.mindforge/personas/worktree-manager.md +56 -0
- package/.mindforge/personas/zero-trust-engineer.md +113 -0
- package/.mindforge/skills/a11y-testing/SKILL.md +143 -0
- package/.mindforge/skills/agent-evaluation-framework/SKILL.md +227 -0
- package/.mindforge/skills/agent-memory-design/SKILL.md +199 -0
- package/.mindforge/skills/agent-orchestration-patterns/SKILL.md +129 -0
- package/.mindforge/skills/agent-tool-selection/SKILL.md +204 -0
- package/.mindforge/skills/ai-agent-deployment/SKILL.md +176 -0
- package/.mindforge/skills/ai-cost-management/SKILL.md +57 -0
- package/.mindforge/skills/ai-safety-alignment/SKILL.md +53 -0
- package/.mindforge/skills/analytics-instrumentation/SKILL.md +172 -0
- package/.mindforge/skills/api-gateway-patterns/SKILL.md +177 -0
- package/.mindforge/skills/api-marketplace/SKILL.md +56 -0
- package/.mindforge/skills/api-versioning/SKILL.md +100 -0
- package/.mindforge/skills/app-store-deployment/SKILL.md +44 -0
- package/.mindforge/skills/architecture-tradeoff-analysis/SKILL.md +97 -0
- package/.mindforge/skills/audit-logging/SKILL.md +140 -0
- package/.mindforge/skills/auth-patterns/SKILL.md +148 -0
- package/.mindforge/skills/autonomous-agent-harness/SKILL.md +218 -0
- package/.mindforge/skills/autonomous-agents/SKILL.md +59 -0
- package/.mindforge/skills/build-system-optimization/SKILL.md +54 -0
- package/.mindforge/skills/build-vs-buy/SKILL.md +80 -0
- package/.mindforge/skills/bundle-optimization/SKILL.md +174 -0
- package/.mindforge/skills/business-analyst/SKILL.md +82 -0
- package/.mindforge/skills/caching-strategies/SKILL.md +132 -0
- package/.mindforge/skills/capacity-planning/SKILL.md +96 -0
- package/.mindforge/skills/causal-inference/SKILL.md +42 -0
- package/.mindforge/skills/cdn-optimization/SKILL.md +212 -0
- package/.mindforge/skills/change-management/SKILL.md +106 -0
- package/.mindforge/skills/chaos-engineering/SKILL.md +99 -0
- package/.mindforge/skills/ci-cd-pipeline/SKILL.md +118 -0
- package/.mindforge/skills/cli-design/SKILL.md +118 -0
- package/.mindforge/skills/code-generation-patterns/SKILL.md +92 -0
- package/.mindforge/skills/code-review-methodology/SKILL.md +180 -0
- package/.mindforge/skills/code-tour/SKILL.md +145 -0
- package/.mindforge/skills/codebase-onboarding/SKILL.md +95 -0
- package/.mindforge/skills/compliance-as-code/SKILL.md +195 -0
- package/.mindforge/skills/conflict-resolution/SKILL.md +87 -0
- package/.mindforge/skills/connection-pooling/SKILL.md +151 -0
- package/.mindforge/skills/container-security/SKILL.md +151 -0
- package/.mindforge/skills/context-engineering/SKILL.md +114 -0
- package/.mindforge/skills/contract-testing/SKILL.md +85 -0
- package/.mindforge/skills/cost-estimation/SKILL.md +82 -0
- package/.mindforge/skills/cqrs-event-sourcing/SKILL.md +95 -0
- package/.mindforge/skills/cross-platform-testing/SKILL.md +43 -0
- package/.mindforge/skills/data-governance/SKILL.md +42 -0
- package/.mindforge/skills/data-lakehouse/SKILL.md +42 -0
- package/.mindforge/skills/data-mesh/SKILL.md +42 -0
- package/.mindforge/skills/data-modeling/SKILL.md +107 -0
- package/.mindforge/skills/data-pipeline-design/SKILL.md +171 -0
- package/.mindforge/skills/data-privacy-engineering/SKILL.md +42 -0
- package/.mindforge/skills/database-performance/SKILL.md +174 -0
- package/.mindforge/skills/database-sharding-advanced/SKILL.md +206 -0
- package/.mindforge/skills/de-sloppify/SKILL.md +120 -0
- package/.mindforge/skills/defense-in-depth/SKILL.md +84 -0
- package/.mindforge/skills/delegation-patterns/SKILL.md +123 -0
- package/.mindforge/skills/dependency-management/SKILL.md +94 -0
- package/.mindforge/skills/deployment-workflow/SKILL.md +135 -0
- package/.mindforge/skills/design-system/SKILL.md +113 -0
- package/.mindforge/skills/developer-onboarding/SKILL.md +99 -0
- package/.mindforge/skills/developer-productivity-metrics/SKILL.md +59 -0
- package/.mindforge/skills/distributed-consensus/SKILL.md +141 -0
- package/.mindforge/skills/dmux-workflows/SKILL.md +141 -0
- package/.mindforge/skills/dns-architecture/SKILL.md +167 -0
- package/.mindforge/skills/ecommerce-architecture/SKILL.md +41 -0
- package/.mindforge/skills/edge-computing/SKILL.md +91 -0
- package/.mindforge/skills/edtech-platform/SKILL.md +41 -0
- package/.mindforge/skills/email-deliverability/SKILL.md +177 -0
- package/.mindforge/skills/embedding-systems/SKILL.md +55 -0
- package/.mindforge/skills/environment-management/SKILL.md +54 -0
- package/.mindforge/skills/error-handling-architecture/SKILL.md +118 -0
- package/.mindforge/skills/estimation-techniques/SKILL.md +113 -0
- package/.mindforge/skills/eval-harness/SKILL.md +180 -0
- package/.mindforge/skills/event-driven-architecture/SKILL.md +162 -0
- package/.mindforge/skills/experiment-design/SKILL.md +139 -0
- package/.mindforge/skills/experiment-platform/SKILL.md +43 -0
- package/.mindforge/skills/feature-engineering/SKILL.md +42 -0
- package/.mindforge/skills/feature-flag-management/SKILL.md +183 -0
- package/.mindforge/skills/fine-tuning-workflow/SKILL.md +189 -0
- package/.mindforge/skills/fintech-patterns/SKILL.md +41 -0
- package/.mindforge/skills/flutter-architecture/SKILL.md +42 -0
- package/.mindforge/skills/gaming-backend/SKILL.md +41 -0
- package/.mindforge/skills/git-workflow-design/SKILL.md +129 -0
- package/.mindforge/skills/graceful-degradation/SKILL.md +95 -0
- package/.mindforge/skills/graphql-patterns/SKILL.md +243 -0
- package/.mindforge/skills/guardrails-and-safety/SKILL.md +137 -0
- package/.mindforge/skills/healthcare-systems/SKILL.md +40 -0
- package/.mindforge/skills/hiring-engineering/SKILL.md +119 -0
- package/.mindforge/skills/human-in-the-loop-design/SKILL.md +234 -0
- package/.mindforge/skills/i18n-architecture/SKILL.md +147 -0
- package/.mindforge/skills/idempotency-patterns/SKILL.md +84 -0
- package/.mindforge/skills/incident-communication/SKILL.md +96 -0
- package/.mindforge/skills/incident-management/SKILL.md +97 -0
- package/.mindforge/skills/infrastructure-as-code/SKILL.md +98 -0
- package/.mindforge/skills/instinct-clustering/SKILL.md +190 -0
- package/.mindforge/skills/internal-developer-platform/SKILL.md +51 -0
- package/.mindforge/skills/iot-platform/SKILL.md +41 -0
- package/.mindforge/skills/k8s-deployment/SKILL.md +358 -0
- package/.mindforge/skills/knowledge-graphs/SKILL.md +56 -0
- package/.mindforge/skills/knowledge-sharing-systems/SKILL.md +112 -0
- package/.mindforge/skills/llm-cost-optimization/SKILL.md +198 -0
- package/.mindforge/skills/llm-orchestration/SKILL.md +56 -0
- package/.mindforge/skills/load-testing/SKILL.md +84 -0
- package/.mindforge/skills/logistics-optimization/SKILL.md +40 -0
- package/.mindforge/skills/market-researcher/SKILL.md +99 -0
- package/.mindforge/skills/marketplace-trust/SKILL.md +40 -0
- package/.mindforge/skills/mcp-server-patterns/SKILL.md +264 -0
- package/.mindforge/skills/media-streaming/SKILL.md +41 -0
- package/.mindforge/skills/meeting-architecture/SKILL.md +146 -0
- package/.mindforge/skills/mentoring-patterns/SKILL.md +77 -0
- package/.mindforge/skills/microservices-patterns/SKILL.md +83 -0
- package/.mindforge/skills/migration-platform/SKILL.md +61 -0
- package/.mindforge/skills/migration-strategies/SKILL.md +129 -0
- package/.mindforge/skills/ml-feature-store/SKILL.md +56 -0
- package/.mindforge/skills/ml-monitoring/SKILL.md +42 -0
- package/.mindforge/skills/mobile-performance/SKILL.md +44 -0
- package/.mindforge/skills/mobile-security/SKILL.md +45 -0
- package/.mindforge/skills/model-evaluation/SKILL.md +53 -0
- package/.mindforge/skills/monorepo-management/SKILL.md +100 -0
- package/.mindforge/skills/multi-tenancy-patterns/SKILL.md +145 -0
- package/.mindforge/skills/multi-turn-conversation-design/SKILL.md +206 -0
- package/.mindforge/skills/multimodal-ai/SKILL.md +51 -0
- package/.mindforge/skills/mutation-testing/SKILL.md +97 -0
- package/.mindforge/skills/notification-system-design/SKILL.md +168 -0
- package/.mindforge/skills/observability-stack/SKILL.md +136 -0
- package/.mindforge/skills/offline-first-design/SKILL.md +43 -0
- package/.mindforge/skills/on-call-design/SKILL.md +111 -0
- package/.mindforge/skills/pagination-patterns/SKILL.md +230 -0
- package/.mindforge/skills/payment-integration/SKILL.md +176 -0
- package/.mindforge/skills/performance-reviews/SKILL.md +140 -0
- package/.mindforge/skills/platform-observability/SKILL.md +58 -0
- package/.mindforge/skills/platform-reliability/SKILL.md +52 -0
- package/.mindforge/skills/post-incident-learning/SKILL.md +96 -0
- package/.mindforge/skills/product-manager/SKILL.md +104 -0
- package/.mindforge/skills/progressive-web-app/SKILL.md +44 -0
- package/.mindforge/skills/prompt-engineering/SKILL.md +94 -0
- package/.mindforge/skills/proofreader/SKILL.md +158 -0
- package/.mindforge/skills/push-notification-architecture/SKILL.md +45 -0
- package/.mindforge/skills/python-performance/SKILL.md +183 -0
- package/.mindforge/skills/quality-audit/SKILL.md +171 -0
- package/.mindforge/skills/queue-design/SKILL.md +85 -0
- package/.mindforge/skills/rag-architecture/SKILL.md +176 -0
- package/.mindforge/skills/rate-limiting-design/SKILL.md +94 -0
- package/.mindforge/skills/react-native-patterns/SKILL.md +42 -0
- package/.mindforge/skills/react-performance/SKILL.md +229 -0
- package/.mindforge/skills/real-time-analytics/SKILL.md +42 -0
- package/.mindforge/skills/real-time-sync/SKILL.md +83 -0
- package/.mindforge/skills/responsive-native/SKILL.md +44 -0
- package/.mindforge/skills/responsive-patterns/SKILL.md +141 -0
- package/.mindforge/skills/rfc-pipeline/SKILL.md +114 -0
- package/.mindforge/skills/saas-multi-tenant/SKILL.md +41 -0
- package/.mindforge/skills/santa-method/SKILL.md +134 -0
- package/.mindforge/skills/search-implementation/SKILL.md +98 -0
- package/.mindforge/skills/secrets-platform/SKILL.md +56 -0
- package/.mindforge/skills/secrets-rotation/SKILL.md +173 -0
- package/.mindforge/skills/self-serve-infrastructure/SKILL.md +51 -0
- package/.mindforge/skills/serverless-patterns/SKILL.md +119 -0
- package/.mindforge/skills/skill-creator-meta/SKILL.md +146 -0
- package/.mindforge/skills/sprint-retrospective-facilitation/SKILL.md +112 -0
- package/.mindforge/skills/stakeholder-communication/SKILL.md +85 -0
- package/.mindforge/skills/state-management/SKILL.md +104 -0
- package/.mindforge/skills/stream-processing/SKILL.md +43 -0
- package/.mindforge/skills/streaming-architecture/SKILL.md +81 -0
- package/.mindforge/skills/supply-chain-security/SKILL.md +145 -0
- package/.mindforge/skills/synthetic-data-generation/SKILL.md +52 -0
- package/.mindforge/skills/system-design/SKILL.md +88 -0
- package/.mindforge/skills/team-topology-design/SKILL.md +107 -0
- package/.mindforge/skills/technical-debt-management/SKILL.md +86 -0
- package/.mindforge/skills/technical-interview-design/SKILL.md +98 -0
- package/.mindforge/skills/technical-leadership/SKILL.md +75 -0
- package/.mindforge/skills/technical-writing/SKILL.md +237 -0
- package/.mindforge/skills/technology-radar/SKILL.md +88 -0
- package/.mindforge/skills/testing-anti-patterns/SKILL.md +288 -0
- package/.mindforge/skills/tool-design/SKILL.md +138 -0
- package/.mindforge/skills/typescript-advanced/SKILL.md +198 -0
- package/.mindforge/skills/using-git-worktrees/SKILL.md +139 -0
- package/.mindforge/skills/verification-loop/SKILL.md +13 -1
- package/.mindforge/skills/vibe-security/SKILL.md +165 -0
- package/.mindforge/skills/visual-regression-testing/SKILL.md +97 -0
- package/.mindforge/skills/websocket-patterns/SKILL.md +203 -0
- package/.mindforge/skills/writing-plans/SKILL.md +170 -0
- package/.mindforge/skills/writing-skills/SKILL.md +216 -0
- package/.mindforge/skills/zero-trust-architecture/SKILL.md +166 -0
- package/CHANGELOG.md +240 -0
- package/MINDFORGE.md +4 -4
- package/README.md +49 -4
- package/RELEASENOTES.md +80 -0
- package/SECURITY.md +20 -8
- package/bin/autonomous/audit-writer.js +13 -0
- package/bin/autonomous/auto-runner.js +74 -16
- package/bin/autonomous/context-refactorer.js +26 -11
- package/bin/autonomous/state-manager.js +62 -6
- package/bin/autonomous/stuck-monitor.js +46 -7
- package/bin/autonomous/wave-executor.js +66 -25
- package/bin/dashboard/api-router.js +43 -0
- package/bin/dashboard/metrics-aggregator.js +28 -1
- package/bin/dashboard/server.js +67 -4
- package/bin/dashboard/sse-bridge.js +4 -4
- package/bin/engine/feedback-loop.js +8 -0
- package/bin/engine/intelligence-interlock.js +32 -15
- package/bin/engine/logic-drift-detector.js +2 -1
- package/bin/engine/nexus-tracer.js +3 -2
- package/bin/engine/remediation-engine.js +155 -32
- package/bin/engine/self-corrective-synthesizer.js +84 -10
- package/bin/engine/sre-manager.js +12 -4
- package/bin/engine/temporal-hub.js +131 -34
- package/bin/governance/approve.js +41 -5
- package/bin/governance/impact-analyzer.js +28 -0
- package/bin/governance/policy-engine.js +10 -3
- package/bin/governance/quantum-crypto.js +32 -19
- package/bin/governance/rbac-manager.js +74 -2
- package/bin/governance/ztai-manager.js +49 -7
- package/bin/hindsight-injector.js +3 -3
- package/bin/memory/eis-client.js +71 -34
- package/bin/memory/embedding-engine.js +61 -0
- package/bin/memory/knowledge-graph.js +58 -5
- package/bin/memory/knowledge-indexer.js +53 -6
- package/bin/memory/knowledge-store.js +22 -0
- package/bin/migrations/10.7.0-to-11.0.0.js +110 -0
- package/bin/migrations/schema-versions.js +13 -0
- package/bin/models/anthropic-provider.js +45 -0
- package/bin/models/cloud-broker.js +68 -20
- package/bin/models/gemini-provider.js +51 -0
- package/bin/models/model-client.js +20 -0
- package/bin/models/model-router.js +28 -8
- package/bin/models/openai-provider.js +44 -0
- package/bin/utils/file-io.js +63 -1
- package/bin/utils/index.js +58 -0
- package/docs/getting-started.md +1 -1
- package/docs/user-guide.md +2 -2
- package/package.json +2 -2
- package/.mindforge/personas/data-privacy-engineer.md +0 -187
|
@@ -0,0 +1,176 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: payment-integration
|
|
3
|
+
version: 1.0.0
|
|
4
|
+
min_mindforge_version: 10.0.4
|
|
5
|
+
status: stable
|
|
6
|
+
triggers: payment integration, stripe architecture, payment webhook, idempotent charge, refund flow, PCI scope, payment state machine, subscription billing, payment retry, payment reconciliation, checkout flow, payment method tokenization
|
|
7
|
+
compose: security-review
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Skill — Payment Integration (Idempotent Payment Architecture)
|
|
11
|
+
|
|
12
|
+
## When this skill activates
|
|
13
|
+
When building or modifying payment flows, integrating payment providers (Stripe,
|
|
14
|
+
PayPal, Braintree), handling subscriptions, processing refunds, or dealing with
|
|
15
|
+
any money movement in the system. Also activates for PCI compliance considerations.
|
|
16
|
+
|
|
17
|
+
Core principle: **Idempotency is life** — every payment operation must be safe to
|
|
18
|
+
retry without charging the customer twice. When in doubt, err on the side of NOT
|
|
19
|
+
charging.
|
|
20
|
+
|
|
21
|
+
## Mandatory actions when this skill is active
|
|
22
|
+
|
|
23
|
+
### Payment State Machine
|
|
24
|
+
|
|
25
|
+
1. **Every payment has a well-defined state machine:**
|
|
26
|
+
```
|
|
27
|
+
States:
|
|
28
|
+
created → processing → succeeded
|
|
29
|
+
→ failed → (retry) → processing
|
|
30
|
+
succeeded → refund_pending → refunded
|
|
31
|
+
succeeded → disputed → dispute_won (funds returned)
|
|
32
|
+
→ dispute_lost (funds lost)
|
|
33
|
+
|
|
34
|
+
State transitions:
|
|
35
|
+
- created → processing: charge initiated with provider
|
|
36
|
+
- processing → succeeded: provider confirms capture
|
|
37
|
+
- processing → failed: provider declines or errors
|
|
38
|
+
- succeeded → refund_pending: refund initiated
|
|
39
|
+
- refund_pending → refunded: provider confirms refund
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
Rules:
|
|
43
|
+
- State transitions are APPEND-ONLY (never delete payment records)
|
|
44
|
+
- Every transition logged with timestamp, actor, and reason
|
|
45
|
+
- Failed payments can retry (max 3 attempts with exponential backoff)
|
|
46
|
+
- Terminal states: succeeded, refunded, dispute_won, dispute_lost
|
|
47
|
+
|
|
48
|
+
### Idempotency
|
|
49
|
+
|
|
50
|
+
2. **Idempotency key on every charge call:**
|
|
51
|
+
```
|
|
52
|
+
Idempotency key format: [user_id]-[order_id]-[attempt_number]
|
|
53
|
+
Example: usr_abc123-ord_xyz789-1
|
|
54
|
+
|
|
55
|
+
Rules:
|
|
56
|
+
- Generate idempotency key BEFORE calling payment provider
|
|
57
|
+
- Store key in database alongside payment intent
|
|
58
|
+
- If retry needed: increment attempt number, generate new key
|
|
59
|
+
- Provider stores result by key — retrying same key returns same result
|
|
60
|
+
- Key expiry: 24 hours (Stripe default) — don't retry after that
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
Critical: If the client retries (network timeout, unclear response), the
|
|
64
|
+
idempotency key ensures no double charge. This is non-negotiable.
|
|
65
|
+
|
|
66
|
+
### Webhook Processing
|
|
67
|
+
|
|
68
|
+
3. **Webhook handler requirements:**
|
|
69
|
+
```
|
|
70
|
+
1. Verify signature FIRST (reject if invalid — no processing)
|
|
71
|
+
2. Respond 200 immediately (within 5 seconds)
|
|
72
|
+
3. Process the event ASYNCHRONOUSLY (queue for background processing)
|
|
73
|
+
4. Process IDEMPOTENTLY (same webhook delivered twice = same outcome)
|
|
74
|
+
5. Handle OUT-OF-ORDER delivery (payment_intent.succeeded before payment_intent.created)
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
Implementation:
|
|
78
|
+
```
|
|
79
|
+
POST /webhooks/stripe
|
|
80
|
+
1. Verify: stripe.webhooks.constructEvent(body, sig, secret)
|
|
81
|
+
2. Dedup: check event.id against processed_events table
|
|
82
|
+
3. If already processed: return 200 (idempotent)
|
|
83
|
+
4. Queue: enqueue event for async processing
|
|
84
|
+
5. Return 200
|
|
85
|
+
6. [Async worker]: process event, update payment state, mark as processed
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
Rules:
|
|
89
|
+
- NEVER do business logic synchronously in the webhook handler
|
|
90
|
+
- Store raw webhook payload for debugging/replay
|
|
91
|
+
- Implement webhook replay for missed events (fetch from provider API)
|
|
92
|
+
- Monitor webhook lag (time between event creation and processing)
|
|
93
|
+
|
|
94
|
+
### PCI Scope Minimization
|
|
95
|
+
|
|
96
|
+
4. **Never touch raw card numbers:**
|
|
97
|
+
```
|
|
98
|
+
Client-side tokenization flow:
|
|
99
|
+
1. User enters card → Stripe.js/Elements captures it
|
|
100
|
+
2. Card data goes DIRECTLY to Stripe (never touches your server)
|
|
101
|
+
3. Stripe returns a token/PaymentMethod ID
|
|
102
|
+
4. Your server uses the token to create charges
|
|
103
|
+
|
|
104
|
+
Your server NEVER sees: card number, CVV, expiration date
|
|
105
|
+
Your PCI scope: SAQ-A (lowest level — just a questionnaire)
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
Rules:
|
|
109
|
+
- Use Stripe Elements, PayPal JS SDK, or equivalent client-side tokenization
|
|
110
|
+
- Never log request bodies that might contain card data
|
|
111
|
+
- Never store card data in your database (only token references)
|
|
112
|
+
- If using iframes: ensure they're from the payment provider's domain
|
|
113
|
+
- PCI-DSS audit not required if you stay at SAQ-A level
|
|
114
|
+
|
|
115
|
+
### Subscription Billing
|
|
116
|
+
|
|
117
|
+
5. **Subscription lifecycle:**
|
|
118
|
+
```
|
|
119
|
+
States: trial → active → past_due → canceled → expired
|
|
120
|
+
|
|
121
|
+
trial → active: trial period ends, first charge succeeds
|
|
122
|
+
active → past_due: renewal charge fails
|
|
123
|
+
past_due → active: retry succeeds
|
|
124
|
+
past_due → canceled: all retries exhausted + grace period ended
|
|
125
|
+
canceled → active: user resubscribes (new subscription)
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
Dunning (failed payment recovery):
|
|
129
|
+
```
|
|
130
|
+
Day 0: Charge fails → retry immediately
|
|
131
|
+
Day 1: Second retry
|
|
132
|
+
Day 3: Third retry + email notification ("update payment method")
|
|
133
|
+
Day 7: Final retry + urgent email + in-app banner
|
|
134
|
+
Day 14: Cancel subscription + final email ("your subscription has ended")
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
Rules:
|
|
138
|
+
- Dunning schedule is configurable per plan tier
|
|
139
|
+
- Always give users a way to update payment method without re-subscribing
|
|
140
|
+
- Prorate upgrades/downgrades (charge difference immediately or credit)
|
|
141
|
+
- Webhook: handle invoice.payment_failed for dunning triggers
|
|
142
|
+
|
|
143
|
+
### Reconciliation
|
|
144
|
+
|
|
145
|
+
6. **Daily reconciliation process:**
|
|
146
|
+
```
|
|
147
|
+
Every 24 hours:
|
|
148
|
+
1. Fetch all payments from provider API (last 48 hours, overlap for safety)
|
|
149
|
+
2. Match against internal payment records
|
|
150
|
+
3. Flag discrepancies:
|
|
151
|
+
- Payment in provider but not in our DB (missed webhook)
|
|
152
|
+
- Payment in our DB but not in provider (ghost record)
|
|
153
|
+
- Amount mismatch (partial capture, currency conversion)
|
|
154
|
+
- Status mismatch (we say succeeded, provider says failed)
|
|
155
|
+
4. Auto-resolve simple cases (missed webhook → replay)
|
|
156
|
+
5. Alert on unresolvable discrepancies (requires human review)
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
Rules:
|
|
160
|
+
- Reconciliation runs daily minimum, hourly for high-volume systems
|
|
161
|
+
- Use 48-hour overlap window to catch delayed settlements
|
|
162
|
+
- Discrepancy alerts go to finance + engineering
|
|
163
|
+
- Never auto-resolve amount mismatches (always flag for human)
|
|
164
|
+
|
|
165
|
+
## Self-check before task completion
|
|
166
|
+
|
|
167
|
+
Before marking a task done when this skill was active:
|
|
168
|
+
|
|
169
|
+
- [ ] Is there a well-defined state machine for payment lifecycle?
|
|
170
|
+
- [ ] Does every charge call include an idempotency key?
|
|
171
|
+
- [ ] Are webhooks verified (signature), deduplicated, and processed async?
|
|
172
|
+
- [ ] Is PCI scope minimized (client-side tokenization, no raw card data on server)?
|
|
173
|
+
- [ ] For subscriptions: is the dunning sequence defined with escalating notifications?
|
|
174
|
+
- [ ] Is daily reconciliation implemented (provider vs internal records)?
|
|
175
|
+
- [ ] Are all payment state transitions logged with timestamp and reason?
|
|
176
|
+
- [ ] Has the security-review skill been co-activated for this payment code?
|
|
@@ -0,0 +1,140 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: performance-reviews
|
|
3
|
+
version: 1.0.0
|
|
4
|
+
min_mindforge_version: 10.3.0
|
|
5
|
+
status: stable
|
|
6
|
+
triggers: performance review engineering, promotion case writing, feedback framework, calibration session, engineering evaluation criteria, performance improvement plan, impact documentation, promotion packet, peer feedback engineering, engineering levels, growth assessment, performance calibration
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# Performance Reviews
|
|
10
|
+
|
|
11
|
+
## When this skill activates
|
|
12
|
+
|
|
13
|
+
This skill activates when conducting engineering performance evaluations, writing promotion cases, designing feedback frameworks, participating in calibration sessions, creating performance improvement plans, or assessing growth against engineering levels. It applies to engineering managers, tech leads, and senior engineers involved in performance management.
|
|
14
|
+
|
|
15
|
+
## Mandatory actions when this skill is active
|
|
16
|
+
|
|
17
|
+
### Before performance reviews
|
|
18
|
+
|
|
19
|
+
1. **Define evaluation criteria explicitly** — What does success look like at each engineering level? Common dimensions: technical execution, system design, code quality, communication, collaboration, ownership, impact, mentorship. Map criteria to levels (junior, mid, senior, staff, principal).
|
|
20
|
+
2. **Collect evidence throughout the cycle** — Don't rely on memory. Keep a running doc of: projects shipped, PRs reviewed, incidents handled, design docs written, mentoring moments. Real-time logging prevents recency bias.
|
|
21
|
+
3. **Gather 360-degree feedback** — Ask peers, cross-functional partners, and direct reports (if applicable) for input. Single-source feedback is incomplete. Use structured prompts: "What does [Engineer] do well?" "Where could they grow?"
|
|
22
|
+
4. **Review the engineer's self-assessment** — Ask them to evaluate their own performance before writing your review. Gaps between self-assessment and manager assessment are learning opportunities.
|
|
23
|
+
|
|
24
|
+
### During performance evaluation
|
|
25
|
+
|
|
26
|
+
#### Engineering Level Expectations
|
|
27
|
+
|
|
28
|
+
Use a competency matrix to define clear expectations at each level. Example dimensions:
|
|
29
|
+
|
|
30
|
+
| Dimension | Junior | Mid-Level | Senior | Staff | Principal |
|
|
31
|
+
|-----------|--------|-----------|--------|-------|-----------|
|
|
32
|
+
| Scope of Work | Well-defined tasks | Small features | Full features/services | Multi-team projects | Org-wide initiatives |
|
|
33
|
+
| Technical Complexity | Low complexity | Medium complexity | High complexity | Architectural decisions | Strategic direction |
|
|
34
|
+
| Autonomy | Needs guidance | Some autonomy | Fully autonomous | Defines direction | Sets vision |
|
|
35
|
+
| Code Quality | Learns best practices | Applies best practices | Role models best practices | Defines standards | Elevates org quality |
|
|
36
|
+
| Design | Implements designs | Designs small features | Designs systems | Designs platforms | Shapes architecture |
|
|
37
|
+
| Mentorship | Learns from others | Helps peers | Mentors 1-2 juniors | Mentors team | Mentors org |
|
|
38
|
+
| Communication | Within team | Cross-team (technical) | Cross-functional | Executives + external | Industry thought leader |
|
|
39
|
+
| Impact | Individual tasks | Team features | Service/product | Organization | Company/industry |
|
|
40
|
+
|
|
41
|
+
#### Evaluation Process
|
|
42
|
+
|
|
43
|
+
- **Rate performance on each dimension** — Use a 1-5 scale: 1 = Below expectations, 2 = Partially meets, 3 = Meets, 4 = Exceeds, 5 = Greatly exceeds.
|
|
44
|
+
- **Provide specific examples** — Don't say "Strong communicator." Say "Led design review for Payment Service rewrite, clearly articulated tradeoffs, and incorporated feedback from 5 engineers."
|
|
45
|
+
- **Distinguish between performance at level vs readiness for next level** — Meeting expectations at Senior level doesn't automatically mean ready for Staff. Promotion requires sustained performance at the next level.
|
|
46
|
+
- **Identify growth areas** — Every engineer has gaps. Name them specifically and provide actionable guidance: "To reach Staff, you need to mentor 2-3 engineers and lead a cross-team project."
|
|
47
|
+
|
|
48
|
+
#### Feedback Framework: SBI + Coaching
|
|
49
|
+
|
|
50
|
+
Use **Situation-Behavior-Impact (SBI)** for developmental feedback:
|
|
51
|
+
- **Situation**: "In last week's design review..."
|
|
52
|
+
- **Behavior**: "...you dismissed Sarah's concern about edge cases without discussing it..."
|
|
53
|
+
- **Impact**: "...which made the team hesitant to raise concerns in future reviews."
|
|
54
|
+
|
|
55
|
+
Follow SBI with **Coaching**:
|
|
56
|
+
- "Next time, try acknowledging the concern and discussing it openly. Even if you ultimately disagree, demonstrating openness builds trust."
|
|
57
|
+
|
|
58
|
+
#### Writing Performance Reviews
|
|
59
|
+
|
|
60
|
+
**Structure:**
|
|
61
|
+
1. **Summary** — Overall performance (meets/exceeds expectations), 2-3 key strengths, 1-2 growth areas.
|
|
62
|
+
2. **Key Accomplishments** — 3-5 most impactful projects or contributions with specific outcomes (metrics, launches, quality improvements).
|
|
63
|
+
3. **Dimension-by-Dimension Assessment** — Rate and provide examples for each competency (technical execution, collaboration, ownership, etc.).
|
|
64
|
+
4. **Growth Areas** — 1-3 specific areas for development with actionable suggestions.
|
|
65
|
+
5. **Career Development** — If promotion-track, outline path to next level. If not promotion-track, outline how to grow within current level.
|
|
66
|
+
|
|
67
|
+
**Tone:**
|
|
68
|
+
- Be direct but supportive. Sugarcoating developmental feedback doesn't help.
|
|
69
|
+
- Use "I observed" not "you are." Focus on behavior, not identity.
|
|
70
|
+
- Balance positive and developmental. If someone is strong, say so. If they have gaps, name them.
|
|
71
|
+
|
|
72
|
+
#### Promotion Case Writing
|
|
73
|
+
|
|
74
|
+
**Promotion Readiness Criteria:**
|
|
75
|
+
- **Sustained performance at the next level** — For 6-12 months, not just one stellar project. Consistency matters.
|
|
76
|
+
- **Demonstrated scope expansion** — Taking on bigger, more complex, more ambiguous work.
|
|
77
|
+
- **Organizational impact** — Contributed beyond their immediate team (mentoring, tooling, process improvements).
|
|
78
|
+
|
|
79
|
+
**Promotion Packet Structure:**
|
|
80
|
+
1. **Summary** — Candidate name, current level, target level, tenure, manager endorsement.
|
|
81
|
+
2. **Case for Promotion** — Why are they ready? Use the competency matrix. Show where they meet or exceed next-level expectations.
|
|
82
|
+
3. **Key Accomplishments** — 3-5 high-impact projects with measurable outcomes. Align each to next-level competencies.
|
|
83
|
+
4. **Peer Feedback** — 3-5 quotes from peers, cross-functional partners, or senior engineers. Shows they operate at the next level already.
|
|
84
|
+
5. **Growth Areas** — Even strong candidates have gaps. Acknowledge them but show they're manageable.
|
|
85
|
+
6. **Comparison to Peers** — How does this candidate compare to others at the target level? Calibration context matters.
|
|
86
|
+
|
|
87
|
+
**Pitfall:** Nominating someone for promotion because they've been around a long time, not because they perform at the next level. Tenure is not a promotion criterion.
|
|
88
|
+
|
|
89
|
+
#### Performance Improvement Plans (PIPs)
|
|
90
|
+
|
|
91
|
+
**When to Use PIPs:**
|
|
92
|
+
- Performance is significantly below expectations for 2+ months.
|
|
93
|
+
- Specific, documented performance issues that haven't improved despite feedback.
|
|
94
|
+
- Not a surprise. PIP should be the culmination of ongoing feedback, not a sudden shock.
|
|
95
|
+
|
|
96
|
+
**PIP Structure:**
|
|
97
|
+
1. **Performance Gaps** — Specific areas where performance is below expectations. Use examples.
|
|
98
|
+
2. **Success Criteria** — What does improvement look like? Measurable, time-bound goals (e.g., "Ship 2 features with <2 rounds of rework within 60 days").
|
|
99
|
+
3. **Support Provided** — What will the manager, mentor, or team do to support improvement? (e.g., weekly 1:1s, pairing sessions, dedicated mentor).
|
|
100
|
+
4. **Timeline** — Typically 30-60 days. Clear checkpoints (15 days, 30 days).
|
|
101
|
+
5. **Consequences** — If performance doesn't improve, what happens? (Usually termination or role change.)
|
|
102
|
+
|
|
103
|
+
**Facilitation:**
|
|
104
|
+
- Weekly check-ins during the PIP period. Don't wait until the end to give feedback.
|
|
105
|
+
- Document everything. Notes from 1:1s, progress on goals, feedback given.
|
|
106
|
+
- Be honest but supportive. The goal is improvement, not punishment.
|
|
107
|
+
|
|
108
|
+
#### Calibration Sessions
|
|
109
|
+
|
|
110
|
+
**Purpose:** Ensure consistency in performance ratings across managers. Prevents rating inflation or deflation.
|
|
111
|
+
|
|
112
|
+
**Process:**
|
|
113
|
+
1. **Managers submit preliminary ratings** — Each manager rates their team members on the 1-5 scale.
|
|
114
|
+
2. **Group discussion** — Managers present outlier cases (all 5s, any 1s or 2s). Justify ratings with evidence.
|
|
115
|
+
3. **Identify inconsistencies** — If Manager A rates their team higher than Manager B for similar performance, probe why. Normalize.
|
|
116
|
+
4. **Adjust ratings** — Based on discussion, managers adjust ratings to reflect consistent standards.
|
|
117
|
+
|
|
118
|
+
**Best Practices:**
|
|
119
|
+
- Come prepared with evidence. Don't rely on vague impressions.
|
|
120
|
+
- Challenge inflation. If someone rates their entire team 4s and 5s, that's not a high-performing team; that's grade inflation.
|
|
121
|
+
- Use the competency matrix as the source of truth. Ratings should map to observable behaviors at each level.
|
|
122
|
+
|
|
123
|
+
### After performance reviews
|
|
124
|
+
|
|
125
|
+
- **Deliver feedback in 1:1s** — Don't just send the written review. Walk through it together. Give the engineer space to ask questions or disagree.
|
|
126
|
+
- **Create a growth plan** — Based on the review, define 30-60-90 day goals tied to growth areas. Make it concrete and actionable.
|
|
127
|
+
- **Follow up regularly** — Check progress on growth goals in 1:1s. Don't wait for the next review cycle to give feedback.
|
|
128
|
+
- **Track promotion pipeline** — Identify engineers who are on a promotion track. Ensure they get the projects and visibility needed to demonstrate readiness.
|
|
129
|
+
- **Document outcomes** — If performance improves or worsens, note it. Builds a longitudinal record that's useful for calibration and promotion discussions.
|
|
130
|
+
|
|
131
|
+
## Self-check before task completion
|
|
132
|
+
|
|
133
|
+
- [ ] Evaluation criteria are explicitly defined with clear expectations at each engineering level
|
|
134
|
+
- [ ] Evidence is collected throughout the review cycle (projects, PRs, incidents, mentoring)
|
|
135
|
+
- [ ] 360-degree feedback is gathered from peers, cross-functional partners, and reports
|
|
136
|
+
- [ ] Performance review includes specific examples for each competency, not vague statements
|
|
137
|
+
- [ ] Developmental feedback uses SBI framework (Situation-Behavior-Impact) with coaching
|
|
138
|
+
- [ ] Promotion case demonstrates sustained performance at the next level for 6-12 months
|
|
139
|
+
- [ ] Promotion packet includes key accomplishments, peer feedback, and calibration context
|
|
140
|
+
- [ ] PIPs include specific performance gaps, measurable success criteria, and support plan
|
|
@@ -0,0 +1,58 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: platform-observability
|
|
3
|
+
version: 1.0.0
|
|
4
|
+
min_mindforge_version: 10.7.0
|
|
5
|
+
status: stable
|
|
6
|
+
triggers: platform observability design, unified observability stack, trace correlation platform, log aggregation architecture, metrics cardinality management, observability platform, telemetry pipeline, distributed tracing platform, observability cost, observability data model, observability self-service, alert routing platform
|
|
7
|
+
compose: observability-stack
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Skill — Platform Observability
|
|
11
|
+
|
|
12
|
+
## When this skill activates
|
|
13
|
+
|
|
14
|
+
This skill activates when the user is designing or implementing platform observability capabilities. This includes unified observability stacks, trace correlation, log aggregation architecture, metrics cardinality management, telemetry pipelines, distributed tracing platforms, observability cost optimization, observability data models, self-service observability, and alert routing platforms.
|
|
15
|
+
|
|
16
|
+
## Mandatory actions when this skill is active
|
|
17
|
+
|
|
18
|
+
### Before writing any code
|
|
19
|
+
|
|
20
|
+
1. Audit current observability tooling: metrics (Prometheus, Datadog), logs (Elasticsearch, Splunk), traces (Jaeger, Honeycomb). Identify gaps and redundancies.
|
|
21
|
+
2. Assess observability cost: cost per service, cost per metric, cost per log line, cost per trace. Identify high-cardinality metrics and expensive log patterns.
|
|
22
|
+
3. Define observability requirements per service tier: critical services need full traces, non-critical services need sampled traces.
|
|
23
|
+
4. Map existing alert sprawl: how many alerts fire per week, what percentage are actionable. Target: reduce noise by 70-90%.
|
|
24
|
+
5. Establish observability SLOs: query latency (p95 < 3 seconds), data freshness (under 30 seconds), retention (metrics 30 days, logs 7 days, traces 7 days).
|
|
25
|
+
|
|
26
|
+
### During implementation
|
|
27
|
+
|
|
28
|
+
- **Unified Observability Stack:** Use OpenTelemetry for instrumentation (vendor-neutral). Collect metrics, logs, and traces via single SDK. Export to backend(s) of choice (Prometheus, Loki, Tempo). Avoids vendor lock-in.
|
|
29
|
+
- **Trace Correlation:** Link traces, logs, and metrics via trace ID. Every log line should include trace ID and span ID. Enables root-cause analysis by jumping from metric spike → trace → logs.
|
|
30
|
+
- **Log Aggregation Architecture:** Centralized log storage (Elasticsearch, Loki, CloudWatch). Use structured logging (JSON) with consistent schema. Index on: service, environment, level, trace_id. Retain logs for 7-30 days (compliance may require longer).
|
|
31
|
+
- **Metrics Cardinality Management:** High-cardinality labels (user_id, request_id) explode metric storage cost. Use exemplars (link to trace) instead. Limit labels to: service, environment, region, status_code. Target: under 10,000 time series per service.
|
|
32
|
+
- **Telemetry Pipeline:** Decouple collection from storage. Use OpenTelemetry Collector as aggregation layer. Enables: sampling, filtering, enrichment, multi-backend export. Pipeline should handle 100k+ events/second.
|
|
33
|
+
- **Distributed Tracing:** Instrument all services with OpenTelemetry. Use head-based sampling (sample 1-10% of traces) or tail-based sampling (sample slow/error traces at 100%, fast traces at 1%). Traces should include: service name, operation, duration, status, attributes (http.method, db.statement).
|
|
34
|
+
- **Observability Cost Optimization:** Sample aggressively (1-10% for most services). Drop high-volume, low-value logs (health checks, debug logs in prod). Use tiered storage (hot: 7 days, warm: 30 days, cold: 90 days). Target: observability cost under 5% of infrastructure cost.
|
|
35
|
+
- **Observability Data Model:** Use semantic conventions (OpenTelemetry) for consistent attribute naming. Define standard labels: service.name, deployment.environment, service.version. Enables cross-service queries and dashboards.
|
|
36
|
+
- **Self-Service Observability:** Developers provision dashboards and alerts via IaC (Terraform, Jsonnet). Pre-built dashboard templates for common patterns (RED, USE, Golden Signals). Query language accessible to non-SREs (LogQL, PromQL with examples).
|
|
37
|
+
- **Alert Routing:** Route alerts to appropriate team via PagerDuty, Opsgenie, or Slack. Use severity levels: SEV1 (page), SEV2 (urgent Slack), SEV3 (non-urgent Slack). Alerts should include: runbook link, suggested queries, recent changes.
|
|
38
|
+
|
|
39
|
+
### After implementation
|
|
40
|
+
|
|
41
|
+
- Verify all services emit metrics, logs, and traces via OpenTelemetry.
|
|
42
|
+
- Confirm trace IDs are propagated and linked across logs, metrics, and traces.
|
|
43
|
+
- Validate metrics cardinality is under 10,000 time series per service.
|
|
44
|
+
- Ensure telemetry pipeline handles 100k+ events/second with sampling and filtering.
|
|
45
|
+
- Check that observability cost is under 5% of infrastructure cost.
|
|
46
|
+
|
|
47
|
+
## Self-check before task completion
|
|
48
|
+
|
|
49
|
+
- [ ] Unified observability stack uses OpenTelemetry for vendor-neutral instrumentation.
|
|
50
|
+
- [ ] Trace correlation links metrics, logs, and traces via trace ID and span ID.
|
|
51
|
+
- [ ] Log aggregation uses structured logging (JSON) with consistent schema.
|
|
52
|
+
- [ ] Metrics cardinality is managed: under 10,000 time series per service.
|
|
53
|
+
- [ ] Telemetry pipeline decouples collection from storage and handles 100k+ events/second.
|
|
54
|
+
- [ ] Distributed tracing uses sampling (head or tail-based) to control costs.
|
|
55
|
+
- [ ] Observability cost is under 5% of infrastructure cost via sampling and tiered storage.
|
|
56
|
+
- [ ] Observability data model uses OpenTelemetry semantic conventions.
|
|
57
|
+
- [ ] Self-service observability enables developers to provision dashboards and alerts via IaC.
|
|
58
|
+
- [ ] Alert routing uses severity levels and includes runbook links.
|
|
@@ -0,0 +1,52 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: platform-reliability
|
|
3
|
+
version: 1.0.0
|
|
4
|
+
min_mindforge_version: 10.7.0
|
|
5
|
+
status: stable
|
|
6
|
+
triggers: platform reliability engineering, SLO management platform, error budget policy, platform availability design, capacity management platform, platform SLA, reliability target, platform health metric, platform uptime, error budget spending, toil reduction platform, platform incident prevention
|
|
7
|
+
compose: incident-management
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Skill — Platform Reliability
|
|
11
|
+
|
|
12
|
+
## When this skill activates
|
|
13
|
+
|
|
14
|
+
This skill activates when the user is designing or implementing platform reliability capabilities. This includes SLO management systems, error budget policies, platform availability design, capacity management, platform health metrics, reliability targets, uptime monitoring, error budget tracking, toil reduction initiatives, and platform incident prevention strategies.
|
|
15
|
+
|
|
16
|
+
## Mandatory actions when this skill is active
|
|
17
|
+
|
|
18
|
+
### Before writing any code
|
|
19
|
+
|
|
20
|
+
1. Inventory all platform services and their current reliability posture (uptime, error rates, latency, throughput).
|
|
21
|
+
2. Define SLOs for each platform capability (e.g., 99.9% for APIs, 99.5% for batch jobs, 99.99% for critical path services).
|
|
22
|
+
3. Establish error budget policy: what happens when budget is exhausted (freeze launches, prioritize reliability work).
|
|
23
|
+
4. Identify toil sources (manual escalations, runbook execution, repetitive debugging) and quantify hours spent per week.
|
|
24
|
+
5. Map platform dependencies and identify single points of failure that require redundancy.
|
|
25
|
+
|
|
26
|
+
### During implementation
|
|
27
|
+
|
|
28
|
+
- **SLO Management:** Define SLOs as percentiles over rolling windows (e.g., 99th percentile latency < 200ms over 28 days). Avoid averages (they hide outliers). Each SLO should have: objective, measurement window, error budget calculation, and owner.
|
|
29
|
+
- **Error Budget Policy:** If error budget is exhausted, automatically freeze non-critical deployments and redirect engineering time to reliability improvements. Budget resets monthly or quarterly. Include exemptions for security patches.
|
|
30
|
+
- **Platform Availability Design:** Use multi-region active-active for critical path services. Implement circuit breakers, rate limiting, and graceful degradation. Platform should survive single availability zone failure with zero downtime.
|
|
31
|
+
- **Capacity Management:** Track resource utilization (CPU, memory, disk, network) and predict exhaustion 30-90 days in advance. Automate horizontal scaling for stateless services. Capacity alerts should fire before user-visible impact.
|
|
32
|
+
- **Platform Health Metrics:** Track: request rate, error rate, latency (p50, p95, p99), saturation, and throughput. Use RED (Rate, Errors, Duration) for services and USE (Utilization, Saturation, Errors) for infrastructure. Dashboards should load in under 3 seconds.
|
|
33
|
+
- **Toil Reduction:** Automate repetitive tasks that consume more than 2 hours per week. Toil reduction should free up 30-50% of on-call time within 6 months. Track toil hours saved as a platform metric.
|
|
34
|
+
- **Incident Prevention:** Use chaos engineering to validate failure modes (kill instances, partition networks, inject latency). Run game days quarterly. Each incident should produce at least one actionable prevention task.
|
|
35
|
+
|
|
36
|
+
### After implementation
|
|
37
|
+
|
|
38
|
+
- Verify each platform service has defined SLOs, error budgets, and dashboards tracking compliance.
|
|
39
|
+
- Confirm error budget policy is enforced automatically (deployment freezes when budget exhausted).
|
|
40
|
+
- Validate multi-region failover works via chaos engineering tests (kill a region, verify zero downtime).
|
|
41
|
+
- Ensure capacity management predicts exhaustion 30-90 days in advance with alerts.
|
|
42
|
+
- Check that toil reduction initiatives have freed up measurable on-call time (tracked weekly).
|
|
43
|
+
|
|
44
|
+
## Self-check before task completion
|
|
45
|
+
|
|
46
|
+
- [ ] Each platform service has SLOs defined as percentiles over rolling windows.
|
|
47
|
+
- [ ] Error budget policy automatically freezes non-critical deployments when budget exhausted.
|
|
48
|
+
- [ ] Platform survives single availability zone failure with zero user-visible downtime.
|
|
49
|
+
- [ ] Capacity management predicts resource exhaustion 30-90 days in advance.
|
|
50
|
+
- [ ] Platform health metrics use RED for services and USE for infrastructure.
|
|
51
|
+
- [ ] Toil reduction initiatives free up 30-50% of on-call time within 6 months.
|
|
52
|
+
- [ ] Chaos engineering validates failure modes quarterly via game days.
|
|
@@ -0,0 +1,96 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: post-incident-learning
|
|
3
|
+
version: 1.0.0
|
|
4
|
+
min_mindforge_version: 10.1.0
|
|
5
|
+
status: stable
|
|
6
|
+
triggers: post-incident learning, systemic pattern, defense mechanism, recurrence prevention, incident class, contributing factor analysis, improvement measurement, learning review, incident pattern, failure class prevention, defense layer, systemic fix
|
|
7
|
+
compose: incident-management
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Post-Incident Learning
|
|
11
|
+
|
|
12
|
+
## When this skill activates
|
|
13
|
+
|
|
14
|
+
This skill activates after an incident has been resolved and the team needs to extract
|
|
15
|
+
lasting organizational learning. It goes beyond traditional postmortems (which document
|
|
16
|
+
what happened) to identify failure classes, build defense mechanisms, and measure
|
|
17
|
+
whether the organization is actually getting better at preventing recurrence.
|
|
18
|
+
|
|
19
|
+
## Mandatory actions when this skill is active
|
|
20
|
+
|
|
21
|
+
### Before
|
|
22
|
+
|
|
23
|
+
1. **Gather incident data** — Timeline, logs, communications, actions taken, resolution
|
|
24
|
+
steps. Ensure the raw facts are documented before memory fades.
|
|
25
|
+
2. **Identify participants** — Who was involved in detection, response, and resolution?
|
|
26
|
+
Schedule the learning review within 72 hours of resolution.
|
|
27
|
+
3. **Set the frame** — This is a learning exercise, not a blame exercise. Establish
|
|
28
|
+
psychological safety explicitly. No individual fault-finding.
|
|
29
|
+
|
|
30
|
+
### During
|
|
31
|
+
|
|
32
|
+
4. **Distinguish postmortem from learning:**
|
|
33
|
+
- Postmortem = What happened? (timeline, root cause, impact)
|
|
34
|
+
- Learning = What patterns do we now defend against? (systemic improvement)
|
|
35
|
+
- This skill focuses on the LEARNING phase that follows the postmortem.
|
|
36
|
+
|
|
37
|
+
5. **Contributing factor analysis (go beyond root cause):**
|
|
38
|
+
- **Proximate cause** — The immediate trigger (e.g., bad deploy, config change).
|
|
39
|
+
- **Contributing factors** — Conditions that allowed the trigger to cause harm
|
|
40
|
+
(e.g., missing validation, no canary, insufficient monitoring).
|
|
41
|
+
- **Systemic conditions** — Organizational patterns that created the contributing
|
|
42
|
+
factors (e.g., time pressure, unclear ownership, technical debt tolerance).
|
|
43
|
+
- Map all three levels. Fixes at only the proximate level guarantee recurrence.
|
|
44
|
+
|
|
45
|
+
6. **Identify the incident CLASS:**
|
|
46
|
+
- This is not just one incident — what CLASS of failure does it represent?
|
|
47
|
+
- Examples: "deploy without validation," "silent dependency failure," "config
|
|
48
|
+
drift between environments," "cascading timeout."
|
|
49
|
+
- Name the class explicitly. Search history for past incidents of the same class.
|
|
50
|
+
- If this class has occurred before, the previous fixes were insufficient.
|
|
51
|
+
|
|
52
|
+
7. **Design defense mechanisms (layered):**
|
|
53
|
+
- **Layer 1: Automated prevention** — Make the failure impossible through code,
|
|
54
|
+
infrastructure, or tooling changes (strongest defense).
|
|
55
|
+
- **Layer 2: Automated detection** — If prevention is impossible, detect instantly
|
|
56
|
+
and auto-remediate or alert within seconds.
|
|
57
|
+
- **Layer 3: Process improvement** — Checklists, review steps, approval gates
|
|
58
|
+
(weakest defense — humans forget).
|
|
59
|
+
- Never accept "be more careful" as a defense mechanism.
|
|
60
|
+
- Prefer Layer 1 > Layer 2 > Layer 3 always.
|
|
61
|
+
|
|
62
|
+
8. **Define improvement measurements:**
|
|
63
|
+
- **Mean time between incidents of same class** — Trending up means defenses work.
|
|
64
|
+
- **Detection time** — Time from failure occurrence to human awareness.
|
|
65
|
+
- **Recovery time** — Time from awareness to full resolution.
|
|
66
|
+
- **Blast radius** — Users/systems affected (should shrink over time).
|
|
67
|
+
- Set specific targets for each metric.
|
|
68
|
+
|
|
69
|
+
9. **Create action items with teeth:**
|
|
70
|
+
- Each action item must have: owner, deadline, definition of done, and verification
|
|
71
|
+
method.
|
|
72
|
+
- Classify priority: P0 (fix this week), P1 (fix this sprint), P2 (fix this quarter).
|
|
73
|
+
- Track completion publicly. Incomplete incident actions are organizational debt.
|
|
74
|
+
|
|
75
|
+
### After
|
|
76
|
+
|
|
77
|
+
10. **Share the learning broadly** — Publish findings to the engineering organization.
|
|
78
|
+
Other teams may have the same class of vulnerability.
|
|
79
|
+
11. **Update runbooks and alerts** — Ensure the detection and response improvements are
|
|
80
|
+
codified in operational documentation.
|
|
81
|
+
12. **Schedule verification** — In 30 days, verify that defense mechanisms are in place
|
|
82
|
+
and metrics show improvement. If not, escalate.
|
|
83
|
+
13. **Feed into incident class tracker** — Maintain an organizational record of incident
|
|
84
|
+
classes, their defenses, and recurrence rates.
|
|
85
|
+
|
|
86
|
+
## Self-check before task completion
|
|
87
|
+
|
|
88
|
+
- [ ] Contributing factors analyzed at all three levels (proximate, contributing, systemic)
|
|
89
|
+
- [ ] Incident class identified and named (not just this one incident)
|
|
90
|
+
- [ ] Historical incidents of same class searched and referenced
|
|
91
|
+
- [ ] Defense mechanisms designed with preference for automation over process
|
|
92
|
+
- [ ] No action item says "be more careful" or equivalent
|
|
93
|
+
- [ ] Metrics defined with specific improvement targets
|
|
94
|
+
- [ ] All action items have owner, deadline, and verification method
|
|
95
|
+
- [ ] Learning shared beyond the immediate team
|
|
96
|
+
- [ ] 30-day verification scheduled
|
|
@@ -0,0 +1,104 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: product-manager
|
|
3
|
+
version: 1.0.0
|
|
4
|
+
min_mindforge_version: 10.0.6
|
|
5
|
+
status: stable
|
|
6
|
+
triggers: user story, PRD, product requirements, backlog prioritization, RICE score, MoSCoW, jobs to be done, feature scoring, sprint planning, product backlog, acceptance criteria, user journey
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# Skill — Product Manager
|
|
10
|
+
|
|
11
|
+
## When this skill activates
|
|
12
|
+
Any task involving product requirements, user story writing, backlog prioritization,
|
|
13
|
+
feature scoring, sprint planning, PRD creation, or user journey mapping.
|
|
14
|
+
|
|
15
|
+
## Mandatory actions when this skill is active
|
|
16
|
+
|
|
17
|
+
### Before
|
|
18
|
+
|
|
19
|
+
1. **Define the problem** — Articulate the user problem with evidence (tickets, data, interviews). No solutions yet.
|
|
20
|
+
2. **Identify personas** — 1-3 specific personas with context, goals, and frustrations.
|
|
21
|
+
3. **State success metrics** — Define how to measure success BEFORE designing the solution.
|
|
22
|
+
|
|
23
|
+
### During
|
|
24
|
+
|
|
25
|
+
#### PRD structure (6 mandatory sections)
|
|
26
|
+
1. **Problem Statement** — data-backed, who has it, how we know
|
|
27
|
+
2. **User Personas** — context, goal, frustration per persona
|
|
28
|
+
3. **Requirements** — functional (numbered, prioritized) + non-functional (perf, a11y)
|
|
29
|
+
4. **Success Metrics** — current baseline, target, measurement method per metric
|
|
30
|
+
5. **Scope + Timeline** — phases with explicit out-of-scope items
|
|
31
|
+
6. **Risks + Mitigations** — risk, probability, impact, mitigation plan
|
|
32
|
+
|
|
33
|
+
#### User story format
|
|
34
|
+
```
|
|
35
|
+
As a [persona], I want [action] so that [outcome/value].
|
|
36
|
+
|
|
37
|
+
Rules:
|
|
38
|
+
- One story = one testable behavior
|
|
39
|
+
- Always include "so that" (forces value articulation)
|
|
40
|
+
- Completable in one sprint (split if larger)
|
|
41
|
+
- Every story has acceptance criteria attached
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
#### Acceptance criteria (Given/When/Then)
|
|
45
|
+
```gherkin
|
|
46
|
+
Scenario: [descriptive name]
|
|
47
|
+
Given [precondition/context]
|
|
48
|
+
When [action taken]
|
|
49
|
+
Then [observable outcome]
|
|
50
|
+
And [additional assertions]
|
|
51
|
+
```
|
|
52
|
+
Cover: happy path, edge cases, error states, boundary conditions.
|
|
53
|
+
|
|
54
|
+
#### RICE scoring
|
|
55
|
+
```
|
|
56
|
+
Score = (Reach * Impact * Confidence) / Effort
|
|
57
|
+
Reach: users affected per quarter
|
|
58
|
+
Impact: 3=massive, 2=high, 1=medium, 0.5=low, 0.25=minimal
|
|
59
|
+
Confidence: 100%=high, 80%=medium, 50%=low
|
|
60
|
+
Effort: person-weeks
|
|
61
|
+
```
|
|
62
|
+
Show the math. Rank by score. Communicate rationale for top picks.
|
|
63
|
+
|
|
64
|
+
#### MoSCoW prioritization
|
|
65
|
+
- Must Have: non-negotiable for launch (failure without these)
|
|
66
|
+
- Should Have: expected, but launch survives without them
|
|
67
|
+
- Could Have: nice-to-have if time permits
|
|
68
|
+
- Won't Have: explicitly deferred (prevents scope creep)
|
|
69
|
+
|
|
70
|
+
#### Jobs-to-be-Done framework
|
|
71
|
+
```
|
|
72
|
+
Interview structure (45-60 min):
|
|
73
|
+
1. First Thought — trigger that started the search
|
|
74
|
+
2. Passive Looking — alternatives considered
|
|
75
|
+
3. Active Looking — event that forced action NOW
|
|
76
|
+
4. Decision — why this solution, what almost stopped them
|
|
77
|
+
5. Satisfaction — does it deliver, what would cause switching
|
|
78
|
+
|
|
79
|
+
Output: "When [situation], I want to [motivation], so I can [outcome]."
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
#### User journey mapping
|
|
83
|
+
```
|
|
84
|
+
Stages: Awareness → Consideration → Setup → First Value → Expansion
|
|
85
|
+
Per stage: Actions, Touchpoints, Emotions, Pain Points, Opportunities, Metrics
|
|
86
|
+
```
|
|
87
|
+
Identify the critical drop-off points and design interventions for each.
|
|
88
|
+
|
|
89
|
+
### After
|
|
90
|
+
|
|
91
|
+
1. **Validate with users** — Show PRD to 2-3 target users. Confirm problem resonates.
|
|
92
|
+
2. **Engineering feasibility** — Tech lead confirms effort estimates and constraints.
|
|
93
|
+
3. **Stakeholder sign-off** — Explicit agreement on v1 scope vs deferred.
|
|
94
|
+
4. **Define done** — What must be true (metrics hit, not just code deployed).
|
|
95
|
+
|
|
96
|
+
## Self-check before task completion
|
|
97
|
+
- [ ] Problem statement evidence-backed (data, quotes, ticket volume)
|
|
98
|
+
- [ ] Personas specific with context, goals, and frustrations
|
|
99
|
+
- [ ] Success metrics have baseline, target, and measurement method
|
|
100
|
+
- [ ] Stories follow "As a... I want... So that..." with acceptance criteria
|
|
101
|
+
- [ ] Backlog prioritized with visible math (RICE/MoSCoW/WSJF)
|
|
102
|
+
- [ ] Scope states what is OUT as well as IN
|
|
103
|
+
- [ ] User journey maps full experience from awareness to expansion
|
|
104
|
+
- [ ] Engineering validated feasibility and effort
|