mindforge-cc 10.0.3 → 11.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.mindforge/MINDFORGE-V2-SCHEMA.json +43 -10
- package/.mindforge/config.json +30 -2
- package/.mindforge/engine/cross-model-eval.md +74 -0
- package/.mindforge/engine/proactive/signal-detector.md +60 -0
- package/.mindforge/engine/proactive/suggestion-engine.md +100 -0
- package/.mindforge/personas/agent-architect.md +57 -0
- package/.mindforge/personas/agent-evaluator.md +162 -0
- package/.mindforge/personas/agent-memory-designer.md +157 -0
- package/.mindforge/personas/agent-ops-engineer.md +120 -0
- package/.mindforge/personas/agent-orchestrator.md +112 -0
- package/.mindforge/personas/ai-economist.md +57 -0
- package/.mindforge/personas/ai-safety-engineer.md +57 -0
- package/.mindforge/personas/analytics-engineer.md +57 -0
- package/.mindforge/personas/anti-pattern-hunter.md +61 -0
- package/.mindforge/personas/api-gateway-designer.md +132 -0
- package/.mindforge/personas/auth-engineer.md +112 -0
- package/.mindforge/personas/build-engineer.md +57 -0
- package/.mindforge/personas/business-analyst.md +56 -0
- package/.mindforge/personas/cache-architect.md +100 -0
- package/.mindforge/personas/causal-scientist.md +57 -0
- package/.mindforge/personas/cdn-architect.md +118 -0
- package/.mindforge/personas/change-agent.md +104 -0
- package/.mindforge/personas/code-narrator.md +52 -0
- package/.mindforge/personas/codegen-specialist.md +68 -0
- package/.mindforge/personas/communication-architect.md +102 -0
- package/.mindforge/personas/compliance-engineer.md +96 -0
- package/.mindforge/personas/consensus-engineer.md +116 -0
- package/.mindforge/personas/contract-tester.md +60 -192
- package/.mindforge/personas/data-architect.md +108 -0
- package/.mindforge/personas/data-mesh-architect.md +57 -0
- package/.mindforge/personas/data-pipeline-architect.md +120 -0
- package/.mindforge/personas/de-sloppifier.md +60 -0
- package/.mindforge/personas/debt-manager.md +66 -0
- package/.mindforge/personas/decision-architect.md +82 -51
- package/.mindforge/personas/deployment-captain.md +74 -0
- package/.mindforge/personas/design-system-lead.md +112 -0
- package/.mindforge/personas/dmux-orchestrator.md +75 -0
- package/.mindforge/personas/dx-engineer.md +96 -0
- package/.mindforge/personas/ecommerce-engineer.md +57 -0
- package/.mindforge/personas/edge-engineer.md +94 -0
- package/.mindforge/personas/edtech-architect.md +106 -0
- package/.mindforge/personas/embedding-architect.md +57 -0
- package/.mindforge/personas/environment-engineer.md +57 -0
- package/.mindforge/personas/eval-judge.md +55 -0
- package/.mindforge/personas/event-architect.md +102 -0
- package/.mindforge/personas/experiment-designer.md +138 -0
- package/.mindforge/personas/feature-store-engineer.md +57 -0
- package/.mindforge/personas/finops-analyst.md +66 -0
- package/.mindforge/personas/fintech-architect.md +57 -0
- package/.mindforge/personas/flutter-engineer.md +104 -0
- package/.mindforge/personas/gaming-engineer.md +57 -0
- package/.mindforge/personas/graphql-designer.md +73 -0
- package/.mindforge/personas/healthcare-engineer.md +57 -0
- package/.mindforge/personas/hiring-strategist.md +105 -0
- package/.mindforge/personas/hitl-architect.md +165 -0
- package/.mindforge/personas/i18n-architect.md +69 -0
- package/.mindforge/personas/iot-architect.md +105 -0
- package/.mindforge/personas/knowledge-curator.md +139 -0
- package/.mindforge/personas/knowledge-engineer.md +57 -0
- package/.mindforge/personas/lakehouse-architect.md +57 -0
- package/.mindforge/personas/llm-orchestrator.md +57 -0
- package/.mindforge/personas/logistics-architect.md +106 -0
- package/.mindforge/personas/market-analyst.md +53 -0
- package/.mindforge/personas/marketplace-engineer.md +105 -0
- package/.mindforge/personas/mcp-designer.md +54 -0
- package/.mindforge/personas/meeting-designer.md +104 -0
- package/.mindforge/personas/mentorship-lead.md +106 -0
- package/.mindforge/personas/migration-architect.md +57 -0
- package/.mindforge/personas/ml-ops-engineer.md +101 -0
- package/.mindforge/personas/mobile-architect.md +105 -0
- package/.mindforge/personas/mobile-security-engineer.md +106 -0
- package/.mindforge/personas/multi-tenancy-architect.md +71 -0
- package/.mindforge/personas/multimodal-engineer.md +57 -0
- package/.mindforge/personas/offline-specialist.md +105 -0
- package/.mindforge/personas/onboarding-navigator.md +63 -0
- package/.mindforge/personas/payments-engineer.md +135 -0
- package/.mindforge/personas/pipeline-engineer.md +115 -0
- package/.mindforge/personas/platform-engineer.md +97 -0
- package/.mindforge/personas/platform-lead.md +57 -0
- package/.mindforge/personas/privacy-engineer.md +57 -0
- package/.mindforge/personas/product-owner.md +56 -0
- package/.mindforge/personas/productivity-analyst.md +57 -0
- package/.mindforge/personas/prompt-architect.md +101 -0
- package/.mindforge/personas/proofreader.md +53 -0
- package/.mindforge/personas/pwa-architect.md +105 -0
- package/.mindforge/personas/quality-scorer.md +63 -0
- package/.mindforge/personas/react-native-engineer.md +106 -0
- package/.mindforge/personas/resilience-engineer.md +69 -0
- package/.mindforge/personas/rfc-architect.md +64 -0
- package/.mindforge/personas/saga-orchestrator.md +80 -0
- package/.mindforge/personas/secrets-engineer.md +57 -0
- package/.mindforge/personas/skill-smith.md +79 -0
- package/.mindforge/personas/sre-lead.md +107 -0
- package/.mindforge/personas/stream-engineer.md +57 -0
- package/.mindforge/personas/streaming-engineer.md +64 -0
- package/.mindforge/personas/swarm-templates.json +674 -44
- package/.mindforge/personas/system-designer.md +57 -0
- package/.mindforge/personas/team-coach.md +120 -0
- package/.mindforge/personas/tech-lead-coach.md +103 -0
- package/.mindforge/personas/technical-writer-lead.md +111 -0
- package/.mindforge/personas/vibe-checker.md +75 -0
- package/.mindforge/personas/worktree-manager.md +56 -0
- package/.mindforge/personas/zero-trust-engineer.md +113 -0
- package/.mindforge/skills/a11y-testing/SKILL.md +143 -0
- package/.mindforge/skills/agent-evaluation-framework/SKILL.md +227 -0
- package/.mindforge/skills/agent-memory-design/SKILL.md +199 -0
- package/.mindforge/skills/agent-orchestration-patterns/SKILL.md +129 -0
- package/.mindforge/skills/agent-tool-selection/SKILL.md +204 -0
- package/.mindforge/skills/ai-agent-deployment/SKILL.md +176 -0
- package/.mindforge/skills/ai-cost-management/SKILL.md +57 -0
- package/.mindforge/skills/ai-safety-alignment/SKILL.md +53 -0
- package/.mindforge/skills/analytics-instrumentation/SKILL.md +172 -0
- package/.mindforge/skills/api-gateway-patterns/SKILL.md +177 -0
- package/.mindforge/skills/api-marketplace/SKILL.md +56 -0
- package/.mindforge/skills/api-versioning/SKILL.md +100 -0
- package/.mindforge/skills/app-store-deployment/SKILL.md +44 -0
- package/.mindforge/skills/architecture-tradeoff-analysis/SKILL.md +97 -0
- package/.mindforge/skills/audit-logging/SKILL.md +140 -0
- package/.mindforge/skills/auth-patterns/SKILL.md +148 -0
- package/.mindforge/skills/autonomous-agent-harness/SKILL.md +218 -0
- package/.mindforge/skills/autonomous-agents/SKILL.md +59 -0
- package/.mindforge/skills/build-system-optimization/SKILL.md +54 -0
- package/.mindforge/skills/build-vs-buy/SKILL.md +80 -0
- package/.mindforge/skills/bundle-optimization/SKILL.md +174 -0
- package/.mindforge/skills/business-analyst/SKILL.md +82 -0
- package/.mindforge/skills/caching-strategies/SKILL.md +132 -0
- package/.mindforge/skills/capacity-planning/SKILL.md +96 -0
- package/.mindforge/skills/causal-inference/SKILL.md +42 -0
- package/.mindforge/skills/cdn-optimization/SKILL.md +212 -0
- package/.mindforge/skills/change-management/SKILL.md +106 -0
- package/.mindforge/skills/chaos-engineering/SKILL.md +99 -0
- package/.mindforge/skills/ci-cd-pipeline/SKILL.md +118 -0
- package/.mindforge/skills/cli-design/SKILL.md +118 -0
- package/.mindforge/skills/code-generation-patterns/SKILL.md +92 -0
- package/.mindforge/skills/code-review-methodology/SKILL.md +180 -0
- package/.mindforge/skills/code-tour/SKILL.md +145 -0
- package/.mindforge/skills/codebase-onboarding/SKILL.md +95 -0
- package/.mindforge/skills/compliance-as-code/SKILL.md +195 -0
- package/.mindforge/skills/conflict-resolution/SKILL.md +87 -0
- package/.mindforge/skills/connection-pooling/SKILL.md +151 -0
- package/.mindforge/skills/container-security/SKILL.md +151 -0
- package/.mindforge/skills/context-engineering/SKILL.md +114 -0
- package/.mindforge/skills/contract-testing/SKILL.md +85 -0
- package/.mindforge/skills/cost-estimation/SKILL.md +82 -0
- package/.mindforge/skills/cqrs-event-sourcing/SKILL.md +95 -0
- package/.mindforge/skills/cross-platform-testing/SKILL.md +43 -0
- package/.mindforge/skills/data-governance/SKILL.md +42 -0
- package/.mindforge/skills/data-lakehouse/SKILL.md +42 -0
- package/.mindforge/skills/data-mesh/SKILL.md +42 -0
- package/.mindforge/skills/data-modeling/SKILL.md +107 -0
- package/.mindforge/skills/data-pipeline-design/SKILL.md +171 -0
- package/.mindforge/skills/data-privacy-engineering/SKILL.md +42 -0
- package/.mindforge/skills/database-performance/SKILL.md +174 -0
- package/.mindforge/skills/database-sharding-advanced/SKILL.md +206 -0
- package/.mindforge/skills/de-sloppify/SKILL.md +120 -0
- package/.mindforge/skills/defense-in-depth/SKILL.md +84 -0
- package/.mindforge/skills/delegation-patterns/SKILL.md +123 -0
- package/.mindforge/skills/dependency-management/SKILL.md +94 -0
- package/.mindforge/skills/deployment-workflow/SKILL.md +135 -0
- package/.mindforge/skills/design-system/SKILL.md +113 -0
- package/.mindforge/skills/developer-onboarding/SKILL.md +99 -0
- package/.mindforge/skills/developer-productivity-metrics/SKILL.md +59 -0
- package/.mindforge/skills/distributed-consensus/SKILL.md +141 -0
- package/.mindforge/skills/dmux-workflows/SKILL.md +141 -0
- package/.mindforge/skills/dns-architecture/SKILL.md +167 -0
- package/.mindforge/skills/ecommerce-architecture/SKILL.md +41 -0
- package/.mindforge/skills/edge-computing/SKILL.md +91 -0
- package/.mindforge/skills/edtech-platform/SKILL.md +41 -0
- package/.mindforge/skills/email-deliverability/SKILL.md +177 -0
- package/.mindforge/skills/embedding-systems/SKILL.md +55 -0
- package/.mindforge/skills/environment-management/SKILL.md +54 -0
- package/.mindforge/skills/error-handling-architecture/SKILL.md +118 -0
- package/.mindforge/skills/estimation-techniques/SKILL.md +113 -0
- package/.mindforge/skills/eval-harness/SKILL.md +180 -0
- package/.mindforge/skills/event-driven-architecture/SKILL.md +162 -0
- package/.mindforge/skills/experiment-design/SKILL.md +139 -0
- package/.mindforge/skills/experiment-platform/SKILL.md +43 -0
- package/.mindforge/skills/feature-engineering/SKILL.md +42 -0
- package/.mindforge/skills/feature-flag-management/SKILL.md +183 -0
- package/.mindforge/skills/fine-tuning-workflow/SKILL.md +189 -0
- package/.mindforge/skills/fintech-patterns/SKILL.md +41 -0
- package/.mindforge/skills/flutter-architecture/SKILL.md +42 -0
- package/.mindforge/skills/gaming-backend/SKILL.md +41 -0
- package/.mindforge/skills/git-workflow-design/SKILL.md +129 -0
- package/.mindforge/skills/graceful-degradation/SKILL.md +95 -0
- package/.mindforge/skills/graphql-patterns/SKILL.md +243 -0
- package/.mindforge/skills/guardrails-and-safety/SKILL.md +137 -0
- package/.mindforge/skills/healthcare-systems/SKILL.md +40 -0
- package/.mindforge/skills/hiring-engineering/SKILL.md +119 -0
- package/.mindforge/skills/human-in-the-loop-design/SKILL.md +234 -0
- package/.mindforge/skills/i18n-architecture/SKILL.md +147 -0
- package/.mindforge/skills/idempotency-patterns/SKILL.md +84 -0
- package/.mindforge/skills/incident-communication/SKILL.md +96 -0
- package/.mindforge/skills/incident-management/SKILL.md +97 -0
- package/.mindforge/skills/infrastructure-as-code/SKILL.md +98 -0
- package/.mindforge/skills/instinct-clustering/SKILL.md +190 -0
- package/.mindforge/skills/internal-developer-platform/SKILL.md +51 -0
- package/.mindforge/skills/iot-platform/SKILL.md +41 -0
- package/.mindforge/skills/k8s-deployment/SKILL.md +358 -0
- package/.mindforge/skills/knowledge-graphs/SKILL.md +56 -0
- package/.mindforge/skills/knowledge-sharing-systems/SKILL.md +112 -0
- package/.mindforge/skills/llm-cost-optimization/SKILL.md +198 -0
- package/.mindforge/skills/llm-orchestration/SKILL.md +56 -0
- package/.mindforge/skills/load-testing/SKILL.md +84 -0
- package/.mindforge/skills/logistics-optimization/SKILL.md +40 -0
- package/.mindforge/skills/market-researcher/SKILL.md +99 -0
- package/.mindforge/skills/marketplace-trust/SKILL.md +40 -0
- package/.mindforge/skills/mcp-server-patterns/SKILL.md +264 -0
- package/.mindforge/skills/media-streaming/SKILL.md +41 -0
- package/.mindforge/skills/meeting-architecture/SKILL.md +146 -0
- package/.mindforge/skills/mentoring-patterns/SKILL.md +77 -0
- package/.mindforge/skills/microservices-patterns/SKILL.md +83 -0
- package/.mindforge/skills/migration-platform/SKILL.md +61 -0
- package/.mindforge/skills/migration-strategies/SKILL.md +129 -0
- package/.mindforge/skills/ml-feature-store/SKILL.md +56 -0
- package/.mindforge/skills/ml-monitoring/SKILL.md +42 -0
- package/.mindforge/skills/mobile-performance/SKILL.md +44 -0
- package/.mindforge/skills/mobile-security/SKILL.md +45 -0
- package/.mindforge/skills/model-evaluation/SKILL.md +53 -0
- package/.mindforge/skills/monorepo-management/SKILL.md +100 -0
- package/.mindforge/skills/multi-tenancy-patterns/SKILL.md +145 -0
- package/.mindforge/skills/multi-turn-conversation-design/SKILL.md +206 -0
- package/.mindforge/skills/multimodal-ai/SKILL.md +51 -0
- package/.mindforge/skills/mutation-testing/SKILL.md +97 -0
- package/.mindforge/skills/notification-system-design/SKILL.md +168 -0
- package/.mindforge/skills/observability-stack/SKILL.md +136 -0
- package/.mindforge/skills/offline-first-design/SKILL.md +43 -0
- package/.mindforge/skills/on-call-design/SKILL.md +111 -0
- package/.mindforge/skills/pagination-patterns/SKILL.md +230 -0
- package/.mindforge/skills/payment-integration/SKILL.md +176 -0
- package/.mindforge/skills/performance-reviews/SKILL.md +140 -0
- package/.mindforge/skills/platform-observability/SKILL.md +58 -0
- package/.mindforge/skills/platform-reliability/SKILL.md +52 -0
- package/.mindforge/skills/post-incident-learning/SKILL.md +96 -0
- package/.mindforge/skills/product-manager/SKILL.md +104 -0
- package/.mindforge/skills/progressive-web-app/SKILL.md +44 -0
- package/.mindforge/skills/prompt-engineering/SKILL.md +94 -0
- package/.mindforge/skills/proofreader/SKILL.md +158 -0
- package/.mindforge/skills/push-notification-architecture/SKILL.md +45 -0
- package/.mindforge/skills/python-performance/SKILL.md +183 -0
- package/.mindforge/skills/quality-audit/SKILL.md +171 -0
- package/.mindforge/skills/queue-design/SKILL.md +85 -0
- package/.mindforge/skills/rag-architecture/SKILL.md +176 -0
- package/.mindforge/skills/rate-limiting-design/SKILL.md +94 -0
- package/.mindforge/skills/react-native-patterns/SKILL.md +42 -0
- package/.mindforge/skills/react-performance/SKILL.md +229 -0
- package/.mindforge/skills/real-time-analytics/SKILL.md +42 -0
- package/.mindforge/skills/real-time-sync/SKILL.md +83 -0
- package/.mindforge/skills/responsive-native/SKILL.md +44 -0
- package/.mindforge/skills/responsive-patterns/SKILL.md +141 -0
- package/.mindforge/skills/rfc-pipeline/SKILL.md +114 -0
- package/.mindforge/skills/saas-multi-tenant/SKILL.md +41 -0
- package/.mindforge/skills/santa-method/SKILL.md +134 -0
- package/.mindforge/skills/search-implementation/SKILL.md +98 -0
- package/.mindforge/skills/secrets-platform/SKILL.md +56 -0
- package/.mindforge/skills/secrets-rotation/SKILL.md +173 -0
- package/.mindforge/skills/self-serve-infrastructure/SKILL.md +51 -0
- package/.mindforge/skills/serverless-patterns/SKILL.md +119 -0
- package/.mindforge/skills/skill-creator-meta/SKILL.md +146 -0
- package/.mindforge/skills/sprint-retrospective-facilitation/SKILL.md +112 -0
- package/.mindforge/skills/stakeholder-communication/SKILL.md +85 -0
- package/.mindforge/skills/state-management/SKILL.md +104 -0
- package/.mindforge/skills/stream-processing/SKILL.md +43 -0
- package/.mindforge/skills/streaming-architecture/SKILL.md +81 -0
- package/.mindforge/skills/supply-chain-security/SKILL.md +145 -0
- package/.mindforge/skills/synthetic-data-generation/SKILL.md +52 -0
- package/.mindforge/skills/system-design/SKILL.md +88 -0
- package/.mindforge/skills/team-topology-design/SKILL.md +107 -0
- package/.mindforge/skills/technical-debt-management/SKILL.md +86 -0
- package/.mindforge/skills/technical-interview-design/SKILL.md +98 -0
- package/.mindforge/skills/technical-leadership/SKILL.md +75 -0
- package/.mindforge/skills/technical-writing/SKILL.md +237 -0
- package/.mindforge/skills/technology-radar/SKILL.md +88 -0
- package/.mindforge/skills/testing-anti-patterns/SKILL.md +288 -0
- package/.mindforge/skills/tool-design/SKILL.md +138 -0
- package/.mindforge/skills/typescript-advanced/SKILL.md +198 -0
- package/.mindforge/skills/using-git-worktrees/SKILL.md +139 -0
- package/.mindforge/skills/verification-loop/SKILL.md +13 -1
- package/.mindforge/skills/vibe-security/SKILL.md +165 -0
- package/.mindforge/skills/visual-regression-testing/SKILL.md +97 -0
- package/.mindforge/skills/websocket-patterns/SKILL.md +203 -0
- package/.mindforge/skills/writing-plans/SKILL.md +170 -0
- package/.mindforge/skills/writing-skills/SKILL.md +216 -0
- package/.mindforge/skills/zero-trust-architecture/SKILL.md +166 -0
- package/CHANGELOG.md +240 -0
- package/MINDFORGE.md +4 -4
- package/README.md +49 -4
- package/RELEASENOTES.md +80 -0
- package/SECURITY.md +20 -8
- package/bin/autonomous/audit-writer.js +13 -0
- package/bin/autonomous/auto-runner.js +74 -16
- package/bin/autonomous/context-refactorer.js +26 -11
- package/bin/autonomous/state-manager.js +62 -6
- package/bin/autonomous/stuck-monitor.js +46 -7
- package/bin/autonomous/wave-executor.js +66 -25
- package/bin/dashboard/api-router.js +43 -0
- package/bin/dashboard/metrics-aggregator.js +28 -1
- package/bin/dashboard/server.js +67 -4
- package/bin/dashboard/sse-bridge.js +4 -4
- package/bin/engine/feedback-loop.js +8 -0
- package/bin/engine/intelligence-interlock.js +32 -15
- package/bin/engine/logic-drift-detector.js +2 -1
- package/bin/engine/nexus-tracer.js +3 -2
- package/bin/engine/remediation-engine.js +155 -32
- package/bin/engine/self-corrective-synthesizer.js +84 -10
- package/bin/engine/sre-manager.js +12 -4
- package/bin/engine/temporal-hub.js +131 -34
- package/bin/governance/approve.js +41 -5
- package/bin/governance/impact-analyzer.js +28 -0
- package/bin/governance/policy-engine.js +10 -3
- package/bin/governance/quantum-crypto.js +32 -19
- package/bin/governance/rbac-manager.js +74 -2
- package/bin/governance/ztai-manager.js +49 -7
- package/bin/hindsight-injector.js +3 -3
- package/bin/memory/eis-client.js +71 -34
- package/bin/memory/embedding-engine.js +61 -0
- package/bin/memory/knowledge-graph.js +58 -5
- package/bin/memory/knowledge-indexer.js +53 -6
- package/bin/memory/knowledge-store.js +22 -0
- package/bin/migrations/10.7.0-to-11.0.0.js +110 -0
- package/bin/migrations/schema-versions.js +13 -0
- package/bin/models/anthropic-provider.js +45 -0
- package/bin/models/cloud-broker.js +68 -20
- package/bin/models/gemini-provider.js +51 -0
- package/bin/models/model-client.js +20 -0
- package/bin/models/model-router.js +28 -8
- package/bin/models/openai-provider.js +44 -0
- package/bin/utils/file-io.js +63 -1
- package/bin/utils/index.js +58 -0
- package/docs/getting-started.md +1 -1
- package/docs/user-guide.md +2 -2
- package/package.json +2 -2
- package/.mindforge/personas/data-privacy-engineer.md +0 -187
|
@@ -0,0 +1,129 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: agent-orchestration-patterns
|
|
3
|
+
version: 1.0.0
|
|
4
|
+
min_mindforge_version: 10.0.7
|
|
5
|
+
status: stable
|
|
6
|
+
triggers: agent orchestration pattern, supervisor worker pattern, agent pipeline topology, agent debate pattern, agent consensus protocol, map reduce agents, handoff protocol design, multi-agent coordination, agent topology design, swarm pattern design, agent composition, failure propagation pattern
|
|
7
|
+
compose:
|
|
8
|
+
- tool-design
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# Agent Orchestration Patterns
|
|
12
|
+
|
|
13
|
+
## When this skill activates
|
|
14
|
+
|
|
15
|
+
This skill activates when designing multi-agent systems, choosing coordination topologies, implementing handoff protocols, or debugging agent-to-agent communication failures. It applies to any system where two or more autonomous agents must collaborate, compete, or chain their outputs to accomplish a goal.
|
|
16
|
+
|
|
17
|
+
## Mandatory actions when this skill is active
|
|
18
|
+
|
|
19
|
+
### Before
|
|
20
|
+
|
|
21
|
+
1. **Map the problem space** — Identify all subtasks. Determine which require sequential execution (dependencies) and which are independent (parallelizable).
|
|
22
|
+
2. **Assess complexity** — Single-agent tasks masquerading as multi-agent problems waste coordination overhead. Only orchestrate when genuine specialization or parallelism is needed.
|
|
23
|
+
3. **Define boundaries** — Each agent must have a clear responsibility boundary. Overlapping responsibilities cause conflicts. Gaps cause dropped tasks.
|
|
24
|
+
4. **Choose state strategy** — Decide upfront: shared state (agents read/write common store) or isolated state (agents communicate only via messages).
|
|
25
|
+
|
|
26
|
+
### During
|
|
27
|
+
|
|
28
|
+
#### Pattern Catalog
|
|
29
|
+
|
|
30
|
+
**1. Supervisor/Worker (Hub and Spoke)**
|
|
31
|
+
- **Topology** — One coordinator agent decomposes the task and dispatches subtasks to N worker agents. Workers report results back to the supervisor.
|
|
32
|
+
- **When to use** — Task is decomposable into independent units. Workers are interchangeable or specialized but non-overlapping.
|
|
33
|
+
- **Supervisor responsibilities** — Task decomposition, worker assignment, result aggregation, error handling, timeout enforcement.
|
|
34
|
+
- **Worker responsibilities** — Execute assigned subtask, report structured results, signal failure early.
|
|
35
|
+
- **Pitfall** — Supervisor becomes bottleneck. Mitigate with async dispatch and parallel worker execution.
|
|
36
|
+
|
|
37
|
+
**2. Pipeline (Sequential Chain)**
|
|
38
|
+
- **Topology** — Agent A's output becomes Agent B's input. Linear flow through N stages.
|
|
39
|
+
- **When to use** — Tasks have natural ordering (research → draft → review → publish). Each stage transforms or enriches the previous output.
|
|
40
|
+
- **Stage contract** — Each stage must define its input schema and output schema. Type mismatch between stages is the most common pipeline failure.
|
|
41
|
+
- **Error handling** — Fail the pipeline on any stage failure. Partial results from earlier stages should be preserved for debugging.
|
|
42
|
+
- **Optimization** — Streaming between stages reduces latency. Agent B can begin processing as Agent A emits output.
|
|
43
|
+
|
|
44
|
+
**3. Debate (Adversarial)**
|
|
45
|
+
- **Topology** — Two or more agents argue opposing positions. A synthesizer agent evaluates arguments and produces a final decision.
|
|
46
|
+
- **When to use** — High-stakes decisions where bias is a risk. Architecture choices, security reviews, strategic decisions.
|
|
47
|
+
- **Protocol** — Round 1: each debater states position with evidence. Round 2: each debater rebuts opponent's position. Round 3: synthesizer produces verdict with reasoning.
|
|
48
|
+
- **Constraint** — Debaters must not see each other's initial positions until after Round 1. Prevents anchoring.
|
|
49
|
+
- **Pitfall** — Debates can be unproductive without strict structure. Always time-box rounds.
|
|
50
|
+
|
|
51
|
+
**4. Consensus (Agreement Required)**
|
|
52
|
+
- **Topology** — All agents must agree on the output. Disagreement triggers re-evaluation.
|
|
53
|
+
- **When to use** — Safety-critical decisions. Deployment approvals. Security assessments. Changes where false positives are acceptable but false negatives are dangerous.
|
|
54
|
+
- **Protocol** — Each agent independently evaluates. If all approve: proceed. If any reject: block and surface the dissenting reasoning.
|
|
55
|
+
- **Threshold variants** — Unanimous (all agree), Majority (>50%), Supermajority (>66%), Quorum (minimum N must vote).
|
|
56
|
+
- **Pitfall** — Consensus is expensive. Reserve for decisions where the cost of a wrong answer far exceeds the cost of deliberation.
|
|
57
|
+
|
|
58
|
+
**5. MapReduce (Parallel Processing)**
|
|
59
|
+
- **Topology** — Map phase: split input into N chunks, dispatch to N parallel agents. Reduce phase: aggregate results into final output.
|
|
60
|
+
- **When to use** — Large inputs that can be processed independently (code review across files, document analysis, test execution).
|
|
61
|
+
- **Map function** — Must produce non-overlapping chunks. Overlap causes duplicate work or conflicting results.
|
|
62
|
+
- **Reduce function** — Must handle partial failures gracefully. If 1 of 10 map workers fails, the reduce should still produce useful output from the other 9.
|
|
63
|
+
- **Scaling** — Add more workers linearly. Bottleneck is the reduce step, not the map step.
|
|
64
|
+
|
|
65
|
+
#### Handoff Protocol Design
|
|
66
|
+
|
|
67
|
+
Every agent-to-agent handoff must include a structured message:
|
|
68
|
+
|
|
69
|
+
```json
|
|
70
|
+
{
|
|
71
|
+
"task_id": "unique-identifier",
|
|
72
|
+
"from_agent": "agent-name",
|
|
73
|
+
"to_agent": "agent-name",
|
|
74
|
+
"task": "clear description of what to do",
|
|
75
|
+
"context": "relevant background (minimal, not full history)",
|
|
76
|
+
"constraints": ["must not modify X", "timeout 30s"],
|
|
77
|
+
"acceptance_criteria": ["output matches schema Y", "all tests pass"],
|
|
78
|
+
"artifacts": ["file paths or data references"]
|
|
79
|
+
}
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
- **Minimal context** — Send only what the receiving agent needs. Full history causes confusion and wastes tokens.
|
|
83
|
+
- **Explicit acceptance criteria** — The receiving agent must know when it has succeeded.
|
|
84
|
+
- **Typed artifacts** — Reference files or data by path/ID, not by embedding content in the message.
|
|
85
|
+
|
|
86
|
+
#### Failure Propagation Strategies
|
|
87
|
+
|
|
88
|
+
| Strategy | Behavior | Use When |
|
|
89
|
+
|----------|----------|----------|
|
|
90
|
+
| Fail-fast | Abort immediately, surface error | Critical path, no recovery possible |
|
|
91
|
+
| Retry | Repeat N times with backoff | Transient failures (network, rate limits) |
|
|
92
|
+
| Escalate | Notify supervisor, request human input | Ambiguous failures, policy decisions |
|
|
93
|
+
| Degrade | Continue with partial results, flag gaps | Non-critical subtasks, best-effort acceptable |
|
|
94
|
+
| Circuit-break | Stop retrying after N failures, return cached/default | Dependency is unreliable |
|
|
95
|
+
|
|
96
|
+
#### State Management
|
|
97
|
+
|
|
98
|
+
- **Shared state** — All agents read/write a common store (database, shared memory). Simpler but requires conflict resolution (optimistic locking, CRDTs).
|
|
99
|
+
- **Isolated state** — Agents maintain private state, communicate only via messages. Safer but requires explicit state transfer in handoffs.
|
|
100
|
+
- **Hybrid** — Shared read-only state (project context, configuration) + isolated write state (each agent's working memory). Best balance for most systems.
|
|
101
|
+
|
|
102
|
+
#### Decision Matrix: When to Use Which Pattern
|
|
103
|
+
|
|
104
|
+
| Scenario | Pattern |
|
|
105
|
+
|----------|---------|
|
|
106
|
+
| Task decomposes into independent subtasks | MapReduce or Supervisor/Worker |
|
|
107
|
+
| Tasks must execute in order | Pipeline |
|
|
108
|
+
| High-stakes decision needs scrutiny | Debate or Consensus |
|
|
109
|
+
| One coordinator manages many executors | Supervisor/Worker |
|
|
110
|
+
| System must tolerate partial failures | MapReduce with degraded reduce |
|
|
111
|
+
| Speed is critical, tasks are independent | MapReduce with max parallelism |
|
|
112
|
+
|
|
113
|
+
### After
|
|
114
|
+
|
|
115
|
+
1. **Validate handoff contracts** — Test that each agent produces output matching the next agent's expected input schema.
|
|
116
|
+
2. **Test failure modes** — Simulate each failure propagation path. Verify the system degrades gracefully, not catastrophically.
|
|
117
|
+
3. **Measure overhead** — Coordination cost should be <20% of total execution time. If higher, simplify the topology.
|
|
118
|
+
4. **Document topology** — Create a diagram showing agent relationships, handoff directions, and failure paths.
|
|
119
|
+
|
|
120
|
+
## Self-check before task completion
|
|
121
|
+
|
|
122
|
+
- [ ] Pattern choice is justified by task structure (not over-engineered)
|
|
123
|
+
- [ ] Each agent has clear, non-overlapping responsibilities
|
|
124
|
+
- [ ] Handoff protocol includes task, context, constraints, and acceptance criteria
|
|
125
|
+
- [ ] Failure propagation strategy is defined for every inter-agent connection
|
|
126
|
+
- [ ] State management approach is explicit (shared vs isolated vs hybrid)
|
|
127
|
+
- [ ] Coordination overhead is measured and acceptable (<20% of total time)
|
|
128
|
+
- [ ] All agent-to-agent contracts are typed and validated
|
|
129
|
+
- [ ] System degrades gracefully under partial failure conditions
|
|
@@ -0,0 +1,204 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: agent-tool-selection
|
|
3
|
+
version: 1.0.0
|
|
4
|
+
min_mindforge_version: 10.0.4
|
|
5
|
+
status: stable
|
|
6
|
+
triggers: agent tool selection, capability matching, tool routing, cost-aware tool choice, tool fallback chain, tool composition strategy, function selection, tool description optimization, tool disambiguation, tool preference, tool capability matrix, tool ranking
|
|
7
|
+
compose: tool-design
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Skill — Agent Tool Selection (Intelligent Capability Routing)
|
|
11
|
+
|
|
12
|
+
## When this skill activates
|
|
13
|
+
When designing tool selection logic for AI agents, optimizing tool descriptions,
|
|
14
|
+
building fallback chains, or analyzing tool usage patterns. Use for any scenario
|
|
15
|
+
where an agent must choose between multiple tools to accomplish a task.
|
|
16
|
+
|
|
17
|
+
Core principle: **Specificity over generality** — when two tools can both handle a
|
|
18
|
+
task, prefer the more specific one. It will be faster, cheaper, and more reliable.
|
|
19
|
+
A hammer works for screws, but a screwdriver is better.
|
|
20
|
+
|
|
21
|
+
## Mandatory actions when this skill is active
|
|
22
|
+
|
|
23
|
+
### Tool Selection Algorithm
|
|
24
|
+
|
|
25
|
+
1. **Selection pipeline (in order):**
|
|
26
|
+
```
|
|
27
|
+
Input: task description + available tools
|
|
28
|
+
|
|
29
|
+
Step 1 — Capability Match:
|
|
30
|
+
- For each tool: does its capability set cover the task requirements?
|
|
31
|
+
- Eliminate tools that CANNOT handle the task (hard filter)
|
|
32
|
+
|
|
33
|
+
Step 2 — Rank by Specificity:
|
|
34
|
+
- More specific tool > more general tool
|
|
35
|
+
- Example: "Read file" > "Bash (cat)" for reading files
|
|
36
|
+
- Specificity = how narrow is the tool's intended use case?
|
|
37
|
+
|
|
38
|
+
Step 3 — Rank by Cost-Efficiency:
|
|
39
|
+
- Among equally capable tools: prefer cheaper/faster
|
|
40
|
+
- Read (free, instant) > Bash cat (process spawn) > API call (network)
|
|
41
|
+
|
|
42
|
+
Step 4 — Rank by Reliability:
|
|
43
|
+
- Among equal cost: prefer higher success rate
|
|
44
|
+
- Check historical success rate per tool per task type
|
|
45
|
+
|
|
46
|
+
Step 5 — Verify Preconditions:
|
|
47
|
+
- Does the selected tool's preconditions hold?
|
|
48
|
+
- Example: Edit requires file was previously Read
|
|
49
|
+
- If preconditions not met: add prerequisite steps
|
|
50
|
+
|
|
51
|
+
Output: ordered list of tools to try (primary + fallbacks)
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
2. **Decision matrix template:**
|
|
55
|
+
```
|
|
56
|
+
| Task Type | Primary Tool | Fallback 1 | Fallback 2 | Anti-pattern |
|
|
57
|
+
|--------------------|--------------|--------------|----------- |------------------|
|
|
58
|
+
| Read file content | Read | Bash (cat) | — | Grep (wrong use) |
|
|
59
|
+
| Search for pattern | Grep | Bash (grep) | Read + scan | Read all files |
|
|
60
|
+
| Edit existing file | Edit | Write | — | Bash (sed) |
|
|
61
|
+
| Create new file | Write | Bash (echo>) | — | Edit (no file) |
|
|
62
|
+
| Run tests | Bash | — | — | Read test output |
|
|
63
|
+
| Check file exists | Bash (ls) | Read (error) | — | Grep for path |
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
### Tool Description Optimization
|
|
67
|
+
|
|
68
|
+
3. **Writing effective tool descriptions (they ARE prompts):**
|
|
69
|
+
```
|
|
70
|
+
Good description (specific, with examples):
|
|
71
|
+
"Read a file from disk. Use when you need to see file contents.
|
|
72
|
+
Supports text, images, PDFs. Prefer over Bash cat/head/tail.
|
|
73
|
+
NOT for directories (use Bash ls)."
|
|
74
|
+
|
|
75
|
+
Bad description (vague):
|
|
76
|
+
"Reads things from the filesystem."
|
|
77
|
+
|
|
78
|
+
Good description (with when-to-use and when-NOT-to-use):
|
|
79
|
+
"Edit an existing file by replacing exact string matches.
|
|
80
|
+
Use when: modifying 1-5 specific locations in a file.
|
|
81
|
+
Do NOT use when: rewriting >50% of the file (use Write instead).
|
|
82
|
+
Requires: file must have been Read in this session first."
|
|
83
|
+
|
|
84
|
+
Bad description (missing boundaries):
|
|
85
|
+
"Edits files."
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
Rules:
|
|
89
|
+
- Include WHEN to use (positive examples)
|
|
90
|
+
- Include when NOT to use (negative examples — prevents misuse)
|
|
91
|
+
- State preconditions explicitly
|
|
92
|
+
- Keep descriptions under 100 words (concise > comprehensive)
|
|
93
|
+
- Use concrete examples, not abstract capabilities
|
|
94
|
+
|
|
95
|
+
### Cost-Aware Selection
|
|
96
|
+
|
|
97
|
+
4. **Cost hierarchy (prefer cheaper when quality is equal):**
|
|
98
|
+
```
|
|
99
|
+
Tier 1 — Free/Instant (prefer these):
|
|
100
|
+
- Read (file content)
|
|
101
|
+
- Edit (modify file)
|
|
102
|
+
- Write (create file)
|
|
103
|
+
- Grep (pattern search in known scope)
|
|
104
|
+
|
|
105
|
+
Tier 2 — Cheap/Fast (use when Tier 1 can't):
|
|
106
|
+
- Bash (shell commands — process spawn overhead)
|
|
107
|
+
- Glob (file path patterns)
|
|
108
|
+
|
|
109
|
+
Tier 3 — Moderate (use when necessary):
|
|
110
|
+
- LSP (language server queries)
|
|
111
|
+
- Web fetch (network requests)
|
|
112
|
+
|
|
113
|
+
Tier 4 — Expensive (use sparingly):
|
|
114
|
+
- Sub-agent spawn (full agent instantiation)
|
|
115
|
+
- Multi-file analysis (token-heavy)
|
|
116
|
+
- External API calls (rate-limited, costly)
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
Rules:
|
|
120
|
+
- Always check if a Tier 1 tool can handle the task before reaching for Tier 3-4
|
|
121
|
+
- Track cumulative cost during a session (don't let tool costs compound silently)
|
|
122
|
+
- For repeated operations: batch when possible (one Bash with && vs many Bash calls)
|
|
123
|
+
- Cost includes: token consumption, time, API calls, compute resources
|
|
124
|
+
|
|
125
|
+
### Fallback Chains
|
|
126
|
+
|
|
127
|
+
5. **Designing robust fallback sequences:**
|
|
128
|
+
```
|
|
129
|
+
Fallback chain structure:
|
|
130
|
+
1. Try primary tool (most specific, cheapest)
|
|
131
|
+
2. If fails (error, timeout, precondition unmet):
|
|
132
|
+
- Log failure reason
|
|
133
|
+
- Try fallback 1 (broader capability)
|
|
134
|
+
3. If fallback 1 fails:
|
|
135
|
+
- Try fallback 2 (most general/expensive)
|
|
136
|
+
4. If all fail:
|
|
137
|
+
- Escalate to user with: what was tried, why each failed, what's needed
|
|
138
|
+
|
|
139
|
+
Example:
|
|
140
|
+
Task: "Find where function X is defined"
|
|
141
|
+
1. Grep (fast, pattern-based) → found? done
|
|
142
|
+
2. LSP (semantic, language-aware) → found? done
|
|
143
|
+
3. Bash find + grep (brute force) → found? done
|
|
144
|
+
4. Escalate: "I couldn't locate function X. Can you point me to the file?"
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
Rules:
|
|
148
|
+
- Fallback chains should be pre-defined per task type (not improvised)
|
|
149
|
+
- Each fallback should be DIFFERENT in approach (not just retry)
|
|
150
|
+
- Log which level of the chain succeeded (optimize primary over time)
|
|
151
|
+
- Max 3 fallback levels before escalation (avoid infinite retry loops)
|
|
152
|
+
|
|
153
|
+
### Tool Composition
|
|
154
|
+
|
|
155
|
+
6. **Combining tools for complex tasks:**
|
|
156
|
+
```
|
|
157
|
+
Composition patterns:
|
|
158
|
+
|
|
159
|
+
Sequential: Tool A output → Tool B input
|
|
160
|
+
Example: Grep (find file) → Read (get content) → Edit (modify)
|
|
161
|
+
|
|
162
|
+
Parallel: Tool A + Tool B independently → merge results
|
|
163
|
+
Example: Grep (find usages) + Read (get definition) → understand full context
|
|
164
|
+
|
|
165
|
+
Conditional: If Tool A succeeds → Tool B, else → Tool C
|
|
166
|
+
Example: If Read(file) succeeds → Edit, else → Write (file doesn't exist)
|
|
167
|
+
|
|
168
|
+
Iterative: Repeat Tool A until condition met
|
|
169
|
+
Example: Bash(test) → fails → Edit(fix) → Bash(test) → passes → done
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
Rules:
|
|
173
|
+
- Plan composition BEFORE executing (don't improvise mid-chain)
|
|
174
|
+
- Minimize total tool calls (combine steps where possible)
|
|
175
|
+
- Never call the same tool twice with identical inputs (cache/reuse results)
|
|
176
|
+
- If composition exceeds 5 sequential steps: consider if there's a more direct tool
|
|
177
|
+
|
|
178
|
+
### Tool Disambiguation
|
|
179
|
+
|
|
180
|
+
7. **When multiple tools seem equally valid:**
|
|
181
|
+
```
|
|
182
|
+
Disambiguation criteria (in priority order):
|
|
183
|
+
1. Fewer side effects: Read-only > Read-write (prefer observation over action)
|
|
184
|
+
2. More specific: Narrow tool > broad tool (Edit > Bash sed)
|
|
185
|
+
3. Cheaper: Less resource consumption > more
|
|
186
|
+
4. More reversible: Undoable > permanent (Edit > Write for existing files)
|
|
187
|
+
5. Better error messages: Tools with clear failure modes > opaque failures
|
|
188
|
+
|
|
189
|
+
If still tied after all criteria: pick the one that appears first in the
|
|
190
|
+
tool list (convention-based tie-breaking, prevents analysis paralysis)
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
## Self-check before task completion
|
|
194
|
+
|
|
195
|
+
Before marking a task done when this skill was active:
|
|
196
|
+
|
|
197
|
+
- [ ] Did I follow the selection pipeline (capability → specificity → cost → reliability)?
|
|
198
|
+
- [ ] Are tool descriptions specific, with positive AND negative usage examples?
|
|
199
|
+
- [ ] Is cost hierarchy respected (cheaper tools preferred when quality is equal)?
|
|
200
|
+
- [ ] Are fallback chains defined for each critical task type (max 3 levels)?
|
|
201
|
+
- [ ] Is tool composition planned before execution (not improvised)?
|
|
202
|
+
- [ ] Are disambiguations resolved by: side effects → specificity → cost → reversibility?
|
|
203
|
+
- [ ] Are tool calls minimized (no redundant calls, batched where possible)?
|
|
204
|
+
- [ ] Is escalation to user defined as the final fallback (not infinite retry)?
|
|
@@ -0,0 +1,176 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ai-agent-deployment
|
|
3
|
+
version: 1.0.0
|
|
4
|
+
min_mindforge_version: 10.1.1
|
|
5
|
+
status: stable
|
|
6
|
+
triggers: ai agent deployment, agent hosting, agent scaling, agent versioning, agent A/B testing, agent monitoring production, agent rollback, agent health check, agent cost production, agent performance monitoring, agent canary, agent shadow testing
|
|
7
|
+
compose: deployment-workflow
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Skill — AI Agent Deployment
|
|
11
|
+
|
|
12
|
+
## When this skill activates
|
|
13
|
+
Any task involving deploying AI agents to production, versioning agent configurations,
|
|
14
|
+
A/B testing agent variants, monitoring agent quality in production,
|
|
15
|
+
or managing the operational lifecycle of AI agents.
|
|
16
|
+
|
|
17
|
+
## Mandatory actions when this skill is active
|
|
18
|
+
|
|
19
|
+
### Before writing any code
|
|
20
|
+
1. Define the agent version tuple: model + prompt + tools + config (all pinned together).
|
|
21
|
+
2. Identify success metrics (quality, latency, cost, user satisfaction).
|
|
22
|
+
3. Plan rollback strategy (instant version pointer switch).
|
|
23
|
+
4. Design monitoring (token usage, error rate, quality signal).
|
|
24
|
+
|
|
25
|
+
### During implementation
|
|
26
|
+
- Package agent as a versioned, immutable deployment artifact.
|
|
27
|
+
- Implement health check endpoint (synthetic task probe).
|
|
28
|
+
- Add structured logging for every agent action (input, output, tools used, tokens).
|
|
29
|
+
- Build traffic splitting capability for A/B and canary.
|
|
30
|
+
- Instrument cost tracking per-task and per-user.
|
|
31
|
+
- Implement graceful degradation (fallback to simpler model on failure).
|
|
32
|
+
|
|
33
|
+
### After implementation
|
|
34
|
+
- Verify shadow test shows no regression vs current version.
|
|
35
|
+
- Confirm monitoring dashboards capture all key metrics.
|
|
36
|
+
- Test rollback procedure end-to-end.
|
|
37
|
+
- Validate cost projections against actual usage.
|
|
38
|
+
- Run synthetic probes for health verification.
|
|
39
|
+
|
|
40
|
+
## Versioning Strategy
|
|
41
|
+
|
|
42
|
+
### Agent Version = Immutable Tuple
|
|
43
|
+
```json
|
|
44
|
+
{
|
|
45
|
+
"version": "agent-v2.3.1",
|
|
46
|
+
"model": "claude-sonnet-4-20250514",
|
|
47
|
+
"prompt_hash": "sha256:abc123...",
|
|
48
|
+
"tools": ["search_v2", "code_exec_v1", "web_browse_v3"],
|
|
49
|
+
"config": {
|
|
50
|
+
"temperature": 0.3,
|
|
51
|
+
"max_tokens": 4096,
|
|
52
|
+
"timeout_ms": 30000
|
|
53
|
+
}
|
|
54
|
+
}
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
### Rules
|
|
58
|
+
- Changing ANY component = new version.
|
|
59
|
+
- Never mutate a deployed version in place.
|
|
60
|
+
- Keep previous N versions warm for instant rollback.
|
|
61
|
+
- Version string includes all components for traceability.
|
|
62
|
+
|
|
63
|
+
## Hosting Patterns
|
|
64
|
+
|
|
65
|
+
### Containerized (Recommended)
|
|
66
|
+
- Docker container with model client, prompt, tool implementations.
|
|
67
|
+
- Auto-scale on queue depth (not CPU — agents are I/O bound).
|
|
68
|
+
- GPU allocation only if running local inference.
|
|
69
|
+
- Isolate per-tenant for data separation.
|
|
70
|
+
|
|
71
|
+
### Scaling Signals
|
|
72
|
+
| Signal | Scale Direction | Reason |
|
|
73
|
+
|--------|----------------|--------|
|
|
74
|
+
| Queue depth increasing | Scale up | Work is backing up |
|
|
75
|
+
| P95 latency rising | Scale up | Capacity insufficient |
|
|
76
|
+
| Queue empty for 5min | Scale down | Over-provisioned |
|
|
77
|
+
| Error rate > 5% | Pause scaling | Fix errors first |
|
|
78
|
+
|
|
79
|
+
## A/B Testing
|
|
80
|
+
|
|
81
|
+
### Setup
|
|
82
|
+
1. Define hypothesis (e.g., "new prompt reduces hallucination by 20%").
|
|
83
|
+
2. Split traffic (e.g., 90/10 control/experiment).
|
|
84
|
+
3. Run for statistical significance (typically 1000+ samples per variant).
|
|
85
|
+
4. Measure: quality score, latency, cost, user feedback.
|
|
86
|
+
|
|
87
|
+
### Metrics to Compare
|
|
88
|
+
- Task success rate (did the agent complete the task correctly?).
|
|
89
|
+
- Token usage (cost proxy).
|
|
90
|
+
- Latency p50/p95/p99.
|
|
91
|
+
- Tool failure rate.
|
|
92
|
+
- User satisfaction signal (thumbs up/down, follow-up corrections).
|
|
93
|
+
- Hallucination rate (if measurable via ground truth).
|
|
94
|
+
|
|
95
|
+
### Graduation Criteria
|
|
96
|
+
- Improvement statistically significant (p < 0.05).
|
|
97
|
+
- No regression in any critical metric.
|
|
98
|
+
- Cost increase acceptable (<20% for same quality).
|
|
99
|
+
|
|
100
|
+
## Shadow Testing
|
|
101
|
+
|
|
102
|
+
### Pattern
|
|
103
|
+
```
|
|
104
|
+
User Request → Production Agent (responds to user)
|
|
105
|
+
→ Shadow Agent (runs silently, output logged)
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
### Purpose
|
|
109
|
+
- Test new version against real traffic without user impact.
|
|
110
|
+
- Compare outputs offline (human eval or automated scoring).
|
|
111
|
+
- Detect regressions before any user sees them.
|
|
112
|
+
|
|
113
|
+
### Rules
|
|
114
|
+
- Shadow agent output never reaches the user.
|
|
115
|
+
- Shadow uses same input but may have different model/prompt/tools.
|
|
116
|
+
- Compare at scale (1000+ requests) before promoting.
|
|
117
|
+
- Track divergence rate and categorize differences.
|
|
118
|
+
|
|
119
|
+
## Monitoring
|
|
120
|
+
|
|
121
|
+
### Key Metrics (Real-Time Dashboard)
|
|
122
|
+
| Metric | Alert Threshold | Action |
|
|
123
|
+
|--------|----------------|--------|
|
|
124
|
+
| Token usage/task | >2x baseline | Check for loops/verbose output |
|
|
125
|
+
| Latency p95 | >30s | Scale up or investigate bottleneck |
|
|
126
|
+
| Tool failure rate | >5% | Check tool availability |
|
|
127
|
+
| Hallucination rate | >3% | Rollback, investigate prompt |
|
|
128
|
+
| User negative feedback | >10% | Investigate, consider rollback |
|
|
129
|
+
| Cost per task | >$0.50 | Check for inefficiency |
|
|
130
|
+
|
|
131
|
+
### Structured Logging
|
|
132
|
+
Every agent invocation must log:
|
|
133
|
+
- Request ID, user ID, agent version.
|
|
134
|
+
- Input (sanitized of PII).
|
|
135
|
+
- Output summary.
|
|
136
|
+
- Tools invoked and their results.
|
|
137
|
+
- Token counts (input, output, total).
|
|
138
|
+
- Latency breakdown (thinking, tool calls, generation).
|
|
139
|
+
- Success/failure determination.
|
|
140
|
+
|
|
141
|
+
## Rollback
|
|
142
|
+
|
|
143
|
+
### Instant Rollback
|
|
144
|
+
- Version pointer in config store (not redeployment).
|
|
145
|
+
- Switch pointer → immediate traffic to previous version.
|
|
146
|
+
- Keep N previous versions warm (containers running, ready).
|
|
147
|
+
- Rollback decision within 5 minutes of detecting regression.
|
|
148
|
+
|
|
149
|
+
### Rollback Triggers (Automatic)
|
|
150
|
+
- Error rate > 10% for 3 consecutive minutes.
|
|
151
|
+
- P95 latency > 60s for 5 minutes.
|
|
152
|
+
- User negative feedback spike (3x normal rate).
|
|
153
|
+
|
|
154
|
+
## Health Checks
|
|
155
|
+
|
|
156
|
+
### Synthetic Probes
|
|
157
|
+
- Run a known-good task against the agent every 5 minutes.
|
|
158
|
+
- Verify output matches expected structure.
|
|
159
|
+
- Check latency within bounds.
|
|
160
|
+
- Alert if probe fails 2 consecutive times.
|
|
161
|
+
|
|
162
|
+
### Probe Design
|
|
163
|
+
- Task must be deterministic (or have verifiable structure).
|
|
164
|
+
- Must exercise core capabilities (reasoning + at least one tool).
|
|
165
|
+
- Must complete within health check timeout (10s recommended).
|
|
166
|
+
- Results logged for trend analysis.
|
|
167
|
+
|
|
168
|
+
## Self-check
|
|
169
|
+
- [ ] Agent version tuple defined (model + prompt + tools + config).
|
|
170
|
+
- [ ] Health check probes running every 5 minutes.
|
|
171
|
+
- [ ] Monitoring covers: tokens, latency, errors, quality, cost.
|
|
172
|
+
- [ ] Rollback tested and confirmed instant.
|
|
173
|
+
- [ ] Shadow test shows no regression.
|
|
174
|
+
- [ ] A/B framework ready for future experiments.
|
|
175
|
+
- [ ] Cost per task tracked and within budget.
|
|
176
|
+
- [ ] Graceful degradation implemented for failures.
|
|
@@ -0,0 +1,57 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ai-cost-management
|
|
3
|
+
version: 1.0.0
|
|
4
|
+
min_mindforge_version: 10.5.0
|
|
5
|
+
status: stable
|
|
6
|
+
triggers: AI cost management, token budget optimization, model cost selection, LLM response caching, batch inference processing, AI infrastructure cost, token usage monitoring, cost-aware model routing, inference cost reduction, GPU utilization optimization, AI spend tracking, cost per query optimization
|
|
7
|
+
compose:
|
|
8
|
+
- llm-cost-optimization
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# AI Cost Management & Optimization
|
|
12
|
+
|
|
13
|
+
## When this skill activates
|
|
14
|
+
|
|
15
|
+
This skill activates when optimizing AI inference costs, implementing token budgeting, designing cost-aware model routing, or tracking AI spending at scale. It applies to production AI systems where compute costs are significant (>$10K/month) and must be controlled without degrading user experience.
|
|
16
|
+
|
|
17
|
+
## Mandatory actions when this skill is active
|
|
18
|
+
|
|
19
|
+
### Before writing any code
|
|
20
|
+
|
|
21
|
+
1. **Establish cost baseline** — Measure current costs per dimension: cost per request, cost per user, cost per model, cost per task type. Identify cost drivers: which models, which tasks, which users consume the most budget. Focus optimization efforts on the top 20% of cost drivers (Pareto principle).
|
|
22
|
+
2. **Set cost budgets and alerts** — Define acceptable cost thresholds per day, per week, per month. Set up alerts when spending exceeds thresholds (50%, 80%, 100%). Alerts prevent runaway costs from unexpected usage spikes or inefficient code.
|
|
23
|
+
3. **Design cost attribution model** — Assign costs to cost centers: per-team, per-product, per-customer tier (free vs. paid). Enables chargeback (internal teams pay for their usage) and informs product decisions (should free tier have lower model quality to reduce costs?).
|
|
24
|
+
4. **Identify optimization opportunities** — Rank opportunities by potential savings: caching (30-70% reduction for repetitive queries), model downgrade (10-50% reduction with minimal quality loss), batching (20-40% reduction via throughput optimization), prompt compression (10-30% reduction by reducing tokens).
|
|
25
|
+
|
|
26
|
+
### During implementation
|
|
27
|
+
|
|
28
|
+
- **Implement aggressive response caching** — Cache LLM responses keyed by prompt hash + model + hyperparameters. Cache hit rate >50% is achievable for many use cases. Use Redis or Memcached for low-latency lookups (<1ms). Set TTL based on content freshness requirements (hours for news, days for documentation, indefinite for static content).
|
|
29
|
+
- **Design cost-aware routing** — Route requests to the cheapest model that meets quality requirements. Example: simple classification → Haiku ($0.25/MTok), complex reasoning → Opus ($15/MTok). Measure quality degradation when downgrading models. If accuracy drop is <2%, downgrade is safe.
|
|
30
|
+
- **Compress prompts aggressively** — Remove filler words, use abbreviations, compress whitespace. Test that compressed prompts produce equivalent outputs. Measure compression ratio (original tokens / compressed tokens). Target: 20-50% compression with <1% quality loss.
|
|
31
|
+
- **Batch inference for throughput** — Group requests into batches (10-100 per batch) to maximize GPU utilization. Batching increases throughput (requests/second) and reduces cost per request (amortize fixed overhead). Trade-off: higher latency (wait for batch to fill) for lower cost.
|
|
32
|
+
- **Implement prompt caching (for supported models)** — Use prompt caching for repeated prompt prefixes (system message, context). Claude and GPT-4 support prompt caching. Reduces cost by 90% for cached tokens. Ensure prompts are structured with stable prefixes and variable suffixes.
|
|
33
|
+
- **Track token usage per request** — Log input tokens, output tokens, and total cost per request. Aggregate by model, task type, user, and time. Identify outliers: queries with unusually high token counts (may indicate inefficient prompts or bugs). Set up monitoring dashboards.
|
|
34
|
+
|
|
35
|
+
### After implementation
|
|
36
|
+
|
|
37
|
+
- **Measure cache hit rate** — Track % of requests served from cache. Target: >50% hit rate for production systems with repetitive queries. If lower, analyze cache misses: are prompts subtly different (normalize prompts), is TTL too short (increase TTL), or is traffic too diverse (caching won't help)?
|
|
38
|
+
- **Validate quality after cost optimization** — Compare model accuracy, user satisfaction, or business metrics before and after optimization. Acceptable thresholds: <2% accuracy drop, <5% user satisfaction drop. If degradation is higher, roll back optimizations.
|
|
39
|
+
- **Benchmark cost reduction** — Measure cost per request after optimizations vs. baseline. Target: 30-50% cost reduction from caching + routing + compression. Document savings in $/month and ROI (developer time spent vs. cost saved).
|
|
40
|
+
- **Monitor for cost anomalies** — Set up alerts for sudden cost spikes (>2x daily average) or unusual patterns (single user consuming 10x typical usage). Anomalies indicate bugs (infinite loops, retry storms) or abuse (attackers exploiting free tier).
|
|
41
|
+
|
|
42
|
+
## Self-check before task completion
|
|
43
|
+
|
|
44
|
+
- [ ] Cost baseline is measured per request, user, model, and task type
|
|
45
|
+
- [ ] Cost budgets are set with alerts at 50%, 80%, 100% thresholds
|
|
46
|
+
- [ ] Cost attribution model assigns costs to teams, products, or customer tiers
|
|
47
|
+
- [ ] Optimization opportunities are ranked by potential savings and effort
|
|
48
|
+
- [ ] Response caching is implemented with cache hit rate tracked (target >50%)
|
|
49
|
+
- [ ] Cost-aware routing selects cheapest model that meets quality requirements
|
|
50
|
+
- [ ] Prompt compression achieves 20-50% token reduction with <1% quality loss
|
|
51
|
+
- [ ] Batch inference is implemented for throughput optimization (10-100 requests per batch)
|
|
52
|
+
- [ ] Prompt caching (if supported) reduces cost by 90% for repeated prompt prefixes
|
|
53
|
+
- [ ] Token usage is logged per request and aggregated by model, task, user, time
|
|
54
|
+
- [ ] Cache hit rate is validated at >50% for production systems with repetitive queries
|
|
55
|
+
- [ ] Model quality is validated post-optimization (<2% accuracy drop, <5% satisfaction drop)
|
|
56
|
+
- [ ] Cost reduction is benchmarked (target 30-50% reduction from baseline)
|
|
57
|
+
- [ ] Cost anomaly alerts are configured for spikes (>2x daily average) and abuse patterns
|
|
@@ -0,0 +1,53 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ai-safety-alignment
|
|
3
|
+
version: 1.0.0
|
|
4
|
+
min_mindforge_version: 10.5.0
|
|
5
|
+
status: stable
|
|
6
|
+
triggers: AI safety implementation, AI alignment technique, output filtering AI, content moderation AI, bias detection model, AI guardrail implementation, harmful content prevention, AI red teaming, model safety evaluation, AI ethics implementation, alignment testing, responsible AI deployment
|
|
7
|
+
compose:
|
|
8
|
+
- guardrails-and-safety
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# AI Safety & Alignment
|
|
12
|
+
|
|
13
|
+
## When this skill activates
|
|
14
|
+
|
|
15
|
+
This skill activates when implementing output filters, content moderation systems, bias detection, adversarial robustness, or alignment techniques for AI systems. It applies when deploying AI in production environments where safety failures have legal, reputational, or ethical consequences.
|
|
16
|
+
|
|
17
|
+
## Mandatory actions when this skill is active
|
|
18
|
+
|
|
19
|
+
### Before writing any code
|
|
20
|
+
|
|
21
|
+
1. **Define harm taxonomy** — Categorize potential harms specific to your domain: toxic content, misinformation, bias (demographic, cultural), privacy leaks, prompt injection, jailbreaks, copyright violations. Prioritize by severity and likelihood. Not all harms are equal.
|
|
22
|
+
2. **Establish safety thresholds** — Define numeric thresholds for each harm category: toxicity score <0.3, PII detection confidence >0.9, bias parity ratio 0.8-1.2. Thresholds must be tuned on representative data, not arbitrary guesses.
|
|
23
|
+
3. **Select detection models** — Choose specialized classifiers per harm type: Perspective API or custom models for toxicity, named entity recognition for PII, fairness metrics for bias. General-purpose LLMs are too slow and expensive for real-time safety filtering.
|
|
24
|
+
4. **Design layered defense** — Implement multiple safety layers: input validation (reject malicious prompts), model guardrails (constrain model behavior), output filtering (catch harmful completions), monitoring (detect safety failures post-deployment). Single-layer defense is insufficient.
|
|
25
|
+
|
|
26
|
+
### During implementation
|
|
27
|
+
|
|
28
|
+
- **Implement input sanitization first** — Validate all user inputs before reaching the model. Reject or sanitize: SQL injection patterns, prompt injection attempts (ignore previous instructions), PII in prompts, excessive length, special characters that break parsing. Log rejected inputs for analysis.
|
|
29
|
+
- **Apply constitutional AI principles** — Constrain model behavior via system prompts: "You are a helpful assistant. You must not generate harmful, biased, or illegal content. If asked to do so, politely refuse and explain why." Test refusal behavior extensively.
|
|
30
|
+
- **Use classifier-guided generation** — For high-risk domains, run a safety classifier on every output before returning to the user. If toxicity/bias/PII is detected above threshold, retry generation with a modified prompt or return a safe default response.
|
|
31
|
+
- **Implement red-teaming as tests** — Create automated adversarial tests: jailbreak attempts, bias triggers, edge cases designed to elicit failures. Run these tests in CI/CD. Safety regressions must block deployment.
|
|
32
|
+
- **Log all safety events** — Record every safety filter activation: input rejected, output filtered, threshold exceeded. Include context: user ID, timestamp, input/output text, classifier scores. This data is critical for tuning thresholds and identifying attack patterns.
|
|
33
|
+
- **Design graceful degradation** — When safety filters trigger, provide user-friendly explanations: "I can't generate that content because [reason]." Do not expose internal classifier scores or filter logic (attackers use this to evade detection).
|
|
34
|
+
|
|
35
|
+
### After implementation
|
|
36
|
+
|
|
37
|
+
- **Validate safety coverage** — Test the system with a held-out safety dataset: known toxic examples, bias triggers, jailbreak prompts. Measure recall (% of harmful content caught) and precision (% of flagged content that is truly harmful). Target: recall >95%, precision >90%.
|
|
38
|
+
- **Measure bias across demographics** — Evaluate model outputs for demographic parity, equalized odds, and calibration across protected attributes (race, gender, age). Use fairness toolkits (Fairlearn, AI Fairness 360). Bias gaps >20% require mitigation.
|
|
39
|
+
- **Conduct human red-teaming** — Hire external red-teamers to attempt jailbreaks and adversarial attacks. Automated tests miss creative attack vectors. Budget 20-40 hours of red-teaming per major release.
|
|
40
|
+
- **Monitor safety in production** — Track safety metrics over time: filter activation rate, false positive rate, user complaints about incorrect filtering. Safety degrades as attackers adapt and user behavior shifts.
|
|
41
|
+
|
|
42
|
+
## Self-check before task completion
|
|
43
|
+
|
|
44
|
+
- [ ] Harm taxonomy is defined with severity ratings and likelihood estimates
|
|
45
|
+
- [ ] Safety thresholds are set per harm type and validated on representative data
|
|
46
|
+
- [ ] Input sanitization rejects malicious prompts before reaching the model
|
|
47
|
+
- [ ] Output filtering runs on every completion with <100ms latency overhead
|
|
48
|
+
- [ ] Constitutional AI constraints are embedded in system prompts and tested
|
|
49
|
+
- [ ] Automated red-teaming tests run in CI/CD and block deployment on failures
|
|
50
|
+
- [ ] Safety events are logged with full context for post-hoc analysis
|
|
51
|
+
- [ ] Bias is measured across demographics with fairness parity within acceptable bounds
|
|
52
|
+
- [ ] Human red-teaming has been conducted with documented attack attempts and mitigations
|
|
53
|
+
- [ ] Production monitoring tracks filter activation rate and false positive trends
|