mindforge-cc 10.0.2 → 10.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.mindforge/config.json +73 -2
- package/.mindforge/engine/autonomous/cross-iteration-bridge.md +96 -0
- package/.mindforge/engine/cost-tracking/budget-enforcer.md +68 -0
- package/.mindforge/engine/cost-tracking/router.md +58 -0
- package/.mindforge/engine/cost-tracking/token-ledger.md +77 -0
- package/.mindforge/engine/council/council-protocol.md +96 -0
- package/.mindforge/engine/council/council-templates.md +85 -0
- package/.mindforge/engine/council/synthesis-engine.md +71 -0
- package/.mindforge/engine/cross-model-eval.md +74 -0
- package/.mindforge/engine/instincts/capture-engine.md +63 -0
- package/.mindforge/engine/instincts/instinct-schema.md +76 -0
- package/.mindforge/engine/instincts/promotion-engine.md +77 -0
- package/.mindforge/engine/proactive/signal-detector.md +60 -0
- package/.mindforge/engine/proactive/suggestion-engine.md +100 -0
- package/.mindforge/engine/skills/composition.md +83 -0
- package/.mindforge/engine/skills/loader.md +16 -0
- package/.mindforge/personas/agent-architect.md +57 -0
- package/.mindforge/personas/agent-evaluator.md +162 -0
- package/.mindforge/personas/agent-memory-designer.md +157 -0
- package/.mindforge/personas/agent-ops-engineer.md +120 -0
- package/.mindforge/personas/agent-orchestrator.md +112 -0
- package/.mindforge/personas/ai-economist.md +57 -0
- package/.mindforge/personas/ai-safety-engineer.md +57 -0
- package/.mindforge/personas/analytics-engineer.md +57 -0
- package/.mindforge/personas/anti-pattern-hunter.md +61 -0
- package/.mindforge/personas/api-gateway-designer.md +132 -0
- package/.mindforge/personas/auth-engineer.md +112 -0
- package/.mindforge/personas/build-engineer.md +57 -0
- package/.mindforge/personas/business-analyst.md +56 -0
- package/.mindforge/personas/cache-architect.md +100 -0
- package/.mindforge/personas/causal-scientist.md +57 -0
- package/.mindforge/personas/cdn-architect.md +118 -0
- package/.mindforge/personas/change-agent.md +104 -0
- package/.mindforge/personas/code-narrator.md +52 -0
- package/.mindforge/personas/codegen-specialist.md +68 -0
- package/.mindforge/personas/communication-architect.md +102 -0
- package/.mindforge/personas/compliance-engineer.md +96 -0
- package/.mindforge/personas/consensus-engineer.md +116 -0
- package/.mindforge/personas/contract-tester.md +60 -192
- package/.mindforge/personas/cost-optimizer.md +71 -0
- package/.mindforge/personas/council-architect.md +66 -0
- package/.mindforge/personas/council-critic.md +67 -0
- package/.mindforge/personas/council-pragmatist.md +71 -0
- package/.mindforge/personas/council-skeptic.md +73 -0
- package/.mindforge/personas/data-architect.md +108 -0
- package/.mindforge/personas/data-mesh-architect.md +57 -0
- package/.mindforge/personas/data-pipeline-architect.md +120 -0
- package/.mindforge/personas/de-sloppifier.md +60 -0
- package/.mindforge/personas/debt-manager.md +66 -0
- package/.mindforge/personas/decision-architect.md +82 -51
- package/.mindforge/personas/deployment-captain.md +74 -0
- package/.mindforge/personas/design-system-lead.md +112 -0
- package/.mindforge/personas/dmux-orchestrator.md +75 -0
- package/.mindforge/personas/doc-auditor.md +84 -0
- package/.mindforge/personas/dx-engineer.md +96 -0
- package/.mindforge/personas/ecommerce-engineer.md +57 -0
- package/.mindforge/personas/edge-engineer.md +94 -0
- package/.mindforge/personas/edtech-architect.md +106 -0
- package/.mindforge/personas/embedding-architect.md +57 -0
- package/.mindforge/personas/environment-engineer.md +57 -0
- package/.mindforge/personas/eval-judge.md +55 -0
- package/.mindforge/personas/event-architect.md +102 -0
- package/.mindforge/personas/experiment-designer.md +138 -0
- package/.mindforge/personas/feature-store-engineer.md +57 -0
- package/.mindforge/personas/finops-analyst.md +66 -0
- package/.mindforge/personas/fintech-architect.md +57 -0
- package/.mindforge/personas/flutter-engineer.md +104 -0
- package/.mindforge/personas/gaming-engineer.md +57 -0
- package/.mindforge/personas/graphql-designer.md +73 -0
- package/.mindforge/personas/healthcare-engineer.md +57 -0
- package/.mindforge/personas/hiring-strategist.md +105 -0
- package/.mindforge/personas/hitl-architect.md +165 -0
- package/.mindforge/personas/i18n-architect.md +69 -0
- package/.mindforge/personas/instinct-curator.md +83 -0
- package/.mindforge/personas/iot-architect.md +105 -0
- package/.mindforge/personas/knowledge-curator.md +139 -0
- package/.mindforge/personas/knowledge-engineer.md +57 -0
- package/.mindforge/personas/lakehouse-architect.md +57 -0
- package/.mindforge/personas/llm-orchestrator.md +57 -0
- package/.mindforge/personas/logistics-architect.md +106 -0
- package/.mindforge/personas/market-analyst.md +53 -0
- package/.mindforge/personas/marketplace-engineer.md +105 -0
- package/.mindforge/personas/mcp-designer.md +54 -0
- package/.mindforge/personas/meeting-designer.md +104 -0
- package/.mindforge/personas/mentorship-lead.md +106 -0
- package/.mindforge/personas/migration-architect.md +57 -0
- package/.mindforge/personas/ml-ops-engineer.md +101 -0
- package/.mindforge/personas/mobile-architect.md +105 -0
- package/.mindforge/personas/mobile-security-engineer.md +106 -0
- package/.mindforge/personas/multi-model-bridge.md +86 -0
- package/.mindforge/personas/multi-tenancy-architect.md +71 -0
- package/.mindforge/personas/multimodal-engineer.md +57 -0
- package/.mindforge/personas/offline-specialist.md +105 -0
- package/.mindforge/personas/onboarding-navigator.md +63 -0
- package/.mindforge/personas/payments-engineer.md +135 -0
- package/.mindforge/personas/pipeline-engineer.md +115 -0
- package/.mindforge/personas/platform-engineer.md +97 -0
- package/.mindforge/personas/platform-lead.md +57 -0
- package/.mindforge/personas/privacy-engineer.md +57 -0
- package/.mindforge/personas/product-owner.md +56 -0
- package/.mindforge/personas/productivity-analyst.md +57 -0
- package/.mindforge/personas/prompt-architect.md +101 -0
- package/.mindforge/personas/proofreader.md +53 -0
- package/.mindforge/personas/pwa-architect.md +105 -0
- package/.mindforge/personas/quality-scorer.md +63 -0
- package/.mindforge/personas/react-native-engineer.md +106 -0
- package/.mindforge/personas/resilience-engineer.md +69 -0
- package/.mindforge/personas/rfc-architect.md +64 -0
- package/.mindforge/personas/saga-orchestrator.md +80 -0
- package/.mindforge/personas/secrets-engineer.md +57 -0
- package/.mindforge/personas/skill-smith.md +79 -0
- package/.mindforge/personas/sre-lead.md +107 -0
- package/.mindforge/personas/stream-engineer.md +57 -0
- package/.mindforge/personas/streaming-engineer.md +64 -0
- package/.mindforge/personas/swarm-templates.json +695 -38
- package/.mindforge/personas/system-designer.md +57 -0
- package/.mindforge/personas/team-coach.md +120 -0
- package/.mindforge/personas/tech-lead-coach.md +103 -0
- package/.mindforge/personas/technical-writer-lead.md +111 -0
- package/.mindforge/personas/threat-modeler.md +82 -0
- package/.mindforge/personas/vibe-checker.md +75 -0
- package/.mindforge/personas/worktree-manager.md +56 -0
- package/.mindforge/personas/zero-trust-engineer.md +113 -0
- package/.mindforge/skills/a11y-testing/SKILL.md +143 -0
- package/.mindforge/skills/agent-evaluation-framework/SKILL.md +227 -0
- package/.mindforge/skills/agent-introspection-debugging/SKILL.md +88 -0
- package/.mindforge/skills/agent-loops/SKILL.md +84 -0
- package/.mindforge/skills/agent-memory-design/SKILL.md +199 -0
- package/.mindforge/skills/agent-orchestration-patterns/SKILL.md +129 -0
- package/.mindforge/skills/agent-tool-selection/SKILL.md +204 -0
- package/.mindforge/skills/ai-agent-deployment/SKILL.md +176 -0
- package/.mindforge/skills/ai-cost-management/SKILL.md +57 -0
- package/.mindforge/skills/ai-safety-alignment/SKILL.md +53 -0
- package/.mindforge/skills/analytics-instrumentation/SKILL.md +172 -0
- package/.mindforge/skills/api-gateway-patterns/SKILL.md +177 -0
- package/.mindforge/skills/api-marketplace/SKILL.md +56 -0
- package/.mindforge/skills/api-versioning/SKILL.md +100 -0
- package/.mindforge/skills/app-store-deployment/SKILL.md +44 -0
- package/.mindforge/skills/architecture-tradeoff-analysis/SKILL.md +97 -0
- package/.mindforge/skills/audit-logging/SKILL.md +140 -0
- package/.mindforge/skills/auth-patterns/SKILL.md +148 -0
- package/.mindforge/skills/autonomous-agent-harness/SKILL.md +218 -0
- package/.mindforge/skills/autonomous-agents/SKILL.md +59 -0
- package/.mindforge/skills/autonomous-loops/SKILL.md +105 -0
- package/.mindforge/skills/build-system-optimization/SKILL.md +54 -0
- package/.mindforge/skills/build-vs-buy/SKILL.md +80 -0
- package/.mindforge/skills/bundle-optimization/SKILL.md +174 -0
- package/.mindforge/skills/business-analyst/SKILL.md +82 -0
- package/.mindforge/skills/caching-strategies/SKILL.md +132 -0
- package/.mindforge/skills/capacity-planning/SKILL.md +96 -0
- package/.mindforge/skills/causal-inference/SKILL.md +42 -0
- package/.mindforge/skills/cdn-optimization/SKILL.md +212 -0
- package/.mindforge/skills/change-management/SKILL.md +106 -0
- package/.mindforge/skills/chaos-engineering/SKILL.md +99 -0
- package/.mindforge/skills/ci-cd-pipeline/SKILL.md +118 -0
- package/.mindforge/skills/cli-design/SKILL.md +118 -0
- package/.mindforge/skills/code-generation-patterns/SKILL.md +92 -0
- package/.mindforge/skills/code-review-methodology/SKILL.md +180 -0
- package/.mindforge/skills/code-tour/SKILL.md +145 -0
- package/.mindforge/skills/codebase-onboarding/SKILL.md +95 -0
- package/.mindforge/skills/compliance-as-code/SKILL.md +195 -0
- package/.mindforge/skills/conflict-resolution/SKILL.md +87 -0
- package/.mindforge/skills/connection-pooling/SKILL.md +151 -0
- package/.mindforge/skills/container-security/SKILL.md +151 -0
- package/.mindforge/skills/context-engineering/SKILL.md +114 -0
- package/.mindforge/skills/continuous-learning/SKILL.md +84 -0
- package/.mindforge/skills/contract-testing/SKILL.md +85 -0
- package/.mindforge/skills/cost-aware-routing/SKILL.md +83 -0
- package/.mindforge/skills/cost-estimation/SKILL.md +82 -0
- package/.mindforge/skills/council/SKILL.md +68 -0
- package/.mindforge/skills/cqrs-event-sourcing/SKILL.md +95 -0
- package/.mindforge/skills/cross-platform-testing/SKILL.md +43 -0
- package/.mindforge/skills/data-governance/SKILL.md +42 -0
- package/.mindforge/skills/data-lakehouse/SKILL.md +42 -0
- package/.mindforge/skills/data-mesh/SKILL.md +42 -0
- package/.mindforge/skills/data-modeling/SKILL.md +107 -0
- package/.mindforge/skills/data-pipeline-design/SKILL.md +171 -0
- package/.mindforge/skills/data-privacy-engineering/SKILL.md +42 -0
- package/.mindforge/skills/database-performance/SKILL.md +174 -0
- package/.mindforge/skills/database-sharding-advanced/SKILL.md +206 -0
- package/.mindforge/skills/de-sloppify/SKILL.md +120 -0
- package/.mindforge/skills/defense-in-depth/SKILL.md +84 -0
- package/.mindforge/skills/delegation-patterns/SKILL.md +123 -0
- package/.mindforge/skills/dependency-management/SKILL.md +94 -0
- package/.mindforge/skills/deployment-workflow/SKILL.md +135 -0
- package/.mindforge/skills/design-system/SKILL.md +113 -0
- package/.mindforge/skills/developer-onboarding/SKILL.md +99 -0
- package/.mindforge/skills/developer-productivity-metrics/SKILL.md +59 -0
- package/.mindforge/skills/distributed-consensus/SKILL.md +141 -0
- package/.mindforge/skills/dmux-workflows/SKILL.md +141 -0
- package/.mindforge/skills/dns-architecture/SKILL.md +167 -0
- package/.mindforge/skills/doc-health-audit/SKILL.md +102 -0
- package/.mindforge/skills/ecommerce-architecture/SKILL.md +41 -0
- package/.mindforge/skills/edge-computing/SKILL.md +91 -0
- package/.mindforge/skills/edtech-platform/SKILL.md +41 -0
- package/.mindforge/skills/email-deliverability/SKILL.md +177 -0
- package/.mindforge/skills/embedding-systems/SKILL.md +55 -0
- package/.mindforge/skills/environment-management/SKILL.md +54 -0
- package/.mindforge/skills/error-handling-architecture/SKILL.md +118 -0
- package/.mindforge/skills/estimation-techniques/SKILL.md +113 -0
- package/.mindforge/skills/eval-harness/SKILL.md +180 -0
- package/.mindforge/skills/event-driven-architecture/SKILL.md +162 -0
- package/.mindforge/skills/experiment-design/SKILL.md +139 -0
- package/.mindforge/skills/experiment-platform/SKILL.md +43 -0
- package/.mindforge/skills/feature-engineering/SKILL.md +42 -0
- package/.mindforge/skills/feature-flag-management/SKILL.md +183 -0
- package/.mindforge/skills/fine-tuning-workflow/SKILL.md +189 -0
- package/.mindforge/skills/fintech-patterns/SKILL.md +41 -0
- package/.mindforge/skills/flutter-architecture/SKILL.md +42 -0
- package/.mindforge/skills/gaming-backend/SKILL.md +41 -0
- package/.mindforge/skills/git-workflow-design/SKILL.md +129 -0
- package/.mindforge/skills/graceful-degradation/SKILL.md +95 -0
- package/.mindforge/skills/graphql-patterns/SKILL.md +243 -0
- package/.mindforge/skills/guardrails-and-safety/SKILL.md +137 -0
- package/.mindforge/skills/healthcare-systems/SKILL.md +40 -0
- package/.mindforge/skills/hiring-engineering/SKILL.md +119 -0
- package/.mindforge/skills/human-in-the-loop-design/SKILL.md +234 -0
- package/.mindforge/skills/i18n-architecture/SKILL.md +147 -0
- package/.mindforge/skills/idempotency-patterns/SKILL.md +84 -0
- package/.mindforge/skills/incident-communication/SKILL.md +96 -0
- package/.mindforge/skills/incident-management/SKILL.md +97 -0
- package/.mindforge/skills/infrastructure-as-code/SKILL.md +98 -0
- package/.mindforge/skills/instinct-clustering/SKILL.md +190 -0
- package/.mindforge/skills/internal-developer-platform/SKILL.md +51 -0
- package/.mindforge/skills/iot-platform/SKILL.md +41 -0
- package/.mindforge/skills/k8s-deployment/SKILL.md +358 -0
- package/.mindforge/skills/knowledge-graphs/SKILL.md +56 -0
- package/.mindforge/skills/knowledge-sharing-systems/SKILL.md +112 -0
- package/.mindforge/skills/llm-cost-optimization/SKILL.md +198 -0
- package/.mindforge/skills/llm-orchestration/SKILL.md +56 -0
- package/.mindforge/skills/load-testing/SKILL.md +84 -0
- package/.mindforge/skills/logistics-optimization/SKILL.md +40 -0
- package/.mindforge/skills/market-researcher/SKILL.md +99 -0
- package/.mindforge/skills/marketplace-trust/SKILL.md +40 -0
- package/.mindforge/skills/mcp-server-patterns/SKILL.md +264 -0
- package/.mindforge/skills/media-streaming/SKILL.md +41 -0
- package/.mindforge/skills/meeting-architecture/SKILL.md +146 -0
- package/.mindforge/skills/mentoring-patterns/SKILL.md +77 -0
- package/.mindforge/skills/microservices-patterns/SKILL.md +83 -0
- package/.mindforge/skills/migration-platform/SKILL.md +61 -0
- package/.mindforge/skills/migration-strategies/SKILL.md +129 -0
- package/.mindforge/skills/ml-feature-store/SKILL.md +56 -0
- package/.mindforge/skills/ml-monitoring/SKILL.md +42 -0
- package/.mindforge/skills/mobile-performance/SKILL.md +44 -0
- package/.mindforge/skills/mobile-security/SKILL.md +45 -0
- package/.mindforge/skills/model-evaluation/SKILL.md +53 -0
- package/.mindforge/skills/monorepo-management/SKILL.md +100 -0
- package/.mindforge/skills/multi-llm-consult/SKILL.md +75 -0
- package/.mindforge/skills/multi-tenancy-patterns/SKILL.md +145 -0
- package/.mindforge/skills/multi-turn-conversation-design/SKILL.md +206 -0
- package/.mindforge/skills/multimodal-ai/SKILL.md +51 -0
- package/.mindforge/skills/mutation-testing/SKILL.md +97 -0
- package/.mindforge/skills/notification-system-design/SKILL.md +168 -0
- package/.mindforge/skills/observability-stack/SKILL.md +136 -0
- package/.mindforge/skills/offline-first-design/SKILL.md +43 -0
- package/.mindforge/skills/on-call-design/SKILL.md +111 -0
- package/.mindforge/skills/pagination-patterns/SKILL.md +230 -0
- package/.mindforge/skills/payment-integration/SKILL.md +176 -0
- package/.mindforge/skills/performance-reviews/SKILL.md +140 -0
- package/.mindforge/skills/platform-observability/SKILL.md +58 -0
- package/.mindforge/skills/platform-reliability/SKILL.md +52 -0
- package/.mindforge/skills/post-incident-learning/SKILL.md +96 -0
- package/.mindforge/skills/product-manager/SKILL.md +104 -0
- package/.mindforge/skills/progressive-web-app/SKILL.md +44 -0
- package/.mindforge/skills/prompt-engineering/SKILL.md +94 -0
- package/.mindforge/skills/proofreader/SKILL.md +158 -0
- package/.mindforge/skills/push-notification-architecture/SKILL.md +45 -0
- package/.mindforge/skills/python-performance/SKILL.md +183 -0
- package/.mindforge/skills/quality-audit/SKILL.md +171 -0
- package/.mindforge/skills/queue-design/SKILL.md +85 -0
- package/.mindforge/skills/rag-architecture/SKILL.md +176 -0
- package/.mindforge/skills/rate-limiting-design/SKILL.md +94 -0
- package/.mindforge/skills/react-native-patterns/SKILL.md +42 -0
- package/.mindforge/skills/react-performance/SKILL.md +229 -0
- package/.mindforge/skills/real-time-analytics/SKILL.md +42 -0
- package/.mindforge/skills/real-time-sync/SKILL.md +83 -0
- package/.mindforge/skills/responsive-native/SKILL.md +44 -0
- package/.mindforge/skills/responsive-patterns/SKILL.md +141 -0
- package/.mindforge/skills/rfc-pipeline/SKILL.md +114 -0
- package/.mindforge/skills/saas-multi-tenant/SKILL.md +41 -0
- package/.mindforge/skills/santa-method/SKILL.md +134 -0
- package/.mindforge/skills/search-implementation/SKILL.md +98 -0
- package/.mindforge/skills/secrets-platform/SKILL.md +56 -0
- package/.mindforge/skills/secrets-rotation/SKILL.md +173 -0
- package/.mindforge/skills/self-serve-infrastructure/SKILL.md +51 -0
- package/.mindforge/skills/serverless-patterns/SKILL.md +119 -0
- package/.mindforge/skills/skill-creator-meta/SKILL.md +146 -0
- package/.mindforge/skills/sprint-retrospective-facilitation/SKILL.md +112 -0
- package/.mindforge/skills/stakeholder-communication/SKILL.md +85 -0
- package/.mindforge/skills/state-management/SKILL.md +104 -0
- package/.mindforge/skills/stream-processing/SKILL.md +43 -0
- package/.mindforge/skills/streaming-architecture/SKILL.md +81 -0
- package/.mindforge/skills/supply-chain-security/SKILL.md +145 -0
- package/.mindforge/skills/synthetic-data-generation/SKILL.md +52 -0
- package/.mindforge/skills/system-design/SKILL.md +88 -0
- package/.mindforge/skills/team-topology-design/SKILL.md +107 -0
- package/.mindforge/skills/technical-debt-management/SKILL.md +86 -0
- package/.mindforge/skills/technical-interview-design/SKILL.md +98 -0
- package/.mindforge/skills/technical-leadership/SKILL.md +75 -0
- package/.mindforge/skills/technical-writing/SKILL.md +237 -0
- package/.mindforge/skills/technology-radar/SKILL.md +88 -0
- package/.mindforge/skills/testing-anti-patterns/SKILL.md +288 -0
- package/.mindforge/skills/threat-modeling/SKILL.md +109 -0
- package/.mindforge/skills/tool-design/SKILL.md +138 -0
- package/.mindforge/skills/typescript-advanced/SKILL.md +198 -0
- package/.mindforge/skills/using-git-worktrees/SKILL.md +139 -0
- package/.mindforge/skills/verification-loop/SKILL.md +97 -0
- package/.mindforge/skills/vibe-security/SKILL.md +165 -0
- package/.mindforge/skills/visual-regression-testing/SKILL.md +97 -0
- package/.mindforge/skills/websocket-patterns/SKILL.md +203 -0
- package/.mindforge/skills/writing-plans/SKILL.md +170 -0
- package/.mindforge/skills/writing-skills/SKILL.md +216 -0
- package/.mindforge/skills/zero-trust-architecture/SKILL.md +166 -0
- package/CHANGELOG.md +195 -0
- package/MINDFORGE.md +4 -4
- package/README.md +2 -2
- package/RELEASENOTES.md +66 -0
- package/bin/installer-core.js +1 -1
- package/bin/wizard/theme.js +2 -2
- package/docs/commands-reference.md +18 -1
- package/package.json +2 -2
- package/.mindforge/personas/data-privacy-engineer.md +0 -187
|
@@ -0,0 +1,113 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: mindforge-zero-trust-engineer
|
|
3
|
+
description: Zero-trust network and identity architecture specialist. Designs systems where the network is hostile, location grants zero privilege, and every request proves identity through cryptographic verification.
|
|
4
|
+
tools: Read, Write, Bash, Grep, Glob
|
|
5
|
+
color: obsidian
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
<role>
|
|
9
|
+
You are the MindForge Zero Trust Engineer. You own the identity and access architecture.
|
|
10
|
+
Your job is to ensure that no service, user, or device is trusted by default — regardless
|
|
11
|
+
of network location. Every request must cryptographically prove its identity and authorization.
|
|
12
|
+
</role>
|
|
13
|
+
|
|
14
|
+
<why_this_matters>
|
|
15
|
+
The perimeter is dead. Attackers are already inside. Your architecture determines whether
|
|
16
|
+
a single compromise cascades into full breach or stays contained:
|
|
17
|
+
- **Architect** implements your trust boundaries in system design.
|
|
18
|
+
- **Security Reviewer** validates your policies catch unauthorized access.
|
|
19
|
+
- **Auth Engineer** implements the identity verification you specify.
|
|
20
|
+
- **DevOps** deploys the mTLS and network policies you design.
|
|
21
|
+
</why_this_matters>
|
|
22
|
+
|
|
23
|
+
<philosophy>
|
|
24
|
+
**The Network Is Hostile:**
|
|
25
|
+
Every network segment — internal, external, VPN, or cloud — is treated as compromised.
|
|
26
|
+
Location is not a trust signal. A request from inside the firewall is no more trusted
|
|
27
|
+
than one from the public internet.
|
|
28
|
+
|
|
29
|
+
**Trust Is Earned Per-Request:**
|
|
30
|
+
Trust is not a state. It is computed fresh on every single request based on:
|
|
31
|
+
identity (verified cryptographically) + device (health checked) + context (time, location,
|
|
32
|
+
behavior) + risk score (anomaly detection).
|
|
33
|
+
|
|
34
|
+
**Assume Breach, Limit Blast Radius:**
|
|
35
|
+
Design as if the attacker already has a foothold. The question is not "how do we keep them out?"
|
|
36
|
+
but "how do we prevent lateral movement when they're in?"
|
|
37
|
+
</philosophy>
|
|
38
|
+
|
|
39
|
+
<process>
|
|
40
|
+
|
|
41
|
+
<step name="flow_inventory">
|
|
42
|
+
Map every communication flow in the system:
|
|
43
|
+
- User-to-service (external access).
|
|
44
|
+
- Service-to-service (internal communication).
|
|
45
|
+
- Service-to-data (database, cache, storage).
|
|
46
|
+
- Admin-to-infrastructure (management plane).
|
|
47
|
+
Document who talks to whom, over what protocol, carrying what data.
|
|
48
|
+
</step>
|
|
49
|
+
|
|
50
|
+
<step name="identity_model">
|
|
51
|
+
Define identity for every actor:
|
|
52
|
+
- Users: OIDC/SAML with MFA, short-lived tokens.
|
|
53
|
+
- Services: SPIFFE/SPIRE workload identity, mTLS certificates.
|
|
54
|
+
- Devices: MDM-managed, posture-checked.
|
|
55
|
+
- Admins: Privileged access management, just-in-time elevation.
|
|
56
|
+
</step>
|
|
57
|
+
|
|
58
|
+
<step name="policy_design">
|
|
59
|
+
Implement default-deny with explicit allows:
|
|
60
|
+
- Micro-segmentation (NetworkPolicy, service mesh authorization).
|
|
61
|
+
- Least privilege (minimum permissions, scoped tightly).
|
|
62
|
+
- Time-bounded access (short-lived tokens, session limits).
|
|
63
|
+
- Context-aware decisions (risk score, behavior analysis).
|
|
64
|
+
</step>
|
|
65
|
+
|
|
66
|
+
<step name="mtls_implementation">
|
|
67
|
+
Enable mutual TLS for all service-to-service communication:
|
|
68
|
+
- Service mesh (Istio/Linkerd) for automatic mTLS.
|
|
69
|
+
- Short-lived certificates (24h rotation).
|
|
70
|
+
- SPIFFE IDs for workload identity.
|
|
71
|
+
- Certificate transparency logging.
|
|
72
|
+
</step>
|
|
73
|
+
|
|
74
|
+
<step name="continuous_verification">
|
|
75
|
+
Implement re-verification triggers:
|
|
76
|
+
- Privilege escalation requires step-up auth.
|
|
77
|
+
- Anomalous behavior triggers session review.
|
|
78
|
+
- Device posture changes restrict access.
|
|
79
|
+
- Time-based re-authentication (every 1h for sensitive resources).
|
|
80
|
+
</step>
|
|
81
|
+
|
|
82
|
+
<step name="validation">
|
|
83
|
+
Test the architecture:
|
|
84
|
+
- Compromise one service → verify no lateral movement.
|
|
85
|
+
- Revoke a certificate → verify immediate access loss.
|
|
86
|
+
- Simulate network partition → verify default-deny holds.
|
|
87
|
+
- Attempt unauthorized access from internal network → verify blocked.
|
|
88
|
+
</step>
|
|
89
|
+
|
|
90
|
+
</process>
|
|
91
|
+
|
|
92
|
+
<critical_rules>
|
|
93
|
+
- DEFAULT DENY EVERYTHING — explicitly allow only what's needed.
|
|
94
|
+
- NEVER trust network location as a security signal.
|
|
95
|
+
- mTLS is MANDATORY for all service-to-service communication.
|
|
96
|
+
- Re-verify identity on EVERY privilege escalation.
|
|
97
|
+
- Short-lived credentials only — no long-lived tokens or permanent keys.
|
|
98
|
+
- Log ALL access decisions for forensic audit.
|
|
99
|
+
- Certificate rotation must be automatic and tested.
|
|
100
|
+
- Device posture check before granting user access.
|
|
101
|
+
- Compromising one service MUST NOT grant access to others.
|
|
102
|
+
- VPN is not a security boundary — it's a connectivity tool.
|
|
103
|
+
</critical_rules>
|
|
104
|
+
|
|
105
|
+
<outputs>
|
|
106
|
+
- Communication flow map with trust boundaries.
|
|
107
|
+
- Identity model (per actor type).
|
|
108
|
+
- Network policies (default-deny + explicit allows).
|
|
109
|
+
- mTLS configuration and certificate rotation strategy.
|
|
110
|
+
- Continuous verification rules and triggers.
|
|
111
|
+
- Lateral movement test results.
|
|
112
|
+
- Access decision audit log schema.
|
|
113
|
+
</outputs>
|
|
@@ -0,0 +1,143 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: a11y-testing
|
|
3
|
+
version: 1.0.0
|
|
4
|
+
min_mindforge_version: 0.3.0
|
|
5
|
+
status: stable
|
|
6
|
+
triggers: a11y testing, axe-core, automated accessibility, screen reader testing, keyboard navigation audit, WCAG compliance test, aria validation, focus management test, color contrast check, accessibility CI, accessibility report, assistive technology
|
|
7
|
+
compose: accessibility
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Skill — Accessibility Testing
|
|
11
|
+
|
|
12
|
+
## When this skill activates
|
|
13
|
+
Any task involving accessibility testing, WCAG compliance, screen reader validation,
|
|
14
|
+
keyboard navigation audits, or automated a11y CI pipelines.
|
|
15
|
+
|
|
16
|
+
## Mandatory actions when this skill is active
|
|
17
|
+
|
|
18
|
+
### Before testing accessibility
|
|
19
|
+
1. Identify the target WCAG conformance level (A, AA, or AAA).
|
|
20
|
+
2. Determine which automated tools are available in the project.
|
|
21
|
+
3. Plan manual testing scenarios for what automation cannot catch.
|
|
22
|
+
|
|
23
|
+
### Automated testing (~30% of issues)
|
|
24
|
+
|
|
25
|
+
**Unit level (jest-axe):**
|
|
26
|
+
```javascript
|
|
27
|
+
import { axe, toHaveNoViolations } from 'jest-axe';
|
|
28
|
+
expect.extend(toHaveNoViolations);
|
|
29
|
+
|
|
30
|
+
it('has no accessibility violations', async () => {
|
|
31
|
+
const { container } = render(<Component />);
|
|
32
|
+
const results = await axe(container);
|
|
33
|
+
expect(results).toHaveNoViolations();
|
|
34
|
+
});
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
**Integration level (Playwright + axe):**
|
|
38
|
+
```javascript
|
|
39
|
+
import AxeBuilder from '@axe-core/playwright';
|
|
40
|
+
|
|
41
|
+
test('page has no a11y violations', async ({ page }) => {
|
|
42
|
+
await page.goto('/dashboard');
|
|
43
|
+
const results = await new AxeBuilder({ page }).analyze();
|
|
44
|
+
expect(results.violations).toEqual([]);
|
|
45
|
+
});
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
**CI pipeline:**
|
|
49
|
+
- Run axe-core on every PR against all critical routes.
|
|
50
|
+
- Fail the build on any "critical" or "serious" violations.
|
|
51
|
+
- Report "moderate" violations as warnings (fix in next sprint).
|
|
52
|
+
- Track violation count over time — must trend downward.
|
|
53
|
+
|
|
54
|
+
### Manual testing checklist
|
|
55
|
+
|
|
56
|
+
**Keyboard navigation:**
|
|
57
|
+
- [ ] Tab through ALL interactive elements in logical order.
|
|
58
|
+
- [ ] Shift+Tab moves backwards correctly.
|
|
59
|
+
- [ ] Enter/Space activates buttons and links.
|
|
60
|
+
- [ ] Escape closes modals, dropdowns, popovers.
|
|
61
|
+
- [ ] Arrow keys navigate within composite widgets (tabs, menus, grids).
|
|
62
|
+
- [ ] No keyboard traps (can always Tab out, except modals).
|
|
63
|
+
- [ ] Focus indicator is clearly visible on all elements.
|
|
64
|
+
|
|
65
|
+
**Screen reader testing:**
|
|
66
|
+
- [ ] VoiceOver (macOS/iOS) — full user flow.
|
|
67
|
+
- [ ] NVDA (Windows) — full user flow.
|
|
68
|
+
- [ ] All images have meaningful alt text (or alt="" for decorative).
|
|
69
|
+
- [ ] Form inputs have associated labels.
|
|
70
|
+
- [ ] Dynamic content changes are announced (aria-live regions).
|
|
71
|
+
- [ ] Headings form a logical hierarchy (h1 > h2 > h3, no skips).
|
|
72
|
+
|
|
73
|
+
**Visual testing:**
|
|
74
|
+
- [ ] Zoom to 200% — no horizontal scroll, no overlapping content.
|
|
75
|
+
- [ ] Zoom to 400% — content still readable and usable.
|
|
76
|
+
- [ ] High contrast mode — all content visible.
|
|
77
|
+
- [ ] Reduced motion — animations respect prefers-reduced-motion.
|
|
78
|
+
|
|
79
|
+
### WCAG conformance levels
|
|
80
|
+
|
|
81
|
+
**Level A (minimum, always required):**
|
|
82
|
+
- All non-text content has text alternative.
|
|
83
|
+
- Content is navigable by keyboard.
|
|
84
|
+
- No content causes seizures.
|
|
85
|
+
|
|
86
|
+
**Level AA (target for most applications):**
|
|
87
|
+
- Color contrast ratio 4.5:1 for normal text, 3:1 for large text.
|
|
88
|
+
- Text can be resized to 200% without loss of content.
|
|
89
|
+
- Focus order is meaningful and logical.
|
|
90
|
+
- Error messages identify the field and suggest correction.
|
|
91
|
+
|
|
92
|
+
**Level AAA (specialized — not typically a blanket requirement):**
|
|
93
|
+
- Color contrast ratio 7:1 for normal text.
|
|
94
|
+
- Sign language interpretation for media.
|
|
95
|
+
- Reading level accommodations.
|
|
96
|
+
|
|
97
|
+
### Focus management
|
|
98
|
+
|
|
99
|
+
**Modal dialogs:**
|
|
100
|
+
- Move focus into the modal when opened.
|
|
101
|
+
- Trap focus within the modal (Tab cycles inside).
|
|
102
|
+
- Return focus to the trigger element when closed.
|
|
103
|
+
|
|
104
|
+
**Route changes (SPA):**
|
|
105
|
+
- Move focus to the main content heading on navigation.
|
|
106
|
+
- Announce the new page to screen readers (aria-live or document.title).
|
|
107
|
+
|
|
108
|
+
**Dynamic content:**
|
|
109
|
+
- New content added below the current focus: no announcement needed.
|
|
110
|
+
- New content that requires attention: use aria-live="polite".
|
|
111
|
+
- Urgent alerts: use aria-live="assertive" (sparingly).
|
|
112
|
+
|
|
113
|
+
**Skip links:**
|
|
114
|
+
- First focusable element should be "Skip to main content."
|
|
115
|
+
- Links to bypass repetitive navigation blocks.
|
|
116
|
+
|
|
117
|
+
### Color contrast
|
|
118
|
+
|
|
119
|
+
**Tools:**
|
|
120
|
+
- Browser DevTools (Accessibility panel shows contrast ratios).
|
|
121
|
+
- axe-core catches contrast violations automatically.
|
|
122
|
+
- Contrast checker plugins for design tools (Figma, Sketch).
|
|
123
|
+
|
|
124
|
+
**Ratios:**
|
|
125
|
+
- Normal text (< 18px or < 14px bold): minimum 4.5:1.
|
|
126
|
+
- Large text (>= 18px or >= 14px bold): minimum 3:1.
|
|
127
|
+
- UI components and graphical objects: minimum 3:1.
|
|
128
|
+
- Never convey information by color alone (add icons, patterns, text).
|
|
129
|
+
|
|
130
|
+
### Reporting format
|
|
131
|
+
|
|
132
|
+
When reporting accessibility issues, include:
|
|
133
|
+
1. **What** — the specific WCAG criterion violated.
|
|
134
|
+
2. **Where** — page URL and element selector/description.
|
|
135
|
+
3. **Impact** — who is affected and how severely.
|
|
136
|
+
4. **Fix** — specific remediation recommendation.
|
|
137
|
+
5. **Priority** — critical (blocks usage) / serious (difficult) / moderate (inconvenient).
|
|
138
|
+
|
|
139
|
+
## Self-check before task completion
|
|
140
|
+
- [ ] Did I follow the mandatory actions for this skill?
|
|
141
|
+
- [ ] Did I apply the patterns appropriate to the context?
|
|
142
|
+
- [ ] Did I verify the implementation meets the criteria above?
|
|
143
|
+
- [ ] Did I document decisions and trade-offs made?
|
|
@@ -0,0 +1,227 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: agent-evaluation-framework
|
|
3
|
+
version: 1.0.0
|
|
4
|
+
min_mindforge_version: 10.0.4
|
|
5
|
+
status: stable
|
|
6
|
+
triggers: agent evaluation, task completion rate, agent benchmark, reasoning quality, tool selection accuracy, agent cost efficiency, end-to-end agent test, agent regression, agent quality score, agent performance metric, evaluation harness design, agent grading
|
|
7
|
+
compose: eval-harness
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Skill — Agent Evaluation Framework (End-to-End Agent Performance Measurement)
|
|
11
|
+
|
|
12
|
+
## When this skill activates
|
|
13
|
+
When measuring agent performance, designing agent benchmarks, tracking quality
|
|
14
|
+
regressions, or comparing agent configurations. Use for any scenario where you
|
|
15
|
+
need to answer: "Is this agent good enough?" or "Did this change make the agent
|
|
16
|
+
better or worse?"
|
|
17
|
+
|
|
18
|
+
Core principle: **Multi-dimensional quality** — agent quality is not a single number.
|
|
19
|
+
A fast agent that's wrong is worse than a slow agent that's right. A cheap agent
|
|
20
|
+
that hallucinates is worse than an expensive agent that's accurate. Measure ALL
|
|
21
|
+
dimensions that matter.
|
|
22
|
+
|
|
23
|
+
## Mandatory actions when this skill is active
|
|
24
|
+
|
|
25
|
+
### Metric Taxonomy
|
|
26
|
+
|
|
27
|
+
1. **Core agent metrics (measure ALL of these):**
|
|
28
|
+
```
|
|
29
|
+
Correctness metrics:
|
|
30
|
+
- Task completion rate: % of tasks completed successfully (end-to-end)
|
|
31
|
+
- First-attempt success rate: % completed without retry or correction
|
|
32
|
+
- Factual accuracy: % of claims that are verifiable and correct
|
|
33
|
+
- Instruction adherence: % of explicit instructions followed correctly
|
|
34
|
+
|
|
35
|
+
Efficiency metrics:
|
|
36
|
+
- Cost per task: total API spend / successful completions
|
|
37
|
+
- Tokens per task: input + output tokens consumed
|
|
38
|
+
- Time per task: wall-clock time from task start to completion
|
|
39
|
+
- Tool calls per task: number of tool invocations (fewer = more efficient)
|
|
40
|
+
|
|
41
|
+
Quality metrics:
|
|
42
|
+
- Reasoning quality score: rubric-based assessment of reasoning chain
|
|
43
|
+
- Tool selection accuracy: % of tool calls that were appropriate
|
|
44
|
+
- Output quality score: rubric-based assessment of final output
|
|
45
|
+
- Hallucination rate: % of outputs containing ungrounded claims
|
|
46
|
+
|
|
47
|
+
Safety metrics:
|
|
48
|
+
- Harmful output rate: % of outputs flagged by safety classifiers
|
|
49
|
+
- Permission violation rate: % of actions exceeding authorized scope
|
|
50
|
+
- Information leakage rate: % of outputs exposing sensitive data
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
2. **Composite quality score:**
|
|
54
|
+
```
|
|
55
|
+
Agent Quality Score = weighted combination:
|
|
56
|
+
- Correctness (40%): task_completion * 0.25 + first_attempt * 0.15
|
|
57
|
+
- Quality (30%): reasoning_quality * 0.15 + output_quality * 0.15
|
|
58
|
+
- Efficiency (20%): normalized(1/cost) * 0.10 + normalized(1/time) * 0.10
|
|
59
|
+
- Safety (10%): (1 - harmful_rate) * 0.05 + (1 - violation_rate) * 0.05
|
|
60
|
+
|
|
61
|
+
Weights are defaults — adjust per use case (safety-critical → increase safety weight)
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
### Benchmark Design
|
|
65
|
+
|
|
66
|
+
3. **Evaluation dataset structure:**
|
|
67
|
+
```
|
|
68
|
+
.mindforge/evals/agent-benchmark/
|
|
69
|
+
├── config.json # benchmark metadata and thresholds
|
|
70
|
+
├── tasks/
|
|
71
|
+
│ ├── easy/ # baseline tasks (should be ~100% success)
|
|
72
|
+
│ │ ├── task-001.json
|
|
73
|
+
│ │ └── task-002.json
|
|
74
|
+
│ ├── medium/ # standard tasks (target: 80%+ success)
|
|
75
|
+
│ │ ├── task-010.json
|
|
76
|
+
│ │ └── task-011.json
|
|
77
|
+
│ └── hard/ # stretch tasks (target: 50%+ success)
|
|
78
|
+
│ ├── task-020.json
|
|
79
|
+
│ └── task-021.json
|
|
80
|
+
├── rubrics/
|
|
81
|
+
│ ├── correctness.md # how to grade correctness
|
|
82
|
+
│ ├── reasoning.md # how to grade reasoning quality
|
|
83
|
+
│ └── output.md # how to grade output quality
|
|
84
|
+
└── results/
|
|
85
|
+
└── results.jsonl # append-only results log
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
4. **Task definition format:**
|
|
89
|
+
```json
|
|
90
|
+
{
|
|
91
|
+
"task_id": "task-001",
|
|
92
|
+
"difficulty": "easy",
|
|
93
|
+
"category": "code-generation",
|
|
94
|
+
"description": "Write a function that reverses a string",
|
|
95
|
+
"input": "Create a TypeScript function reverseString(s: string): string",
|
|
96
|
+
"expected_behavior": [
|
|
97
|
+
"Returns reversed string",
|
|
98
|
+
"Handles empty string",
|
|
99
|
+
"Handles unicode correctly"
|
|
100
|
+
],
|
|
101
|
+
"verification": {
|
|
102
|
+
"type": "code",
|
|
103
|
+
"test_cases": [
|
|
104
|
+
{"input": "hello", "expected": "olleh"},
|
|
105
|
+
{"input": "", "expected": ""},
|
|
106
|
+
{"input": "abc", "expected": "cba"}
|
|
107
|
+
]
|
|
108
|
+
},
|
|
109
|
+
"metadata": {
|
|
110
|
+
"tools_available": ["Read", "Write", "Bash"],
|
|
111
|
+
"time_limit_seconds": 120,
|
|
112
|
+
"cost_limit_usd": 0.50
|
|
113
|
+
}
|
|
114
|
+
}
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
Rules:
|
|
118
|
+
- Minimum 30 tasks per benchmark (10 easy, 15 medium, 5 hard)
|
|
119
|
+
- Tasks must be representative of real usage patterns
|
|
120
|
+
- Include both deterministic tasks (one right answer) and generative tasks (rubric-graded)
|
|
121
|
+
- Each task has explicit success criteria (not vague "good output")
|
|
122
|
+
- Stratify by difficulty to detect capability thresholds
|
|
123
|
+
|
|
124
|
+
### Running Benchmarks
|
|
125
|
+
|
|
126
|
+
5. **Execution protocol:**
|
|
127
|
+
```
|
|
128
|
+
For each task in benchmark:
|
|
129
|
+
1. Initialize fresh agent context (no contamination between tasks)
|
|
130
|
+
2. Provide task input + available tools
|
|
131
|
+
3. Record: start_time, all tool calls, all outputs, end_time
|
|
132
|
+
4. Grade output against verification criteria
|
|
133
|
+
5. Log full result to results.jsonl
|
|
134
|
+
|
|
135
|
+
Run N times per task (N >= 3) to measure variance:
|
|
136
|
+
- Report mean and standard deviation per metric
|
|
137
|
+
- Flag high-variance tasks (inconsistent agent behavior)
|
|
138
|
+
- Use same random seed where possible for reproducibility
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
6. **Result logging:**
|
|
142
|
+
```json
|
|
143
|
+
{
|
|
144
|
+
"run_id": "uuid",
|
|
145
|
+
"timestamp": "ISO-8601",
|
|
146
|
+
"task_id": "task-001",
|
|
147
|
+
"agent_config": {"model": "claude-sonnet", "temperature": 0.0},
|
|
148
|
+
"metrics": {
|
|
149
|
+
"completed": true,
|
|
150
|
+
"first_attempt": true,
|
|
151
|
+
"time_seconds": 15.3,
|
|
152
|
+
"cost_usd": 0.012,
|
|
153
|
+
"tokens_used": {"input": 1200, "output": 450},
|
|
154
|
+
"tool_calls": 3,
|
|
155
|
+
"reasoning_quality": 4,
|
|
156
|
+
"output_quality": 5
|
|
157
|
+
},
|
|
158
|
+
"grading": {
|
|
159
|
+
"method": "code",
|
|
160
|
+
"pass": true,
|
|
161
|
+
"evidence": "All 3 test cases passed"
|
|
162
|
+
}
|
|
163
|
+
}
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
### Regression Detection
|
|
167
|
+
|
|
168
|
+
7. **Regression detection algorithm:**
|
|
169
|
+
```
|
|
170
|
+
Compare current run vs baseline:
|
|
171
|
+
|
|
172
|
+
RED (regression detected — blocks deployment):
|
|
173
|
+
- Task completion rate drops > 5%
|
|
174
|
+
- Any previously-passing easy task now fails
|
|
175
|
+
- Cost per task increases > 50%
|
|
176
|
+
- Safety metric degrades at all
|
|
177
|
+
|
|
178
|
+
YELLOW (warning — investigate before deploying):
|
|
179
|
+
- Task completion rate drops 2-5%
|
|
180
|
+
- Medium/hard task pass rate drops > 10%
|
|
181
|
+
- Time per task increases > 30%
|
|
182
|
+
- New failure modes appear
|
|
183
|
+
|
|
184
|
+
GREEN (no regression):
|
|
185
|
+
- All metrics within 2% of baseline
|
|
186
|
+
- No new failure modes
|
|
187
|
+
- Cost/time stable or improved
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
Rules:
|
|
191
|
+
- ALWAYS compare to a pinned baseline (not just previous run)
|
|
192
|
+
- Run regression suite before any agent config change ships
|
|
193
|
+
- Regression in EASY tasks is more alarming than regression in HARD tasks
|
|
194
|
+
- Store baseline with agent version (update baseline when intentionally accepting changes)
|
|
195
|
+
|
|
196
|
+
### Cost Efficiency Analysis
|
|
197
|
+
|
|
198
|
+
8. **Quality-per-dollar assessment:**
|
|
199
|
+
```
|
|
200
|
+
Cost Efficiency Ratio = quality_score / cost_per_task
|
|
201
|
+
|
|
202
|
+
Comparison framework:
|
|
203
|
+
- Agent A: quality=0.92, cost=$0.05/task → efficiency=18.4
|
|
204
|
+
- Agent B: quality=0.88, cost=$0.01/task → efficiency=88.0
|
|
205
|
+
|
|
206
|
+
Decision: Agent B is 4.8x more cost-efficient.
|
|
207
|
+
Choose A only if the 4% quality gap causes real user-visible failures.
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
Rules:
|
|
211
|
+
- A cheaper model that achieves 95% of the quality at 20% of the cost is usually better
|
|
212
|
+
- Factor in retry cost (low first-attempt rate = hidden cost multiplier)
|
|
213
|
+
- Include tool call costs in total cost (API calls, compute)
|
|
214
|
+
- Report cost efficiency alongside raw quality (both matter)
|
|
215
|
+
|
|
216
|
+
## Self-check before task completion
|
|
217
|
+
|
|
218
|
+
Before marking a task done when this skill was active:
|
|
219
|
+
|
|
220
|
+
- [ ] Did I define metrics across all four dimensions (correctness, quality, efficiency, safety)?
|
|
221
|
+
- [ ] Is the benchmark stratified by difficulty (easy/medium/hard)?
|
|
222
|
+
- [ ] Did I run multiple times (N >= 3) to measure variance?
|
|
223
|
+
- [ ] Is there a pinned baseline for regression detection?
|
|
224
|
+
- [ ] Are regression thresholds defined (RED/YELLOW/GREEN)?
|
|
225
|
+
- [ ] Did I report cost efficiency (quality/cost ratio), not just raw quality?
|
|
226
|
+
- [ ] Are easy-task failures treated as more alarming than hard-task failures?
|
|
227
|
+
- [ ] Are results appended to results.jsonl (never overwritten)?
|
|
@@ -0,0 +1,88 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: agent-introspection-debugging
|
|
3
|
+
version: 1.0.0
|
|
4
|
+
min_mindforge_version: 10.0.3
|
|
5
|
+
status: stable
|
|
6
|
+
triggers: introspect, agent failure, reasoning failure, self-debug, agent stuck, hallucination, context overflow, reasoning trace, agent error, token waste, spinning
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# Skill — Agent Introspection Debugging
|
|
10
|
+
|
|
11
|
+
## When this skill activates
|
|
12
|
+
When an agent is stuck, producing incorrect outputs, hallucinating, wasting
|
|
13
|
+
tokens on repeated failed attempts, or when reasoning quality has degraded.
|
|
14
|
+
|
|
15
|
+
## Mandatory actions when this skill is active
|
|
16
|
+
|
|
17
|
+
### The 4-Phase Self-Debug Protocol
|
|
18
|
+
|
|
19
|
+
**Phase 1 — Failure Capture**
|
|
20
|
+
Document exactly what went wrong:
|
|
21
|
+
- What was the agent trying to accomplish?
|
|
22
|
+
- What did it actually produce?
|
|
23
|
+
- What was the expected outcome?
|
|
24
|
+
- What context was available at the time?
|
|
25
|
+
- How many tokens/iterations were spent before failure was detected?
|
|
26
|
+
|
|
27
|
+
**Phase 2 — Diagnosis**
|
|
28
|
+
Identify WHY the reasoning failed:
|
|
29
|
+
|
|
30
|
+
| Failure Mode | Symptoms | Root Cause |
|
|
31
|
+
|-------------|----------|-----------|
|
|
32
|
+
| Context overflow | Repeating earlier mistakes, forgetting constraints | Context window exceeded, compaction lost key info |
|
|
33
|
+
| Hallucination | Confident claims about non-existent code/APIs | Insufficient grounding, no verification step |
|
|
34
|
+
| Loop spinning | Same action repeated 3+ times without progress | No exit condition, stuck-detection not triggered |
|
|
35
|
+
| Scope creep | Task expanding beyond original spec | Missing constraints, no scope boundary check |
|
|
36
|
+
| Stale context | Acting on outdated information | Context not refreshed, old file contents cached |
|
|
37
|
+
| Wrong persona | Security review giving UX advice | Persona mismatch, wrong skill loaded |
|
|
38
|
+
|
|
39
|
+
**Phase 3 — Contained Recovery**
|
|
40
|
+
Fix the problem WITHOUT expanding the blast radius:
|
|
41
|
+
1. Identify the MINIMUM change needed to recover
|
|
42
|
+
2. Do NOT restart from scratch unless absolutely necessary
|
|
43
|
+
3. Do NOT make speculative changes beyond the fix
|
|
44
|
+
4. Verify the recovery actually works (don't assume)
|
|
45
|
+
5. If recovery fails after 2 attempts: ESCALATE (do not keep trying)
|
|
46
|
+
|
|
47
|
+
**Phase 4 — Introspection Report**
|
|
48
|
+
Write structured output to `.planning/INTROSPECTION-[timestamp].md`:
|
|
49
|
+
```markdown
|
|
50
|
+
# Introspection Report
|
|
51
|
+
Date: [timestamp]
|
|
52
|
+
Session: [session-id]
|
|
53
|
+
Failure type: [from diagnosis table]
|
|
54
|
+
|
|
55
|
+
## What Happened
|
|
56
|
+
[1-2 sentences describing the failure]
|
|
57
|
+
|
|
58
|
+
## Root Cause
|
|
59
|
+
[Why this happened — be specific]
|
|
60
|
+
|
|
61
|
+
## Recovery Action
|
|
62
|
+
[What was done to fix it]
|
|
63
|
+
|
|
64
|
+
## Prevention
|
|
65
|
+
[What should change to prevent recurrence]
|
|
66
|
+
- [ ] Instinct to capture? [yes/no — if yes, create via learn-instinct]
|
|
67
|
+
- [ ] Skill gap? [yes/no — if yes, what skill is missing]
|
|
68
|
+
- [ ] Config change needed? [yes/no — what setting]
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
### Introspection Triggers
|
|
72
|
+
Automatically invoke this skill when:
|
|
73
|
+
- Stuck-detector fires (3 iterations, no progress)
|
|
74
|
+
- Token usage exceeds 3x estimate for a task
|
|
75
|
+
- Same error appears 2+ times in consecutive attempts
|
|
76
|
+
- User says "stop", "that's wrong", "you're stuck", "try again differently"
|
|
77
|
+
|
|
78
|
+
### During introspection
|
|
79
|
+
- PAUSE all other work — introspection is the priority
|
|
80
|
+
- Read recent AUDIT entries for context on what was attempted
|
|
81
|
+
- Check SHARED_TASK_NOTES.md for cross-iteration patterns
|
|
82
|
+
- Never blame external factors without evidence (check your own reasoning first)
|
|
83
|
+
|
|
84
|
+
### After introspection
|
|
85
|
+
- Log introspection event in AUDIT
|
|
86
|
+
- Consider whether this warrants a new instinct (via continuous-learning)
|
|
87
|
+
- Resume work only after recovery is verified
|
|
88
|
+
- If pattern repeats: escalate to user, do not keep self-debugging
|
|
@@ -0,0 +1,84 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: agent-loops
|
|
3
|
+
version: 1.0.0
|
|
4
|
+
min_mindforge_version: 10.0.3
|
|
5
|
+
status: stable
|
|
6
|
+
triggers: loop, circuit breaker, retry, fallback, agent loop, orchestration, self-repair, recovery, sequential execution, iteration, backoff, provider fallback
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# Skill — Agent Loops
|
|
10
|
+
|
|
11
|
+
## When this skill activates
|
|
12
|
+
Any task involving repeated automated execution, retry logic, autonomous pipelines,
|
|
13
|
+
or self-repairing agent workflows. Also activates when implementing circuit breakers
|
|
14
|
+
or provider-aware fallback chains.
|
|
15
|
+
|
|
16
|
+
## Mandatory actions when this skill is active
|
|
17
|
+
|
|
18
|
+
### Before implementation
|
|
19
|
+
1. Define the loop's **termination condition** explicitly. No infinite loops without escape.
|
|
20
|
+
2. Set a **maximum iteration count** (default: 10 for code changes, 50 for data processing).
|
|
21
|
+
3. Identify the **checkpoint mechanism** — how will state be preserved between iterations?
|
|
22
|
+
|
|
23
|
+
### Loop Patterns
|
|
24
|
+
|
|
25
|
+
**Sequential Pipeline:**
|
|
26
|
+
```
|
|
27
|
+
Task 1 -> Task 2 -> Task 3 -> ... -> Complete
|
|
28
|
+
```
|
|
29
|
+
- Each task must succeed before the next starts
|
|
30
|
+
- On failure: log, checkpoint state, halt with context for resumption
|
|
31
|
+
- Use when: tasks have strict ordering dependencies
|
|
32
|
+
|
|
33
|
+
**Circuit Breaker Pattern:**
|
|
34
|
+
```
|
|
35
|
+
Attempt -> Success? -> Continue
|
|
36
|
+
| No
|
|
37
|
+
Failure count++
|
|
38
|
+
|
|
|
39
|
+
Count >= threshold?
|
|
40
|
+
| Yes
|
|
41
|
+
OPEN circuit -> wait -> half-open -> retry once
|
|
42
|
+
```
|
|
43
|
+
- Threshold: 3 consecutive failures opens the circuit
|
|
44
|
+
- Backoff: exponential (1s, 2s, 4s, 8s, max 60s)
|
|
45
|
+
- Half-open: after backoff, allow ONE request through
|
|
46
|
+
- If half-open succeeds: close circuit, resume normal operation
|
|
47
|
+
- If half-open fails: re-open circuit, double backoff
|
|
48
|
+
|
|
49
|
+
**Provider-Aware Fallback Chain:**
|
|
50
|
+
```
|
|
51
|
+
Primary Model -> Timeout/Error? -> Fallback Model -> Timeout/Error? -> Degrade gracefully
|
|
52
|
+
```
|
|
53
|
+
- Always try primary model first (respects cost-aware-routing tier)
|
|
54
|
+
- On timeout (>30s) or error: switch to fallback
|
|
55
|
+
- Fallback models: same tier or one tier down
|
|
56
|
+
- Log every fallback with reason in AUDIT
|
|
57
|
+
- Never silently degrade — always inform user of fallback
|
|
58
|
+
|
|
59
|
+
**Self-Repair Loop:**
|
|
60
|
+
```
|
|
61
|
+
Execute -> Verify -> Pass? -> Done
|
|
62
|
+
| No
|
|
63
|
+
Diagnose -> Fix -> Re-verify (max 3 attempts)
|
|
64
|
+
```
|
|
65
|
+
- After 3 failed self-repair attempts: STOP and escalate to user
|
|
66
|
+
- Each repair attempt must be DIFFERENT from the previous
|
|
67
|
+
- Log each diagnosis and attempted fix
|
|
68
|
+
|
|
69
|
+
### During implementation
|
|
70
|
+
- Every loop MUST have: max iterations, checkpoint logic, escalation path
|
|
71
|
+
- Never catch-and-swallow errors in loop bodies — always log with context
|
|
72
|
+
- Track iteration count in AUDIT entries
|
|
73
|
+
- Use SHARED_TASK_NOTES.md for cross-iteration context (see cross-iteration-bridge.md)
|
|
74
|
+
|
|
75
|
+
### After implementation
|
|
76
|
+
- Verify the loop terminates under all test conditions
|
|
77
|
+
- Verify the circuit breaker opens and closes correctly
|
|
78
|
+
- Confirm escalation path works (simulate max-retries-exceeded)
|
|
79
|
+
|
|
80
|
+
## Self-check before task completion
|
|
81
|
+
- [ ] Did I define explicit termination conditions for every loop?
|
|
82
|
+
- [ ] Did I set maximum iteration limits (no unbounded loops)?
|
|
83
|
+
- [ ] Did I implement checkpoint/state persistence between iterations?
|
|
84
|
+
- [ ] Did I verify the escalation path works when max retries are exceeded?
|