mindforge-cc 10.0.3 → 11.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.mindforge/MINDFORGE-V2-SCHEMA.json +43 -10
- package/.mindforge/config.json +30 -2
- package/.mindforge/engine/cross-model-eval.md +74 -0
- package/.mindforge/engine/proactive/signal-detector.md +60 -0
- package/.mindforge/engine/proactive/suggestion-engine.md +100 -0
- package/.mindforge/personas/agent-architect.md +57 -0
- package/.mindforge/personas/agent-evaluator.md +162 -0
- package/.mindforge/personas/agent-memory-designer.md +157 -0
- package/.mindforge/personas/agent-ops-engineer.md +120 -0
- package/.mindforge/personas/agent-orchestrator.md +112 -0
- package/.mindforge/personas/ai-economist.md +57 -0
- package/.mindforge/personas/ai-safety-engineer.md +57 -0
- package/.mindforge/personas/analytics-engineer.md +57 -0
- package/.mindforge/personas/anti-pattern-hunter.md +61 -0
- package/.mindforge/personas/api-gateway-designer.md +132 -0
- package/.mindforge/personas/auth-engineer.md +112 -0
- package/.mindforge/personas/build-engineer.md +57 -0
- package/.mindforge/personas/business-analyst.md +56 -0
- package/.mindforge/personas/cache-architect.md +100 -0
- package/.mindforge/personas/causal-scientist.md +57 -0
- package/.mindforge/personas/cdn-architect.md +118 -0
- package/.mindforge/personas/change-agent.md +104 -0
- package/.mindforge/personas/code-narrator.md +52 -0
- package/.mindforge/personas/codegen-specialist.md +68 -0
- package/.mindforge/personas/communication-architect.md +102 -0
- package/.mindforge/personas/compliance-engineer.md +96 -0
- package/.mindforge/personas/consensus-engineer.md +116 -0
- package/.mindforge/personas/contract-tester.md +60 -192
- package/.mindforge/personas/data-architect.md +108 -0
- package/.mindforge/personas/data-mesh-architect.md +57 -0
- package/.mindforge/personas/data-pipeline-architect.md +120 -0
- package/.mindforge/personas/de-sloppifier.md +60 -0
- package/.mindforge/personas/debt-manager.md +66 -0
- package/.mindforge/personas/decision-architect.md +82 -51
- package/.mindforge/personas/deployment-captain.md +74 -0
- package/.mindforge/personas/design-system-lead.md +112 -0
- package/.mindforge/personas/dmux-orchestrator.md +75 -0
- package/.mindforge/personas/dx-engineer.md +96 -0
- package/.mindforge/personas/ecommerce-engineer.md +57 -0
- package/.mindforge/personas/edge-engineer.md +94 -0
- package/.mindforge/personas/edtech-architect.md +106 -0
- package/.mindforge/personas/embedding-architect.md +57 -0
- package/.mindforge/personas/environment-engineer.md +57 -0
- package/.mindforge/personas/eval-judge.md +55 -0
- package/.mindforge/personas/event-architect.md +102 -0
- package/.mindforge/personas/experiment-designer.md +138 -0
- package/.mindforge/personas/feature-store-engineer.md +57 -0
- package/.mindforge/personas/finops-analyst.md +66 -0
- package/.mindforge/personas/fintech-architect.md +57 -0
- package/.mindforge/personas/flutter-engineer.md +104 -0
- package/.mindforge/personas/gaming-engineer.md +57 -0
- package/.mindforge/personas/graphql-designer.md +73 -0
- package/.mindforge/personas/healthcare-engineer.md +57 -0
- package/.mindforge/personas/hiring-strategist.md +105 -0
- package/.mindforge/personas/hitl-architect.md +165 -0
- package/.mindforge/personas/i18n-architect.md +69 -0
- package/.mindforge/personas/iot-architect.md +105 -0
- package/.mindforge/personas/knowledge-curator.md +139 -0
- package/.mindforge/personas/knowledge-engineer.md +57 -0
- package/.mindforge/personas/lakehouse-architect.md +57 -0
- package/.mindforge/personas/llm-orchestrator.md +57 -0
- package/.mindforge/personas/logistics-architect.md +106 -0
- package/.mindforge/personas/market-analyst.md +53 -0
- package/.mindforge/personas/marketplace-engineer.md +105 -0
- package/.mindforge/personas/mcp-designer.md +54 -0
- package/.mindforge/personas/meeting-designer.md +104 -0
- package/.mindforge/personas/mentorship-lead.md +106 -0
- package/.mindforge/personas/migration-architect.md +57 -0
- package/.mindforge/personas/ml-ops-engineer.md +101 -0
- package/.mindforge/personas/mobile-architect.md +105 -0
- package/.mindforge/personas/mobile-security-engineer.md +106 -0
- package/.mindforge/personas/multi-tenancy-architect.md +71 -0
- package/.mindforge/personas/multimodal-engineer.md +57 -0
- package/.mindforge/personas/offline-specialist.md +105 -0
- package/.mindforge/personas/onboarding-navigator.md +63 -0
- package/.mindforge/personas/payments-engineer.md +135 -0
- package/.mindforge/personas/pipeline-engineer.md +115 -0
- package/.mindforge/personas/platform-engineer.md +97 -0
- package/.mindforge/personas/platform-lead.md +57 -0
- package/.mindforge/personas/privacy-engineer.md +57 -0
- package/.mindforge/personas/product-owner.md +56 -0
- package/.mindforge/personas/productivity-analyst.md +57 -0
- package/.mindforge/personas/prompt-architect.md +101 -0
- package/.mindforge/personas/proofreader.md +53 -0
- package/.mindforge/personas/pwa-architect.md +105 -0
- package/.mindforge/personas/quality-scorer.md +63 -0
- package/.mindforge/personas/react-native-engineer.md +106 -0
- package/.mindforge/personas/resilience-engineer.md +69 -0
- package/.mindforge/personas/rfc-architect.md +64 -0
- package/.mindforge/personas/saga-orchestrator.md +80 -0
- package/.mindforge/personas/secrets-engineer.md +57 -0
- package/.mindforge/personas/skill-smith.md +79 -0
- package/.mindforge/personas/sre-lead.md +107 -0
- package/.mindforge/personas/stream-engineer.md +57 -0
- package/.mindforge/personas/streaming-engineer.md +64 -0
- package/.mindforge/personas/swarm-templates.json +674 -44
- package/.mindforge/personas/system-designer.md +57 -0
- package/.mindforge/personas/team-coach.md +120 -0
- package/.mindforge/personas/tech-lead-coach.md +103 -0
- package/.mindforge/personas/technical-writer-lead.md +111 -0
- package/.mindforge/personas/vibe-checker.md +75 -0
- package/.mindforge/personas/worktree-manager.md +56 -0
- package/.mindforge/personas/zero-trust-engineer.md +113 -0
- package/.mindforge/skills/a11y-testing/SKILL.md +143 -0
- package/.mindforge/skills/agent-evaluation-framework/SKILL.md +227 -0
- package/.mindforge/skills/agent-memory-design/SKILL.md +199 -0
- package/.mindforge/skills/agent-orchestration-patterns/SKILL.md +129 -0
- package/.mindforge/skills/agent-tool-selection/SKILL.md +204 -0
- package/.mindforge/skills/ai-agent-deployment/SKILL.md +176 -0
- package/.mindforge/skills/ai-cost-management/SKILL.md +57 -0
- package/.mindforge/skills/ai-safety-alignment/SKILL.md +53 -0
- package/.mindforge/skills/analytics-instrumentation/SKILL.md +172 -0
- package/.mindforge/skills/api-gateway-patterns/SKILL.md +177 -0
- package/.mindforge/skills/api-marketplace/SKILL.md +56 -0
- package/.mindforge/skills/api-versioning/SKILL.md +100 -0
- package/.mindforge/skills/app-store-deployment/SKILL.md +44 -0
- package/.mindforge/skills/architecture-tradeoff-analysis/SKILL.md +97 -0
- package/.mindforge/skills/audit-logging/SKILL.md +140 -0
- package/.mindforge/skills/auth-patterns/SKILL.md +148 -0
- package/.mindforge/skills/autonomous-agent-harness/SKILL.md +218 -0
- package/.mindforge/skills/autonomous-agents/SKILL.md +59 -0
- package/.mindforge/skills/build-system-optimization/SKILL.md +54 -0
- package/.mindforge/skills/build-vs-buy/SKILL.md +80 -0
- package/.mindforge/skills/bundle-optimization/SKILL.md +174 -0
- package/.mindforge/skills/business-analyst/SKILL.md +82 -0
- package/.mindforge/skills/caching-strategies/SKILL.md +132 -0
- package/.mindforge/skills/capacity-planning/SKILL.md +96 -0
- package/.mindforge/skills/causal-inference/SKILL.md +42 -0
- package/.mindforge/skills/cdn-optimization/SKILL.md +212 -0
- package/.mindforge/skills/change-management/SKILL.md +106 -0
- package/.mindforge/skills/chaos-engineering/SKILL.md +99 -0
- package/.mindforge/skills/ci-cd-pipeline/SKILL.md +118 -0
- package/.mindforge/skills/cli-design/SKILL.md +118 -0
- package/.mindforge/skills/code-generation-patterns/SKILL.md +92 -0
- package/.mindforge/skills/code-review-methodology/SKILL.md +180 -0
- package/.mindforge/skills/code-tour/SKILL.md +145 -0
- package/.mindforge/skills/codebase-onboarding/SKILL.md +95 -0
- package/.mindforge/skills/compliance-as-code/SKILL.md +195 -0
- package/.mindforge/skills/conflict-resolution/SKILL.md +87 -0
- package/.mindforge/skills/connection-pooling/SKILL.md +151 -0
- package/.mindforge/skills/container-security/SKILL.md +151 -0
- package/.mindforge/skills/context-engineering/SKILL.md +114 -0
- package/.mindforge/skills/contract-testing/SKILL.md +85 -0
- package/.mindforge/skills/cost-estimation/SKILL.md +82 -0
- package/.mindforge/skills/cqrs-event-sourcing/SKILL.md +95 -0
- package/.mindforge/skills/cross-platform-testing/SKILL.md +43 -0
- package/.mindforge/skills/data-governance/SKILL.md +42 -0
- package/.mindforge/skills/data-lakehouse/SKILL.md +42 -0
- package/.mindforge/skills/data-mesh/SKILL.md +42 -0
- package/.mindforge/skills/data-modeling/SKILL.md +107 -0
- package/.mindforge/skills/data-pipeline-design/SKILL.md +171 -0
- package/.mindforge/skills/data-privacy-engineering/SKILL.md +42 -0
- package/.mindforge/skills/database-performance/SKILL.md +174 -0
- package/.mindforge/skills/database-sharding-advanced/SKILL.md +206 -0
- package/.mindforge/skills/de-sloppify/SKILL.md +120 -0
- package/.mindforge/skills/defense-in-depth/SKILL.md +84 -0
- package/.mindforge/skills/delegation-patterns/SKILL.md +123 -0
- package/.mindforge/skills/dependency-management/SKILL.md +94 -0
- package/.mindforge/skills/deployment-workflow/SKILL.md +135 -0
- package/.mindforge/skills/design-system/SKILL.md +113 -0
- package/.mindforge/skills/developer-onboarding/SKILL.md +99 -0
- package/.mindforge/skills/developer-productivity-metrics/SKILL.md +59 -0
- package/.mindforge/skills/distributed-consensus/SKILL.md +141 -0
- package/.mindforge/skills/dmux-workflows/SKILL.md +141 -0
- package/.mindforge/skills/dns-architecture/SKILL.md +167 -0
- package/.mindforge/skills/ecommerce-architecture/SKILL.md +41 -0
- package/.mindforge/skills/edge-computing/SKILL.md +91 -0
- package/.mindforge/skills/edtech-platform/SKILL.md +41 -0
- package/.mindforge/skills/email-deliverability/SKILL.md +177 -0
- package/.mindforge/skills/embedding-systems/SKILL.md +55 -0
- package/.mindforge/skills/environment-management/SKILL.md +54 -0
- package/.mindforge/skills/error-handling-architecture/SKILL.md +118 -0
- package/.mindforge/skills/estimation-techniques/SKILL.md +113 -0
- package/.mindforge/skills/eval-harness/SKILL.md +180 -0
- package/.mindforge/skills/event-driven-architecture/SKILL.md +162 -0
- package/.mindforge/skills/experiment-design/SKILL.md +139 -0
- package/.mindforge/skills/experiment-platform/SKILL.md +43 -0
- package/.mindforge/skills/feature-engineering/SKILL.md +42 -0
- package/.mindforge/skills/feature-flag-management/SKILL.md +183 -0
- package/.mindforge/skills/fine-tuning-workflow/SKILL.md +189 -0
- package/.mindforge/skills/fintech-patterns/SKILL.md +41 -0
- package/.mindforge/skills/flutter-architecture/SKILL.md +42 -0
- package/.mindforge/skills/gaming-backend/SKILL.md +41 -0
- package/.mindforge/skills/git-workflow-design/SKILL.md +129 -0
- package/.mindforge/skills/graceful-degradation/SKILL.md +95 -0
- package/.mindforge/skills/graphql-patterns/SKILL.md +243 -0
- package/.mindforge/skills/guardrails-and-safety/SKILL.md +137 -0
- package/.mindforge/skills/healthcare-systems/SKILL.md +40 -0
- package/.mindforge/skills/hiring-engineering/SKILL.md +119 -0
- package/.mindforge/skills/human-in-the-loop-design/SKILL.md +234 -0
- package/.mindforge/skills/i18n-architecture/SKILL.md +147 -0
- package/.mindforge/skills/idempotency-patterns/SKILL.md +84 -0
- package/.mindforge/skills/incident-communication/SKILL.md +96 -0
- package/.mindforge/skills/incident-management/SKILL.md +97 -0
- package/.mindforge/skills/infrastructure-as-code/SKILL.md +98 -0
- package/.mindforge/skills/instinct-clustering/SKILL.md +190 -0
- package/.mindforge/skills/internal-developer-platform/SKILL.md +51 -0
- package/.mindforge/skills/iot-platform/SKILL.md +41 -0
- package/.mindforge/skills/k8s-deployment/SKILL.md +358 -0
- package/.mindforge/skills/knowledge-graphs/SKILL.md +56 -0
- package/.mindforge/skills/knowledge-sharing-systems/SKILL.md +112 -0
- package/.mindforge/skills/llm-cost-optimization/SKILL.md +198 -0
- package/.mindforge/skills/llm-orchestration/SKILL.md +56 -0
- package/.mindforge/skills/load-testing/SKILL.md +84 -0
- package/.mindforge/skills/logistics-optimization/SKILL.md +40 -0
- package/.mindforge/skills/market-researcher/SKILL.md +99 -0
- package/.mindforge/skills/marketplace-trust/SKILL.md +40 -0
- package/.mindforge/skills/mcp-server-patterns/SKILL.md +264 -0
- package/.mindforge/skills/media-streaming/SKILL.md +41 -0
- package/.mindforge/skills/meeting-architecture/SKILL.md +146 -0
- package/.mindforge/skills/mentoring-patterns/SKILL.md +77 -0
- package/.mindforge/skills/microservices-patterns/SKILL.md +83 -0
- package/.mindforge/skills/migration-platform/SKILL.md +61 -0
- package/.mindforge/skills/migration-strategies/SKILL.md +129 -0
- package/.mindforge/skills/ml-feature-store/SKILL.md +56 -0
- package/.mindforge/skills/ml-monitoring/SKILL.md +42 -0
- package/.mindforge/skills/mobile-performance/SKILL.md +44 -0
- package/.mindforge/skills/mobile-security/SKILL.md +45 -0
- package/.mindforge/skills/model-evaluation/SKILL.md +53 -0
- package/.mindforge/skills/monorepo-management/SKILL.md +100 -0
- package/.mindforge/skills/multi-tenancy-patterns/SKILL.md +145 -0
- package/.mindforge/skills/multi-turn-conversation-design/SKILL.md +206 -0
- package/.mindforge/skills/multimodal-ai/SKILL.md +51 -0
- package/.mindforge/skills/mutation-testing/SKILL.md +97 -0
- package/.mindforge/skills/notification-system-design/SKILL.md +168 -0
- package/.mindforge/skills/observability-stack/SKILL.md +136 -0
- package/.mindforge/skills/offline-first-design/SKILL.md +43 -0
- package/.mindforge/skills/on-call-design/SKILL.md +111 -0
- package/.mindforge/skills/pagination-patterns/SKILL.md +230 -0
- package/.mindforge/skills/payment-integration/SKILL.md +176 -0
- package/.mindforge/skills/performance-reviews/SKILL.md +140 -0
- package/.mindforge/skills/platform-observability/SKILL.md +58 -0
- package/.mindforge/skills/platform-reliability/SKILL.md +52 -0
- package/.mindforge/skills/post-incident-learning/SKILL.md +96 -0
- package/.mindforge/skills/product-manager/SKILL.md +104 -0
- package/.mindforge/skills/progressive-web-app/SKILL.md +44 -0
- package/.mindforge/skills/prompt-engineering/SKILL.md +94 -0
- package/.mindforge/skills/proofreader/SKILL.md +158 -0
- package/.mindforge/skills/push-notification-architecture/SKILL.md +45 -0
- package/.mindforge/skills/python-performance/SKILL.md +183 -0
- package/.mindforge/skills/quality-audit/SKILL.md +171 -0
- package/.mindforge/skills/queue-design/SKILL.md +85 -0
- package/.mindforge/skills/rag-architecture/SKILL.md +176 -0
- package/.mindforge/skills/rate-limiting-design/SKILL.md +94 -0
- package/.mindforge/skills/react-native-patterns/SKILL.md +42 -0
- package/.mindforge/skills/react-performance/SKILL.md +229 -0
- package/.mindforge/skills/real-time-analytics/SKILL.md +42 -0
- package/.mindforge/skills/real-time-sync/SKILL.md +83 -0
- package/.mindforge/skills/responsive-native/SKILL.md +44 -0
- package/.mindforge/skills/responsive-patterns/SKILL.md +141 -0
- package/.mindforge/skills/rfc-pipeline/SKILL.md +114 -0
- package/.mindforge/skills/saas-multi-tenant/SKILL.md +41 -0
- package/.mindforge/skills/santa-method/SKILL.md +134 -0
- package/.mindforge/skills/search-implementation/SKILL.md +98 -0
- package/.mindforge/skills/secrets-platform/SKILL.md +56 -0
- package/.mindforge/skills/secrets-rotation/SKILL.md +173 -0
- package/.mindforge/skills/self-serve-infrastructure/SKILL.md +51 -0
- package/.mindforge/skills/serverless-patterns/SKILL.md +119 -0
- package/.mindforge/skills/skill-creator-meta/SKILL.md +146 -0
- package/.mindforge/skills/sprint-retrospective-facilitation/SKILL.md +112 -0
- package/.mindforge/skills/stakeholder-communication/SKILL.md +85 -0
- package/.mindforge/skills/state-management/SKILL.md +104 -0
- package/.mindforge/skills/stream-processing/SKILL.md +43 -0
- package/.mindforge/skills/streaming-architecture/SKILL.md +81 -0
- package/.mindforge/skills/supply-chain-security/SKILL.md +145 -0
- package/.mindforge/skills/synthetic-data-generation/SKILL.md +52 -0
- package/.mindforge/skills/system-design/SKILL.md +88 -0
- package/.mindforge/skills/team-topology-design/SKILL.md +107 -0
- package/.mindforge/skills/technical-debt-management/SKILL.md +86 -0
- package/.mindforge/skills/technical-interview-design/SKILL.md +98 -0
- package/.mindforge/skills/technical-leadership/SKILL.md +75 -0
- package/.mindforge/skills/technical-writing/SKILL.md +237 -0
- package/.mindforge/skills/technology-radar/SKILL.md +88 -0
- package/.mindforge/skills/testing-anti-patterns/SKILL.md +288 -0
- package/.mindforge/skills/tool-design/SKILL.md +138 -0
- package/.mindforge/skills/typescript-advanced/SKILL.md +198 -0
- package/.mindforge/skills/using-git-worktrees/SKILL.md +139 -0
- package/.mindforge/skills/verification-loop/SKILL.md +13 -1
- package/.mindforge/skills/vibe-security/SKILL.md +165 -0
- package/.mindforge/skills/visual-regression-testing/SKILL.md +97 -0
- package/.mindforge/skills/websocket-patterns/SKILL.md +203 -0
- package/.mindforge/skills/writing-plans/SKILL.md +170 -0
- package/.mindforge/skills/writing-skills/SKILL.md +216 -0
- package/.mindforge/skills/zero-trust-architecture/SKILL.md +166 -0
- package/CHANGELOG.md +240 -0
- package/MINDFORGE.md +4 -4
- package/README.md +49 -4
- package/RELEASENOTES.md +80 -0
- package/SECURITY.md +20 -8
- package/bin/autonomous/audit-writer.js +13 -0
- package/bin/autonomous/auto-runner.js +74 -16
- package/bin/autonomous/context-refactorer.js +26 -11
- package/bin/autonomous/state-manager.js +62 -6
- package/bin/autonomous/stuck-monitor.js +46 -7
- package/bin/autonomous/wave-executor.js +66 -25
- package/bin/dashboard/api-router.js +43 -0
- package/bin/dashboard/metrics-aggregator.js +28 -1
- package/bin/dashboard/server.js +67 -4
- package/bin/dashboard/sse-bridge.js +4 -4
- package/bin/engine/feedback-loop.js +8 -0
- package/bin/engine/intelligence-interlock.js +32 -15
- package/bin/engine/logic-drift-detector.js +2 -1
- package/bin/engine/nexus-tracer.js +3 -2
- package/bin/engine/remediation-engine.js +155 -32
- package/bin/engine/self-corrective-synthesizer.js +84 -10
- package/bin/engine/sre-manager.js +12 -4
- package/bin/engine/temporal-hub.js +131 -34
- package/bin/governance/approve.js +41 -5
- package/bin/governance/impact-analyzer.js +28 -0
- package/bin/governance/policy-engine.js +10 -3
- package/bin/governance/quantum-crypto.js +32 -19
- package/bin/governance/rbac-manager.js +74 -2
- package/bin/governance/ztai-manager.js +49 -7
- package/bin/hindsight-injector.js +3 -3
- package/bin/memory/eis-client.js +71 -34
- package/bin/memory/embedding-engine.js +61 -0
- package/bin/memory/knowledge-graph.js +58 -5
- package/bin/memory/knowledge-indexer.js +53 -6
- package/bin/memory/knowledge-store.js +22 -0
- package/bin/migrations/10.7.0-to-11.0.0.js +110 -0
- package/bin/migrations/schema-versions.js +13 -0
- package/bin/models/anthropic-provider.js +45 -0
- package/bin/models/cloud-broker.js +68 -20
- package/bin/models/gemini-provider.js +51 -0
- package/bin/models/model-client.js +20 -0
- package/bin/models/model-router.js +28 -8
- package/bin/models/openai-provider.js +44 -0
- package/bin/utils/file-io.js +63 -1
- package/bin/utils/index.js +58 -0
- package/docs/getting-started.md +1 -1
- package/docs/user-guide.md +2 -2
- package/package.json +2 -2
- package/.mindforge/personas/data-privacy-engineer.md +0 -187
|
@@ -0,0 +1,143 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: a11y-testing
|
|
3
|
+
version: 1.0.0
|
|
4
|
+
min_mindforge_version: 0.3.0
|
|
5
|
+
status: stable
|
|
6
|
+
triggers: a11y testing, axe-core, automated accessibility, screen reader testing, keyboard navigation audit, WCAG compliance test, aria validation, focus management test, color contrast check, accessibility CI, accessibility report, assistive technology
|
|
7
|
+
compose: accessibility
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Skill — Accessibility Testing
|
|
11
|
+
|
|
12
|
+
## When this skill activates
|
|
13
|
+
Any task involving accessibility testing, WCAG compliance, screen reader validation,
|
|
14
|
+
keyboard navigation audits, or automated a11y CI pipelines.
|
|
15
|
+
|
|
16
|
+
## Mandatory actions when this skill is active
|
|
17
|
+
|
|
18
|
+
### Before testing accessibility
|
|
19
|
+
1. Identify the target WCAG conformance level (A, AA, or AAA).
|
|
20
|
+
2. Determine which automated tools are available in the project.
|
|
21
|
+
3. Plan manual testing scenarios for what automation cannot catch.
|
|
22
|
+
|
|
23
|
+
### Automated testing (~30% of issues)
|
|
24
|
+
|
|
25
|
+
**Unit level (jest-axe):**
|
|
26
|
+
```javascript
|
|
27
|
+
import { axe, toHaveNoViolations } from 'jest-axe';
|
|
28
|
+
expect.extend(toHaveNoViolations);
|
|
29
|
+
|
|
30
|
+
it('has no accessibility violations', async () => {
|
|
31
|
+
const { container } = render(<Component />);
|
|
32
|
+
const results = await axe(container);
|
|
33
|
+
expect(results).toHaveNoViolations();
|
|
34
|
+
});
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
**Integration level (Playwright + axe):**
|
|
38
|
+
```javascript
|
|
39
|
+
import AxeBuilder from '@axe-core/playwright';
|
|
40
|
+
|
|
41
|
+
test('page has no a11y violations', async ({ page }) => {
|
|
42
|
+
await page.goto('/dashboard');
|
|
43
|
+
const results = await new AxeBuilder({ page }).analyze();
|
|
44
|
+
expect(results.violations).toEqual([]);
|
|
45
|
+
});
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
**CI pipeline:**
|
|
49
|
+
- Run axe-core on every PR against all critical routes.
|
|
50
|
+
- Fail the build on any "critical" or "serious" violations.
|
|
51
|
+
- Report "moderate" violations as warnings (fix in next sprint).
|
|
52
|
+
- Track violation count over time — must trend downward.
|
|
53
|
+
|
|
54
|
+
### Manual testing checklist
|
|
55
|
+
|
|
56
|
+
**Keyboard navigation:**
|
|
57
|
+
- [ ] Tab through ALL interactive elements in logical order.
|
|
58
|
+
- [ ] Shift+Tab moves backwards correctly.
|
|
59
|
+
- [ ] Enter/Space activates buttons and links.
|
|
60
|
+
- [ ] Escape closes modals, dropdowns, popovers.
|
|
61
|
+
- [ ] Arrow keys navigate within composite widgets (tabs, menus, grids).
|
|
62
|
+
- [ ] No keyboard traps (can always Tab out, except modals).
|
|
63
|
+
- [ ] Focus indicator is clearly visible on all elements.
|
|
64
|
+
|
|
65
|
+
**Screen reader testing:**
|
|
66
|
+
- [ ] VoiceOver (macOS/iOS) — full user flow.
|
|
67
|
+
- [ ] NVDA (Windows) — full user flow.
|
|
68
|
+
- [ ] All images have meaningful alt text (or alt="" for decorative).
|
|
69
|
+
- [ ] Form inputs have associated labels.
|
|
70
|
+
- [ ] Dynamic content changes are announced (aria-live regions).
|
|
71
|
+
- [ ] Headings form a logical hierarchy (h1 > h2 > h3, no skips).
|
|
72
|
+
|
|
73
|
+
**Visual testing:**
|
|
74
|
+
- [ ] Zoom to 200% — no horizontal scroll, no overlapping content.
|
|
75
|
+
- [ ] Zoom to 400% — content still readable and usable.
|
|
76
|
+
- [ ] High contrast mode — all content visible.
|
|
77
|
+
- [ ] Reduced motion — animations respect prefers-reduced-motion.
|
|
78
|
+
|
|
79
|
+
### WCAG conformance levels
|
|
80
|
+
|
|
81
|
+
**Level A (minimum, always required):**
|
|
82
|
+
- All non-text content has text alternative.
|
|
83
|
+
- Content is navigable by keyboard.
|
|
84
|
+
- No content causes seizures.
|
|
85
|
+
|
|
86
|
+
**Level AA (target for most applications):**
|
|
87
|
+
- Color contrast ratio 4.5:1 for normal text, 3:1 for large text.
|
|
88
|
+
- Text can be resized to 200% without loss of content.
|
|
89
|
+
- Focus order is meaningful and logical.
|
|
90
|
+
- Error messages identify the field and suggest correction.
|
|
91
|
+
|
|
92
|
+
**Level AAA (specialized — not typically a blanket requirement):**
|
|
93
|
+
- Color contrast ratio 7:1 for normal text.
|
|
94
|
+
- Sign language interpretation for media.
|
|
95
|
+
- Reading level accommodations.
|
|
96
|
+
|
|
97
|
+
### Focus management
|
|
98
|
+
|
|
99
|
+
**Modal dialogs:**
|
|
100
|
+
- Move focus into the modal when opened.
|
|
101
|
+
- Trap focus within the modal (Tab cycles inside).
|
|
102
|
+
- Return focus to the trigger element when closed.
|
|
103
|
+
|
|
104
|
+
**Route changes (SPA):**
|
|
105
|
+
- Move focus to the main content heading on navigation.
|
|
106
|
+
- Announce the new page to screen readers (aria-live or document.title).
|
|
107
|
+
|
|
108
|
+
**Dynamic content:**
|
|
109
|
+
- New content added below the current focus: no announcement needed.
|
|
110
|
+
- New content that requires attention: use aria-live="polite".
|
|
111
|
+
- Urgent alerts: use aria-live="assertive" (sparingly).
|
|
112
|
+
|
|
113
|
+
**Skip links:**
|
|
114
|
+
- First focusable element should be "Skip to main content."
|
|
115
|
+
- Links to bypass repetitive navigation blocks.
|
|
116
|
+
|
|
117
|
+
### Color contrast
|
|
118
|
+
|
|
119
|
+
**Tools:**
|
|
120
|
+
- Browser DevTools (Accessibility panel shows contrast ratios).
|
|
121
|
+
- axe-core catches contrast violations automatically.
|
|
122
|
+
- Contrast checker plugins for design tools (Figma, Sketch).
|
|
123
|
+
|
|
124
|
+
**Ratios:**
|
|
125
|
+
- Normal text (< 18px or < 14px bold): minimum 4.5:1.
|
|
126
|
+
- Large text (>= 18px or >= 14px bold): minimum 3:1.
|
|
127
|
+
- UI components and graphical objects: minimum 3:1.
|
|
128
|
+
- Never convey information by color alone (add icons, patterns, text).
|
|
129
|
+
|
|
130
|
+
### Reporting format
|
|
131
|
+
|
|
132
|
+
When reporting accessibility issues, include:
|
|
133
|
+
1. **What** — the specific WCAG criterion violated.
|
|
134
|
+
2. **Where** — page URL and element selector/description.
|
|
135
|
+
3. **Impact** — who is affected and how severely.
|
|
136
|
+
4. **Fix** — specific remediation recommendation.
|
|
137
|
+
5. **Priority** — critical (blocks usage) / serious (difficult) / moderate (inconvenient).
|
|
138
|
+
|
|
139
|
+
## Self-check before task completion
|
|
140
|
+
- [ ] Did I follow the mandatory actions for this skill?
|
|
141
|
+
- [ ] Did I apply the patterns appropriate to the context?
|
|
142
|
+
- [ ] Did I verify the implementation meets the criteria above?
|
|
143
|
+
- [ ] Did I document decisions and trade-offs made?
|
|
@@ -0,0 +1,227 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: agent-evaluation-framework
|
|
3
|
+
version: 1.0.0
|
|
4
|
+
min_mindforge_version: 10.0.4
|
|
5
|
+
status: stable
|
|
6
|
+
triggers: agent evaluation, task completion rate, agent benchmark, reasoning quality, tool selection accuracy, agent cost efficiency, end-to-end agent test, agent regression, agent quality score, agent performance metric, evaluation harness design, agent grading
|
|
7
|
+
compose: eval-harness
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Skill — Agent Evaluation Framework (End-to-End Agent Performance Measurement)
|
|
11
|
+
|
|
12
|
+
## When this skill activates
|
|
13
|
+
When measuring agent performance, designing agent benchmarks, tracking quality
|
|
14
|
+
regressions, or comparing agent configurations. Use for any scenario where you
|
|
15
|
+
need to answer: "Is this agent good enough?" or "Did this change make the agent
|
|
16
|
+
better or worse?"
|
|
17
|
+
|
|
18
|
+
Core principle: **Multi-dimensional quality** — agent quality is not a single number.
|
|
19
|
+
A fast agent that's wrong is worse than a slow agent that's right. A cheap agent
|
|
20
|
+
that hallucinates is worse than an expensive agent that's accurate. Measure ALL
|
|
21
|
+
dimensions that matter.
|
|
22
|
+
|
|
23
|
+
## Mandatory actions when this skill is active
|
|
24
|
+
|
|
25
|
+
### Metric Taxonomy
|
|
26
|
+
|
|
27
|
+
1. **Core agent metrics (measure ALL of these):**
|
|
28
|
+
```
|
|
29
|
+
Correctness metrics:
|
|
30
|
+
- Task completion rate: % of tasks completed successfully (end-to-end)
|
|
31
|
+
- First-attempt success rate: % completed without retry or correction
|
|
32
|
+
- Factual accuracy: % of claims that are verifiable and correct
|
|
33
|
+
- Instruction adherence: % of explicit instructions followed correctly
|
|
34
|
+
|
|
35
|
+
Efficiency metrics:
|
|
36
|
+
- Cost per task: total API spend / successful completions
|
|
37
|
+
- Tokens per task: input + output tokens consumed
|
|
38
|
+
- Time per task: wall-clock time from task start to completion
|
|
39
|
+
- Tool calls per task: number of tool invocations (fewer = more efficient)
|
|
40
|
+
|
|
41
|
+
Quality metrics:
|
|
42
|
+
- Reasoning quality score: rubric-based assessment of reasoning chain
|
|
43
|
+
- Tool selection accuracy: % of tool calls that were appropriate
|
|
44
|
+
- Output quality score: rubric-based assessment of final output
|
|
45
|
+
- Hallucination rate: % of outputs containing ungrounded claims
|
|
46
|
+
|
|
47
|
+
Safety metrics:
|
|
48
|
+
- Harmful output rate: % of outputs flagged by safety classifiers
|
|
49
|
+
- Permission violation rate: % of actions exceeding authorized scope
|
|
50
|
+
- Information leakage rate: % of outputs exposing sensitive data
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
2. **Composite quality score:**
|
|
54
|
+
```
|
|
55
|
+
Agent Quality Score = weighted combination:
|
|
56
|
+
- Correctness (40%): task_completion * 0.25 + first_attempt * 0.15
|
|
57
|
+
- Quality (30%): reasoning_quality * 0.15 + output_quality * 0.15
|
|
58
|
+
- Efficiency (20%): normalized(1/cost) * 0.10 + normalized(1/time) * 0.10
|
|
59
|
+
- Safety (10%): (1 - harmful_rate) * 0.05 + (1 - violation_rate) * 0.05
|
|
60
|
+
|
|
61
|
+
Weights are defaults — adjust per use case (safety-critical → increase safety weight)
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
### Benchmark Design
|
|
65
|
+
|
|
66
|
+
3. **Evaluation dataset structure:**
|
|
67
|
+
```
|
|
68
|
+
.mindforge/evals/agent-benchmark/
|
|
69
|
+
├── config.json # benchmark metadata and thresholds
|
|
70
|
+
├── tasks/
|
|
71
|
+
│ ├── easy/ # baseline tasks (should be ~100% success)
|
|
72
|
+
│ │ ├── task-001.json
|
|
73
|
+
│ │ └── task-002.json
|
|
74
|
+
│ ├── medium/ # standard tasks (target: 80%+ success)
|
|
75
|
+
│ │ ├── task-010.json
|
|
76
|
+
│ │ └── task-011.json
|
|
77
|
+
│ └── hard/ # stretch tasks (target: 50%+ success)
|
|
78
|
+
│ ├── task-020.json
|
|
79
|
+
│ └── task-021.json
|
|
80
|
+
├── rubrics/
|
|
81
|
+
│ ├── correctness.md # how to grade correctness
|
|
82
|
+
│ ├── reasoning.md # how to grade reasoning quality
|
|
83
|
+
│ └── output.md # how to grade output quality
|
|
84
|
+
└── results/
|
|
85
|
+
└── results.jsonl # append-only results log
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
4. **Task definition format:**
|
|
89
|
+
```json
|
|
90
|
+
{
|
|
91
|
+
"task_id": "task-001",
|
|
92
|
+
"difficulty": "easy",
|
|
93
|
+
"category": "code-generation",
|
|
94
|
+
"description": "Write a function that reverses a string",
|
|
95
|
+
"input": "Create a TypeScript function reverseString(s: string): string",
|
|
96
|
+
"expected_behavior": [
|
|
97
|
+
"Returns reversed string",
|
|
98
|
+
"Handles empty string",
|
|
99
|
+
"Handles unicode correctly"
|
|
100
|
+
],
|
|
101
|
+
"verification": {
|
|
102
|
+
"type": "code",
|
|
103
|
+
"test_cases": [
|
|
104
|
+
{"input": "hello", "expected": "olleh"},
|
|
105
|
+
{"input": "", "expected": ""},
|
|
106
|
+
{"input": "abc", "expected": "cba"}
|
|
107
|
+
]
|
|
108
|
+
},
|
|
109
|
+
"metadata": {
|
|
110
|
+
"tools_available": ["Read", "Write", "Bash"],
|
|
111
|
+
"time_limit_seconds": 120,
|
|
112
|
+
"cost_limit_usd": 0.50
|
|
113
|
+
}
|
|
114
|
+
}
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
Rules:
|
|
118
|
+
- Minimum 30 tasks per benchmark (10 easy, 15 medium, 5 hard)
|
|
119
|
+
- Tasks must be representative of real usage patterns
|
|
120
|
+
- Include both deterministic tasks (one right answer) and generative tasks (rubric-graded)
|
|
121
|
+
- Each task has explicit success criteria (not vague "good output")
|
|
122
|
+
- Stratify by difficulty to detect capability thresholds
|
|
123
|
+
|
|
124
|
+
### Running Benchmarks
|
|
125
|
+
|
|
126
|
+
5. **Execution protocol:**
|
|
127
|
+
```
|
|
128
|
+
For each task in benchmark:
|
|
129
|
+
1. Initialize fresh agent context (no contamination between tasks)
|
|
130
|
+
2. Provide task input + available tools
|
|
131
|
+
3. Record: start_time, all tool calls, all outputs, end_time
|
|
132
|
+
4. Grade output against verification criteria
|
|
133
|
+
5. Log full result to results.jsonl
|
|
134
|
+
|
|
135
|
+
Run N times per task (N >= 3) to measure variance:
|
|
136
|
+
- Report mean and standard deviation per metric
|
|
137
|
+
- Flag high-variance tasks (inconsistent agent behavior)
|
|
138
|
+
- Use same random seed where possible for reproducibility
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
6. **Result logging:**
|
|
142
|
+
```json
|
|
143
|
+
{
|
|
144
|
+
"run_id": "uuid",
|
|
145
|
+
"timestamp": "ISO-8601",
|
|
146
|
+
"task_id": "task-001",
|
|
147
|
+
"agent_config": {"model": "claude-sonnet", "temperature": 0.0},
|
|
148
|
+
"metrics": {
|
|
149
|
+
"completed": true,
|
|
150
|
+
"first_attempt": true,
|
|
151
|
+
"time_seconds": 15.3,
|
|
152
|
+
"cost_usd": 0.012,
|
|
153
|
+
"tokens_used": {"input": 1200, "output": 450},
|
|
154
|
+
"tool_calls": 3,
|
|
155
|
+
"reasoning_quality": 4,
|
|
156
|
+
"output_quality": 5
|
|
157
|
+
},
|
|
158
|
+
"grading": {
|
|
159
|
+
"method": "code",
|
|
160
|
+
"pass": true,
|
|
161
|
+
"evidence": "All 3 test cases passed"
|
|
162
|
+
}
|
|
163
|
+
}
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
### Regression Detection
|
|
167
|
+
|
|
168
|
+
7. **Regression detection algorithm:**
|
|
169
|
+
```
|
|
170
|
+
Compare current run vs baseline:
|
|
171
|
+
|
|
172
|
+
RED (regression detected — blocks deployment):
|
|
173
|
+
- Task completion rate drops > 5%
|
|
174
|
+
- Any previously-passing easy task now fails
|
|
175
|
+
- Cost per task increases > 50%
|
|
176
|
+
- Safety metric degrades at all
|
|
177
|
+
|
|
178
|
+
YELLOW (warning — investigate before deploying):
|
|
179
|
+
- Task completion rate drops 2-5%
|
|
180
|
+
- Medium/hard task pass rate drops > 10%
|
|
181
|
+
- Time per task increases > 30%
|
|
182
|
+
- New failure modes appear
|
|
183
|
+
|
|
184
|
+
GREEN (no regression):
|
|
185
|
+
- All metrics within 2% of baseline
|
|
186
|
+
- No new failure modes
|
|
187
|
+
- Cost/time stable or improved
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
Rules:
|
|
191
|
+
- ALWAYS compare to a pinned baseline (not just previous run)
|
|
192
|
+
- Run regression suite before any agent config change ships
|
|
193
|
+
- Regression in EASY tasks is more alarming than regression in HARD tasks
|
|
194
|
+
- Store baseline with agent version (update baseline when intentionally accepting changes)
|
|
195
|
+
|
|
196
|
+
### Cost Efficiency Analysis
|
|
197
|
+
|
|
198
|
+
8. **Quality-per-dollar assessment:**
|
|
199
|
+
```
|
|
200
|
+
Cost Efficiency Ratio = quality_score / cost_per_task
|
|
201
|
+
|
|
202
|
+
Comparison framework:
|
|
203
|
+
- Agent A: quality=0.92, cost=$0.05/task → efficiency=18.4
|
|
204
|
+
- Agent B: quality=0.88, cost=$0.01/task → efficiency=88.0
|
|
205
|
+
|
|
206
|
+
Decision: Agent B is 4.8x more cost-efficient.
|
|
207
|
+
Choose A only if the 4% quality gap causes real user-visible failures.
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
Rules:
|
|
211
|
+
- A cheaper model that achieves 95% of the quality at 20% of the cost is usually better
|
|
212
|
+
- Factor in retry cost (low first-attempt rate = hidden cost multiplier)
|
|
213
|
+
- Include tool call costs in total cost (API calls, compute)
|
|
214
|
+
- Report cost efficiency alongside raw quality (both matter)
|
|
215
|
+
|
|
216
|
+
## Self-check before task completion
|
|
217
|
+
|
|
218
|
+
Before marking a task done when this skill was active:
|
|
219
|
+
|
|
220
|
+
- [ ] Did I define metrics across all four dimensions (correctness, quality, efficiency, safety)?
|
|
221
|
+
- [ ] Is the benchmark stratified by difficulty (easy/medium/hard)?
|
|
222
|
+
- [ ] Did I run multiple times (N >= 3) to measure variance?
|
|
223
|
+
- [ ] Is there a pinned baseline for regression detection?
|
|
224
|
+
- [ ] Are regression thresholds defined (RED/YELLOW/GREEN)?
|
|
225
|
+
- [ ] Did I report cost efficiency (quality/cost ratio), not just raw quality?
|
|
226
|
+
- [ ] Are easy-task failures treated as more alarming than hard-task failures?
|
|
227
|
+
- [ ] Are results appended to results.jsonl (never overwritten)?
|
|
@@ -0,0 +1,199 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: agent-memory-design
|
|
3
|
+
version: 1.0.0
|
|
4
|
+
min_mindforge_version: 10.0.4
|
|
5
|
+
status: stable
|
|
6
|
+
triggers: agent memory design, short-term memory, long-term memory, memory retrieval, memory consolidation, memory decay, episodic memory, semantic memory, memory architecture, working memory, memory indexing, knowledge persistence
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# Skill — Agent Memory Design (Multi-Layer Knowledge Persistence)
|
|
10
|
+
|
|
11
|
+
## When this skill activates
|
|
12
|
+
When designing memory systems for AI agents, implementing context persistence across
|
|
13
|
+
sessions, building knowledge retrieval mechanisms, or architecting memory consolidation
|
|
14
|
+
pipelines. Use for any agent that needs to remember information beyond a single context
|
|
15
|
+
window.
|
|
16
|
+
|
|
17
|
+
Core principle: **Memory is retrieval, not storage** — the value of a memory system
|
|
18
|
+
is measured by: can the agent find the RIGHT information at the RIGHT time? Storage
|
|
19
|
+
without retrieval is a graveyard.
|
|
20
|
+
|
|
21
|
+
## Mandatory actions when this skill is active
|
|
22
|
+
|
|
23
|
+
### Memory Layer Architecture
|
|
24
|
+
|
|
25
|
+
1. **Four-tier memory model:**
|
|
26
|
+
```
|
|
27
|
+
Layer 1 — Working Memory (context window)
|
|
28
|
+
- Scope: current conversation turn
|
|
29
|
+
- Capacity: model context limit (8K-200K tokens)
|
|
30
|
+
- Persistence: none (gone when context resets)
|
|
31
|
+
- Access: immediate (already in context)
|
|
32
|
+
- Priority: highest — this is what the agent "sees"
|
|
33
|
+
|
|
34
|
+
Layer 2 — Short-Term Memory (session scratchpad)
|
|
35
|
+
- Scope: current session/task
|
|
36
|
+
- Capacity: 10-50 key facts
|
|
37
|
+
- Persistence: session duration
|
|
38
|
+
- Access: explicit retrieval (agent requests it)
|
|
39
|
+
- Format: JSONL scratchpad file
|
|
40
|
+
- Use for: intermediate results, task progress, recent user preferences
|
|
41
|
+
|
|
42
|
+
Layer 3 — Medium-Term Memory (project knowledge)
|
|
43
|
+
- Scope: current project/workspace
|
|
44
|
+
- Capacity: hundreds of entries
|
|
45
|
+
- Persistence: project lifetime
|
|
46
|
+
- Access: semantic search + key lookup
|
|
47
|
+
- Format: structured markdown files, JSON indexes
|
|
48
|
+
- Use for: architectural decisions, user patterns, project conventions
|
|
49
|
+
|
|
50
|
+
Layer 4 — Long-Term Memory (permanent knowledge)
|
|
51
|
+
- Scope: cross-project, cross-session
|
|
52
|
+
- Capacity: unbounded
|
|
53
|
+
- Persistence: permanent (with decay)
|
|
54
|
+
- Access: semantic similarity search + knowledge graph traversal
|
|
55
|
+
- Format: vector DB + knowledge graph
|
|
56
|
+
- Use for: learned patterns, user preferences, domain expertise
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
### Retrieval Strategies
|
|
60
|
+
|
|
61
|
+
2. **Retrieval by memory layer:**
|
|
62
|
+
```
|
|
63
|
+
Working Memory:
|
|
64
|
+
- Already in context — no retrieval needed
|
|
65
|
+
- Manage carefully: don't waste on retrievable facts
|
|
66
|
+
|
|
67
|
+
Short-Term Memory:
|
|
68
|
+
- Recency-weighted retrieval (most recent = most relevant)
|
|
69
|
+
- Key-based lookup: "what was the last error message?"
|
|
70
|
+
- Cleared at session end (or explicit flush)
|
|
71
|
+
|
|
72
|
+
Medium-Term Memory:
|
|
73
|
+
- Keyword + semantic hybrid search
|
|
74
|
+
- Structured queries: "what auth pattern does this project use?"
|
|
75
|
+
- Indexed by: topic, file path, date, relevance score
|
|
76
|
+
|
|
77
|
+
Long-Term Memory:
|
|
78
|
+
- Semantic similarity search (embedding-based)
|
|
79
|
+
- Knowledge graph traversal (entity → relationship → entity)
|
|
80
|
+
- Confidence-weighted (higher confidence = higher ranking)
|
|
81
|
+
- Decay-adjusted (older unreinforced memories rank lower)
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
3. **Retrieval decision flow:**
|
|
85
|
+
```
|
|
86
|
+
When agent needs information:
|
|
87
|
+
1. Check working memory (already in context?) → use it
|
|
88
|
+
2. Check short-term memory (recent session fact?) → retrieve and inject
|
|
89
|
+
3. Check medium-term memory (project knowledge?) → search and inject summary
|
|
90
|
+
4. Check long-term memory (learned pattern?) → search, verify relevance, inject
|
|
91
|
+
5. If not found anywhere → ask user or research from scratch
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
### Memory Consolidation
|
|
95
|
+
|
|
96
|
+
4. **End-of-session consolidation pipeline:**
|
|
97
|
+
```
|
|
98
|
+
Session ends → Consolidation triggers:
|
|
99
|
+
|
|
100
|
+
1. Extract key learnings from session:
|
|
101
|
+
- New facts learned (user preferences, project decisions)
|
|
102
|
+
- Patterns observed (what worked, what failed)
|
|
103
|
+
- Corrections received (mistakes to avoid next time)
|
|
104
|
+
|
|
105
|
+
2. Classify each learning:
|
|
106
|
+
- Short-term only (task-specific, won't matter next session) → discard
|
|
107
|
+
- Medium-term (project-relevant) → write to project memory
|
|
108
|
+
- Long-term (generalizable pattern) → write to permanent memory
|
|
109
|
+
|
|
110
|
+
3. Summarize, don't dump:
|
|
111
|
+
- Raw conversation → extract 5-10 key facts
|
|
112
|
+
- Include WHY something matters, not just WHAT happened
|
|
113
|
+
- Link to existing memories (reinforce or update)
|
|
114
|
+
|
|
115
|
+
4. Update indexes:
|
|
116
|
+
- Add new entries to search indexes
|
|
117
|
+
- Update confidence scores on existing memories
|
|
118
|
+
- Deprecate contradicted memories
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
Rules:
|
|
122
|
+
- Consolidation must be LOSSY (summarize, compress, extract — not raw dump)
|
|
123
|
+
- Every memory entry needs: content, source, timestamp, confidence, tags
|
|
124
|
+
- Contradictions: new information supersedes old (but keep old as "deprecated")
|
|
125
|
+
|
|
126
|
+
### Memory Decay
|
|
127
|
+
|
|
128
|
+
5. **Confidence decay model:**
|
|
129
|
+
```
|
|
130
|
+
Each memory entry has a confidence score [0.0 - 1.0]:
|
|
131
|
+
|
|
132
|
+
Initial confidence:
|
|
133
|
+
- User explicitly stated: 1.0
|
|
134
|
+
- Inferred from behavior: 0.7
|
|
135
|
+
- Assumed from patterns: 0.5
|
|
136
|
+
|
|
137
|
+
Decay rules:
|
|
138
|
+
- Unreinforced memory: confidence -= 0.05 per week
|
|
139
|
+
- Reinforced memory (used successfully): confidence = min(1.0, confidence + 0.1)
|
|
140
|
+
- Contradicted memory: confidence = 0.0 (deprecated, kept for history)
|
|
141
|
+
- Below threshold (confidence < 0.2): excluded from retrieval results
|
|
142
|
+
|
|
143
|
+
Reinforcement triggers:
|
|
144
|
+
- Agent retrieves memory and uses it successfully
|
|
145
|
+
- User confirms a remembered fact
|
|
146
|
+
- Pattern matches current observation
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
### Implementation Patterns
|
|
150
|
+
|
|
151
|
+
6. **Storage format by layer:**
|
|
152
|
+
```
|
|
153
|
+
Short-term (JSONL scratchpad):
|
|
154
|
+
{"key": "last_error", "value": "TypeError at line 42", "ts": "...", "confidence": 1.0}
|
|
155
|
+
{"key": "user_intent", "value": "refactoring auth module", "ts": "...", "confidence": 0.9}
|
|
156
|
+
|
|
157
|
+
Medium-term (structured markdown):
|
|
158
|
+
## Project: MindForge
|
|
159
|
+
### Architecture Decisions
|
|
160
|
+
- Auth: JWT with refresh tokens (decided 2024-03-15, confidence: 1.0)
|
|
161
|
+
- Database: PostgreSQL with Prisma ORM (decided 2024-03-10, confidence: 1.0)
|
|
162
|
+
### User Preferences
|
|
163
|
+
- Prefers functional style over OOP (confidence: 0.8)
|
|
164
|
+
- Wants verbose error messages in dev (confidence: 0.9)
|
|
165
|
+
|
|
166
|
+
Long-term (vector DB + knowledge graph):
|
|
167
|
+
Entry: {id, embedding, content, source, timestamp, confidence, tags, relationships}
|
|
168
|
+
Graph: (pattern)--[applies_to]-->(domain)--[learned_from]-->(project)
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
7. **Working memory budget management:**
|
|
172
|
+
```
|
|
173
|
+
Context window is finite — manage it:
|
|
174
|
+
|
|
175
|
+
Priority for inclusion in working memory:
|
|
176
|
+
1. Current user message and task context (always)
|
|
177
|
+
2. Relevant retrieved memories (top-k by relevance)
|
|
178
|
+
3. System instructions and constraints (always)
|
|
179
|
+
4. Recent conversation history (sliding window)
|
|
180
|
+
|
|
181
|
+
If context is filling up:
|
|
182
|
+
- Summarize older conversation turns (don't drop, compress)
|
|
183
|
+
- Move detailed context to short-term memory (retrieve if needed)
|
|
184
|
+
- Keep only TOP-5 most relevant retrieved memories in context
|
|
185
|
+
- Never sacrifice system instructions for conversation history
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
## Self-check before task completion
|
|
189
|
+
|
|
190
|
+
Before marking a task done when this skill was active:
|
|
191
|
+
|
|
192
|
+
- [ ] Did I define all four memory layers with clear scope and access patterns?
|
|
193
|
+
- [ ] Is retrieval strategy defined per layer (recency, semantic, key-based)?
|
|
194
|
+
- [ ] Is there a consolidation pipeline that extracts and summarizes key learnings?
|
|
195
|
+
- [ ] Is consolidation lossy (summarize, don't dump raw conversation)?
|
|
196
|
+
- [ ] Is memory decay implemented (confidence decreases without reinforcement)?
|
|
197
|
+
- [ ] Are contradicted memories deprecated (not deleted)?
|
|
198
|
+
- [ ] Is working memory budget managed (prioritized inclusion, compression when full)?
|
|
199
|
+
- [ ] Does every memory entry have: content, source, timestamp, confidence, tags?
|