@wazir-dev/cli 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +111 -0
- package/CHANGELOG.md +14 -0
- package/CONTRIBUTING.md +101 -0
- package/LICENSE +21 -0
- package/README.md +314 -0
- package/assets/composition-engine.mmd +34 -0
- package/assets/demo-script.sh +17 -0
- package/assets/logo-dark.svg +14 -0
- package/assets/logo.svg +14 -0
- package/assets/pipeline.mmd +39 -0
- package/assets/record-demo.sh +51 -0
- package/docs/README.md +51 -0
- package/docs/adapters/context-mode.md +60 -0
- package/docs/concepts/architecture.md +87 -0
- package/docs/concepts/artifact-model.md +60 -0
- package/docs/concepts/composition-engine.md +36 -0
- package/docs/concepts/indexing-and-recall.md +160 -0
- package/docs/concepts/observability.md +41 -0
- package/docs/concepts/roles-and-workflows.md +59 -0
- package/docs/concepts/terminology-policy.md +27 -0
- package/docs/getting-started/01-installation.md +78 -0
- package/docs/getting-started/02-first-run.md +102 -0
- package/docs/getting-started/03-adding-to-project.md +15 -0
- package/docs/getting-started/04-host-setup.md +15 -0
- package/docs/guides/ci-integration.md +15 -0
- package/docs/guides/creating-skills.md +15 -0
- package/docs/guides/expertise-module-authoring.md +15 -0
- package/docs/guides/hook-development.md +15 -0
- package/docs/guides/memory-and-learnings.md +34 -0
- package/docs/guides/multi-host-export.md +15 -0
- package/docs/guides/troubleshooting.md +101 -0
- package/docs/guides/writing-custom-roles.md +15 -0
- package/docs/plans/2026-03-15-cli-pipeline-integration-design.md +592 -0
- package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md +598 -0
- package/docs/plans/2026-03-15-docs-enforcement-plan.md +238 -0
- package/docs/readmes/INDEX.md +99 -0
- package/docs/readmes/features/expertise/README.md +171 -0
- package/docs/readmes/features/exports/README.md +222 -0
- package/docs/readmes/features/hooks/README.md +103 -0
- package/docs/readmes/features/hooks/loop-cap-guard.md +133 -0
- package/docs/readmes/features/hooks/post-tool-capture.md +121 -0
- package/docs/readmes/features/hooks/post-tool-lint.md +130 -0
- package/docs/readmes/features/hooks/pre-compact-summary.md +122 -0
- package/docs/readmes/features/hooks/pre-tool-capture-route.md +100 -0
- package/docs/readmes/features/hooks/protected-path-write-guard.md +128 -0
- package/docs/readmes/features/hooks/session-start.md +119 -0
- package/docs/readmes/features/hooks/stop-handoff-harvest.md +125 -0
- package/docs/readmes/features/roles/README.md +157 -0
- package/docs/readmes/features/roles/clarifier.md +152 -0
- package/docs/readmes/features/roles/content-author.md +190 -0
- package/docs/readmes/features/roles/designer.md +193 -0
- package/docs/readmes/features/roles/executor.md +184 -0
- package/docs/readmes/features/roles/learner.md +210 -0
- package/docs/readmes/features/roles/planner.md +182 -0
- package/docs/readmes/features/roles/researcher.md +164 -0
- package/docs/readmes/features/roles/reviewer.md +184 -0
- package/docs/readmes/features/roles/specifier.md +162 -0
- package/docs/readmes/features/roles/verifier.md +215 -0
- package/docs/readmes/features/schemas/README.md +178 -0
- package/docs/readmes/features/skills/README.md +63 -0
- package/docs/readmes/features/skills/brainstorming.md +96 -0
- package/docs/readmes/features/skills/debugging.md +148 -0
- package/docs/readmes/features/skills/design.md +120 -0
- package/docs/readmes/features/skills/prepare-next.md +109 -0
- package/docs/readmes/features/skills/run-audit.md +159 -0
- package/docs/readmes/features/skills/scan-project.md +109 -0
- package/docs/readmes/features/skills/self-audit.md +176 -0
- package/docs/readmes/features/skills/tdd.md +137 -0
- package/docs/readmes/features/skills/using-skills.md +92 -0
- package/docs/readmes/features/skills/verification.md +120 -0
- package/docs/readmes/features/skills/writing-plans.md +104 -0
- package/docs/readmes/features/tooling/README.md +320 -0
- package/docs/readmes/features/workflows/README.md +186 -0
- package/docs/readmes/features/workflows/author.md +181 -0
- package/docs/readmes/features/workflows/clarify.md +154 -0
- package/docs/readmes/features/workflows/design-review.md +171 -0
- package/docs/readmes/features/workflows/design.md +169 -0
- package/docs/readmes/features/workflows/discover.md +162 -0
- package/docs/readmes/features/workflows/execute.md +173 -0
- package/docs/readmes/features/workflows/learn.md +167 -0
- package/docs/readmes/features/workflows/plan-review.md +165 -0
- package/docs/readmes/features/workflows/plan.md +170 -0
- package/docs/readmes/features/workflows/prepare-next.md +167 -0
- package/docs/readmes/features/workflows/review.md +169 -0
- package/docs/readmes/features/workflows/run-audit.md +191 -0
- package/docs/readmes/features/workflows/spec-challenge.md +159 -0
- package/docs/readmes/features/workflows/specify.md +160 -0
- package/docs/readmes/features/workflows/verify.md +177 -0
- package/docs/readmes/packages/README.md +50 -0
- package/docs/readmes/packages/ajv.md +117 -0
- package/docs/readmes/packages/context-mode.md +118 -0
- package/docs/readmes/packages/gray-matter.md +116 -0
- package/docs/readmes/packages/node-test.md +137 -0
- package/docs/readmes/packages/yaml.md +112 -0
- package/docs/reference/configuration-reference.md +159 -0
- package/docs/reference/expertise-index.md +52 -0
- package/docs/reference/git-flow.md +43 -0
- package/docs/reference/hooks.md +87 -0
- package/docs/reference/host-exports.md +50 -0
- package/docs/reference/launch-checklist.md +172 -0
- package/docs/reference/marketplace-listings.md +76 -0
- package/docs/reference/release-process.md +34 -0
- package/docs/reference/roles-reference.md +77 -0
- package/docs/reference/skills.md +33 -0
- package/docs/reference/templates.md +29 -0
- package/docs/reference/tooling-cli.md +94 -0
- package/docs/truth-claims.yaml +222 -0
- package/expertise/PROGRESS.md +63 -0
- package/expertise/README.md +18 -0
- package/expertise/antipatterns/PROGRESS.md +56 -0
- package/expertise/antipatterns/backend/api-design-antipatterns.md +1271 -0
- package/expertise/antipatterns/backend/auth-antipatterns.md +1195 -0
- package/expertise/antipatterns/backend/caching-antipatterns.md +622 -0
- package/expertise/antipatterns/backend/database-antipatterns.md +1038 -0
- package/expertise/antipatterns/backend/index.md +24 -0
- package/expertise/antipatterns/backend/microservices-antipatterns.md +850 -0
- package/expertise/antipatterns/code/architecture-antipatterns.md +919 -0
- package/expertise/antipatterns/code/async-antipatterns.md +622 -0
- package/expertise/antipatterns/code/code-smells.md +1186 -0
- package/expertise/antipatterns/code/dependency-antipatterns.md +1209 -0
- package/expertise/antipatterns/code/error-handling-antipatterns.md +1360 -0
- package/expertise/antipatterns/code/index.md +27 -0
- package/expertise/antipatterns/code/naming-and-abstraction.md +1118 -0
- package/expertise/antipatterns/code/state-management-antipatterns.md +1076 -0
- package/expertise/antipatterns/code/testing-antipatterns.md +1053 -0
- package/expertise/antipatterns/design/accessibility-antipatterns.md +1136 -0
- package/expertise/antipatterns/design/dark-patterns.md +1121 -0
- package/expertise/antipatterns/design/index.md +22 -0
- package/expertise/antipatterns/design/ui-antipatterns.md +1202 -0
- package/expertise/antipatterns/design/ux-antipatterns.md +680 -0
- package/expertise/antipatterns/frontend/css-layout-antipatterns.md +691 -0
- package/expertise/antipatterns/frontend/flutter-antipatterns.md +1827 -0
- package/expertise/antipatterns/frontend/index.md +23 -0
- package/expertise/antipatterns/frontend/mobile-antipatterns.md +573 -0
- package/expertise/antipatterns/frontend/react-antipatterns.md +1128 -0
- package/expertise/antipatterns/frontend/spa-antipatterns.md +1235 -0
- package/expertise/antipatterns/index.md +31 -0
- package/expertise/antipatterns/performance/index.md +20 -0
- package/expertise/antipatterns/performance/performance-antipatterns.md +1013 -0
- package/expertise/antipatterns/performance/premature-optimization.md +623 -0
- package/expertise/antipatterns/performance/scaling-antipatterns.md +785 -0
- package/expertise/antipatterns/process/ai-coding-antipatterns.md +853 -0
- package/expertise/antipatterns/process/code-review-antipatterns.md +656 -0
- package/expertise/antipatterns/process/deployment-antipatterns.md +920 -0
- package/expertise/antipatterns/process/index.md +23 -0
- package/expertise/antipatterns/process/technical-debt-antipatterns.md +647 -0
- package/expertise/antipatterns/security/index.md +20 -0
- package/expertise/antipatterns/security/secrets-antipatterns.md +849 -0
- package/expertise/antipatterns/security/security-theater.md +843 -0
- package/expertise/antipatterns/security/vulnerability-patterns.md +801 -0
- package/expertise/architecture/PROGRESS.md +70 -0
- package/expertise/architecture/data/caching-architecture.md +671 -0
- package/expertise/architecture/data/data-consistency.md +574 -0
- package/expertise/architecture/data/data-modeling.md +536 -0
- package/expertise/architecture/data/event-streams-and-queues.md +634 -0
- package/expertise/architecture/data/index.md +25 -0
- package/expertise/architecture/data/search-architecture.md +663 -0
- package/expertise/architecture/data/sql-vs-nosql.md +708 -0
- package/expertise/architecture/decisions/architecture-decision-records.md +640 -0
- package/expertise/architecture/decisions/build-vs-buy.md +616 -0
- package/expertise/architecture/decisions/index.md +23 -0
- package/expertise/architecture/decisions/monolith-to-microservices.md +790 -0
- package/expertise/architecture/decisions/technology-selection.md +616 -0
- package/expertise/architecture/distributed/cap-theorem-and-tradeoffs.md +800 -0
- package/expertise/architecture/distributed/circuit-breaker-bulkhead.md +741 -0
- package/expertise/architecture/distributed/consensus-and-coordination.md +796 -0
- package/expertise/architecture/distributed/distributed-systems-fundamentals.md +564 -0
- package/expertise/architecture/distributed/idempotency-and-retry.md +796 -0
- package/expertise/architecture/distributed/index.md +25 -0
- package/expertise/architecture/distributed/saga-pattern.md +797 -0
- package/expertise/architecture/foundations/architectural-thinking.md +460 -0
- package/expertise/architecture/foundations/coupling-and-cohesion.md +770 -0
- package/expertise/architecture/foundations/design-principles-solid.md +649 -0
- package/expertise/architecture/foundations/domain-driven-design.md +719 -0
- package/expertise/architecture/foundations/index.md +25 -0
- package/expertise/architecture/foundations/separation-of-concerns.md +472 -0
- package/expertise/architecture/foundations/twelve-factor-app.md +797 -0
- package/expertise/architecture/index.md +34 -0
- package/expertise/architecture/integration/api-design-graphql.md +638 -0
- package/expertise/architecture/integration/api-design-grpc.md +804 -0
- package/expertise/architecture/integration/api-design-rest.md +892 -0
- package/expertise/architecture/integration/index.md +25 -0
- package/expertise/architecture/integration/third-party-integration.md +795 -0
- package/expertise/architecture/integration/webhooks-and-callbacks.md +1152 -0
- package/expertise/architecture/integration/websockets-realtime.md +791 -0
- package/expertise/architecture/mobile-architecture/index.md +22 -0
- package/expertise/architecture/mobile-architecture/mobile-app-architecture.md +780 -0
- package/expertise/architecture/mobile-architecture/mobile-backend-for-frontend.md +670 -0
- package/expertise/architecture/mobile-architecture/offline-first.md +719 -0
- package/expertise/architecture/mobile-architecture/push-and-sync.md +782 -0
- package/expertise/architecture/patterns/cqrs-event-sourcing.md +717 -0
- package/expertise/architecture/patterns/event-driven.md +797 -0
- package/expertise/architecture/patterns/hexagonal-clean-architecture.md +870 -0
- package/expertise/architecture/patterns/index.md +27 -0
- package/expertise/architecture/patterns/layered-architecture.md +736 -0
- package/expertise/architecture/patterns/microservices.md +753 -0
- package/expertise/architecture/patterns/modular-monolith.md +692 -0
- package/expertise/architecture/patterns/monolith.md +626 -0
- package/expertise/architecture/patterns/plugin-architecture.md +735 -0
- package/expertise/architecture/patterns/serverless.md +780 -0
- package/expertise/architecture/scaling/database-scaling.md +615 -0
- package/expertise/architecture/scaling/feature-flags-and-rollouts.md +757 -0
- package/expertise/architecture/scaling/horizontal-vs-vertical.md +606 -0
- package/expertise/architecture/scaling/index.md +24 -0
- package/expertise/architecture/scaling/multi-tenancy.md +800 -0
- package/expertise/architecture/scaling/stateless-design.md +787 -0
- package/expertise/backend/embedded-firmware.md +625 -0
- package/expertise/backend/go.md +853 -0
- package/expertise/backend/index.md +24 -0
- package/expertise/backend/java-spring.md +448 -0
- package/expertise/backend/node-typescript.md +625 -0
- package/expertise/backend/python-fastapi.md +724 -0
- package/expertise/backend/rust.md +458 -0
- package/expertise/backend/solidity.md +711 -0
- package/expertise/composition-map.yaml +443 -0
- package/expertise/content/foundations/content-modeling.md +395 -0
- package/expertise/content/foundations/editorial-standards.md +449 -0
- package/expertise/content/foundations/index.md +24 -0
- package/expertise/content/foundations/microcopy.md +455 -0
- package/expertise/content/foundations/terminology-governance.md +509 -0
- package/expertise/content/index.md +34 -0
- package/expertise/content/patterns/accessibility-copy.md +518 -0
- package/expertise/content/patterns/index.md +24 -0
- package/expertise/content/patterns/notification-content.md +433 -0
- package/expertise/content/patterns/sample-content.md +486 -0
- package/expertise/content/patterns/state-copy.md +439 -0
- package/expertise/design/PROGRESS.md +58 -0
- package/expertise/design/disciplines/dark-mode-theming.md +577 -0
- package/expertise/design/disciplines/design-systems.md +595 -0
- package/expertise/design/disciplines/index.md +25 -0
- package/expertise/design/disciplines/information-architecture.md +800 -0
- package/expertise/design/disciplines/interaction-design.md +788 -0
- package/expertise/design/disciplines/responsive-design.md +552 -0
- package/expertise/design/disciplines/usability-testing.md +516 -0
- package/expertise/design/disciplines/user-research.md +792 -0
- package/expertise/design/foundations/accessibility-design.md +796 -0
- package/expertise/design/foundations/color-theory.md +797 -0
- package/expertise/design/foundations/iconography.md +795 -0
- package/expertise/design/foundations/index.md +26 -0
- package/expertise/design/foundations/motion-and-animation.md +653 -0
- package/expertise/design/foundations/rtl-design.md +585 -0
- package/expertise/design/foundations/spacing-and-layout.md +607 -0
- package/expertise/design/foundations/typography.md +800 -0
- package/expertise/design/foundations/visual-hierarchy.md +761 -0
- package/expertise/design/index.md +32 -0
- package/expertise/design/patterns/authentication-flows.md +474 -0
- package/expertise/design/patterns/content-consumption.md +789 -0
- package/expertise/design/patterns/data-display.md +618 -0
- package/expertise/design/patterns/e-commerce.md +1494 -0
- package/expertise/design/patterns/feedback-and-states.md +642 -0
- package/expertise/design/patterns/forms-and-input.md +819 -0
- package/expertise/design/patterns/gamification.md +801 -0
- package/expertise/design/patterns/index.md +31 -0
- package/expertise/design/patterns/microinteractions.md +449 -0
- package/expertise/design/patterns/navigation.md +800 -0
- package/expertise/design/patterns/notifications.md +705 -0
- package/expertise/design/patterns/onboarding.md +700 -0
- package/expertise/design/patterns/search-and-filter.md +601 -0
- package/expertise/design/patterns/settings-and-preferences.md +768 -0
- package/expertise/design/patterns/social-and-community.md +748 -0
- package/expertise/design/platforms/desktop-native.md +612 -0
- package/expertise/design/platforms/index.md +25 -0
- package/expertise/design/platforms/mobile-android.md +825 -0
- package/expertise/design/platforms/mobile-cross-platform.md +983 -0
- package/expertise/design/platforms/mobile-ios.md +699 -0
- package/expertise/design/platforms/tablet.md +794 -0
- package/expertise/design/platforms/web-dashboard.md +790 -0
- package/expertise/design/platforms/web-responsive.md +550 -0
- package/expertise/design/psychology/behavioral-nudges.md +449 -0
- package/expertise/design/psychology/cognitive-load.md +1191 -0
- package/expertise/design/psychology/error-psychology.md +778 -0
- package/expertise/design/psychology/index.md +22 -0
- package/expertise/design/psychology/persuasive-design.md +736 -0
- package/expertise/design/psychology/user-mental-models.md +623 -0
- package/expertise/design/tooling/open-pencil.md +266 -0
- package/expertise/frontend/angular.md +1073 -0
- package/expertise/frontend/desktop-electron.md +546 -0
- package/expertise/frontend/flutter.md +782 -0
- package/expertise/frontend/index.md +27 -0
- package/expertise/frontend/native-android.md +409 -0
- package/expertise/frontend/native-ios.md +490 -0
- package/expertise/frontend/react-native.md +1160 -0
- package/expertise/frontend/react.md +808 -0
- package/expertise/frontend/vue.md +1089 -0
- package/expertise/humanize/domain-rules-code.md +79 -0
- package/expertise/humanize/domain-rules-content.md +67 -0
- package/expertise/humanize/domain-rules-technical-docs.md +56 -0
- package/expertise/humanize/index.md +35 -0
- package/expertise/humanize/self-audit-checklist.md +87 -0
- package/expertise/humanize/sentence-patterns.md +218 -0
- package/expertise/humanize/vocabulary-blacklist.md +105 -0
- package/expertise/i18n/PROGRESS.md +65 -0
- package/expertise/i18n/advanced/accessibility-and-i18n.md +28 -0
- package/expertise/i18n/advanced/bidirectional-text-algorithm.md +38 -0
- package/expertise/i18n/advanced/complex-scripts.md +30 -0
- package/expertise/i18n/advanced/performance-and-i18n.md +27 -0
- package/expertise/i18n/advanced/testing-i18n.md +28 -0
- package/expertise/i18n/content/content-adaptation.md +23 -0
- package/expertise/i18n/content/locale-specific-formatting.md +23 -0
- package/expertise/i18n/content/machine-translation-integration.md +28 -0
- package/expertise/i18n/content/translation-management.md +29 -0
- package/expertise/i18n/foundations/date-time-calendars.md +67 -0
- package/expertise/i18n/foundations/i18n-architecture.md +272 -0
- package/expertise/i18n/foundations/locale-and-language-tags.md +79 -0
- package/expertise/i18n/foundations/numbers-currency-units.md +61 -0
- package/expertise/i18n/foundations/pluralization-and-gender.md +109 -0
- package/expertise/i18n/foundations/string-externalization.md +236 -0
- package/expertise/i18n/foundations/text-direction-bidi.md +241 -0
- package/expertise/i18n/foundations/unicode-and-encoding.md +86 -0
- package/expertise/i18n/index.md +38 -0
- package/expertise/i18n/platform/backend-i18n.md +31 -0
- package/expertise/i18n/platform/flutter-i18n.md +148 -0
- package/expertise/i18n/platform/native-android-i18n.md +36 -0
- package/expertise/i18n/platform/native-ios-i18n.md +36 -0
- package/expertise/i18n/platform/react-i18n.md +103 -0
- package/expertise/i18n/platform/web-css-i18n.md +81 -0
- package/expertise/i18n/rtl/arabic-specific.md +175 -0
- package/expertise/i18n/rtl/hebrew-specific.md +149 -0
- package/expertise/i18n/rtl/rtl-animations-and-transitions.md +111 -0
- package/expertise/i18n/rtl/rtl-forms-and-input.md +161 -0
- package/expertise/i18n/rtl/rtl-fundamentals.md +211 -0
- package/expertise/i18n/rtl/rtl-icons-and-images.md +181 -0
- package/expertise/i18n/rtl/rtl-layout-mirroring.md +252 -0
- package/expertise/i18n/rtl/rtl-navigation-and-gestures.md +107 -0
- package/expertise/i18n/rtl/rtl-testing-and-qa.md +147 -0
- package/expertise/i18n/rtl/rtl-typography.md +160 -0
- package/expertise/index.md +113 -0
- package/expertise/index.yaml +216 -0
- package/expertise/infrastructure/cloud-aws.md +597 -0
- package/expertise/infrastructure/cloud-gcp.md +599 -0
- package/expertise/infrastructure/cybersecurity.md +816 -0
- package/expertise/infrastructure/database-mongodb.md +447 -0
- package/expertise/infrastructure/database-postgres.md +400 -0
- package/expertise/infrastructure/devops-cicd.md +787 -0
- package/expertise/infrastructure/index.md +27 -0
- package/expertise/performance/PROGRESS.md +50 -0
- package/expertise/performance/backend/api-latency.md +1204 -0
- package/expertise/performance/backend/background-jobs.md +506 -0
- package/expertise/performance/backend/connection-pooling.md +1209 -0
- package/expertise/performance/backend/database-query-optimization.md +515 -0
- package/expertise/performance/backend/index.md +23 -0
- package/expertise/performance/backend/rate-limiting-and-throttling.md +971 -0
- package/expertise/performance/foundations/algorithmic-complexity.md +954 -0
- package/expertise/performance/foundations/caching-strategies.md +489 -0
- package/expertise/performance/foundations/concurrency-and-parallelism.md +847 -0
- package/expertise/performance/foundations/index.md +24 -0
- package/expertise/performance/foundations/measuring-and-profiling.md +440 -0
- package/expertise/performance/foundations/memory-management.md +964 -0
- package/expertise/performance/foundations/performance-budgets.md +1314 -0
- package/expertise/performance/index.md +31 -0
- package/expertise/performance/infrastructure/auto-scaling.md +1059 -0
- package/expertise/performance/infrastructure/cdn-and-edge.md +1081 -0
- package/expertise/performance/infrastructure/index.md +22 -0
- package/expertise/performance/infrastructure/load-balancing.md +1081 -0
- package/expertise/performance/infrastructure/observability.md +1079 -0
- package/expertise/performance/mobile/index.md +23 -0
- package/expertise/performance/mobile/mobile-animations.md +544 -0
- package/expertise/performance/mobile/mobile-memory-battery.md +416 -0
- package/expertise/performance/mobile/mobile-network.md +452 -0
- package/expertise/performance/mobile/mobile-rendering.md +599 -0
- package/expertise/performance/mobile/mobile-startup-time.md +505 -0
- package/expertise/performance/platform-specific/flutter-performance.md +647 -0
- package/expertise/performance/platform-specific/index.md +22 -0
- package/expertise/performance/platform-specific/node-performance.md +1307 -0
- package/expertise/performance/platform-specific/postgres-performance.md +1366 -0
- package/expertise/performance/platform-specific/react-performance.md +1403 -0
- package/expertise/performance/web/bundle-optimization.md +1239 -0
- package/expertise/performance/web/image-and-media.md +636 -0
- package/expertise/performance/web/index.md +24 -0
- package/expertise/performance/web/network-optimization.md +1133 -0
- package/expertise/performance/web/rendering-performance.md +1098 -0
- package/expertise/performance/web/ssr-and-hydration.md +918 -0
- package/expertise/performance/web/web-vitals.md +1374 -0
- package/expertise/quality/accessibility.md +985 -0
- package/expertise/quality/evidence-based-verification.md +499 -0
- package/expertise/quality/index.md +24 -0
- package/expertise/quality/ml-model-audit.md +614 -0
- package/expertise/quality/performance.md +600 -0
- package/expertise/quality/testing-api.md +891 -0
- package/expertise/quality/testing-mobile.md +496 -0
- package/expertise/quality/testing-web.md +849 -0
- package/expertise/security/PROGRESS.md +54 -0
- package/expertise/security/agentic-identity.md +540 -0
- package/expertise/security/compliance-frameworks.md +601 -0
- package/expertise/security/data/data-encryption.md +364 -0
- package/expertise/security/data/data-privacy-gdpr.md +692 -0
- package/expertise/security/data/database-security.md +1171 -0
- package/expertise/security/data/index.md +22 -0
- package/expertise/security/data/pii-handling.md +531 -0
- package/expertise/security/foundations/authentication.md +1041 -0
- package/expertise/security/foundations/authorization.md +603 -0
- package/expertise/security/foundations/cryptography.md +1001 -0
- package/expertise/security/foundations/index.md +25 -0
- package/expertise/security/foundations/owasp-top-10.md +1354 -0
- package/expertise/security/foundations/secrets-management.md +1217 -0
- package/expertise/security/foundations/secure-sdlc.md +700 -0
- package/expertise/security/foundations/supply-chain-security.md +698 -0
- package/expertise/security/index.md +31 -0
- package/expertise/security/infrastructure/cloud-security-aws.md +1296 -0
- package/expertise/security/infrastructure/cloud-security-gcp.md +1376 -0
- package/expertise/security/infrastructure/container-security.md +721 -0
- package/expertise/security/infrastructure/incident-response.md +1295 -0
- package/expertise/security/infrastructure/index.md +24 -0
- package/expertise/security/infrastructure/logging-and-monitoring.md +1618 -0
- package/expertise/security/infrastructure/network-security.md +1337 -0
- package/expertise/security/mobile/index.md +23 -0
- package/expertise/security/mobile/mobile-android-security.md +1218 -0
- package/expertise/security/mobile/mobile-binary-protection.md +1229 -0
- package/expertise/security/mobile/mobile-data-storage.md +1265 -0
- package/expertise/security/mobile/mobile-ios-security.md +1401 -0
- package/expertise/security/mobile/mobile-network-security.md +1520 -0
- package/expertise/security/smart-contract-security.md +594 -0
- package/expertise/security/testing/index.md +22 -0
- package/expertise/security/testing/penetration-testing.md +1258 -0
- package/expertise/security/testing/security-code-review.md +1765 -0
- package/expertise/security/testing/threat-modeling.md +1074 -0
- package/expertise/security/testing/vulnerability-scanning.md +1062 -0
- package/expertise/security/web/api-security.md +586 -0
- package/expertise/security/web/cors-and-headers.md +433 -0
- package/expertise/security/web/csrf.md +562 -0
- package/expertise/security/web/file-upload.md +1477 -0
- package/expertise/security/web/index.md +25 -0
- package/expertise/security/web/injection.md +1375 -0
- package/expertise/security/web/session-management.md +1101 -0
- package/expertise/security/web/xss.md +1158 -0
- package/exports/README.md +17 -0
- package/exports/hosts/claude/.claude/agents/clarifier.md +42 -0
- package/exports/hosts/claude/.claude/agents/content-author.md +63 -0
- package/exports/hosts/claude/.claude/agents/designer.md +55 -0
- package/exports/hosts/claude/.claude/agents/executor.md +55 -0
- package/exports/hosts/claude/.claude/agents/learner.md +51 -0
- package/exports/hosts/claude/.claude/agents/planner.md +53 -0
- package/exports/hosts/claude/.claude/agents/researcher.md +43 -0
- package/exports/hosts/claude/.claude/agents/reviewer.md +54 -0
- package/exports/hosts/claude/.claude/agents/specifier.md +47 -0
- package/exports/hosts/claude/.claude/agents/verifier.md +71 -0
- package/exports/hosts/claude/.claude/commands/author.md +42 -0
- package/exports/hosts/claude/.claude/commands/clarify.md +38 -0
- package/exports/hosts/claude/.claude/commands/design-review.md +46 -0
- package/exports/hosts/claude/.claude/commands/design.md +44 -0
- package/exports/hosts/claude/.claude/commands/discover.md +37 -0
- package/exports/hosts/claude/.claude/commands/execute.md +48 -0
- package/exports/hosts/claude/.claude/commands/learn.md +38 -0
- package/exports/hosts/claude/.claude/commands/plan-review.md +42 -0
- package/exports/hosts/claude/.claude/commands/plan.md +39 -0
- package/exports/hosts/claude/.claude/commands/prepare-next.md +37 -0
- package/exports/hosts/claude/.claude/commands/review.md +40 -0
- package/exports/hosts/claude/.claude/commands/run-audit.md +41 -0
- package/exports/hosts/claude/.claude/commands/spec-challenge.md +41 -0
- package/exports/hosts/claude/.claude/commands/specify.md +38 -0
- package/exports/hosts/claude/.claude/commands/verify.md +37 -0
- package/exports/hosts/claude/.claude/settings.json +34 -0
- package/exports/hosts/claude/CLAUDE.md +19 -0
- package/exports/hosts/claude/export.manifest.json +38 -0
- package/exports/hosts/claude/host-package.json +67 -0
- package/exports/hosts/codex/AGENTS.md +19 -0
- package/exports/hosts/codex/export.manifest.json +38 -0
- package/exports/hosts/codex/host-package.json +41 -0
- package/exports/hosts/cursor/.cursor/hooks.json +16 -0
- package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +19 -0
- package/exports/hosts/cursor/export.manifest.json +38 -0
- package/exports/hosts/cursor/host-package.json +42 -0
- package/exports/hosts/gemini/GEMINI.md +19 -0
- package/exports/hosts/gemini/export.manifest.json +38 -0
- package/exports/hosts/gemini/host-package.json +41 -0
- package/hooks/README.md +18 -0
- package/hooks/definitions/loop_cap_guard.yaml +21 -0
- package/hooks/definitions/post_tool_capture.yaml +24 -0
- package/hooks/definitions/pre_compact_summary.yaml +19 -0
- package/hooks/definitions/pre_tool_capture_route.yaml +19 -0
- package/hooks/definitions/protected_path_write_guard.yaml +19 -0
- package/hooks/definitions/session_start.yaml +19 -0
- package/hooks/definitions/stop_handoff_harvest.yaml +20 -0
- package/hooks/loop-cap-guard +17 -0
- package/hooks/post-tool-lint +36 -0
- package/hooks/protected-path-write-guard +17 -0
- package/hooks/session-start +41 -0
- package/llms-full.txt +2355 -0
- package/llms.txt +43 -0
- package/package.json +79 -0
- package/roles/README.md +20 -0
- package/roles/clarifier.md +42 -0
- package/roles/content-author.md +63 -0
- package/roles/designer.md +55 -0
- package/roles/executor.md +55 -0
- package/roles/learner.md +51 -0
- package/roles/planner.md +53 -0
- package/roles/researcher.md +43 -0
- package/roles/reviewer.md +54 -0
- package/roles/specifier.md +47 -0
- package/roles/verifier.md +71 -0
- package/schemas/README.md +24 -0
- package/schemas/accepted-learning.schema.json +20 -0
- package/schemas/author-artifact.schema.json +156 -0
- package/schemas/clarification.schema.json +19 -0
- package/schemas/design-artifact.schema.json +80 -0
- package/schemas/docs-claim.schema.json +18 -0
- package/schemas/export-manifest.schema.json +20 -0
- package/schemas/hook.schema.json +67 -0
- package/schemas/host-export-package.schema.json +18 -0
- package/schemas/implementation-plan.schema.json +19 -0
- package/schemas/proposed-learning.schema.json +19 -0
- package/schemas/research.schema.json +18 -0
- package/schemas/review.schema.json +29 -0
- package/schemas/run-manifest.schema.json +18 -0
- package/schemas/spec-challenge.schema.json +18 -0
- package/schemas/spec.schema.json +20 -0
- package/schemas/usage.schema.json +102 -0
- package/schemas/verification-proof.schema.json +29 -0
- package/schemas/wazir-manifest.schema.json +173 -0
- package/skills/README.md +40 -0
- package/skills/brainstorming/SKILL.md +77 -0
- package/skills/debugging/SKILL.md +50 -0
- package/skills/design/SKILL.md +61 -0
- package/skills/dispatching-parallel-agents/SKILL.md +128 -0
- package/skills/executing-plans/SKILL.md +70 -0
- package/skills/finishing-a-development-branch/SKILL.md +169 -0
- package/skills/humanize/SKILL.md +123 -0
- package/skills/init-pipeline/SKILL.md +124 -0
- package/skills/prepare-next/SKILL.md +20 -0
- package/skills/receiving-code-review/SKILL.md +123 -0
- package/skills/requesting-code-review/SKILL.md +105 -0
- package/skills/requesting-code-review/code-reviewer.md +108 -0
- package/skills/run-audit/SKILL.md +197 -0
- package/skills/scan-project/SKILL.md +41 -0
- package/skills/self-audit/SKILL.md +153 -0
- package/skills/subagent-driven-development/SKILL.md +154 -0
- package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +26 -0
- package/skills/subagent-driven-development/implementer-prompt.md +102 -0
- package/skills/subagent-driven-development/spec-reviewer-prompt.md +61 -0
- package/skills/tdd/SKILL.md +23 -0
- package/skills/using-git-worktrees/SKILL.md +163 -0
- package/skills/using-skills/SKILL.md +95 -0
- package/skills/verification/SKILL.md +22 -0
- package/skills/wazir/SKILL.md +463 -0
- package/skills/writing-plans/SKILL.md +30 -0
- package/skills/writing-skills/SKILL.md +157 -0
- package/skills/writing-skills/anthropic-best-practices.md +122 -0
- package/skills/writing-skills/persuasion-principles.md +50 -0
- package/templates/README.md +20 -0
- package/templates/artifacts/README.md +10 -0
- package/templates/artifacts/accepted-learning.md +19 -0
- package/templates/artifacts/accepted-learning.template.json +12 -0
- package/templates/artifacts/author.md +74 -0
- package/templates/artifacts/author.template.json +19 -0
- package/templates/artifacts/clarification.md +21 -0
- package/templates/artifacts/clarification.template.json +12 -0
- package/templates/artifacts/execute-notes.md +19 -0
- package/templates/artifacts/implementation-plan.md +21 -0
- package/templates/artifacts/implementation-plan.template.json +11 -0
- package/templates/artifacts/learning-proposal.md +19 -0
- package/templates/artifacts/next-run-handoff.md +21 -0
- package/templates/artifacts/plan-review.md +19 -0
- package/templates/artifacts/proposed-learning.template.json +12 -0
- package/templates/artifacts/research.md +21 -0
- package/templates/artifacts/research.template.json +12 -0
- package/templates/artifacts/review-findings.md +19 -0
- package/templates/artifacts/review.template.json +11 -0
- package/templates/artifacts/run-manifest.template.json +8 -0
- package/templates/artifacts/spec-challenge.md +19 -0
- package/templates/artifacts/spec-challenge.template.json +11 -0
- package/templates/artifacts/spec.md +21 -0
- package/templates/artifacts/spec.template.json +12 -0
- package/templates/artifacts/verification-proof.md +19 -0
- package/templates/artifacts/verification-proof.template.json +11 -0
- package/templates/examples/accepted-learning.example.json +14 -0
- package/templates/examples/author.example.json +152 -0
- package/templates/examples/clarification.example.json +15 -0
- package/templates/examples/docs-claim.example.json +8 -0
- package/templates/examples/export-manifest.example.json +7 -0
- package/templates/examples/host-export-package.example.json +11 -0
- package/templates/examples/implementation-plan.example.json +17 -0
- package/templates/examples/proposed-learning.example.json +13 -0
- package/templates/examples/research.example.json +15 -0
- package/templates/examples/research.example.md +6 -0
- package/templates/examples/review.example.json +17 -0
- package/templates/examples/run-manifest.example.json +9 -0
- package/templates/examples/spec-challenge.example.json +14 -0
- package/templates/examples/spec.example.json +21 -0
- package/templates/examples/verification-proof.example.json +21 -0
- package/templates/examples/wazir-manifest.example.yaml +65 -0
- package/templates/task-definition-schema.md +99 -0
- package/tooling/README.md +20 -0
- package/tooling/src/adapters/context-mode.js +50 -0
- package/tooling/src/capture/command.js +376 -0
- package/tooling/src/capture/store.js +99 -0
- package/tooling/src/capture/usage.js +270 -0
- package/tooling/src/checks/branches.js +50 -0
- package/tooling/src/checks/brand-truth.js +110 -0
- package/tooling/src/checks/changelog.js +231 -0
- package/tooling/src/checks/command-registry.js +36 -0
- package/tooling/src/checks/commits.js +102 -0
- package/tooling/src/checks/docs-drift.js +103 -0
- package/tooling/src/checks/docs-truth.js +201 -0
- package/tooling/src/checks/runtime-surface.js +156 -0
- package/tooling/src/cli.js +116 -0
- package/tooling/src/command-options.js +56 -0
- package/tooling/src/commands/validate.js +320 -0
- package/tooling/src/doctor/command.js +91 -0
- package/tooling/src/export/command.js +77 -0
- package/tooling/src/export/compiler.js +498 -0
- package/tooling/src/guards/loop-cap-guard.js +52 -0
- package/tooling/src/guards/protected-path-write-guard.js +67 -0
- package/tooling/src/index/command.js +152 -0
- package/tooling/src/index/storage.js +1061 -0
- package/tooling/src/index/summarizers.js +261 -0
- package/tooling/src/loaders.js +18 -0
- package/tooling/src/project-root.js +22 -0
- package/tooling/src/recall/command.js +225 -0
- package/tooling/src/schema-validator.js +30 -0
- package/tooling/src/state-root.js +40 -0
- package/tooling/src/status/command.js +71 -0
- package/wazir.manifest.yaml +135 -0
- package/workflows/README.md +19 -0
- package/workflows/author.md +42 -0
- package/workflows/clarify.md +38 -0
- package/workflows/design-review.md +46 -0
- package/workflows/design.md +44 -0
- package/workflows/discover.md +37 -0
- package/workflows/execute.md +48 -0
- package/workflows/learn.md +38 -0
- package/workflows/plan-review.md +42 -0
- package/workflows/plan.md +39 -0
- package/workflows/prepare-next.md +37 -0
- package/workflows/review.md +40 -0
- package/workflows/run-audit.md +41 -0
- package/workflows/spec-challenge.md +41 -0
- package/workflows/specify.md +38 -0
- package/workflows/verify.md +37 -0
|
@@ -0,0 +1,614 @@
|
|
|
1
|
+
# ML Model Audit — Expertise Module
|
|
2
|
+
|
|
3
|
+
> An ML model auditor validates correctness, fairness, calibration, and production readiness of machine learning models before and after deployment. The scope spans data quality verification, discrimination and calibration testing, fairness assessment against legal thresholds, interpretability analysis via SHAP, drift detection, and continuous production monitoring.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Why Model Auditing Matters
|
|
8
|
+
|
|
9
|
+
| Incident | Year | Impact | Root Cause |
|
|
10
|
+
|---|---|---|---|
|
|
11
|
+
| Knight Capital algorithmic trading | 2012 | $440M loss in 45 minutes | Untested deployment; no rollback, no production monitoring |
|
|
12
|
+
| Amazon hiring tool gender bias | 2018 | Scrapped after Reuters exposure | Training data reflected historical hiring bias against women |
|
|
13
|
+
| Zillow Zestimate iBuyer model | 2021 | $569M write-down, 2,000 layoffs | Model drift; no recalibration when market shifted |
|
|
14
|
+
| COMPAS recidivism scoring | 2016 | ProPublica investigation, litigation | Racial bias in FPR; Black defendants 2x more likely flagged high-risk |
|
|
15
|
+
|
|
16
|
+
Pattern: models passed aggregate metrics but failed on unmeasured dimensions — subgroup fairness, calibration under shift, or operational monitoring.
|
|
17
|
+
|
|
18
|
+
**Key references:** Google Model Cards paper (Mitchell et al., 2019) — standard for model documentation. EU AI Act (Regulation 2024/1689) — four risk tiers, high-risk systems require conformity assessments, penalties up to 7% global turnover. NIST AI RMF 1.0 (2023) — Govern, Map, Measure, Manage. US EEOC Uniform Guidelines — 4/5ths rule for adverse impact.
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## 10-Domain Audit Framework
|
|
23
|
+
|
|
24
|
+
| # | Domain | What to Check | Key Metric | Threshold |
|
|
25
|
+
|---|---|---|---|---|
|
|
26
|
+
| 1 | Documentation | Model cards, data provenance, version history | Completeness | 100% required fields |
|
|
27
|
+
| 2 | Data Quality | Distribution, missing values, leakage, duplicates | PSI, missing rate | PSI < 0.1, missing < 5% |
|
|
28
|
+
| 3 | Feature Analysis | Importance stability, multicollinearity | SHAP values, VIF | VIF < 5, stable SHAP |
|
|
29
|
+
| 4 | Target/Label | Class balance, label noise, label leakage | Imbalance ratio | < 10:1, noise < 2% |
|
|
30
|
+
| 5 | Calibration | Predicted probability vs. observed frequency | Hosmer-Lemeshow, Brier | HL p > 0.05, Brier < 0.25 |
|
|
31
|
+
| 6 | Discrimination | Separating power for positive/negative classes | AUC-ROC, Gini, KS | AUC > 0.7, KS > 0.3 |
|
|
32
|
+
| 7 | Fairness | Protected group parity in outcomes and errors | Disparate impact | > 0.8 (4/5ths rule) |
|
|
33
|
+
| 8 | Interpretability | Feature explanations, local and global | SHAP consistency | Stable across samples |
|
|
34
|
+
| 9 | Monitoring | Drift detection, performance degradation | PSI per feature | PSI < 0.2, AUC drop < 5% |
|
|
35
|
+
| 10 | Business Impact | Decision quality, cost-weighted outcomes | Cost matrix | ROI positive |
|
|
36
|
+
|
|
37
|
+
**Execution order:** Documentation -> Data Quality -> Feature Analysis -> Target/Label -> Discrimination -> Calibration -> Fairness -> Interpretability -> Monitoring -> Business Impact. Each domain's findings inform the next — data quality issues invalidate downstream metrics, miscalibration corrupts fairness results.
|
|
38
|
+
|
|
39
|
+
---
|
|
40
|
+
|
|
41
|
+
## Domain 1: Documentation — Model Cards
|
|
42
|
+
|
|
43
|
+
```markdown
|
|
44
|
+
# Model Card: [Model Name]
|
|
45
|
+
## Model Details
|
|
46
|
+
- Version, Type, Framework, Training date, Owner
|
|
47
|
+
## Intended Use
|
|
48
|
+
- Primary use case, Out-of-scope uses, Target population
|
|
49
|
+
## Training Data
|
|
50
|
+
- Source, Collection period, Size, Preprocessing, Known limitations
|
|
51
|
+
## Evaluation Results
|
|
52
|
+
| Metric | Train | Validation | Test | Production |
|
|
53
|
+
|---|---|---|---|---|
|
|
54
|
+
| AUC-ROC / Gini / KS / Brier / PR-AUC | | | | |
|
|
55
|
+
## Performance by Subgroup
|
|
56
|
+
| Subgroup | N | AUC | FPR | FNR | Disparate Impact |
|
|
57
|
+
## Ethical Considerations
|
|
58
|
+
- Protected attributes evaluated, Fairness metrics, Known biases, Mitigation
|
|
59
|
+
## Limitations & Monitoring
|
|
60
|
+
- Drift detection method, Retraining trigger, Rollback plan
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
```python
|
|
64
|
+
REQUIRED_SECTIONS = [
|
|
65
|
+
'model_details', 'intended_use', 'training_data',
|
|
66
|
+
'evaluation_results', 'performance_by_subgroup',
|
|
67
|
+
'ethical_considerations', 'limitations', 'monitoring',
|
|
68
|
+
]
|
|
69
|
+
REQUIRED_FIELDS = {
|
|
70
|
+
'model_details': ['version', 'type', 'framework', 'training_date', 'owner'],
|
|
71
|
+
'intended_use': ['primary_use', 'out_of_scope', 'target_population'],
|
|
72
|
+
'training_data': ['source', 'collection_period', 'size', 'preprocessing'],
|
|
73
|
+
'evaluation_results': ['auc_roc', 'gini', 'brier_score'],
|
|
74
|
+
'ethical_considerations': ['protected_attributes', 'fairness_metrics'],
|
|
75
|
+
'monitoring': ['drift_detection', 'retraining_trigger', 'rollback_plan'],
|
|
76
|
+
}
|
|
77
|
+
|
|
78
|
+
def validate_model_card(card: dict) -> dict:
|
|
79
|
+
missing_sections = [s for s in REQUIRED_SECTIONS if s not in card]
|
|
80
|
+
missing_fields = {}
|
|
81
|
+
for section, fields in REQUIRED_FIELDS.items():
|
|
82
|
+
if section in card:
|
|
83
|
+
missing = [f for f in fields if not card[section].get(f)]
|
|
84
|
+
if missing:
|
|
85
|
+
missing_fields[section] = missing
|
|
86
|
+
total = len(REQUIRED_SECTIONS) + sum(len(v) for v in REQUIRED_FIELDS.values())
|
|
87
|
+
total_missing = len(missing_sections) + sum(len(v) for v in missing_fields.values())
|
|
88
|
+
completeness = (total - total_missing) / total
|
|
89
|
+
return {'completeness': round(completeness, 3), 'passes': completeness == 1.0,
|
|
90
|
+
'missing_sections': missing_sections, 'missing_fields': missing_fields}
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
---
|
|
94
|
+
|
|
95
|
+
## Domain 2: Data Quality — PSI and Integrity
|
|
96
|
+
|
|
97
|
+
### Population Stability Index (PSI)
|
|
98
|
+
|
|
99
|
+
| PSI Value | Interpretation | Action |
|
|
100
|
+
|---|---|---|
|
|
101
|
+
| < 0.1 | No significant shift | Continue monitoring |
|
|
102
|
+
| 0.1 - 0.2 | Moderate shift | Investigate, consider recalibration |
|
|
103
|
+
| > 0.2 | Significant shift | Retrain model |
|
|
104
|
+
| > 0.25 | Severe shift | Immediate review, potential rollback |
|
|
105
|
+
|
|
106
|
+
```python
|
|
107
|
+
import numpy as np
|
|
108
|
+
from typing import Optional
|
|
109
|
+
|
|
110
|
+
def compute_psi(
|
|
111
|
+
expected: np.ndarray, actual: np.ndarray,
|
|
112
|
+
bins: int = 10, method: str = 'quantile',
|
|
113
|
+
) -> float:
|
|
114
|
+
"""Population Stability Index — measures distributional shift."""
|
|
115
|
+
expected = expected[~np.isnan(expected)]
|
|
116
|
+
actual = actual[~np.isnan(actual)]
|
|
117
|
+
if len(expected) == 0 or len(actual) == 0:
|
|
118
|
+
raise ValueError("Input arrays must contain non-NaN values.")
|
|
119
|
+
|
|
120
|
+
if method == 'quantile':
|
|
121
|
+
breakpoints = np.unique(np.quantile(expected, np.linspace(0, 1, bins + 1)))
|
|
122
|
+
elif method == 'uniform':
|
|
123
|
+
breakpoints = np.linspace(expected.min(), expected.max(), bins + 1)
|
|
124
|
+
else:
|
|
125
|
+
raise ValueError(f"Unknown method '{method}'. Use 'quantile' or 'uniform'.")
|
|
126
|
+
|
|
127
|
+
if len(breakpoints) < 3: # Collapsed bins — fall back to uniform
|
|
128
|
+
breakpoints = np.linspace(expected.min(), expected.max(), bins + 1)
|
|
129
|
+
|
|
130
|
+
expected_pct = np.clip(np.histogram(expected, bins=breakpoints)[0] / len(expected), 1e-4, None)
|
|
131
|
+
actual_pct = np.clip(np.histogram(actual, bins=breakpoints)[0] / len(actual), 1e-4, None)
|
|
132
|
+
return float(np.sum((actual_pct - expected_pct) * np.log(actual_pct / expected_pct)))
|
|
133
|
+
|
|
134
|
+
def compute_feature_psi(expected_df, actual_df, columns=None, bins=10) -> dict:
|
|
135
|
+
"""PSI for every numeric feature. Returns {feature: psi} sorted descending."""
|
|
136
|
+
if columns is None:
|
|
137
|
+
columns = expected_df.select_dtypes(include=[np.number]).columns.tolist()
|
|
138
|
+
results = {}
|
|
139
|
+
for col in columns:
|
|
140
|
+
try:
|
|
141
|
+
results[col] = compute_psi(expected_df[col].values, actual_df[col].values, bins)
|
|
142
|
+
except ValueError:
|
|
143
|
+
results[col] = float('nan')
|
|
144
|
+
return dict(sorted(results.items(), key=lambda x: x[1], reverse=True))
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
### Data Quality Report
|
|
148
|
+
|
|
149
|
+
```python
|
|
150
|
+
import pandas as pd
|
|
151
|
+
|
|
152
|
+
def data_quality_report(df: pd.DataFrame) -> dict:
|
|
153
|
+
"""Checks missing values, duplicates, constant columns, infinities, high cardinality."""
|
|
154
|
+
n_rows = len(df)
|
|
155
|
+
missing_pct = (df.isnull().sum() / n_rows * 100).round(2)
|
|
156
|
+
n_dupes = int(df.duplicated().sum())
|
|
157
|
+
constant_cols = [c for c in df.columns if df[c].nunique(dropna=True) <= 1]
|
|
158
|
+
numeric = df.select_dtypes(include=[np.number]).columns
|
|
159
|
+
inf_counts = {c: int(np.isinf(df[c]).sum()) for c in numeric if np.isinf(df[c]).any()}
|
|
160
|
+
cat_cols = df.select_dtypes(include=['object', 'category']).columns
|
|
161
|
+
high_card = {c: int(df[c].nunique()) for c in cat_cols if df[c].nunique() > 0.5 * n_rows}
|
|
162
|
+
return {
|
|
163
|
+
'shape': df.shape, 'duplicate_rows': n_dupes,
|
|
164
|
+
'columns_above_5pct_missing': [c for c, p in missing_pct.items() if p > 5],
|
|
165
|
+
'constant_columns': constant_cols, 'infinite_values': inf_counts,
|
|
166
|
+
'high_cardinality_categoricals': high_card,
|
|
167
|
+
}
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
---
|
|
171
|
+
|
|
172
|
+
## Domain 3: Feature Analysis — SHAP and Multicollinearity
|
|
173
|
+
|
|
174
|
+
### SHAP Audit
|
|
175
|
+
|
|
176
|
+
```python
|
|
177
|
+
import shap
|
|
178
|
+
import matplotlib.pyplot as plt
|
|
179
|
+
from pathlib import Path
|
|
180
|
+
|
|
181
|
+
def run_shap_audit(model, X_test, output_dir: str = 'audit/shap') -> dict:
|
|
182
|
+
"""Global SHAP importance, summary plot, top-5 dependence plots."""
|
|
183
|
+
Path(output_dir).mkdir(parents=True, exist_ok=True)
|
|
184
|
+
tree_types = {'XGBClassifier', 'LGBMClassifier', 'RandomForestClassifier',
|
|
185
|
+
'GradientBoostingClassifier', 'XGBRegressor', 'LGBMRegressor'}
|
|
186
|
+
if type(model).__name__ in tree_types:
|
|
187
|
+
explainer = shap.TreeExplainer(model)
|
|
188
|
+
else:
|
|
189
|
+
background = shap.sample(X_test, min(100, len(X_test)))
|
|
190
|
+
explainer = shap.KernelExplainer(model.predict_proba, background)
|
|
191
|
+
|
|
192
|
+
shap_values = explainer.shap_values(X_test)
|
|
193
|
+
if isinstance(shap_values, list) and len(shap_values) == 2:
|
|
194
|
+
shap_values = shap_values[1] # Positive class for binary
|
|
195
|
+
|
|
196
|
+
mean_abs = np.abs(shap_values).mean(axis=0)
|
|
197
|
+
names = X_test.columns.tolist() if hasattr(X_test, 'columns') else [f'f_{i}' for i in range(X_test.shape[1])]
|
|
198
|
+
importance = dict(sorted(zip(names, mean_abs), key=lambda x: x[1], reverse=True))
|
|
199
|
+
|
|
200
|
+
shap.summary_plot(shap_values, X_test, show=False)
|
|
201
|
+
plt.savefig(f'{output_dir}/shap_summary.png', dpi=150, bbox_inches='tight'); plt.close()
|
|
202
|
+
|
|
203
|
+
for feat in list(importance.keys())[:5]:
|
|
204
|
+
shap.dependence_plot(feat, shap_values, X_test, show=False)
|
|
205
|
+
plt.savefig(f'{output_dir}/dep_{feat}.png', dpi=150, bbox_inches='tight'); plt.close()
|
|
206
|
+
|
|
207
|
+
return {'feature_importance': importance, 'top_5': list(importance.keys())[:5]}
|
|
208
|
+
|
|
209
|
+
def shap_consistency_check(model, X_test, n_bootstrap: int = 5, sample_frac: float = 0.8) -> dict:
|
|
210
|
+
"""Verify SHAP rankings are stable across bootstrap samples."""
|
|
211
|
+
from collections import Counter
|
|
212
|
+
rankings = []
|
|
213
|
+
for i in range(n_bootstrap):
|
|
214
|
+
sample = X_test.sample(frac=sample_frac, random_state=i)
|
|
215
|
+
explainer = shap.TreeExplainer(model)
|
|
216
|
+
sv = explainer.shap_values(sample)
|
|
217
|
+
if isinstance(sv, list) and len(sv) == 2:
|
|
218
|
+
sv = sv[1]
|
|
219
|
+
ranked = np.argsort(-np.abs(sv).mean(axis=0)).tolist()[:5]
|
|
220
|
+
rankings.append(ranked)
|
|
221
|
+
all_top5 = [f for r in rankings for f in r]
|
|
222
|
+
stable = [f for f, c in Counter(all_top5).items() if c == n_bootstrap]
|
|
223
|
+
return {'stable_top5': stable, 'stability_ratio': len(stable) / 5, 'passes': len(stable) >= 3}
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
### Variance Inflation Factor (VIF)
|
|
227
|
+
|
|
228
|
+
```python
|
|
229
|
+
def compute_vif(X: pd.DataFrame) -> pd.DataFrame:
|
|
230
|
+
"""VIF per feature. VIF > 5 = moderate, > 10 = severe multicollinearity."""
|
|
231
|
+
X_arr = X.values.astype(float)
|
|
232
|
+
vif_data = []
|
|
233
|
+
for i in range(X_arr.shape[1]):
|
|
234
|
+
y_i = X_arr[:, i]
|
|
235
|
+
X_i = np.column_stack([np.ones(X_arr.shape[0]), np.delete(X_arr, i, axis=1)])
|
|
236
|
+
try:
|
|
237
|
+
beta = np.linalg.lstsq(X_i, y_i, rcond=None)[0]
|
|
238
|
+
ss_res = np.sum((y_i - X_i @ beta) ** 2)
|
|
239
|
+
ss_tot = np.sum((y_i - y_i.mean()) ** 2)
|
|
240
|
+
r2 = 1 - ss_res / ss_tot if ss_tot > 0 else 0.0
|
|
241
|
+
vif = 1 / (1 - r2) if r2 < 1.0 else float('inf')
|
|
242
|
+
except np.linalg.LinAlgError:
|
|
243
|
+
vif = float('inf')
|
|
244
|
+
vif_data.append({'feature': X.columns[i], 'vif': round(vif, 2),
|
|
245
|
+
'flag': 'SEVERE' if vif > 10 else ('MODERATE' if vif > 5 else 'OK')})
|
|
246
|
+
return pd.DataFrame(vif_data).sort_values('vif', ascending=False)
|
|
247
|
+
```
|
|
248
|
+
|
|
249
|
+
---
|
|
250
|
+
|
|
251
|
+
## Domain 4: Target/Label Quality
|
|
252
|
+
|
|
253
|
+
```python
|
|
254
|
+
from collections import Counter
|
|
255
|
+
|
|
256
|
+
def label_quality_report(y: np.ndarray) -> dict:
|
|
257
|
+
"""Assess class balance and recommend resampling strategy."""
|
|
258
|
+
counts = Counter(y)
|
|
259
|
+
total = len(y)
|
|
260
|
+
majority = max(counts, key=counts.get)
|
|
261
|
+
minority = min(counts, key=counts.get)
|
|
262
|
+
ratio = counts[majority] / counts[minority]
|
|
263
|
+
if ratio < 3: strategy, severity = 'none', 'balanced'
|
|
264
|
+
elif ratio < 10: strategy, severity = 'class_weight', 'moderate'
|
|
265
|
+
elif ratio < 100: strategy, severity = 'SMOTE_or_class_weight', 'severe'
|
|
266
|
+
else: strategy, severity = 'anomaly_detection_reframe', 'extreme'
|
|
267
|
+
return {
|
|
268
|
+
'class_distribution': dict(counts), 'imbalance_ratio': round(ratio, 1),
|
|
269
|
+
'severity': severity, 'recommended_strategy': strategy, 'passes': ratio < 10,
|
|
270
|
+
}
|
|
271
|
+
```
|
|
272
|
+
|
|
273
|
+
---
|
|
274
|
+
|
|
275
|
+
## Domain 5: Calibration Testing
|
|
276
|
+
|
|
277
|
+
### Hosmer-Lemeshow Test
|
|
278
|
+
|
|
279
|
+
```python
|
|
280
|
+
from scipy.stats import chi2
|
|
281
|
+
|
|
282
|
+
def hosmer_lemeshow_test(y_true: np.ndarray, y_prob: np.ndarray, n_groups: int = 10) -> dict:
|
|
283
|
+
"""Goodness-of-fit test. H0: model is well-calibrated. Reject if p < 0.05."""
|
|
284
|
+
order = np.argsort(y_prob)
|
|
285
|
+
y_true_s, y_prob_s = np.asarray(y_true, dtype=float)[order], np.asarray(y_prob, dtype=float)[order]
|
|
286
|
+
groups = np.array_split(np.arange(len(y_true)), n_groups)
|
|
287
|
+
hl_stat = 0.0
|
|
288
|
+
group_details = []
|
|
289
|
+
for idx in groups:
|
|
290
|
+
n_g = len(idx)
|
|
291
|
+
obs = y_true_s[idx].sum()
|
|
292
|
+
exp = y_prob_s[idx].sum()
|
|
293
|
+
if exp > 0: hl_stat += (obs - exp) ** 2 / exp
|
|
294
|
+
if (n_g - exp) > 0: hl_stat += (n_g - obs - (n_g - exp)) ** 2 / (n_g - exp)
|
|
295
|
+
group_details.append({'n': n_g, 'observed_rate': round(float(obs / n_g), 4),
|
|
296
|
+
'predicted_rate': round(float(y_prob_s[idx].mean()), 4)})
|
|
297
|
+
p_value = 1 - chi2.cdf(hl_stat, n_groups - 2)
|
|
298
|
+
return {
|
|
299
|
+
'statistic': round(hl_stat, 4), 'p_value': round(p_value, 4),
|
|
300
|
+
'group_details': group_details, 'passes': p_value > 0.05,
|
|
301
|
+
'interpretation': 'Well calibrated' if p_value > 0.05
|
|
302
|
+
else 'Miscalibrated — consider Platt scaling or isotonic regression',
|
|
303
|
+
}
|
|
304
|
+
```
|
|
305
|
+
|
|
306
|
+
### Calibration Curve and Brier Score
|
|
307
|
+
|
|
308
|
+
```python
|
|
309
|
+
from sklearn.calibration import calibration_curve
|
|
310
|
+
from sklearn.metrics import brier_score_loss
|
|
311
|
+
|
|
312
|
+
def calibration_audit(y_true, y_prob, n_bins=10, output_dir='audit/calibration') -> dict:
|
|
313
|
+
"""Brier score + reliability curve plot. Brier < 0.25 = acceptable."""
|
|
314
|
+
Path(output_dir).mkdir(parents=True, exist_ok=True)
|
|
315
|
+
brier = brier_score_loss(y_true, y_prob)
|
|
316
|
+
frac_pos, mean_pred = calibration_curve(y_true, y_prob, n_bins=n_bins, strategy='uniform')
|
|
317
|
+
|
|
318
|
+
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
|
|
319
|
+
ax1.plot([0, 1], [0, 1], 'k--', label='Perfect')
|
|
320
|
+
ax1.plot(mean_pred, frac_pos, 'o-', label=f'Model (Brier={brier:.4f})')
|
|
321
|
+
ax1.set_xlabel('Mean predicted'); ax1.set_ylabel('Fraction positive')
|
|
322
|
+
ax1.set_title('Calibration Curve'); ax1.legend()
|
|
323
|
+
ax2.hist(y_prob, bins=50, alpha=0.7); ax2.set_title('Prediction Distribution')
|
|
324
|
+
plt.tight_layout()
|
|
325
|
+
plt.savefig(f'{output_dir}/calibration_plot.png', dpi=150, bbox_inches='tight'); plt.close()
|
|
326
|
+
|
|
327
|
+
return {'brier_score': round(brier, 4), 'passes': brier < 0.25,
|
|
328
|
+
'calibration_bins': {'predicted': mean_pred.tolist(), 'observed': frac_pos.tolist()}}
|
|
329
|
+
```
|
|
330
|
+
|
|
331
|
+
---
|
|
332
|
+
|
|
333
|
+
## Domain 6: Discrimination Metrics
|
|
334
|
+
|
|
335
|
+
```python
|
|
336
|
+
from sklearn.metrics import roc_auc_score, average_precision_score, roc_curve
|
|
337
|
+
from scipy.stats import ks_2samp
|
|
338
|
+
|
|
339
|
+
def discrimination_report(y_true: np.ndarray, y_prob: np.ndarray) -> dict:
|
|
340
|
+
"""AUC-ROC, Gini, KS statistic, PR-AUC with grading."""
|
|
341
|
+
auc = roc_auc_score(y_true, y_prob)
|
|
342
|
+
gini = 2 * auc - 1
|
|
343
|
+
ks_stat = ks_2samp(y_prob[y_true == 1], y_prob[y_true == 0]).statistic
|
|
344
|
+
fpr, tpr, thresholds = roc_curve(y_true, y_prob)
|
|
345
|
+
optimal_threshold = float(thresholds[np.argmax(tpr - fpr)])
|
|
346
|
+
pr_auc = average_precision_score(y_true, y_prob)
|
|
347
|
+
if auc >= 0.9: grade = 'EXCELLENT'
|
|
348
|
+
elif auc >= 0.8: grade = 'GOOD'
|
|
349
|
+
elif auc >= 0.7: grade = 'ACCEPTABLE'
|
|
350
|
+
elif auc >= 0.6: grade = 'POOR'
|
|
351
|
+
else: grade = 'FAIL'
|
|
352
|
+
return {'AUC-ROC': round(auc, 4), 'Gini': round(gini, 4), 'KS': round(ks_stat, 4),
|
|
353
|
+
'KS_optimal_threshold': round(optimal_threshold, 4), 'PR-AUC': round(pr_auc, 4),
|
|
354
|
+
'grade': grade, 'passes': auc > 0.7}
|
|
355
|
+
```
|
|
356
|
+
|
|
357
|
+
---
|
|
358
|
+
|
|
359
|
+
## Domain 7: Fairness Assessment
|
|
360
|
+
|
|
361
|
+
**Legal context:** US EEOC 4/5ths rule — selection rate for protected group must be >= 80% of highest-rate group. EU AI Act Article 10 — high-risk systems must use representative training data, examine biases. ECOA/Reg B — prohibits discrimination in credit by race, sex, age, etc.
|
|
362
|
+
|
|
363
|
+
### Disparate Impact and Equalized Odds
|
|
364
|
+
|
|
365
|
+
```python
|
|
366
|
+
def disparate_impact_ratio(y_pred: np.ndarray, protected_attr: np.ndarray) -> dict:
|
|
367
|
+
"""4/5ths rule: ratio >= 0.8 for all groups."""
|
|
368
|
+
groups = np.unique(protected_attr)
|
|
369
|
+
rates = {str(g): float(y_pred[protected_attr == g].mean()) for g in groups}
|
|
370
|
+
max_rate = max(rates.values())
|
|
371
|
+
results = {}
|
|
372
|
+
for g, rate in rates.items():
|
|
373
|
+
ratio = rate / max_rate if max_rate > 0 else 0.0
|
|
374
|
+
results[g] = {'rate': round(rate, 4), 'ratio': round(ratio, 4), 'passes': ratio >= 0.8}
|
|
375
|
+
return {'group_results': results, 'overall_passes': all(r['passes'] for r in results.values())}
|
|
376
|
+
|
|
377
|
+
def equalized_odds_check(y_true, y_pred, protected_attr, threshold=0.05) -> dict:
|
|
378
|
+
"""FPR and TPR should be similar across groups (within threshold)."""
|
|
379
|
+
groups = np.unique(protected_attr)
|
|
380
|
+
metrics = {}
|
|
381
|
+
for g in groups:
|
|
382
|
+
mask = protected_attr == g
|
|
383
|
+
yt, yp = y_true[mask], y_pred[mask]
|
|
384
|
+
tp = ((yt == 1) & (yp == 1)).sum(); fn = ((yt == 1) & (yp == 0)).sum()
|
|
385
|
+
fp = ((yt == 0) & (yp == 1)).sum(); tn = ((yt == 0) & (yp == 0)).sum()
|
|
386
|
+
tpr = tp / (tp + fn) if (tp + fn) > 0 else 0.0
|
|
387
|
+
fpr = fp / (fp + tn) if (fp + tn) > 0 else 0.0
|
|
388
|
+
metrics[str(g)] = {'TPR': round(tpr, 4), 'FPR': round(fpr, 4)}
|
|
389
|
+
tpr_gap = max(m['TPR'] for m in metrics.values()) - min(m['TPR'] for m in metrics.values())
|
|
390
|
+
fpr_gap = max(m['FPR'] for m in metrics.values()) - min(m['FPR'] for m in metrics.values())
|
|
391
|
+
return {'group_metrics': metrics, 'TPR_gap': round(tpr_gap, 4), 'FPR_gap': round(fpr_gap, 4),
|
|
392
|
+
'passes_equalized_odds': tpr_gap <= threshold and fpr_gap <= threshold}
|
|
393
|
+
```
|
|
394
|
+
|
|
395
|
+
---
|
|
396
|
+
|
|
397
|
+
## Domain 8: Interpretability
|
|
398
|
+
|
|
399
|
+
**EU AI Act risk-level requirements:**
|
|
400
|
+
|
|
401
|
+
| Risk Level | Examples | Required Interpretability |
|
|
402
|
+
|---|---|---|
|
|
403
|
+
| Unacceptable | Social scoring, real-time biometric | Prohibited |
|
|
404
|
+
| High | Credit, hiring, criminal justice | Full SHAP/LIME, per-prediction explanations, human-in-the-loop |
|
|
405
|
+
| Limited | Chatbots, recommendations | Transparency obligations |
|
|
406
|
+
| Minimal | Spam filters, game AI | Best practice only |
|
|
407
|
+
|
|
408
|
+
### Local Explanation Stability
|
|
409
|
+
|
|
410
|
+
```python
|
|
411
|
+
from scipy.stats import spearmanr
|
|
412
|
+
|
|
413
|
+
def explanation_stability_test(explainer, instance, n_perturbations=20, noise_scale=0.01) -> dict:
|
|
414
|
+
"""Test if local explanations are stable under small input perturbations."""
|
|
415
|
+
base = explainer.shap_values(instance.reshape(1, -1))
|
|
416
|
+
if isinstance(base, list): base = base[1]
|
|
417
|
+
base = base.flatten()
|
|
418
|
+
correlations = []
|
|
419
|
+
for i in range(n_perturbations):
|
|
420
|
+
noise = np.random.RandomState(i).normal(0, noise_scale, size=instance.shape)
|
|
421
|
+
sv = explainer.shap_values((instance + noise).reshape(1, -1))
|
|
422
|
+
if isinstance(sv, list): sv = sv[1]
|
|
423
|
+
corr, _ = spearmanr(base, sv.flatten())
|
|
424
|
+
correlations.append(corr)
|
|
425
|
+
mean_corr = float(np.mean(correlations))
|
|
426
|
+
return {'mean_rank_correlation': round(mean_corr, 4),
|
|
427
|
+
'min_rank_correlation': round(float(np.min(correlations)), 4),
|
|
428
|
+
'passes': mean_corr > 0.8 and min(correlations) > 0.5}
|
|
429
|
+
```
|
|
430
|
+
|
|
431
|
+
---
|
|
432
|
+
|
|
433
|
+
## Domain 9: Production Monitoring Pipeline
|
|
434
|
+
|
|
435
|
+
```
|
|
436
|
+
Alerting Layer
|
|
437
|
+
┌─────────────────────────────────┐
|
|
438
|
+
│ PSI > 0.2 → PagerDuty │
|
|
439
|
+
│ AUC drop > 5% → Slack │
|
|
440
|
+
│ Label drift → Email │
|
|
441
|
+
└──────────┬──────────────────────┘
|
|
442
|
+
│
|
|
443
|
+
┌──────────▼──────────────────────┐
|
|
444
|
+
│ Drift Detection Engine │
|
|
445
|
+
│ Feature PSI | Prediction shift │
|
|
446
|
+
│ Label drift | Rolling AUC │
|
|
447
|
+
└──────────┬──────────────────────┘
|
|
448
|
+
│
|
|
449
|
+
┌──────────▼──────────────────────┐
|
|
450
|
+
│ Scoring Pipeline │
|
|
451
|
+
│ Data → Features → Model → Score │
|
|
452
|
+
│ (log each stage for monitoring) │
|
|
453
|
+
└─────────────────────────────────┘
|
|
454
|
+
```
|
|
455
|
+
|
|
456
|
+
**Alerting thresholds:**
|
|
457
|
+
|
|
458
|
+
| Metric | Yellow (Investigate) | Red (Action Required) |
|
|
459
|
+
|---|---|---|
|
|
460
|
+
| Feature PSI (any feature) | > 0.1 | > 0.2 |
|
|
461
|
+
| Prediction PSI | > 0.1 | > 0.2 |
|
|
462
|
+
| Rolling AUC (7-day) | < baseline - 3% | < baseline - 5% |
|
|
463
|
+
| Missing value rate | > 2x training rate | > 5x training rate |
|
|
464
|
+
| Prediction volume | < 50% normal | < 20% normal |
|
|
465
|
+
|
|
466
|
+
```python
|
|
467
|
+
from dataclasses import dataclass, field
|
|
468
|
+
from datetime import datetime
|
|
469
|
+
from typing import Optional
|
|
470
|
+
|
|
471
|
+
@dataclass
|
|
472
|
+
class MonitoringResult:
|
|
473
|
+
metric_name: str
|
|
474
|
+
current_value: float
|
|
475
|
+
threshold_yellow: float
|
|
476
|
+
threshold_red: float
|
|
477
|
+
status: str # GREEN, YELLOW, RED
|
|
478
|
+
details: Optional[str] = None
|
|
479
|
+
|
|
480
|
+
class ModelMonitor:
|
|
481
|
+
"""Production monitoring: feature drift, prediction drift, performance degradation."""
|
|
482
|
+
def __init__(self, reference_features, reference_predictions, feature_names, baseline_auc):
|
|
483
|
+
self.ref_features = reference_features
|
|
484
|
+
self.ref_predictions = reference_predictions
|
|
485
|
+
self.feature_names = feature_names
|
|
486
|
+
self.baseline_auc = baseline_auc
|
|
487
|
+
|
|
488
|
+
def check_feature_drift(self, current_features) -> list:
|
|
489
|
+
results = []
|
|
490
|
+
for i, name in enumerate(self.feature_names):
|
|
491
|
+
psi = compute_psi(self.ref_features[:, i], current_features[:, i])
|
|
492
|
+
status = 'RED' if psi > 0.2 else ('YELLOW' if psi > 0.1 else 'GREEN')
|
|
493
|
+
results.append(MonitoringResult(f'psi_{name}', round(psi, 4), 0.1, 0.2, status))
|
|
494
|
+
return results
|
|
495
|
+
|
|
496
|
+
def check_prediction_drift(self, current_predictions) -> MonitoringResult:
|
|
497
|
+
psi = compute_psi(self.ref_predictions, current_predictions)
|
|
498
|
+
status = 'RED' if psi > 0.2 else ('YELLOW' if psi > 0.1 else 'GREEN')
|
|
499
|
+
return MonitoringResult('prediction_psi', round(psi, 4), 0.1, 0.2, status)
|
|
500
|
+
|
|
501
|
+
def check_performance(self, y_true, y_prob) -> MonitoringResult:
|
|
502
|
+
current_auc = roc_auc_score(y_true, y_prob)
|
|
503
|
+
drop = self.baseline_auc - current_auc
|
|
504
|
+
status = 'RED' if drop > 0.05 else ('YELLOW' if drop > 0.03 else 'GREEN')
|
|
505
|
+
return MonitoringResult('rolling_auc', round(current_auc, 4),
|
|
506
|
+
round(self.baseline_auc - 0.03, 4),
|
|
507
|
+
round(self.baseline_auc - 0.05, 4), status,
|
|
508
|
+
f'drop={drop:.4f}')
|
|
509
|
+
|
|
510
|
+
def run_full_check(self, current_features, current_predictions,
|
|
511
|
+
y_true=None, y_prob=None) -> dict:
|
|
512
|
+
results = self.check_feature_drift(current_features)
|
|
513
|
+
results.append(self.check_prediction_drift(current_predictions))
|
|
514
|
+
if y_true is not None and y_prob is not None:
|
|
515
|
+
results.append(self.check_performance(y_true, y_prob))
|
|
516
|
+
statuses = [r.status for r in results]
|
|
517
|
+
overall = 'RED' if 'RED' in statuses else ('YELLOW' if 'YELLOW' in statuses else 'GREEN')
|
|
518
|
+
return {
|
|
519
|
+
'overall_status': overall, 'n_checks': len(results),
|
|
520
|
+
'red_alerts': [{'metric': r.metric_name, 'value': r.current_value}
|
|
521
|
+
for r in results if r.status == 'RED'],
|
|
522
|
+
'yellow_alerts': [{'metric': r.metric_name, 'value': r.current_value}
|
|
523
|
+
for r in results if r.status == 'YELLOW'],
|
|
524
|
+
}
|
|
525
|
+
```
|
|
526
|
+
|
|
527
|
+
---
|
|
528
|
+
|
|
529
|
+
## Domain 10: Business Impact
|
|
530
|
+
|
|
531
|
+
```python
|
|
532
|
+
def cost_matrix_evaluation(y_true, y_pred, cost_tp=0.0, cost_fp=-100.0,
|
|
533
|
+
cost_fn=-500.0, cost_tn=0.0) -> dict:
|
|
534
|
+
"""Evaluate model using business cost matrix. Defaults: fraud detection scenario."""
|
|
535
|
+
y_true, y_pred = np.asarray(y_true), np.asarray(y_pred)
|
|
536
|
+
tp = int(((y_true == 1) & (y_pred == 1)).sum())
|
|
537
|
+
fp = int(((y_true == 0) & (y_pred == 1)).sum())
|
|
538
|
+
fn = int(((y_true == 1) & (y_pred == 0)).sum())
|
|
539
|
+
tn = int(((y_true == 0) & (y_pred == 0)).sum())
|
|
540
|
+
total_cost = tp * cost_tp + fp * cost_fp + fn * cost_fn + tn * cost_tn
|
|
541
|
+
baseline = y_true.sum() * cost_fn + (len(y_true) - y_true.sum()) * cost_tn
|
|
542
|
+
net_benefit = total_cost - baseline
|
|
543
|
+
return {'confusion_matrix': {'TP': tp, 'FP': fp, 'FN': fn, 'TN': tn},
|
|
544
|
+
'total_cost': round(total_cost, 2), 'baseline_no_model': round(baseline, 2),
|
|
545
|
+
'net_benefit': round(net_benefit, 2), 'roi_positive': net_benefit > 0}
|
|
546
|
+
```
|
|
547
|
+
|
|
548
|
+
---
|
|
549
|
+
|
|
550
|
+
## Anti-Patterns
|
|
551
|
+
|
|
552
|
+
### 1. Training on test data (data leakage)
|
|
553
|
+
Features include information unavailable at prediction time. Model appears brilliant in eval, fails in production. **Detect:** suspiciously high single-feature importance; future data in features; scaler fit before train/test split.
|
|
554
|
+
|
|
555
|
+
### 2. Optimizing aggregate metrics only
|
|
556
|
+
Overall AUC 0.85, minority subgroup AUC 0.55. Aggregate masks subgroup failure. **Prevent:** always stratify metrics by protected attributes, geography, business segments (Domain 7).
|
|
557
|
+
|
|
558
|
+
### 3. Deploy and forget
|
|
559
|
+
Model degrades silently as distributions shift. Zillow's $569M write-down is the canonical example. **Prevent:** implement monitoring pipeline (Domain 9). No model ships without drift detection.
|
|
560
|
+
|
|
561
|
+
### 4. Fairness washing
|
|
562
|
+
Computing disparate impact to check a box, taking no action when ratios fall below 0.8. **Prevent:** fairness metrics must have automated deployment gates, same as failing tests.
|
|
563
|
+
|
|
564
|
+
### 5. Overfitting to validation set
|
|
565
|
+
After 200 hyperparameter tuning rounds against the validation set, the model memorizes it. **Prevent:** holdout test evaluated once at the end; use cross-validation for hyperparameter search.
|
|
566
|
+
|
|
567
|
+
### 6. Ignoring class imbalance
|
|
568
|
+
Predicting "not fraud" for every transaction yields 99.5% accuracy on 0.5% fraud data. **Prevent:** if imbalance > 10:1, accuracy is invalid. Use PR-AUC, F1, or cost-weighted metrics.
|
|
569
|
+
|
|
570
|
+
### 7. Single metric obsession
|
|
571
|
+
AUC 0.92 but predicted 0.7 corresponds to 30% actual event rate. Every threshold decision is wrong. **Prevent:** always audit calibration (Domain 5) alongside discrimination (Domain 6).
|
|
572
|
+
|
|
573
|
+
### 8. Missing data provenance
|
|
574
|
+
Cannot reproduce training dataset six months later for a regulator. **Prevent:** version training data alongside model artifacts. Record query, filters, date range, random seed.
|
|
575
|
+
|
|
576
|
+
### 9. Uncalibrated probability usage
|
|
577
|
+
Random forest `predict_proba` treated as true probability for risk tiers. RF outputs are vote fractions, not probabilities. **Prevent:** calibrate with Platt scaling or isotonic regression; test with Hosmer-Lemeshow.
|
|
578
|
+
|
|
579
|
+
### 10. Threshold selection on training data
|
|
580
|
+
Operating threshold optimized on training set; production distribution differs. **Prevent:** select thresholds on validation set using business cost matrix; re-evaluate periodically.
|
|
581
|
+
|
|
582
|
+
---
|
|
583
|
+
|
|
584
|
+
## Recalibration Strategies
|
|
585
|
+
|
|
586
|
+
When calibration fails (Hosmer-Lemeshow p < 0.05 or Brier > 0.25), apply one of these post-hoc methods:
|
|
587
|
+
|
|
588
|
+
| Method | When to Use | Pros | Cons |
|
|
589
|
+
|---|---|---|---|
|
|
590
|
+
| Platt Scaling | Binary classification, sigmoid-shaped miscalibration | Simple, works well for SVMs and neural nets | Assumes sigmoid relationship |
|
|
591
|
+
| Isotonic Regression | Non-parametric miscalibration | No shape assumption, flexible | Requires more data, can overfit on small sets |
|
|
592
|
+
| Beta Calibration | Skewed prediction distributions | Handles asymmetric miscalibration | More complex, less widely supported |
|
|
593
|
+
| Temperature Scaling | Neural network confidence calibration | Single parameter, preserves ranking | Only adjusts sharpness, not shape |
|
|
594
|
+
|
|
595
|
+
Always recalibrate on a held-out calibration set (not training or test). Re-run Hosmer-Lemeshow after recalibration to confirm improvement.
|
|
596
|
+
|
|
597
|
+
---
|
|
598
|
+
|
|
599
|
+
## Deployment Audit Checklist
|
|
600
|
+
|
|
601
|
+
| # | Check | Pass Criteria |
|
|
602
|
+
|---|---|---|
|
|
603
|
+
| 1 | Model card complete | All required sections filled |
|
|
604
|
+
| 2 | Data quality | No columns > 5% missing, no leakage |
|
|
605
|
+
| 3 | Feature VIF | < 5 for all features (or justified) |
|
|
606
|
+
| 4 | Class imbalance | < 10:1 (or mitigation documented) |
|
|
607
|
+
| 5 | Calibration | Hosmer-Lemeshow p > 0.05 |
|
|
608
|
+
| 6 | Discrimination | AUC-ROC > 0.7 on holdout |
|
|
609
|
+
| 7 | Fairness | Disparate impact > 0.8 all groups |
|
|
610
|
+
| 8 | SHAP stability | Rankings stable across bootstrap |
|
|
611
|
+
| 9 | Monitoring | Pipeline deployed with PSI alerts |
|
|
612
|
+
| 10 | Business impact | Cost matrix shows positive ROI |
|
|
613
|
+
| 11 | Data versioning | Training data reproducible |
|
|
614
|
+
| 12 | Rollback plan | Documented and tested |
|