@wazir-dev/cli 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (629) hide show
  1. package/AGENTS.md +111 -0
  2. package/CHANGELOG.md +14 -0
  3. package/CONTRIBUTING.md +101 -0
  4. package/LICENSE +21 -0
  5. package/README.md +314 -0
  6. package/assets/composition-engine.mmd +34 -0
  7. package/assets/demo-script.sh +17 -0
  8. package/assets/logo-dark.svg +14 -0
  9. package/assets/logo.svg +14 -0
  10. package/assets/pipeline.mmd +39 -0
  11. package/assets/record-demo.sh +51 -0
  12. package/docs/README.md +51 -0
  13. package/docs/adapters/context-mode.md +60 -0
  14. package/docs/concepts/architecture.md +87 -0
  15. package/docs/concepts/artifact-model.md +60 -0
  16. package/docs/concepts/composition-engine.md +36 -0
  17. package/docs/concepts/indexing-and-recall.md +160 -0
  18. package/docs/concepts/observability.md +41 -0
  19. package/docs/concepts/roles-and-workflows.md +59 -0
  20. package/docs/concepts/terminology-policy.md +27 -0
  21. package/docs/getting-started/01-installation.md +78 -0
  22. package/docs/getting-started/02-first-run.md +102 -0
  23. package/docs/getting-started/03-adding-to-project.md +15 -0
  24. package/docs/getting-started/04-host-setup.md +15 -0
  25. package/docs/guides/ci-integration.md +15 -0
  26. package/docs/guides/creating-skills.md +15 -0
  27. package/docs/guides/expertise-module-authoring.md +15 -0
  28. package/docs/guides/hook-development.md +15 -0
  29. package/docs/guides/memory-and-learnings.md +34 -0
  30. package/docs/guides/multi-host-export.md +15 -0
  31. package/docs/guides/troubleshooting.md +101 -0
  32. package/docs/guides/writing-custom-roles.md +15 -0
  33. package/docs/plans/2026-03-15-cli-pipeline-integration-design.md +592 -0
  34. package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md +598 -0
  35. package/docs/plans/2026-03-15-docs-enforcement-plan.md +238 -0
  36. package/docs/readmes/INDEX.md +99 -0
  37. package/docs/readmes/features/expertise/README.md +171 -0
  38. package/docs/readmes/features/exports/README.md +222 -0
  39. package/docs/readmes/features/hooks/README.md +103 -0
  40. package/docs/readmes/features/hooks/loop-cap-guard.md +133 -0
  41. package/docs/readmes/features/hooks/post-tool-capture.md +121 -0
  42. package/docs/readmes/features/hooks/post-tool-lint.md +130 -0
  43. package/docs/readmes/features/hooks/pre-compact-summary.md +122 -0
  44. package/docs/readmes/features/hooks/pre-tool-capture-route.md +100 -0
  45. package/docs/readmes/features/hooks/protected-path-write-guard.md +128 -0
  46. package/docs/readmes/features/hooks/session-start.md +119 -0
  47. package/docs/readmes/features/hooks/stop-handoff-harvest.md +125 -0
  48. package/docs/readmes/features/roles/README.md +157 -0
  49. package/docs/readmes/features/roles/clarifier.md +152 -0
  50. package/docs/readmes/features/roles/content-author.md +190 -0
  51. package/docs/readmes/features/roles/designer.md +193 -0
  52. package/docs/readmes/features/roles/executor.md +184 -0
  53. package/docs/readmes/features/roles/learner.md +210 -0
  54. package/docs/readmes/features/roles/planner.md +182 -0
  55. package/docs/readmes/features/roles/researcher.md +164 -0
  56. package/docs/readmes/features/roles/reviewer.md +184 -0
  57. package/docs/readmes/features/roles/specifier.md +162 -0
  58. package/docs/readmes/features/roles/verifier.md +215 -0
  59. package/docs/readmes/features/schemas/README.md +178 -0
  60. package/docs/readmes/features/skills/README.md +63 -0
  61. package/docs/readmes/features/skills/brainstorming.md +96 -0
  62. package/docs/readmes/features/skills/debugging.md +148 -0
  63. package/docs/readmes/features/skills/design.md +120 -0
  64. package/docs/readmes/features/skills/prepare-next.md +109 -0
  65. package/docs/readmes/features/skills/run-audit.md +159 -0
  66. package/docs/readmes/features/skills/scan-project.md +109 -0
  67. package/docs/readmes/features/skills/self-audit.md +176 -0
  68. package/docs/readmes/features/skills/tdd.md +137 -0
  69. package/docs/readmes/features/skills/using-skills.md +92 -0
  70. package/docs/readmes/features/skills/verification.md +120 -0
  71. package/docs/readmes/features/skills/writing-plans.md +104 -0
  72. package/docs/readmes/features/tooling/README.md +320 -0
  73. package/docs/readmes/features/workflows/README.md +186 -0
  74. package/docs/readmes/features/workflows/author.md +181 -0
  75. package/docs/readmes/features/workflows/clarify.md +154 -0
  76. package/docs/readmes/features/workflows/design-review.md +171 -0
  77. package/docs/readmes/features/workflows/design.md +169 -0
  78. package/docs/readmes/features/workflows/discover.md +162 -0
  79. package/docs/readmes/features/workflows/execute.md +173 -0
  80. package/docs/readmes/features/workflows/learn.md +167 -0
  81. package/docs/readmes/features/workflows/plan-review.md +165 -0
  82. package/docs/readmes/features/workflows/plan.md +170 -0
  83. package/docs/readmes/features/workflows/prepare-next.md +167 -0
  84. package/docs/readmes/features/workflows/review.md +169 -0
  85. package/docs/readmes/features/workflows/run-audit.md +191 -0
  86. package/docs/readmes/features/workflows/spec-challenge.md +159 -0
  87. package/docs/readmes/features/workflows/specify.md +160 -0
  88. package/docs/readmes/features/workflows/verify.md +177 -0
  89. package/docs/readmes/packages/README.md +50 -0
  90. package/docs/readmes/packages/ajv.md +117 -0
  91. package/docs/readmes/packages/context-mode.md +118 -0
  92. package/docs/readmes/packages/gray-matter.md +116 -0
  93. package/docs/readmes/packages/node-test.md +137 -0
  94. package/docs/readmes/packages/yaml.md +112 -0
  95. package/docs/reference/configuration-reference.md +159 -0
  96. package/docs/reference/expertise-index.md +52 -0
  97. package/docs/reference/git-flow.md +43 -0
  98. package/docs/reference/hooks.md +87 -0
  99. package/docs/reference/host-exports.md +50 -0
  100. package/docs/reference/launch-checklist.md +172 -0
  101. package/docs/reference/marketplace-listings.md +76 -0
  102. package/docs/reference/release-process.md +34 -0
  103. package/docs/reference/roles-reference.md +77 -0
  104. package/docs/reference/skills.md +33 -0
  105. package/docs/reference/templates.md +29 -0
  106. package/docs/reference/tooling-cli.md +94 -0
  107. package/docs/truth-claims.yaml +222 -0
  108. package/expertise/PROGRESS.md +63 -0
  109. package/expertise/README.md +18 -0
  110. package/expertise/antipatterns/PROGRESS.md +56 -0
  111. package/expertise/antipatterns/backend/api-design-antipatterns.md +1271 -0
  112. package/expertise/antipatterns/backend/auth-antipatterns.md +1195 -0
  113. package/expertise/antipatterns/backend/caching-antipatterns.md +622 -0
  114. package/expertise/antipatterns/backend/database-antipatterns.md +1038 -0
  115. package/expertise/antipatterns/backend/index.md +24 -0
  116. package/expertise/antipatterns/backend/microservices-antipatterns.md +850 -0
  117. package/expertise/antipatterns/code/architecture-antipatterns.md +919 -0
  118. package/expertise/antipatterns/code/async-antipatterns.md +622 -0
  119. package/expertise/antipatterns/code/code-smells.md +1186 -0
  120. package/expertise/antipatterns/code/dependency-antipatterns.md +1209 -0
  121. package/expertise/antipatterns/code/error-handling-antipatterns.md +1360 -0
  122. package/expertise/antipatterns/code/index.md +27 -0
  123. package/expertise/antipatterns/code/naming-and-abstraction.md +1118 -0
  124. package/expertise/antipatterns/code/state-management-antipatterns.md +1076 -0
  125. package/expertise/antipatterns/code/testing-antipatterns.md +1053 -0
  126. package/expertise/antipatterns/design/accessibility-antipatterns.md +1136 -0
  127. package/expertise/antipatterns/design/dark-patterns.md +1121 -0
  128. package/expertise/antipatterns/design/index.md +22 -0
  129. package/expertise/antipatterns/design/ui-antipatterns.md +1202 -0
  130. package/expertise/antipatterns/design/ux-antipatterns.md +680 -0
  131. package/expertise/antipatterns/frontend/css-layout-antipatterns.md +691 -0
  132. package/expertise/antipatterns/frontend/flutter-antipatterns.md +1827 -0
  133. package/expertise/antipatterns/frontend/index.md +23 -0
  134. package/expertise/antipatterns/frontend/mobile-antipatterns.md +573 -0
  135. package/expertise/antipatterns/frontend/react-antipatterns.md +1128 -0
  136. package/expertise/antipatterns/frontend/spa-antipatterns.md +1235 -0
  137. package/expertise/antipatterns/index.md +31 -0
  138. package/expertise/antipatterns/performance/index.md +20 -0
  139. package/expertise/antipatterns/performance/performance-antipatterns.md +1013 -0
  140. package/expertise/antipatterns/performance/premature-optimization.md +623 -0
  141. package/expertise/antipatterns/performance/scaling-antipatterns.md +785 -0
  142. package/expertise/antipatterns/process/ai-coding-antipatterns.md +853 -0
  143. package/expertise/antipatterns/process/code-review-antipatterns.md +656 -0
  144. package/expertise/antipatterns/process/deployment-antipatterns.md +920 -0
  145. package/expertise/antipatterns/process/index.md +23 -0
  146. package/expertise/antipatterns/process/technical-debt-antipatterns.md +647 -0
  147. package/expertise/antipatterns/security/index.md +20 -0
  148. package/expertise/antipatterns/security/secrets-antipatterns.md +849 -0
  149. package/expertise/antipatterns/security/security-theater.md +843 -0
  150. package/expertise/antipatterns/security/vulnerability-patterns.md +801 -0
  151. package/expertise/architecture/PROGRESS.md +70 -0
  152. package/expertise/architecture/data/caching-architecture.md +671 -0
  153. package/expertise/architecture/data/data-consistency.md +574 -0
  154. package/expertise/architecture/data/data-modeling.md +536 -0
  155. package/expertise/architecture/data/event-streams-and-queues.md +634 -0
  156. package/expertise/architecture/data/index.md +25 -0
  157. package/expertise/architecture/data/search-architecture.md +663 -0
  158. package/expertise/architecture/data/sql-vs-nosql.md +708 -0
  159. package/expertise/architecture/decisions/architecture-decision-records.md +640 -0
  160. package/expertise/architecture/decisions/build-vs-buy.md +616 -0
  161. package/expertise/architecture/decisions/index.md +23 -0
  162. package/expertise/architecture/decisions/monolith-to-microservices.md +790 -0
  163. package/expertise/architecture/decisions/technology-selection.md +616 -0
  164. package/expertise/architecture/distributed/cap-theorem-and-tradeoffs.md +800 -0
  165. package/expertise/architecture/distributed/circuit-breaker-bulkhead.md +741 -0
  166. package/expertise/architecture/distributed/consensus-and-coordination.md +796 -0
  167. package/expertise/architecture/distributed/distributed-systems-fundamentals.md +564 -0
  168. package/expertise/architecture/distributed/idempotency-and-retry.md +796 -0
  169. package/expertise/architecture/distributed/index.md +25 -0
  170. package/expertise/architecture/distributed/saga-pattern.md +797 -0
  171. package/expertise/architecture/foundations/architectural-thinking.md +460 -0
  172. package/expertise/architecture/foundations/coupling-and-cohesion.md +770 -0
  173. package/expertise/architecture/foundations/design-principles-solid.md +649 -0
  174. package/expertise/architecture/foundations/domain-driven-design.md +719 -0
  175. package/expertise/architecture/foundations/index.md +25 -0
  176. package/expertise/architecture/foundations/separation-of-concerns.md +472 -0
  177. package/expertise/architecture/foundations/twelve-factor-app.md +797 -0
  178. package/expertise/architecture/index.md +34 -0
  179. package/expertise/architecture/integration/api-design-graphql.md +638 -0
  180. package/expertise/architecture/integration/api-design-grpc.md +804 -0
  181. package/expertise/architecture/integration/api-design-rest.md +892 -0
  182. package/expertise/architecture/integration/index.md +25 -0
  183. package/expertise/architecture/integration/third-party-integration.md +795 -0
  184. package/expertise/architecture/integration/webhooks-and-callbacks.md +1152 -0
  185. package/expertise/architecture/integration/websockets-realtime.md +791 -0
  186. package/expertise/architecture/mobile-architecture/index.md +22 -0
  187. package/expertise/architecture/mobile-architecture/mobile-app-architecture.md +780 -0
  188. package/expertise/architecture/mobile-architecture/mobile-backend-for-frontend.md +670 -0
  189. package/expertise/architecture/mobile-architecture/offline-first.md +719 -0
  190. package/expertise/architecture/mobile-architecture/push-and-sync.md +782 -0
  191. package/expertise/architecture/patterns/cqrs-event-sourcing.md +717 -0
  192. package/expertise/architecture/patterns/event-driven.md +797 -0
  193. package/expertise/architecture/patterns/hexagonal-clean-architecture.md +870 -0
  194. package/expertise/architecture/patterns/index.md +27 -0
  195. package/expertise/architecture/patterns/layered-architecture.md +736 -0
  196. package/expertise/architecture/patterns/microservices.md +753 -0
  197. package/expertise/architecture/patterns/modular-monolith.md +692 -0
  198. package/expertise/architecture/patterns/monolith.md +626 -0
  199. package/expertise/architecture/patterns/plugin-architecture.md +735 -0
  200. package/expertise/architecture/patterns/serverless.md +780 -0
  201. package/expertise/architecture/scaling/database-scaling.md +615 -0
  202. package/expertise/architecture/scaling/feature-flags-and-rollouts.md +757 -0
  203. package/expertise/architecture/scaling/horizontal-vs-vertical.md +606 -0
  204. package/expertise/architecture/scaling/index.md +24 -0
  205. package/expertise/architecture/scaling/multi-tenancy.md +800 -0
  206. package/expertise/architecture/scaling/stateless-design.md +787 -0
  207. package/expertise/backend/embedded-firmware.md +625 -0
  208. package/expertise/backend/go.md +853 -0
  209. package/expertise/backend/index.md +24 -0
  210. package/expertise/backend/java-spring.md +448 -0
  211. package/expertise/backend/node-typescript.md +625 -0
  212. package/expertise/backend/python-fastapi.md +724 -0
  213. package/expertise/backend/rust.md +458 -0
  214. package/expertise/backend/solidity.md +711 -0
  215. package/expertise/composition-map.yaml +443 -0
  216. package/expertise/content/foundations/content-modeling.md +395 -0
  217. package/expertise/content/foundations/editorial-standards.md +449 -0
  218. package/expertise/content/foundations/index.md +24 -0
  219. package/expertise/content/foundations/microcopy.md +455 -0
  220. package/expertise/content/foundations/terminology-governance.md +509 -0
  221. package/expertise/content/index.md +34 -0
  222. package/expertise/content/patterns/accessibility-copy.md +518 -0
  223. package/expertise/content/patterns/index.md +24 -0
  224. package/expertise/content/patterns/notification-content.md +433 -0
  225. package/expertise/content/patterns/sample-content.md +486 -0
  226. package/expertise/content/patterns/state-copy.md +439 -0
  227. package/expertise/design/PROGRESS.md +58 -0
  228. package/expertise/design/disciplines/dark-mode-theming.md +577 -0
  229. package/expertise/design/disciplines/design-systems.md +595 -0
  230. package/expertise/design/disciplines/index.md +25 -0
  231. package/expertise/design/disciplines/information-architecture.md +800 -0
  232. package/expertise/design/disciplines/interaction-design.md +788 -0
  233. package/expertise/design/disciplines/responsive-design.md +552 -0
  234. package/expertise/design/disciplines/usability-testing.md +516 -0
  235. package/expertise/design/disciplines/user-research.md +792 -0
  236. package/expertise/design/foundations/accessibility-design.md +796 -0
  237. package/expertise/design/foundations/color-theory.md +797 -0
  238. package/expertise/design/foundations/iconography.md +795 -0
  239. package/expertise/design/foundations/index.md +26 -0
  240. package/expertise/design/foundations/motion-and-animation.md +653 -0
  241. package/expertise/design/foundations/rtl-design.md +585 -0
  242. package/expertise/design/foundations/spacing-and-layout.md +607 -0
  243. package/expertise/design/foundations/typography.md +800 -0
  244. package/expertise/design/foundations/visual-hierarchy.md +761 -0
  245. package/expertise/design/index.md +32 -0
  246. package/expertise/design/patterns/authentication-flows.md +474 -0
  247. package/expertise/design/patterns/content-consumption.md +789 -0
  248. package/expertise/design/patterns/data-display.md +618 -0
  249. package/expertise/design/patterns/e-commerce.md +1494 -0
  250. package/expertise/design/patterns/feedback-and-states.md +642 -0
  251. package/expertise/design/patterns/forms-and-input.md +819 -0
  252. package/expertise/design/patterns/gamification.md +801 -0
  253. package/expertise/design/patterns/index.md +31 -0
  254. package/expertise/design/patterns/microinteractions.md +449 -0
  255. package/expertise/design/patterns/navigation.md +800 -0
  256. package/expertise/design/patterns/notifications.md +705 -0
  257. package/expertise/design/patterns/onboarding.md +700 -0
  258. package/expertise/design/patterns/search-and-filter.md +601 -0
  259. package/expertise/design/patterns/settings-and-preferences.md +768 -0
  260. package/expertise/design/patterns/social-and-community.md +748 -0
  261. package/expertise/design/platforms/desktop-native.md +612 -0
  262. package/expertise/design/platforms/index.md +25 -0
  263. package/expertise/design/platforms/mobile-android.md +825 -0
  264. package/expertise/design/platforms/mobile-cross-platform.md +983 -0
  265. package/expertise/design/platforms/mobile-ios.md +699 -0
  266. package/expertise/design/platforms/tablet.md +794 -0
  267. package/expertise/design/platforms/web-dashboard.md +790 -0
  268. package/expertise/design/platforms/web-responsive.md +550 -0
  269. package/expertise/design/psychology/behavioral-nudges.md +449 -0
  270. package/expertise/design/psychology/cognitive-load.md +1191 -0
  271. package/expertise/design/psychology/error-psychology.md +778 -0
  272. package/expertise/design/psychology/index.md +22 -0
  273. package/expertise/design/psychology/persuasive-design.md +736 -0
  274. package/expertise/design/psychology/user-mental-models.md +623 -0
  275. package/expertise/design/tooling/open-pencil.md +266 -0
  276. package/expertise/frontend/angular.md +1073 -0
  277. package/expertise/frontend/desktop-electron.md +546 -0
  278. package/expertise/frontend/flutter.md +782 -0
  279. package/expertise/frontend/index.md +27 -0
  280. package/expertise/frontend/native-android.md +409 -0
  281. package/expertise/frontend/native-ios.md +490 -0
  282. package/expertise/frontend/react-native.md +1160 -0
  283. package/expertise/frontend/react.md +808 -0
  284. package/expertise/frontend/vue.md +1089 -0
  285. package/expertise/humanize/domain-rules-code.md +79 -0
  286. package/expertise/humanize/domain-rules-content.md +67 -0
  287. package/expertise/humanize/domain-rules-technical-docs.md +56 -0
  288. package/expertise/humanize/index.md +35 -0
  289. package/expertise/humanize/self-audit-checklist.md +87 -0
  290. package/expertise/humanize/sentence-patterns.md +218 -0
  291. package/expertise/humanize/vocabulary-blacklist.md +105 -0
  292. package/expertise/i18n/PROGRESS.md +65 -0
  293. package/expertise/i18n/advanced/accessibility-and-i18n.md +28 -0
  294. package/expertise/i18n/advanced/bidirectional-text-algorithm.md +38 -0
  295. package/expertise/i18n/advanced/complex-scripts.md +30 -0
  296. package/expertise/i18n/advanced/performance-and-i18n.md +27 -0
  297. package/expertise/i18n/advanced/testing-i18n.md +28 -0
  298. package/expertise/i18n/content/content-adaptation.md +23 -0
  299. package/expertise/i18n/content/locale-specific-formatting.md +23 -0
  300. package/expertise/i18n/content/machine-translation-integration.md +28 -0
  301. package/expertise/i18n/content/translation-management.md +29 -0
  302. package/expertise/i18n/foundations/date-time-calendars.md +67 -0
  303. package/expertise/i18n/foundations/i18n-architecture.md +272 -0
  304. package/expertise/i18n/foundations/locale-and-language-tags.md +79 -0
  305. package/expertise/i18n/foundations/numbers-currency-units.md +61 -0
  306. package/expertise/i18n/foundations/pluralization-and-gender.md +109 -0
  307. package/expertise/i18n/foundations/string-externalization.md +236 -0
  308. package/expertise/i18n/foundations/text-direction-bidi.md +241 -0
  309. package/expertise/i18n/foundations/unicode-and-encoding.md +86 -0
  310. package/expertise/i18n/index.md +38 -0
  311. package/expertise/i18n/platform/backend-i18n.md +31 -0
  312. package/expertise/i18n/platform/flutter-i18n.md +148 -0
  313. package/expertise/i18n/platform/native-android-i18n.md +36 -0
  314. package/expertise/i18n/platform/native-ios-i18n.md +36 -0
  315. package/expertise/i18n/platform/react-i18n.md +103 -0
  316. package/expertise/i18n/platform/web-css-i18n.md +81 -0
  317. package/expertise/i18n/rtl/arabic-specific.md +175 -0
  318. package/expertise/i18n/rtl/hebrew-specific.md +149 -0
  319. package/expertise/i18n/rtl/rtl-animations-and-transitions.md +111 -0
  320. package/expertise/i18n/rtl/rtl-forms-and-input.md +161 -0
  321. package/expertise/i18n/rtl/rtl-fundamentals.md +211 -0
  322. package/expertise/i18n/rtl/rtl-icons-and-images.md +181 -0
  323. package/expertise/i18n/rtl/rtl-layout-mirroring.md +252 -0
  324. package/expertise/i18n/rtl/rtl-navigation-and-gestures.md +107 -0
  325. package/expertise/i18n/rtl/rtl-testing-and-qa.md +147 -0
  326. package/expertise/i18n/rtl/rtl-typography.md +160 -0
  327. package/expertise/index.md +113 -0
  328. package/expertise/index.yaml +216 -0
  329. package/expertise/infrastructure/cloud-aws.md +597 -0
  330. package/expertise/infrastructure/cloud-gcp.md +599 -0
  331. package/expertise/infrastructure/cybersecurity.md +816 -0
  332. package/expertise/infrastructure/database-mongodb.md +447 -0
  333. package/expertise/infrastructure/database-postgres.md +400 -0
  334. package/expertise/infrastructure/devops-cicd.md +787 -0
  335. package/expertise/infrastructure/index.md +27 -0
  336. package/expertise/performance/PROGRESS.md +50 -0
  337. package/expertise/performance/backend/api-latency.md +1204 -0
  338. package/expertise/performance/backend/background-jobs.md +506 -0
  339. package/expertise/performance/backend/connection-pooling.md +1209 -0
  340. package/expertise/performance/backend/database-query-optimization.md +515 -0
  341. package/expertise/performance/backend/index.md +23 -0
  342. package/expertise/performance/backend/rate-limiting-and-throttling.md +971 -0
  343. package/expertise/performance/foundations/algorithmic-complexity.md +954 -0
  344. package/expertise/performance/foundations/caching-strategies.md +489 -0
  345. package/expertise/performance/foundations/concurrency-and-parallelism.md +847 -0
  346. package/expertise/performance/foundations/index.md +24 -0
  347. package/expertise/performance/foundations/measuring-and-profiling.md +440 -0
  348. package/expertise/performance/foundations/memory-management.md +964 -0
  349. package/expertise/performance/foundations/performance-budgets.md +1314 -0
  350. package/expertise/performance/index.md +31 -0
  351. package/expertise/performance/infrastructure/auto-scaling.md +1059 -0
  352. package/expertise/performance/infrastructure/cdn-and-edge.md +1081 -0
  353. package/expertise/performance/infrastructure/index.md +22 -0
  354. package/expertise/performance/infrastructure/load-balancing.md +1081 -0
  355. package/expertise/performance/infrastructure/observability.md +1079 -0
  356. package/expertise/performance/mobile/index.md +23 -0
  357. package/expertise/performance/mobile/mobile-animations.md +544 -0
  358. package/expertise/performance/mobile/mobile-memory-battery.md +416 -0
  359. package/expertise/performance/mobile/mobile-network.md +452 -0
  360. package/expertise/performance/mobile/mobile-rendering.md +599 -0
  361. package/expertise/performance/mobile/mobile-startup-time.md +505 -0
  362. package/expertise/performance/platform-specific/flutter-performance.md +647 -0
  363. package/expertise/performance/platform-specific/index.md +22 -0
  364. package/expertise/performance/platform-specific/node-performance.md +1307 -0
  365. package/expertise/performance/platform-specific/postgres-performance.md +1366 -0
  366. package/expertise/performance/platform-specific/react-performance.md +1403 -0
  367. package/expertise/performance/web/bundle-optimization.md +1239 -0
  368. package/expertise/performance/web/image-and-media.md +636 -0
  369. package/expertise/performance/web/index.md +24 -0
  370. package/expertise/performance/web/network-optimization.md +1133 -0
  371. package/expertise/performance/web/rendering-performance.md +1098 -0
  372. package/expertise/performance/web/ssr-and-hydration.md +918 -0
  373. package/expertise/performance/web/web-vitals.md +1374 -0
  374. package/expertise/quality/accessibility.md +985 -0
  375. package/expertise/quality/evidence-based-verification.md +499 -0
  376. package/expertise/quality/index.md +24 -0
  377. package/expertise/quality/ml-model-audit.md +614 -0
  378. package/expertise/quality/performance.md +600 -0
  379. package/expertise/quality/testing-api.md +891 -0
  380. package/expertise/quality/testing-mobile.md +496 -0
  381. package/expertise/quality/testing-web.md +849 -0
  382. package/expertise/security/PROGRESS.md +54 -0
  383. package/expertise/security/agentic-identity.md +540 -0
  384. package/expertise/security/compliance-frameworks.md +601 -0
  385. package/expertise/security/data/data-encryption.md +364 -0
  386. package/expertise/security/data/data-privacy-gdpr.md +692 -0
  387. package/expertise/security/data/database-security.md +1171 -0
  388. package/expertise/security/data/index.md +22 -0
  389. package/expertise/security/data/pii-handling.md +531 -0
  390. package/expertise/security/foundations/authentication.md +1041 -0
  391. package/expertise/security/foundations/authorization.md +603 -0
  392. package/expertise/security/foundations/cryptography.md +1001 -0
  393. package/expertise/security/foundations/index.md +25 -0
  394. package/expertise/security/foundations/owasp-top-10.md +1354 -0
  395. package/expertise/security/foundations/secrets-management.md +1217 -0
  396. package/expertise/security/foundations/secure-sdlc.md +700 -0
  397. package/expertise/security/foundations/supply-chain-security.md +698 -0
  398. package/expertise/security/index.md +31 -0
  399. package/expertise/security/infrastructure/cloud-security-aws.md +1296 -0
  400. package/expertise/security/infrastructure/cloud-security-gcp.md +1376 -0
  401. package/expertise/security/infrastructure/container-security.md +721 -0
  402. package/expertise/security/infrastructure/incident-response.md +1295 -0
  403. package/expertise/security/infrastructure/index.md +24 -0
  404. package/expertise/security/infrastructure/logging-and-monitoring.md +1618 -0
  405. package/expertise/security/infrastructure/network-security.md +1337 -0
  406. package/expertise/security/mobile/index.md +23 -0
  407. package/expertise/security/mobile/mobile-android-security.md +1218 -0
  408. package/expertise/security/mobile/mobile-binary-protection.md +1229 -0
  409. package/expertise/security/mobile/mobile-data-storage.md +1265 -0
  410. package/expertise/security/mobile/mobile-ios-security.md +1401 -0
  411. package/expertise/security/mobile/mobile-network-security.md +1520 -0
  412. package/expertise/security/smart-contract-security.md +594 -0
  413. package/expertise/security/testing/index.md +22 -0
  414. package/expertise/security/testing/penetration-testing.md +1258 -0
  415. package/expertise/security/testing/security-code-review.md +1765 -0
  416. package/expertise/security/testing/threat-modeling.md +1074 -0
  417. package/expertise/security/testing/vulnerability-scanning.md +1062 -0
  418. package/expertise/security/web/api-security.md +586 -0
  419. package/expertise/security/web/cors-and-headers.md +433 -0
  420. package/expertise/security/web/csrf.md +562 -0
  421. package/expertise/security/web/file-upload.md +1477 -0
  422. package/expertise/security/web/index.md +25 -0
  423. package/expertise/security/web/injection.md +1375 -0
  424. package/expertise/security/web/session-management.md +1101 -0
  425. package/expertise/security/web/xss.md +1158 -0
  426. package/exports/README.md +17 -0
  427. package/exports/hosts/claude/.claude/agents/clarifier.md +42 -0
  428. package/exports/hosts/claude/.claude/agents/content-author.md +63 -0
  429. package/exports/hosts/claude/.claude/agents/designer.md +55 -0
  430. package/exports/hosts/claude/.claude/agents/executor.md +55 -0
  431. package/exports/hosts/claude/.claude/agents/learner.md +51 -0
  432. package/exports/hosts/claude/.claude/agents/planner.md +53 -0
  433. package/exports/hosts/claude/.claude/agents/researcher.md +43 -0
  434. package/exports/hosts/claude/.claude/agents/reviewer.md +54 -0
  435. package/exports/hosts/claude/.claude/agents/specifier.md +47 -0
  436. package/exports/hosts/claude/.claude/agents/verifier.md +71 -0
  437. package/exports/hosts/claude/.claude/commands/author.md +42 -0
  438. package/exports/hosts/claude/.claude/commands/clarify.md +38 -0
  439. package/exports/hosts/claude/.claude/commands/design-review.md +46 -0
  440. package/exports/hosts/claude/.claude/commands/design.md +44 -0
  441. package/exports/hosts/claude/.claude/commands/discover.md +37 -0
  442. package/exports/hosts/claude/.claude/commands/execute.md +48 -0
  443. package/exports/hosts/claude/.claude/commands/learn.md +38 -0
  444. package/exports/hosts/claude/.claude/commands/plan-review.md +42 -0
  445. package/exports/hosts/claude/.claude/commands/plan.md +39 -0
  446. package/exports/hosts/claude/.claude/commands/prepare-next.md +37 -0
  447. package/exports/hosts/claude/.claude/commands/review.md +40 -0
  448. package/exports/hosts/claude/.claude/commands/run-audit.md +41 -0
  449. package/exports/hosts/claude/.claude/commands/spec-challenge.md +41 -0
  450. package/exports/hosts/claude/.claude/commands/specify.md +38 -0
  451. package/exports/hosts/claude/.claude/commands/verify.md +37 -0
  452. package/exports/hosts/claude/.claude/settings.json +34 -0
  453. package/exports/hosts/claude/CLAUDE.md +19 -0
  454. package/exports/hosts/claude/export.manifest.json +38 -0
  455. package/exports/hosts/claude/host-package.json +67 -0
  456. package/exports/hosts/codex/AGENTS.md +19 -0
  457. package/exports/hosts/codex/export.manifest.json +38 -0
  458. package/exports/hosts/codex/host-package.json +41 -0
  459. package/exports/hosts/cursor/.cursor/hooks.json +16 -0
  460. package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +19 -0
  461. package/exports/hosts/cursor/export.manifest.json +38 -0
  462. package/exports/hosts/cursor/host-package.json +42 -0
  463. package/exports/hosts/gemini/GEMINI.md +19 -0
  464. package/exports/hosts/gemini/export.manifest.json +38 -0
  465. package/exports/hosts/gemini/host-package.json +41 -0
  466. package/hooks/README.md +18 -0
  467. package/hooks/definitions/loop_cap_guard.yaml +21 -0
  468. package/hooks/definitions/post_tool_capture.yaml +24 -0
  469. package/hooks/definitions/pre_compact_summary.yaml +19 -0
  470. package/hooks/definitions/pre_tool_capture_route.yaml +19 -0
  471. package/hooks/definitions/protected_path_write_guard.yaml +19 -0
  472. package/hooks/definitions/session_start.yaml +19 -0
  473. package/hooks/definitions/stop_handoff_harvest.yaml +20 -0
  474. package/hooks/loop-cap-guard +17 -0
  475. package/hooks/post-tool-lint +36 -0
  476. package/hooks/protected-path-write-guard +17 -0
  477. package/hooks/session-start +41 -0
  478. package/llms-full.txt +2355 -0
  479. package/llms.txt +43 -0
  480. package/package.json +79 -0
  481. package/roles/README.md +20 -0
  482. package/roles/clarifier.md +42 -0
  483. package/roles/content-author.md +63 -0
  484. package/roles/designer.md +55 -0
  485. package/roles/executor.md +55 -0
  486. package/roles/learner.md +51 -0
  487. package/roles/planner.md +53 -0
  488. package/roles/researcher.md +43 -0
  489. package/roles/reviewer.md +54 -0
  490. package/roles/specifier.md +47 -0
  491. package/roles/verifier.md +71 -0
  492. package/schemas/README.md +24 -0
  493. package/schemas/accepted-learning.schema.json +20 -0
  494. package/schemas/author-artifact.schema.json +156 -0
  495. package/schemas/clarification.schema.json +19 -0
  496. package/schemas/design-artifact.schema.json +80 -0
  497. package/schemas/docs-claim.schema.json +18 -0
  498. package/schemas/export-manifest.schema.json +20 -0
  499. package/schemas/hook.schema.json +67 -0
  500. package/schemas/host-export-package.schema.json +18 -0
  501. package/schemas/implementation-plan.schema.json +19 -0
  502. package/schemas/proposed-learning.schema.json +19 -0
  503. package/schemas/research.schema.json +18 -0
  504. package/schemas/review.schema.json +29 -0
  505. package/schemas/run-manifest.schema.json +18 -0
  506. package/schemas/spec-challenge.schema.json +18 -0
  507. package/schemas/spec.schema.json +20 -0
  508. package/schemas/usage.schema.json +102 -0
  509. package/schemas/verification-proof.schema.json +29 -0
  510. package/schemas/wazir-manifest.schema.json +173 -0
  511. package/skills/README.md +40 -0
  512. package/skills/brainstorming/SKILL.md +77 -0
  513. package/skills/debugging/SKILL.md +50 -0
  514. package/skills/design/SKILL.md +61 -0
  515. package/skills/dispatching-parallel-agents/SKILL.md +128 -0
  516. package/skills/executing-plans/SKILL.md +70 -0
  517. package/skills/finishing-a-development-branch/SKILL.md +169 -0
  518. package/skills/humanize/SKILL.md +123 -0
  519. package/skills/init-pipeline/SKILL.md +124 -0
  520. package/skills/prepare-next/SKILL.md +20 -0
  521. package/skills/receiving-code-review/SKILL.md +123 -0
  522. package/skills/requesting-code-review/SKILL.md +105 -0
  523. package/skills/requesting-code-review/code-reviewer.md +108 -0
  524. package/skills/run-audit/SKILL.md +197 -0
  525. package/skills/scan-project/SKILL.md +41 -0
  526. package/skills/self-audit/SKILL.md +153 -0
  527. package/skills/subagent-driven-development/SKILL.md +154 -0
  528. package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +26 -0
  529. package/skills/subagent-driven-development/implementer-prompt.md +102 -0
  530. package/skills/subagent-driven-development/spec-reviewer-prompt.md +61 -0
  531. package/skills/tdd/SKILL.md +23 -0
  532. package/skills/using-git-worktrees/SKILL.md +163 -0
  533. package/skills/using-skills/SKILL.md +95 -0
  534. package/skills/verification/SKILL.md +22 -0
  535. package/skills/wazir/SKILL.md +463 -0
  536. package/skills/writing-plans/SKILL.md +30 -0
  537. package/skills/writing-skills/SKILL.md +157 -0
  538. package/skills/writing-skills/anthropic-best-practices.md +122 -0
  539. package/skills/writing-skills/persuasion-principles.md +50 -0
  540. package/templates/README.md +20 -0
  541. package/templates/artifacts/README.md +10 -0
  542. package/templates/artifacts/accepted-learning.md +19 -0
  543. package/templates/artifacts/accepted-learning.template.json +12 -0
  544. package/templates/artifacts/author.md +74 -0
  545. package/templates/artifacts/author.template.json +19 -0
  546. package/templates/artifacts/clarification.md +21 -0
  547. package/templates/artifacts/clarification.template.json +12 -0
  548. package/templates/artifacts/execute-notes.md +19 -0
  549. package/templates/artifacts/implementation-plan.md +21 -0
  550. package/templates/artifacts/implementation-plan.template.json +11 -0
  551. package/templates/artifacts/learning-proposal.md +19 -0
  552. package/templates/artifacts/next-run-handoff.md +21 -0
  553. package/templates/artifacts/plan-review.md +19 -0
  554. package/templates/artifacts/proposed-learning.template.json +12 -0
  555. package/templates/artifacts/research.md +21 -0
  556. package/templates/artifacts/research.template.json +12 -0
  557. package/templates/artifacts/review-findings.md +19 -0
  558. package/templates/artifacts/review.template.json +11 -0
  559. package/templates/artifacts/run-manifest.template.json +8 -0
  560. package/templates/artifacts/spec-challenge.md +19 -0
  561. package/templates/artifacts/spec-challenge.template.json +11 -0
  562. package/templates/artifacts/spec.md +21 -0
  563. package/templates/artifacts/spec.template.json +12 -0
  564. package/templates/artifacts/verification-proof.md +19 -0
  565. package/templates/artifacts/verification-proof.template.json +11 -0
  566. package/templates/examples/accepted-learning.example.json +14 -0
  567. package/templates/examples/author.example.json +152 -0
  568. package/templates/examples/clarification.example.json +15 -0
  569. package/templates/examples/docs-claim.example.json +8 -0
  570. package/templates/examples/export-manifest.example.json +7 -0
  571. package/templates/examples/host-export-package.example.json +11 -0
  572. package/templates/examples/implementation-plan.example.json +17 -0
  573. package/templates/examples/proposed-learning.example.json +13 -0
  574. package/templates/examples/research.example.json +15 -0
  575. package/templates/examples/research.example.md +6 -0
  576. package/templates/examples/review.example.json +17 -0
  577. package/templates/examples/run-manifest.example.json +9 -0
  578. package/templates/examples/spec-challenge.example.json +14 -0
  579. package/templates/examples/spec.example.json +21 -0
  580. package/templates/examples/verification-proof.example.json +21 -0
  581. package/templates/examples/wazir-manifest.example.yaml +65 -0
  582. package/templates/task-definition-schema.md +99 -0
  583. package/tooling/README.md +20 -0
  584. package/tooling/src/adapters/context-mode.js +50 -0
  585. package/tooling/src/capture/command.js +376 -0
  586. package/tooling/src/capture/store.js +99 -0
  587. package/tooling/src/capture/usage.js +270 -0
  588. package/tooling/src/checks/branches.js +50 -0
  589. package/tooling/src/checks/brand-truth.js +110 -0
  590. package/tooling/src/checks/changelog.js +231 -0
  591. package/tooling/src/checks/command-registry.js +36 -0
  592. package/tooling/src/checks/commits.js +102 -0
  593. package/tooling/src/checks/docs-drift.js +103 -0
  594. package/tooling/src/checks/docs-truth.js +201 -0
  595. package/tooling/src/checks/runtime-surface.js +156 -0
  596. package/tooling/src/cli.js +116 -0
  597. package/tooling/src/command-options.js +56 -0
  598. package/tooling/src/commands/validate.js +320 -0
  599. package/tooling/src/doctor/command.js +91 -0
  600. package/tooling/src/export/command.js +77 -0
  601. package/tooling/src/export/compiler.js +498 -0
  602. package/tooling/src/guards/loop-cap-guard.js +52 -0
  603. package/tooling/src/guards/protected-path-write-guard.js +67 -0
  604. package/tooling/src/index/command.js +152 -0
  605. package/tooling/src/index/storage.js +1061 -0
  606. package/tooling/src/index/summarizers.js +261 -0
  607. package/tooling/src/loaders.js +18 -0
  608. package/tooling/src/project-root.js +22 -0
  609. package/tooling/src/recall/command.js +225 -0
  610. package/tooling/src/schema-validator.js +30 -0
  611. package/tooling/src/state-root.js +40 -0
  612. package/tooling/src/status/command.js +71 -0
  613. package/wazir.manifest.yaml +135 -0
  614. package/workflows/README.md +19 -0
  615. package/workflows/author.md +42 -0
  616. package/workflows/clarify.md +38 -0
  617. package/workflows/design-review.md +46 -0
  618. package/workflows/design.md +44 -0
  619. package/workflows/discover.md +37 -0
  620. package/workflows/execute.md +48 -0
  621. package/workflows/learn.md +38 -0
  622. package/workflows/plan-review.md +42 -0
  623. package/workflows/plan.md +39 -0
  624. package/workflows/prepare-next.md +37 -0
  625. package/workflows/review.md +40 -0
  626. package/workflows/run-audit.md +41 -0
  627. package/workflows/spec-challenge.md +41 -0
  628. package/workflows/specify.md +38 -0
  629. package/workflows/verify.md +37 -0
@@ -0,0 +1,614 @@
1
+ # ML Model Audit — Expertise Module
2
+
3
+ > An ML model auditor validates correctness, fairness, calibration, and production readiness of machine learning models before and after deployment. The scope spans data quality verification, discrimination and calibration testing, fairness assessment against legal thresholds, interpretability analysis via SHAP, drift detection, and continuous production monitoring.
4
+
5
+ ---
6
+
7
+ ## Why Model Auditing Matters
8
+
9
+ | Incident | Year | Impact | Root Cause |
10
+ |---|---|---|---|
11
+ | Knight Capital algorithmic trading | 2012 | $440M loss in 45 minutes | Untested deployment; no rollback, no production monitoring |
12
+ | Amazon hiring tool gender bias | 2018 | Scrapped after Reuters exposure | Training data reflected historical hiring bias against women |
13
+ | Zillow Zestimate iBuyer model | 2021 | $569M write-down, 2,000 layoffs | Model drift; no recalibration when market shifted |
14
+ | COMPAS recidivism scoring | 2016 | ProPublica investigation, litigation | Racial bias in FPR; Black defendants 2x more likely flagged high-risk |
15
+
16
+ Pattern: models passed aggregate metrics but failed on unmeasured dimensions — subgroup fairness, calibration under shift, or operational monitoring.
17
+
18
+ **Key references:** Google Model Cards paper (Mitchell et al., 2019) — standard for model documentation. EU AI Act (Regulation 2024/1689) — four risk tiers, high-risk systems require conformity assessments, penalties up to 7% global turnover. NIST AI RMF 1.0 (2023) — Govern, Map, Measure, Manage. US EEOC Uniform Guidelines — 4/5ths rule for adverse impact.
19
+
20
+ ---
21
+
22
+ ## 10-Domain Audit Framework
23
+
24
+ | # | Domain | What to Check | Key Metric | Threshold |
25
+ |---|---|---|---|---|
26
+ | 1 | Documentation | Model cards, data provenance, version history | Completeness | 100% required fields |
27
+ | 2 | Data Quality | Distribution, missing values, leakage, duplicates | PSI, missing rate | PSI < 0.1, missing < 5% |
28
+ | 3 | Feature Analysis | Importance stability, multicollinearity | SHAP values, VIF | VIF < 5, stable SHAP |
29
+ | 4 | Target/Label | Class balance, label noise, label leakage | Imbalance ratio | < 10:1, noise < 2% |
30
+ | 5 | Calibration | Predicted probability vs. observed frequency | Hosmer-Lemeshow, Brier | HL p > 0.05, Brier < 0.25 |
31
+ | 6 | Discrimination | Separating power for positive/negative classes | AUC-ROC, Gini, KS | AUC > 0.7, KS > 0.3 |
32
+ | 7 | Fairness | Protected group parity in outcomes and errors | Disparate impact | > 0.8 (4/5ths rule) |
33
+ | 8 | Interpretability | Feature explanations, local and global | SHAP consistency | Stable across samples |
34
+ | 9 | Monitoring | Drift detection, performance degradation | PSI per feature | PSI < 0.2, AUC drop < 5% |
35
+ | 10 | Business Impact | Decision quality, cost-weighted outcomes | Cost matrix | ROI positive |
36
+
37
+ **Execution order:** Documentation -> Data Quality -> Feature Analysis -> Target/Label -> Discrimination -> Calibration -> Fairness -> Interpretability -> Monitoring -> Business Impact. Each domain's findings inform the next — data quality issues invalidate downstream metrics, miscalibration corrupts fairness results.
38
+
39
+ ---
40
+
41
+ ## Domain 1: Documentation — Model Cards
42
+
43
+ ```markdown
44
+ # Model Card: [Model Name]
45
+ ## Model Details
46
+ - Version, Type, Framework, Training date, Owner
47
+ ## Intended Use
48
+ - Primary use case, Out-of-scope uses, Target population
49
+ ## Training Data
50
+ - Source, Collection period, Size, Preprocessing, Known limitations
51
+ ## Evaluation Results
52
+ | Metric | Train | Validation | Test | Production |
53
+ |---|---|---|---|---|
54
+ | AUC-ROC / Gini / KS / Brier / PR-AUC | | | | |
55
+ ## Performance by Subgroup
56
+ | Subgroup | N | AUC | FPR | FNR | Disparate Impact |
57
+ ## Ethical Considerations
58
+ - Protected attributes evaluated, Fairness metrics, Known biases, Mitigation
59
+ ## Limitations & Monitoring
60
+ - Drift detection method, Retraining trigger, Rollback plan
61
+ ```
62
+
63
+ ```python
64
+ REQUIRED_SECTIONS = [
65
+ 'model_details', 'intended_use', 'training_data',
66
+ 'evaluation_results', 'performance_by_subgroup',
67
+ 'ethical_considerations', 'limitations', 'monitoring',
68
+ ]
69
+ REQUIRED_FIELDS = {
70
+ 'model_details': ['version', 'type', 'framework', 'training_date', 'owner'],
71
+ 'intended_use': ['primary_use', 'out_of_scope', 'target_population'],
72
+ 'training_data': ['source', 'collection_period', 'size', 'preprocessing'],
73
+ 'evaluation_results': ['auc_roc', 'gini', 'brier_score'],
74
+ 'ethical_considerations': ['protected_attributes', 'fairness_metrics'],
75
+ 'monitoring': ['drift_detection', 'retraining_trigger', 'rollback_plan'],
76
+ }
77
+
78
+ def validate_model_card(card: dict) -> dict:
79
+ missing_sections = [s for s in REQUIRED_SECTIONS if s not in card]
80
+ missing_fields = {}
81
+ for section, fields in REQUIRED_FIELDS.items():
82
+ if section in card:
83
+ missing = [f for f in fields if not card[section].get(f)]
84
+ if missing:
85
+ missing_fields[section] = missing
86
+ total = len(REQUIRED_SECTIONS) + sum(len(v) for v in REQUIRED_FIELDS.values())
87
+ total_missing = len(missing_sections) + sum(len(v) for v in missing_fields.values())
88
+ completeness = (total - total_missing) / total
89
+ return {'completeness': round(completeness, 3), 'passes': completeness == 1.0,
90
+ 'missing_sections': missing_sections, 'missing_fields': missing_fields}
91
+ ```
92
+
93
+ ---
94
+
95
+ ## Domain 2: Data Quality — PSI and Integrity
96
+
97
+ ### Population Stability Index (PSI)
98
+
99
+ | PSI Value | Interpretation | Action |
100
+ |---|---|---|
101
+ | < 0.1 | No significant shift | Continue monitoring |
102
+ | 0.1 - 0.2 | Moderate shift | Investigate, consider recalibration |
103
+ | > 0.2 | Significant shift | Retrain model |
104
+ | > 0.25 | Severe shift | Immediate review, potential rollback |
105
+
106
+ ```python
107
+ import numpy as np
108
+ from typing import Optional
109
+
110
+ def compute_psi(
111
+ expected: np.ndarray, actual: np.ndarray,
112
+ bins: int = 10, method: str = 'quantile',
113
+ ) -> float:
114
+ """Population Stability Index — measures distributional shift."""
115
+ expected = expected[~np.isnan(expected)]
116
+ actual = actual[~np.isnan(actual)]
117
+ if len(expected) == 0 or len(actual) == 0:
118
+ raise ValueError("Input arrays must contain non-NaN values.")
119
+
120
+ if method == 'quantile':
121
+ breakpoints = np.unique(np.quantile(expected, np.linspace(0, 1, bins + 1)))
122
+ elif method == 'uniform':
123
+ breakpoints = np.linspace(expected.min(), expected.max(), bins + 1)
124
+ else:
125
+ raise ValueError(f"Unknown method '{method}'. Use 'quantile' or 'uniform'.")
126
+
127
+ if len(breakpoints) < 3: # Collapsed bins — fall back to uniform
128
+ breakpoints = np.linspace(expected.min(), expected.max(), bins + 1)
129
+
130
+ expected_pct = np.clip(np.histogram(expected, bins=breakpoints)[0] / len(expected), 1e-4, None)
131
+ actual_pct = np.clip(np.histogram(actual, bins=breakpoints)[0] / len(actual), 1e-4, None)
132
+ return float(np.sum((actual_pct - expected_pct) * np.log(actual_pct / expected_pct)))
133
+
134
+ def compute_feature_psi(expected_df, actual_df, columns=None, bins=10) -> dict:
135
+ """PSI for every numeric feature. Returns {feature: psi} sorted descending."""
136
+ if columns is None:
137
+ columns = expected_df.select_dtypes(include=[np.number]).columns.tolist()
138
+ results = {}
139
+ for col in columns:
140
+ try:
141
+ results[col] = compute_psi(expected_df[col].values, actual_df[col].values, bins)
142
+ except ValueError:
143
+ results[col] = float('nan')
144
+ return dict(sorted(results.items(), key=lambda x: x[1], reverse=True))
145
+ ```
146
+
147
+ ### Data Quality Report
148
+
149
+ ```python
150
+ import pandas as pd
151
+
152
+ def data_quality_report(df: pd.DataFrame) -> dict:
153
+ """Checks missing values, duplicates, constant columns, infinities, high cardinality."""
154
+ n_rows = len(df)
155
+ missing_pct = (df.isnull().sum() / n_rows * 100).round(2)
156
+ n_dupes = int(df.duplicated().sum())
157
+ constant_cols = [c for c in df.columns if df[c].nunique(dropna=True) <= 1]
158
+ numeric = df.select_dtypes(include=[np.number]).columns
159
+ inf_counts = {c: int(np.isinf(df[c]).sum()) for c in numeric if np.isinf(df[c]).any()}
160
+ cat_cols = df.select_dtypes(include=['object', 'category']).columns
161
+ high_card = {c: int(df[c].nunique()) for c in cat_cols if df[c].nunique() > 0.5 * n_rows}
162
+ return {
163
+ 'shape': df.shape, 'duplicate_rows': n_dupes,
164
+ 'columns_above_5pct_missing': [c for c, p in missing_pct.items() if p > 5],
165
+ 'constant_columns': constant_cols, 'infinite_values': inf_counts,
166
+ 'high_cardinality_categoricals': high_card,
167
+ }
168
+ ```
169
+
170
+ ---
171
+
172
+ ## Domain 3: Feature Analysis — SHAP and Multicollinearity
173
+
174
+ ### SHAP Audit
175
+
176
+ ```python
177
+ import shap
178
+ import matplotlib.pyplot as plt
179
+ from pathlib import Path
180
+
181
+ def run_shap_audit(model, X_test, output_dir: str = 'audit/shap') -> dict:
182
+ """Global SHAP importance, summary plot, top-5 dependence plots."""
183
+ Path(output_dir).mkdir(parents=True, exist_ok=True)
184
+ tree_types = {'XGBClassifier', 'LGBMClassifier', 'RandomForestClassifier',
185
+ 'GradientBoostingClassifier', 'XGBRegressor', 'LGBMRegressor'}
186
+ if type(model).__name__ in tree_types:
187
+ explainer = shap.TreeExplainer(model)
188
+ else:
189
+ background = shap.sample(X_test, min(100, len(X_test)))
190
+ explainer = shap.KernelExplainer(model.predict_proba, background)
191
+
192
+ shap_values = explainer.shap_values(X_test)
193
+ if isinstance(shap_values, list) and len(shap_values) == 2:
194
+ shap_values = shap_values[1] # Positive class for binary
195
+
196
+ mean_abs = np.abs(shap_values).mean(axis=0)
197
+ names = X_test.columns.tolist() if hasattr(X_test, 'columns') else [f'f_{i}' for i in range(X_test.shape[1])]
198
+ importance = dict(sorted(zip(names, mean_abs), key=lambda x: x[1], reverse=True))
199
+
200
+ shap.summary_plot(shap_values, X_test, show=False)
201
+ plt.savefig(f'{output_dir}/shap_summary.png', dpi=150, bbox_inches='tight'); plt.close()
202
+
203
+ for feat in list(importance.keys())[:5]:
204
+ shap.dependence_plot(feat, shap_values, X_test, show=False)
205
+ plt.savefig(f'{output_dir}/dep_{feat}.png', dpi=150, bbox_inches='tight'); plt.close()
206
+
207
+ return {'feature_importance': importance, 'top_5': list(importance.keys())[:5]}
208
+
209
+ def shap_consistency_check(model, X_test, n_bootstrap: int = 5, sample_frac: float = 0.8) -> dict:
210
+ """Verify SHAP rankings are stable across bootstrap samples."""
211
+ from collections import Counter
212
+ rankings = []
213
+ for i in range(n_bootstrap):
214
+ sample = X_test.sample(frac=sample_frac, random_state=i)
215
+ explainer = shap.TreeExplainer(model)
216
+ sv = explainer.shap_values(sample)
217
+ if isinstance(sv, list) and len(sv) == 2:
218
+ sv = sv[1]
219
+ ranked = np.argsort(-np.abs(sv).mean(axis=0)).tolist()[:5]
220
+ rankings.append(ranked)
221
+ all_top5 = [f for r in rankings for f in r]
222
+ stable = [f for f, c in Counter(all_top5).items() if c == n_bootstrap]
223
+ return {'stable_top5': stable, 'stability_ratio': len(stable) / 5, 'passes': len(stable) >= 3}
224
+ ```
225
+
226
+ ### Variance Inflation Factor (VIF)
227
+
228
+ ```python
229
+ def compute_vif(X: pd.DataFrame) -> pd.DataFrame:
230
+ """VIF per feature. VIF > 5 = moderate, > 10 = severe multicollinearity."""
231
+ X_arr = X.values.astype(float)
232
+ vif_data = []
233
+ for i in range(X_arr.shape[1]):
234
+ y_i = X_arr[:, i]
235
+ X_i = np.column_stack([np.ones(X_arr.shape[0]), np.delete(X_arr, i, axis=1)])
236
+ try:
237
+ beta = np.linalg.lstsq(X_i, y_i, rcond=None)[0]
238
+ ss_res = np.sum((y_i - X_i @ beta) ** 2)
239
+ ss_tot = np.sum((y_i - y_i.mean()) ** 2)
240
+ r2 = 1 - ss_res / ss_tot if ss_tot > 0 else 0.0
241
+ vif = 1 / (1 - r2) if r2 < 1.0 else float('inf')
242
+ except np.linalg.LinAlgError:
243
+ vif = float('inf')
244
+ vif_data.append({'feature': X.columns[i], 'vif': round(vif, 2),
245
+ 'flag': 'SEVERE' if vif > 10 else ('MODERATE' if vif > 5 else 'OK')})
246
+ return pd.DataFrame(vif_data).sort_values('vif', ascending=False)
247
+ ```
248
+
249
+ ---
250
+
251
+ ## Domain 4: Target/Label Quality
252
+
253
+ ```python
254
+ from collections import Counter
255
+
256
+ def label_quality_report(y: np.ndarray) -> dict:
257
+ """Assess class balance and recommend resampling strategy."""
258
+ counts = Counter(y)
259
+ total = len(y)
260
+ majority = max(counts, key=counts.get)
261
+ minority = min(counts, key=counts.get)
262
+ ratio = counts[majority] / counts[minority]
263
+ if ratio < 3: strategy, severity = 'none', 'balanced'
264
+ elif ratio < 10: strategy, severity = 'class_weight', 'moderate'
265
+ elif ratio < 100: strategy, severity = 'SMOTE_or_class_weight', 'severe'
266
+ else: strategy, severity = 'anomaly_detection_reframe', 'extreme'
267
+ return {
268
+ 'class_distribution': dict(counts), 'imbalance_ratio': round(ratio, 1),
269
+ 'severity': severity, 'recommended_strategy': strategy, 'passes': ratio < 10,
270
+ }
271
+ ```
272
+
273
+ ---
274
+
275
+ ## Domain 5: Calibration Testing
276
+
277
+ ### Hosmer-Lemeshow Test
278
+
279
+ ```python
280
+ from scipy.stats import chi2
281
+
282
+ def hosmer_lemeshow_test(y_true: np.ndarray, y_prob: np.ndarray, n_groups: int = 10) -> dict:
283
+ """Goodness-of-fit test. H0: model is well-calibrated. Reject if p < 0.05."""
284
+ order = np.argsort(y_prob)
285
+ y_true_s, y_prob_s = np.asarray(y_true, dtype=float)[order], np.asarray(y_prob, dtype=float)[order]
286
+ groups = np.array_split(np.arange(len(y_true)), n_groups)
287
+ hl_stat = 0.0
288
+ group_details = []
289
+ for idx in groups:
290
+ n_g = len(idx)
291
+ obs = y_true_s[idx].sum()
292
+ exp = y_prob_s[idx].sum()
293
+ if exp > 0: hl_stat += (obs - exp) ** 2 / exp
294
+ if (n_g - exp) > 0: hl_stat += (n_g - obs - (n_g - exp)) ** 2 / (n_g - exp)
295
+ group_details.append({'n': n_g, 'observed_rate': round(float(obs / n_g), 4),
296
+ 'predicted_rate': round(float(y_prob_s[idx].mean()), 4)})
297
+ p_value = 1 - chi2.cdf(hl_stat, n_groups - 2)
298
+ return {
299
+ 'statistic': round(hl_stat, 4), 'p_value': round(p_value, 4),
300
+ 'group_details': group_details, 'passes': p_value > 0.05,
301
+ 'interpretation': 'Well calibrated' if p_value > 0.05
302
+ else 'Miscalibrated — consider Platt scaling or isotonic regression',
303
+ }
304
+ ```
305
+
306
+ ### Calibration Curve and Brier Score
307
+
308
+ ```python
309
+ from sklearn.calibration import calibration_curve
310
+ from sklearn.metrics import brier_score_loss
311
+
312
+ def calibration_audit(y_true, y_prob, n_bins=10, output_dir='audit/calibration') -> dict:
313
+ """Brier score + reliability curve plot. Brier < 0.25 = acceptable."""
314
+ Path(output_dir).mkdir(parents=True, exist_ok=True)
315
+ brier = brier_score_loss(y_true, y_prob)
316
+ frac_pos, mean_pred = calibration_curve(y_true, y_prob, n_bins=n_bins, strategy='uniform')
317
+
318
+ fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
319
+ ax1.plot([0, 1], [0, 1], 'k--', label='Perfect')
320
+ ax1.plot(mean_pred, frac_pos, 'o-', label=f'Model (Brier={brier:.4f})')
321
+ ax1.set_xlabel('Mean predicted'); ax1.set_ylabel('Fraction positive')
322
+ ax1.set_title('Calibration Curve'); ax1.legend()
323
+ ax2.hist(y_prob, bins=50, alpha=0.7); ax2.set_title('Prediction Distribution')
324
+ plt.tight_layout()
325
+ plt.savefig(f'{output_dir}/calibration_plot.png', dpi=150, bbox_inches='tight'); plt.close()
326
+
327
+ return {'brier_score': round(brier, 4), 'passes': brier < 0.25,
328
+ 'calibration_bins': {'predicted': mean_pred.tolist(), 'observed': frac_pos.tolist()}}
329
+ ```
330
+
331
+ ---
332
+
333
+ ## Domain 6: Discrimination Metrics
334
+
335
+ ```python
336
+ from sklearn.metrics import roc_auc_score, average_precision_score, roc_curve
337
+ from scipy.stats import ks_2samp
338
+
339
+ def discrimination_report(y_true: np.ndarray, y_prob: np.ndarray) -> dict:
340
+ """AUC-ROC, Gini, KS statistic, PR-AUC with grading."""
341
+ auc = roc_auc_score(y_true, y_prob)
342
+ gini = 2 * auc - 1
343
+ ks_stat = ks_2samp(y_prob[y_true == 1], y_prob[y_true == 0]).statistic
344
+ fpr, tpr, thresholds = roc_curve(y_true, y_prob)
345
+ optimal_threshold = float(thresholds[np.argmax(tpr - fpr)])
346
+ pr_auc = average_precision_score(y_true, y_prob)
347
+ if auc >= 0.9: grade = 'EXCELLENT'
348
+ elif auc >= 0.8: grade = 'GOOD'
349
+ elif auc >= 0.7: grade = 'ACCEPTABLE'
350
+ elif auc >= 0.6: grade = 'POOR'
351
+ else: grade = 'FAIL'
352
+ return {'AUC-ROC': round(auc, 4), 'Gini': round(gini, 4), 'KS': round(ks_stat, 4),
353
+ 'KS_optimal_threshold': round(optimal_threshold, 4), 'PR-AUC': round(pr_auc, 4),
354
+ 'grade': grade, 'passes': auc > 0.7}
355
+ ```
356
+
357
+ ---
358
+
359
+ ## Domain 7: Fairness Assessment
360
+
361
+ **Legal context:** US EEOC 4/5ths rule — selection rate for protected group must be >= 80% of highest-rate group. EU AI Act Article 10 — high-risk systems must use representative training data, examine biases. ECOA/Reg B — prohibits discrimination in credit by race, sex, age, etc.
362
+
363
+ ### Disparate Impact and Equalized Odds
364
+
365
+ ```python
366
+ def disparate_impact_ratio(y_pred: np.ndarray, protected_attr: np.ndarray) -> dict:
367
+ """4/5ths rule: ratio >= 0.8 for all groups."""
368
+ groups = np.unique(protected_attr)
369
+ rates = {str(g): float(y_pred[protected_attr == g].mean()) for g in groups}
370
+ max_rate = max(rates.values())
371
+ results = {}
372
+ for g, rate in rates.items():
373
+ ratio = rate / max_rate if max_rate > 0 else 0.0
374
+ results[g] = {'rate': round(rate, 4), 'ratio': round(ratio, 4), 'passes': ratio >= 0.8}
375
+ return {'group_results': results, 'overall_passes': all(r['passes'] for r in results.values())}
376
+
377
+ def equalized_odds_check(y_true, y_pred, protected_attr, threshold=0.05) -> dict:
378
+ """FPR and TPR should be similar across groups (within threshold)."""
379
+ groups = np.unique(protected_attr)
380
+ metrics = {}
381
+ for g in groups:
382
+ mask = protected_attr == g
383
+ yt, yp = y_true[mask], y_pred[mask]
384
+ tp = ((yt == 1) & (yp == 1)).sum(); fn = ((yt == 1) & (yp == 0)).sum()
385
+ fp = ((yt == 0) & (yp == 1)).sum(); tn = ((yt == 0) & (yp == 0)).sum()
386
+ tpr = tp / (tp + fn) if (tp + fn) > 0 else 0.0
387
+ fpr = fp / (fp + tn) if (fp + tn) > 0 else 0.0
388
+ metrics[str(g)] = {'TPR': round(tpr, 4), 'FPR': round(fpr, 4)}
389
+ tpr_gap = max(m['TPR'] for m in metrics.values()) - min(m['TPR'] for m in metrics.values())
390
+ fpr_gap = max(m['FPR'] for m in metrics.values()) - min(m['FPR'] for m in metrics.values())
391
+ return {'group_metrics': metrics, 'TPR_gap': round(tpr_gap, 4), 'FPR_gap': round(fpr_gap, 4),
392
+ 'passes_equalized_odds': tpr_gap <= threshold and fpr_gap <= threshold}
393
+ ```
394
+
395
+ ---
396
+
397
+ ## Domain 8: Interpretability
398
+
399
+ **EU AI Act risk-level requirements:**
400
+
401
+ | Risk Level | Examples | Required Interpretability |
402
+ |---|---|---|
403
+ | Unacceptable | Social scoring, real-time biometric | Prohibited |
404
+ | High | Credit, hiring, criminal justice | Full SHAP/LIME, per-prediction explanations, human-in-the-loop |
405
+ | Limited | Chatbots, recommendations | Transparency obligations |
406
+ | Minimal | Spam filters, game AI | Best practice only |
407
+
408
+ ### Local Explanation Stability
409
+
410
+ ```python
411
+ from scipy.stats import spearmanr
412
+
413
+ def explanation_stability_test(explainer, instance, n_perturbations=20, noise_scale=0.01) -> dict:
414
+ """Test if local explanations are stable under small input perturbations."""
415
+ base = explainer.shap_values(instance.reshape(1, -1))
416
+ if isinstance(base, list): base = base[1]
417
+ base = base.flatten()
418
+ correlations = []
419
+ for i in range(n_perturbations):
420
+ noise = np.random.RandomState(i).normal(0, noise_scale, size=instance.shape)
421
+ sv = explainer.shap_values((instance + noise).reshape(1, -1))
422
+ if isinstance(sv, list): sv = sv[1]
423
+ corr, _ = spearmanr(base, sv.flatten())
424
+ correlations.append(corr)
425
+ mean_corr = float(np.mean(correlations))
426
+ return {'mean_rank_correlation': round(mean_corr, 4),
427
+ 'min_rank_correlation': round(float(np.min(correlations)), 4),
428
+ 'passes': mean_corr > 0.8 and min(correlations) > 0.5}
429
+ ```
430
+
431
+ ---
432
+
433
+ ## Domain 9: Production Monitoring Pipeline
434
+
435
+ ```
436
+ Alerting Layer
437
+ ┌─────────────────────────────────┐
438
+ │ PSI > 0.2 → PagerDuty │
439
+ │ AUC drop > 5% → Slack │
440
+ │ Label drift → Email │
441
+ └──────────┬──────────────────────┘
442
+
443
+ ┌──────────▼──────────────────────┐
444
+ │ Drift Detection Engine │
445
+ │ Feature PSI | Prediction shift │
446
+ │ Label drift | Rolling AUC │
447
+ └──────────┬──────────────────────┘
448
+
449
+ ┌──────────▼──────────────────────┐
450
+ │ Scoring Pipeline │
451
+ │ Data → Features → Model → Score │
452
+ │ (log each stage for monitoring) │
453
+ └─────────────────────────────────┘
454
+ ```
455
+
456
+ **Alerting thresholds:**
457
+
458
+ | Metric | Yellow (Investigate) | Red (Action Required) |
459
+ |---|---|---|
460
+ | Feature PSI (any feature) | > 0.1 | > 0.2 |
461
+ | Prediction PSI | > 0.1 | > 0.2 |
462
+ | Rolling AUC (7-day) | < baseline - 3% | < baseline - 5% |
463
+ | Missing value rate | > 2x training rate | > 5x training rate |
464
+ | Prediction volume | < 50% normal | < 20% normal |
465
+
466
+ ```python
467
+ from dataclasses import dataclass, field
468
+ from datetime import datetime
469
+ from typing import Optional
470
+
471
+ @dataclass
472
+ class MonitoringResult:
473
+ metric_name: str
474
+ current_value: float
475
+ threshold_yellow: float
476
+ threshold_red: float
477
+ status: str # GREEN, YELLOW, RED
478
+ details: Optional[str] = None
479
+
480
+ class ModelMonitor:
481
+ """Production monitoring: feature drift, prediction drift, performance degradation."""
482
+ def __init__(self, reference_features, reference_predictions, feature_names, baseline_auc):
483
+ self.ref_features = reference_features
484
+ self.ref_predictions = reference_predictions
485
+ self.feature_names = feature_names
486
+ self.baseline_auc = baseline_auc
487
+
488
+ def check_feature_drift(self, current_features) -> list:
489
+ results = []
490
+ for i, name in enumerate(self.feature_names):
491
+ psi = compute_psi(self.ref_features[:, i], current_features[:, i])
492
+ status = 'RED' if psi > 0.2 else ('YELLOW' if psi > 0.1 else 'GREEN')
493
+ results.append(MonitoringResult(f'psi_{name}', round(psi, 4), 0.1, 0.2, status))
494
+ return results
495
+
496
+ def check_prediction_drift(self, current_predictions) -> MonitoringResult:
497
+ psi = compute_psi(self.ref_predictions, current_predictions)
498
+ status = 'RED' if psi > 0.2 else ('YELLOW' if psi > 0.1 else 'GREEN')
499
+ return MonitoringResult('prediction_psi', round(psi, 4), 0.1, 0.2, status)
500
+
501
+ def check_performance(self, y_true, y_prob) -> MonitoringResult:
502
+ current_auc = roc_auc_score(y_true, y_prob)
503
+ drop = self.baseline_auc - current_auc
504
+ status = 'RED' if drop > 0.05 else ('YELLOW' if drop > 0.03 else 'GREEN')
505
+ return MonitoringResult('rolling_auc', round(current_auc, 4),
506
+ round(self.baseline_auc - 0.03, 4),
507
+ round(self.baseline_auc - 0.05, 4), status,
508
+ f'drop={drop:.4f}')
509
+
510
+ def run_full_check(self, current_features, current_predictions,
511
+ y_true=None, y_prob=None) -> dict:
512
+ results = self.check_feature_drift(current_features)
513
+ results.append(self.check_prediction_drift(current_predictions))
514
+ if y_true is not None and y_prob is not None:
515
+ results.append(self.check_performance(y_true, y_prob))
516
+ statuses = [r.status for r in results]
517
+ overall = 'RED' if 'RED' in statuses else ('YELLOW' if 'YELLOW' in statuses else 'GREEN')
518
+ return {
519
+ 'overall_status': overall, 'n_checks': len(results),
520
+ 'red_alerts': [{'metric': r.metric_name, 'value': r.current_value}
521
+ for r in results if r.status == 'RED'],
522
+ 'yellow_alerts': [{'metric': r.metric_name, 'value': r.current_value}
523
+ for r in results if r.status == 'YELLOW'],
524
+ }
525
+ ```
526
+
527
+ ---
528
+
529
+ ## Domain 10: Business Impact
530
+
531
+ ```python
532
+ def cost_matrix_evaluation(y_true, y_pred, cost_tp=0.0, cost_fp=-100.0,
533
+ cost_fn=-500.0, cost_tn=0.0) -> dict:
534
+ """Evaluate model using business cost matrix. Defaults: fraud detection scenario."""
535
+ y_true, y_pred = np.asarray(y_true), np.asarray(y_pred)
536
+ tp = int(((y_true == 1) & (y_pred == 1)).sum())
537
+ fp = int(((y_true == 0) & (y_pred == 1)).sum())
538
+ fn = int(((y_true == 1) & (y_pred == 0)).sum())
539
+ tn = int(((y_true == 0) & (y_pred == 0)).sum())
540
+ total_cost = tp * cost_tp + fp * cost_fp + fn * cost_fn + tn * cost_tn
541
+ baseline = y_true.sum() * cost_fn + (len(y_true) - y_true.sum()) * cost_tn
542
+ net_benefit = total_cost - baseline
543
+ return {'confusion_matrix': {'TP': tp, 'FP': fp, 'FN': fn, 'TN': tn},
544
+ 'total_cost': round(total_cost, 2), 'baseline_no_model': round(baseline, 2),
545
+ 'net_benefit': round(net_benefit, 2), 'roi_positive': net_benefit > 0}
546
+ ```
547
+
548
+ ---
549
+
550
+ ## Anti-Patterns
551
+
552
+ ### 1. Training on test data (data leakage)
553
+ Features include information unavailable at prediction time. Model appears brilliant in eval, fails in production. **Detect:** suspiciously high single-feature importance; future data in features; scaler fit before train/test split.
554
+
555
+ ### 2. Optimizing aggregate metrics only
556
+ Overall AUC 0.85, minority subgroup AUC 0.55. Aggregate masks subgroup failure. **Prevent:** always stratify metrics by protected attributes, geography, business segments (Domain 7).
557
+
558
+ ### 3. Deploy and forget
559
+ Model degrades silently as distributions shift. Zillow's $569M write-down is the canonical example. **Prevent:** implement monitoring pipeline (Domain 9). No model ships without drift detection.
560
+
561
+ ### 4. Fairness washing
562
+ Computing disparate impact to check a box, taking no action when ratios fall below 0.8. **Prevent:** fairness metrics must have automated deployment gates, same as failing tests.
563
+
564
+ ### 5. Overfitting to validation set
565
+ After 200 hyperparameter tuning rounds against the validation set, the model memorizes it. **Prevent:** holdout test evaluated once at the end; use cross-validation for hyperparameter search.
566
+
567
+ ### 6. Ignoring class imbalance
568
+ Predicting "not fraud" for every transaction yields 99.5% accuracy on 0.5% fraud data. **Prevent:** if imbalance > 10:1, accuracy is invalid. Use PR-AUC, F1, or cost-weighted metrics.
569
+
570
+ ### 7. Single metric obsession
571
+ AUC 0.92 but predicted 0.7 corresponds to 30% actual event rate. Every threshold decision is wrong. **Prevent:** always audit calibration (Domain 5) alongside discrimination (Domain 6).
572
+
573
+ ### 8. Missing data provenance
574
+ Cannot reproduce training dataset six months later for a regulator. **Prevent:** version training data alongside model artifacts. Record query, filters, date range, random seed.
575
+
576
+ ### 9. Uncalibrated probability usage
577
+ Random forest `predict_proba` treated as true probability for risk tiers. RF outputs are vote fractions, not probabilities. **Prevent:** calibrate with Platt scaling or isotonic regression; test with Hosmer-Lemeshow.
578
+
579
+ ### 10. Threshold selection on training data
580
+ Operating threshold optimized on training set; production distribution differs. **Prevent:** select thresholds on validation set using business cost matrix; re-evaluate periodically.
581
+
582
+ ---
583
+
584
+ ## Recalibration Strategies
585
+
586
+ When calibration fails (Hosmer-Lemeshow p < 0.05 or Brier > 0.25), apply one of these post-hoc methods:
587
+
588
+ | Method | When to Use | Pros | Cons |
589
+ |---|---|---|---|
590
+ | Platt Scaling | Binary classification, sigmoid-shaped miscalibration | Simple, works well for SVMs and neural nets | Assumes sigmoid relationship |
591
+ | Isotonic Regression | Non-parametric miscalibration | No shape assumption, flexible | Requires more data, can overfit on small sets |
592
+ | Beta Calibration | Skewed prediction distributions | Handles asymmetric miscalibration | More complex, less widely supported |
593
+ | Temperature Scaling | Neural network confidence calibration | Single parameter, preserves ranking | Only adjusts sharpness, not shape |
594
+
595
+ Always recalibrate on a held-out calibration set (not training or test). Re-run Hosmer-Lemeshow after recalibration to confirm improvement.
596
+
597
+ ---
598
+
599
+ ## Deployment Audit Checklist
600
+
601
+ | # | Check | Pass Criteria |
602
+ |---|---|---|
603
+ | 1 | Model card complete | All required sections filled |
604
+ | 2 | Data quality | No columns > 5% missing, no leakage |
605
+ | 3 | Feature VIF | < 5 for all features (or justified) |
606
+ | 4 | Class imbalance | < 10:1 (or mitigation documented) |
607
+ | 5 | Calibration | Hosmer-Lemeshow p > 0.05 |
608
+ | 6 | Discrimination | AUC-ROC > 0.7 on holdout |
609
+ | 7 | Fairness | Disparate impact > 0.8 all groups |
610
+ | 8 | SHAP stability | Rankings stable across bootstrap |
611
+ | 9 | Monitoring | Pipeline deployed with PSI alerts |
612
+ | 10 | Business impact | Cost matrix shows positive ROI |
613
+ | 11 | Data versioning | Training data reproducible |
614
+ | 12 | Rollback plan | Documented and tested |