@wazir-dev/cli 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +111 -0
- package/CHANGELOG.md +14 -0
- package/CONTRIBUTING.md +101 -0
- package/LICENSE +21 -0
- package/README.md +314 -0
- package/assets/composition-engine.mmd +34 -0
- package/assets/demo-script.sh +17 -0
- package/assets/logo-dark.svg +14 -0
- package/assets/logo.svg +14 -0
- package/assets/pipeline.mmd +39 -0
- package/assets/record-demo.sh +51 -0
- package/docs/README.md +51 -0
- package/docs/adapters/context-mode.md +60 -0
- package/docs/concepts/architecture.md +87 -0
- package/docs/concepts/artifact-model.md +60 -0
- package/docs/concepts/composition-engine.md +36 -0
- package/docs/concepts/indexing-and-recall.md +160 -0
- package/docs/concepts/observability.md +41 -0
- package/docs/concepts/roles-and-workflows.md +59 -0
- package/docs/concepts/terminology-policy.md +27 -0
- package/docs/getting-started/01-installation.md +78 -0
- package/docs/getting-started/02-first-run.md +102 -0
- package/docs/getting-started/03-adding-to-project.md +15 -0
- package/docs/getting-started/04-host-setup.md +15 -0
- package/docs/guides/ci-integration.md +15 -0
- package/docs/guides/creating-skills.md +15 -0
- package/docs/guides/expertise-module-authoring.md +15 -0
- package/docs/guides/hook-development.md +15 -0
- package/docs/guides/memory-and-learnings.md +34 -0
- package/docs/guides/multi-host-export.md +15 -0
- package/docs/guides/troubleshooting.md +101 -0
- package/docs/guides/writing-custom-roles.md +15 -0
- package/docs/plans/2026-03-15-cli-pipeline-integration-design.md +592 -0
- package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md +598 -0
- package/docs/plans/2026-03-15-docs-enforcement-plan.md +238 -0
- package/docs/readmes/INDEX.md +99 -0
- package/docs/readmes/features/expertise/README.md +171 -0
- package/docs/readmes/features/exports/README.md +222 -0
- package/docs/readmes/features/hooks/README.md +103 -0
- package/docs/readmes/features/hooks/loop-cap-guard.md +133 -0
- package/docs/readmes/features/hooks/post-tool-capture.md +121 -0
- package/docs/readmes/features/hooks/post-tool-lint.md +130 -0
- package/docs/readmes/features/hooks/pre-compact-summary.md +122 -0
- package/docs/readmes/features/hooks/pre-tool-capture-route.md +100 -0
- package/docs/readmes/features/hooks/protected-path-write-guard.md +128 -0
- package/docs/readmes/features/hooks/session-start.md +119 -0
- package/docs/readmes/features/hooks/stop-handoff-harvest.md +125 -0
- package/docs/readmes/features/roles/README.md +157 -0
- package/docs/readmes/features/roles/clarifier.md +152 -0
- package/docs/readmes/features/roles/content-author.md +190 -0
- package/docs/readmes/features/roles/designer.md +193 -0
- package/docs/readmes/features/roles/executor.md +184 -0
- package/docs/readmes/features/roles/learner.md +210 -0
- package/docs/readmes/features/roles/planner.md +182 -0
- package/docs/readmes/features/roles/researcher.md +164 -0
- package/docs/readmes/features/roles/reviewer.md +184 -0
- package/docs/readmes/features/roles/specifier.md +162 -0
- package/docs/readmes/features/roles/verifier.md +215 -0
- package/docs/readmes/features/schemas/README.md +178 -0
- package/docs/readmes/features/skills/README.md +63 -0
- package/docs/readmes/features/skills/brainstorming.md +96 -0
- package/docs/readmes/features/skills/debugging.md +148 -0
- package/docs/readmes/features/skills/design.md +120 -0
- package/docs/readmes/features/skills/prepare-next.md +109 -0
- package/docs/readmes/features/skills/run-audit.md +159 -0
- package/docs/readmes/features/skills/scan-project.md +109 -0
- package/docs/readmes/features/skills/self-audit.md +176 -0
- package/docs/readmes/features/skills/tdd.md +137 -0
- package/docs/readmes/features/skills/using-skills.md +92 -0
- package/docs/readmes/features/skills/verification.md +120 -0
- package/docs/readmes/features/skills/writing-plans.md +104 -0
- package/docs/readmes/features/tooling/README.md +320 -0
- package/docs/readmes/features/workflows/README.md +186 -0
- package/docs/readmes/features/workflows/author.md +181 -0
- package/docs/readmes/features/workflows/clarify.md +154 -0
- package/docs/readmes/features/workflows/design-review.md +171 -0
- package/docs/readmes/features/workflows/design.md +169 -0
- package/docs/readmes/features/workflows/discover.md +162 -0
- package/docs/readmes/features/workflows/execute.md +173 -0
- package/docs/readmes/features/workflows/learn.md +167 -0
- package/docs/readmes/features/workflows/plan-review.md +165 -0
- package/docs/readmes/features/workflows/plan.md +170 -0
- package/docs/readmes/features/workflows/prepare-next.md +167 -0
- package/docs/readmes/features/workflows/review.md +169 -0
- package/docs/readmes/features/workflows/run-audit.md +191 -0
- package/docs/readmes/features/workflows/spec-challenge.md +159 -0
- package/docs/readmes/features/workflows/specify.md +160 -0
- package/docs/readmes/features/workflows/verify.md +177 -0
- package/docs/readmes/packages/README.md +50 -0
- package/docs/readmes/packages/ajv.md +117 -0
- package/docs/readmes/packages/context-mode.md +118 -0
- package/docs/readmes/packages/gray-matter.md +116 -0
- package/docs/readmes/packages/node-test.md +137 -0
- package/docs/readmes/packages/yaml.md +112 -0
- package/docs/reference/configuration-reference.md +159 -0
- package/docs/reference/expertise-index.md +52 -0
- package/docs/reference/git-flow.md +43 -0
- package/docs/reference/hooks.md +87 -0
- package/docs/reference/host-exports.md +50 -0
- package/docs/reference/launch-checklist.md +172 -0
- package/docs/reference/marketplace-listings.md +76 -0
- package/docs/reference/release-process.md +34 -0
- package/docs/reference/roles-reference.md +77 -0
- package/docs/reference/skills.md +33 -0
- package/docs/reference/templates.md +29 -0
- package/docs/reference/tooling-cli.md +94 -0
- package/docs/truth-claims.yaml +222 -0
- package/expertise/PROGRESS.md +63 -0
- package/expertise/README.md +18 -0
- package/expertise/antipatterns/PROGRESS.md +56 -0
- package/expertise/antipatterns/backend/api-design-antipatterns.md +1271 -0
- package/expertise/antipatterns/backend/auth-antipatterns.md +1195 -0
- package/expertise/antipatterns/backend/caching-antipatterns.md +622 -0
- package/expertise/antipatterns/backend/database-antipatterns.md +1038 -0
- package/expertise/antipatterns/backend/index.md +24 -0
- package/expertise/antipatterns/backend/microservices-antipatterns.md +850 -0
- package/expertise/antipatterns/code/architecture-antipatterns.md +919 -0
- package/expertise/antipatterns/code/async-antipatterns.md +622 -0
- package/expertise/antipatterns/code/code-smells.md +1186 -0
- package/expertise/antipatterns/code/dependency-antipatterns.md +1209 -0
- package/expertise/antipatterns/code/error-handling-antipatterns.md +1360 -0
- package/expertise/antipatterns/code/index.md +27 -0
- package/expertise/antipatterns/code/naming-and-abstraction.md +1118 -0
- package/expertise/antipatterns/code/state-management-antipatterns.md +1076 -0
- package/expertise/antipatterns/code/testing-antipatterns.md +1053 -0
- package/expertise/antipatterns/design/accessibility-antipatterns.md +1136 -0
- package/expertise/antipatterns/design/dark-patterns.md +1121 -0
- package/expertise/antipatterns/design/index.md +22 -0
- package/expertise/antipatterns/design/ui-antipatterns.md +1202 -0
- package/expertise/antipatterns/design/ux-antipatterns.md +680 -0
- package/expertise/antipatterns/frontend/css-layout-antipatterns.md +691 -0
- package/expertise/antipatterns/frontend/flutter-antipatterns.md +1827 -0
- package/expertise/antipatterns/frontend/index.md +23 -0
- package/expertise/antipatterns/frontend/mobile-antipatterns.md +573 -0
- package/expertise/antipatterns/frontend/react-antipatterns.md +1128 -0
- package/expertise/antipatterns/frontend/spa-antipatterns.md +1235 -0
- package/expertise/antipatterns/index.md +31 -0
- package/expertise/antipatterns/performance/index.md +20 -0
- package/expertise/antipatterns/performance/performance-antipatterns.md +1013 -0
- package/expertise/antipatterns/performance/premature-optimization.md +623 -0
- package/expertise/antipatterns/performance/scaling-antipatterns.md +785 -0
- package/expertise/antipatterns/process/ai-coding-antipatterns.md +853 -0
- package/expertise/antipatterns/process/code-review-antipatterns.md +656 -0
- package/expertise/antipatterns/process/deployment-antipatterns.md +920 -0
- package/expertise/antipatterns/process/index.md +23 -0
- package/expertise/antipatterns/process/technical-debt-antipatterns.md +647 -0
- package/expertise/antipatterns/security/index.md +20 -0
- package/expertise/antipatterns/security/secrets-antipatterns.md +849 -0
- package/expertise/antipatterns/security/security-theater.md +843 -0
- package/expertise/antipatterns/security/vulnerability-patterns.md +801 -0
- package/expertise/architecture/PROGRESS.md +70 -0
- package/expertise/architecture/data/caching-architecture.md +671 -0
- package/expertise/architecture/data/data-consistency.md +574 -0
- package/expertise/architecture/data/data-modeling.md +536 -0
- package/expertise/architecture/data/event-streams-and-queues.md +634 -0
- package/expertise/architecture/data/index.md +25 -0
- package/expertise/architecture/data/search-architecture.md +663 -0
- package/expertise/architecture/data/sql-vs-nosql.md +708 -0
- package/expertise/architecture/decisions/architecture-decision-records.md +640 -0
- package/expertise/architecture/decisions/build-vs-buy.md +616 -0
- package/expertise/architecture/decisions/index.md +23 -0
- package/expertise/architecture/decisions/monolith-to-microservices.md +790 -0
- package/expertise/architecture/decisions/technology-selection.md +616 -0
- package/expertise/architecture/distributed/cap-theorem-and-tradeoffs.md +800 -0
- package/expertise/architecture/distributed/circuit-breaker-bulkhead.md +741 -0
- package/expertise/architecture/distributed/consensus-and-coordination.md +796 -0
- package/expertise/architecture/distributed/distributed-systems-fundamentals.md +564 -0
- package/expertise/architecture/distributed/idempotency-and-retry.md +796 -0
- package/expertise/architecture/distributed/index.md +25 -0
- package/expertise/architecture/distributed/saga-pattern.md +797 -0
- package/expertise/architecture/foundations/architectural-thinking.md +460 -0
- package/expertise/architecture/foundations/coupling-and-cohesion.md +770 -0
- package/expertise/architecture/foundations/design-principles-solid.md +649 -0
- package/expertise/architecture/foundations/domain-driven-design.md +719 -0
- package/expertise/architecture/foundations/index.md +25 -0
- package/expertise/architecture/foundations/separation-of-concerns.md +472 -0
- package/expertise/architecture/foundations/twelve-factor-app.md +797 -0
- package/expertise/architecture/index.md +34 -0
- package/expertise/architecture/integration/api-design-graphql.md +638 -0
- package/expertise/architecture/integration/api-design-grpc.md +804 -0
- package/expertise/architecture/integration/api-design-rest.md +892 -0
- package/expertise/architecture/integration/index.md +25 -0
- package/expertise/architecture/integration/third-party-integration.md +795 -0
- package/expertise/architecture/integration/webhooks-and-callbacks.md +1152 -0
- package/expertise/architecture/integration/websockets-realtime.md +791 -0
- package/expertise/architecture/mobile-architecture/index.md +22 -0
- package/expertise/architecture/mobile-architecture/mobile-app-architecture.md +780 -0
- package/expertise/architecture/mobile-architecture/mobile-backend-for-frontend.md +670 -0
- package/expertise/architecture/mobile-architecture/offline-first.md +719 -0
- package/expertise/architecture/mobile-architecture/push-and-sync.md +782 -0
- package/expertise/architecture/patterns/cqrs-event-sourcing.md +717 -0
- package/expertise/architecture/patterns/event-driven.md +797 -0
- package/expertise/architecture/patterns/hexagonal-clean-architecture.md +870 -0
- package/expertise/architecture/patterns/index.md +27 -0
- package/expertise/architecture/patterns/layered-architecture.md +736 -0
- package/expertise/architecture/patterns/microservices.md +753 -0
- package/expertise/architecture/patterns/modular-monolith.md +692 -0
- package/expertise/architecture/patterns/monolith.md +626 -0
- package/expertise/architecture/patterns/plugin-architecture.md +735 -0
- package/expertise/architecture/patterns/serverless.md +780 -0
- package/expertise/architecture/scaling/database-scaling.md +615 -0
- package/expertise/architecture/scaling/feature-flags-and-rollouts.md +757 -0
- package/expertise/architecture/scaling/horizontal-vs-vertical.md +606 -0
- package/expertise/architecture/scaling/index.md +24 -0
- package/expertise/architecture/scaling/multi-tenancy.md +800 -0
- package/expertise/architecture/scaling/stateless-design.md +787 -0
- package/expertise/backend/embedded-firmware.md +625 -0
- package/expertise/backend/go.md +853 -0
- package/expertise/backend/index.md +24 -0
- package/expertise/backend/java-spring.md +448 -0
- package/expertise/backend/node-typescript.md +625 -0
- package/expertise/backend/python-fastapi.md +724 -0
- package/expertise/backend/rust.md +458 -0
- package/expertise/backend/solidity.md +711 -0
- package/expertise/composition-map.yaml +443 -0
- package/expertise/content/foundations/content-modeling.md +395 -0
- package/expertise/content/foundations/editorial-standards.md +449 -0
- package/expertise/content/foundations/index.md +24 -0
- package/expertise/content/foundations/microcopy.md +455 -0
- package/expertise/content/foundations/terminology-governance.md +509 -0
- package/expertise/content/index.md +34 -0
- package/expertise/content/patterns/accessibility-copy.md +518 -0
- package/expertise/content/patterns/index.md +24 -0
- package/expertise/content/patterns/notification-content.md +433 -0
- package/expertise/content/patterns/sample-content.md +486 -0
- package/expertise/content/patterns/state-copy.md +439 -0
- package/expertise/design/PROGRESS.md +58 -0
- package/expertise/design/disciplines/dark-mode-theming.md +577 -0
- package/expertise/design/disciplines/design-systems.md +595 -0
- package/expertise/design/disciplines/index.md +25 -0
- package/expertise/design/disciplines/information-architecture.md +800 -0
- package/expertise/design/disciplines/interaction-design.md +788 -0
- package/expertise/design/disciplines/responsive-design.md +552 -0
- package/expertise/design/disciplines/usability-testing.md +516 -0
- package/expertise/design/disciplines/user-research.md +792 -0
- package/expertise/design/foundations/accessibility-design.md +796 -0
- package/expertise/design/foundations/color-theory.md +797 -0
- package/expertise/design/foundations/iconography.md +795 -0
- package/expertise/design/foundations/index.md +26 -0
- package/expertise/design/foundations/motion-and-animation.md +653 -0
- package/expertise/design/foundations/rtl-design.md +585 -0
- package/expertise/design/foundations/spacing-and-layout.md +607 -0
- package/expertise/design/foundations/typography.md +800 -0
- package/expertise/design/foundations/visual-hierarchy.md +761 -0
- package/expertise/design/index.md +32 -0
- package/expertise/design/patterns/authentication-flows.md +474 -0
- package/expertise/design/patterns/content-consumption.md +789 -0
- package/expertise/design/patterns/data-display.md +618 -0
- package/expertise/design/patterns/e-commerce.md +1494 -0
- package/expertise/design/patterns/feedback-and-states.md +642 -0
- package/expertise/design/patterns/forms-and-input.md +819 -0
- package/expertise/design/patterns/gamification.md +801 -0
- package/expertise/design/patterns/index.md +31 -0
- package/expertise/design/patterns/microinteractions.md +449 -0
- package/expertise/design/patterns/navigation.md +800 -0
- package/expertise/design/patterns/notifications.md +705 -0
- package/expertise/design/patterns/onboarding.md +700 -0
- package/expertise/design/patterns/search-and-filter.md +601 -0
- package/expertise/design/patterns/settings-and-preferences.md +768 -0
- package/expertise/design/patterns/social-and-community.md +748 -0
- package/expertise/design/platforms/desktop-native.md +612 -0
- package/expertise/design/platforms/index.md +25 -0
- package/expertise/design/platforms/mobile-android.md +825 -0
- package/expertise/design/platforms/mobile-cross-platform.md +983 -0
- package/expertise/design/platforms/mobile-ios.md +699 -0
- package/expertise/design/platforms/tablet.md +794 -0
- package/expertise/design/platforms/web-dashboard.md +790 -0
- package/expertise/design/platforms/web-responsive.md +550 -0
- package/expertise/design/psychology/behavioral-nudges.md +449 -0
- package/expertise/design/psychology/cognitive-load.md +1191 -0
- package/expertise/design/psychology/error-psychology.md +778 -0
- package/expertise/design/psychology/index.md +22 -0
- package/expertise/design/psychology/persuasive-design.md +736 -0
- package/expertise/design/psychology/user-mental-models.md +623 -0
- package/expertise/design/tooling/open-pencil.md +266 -0
- package/expertise/frontend/angular.md +1073 -0
- package/expertise/frontend/desktop-electron.md +546 -0
- package/expertise/frontend/flutter.md +782 -0
- package/expertise/frontend/index.md +27 -0
- package/expertise/frontend/native-android.md +409 -0
- package/expertise/frontend/native-ios.md +490 -0
- package/expertise/frontend/react-native.md +1160 -0
- package/expertise/frontend/react.md +808 -0
- package/expertise/frontend/vue.md +1089 -0
- package/expertise/humanize/domain-rules-code.md +79 -0
- package/expertise/humanize/domain-rules-content.md +67 -0
- package/expertise/humanize/domain-rules-technical-docs.md +56 -0
- package/expertise/humanize/index.md +35 -0
- package/expertise/humanize/self-audit-checklist.md +87 -0
- package/expertise/humanize/sentence-patterns.md +218 -0
- package/expertise/humanize/vocabulary-blacklist.md +105 -0
- package/expertise/i18n/PROGRESS.md +65 -0
- package/expertise/i18n/advanced/accessibility-and-i18n.md +28 -0
- package/expertise/i18n/advanced/bidirectional-text-algorithm.md +38 -0
- package/expertise/i18n/advanced/complex-scripts.md +30 -0
- package/expertise/i18n/advanced/performance-and-i18n.md +27 -0
- package/expertise/i18n/advanced/testing-i18n.md +28 -0
- package/expertise/i18n/content/content-adaptation.md +23 -0
- package/expertise/i18n/content/locale-specific-formatting.md +23 -0
- package/expertise/i18n/content/machine-translation-integration.md +28 -0
- package/expertise/i18n/content/translation-management.md +29 -0
- package/expertise/i18n/foundations/date-time-calendars.md +67 -0
- package/expertise/i18n/foundations/i18n-architecture.md +272 -0
- package/expertise/i18n/foundations/locale-and-language-tags.md +79 -0
- package/expertise/i18n/foundations/numbers-currency-units.md +61 -0
- package/expertise/i18n/foundations/pluralization-and-gender.md +109 -0
- package/expertise/i18n/foundations/string-externalization.md +236 -0
- package/expertise/i18n/foundations/text-direction-bidi.md +241 -0
- package/expertise/i18n/foundations/unicode-and-encoding.md +86 -0
- package/expertise/i18n/index.md +38 -0
- package/expertise/i18n/platform/backend-i18n.md +31 -0
- package/expertise/i18n/platform/flutter-i18n.md +148 -0
- package/expertise/i18n/platform/native-android-i18n.md +36 -0
- package/expertise/i18n/platform/native-ios-i18n.md +36 -0
- package/expertise/i18n/platform/react-i18n.md +103 -0
- package/expertise/i18n/platform/web-css-i18n.md +81 -0
- package/expertise/i18n/rtl/arabic-specific.md +175 -0
- package/expertise/i18n/rtl/hebrew-specific.md +149 -0
- package/expertise/i18n/rtl/rtl-animations-and-transitions.md +111 -0
- package/expertise/i18n/rtl/rtl-forms-and-input.md +161 -0
- package/expertise/i18n/rtl/rtl-fundamentals.md +211 -0
- package/expertise/i18n/rtl/rtl-icons-and-images.md +181 -0
- package/expertise/i18n/rtl/rtl-layout-mirroring.md +252 -0
- package/expertise/i18n/rtl/rtl-navigation-and-gestures.md +107 -0
- package/expertise/i18n/rtl/rtl-testing-and-qa.md +147 -0
- package/expertise/i18n/rtl/rtl-typography.md +160 -0
- package/expertise/index.md +113 -0
- package/expertise/index.yaml +216 -0
- package/expertise/infrastructure/cloud-aws.md +597 -0
- package/expertise/infrastructure/cloud-gcp.md +599 -0
- package/expertise/infrastructure/cybersecurity.md +816 -0
- package/expertise/infrastructure/database-mongodb.md +447 -0
- package/expertise/infrastructure/database-postgres.md +400 -0
- package/expertise/infrastructure/devops-cicd.md +787 -0
- package/expertise/infrastructure/index.md +27 -0
- package/expertise/performance/PROGRESS.md +50 -0
- package/expertise/performance/backend/api-latency.md +1204 -0
- package/expertise/performance/backend/background-jobs.md +506 -0
- package/expertise/performance/backend/connection-pooling.md +1209 -0
- package/expertise/performance/backend/database-query-optimization.md +515 -0
- package/expertise/performance/backend/index.md +23 -0
- package/expertise/performance/backend/rate-limiting-and-throttling.md +971 -0
- package/expertise/performance/foundations/algorithmic-complexity.md +954 -0
- package/expertise/performance/foundations/caching-strategies.md +489 -0
- package/expertise/performance/foundations/concurrency-and-parallelism.md +847 -0
- package/expertise/performance/foundations/index.md +24 -0
- package/expertise/performance/foundations/measuring-and-profiling.md +440 -0
- package/expertise/performance/foundations/memory-management.md +964 -0
- package/expertise/performance/foundations/performance-budgets.md +1314 -0
- package/expertise/performance/index.md +31 -0
- package/expertise/performance/infrastructure/auto-scaling.md +1059 -0
- package/expertise/performance/infrastructure/cdn-and-edge.md +1081 -0
- package/expertise/performance/infrastructure/index.md +22 -0
- package/expertise/performance/infrastructure/load-balancing.md +1081 -0
- package/expertise/performance/infrastructure/observability.md +1079 -0
- package/expertise/performance/mobile/index.md +23 -0
- package/expertise/performance/mobile/mobile-animations.md +544 -0
- package/expertise/performance/mobile/mobile-memory-battery.md +416 -0
- package/expertise/performance/mobile/mobile-network.md +452 -0
- package/expertise/performance/mobile/mobile-rendering.md +599 -0
- package/expertise/performance/mobile/mobile-startup-time.md +505 -0
- package/expertise/performance/platform-specific/flutter-performance.md +647 -0
- package/expertise/performance/platform-specific/index.md +22 -0
- package/expertise/performance/platform-specific/node-performance.md +1307 -0
- package/expertise/performance/platform-specific/postgres-performance.md +1366 -0
- package/expertise/performance/platform-specific/react-performance.md +1403 -0
- package/expertise/performance/web/bundle-optimization.md +1239 -0
- package/expertise/performance/web/image-and-media.md +636 -0
- package/expertise/performance/web/index.md +24 -0
- package/expertise/performance/web/network-optimization.md +1133 -0
- package/expertise/performance/web/rendering-performance.md +1098 -0
- package/expertise/performance/web/ssr-and-hydration.md +918 -0
- package/expertise/performance/web/web-vitals.md +1374 -0
- package/expertise/quality/accessibility.md +985 -0
- package/expertise/quality/evidence-based-verification.md +499 -0
- package/expertise/quality/index.md +24 -0
- package/expertise/quality/ml-model-audit.md +614 -0
- package/expertise/quality/performance.md +600 -0
- package/expertise/quality/testing-api.md +891 -0
- package/expertise/quality/testing-mobile.md +496 -0
- package/expertise/quality/testing-web.md +849 -0
- package/expertise/security/PROGRESS.md +54 -0
- package/expertise/security/agentic-identity.md +540 -0
- package/expertise/security/compliance-frameworks.md +601 -0
- package/expertise/security/data/data-encryption.md +364 -0
- package/expertise/security/data/data-privacy-gdpr.md +692 -0
- package/expertise/security/data/database-security.md +1171 -0
- package/expertise/security/data/index.md +22 -0
- package/expertise/security/data/pii-handling.md +531 -0
- package/expertise/security/foundations/authentication.md +1041 -0
- package/expertise/security/foundations/authorization.md +603 -0
- package/expertise/security/foundations/cryptography.md +1001 -0
- package/expertise/security/foundations/index.md +25 -0
- package/expertise/security/foundations/owasp-top-10.md +1354 -0
- package/expertise/security/foundations/secrets-management.md +1217 -0
- package/expertise/security/foundations/secure-sdlc.md +700 -0
- package/expertise/security/foundations/supply-chain-security.md +698 -0
- package/expertise/security/index.md +31 -0
- package/expertise/security/infrastructure/cloud-security-aws.md +1296 -0
- package/expertise/security/infrastructure/cloud-security-gcp.md +1376 -0
- package/expertise/security/infrastructure/container-security.md +721 -0
- package/expertise/security/infrastructure/incident-response.md +1295 -0
- package/expertise/security/infrastructure/index.md +24 -0
- package/expertise/security/infrastructure/logging-and-monitoring.md +1618 -0
- package/expertise/security/infrastructure/network-security.md +1337 -0
- package/expertise/security/mobile/index.md +23 -0
- package/expertise/security/mobile/mobile-android-security.md +1218 -0
- package/expertise/security/mobile/mobile-binary-protection.md +1229 -0
- package/expertise/security/mobile/mobile-data-storage.md +1265 -0
- package/expertise/security/mobile/mobile-ios-security.md +1401 -0
- package/expertise/security/mobile/mobile-network-security.md +1520 -0
- package/expertise/security/smart-contract-security.md +594 -0
- package/expertise/security/testing/index.md +22 -0
- package/expertise/security/testing/penetration-testing.md +1258 -0
- package/expertise/security/testing/security-code-review.md +1765 -0
- package/expertise/security/testing/threat-modeling.md +1074 -0
- package/expertise/security/testing/vulnerability-scanning.md +1062 -0
- package/expertise/security/web/api-security.md +586 -0
- package/expertise/security/web/cors-and-headers.md +433 -0
- package/expertise/security/web/csrf.md +562 -0
- package/expertise/security/web/file-upload.md +1477 -0
- package/expertise/security/web/index.md +25 -0
- package/expertise/security/web/injection.md +1375 -0
- package/expertise/security/web/session-management.md +1101 -0
- package/expertise/security/web/xss.md +1158 -0
- package/exports/README.md +17 -0
- package/exports/hosts/claude/.claude/agents/clarifier.md +42 -0
- package/exports/hosts/claude/.claude/agents/content-author.md +63 -0
- package/exports/hosts/claude/.claude/agents/designer.md +55 -0
- package/exports/hosts/claude/.claude/agents/executor.md +55 -0
- package/exports/hosts/claude/.claude/agents/learner.md +51 -0
- package/exports/hosts/claude/.claude/agents/planner.md +53 -0
- package/exports/hosts/claude/.claude/agents/researcher.md +43 -0
- package/exports/hosts/claude/.claude/agents/reviewer.md +54 -0
- package/exports/hosts/claude/.claude/agents/specifier.md +47 -0
- package/exports/hosts/claude/.claude/agents/verifier.md +71 -0
- package/exports/hosts/claude/.claude/commands/author.md +42 -0
- package/exports/hosts/claude/.claude/commands/clarify.md +38 -0
- package/exports/hosts/claude/.claude/commands/design-review.md +46 -0
- package/exports/hosts/claude/.claude/commands/design.md +44 -0
- package/exports/hosts/claude/.claude/commands/discover.md +37 -0
- package/exports/hosts/claude/.claude/commands/execute.md +48 -0
- package/exports/hosts/claude/.claude/commands/learn.md +38 -0
- package/exports/hosts/claude/.claude/commands/plan-review.md +42 -0
- package/exports/hosts/claude/.claude/commands/plan.md +39 -0
- package/exports/hosts/claude/.claude/commands/prepare-next.md +37 -0
- package/exports/hosts/claude/.claude/commands/review.md +40 -0
- package/exports/hosts/claude/.claude/commands/run-audit.md +41 -0
- package/exports/hosts/claude/.claude/commands/spec-challenge.md +41 -0
- package/exports/hosts/claude/.claude/commands/specify.md +38 -0
- package/exports/hosts/claude/.claude/commands/verify.md +37 -0
- package/exports/hosts/claude/.claude/settings.json +34 -0
- package/exports/hosts/claude/CLAUDE.md +19 -0
- package/exports/hosts/claude/export.manifest.json +38 -0
- package/exports/hosts/claude/host-package.json +67 -0
- package/exports/hosts/codex/AGENTS.md +19 -0
- package/exports/hosts/codex/export.manifest.json +38 -0
- package/exports/hosts/codex/host-package.json +41 -0
- package/exports/hosts/cursor/.cursor/hooks.json +16 -0
- package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +19 -0
- package/exports/hosts/cursor/export.manifest.json +38 -0
- package/exports/hosts/cursor/host-package.json +42 -0
- package/exports/hosts/gemini/GEMINI.md +19 -0
- package/exports/hosts/gemini/export.manifest.json +38 -0
- package/exports/hosts/gemini/host-package.json +41 -0
- package/hooks/README.md +18 -0
- package/hooks/definitions/loop_cap_guard.yaml +21 -0
- package/hooks/definitions/post_tool_capture.yaml +24 -0
- package/hooks/definitions/pre_compact_summary.yaml +19 -0
- package/hooks/definitions/pre_tool_capture_route.yaml +19 -0
- package/hooks/definitions/protected_path_write_guard.yaml +19 -0
- package/hooks/definitions/session_start.yaml +19 -0
- package/hooks/definitions/stop_handoff_harvest.yaml +20 -0
- package/hooks/loop-cap-guard +17 -0
- package/hooks/post-tool-lint +36 -0
- package/hooks/protected-path-write-guard +17 -0
- package/hooks/session-start +41 -0
- package/llms-full.txt +2355 -0
- package/llms.txt +43 -0
- package/package.json +79 -0
- package/roles/README.md +20 -0
- package/roles/clarifier.md +42 -0
- package/roles/content-author.md +63 -0
- package/roles/designer.md +55 -0
- package/roles/executor.md +55 -0
- package/roles/learner.md +51 -0
- package/roles/planner.md +53 -0
- package/roles/researcher.md +43 -0
- package/roles/reviewer.md +54 -0
- package/roles/specifier.md +47 -0
- package/roles/verifier.md +71 -0
- package/schemas/README.md +24 -0
- package/schemas/accepted-learning.schema.json +20 -0
- package/schemas/author-artifact.schema.json +156 -0
- package/schemas/clarification.schema.json +19 -0
- package/schemas/design-artifact.schema.json +80 -0
- package/schemas/docs-claim.schema.json +18 -0
- package/schemas/export-manifest.schema.json +20 -0
- package/schemas/hook.schema.json +67 -0
- package/schemas/host-export-package.schema.json +18 -0
- package/schemas/implementation-plan.schema.json +19 -0
- package/schemas/proposed-learning.schema.json +19 -0
- package/schemas/research.schema.json +18 -0
- package/schemas/review.schema.json +29 -0
- package/schemas/run-manifest.schema.json +18 -0
- package/schemas/spec-challenge.schema.json +18 -0
- package/schemas/spec.schema.json +20 -0
- package/schemas/usage.schema.json +102 -0
- package/schemas/verification-proof.schema.json +29 -0
- package/schemas/wazir-manifest.schema.json +173 -0
- package/skills/README.md +40 -0
- package/skills/brainstorming/SKILL.md +77 -0
- package/skills/debugging/SKILL.md +50 -0
- package/skills/design/SKILL.md +61 -0
- package/skills/dispatching-parallel-agents/SKILL.md +128 -0
- package/skills/executing-plans/SKILL.md +70 -0
- package/skills/finishing-a-development-branch/SKILL.md +169 -0
- package/skills/humanize/SKILL.md +123 -0
- package/skills/init-pipeline/SKILL.md +124 -0
- package/skills/prepare-next/SKILL.md +20 -0
- package/skills/receiving-code-review/SKILL.md +123 -0
- package/skills/requesting-code-review/SKILL.md +105 -0
- package/skills/requesting-code-review/code-reviewer.md +108 -0
- package/skills/run-audit/SKILL.md +197 -0
- package/skills/scan-project/SKILL.md +41 -0
- package/skills/self-audit/SKILL.md +153 -0
- package/skills/subagent-driven-development/SKILL.md +154 -0
- package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +26 -0
- package/skills/subagent-driven-development/implementer-prompt.md +102 -0
- package/skills/subagent-driven-development/spec-reviewer-prompt.md +61 -0
- package/skills/tdd/SKILL.md +23 -0
- package/skills/using-git-worktrees/SKILL.md +163 -0
- package/skills/using-skills/SKILL.md +95 -0
- package/skills/verification/SKILL.md +22 -0
- package/skills/wazir/SKILL.md +463 -0
- package/skills/writing-plans/SKILL.md +30 -0
- package/skills/writing-skills/SKILL.md +157 -0
- package/skills/writing-skills/anthropic-best-practices.md +122 -0
- package/skills/writing-skills/persuasion-principles.md +50 -0
- package/templates/README.md +20 -0
- package/templates/artifacts/README.md +10 -0
- package/templates/artifacts/accepted-learning.md +19 -0
- package/templates/artifacts/accepted-learning.template.json +12 -0
- package/templates/artifacts/author.md +74 -0
- package/templates/artifacts/author.template.json +19 -0
- package/templates/artifacts/clarification.md +21 -0
- package/templates/artifacts/clarification.template.json +12 -0
- package/templates/artifacts/execute-notes.md +19 -0
- package/templates/artifacts/implementation-plan.md +21 -0
- package/templates/artifacts/implementation-plan.template.json +11 -0
- package/templates/artifacts/learning-proposal.md +19 -0
- package/templates/artifacts/next-run-handoff.md +21 -0
- package/templates/artifacts/plan-review.md +19 -0
- package/templates/artifacts/proposed-learning.template.json +12 -0
- package/templates/artifacts/research.md +21 -0
- package/templates/artifacts/research.template.json +12 -0
- package/templates/artifacts/review-findings.md +19 -0
- package/templates/artifacts/review.template.json +11 -0
- package/templates/artifacts/run-manifest.template.json +8 -0
- package/templates/artifacts/spec-challenge.md +19 -0
- package/templates/artifacts/spec-challenge.template.json +11 -0
- package/templates/artifacts/spec.md +21 -0
- package/templates/artifacts/spec.template.json +12 -0
- package/templates/artifacts/verification-proof.md +19 -0
- package/templates/artifacts/verification-proof.template.json +11 -0
- package/templates/examples/accepted-learning.example.json +14 -0
- package/templates/examples/author.example.json +152 -0
- package/templates/examples/clarification.example.json +15 -0
- package/templates/examples/docs-claim.example.json +8 -0
- package/templates/examples/export-manifest.example.json +7 -0
- package/templates/examples/host-export-package.example.json +11 -0
- package/templates/examples/implementation-plan.example.json +17 -0
- package/templates/examples/proposed-learning.example.json +13 -0
- package/templates/examples/research.example.json +15 -0
- package/templates/examples/research.example.md +6 -0
- package/templates/examples/review.example.json +17 -0
- package/templates/examples/run-manifest.example.json +9 -0
- package/templates/examples/spec-challenge.example.json +14 -0
- package/templates/examples/spec.example.json +21 -0
- package/templates/examples/verification-proof.example.json +21 -0
- package/templates/examples/wazir-manifest.example.yaml +65 -0
- package/templates/task-definition-schema.md +99 -0
- package/tooling/README.md +20 -0
- package/tooling/src/adapters/context-mode.js +50 -0
- package/tooling/src/capture/command.js +376 -0
- package/tooling/src/capture/store.js +99 -0
- package/tooling/src/capture/usage.js +270 -0
- package/tooling/src/checks/branches.js +50 -0
- package/tooling/src/checks/brand-truth.js +110 -0
- package/tooling/src/checks/changelog.js +231 -0
- package/tooling/src/checks/command-registry.js +36 -0
- package/tooling/src/checks/commits.js +102 -0
- package/tooling/src/checks/docs-drift.js +103 -0
- package/tooling/src/checks/docs-truth.js +201 -0
- package/tooling/src/checks/runtime-surface.js +156 -0
- package/tooling/src/cli.js +116 -0
- package/tooling/src/command-options.js +56 -0
- package/tooling/src/commands/validate.js +320 -0
- package/tooling/src/doctor/command.js +91 -0
- package/tooling/src/export/command.js +77 -0
- package/tooling/src/export/compiler.js +498 -0
- package/tooling/src/guards/loop-cap-guard.js +52 -0
- package/tooling/src/guards/protected-path-write-guard.js +67 -0
- package/tooling/src/index/command.js +152 -0
- package/tooling/src/index/storage.js +1061 -0
- package/tooling/src/index/summarizers.js +261 -0
- package/tooling/src/loaders.js +18 -0
- package/tooling/src/project-root.js +22 -0
- package/tooling/src/recall/command.js +225 -0
- package/tooling/src/schema-validator.js +30 -0
- package/tooling/src/state-root.js +40 -0
- package/tooling/src/status/command.js +71 -0
- package/wazir.manifest.yaml +135 -0
- package/workflows/README.md +19 -0
- package/workflows/author.md +42 -0
- package/workflows/clarify.md +38 -0
- package/workflows/design-review.md +46 -0
- package/workflows/design.md +44 -0
- package/workflows/discover.md +37 -0
- package/workflows/execute.md +48 -0
- package/workflows/learn.md +38 -0
- package/workflows/plan-review.md +42 -0
- package/workflows/plan.md +39 -0
- package/workflows/prepare-next.md +37 -0
- package/workflows/review.md +40 -0
- package/workflows/run-audit.md +41 -0
- package/workflows/spec-challenge.md +41 -0
- package/workflows/specify.md +38 -0
- package/workflows/verify.md +37 -0
|
@@ -0,0 +1,1053 @@
|
|
|
1
|
+
# Testing Anti-Patterns
|
|
2
|
+
|
|
3
|
+
> Testing is the primary safety net for software quality, yet the tests themselves are frequently riddled with anti-patterns that create a false sense of security, slow down development, and allow real bugs to escape into production. A bad test suite is worse than no test suite: it costs time to maintain, lies about what it verifies, and erodes developer trust until the team ignores test results entirely. The anti-patterns below represent the most common, most damaging, and most subtle ways test suites fail their purpose.
|
|
4
|
+
|
|
5
|
+
> **Domain:** Code
|
|
6
|
+
> **Anti-patterns covered:** 22
|
|
7
|
+
> **Highest severity:** Critical
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## Anti-Patterns
|
|
12
|
+
|
|
13
|
+
### AP-01: The Mockery (Over-Mocking)
|
|
14
|
+
|
|
15
|
+
**Also known as:** Mock Happy, Mock Hell, Mockitis, Test Double Abuse
|
|
16
|
+
**Frequency:** Very Common
|
|
17
|
+
**Severity:** Critical
|
|
18
|
+
**Detection difficulty:** Moderate
|
|
19
|
+
|
|
20
|
+
**What it looks like:**
|
|
21
|
+
A test where the majority of the code is setting up mocks, stubs, and fakes. The system under test calls mocked dependencies that return pre-configured values, and the assertions verify that those pre-configured values flowed through correctly. The actual business logic is barely exercised because every collaborator has been replaced with a controlled stand-in.
|
|
22
|
+
|
|
23
|
+
```python
|
|
24
|
+
# BAD: More mock setup than actual testing
|
|
25
|
+
def test_process_order(self):
|
|
26
|
+
mock_db = Mock()
|
|
27
|
+
mock_payment = Mock()
|
|
28
|
+
mock_inventory = Mock()
|
|
29
|
+
mock_email = Mock()
|
|
30
|
+
mock_logger = Mock()
|
|
31
|
+
|
|
32
|
+
mock_inventory.check_stock.return_value = True
|
|
33
|
+
mock_payment.charge.return_value = PaymentResult(success=True)
|
|
34
|
+
mock_db.save.return_value = Order(id=1)
|
|
35
|
+
|
|
36
|
+
service = OrderService(mock_db, mock_payment, mock_inventory, mock_email, mock_logger)
|
|
37
|
+
result = service.process(order_data)
|
|
38
|
+
|
|
39
|
+
# Testing that mocks returned what we told them to return
|
|
40
|
+
assert result.success == True
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
**Why developers do it:**
|
|
44
|
+
Mocking makes tests fast and isolated. Developers learn early that "unit tests should not touch the database" and overcorrect by mocking everything. Mocking frameworks make it trivially easy to stub out any dependency. The resulting test runs in milliseconds and appears to provide coverage.
|
|
45
|
+
|
|
46
|
+
**What goes wrong:**
|
|
47
|
+
The test verifies the wiring between mocks, not the actual behavior. When the real database, payment gateway, or inventory service behaves differently than the mock (different error format, different null handling, different timing), the test passes but production breaks. Google's testing blog documented that over-mocked tests accounted for a significant portion of tests that passed in CI but failed to catch real integration bugs. The mocking approach also couples tests tightly to implementation details -- any refactoring of how dependencies are called breaks every test, even when the behavior is unchanged.
|
|
48
|
+
|
|
49
|
+
**The fix:**
|
|
50
|
+
Reserve mocks for true external boundaries (network calls, third-party APIs, clocks). Use real implementations for internal collaborators. For database interactions, use in-memory databases or test containers. Apply the "Sociable Unit Test" pattern where the unit under test uses real collaborators.
|
|
51
|
+
|
|
52
|
+
```python
|
|
53
|
+
# BETTER: Real collaborators, mock only the external boundary
|
|
54
|
+
def test_process_order(self):
|
|
55
|
+
db = InMemoryOrderRepository()
|
|
56
|
+
payment = FakePaymentGateway(always_succeeds=True)
|
|
57
|
+
inventory = InMemoryInventory({"SKU-1": 10})
|
|
58
|
+
email = SpyEmailSender()
|
|
59
|
+
|
|
60
|
+
service = OrderService(db, payment, inventory, email)
|
|
61
|
+
result = service.process(order_data)
|
|
62
|
+
|
|
63
|
+
assert result.success == True
|
|
64
|
+
assert db.find(result.order_id) is not None
|
|
65
|
+
assert email.sent_count == 1
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
**Detection rule:**
|
|
69
|
+
If a test method has more lines of mock setup than lines of assertions and action combined, suspect AP-01. Count the `Mock()`, `.return_value`, and `.side_effect` calls -- if they exceed 5, the test is likely over-mocked.
|
|
70
|
+
|
|
71
|
+
---
|
|
72
|
+
|
|
73
|
+
### AP-02: Testing Implementation, Not Behavior
|
|
74
|
+
|
|
75
|
+
**Also known as:** Structural Inspection, White-Box Obsession, Brittle Tests
|
|
76
|
+
**Frequency:** Very Common
|
|
77
|
+
**Severity:** Critical
|
|
78
|
+
**Detection difficulty:** Moderate
|
|
79
|
+
|
|
80
|
+
**What it looks like:**
|
|
81
|
+
Tests that assert on internal method calls, private state, execution order, or specific code paths rather than observable outputs. The test knows exactly how the code works and verifies that it works that way, rather than verifying what it produces.
|
|
82
|
+
|
|
83
|
+
```javascript
|
|
84
|
+
// BAD: Testing HOW it works, not WHAT it does
|
|
85
|
+
test('calculates discount', () => {
|
|
86
|
+
const calculator = new PriceCalculator();
|
|
87
|
+
const spy = jest.spyOn(calculator, '_applyTierDiscount');
|
|
88
|
+
|
|
89
|
+
calculator.calculateTotal(items, customer);
|
|
90
|
+
|
|
91
|
+
expect(spy).toHaveBeenCalledWith(items, 'gold');
|
|
92
|
+
expect(spy).toHaveBeenCalledTimes(1);
|
|
93
|
+
});
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
**Why developers do it:**
|
|
97
|
+
It feels thorough. Developers reason: "I know the discount logic goes through `_applyTierDiscount`, so I should verify it gets called." It is also easier to write -- asserting on calls is simpler than computing the expected output. Code coverage tools reward this approach because every branch gets "tested."
|
|
98
|
+
|
|
99
|
+
**What goes wrong:**
|
|
100
|
+
Every refactoring breaks the tests even when the behavior is correct. Kent Beck's principle states: "Programmer tests should be sensitive to behavior changes and insensitive to structure changes." When tests are coupled to structure, developers stop refactoring because the cost of updating tests exceeds the perceived benefit. The codebase ossifies. Meanwhile, the tests provide a false sense of security because they verify the mechanism, not the result -- a method could be called correctly but produce the wrong output, and the test would still pass.
|
|
101
|
+
|
|
102
|
+
**The fix:**
|
|
103
|
+
Test the public API. Assert on outputs, side effects observable from outside, and state changes visible through the public interface.
|
|
104
|
+
|
|
105
|
+
```javascript
|
|
106
|
+
// GOOD: Testing WHAT it does
|
|
107
|
+
test('gold customers get 20% discount on orders over $100', () => {
|
|
108
|
+
const calculator = new PriceCalculator();
|
|
109
|
+
const goldCustomer = { tier: 'gold' };
|
|
110
|
+
const items = [{ price: 50 }, { price: 80 }];
|
|
111
|
+
|
|
112
|
+
const total = calculator.calculateTotal(items, goldCustomer);
|
|
113
|
+
|
|
114
|
+
expect(total).toBe(104); // (50 + 80) * 0.80
|
|
115
|
+
});
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
**Detection rule:**
|
|
119
|
+
If a test uses `spyOn` on private/internal methods, accesses properties prefixed with `_`, uses reflection to read private fields, or asserts on the number of times an internal method was called, suspect AP-02.
|
|
120
|
+
|
|
121
|
+
---
|
|
122
|
+
|
|
123
|
+
### AP-03: Flaky / Non-Deterministic Tests
|
|
124
|
+
|
|
125
|
+
**Also known as:** The Blinking Test, Heisenbug Test, Random Failures
|
|
126
|
+
**Frequency:** Very Common
|
|
127
|
+
**Severity:** Critical
|
|
128
|
+
**Detection difficulty:** Hard
|
|
129
|
+
|
|
130
|
+
**What it looks like:**
|
|
131
|
+
A test that passes most of the time but occasionally fails without any code change. Re-running the test suite makes it pass again. Common causes include: reliance on system time, race conditions in async code, order-dependent tests, network calls to real services, and uncontrolled randomness.
|
|
132
|
+
|
|
133
|
+
```java
|
|
134
|
+
// BAD: Depends on timing
|
|
135
|
+
@Test
|
|
136
|
+
void testCacheExpiry() {
|
|
137
|
+
cache.put("key", "value", Duration.ofMillis(100));
|
|
138
|
+
Thread.sleep(150); // Might not be enough on a loaded CI server
|
|
139
|
+
assertNull(cache.get("key"));
|
|
140
|
+
}
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
**Why developers do it:**
|
|
144
|
+
The test works on the developer's machine. CI servers are faster or slower than expected, but the developer does not see the failure locally. Some flakiness is introduced unknowingly through shared state or implicit ordering. The developer ships it, and the flakiness only manifests under load or on different hardware.
|
|
145
|
+
|
|
146
|
+
**What goes wrong:**
|
|
147
|
+
Google reported that approximately 16% of their tests exhibit flaky behavior, and flaky tests took 1.5 times longer to fix than non-flaky ones. At scale, flaky tests cost engineering organizations over $4.3M annually in lost productivity (investigation time, re-runs, lost trust). The worst consequence is cultural: developers learn to ignore test failures ("it's just a flaky test"), and real bugs slip through because the signal is buried in noise. Spotify reported that their pre-merge suite of 48,000 tests required dedicated tooling to skip slow and flaky tests, and they invested heavily in flakiness detection infrastructure.
|
|
148
|
+
|
|
149
|
+
**The fix:**
|
|
150
|
+
Eliminate non-determinism at the source. Use injectable clocks instead of `Thread.sleep`. Use deterministic seeds for randomness. Isolate test state so order does not matter. Replace real network calls with controlled fakes. For async operations, use explicit synchronization (latches, futures, polling with timeout) rather than arbitrary delays.
|
|
151
|
+
|
|
152
|
+
```java
|
|
153
|
+
// GOOD: Deterministic time control
|
|
154
|
+
@Test
|
|
155
|
+
void testCacheExpiry() {
|
|
156
|
+
FakeClock clock = new FakeClock();
|
|
157
|
+
Cache cache = new Cache(clock);
|
|
158
|
+
cache.put("key", "value", Duration.ofMillis(100));
|
|
159
|
+
clock.advance(Duration.ofMillis(150));
|
|
160
|
+
assertNull(cache.get("key"));
|
|
161
|
+
}
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
**Detection rule:**
|
|
165
|
+
If a test uses `Thread.sleep`, `time.sleep`, `setTimeout` with a magic number, `Date.now()`, `Math.random()` without a seed, or makes real HTTP calls, suspect AP-03. Also flag any test that has been re-run or marked `@Retry` in CI configuration.
|
|
166
|
+
|
|
167
|
+
---
|
|
168
|
+
|
|
169
|
+
### AP-04: Testing Private Methods Directly
|
|
170
|
+
|
|
171
|
+
**Also known as:** The Anal Probe, The Inspector, Encapsulation Violation
|
|
172
|
+
**Frequency:** Common
|
|
173
|
+
**Severity:** High
|
|
174
|
+
**Detection difficulty:** Easy
|
|
175
|
+
|
|
176
|
+
**What it looks like:**
|
|
177
|
+
Tests that use reflection, `@VisibleForTesting` annotations, or language-specific hacks to access and test private/internal methods directly. The test reaches inside the class to call methods that are not part of the public contract.
|
|
178
|
+
|
|
179
|
+
```csharp
|
|
180
|
+
// BAD: Using reflection to test a private method
|
|
181
|
+
[Test]
|
|
182
|
+
public void TestParseInternalFormat()
|
|
183
|
+
{
|
|
184
|
+
var parser = new DataProcessor();
|
|
185
|
+
var method = typeof(DataProcessor).GetMethod("ParseInternalFormat",
|
|
186
|
+
BindingFlags.NonPublic | BindingFlags.Instance);
|
|
187
|
+
var result = method.Invoke(parser, new object[] { rawData });
|
|
188
|
+
Assert.AreEqual(expected, result);
|
|
189
|
+
}
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
**Why developers do it:**
|
|
193
|
+
The private method contains complex logic that the developer wants to test in isolation. Testing it through the public API feels indirect and requires more setup. The developer reasons: "This private method is the hard part; I should test it directly." Some code coverage tools flag uncovered private methods, pressuring developers to test them.
|
|
194
|
+
|
|
195
|
+
**What goes wrong:**
|
|
196
|
+
The test is now coupled to the internal structure. Any refactoring -- renaming the method, changing its signature, inlining it, or splitting it -- breaks the test. Worse, it signals a design problem: if a private method is complex enough to need its own tests, it likely belongs in a separate class. Vladimir Khorikov documented that the root issue is not encapsulation violation per se, but that testing private methods masks a missing abstraction that should be extracted and tested through its own public API.
|
|
197
|
+
|
|
198
|
+
**The fix:**
|
|
199
|
+
Test private methods indirectly through the public API. If the private method is too complex for that, extract it into a separate class with its own public interface and test that class directly.
|
|
200
|
+
|
|
201
|
+
```csharp
|
|
202
|
+
// GOOD: Extract the complex logic into its own testable class
|
|
203
|
+
public class InternalFormatParser
|
|
204
|
+
{
|
|
205
|
+
public ParsedData Parse(byte[] rawData) { /* ... */ }
|
|
206
|
+
}
|
|
207
|
+
|
|
208
|
+
[Test]
|
|
209
|
+
public void TestInternalFormatParsing()
|
|
210
|
+
{
|
|
211
|
+
var parser = new InternalFormatParser();
|
|
212
|
+
var result = parser.Parse(rawData);
|
|
213
|
+
Assert.AreEqual(expected, result);
|
|
214
|
+
}
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
**Detection rule:**
|
|
218
|
+
If a test uses reflection to access non-public members, uses `@VisibleForTesting` or `internal` access modifiers added solely for testing, or imports a method that starts with `_`, suspect AP-04.
|
|
219
|
+
|
|
220
|
+
---
|
|
221
|
+
|
|
222
|
+
### AP-05: The Coverage Obsession
|
|
223
|
+
|
|
224
|
+
**Also known as:** 100% Coverage Cult, Goodhart's Test, Metric Gaming
|
|
225
|
+
**Frequency:** Common
|
|
226
|
+
**Severity:** High
|
|
227
|
+
**Detection difficulty:** Moderate
|
|
228
|
+
|
|
229
|
+
**What it looks like:**
|
|
230
|
+
Teams enforce a hard 100% (or near-100%) code coverage requirement. Developers write trivial tests for getters, setters, constructors, and configuration code just to hit the number. Tests verify that code runs but not that it works correctly. Coverage becomes a KPI that is gamed rather than a signal that is interpreted.
|
|
231
|
+
|
|
232
|
+
```java
|
|
233
|
+
// BAD: Testing a getter to inflate coverage
|
|
234
|
+
@Test
|
|
235
|
+
void testGetName() {
|
|
236
|
+
User user = new User("Alice");
|
|
237
|
+
assertEquals("Alice", user.getName());
|
|
238
|
+
}
|
|
239
|
+
|
|
240
|
+
// BAD: Testing framework configuration
|
|
241
|
+
@Test
|
|
242
|
+
void testSpringContextLoads() {
|
|
243
|
+
assertNotNull(applicationContext);
|
|
244
|
+
}
|
|
245
|
+
```
|
|
246
|
+
|
|
247
|
+
**Why developers do it:**
|
|
248
|
+
Management mandates coverage thresholds. CI pipelines reject PRs below the threshold. Developers fill the gap with the easiest possible tests rather than the most valuable ones. The metric is visible and gameable, while actual test quality is invisible and subjective.
|
|
249
|
+
|
|
250
|
+
**What goes wrong:**
|
|
251
|
+
Google's own testing guidelines recommend "60% as acceptable, 75% as commendable, and 90% as exemplary" -- not 100%. Research has shown that when coverage becomes a target, teams optimize for the metric rather than for quality (Goodhart's Law). The resulting tests are expensive to maintain, break on every refactoring, and provide no safety. Codecov's analysis documented that teams pursuing 100% coverage spent disproportionate effort on the last 10-20% of code (edge cases in generated code, third-party adapters, trivial boilerplate), producing tests that caught zero real bugs. Meanwhile, the complex business logic at 80% coverage was undertested because developers spent their time elsewhere.
|
|
252
|
+
|
|
253
|
+
**The fix:**
|
|
254
|
+
Set coverage floors (70-85%), not ceilings. Track coverage trends rather than absolute numbers. Measure mutation testing scores for critical modules -- mutation testing verifies that tests actually catch bugs, not just execute code. Exclude generated code, DTOs, and trivial boilerplate from coverage requirements.
|
|
255
|
+
|
|
256
|
+
**Detection rule:**
|
|
257
|
+
If a test file contains only getter/setter tests, constructor tests, or tests that assert `assertNotNull` on injected dependencies, suspect AP-05. If the team has a 100% coverage requirement and tests are being added with no assertions beyond "it runs," this is active.
|
|
258
|
+
|
|
259
|
+
---
|
|
260
|
+
|
|
261
|
+
### AP-06: Assertion-Free Testing
|
|
262
|
+
|
|
263
|
+
**Also known as:** The Secret Catcher, The Placebo Test, Happy Path Smoke
|
|
264
|
+
**Frequency:** Common
|
|
265
|
+
**Severity:** High
|
|
266
|
+
**Detection difficulty:** Easy
|
|
267
|
+
|
|
268
|
+
**What it looks like:**
|
|
269
|
+
A test that calls production code but contains no assertions. It relies entirely on "no exception thrown" as the success criterion. The test method invokes a function and considers the test passed if execution completes without error.
|
|
270
|
+
|
|
271
|
+
```python
|
|
272
|
+
# BAD: No assertions at all
|
|
273
|
+
def test_process_payment():
|
|
274
|
+
processor = PaymentProcessor()
|
|
275
|
+
processor.process(valid_order)
|
|
276
|
+
# ... that's it. No assertions.
|
|
277
|
+
|
|
278
|
+
# BAD: Assert only that no exception was thrown
|
|
279
|
+
def test_generate_report():
|
|
280
|
+
generator = ReportGenerator()
|
|
281
|
+
try:
|
|
282
|
+
generator.generate(data)
|
|
283
|
+
except Exception:
|
|
284
|
+
self.fail("Report generation raised an exception")
|
|
285
|
+
# But never checks the report content
|
|
286
|
+
```
|
|
287
|
+
|
|
288
|
+
**Why developers do it:**
|
|
289
|
+
The developer wants quick coverage credit. The function is hard to observe (it writes to a file, sends an email, updates a database) and the developer does not invest in making the output inspectable. Martin Fowler documented this as "Assertion Free Testing" and noted that the most common reason is lack of observability -- the system under test does not expose its results in a way that is easy to assert on.
|
|
290
|
+
|
|
291
|
+
**What goes wrong:**
|
|
292
|
+
The test passes regardless of whether the function produces correct output. A payment processor that silently charges the wrong amount, a report generator that produces empty files, a data pipeline that drops records -- all pass these tests. The developer and the team believe these paths are tested. They are not. Research on JUnit test suites found that tests without assertions can achieve 100% code coverage while catching zero defects.
|
|
293
|
+
|
|
294
|
+
**The fix:**
|
|
295
|
+
Every test must assert on at least one observable outcome. If the function's output is hard to observe, refactor for testability: return values instead of void, use spy objects for side effects, or inject observable collaborators.
|
|
296
|
+
|
|
297
|
+
```python
|
|
298
|
+
# GOOD: Assert on observable outcomes
|
|
299
|
+
def test_process_payment():
|
|
300
|
+
ledger = InMemoryLedger()
|
|
301
|
+
processor = PaymentProcessor(ledger)
|
|
302
|
+
|
|
303
|
+
result = processor.process(valid_order)
|
|
304
|
+
|
|
305
|
+
assert result.status == "completed"
|
|
306
|
+
assert ledger.last_entry().amount == valid_order.total
|
|
307
|
+
assert ledger.last_entry().merchant == valid_order.merchant_id
|
|
308
|
+
```
|
|
309
|
+
|
|
310
|
+
**Detection rule:**
|
|
311
|
+
If a test method contains zero `assert`, `expect`, `should`, or equivalent assertion calls, it is AP-06. Static analysis tools can flag test methods with no assertion statements.
|
|
312
|
+
|
|
313
|
+
---
|
|
314
|
+
|
|
315
|
+
### AP-07: The Ice Cream Cone (Inverted Test Pyramid)
|
|
316
|
+
|
|
317
|
+
**Also known as:** Inverted Pyramid, E2E Heavy, Manual Testing Addiction
|
|
318
|
+
**Frequency:** Common
|
|
319
|
+
**Severity:** High
|
|
320
|
+
**Detection difficulty:** Easy
|
|
321
|
+
|
|
322
|
+
**What it looks like:**
|
|
323
|
+
The test suite has many end-to-end and UI tests, fewer integration tests, and very few (or no) unit tests. The majority of test effort goes into manual testing. The test distribution is the inverse of the recommended testing pyramid (many unit tests at the base, fewer integration tests in the middle, few E2E tests at the top).
|
|
324
|
+
|
|
325
|
+
**Why developers do it:**
|
|
326
|
+
E2E tests feel more "real" and trustworthy because they test the whole system. Writing unit tests requires understanding dependency injection and test doubles, which feels harder. In organizations without a strong testing culture, QA teams write E2E/manual tests because that is what they know. The first tests a team writes are often E2E because they require no code changes to support testability.
|
|
327
|
+
|
|
328
|
+
**What goes wrong:**
|
|
329
|
+
LayerX, a fintech company, documented this pattern: their manual E2E test suite grew to 900 items, and despite a two-day release cycle, three bugs slipped through immediately after a stable release due to human error in manual E2E testing. The problems with the inverted pyramid are compounding: E2E tests are slow (minutes to hours per run), flaky (browser timeouts, network issues, CSS selector changes), expensive to maintain (UI changes break many tests), and provide poor failure localization (a failing E2E test does not tell you which module has the bug). Teams with this pattern ship slower because the feedback loop is measured in hours, not seconds.
|
|
330
|
+
|
|
331
|
+
**The fix:**
|
|
332
|
+
Adopt the testing pyramid: 70% unit tests (fast, isolated, precise), 20% integration tests (verify module interactions), 10% E2E tests (critical user journeys only). When an E2E test fails, write a unit test that catches the same bug, then consider removing the E2E test.
|
|
333
|
+
|
|
334
|
+
**Detection rule:**
|
|
335
|
+
Count the tests at each level. If E2E/UI tests outnumber unit tests, or if the majority of testing is manual, suspect AP-07. If the full test suite takes more than 15 minutes, the pyramid is likely inverted.
|
|
336
|
+
|
|
337
|
+
---
|
|
338
|
+
|
|
339
|
+
### AP-08: Shared Mutable State Between Tests
|
|
340
|
+
|
|
341
|
+
**Also known as:** Generous Leftovers, Test Pollution, Order-Dependent Tests
|
|
342
|
+
**Frequency:** Common
|
|
343
|
+
**Severity:** High
|
|
344
|
+
**Detection difficulty:** Hard
|
|
345
|
+
|
|
346
|
+
**What it looks like:**
|
|
347
|
+
Tests share a database, a static/global variable, a singleton, a file on disk, or an in-memory collection that is mutated by one test and read by another. Tests pass when run in a specific order but fail when run in isolation, in parallel, or in a different order.
|
|
348
|
+
|
|
349
|
+
```python
|
|
350
|
+
# BAD: Shared class-level state
|
|
351
|
+
class TestUserService:
|
|
352
|
+
users_db = {} # Shared across all tests
|
|
353
|
+
|
|
354
|
+
def test_create_user(self):
|
|
355
|
+
self.users_db["alice"] = User("alice")
|
|
356
|
+
assert "alice" in self.users_db
|
|
357
|
+
|
|
358
|
+
def test_list_users(self):
|
|
359
|
+
# Depends on test_create_user having run first
|
|
360
|
+
assert len(self.users_db) == 1
|
|
361
|
+
```
|
|
362
|
+
|
|
363
|
+
**Why developers do it:**
|
|
364
|
+
Setting up test data is expensive. Developers reason: "The previous test already created the user, so I'll just use it." Shared setup methods (`@BeforeAll`, `setUpClass`) are convenient but dangerous when they create mutable state. Some developers are unaware that test execution order is not guaranteed in most frameworks.
|
|
365
|
+
|
|
366
|
+
**What goes wrong:**
|
|
367
|
+
Research from the University of Illinois found that Test Order Dependency accounts for 12% of flaky test failures, and 74% of these issues are fixed by cleaning shared state between test runs. When tests share state, failures become non-reproducible: a test fails in CI but passes locally because the local runner uses a different order. Debugging these failures is exceptionally time-consuming because the root cause is in a different test than the one that fails. In large test suites, shared state prevents parallelization, which multiplies execution time.
|
|
368
|
+
|
|
369
|
+
**The fix:**
|
|
370
|
+
Each test must create its own state and clean up after itself. Use `@BeforeEach`/`setUp` (not `@BeforeAll`) for test data. Use transactions that roll back after each test for database tests. Never use static mutable fields in test classes.
|
|
371
|
+
|
|
372
|
+
```python
|
|
373
|
+
# GOOD: Each test owns its state
|
|
374
|
+
class TestUserService:
|
|
375
|
+
def setup_method(self):
|
|
376
|
+
self.users_db = {} # Fresh state for each test
|
|
377
|
+
|
|
378
|
+
def test_create_user(self):
|
|
379
|
+
self.users_db["alice"] = User("alice")
|
|
380
|
+
assert "alice" in self.users_db
|
|
381
|
+
|
|
382
|
+
def test_list_users_empty(self):
|
|
383
|
+
assert len(self.users_db) == 0
|
|
384
|
+
```
|
|
385
|
+
|
|
386
|
+
**Detection rule:**
|
|
387
|
+
If tests use class-level mutable fields, static variables, singletons, or `@BeforeAll`/`setUpClass` that creates mutable data, suspect AP-08. If a test has `@Order` annotations or passes only when run in a specific sequence, this is confirmed.
|
|
388
|
+
|
|
389
|
+
---
|
|
390
|
+
|
|
391
|
+
### AP-09: The Slow Suite
|
|
392
|
+
|
|
393
|
+
**Also known as:** The Slow Poke, CI Bottleneck, Coffee Break Tests
|
|
394
|
+
**Frequency:** Common
|
|
395
|
+
**Severity:** High
|
|
396
|
+
**Detection difficulty:** Easy
|
|
397
|
+
|
|
398
|
+
**What it looks like:**
|
|
399
|
+
The test suite takes 10+ minutes to run locally and 30+ minutes in CI. Developers stop running tests before pushing because "it takes too long." CI feedback arrives after the developer has context-switched to another task.
|
|
400
|
+
|
|
401
|
+
**Why developers do it:**
|
|
402
|
+
Individual tests are added without considering cumulative impact. Each test seems reasonable in isolation (100ms here, 200ms there), but 5,000 of them add up. Tests hit real databases, make real HTTP calls, or use `Thread.sleep` for synchronization. Nobody owns the test suite performance budget.
|
|
403
|
+
|
|
404
|
+
**What goes wrong:**
|
|
405
|
+
Dropbox documented that their Android test pipeline averaged 25 minutes with a worst case of 3 hours before they invested in optimization. The root cause was poor developer experience: developers stopped waiting for CI and pushed untested code. A study cited by DevOps.com estimated that a typical developer with 5 CI runs per day at 30 minutes each loses 2.5 hours daily to waiting -- equivalent to 3+ full-time engineers' time for a 10-person team. Slow suites also prevent continuous deployment: if tests take 45 minutes, you can deploy at most ~10 times per day, even with parallelization.
|
|
406
|
+
|
|
407
|
+
**The fix:**
|
|
408
|
+
Set a test suite time budget (e.g., 5 minutes for unit tests). Profile the slowest tests and fix them first. Replace real I/O with in-memory alternatives. Parallelize test execution. Separate fast unit tests from slow integration tests and run them in different CI stages. Use test impact analysis to run only tests affected by the change.
|
|
409
|
+
|
|
410
|
+
**Detection rule:**
|
|
411
|
+
If the full unit test suite takes more than 5 minutes, or if individual tests take more than 1 second, suspect AP-09. If developers have a habit of pushing without running tests ("CI will catch it"), this is confirmed.
|
|
412
|
+
|
|
413
|
+
---
|
|
414
|
+
|
|
415
|
+
### AP-10: Copy-Paste Test Code
|
|
416
|
+
|
|
417
|
+
**Also known as:** Copypasta Tests, Test Code Duplication, WET Tests
|
|
418
|
+
**Frequency:** Very Common
|
|
419
|
+
**Severity:** Medium
|
|
420
|
+
**Detection difficulty:** Easy
|
|
421
|
+
|
|
422
|
+
**What it looks like:**
|
|
423
|
+
Test methods that are near-identical copies of each other with minor variations in input data or expected values. The test setup, action, and assertion structure is duplicated across dozens of test methods. When the production API changes, every copied test must be updated individually.
|
|
424
|
+
|
|
425
|
+
```javascript
|
|
426
|
+
// BAD: Copy-pasted test with tiny variations
|
|
427
|
+
test('validates email with no @', () => {
|
|
428
|
+
const validator = new EmailValidator();
|
|
429
|
+
const result = validator.validate('invalidemail.com');
|
|
430
|
+
expect(result.valid).toBe(false);
|
|
431
|
+
expect(result.error).toBe('Invalid email format');
|
|
432
|
+
});
|
|
433
|
+
|
|
434
|
+
test('validates email with no domain', () => {
|
|
435
|
+
const validator = new EmailValidator();
|
|
436
|
+
const result = validator.validate('user@');
|
|
437
|
+
expect(result.valid).toBe(false);
|
|
438
|
+
expect(result.error).toBe('Invalid email format');
|
|
439
|
+
});
|
|
440
|
+
|
|
441
|
+
// ... 15 more identical copies with different input strings
|
|
442
|
+
```
|
|
443
|
+
|
|
444
|
+
**Why developers do it:**
|
|
445
|
+
Duplicating a test and changing one value is faster than designing a reusable test structure. The developer is "in the zone" and wants to add test cases quickly. Some developers have heard that "tests should be DAMP (Descriptive And Meaningful Phrases) not DRY" and interpret this as "duplication in tests is always fine."
|
|
446
|
+
|
|
447
|
+
**What goes wrong:**
|
|
448
|
+
When the `EmailValidator` API changes (e.g., the error message format changes), every copy must be updated. The xUnit Patterns wiki documents that Test Code Duplication causes a "very large increase in the cost to introduce new functionality because of the effort involved in updating all the tests that have copies of the affected code." In practice, developers update some copies but miss others, creating tests that fail for the wrong reason.
|
|
449
|
+
|
|
450
|
+
**The fix:**
|
|
451
|
+
Use parameterized tests (data-driven tests) for input variations. Extract common setup into helper methods or test fixtures. Keep each test readable but eliminate structural duplication.
|
|
452
|
+
|
|
453
|
+
```javascript
|
|
454
|
+
// GOOD: Parameterized test
|
|
455
|
+
test.each([
|
|
456
|
+
['invalidemail.com', 'no @ symbol'],
|
|
457
|
+
['user@', 'no domain'],
|
|
458
|
+
['@domain.com', 'no local part'],
|
|
459
|
+
['user@.com', 'domain starts with dot'],
|
|
460
|
+
])('validates email: %s (%s)', (email, _description) => {
|
|
461
|
+
const result = new EmailValidator().validate(email);
|
|
462
|
+
expect(result.valid).toBe(false);
|
|
463
|
+
});
|
|
464
|
+
```
|
|
465
|
+
|
|
466
|
+
**Detection rule:**
|
|
467
|
+
If two or more test methods in the same file share more than 80% of their code and differ only in input values, suspect AP-10. If a test file has more than 200 lines and most tests look structurally identical, this is confirmed.
|
|
468
|
+
|
|
469
|
+
---
|
|
470
|
+
|
|
471
|
+
### AP-11: Testing Too Many Things at Once
|
|
472
|
+
|
|
473
|
+
**Also known as:** The Giant, The Kitchen Sink Test, Mega-Test
|
|
474
|
+
**Frequency:** Common
|
|
475
|
+
**Severity:** Medium
|
|
476
|
+
**Detection difficulty:** Easy
|
|
477
|
+
|
|
478
|
+
**What it looks like:**
|
|
479
|
+
A single test method that verifies multiple independent behaviors. It creates data, calls several functions, and asserts on many unrelated outcomes. When it fails, the failure message does not indicate which behavior is broken.
|
|
480
|
+
|
|
481
|
+
```python
|
|
482
|
+
# BAD: Testing creation, validation, persistence, and notification in one test
|
|
483
|
+
def test_user_registration(self):
|
|
484
|
+
user = UserService.register("alice", "alice@example.com", "password123")
|
|
485
|
+
|
|
486
|
+
assert user.id is not None
|
|
487
|
+
assert user.name == "alice"
|
|
488
|
+
assert user.email == "alice@example.com"
|
|
489
|
+
assert user.password != "password123" # hashed
|
|
490
|
+
assert user.created_at is not None
|
|
491
|
+
assert UserRepository.find(user.id) is not None
|
|
492
|
+
assert EmailService.last_sent_to == "alice@example.com"
|
|
493
|
+
assert AuditLog.last_entry().action == "user_registered"
|
|
494
|
+
assert RateLimiter.attempts_for("alice@example.com") == 1
|
|
495
|
+
```
|
|
496
|
+
|
|
497
|
+
**Why developers do it:**
|
|
498
|
+
Setting up the test context is expensive, so the developer wants to maximize assertions per setup. It feels efficient: "I already have the user object, why not test everything about it?" This is especially common when the system under test has complex setup requirements.
|
|
499
|
+
|
|
500
|
+
**What goes wrong:**
|
|
501
|
+
When the test fails on the 4th assertion, the developer does not know whether assertions 5-9 would also fail. Fixing the 4th assertion and re-running might reveal another failure, creating a slow debugging cycle. The test name (`test_user_registration`) does not describe what specifically is being tested, making the test suite less useful as documentation. When adding a new feature, the developer cannot tell which mega-test to update.
|
|
502
|
+
|
|
503
|
+
**The fix:**
|
|
504
|
+
One concept per test. Group related assertions (e.g., "user is persisted with correct fields") but separate unrelated behaviors (e.g., "notification is sent" vs. "rate limiter is updated") into distinct tests. Invest in test fixture builders to make setup cheap.
|
|
505
|
+
|
|
506
|
+
**Detection rule:**
|
|
507
|
+
If a test method has more than 5-6 assertions on different objects or properties, or if the test name is generic (e.g., `testEverything`, `testUserFlow`), suspect AP-11.
|
|
508
|
+
|
|
509
|
+
---
|
|
510
|
+
|
|
511
|
+
### AP-12: Not Testing Edge Cases
|
|
512
|
+
|
|
513
|
+
**Also known as:** Happy Path Only, Golden Path Trap, Boundary Blindness
|
|
514
|
+
**Frequency:** Very Common
|
|
515
|
+
**Severity:** High
|
|
516
|
+
**Detection difficulty:** Hard
|
|
517
|
+
|
|
518
|
+
**What it looks like:**
|
|
519
|
+
Tests only cover the normal successful case. There are no tests for empty inputs, null values, boundary values (0, -1, MAX_INT), Unicode characters, concurrent access, or error conditions.
|
|
520
|
+
|
|
521
|
+
```go
|
|
522
|
+
// BAD: Only tests the happy path
|
|
523
|
+
func TestDivide(t *testing.T) {
|
|
524
|
+
result := Divide(10, 2)
|
|
525
|
+
assert.Equal(t, 5.0, result)
|
|
526
|
+
}
|
|
527
|
+
// Missing: Divide(0, 5), Divide(10, 0), Divide(-1, -1),
|
|
528
|
+
// Divide(math.MaxFloat64, 0.001), Divide(0, 0)
|
|
529
|
+
```
|
|
530
|
+
|
|
531
|
+
**Why developers do it:**
|
|
532
|
+
The happy path is the obvious test case. Edge cases require more thought and often reveal uncomfortable design questions ("what should happen when the input is nil?"). Developers under deadline pressure write the test that proves the feature works and move on. Edge cases "probably won't happen in production."
|
|
533
|
+
|
|
534
|
+
**What goes wrong:**
|
|
535
|
+
Edge cases are where the majority of production bugs live. The Ariane 5 rocket (Flight 501, 1996) exploded because a 64-bit float was converted to a 16-bit integer, causing an overflow -- a boundary condition that was never tested. More commonly, APIs that work perfectly with typical data fail on empty strings, null fields, very long inputs, or concurrent requests. These failures manifest as 500 errors in production, data corruption, or security vulnerabilities (integer overflows, buffer overruns).
|
|
536
|
+
|
|
537
|
+
**The fix:**
|
|
538
|
+
For every function, explicitly test: null/nil/undefined inputs, empty collections, zero values, negative numbers, boundary values (MAX/MIN), very large inputs, special characters (Unicode, emoji, newlines, SQL metacharacters), and concurrent access where applicable. Use property-based testing (QuickCheck, Hypothesis, fast-check) to automatically generate edge cases.
|
|
539
|
+
|
|
540
|
+
**Detection rule:**
|
|
541
|
+
If a test file has only positive-case tests (inputs that exercise the main code path) and no tests with empty, null, zero, negative, or boundary inputs, suspect AP-12.
|
|
542
|
+
|
|
543
|
+
---
|
|
544
|
+
|
|
545
|
+
### AP-13: Commenting Out Failing Tests
|
|
546
|
+
|
|
547
|
+
**Also known as:** @Ignored Tests, Skipped Tests, Test Graveyard
|
|
548
|
+
**Frequency:** Common
|
|
549
|
+
**Severity:** High
|
|
550
|
+
**Detection difficulty:** Easy
|
|
551
|
+
|
|
552
|
+
**What it looks like:**
|
|
553
|
+
Tests that were commented out, annotated with `@Ignore`/`@Disabled`/`skip`, or wrapped in `if (false)` blocks. The test was failing, and instead of fixing it, someone disabled it. The comment says "TODO: fix this" or "temporarily disabled" -- from 18 months ago.
|
|
554
|
+
|
|
555
|
+
```java
|
|
556
|
+
// BAD: Disabled "temporarily" in 2024
|
|
557
|
+
@Disabled("Flaky on CI, will fix later")
|
|
558
|
+
@Test
|
|
559
|
+
void testPaymentRefund() {
|
|
560
|
+
// ... test code that once worked ...
|
|
561
|
+
}
|
|
562
|
+
```
|
|
563
|
+
|
|
564
|
+
**Why developers do it:**
|
|
565
|
+
The test is failing and blocking the CI pipeline. The developer does not have time to investigate. Disabling it unblocks the build immediately. The developer genuinely intends to fix it later. Later never comes. Other developers see the pattern and follow it.
|
|
566
|
+
|
|
567
|
+
**What goes wrong:**
|
|
568
|
+
Disabled tests are dead code that rots. The production code they tested continues to change, so the disabled test becomes increasingly invalid. The behavior it tested is now unverified -- if a bug is introduced in that code path, nothing catches it. Over time, a test suite can accumulate dozens of disabled tests, representing a growing blind spot. The practical impact is test debt: teams documented that when regression suites are not updated, "chaos forms quickly when testers attempt to execute regression tests with known defects or technical debt."
|
|
569
|
+
|
|
570
|
+
**The fix:**
|
|
571
|
+
Treat disabled tests as bugs. If a test fails, either fix it immediately or delete it and create a tracked ticket. Set a CI rule: no `@Disabled` without a linked issue. Run a weekly report of disabled tests. If a test has been disabled for more than 2 weeks without progress, delete it.
|
|
572
|
+
|
|
573
|
+
**Detection rule:**
|
|
574
|
+
Search for `@Ignore`, `@Disabled`, `skip(`, `xit(`, `xdescribe(`, `@pytest.mark.skip`, or commented-out test methods. If any exist without a linked issue tracker reference, suspect AP-13.
|
|
575
|
+
|
|
576
|
+
---
|
|
577
|
+
|
|
578
|
+
### AP-14: Test Data Coupling
|
|
579
|
+
|
|
580
|
+
**Also known as:** Database-Dependent Tests, Seed Data Addiction, Fixture Coupling
|
|
581
|
+
**Frequency:** Common
|
|
582
|
+
**Severity:** High
|
|
583
|
+
**Detection difficulty:** Moderate
|
|
584
|
+
|
|
585
|
+
**What it looks like:**
|
|
586
|
+
Tests depend on specific data existing in a shared database, fixture file, or seed script. The test assumes that user ID 42 exists, that the "admin" role is pre-loaded, or that the test database was seeded before the suite ran.
|
|
587
|
+
|
|
588
|
+
```ruby
|
|
589
|
+
# BAD: Depends on specific seed data
|
|
590
|
+
test "admin can delete users" do
|
|
591
|
+
admin = User.find(1) # Assumes seed data has admin with ID 1
|
|
592
|
+
target = User.find(42) # Assumes user 42 exists
|
|
593
|
+
|
|
594
|
+
delete :destroy, params: { id: target.id }, session: { user_id: admin.id }
|
|
595
|
+
|
|
596
|
+
assert_response :success
|
|
597
|
+
end
|
|
598
|
+
```
|
|
599
|
+
|
|
600
|
+
**Why developers do it:**
|
|
601
|
+
Using pre-existing data is faster than creating it in each test. Seed files are "just there" in the test database. The developer tested locally where the seeds were loaded and did not realize the coupling.
|
|
602
|
+
|
|
603
|
+
**What goes wrong:**
|
|
604
|
+
Seed data changes break tests that depend on it. New developers run the tests without loading seeds and get mysterious failures. Tests cannot be run in parallel because they compete for the same database rows. Database migrations can invalidate fixture data, causing cascading test failures across the entire suite. In microservice architectures, shared test databases between services create implicit coupling that makes independent deployment impossible.
|
|
605
|
+
|
|
606
|
+
**The fix:**
|
|
607
|
+
Each test creates exactly the data it needs. Use factory patterns (Factory Bot, Fishery, test builders) to create test data declaratively. Use database transactions that roll back after each test. Never reference specific IDs or assume pre-existing data.
|
|
608
|
+
|
|
609
|
+
```ruby
|
|
610
|
+
# GOOD: Test creates its own data
|
|
611
|
+
test "admin can delete users" do
|
|
612
|
+
admin = create(:user, role: :admin)
|
|
613
|
+
target = create(:user)
|
|
614
|
+
|
|
615
|
+
delete :destroy, params: { id: target.id }, session: { user_id: admin.id }
|
|
616
|
+
|
|
617
|
+
assert_response :success
|
|
618
|
+
assert_nil User.find_by(id: target.id)
|
|
619
|
+
end
|
|
620
|
+
```
|
|
621
|
+
|
|
622
|
+
**Detection rule:**
|
|
623
|
+
If a test calls `find(literal_id)`, references hardcoded database IDs, or has a comment like "make sure seeds are loaded," suspect AP-14.
|
|
624
|
+
|
|
625
|
+
---
|
|
626
|
+
|
|
627
|
+
### AP-15: Using Sleep/Delays for Synchronization
|
|
628
|
+
|
|
629
|
+
**Also known as:** Thread.sleep Testing, Arbitrary Waits, Timing Bombs
|
|
630
|
+
**Frequency:** Common
|
|
631
|
+
**Severity:** Medium
|
|
632
|
+
**Detection difficulty:** Easy
|
|
633
|
+
|
|
634
|
+
**What it looks like:**
|
|
635
|
+
Tests use fixed-duration waits (`Thread.sleep`, `time.sleep`, `setTimeout`) to wait for asynchronous operations to complete, rather than using explicit synchronization mechanisms.
|
|
636
|
+
|
|
637
|
+
```python
|
|
638
|
+
# BAD: Arbitrary sleep
|
|
639
|
+
def test_async_processing():
|
|
640
|
+
queue.submit(job)
|
|
641
|
+
time.sleep(2) # Hope it's done by now
|
|
642
|
+
assert job.status == "completed"
|
|
643
|
+
```
|
|
644
|
+
|
|
645
|
+
**Why developers do it:**
|
|
646
|
+
It is the simplest way to handle async behavior: "just wait long enough." The developer tested locally where 2 seconds was sufficient. The alternative (polling, callbacks, latches) requires more code and understanding of concurrency primitives.
|
|
647
|
+
|
|
648
|
+
**What goes wrong:**
|
|
649
|
+
The developer is stuck between two bad options: a short sleep that causes flaky failures on slow CI servers, or a long sleep that makes the test suite unnecessarily slow. A 2-second sleep across 100 async tests adds 3+ minutes to the suite. On loaded CI servers, even generous sleeps may be insufficient, causing intermittent failures. Enterprise Craftsmanship documented that "you cannot know for sure when exactly a job will complete and thus you are essentially guessing with the time interval."
|
|
650
|
+
|
|
651
|
+
**The fix:**
|
|
652
|
+
Use polling with a timeout: check the condition repeatedly with a short interval and a maximum wait time. Use synchronization primitives (CountDownLatch, Future, Promise). Use test-specific hooks that signal completion. For UI tests, use explicit wait conditions ("wait until element is visible") rather than `sleep`.
|
|
653
|
+
|
|
654
|
+
```python
|
|
655
|
+
# GOOD: Polling with timeout
|
|
656
|
+
def test_async_processing():
|
|
657
|
+
queue.submit(job)
|
|
658
|
+
wait_until(lambda: job.status == "completed", timeout=5.0, interval=0.1)
|
|
659
|
+
assert job.status == "completed"
|
|
660
|
+
```
|
|
661
|
+
|
|
662
|
+
**Detection rule:**
|
|
663
|
+
If a test contains `sleep`, `Thread.sleep`, `time.sleep`, `Task.Delay`, or `setTimeout` with a numeric literal, suspect AP-15. Every sleep in a test is a potential flakiness source.
|
|
664
|
+
|
|
665
|
+
---
|
|
666
|
+
|
|
667
|
+
### AP-16: Tautological Tests
|
|
668
|
+
|
|
669
|
+
**Also known as:** Self-Fulfilling Tests, Circular Assertions, Mirror Tests
|
|
670
|
+
**Frequency:** Occasional
|
|
671
|
+
**Severity:** High
|
|
672
|
+
**Detection difficulty:** Very Hard
|
|
673
|
+
|
|
674
|
+
**What it looks like:**
|
|
675
|
+
A test where the expected value is computed using the same logic as the production code, so the test is guaranteed to pass by construction. The test and the code use the same formula, making the test a tautology.
|
|
676
|
+
|
|
677
|
+
```javascript
|
|
678
|
+
// BAD: Computing expected value with the same logic
|
|
679
|
+
test('calculates tax', () => {
|
|
680
|
+
const price = 100;
|
|
681
|
+
const taxRate = 0.08;
|
|
682
|
+
const expected = price * taxRate; // Same formula as production code
|
|
683
|
+
|
|
684
|
+
const result = calculateTax(price, taxRate);
|
|
685
|
+
|
|
686
|
+
expect(result).toBe(expected); // Will always pass, even if the formula is wrong
|
|
687
|
+
});
|
|
688
|
+
|
|
689
|
+
// BAD: Asserting what was just set up
|
|
690
|
+
test('user has correct name', () => {
|
|
691
|
+
const user = new User({ name: 'Alice' });
|
|
692
|
+
expect(user.name).toBe('Alice'); // Testing the constructor, not behavior
|
|
693
|
+
});
|
|
694
|
+
```
|
|
695
|
+
|
|
696
|
+
**Why developers do it:**
|
|
697
|
+
The developer wants to avoid hardcoding expected values (which feels fragile) and instead derives them. They reason: "If the formula changes, the test will automatically update." They do not realize this defeats the purpose of the test. As Randy Coulman documented, "test code that's impossible to edit without looking at the implementation is a strong indicator that you've got a tautological test."
|
|
698
|
+
|
|
699
|
+
**What goes wrong:**
|
|
700
|
+
If the formula in production code has a bug (e.g., tax should be `price * taxRate / 100` but is `price * taxRate`), the test replicates the same bug and passes. The test can never fail for a logic error because it uses the same logic. This creates maximum false confidence: the code has 100% coverage and all tests pass, but the result is wrong.
|
|
701
|
+
|
|
702
|
+
**The fix:**
|
|
703
|
+
Always use pre-computed literal values as expected results. Work from concrete examples, not from derived computations. If you need to test `calculateTax(100, 0.08)`, the expected result is `8.00`, hardcoded.
|
|
704
|
+
|
|
705
|
+
```javascript
|
|
706
|
+
// GOOD: Hardcoded expected value from requirements
|
|
707
|
+
test('calculates 8% tax on $100', () => {
|
|
708
|
+
expect(calculateTax(100, 0.08)).toBe(8.00);
|
|
709
|
+
});
|
|
710
|
+
|
|
711
|
+
test('calculates 8% tax on $250', () => {
|
|
712
|
+
expect(calculateTax(250, 0.08)).toBe(20.00);
|
|
713
|
+
});
|
|
714
|
+
```
|
|
715
|
+
|
|
716
|
+
**Detection rule:**
|
|
717
|
+
If a test's expected value is computed using a function or formula (not a literal), and that computation mirrors the production code, suspect AP-16. Also flag tests that only assert on values that were directly set in the test setup.
|
|
718
|
+
|
|
719
|
+
---
|
|
720
|
+
|
|
721
|
+
### AP-17: Not Testing Error Paths
|
|
722
|
+
|
|
723
|
+
**Also known as:** Sunny Day Testing, Exception Blindness, Error Path Amnesia
|
|
724
|
+
**Frequency:** Very Common
|
|
725
|
+
**Severity:** High
|
|
726
|
+
**Detection difficulty:** Moderate
|
|
727
|
+
|
|
728
|
+
**What it looks like:**
|
|
729
|
+
Tests verify that the function works when given valid input but never test what happens with invalid input, network failures, timeout conditions, permission errors, or resource exhaustion.
|
|
730
|
+
|
|
731
|
+
```python
|
|
732
|
+
# Tests exist for: create_user("valid_name", "valid@email.com")
|
|
733
|
+
# No tests for:
|
|
734
|
+
# create_user("", "") -- empty inputs
|
|
735
|
+
# create_user(None, None) -- null inputs
|
|
736
|
+
# create_user("a"*1000, "valid@email.com") -- oversized input
|
|
737
|
+
# create_user("valid", "valid@email.com") when DB is down
|
|
738
|
+
# create_user("valid", "valid@email.com") when duplicate exists
|
|
739
|
+
```
|
|
740
|
+
|
|
741
|
+
**Why developers do it:**
|
|
742
|
+
Error paths are less interesting to write. The developer focuses on making the feature work and writes tests to confirm it works. Error handling is often an afterthought in both the production code and the tests. Testing error paths requires simulating failures (database down, network timeout), which is harder than testing the happy path.
|
|
743
|
+
|
|
744
|
+
**What goes wrong:**
|
|
745
|
+
Error paths are where production incidents happen. When the database connection pool is exhausted, when a downstream service returns an unexpected 500, when disk space runs out -- these are the scenarios that cause outages. If these paths are untested, the error handling code may be incorrect (swallowing exceptions, returning null instead of throwing, leaking resources). The code may crash with an unhandled exception, exposing stack traces to users or creating security vulnerabilities.
|
|
746
|
+
|
|
747
|
+
**The fix:**
|
|
748
|
+
For every function, explicitly list the error conditions and write tests for each. Test: invalid inputs, null/undefined values, resource failures (DB down, network timeout), concurrent access conflicts, permission violations, and resource limits. Use fault injection to simulate infrastructure failures.
|
|
749
|
+
|
|
750
|
+
**Detection rule:**
|
|
751
|
+
If a test file has only tests for valid inputs and no tests that expect exceptions, error codes, or error states, suspect AP-17. If the production code has `try/catch` blocks or error handling, but no tests exercise those branches, this is confirmed.
|
|
752
|
+
|
|
753
|
+
---
|
|
754
|
+
|
|
755
|
+
### AP-18: Testing the Framework
|
|
756
|
+
|
|
757
|
+
**Also known as:** Framework Verification, Third-Party Testing, Library QA
|
|
758
|
+
**Frequency:** Common
|
|
759
|
+
**Severity:** Medium
|
|
760
|
+
**Detection difficulty:** Moderate
|
|
761
|
+
|
|
762
|
+
**What it looks like:**
|
|
763
|
+
Tests that verify the behavior of the framework, library, or language runtime rather than the application's own logic. The test is confirming that Spring injects dependencies, that React renders JSX, that Django ORM persists objects, or that Array.sort() works.
|
|
764
|
+
|
|
765
|
+
```java
|
|
766
|
+
// BAD: Testing that Spring DI works
|
|
767
|
+
@Test
|
|
768
|
+
void testServiceIsInjected() {
|
|
769
|
+
assertNotNull(userService);
|
|
770
|
+
assertNotNull(userRepository);
|
|
771
|
+
}
|
|
772
|
+
|
|
773
|
+
// BAD: Testing that JPA saves entities
|
|
774
|
+
@Test
|
|
775
|
+
void testUserIsSaved() {
|
|
776
|
+
User user = new User("Alice");
|
|
777
|
+
entityManager.persist(user);
|
|
778
|
+
entityManager.flush();
|
|
779
|
+
assertNotNull(user.getId()); // Testing that JPA generates IDs
|
|
780
|
+
}
|
|
781
|
+
```
|
|
782
|
+
|
|
783
|
+
**Why developers do it:**
|
|
784
|
+
It is easy to write and provides coverage numbers. The developer is learning the framework and writes tests to confirm their understanding. Some testing tutorials use framework verification as examples, and developers carry the pattern into production code. It feels safer to "verify everything."
|
|
785
|
+
|
|
786
|
+
**What goes wrong:**
|
|
787
|
+
These tests provide zero value in catching application bugs. The framework is already tested by its own test suite (Spring has thousands of tests; you do not need to re-test dependency injection). They add to the maintenance burden and suite execution time. When the framework is upgraded, these tests may break due to internal changes, creating noise that obscures real failures. As the Codepipes blog documented: "Writing software tests for trivial code because this is the correct way to 'do TDD' will get you nowhere."
|
|
788
|
+
|
|
789
|
+
**The fix:**
|
|
790
|
+
Test your application's behavior, not the framework's. If your code configures Spring beans, test that your application behaves correctly (e.g., "when a user signs up, they receive a welcome email"), not that Spring wired the beans. Trust the framework's own tests.
|
|
791
|
+
|
|
792
|
+
**Detection rule:**
|
|
793
|
+
If a test's only assertions are `assertNotNull` on injected dependencies, or if the test verifies behavior that is documented in the framework's own documentation (e.g., "JPA generates IDs"), suspect AP-18.
|
|
794
|
+
|
|
795
|
+
---
|
|
796
|
+
|
|
797
|
+
### AP-19: Logic in Tests
|
|
798
|
+
|
|
799
|
+
**Also known as:** Conditional Tests, Test Spaghetti, Computed Assertions
|
|
800
|
+
**Frequency:** Occasional
|
|
801
|
+
**Severity:** Medium
|
|
802
|
+
**Detection difficulty:** Moderate
|
|
803
|
+
|
|
804
|
+
**What it looks like:**
|
|
805
|
+
Tests that contain conditional logic (`if/else`), loops (`for/while`), or complex computations. The test code itself is complex enough to have bugs.
|
|
806
|
+
|
|
807
|
+
```python
|
|
808
|
+
# BAD: Logic in tests
|
|
809
|
+
def test_bulk_processing(self):
|
|
810
|
+
results = processor.process_batch(items)
|
|
811
|
+
|
|
812
|
+
for i, result in enumerate(results):
|
|
813
|
+
if items[i].type == "premium":
|
|
814
|
+
assert result.priority == "high"
|
|
815
|
+
elif items[i].type == "standard":
|
|
816
|
+
assert result.priority == "normal"
|
|
817
|
+
else:
|
|
818
|
+
assert result.priority == "low"
|
|
819
|
+
```
|
|
820
|
+
|
|
821
|
+
**Why developers do it:**
|
|
822
|
+
The developer wants to test multiple scenarios efficiently. Loops feel like a DRY approach to testing. The developer does not realize that the logic in the test introduces the same risk of bugs that testing is supposed to catch.
|
|
823
|
+
|
|
824
|
+
**What goes wrong:**
|
|
825
|
+
As Gil Zilberfeld documented: "Logic is a petri dish for bugs. The reason we're testing in the first place is to make sure code that contains logic works. Adding logic to tests is like inviting a vampire into your home." If the conditional in the test has a bug, the test may silently skip assertions or assert the wrong thing. The test becomes harder to read and debug because the reader must trace the logic to understand what is being verified.
|
|
826
|
+
|
|
827
|
+
**The fix:**
|
|
828
|
+
Use parameterized tests for multiple scenarios. Each test path should be explicit and linear -- no branching. If a test needs a loop, it should be a parameterized test where each iteration is an independent test case.
|
|
829
|
+
|
|
830
|
+
```python
|
|
831
|
+
# GOOD: Explicit, linear test cases
|
|
832
|
+
@pytest.mark.parametrize("item_type,expected_priority", [
|
|
833
|
+
("premium", "high"),
|
|
834
|
+
("standard", "normal"),
|
|
835
|
+
("budget", "low"),
|
|
836
|
+
])
|
|
837
|
+
def test_processing_priority(self, item_type, expected_priority):
|
|
838
|
+
item = create_item(type=item_type)
|
|
839
|
+
result = processor.process(item)
|
|
840
|
+
assert result.priority == expected_priority
|
|
841
|
+
```
|
|
842
|
+
|
|
843
|
+
**Detection rule:**
|
|
844
|
+
If a test method contains `if`, `else`, `for`, `while`, or `switch`/`match` statements, suspect AP-19. Test methods should be linear: arrange, act, assert -- no branching.
|
|
845
|
+
|
|
846
|
+
---
|
|
847
|
+
|
|
848
|
+
### AP-20: Ignoring Test Maintenance
|
|
849
|
+
|
|
850
|
+
**Also known as:** Test Rot, Stale Tests, Abandoned Test Suite
|
|
851
|
+
**Frequency:** Common
|
|
852
|
+
**Severity:** High
|
|
853
|
+
**Detection difficulty:** Moderate
|
|
854
|
+
|
|
855
|
+
**What it looks like:**
|
|
856
|
+
The test suite has not been updated alongside the production code. Tests reference deprecated APIs, use outdated patterns, have hardcoded dates that have passed, or test features that no longer exist. Warning messages flood the test output. The suite "mostly passes" but the failures are background noise.
|
|
857
|
+
|
|
858
|
+
**Why developers do it:**
|
|
859
|
+
Tests are treated as second-class code. Production code has PR reviews, coding standards, and refactoring cycles; tests do not. When a feature changes, the developer updates the production code but says "I'll fix the tests later." Test maintenance is not tracked in sprint planning or velocity calculations. Nobody "owns" the test suite.
|
|
860
|
+
|
|
861
|
+
**What goes wrong:**
|
|
862
|
+
The test suite becomes unreliable and developers stop trusting it. Legitimate failures are ignored because "that test always fails." New developers cannot use the tests to understand the system because the tests describe old behavior. The cost of reviving an abandoned test suite grows exponentially -- after 6 months of neglect, it is often cheaper to rewrite than to fix. Organizations documented that test debt, unlike code debt, affects how effectively teams validate quality, and the accumulation of gaps in test coverage reduces the team's ability to release with confidence.
|
|
863
|
+
|
|
864
|
+
**The fix:**
|
|
865
|
+
Treat test code with the same quality standards as production code. Include test updates in the definition of done for every feature. Assign test suite ownership. Track and trend: number of disabled tests, test suite execution time, flaky test rate. Budget 15-20% of development time for test maintenance.
|
|
866
|
+
|
|
867
|
+
**Detection rule:**
|
|
868
|
+
If the test suite has `@SuppressWarnings`, deprecation warnings, tests referencing removed classes, or tests that have not been modified in 12+ months while the production code has changed, suspect AP-20.
|
|
869
|
+
|
|
870
|
+
---
|
|
871
|
+
|
|
872
|
+
### AP-21: Excessive Test Setup
|
|
873
|
+
|
|
874
|
+
**Also known as:** The Ceremony, Test Novel, Arrangement Overload
|
|
875
|
+
**Frequency:** Common
|
|
876
|
+
**Severity:** Medium
|
|
877
|
+
**Detection difficulty:** Easy
|
|
878
|
+
|
|
879
|
+
**What it looks like:**
|
|
880
|
+
A test method where 80% of the code is setting up the context (creating objects, configuring dependencies, loading data) and only 2-3 lines perform the action and assertion. The setup is so long that the reader cannot see what is actually being tested.
|
|
881
|
+
|
|
882
|
+
```java
|
|
883
|
+
// BAD: 30 lines of setup for a 2-line test
|
|
884
|
+
@Test
|
|
885
|
+
void testOrderDiscount() {
|
|
886
|
+
Address address = new Address("123 Main St", "City", "ST", "12345");
|
|
887
|
+
Customer customer = new Customer("Alice", "alice@test.com", address);
|
|
888
|
+
customer.setTier(CustomerTier.GOLD);
|
|
889
|
+
customer.setMemberSince(LocalDate.of(2020, 1, 1));
|
|
890
|
+
Product product1 = new Product("Widget", 29.99, Category.ELECTRONICS);
|
|
891
|
+
Product product2 = new Product("Gadget", 49.99, Category.ELECTRONICS);
|
|
892
|
+
product1.setWeight(0.5);
|
|
893
|
+
product2.setWeight(1.2);
|
|
894
|
+
Order order = new Order(customer);
|
|
895
|
+
order.addItem(product1, 2);
|
|
896
|
+
order.addItem(product2, 1);
|
|
897
|
+
order.setShippingMethod(ShippingMethod.STANDARD);
|
|
898
|
+
order.setCouponCode("SAVE10");
|
|
899
|
+
// ... 15 more lines of setup ...
|
|
900
|
+
|
|
901
|
+
double total = order.calculateTotal();
|
|
902
|
+
|
|
903
|
+
assertEquals(92.97, total, 0.01);
|
|
904
|
+
}
|
|
905
|
+
```
|
|
906
|
+
|
|
907
|
+
**Why developers do it:**
|
|
908
|
+
The system under test has many dependencies and requires a complex object graph. The developer does not invest in builder patterns or factory methods because each test "only" needs this setup. Over time, every test copies and slightly modifies the same setup.
|
|
909
|
+
|
|
910
|
+
**What goes wrong:**
|
|
911
|
+
Tests become unreadable: the reader cannot determine what matters for this specific test and what is incidental setup. When the domain model changes (e.g., `Customer` requires a new field), every test with this setup must be updated. The setup noise hides the intent of the test, making it useless as documentation.
|
|
912
|
+
|
|
913
|
+
**The fix:**
|
|
914
|
+
Use the Builder or Object Mother pattern to create test data with sensible defaults. Only specify values that matter for the specific test. Extract common setup into well-named helper methods.
|
|
915
|
+
|
|
916
|
+
```java
|
|
917
|
+
// GOOD: Builder with defaults, only specify what matters
|
|
918
|
+
@Test
|
|
919
|
+
void testGoldCustomerGets20PercentDiscount() {
|
|
920
|
+
Order order = anOrder()
|
|
921
|
+
.withGoldCustomer()
|
|
922
|
+
.withItems(aProduct().priced(100.00))
|
|
923
|
+
.build();
|
|
924
|
+
|
|
925
|
+
double total = order.calculateTotal();
|
|
926
|
+
|
|
927
|
+
assertEquals(80.00, total, 0.01);
|
|
928
|
+
}
|
|
929
|
+
```
|
|
930
|
+
|
|
931
|
+
**Detection rule:**
|
|
932
|
+
If a test method has more than 10 lines of object construction before the action, or if the setup-to-assertion ratio exceeds 5:1, suspect AP-21.
|
|
933
|
+
|
|
934
|
+
---
|
|
935
|
+
|
|
936
|
+
### AP-22: Test Double Misuse
|
|
937
|
+
|
|
938
|
+
**Also known as:** Wrong Double, Stub-Mock Confusion, Fake Fragility
|
|
939
|
+
**Frequency:** Occasional
|
|
940
|
+
**Severity:** Medium
|
|
941
|
+
**Detection difficulty:** Hard
|
|
942
|
+
|
|
943
|
+
**What it looks like:**
|
|
944
|
+
Using the wrong type of test double for the situation: using a mock (which verifies interactions) when a stub (which provides canned answers) would suffice, or using a full mock framework when a simple hand-written fake would be clearer. Verifying every interaction with every mock, even when only the output matters.
|
|
945
|
+
|
|
946
|
+
```python
|
|
947
|
+
# BAD: Using mock verification when only the output matters
|
|
948
|
+
def test_calculate_shipping(self):
|
|
949
|
+
mock_weight_service = Mock()
|
|
950
|
+
mock_weight_service.get_weight.return_value = 5.0
|
|
951
|
+
|
|
952
|
+
cost = calculator.calculate_shipping(order, mock_weight_service)
|
|
953
|
+
|
|
954
|
+
assert cost == 12.50
|
|
955
|
+
# Unnecessary: verifying HOW the weight was retrieved
|
|
956
|
+
mock_weight_service.get_weight.assert_called_once_with(order.items)
|
|
957
|
+
mock_weight_service.get_weight.assert_called_with(order.items)
|
|
958
|
+
```
|
|
959
|
+
|
|
960
|
+
**Why developers do it:**
|
|
961
|
+
Mock frameworks make it trivially easy to add verification calls. The developer adds them "just to be thorough." The difference between mocks, stubs, fakes, and spies is poorly understood -- many developers use "mock" as a generic term for all test doubles.
|
|
962
|
+
|
|
963
|
+
**What goes wrong:**
|
|
964
|
+
Over-verified mocks couple tests to implementation details (AP-02). When the production code changes how it calls collaborators (e.g., batching calls, reordering calls, adding caching), the mock verifications fail even though the behavior is correct. This creates maintenance overhead and discourages refactoring. As the Cash App engineering blog documented: "Mocking isn't evil, but avoid it anyway" -- not because mocking is wrong, but because it is chronically misused.
|
|
965
|
+
|
|
966
|
+
**The fix:**
|
|
967
|
+
Use stubs for queries (methods that return data) and mocks only for commands (methods that cause side effects you need to verify). Prefer hand-written fakes over mock frameworks for complex collaborators. Only verify interactions that are part of the test's purpose.
|
|
968
|
+
|
|
969
|
+
**Detection rule:**
|
|
970
|
+
If a test uses `.assert_called_with()` or `verify()` on a test double that is only used for its return value, suspect AP-22. If every mock in the test has verification calls, this is confirmed.
|
|
971
|
+
|
|
972
|
+
---
|
|
973
|
+
|
|
974
|
+
## Root Cause Analysis
|
|
975
|
+
|
|
976
|
+
| Anti-Pattern | Root Cause | Prevention |
|
|
977
|
+
|---|---|---|
|
|
978
|
+
| AP-01: The Mockery | Cargo culting ("unit tests must be isolated") | Prefer sociable unit tests; mock only external boundaries |
|
|
979
|
+
| AP-02: Testing Implementation | Ignorance of behavior vs. structure distinction | Train on Kent Beck's test desiderata; review tests for structural coupling |
|
|
980
|
+
| AP-03: Flaky Tests | Laziness (non-deterministic shortcuts) | Inject clocks, seeds, and controlled dependencies; quarantine flaky tests |
|
|
981
|
+
| AP-04: Testing Private Methods | Ignorance (missing abstraction) | Extract complex private methods into separate classes |
|
|
982
|
+
| AP-05: Coverage Obsession | Cargo culting (metrics as goals) | Use mutation testing; set coverage floors not ceilings |
|
|
983
|
+
| AP-06: Assertion-Free Testing | Laziness (quick coverage credit) | Enforce assertion count > 0 in CI linter rules |
|
|
984
|
+
| AP-07: Ice Cream Cone | Ignorance (no unit test culture) | Adopt testing pyramid; require unit tests in PR reviews |
|
|
985
|
+
| AP-08: Shared Mutable State | Laziness (reusing existing data) | Fresh state per test; rollback transactions |
|
|
986
|
+
| AP-09: The Slow Suite | Premature integration (testing too much via I/O) | Set time budgets; profile slowest tests; parallelize |
|
|
987
|
+
| AP-10: Copy-Paste Tests | Laziness (faster than designing abstractions) | Use parameterized tests and test builders |
|
|
988
|
+
| AP-11: Testing Too Many Things | Laziness (maximize assertions per setup) | One concept per test; cheap test setup via builders |
|
|
989
|
+
| AP-12: Not Testing Edge Cases | Laziness (deadline pressure) | Boundary value checklists; property-based testing |
|
|
990
|
+
| AP-13: Commenting Out Tests | Laziness (unblock CI quickly) | CI rule: no @Disabled without issue link; weekly report |
|
|
991
|
+
| AP-14: Test Data Coupling | Ignorance (unaware of seed dependency) | Factory patterns; each test creates its own data |
|
|
992
|
+
| AP-15: Sleep/Delays | Ignorance (simplest async solution) | Polling with timeout; explicit synchronization |
|
|
993
|
+
| AP-16: Tautological Tests | Ignorance (computed expectations feel robust) | Always use hardcoded expected values |
|
|
994
|
+
| AP-17: Not Testing Errors | Laziness (error paths are boring) | Error condition checklist per function |
|
|
995
|
+
| AP-18: Testing the Framework | Cargo culting ("test everything") | Only test your application's behavior |
|
|
996
|
+
| AP-19: Logic in Tests | Copy-paste from AI/SO (complex test patterns) | Parameterized tests; no branching in test methods |
|
|
997
|
+
| AP-20: Ignoring Maintenance | Laziness (tests are second-class) | Include test updates in definition of done |
|
|
998
|
+
| AP-21: Excessive Setup | Ignorance (no builder/factory patterns) | Object Mother / Builder pattern for test data |
|
|
999
|
+
| AP-22: Test Double Misuse | Ignorance (mock/stub/fake confusion) | Learn test double taxonomy; stubs for queries, mocks for commands |
|
|
1000
|
+
|
|
1001
|
+
---
|
|
1002
|
+
|
|
1003
|
+
## Self-Check Questions
|
|
1004
|
+
|
|
1005
|
+
Ask these questions during code review or while writing tests:
|
|
1006
|
+
|
|
1007
|
+
1. **Am I testing behavior or implementation?** If I refactored the internals without changing the output, would this test break?
|
|
1008
|
+
2. **Does this test have at least one meaningful assertion?** Not just "it didn't throw" -- does it verify a specific output or state change?
|
|
1009
|
+
3. **Could this test fail for the wrong reason?** Is it coupled to test execution order, system time, or specific database state?
|
|
1010
|
+
4. **Am I mocking things I own?** Mocks should wrap external boundaries, not internal collaborators.
|
|
1011
|
+
5. **What happens if I change the input to null, empty, zero, or MAX_INT?** Is that tested?
|
|
1012
|
+
6. **Is the expected value hardcoded or computed?** If computed, am I just re-implementing the production logic in the test?
|
|
1013
|
+
7. **Would a new team member understand what this test verifies from the test name alone?**
|
|
1014
|
+
8. **If this test fails, will the failure message tell me exactly what broke?** Or will I need to debug?
|
|
1015
|
+
9. **Am I testing my code or the framework's code?** Would this test be useful if I swapped frameworks?
|
|
1016
|
+
10. **How long does this test take?** Would I notice if it was 10x slower?
|
|
1017
|
+
11. **Does this test create its own data, or does it depend on data created elsewhere?**
|
|
1018
|
+
12. **Am I using `sleep` or `Thread.sleep` in this test?** Is there a deterministic alternative?
|
|
1019
|
+
13. **If I delete this test, what bug could slip into production undetected?** If none, the test may not be worth maintaining.
|
|
1020
|
+
14. **Am I commenting out this test because it is flaky, or because it found a real bug I don't want to fix right now?**
|
|
1021
|
+
15. **Does my test suite have tests for error paths, not just success paths?**
|
|
1022
|
+
|
|
1023
|
+
---
|
|
1024
|
+
|
|
1025
|
+
## Code Smell Quick Reference
|
|
1026
|
+
|
|
1027
|
+
| If you see... | Suspect... | Verify... |
|
|
1028
|
+
|---|---|---|
|
|
1029
|
+
| More mock setup lines than assertion lines | AP-01: The Mockery | Are internal collaborators mocked? Could real implementations be used? |
|
|
1030
|
+
| `spyOn(obj, '_privateMethod')` | AP-02: Testing Implementation | Does the test still pass after an internal refactoring? |
|
|
1031
|
+
| `Thread.sleep` or `time.sleep` in a test | AP-03/AP-15: Flaky/Sleep | Can this be replaced with polling or an injectable clock? |
|
|
1032
|
+
| Reflection to access private members | AP-04: Testing Private Methods | Should this logic be extracted to its own public class? |
|
|
1033
|
+
| Coverage report at 100% but bugs in production | AP-05: Coverage Obsession | Run mutation testing -- how many mutants survive? |
|
|
1034
|
+
| Test method with zero `assert`/`expect` calls | AP-06: Assertion-Free | What observable outcome should be verified? |
|
|
1035
|
+
| E2E test count > unit test count | AP-07: Ice Cream Cone | Can the same bug be caught by a unit test instead? |
|
|
1036
|
+
| `static` or class-level mutable fields in tests | AP-08: Shared State | Does each test method get a fresh instance? |
|
|
1037
|
+
| Full suite > 10 minutes | AP-09: Slow Suite | Which 10 tests are slowest? Do they use real I/O? |
|
|
1038
|
+
| 3+ test methods with >80% identical code | AP-10: Copy-Paste | Can these become a parameterized test? |
|
|
1039
|
+
| Test with 8+ assertions on different objects | AP-11: Too Many Things | Does the test name describe a single behavior? |
|
|
1040
|
+
| All test inputs are "valid" or "normal" | AP-12: No Edge Cases | Where are the null, empty, zero, and boundary tests? |
|
|
1041
|
+
| `@Disabled`, `@Ignore`, `skip(`, `xit(` | AP-13: Commented Out | Is there a linked issue? How old is this skip? |
|
|
1042
|
+
| `User.find(1)` or `Order.find(42)` in tests | AP-14: Data Coupling | Does the test create this data or depend on seeds? |
|
|
1043
|
+
| `expected = price * rate` in test assertion | AP-16: Tautological | Is the expected value a hardcoded literal? |
|
|
1044
|
+
| Test file with zero tests for exceptions/errors | AP-17: No Error Paths | What error handling code exists but is untested? |
|
|
1045
|
+
| `assertNotNull(injectedService)` as only assertion | AP-18: Testing Framework | Does this test verify application behavior? |
|
|
1046
|
+
| `if`/`else`/`for` inside a test method | AP-19: Logic in Tests | Can this be a parameterized test instead? |
|
|
1047
|
+
| Test warnings, deprecations in test output | AP-20: Stale Tests | When was this test last updated vs. production code? |
|
|
1048
|
+
| 20+ lines of object construction before the action | AP-21: Excessive Setup | Is there a builder or factory pattern available? |
|
|
1049
|
+
| `mock.verify()` on every mock in the test | AP-22: Test Double Misuse | Is this mock used for its return value or its side effect? |
|
|
1050
|
+
|
|
1051
|
+
---
|
|
1052
|
+
|
|
1053
|
+
*Researched: 2026-03-08 | Sources: [Software Testing Anti-patterns (Codepipes)](https://blog.codepipes.com/testing/software-testing-antipatterns.html), [Unit Testing Anti-Patterns Full List (Yegor256)](https://www.yegor256.com/2018/12/11/unit-testing-anti-patterns.html), [Flaky Tests at Google (Google Testing Blog)](https://testing.googleblog.com/2016/05/flaky-tests-at-google-and-how-we.html), [Unit Testing Principles, Practices, and Patterns (Manning)](https://livebook.manning.com/book/unit-testing/chapter-11), [The Case Against 100% Code Coverage (Codecov)](https://about.codecov.io/blog/the-case-against-100-code-coverage/), [Assertion Free Testing (Martin Fowler)](https://martinfowler.com/bliki/AssertionFreeTesting.html), [Tautological Tests (Randy Coulman)](https://randycoulman.com/blog/2016/12/20/tautological-tests/), [Test Desiderata (Kent Beck)](https://medium.com/@kentbeck_7670/test-desiderata-94150638a4b3), [Mocking is an Anti-Pattern (AmazingCTO)](https://www.amazingcto.com/mocking-is-an-antipattern-how-to-test-without-mocking/), [Mocking isn't evil (Cash App)](https://code.cash.app/mocking), [Test Code Duplication (xUnit Patterns)](http://xunitpatterns.com/Test%20Code%20Duplication.html), [Ice Cream Cone Anti-Pattern (BugBug)](https://bugbug.io/blog/software-testing/ice-cream-cone-anti-pattern/), [Test Flakiness at Spotify (Spotify Engineering)](https://engineering.atspotify.com/2019/11/test-flakiness-methods-for-identifying-and-dealing-with-flaky-tests/), [Revamping Android Testing at Dropbox (Dropbox Tech)](https://dropbox.tech/mobile/revamping-the-android-testing-pipeline-at-dropbox), [Logic in Tests (TestinGil)](https://www.everydayunittesting.com/2016/08/unit-test-anti-patterns-logic-in-tests.html), [Private Methods and Encapsulation (Vladimir Khorikov)](https://khorikov.org/posts/2020-03-26-private-methods-encapsulation/), [Non-determinism in Tests (Enterprise Craftsmanship)](https://enterprisecraftsmanship.com/posts/non-determinism-tests/), [LayerX QA Initiative (Autify)](https://nocode.autify.com/blog/layerxs-qa-initiative-dont-be-tempted-by-the-ice-cream-cone), [Flaky Tests Cost $4.3M (Medium)](https://medium.com/@ran.algawi/its-just-a-flaky-test-the-most-expensive-lie-in-engineering-4b18b0207d96), [An Empirical Analysis of Flaky Tests (U of Illinois)](https://mir.cs.illinois.edu/lamyaa/publications/fse14.pdf)*
|