@wazir-dev/cli 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +111 -0
- package/CHANGELOG.md +14 -0
- package/CONTRIBUTING.md +101 -0
- package/LICENSE +21 -0
- package/README.md +314 -0
- package/assets/composition-engine.mmd +34 -0
- package/assets/demo-script.sh +17 -0
- package/assets/logo-dark.svg +14 -0
- package/assets/logo.svg +14 -0
- package/assets/pipeline.mmd +39 -0
- package/assets/record-demo.sh +51 -0
- package/docs/README.md +51 -0
- package/docs/adapters/context-mode.md +60 -0
- package/docs/concepts/architecture.md +87 -0
- package/docs/concepts/artifact-model.md +60 -0
- package/docs/concepts/composition-engine.md +36 -0
- package/docs/concepts/indexing-and-recall.md +160 -0
- package/docs/concepts/observability.md +41 -0
- package/docs/concepts/roles-and-workflows.md +59 -0
- package/docs/concepts/terminology-policy.md +27 -0
- package/docs/getting-started/01-installation.md +78 -0
- package/docs/getting-started/02-first-run.md +102 -0
- package/docs/getting-started/03-adding-to-project.md +15 -0
- package/docs/getting-started/04-host-setup.md +15 -0
- package/docs/guides/ci-integration.md +15 -0
- package/docs/guides/creating-skills.md +15 -0
- package/docs/guides/expertise-module-authoring.md +15 -0
- package/docs/guides/hook-development.md +15 -0
- package/docs/guides/memory-and-learnings.md +34 -0
- package/docs/guides/multi-host-export.md +15 -0
- package/docs/guides/troubleshooting.md +101 -0
- package/docs/guides/writing-custom-roles.md +15 -0
- package/docs/plans/2026-03-15-cli-pipeline-integration-design.md +592 -0
- package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md +598 -0
- package/docs/plans/2026-03-15-docs-enforcement-plan.md +238 -0
- package/docs/readmes/INDEX.md +99 -0
- package/docs/readmes/features/expertise/README.md +171 -0
- package/docs/readmes/features/exports/README.md +222 -0
- package/docs/readmes/features/hooks/README.md +103 -0
- package/docs/readmes/features/hooks/loop-cap-guard.md +133 -0
- package/docs/readmes/features/hooks/post-tool-capture.md +121 -0
- package/docs/readmes/features/hooks/post-tool-lint.md +130 -0
- package/docs/readmes/features/hooks/pre-compact-summary.md +122 -0
- package/docs/readmes/features/hooks/pre-tool-capture-route.md +100 -0
- package/docs/readmes/features/hooks/protected-path-write-guard.md +128 -0
- package/docs/readmes/features/hooks/session-start.md +119 -0
- package/docs/readmes/features/hooks/stop-handoff-harvest.md +125 -0
- package/docs/readmes/features/roles/README.md +157 -0
- package/docs/readmes/features/roles/clarifier.md +152 -0
- package/docs/readmes/features/roles/content-author.md +190 -0
- package/docs/readmes/features/roles/designer.md +193 -0
- package/docs/readmes/features/roles/executor.md +184 -0
- package/docs/readmes/features/roles/learner.md +210 -0
- package/docs/readmes/features/roles/planner.md +182 -0
- package/docs/readmes/features/roles/researcher.md +164 -0
- package/docs/readmes/features/roles/reviewer.md +184 -0
- package/docs/readmes/features/roles/specifier.md +162 -0
- package/docs/readmes/features/roles/verifier.md +215 -0
- package/docs/readmes/features/schemas/README.md +178 -0
- package/docs/readmes/features/skills/README.md +63 -0
- package/docs/readmes/features/skills/brainstorming.md +96 -0
- package/docs/readmes/features/skills/debugging.md +148 -0
- package/docs/readmes/features/skills/design.md +120 -0
- package/docs/readmes/features/skills/prepare-next.md +109 -0
- package/docs/readmes/features/skills/run-audit.md +159 -0
- package/docs/readmes/features/skills/scan-project.md +109 -0
- package/docs/readmes/features/skills/self-audit.md +176 -0
- package/docs/readmes/features/skills/tdd.md +137 -0
- package/docs/readmes/features/skills/using-skills.md +92 -0
- package/docs/readmes/features/skills/verification.md +120 -0
- package/docs/readmes/features/skills/writing-plans.md +104 -0
- package/docs/readmes/features/tooling/README.md +320 -0
- package/docs/readmes/features/workflows/README.md +186 -0
- package/docs/readmes/features/workflows/author.md +181 -0
- package/docs/readmes/features/workflows/clarify.md +154 -0
- package/docs/readmes/features/workflows/design-review.md +171 -0
- package/docs/readmes/features/workflows/design.md +169 -0
- package/docs/readmes/features/workflows/discover.md +162 -0
- package/docs/readmes/features/workflows/execute.md +173 -0
- package/docs/readmes/features/workflows/learn.md +167 -0
- package/docs/readmes/features/workflows/plan-review.md +165 -0
- package/docs/readmes/features/workflows/plan.md +170 -0
- package/docs/readmes/features/workflows/prepare-next.md +167 -0
- package/docs/readmes/features/workflows/review.md +169 -0
- package/docs/readmes/features/workflows/run-audit.md +191 -0
- package/docs/readmes/features/workflows/spec-challenge.md +159 -0
- package/docs/readmes/features/workflows/specify.md +160 -0
- package/docs/readmes/features/workflows/verify.md +177 -0
- package/docs/readmes/packages/README.md +50 -0
- package/docs/readmes/packages/ajv.md +117 -0
- package/docs/readmes/packages/context-mode.md +118 -0
- package/docs/readmes/packages/gray-matter.md +116 -0
- package/docs/readmes/packages/node-test.md +137 -0
- package/docs/readmes/packages/yaml.md +112 -0
- package/docs/reference/configuration-reference.md +159 -0
- package/docs/reference/expertise-index.md +52 -0
- package/docs/reference/git-flow.md +43 -0
- package/docs/reference/hooks.md +87 -0
- package/docs/reference/host-exports.md +50 -0
- package/docs/reference/launch-checklist.md +172 -0
- package/docs/reference/marketplace-listings.md +76 -0
- package/docs/reference/release-process.md +34 -0
- package/docs/reference/roles-reference.md +77 -0
- package/docs/reference/skills.md +33 -0
- package/docs/reference/templates.md +29 -0
- package/docs/reference/tooling-cli.md +94 -0
- package/docs/truth-claims.yaml +222 -0
- package/expertise/PROGRESS.md +63 -0
- package/expertise/README.md +18 -0
- package/expertise/antipatterns/PROGRESS.md +56 -0
- package/expertise/antipatterns/backend/api-design-antipatterns.md +1271 -0
- package/expertise/antipatterns/backend/auth-antipatterns.md +1195 -0
- package/expertise/antipatterns/backend/caching-antipatterns.md +622 -0
- package/expertise/antipatterns/backend/database-antipatterns.md +1038 -0
- package/expertise/antipatterns/backend/index.md +24 -0
- package/expertise/antipatterns/backend/microservices-antipatterns.md +850 -0
- package/expertise/antipatterns/code/architecture-antipatterns.md +919 -0
- package/expertise/antipatterns/code/async-antipatterns.md +622 -0
- package/expertise/antipatterns/code/code-smells.md +1186 -0
- package/expertise/antipatterns/code/dependency-antipatterns.md +1209 -0
- package/expertise/antipatterns/code/error-handling-antipatterns.md +1360 -0
- package/expertise/antipatterns/code/index.md +27 -0
- package/expertise/antipatterns/code/naming-and-abstraction.md +1118 -0
- package/expertise/antipatterns/code/state-management-antipatterns.md +1076 -0
- package/expertise/antipatterns/code/testing-antipatterns.md +1053 -0
- package/expertise/antipatterns/design/accessibility-antipatterns.md +1136 -0
- package/expertise/antipatterns/design/dark-patterns.md +1121 -0
- package/expertise/antipatterns/design/index.md +22 -0
- package/expertise/antipatterns/design/ui-antipatterns.md +1202 -0
- package/expertise/antipatterns/design/ux-antipatterns.md +680 -0
- package/expertise/antipatterns/frontend/css-layout-antipatterns.md +691 -0
- package/expertise/antipatterns/frontend/flutter-antipatterns.md +1827 -0
- package/expertise/antipatterns/frontend/index.md +23 -0
- package/expertise/antipatterns/frontend/mobile-antipatterns.md +573 -0
- package/expertise/antipatterns/frontend/react-antipatterns.md +1128 -0
- package/expertise/antipatterns/frontend/spa-antipatterns.md +1235 -0
- package/expertise/antipatterns/index.md +31 -0
- package/expertise/antipatterns/performance/index.md +20 -0
- package/expertise/antipatterns/performance/performance-antipatterns.md +1013 -0
- package/expertise/antipatterns/performance/premature-optimization.md +623 -0
- package/expertise/antipatterns/performance/scaling-antipatterns.md +785 -0
- package/expertise/antipatterns/process/ai-coding-antipatterns.md +853 -0
- package/expertise/antipatterns/process/code-review-antipatterns.md +656 -0
- package/expertise/antipatterns/process/deployment-antipatterns.md +920 -0
- package/expertise/antipatterns/process/index.md +23 -0
- package/expertise/antipatterns/process/technical-debt-antipatterns.md +647 -0
- package/expertise/antipatterns/security/index.md +20 -0
- package/expertise/antipatterns/security/secrets-antipatterns.md +849 -0
- package/expertise/antipatterns/security/security-theater.md +843 -0
- package/expertise/antipatterns/security/vulnerability-patterns.md +801 -0
- package/expertise/architecture/PROGRESS.md +70 -0
- package/expertise/architecture/data/caching-architecture.md +671 -0
- package/expertise/architecture/data/data-consistency.md +574 -0
- package/expertise/architecture/data/data-modeling.md +536 -0
- package/expertise/architecture/data/event-streams-and-queues.md +634 -0
- package/expertise/architecture/data/index.md +25 -0
- package/expertise/architecture/data/search-architecture.md +663 -0
- package/expertise/architecture/data/sql-vs-nosql.md +708 -0
- package/expertise/architecture/decisions/architecture-decision-records.md +640 -0
- package/expertise/architecture/decisions/build-vs-buy.md +616 -0
- package/expertise/architecture/decisions/index.md +23 -0
- package/expertise/architecture/decisions/monolith-to-microservices.md +790 -0
- package/expertise/architecture/decisions/technology-selection.md +616 -0
- package/expertise/architecture/distributed/cap-theorem-and-tradeoffs.md +800 -0
- package/expertise/architecture/distributed/circuit-breaker-bulkhead.md +741 -0
- package/expertise/architecture/distributed/consensus-and-coordination.md +796 -0
- package/expertise/architecture/distributed/distributed-systems-fundamentals.md +564 -0
- package/expertise/architecture/distributed/idempotency-and-retry.md +796 -0
- package/expertise/architecture/distributed/index.md +25 -0
- package/expertise/architecture/distributed/saga-pattern.md +797 -0
- package/expertise/architecture/foundations/architectural-thinking.md +460 -0
- package/expertise/architecture/foundations/coupling-and-cohesion.md +770 -0
- package/expertise/architecture/foundations/design-principles-solid.md +649 -0
- package/expertise/architecture/foundations/domain-driven-design.md +719 -0
- package/expertise/architecture/foundations/index.md +25 -0
- package/expertise/architecture/foundations/separation-of-concerns.md +472 -0
- package/expertise/architecture/foundations/twelve-factor-app.md +797 -0
- package/expertise/architecture/index.md +34 -0
- package/expertise/architecture/integration/api-design-graphql.md +638 -0
- package/expertise/architecture/integration/api-design-grpc.md +804 -0
- package/expertise/architecture/integration/api-design-rest.md +892 -0
- package/expertise/architecture/integration/index.md +25 -0
- package/expertise/architecture/integration/third-party-integration.md +795 -0
- package/expertise/architecture/integration/webhooks-and-callbacks.md +1152 -0
- package/expertise/architecture/integration/websockets-realtime.md +791 -0
- package/expertise/architecture/mobile-architecture/index.md +22 -0
- package/expertise/architecture/mobile-architecture/mobile-app-architecture.md +780 -0
- package/expertise/architecture/mobile-architecture/mobile-backend-for-frontend.md +670 -0
- package/expertise/architecture/mobile-architecture/offline-first.md +719 -0
- package/expertise/architecture/mobile-architecture/push-and-sync.md +782 -0
- package/expertise/architecture/patterns/cqrs-event-sourcing.md +717 -0
- package/expertise/architecture/patterns/event-driven.md +797 -0
- package/expertise/architecture/patterns/hexagonal-clean-architecture.md +870 -0
- package/expertise/architecture/patterns/index.md +27 -0
- package/expertise/architecture/patterns/layered-architecture.md +736 -0
- package/expertise/architecture/patterns/microservices.md +753 -0
- package/expertise/architecture/patterns/modular-monolith.md +692 -0
- package/expertise/architecture/patterns/monolith.md +626 -0
- package/expertise/architecture/patterns/plugin-architecture.md +735 -0
- package/expertise/architecture/patterns/serverless.md +780 -0
- package/expertise/architecture/scaling/database-scaling.md +615 -0
- package/expertise/architecture/scaling/feature-flags-and-rollouts.md +757 -0
- package/expertise/architecture/scaling/horizontal-vs-vertical.md +606 -0
- package/expertise/architecture/scaling/index.md +24 -0
- package/expertise/architecture/scaling/multi-tenancy.md +800 -0
- package/expertise/architecture/scaling/stateless-design.md +787 -0
- package/expertise/backend/embedded-firmware.md +625 -0
- package/expertise/backend/go.md +853 -0
- package/expertise/backend/index.md +24 -0
- package/expertise/backend/java-spring.md +448 -0
- package/expertise/backend/node-typescript.md +625 -0
- package/expertise/backend/python-fastapi.md +724 -0
- package/expertise/backend/rust.md +458 -0
- package/expertise/backend/solidity.md +711 -0
- package/expertise/composition-map.yaml +443 -0
- package/expertise/content/foundations/content-modeling.md +395 -0
- package/expertise/content/foundations/editorial-standards.md +449 -0
- package/expertise/content/foundations/index.md +24 -0
- package/expertise/content/foundations/microcopy.md +455 -0
- package/expertise/content/foundations/terminology-governance.md +509 -0
- package/expertise/content/index.md +34 -0
- package/expertise/content/patterns/accessibility-copy.md +518 -0
- package/expertise/content/patterns/index.md +24 -0
- package/expertise/content/patterns/notification-content.md +433 -0
- package/expertise/content/patterns/sample-content.md +486 -0
- package/expertise/content/patterns/state-copy.md +439 -0
- package/expertise/design/PROGRESS.md +58 -0
- package/expertise/design/disciplines/dark-mode-theming.md +577 -0
- package/expertise/design/disciplines/design-systems.md +595 -0
- package/expertise/design/disciplines/index.md +25 -0
- package/expertise/design/disciplines/information-architecture.md +800 -0
- package/expertise/design/disciplines/interaction-design.md +788 -0
- package/expertise/design/disciplines/responsive-design.md +552 -0
- package/expertise/design/disciplines/usability-testing.md +516 -0
- package/expertise/design/disciplines/user-research.md +792 -0
- package/expertise/design/foundations/accessibility-design.md +796 -0
- package/expertise/design/foundations/color-theory.md +797 -0
- package/expertise/design/foundations/iconography.md +795 -0
- package/expertise/design/foundations/index.md +26 -0
- package/expertise/design/foundations/motion-and-animation.md +653 -0
- package/expertise/design/foundations/rtl-design.md +585 -0
- package/expertise/design/foundations/spacing-and-layout.md +607 -0
- package/expertise/design/foundations/typography.md +800 -0
- package/expertise/design/foundations/visual-hierarchy.md +761 -0
- package/expertise/design/index.md +32 -0
- package/expertise/design/patterns/authentication-flows.md +474 -0
- package/expertise/design/patterns/content-consumption.md +789 -0
- package/expertise/design/patterns/data-display.md +618 -0
- package/expertise/design/patterns/e-commerce.md +1494 -0
- package/expertise/design/patterns/feedback-and-states.md +642 -0
- package/expertise/design/patterns/forms-and-input.md +819 -0
- package/expertise/design/patterns/gamification.md +801 -0
- package/expertise/design/patterns/index.md +31 -0
- package/expertise/design/patterns/microinteractions.md +449 -0
- package/expertise/design/patterns/navigation.md +800 -0
- package/expertise/design/patterns/notifications.md +705 -0
- package/expertise/design/patterns/onboarding.md +700 -0
- package/expertise/design/patterns/search-and-filter.md +601 -0
- package/expertise/design/patterns/settings-and-preferences.md +768 -0
- package/expertise/design/patterns/social-and-community.md +748 -0
- package/expertise/design/platforms/desktop-native.md +612 -0
- package/expertise/design/platforms/index.md +25 -0
- package/expertise/design/platforms/mobile-android.md +825 -0
- package/expertise/design/platforms/mobile-cross-platform.md +983 -0
- package/expertise/design/platforms/mobile-ios.md +699 -0
- package/expertise/design/platforms/tablet.md +794 -0
- package/expertise/design/platforms/web-dashboard.md +790 -0
- package/expertise/design/platforms/web-responsive.md +550 -0
- package/expertise/design/psychology/behavioral-nudges.md +449 -0
- package/expertise/design/psychology/cognitive-load.md +1191 -0
- package/expertise/design/psychology/error-psychology.md +778 -0
- package/expertise/design/psychology/index.md +22 -0
- package/expertise/design/psychology/persuasive-design.md +736 -0
- package/expertise/design/psychology/user-mental-models.md +623 -0
- package/expertise/design/tooling/open-pencil.md +266 -0
- package/expertise/frontend/angular.md +1073 -0
- package/expertise/frontend/desktop-electron.md +546 -0
- package/expertise/frontend/flutter.md +782 -0
- package/expertise/frontend/index.md +27 -0
- package/expertise/frontend/native-android.md +409 -0
- package/expertise/frontend/native-ios.md +490 -0
- package/expertise/frontend/react-native.md +1160 -0
- package/expertise/frontend/react.md +808 -0
- package/expertise/frontend/vue.md +1089 -0
- package/expertise/humanize/domain-rules-code.md +79 -0
- package/expertise/humanize/domain-rules-content.md +67 -0
- package/expertise/humanize/domain-rules-technical-docs.md +56 -0
- package/expertise/humanize/index.md +35 -0
- package/expertise/humanize/self-audit-checklist.md +87 -0
- package/expertise/humanize/sentence-patterns.md +218 -0
- package/expertise/humanize/vocabulary-blacklist.md +105 -0
- package/expertise/i18n/PROGRESS.md +65 -0
- package/expertise/i18n/advanced/accessibility-and-i18n.md +28 -0
- package/expertise/i18n/advanced/bidirectional-text-algorithm.md +38 -0
- package/expertise/i18n/advanced/complex-scripts.md +30 -0
- package/expertise/i18n/advanced/performance-and-i18n.md +27 -0
- package/expertise/i18n/advanced/testing-i18n.md +28 -0
- package/expertise/i18n/content/content-adaptation.md +23 -0
- package/expertise/i18n/content/locale-specific-formatting.md +23 -0
- package/expertise/i18n/content/machine-translation-integration.md +28 -0
- package/expertise/i18n/content/translation-management.md +29 -0
- package/expertise/i18n/foundations/date-time-calendars.md +67 -0
- package/expertise/i18n/foundations/i18n-architecture.md +272 -0
- package/expertise/i18n/foundations/locale-and-language-tags.md +79 -0
- package/expertise/i18n/foundations/numbers-currency-units.md +61 -0
- package/expertise/i18n/foundations/pluralization-and-gender.md +109 -0
- package/expertise/i18n/foundations/string-externalization.md +236 -0
- package/expertise/i18n/foundations/text-direction-bidi.md +241 -0
- package/expertise/i18n/foundations/unicode-and-encoding.md +86 -0
- package/expertise/i18n/index.md +38 -0
- package/expertise/i18n/platform/backend-i18n.md +31 -0
- package/expertise/i18n/platform/flutter-i18n.md +148 -0
- package/expertise/i18n/platform/native-android-i18n.md +36 -0
- package/expertise/i18n/platform/native-ios-i18n.md +36 -0
- package/expertise/i18n/platform/react-i18n.md +103 -0
- package/expertise/i18n/platform/web-css-i18n.md +81 -0
- package/expertise/i18n/rtl/arabic-specific.md +175 -0
- package/expertise/i18n/rtl/hebrew-specific.md +149 -0
- package/expertise/i18n/rtl/rtl-animations-and-transitions.md +111 -0
- package/expertise/i18n/rtl/rtl-forms-and-input.md +161 -0
- package/expertise/i18n/rtl/rtl-fundamentals.md +211 -0
- package/expertise/i18n/rtl/rtl-icons-and-images.md +181 -0
- package/expertise/i18n/rtl/rtl-layout-mirroring.md +252 -0
- package/expertise/i18n/rtl/rtl-navigation-and-gestures.md +107 -0
- package/expertise/i18n/rtl/rtl-testing-and-qa.md +147 -0
- package/expertise/i18n/rtl/rtl-typography.md +160 -0
- package/expertise/index.md +113 -0
- package/expertise/index.yaml +216 -0
- package/expertise/infrastructure/cloud-aws.md +597 -0
- package/expertise/infrastructure/cloud-gcp.md +599 -0
- package/expertise/infrastructure/cybersecurity.md +816 -0
- package/expertise/infrastructure/database-mongodb.md +447 -0
- package/expertise/infrastructure/database-postgres.md +400 -0
- package/expertise/infrastructure/devops-cicd.md +787 -0
- package/expertise/infrastructure/index.md +27 -0
- package/expertise/performance/PROGRESS.md +50 -0
- package/expertise/performance/backend/api-latency.md +1204 -0
- package/expertise/performance/backend/background-jobs.md +506 -0
- package/expertise/performance/backend/connection-pooling.md +1209 -0
- package/expertise/performance/backend/database-query-optimization.md +515 -0
- package/expertise/performance/backend/index.md +23 -0
- package/expertise/performance/backend/rate-limiting-and-throttling.md +971 -0
- package/expertise/performance/foundations/algorithmic-complexity.md +954 -0
- package/expertise/performance/foundations/caching-strategies.md +489 -0
- package/expertise/performance/foundations/concurrency-and-parallelism.md +847 -0
- package/expertise/performance/foundations/index.md +24 -0
- package/expertise/performance/foundations/measuring-and-profiling.md +440 -0
- package/expertise/performance/foundations/memory-management.md +964 -0
- package/expertise/performance/foundations/performance-budgets.md +1314 -0
- package/expertise/performance/index.md +31 -0
- package/expertise/performance/infrastructure/auto-scaling.md +1059 -0
- package/expertise/performance/infrastructure/cdn-and-edge.md +1081 -0
- package/expertise/performance/infrastructure/index.md +22 -0
- package/expertise/performance/infrastructure/load-balancing.md +1081 -0
- package/expertise/performance/infrastructure/observability.md +1079 -0
- package/expertise/performance/mobile/index.md +23 -0
- package/expertise/performance/mobile/mobile-animations.md +544 -0
- package/expertise/performance/mobile/mobile-memory-battery.md +416 -0
- package/expertise/performance/mobile/mobile-network.md +452 -0
- package/expertise/performance/mobile/mobile-rendering.md +599 -0
- package/expertise/performance/mobile/mobile-startup-time.md +505 -0
- package/expertise/performance/platform-specific/flutter-performance.md +647 -0
- package/expertise/performance/platform-specific/index.md +22 -0
- package/expertise/performance/platform-specific/node-performance.md +1307 -0
- package/expertise/performance/platform-specific/postgres-performance.md +1366 -0
- package/expertise/performance/platform-specific/react-performance.md +1403 -0
- package/expertise/performance/web/bundle-optimization.md +1239 -0
- package/expertise/performance/web/image-and-media.md +636 -0
- package/expertise/performance/web/index.md +24 -0
- package/expertise/performance/web/network-optimization.md +1133 -0
- package/expertise/performance/web/rendering-performance.md +1098 -0
- package/expertise/performance/web/ssr-and-hydration.md +918 -0
- package/expertise/performance/web/web-vitals.md +1374 -0
- package/expertise/quality/accessibility.md +985 -0
- package/expertise/quality/evidence-based-verification.md +499 -0
- package/expertise/quality/index.md +24 -0
- package/expertise/quality/ml-model-audit.md +614 -0
- package/expertise/quality/performance.md +600 -0
- package/expertise/quality/testing-api.md +891 -0
- package/expertise/quality/testing-mobile.md +496 -0
- package/expertise/quality/testing-web.md +849 -0
- package/expertise/security/PROGRESS.md +54 -0
- package/expertise/security/agentic-identity.md +540 -0
- package/expertise/security/compliance-frameworks.md +601 -0
- package/expertise/security/data/data-encryption.md +364 -0
- package/expertise/security/data/data-privacy-gdpr.md +692 -0
- package/expertise/security/data/database-security.md +1171 -0
- package/expertise/security/data/index.md +22 -0
- package/expertise/security/data/pii-handling.md +531 -0
- package/expertise/security/foundations/authentication.md +1041 -0
- package/expertise/security/foundations/authorization.md +603 -0
- package/expertise/security/foundations/cryptography.md +1001 -0
- package/expertise/security/foundations/index.md +25 -0
- package/expertise/security/foundations/owasp-top-10.md +1354 -0
- package/expertise/security/foundations/secrets-management.md +1217 -0
- package/expertise/security/foundations/secure-sdlc.md +700 -0
- package/expertise/security/foundations/supply-chain-security.md +698 -0
- package/expertise/security/index.md +31 -0
- package/expertise/security/infrastructure/cloud-security-aws.md +1296 -0
- package/expertise/security/infrastructure/cloud-security-gcp.md +1376 -0
- package/expertise/security/infrastructure/container-security.md +721 -0
- package/expertise/security/infrastructure/incident-response.md +1295 -0
- package/expertise/security/infrastructure/index.md +24 -0
- package/expertise/security/infrastructure/logging-and-monitoring.md +1618 -0
- package/expertise/security/infrastructure/network-security.md +1337 -0
- package/expertise/security/mobile/index.md +23 -0
- package/expertise/security/mobile/mobile-android-security.md +1218 -0
- package/expertise/security/mobile/mobile-binary-protection.md +1229 -0
- package/expertise/security/mobile/mobile-data-storage.md +1265 -0
- package/expertise/security/mobile/mobile-ios-security.md +1401 -0
- package/expertise/security/mobile/mobile-network-security.md +1520 -0
- package/expertise/security/smart-contract-security.md +594 -0
- package/expertise/security/testing/index.md +22 -0
- package/expertise/security/testing/penetration-testing.md +1258 -0
- package/expertise/security/testing/security-code-review.md +1765 -0
- package/expertise/security/testing/threat-modeling.md +1074 -0
- package/expertise/security/testing/vulnerability-scanning.md +1062 -0
- package/expertise/security/web/api-security.md +586 -0
- package/expertise/security/web/cors-and-headers.md +433 -0
- package/expertise/security/web/csrf.md +562 -0
- package/expertise/security/web/file-upload.md +1477 -0
- package/expertise/security/web/index.md +25 -0
- package/expertise/security/web/injection.md +1375 -0
- package/expertise/security/web/session-management.md +1101 -0
- package/expertise/security/web/xss.md +1158 -0
- package/exports/README.md +17 -0
- package/exports/hosts/claude/.claude/agents/clarifier.md +42 -0
- package/exports/hosts/claude/.claude/agents/content-author.md +63 -0
- package/exports/hosts/claude/.claude/agents/designer.md +55 -0
- package/exports/hosts/claude/.claude/agents/executor.md +55 -0
- package/exports/hosts/claude/.claude/agents/learner.md +51 -0
- package/exports/hosts/claude/.claude/agents/planner.md +53 -0
- package/exports/hosts/claude/.claude/agents/researcher.md +43 -0
- package/exports/hosts/claude/.claude/agents/reviewer.md +54 -0
- package/exports/hosts/claude/.claude/agents/specifier.md +47 -0
- package/exports/hosts/claude/.claude/agents/verifier.md +71 -0
- package/exports/hosts/claude/.claude/commands/author.md +42 -0
- package/exports/hosts/claude/.claude/commands/clarify.md +38 -0
- package/exports/hosts/claude/.claude/commands/design-review.md +46 -0
- package/exports/hosts/claude/.claude/commands/design.md +44 -0
- package/exports/hosts/claude/.claude/commands/discover.md +37 -0
- package/exports/hosts/claude/.claude/commands/execute.md +48 -0
- package/exports/hosts/claude/.claude/commands/learn.md +38 -0
- package/exports/hosts/claude/.claude/commands/plan-review.md +42 -0
- package/exports/hosts/claude/.claude/commands/plan.md +39 -0
- package/exports/hosts/claude/.claude/commands/prepare-next.md +37 -0
- package/exports/hosts/claude/.claude/commands/review.md +40 -0
- package/exports/hosts/claude/.claude/commands/run-audit.md +41 -0
- package/exports/hosts/claude/.claude/commands/spec-challenge.md +41 -0
- package/exports/hosts/claude/.claude/commands/specify.md +38 -0
- package/exports/hosts/claude/.claude/commands/verify.md +37 -0
- package/exports/hosts/claude/.claude/settings.json +34 -0
- package/exports/hosts/claude/CLAUDE.md +19 -0
- package/exports/hosts/claude/export.manifest.json +38 -0
- package/exports/hosts/claude/host-package.json +67 -0
- package/exports/hosts/codex/AGENTS.md +19 -0
- package/exports/hosts/codex/export.manifest.json +38 -0
- package/exports/hosts/codex/host-package.json +41 -0
- package/exports/hosts/cursor/.cursor/hooks.json +16 -0
- package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +19 -0
- package/exports/hosts/cursor/export.manifest.json +38 -0
- package/exports/hosts/cursor/host-package.json +42 -0
- package/exports/hosts/gemini/GEMINI.md +19 -0
- package/exports/hosts/gemini/export.manifest.json +38 -0
- package/exports/hosts/gemini/host-package.json +41 -0
- package/hooks/README.md +18 -0
- package/hooks/definitions/loop_cap_guard.yaml +21 -0
- package/hooks/definitions/post_tool_capture.yaml +24 -0
- package/hooks/definitions/pre_compact_summary.yaml +19 -0
- package/hooks/definitions/pre_tool_capture_route.yaml +19 -0
- package/hooks/definitions/protected_path_write_guard.yaml +19 -0
- package/hooks/definitions/session_start.yaml +19 -0
- package/hooks/definitions/stop_handoff_harvest.yaml +20 -0
- package/hooks/loop-cap-guard +17 -0
- package/hooks/post-tool-lint +36 -0
- package/hooks/protected-path-write-guard +17 -0
- package/hooks/session-start +41 -0
- package/llms-full.txt +2355 -0
- package/llms.txt +43 -0
- package/package.json +79 -0
- package/roles/README.md +20 -0
- package/roles/clarifier.md +42 -0
- package/roles/content-author.md +63 -0
- package/roles/designer.md +55 -0
- package/roles/executor.md +55 -0
- package/roles/learner.md +51 -0
- package/roles/planner.md +53 -0
- package/roles/researcher.md +43 -0
- package/roles/reviewer.md +54 -0
- package/roles/specifier.md +47 -0
- package/roles/verifier.md +71 -0
- package/schemas/README.md +24 -0
- package/schemas/accepted-learning.schema.json +20 -0
- package/schemas/author-artifact.schema.json +156 -0
- package/schemas/clarification.schema.json +19 -0
- package/schemas/design-artifact.schema.json +80 -0
- package/schemas/docs-claim.schema.json +18 -0
- package/schemas/export-manifest.schema.json +20 -0
- package/schemas/hook.schema.json +67 -0
- package/schemas/host-export-package.schema.json +18 -0
- package/schemas/implementation-plan.schema.json +19 -0
- package/schemas/proposed-learning.schema.json +19 -0
- package/schemas/research.schema.json +18 -0
- package/schemas/review.schema.json +29 -0
- package/schemas/run-manifest.schema.json +18 -0
- package/schemas/spec-challenge.schema.json +18 -0
- package/schemas/spec.schema.json +20 -0
- package/schemas/usage.schema.json +102 -0
- package/schemas/verification-proof.schema.json +29 -0
- package/schemas/wazir-manifest.schema.json +173 -0
- package/skills/README.md +40 -0
- package/skills/brainstorming/SKILL.md +77 -0
- package/skills/debugging/SKILL.md +50 -0
- package/skills/design/SKILL.md +61 -0
- package/skills/dispatching-parallel-agents/SKILL.md +128 -0
- package/skills/executing-plans/SKILL.md +70 -0
- package/skills/finishing-a-development-branch/SKILL.md +169 -0
- package/skills/humanize/SKILL.md +123 -0
- package/skills/init-pipeline/SKILL.md +124 -0
- package/skills/prepare-next/SKILL.md +20 -0
- package/skills/receiving-code-review/SKILL.md +123 -0
- package/skills/requesting-code-review/SKILL.md +105 -0
- package/skills/requesting-code-review/code-reviewer.md +108 -0
- package/skills/run-audit/SKILL.md +197 -0
- package/skills/scan-project/SKILL.md +41 -0
- package/skills/self-audit/SKILL.md +153 -0
- package/skills/subagent-driven-development/SKILL.md +154 -0
- package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +26 -0
- package/skills/subagent-driven-development/implementer-prompt.md +102 -0
- package/skills/subagent-driven-development/spec-reviewer-prompt.md +61 -0
- package/skills/tdd/SKILL.md +23 -0
- package/skills/using-git-worktrees/SKILL.md +163 -0
- package/skills/using-skills/SKILL.md +95 -0
- package/skills/verification/SKILL.md +22 -0
- package/skills/wazir/SKILL.md +463 -0
- package/skills/writing-plans/SKILL.md +30 -0
- package/skills/writing-skills/SKILL.md +157 -0
- package/skills/writing-skills/anthropic-best-practices.md +122 -0
- package/skills/writing-skills/persuasion-principles.md +50 -0
- package/templates/README.md +20 -0
- package/templates/artifacts/README.md +10 -0
- package/templates/artifacts/accepted-learning.md +19 -0
- package/templates/artifacts/accepted-learning.template.json +12 -0
- package/templates/artifacts/author.md +74 -0
- package/templates/artifacts/author.template.json +19 -0
- package/templates/artifacts/clarification.md +21 -0
- package/templates/artifacts/clarification.template.json +12 -0
- package/templates/artifacts/execute-notes.md +19 -0
- package/templates/artifacts/implementation-plan.md +21 -0
- package/templates/artifacts/implementation-plan.template.json +11 -0
- package/templates/artifacts/learning-proposal.md +19 -0
- package/templates/artifacts/next-run-handoff.md +21 -0
- package/templates/artifacts/plan-review.md +19 -0
- package/templates/artifacts/proposed-learning.template.json +12 -0
- package/templates/artifacts/research.md +21 -0
- package/templates/artifacts/research.template.json +12 -0
- package/templates/artifacts/review-findings.md +19 -0
- package/templates/artifacts/review.template.json +11 -0
- package/templates/artifacts/run-manifest.template.json +8 -0
- package/templates/artifacts/spec-challenge.md +19 -0
- package/templates/artifacts/spec-challenge.template.json +11 -0
- package/templates/artifacts/spec.md +21 -0
- package/templates/artifacts/spec.template.json +12 -0
- package/templates/artifacts/verification-proof.md +19 -0
- package/templates/artifacts/verification-proof.template.json +11 -0
- package/templates/examples/accepted-learning.example.json +14 -0
- package/templates/examples/author.example.json +152 -0
- package/templates/examples/clarification.example.json +15 -0
- package/templates/examples/docs-claim.example.json +8 -0
- package/templates/examples/export-manifest.example.json +7 -0
- package/templates/examples/host-export-package.example.json +11 -0
- package/templates/examples/implementation-plan.example.json +17 -0
- package/templates/examples/proposed-learning.example.json +13 -0
- package/templates/examples/research.example.json +15 -0
- package/templates/examples/research.example.md +6 -0
- package/templates/examples/review.example.json +17 -0
- package/templates/examples/run-manifest.example.json +9 -0
- package/templates/examples/spec-challenge.example.json +14 -0
- package/templates/examples/spec.example.json +21 -0
- package/templates/examples/verification-proof.example.json +21 -0
- package/templates/examples/wazir-manifest.example.yaml +65 -0
- package/templates/task-definition-schema.md +99 -0
- package/tooling/README.md +20 -0
- package/tooling/src/adapters/context-mode.js +50 -0
- package/tooling/src/capture/command.js +376 -0
- package/tooling/src/capture/store.js +99 -0
- package/tooling/src/capture/usage.js +270 -0
- package/tooling/src/checks/branches.js +50 -0
- package/tooling/src/checks/brand-truth.js +110 -0
- package/tooling/src/checks/changelog.js +231 -0
- package/tooling/src/checks/command-registry.js +36 -0
- package/tooling/src/checks/commits.js +102 -0
- package/tooling/src/checks/docs-drift.js +103 -0
- package/tooling/src/checks/docs-truth.js +201 -0
- package/tooling/src/checks/runtime-surface.js +156 -0
- package/tooling/src/cli.js +116 -0
- package/tooling/src/command-options.js +56 -0
- package/tooling/src/commands/validate.js +320 -0
- package/tooling/src/doctor/command.js +91 -0
- package/tooling/src/export/command.js +77 -0
- package/tooling/src/export/compiler.js +498 -0
- package/tooling/src/guards/loop-cap-guard.js +52 -0
- package/tooling/src/guards/protected-path-write-guard.js +67 -0
- package/tooling/src/index/command.js +152 -0
- package/tooling/src/index/storage.js +1061 -0
- package/tooling/src/index/summarizers.js +261 -0
- package/tooling/src/loaders.js +18 -0
- package/tooling/src/project-root.js +22 -0
- package/tooling/src/recall/command.js +225 -0
- package/tooling/src/schema-validator.js +30 -0
- package/tooling/src/state-root.js +40 -0
- package/tooling/src/status/command.js +71 -0
- package/wazir.manifest.yaml +135 -0
- package/workflows/README.md +19 -0
- package/workflows/author.md +42 -0
- package/workflows/clarify.md +38 -0
- package/workflows/design-review.md +46 -0
- package/workflows/design.md +44 -0
- package/workflows/discover.md +37 -0
- package/workflows/execute.md +48 -0
- package/workflows/learn.md +38 -0
- package/workflows/plan-review.md +42 -0
- package/workflows/plan.md +39 -0
- package/workflows/prepare-next.md +37 -0
- package/workflows/review.md +40 -0
- package/workflows/run-audit.md +41 -0
- package/workflows/spec-challenge.md +41 -0
- package/workflows/specify.md +38 -0
- package/workflows/verify.md +37 -0
|
@@ -0,0 +1,850 @@
|
|
|
1
|
+
# Microservices Anti-Patterns
|
|
2
|
+
|
|
3
|
+
> Microservices promise independent deployability, team autonomy, and targeted scaling. In practice, most organizations recreate the monolith's problems in distributed form -- adding network unreliability, operational overhead, and debugging nightmares on top. These anti-patterns are drawn from post-mortems at Uber, DoorDash, Twitter/X, Knight Capital, and dozens of production incidents documented in public engineering blogs.
|
|
4
|
+
|
|
5
|
+
> **Domain:** Backend
|
|
6
|
+
> **Anti-patterns covered:** 20
|
|
7
|
+
> **Highest severity:** Critical
|
|
8
|
+
|
|
9
|
+
## Anti-Patterns
|
|
10
|
+
|
|
11
|
+
### AP-01: Distributed Monolith
|
|
12
|
+
|
|
13
|
+
**Also known as:** Monolith in Disguise, Coupled Microservices, Synchronized Deployment Cluster
|
|
14
|
+
**Frequency:** Very Common
|
|
15
|
+
**Severity:** Critical
|
|
16
|
+
**Detection difficulty:** Moderate
|
|
17
|
+
|
|
18
|
+
**What it looks like:**
|
|
19
|
+
|
|
20
|
+
Services are split into separate repositories and deployed independently in theory, but in practice every release requires coordinated deployment of multiple services. A change in Service A cannot ship without matching changes in Services B and C.
|
|
21
|
+
|
|
22
|
+
```
|
|
23
|
+
# Deployment runbook for "independent" microservices
|
|
24
|
+
1. Deploy user-service v2.3.1
|
|
25
|
+
2. Deploy order-service v1.8.0 (MUST follow user-service within 5 min)
|
|
26
|
+
3. Deploy payment-service v3.1.2 (MUST follow order-service)
|
|
27
|
+
4. Deploy notification-service v2.0.1 (MUST follow payment-service)
|
|
28
|
+
# If any step fails, roll back ALL four services
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
**Why developers do it:**
|
|
32
|
+
|
|
33
|
+
Teams lift-and-shift a monolith by splitting along code module boundaries rather than business domain boundaries. Shared data models, synchronous call chains, and direct database access between services create invisible coupling that only surfaces at deploy time.
|
|
34
|
+
|
|
35
|
+
**What goes wrong:**
|
|
36
|
+
|
|
37
|
+
A 2024 industry survey found that 85% of enterprises claim microservices adoption, yet many end up with a distributed monolith -- all the operational overhead of microservices with none of the independence benefits. Twitter/X experienced this when Elon Musk's team attempted to shut down "microservices bloatware," discovering that less than 20% of services were independently functional. Shutting down the rest cascaded into outages affecting tweeting, liking, and direct messaging. Teams spend more time on cross-service coordination than feature work, and deployment windows grow rather than shrink.
|
|
38
|
+
|
|
39
|
+
**The fix:**
|
|
40
|
+
|
|
41
|
+
Redesign service boundaries around business capabilities using Domain-Driven Design bounded contexts. Each service owns its data, exposes a versioned API, and can be deployed without coordinating with other services. Apply the "can I deploy this service on a Friday afternoon without telling anyone?" test.
|
|
42
|
+
|
|
43
|
+
**Detection rule:**
|
|
44
|
+
|
|
45
|
+
Flag any deployment that requires more than one service to be released within the same time window. Track inter-service deployment coupling ratio: deploys requiring coordination / total deploys.
|
|
46
|
+
|
|
47
|
+
---
|
|
48
|
+
|
|
49
|
+
### AP-02: Nano-Services
|
|
50
|
+
|
|
51
|
+
**Also known as:** Function-as-a-Service Abuse, Over-Decomposition, Microservice Sprawl
|
|
52
|
+
**Frequency:** Common
|
|
53
|
+
**Severity:** High
|
|
54
|
+
**Detection difficulty:** Moderate
|
|
55
|
+
|
|
56
|
+
**What it looks like:**
|
|
57
|
+
|
|
58
|
+
Every small function or CRUD operation gets its own service, deployment pipeline, repository, and infrastructure. A user signup flow touches 12 services for what is fundamentally one business transaction.
|
|
59
|
+
|
|
60
|
+
```
|
|
61
|
+
user-validation-service/
|
|
62
|
+
user-creation-service/
|
|
63
|
+
email-format-checker-service/
|
|
64
|
+
password-hash-service/
|
|
65
|
+
welcome-email-service/
|
|
66
|
+
user-preferences-default-service/
|
|
67
|
+
audit-log-write-service/
|
|
68
|
+
# Each with its own Dockerfile, CI pipeline, Kubernetes deployment, and on-call rotation
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
**Why developers do it:**
|
|
72
|
+
|
|
73
|
+
Enthusiasm for microservices leads to decomposing along technical function boundaries rather than business boundaries. "Micro" is misinterpreted as "as small as possible." Each new feature becomes a new service because creating one feels productive.
|
|
74
|
+
|
|
75
|
+
**What goes wrong:**
|
|
76
|
+
|
|
77
|
+
Uber scaled to over 4,000 microservices, creating what engineers internally called the "Death Star" -- a dependency graph so tangled that deploying a new service became difficult and tracing an API call was nearly impossible. They began consolidating back into "macroservices." The operational cost per service (CI/CD, monitoring, on-call, dependency upgrades, security patching) exceeded the development cost of the actual business logic. Spotify found that every microservice relied on a minimum of 10-15 other microservices for a single customer request.
|
|
78
|
+
|
|
79
|
+
**The fix:**
|
|
80
|
+
|
|
81
|
+
Apply the "two-pizza team" rule: a service should be owned by one team and represent a cohesive business capability. If a service has fewer than 500 lines of business logic, it is probably a library, not a service. Periodically review the service catalog and merge nano-services that always change together.
|
|
82
|
+
|
|
83
|
+
**Detection rule:**
|
|
84
|
+
|
|
85
|
+
Flag services with fewer than 3 API endpoints OR fewer than 500 lines of domain logic. Alert when total service count exceeds 10x the number of backend engineers.
|
|
86
|
+
|
|
87
|
+
---
|
|
88
|
+
|
|
89
|
+
### AP-03: Shared Database
|
|
90
|
+
|
|
91
|
+
**Also known as:** Data Monolith, Common Database, Integration Database
|
|
92
|
+
**Frequency:** Very Common
|
|
93
|
+
**Severity:** Critical
|
|
94
|
+
**Detection difficulty:** Easy
|
|
95
|
+
|
|
96
|
+
**What it looks like:**
|
|
97
|
+
|
|
98
|
+
Multiple microservices read from and write to the same database tables. Schema changes require coordinating across all consuming services.
|
|
99
|
+
|
|
100
|
+
```sql
|
|
101
|
+
-- orders_db used by: order-service, billing-service, shipping-service, analytics-service
|
|
102
|
+
-- All four services have direct SELECT/INSERT/UPDATE on the `orders` table
|
|
103
|
+
ALTER TABLE orders ADD COLUMN fulfillment_status VARCHAR(20);
|
|
104
|
+
-- Requires synchronized deployment of all four services
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
**Why developers do it:**
|
|
108
|
+
|
|
109
|
+
Sharing a database is the path of least resistance when extracting services from a monolith. Developers avoid the complexity of data synchronization, event-driven communication, and eventual consistency by keeping the "easy" joins and transactions.
|
|
110
|
+
|
|
111
|
+
**What goes wrong:**
|
|
112
|
+
|
|
113
|
+
The shared database becomes a single point of failure -- if it goes down, every service relying on it fails simultaneously. Schema changes become high-risk coordinated events. A real-world Hacker News discussion documents how teams discovered that their "microservices" were coupled through 47 shared tables, making independent deployment impossible. Lock contention between services writing to the same tables causes cascading latency spikes during peak traffic. One team's unoptimized query can starve every other service of database connections.
|
|
114
|
+
|
|
115
|
+
**The fix:**
|
|
116
|
+
|
|
117
|
+
Each service owns its database (database-per-service pattern). Services expose data through APIs, not shared tables. Use event-driven synchronization (Change Data Capture, domain events) for cross-service data needs. Accept eventual consistency where possible; use the Saga pattern for distributed transactions.
|
|
118
|
+
|
|
119
|
+
**Detection rule:**
|
|
120
|
+
|
|
121
|
+
Flag any database with connections from more than one service. Monitor for cross-service table access by tracking database user/role per connection.
|
|
122
|
+
|
|
123
|
+
---
|
|
124
|
+
|
|
125
|
+
### AP-04: Synchronous Call Chains
|
|
126
|
+
|
|
127
|
+
**Also known as:** Temporal Coupling, REST Chain, Synchronous Death Spiral
|
|
128
|
+
**Frequency:** Very Common
|
|
129
|
+
**Severity:** Critical
|
|
130
|
+
**Detection difficulty:** Moderate
|
|
131
|
+
|
|
132
|
+
**What it looks like:**
|
|
133
|
+
|
|
134
|
+
Service A synchronously calls B, which synchronously calls C, which calls D. The entire chain blocks until D responds. Overall latency is the sum of all hops. Failure at any point fails the entire request.
|
|
135
|
+
|
|
136
|
+
```
|
|
137
|
+
Client -> API Gateway -> Order Service -> Inventory Service -> Warehouse Service -> Shipping Service
|
|
138
|
+
(waits) (waits) (waits) (waits)
|
|
139
|
+
Total latency = sum of all four service latencies + network overhead
|
|
140
|
+
Availability = 0.99 * 0.99 * 0.99 * 0.99 = 0.96 (not 0.99)
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
**Why developers do it:**
|
|
144
|
+
|
|
145
|
+
Synchronous HTTP/REST is familiar, easy to debug locally, and mirrors the request-response model developers learned first. Async messaging feels complex and introduces new failure modes that developers have not encountered before.
|
|
146
|
+
|
|
147
|
+
**What goes wrong:**
|
|
148
|
+
|
|
149
|
+
DoorDash documented a major cascading outage triggered by database maintenance that increased latency on one downstream service. The latency bubbled up through synchronous call chains, causing thread exhaustion and connection pool saturation across upstream services. The error rates then triggered a misconfigured circuit breaker, which halted traffic between unrelated services, amplifying a minor database latency issue into a platform-wide outage. Microsoft's architecture guidance warns that if most internal microservice interaction relies on synchronous HTTP calls, partial failures are amplified into global failures.
|
|
150
|
+
|
|
151
|
+
**The fix:**
|
|
152
|
+
|
|
153
|
+
Use asynchronous messaging (events, message queues) for operations that do not require an immediate response. Where synchronous calls are necessary, set aggressive timeouts, implement bulkheads (separate thread pools per dependency), and never chain more than two synchronous hops.
|
|
154
|
+
|
|
155
|
+
**Detection rule:**
|
|
156
|
+
|
|
157
|
+
Trace request paths and flag any chain exceeding 3 synchronous hops. Alert when p99 latency of a service exceeds 2x the sum of its direct dependency p99 latencies.
|
|
158
|
+
|
|
159
|
+
---
|
|
160
|
+
|
|
161
|
+
### AP-05: Missing Circuit Breakers
|
|
162
|
+
|
|
163
|
+
**Also known as:** Unbounded Failure, Missing Bulkheads, No Fail-Fast
|
|
164
|
+
**Frequency:** Common
|
|
165
|
+
**Severity:** Critical
|
|
166
|
+
**Detection difficulty:** Easy
|
|
167
|
+
|
|
168
|
+
**What it looks like:**
|
|
169
|
+
|
|
170
|
+
Services make calls to dependencies without any failure isolation. When a dependency slows down or fails, the calling service exhausts its thread pool waiting for responses and becomes unresponsive itself.
|
|
171
|
+
|
|
172
|
+
```java
|
|
173
|
+
// No circuit breaker, no timeout, no fallback
|
|
174
|
+
public OrderResponse createOrder(OrderRequest request) {
|
|
175
|
+
InventoryResponse inventory = inventoryClient.checkStock(request.getItems()); // blocks forever
|
|
176
|
+
PaymentResponse payment = paymentClient.charge(request.getPayment()); // blocks forever
|
|
177
|
+
ShippingResponse shipping = shippingClient.schedule(request.getAddress()); // blocks forever
|
|
178
|
+
return new OrderResponse(inventory, payment, shipping);
|
|
179
|
+
}
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
**Why developers do it:**
|
|
183
|
+
|
|
184
|
+
Circuit breakers add code complexity and require defining fallback behaviors, timeout thresholds, and error budgets. In the "happy path" development mindset, failures feel like edge cases that can be handled later. The defaults in most HTTP clients are generous timeouts or no timeouts at all.
|
|
185
|
+
|
|
186
|
+
**What goes wrong:**
|
|
187
|
+
|
|
188
|
+
Netflix built Hystrix specifically because missing circuit breakers caused cascading failures across their streaming infrastructure. The DoorDash outage saw a misconfigured circuit breaker (thresholds set too aggressively) stop traffic between unrelated services. Without circuit breakers, a single slow dependency can consume all threads in the calling service, making it unresponsive to all requests -- not just those involving the slow dependency. This is the "bulkhead failure" pattern: one leaking compartment sinks the entire ship.
|
|
189
|
+
|
|
190
|
+
**The fix:**
|
|
191
|
+
|
|
192
|
+
Implement circuit breakers on every external call (Resilience4j, Polly, or built-in service mesh policies). Define fallback responses for degraded operation. Use bulkhead patterns to isolate thread pools per dependency so a slow Inventory Service cannot exhaust threads needed for Payment processing.
|
|
193
|
+
|
|
194
|
+
```java
|
|
195
|
+
@CircuitBreaker(name = "inventory", fallbackMethod = "inventoryFallback")
|
|
196
|
+
@Retry(name = "inventory")
|
|
197
|
+
@TimeLimiter(name = "inventory")
|
|
198
|
+
public CompletableFuture<InventoryResponse> checkStock(List<Item> items) {
|
|
199
|
+
return CompletableFuture.supplyAsync(() -> inventoryClient.checkStock(items));
|
|
200
|
+
}
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
**Detection rule:**
|
|
204
|
+
|
|
205
|
+
Audit all outbound HTTP/gRPC calls for circuit breaker configuration. Flag any service-to-service call without a timeout shorter than 5 seconds and a circuit breaker with defined thresholds.
|
|
206
|
+
|
|
207
|
+
---
|
|
208
|
+
|
|
209
|
+
### AP-06: Service Mesh Explosion
|
|
210
|
+
|
|
211
|
+
**Also known as:** Infrastructure Complexity Creep, Sidecar Tax, Mesh Overhead
|
|
212
|
+
**Frequency:** Moderate
|
|
213
|
+
**Severity:** High
|
|
214
|
+
**Detection difficulty:** Hard
|
|
215
|
+
|
|
216
|
+
**What it looks like:**
|
|
217
|
+
|
|
218
|
+
Teams adopt a service mesh (Istio, Linkerd) for simple traffic routing but end up running a complex distributed system just to manage the other distributed system. Every pod gets a sidecar proxy, adding CPU/memory overhead and a new failure domain.
|
|
219
|
+
|
|
220
|
+
**Why developers do it:**
|
|
221
|
+
|
|
222
|
+
Service meshes promise observability, mTLS, traffic splitting, and retries "for free." The initial demo looks magical. Teams adopt the full mesh before they have the operational maturity to manage it, or before they even need most of its features.
|
|
223
|
+
|
|
224
|
+
**What goes wrong:**
|
|
225
|
+
|
|
226
|
+
Platform teams found themselves operating "yet another distributed system," with sidecar injection into every pod and configuration drift across hundreds of mesh policies becoming a full-time job. The resource overhead is non-trivial: each Envoy sidecar consumes 50-100MB of memory, multiplied across hundreds of pods. Debugging becomes harder because every network call now passes through an additional proxy layer. Istio's CNCF graduation in 2025 acknowledged that running a mesh "was hard, sidecars added resource overhead, operational complexity ballooned, and for many, service mesh became an idea that looked better in theory than in practice."
|
|
227
|
+
|
|
228
|
+
**The fix:**
|
|
229
|
+
|
|
230
|
+
Start without a mesh. Use application-level libraries for retries and circuit breakers. Adopt a mesh only when you have 50+ services AND a dedicated platform team AND concrete requirements (mTLS everywhere, fine-grained traffic control) that cannot be met with simpler tools. Consider ambient mesh (sidecar-less) architectures.
|
|
231
|
+
|
|
232
|
+
**Detection rule:**
|
|
233
|
+
|
|
234
|
+
Track mesh resource overhead as a percentage of total cluster resources. Alert if sidecar CPU/memory exceeds 15% of workload resource consumption. Monitor mesh configuration object count and flag drift between environments.
|
|
235
|
+
|
|
236
|
+
---
|
|
237
|
+
|
|
238
|
+
### AP-07: Ignoring Eventual Consistency
|
|
239
|
+
|
|
240
|
+
**Also known as:** Distributed ACID Assumption, Stale Read Blindness, Consistency Denial
|
|
241
|
+
**Frequency:** Common
|
|
242
|
+
**Severity:** High
|
|
243
|
+
**Detection difficulty:** Hard
|
|
244
|
+
|
|
245
|
+
**What it looks like:**
|
|
246
|
+
|
|
247
|
+
Developers write code assuming that data written by Service A is immediately visible to Service B, treating a distributed system like a single-database application. No compensation logic exists for stale or conflicting reads.
|
|
248
|
+
|
|
249
|
+
```python
|
|
250
|
+
# Order service writes order
|
|
251
|
+
order_service.create_order(order_id=123, status="CONFIRMED")
|
|
252
|
+
|
|
253
|
+
# Analytics service reads immediately -- assumes order is visible
|
|
254
|
+
analytics = analytics_service.get_order(order_id=123)
|
|
255
|
+
# Returns None or stale data because replication has not propagated yet
|
|
256
|
+
assert analytics is not None # FAILS intermittently in production
|
|
257
|
+
```
|
|
258
|
+
|
|
259
|
+
**Why developers do it:**
|
|
260
|
+
|
|
261
|
+
ACID transactions in a single database are familiar and reliable. Developers trained on relational databases expect read-after-write consistency everywhere. The system works in local testing (single machine, no replication lag) and only fails under production load and replication delays.
|
|
262
|
+
|
|
263
|
+
**What goes wrong:**
|
|
264
|
+
|
|
265
|
+
CQRS implementations where transaction events are persisted in a write datastore and replicated to a read datastore cause users to see stale data when querying recently written records -- leading to support tickets, double-submissions, and data corruption when users retry operations they believe failed. An online travel platform experienced booking inconsistencies when the flight reservation service updated its local state but the hotel booking service read stale availability data, resulting in confirmed bookings for unavailable rooms.
|
|
266
|
+
|
|
267
|
+
**The fix:**
|
|
268
|
+
|
|
269
|
+
Design for eventual consistency from the start. Use read-your-own-writes patterns where the writing service can serve reads for recently written data. Implement idempotency keys so duplicate submissions are safe. Show appropriate UI states ("Your order is being processed") rather than assuming instant consistency.
|
|
270
|
+
|
|
271
|
+
**Detection rule:**
|
|
272
|
+
|
|
273
|
+
Flag any cross-service read that occurs within 500ms of a related write without a consistency guarantee. Audit for missing idempotency keys on state-changing operations.
|
|
274
|
+
|
|
275
|
+
---
|
|
276
|
+
|
|
277
|
+
### AP-08: No Saga Pattern
|
|
278
|
+
|
|
279
|
+
**Also known as:** Distributed Transaction Neglect, Missing Compensation Logic, Two-Phase Commit Everywhere
|
|
280
|
+
**Frequency:** Common
|
|
281
|
+
**Severity:** High
|
|
282
|
+
**Detection difficulty:** Moderate
|
|
283
|
+
|
|
284
|
+
**What it looks like:**
|
|
285
|
+
|
|
286
|
+
Multi-service business transactions have no coordinated rollback mechanism. When step 3 of a 5-step process fails, steps 1 and 2 leave permanent side effects (charges, reservations, notifications) with no compensation.
|
|
287
|
+
|
|
288
|
+
```
|
|
289
|
+
1. Order Service: Create order [SUCCESS]
|
|
290
|
+
2. Payment Service: Charge card [SUCCESS - $150 charged]
|
|
291
|
+
3. Inventory Service: Reserve stock [FAILURE - out of stock]
|
|
292
|
+
-- Customer is charged $150 for an order that cannot be fulfilled
|
|
293
|
+
-- No automated refund, no order cancellation
|
|
294
|
+
-- Manual intervention required
|
|
295
|
+
```
|
|
296
|
+
|
|
297
|
+
**Why developers do it:**
|
|
298
|
+
|
|
299
|
+
Distributed transactions (two-phase commit) are complex and perform poorly across services. Teams avoid the Saga pattern because it requires defining compensating actions for every step and handling partial failure states -- significant design effort that feels like over-engineering for "rare" failures.
|
|
300
|
+
|
|
301
|
+
**What goes wrong:**
|
|
302
|
+
|
|
303
|
+
Without sagas, partial failures create data inconsistencies that require manual intervention. A service must atomically update its database and publish a message/event, but without the Saga pattern, this atomicity is not guaranteed. Teams discover that "rare" failures happen daily at scale. Compensating transactions are harder to retrofit than to design upfront, because the system accumulates inconsistent states that resist automated cleanup. The lack of isolation in sagas (the "I" in ACID) means concurrent saga executions can create data anomalies if not carefully designed with semantic locks or countermeasures.
|
|
304
|
+
|
|
305
|
+
**The fix:**
|
|
306
|
+
|
|
307
|
+
Implement choreography-based sagas for simple workflows (events trigger compensating actions) or orchestration-based sagas for complex workflows (a coordinator manages the sequence). Every step must have a defined compensating action. Use idempotent operations and unique saga IDs for traceability.
|
|
308
|
+
|
|
309
|
+
**Detection rule:**
|
|
310
|
+
|
|
311
|
+
Identify multi-service write operations that lack compensating transactions. Flag any business flow spanning 3+ services without a saga coordinator or event-driven compensation chain.
|
|
312
|
+
|
|
313
|
+
---
|
|
314
|
+
|
|
315
|
+
### AP-09: Chatty Services
|
|
316
|
+
|
|
317
|
+
**Also known as:** N+1 Service Calls, Fine-Grained APIs, Network Tax
|
|
318
|
+
**Frequency:** Very Common
|
|
319
|
+
**Severity:** High
|
|
320
|
+
**Detection difficulty:** Easy
|
|
321
|
+
|
|
322
|
+
**What it looks like:**
|
|
323
|
+
|
|
324
|
+
A single user-facing request triggers dozens or hundreds of inter-service calls, often in loops that fetch individual records one at a time.
|
|
325
|
+
|
|
326
|
+
```python
|
|
327
|
+
# Rendering a user dashboard
|
|
328
|
+
def get_dashboard(user_id):
|
|
329
|
+
user = user_service.get_user(user_id) # Call 1
|
|
330
|
+
orders = order_service.get_orders(user_id) # Call 2
|
|
331
|
+
for order in orders: # N orders = N calls
|
|
332
|
+
order.items = catalog_service.get_items(order.id) # Call 3..N+2
|
|
333
|
+
for item in order.items: # M items per order
|
|
334
|
+
item.review = review_service.get_review(item.id) # Call N+3..N*M+2
|
|
335
|
+
recommendations = rec_service.get_recs(user_id) # Call N*M+3
|
|
336
|
+
# Total: potentially hundreds of network round-trips for one page load
|
|
337
|
+
```
|
|
338
|
+
|
|
339
|
+
**Why developers do it:**
|
|
340
|
+
|
|
341
|
+
Each service exposes simple, RESTful CRUD endpoints following textbook API design. Developers compose these fine-grained APIs from the calling service without realizing the network multiplication. The pattern works fast in development (localhost, zero latency) and only degrades under real network conditions.
|
|
342
|
+
|
|
343
|
+
**What goes wrong:**
|
|
344
|
+
|
|
345
|
+
Each inter-service call adds 1-10ms of network latency. With 100 calls, a single page load takes 1-2 seconds just in network overhead -- before any actual processing. Under load, connection pools saturate, and the calling service becomes a bottleneck. Spotify discovered that each microservice depended on 10-15 others for a single customer request, creating fragile chains where any single dependency slowdown degraded the user experience.
|
|
346
|
+
|
|
347
|
+
**The fix:**
|
|
348
|
+
|
|
349
|
+
Design coarse-grained APIs that return aggregated data for specific use cases (Backend for Frontend pattern). Use batch endpoints instead of individual record fetches. Consider GraphQL or composite APIs that resolve multiple data needs in a single round-trip. Cache frequently accessed cross-service data.
|
|
350
|
+
|
|
351
|
+
**Detection rule:**
|
|
352
|
+
|
|
353
|
+
Trace fan-out per incoming request. Alert when a single user-facing request triggers more than 10 inter-service calls. Monitor the ratio of internal to external API calls.
|
|
354
|
+
|
|
355
|
+
---
|
|
356
|
+
|
|
357
|
+
### AP-10: Shared Libraries Coupling
|
|
358
|
+
|
|
359
|
+
**Also known as:** Library Lock-Step, Common Library Hell, Diamond Dependency
|
|
360
|
+
**Frequency:** Common
|
|
361
|
+
**Severity:** High
|
|
362
|
+
**Detection difficulty:** Moderate
|
|
363
|
+
|
|
364
|
+
**What it looks like:**
|
|
365
|
+
|
|
366
|
+
A shared library (e.g., `company-commons`, `platform-core`) is used by every service for models, utilities, and cross-cutting concerns. Updating the library requires updating and redeploying every consuming service simultaneously.
|
|
367
|
+
|
|
368
|
+
```xml
|
|
369
|
+
<!-- 47 microservices all depend on: -->
|
|
370
|
+
<dependency>
|
|
371
|
+
<groupId>com.company</groupId>
|
|
372
|
+
<artifactId>platform-commons</artifactId>
|
|
373
|
+
<version>3.8.1</version> <!-- Changing this version requires redeploying all 47 services -->
|
|
374
|
+
</dependency>
|
|
375
|
+
```
|
|
376
|
+
|
|
377
|
+
**Why developers do it:**
|
|
378
|
+
|
|
379
|
+
Shared libraries reduce duplication and enforce consistency for logging, authentication, and data models. Extracting common code into a library follows the DRY principle -- seemingly a best practice.
|
|
380
|
+
|
|
381
|
+
**What goes wrong:**
|
|
382
|
+
|
|
383
|
+
The shared library becomes a coupling vector that defeats independent deployability. A security patch in `platform-commons` requires redeploying all 47 services. Version conflicts arise when different services need different versions, creating diamond dependency problems. Teams discovered that their microservices were coupled through shared libraries containing data transfer objects that encoded business logic, making it impossible to evolve one service's domain model without breaking others.
|
|
384
|
+
|
|
385
|
+
**The fix:**
|
|
386
|
+
|
|
387
|
+
Keep shared libraries thin: logging, tracing, and HTTP client configuration only. Never put domain models or business logic in shared libraries. Use API contracts (protobuf, OpenAPI) instead of shared DTOs. Allow services to independently version their dependencies. If two services need the same business logic, that logic might belong in a dedicated service.
|
|
388
|
+
|
|
389
|
+
**Detection rule:**
|
|
390
|
+
|
|
391
|
+
Flag shared libraries that contain domain model classes or business logic methods. Alert when a library update requires more than 3 services to redeploy simultaneously.
|
|
392
|
+
|
|
393
|
+
---
|
|
394
|
+
|
|
395
|
+
### AP-11: No Observability
|
|
396
|
+
|
|
397
|
+
**Also known as:** Blind Microservices, Missing Telemetry, Debug-by-Prayer
|
|
398
|
+
**Frequency:** Common
|
|
399
|
+
**Severity:** Critical
|
|
400
|
+
**Detection difficulty:** Easy
|
|
401
|
+
|
|
402
|
+
**What it looks like:**
|
|
403
|
+
|
|
404
|
+
Services are deployed without structured logging, distributed tracing, or meaningful metrics. When something fails, engineers SSH into individual containers and grep through unstructured log files, unable to trace a request across service boundaries.
|
|
405
|
+
|
|
406
|
+
**Why developers do it:**
|
|
407
|
+
|
|
408
|
+
Observability is treated as a "nice-to-have" that can be added later. Teams focus on features and deploy services before establishing logging standards, trace propagation, or alerting. In a monolith, a single stack trace tells the whole story; developers do not realize this breaks down in distributed systems.
|
|
409
|
+
|
|
410
|
+
**What goes wrong:**
|
|
411
|
+
|
|
412
|
+
After Twitter/X shut down a majority of its microservices in cost-cutting efforts, subsequent outages were extremely difficult to diagnose because the observability infrastructure was also degraded. Engineers could not trace which remaining services were failing or why. A 2024 New Relic Observability Forecast found that the median time to detect outages without proper observability was 5x longer than with it, and the mean time to resolution increased by 3x. Without distributed tracing, debugging a latency spike across 20 services becomes a multi-day investigation.
|
|
413
|
+
|
|
414
|
+
**The fix:**
|
|
415
|
+
|
|
416
|
+
Implement the three pillars from day one: structured logs (JSON with service name, trace ID, span ID), distributed tracing (OpenTelemetry), and RED metrics (Rate, Errors, Duration) for every service. Make observability a prerequisite for production deployment, not an afterthought.
|
|
417
|
+
|
|
418
|
+
**Detection rule:**
|
|
419
|
+
|
|
420
|
+
Block deployment of any service lacking: (1) structured log output with trace ID propagation, (2) health check endpoint, (3) RED metrics exported to the monitoring system, (4) at least one alert configured for error rate threshold.
|
|
421
|
+
|
|
422
|
+
---
|
|
423
|
+
|
|
424
|
+
### AP-12: No API Gateway
|
|
425
|
+
|
|
426
|
+
**Also known as:** Direct Client-to-Service, Missing Edge Layer, Exposed Internals
|
|
427
|
+
**Frequency:** Moderate
|
|
428
|
+
**Severity:** High
|
|
429
|
+
**Detection difficulty:** Easy
|
|
430
|
+
|
|
431
|
+
**What it looks like:**
|
|
432
|
+
|
|
433
|
+
Clients (web, mobile) call individual microservices directly, each with its own authentication, rate limiting, and URL. The client must know the network location and API contract of every backend service.
|
|
434
|
+
|
|
435
|
+
```javascript
|
|
436
|
+
// Mobile app making direct calls to internal services
|
|
437
|
+
const user = await fetch('https://user-service.internal:8080/users/123');
|
|
438
|
+
const orders = await fetch('https://order-service.internal:8081/orders?user=123');
|
|
439
|
+
const recs = await fetch('https://rec-service.internal:8082/recommendations/123');
|
|
440
|
+
// Client knows about 3 internal services, their ports, and their APIs
|
|
441
|
+
```
|
|
442
|
+
|
|
443
|
+
**Why developers do it:**
|
|
444
|
+
|
|
445
|
+
An API gateway feels like unnecessary indirection for a small number of services. Direct calls are simpler to implement and debug initially. Teams plan to "add a gateway later" once the architecture stabilizes.
|
|
446
|
+
|
|
447
|
+
**What goes wrong:**
|
|
448
|
+
|
|
449
|
+
Without a gateway, every service must independently implement authentication, rate limiting, CORS headers, and request validation -- leading to inconsistent security policies. Internal service topology leaks to clients, making it impossible to restructure backend services without breaking every client. Mobile apps with hardcoded service URLs require app store releases to change routing. AWS architecture guidance specifically warns that without an API gateway, managing cross-cutting concerns across dozens of services becomes unsustainable.
|
|
450
|
+
|
|
451
|
+
**The fix:**
|
|
452
|
+
|
|
453
|
+
Deploy an API gateway (Kong, AWS API Gateway, Envoy) as the single entry point. Implement authentication, rate limiting, and request routing at the gateway. Use the Backend for Frontend pattern to create client-specific API compositions. Internal services are never exposed directly to clients.
|
|
454
|
+
|
|
455
|
+
**Detection rule:**
|
|
456
|
+
|
|
457
|
+
Scan network policies and load balancer configurations for any microservice directly accessible from outside the cluster. Flag services with public-facing ports that are not the designated API gateway.
|
|
458
|
+
|
|
459
|
+
---
|
|
460
|
+
|
|
461
|
+
### AP-13: Service Discovery Failures
|
|
462
|
+
|
|
463
|
+
**Also known as:** Hardcoded Endpoints, Stale DNS, Registry Blindness
|
|
464
|
+
**Frequency:** Moderate
|
|
465
|
+
**Severity:** High
|
|
466
|
+
**Detection difficulty:** Moderate
|
|
467
|
+
|
|
468
|
+
**What it looks like:**
|
|
469
|
+
|
|
470
|
+
Services use hardcoded IP addresses or hostnames to reach dependencies. When instances scale up/down or relocate, calls fail because the caller does not know about the new addresses.
|
|
471
|
+
|
|
472
|
+
```yaml
|
|
473
|
+
# application.yml - hardcoded service locations
|
|
474
|
+
payment-service:
|
|
475
|
+
url: http://10.0.1.45:8080 # What happens when this instance is replaced?
|
|
476
|
+
inventory-service:
|
|
477
|
+
url: http://10.0.1.46:8081 # This IP was valid 3 months ago
|
|
478
|
+
```
|
|
479
|
+
|
|
480
|
+
**Why developers do it:**
|
|
481
|
+
|
|
482
|
+
Hardcoded endpoints work in development and staging environments with fixed infrastructure. Service discovery adds complexity (Consul, Eureka, Kubernetes DNS) and a new failure domain. It feels over-engineered for "just a few services."
|
|
483
|
+
|
|
484
|
+
**What goes wrong:**
|
|
485
|
+
|
|
486
|
+
Cloud instances are ephemeral -- IPs change on every deployment, scaling event, or failover. Hardcoded endpoints cause silent failures when the target instance is replaced. Even with service discovery, misconfigured health checks can cause the registry to route traffic to unhealthy instances or deregister healthy ones during temporary network blips. Twitter/X experienced configuration change propagation issues in 2023 where an engineer's change "escalated to other services," causing a multi-hour outage -- a service discovery and configuration propagation failure.
|
|
487
|
+
|
|
488
|
+
**The fix:**
|
|
489
|
+
|
|
490
|
+
Use platform-native service discovery (Kubernetes DNS/Services, Consul, AWS Cloud Map). Never hardcode IPs or hostnames in application configuration. Implement health checks that accurately reflect service readiness (not just liveness). Use client-side load balancing with circuit breakers for resilience against discovery lag.
|
|
491
|
+
|
|
492
|
+
**Detection rule:**
|
|
493
|
+
|
|
494
|
+
Grep configuration files and environment variables for hardcoded IP addresses (regex: `\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}`). Flag any service configuration that does not use DNS-based or registry-based service resolution.
|
|
495
|
+
|
|
496
|
+
---
|
|
497
|
+
|
|
498
|
+
### AP-14: Configuration Drift
|
|
499
|
+
|
|
500
|
+
**Also known as:** Snowflake Services, Environment Mismatch, Config Sprawl
|
|
501
|
+
**Frequency:** Common
|
|
502
|
+
**Severity:** High
|
|
503
|
+
**Detection difficulty:** Hard
|
|
504
|
+
|
|
505
|
+
**What it looks like:**
|
|
506
|
+
|
|
507
|
+
Each service manages its own configuration independently. Over time, timeout values, retry policies, feature flags, and resource limits diverge across environments and services with no central visibility.
|
|
508
|
+
|
|
509
|
+
```
|
|
510
|
+
# Production configs across 30 services:
|
|
511
|
+
order-service: timeout=30s, retries=3, pool_size=50
|
|
512
|
+
payment-service: timeout=60s, retries=5, pool_size=20 # Why different?
|
|
513
|
+
user-service: timeout=10s, retries=1, pool_size=100 # Who set this?
|
|
514
|
+
shipping-service: timeout=30s, retries=3, pool_size=50 # Matches order, coincidence?
|
|
515
|
+
# Staging has completely different values for all of these
|
|
516
|
+
```
|
|
517
|
+
|
|
518
|
+
**Why developers do it:**
|
|
519
|
+
|
|
520
|
+
Each team configures their service based on local knowledge and immediate needs. Without a centralized configuration strategy, defaults are copied from Stack Overflow or inherited from whichever template was used to scaffold the service. Nobody owns the cross-cutting concern of configuration consistency.
|
|
521
|
+
|
|
522
|
+
**What goes wrong:**
|
|
523
|
+
|
|
524
|
+
Inconsistent timeout configurations between caller and callee create subtle failure modes: if Service A's timeout (30s) exceeds Service B's own processing timeout (10s), Service A waits for responses that Service B has already abandoned. Configuration drift between staging and production means bugs are not caught in pre-production testing. Istio platform teams reported that managing configuration drift across hundreds of mesh policies became a full-time job, with misconfigured policies causing unexpected traffic routing and dropped requests.
|
|
525
|
+
|
|
526
|
+
**The fix:**
|
|
527
|
+
|
|
528
|
+
Use a centralized configuration service (Consul KV, Spring Cloud Config, AWS Parameter Store). Define organization-wide defaults for timeouts, retries, and pool sizes. Implement configuration-as-code with version control and automated drift detection. Treat configuration changes like code changes: review, test, deploy progressively.
|
|
529
|
+
|
|
530
|
+
**Detection rule:**
|
|
531
|
+
|
|
532
|
+
Diff configuration values across all services and environments weekly. Alert on timeout mismatches between calling and called services. Flag any configuration value that differs between staging and production without an explicit override justification.
|
|
533
|
+
|
|
534
|
+
---
|
|
535
|
+
|
|
536
|
+
### AP-15: Missing Correlation IDs
|
|
537
|
+
|
|
538
|
+
**Also known as:** Untraceable Requests, Lost Context, Orphaned Logs
|
|
539
|
+
**Frequency:** Common
|
|
540
|
+
**Severity:** High
|
|
541
|
+
**Detection difficulty:** Easy
|
|
542
|
+
|
|
543
|
+
**What it looks like:**
|
|
544
|
+
|
|
545
|
+
Requests pass through multiple services without a shared identifier. Each service logs independently, making it impossible to reconstruct the full journey of a single user request.
|
|
546
|
+
|
|
547
|
+
```
|
|
548
|
+
# Order service logs
|
|
549
|
+
2026-03-08 10:23:45 INFO Creating order for user 789
|
|
550
|
+
2026-03-08 10:23:45 ERROR Payment failed
|
|
551
|
+
|
|
552
|
+
# Payment service logs
|
|
553
|
+
2026-03-08 10:23:45 ERROR Card declined for amount $150
|
|
554
|
+
2026-03-08 10:23:45 ERROR Connection timeout to fraud service
|
|
555
|
+
|
|
556
|
+
# Fraud service logs
|
|
557
|
+
2026-03-08 10:23:44 WARN High latency on ML model inference
|
|
558
|
+
# Which of these log lines are related? No way to tell.
|
|
559
|
+
```
|
|
560
|
+
|
|
561
|
+
**Why developers do it:**
|
|
562
|
+
|
|
563
|
+
Correlation IDs require every service to extract an ID from incoming requests and propagate it to all outbound calls and log statements. This is a cross-cutting concern that is easy to forget when each team builds their service independently. It works fine in a monolith where a single thread ID correlates all log lines.
|
|
564
|
+
|
|
565
|
+
**What goes wrong:**
|
|
566
|
+
|
|
567
|
+
Without correlation IDs, diagnosing production issues across 20+ services becomes a manual timestamp-correlation exercise that takes hours instead of minutes. AWS distributed monitoring guidance specifically recommends correlation IDs as essential for microservices debugging. Teams waste engineering hours per incident manually piecing together log lines. When regulatory audits require tracing a specific user's data flow through the system, missing correlation IDs make compliance impossible.
|
|
568
|
+
|
|
569
|
+
**The fix:**
|
|
570
|
+
|
|
571
|
+
Generate a correlation ID (UUID) at the system edge (API gateway) and propagate it through all service calls via headers (e.g., `X-Correlation-Id` or W3C `traceparent`). Use OpenTelemetry for automatic context propagation. Include the correlation ID in every log line, metric tag, and error report.
|
|
572
|
+
|
|
573
|
+
```python
|
|
574
|
+
# Middleware that propagates correlation ID
|
|
575
|
+
def correlation_middleware(request, call_next):
|
|
576
|
+
correlation_id = request.headers.get('X-Correlation-Id', str(uuid4()))
|
|
577
|
+
context.set('correlation_id', correlation_id)
|
|
578
|
+
response = call_next(request)
|
|
579
|
+
response.headers['X-Correlation-Id'] = correlation_id
|
|
580
|
+
return response
|
|
581
|
+
```
|
|
582
|
+
|
|
583
|
+
**Detection rule:**
|
|
584
|
+
|
|
585
|
+
Sample 100 requests per hour and verify that the correlation ID appears in logs from every service in the call chain. Alert when any service log line is missing a correlation ID field.
|
|
586
|
+
|
|
587
|
+
---
|
|
588
|
+
|
|
589
|
+
### AP-16: Breaking API Contracts
|
|
590
|
+
|
|
591
|
+
**Also known as:** Unversioned APIs, Silent Contract Changes, Consumer-Blind Evolution
|
|
592
|
+
**Frequency:** Common
|
|
593
|
+
**Severity:** Critical
|
|
594
|
+
**Detection difficulty:** Moderate
|
|
595
|
+
|
|
596
|
+
**What it looks like:**
|
|
597
|
+
|
|
598
|
+
A service changes its API (removes a field, renames an endpoint, changes a type) without versioning and without notifying or testing against consumers. Existing clients break in production.
|
|
599
|
+
|
|
600
|
+
```json
|
|
601
|
+
// v1 response (what consumers expect)
|
|
602
|
+
{ "user_id": 123, "name": "Alice", "email": "alice@example.com" }
|
|
603
|
+
|
|
604
|
+
// v2 response (deployed without notice)
|
|
605
|
+
{ "id": 123, "full_name": "Alice", "email_address": "alice@example.com" }
|
|
606
|
+
// Every consumer parsing "user_id", "name", or "email" breaks silently
|
|
607
|
+
```
|
|
608
|
+
|
|
609
|
+
**Why developers do it:**
|
|
610
|
+
|
|
611
|
+
Teams own their service and feel they can change its API freely. Without contract testing in CI/CD, breaking changes are not detected until production. The producer team tests their service in isolation, confirming it works with the new schema, without testing against actual consumers.
|
|
612
|
+
|
|
613
|
+
**What goes wrong:**
|
|
614
|
+
|
|
615
|
+
A single incompatible API change can block the integration environment and the path to production for all dependent services. The most insidious failures are silent: a renamed field returns `null` instead of throwing an error, causing downstream logic to silently use default values. Consumer-driven contract testing (Pact) was invented specifically because teams at REA Group found that integration-time API breakages were their most common production incident category.
|
|
616
|
+
|
|
617
|
+
**The fix:**
|
|
618
|
+
|
|
619
|
+
Implement consumer-driven contract testing (Pact, Spring Cloud Contract) in CI/CD. Use semantic versioning for APIs. Follow additive-only evolution: new fields are optional, old fields are deprecated but never removed without a migration window. Run two API versions in production simultaneously during transitions.
|
|
620
|
+
|
|
621
|
+
**Detection rule:**
|
|
622
|
+
|
|
623
|
+
Run contract tests on every PR that modifies API response schemas. Flag any field removal or type change in OpenAPI/protobuf definitions. Alert when a service deploys an API version that has no registered consumers.
|
|
624
|
+
|
|
625
|
+
---
|
|
626
|
+
|
|
627
|
+
### AP-17: Coordinated Releases
|
|
628
|
+
|
|
629
|
+
**Also known as:** Big-Bang Deployment, Release Train, Synchronized Rollout
|
|
630
|
+
**Frequency:** Common
|
|
631
|
+
**Severity:** Critical
|
|
632
|
+
**Detection difficulty:** Easy
|
|
633
|
+
|
|
634
|
+
**What it looks like:**
|
|
635
|
+
|
|
636
|
+
Despite having separate services, releases are batched into a weekly or biweekly "release train" where all changed services are deployed together. A failure in any service's deployment blocks the entire train.
|
|
637
|
+
|
|
638
|
+
**Why developers do it:**
|
|
639
|
+
|
|
640
|
+
Coordinated releases feel safer -- everything is tested together before going live. They match the release cadence of the monolith the team migrated from. Integration testing is only done against the full release bundle, not individual services.
|
|
641
|
+
|
|
642
|
+
**What goes wrong:**
|
|
643
|
+
|
|
644
|
+
Knight Capital lost $460 million in 45 minutes on August 1, 2012, due in part to a big-bang deployment failure. Engineers manually deployed new trading code (SMARS) across 8 servers but missed one server, leaving deprecated "Power Peg" code active. The staggered inconsistency between servers caused the system to execute erroneous trades at a rate that hemorrhaged $10 million per minute. There were no automated deployment pipelines, no peer review of deployments, and no canary process. The aggressive delivery schedule demanded a synchronized big-bang release rather than a phased rollout. Knight Capital was acquired four months later. Coordinated releases also mean that one team's delay blocks every other team, destroying the independent deployment advantage of microservices.
|
|
645
|
+
|
|
646
|
+
**The fix:**
|
|
647
|
+
|
|
648
|
+
Deploy services independently with backward-compatible APIs. Use feature flags to decouple deployment from release. Implement canary deployments and progressive rollouts (1% -> 10% -> 50% -> 100%). Each service has its own deployment pipeline triggered by its own CI. If a service cannot be deployed independently, it is coupled (see AP-01).
|
|
649
|
+
|
|
650
|
+
**Detection rule:**
|
|
651
|
+
|
|
652
|
+
Track the percentage of deployments that are solo vs. batched. Alert when more than 20% of deployments in a sprint involve coordinated multi-service releases. Flag any deployment runbook that mentions another service by name.
|
|
653
|
+
|
|
654
|
+
---
|
|
655
|
+
|
|
656
|
+
### AP-18: Microservices for Small Teams
|
|
657
|
+
|
|
658
|
+
**Also known as:** Premature Decomposition, Resume-Driven Architecture, Complexity Before Scale
|
|
659
|
+
**Frequency:** Common
|
|
660
|
+
**Severity:** High
|
|
661
|
+
**Detection difficulty:** Easy
|
|
662
|
+
|
|
663
|
+
**What it looks like:**
|
|
664
|
+
|
|
665
|
+
A team of 3-5 engineers builds 15+ microservices for an application with hundreds of users. Each engineer is on-call for 5+ services. More time is spent on infrastructure, deployment pipelines, and inter-service debugging than on building features.
|
|
666
|
+
|
|
667
|
+
**Why developers do it:**
|
|
668
|
+
|
|
669
|
+
Microservices are perceived as the "modern" way to build software. Job postings and conference talks glorify microservices architectures. Teams adopt them for career development (resume-driven development) or because they anticipate scale that may never arrive. One documented case saw a CTO spend nine months building a microservices architecture for an application with forty-seven users.
|
|
670
|
+
|
|
671
|
+
**What goes wrong:**
|
|
672
|
+
|
|
673
|
+
The operational cost per service is constant regardless of team size: each service needs CI/CD, monitoring, alerting, dependency management, security updates, and on-call coverage. A 4-person team running 15 services spends 70%+ of their time on operational overhead. GitHub's core application remains largely a Ruby on Rails monolith serving millions of developers daily. Basecamp, Shopify (core), and many successful products run on modular monoliths. Early-stage startups have failed because premature decomposition created more PM-engineering coordination overhead than technical gain.
|
|
674
|
+
|
|
675
|
+
**The fix:**
|
|
676
|
+
|
|
677
|
+
Start with a modular monolith. Define clear module boundaries that can become service boundaries later. Decompose into microservices only when: (1) you have multiple teams needing independent deployment, (2) modules have genuinely different scaling requirements, (3) you have the operational maturity to run distributed systems. Apply Martin Fowler's "monolith first" strategy.
|
|
678
|
+
|
|
679
|
+
**Detection rule:**
|
|
680
|
+
|
|
681
|
+
Flag when the ratio of services to engineers exceeds 3:1. Alert when more than 40% of sprint velocity is consumed by infrastructure and operational tasks rather than feature development.
|
|
682
|
+
|
|
683
|
+
---
|
|
684
|
+
|
|
685
|
+
### AP-19: Not Handling Partial Failures
|
|
686
|
+
|
|
687
|
+
**Also known as:** All-or-Nothing Thinking, Missing Graceful Degradation, Brittle Composition
|
|
688
|
+
**Frequency:** Common
|
|
689
|
+
**Severity:** Critical
|
|
690
|
+
**Detection difficulty:** Moderate
|
|
691
|
+
|
|
692
|
+
**What it looks like:**
|
|
693
|
+
|
|
694
|
+
When any dependency is unavailable, the entire request fails. The system has no concept of degraded operation -- it is either fully functional or fully broken.
|
|
695
|
+
|
|
696
|
+
```python
|
|
697
|
+
def get_product_page(product_id):
|
|
698
|
+
product = product_service.get(product_id) # Required
|
|
699
|
+
reviews = review_service.get(product_id) # Nice-to-have
|
|
700
|
+
recommendations = rec_service.get(product_id) # Nice-to-have
|
|
701
|
+
seller_info = seller_service.get(product.seller) # Nice-to-have
|
|
702
|
+
# If review-service is down, the ENTIRE product page returns 500
|
|
703
|
+
# Even though we could show the product without reviews
|
|
704
|
+
return render(product, reviews, recommendations, seller_info)
|
|
705
|
+
```
|
|
706
|
+
|
|
707
|
+
**Why developers do it:**
|
|
708
|
+
|
|
709
|
+
Composing responses from multiple services is implemented as a sequential pipeline where any failure aborts the entire operation. Defining which dependencies are "required" vs. "optional" requires product decisions that engineers defer. Exception handling defaults to "fail and propagate" rather than "degrade and continue."
|
|
710
|
+
|
|
711
|
+
**What goes wrong:**
|
|
712
|
+
|
|
713
|
+
Microsoft's microservices architecture guidance warns that in large applications, partial failures are amplified when most internal interaction relies on synchronous HTTP calls. A minor update to one service can unintentionally break the entire user experience. An online travel platform experienced complete booking page failures when the car rental recommendation service went down, even though users were booking flights -- the page composition treated all data sources as required. Blocking threads waiting for unresponsive services consumes resources until the application runtime runs out of threads and becomes globally unresponsive.
|
|
714
|
+
|
|
715
|
+
**The fix:**
|
|
716
|
+
|
|
717
|
+
Classify each dependency as critical (request fails without it) or optional (degrade gracefully without it). Use `CompletableFuture` / `Promise.allSettled` patterns to fetch optional data in parallel with timeouts. Return partial responses with explicit indicators of what data is missing. Implement fallback responses (cached data, empty states, default values) for optional dependencies.
|
|
718
|
+
|
|
719
|
+
```python
|
|
720
|
+
def get_product_page(product_id):
|
|
721
|
+
product = product_service.get(product_id) # Required - fail if unavailable
|
|
722
|
+
reviews = safe_call(review_service.get, product_id, default=[])
|
|
723
|
+
recommendations = safe_call(rec_service.get, product_id, default=[])
|
|
724
|
+
seller_info = safe_call(seller_service.get, product.seller, default=None)
|
|
725
|
+
return render(product, reviews, recommendations, seller_info)
|
|
726
|
+
```
|
|
727
|
+
|
|
728
|
+
**Detection rule:**
|
|
729
|
+
|
|
730
|
+
Trace error propagation paths: flag any service that returns 5xx when an optional dependency is unavailable. Audit response handlers for missing try/catch blocks around non-critical service calls.
|
|
731
|
+
|
|
732
|
+
---
|
|
733
|
+
|
|
734
|
+
### AP-20: Event Sourcing Misuse
|
|
735
|
+
|
|
736
|
+
**Also known as:** Event Store Abuse, Premature Event Sourcing, CQRS Everywhere
|
|
737
|
+
**Frequency:** Moderate
|
|
738
|
+
**Severity:** High
|
|
739
|
+
**Detection difficulty:** Hard
|
|
740
|
+
|
|
741
|
+
**What it looks like:**
|
|
742
|
+
|
|
743
|
+
Event sourcing is adopted for every service in the system, including simple CRUD services that do not benefit from an event log. The event store becomes the primary query interface, requiring complex projections for basic lookups.
|
|
744
|
+
|
|
745
|
+
```python
|
|
746
|
+
# Simple user profile CRUD -- does NOT need event sourcing
|
|
747
|
+
class UserProfileEvents:
|
|
748
|
+
UserCreated = {"user_id": 1, "name": "Alice", "email": "alice@example.com"}
|
|
749
|
+
NameChanged = {"user_id": 1, "old": "Alice", "new": "Alicia"}
|
|
750
|
+
EmailChanged = {"user_id": 1, "old": "alice@example.com", "new": "alicia@example.com"}
|
|
751
|
+
AvatarUpdated = {"user_id": 1, "url": "/avatars/1.png"}
|
|
752
|
+
# To get current user state: replay ALL events for user 1
|
|
753
|
+
# 50,000 events later... just to show a profile page
|
|
754
|
+
```
|
|
755
|
+
|
|
756
|
+
**Why developers do it:**
|
|
757
|
+
|
|
758
|
+
Event sourcing is intellectually appealing -- a complete audit trail, ability to replay and reconstruct any past state, natural fit for event-driven architectures. Conference talks showcase event sourcing in domains like financial trading where it is genuinely valuable, and teams generalize it to every domain.
|
|
759
|
+
|
|
760
|
+
**What goes wrong:**
|
|
761
|
+
|
|
762
|
+
The event store becomes difficult to query -- reconstructing current state requires replaying events, which is complex and inefficient for typical read-heavy workloads. Applications must handle eventually consistent data, and the learning curve is steep. Teams that adopt event sourcing without understanding its trade-offs end up with systems where simple queries (get user by ID) require rebuilding state from thousands of events, read models fall out of sync with write models, and debugging involves traversing event chains rather than inspecting rows. Schema evolution for events is significantly harder than for database tables -- events are immutable, so a badly designed event schema persists forever.
|
|
763
|
+
|
|
764
|
+
**The fix:**
|
|
765
|
+
|
|
766
|
+
Use event sourcing only for domains that genuinely benefit: financial transactions, audit-critical workflows, collaborative editing, or systems requiring temporal queries. For standard CRUD services, use a regular database with change data capture if you need an event stream. Do not conflate event-driven architecture (good for decoupling) with event sourcing (specific persistence pattern).
|
|
767
|
+
|
|
768
|
+
**Detection rule:**
|
|
769
|
+
|
|
770
|
+
Flag services using event sourcing where the read-to-write ratio exceeds 100:1 and no temporal queries are performed. Audit for event stores used as primary query interfaces without materialized views.
|
|
771
|
+
|
|
772
|
+
---
|
|
773
|
+
|
|
774
|
+
## Root Cause Analysis
|
|
775
|
+
|
|
776
|
+
| Root Cause | Anti-Patterns Triggered | Prevalence |
|
|
777
|
+
|---|---|---|
|
|
778
|
+
| Monolith mindset applied to distributed system | AP-01, AP-03, AP-04, AP-07, AP-17 | Very High |
|
|
779
|
+
| Missing operational maturity | AP-05, AP-06, AP-11, AP-14, AP-15 | High |
|
|
780
|
+
| Premature decomposition / hype-driven architecture | AP-02, AP-18, AP-20 | High |
|
|
781
|
+
| DRY principle misapplied across service boundaries | AP-03, AP-10, AP-16 | High |
|
|
782
|
+
| Feature-first delivery without resilience design | AP-05, AP-08, AP-19 | High |
|
|
783
|
+
| Synchronous-first communication default | AP-04, AP-09, AP-17 | High |
|
|
784
|
+
| Lacking domain-driven design expertise | AP-01, AP-02, AP-03 | High |
|
|
785
|
+
| No contract testing in CI/CD | AP-16, AP-17 | Moderate |
|
|
786
|
+
| Inadequate team-to-service ratio | AP-02, AP-06, AP-18 | Moderate |
|
|
787
|
+
| Cargo-culting enterprise architecture | AP-06, AP-12, AP-20 | Moderate |
|
|
788
|
+
|
|
789
|
+
## Self-Check Questions
|
|
790
|
+
|
|
791
|
+
Use these questions during architecture reviews and design sessions to catch anti-patterns before they reach production:
|
|
792
|
+
|
|
793
|
+
1. **Can every service be deployed independently on a Friday afternoon without notifying any other team?** If not, you have a distributed monolith (AP-01) or coordinated releases (AP-17).
|
|
794
|
+
|
|
795
|
+
2. **Does each service own its data exclusively, with no other service reading from or writing to its database?** If not, you have a shared database (AP-03).
|
|
796
|
+
|
|
797
|
+
3. **What happens to the user experience when any single non-critical service is completely unavailable for 5 minutes?** If the answer is "everything breaks," you lack partial failure handling (AP-19) and circuit breakers (AP-05).
|
|
798
|
+
|
|
799
|
+
4. **Can you trace a single user request from entry to completion across all services it touches, using one identifier?** If not, you are missing correlation IDs (AP-15) and observability (AP-11).
|
|
800
|
+
|
|
801
|
+
5. **If a multi-step business transaction fails at step 3 of 5, are steps 1 and 2 automatically compensated?** If not, you need sagas (AP-08).
|
|
802
|
+
|
|
803
|
+
6. **Is your team-to-service ratio at most 1:3?** If each engineer owns more than 3 services, you may have nano-services (AP-02) or premature decomposition (AP-18).
|
|
804
|
+
|
|
805
|
+
7. **Does a single user-facing API call trigger fewer than 10 internal service calls?** If not, you have chatty services (AP-09).
|
|
806
|
+
|
|
807
|
+
8. **Can you update a shared library without redeploying more than one service?** If not, you have shared library coupling (AP-10).
|
|
808
|
+
|
|
809
|
+
9. **Do all services have consistent timeout, retry, and circuit breaker configurations, managed from a central source?** If not, you have configuration drift (AP-14).
|
|
810
|
+
|
|
811
|
+
10. **Is every API change tested against consumer contracts before deployment?** If not, you risk breaking contracts (AP-16).
|
|
812
|
+
|
|
813
|
+
11. **Could you explain to a new engineer why each service exists as a separate service and not a module?** If the answer is "that is how it was set up," you may have accidental complexity (AP-02, AP-18).
|
|
814
|
+
|
|
815
|
+
12. **Does your service mesh or infrastructure layer consume less than 15% of cluster resources?** If not, you may have service mesh explosion (AP-06).
|
|
816
|
+
|
|
817
|
+
13. **Are events in your event-sourced services genuinely needed for temporal queries or audit, or are they just a persistence mechanism?** If the latter, you have event sourcing misuse (AP-20).
|
|
818
|
+
|
|
819
|
+
14. **When was the last time you merged two services that always changed together?** If never, you may not be managing nano-services (AP-02).
|
|
820
|
+
|
|
821
|
+
15. **Do your staging and production environments have identical service configurations (timeouts, feature flags, pool sizes)?** If not, you have configuration drift (AP-14).
|
|
822
|
+
|
|
823
|
+
## Code Smell Quick Reference
|
|
824
|
+
|
|
825
|
+
| Smell | Likely Anti-Pattern | Severity | First Check |
|
|
826
|
+
|---|---|---|---|
|
|
827
|
+
| Deployment requires multi-service coordination | AP-01: Distributed Monolith | Critical | Deployment runbooks and release notes |
|
|
828
|
+
| Service has < 3 endpoints or < 500 LOC | AP-02: Nano-Services | High | Service catalog and code metrics |
|
|
829
|
+
| Multiple services share database tables | AP-03: Shared Database | Critical | Database connection audits |
|
|
830
|
+
| Request chain exceeds 3 synchronous hops | AP-04: Synchronous Chains | Critical | Distributed traces |
|
|
831
|
+
| No timeout or fallback on outbound calls | AP-05: Missing Circuit Breakers | Critical | Code review for HTTP client configuration |
|
|
832
|
+
| Sidecar proxy uses > 15% of pod resources | AP-06: Service Mesh Explosion | High | Kubernetes resource metrics |
|
|
833
|
+
| Cross-service read immediately after write | AP-07: Eventual Consistency Ignored | High | Code review and integration tests |
|
|
834
|
+
| Multi-service write with no compensation logic | AP-08: No Saga Pattern | High | Transaction flow diagrams |
|
|
835
|
+
| Single request triggers > 10 internal calls | AP-09: Chatty Services | High | Distributed trace fan-out metrics |
|
|
836
|
+
| Shared library contains domain model classes | AP-10: Shared Libraries Coupling | High | Dependency tree analysis |
|
|
837
|
+
| Cannot trace request across service boundaries | AP-11: No Observability | Critical | Log and trace sampling |
|
|
838
|
+
| Client directly calls internal service endpoints | AP-12: No API Gateway | High | Network policy and load balancer config |
|
|
839
|
+
| Hardcoded IPs in service configuration | AP-13: Service Discovery Failures | High | Configuration file grep |
|
|
840
|
+
| Timeout values differ between caller and callee | AP-14: Configuration Drift | High | Cross-service config diff |
|
|
841
|
+
| Log lines missing correlation/trace ID | AP-15: Missing Correlation IDs | High | Log sampling audit |
|
|
842
|
+
| API field removed without deprecation period | AP-16: Breaking Contracts | Critical | OpenAPI/protobuf diff in CI |
|
|
843
|
+
| Release notes mention "deploy X before Y" | AP-17: Coordinated Releases | Critical | Release process audit |
|
|
844
|
+
| Service-to-engineer ratio exceeds 3:1 | AP-18: Microservices for Small Teams | High | Service catalog vs. org chart |
|
|
845
|
+
| 500 error when optional dependency is down | AP-19: Partial Failure Unhandled | Critical | Chaos engineering / dependency kill tests |
|
|
846
|
+
| Event replay needed for simple lookups | AP-20: Event Sourcing Misuse | High | Query pattern analysis |
|
|
847
|
+
|
|
848
|
+
---
|
|
849
|
+
|
|
850
|
+
*Researched: 2026-03-08 | Sources: DoorDash Engineering Blog (Aperture/failure mitigation), Netflix Hystrix (GitHub/resilience patterns), Uber Engineering (microservices scaling/Death Star architecture), Twitter/X engineering (microservices shutdown and outages), Knight Capital post-mortem (Henricodolfing, Honeybadger), Microsoft .NET microservices guidance, AWS microservices whitepapers, Chris Richardson microservices.io, Martin Fowler (monolith-first), Spotify engineering (dependency chains), New Relic 2024 Observability Forecast, CNCF Istio graduation analysis, Pact contract testing, vFunction anti-patterns survey, ArXiv microservices anti-patterns taxonomy (Taibi et al.), Hacker News shared database discussions*
|