@accelerationguy/accel 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CLAUDE.md +19 -0
- package/LICENSE +33 -0
- package/README.md +275 -0
- package/bin/install.js +661 -0
- package/docs/getting-started.md +164 -0
- package/docs/module-guide.md +139 -0
- package/modules/drive/LICENSE +21 -0
- package/modules/drive/PAUL-VS-GSD.md +171 -0
- package/modules/drive/README.md +555 -0
- package/modules/drive/assets/terminal.svg +67 -0
- package/modules/drive/bin/install.js +210 -0
- package/modules/drive/integration.js +76 -0
- package/modules/drive/package.json +38 -0
- package/modules/drive/src/commands/add-phase.md +36 -0
- package/modules/drive/src/commands/apply.md +83 -0
- package/modules/drive/src/commands/assumptions.md +37 -0
- package/modules/drive/src/commands/audit.md +57 -0
- package/modules/drive/src/commands/complete-milestone.md +36 -0
- package/modules/drive/src/commands/config.md +175 -0
- package/modules/drive/src/commands/consider-issues.md +41 -0
- package/modules/drive/src/commands/discover.md +48 -0
- package/modules/drive/src/commands/discuss-milestone.md +33 -0
- package/modules/drive/src/commands/discuss.md +34 -0
- package/modules/drive/src/commands/flows.md +73 -0
- package/modules/drive/src/commands/handoff.md +201 -0
- package/modules/drive/src/commands/help.md +525 -0
- package/modules/drive/src/commands/init.md +54 -0
- package/modules/drive/src/commands/map-codebase.md +34 -0
- package/modules/drive/src/commands/milestone.md +34 -0
- package/modules/drive/src/commands/pause.md +44 -0
- package/modules/drive/src/commands/plan-fix.md +216 -0
- package/modules/drive/src/commands/plan.md +36 -0
- package/modules/drive/src/commands/progress.md +138 -0
- package/modules/drive/src/commands/register.md +29 -0
- package/modules/drive/src/commands/remove-phase.md +37 -0
- package/modules/drive/src/commands/research-phase.md +209 -0
- package/modules/drive/src/commands/research.md +47 -0
- package/modules/drive/src/commands/resume.md +49 -0
- package/modules/drive/src/commands/status.md +78 -0
- package/modules/drive/src/commands/unify.md +87 -0
- package/modules/drive/src/commands/verify.md +60 -0
- package/modules/drive/src/references/checkpoints.md +234 -0
- package/modules/drive/src/references/context-management.md +219 -0
- package/modules/drive/src/references/git-strategy.md +206 -0
- package/modules/drive/src/references/loop-phases.md +254 -0
- package/modules/drive/src/references/plan-format.md +263 -0
- package/modules/drive/src/references/quality-principles.md +152 -0
- package/modules/drive/src/references/research-quality-control.md +247 -0
- package/modules/drive/src/references/sonarqube-integration.md +244 -0
- package/modules/drive/src/references/specialized-workflow-integration.md +186 -0
- package/modules/drive/src/references/subagent-criteria.md +179 -0
- package/modules/drive/src/references/tdd.md +219 -0
- package/modules/drive/src/references/work-units.md +161 -0
- package/modules/drive/src/rules/commands.md +108 -0
- package/modules/drive/src/rules/references.md +107 -0
- package/modules/drive/src/rules/style.md +123 -0
- package/modules/drive/src/rules/templates.md +51 -0
- package/modules/drive/src/rules/workflows.md +133 -0
- package/modules/drive/src/templates/CONTEXT.md +88 -0
- package/modules/drive/src/templates/DEBUG.md +164 -0
- package/modules/drive/src/templates/DISCOVERY.md +148 -0
- package/modules/drive/src/templates/HANDOFF.md +77 -0
- package/modules/drive/src/templates/ISSUES.md +93 -0
- package/modules/drive/src/templates/MILESTONES.md +167 -0
- package/modules/drive/src/templates/PLAN.md +328 -0
- package/modules/drive/src/templates/PROJECT.md +219 -0
- package/modules/drive/src/templates/RESEARCH.md +130 -0
- package/modules/drive/src/templates/ROADMAP.md +328 -0
- package/modules/drive/src/templates/SPECIAL-FLOWS.md +70 -0
- package/modules/drive/src/templates/STATE.md +210 -0
- package/modules/drive/src/templates/SUMMARY.md +221 -0
- package/modules/drive/src/templates/UAT-ISSUES.md +139 -0
- package/modules/drive/src/templates/codebase/architecture.md +259 -0
- package/modules/drive/src/templates/codebase/concerns.md +329 -0
- package/modules/drive/src/templates/codebase/conventions.md +311 -0
- package/modules/drive/src/templates/codebase/integrations.md +284 -0
- package/modules/drive/src/templates/codebase/stack.md +190 -0
- package/modules/drive/src/templates/codebase/structure.md +287 -0
- package/modules/drive/src/templates/codebase/testing.md +484 -0
- package/modules/drive/src/templates/config.md +181 -0
- package/modules/drive/src/templates/milestone-archive.md +236 -0
- package/modules/drive/src/templates/milestone-context.md +190 -0
- package/modules/drive/src/templates/paul-json.md +147 -0
- package/modules/drive/src/vector-config/PAUL +26 -0
- package/modules/drive/src/vector-config/PAUL.manifest +11 -0
- package/modules/drive/src/workflows/apply-phase.md +393 -0
- package/modules/drive/src/workflows/audit-plan.md +344 -0
- package/modules/drive/src/workflows/complete-milestone.md +479 -0
- package/modules/drive/src/workflows/configure-special-flows.md +283 -0
- package/modules/drive/src/workflows/consider-issues.md +172 -0
- package/modules/drive/src/workflows/create-milestone.md +268 -0
- package/modules/drive/src/workflows/debug.md +292 -0
- package/modules/drive/src/workflows/discovery.md +187 -0
- package/modules/drive/src/workflows/discuss-milestone.md +245 -0
- package/modules/drive/src/workflows/discuss-phase.md +231 -0
- package/modules/drive/src/workflows/init-project.md +698 -0
- package/modules/drive/src/workflows/map-codebase.md +459 -0
- package/modules/drive/src/workflows/pause-work.md +259 -0
- package/modules/drive/src/workflows/phase-assumptions.md +181 -0
- package/modules/drive/src/workflows/plan-phase.md +385 -0
- package/modules/drive/src/workflows/quality-gate.md +263 -0
- package/modules/drive/src/workflows/register-manifest.md +107 -0
- package/modules/drive/src/workflows/research.md +241 -0
- package/modules/drive/src/workflows/resume-project.md +200 -0
- package/modules/drive/src/workflows/roadmap-management.md +334 -0
- package/modules/drive/src/workflows/transition-phase.md +368 -0
- package/modules/drive/src/workflows/unify-phase.md +290 -0
- package/modules/drive/src/workflows/verify-work.md +241 -0
- package/modules/forge/README.md +281 -0
- package/modules/forge/bin/install.js +200 -0
- package/modules/forge/package.json +32 -0
- package/modules/forge/skillsmith/rules/checklists-rules.md +42 -0
- package/modules/forge/skillsmith/rules/context-rules.md +43 -0
- package/modules/forge/skillsmith/rules/entry-point-rules.md +44 -0
- package/modules/forge/skillsmith/rules/frameworks-rules.md +43 -0
- package/modules/forge/skillsmith/rules/tasks-rules.md +52 -0
- package/modules/forge/skillsmith/rules/templates-rules.md +43 -0
- package/modules/forge/skillsmith/skillsmith.md +82 -0
- package/modules/forge/skillsmith/tasks/audit.md +277 -0
- package/modules/forge/skillsmith/tasks/discover.md +145 -0
- package/modules/forge/skillsmith/tasks/distill.md +276 -0
- package/modules/forge/skillsmith/tasks/scaffold.md +349 -0
- package/modules/forge/specs/checklists.md +193 -0
- package/modules/forge/specs/context.md +223 -0
- package/modules/forge/specs/entry-point.md +320 -0
- package/modules/forge/specs/frameworks.md +228 -0
- package/modules/forge/specs/rules.md +245 -0
- package/modules/forge/specs/tasks.md +344 -0
- package/modules/forge/specs/templates.md +335 -0
- package/modules/forge/terminal.svg +70 -0
- package/modules/ignition/README.md +245 -0
- package/modules/ignition/bin/install.js +184 -0
- package/modules/ignition/checklists/planning-quality.md +55 -0
- package/modules/ignition/data/application/config.md +21 -0
- package/modules/ignition/data/application/guide.md +51 -0
- package/modules/ignition/data/application/skill-loadout.md +11 -0
- package/modules/ignition/data/campaign/config.md +18 -0
- package/modules/ignition/data/campaign/guide.md +36 -0
- package/modules/ignition/data/campaign/skill-loadout.md +10 -0
- package/modules/ignition/data/client/config.md +18 -0
- package/modules/ignition/data/client/guide.md +36 -0
- package/modules/ignition/data/client/skill-loadout.md +11 -0
- package/modules/ignition/data/utility/config.md +18 -0
- package/modules/ignition/data/utility/guide.md +31 -0
- package/modules/ignition/data/utility/skill-loadout.md +8 -0
- package/modules/ignition/data/workflow/config.md +19 -0
- package/modules/ignition/data/workflow/guide.md +41 -0
- package/modules/ignition/data/workflow/skill-loadout.md +10 -0
- package/modules/ignition/integration.js +54 -0
- package/modules/ignition/package.json +35 -0
- package/modules/ignition/seed.md +81 -0
- package/modules/ignition/tasks/add-type.md +164 -0
- package/modules/ignition/tasks/graduate.md +182 -0
- package/modules/ignition/tasks/ideate.md +221 -0
- package/modules/ignition/tasks/launch.md +137 -0
- package/modules/ignition/tasks/status.md +71 -0
- package/modules/ignition/templates/planning-application.md +193 -0
- package/modules/ignition/templates/planning-campaign.md +138 -0
- package/modules/ignition/templates/planning-client.md +149 -0
- package/modules/ignition/templates/planning-utility.md +112 -0
- package/modules/ignition/templates/planning-workflow.md +125 -0
- package/modules/ignition/terminal.svg +74 -0
- package/modules/mission-control/CONTEXT-CONTINUITY-SPEC.md +293 -0
- package/modules/mission-control/CONTEXT-ENGINEERING-GUIDE.md +282 -0
- package/modules/mission-control/README.md +91 -0
- package/modules/mission-control/assets/terminal.svg +80 -0
- package/modules/mission-control/examples/entities.example.json +133 -0
- package/modules/mission-control/examples/projects.example.json +318 -0
- package/modules/mission-control/examples/state.example.json +183 -0
- package/modules/mission-control/examples/vector.example.json +245 -0
- package/modules/mission-control/mission-control/checklists/install-verification.md +46 -0
- package/modules/mission-control/mission-control/frameworks/framework-registry.md +83 -0
- package/modules/mission-control/mission-control/mission-control.md +83 -0
- package/modules/mission-control/mission-control/tasks/insights.md +73 -0
- package/modules/mission-control/mission-control/tasks/install.md +194 -0
- package/modules/mission-control/mission-control/tasks/status.md +125 -0
- package/modules/mission-control/schemas/entities.schema.json +89 -0
- package/modules/mission-control/schemas/projects.schema.json +221 -0
- package/modules/mission-control/schemas/state.schema.json +108 -0
- package/modules/mission-control/schemas/vector.schema.json +200 -0
- package/modules/momentum/README.md +678 -0
- package/modules/momentum/bin/install.js +563 -0
- package/modules/momentum/integration.js +131 -0
- package/modules/momentum/package.json +42 -0
- package/modules/momentum/schemas/entities.schema.json +89 -0
- package/modules/momentum/schemas/projects.schema.json +221 -0
- package/modules/momentum/schemas/state.schema.json +108 -0
- package/modules/momentum/src/commands/audit-claude-md.md +31 -0
- package/modules/momentum/src/commands/audit.md +33 -0
- package/modules/momentum/src/commands/groom.md +35 -0
- package/modules/momentum/src/commands/history.md +27 -0
- package/modules/momentum/src/commands/pulse.md +33 -0
- package/modules/momentum/src/commands/scaffold.md +33 -0
- package/modules/momentum/src/commands/status.md +28 -0
- package/modules/momentum/src/commands/surface-convert.md +35 -0
- package/modules/momentum/src/commands/surface-create.md +34 -0
- package/modules/momentum/src/commands/surface-list.md +27 -0
- package/modules/momentum/src/commands/vector-hygiene.md +33 -0
- package/modules/momentum/src/framework/context/momentum-principles.md +71 -0
- package/modules/momentum/src/framework/frameworks/audit-strategies.md +53 -0
- package/modules/momentum/src/framework/frameworks/satellite-registration.md +44 -0
- package/modules/momentum/src/framework/tasks/audit-claude-md.md +68 -0
- package/modules/momentum/src/framework/tasks/audit.md +64 -0
- package/modules/momentum/src/framework/tasks/groom.md +164 -0
- package/modules/momentum/src/framework/tasks/history.md +34 -0
- package/modules/momentum/src/framework/tasks/pulse.md +83 -0
- package/modules/momentum/src/framework/tasks/scaffold.md +202 -0
- package/modules/momentum/src/framework/tasks/status.md +35 -0
- package/modules/momentum/src/framework/tasks/surface-convert.md +143 -0
- package/modules/momentum/src/framework/tasks/surface-create.md +184 -0
- package/modules/momentum/src/framework/tasks/surface-list.md +42 -0
- package/modules/momentum/src/framework/tasks/vector-hygiene.md +160 -0
- package/modules/momentum/src/framework/templates/workspace-json.md +96 -0
- package/modules/momentum/src/hooks/_template.py +129 -0
- package/modules/momentum/src/hooks/active-hook.py +178 -0
- package/modules/momentum/src/hooks/backlog-hook.py +115 -0
- package/modules/momentum/src/hooks/mission-control-insights.py +169 -0
- package/modules/momentum/src/hooks/momentum-pulse-check.py +351 -0
- package/modules/momentum/src/hooks/operator.py +53 -0
- package/modules/momentum/src/hooks/psmm-injector.py +67 -0
- package/modules/momentum/src/hooks/satellite-detection.py +248 -0
- package/modules/momentum/src/packages/momentum-mcp/index.js +119 -0
- package/modules/momentum/src/packages/momentum-mcp/package.json +10 -0
- package/modules/momentum/src/packages/momentum-mcp/tools/entities.js +226 -0
- package/modules/momentum/src/packages/momentum-mcp/tools/operator.js +106 -0
- package/modules/momentum/src/packages/momentum-mcp/tools/projects.js +322 -0
- package/modules/momentum/src/packages/momentum-mcp/tools/psmm.js +206 -0
- package/modules/momentum/src/packages/momentum-mcp/tools/state.js +199 -0
- package/modules/momentum/src/packages/momentum-mcp/tools/surfaces.js +404 -0
- package/modules/momentum/src/skill/momentum.md +111 -0
- package/modules/momentum/src/tasks/groom.md +164 -0
- package/modules/momentum/src/templates/operator.json +66 -0
- package/modules/momentum/src/templates/workspace.json +111 -0
- package/modules/momentum/terminal.svg +77 -0
- package/modules/radar/README.md +1552 -0
- package/modules/radar/commands/audit.md +233 -0
- package/modules/radar/commands/guardrails.md +194 -0
- package/modules/radar/commands/init.md +207 -0
- package/modules/radar/commands/playbook.md +176 -0
- package/modules/radar/commands/remediate.md +156 -0
- package/modules/radar/commands/report.md +172 -0
- package/modules/radar/commands/resume.md +176 -0
- package/modules/radar/commands/status.md +148 -0
- package/modules/radar/commands/transform.md +205 -0
- package/modules/radar/commands/validate.md +177 -0
- package/modules/radar/docs/ARCHITECTURE.md +336 -0
- package/modules/radar/docs/GETTING-STARTED.md +287 -0
- package/modules/radar/docs/standards/agents.md +197 -0
- package/modules/radar/docs/standards/commands.md +250 -0
- package/modules/radar/docs/standards/domains.md +191 -0
- package/modules/radar/docs/standards/personas.md +211 -0
- package/modules/radar/docs/standards/rules.md +218 -0
- package/modules/radar/docs/standards/runtime.md +445 -0
- package/modules/radar/docs/standards/schemas.md +269 -0
- package/modules/radar/docs/standards/tools.md +273 -0
- package/modules/radar/docs/standards/workflows.md +254 -0
- package/modules/radar/docs/terminal.svg +72 -0
- package/modules/radar/docs/validation/convention-compliance-report.md +183 -0
- package/modules/radar/docs/validation/cross-reference-report.md +195 -0
- package/modules/radar/docs/validation/validation-summary.md +118 -0
- package/modules/radar/docs/validation/version-manifest.yaml +363 -0
- package/modules/radar/install.sh +711 -0
- package/modules/radar/integration.js +53 -0
- package/modules/radar/src/core/agents/architect.md +25 -0
- package/modules/radar/src/core/agents/compliance-officer.md +25 -0
- package/modules/radar/src/core/agents/data-engineer.md +25 -0
- package/modules/radar/src/core/agents/devils-advocate.md +22 -0
- package/modules/radar/src/core/agents/performance-engineer.md +25 -0
- package/modules/radar/src/core/agents/principal-engineer.md +23 -0
- package/modules/radar/src/core/agents/reality-gap-analyst.md +22 -0
- package/modules/radar/src/core/agents/security-engineer.md +25 -0
- package/modules/radar/src/core/agents/senior-app-engineer.md +25 -0
- package/modules/radar/src/core/agents/sre.md +25 -0
- package/modules/radar/src/core/agents/staff-engineer.md +23 -0
- package/modules/radar/src/core/agents/test-engineer.md +25 -0
- package/modules/radar/src/core/personas/architect.md +111 -0
- package/modules/radar/src/core/personas/compliance-officer.md +104 -0
- package/modules/radar/src/core/personas/data-engineer.md +113 -0
- package/modules/radar/src/core/personas/devils-advocate.md +105 -0
- package/modules/radar/src/core/personas/performance-engineer.md +119 -0
- package/modules/radar/src/core/personas/principal-engineer.md +119 -0
- package/modules/radar/src/core/personas/reality-gap-analyst.md +111 -0
- package/modules/radar/src/core/personas/security-engineer.md +108 -0
- package/modules/radar/src/core/personas/senior-app-engineer.md +111 -0
- package/modules/radar/src/core/personas/sre.md +117 -0
- package/modules/radar/src/core/personas/staff-engineer.md +109 -0
- package/modules/radar/src/core/personas/test-engineer.md +109 -0
- package/modules/radar/src/core/workflows/disagreement-resolution.md +183 -0
- package/modules/radar/src/core/workflows/phase-0-context.md +148 -0
- package/modules/radar/src/core/workflows/phase-1-reconnaissance.md +169 -0
- package/modules/radar/src/core/workflows/phase-2-domain-audits.md +190 -0
- package/modules/radar/src/core/workflows/phase-3-cross-domain.md +177 -0
- package/modules/radar/src/core/workflows/phase-4-adversarial-review.md +165 -0
- package/modules/radar/src/core/workflows/phase-5-report.md +189 -0
- package/modules/radar/src/core/workflows/phase-checkpoint.md +222 -0
- package/modules/radar/src/core/workflows/session-handoff.md +152 -0
- package/modules/radar/src/domains/00-context.md +201 -0
- package/modules/radar/src/domains/01-architecture.md +248 -0
- package/modules/radar/src/domains/02-data.md +224 -0
- package/modules/radar/src/domains/03-correctness.md +230 -0
- package/modules/radar/src/domains/04-security.md +274 -0
- package/modules/radar/src/domains/05-compliance.md +228 -0
- package/modules/radar/src/domains/06-testing.md +228 -0
- package/modules/radar/src/domains/07-reliability.md +246 -0
- package/modules/radar/src/domains/08-performance.md +247 -0
- package/modules/radar/src/domains/09-maintainability.md +271 -0
- package/modules/radar/src/domains/10-operability.md +250 -0
- package/modules/radar/src/domains/11-change-risk.md +246 -0
- package/modules/radar/src/domains/12-team-risk.md +221 -0
- package/modules/radar/src/domains/13-risk-synthesis.md +202 -0
- package/modules/radar/src/rules/agent-boundaries.md +78 -0
- package/modules/radar/src/rules/disagreement-protocol.md +76 -0
- package/modules/radar/src/rules/epistemic-hygiene.md +78 -0
- package/modules/radar/src/schemas/confidence.md +185 -0
- package/modules/radar/src/schemas/disagreement.md +238 -0
- package/modules/radar/src/schemas/finding.md +287 -0
- package/modules/radar/src/schemas/report-section.md +150 -0
- package/modules/radar/src/schemas/signal.md +108 -0
- package/modules/radar/src/tools/checkov.md +463 -0
- package/modules/radar/src/tools/git-history.md +581 -0
- package/modules/radar/src/tools/gitleaks.md +447 -0
- package/modules/radar/src/tools/grype.md +611 -0
- package/modules/radar/src/tools/semgrep.md +378 -0
- package/modules/radar/src/tools/sonarqube.md +550 -0
- package/modules/radar/src/tools/syft.md +539 -0
- package/modules/radar/src/tools/trivy.md +439 -0
- package/modules/radar/src/transform/agents/change-risk-modeler.md +24 -0
- package/modules/radar/src/transform/agents/execution-validator.md +24 -0
- package/modules/radar/src/transform/agents/guardrail-generator.md +24 -0
- package/modules/radar/src/transform/agents/pedagogy-agent.md +24 -0
- package/modules/radar/src/transform/agents/remediation-architect.md +24 -0
- package/modules/radar/src/transform/personas/change-risk-modeler.md +95 -0
- package/modules/radar/src/transform/personas/execution-validator.md +95 -0
- package/modules/radar/src/transform/personas/guardrail-generator.md +103 -0
- package/modules/radar/src/transform/personas/pedagogy-agent.md +105 -0
- package/modules/radar/src/transform/personas/remediation-architect.md +95 -0
- package/modules/radar/src/transform/rules/change-risk-rules.md +87 -0
- package/modules/radar/src/transform/rules/safety-governance.md +87 -0
- package/modules/radar/src/transform/schemas/change-risk.md +139 -0
- package/modules/radar/src/transform/schemas/intervention-level.md +207 -0
- package/modules/radar/src/transform/schemas/playbook.md +205 -0
- package/modules/radar/src/transform/schemas/verification-plan.md +134 -0
- package/modules/radar/src/transform/workflows/phase-6-remediation.md +148 -0
- package/modules/radar/src/transform/workflows/phase-7-risk-validation.md +161 -0
- package/modules/radar/src/transform/workflows/phase-8-execution-planning.md +159 -0
- package/modules/radar/src/transform/workflows/transform-safety.md +158 -0
- package/modules/vector/.vector-template/sessions/.gitkeep +0 -0
- package/modules/vector/.vector-template/vector.json +72 -0
- package/modules/vector/AUDIT-CLAUDEMD.md +154 -0
- package/modules/vector/INSTALL.md +185 -0
- package/modules/vector/LICENSE +21 -0
- package/modules/vector/README.md +409 -0
- package/modules/vector/VECTOR-BLOCK.md +57 -0
- package/modules/vector/assets/terminal.svg +68 -0
- package/modules/vector/bin/install.js +455 -0
- package/modules/vector/bin/migrate-v1-to-v2.sh +492 -0
- package/modules/vector/commands/help.md +46 -0
- package/modules/vector/hooks/vector-hook.py +775 -0
- package/modules/vector/mcp/index.js +118 -0
- package/modules/vector/mcp/package.json +10 -0
- package/modules/vector/mcp/tools/decisions.js +269 -0
- package/modules/vector/mcp/tools/domains.js +361 -0
- package/modules/vector/mcp/tools/staging.js +252 -0
- package/modules/vector/mcp/tools/vector-json.js +647 -0
- package/modules/vector/package.json +38 -0
- package/modules/vector/schemas/vector.schema.json +237 -0
- package/package.json +39 -0
- package/shared/branding/branding.js +70 -0
- package/shared/config/defaults.json +59 -0
- package/shared/events/README.md +175 -0
- package/shared/events/event-bus.js +134 -0
- package/shared/events/event_bus.py +255 -0
- package/shared/events/integrations.js +161 -0
- package/shared/events/schemas/audit-complete.schema.json +21 -0
- package/shared/events/schemas/phase-progress.schema.json +23 -0
- package/shared/events/schemas/plan-created.schema.json +21 -0
|
@@ -0,0 +1,228 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: domain-06
|
|
3
|
+
number: "06"
|
|
4
|
+
name: Testing Strategy & Verification
|
|
5
|
+
owner_agents: [test-engineer]
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Overview
|
|
9
|
+
|
|
10
|
+
This domain addresses test pyramid shape, determinism, coverage gaps, mutation resistance, contract testing, and failure path coverage. Senior engineers recognize that tests are the primary defense against regressions and the most reliable form of system documentation. The central question is: "Would these tests catch the most expensive failures?" This domain does NOT cover code correctness implementation details (domain 03), security vulnerability scanning (domain 04), or deployment verification and operational monitoring (domain 10).
|
|
11
|
+
|
|
12
|
+
## Audit Questions
|
|
13
|
+
|
|
14
|
+
- Does the test suite follow the test pyramid principle (many unit tests, fewer integration tests, minimal E2E tests)?
|
|
15
|
+
- Are tests deterministic and free from flakiness, or do they exhibit intermittent failures due to timing, concurrency, or external dependencies?
|
|
16
|
+
- What critical user flows, edge cases, or error paths are NOT covered by existing tests?
|
|
17
|
+
- Does the codebase include mutation testing or property-based testing to verify test effectiveness beyond line coverage?
|
|
18
|
+
- Are contract tests in place for external API integrations, microservice boundaries, or third-party dependencies?
|
|
19
|
+
- Do tests serve as executable documentation, clearly communicating expected behavior to future maintainers?
|
|
20
|
+
- Are failure paths (error handling, timeouts, invalid inputs) tested as rigorously as success paths?
|
|
21
|
+
- How is test data managed—does it rely on brittle fixtures, or are factories and builders used for flexible test setup?
|
|
22
|
+
- Are tests isolated from each other, or do they share state that causes order-dependent failures?
|
|
23
|
+
- Can the test suite run in parallel without race conditions or resource contention?
|
|
24
|
+
- Are slow tests (>1s for unit, >10s for integration) identified and optimized or moved to appropriate test layers?
|
|
25
|
+
- Does the CI pipeline fail fast on test failures with clear error messages and reproducible failure artifacts?
|
|
26
|
+
|
|
27
|
+
## Failure Patterns
|
|
28
|
+
|
|
29
|
+
### Inverted Test Pyramid
|
|
30
|
+
- **Description:** Test suite contains disproportionately more end-to-end or integration tests than unit tests, leading to slow feedback loops, brittle tests, and high maintenance costs. The test pyramid (Fowler) recommends a broad base of fast unit tests with fewer higher-level tests.
|
|
31
|
+
- **Indicators:**
|
|
32
|
+
- Test execution time exceeds 10 minutes for a full run, with most time spent in browser automation or API integration tests
|
|
33
|
+
- Test count ratio shows more E2E tests than unit tests (e.g., 200 E2E, 50 unit tests)
|
|
34
|
+
- Test files primarily use frameworks like Selenium, Cypress, or Playwright rather than unit testing frameworks (Jest, pytest, JUnit)
|
|
35
|
+
- Business logic embedded in controllers, views, or API handlers is untested in isolation
|
|
36
|
+
- Developers skip running full test suite locally due to time constraints, relying solely on CI
|
|
37
|
+
- **Severity Tendency:** high
|
|
38
|
+
|
|
39
|
+
### Flaky Tests
|
|
40
|
+
- **Description:** Tests exhibit intermittent failures unrelated to code changes, caused by timing issues, race conditions, external service dependencies, or shared state. Flaky tests erode trust and force developers to re-run CI pipelines repeatedly.
|
|
41
|
+
- **Indicators:**
|
|
42
|
+
- Test suite failures occur in CI but pass when re-run without code changes
|
|
43
|
+
- Tests use `sleep()`, `setTimeout()`, or fixed delays instead of explicit waits for asynchronous operations
|
|
44
|
+
- Tests depend on external services (third-party APIs, staging databases) without mocking or stubbing
|
|
45
|
+
- Test execution order affects outcomes—tests pass individually but fail when run as a suite
|
|
46
|
+
- Tests rely on system clock, random number generators, or filesystem state without deterministic seeding
|
|
47
|
+
- **Severity Tendency:** medium
|
|
48
|
+
|
|
49
|
+
### Missing Failure Path Tests
|
|
50
|
+
- **Description:** Test suite validates success paths (happy paths) but omits error handling, edge cases, invalid inputs, timeouts, and exception scenarios. Production failures occur in untested error paths.
|
|
51
|
+
- **Indicators:**
|
|
52
|
+
- Code coverage reports show high line coverage but low branch coverage (missing else/catch blocks)
|
|
53
|
+
- Error handling blocks (`try/except`, `catch`, `if err != nil`) lack corresponding test cases
|
|
54
|
+
- API tests only verify HTTP 200 responses, ignoring 400/401/403/404/500 error conditions
|
|
55
|
+
- Timeout and retry logic in HTTP clients, database connections, or message queues is untested
|
|
56
|
+
- Validation logic for user inputs or API payloads lacks test cases for malformed, missing, or boundary-exceeding values
|
|
57
|
+
- **Severity Tendency:** high
|
|
58
|
+
|
|
59
|
+
### Tautological Tests
|
|
60
|
+
- **Description:** Tests verify implementation details rather than behavior, or pass trivially by asserting the same logic as production code. These tests provide false confidence and fail to catch regressions when refactoring.
|
|
61
|
+
- **Indicators:**
|
|
62
|
+
- Tests directly inspect private methods, internal state, or implementation-specific data structures
|
|
63
|
+
- Test assertions duplicate production logic (e.g., `assert result == calculate_tax(input)` where `calculate_tax` is the function under test)
|
|
64
|
+
- Mocks or stubs return hardcoded values that match expected outputs without verifying logic correctness
|
|
65
|
+
- Tests pass even after intentionally introducing bugs (mutation testing would reveal ineffective assertions)
|
|
66
|
+
- Refactoring code without changing behavior causes test failures due to tightly coupled test-implementation binding
|
|
67
|
+
- **Severity Tendency:** medium
|
|
68
|
+
|
|
69
|
+
### No Contract Tests
|
|
70
|
+
- **Description:** System lacks contract tests for external API integrations, microservice boundaries, or third-party dependencies. Schema changes or breaking API updates cause production failures without advance warning.
|
|
71
|
+
- **Indicators:**
|
|
72
|
+
- No consumer-driven contract tests (Pact, Spring Cloud Contract) for microservice interactions
|
|
73
|
+
- API integration tests mock external services without verifying mock responses match real API behavior
|
|
74
|
+
- OpenAPI/Swagger schemas are not validated against live API responses in tests
|
|
75
|
+
- Database schema migrations lack rollback tests or compatibility verification with existing application code
|
|
76
|
+
- Third-party SDK version upgrades cause runtime errors despite passing test suite
|
|
77
|
+
- **Severity Tendency:** high
|
|
78
|
+
|
|
79
|
+
### Untested Critical Paths
|
|
80
|
+
- **Description:** Core business logic, revenue-generating flows, or safety-critical functionality lacks comprehensive test coverage. High-impact failures occur in features assumed to be reliable.
|
|
81
|
+
- **Indicators:**
|
|
82
|
+
- Payment processing, checkout flows, or subscription management lack end-to-end test coverage
|
|
83
|
+
- Authentication, authorization, or access control logic is tested only manually or via ad-hoc scripts
|
|
84
|
+
- Data migration scripts, batch processing jobs, or cron tasks have no automated tests
|
|
85
|
+
- Admin tools, internal dashboards, or operational scripts bypass test suite due to "internal use only" rationale
|
|
86
|
+
- Features flagged as "critical" in incident postmortems lack corresponding test coverage improvements
|
|
87
|
+
- **Severity Tendency:** critical
|
|
88
|
+
|
|
89
|
+
### Shallow Mocking
|
|
90
|
+
- **Description:** Tests over-mock dependencies, isolating units to the point of testing trivial logic while ignoring integration failures. Mocks do not verify realistic interactions or return representative data.
|
|
91
|
+
- **Indicators:**
|
|
92
|
+
- Every external dependency (database, HTTP client, file system) is mocked in all tests, with no integration tests
|
|
93
|
+
- Mocks return empty arrays, null values, or unrealistic stub data that would never occur in production
|
|
94
|
+
- Mock verification checks call counts (`verify(mock, times(1)).method()`) without asserting argument correctness
|
|
95
|
+
- Tests pass with all dependencies mocked, but integration tests or staging deployments reveal compatibility issues
|
|
96
|
+
- Refactoring internal method calls causes widespread test failures despite no behavior changes
|
|
97
|
+
- **Severity Tendency:** medium
|
|
98
|
+
|
|
99
|
+
## Best Practice Patterns
|
|
100
|
+
|
|
101
|
+
### Balanced Test Pyramid
|
|
102
|
+
- **Replaces Failure Pattern:** Inverted Test Pyramid
|
|
103
|
+
- **Abstract Pattern:** Structure test suite with a broad base of fast, isolated unit tests (70%), moderate integration tests (20%), and minimal end-to-end tests (10%). Optimize feedback speed while maintaining confidence in system behavior.
|
|
104
|
+
- **Framework Mappings:**
|
|
105
|
+
- **Jest (JavaScript/TypeScript):** Write unit tests with `jest.fn()` mocks for dependencies, integration tests using `supertest` for API routes, and E2E tests with Playwright for critical user flows.
|
|
106
|
+
- **pytest (Python):** Use `pytest` with `unittest.mock` for unit tests, `pytest-flask` or `httpx` for integration tests, and `playwright-pytest` for E2E validation.
|
|
107
|
+
- **JUnit 5 (Java):** Apply `@Test` annotations for unit tests with Mockito, `@SpringBootTest` for integration tests, and Selenium/TestContainers for E2E scenarios.
|
|
108
|
+
- **Language Patterns:**
|
|
109
|
+
- **TypeScript:** Separate test files by layer (`*.unit.test.ts`, `*.integration.test.ts`, `*.e2e.test.ts`) with distinct Jest configurations for parallelization and timeouts.
|
|
110
|
+
- **Python:** Use pytest markers (`@pytest.mark.unit`, `@pytest.mark.integration`) to selectively run test subsets in CI stages.
|
|
111
|
+
- **Go:** Leverage `testing` package with build tags (`//go:build integration`) to separate unit tests from integration tests.
|
|
112
|
+
|
|
113
|
+
### Deterministic Test Design
|
|
114
|
+
- **Replaces Failure Pattern:** Flaky Tests
|
|
115
|
+
- **Abstract Pattern:** Eliminate non-determinism by mocking external dependencies, using explicit waits for async operations, seeding random generators, and isolating test state. Ensure tests produce identical results on every run.
|
|
116
|
+
- **Framework Mappings:**
|
|
117
|
+
- **Cypress:** Use `cy.intercept()` to stub network requests, `cy.wait('@alias')` for explicit asynchronous waits, and `cy.clock()` to control time-dependent behavior.
|
|
118
|
+
- **pytest:** Apply `freezegun` to mock `datetime.now()`, `responses` or `httpx_mock` for HTTP requests, and `pytest-randomly` with fixed seeds for reproducible test order.
|
|
119
|
+
- **Testcontainers:** Use containerized databases (PostgreSQL, MongoDB) with `@Container` annotations to ensure clean state per test run without relying on external services.
|
|
120
|
+
- **Language Patterns:**
|
|
121
|
+
- **JavaScript:** Use `jest.useFakeTimers()` to control async callbacks, `nock` for HTTP mocking, and `uuid.v4 = jest.fn(() => 'fixed-uuid')` for deterministic ID generation.
|
|
122
|
+
- **Python:** Apply `unittest.mock.patch()` to replace `random.choice`, `uuid.uuid4`, or `requests.get` with deterministic stubs.
|
|
123
|
+
- **Java:** Use `@MockBean` in Spring Boot tests to inject mocked dependencies, `WireMock` for HTTP service stubs, and `Clock.fixed()` for time-dependent logic.
|
|
124
|
+
|
|
125
|
+
### Comprehensive Failure Path Coverage
|
|
126
|
+
- **Replaces Failure Pattern:** Missing Failure Path Tests
|
|
127
|
+
- **Abstract Pattern:** Explicitly test error handling, timeouts, retries, invalid inputs, and exception scenarios. Verify that failure modes degrade gracefully and produce actionable error messages.
|
|
128
|
+
- **Framework Mappings:**
|
|
129
|
+
- **pytest:** Use `pytest.raises(ExceptionType)` to assert expected exceptions, `pytest.mark.parametrize` for boundary value testing, and `pytest-timeout` to verify timeout handling.
|
|
130
|
+
- **Jest:** Apply `.rejects.toThrow()` for promise rejections, `expect(fn).toThrow(ErrorClass)` for synchronous errors, and `jest.advanceTimersByTime()` to trigger timeout logic.
|
|
131
|
+
- **JUnit 5:** Use `@Test(expected = Exception.class)` or `assertThrows()` for exception validation, `@ParameterizedTest` for edge cases, and `@Timeout` for performance boundaries.
|
|
132
|
+
- **Language Patterns:**
|
|
133
|
+
- **Python:** Test exception paths with `with pytest.raises(ValueError, match="expected message"):` and parametrize edge cases (`@pytest.mark.parametrize("input", [None, "", -1, sys.maxsize])`).
|
|
134
|
+
- **TypeScript:** Validate error responses with `expect(response.status).toBe(400)` and assert error payloads match API schema definitions.
|
|
135
|
+
- **Go:** Use `if err != nil` assertions in tests, table-driven tests for invalid inputs, and `context.WithTimeout()` to verify timeout handling.
|
|
136
|
+
|
|
137
|
+
### Behavioral Test Assertions
|
|
138
|
+
- **Replaces Failure Pattern:** Tautological Tests
|
|
139
|
+
- **Abstract Pattern:** Test observable behavior and contracts rather than implementation details. Assert on outputs, side effects, and interactions without depending on internal state or private methods.
|
|
140
|
+
- **Framework Mappings:**
|
|
141
|
+
- **RSpec (Ruby):** Use `expect(result).to eq(expected_value)` rather than inspecting private instance variables, and `have_received(:method).with(args)` for interaction verification.
|
|
142
|
+
- **pytest:** Apply `assert` statements on public API outputs, `mock.assert_called_once_with(args)` for behavior verification, and `capsys` to validate stdout/stderr.
|
|
143
|
+
- **Jest:** Use `.toHaveBeenCalledWith(args)` for mock verification, snapshot testing for complex output structures, and `.toMatchObject()` for partial assertion flexibility.
|
|
144
|
+
- **Language Patterns:**
|
|
145
|
+
- **Python:** Test public methods with `assert service.process(input) == expected_output` rather than `assert service._internal_state == value`.
|
|
146
|
+
- **JavaScript:** Verify API responses with `expect(response.body).toEqual({ key: "value" })` without asserting on controller internal variables.
|
|
147
|
+
- **Java:** Use AssertJ fluent assertions (`assertThat(result).isNotNull().hasFieldOrPropertyWithValue("status", "success")`) focused on outcomes.
|
|
148
|
+
|
|
149
|
+
### Consumer-Driven Contract Testing
|
|
150
|
+
- **Replaces Failure Pattern:** No Contract Tests
|
|
151
|
+
- **Abstract Pattern:** Verify API contracts between services using consumer-driven tests or schema validation. Ensure producers and consumers agree on request/response formats, preventing integration failures.
|
|
152
|
+
- **Framework Mappings:**
|
|
153
|
+
- **Pact (Polyglot):** Define consumer expectations with `pact.given().uponReceiving().withRequest().willRespondWith()`, publish contracts to Pact Broker, and verify provider compliance.
|
|
154
|
+
- **Spring Cloud Contract:** Use Groovy DSL to define contracts in producer repos, generate WireMock stubs for consumers, and validate contracts in provider CI pipelines.
|
|
155
|
+
- **OpenAPI Validator:** Apply `openapi-validator` or `schemathesis` to test live API responses against OpenAPI schemas in integration tests.
|
|
156
|
+
- **Language Patterns:**
|
|
157
|
+
- **Python:** Use `pact-python` to define consumer expectations, publish to broker, and verify provider with `pytest-pact`.
|
|
158
|
+
- **JavaScript:** Apply `@pact-foundation/pact` to generate consumer contracts, integrate with Jest, and validate provider endpoints in CI.
|
|
159
|
+
- **Java:** Use `@PactTestFor` annotations with JUnit 5 to define consumer tests and `@Provider` to verify producer compliance.
|
|
160
|
+
|
|
161
|
+
### Critical Path Test Prioritization
|
|
162
|
+
- **Replaces Failure Pattern:** Untested Critical Paths
|
|
163
|
+
- **Abstract Pattern:** Identify high-impact business flows through risk analysis and ensure comprehensive test coverage for revenue-generating, safety-critical, or frequently used features. Prioritize test development based on failure cost.
|
|
164
|
+
- **Framework Mappings:**
|
|
165
|
+
- **Cypress (E2E):** Create dedicated test suites for checkout, authentication, and payment flows with `describe("Critical: Checkout Flow")` naming conventions.
|
|
166
|
+
- **pytest:** Use `@pytest.mark.critical` to tag high-priority tests, configure CI to fail fast on critical test failures, and track coverage separately for critical modules.
|
|
167
|
+
- **Postman/Newman:** Maintain integration test collections for critical API endpoints, run in CI with `newman run critical-flows.json`, and alert on failures.
|
|
168
|
+
- **Language Patterns:**
|
|
169
|
+
- **Python:** Apply coverage reports with `--cov-report=html --cov-fail-under=90` specifically for critical modules (e.g., `payment/`, `auth/`).
|
|
170
|
+
- **TypeScript:** Use custom Jest reporters to highlight coverage gaps in critical paths, integrated with PR comments via CI.
|
|
171
|
+
- **Go:** Leverage `go test -coverprofile` with manual review of untested lines in critical packages (`auth`, `billing`, `encryption`).
|
|
172
|
+
|
|
173
|
+
### Realistic Integration Testing
|
|
174
|
+
- **Replaces Failure Pattern:** Shallow Mocking
|
|
175
|
+
- **Abstract Pattern:** Balance unit tests (isolated with mocks) and integration tests (real dependencies) to verify component interactions. Use test doubles that accurately represent production behavior.
|
|
176
|
+
- **Framework Mappings:**
|
|
177
|
+
- **Testcontainers:** Spin up ephemeral Docker containers (PostgreSQL, Redis, Kafka) for integration tests, ensuring realistic database queries and message queue interactions.
|
|
178
|
+
- **Spring Boot `@DataJpaTest`:** Use embedded H2 or Testcontainers PostgreSQL for repository layer tests with real SQL execution and transaction management.
|
|
179
|
+
- **SuperTest (Node.js):** Test Express/Fastify routes with real middleware and database connections, mocking only external third-party APIs.
|
|
180
|
+
- **Language Patterns:**
|
|
181
|
+
- **Python:** Use `pytest-docker` or `testcontainers-python` to provision databases, apply real ORM queries (SQLAlchemy, Django ORM), and validate transaction rollback.
|
|
182
|
+
- **Java:** Apply `@SpringBootTest(webEnvironment = RANDOM_PORT)` with Testcontainers for full integration tests, verifying Hibernate entity mappings and transaction boundaries.
|
|
183
|
+
- **Go:** Use `dockertest` to start PostgreSQL containers, run migrations with `golang-migrate`, and execute integration tests with real `database/sql` connections.
|
|
184
|
+
|
|
185
|
+
## Red Flags
|
|
186
|
+
|
|
187
|
+
- Test suite execution exceeds 10 minutes with majority of time in browser automation or E2E tests
|
|
188
|
+
- CI pipelines frequently re-run due to intermittent test failures without code changes
|
|
189
|
+
- Code coverage reports show high line coverage (>80%) but low branch coverage (<50%)
|
|
190
|
+
- Error handling blocks (`try/catch`, `if err != nil`) lack corresponding test files
|
|
191
|
+
- No tests for authentication, payment processing, or data migration scripts
|
|
192
|
+
- Mocks return empty arrays, `null`, or hardcoded stubs without realistic data
|
|
193
|
+
- Tests depend on external staging environments or third-party APIs without fallback stubs
|
|
194
|
+
- Test files import production code private methods or internal modules
|
|
195
|
+
- No contract tests for microservice boundaries or external API integrations
|
|
196
|
+
- Developers skip running tests locally due to flakiness or execution time, relying solely on CI
|
|
197
|
+
|
|
198
|
+
## Tool Affinities
|
|
199
|
+
|
|
200
|
+
| Tool ID | Signal Type | Relevance |
|
|
201
|
+
|---------|-------------|-----------|
|
|
202
|
+
| SonarQube | Test coverage metrics, branch coverage, code duplication in tests | primary |
|
|
203
|
+
| git-history | Test file churn, deleted tests without replacement, coverage trends over time | supporting |
|
|
204
|
+
| Semgrep | Insecure test patterns (hardcoded credentials, disabled SSL verification) | contextual |
|
|
205
|
+
| Trivy | Vulnerabilities in test dependencies (vulnerable test frameworks, outdated mocking libraries) | contextual |
|
|
206
|
+
|
|
207
|
+
## Standards & Frameworks
|
|
208
|
+
|
|
209
|
+
- **Test Pyramid (Martin Fowler):** Broad base of unit tests, moderate integration tests, minimal E2E tests for fast feedback and maintainability
|
|
210
|
+
- **Testing Trophy (Kent C. Dodds):** Emphasizes integration tests as the most cost-effective layer, with supporting unit and E2E tests
|
|
211
|
+
- **Test-Driven Development (TDD):** Red-green-refactor cycle ensuring tests drive design and validate behavior before implementation
|
|
212
|
+
- **Property-Based Testing:** Hypothesis (Python), fast-check (JavaScript), QuickCheck (Haskell) for generative testing of invariants
|
|
213
|
+
- **Mutation Testing:** Stryker (JavaScript), mutmut (Python), PIT (Java) to verify test suite effectiveness by introducing code mutations
|
|
214
|
+
- **Behavior-Driven Development (BDD):** Cucumber/Gherkin for specification by example, linking user stories to executable tests
|
|
215
|
+
- **Consumer-Driven Contracts:** Pact, Spring Cloud Contract for verifying API compatibility across service boundaries
|
|
216
|
+
|
|
217
|
+
## Metrics
|
|
218
|
+
|
|
219
|
+
| Metric | What It Measures | Healthy Range |
|
|
220
|
+
|--------|-----------------|---------------|
|
|
221
|
+
| Test Coverage (Line) | Percentage of codebase executed during test runs | ≥80% for critical paths, ≥70% overall |
|
|
222
|
+
| Test Coverage (Branch) | Percentage of conditional branches (if/else) covered by tests | ≥75% |
|
|
223
|
+
| Test-to-Code Ratio | Lines of test code divided by lines of production code | 1:1 to 2:1 (varies by domain) |
|
|
224
|
+
| Test Execution Time | Duration of full test suite run (unit + integration + E2E) | <5 minutes for unit, <10 minutes total |
|
|
225
|
+
| Flaky Test Rate | Percentage of tests with intermittent failures over 30-day period | <1% |
|
|
226
|
+
| Mutation Score | Percentage of introduced code mutations detected by test suite (mutation testing) | ≥75% |
|
|
227
|
+
| Critical Path Coverage | Percentage of revenue-generating or safety-critical features with comprehensive test coverage | 100% |
|
|
228
|
+
| Test Pyramid Ratio | Distribution of tests (unit:integration:E2E) | 70:20:10 or 60:30:10 |
|
|
@@ -0,0 +1,246 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: domain-07
|
|
3
|
+
number: "07"
|
|
4
|
+
name: Reliability & Resilience
|
|
5
|
+
owner_agents: [sre]
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Overview
|
|
9
|
+
|
|
10
|
+
Covers failure handling, retry strategies, timeouts, circuit breakers, graceful degradation, and whether the system can survive real-world failure conditions. Production systems fail constantly — network partitions, dependency outages, resource exhaustion, deployment errors — and a well-designed system degrades gracefully rather than catastrophically.
|
|
11
|
+
|
|
12
|
+
Scope: error handling strategy, retry policies, timeout configuration, circuit breaker patterns, graceful degradation paths, health checks, failure isolation, recovery procedures, and startup/shutdown safety. Does NOT cover data integrity during failures (domain 02), correctness of error handling logic (domain 03), or deployment infrastructure (domain 10).
|
|
13
|
+
|
|
14
|
+
## Audit Questions
|
|
15
|
+
|
|
16
|
+
- Is there a consistent error handling strategy, or is error handling ad-hoc per module?
|
|
17
|
+
- Are retries bounded with backoff and jitter, or can they cause retry storms?
|
|
18
|
+
- Are timeouts configured on all external calls (HTTP, database, message queue, gRPC)?
|
|
19
|
+
- Are circuit breakers implemented for critical dependency calls?
|
|
20
|
+
- What happens when a downstream dependency is unavailable — does the system degrade gracefully or fail completely?
|
|
21
|
+
- Are health check endpoints implemented, and do they test actual readiness (not just "process is running")?
|
|
22
|
+
- Are failures isolated — can one component's failure cascade to unrelated components?
|
|
23
|
+
- What is the startup sequence, and what happens if a dependency is unavailable at startup?
|
|
24
|
+
- What is the shutdown procedure, and are in-flight requests completed before shutdown?
|
|
25
|
+
- Are error conditions observable — do failures produce actionable logs, metrics, or alerts?
|
|
26
|
+
- Is there a distinction between transient failures (retry-safe) and permanent failures (fail-fast)?
|
|
27
|
+
|
|
28
|
+
## Failure Patterns
|
|
29
|
+
|
|
30
|
+
### Missing Error Handling
|
|
31
|
+
|
|
32
|
+
- **Description:** Errors are ignored, swallowed, or handled with generic catch-all blocks that obscure the nature and location of failures.
|
|
33
|
+
- **Indicators:**
|
|
34
|
+
- Empty catch blocks or catch blocks that only log and continue
|
|
35
|
+
- Generic `catch (Exception e)` without type-specific handling
|
|
36
|
+
- Functions that return null/undefined on error instead of propagating
|
|
37
|
+
- No error boundary or global error handler in the application
|
|
38
|
+
- **Severity Tendency:** high
|
|
39
|
+
|
|
40
|
+
### Retry Storms
|
|
41
|
+
|
|
42
|
+
- **Description:** Retry logic amplifies failures instead of recovering from them — when a downstream service is struggling, unbounded retries from multiple clients overwhelm it further, turning partial failures into total outages.
|
|
43
|
+
- **Indicators:**
|
|
44
|
+
- Retry logic without maximum retry count
|
|
45
|
+
- Fixed-interval retries without exponential backoff
|
|
46
|
+
- No jitter on retry timing (all clients retry at the same instant)
|
|
47
|
+
- Retries on non-idempotent operations (POST without idempotency key)
|
|
48
|
+
- **Severity Tendency:** critical
|
|
49
|
+
|
|
50
|
+
### Cascading Failures
|
|
51
|
+
|
|
52
|
+
- **Description:** A failure in one component propagates through the system because there are no isolation boundaries, causing unrelated functionality to fail.
|
|
53
|
+
- **Indicators:**
|
|
54
|
+
- No circuit breakers on external dependency calls
|
|
55
|
+
- Shared thread pools or connection pools across unrelated features
|
|
56
|
+
- Synchronous call chains where any link's failure blocks the entire chain
|
|
57
|
+
- Health check that reports unhealthy when a non-critical dependency is down
|
|
58
|
+
- **Severity Tendency:** critical
|
|
59
|
+
|
|
60
|
+
### Silent Failures
|
|
61
|
+
|
|
62
|
+
- **Description:** Operations fail without producing observable signals — no logs, no metrics, no alerts. The system appears healthy while silently producing incorrect results or dropping work.
|
|
63
|
+
- **Indicators:**
|
|
64
|
+
- Background jobs that fail without notification
|
|
65
|
+
- Catch blocks that log at DEBUG level or not at all
|
|
66
|
+
- No dead letter queue for failed message processing
|
|
67
|
+
- Monitoring dashboards that only show success metrics, not failure rates
|
|
68
|
+
- **Severity Tendency:** high
|
|
69
|
+
|
|
70
|
+
### Single Points of Failure
|
|
71
|
+
|
|
72
|
+
- **Description:** Critical system functionality depends on a single component instance with no redundancy, failover, or degradation path.
|
|
73
|
+
- **Indicators:**
|
|
74
|
+
- Single database instance without replica or failover
|
|
75
|
+
- In-memory state (caches, sessions) on a single server with no persistence
|
|
76
|
+
- Critical batch job running on one host with no scheduling redundancy
|
|
77
|
+
- Single external API dependency with no fallback or cache
|
|
78
|
+
- **Severity Tendency:** high
|
|
79
|
+
|
|
80
|
+
### Missing Health Checks
|
|
81
|
+
|
|
82
|
+
- **Description:** The system provides no mechanism for orchestration layers (load balancers, Kubernetes, etc.) to determine whether an instance is ready to serve traffic or should be replaced.
|
|
83
|
+
- **Indicators:**
|
|
84
|
+
- No health check endpoint, or endpoint that always returns 200
|
|
85
|
+
- Health check that doesn't verify actual dependencies (database, cache, external services)
|
|
86
|
+
- No distinction between liveness (process alive) and readiness (can serve traffic)
|
|
87
|
+
- Load balancer sending traffic to instances still initializing
|
|
88
|
+
- **Severity Tendency:** medium
|
|
89
|
+
|
|
90
|
+
### Unbounded Queues
|
|
91
|
+
|
|
92
|
+
- **Description:** Work queues, message buffers, or in-memory collections grow without limit, eventually exhausting memory and causing system-wide crashes.
|
|
93
|
+
- **Indicators:**
|
|
94
|
+
- Queue implementations without maximum capacity configured
|
|
95
|
+
- No backpressure mechanism — producers outpace consumers without throttling
|
|
96
|
+
- Memory usage that correlates with request rate rather than staying bounded
|
|
97
|
+
- OOM (Out of Memory) kills in production history
|
|
98
|
+
- **Severity Tendency:** high
|
|
99
|
+
|
|
100
|
+
### Missing Timeouts
|
|
101
|
+
|
|
102
|
+
- **Description:** External calls (HTTP, database, gRPC) have no timeout configured, allowing hung connections to block threads/goroutines indefinitely, leading to thread pool exhaustion.
|
|
103
|
+
- **Indicators:**
|
|
104
|
+
- HTTP client instantiation without timeout configuration
|
|
105
|
+
- Database connection strings without connection and query timeout parameters
|
|
106
|
+
- No deadline/context propagation in gRPC or async call chains
|
|
107
|
+
- Threads or connections that appear "stuck" in monitoring
|
|
108
|
+
- **Severity Tendency:** high
|
|
109
|
+
|
|
110
|
+
## Best Practice Patterns
|
|
111
|
+
|
|
112
|
+
### Structured Error Handling
|
|
113
|
+
|
|
114
|
+
- **Replaces Failure Pattern:** Missing Error Handling
|
|
115
|
+
- **Abstract Pattern:** Errors are categorized by type (transient vs permanent, expected vs unexpected), handled at the appropriate level, and always produce observable signals. No error is silently swallowed.
|
|
116
|
+
- **Framework Mappings:**
|
|
117
|
+
- Express: Centralized error handling middleware with typed error classes (NotFoundError, ValidationError, ServiceUnavailableError)
|
|
118
|
+
- Spring Boot: `@ControllerAdvice` with exception handler methods mapping domain exceptions to HTTP responses
|
|
119
|
+
- Go: Explicit error return values with error wrapping (`fmt.Errorf("operation failed: %w", err)`) preserving error chain
|
|
120
|
+
- **Language Patterns:**
|
|
121
|
+
- TypeScript: Custom error classes extending Error with `code` and `isRetryable` properties
|
|
122
|
+
- Rust: `Result<T, E>` with `?` operator for propagation and `thiserror` for typed error hierarchies
|
|
123
|
+
|
|
124
|
+
### Exponential Backoff with Jitter
|
|
125
|
+
|
|
126
|
+
- **Replaces Failure Pattern:** Retry Storms
|
|
127
|
+
- **Abstract Pattern:** Retries use exponential backoff (each retry waits longer than the last) plus random jitter (each client retries at a slightly different time) with a maximum retry count and a maximum backoff ceiling.
|
|
128
|
+
- **Framework Mappings:**
|
|
129
|
+
- AWS SDK: Built-in exponential backoff with full jitter (recommended default for AWS services)
|
|
130
|
+
- Resilience4j (Java): `RetryConfig.custom().maxAttempts(3).waitDuration(Duration.ofMillis(500)).intervalFunction(IntervalFunction.ofExponentialRandomBackoff())`
|
|
131
|
+
- Polly (.NET): `WaitAndRetryAsync` with decorrelated jitter backoff strategy
|
|
132
|
+
- **Language Patterns:**
|
|
133
|
+
- Any: `delay = min(cap, base * 2^attempt) + random(0, base * 2^attempt)` (full jitter formula)
|
|
134
|
+
- Python: `tenacity` library with `wait_exponential_jitter()` and `stop_after_attempt()`
|
|
135
|
+
|
|
136
|
+
### Circuit Breaker Pattern
|
|
137
|
+
|
|
138
|
+
- **Replaces Failure Pattern:** Cascading Failures
|
|
139
|
+
- **Abstract Pattern:** External dependency calls are wrapped in a circuit breaker that monitors failure rates. When failures exceed a threshold, the circuit "opens" and immediately returns a fallback/error without attempting the call, preventing cascade. After a cooldown period, it "half-opens" to test if the dependency has recovered.
|
|
140
|
+
- **Framework Mappings:**
|
|
141
|
+
- Resilience4j (Java): `CircuitBreaker.ofDefaults("name")` with configurable failure rate threshold, wait duration, and sliding window
|
|
142
|
+
- Polly (.NET): `CircuitBreakerAsync(exceptionsAllowedBeforeBreaking, durationOfBreak)` with advanced options
|
|
143
|
+
- Opossum (Node.js): `new CircuitBreaker(asyncFunction, { timeout: 3000, errorThresholdPercentage: 50, resetTimeout: 30000 })`
|
|
144
|
+
- **Language Patterns:**
|
|
145
|
+
- Go: `sony/gobreaker` with configurable thresholds and state transition callbacks
|
|
146
|
+
- Python: `pybreaker` with state change listeners and exclusion of specific exceptions
|
|
147
|
+
|
|
148
|
+
### Observable Failure Signals
|
|
149
|
+
|
|
150
|
+
- **Replaces Failure Pattern:** Silent Failures
|
|
151
|
+
- **Abstract Pattern:** Every failure produces an observable signal — structured log entry, metric increment, and/or dead letter queue entry. Background job failures trigger alerts. No operation can fail without the operations team being able to detect it from monitoring data alone.
|
|
152
|
+
- **Framework Mappings:**
|
|
153
|
+
- Sidekiq (Ruby): Dead set for failed jobs, retry metrics, integration with error tracking (Sentry, Honeybadger)
|
|
154
|
+
- Bull/BullMQ (Node.js): Failed job events, dead letter queues, job lifecycle metrics exported to Prometheus
|
|
155
|
+
- Celery (Python): `task_failure` signal handler, Flower monitoring dashboard, dead letter exchange in RabbitMQ
|
|
156
|
+
- **Language Patterns:**
|
|
157
|
+
- Any: Every catch block increments a failure counter metric and logs at ERROR or WARN level with structured context
|
|
158
|
+
- Any: Background job processors configured with dead letter queues and failure alerting thresholds
|
|
159
|
+
|
|
160
|
+
### Bulkhead Isolation
|
|
161
|
+
|
|
162
|
+
- **Replaces Failure Pattern:** Single Points of Failure
|
|
163
|
+
- **Abstract Pattern:** Critical resources (thread pools, connection pools, memory) are isolated per function or dependency so that one component's resource exhaustion cannot affect unrelated components. Redundancy is built in for critical paths.
|
|
164
|
+
- **Framework Mappings:**
|
|
165
|
+
- Resilience4j: `Bulkhead.ofDefaults("name")` with separate thread pools per external dependency
|
|
166
|
+
- Kubernetes: Resource limits per pod ensuring one service can't starve others, replica sets for redundancy
|
|
167
|
+
- Hystrix (legacy): Thread pool isolation per command group
|
|
168
|
+
- **Language Patterns:**
|
|
169
|
+
- Java: Separate `ExecutorService` instances per dependency with bounded thread pools
|
|
170
|
+
- Node.js: Worker thread pools with per-task timeout and separate event loop isolation for CPU-bound work
|
|
171
|
+
|
|
172
|
+
### Health Check Endpoints
|
|
173
|
+
|
|
174
|
+
- **Replaces Failure Pattern:** Missing Health Checks
|
|
175
|
+
- **Abstract Pattern:** The system exposes separate liveness (is the process alive?) and readiness (can it serve traffic?) endpoints that verify actual dependency connectivity, not just process status.
|
|
176
|
+
- **Framework Mappings:**
|
|
177
|
+
- Spring Boot Actuator: `/actuator/health` with auto-configured health indicators for database, Redis, message brokers
|
|
178
|
+
- Kubernetes: `livenessProbe` (restart if failed) and `readinessProbe` (remove from service if failed) as separate configs
|
|
179
|
+
- Express: Custom `/healthz` (liveness) and `/readyz` (readiness) endpoints checking dependency connectivity
|
|
180
|
+
- **Language Patterns:**
|
|
181
|
+
- Any: Readiness check that pings database, cache, and critical external services with short timeout
|
|
182
|
+
- Any: Liveness check that verifies the process can handle requests (not deadlocked) — typically a simple 200 response
|
|
183
|
+
|
|
184
|
+
### Bounded Queues with Overflow Handling
|
|
185
|
+
|
|
186
|
+
- **Replaces Failure Pattern:** Unbounded Queues
|
|
187
|
+
- **Abstract Pattern:** All queues and buffers have explicit maximum capacity. When capacity is reached, a defined overflow strategy activates — backpressure, load shedding, or dead letter routing — rather than unbounded growth.
|
|
188
|
+
- **Framework Mappings:**
|
|
189
|
+
- RabbitMQ: Queue `x-max-length` with `x-overflow: reject-publish` or dead letter exchange for overflow routing
|
|
190
|
+
- Kafka: Topic retention and partition limits with consumer group lag monitoring
|
|
191
|
+
- SQS: Maximum message size, visibility timeout, and dead letter queue after N receive attempts
|
|
192
|
+
- **Language Patterns:**
|
|
193
|
+
- Go: Buffered channels with `select` + `default` case for non-blocking sends when full
|
|
194
|
+
- Java: `ArrayBlockingQueue` with capacity limit and `offer()` returning false when full instead of blocking indefinitely
|
|
195
|
+
|
|
196
|
+
### Timeout Budgets
|
|
197
|
+
|
|
198
|
+
- **Replaces Failure Pattern:** Missing Timeouts
|
|
199
|
+
- **Abstract Pattern:** Every external call has an explicit timeout configured. For call chains, a deadline or timeout budget propagates from the entry point so that downstream calls share a total time budget rather than each having an independent timeout.
|
|
200
|
+
- **Framework Mappings:**
|
|
201
|
+
- gRPC: Deadline propagation — initial deadline set at the edge, automatically decremented through service-to-service calls
|
|
202
|
+
- Spring Boot: `RestTemplate` and `WebClient` with `connectTimeout` and `readTimeout` configured at client creation
|
|
203
|
+
- Express/Axios: Request-level timeout with `AbortController` for cancellation on deadline
|
|
204
|
+
- **Language Patterns:**
|
|
205
|
+
- Go: `context.WithTimeout()` propagated through the entire call chain — every function accepts `ctx context.Context`
|
|
206
|
+
- Any: Timeouts on every external call: `http.get(url, { timeout: 5000 })` — never use framework defaults
|
|
207
|
+
|
|
208
|
+
## Red Flags
|
|
209
|
+
|
|
210
|
+
- Empty catch/except blocks anywhere in the codebase
|
|
211
|
+
- HTTP clients instantiated without timeout configuration
|
|
212
|
+
- Retry logic with no maximum attempt count
|
|
213
|
+
- No circuit breaker library in dependencies
|
|
214
|
+
- Health check endpoint that returns 200 without checking any dependencies
|
|
215
|
+
- A single catch-all error handler that logs "something went wrong"
|
|
216
|
+
- Background job processor with no dead letter queue or failure alerting
|
|
217
|
+
- Thread pool sized at Integer.MAX_VALUE or equivalent unbounded configuration
|
|
218
|
+
- No graceful shutdown handler (SIGTERM handling)
|
|
219
|
+
- Error logging at DEBUG level or below
|
|
220
|
+
|
|
221
|
+
## Tool Affinities
|
|
222
|
+
|
|
223
|
+
| Tool ID | Signal Type | Relevance |
|
|
224
|
+
|---------|-------------|-----------|
|
|
225
|
+
| sonarqube | Empty catch blocks, unreachable error handling, exception handling code smells | primary |
|
|
226
|
+
| semgrep | Pattern detection for missing timeouts, unbounded retries, swallowed exceptions, missing error propagation | primary |
|
|
227
|
+
| git-history | Incident-correlated changes — commits tagged with incident IDs reveal reliability pain points | contextual |
|
|
228
|
+
|
|
229
|
+
## Standards & Frameworks
|
|
230
|
+
|
|
231
|
+
- Release It! (Michael Nygard) — Stability patterns: circuit breakers, bulkheads, timeouts, steady state
|
|
232
|
+
- Netflix resilience patterns — Hystrix, chaos engineering principles, adaptive concurrency
|
|
233
|
+
- SRE principles (Google) — Error budgets, SLOs, toil reduction, defense in depth
|
|
234
|
+
- Chaos Engineering (Principles of Chaos) — Proactive failure injection to surface reliability gaps before production incidents
|
|
235
|
+
- The Twelve-Factor App — Factor XII (Admin processes) for operational safety
|
|
236
|
+
|
|
237
|
+
## Metrics
|
|
238
|
+
|
|
239
|
+
| Metric | What It Measures | Healthy Range |
|
|
240
|
+
|--------|-----------------|---------------|
|
|
241
|
+
| Error handling coverage | Percentage of external calls with explicit error handling | >95% |
|
|
242
|
+
| Timeout configuration rate | Percentage of external calls with timeout configured | 100% |
|
|
243
|
+
| Circuit breaker count | Number of circuit breakers on external dependency calls | ≥1 per external dependency |
|
|
244
|
+
| Health check endpoint count | Number of health check endpoints (liveness + readiness) | ≥2 (liveness + readiness) |
|
|
245
|
+
| Mean time to detection | Average time between failure occurrence and alerting | <5 minutes |
|
|
246
|
+
| Graceful shutdown coverage | Whether SIGTERM handler exists and drains in-flight requests | Present and tested |
|