hatch3r 1.3.0 → 1.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +12 -7
- package/agents/hatch3r-a11y-auditor.md +18 -11
- package/agents/hatch3r-architect.md +27 -12
- package/agents/hatch3r-ci-watcher.md +30 -9
- package/agents/hatch3r-context-rules.md +18 -8
- package/agents/hatch3r-dependency-auditor.md +30 -15
- package/agents/hatch3r-devops.md +18 -13
- package/agents/hatch3r-docs-writer.md +33 -12
- package/agents/hatch3r-fixer.md +46 -9
- package/agents/hatch3r-implementer.md +21 -9
- package/agents/hatch3r-learnings-loader.md +24 -7
- package/agents/hatch3r-lint-fixer.md +18 -9
- package/agents/hatch3r-perf-profiler.md +26 -10
- package/agents/hatch3r-researcher.md +57 -919
- package/agents/hatch3r-reviewer.md +29 -10
- package/agents/hatch3r-security-auditor.md +25 -10
- package/agents/hatch3r-test-writer.md +29 -9
- package/agents/modes/architecture.md +1 -0
- package/agents/modes/boundary-analysis.md +2 -1
- package/agents/modes/codebase-impact.md +1 -0
- package/agents/modes/complexity-risk.md +1 -0
- package/agents/modes/coverage-analysis.md +1 -0
- package/agents/modes/current-state.md +1 -0
- package/agents/modes/feature-design.md +1 -0
- package/agents/modes/impact-analysis.md +1 -0
- package/agents/modes/library-docs.md +2 -1
- package/agents/modes/migration-path.md +1 -0
- package/agents/modes/prior-art.md +1 -0
- package/agents/modes/refactoring-strategy.md +1 -0
- package/agents/modes/regression.md +1 -0
- package/agents/modes/requirements-elicitation.md +1 -0
- package/agents/modes/risk-assessment.md +1 -0
- package/agents/modes/risk-prioritization.md +1 -0
- package/agents/modes/root-cause.md +1 -0
- package/agents/modes/similar-implementation.md +2 -1
- package/agents/modes/symptom-trace.md +1 -0
- package/agents/modes/test-pattern.md +2 -1
- package/agents/shared/external-knowledge.md +31 -0
- package/agents/shared/quality-charter.md +96 -0
- package/checks/README.md +1 -0
- package/checks/accessibility.md +55 -0
- package/commands/board/pickup-azure-devops.md +5 -0
- package/commands/board/pickup-delegation-multi.md +9 -1
- package/commands/board/pickup-delegation.md +4 -0
- package/commands/board/pickup-github.md +5 -0
- package/commands/board/pickup-gitlab.md +5 -0
- package/commands/board/pickup-modes.md +1 -0
- package/commands/board/pickup-post-impl.md +9 -1
- package/commands/board/shared-azure-devops.md +14 -3
- package/commands/board/shared-board-overview.md +1 -0
- package/commands/board/shared-github.md +2 -0
- package/commands/board/shared-gitlab.md +10 -2
- package/commands/hatch3r-agent-customize.md +6 -1
- package/commands/hatch3r-api-spec.md +1 -0
- package/commands/hatch3r-benchmark.md +4 -3
- package/commands/hatch3r-board-fill.md +52 -9
- package/commands/hatch3r-board-groom.md +124 -7
- package/commands/hatch3r-board-init.md +7 -3
- package/commands/hatch3r-board-pickup.md +1 -0
- package/commands/hatch3r-board-refresh.md +1 -0
- package/commands/hatch3r-board-shared.md +71 -5
- package/commands/hatch3r-bug-plan.md +2 -1
- package/commands/hatch3r-codebase-map.md +4 -3
- package/commands/hatch3r-command-customize.md +6 -1
- package/commands/hatch3r-context-health.md +1 -0
- package/commands/hatch3r-cost-tracking.md +1 -0
- package/commands/hatch3r-debug.md +4 -3
- package/commands/hatch3r-dep-audit.md +3 -0
- package/commands/hatch3r-feature-plan.md +3 -2
- package/commands/hatch3r-healthcheck.md +1 -0
- package/commands/hatch3r-hooks.md +6 -1
- package/commands/hatch3r-learn.md +1 -0
- package/commands/hatch3r-migration-plan.md +3 -2
- package/commands/hatch3r-onboard.md +2 -1
- package/commands/hatch3r-project-spec.md +4 -3
- package/commands/hatch3r-quick-change.md +31 -3
- package/commands/hatch3r-recipe.md +1 -0
- package/commands/hatch3r-refactor-plan.md +2 -1
- package/commands/hatch3r-release.md +4 -1
- package/commands/hatch3r-revision.md +138 -17
- package/commands/hatch3r-roadmap.md +5 -4
- package/commands/hatch3r-rule-customize.md +5 -0
- package/commands/hatch3r-security-audit.md +1 -0
- package/commands/hatch3r-skill-customize.md +5 -0
- package/commands/hatch3r-test-plan.md +3 -2
- package/commands/hatch3r-workflow.md +15 -1
- package/dist/cli/index.js +7595 -4548
- package/dist/cli/index.js.map +1 -1
- package/hooks/hatch3r-ci-failure.md +1 -0
- package/hooks/hatch3r-file-save.md +1 -0
- package/hooks/hatch3r-post-merge.md +1 -0
- package/hooks/hatch3r-pre-commit.md +1 -0
- package/hooks/hatch3r-pre-push.md +1 -0
- package/hooks/hatch3r-session-start.md +1 -0
- package/package.json +30 -12
- package/rules/hatch3r-accessibility-standards.md +2 -1
- package/rules/hatch3r-accessibility-standards.mdc +1 -1
- package/rules/hatch3r-agent-orchestration-detail.md +207 -0
- package/rules/hatch3r-agent-orchestration-detail.mdc +202 -0
- package/rules/hatch3r-agent-orchestration.md +161 -318
- package/rules/hatch3r-agent-orchestration.mdc +212 -154
- package/rules/hatch3r-api-design.md +2 -1
- package/rules/hatch3r-api-design.mdc +1 -1
- package/rules/hatch3r-browser-verification.md +4 -2
- package/rules/hatch3r-browser-verification.mdc +1 -0
- package/rules/hatch3r-ci-cd.md +2 -1
- package/rules/hatch3r-ci-cd.mdc +1 -1
- package/rules/hatch3r-code-standards.md +15 -2
- package/rules/hatch3r-code-standards.mdc +22 -2
- package/rules/hatch3r-component-conventions.md +2 -1
- package/rules/hatch3r-component-conventions.mdc +1 -1
- package/rules/hatch3r-data-classification.md +2 -1
- package/rules/hatch3r-data-classification.mdc +1 -1
- package/rules/hatch3r-deep-context.md +26 -1
- package/rules/hatch3r-deep-context.mdc +54 -8
- package/rules/hatch3r-dependency-management.md +2 -1
- package/rules/hatch3r-dependency-management.mdc +17 -5
- package/rules/hatch3r-feature-flags.md +2 -0
- package/rules/hatch3r-feature-flags.mdc +1 -0
- package/rules/hatch3r-git-conventions.md +2 -1
- package/rules/hatch3r-git-conventions.mdc +2 -1
- package/rules/hatch3r-i18n.md +2 -1
- package/rules/hatch3r-i18n.mdc +1 -1
- package/rules/hatch3r-learning-consult.md +11 -1
- package/rules/hatch3r-learning-consult.mdc +11 -1
- package/rules/hatch3r-migrations.md +2 -1
- package/rules/hatch3r-migrations.mdc +12 -1
- package/rules/hatch3r-observability-logging.md +34 -0
- package/rules/hatch3r-observability-logging.mdc +30 -0
- package/rules/hatch3r-observability-metrics.md +74 -0
- package/rules/hatch3r-observability-metrics.mdc +70 -0
- package/rules/hatch3r-observability-tracing-detail.md +160 -0
- package/rules/hatch3r-observability-tracing-detail.mdc +63 -0
- package/rules/hatch3r-observability-tracing.md +86 -0
- package/rules/hatch3r-observability-tracing.mdc +77 -0
- package/rules/hatch3r-observability.md +9 -448
- package/rules/hatch3r-observability.mdc +7 -159
- package/rules/hatch3r-performance-budgets.md +2 -0
- package/rules/hatch3r-performance-budgets.mdc +1 -0
- package/rules/hatch3r-secrets-management.md +2 -1
- package/rules/hatch3r-secrets-management.mdc +1 -1
- package/rules/hatch3r-security-patterns.md +3 -2
- package/rules/hatch3r-security-patterns.mdc +12 -1
- package/rules/hatch3r-testing.md +12 -2
- package/rules/hatch3r-testing.mdc +11 -2
- package/rules/hatch3r-theming.md +3 -2
- package/rules/hatch3r-theming.mdc +1 -1
- package/rules/hatch3r-tooling-hierarchy.md +3 -2
- package/rules/hatch3r-tooling-hierarchy.mdc +19 -5
- package/skills/hatch3r-a11y-audit/SKILL.md +11 -4
- package/skills/hatch3r-agent-customize/SKILL.md +5 -72
- package/skills/hatch3r-api-spec/SKILL.md +9 -2
- package/skills/hatch3r-architecture-review/SKILL.md +7 -0
- package/skills/hatch3r-bug-fix/SKILL.md +16 -7
- package/skills/hatch3r-ci-pipeline/SKILL.md +8 -1
- package/skills/hatch3r-command-customize/SKILL.md +5 -62
- package/skills/hatch3r-context-health/SKILL.md +23 -2
- package/skills/hatch3r-cost-tracking/SKILL.md +16 -6
- package/skills/hatch3r-customize/SKILL.md +124 -0
- package/skills/hatch3r-dep-audit/SKILL.md +9 -2
- package/skills/hatch3r-feature/SKILL.md +12 -4
- package/skills/hatch3r-gh-agentic-workflows/SKILL.md +7 -0
- package/skills/hatch3r-incident-response/SKILL.md +7 -0
- package/skills/hatch3r-issue-workflow/SKILL.md +8 -1
- package/skills/hatch3r-logical-refactor/SKILL.md +8 -1
- package/skills/hatch3r-migration/SKILL.md +7 -0
- package/skills/hatch3r-perf-audit/SKILL.md +9 -2
- package/skills/hatch3r-pr-creation/SKILL.md +8 -1
- package/skills/hatch3r-qa-validation/SKILL.md +8 -1
- package/skills/hatch3r-recipe/SKILL.md +8 -1
- package/skills/hatch3r-refactor/SKILL.md +10 -2
- package/skills/hatch3r-release/SKILL.md +8 -1
- package/skills/hatch3r-rule-customize/SKILL.md +5 -65
- package/skills/hatch3r-skill-customize/SKILL.md +5 -62
- package/skills/hatch3r-visual-refactor/SKILL.md +12 -5
|
@@ -6,9 +6,9 @@ alwaysApply: true
|
|
|
6
6
|
|
|
7
7
|
## Core Conventions
|
|
8
8
|
|
|
9
|
-
-
|
|
9
|
+
- Enable strict type checking. No type escape hatches (e.g., `any`, `@ts-ignore`, or equivalent) without a linked issue.
|
|
10
10
|
- Functions: `camelCase`. Types/Interfaces: `PascalCase`. Constants: `SCREAMING_SNAKE`.
|
|
11
|
-
- Component files: `PascalCase` (match framework convention). Logic files: `camelCase.
|
|
11
|
+
- Component files: `PascalCase` (match framework convention). Logic files: `camelCase` (or language convention).
|
|
12
12
|
- Max function length: 50 lines. Max file: 400 lines. Cyclomatic complexity: 10.
|
|
13
13
|
- Use framework-recommended component patterns (e.g., typed props and emits).
|
|
14
14
|
- Use stores or equivalent for shared state. Prefer composables/hooks over mixins.
|
|
@@ -88,6 +88,18 @@ alwaysApply: true
|
|
|
88
88
|
- Include `correlationId` in all error logs for tracing across client and server.
|
|
89
89
|
- No secrets, tokens, or PII in error messages or logs.
|
|
90
90
|
|
|
91
|
+
### Error Handling Anti-Patterns (Prohibited)
|
|
92
|
+
|
|
93
|
+
The following patterns are always wrong and must be flagged in review:
|
|
94
|
+
|
|
95
|
+
| Anti-Pattern | Why It Is Wrong | Correct Alternative |
|
|
96
|
+
|-------------|-----------------|---------------------|
|
|
97
|
+
| `catch (e) {}` (empty catch) | Silently swallows errors; failures become invisible | `catch (e) { logger.error('context', e); throw e; }` or handle with Result type |
|
|
98
|
+
| `catch (e) { return null; }` in auth paths | Fail-open: returns "no user" instead of "auth failed" | `catch (e) { throw new AuthError('auth_failed', e); }` |
|
|
99
|
+
| `as any` to fix type errors | Bypasses type safety; hides real type mismatches | Fix the actual type or use a proper type guard |
|
|
100
|
+
| `// @ts-ignore` without linked issue | Permanent type-safety hole | Fix the type error or add `// @ts-expect-error` with issue link |
|
|
101
|
+
| `try { ... } catch { return defaultValue; }` for all errors | Treats transient errors (network) same as permanent ones (validation) | Discriminate error types: retry transient, fail permanent |
|
|
102
|
+
|
|
91
103
|
## Import Ordering
|
|
92
104
|
|
|
93
105
|
Enforce consistent import ordering via linter rules (e.g., `eslint-plugin-import`). The canonical order:
|
|
@@ -100,6 +112,14 @@ Enforce consistent import ordering via linter rules (e.g., `eslint-plugin-import
|
|
|
100
112
|
|
|
101
113
|
Separate each group with a blank line. Sort alphabetically within each group.
|
|
102
114
|
|
|
115
|
+
## Monorepo Conventions
|
|
116
|
+
|
|
117
|
+
When working in a monorepo (multiple packages or apps in a single repository):
|
|
118
|
+
|
|
119
|
+
- **Scope changes to a single package at a time.** A PR should touch one package unless the change requires a coordinated cross-package update (e.g., a shared type change and its consumers). Coordinated changes must be documented in the PR description.
|
|
120
|
+
- **Run tests only for affected packages.** Use the monorepo tool's filtering (e.g., `--filter`, `--scope`, `--since`) to run tests, lint, and builds only for packages affected by the current change.
|
|
121
|
+
- **Respect package boundaries — do not import across packages without explicit dependency.** If package A needs something from package B, B must be declared as a dependency in A's `package.json` (or equivalent manifest). Direct file-path imports across package boundaries are forbidden.
|
|
122
|
+
|
|
103
123
|
## Dead Code Prevention
|
|
104
124
|
|
|
105
125
|
- Remove unused imports, variables, functions, and type definitions immediately. Do not comment them out "for later."
|
|
@@ -5,6 +5,7 @@ description: Rules for component development in web applications
|
|
|
5
5
|
scope: conditional
|
|
6
6
|
globs: src/**/*.vue, src/**/*.tsx, src/**/*.jsx
|
|
7
7
|
tags: [implementation]
|
|
8
|
+
quality_charter: agents/shared/quality-charter.md
|
|
8
9
|
---
|
|
9
10
|
# Component Conventions
|
|
10
11
|
|
|
@@ -78,7 +79,7 @@ tags: [implementation]
|
|
|
78
79
|
- Group related fields with `<fieldset>` and `<legend>` (e.g., "Shipping Address", "Payment Details").
|
|
79
80
|
- Use progressive disclosure for complex forms: show advanced options behind an expandable section or a follow-up step.
|
|
80
81
|
- Autofocus the first input field on form mount.
|
|
81
|
-
-
|
|
82
|
+
- Verify tab order follows the visual layout order by tabbing through the form — never use positive `tabindex` values.
|
|
82
83
|
|
|
83
84
|
### Submit Behavior
|
|
84
85
|
- Disable the submit button when the form has known validation errors (but keep it focusable for screen readers).
|
|
@@ -2,8 +2,9 @@
|
|
|
2
2
|
id: hatch3r-data-classification
|
|
3
3
|
type: rule
|
|
4
4
|
description: Data classification standards covering PII handling, encryption, retention policies, and regulatory compliance
|
|
5
|
-
scope:
|
|
5
|
+
scope: "**/models/**,**/schemas/**,**/schema*,**/database/**,**/db/**,**/*model*,**/*entity*,**/prisma/**,**/drizzle/**,**/*migration*"
|
|
6
6
|
tags: [security]
|
|
7
|
+
quality_charter: agents/shared/quality-charter.md
|
|
7
8
|
---
|
|
8
9
|
# Data Classification Standards
|
|
9
10
|
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
description: Data classification standards covering PII handling, encryption, retention policies, and regulatory compliance
|
|
3
|
-
|
|
3
|
+
globs: ["**/models/**", "**/schemas/**", "**/schema*", "**/database/**", "**/db/**", "**/*model*", "**/*entity*", "**/prisma/**", "**/drizzle/**", "**/*migration*"]
|
|
4
4
|
---
|
|
5
5
|
# Data Classification Standards
|
|
6
6
|
|
|
@@ -4,6 +4,7 @@ type: rule
|
|
|
4
4
|
description: Adaptive pre-implementation analysis — complexity scoring, requirements elicitation, similar implementation discovery, and transitive dependency tracing before coding
|
|
5
5
|
scope: always
|
|
6
6
|
tags: [core]
|
|
7
|
+
quality_charter: agents/shared/quality-charter.md
|
|
7
8
|
---
|
|
8
9
|
# Deep Context Analysis
|
|
9
10
|
|
|
@@ -87,8 +88,32 @@ This rule augments — not replaces — the existing Universal Sub-Agent Pipelin
|
|
|
87
88
|
- **`type:bug`**: existing modes `symptom-trace`, `root-cause`, `codebase-impact` + `requirements-elicitation` per tier (bugs often have underspecified reproduction steps)
|
|
88
89
|
- **`type:refactor`**: existing modes `current-state`, `refactoring-strategy`, `migration-path` + `similar-implementation` per tier (refactors benefit most from convention alignment)
|
|
89
90
|
|
|
91
|
+
## Scoring Examples
|
|
92
|
+
|
|
93
|
+
To reduce ambiguity in tier assignment, here are worked examples:
|
|
94
|
+
|
|
95
|
+
**Example 1: "Fix typo in error message" -- Tier 1 (score 0)**
|
|
96
|
+
No signals triggered. Single file, no cross-module impact, no ambiguity.
|
|
97
|
+
|
|
98
|
+
**Example 2: "Add email validation to signup form" -- Tier 2 (score 4)**
|
|
99
|
+
- Multiple layers touched (API + UI): +3
|
|
100
|
+
- Estimated 2-3 files: +0
|
|
101
|
+
- Input validation is security-adjacent but not in a security-sensitive area: +0
|
|
102
|
+
- Clear requirements ("validate email format"): +0
|
|
103
|
+
- May trigger cross-cutting i18n for error messages: +1 (partial cross-cutting)
|
|
104
|
+
|
|
105
|
+
**Example 3: "Migrate auth from session-based to JWT" -- Tier 3 (score 12)**
|
|
106
|
+
- Multiple layers (auth middleware + API + UI + storage): +3
|
|
107
|
+
- Vague term "migrate" (scope unclear): +2
|
|
108
|
+
- Cross-cutting auth concern: +2
|
|
109
|
+
- Security-sensitive area: +2
|
|
110
|
+
- Behavioral contract change (session API to JWT API): +2
|
|
111
|
+
- Estimated >5 files: +1 (partial -- easily >5)
|
|
112
|
+
|
|
113
|
+
When a signal partially applies (e.g., "maybe 5 files, maybe 4"), round down. Tier upgrades from adaptation (see `hatch3r-agent-orchestration-detail`) compensate for underestimates.
|
|
114
|
+
|
|
90
115
|
## Exceptions
|
|
91
116
|
|
|
92
117
|
- **`hatch3r-quick-change` command**: Tier 1 items proceed without research. Tier 2 items get lightweight `similar-implementation` at `quick` depth. Tier 3 items must be routed to `hatch3r-workflow` (hard block).
|
|
93
|
-
- **Trivial single-line edits**: Always Tier 1 regardless of scoring signals. This is the only valid basis for skipping research
|
|
118
|
+
- **Trivial single-line edits**: Always Tier 1 regardless of scoring signals. This is the only valid basis for skipping research -- label-based shortcuts (e.g., `risk:low AND priority:p3`) are not sufficient alone.
|
|
94
119
|
- **`hatch3r-revision` command**: Operates on already-implemented code. Deep context analysis applies to the original implementation, not the revision pass.
|
|
@@ -33,7 +33,9 @@ Score every task against these signals before implementation. Each signal adds w
|
|
|
33
33
|
|
|
34
34
|
### Tier 1 — Light
|
|
35
35
|
|
|
36
|
-
Single-file changes, config tweaks, typo fixes, comment edits, constant value changes. No additional analysis required beyond existing researcher modes.
|
|
36
|
+
Single-file changes, config tweaks, typo fixes, comment edits, constant value changes. No additional analysis required beyond existing researcher modes.
|
|
37
|
+
|
|
38
|
+
Skip deep context analysis entirely. Proceed with the standard pipeline.
|
|
37
39
|
|
|
38
40
|
### Tier 2 — Standard
|
|
39
41
|
|
|
@@ -42,7 +44,7 @@ Multi-file changes with clear scope. Run these researcher modes at `quick` depth
|
|
|
42
44
|
1. **`requirements-elicitation`** at `quick` — scan for top ambiguities and ask 3–5 clarifying questions.
|
|
43
45
|
2. **`similar-implementation`** at `quick` — find 1 reference implementation and extract top-level patterns.
|
|
44
46
|
|
|
45
|
-
Present findings to the user inline. Proceed after answers are received.
|
|
47
|
+
Present findings to the user inline. Proceed after answers are received — no separate confirmation checkpoint required.
|
|
46
48
|
|
|
47
49
|
### Tier 3 — Deep
|
|
48
50
|
|
|
@@ -52,18 +54,62 @@ Cross-module features, architectural changes, multi-layer implementations, or ta
|
|
|
52
54
|
2. **`similar-implementation`** at `deep` — find 2–3 reference implementations, full convention extraction, divergence analysis.
|
|
53
55
|
3. **`codebase-impact`** at `deep` — full transitive dependency tracing, API consumer map, blast radius summary.
|
|
54
56
|
|
|
55
|
-
**Mandatory checkpoint:** Present a consolidated "Pre-Implementation Summary" to the user and ASK for confirmation before proceeding
|
|
57
|
+
**Mandatory checkpoint:** Present a consolidated "Pre-Implementation Summary" to the user and ASK for confirmation before proceeding to implementation:
|
|
58
|
+
|
|
59
|
+
```
|
|
60
|
+
Pre-Implementation Summary:
|
|
61
|
+
Complexity: Tier 3 (Deep) — score {N}
|
|
62
|
+
Resolved requirements: {N}/{M} dimensions addressed
|
|
63
|
+
Unresolved questions: {list — these MUST be answered before proceeding}
|
|
64
|
+
Reference implementation: {name} — conventions locked
|
|
65
|
+
Blast radius: {N} files directly affected, {M} transitively at risk
|
|
66
|
+
Cross-cutting concerns: {list with status}
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
Do NOT proceed to implementation until all unresolved questions are answered by the user.
|
|
56
70
|
|
|
57
71
|
## Passing Context to Implementer
|
|
58
72
|
|
|
59
|
-
When the `similar-implementation` mode produces reference implementations, include them in the implementer sub-agent prompt as **"Reference Conventions"**.
|
|
73
|
+
When the `similar-implementation` mode produces reference implementations, include them in the implementer sub-agent prompt as **"Reference Conventions"**. The implementer's Convention Lock step (Step 1b) uses these to align its architectural decisions with established codebase patterns.
|
|
74
|
+
|
|
75
|
+
When the `requirements-elicitation` mode produces resolved requirements, include the user's answers in the implementer sub-agent prompt as **"Resolved Requirements"** so the implementer has explicit answers to all ambiguities rather than guessing.
|
|
76
|
+
|
|
77
|
+
When the enhanced `codebase-impact` mode produces a blast radius summary, include it in the implementer sub-agent prompt so the implementer knows which consumers and contracts must be preserved.
|
|
60
78
|
|
|
61
79
|
## Integration with Existing Pipeline
|
|
62
80
|
|
|
63
|
-
This rule augments the existing Universal Sub-Agent Pipeline from `hatch3r-agent-orchestration`. The complexity scoring happens at the start of Phase 1 (Research), and the additional researcher modes run alongside the existing task-type modes
|
|
81
|
+
This rule augments — not replaces — the existing Universal Sub-Agent Pipeline from `hatch3r-agent-orchestration`. The complexity scoring happens at the start of Phase 1 (Research), and the additional researcher modes run alongside the existing task-type modes:
|
|
82
|
+
|
|
83
|
+
- **`type:feature`**: existing modes `codebase-impact`, `feature-design`, `architecture` + new modes per tier
|
|
84
|
+
- **`type:bug`**: existing modes `symptom-trace`, `root-cause`, `codebase-impact` + `requirements-elicitation` per tier (bugs often have underspecified reproduction steps)
|
|
85
|
+
- **`type:refactor`**: existing modes `current-state`, `refactoring-strategy`, `migration-path` + `similar-implementation` per tier (refactors benefit most from convention alignment)
|
|
86
|
+
|
|
87
|
+
## Scoring Examples
|
|
88
|
+
|
|
89
|
+
To reduce ambiguity in tier assignment, here are worked examples:
|
|
90
|
+
|
|
91
|
+
**Example 1: "Fix typo in error message" -- Tier 1 (score 0)**
|
|
92
|
+
No signals triggered. Single file, no cross-module impact, no ambiguity.
|
|
93
|
+
|
|
94
|
+
**Example 2: "Add email validation to signup form" -- Tier 2 (score 4)**
|
|
95
|
+
- Multiple layers touched (API + UI): +3
|
|
96
|
+
- Estimated 2-3 files: +0
|
|
97
|
+
- Input validation is security-adjacent but not in a security-sensitive area: +0
|
|
98
|
+
- Clear requirements ("validate email format"): +0
|
|
99
|
+
- May trigger cross-cutting i18n for error messages: +1 (partial cross-cutting)
|
|
100
|
+
|
|
101
|
+
**Example 3: "Migrate auth from session-based to JWT" -- Tier 3 (score 12)**
|
|
102
|
+
- Multiple layers (auth middleware + API + UI + storage): +3
|
|
103
|
+
- Vague term "migrate" (scope unclear): +2
|
|
104
|
+
- Cross-cutting auth concern: +2
|
|
105
|
+
- Security-sensitive area: +2
|
|
106
|
+
- Behavioral contract change (session API to JWT API): +2
|
|
107
|
+
- Estimated >5 files: +1 (partial -- easily >5)
|
|
108
|
+
|
|
109
|
+
When a signal partially applies (e.g., "maybe 5 files, maybe 4"), round down. Tier upgrades from adaptation (see `hatch3r-agent-orchestration-detail`) compensate for underestimates.
|
|
64
110
|
|
|
65
111
|
## Exceptions
|
|
66
112
|
|
|
67
|
-
- **`hatch3r-quick-change
|
|
68
|
-
- **Trivial single-line edits**: Always Tier 1 regardless of scoring. This is the only valid basis for skipping research
|
|
69
|
-
- **`hatch3r-revision
|
|
113
|
+
- **`hatch3r-quick-change` command**: Tier 1 items proceed without research. Tier 2 items get lightweight `similar-implementation` at `quick` depth. Tier 3 items must be routed to `hatch3r-workflow` (hard block).
|
|
114
|
+
- **Trivial single-line edits**: Always Tier 1 regardless of scoring signals. This is the only valid basis for skipping research -- label-based shortcuts (e.g., `risk:low AND priority:p3`) are not sufficient alone.
|
|
115
|
+
- **`hatch3r-revision` command**: Operates on already-implemented code. Deep context analysis applies to the original implementation, not the revision pass.
|
|
@@ -2,8 +2,9 @@
|
|
|
2
2
|
id: hatch3r-dependency-management
|
|
3
3
|
type: rule
|
|
4
4
|
description: Rules for managing project dependencies
|
|
5
|
-
scope:
|
|
5
|
+
scope: "**/package.json,**/package-lock.json,**/yarn.lock,**/pnpm-lock.yaml,**/Cargo.toml,**/Cargo.lock,**/requirements*.txt,**/pyproject.toml,**/go.mod,**/go.sum,**/Gemfile*"
|
|
6
6
|
tags: [maintenance]
|
|
7
|
+
quality_charter: agents/shared/quality-charter.md
|
|
7
8
|
---
|
|
8
9
|
# Dependency Management
|
|
9
10
|
|
|
@@ -1,15 +1,27 @@
|
|
|
1
1
|
---
|
|
2
|
-
description: Rules for managing
|
|
3
|
-
|
|
2
|
+
description: Rules for managing project dependencies
|
|
3
|
+
globs: ["**/package.json", "**/package-lock.json", "**/yarn.lock", "**/pnpm-lock.yaml", "**/Cargo.toml", "**/Cargo.lock", "**/requirements*.txt", "**/pyproject.toml", "**/go.mod", "**/go.sum", "**/Gemfile*"]
|
|
4
4
|
---
|
|
5
5
|
# Dependency Management
|
|
6
6
|
|
|
7
|
-
- Always commit
|
|
7
|
+
- Always commit the lockfile. Never install without saving.
|
|
8
8
|
- Justify new dependencies in PR description: what it does, why needed, alternatives considered, bundle size impact.
|
|
9
9
|
- Prefer well-maintained packages: recent commits, active issues, no known CVEs.
|
|
10
|
-
- Pin exact versions for production deps. Use `npm ci` in CI.
|
|
11
|
-
- Run `npm audit` before merging dependency changes. Fix high/critical before merge.
|
|
10
|
+
- Pin exact versions for production deps. Use clean install (e.g., `npm ci`, `pip install -r`, `cargo build`) in CI.
|
|
11
|
+
- Run a dependency security scanner (e.g., `npm audit`, `pip-audit`, `cargo audit`) before merging dependency changes. Fix high/critical before merge.
|
|
12
12
|
- No duplicate packages serving the same purpose. Consolidate on one.
|
|
13
13
|
- Remove unused dependencies on every cleanup pass.
|
|
14
14
|
- Security patches (CVEs) are P0/P1 priority. Patch within 48h for critical.
|
|
15
15
|
- Check bundle size impact against budget. Reject deps that exceed.
|
|
16
|
+
|
|
17
|
+
## Transitive Dependency Hygiene
|
|
18
|
+
|
|
19
|
+
- Audit transitive dependencies, not just direct ones. A direct dependency with a compromised transitive dep is still a vulnerability. Use `npm ls`, `pip show`, or `cargo tree` to inspect the full dependency graph.
|
|
20
|
+
- When a transitive dependency has a known CVE, determine whether the vulnerable code path is reachable from your project. If reachable, override or patch the transitive dep. If unreachable, document the finding with justification for deferral.
|
|
21
|
+
- Avoid dependencies that pull in excessively large transitive trees for minimal functionality. If a package adds 50+ transitive deps for a single utility function, write the utility inline or find a lighter alternative.
|
|
22
|
+
|
|
23
|
+
## Version Upgrade Strategy
|
|
24
|
+
|
|
25
|
+
- Review changelogs and migration guides before upgrading major versions. Never blindly bump major versions and assume backward compatibility.
|
|
26
|
+
- Run the full test suite after any dependency upgrade, including integration tests. A passing unit test suite does not guarantee compatibility with upgraded peer dependencies.
|
|
27
|
+
- When upgrading a shared dependency used across multiple modules, upgrade all consumers in the same PR to avoid version skew within the monorepo or project.
|
|
@@ -3,7 +3,9 @@ id: hatch3r-feature-flags
|
|
|
3
3
|
type: rule
|
|
4
4
|
description: Feature flag patterns and lifecycle for the project
|
|
5
5
|
scope: conditional
|
|
6
|
+
globs: "**/*feature-flag*,**/*featureFlag*,**/*feature_flag*,**/config/**"
|
|
6
7
|
tags: [implementation]
|
|
8
|
+
quality_charter: agents/shared/quality-charter.md
|
|
7
9
|
---
|
|
8
10
|
# Feature Flags
|
|
9
11
|
|
|
@@ -2,8 +2,9 @@
|
|
|
2
2
|
id: hatch3r-git-conventions
|
|
3
3
|
type: rule
|
|
4
4
|
description: Git commit message and branching conventions
|
|
5
|
-
scope:
|
|
5
|
+
scope: "**/.git/**,**/.gitignore,**/.gitattributes,**/.gitmodules,**/COMMIT_EDITMSG"
|
|
6
6
|
tags: [core]
|
|
7
|
+
quality_charter: agents/shared/quality-charter.md
|
|
7
8
|
---
|
|
8
9
|
# Git Conventions
|
|
9
10
|
|
package/rules/hatch3r-i18n.md
CHANGED
|
@@ -5,6 +5,7 @@ description: Internationalization, localization, and RTL support conventions for
|
|
|
5
5
|
scope: conditional
|
|
6
6
|
globs: src/**/*.vue, src/**/*.tsx, src/**/*.jsx, src/**/*.ts
|
|
7
7
|
tags: [implementation]
|
|
8
|
+
quality_charter: agents/shared/quality-charter.md
|
|
8
9
|
---
|
|
9
10
|
# Internationalization & RTL
|
|
10
11
|
|
|
@@ -43,7 +44,7 @@ tags: [implementation]
|
|
|
43
44
|
|
|
44
45
|
- Allow 30–40% text expansion for German/Finnish translations relative to English; test UI with pseudo-localization that pads strings.
|
|
45
46
|
- Use `min-height` instead of fixed `height` on text containers to accommodate CJK line-height requirements (1.6–1.8 recommended).
|
|
46
|
-
- Define font stacks per script family: Latin, CJK, Arabic, Devanagari —
|
|
47
|
+
- Define font stacks per script family: Latin, CJK, Arabic, Devanagari — each stack must include a web-safe fallback and `system-ui`.
|
|
47
48
|
- Truncate overflowing text with `text-overflow: ellipsis` plus `overflow: hidden` and `white-space: nowrap` only when semantically safe; provide title/tooltip with full text.
|
|
48
49
|
- Avoid fixed-width containers for translatable text; prefer `min-width` / `max-width` with flex/grid layout.
|
|
49
50
|
|
package/rules/hatch3r-i18n.mdc
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
description: Internationalization, localization, and RTL support conventions for the project
|
|
3
|
-
globs: src/**/*.vue, src/**/*.tsx, src/**/*.jsx, src/**/*.ts
|
|
3
|
+
globs: ["src/**/*.vue", "src/**/*.tsx", "src/**/*.jsx", "src/**/*.ts", "**/locales/**", "**/i18n/**", "**/*i18n*", "**/*locale*"]
|
|
4
4
|
alwaysApply: false
|
|
5
5
|
---
|
|
6
6
|
# Internationalization & RTL
|
|
@@ -2,8 +2,9 @@
|
|
|
2
2
|
id: hatch3r-learning-consult
|
|
3
3
|
type: rule
|
|
4
4
|
description: Auto-consult project learnings before implementation
|
|
5
|
-
scope:
|
|
5
|
+
scope: "**/.agents/learnings/**,**/learnings/**"
|
|
6
6
|
tags: [core]
|
|
7
|
+
quality_charter: agents/shared/quality-charter.md
|
|
7
8
|
---
|
|
8
9
|
# Learning Consultation
|
|
9
10
|
|
|
@@ -29,3 +30,12 @@ Before implementing any task, check `.agents/learnings/` for relevant past learn
|
|
|
29
30
|
- **Pitfalls** are highest priority -- always surface these.
|
|
30
31
|
- **Patterns** are surfaced when the current task matches their trigger conditions.
|
|
31
32
|
- **Decisions** are surfaced when the current task might conflict with a past decision.
|
|
33
|
+
|
|
34
|
+
## Consultation Efficiency
|
|
35
|
+
|
|
36
|
+
To avoid excessive token usage during consultation:
|
|
37
|
+
|
|
38
|
+
1. **Scan frontmatter first.** Read only the YAML frontmatter (`tags`, `area`, `category`) of each learning file to determine relevance. Only read the full body of learnings that match the current task.
|
|
39
|
+
2. **Limit surfaced learnings.** Present at most 5 relevant learnings per consultation. If more are relevant, prioritize by confidence level (high > medium > low) and recency.
|
|
40
|
+
3. **Cache consultation results.** If consultation was already performed for this task (e.g., during board-pickup), do not re-consult during skill execution. The orchestrator passes relevant learnings to subagents as part of prompt enrichment.
|
|
41
|
+
4. **Skip when empty.** If `.agents/learnings/` has fewer than 3 files, consultation overhead exceeds value. Skip silently.
|
|
@@ -1,6 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
description: Auto-consult project learnings before implementation
|
|
3
|
-
|
|
3
|
+
globs: "**/.agents/learnings/**,**/learnings/**"
|
|
4
|
+
alwaysApply: false
|
|
4
5
|
---
|
|
5
6
|
# Learning Consultation
|
|
6
7
|
|
|
@@ -26,3 +27,12 @@ Before implementing any task, check `.agents/learnings/` for relevant past learn
|
|
|
26
27
|
- **Pitfalls** are highest priority -- always surface these.
|
|
27
28
|
- **Patterns** are surfaced when the current task matches their trigger conditions.
|
|
28
29
|
- **Decisions** are surfaced when the current task might conflict with a past decision.
|
|
30
|
+
|
|
31
|
+
## Consultation Efficiency
|
|
32
|
+
|
|
33
|
+
To avoid excessive token usage during consultation:
|
|
34
|
+
|
|
35
|
+
1. **Scan frontmatter first.** Read only the YAML frontmatter (`tags`, `area`, `category`) of each learning file to determine relevance. Only read the full body of learnings that match the current task.
|
|
36
|
+
2. **Limit surfaced learnings.** Present at most 5 relevant learnings per consultation. If more are relevant, prioritize by confidence level (high > medium > low) and recency.
|
|
37
|
+
3. **Cache consultation results.** If consultation was already performed for this task (e.g., during board-pickup), do not re-consult during skill execution. The orchestrator passes relevant learnings to subagents as part of prompt enrichment.
|
|
38
|
+
4. **Skip when empty.** If `.agents/learnings/` has fewer than 3 files, consultation overhead exceeds value. Skip silently.
|
|
@@ -2,8 +2,9 @@
|
|
|
2
2
|
id: hatch3r-migrations
|
|
3
3
|
type: rule
|
|
4
4
|
description: Database migration and schema change patterns for the project
|
|
5
|
-
scope:
|
|
5
|
+
scope: "**/migrations/**,**/*migration*,**/migrate/**,**/seeds/**,**/seeders/**,**/prisma/migrations/**,**/drizzle/**,**/knex/**"
|
|
6
6
|
tags: [implementation, brownfield]
|
|
7
|
+
quality_charter: agents/shared/quality-charter.md
|
|
7
8
|
---
|
|
8
9
|
# Migrations
|
|
9
10
|
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
description: Database migration and schema change patterns for the project
|
|
3
|
-
|
|
3
|
+
globs: ["**/migrations/**", "**/*migration*", "**/migrate/**", "**/seeds/**", "**/seeders/**", "**/prisma/migrations/**", "**/drizzle/**", "**/knex/**"]
|
|
4
4
|
---
|
|
5
5
|
# Migrations
|
|
6
6
|
|
|
@@ -13,3 +13,14 @@ alwaysApply: false
|
|
|
13
13
|
- Document schema changes in project data model spec.
|
|
14
14
|
- Rollback plan required for every migration. Never run destructive migrations without backup verification.
|
|
15
15
|
- Hot documents must stay within size limits after migration.
|
|
16
|
+
|
|
17
|
+
## Data Validation During Migration
|
|
18
|
+
|
|
19
|
+
- Validate data integrity after each migration step, not just at the end. Check that migrated records match the expected schema, required fields are populated, and no data was silently dropped.
|
|
20
|
+
- Include count checks: the number of records processed should match the number of records in the source collection. Log discrepancies as errors, not warnings.
|
|
21
|
+
- For large datasets, migrate in batches with progress checkpoints. If a batch fails, resume from the last checkpoint rather than restarting the entire migration.
|
|
22
|
+
|
|
23
|
+
## Migration Coordination in Multi-Service Environments
|
|
24
|
+
|
|
25
|
+
- When a migration affects shared data (e.g., a schema used by multiple services), coordinate the migration order across services. The consuming services must be deployed with backward-compatible readers before the migration runs.
|
|
26
|
+
- Never assume that all service instances will be running the same code version during a migration window. Design migrations to tolerate mixed-version reads and writes during the rollout period.
|
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: hatch3r-observability-logging
|
|
3
|
+
type: rule
|
|
4
|
+
description: Structured logging and error reporting conventions for the project
|
|
5
|
+
scope: conditional
|
|
6
|
+
globs: "**/*log*,**/*logger*,**/*logging*,**/*error*,**/observability/**"
|
|
7
|
+
tags: [devops]
|
|
8
|
+
quality_charter: agents/shared/quality-charter.md
|
|
9
|
+
---
|
|
10
|
+
# Observability -- Logging & Error Reporting
|
|
11
|
+
|
|
12
|
+
Logging and error reporting conventions. For metrics, SLOs, alerting, and dashboards see `hatch3r-observability-metrics`. For distributed tracing and OpenTelemetry conventions see `hatch3r-observability-tracing`.
|
|
13
|
+
|
|
14
|
+
## Structured Logging
|
|
15
|
+
|
|
16
|
+
- Use structured JSON logging. No `console.log` in production code.
|
|
17
|
+
- Log levels: `error` (failures), `warn` (degraded), `info` (state changes), `debug` (dev only).
|
|
18
|
+
- Every log entry includes `correlationId` and `userId` (if available).
|
|
19
|
+
- Never log secrets, PII, tokens, passwords, or sensitive content.
|
|
20
|
+
- Instrument key operations with timing metrics. Serverless functions log execution time and outcome.
|
|
21
|
+
- Client-side: log errors to a sink (e.g., error reporting service), not just `console.error`.
|
|
22
|
+
- Prefer event-based metrics over polling. Trace user flows end-to-end with `correlationId`.
|
|
23
|
+
- Respect performance budgets: logging must not add > 10ms latency to hot paths.
|
|
24
|
+
- Include `service`, `environment`, and `version` fields in every log entry for filtering.
|
|
25
|
+
- Use log sampling for high-volume debug logs in production (e.g., 1% sample rate).
|
|
26
|
+
|
|
27
|
+
## Structured Error Reporting
|
|
28
|
+
|
|
29
|
+
- Integrate Sentry (or equivalent) for automated error capture in both server and client environments.
|
|
30
|
+
- Configure release tracking: tag errors with `release` (git SHA or semver) and upload source maps for readable stack traces.
|
|
31
|
+
- Enable breadcrumbs: capture the last 50 user actions, network requests, and console messages leading to an error.
|
|
32
|
+
- Error grouping: use custom fingerprints for domain-specific errors to prevent over-grouping. Default fingerprinting is acceptable for unhandled exceptions.
|
|
33
|
+
- Enrich error context with `correlationId`, `userId`, environment, and relevant business state. Never attach PII or secrets.
|
|
34
|
+
- Set sample rates: 100% for errors, 10% for transactions in production. Adjust based on volume and budget.
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Structured logging and error reporting conventions for the project
|
|
3
|
+
globs: ["**/*log*", "**/*logger*", "**/*logging*", "**/*error*", "**/observability/**"]
|
|
4
|
+
alwaysApply: false
|
|
5
|
+
---
|
|
6
|
+
# Observability -- Logging & Error Reporting
|
|
7
|
+
|
|
8
|
+
Logging and error reporting conventions. For metrics, SLOs, alerting, and dashboards see `hatch3r-observability-metrics`. For distributed tracing and OpenTelemetry conventions see `hatch3r-observability-tracing`.
|
|
9
|
+
|
|
10
|
+
## Structured Logging
|
|
11
|
+
|
|
12
|
+
- Use structured JSON logging. No `console.log` in production code.
|
|
13
|
+
- Log levels: `error` (failures), `warn` (degraded), `info` (state changes), `debug` (dev only).
|
|
14
|
+
- Every log entry includes `correlationId` and `userId` (if available).
|
|
15
|
+
- Never log secrets, PII, tokens, passwords, or sensitive content.
|
|
16
|
+
- Instrument key operations with timing metrics. Serverless functions log execution time and outcome.
|
|
17
|
+
- Client-side: log errors to a sink (e.g., error reporting service), not just `console.error`.
|
|
18
|
+
- Prefer event-based metrics over polling. Trace user flows end-to-end with `correlationId`.
|
|
19
|
+
- Respect performance budgets: logging must not add > 10ms latency to hot paths.
|
|
20
|
+
- Include `service`, `environment`, and `version` fields in every log entry for filtering.
|
|
21
|
+
- Use log sampling for high-volume debug logs in production (e.g., 1% sample rate).
|
|
22
|
+
|
|
23
|
+
## Structured Error Reporting
|
|
24
|
+
|
|
25
|
+
- Integrate Sentry (or equivalent) for automated error capture in both server and client environments.
|
|
26
|
+
- Configure release tracking: tag errors with `release` (git SHA or semver) and upload source maps for readable stack traces.
|
|
27
|
+
- Enable breadcrumbs: capture the last 50 user actions, network requests, and console messages leading to an error.
|
|
28
|
+
- Error grouping: use custom fingerprints for domain-specific errors to prevent over-grouping. Default fingerprinting is acceptable for unhandled exceptions.
|
|
29
|
+
- Enrich error context with `correlationId`, `userId`, environment, and relevant business state. Never attach PII or secrets.
|
|
30
|
+
- Set sample rates: 100% for errors, 10% for transactions in production. Adjust based on volume and budget.
|
|
@@ -0,0 +1,74 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: hatch3r-observability-metrics
|
|
3
|
+
type: rule
|
|
4
|
+
description: Metrics, SLO/SLI definitions, alerting, and dashboard conventions for the project
|
|
5
|
+
scope: conditional
|
|
6
|
+
globs: "**/*metric*,**/*slo*,**/*sli*,**/*alert*,**/*dashboard*,**/observability/**"
|
|
7
|
+
tags: [devops]
|
|
8
|
+
quality_charter: agents/shared/quality-charter.md
|
|
9
|
+
---
|
|
10
|
+
# Observability -- Metrics, SLOs & Alerting
|
|
11
|
+
|
|
12
|
+
Metrics, SLO/SLI, alerting, and dashboard conventions. For structured logging see `hatch3r-observability-logging`. For distributed tracing and OpenTelemetry conventions see `hatch3r-observability-tracing`.
|
|
13
|
+
|
|
14
|
+
## Metrics
|
|
15
|
+
|
|
16
|
+
- Use OpenTelemetry Metrics SDK. Expose Prometheus-compatible `/metrics` endpoint for scraping where applicable.
|
|
17
|
+
- Metric naming: `{service}.{domain}.{metric}_{unit}` in snake_case. Example: `api.auth.login_duration_ms`.
|
|
18
|
+
- Instrument types and when to use:
|
|
19
|
+
|
|
20
|
+
| Instrument | Use Case | Example |
|
|
21
|
+
| ----------- | ---------------------------------- | -------------------------------- |
|
|
22
|
+
| Counter | Monotonically increasing totals | `http.requests_total` |
|
|
23
|
+
| Histogram | Distributions (latency, size) | `http.request_duration_ms` |
|
|
24
|
+
| Gauge | Point-in-time values | `db.connection_pool_active` |
|
|
25
|
+
| UpDownCounter | Values that increase and decrease | `queue.messages_pending` |
|
|
26
|
+
|
|
27
|
+
- Histogram buckets for latency: `[5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10000]` ms.
|
|
28
|
+
- Cardinality management: never use unbounded values (user IDs, request paths with params) as metric labels. Cap label cardinality to < 100 unique values per metric.
|
|
29
|
+
- Custom business metrics: track domain-significant events (sign-ups, purchases, feature usage) as counters with relevant dimensions.
|
|
30
|
+
|
|
31
|
+
## SLO / SLI Definitions
|
|
32
|
+
|
|
33
|
+
- Define SLIs as ratios of good events to total events, measured from the user's perspective.
|
|
34
|
+
- Standard SLIs:
|
|
35
|
+
|
|
36
|
+
| SLI | Definition | Measurement Source |
|
|
37
|
+
| ---------------- | --------------------------------------------- | ------------------------ |
|
|
38
|
+
| Availability | Requests returning non-5xx / total requests | Load balancer logs |
|
|
39
|
+
| Latency | Requests completing < threshold / total | Tracing p99 |
|
|
40
|
+
| Error rate | Failed operations / total operations | Application metrics |
|
|
41
|
+
| Freshness | Data updated within SLA / total records | Background job metrics |
|
|
42
|
+
|
|
43
|
+
- SLO targets: set per-service. Typical starting points: 99.9% availability (43 min/month budget), p99 latency < 500ms.
|
|
44
|
+
- Error budgets: `budget = 1 - SLO_target`. Track remaining budget on a rolling 30-day window.
|
|
45
|
+
- Burn rate alerts: use multi-window approach (short + long window). Fast-burn alert: 2% budget consumed in 1 hour. Slow-burn alert: 5% consumed in 6 hours. Alert only when both windows confirm.
|
|
46
|
+
|
|
47
|
+
## Alerting
|
|
48
|
+
|
|
49
|
+
| Severity | Criteria | Response Time | Notification |
|
|
50
|
+
| -------- | ----------------------------------- | ------------- | ------------------- |
|
|
51
|
+
| P1 | Service down, data loss risk | 15 min | Page on-call + Slack |
|
|
52
|
+
| P2 | Degraded performance, SLO at risk | 1 hour | Page on-call |
|
|
53
|
+
| P3 | Non-critical issue, workaround exists | Next business day | Slack channel |
|
|
54
|
+
| P4 | Cosmetic / low-impact | Sprint backlog | Ticket only |
|
|
55
|
+
|
|
56
|
+
- Every alert must link to a runbook with: symptoms, likely causes, diagnostic steps, remediation actions.
|
|
57
|
+
- Alert fatigue prevention: tune thresholds to < 5 actionable alerts per on-call shift. Suppress duplicate alerts within a 10-minute dedup window.
|
|
58
|
+
- Route alerts by service ownership. Use escalation policies: if P1/P2 unacknowledged in 15 min, escalate to secondary.
|
|
59
|
+
- Review alert quality monthly: snooze/delete alerts with < 20% action rate.
|
|
60
|
+
|
|
61
|
+
## Dashboard Standards
|
|
62
|
+
|
|
63
|
+
- Required dashboards per service:
|
|
64
|
+
|
|
65
|
+
| Dashboard | Contents |
|
|
66
|
+
| ---------------- | ----------------------------------------------------------- |
|
|
67
|
+
| Service Health | Request rate, error rate, latency p50/p95/p99, saturation |
|
|
68
|
+
| Business Metrics | Key domain counters, conversion funnels, feature adoption |
|
|
69
|
+
| Dependencies | Upstream/downstream latency, error rates, circuit breaker state |
|
|
70
|
+
| Infrastructure | CPU, memory, disk, connection pools, queue depth |
|
|
71
|
+
|
|
72
|
+
- Dashboard-as-code: define dashboards in version-controlled JSON/YAML (Grafana provisioning, Terraform, or equivalent). No manual dashboard creation in production.
|
|
73
|
+
- Every dashboard panel includes: descriptive title, unit labels, threshold lines for SLO targets, and a link to the relevant runbook or alert.
|
|
74
|
+
- Review dashboards quarterly: remove unused panels, update thresholds, verify data source accuracy.
|