antigravity-ai-kit 3.1.1 → 3.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (43) hide show
  1. package/.agent/agents/planner.md +205 -62
  2. package/.agent/contexts/plan-quality-log.md +30 -0
  3. package/.agent/engine/loading-rules.json +37 -3
  4. package/.agent/hooks/hooks.json +10 -0
  5. package/.agent/manifest.json +4 -3
  6. package/.agent/skills/plan-validation/SKILL.md +192 -0
  7. package/.agent/skills/plan-writing/SKILL.md +47 -8
  8. package/.agent/skills/plan-writing/domain-enhancers.md +114 -0
  9. package/.agent/skills/plan-writing/plan-retrospective.md +116 -0
  10. package/.agent/skills/plan-writing/plan-schema.md +119 -0
  11. package/.agent/workflows/plan.md +49 -5
  12. package/README.md +30 -29
  13. package/bin/ag-kit.js +26 -5
  14. package/lib/agent-registry.js +17 -3
  15. package/lib/agent-reputation.js +3 -11
  16. package/lib/circuit-breaker.js +195 -0
  17. package/lib/cli-commands.js +88 -1
  18. package/lib/config-validator.js +274 -0
  19. package/lib/conflict-detector.js +29 -22
  20. package/lib/constants.js +35 -0
  21. package/lib/engineering-manager.js +9 -27
  22. package/lib/error-budget.js +105 -29
  23. package/lib/hook-system.js +8 -4
  24. package/lib/identity.js +22 -27
  25. package/lib/io.js +74 -0
  26. package/lib/loading-engine.js +248 -35
  27. package/lib/logger.js +118 -0
  28. package/lib/marketplace.js +43 -20
  29. package/lib/plugin-system.js +55 -31
  30. package/lib/plugin-verifier.js +197 -0
  31. package/lib/rate-limiter.js +113 -0
  32. package/lib/security-scanner.js +1 -4
  33. package/lib/self-healing.js +58 -24
  34. package/lib/session-manager.js +51 -48
  35. package/lib/skill-sandbox.js +1 -1
  36. package/lib/task-governance.js +10 -11
  37. package/lib/task-model.js +42 -27
  38. package/lib/updater.js +1 -1
  39. package/lib/verify.js +4 -4
  40. package/lib/workflow-engine.js +88 -68
  41. package/lib/workflow-events.js +166 -0
  42. package/lib/workflow-persistence.js +19 -19
  43. package/package.json +2 -2
@@ -0,0 +1,192 @@
1
+ ---
2
+ name: plan-validation
3
+ description: Quality gate for implementation plans. Validates schema compliance, cross-cutting concerns, and completeness scoring before user presentation.
4
+ version: 1.0.0
5
+ triggers: [post-plan-creation]
6
+ allowed-tools: Read, Grep
7
+ ---
8
+
9
+ # Plan Validation
10
+
11
+ > Quality gate ensuring every implementation plan meets enterprise standards
12
+ > before being presented to the user for approval.
13
+
14
+ ---
15
+
16
+ ## Overview
17
+
18
+ This skill is used by the planner agent as a self-validation checklist after creating a plan but BEFORE presenting it to the user. The planner applies the validation pipeline below to its own output, verifying against the quality schema (`plan-schema.md`), checking cross-cutting concerns, and calculating a completeness score. Plans that fail validation are revised before presentation.
19
+
20
+ **Invocation**: The planner runs this checklist during `/plan` workflow step 3.5. This is NOT a separate agent — the planner validates its own plan against these criteria.
21
+
22
+ ---
23
+
24
+ ## Validation Pipeline
25
+
26
+ ### Step 1: Task Size Classification
27
+
28
+ Determine the task size from the plan content:
29
+
30
+ | Indicator | Classification |
31
+ |-----------|---------------|
32
+ | Plan references 1-2 files | **Trivial** |
33
+ | Plan references 3-10 files | **Medium** |
34
+ | Plan references 10+ files | **Large** |
35
+ | Estimated effort < 30 minutes | **Trivial** |
36
+ | Estimated effort 1-4 hours | **Medium** |
37
+ | Estimated effort > 4 hours | **Large** |
38
+
39
+ Use the HIGHER classification when indicators conflict.
40
+
41
+ ### Step 2: Schema Compliance
42
+
43
+ Verify all required sections are present and substantively populated:
44
+
45
+ **Tier 1 Sections (Always Required)**:
46
+
47
+ | # | Section | Check |
48
+ |---|---------|-------|
49
+ | 1 | Context & Problem Statement | Present and >= 2 sentences |
50
+ | 2 | Goals & Non-Goals | Both goals AND non-goals stated |
51
+ | 3 | Implementation Steps | Steps have file paths and verification criteria |
52
+ | 4 | Testing Strategy | Test types specified with coverage targets |
53
+ | 5 | Security Considerations | Substantive content or explicit "N/A — [reason]" |
54
+ | 6 | Risks & Mitigations | At least 1 risk with severity and mitigation |
55
+ | 7 | Success Criteria | Measurable, checkable outcomes |
56
+
57
+ **Tier 2 Sections (Required for Medium/Large)**:
58
+
59
+ | # | Section | Check |
60
+ |---|---------|-------|
61
+ | 8 | Architecture Impact | Components and files identified |
62
+ | 9 | API / Data Model Changes | Schemas defined (or N/A with reason) |
63
+ | 10 | Rollback Strategy | Concrete undo procedure |
64
+ | 11 | Observability | Logging/metrics plan |
65
+ | 12 | Performance Impact | Assessment provided |
66
+ | 13 | Documentation Updates | Specific docs identified |
67
+ | 14 | Dependencies | Blockers and dependents listed |
68
+ | 15 | Alternatives Considered | At least 1 rejected alternative with reasoning |
69
+
70
+ ### Step 3: Cross-Cutting Verification
71
+
72
+ These sections MUST be non-empty regardless of task domain:
73
+
74
+ | Section | Acceptable Content |
75
+ |---------|-------------------|
76
+ | **Security Considerations** | Specific requirements from `rules/security.md` OR `N/A — [valid justification]` |
77
+ | **Testing Strategy** | At least unit test plan with coverage target OR `N/A — [valid justification]` |
78
+ | **Documentation Updates** | Specific docs listed OR `N/A — no docs affected` |
79
+
80
+ **Unacceptable**: Empty section, placeholder text, section completely missing.
81
+
82
+ ### Step 4: Specificity Audit
83
+
84
+ Verify that implementation steps are actionable, not vague:
85
+
86
+ | Vague (FAIL) | Specific (PASS) |
87
+ |-------------|-----------------|
88
+ | "Update the component" | "Add `onSubmit` handler to `src/components/LoginForm.tsx`" |
89
+ | "Add tests" | "Create `tests/auth.test.js` with login success/failure cases" |
90
+ | "Fix the bug" | "Change line 42 of `lib/parser.js`: replace `==` with `===`" |
91
+ | "Style the UI" | "Add Tailwind classes `flex gap-4 p-6` to `Header.tsx`" |
92
+
93
+ **Rule**: Every implementation step MUST include a file path.
94
+
95
+ ### Step 5: Completeness Scoring
96
+
97
+ Calculate the score using the rubric from `plan-schema.md`:
98
+
99
+ **Tier 1 Scoring** (60 points max):
100
+
101
+ | Section | Points |
102
+ |---------|--------|
103
+ | Context & Problem Statement | 10 |
104
+ | Goals & Non-Goals | 10 |
105
+ | Implementation Steps | 10 |
106
+ | Testing Strategy | 10 |
107
+ | Security Considerations | 10 |
108
+ | Risks & Mitigations | 5 |
109
+ | Success Criteria | 5 |
110
+
111
+ **Tier 2 Scoring** (20 additional points):
112
+
113
+ | Section | Points |
114
+ |---------|--------|
115
+ | Architecture Impact | 4 |
116
+ | API / Data Model Changes | 3 |
117
+ | Rollback Strategy | 3 |
118
+ | Observability | 2 |
119
+ | Performance Impact | 2 |
120
+ | Documentation Updates | 2 |
121
+ | Dependencies | 2 |
122
+ | Alternatives Considered | 2 |
123
+
124
+ **Score Rules**:
125
+ - Section present and substantively populated = full points
126
+ - Section present but placeholder/minimal = half points
127
+ - Section missing = 0 points
128
+ - "N/A" with valid justification = full points
129
+
130
+ **Domain Enhancement Scoring** (bonus/penalty on top of tier score):
131
+ - For each domain in `matchedDomains` from the loading engine:
132
+ - Domain enhancer section present and substantive = **+2 bonus points**
133
+ - Domain matched but enhancer section missing = **-2 penalty points**
134
+ - Domain matched with "N/A — [valid reason]" = no bonus, no penalty
135
+ - Maximum domain bonus: +6 points (3 domains × 2 points)
136
+ - Domain scoring does not change the pass threshold — it provides additional quality signal
137
+
138
+ ### Step 6: Verdict
139
+
140
+ | Condition | Verdict | Action |
141
+ |-----------|---------|--------|
142
+ | Score >= 70% of tier max | **PASS** | Present plan to user with score |
143
+ | Score < 70% of tier max | **REVISE** | Identify gaps, revise, re-validate |
144
+
145
+ **Revision Protocol**:
146
+ 1. Identify the specific missing or weak sections
147
+ 2. Provide targeted instructions to the planner for revision
148
+ 3. Re-run validation after revision
149
+ 4. Maximum 2 revision cycles — then present with warnings
150
+
151
+ ---
152
+
153
+ ## Output Format
154
+
155
+ After validation, append to the plan:
156
+
157
+ ```markdown
158
+ ## Plan Quality Assessment
159
+
160
+ **Task Size**: [Trivial/Medium/Large]
161
+ **Quality Score**: [X]/[max] ([percentage]%) [+N domain bonus / -N domain penalty]
162
+ **Verdict**: [PASS/REVISE]
163
+
164
+ ### Validation Results
165
+
166
+ | Check | Status |
167
+ |-------|--------|
168
+ | Schema Compliance | [sections present]/[sections required] |
169
+ | Cross-Cutting Concerns | [All addressed / Missing: X, Y] |
170
+ | Specificity Audit | [All steps have file paths / X steps lack paths] |
171
+ | Domain Enhancement | [N domains matched, N enhancer sections present] |
172
+ | Rules Consulted | [list of rule files referenced] |
173
+ | Matched Domains | [list from loading engine] |
174
+ ```
175
+
176
+ ---
177
+
178
+ ## Integration
179
+
180
+ - **Invoked by**: `/plan` workflow (step 3.5, between plan creation and user presentation)
181
+ - **Depends on**: `plan-schema.md` for scoring rubric, `domain-enhancers.md` for domain sections
182
+ - **Feeds into**: Plan quality score shown to user alongside the plan
183
+ - **Learning**: Quality scores are logged to `.agent/contexts/plan-quality-log.md` for adaptive improvement
184
+
185
+ ---
186
+
187
+ ## Principles
188
+
189
+ 1. **Validate, don't block**: The goal is quality improvement, not gatekeeping. After 2 revision cycles, present the plan with warnings rather than blocking indefinitely.
190
+ 2. **Score transparently**: The user sees the quality score and understands what was checked.
191
+ 3. **Learn from outcomes**: Post-implementation retrospectives compare predicted vs. actual to calibrate future scoring.
192
+ 4. **Cross-cutting is non-negotiable**: Security, testing, and documentation sections must ALWAYS be addressed. This is the single most impactful quality gate.
@@ -42,17 +42,29 @@ Framework for breaking down work into clear, actionable tasks with verification
42
42
 
43
43
  ## Planning Principles
44
44
 
45
- > 🔴 **NO fixed templates. Each plan is UNIQUE to the task.**
45
+ > 🔴 **NO fixed templates. Each plan's CONTENT is UNIQUE to the task.**
46
+ > ✅ **Every plan MUST satisfy the quality schema in `plan-schema.md`.**
47
+ > Dynamic content within a consistent structure = the standard.
46
48
 
47
- ### Principle 1: Keep It SHORT
49
+ ### Principle 1: Right-Size to Task Tier
48
50
 
49
- | Wrong | Right |
50
- | --------------------------- | --------------------- |
51
- | 50 tasks with sub-sub-tasks | 5-10 clear tasks max |
52
- | Every micro-step listed | Only actionable items |
53
- | Verbose descriptions | One-line per task |
51
+ Plan length MUST match task complexity:
54
52
 
55
- > **Rule:** If plan is longer than 1 page, it's too long. Simplify.
53
+ | Task Tier | Max Sections | Max Tasks | Guideline |
54
+ | --------- | ------------ | --------- | --------- |
55
+ | **Trivial** (1-2 files) | Tier 1 only (7 sections) | 5-8 tasks | ~1 page — concise, no specialist synthesis |
56
+ | **Medium** (3-10 files) | Tier 1 + Tier 2 (15 sections) | 8-15 tasks | 2-3 pages — includes specialist input |
57
+ | **Large** (10+ files) | Tier 1 + Tier 2 + domains (15+ sections) | 15-25 tasks | 3-5 pages — full multi-agent synthesis |
58
+
59
+ | ❌ Wrong | ✅ Right |
60
+ | -------- | -------- |
61
+ | 50 tasks with sub-sub-tasks | Right-sized task count per tier |
62
+ | Every micro-step listed | Only actionable items |
63
+ | Verbose descriptions | One-line per task |
64
+ | Large task crammed into 1 page | Large task gets full Tier 2 coverage |
65
+ | Trivial task with 15 sections | Trivial task uses Tier 1 only |
66
+
67
+ > **Rule:** Trivial tasks stay concise (~1 page). Medium/Large tasks expand to cover all required tier sections. Never sacrifice completeness for brevity on complex tasks.
56
68
 
57
69
  ---
58
70
 
@@ -98,6 +110,33 @@ Framework for breaking down work into clear, actionable tasks with verification
98
110
 
99
111
  ---
100
112
 
113
+ ### Principle 5: Cross-Cutting Concerns Are Mandatory
114
+
115
+ Every plan MUST explicitly address:
116
+
117
+ 1. **Security**: Reference `.agent/rules/security.md` — what security implications exist?
118
+ 2. **Testing**: Reference `.agent/rules/testing.md` — what test types are needed? Coverage targets?
119
+ 3. **Documentation**: Reference `.agent/rules/documentation.md` — which docs need updating?
120
+
121
+ If a concern is genuinely not applicable, state `N/A — [one-line justification]`.
122
+
123
+ **NEVER silently omit these sections.** Silent omission is a plan defect.
124
+
125
+ ---
126
+
127
+ ### Principle 6: Schema Compliance
128
+
129
+ Every plan MUST satisfy the quality schema defined in `plan-schema.md`:
130
+
131
+ - **Tier 1** sections are ALWAYS required. Omitting any Tier 1 section is a plan defect.
132
+ - **Tier 2** sections are required for Medium and Large tasks (3+ files or 1+ hours).
133
+ - Before presenting a plan, validate it against the schema checklist.
134
+ - Plans scoring below 70% of their tier maximum must be revised before presentation.
135
+
136
+ See also: `domain-enhancers.md` for domain-specific plan sections.
137
+
138
+ ---
139
+
101
140
  ## Plan Structure (Minimal)
102
141
 
103
142
  ```markdown
@@ -0,0 +1,114 @@
1
+ # Domain-Specific Plan Enhancers
2
+
3
+ > When the loading engine matches specific domains for a task, the planner
4
+ > MUST include the corresponding domain-specific sections below.
5
+ > These sections are additive to the base plan schema (Tier 1 + Tier 2).
6
+
7
+ ---
8
+
9
+ ## Frontend Domain
10
+
11
+ **Triggered when**: `frontend` domain matched (keywords: react, next.js, component, css, ui, ux, etc.)
12
+
13
+ Include in plan:
14
+
15
+ - **Accessibility (WCAG 2.1 AA)**: Identify components requiring ARIA labels, keyboard navigation, screen reader support, color contrast compliance
16
+ - **Responsive Design**: Specify breakpoints to test (mobile 375px, tablet 768px, desktop 1280px), identify layout changes per breakpoint
17
+ - **Bundle Size Impact**: Estimate size of new dependencies, identify tree-shaking opportunities, consider code splitting for new routes
18
+ - **Core Web Vitals**: Assess impact on LCP (largest contentful paint), CLS (cumulative layout shift), INP (interaction to next paint)
19
+ - **Component Composition**: Specify component hierarchy, prop interfaces, state management approach (local vs. global)
20
+
21
+ ---
22
+
23
+ ## Backend Domain
24
+
25
+ **Triggered when**: `backend` domain matched (keywords: api, server, node, express, middleware, endpoint, etc.)
26
+
27
+ Include in plan:
28
+
29
+ - **API Contract**: Define request/response schemas (Zod validation), HTTP methods, status codes, error response format
30
+ - **Error Handling**: Specify error response structure, error codes, client-facing messages vs. internal logging
31
+ - **Rate Limiting**: Identify endpoints requiring rate limits, specify limits (requests/minute/user), throttling strategy
32
+ - **Middleware Chain**: Document new middleware additions, execution order, impact on existing middleware stack
33
+ - **Database Interaction**: Query patterns (parameterized), transaction boundaries, connection pooling impact
34
+
35
+ ---
36
+
37
+ ## Database Domain
38
+
39
+ **Triggered when**: `database` domain matched (keywords: database, sql, migration, schema, query, orm, etc.)
40
+
41
+ Include in plan:
42
+
43
+ - **Migration Rollback**: Write both up and down migrations, test rollback procedure before deploying
44
+ - **Index Impact Analysis**: Identify queries affected by schema changes, recommend index additions/removals, estimate query performance impact
45
+ - **Data Integrity**: Define constraints (foreign keys, unique, not null, check), cascade behavior for deletions
46
+ - **Backup Verification**: Verify backup exists before destructive migrations, test restore procedure for critical tables
47
+ - **Query Performance**: Benchmark key queries before and after changes, set acceptable latency thresholds
48
+
49
+ ---
50
+
51
+ ## DevOps Domain
52
+
53
+ **Triggered when**: `devops` domain matched (keywords: deploy, ci, cd, docker, kubernetes, pipeline, etc.)
54
+
55
+ Include in plan:
56
+
57
+ - **Infrastructure Changes**: Specify IaC modifications (Dockerfile, docker-compose, CI config), environment variable additions
58
+ - **Monitoring & Alerting**: Define new metrics to track, alerting thresholds, dashboard updates
59
+ - **Progressive Rollout**: Strategy for deployment (canary → staged → full), rollback triggers, health check endpoints
60
+ - **Runbook Updates**: Document operational procedures for the new functionality, incident response steps
61
+ - **Environment Parity**: Verify changes work across dev, staging, and production environments
62
+
63
+ ---
64
+
65
+ ## Security Domain
66
+
67
+ **Triggered when**: `security` domain matched (keywords or implicit triggers: auth, login, signup, form, payment, etc.)
68
+
69
+ Include in plan (in addition to mandatory security considerations):
70
+
71
+ - **Threat Model (STRIDE)**: Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege — assess each for the change
72
+ - **Authentication Flow Impact**: How the change affects login, session management, token lifecycle
73
+ - **Data Classification**: Identify data sensitivity levels (public, internal, confidential, restricted), storage and transmission requirements
74
+ - **Compliance Requirements**: GDPR/CCPA implications (data minimization, consent, right to erasure)
75
+ - **Secret Management**: New secrets required, rotation policy, storage mechanism (environment variables only)
76
+
77
+ ---
78
+
79
+ ## Performance Domain
80
+
81
+ **Triggered when**: `performance` domain matched (keywords: slow, optimize, speed, bundle, lighthouse, cache, etc.)
82
+
83
+ Include in plan:
84
+
85
+ - **Performance Budget**: Define acceptable thresholds (page load time, API response time, memory usage)
86
+ - **Profiling Strategy**: Tools and methods to measure before/after (Lighthouse, Chrome DevTools, load testing)
87
+ - **Caching Strategy**: Cache layers (browser, CDN, application, database), TTL values, invalidation approach
88
+ - **Lazy Loading**: Identify resources for deferred loading, intersection observer patterns, dynamic imports
89
+ - **Benchmarking**: Define benchmark suite, baseline measurements, regression detection
90
+
91
+ ---
92
+
93
+ ## Mobile Domain
94
+
95
+ **Triggered when**: `mobile` domain matched (keywords: mobile, react native, expo, ios, android, etc.)
96
+
97
+ Include in plan:
98
+
99
+ - **Platform Parity**: Identify iOS vs. Android differences in behavior, UI, or API access
100
+ - **Offline Support**: Define offline behavior, data sync strategy, conflict resolution
101
+ - **App Store Guidelines**: Compliance with Apple/Google review guidelines for the feature
102
+ - **Native Modules**: Bridge requirements, native module dependencies, build configuration changes
103
+ - **Device Testing**: Target device matrix, screen size variations, OS version compatibility
104
+
105
+ ---
106
+
107
+ ## Usage
108
+
109
+ The planner reads this file when domain-specific sections are needed:
110
+
111
+ 1. Loading engine returns `matchedDomains` array
112
+ 2. For each matched domain, include the corresponding enhancer section
113
+ 3. Domain sections are added AFTER the base plan schema sections
114
+ 4. Multiple domains can be active simultaneously (e.g., frontend + backend for a full-stack feature)
@@ -0,0 +1,116 @@
1
+ # Plan Retrospective
2
+
3
+ > Post-implementation review protocol for measuring plan accuracy
4
+ > and feeding learnings back into future plan generation.
5
+
6
+ ---
7
+
8
+ ## Overview
9
+
10
+ After a planned task reaches the VERIFY phase (all implementation complete, tests running), this retrospective compares the original plan against actual implementation to identify accuracy gaps and improve future planning.
11
+
12
+ ---
13
+
14
+ ## When to Run
15
+
16
+ - **Primary Trigger**: The `plan-complete` hook in `.agent/hooks/hooks.json` fires when workflow state transitions to VERIFY phase
17
+ - **Manual Trigger**: User runs `/retrospective` on a completed plan
18
+ - **Data Flow**: The hook reads the original plan file (`docs/PLAN-{slug}.md`), compares against `git diff --name-only` from the plan's creation timestamp, then appends results to `.agent/contexts/plan-quality-log.md`
19
+ - **Frequency**: After every planned task completes implementation
20
+ - **Blocking**: No — this is a learning activity, not a quality gate (severity: medium, onFailure: log)
21
+ - **Planner Integration**: The planner reads `plan-quality-log.md` during Requirements Analysis (Step 1) to adjust estimates and predictions for future plans
22
+
23
+ ---
24
+
25
+ ## Retrospective Dimensions
26
+
27
+ ### 1. File Prediction Accuracy
28
+
29
+ Compare files listed in the plan vs. files actually modified:
30
+
31
+ | Metric | Measurement |
32
+ |--------|-------------|
33
+ | **Files Predicted** | Count of unique file paths in the plan |
34
+ | **Files Actually Modified** | Count from `git diff --name-only` against plan start |
35
+ | **Prediction Accuracy** | Predicted / Actual (percentage) |
36
+ | **Surprise Files** | Files modified that were NOT in the plan |
37
+ | **Unused Predictions** | Files in the plan that were NOT modified |
38
+
39
+ ### 2. Task Completeness
40
+
41
+ | Metric | Measurement |
42
+ |--------|-------------|
43
+ | **Tasks Planned** | Count of implementation steps in original plan |
44
+ | **Tasks Completed** | Steps that matched actual work |
45
+ | **Surprise Tasks** | Work done that wasn't in the plan |
46
+ | **Dropped Tasks** | Planned tasks that turned out unnecessary |
47
+ | **Completeness Score** | (Completed - Surprise) / Planned |
48
+
49
+ ### 3. Estimate Accuracy
50
+
51
+ | Metric | Measurement |
52
+ |--------|-------------|
53
+ | **Estimated Effort** | Total hours from plan |
54
+ | **Actual Effort** | Approximate actual time spent |
55
+ | **Drift** | Actual / Estimated (ratio; 1.0 = perfect) |
56
+ | **Drift Direction** | Over-estimated / Under-estimated / Accurate |
57
+
58
+ ### 4. Risk Prediction
59
+
60
+ | Metric | Measurement |
61
+ |--------|-------------|
62
+ | **Risks Identified** | Count of risks in plan |
63
+ | **Risks Materialized** | Planned risks that actually occurred |
64
+ | **Surprise Risks** | Unplanned risks that emerged |
65
+ | **Risk Prediction Rate** | Materialized / (Materialized + Surprise) |
66
+
67
+ ### 5. Specialist Contribution Value
68
+
69
+ | Specialist | Contribution Accurate? | Key Insight That Helped |
70
+ |-----------|----------------------|------------------------|
71
+ | Architect | Yes/No/Partial | [what was most useful] |
72
+ | Security-Reviewer | Yes/No/Partial | [what was most useful] |
73
+ | TDD-Guide | Yes/No/Partial | [what was most useful] |
74
+
75
+ ---
76
+
77
+ ## Output Format
78
+
79
+ Append one row to `.agent/contexts/plan-quality-log.md`:
80
+
81
+ ```markdown
82
+ | [date] | [plan name] | [quality score] | [files predicted] | [files actual] | [surprise count] | [estimate drift] | [key learning] |
83
+ ```
84
+
85
+ ### Key Learning Format
86
+
87
+ Capture the single most important learning in one sentence:
88
+
89
+ **Good examples**:
90
+ - "Auth tasks consistently require middleware changes not predicted in plans"
91
+ - "Database migration effort was 2x underestimated due to index rebuilding"
92
+ - "Frontend plans should always include accessibility testing as a task"
93
+
94
+ **Bad examples**:
95
+ - "The plan was good" (not actionable)
96
+ - "Everything went as expected" (no learning value)
97
+
98
+ ---
99
+
100
+ ## Adaptive Feedback
101
+
102
+ The planner agent reads `plan-quality-log.md` at the start of each planning session to:
103
+
104
+ 1. **Adjust estimates**: If historical drift is consistently 1.5x, multiply estimates by 1.5
105
+ 2. **Predict surprise files**: If auth tasks consistently miss middleware, proactively include middleware files
106
+ 3. **Weight risks**: If certain risk categories historically materialize, elevate their severity
107
+ 4. **Improve domain sections**: If specific domain enhancer sections are consistently unhelpful, deprioritize them
108
+ 5. **Value specialists**: If security-reviewer contributions are consistently accurate, weight their input more heavily
109
+
110
+ ---
111
+
112
+ ## Example Retrospective Entry
113
+
114
+ ```
115
+ | 2026-03-16 | PLAN-user-auth | 72/80 | 8 | 11 | 3 (middleware, session config, error handler) | 1.4x | Auth plans should include middleware and session store files by default |
116
+ ```
@@ -0,0 +1,119 @@
1
+ # Plan Quality Schema
2
+
3
+ > Defines the mandatory structure and scoring rubric for implementation plans.
4
+ > Every plan produced by the `/plan` workflow MUST satisfy this schema.
5
+
6
+ ---
7
+
8
+ ## Task Size Classification
9
+
10
+ Before applying the schema, classify the task:
11
+
12
+ | Size | Criteria | Required Tiers |
13
+ |------|----------|----------------|
14
+ | **Trivial** | 1-2 files, <30 minutes estimated effort | Tier 1 only |
15
+ | **Medium** | 3-10 files, 1-4 hours estimated effort | Tier 1 + Tier 2 |
16
+ | **Large** | 10+ files, multi-day effort | Tier 1 + Tier 2 + architect consultation |
17
+
18
+ ---
19
+
20
+ ## Tier 1 — Always Required
21
+
22
+ Every plan, regardless of task size, MUST include these sections:
23
+
24
+ | # | Section | Description | Points |
25
+ |---|---------|-------------|--------|
26
+ | 1 | **Context & Problem Statement** | Why this change is needed. 2-3 sentences covering the problem, impact, and motivation. | 10 |
27
+ | 2 | **Goals & Non-Goals** | What the plan achieves (goals) and what is explicitly out of scope (non-goals). Prevents scope creep. | 10 |
28
+ | 3 | **Implementation Steps** | Ordered tasks with exact file paths, specific actions, and verification criteria per step. | 10 |
29
+ | 4 | **Testing Strategy** | Test types required (unit, integration, e2e), coverage targets, key test cases. Reference `.agent/rules/testing.md`. | 10 |
30
+ | 5 | **Security Considerations** | Applicable security requirements from `.agent/rules/security.md`. If genuinely not applicable, state `N/A — [one-line justification]`. | 10 |
31
+ | 6 | **Risks & Mitigations** | At least 1 risk with severity (Low/Medium/High) and concrete mitigation strategy. | 5 |
32
+ | 7 | **Success Criteria** | Measurable definition of done. Checkboxes with specific, verifiable outcomes. | 5 |
33
+
34
+ **Tier 1 Maximum: 60 points**
35
+
36
+ ---
37
+
38
+ ## Tier 2 — Required for Medium & Large Tasks
39
+
40
+ Plans for tasks affecting 3+ files or requiring 1+ hours MUST also include:
41
+
42
+ | # | Section | Description | Points |
43
+ |---|---------|-------------|--------|
44
+ | 8 | **Architecture Impact** | Affected components/modules, integration points, dependency changes. Include component diagram for Large tasks. | 4 |
45
+ | 9 | **API / Data Model Changes** | New or modified endpoints, request/response schemas, database schema changes. | 3 |
46
+ | 10 | **Rollback Strategy** | How to undo the change if deployment fails or defects are discovered post-release. | 3 |
47
+ | 11 | **Observability** | Logging additions, metrics to track, alerting changes, monitoring dashboards affected. | 2 |
48
+ | 12 | **Performance Impact** | Bundle size changes, query performance, latency estimates, memory usage. | 2 |
49
+ | 13 | **Documentation Updates** | Which docs need changing (ROADMAP, CHANGELOG, README, API docs, ADRs). Reference `.agent/rules/documentation.md`. | 2 |
50
+ | 14 | **Dependencies** | What blocks this work (prerequisites). What depends on this work (downstream impact). | 2 |
51
+ | 15 | **Alternatives Considered** | At least 1 rejected approach with reasoning for why the chosen approach is superior. | 2 |
52
+
53
+ **Tier 2 Maximum: 20 points (added to Tier 1)**
54
+
55
+ ---
56
+
57
+ ## Domain Enhancement Scoring
58
+
59
+ When the loading engine matches specific domains (e.g., frontend, backend, security), the corresponding domain enhancer sections from `domain-enhancers.md` MUST be included. Domain sections are scored as **bonus points** on top of the tier maximum:
60
+
61
+ | Condition | Scoring Impact |
62
+ |-----------|---------------|
63
+ | Domain matched and enhancer section present + substantive | +2 bonus points per domain |
64
+ | Domain matched but enhancer section missing | -2 penalty per missing domain (deducted from tier score) |
65
+ | Domain matched with "N/A — [valid reason]" | No bonus, no penalty |
66
+ | No domains matched | No impact |
67
+
68
+ **Maximum domain bonus**: +6 points (3 domains × 2 points each).
69
+
70
+ Domain scoring does NOT change the pass threshold — it provides additional quality signal. A plan can PASS without domain bonuses but will be penalized if matched domains are ignored.
71
+
72
+ ---
73
+
74
+ ## Scoring
75
+
76
+ | Task Size | Max Score | Pass Threshold (70%) |
77
+ |-----------|-----------|---------------------|
78
+ | Trivial | 60 | 42 |
79
+ | Medium | 80 | 56 |
80
+ | Large | 100 | 70 |
81
+
82
+ **Score Calculation**:
83
+ - A section earns full points when present and substantively populated
84
+ - A section earns zero points when missing or contains only placeholder text
85
+ - "N/A" with a valid justification counts as populated (earns full points)
86
+
87
+ **Verdict**:
88
+ - **PASS**: Score >= 70% of tier maximum
89
+ - **REVISE**: Score < 70% — identify missing sections and revise (max 2 revision cycles)
90
+
91
+ ---
92
+
93
+ ## Cross-Cutting Mandate
94
+
95
+ Regardless of task domain, these sections MUST be substantively addressed in every plan:
96
+
97
+ 1. **Security Considerations** (Tier 1, #5) — Reference `.agent/rules/security.md`
98
+ 2. **Testing Strategy** (Tier 1, #4) — Reference `.agent/rules/testing.md`
99
+ 3. **Documentation Updates** (Tier 2, #13) — Reference `.agent/rules/documentation.md`
100
+
101
+ If a cross-cutting section is genuinely not applicable, the plan MUST state:
102
+ ```
103
+ N/A — [specific reason this concern does not apply to this task]
104
+ ```
105
+
106
+ **NEVER silently omit a cross-cutting section.** Silent omission is a plan defect.
107
+
108
+ ---
109
+
110
+ ## Alignment Verification
111
+
112
+ Every plan MUST include an alignment check against operating constraints:
113
+
114
+ | Check | Question |
115
+ |-------|----------|
116
+ | Operating Constraints | Does this respect Trust > Optimization? |
117
+ | Existing Patterns | Does this follow project conventions? |
118
+ | Rules Consulted | Which rule files were reviewed? |
119
+ | Coding Style | Does this comply with `.agent/rules/coding-style.md`? |