antigravity-ai-kit 3.1.1 → 3.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agent/agents/planner.md +205 -62
- package/.agent/contexts/plan-quality-log.md +30 -0
- package/.agent/engine/loading-rules.json +37 -3
- package/.agent/hooks/hooks.json +10 -0
- package/.agent/manifest.json +4 -3
- package/.agent/skills/plan-validation/SKILL.md +192 -0
- package/.agent/skills/plan-writing/SKILL.md +47 -8
- package/.agent/skills/plan-writing/domain-enhancers.md +114 -0
- package/.agent/skills/plan-writing/plan-retrospective.md +116 -0
- package/.agent/skills/plan-writing/plan-schema.md +119 -0
- package/.agent/workflows/plan.md +49 -5
- package/README.md +30 -29
- package/bin/ag-kit.js +26 -5
- package/lib/agent-registry.js +17 -3
- package/lib/agent-reputation.js +3 -11
- package/lib/circuit-breaker.js +195 -0
- package/lib/cli-commands.js +88 -1
- package/lib/config-validator.js +274 -0
- package/lib/conflict-detector.js +29 -22
- package/lib/constants.js +35 -0
- package/lib/engineering-manager.js +9 -27
- package/lib/error-budget.js +105 -29
- package/lib/hook-system.js +8 -4
- package/lib/identity.js +22 -27
- package/lib/io.js +74 -0
- package/lib/loading-engine.js +248 -35
- package/lib/logger.js +118 -0
- package/lib/marketplace.js +43 -20
- package/lib/plugin-system.js +55 -31
- package/lib/plugin-verifier.js +197 -0
- package/lib/rate-limiter.js +113 -0
- package/lib/security-scanner.js +1 -4
- package/lib/self-healing.js +58 -24
- package/lib/session-manager.js +51 -48
- package/lib/skill-sandbox.js +1 -1
- package/lib/task-governance.js +10 -11
- package/lib/task-model.js +42 -27
- package/lib/updater.js +1 -1
- package/lib/verify.js +4 -4
- package/lib/workflow-engine.js +88 -68
- package/lib/workflow-events.js +166 -0
- package/lib/workflow-persistence.js +19 -19
- package/package.json +2 -2
|
@@ -0,0 +1,192 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: plan-validation
|
|
3
|
+
description: Quality gate for implementation plans. Validates schema compliance, cross-cutting concerns, and completeness scoring before user presentation.
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
triggers: [post-plan-creation]
|
|
6
|
+
allowed-tools: Read, Grep
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# Plan Validation
|
|
10
|
+
|
|
11
|
+
> Quality gate ensuring every implementation plan meets enterprise standards
|
|
12
|
+
> before being presented to the user for approval.
|
|
13
|
+
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
## Overview
|
|
17
|
+
|
|
18
|
+
This skill is used by the planner agent as a self-validation checklist after creating a plan but BEFORE presenting it to the user. The planner applies the validation pipeline below to its own output, verifying against the quality schema (`plan-schema.md`), checking cross-cutting concerns, and calculating a completeness score. Plans that fail validation are revised before presentation.
|
|
19
|
+
|
|
20
|
+
**Invocation**: The planner runs this checklist during `/plan` workflow step 3.5. This is NOT a separate agent — the planner validates its own plan against these criteria.
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## Validation Pipeline
|
|
25
|
+
|
|
26
|
+
### Step 1: Task Size Classification
|
|
27
|
+
|
|
28
|
+
Determine the task size from the plan content:
|
|
29
|
+
|
|
30
|
+
| Indicator | Classification |
|
|
31
|
+
|-----------|---------------|
|
|
32
|
+
| Plan references 1-2 files | **Trivial** |
|
|
33
|
+
| Plan references 3-10 files | **Medium** |
|
|
34
|
+
| Plan references 10+ files | **Large** |
|
|
35
|
+
| Estimated effort < 30 minutes | **Trivial** |
|
|
36
|
+
| Estimated effort 1-4 hours | **Medium** |
|
|
37
|
+
| Estimated effort > 4 hours | **Large** |
|
|
38
|
+
|
|
39
|
+
Use the HIGHER classification when indicators conflict.
|
|
40
|
+
|
|
41
|
+
### Step 2: Schema Compliance
|
|
42
|
+
|
|
43
|
+
Verify all required sections are present and substantively populated:
|
|
44
|
+
|
|
45
|
+
**Tier 1 Sections (Always Required)**:
|
|
46
|
+
|
|
47
|
+
| # | Section | Check |
|
|
48
|
+
|---|---------|-------|
|
|
49
|
+
| 1 | Context & Problem Statement | Present and >= 2 sentences |
|
|
50
|
+
| 2 | Goals & Non-Goals | Both goals AND non-goals stated |
|
|
51
|
+
| 3 | Implementation Steps | Steps have file paths and verification criteria |
|
|
52
|
+
| 4 | Testing Strategy | Test types specified with coverage targets |
|
|
53
|
+
| 5 | Security Considerations | Substantive content or explicit "N/A — [reason]" |
|
|
54
|
+
| 6 | Risks & Mitigations | At least 1 risk with severity and mitigation |
|
|
55
|
+
| 7 | Success Criteria | Measurable, checkable outcomes |
|
|
56
|
+
|
|
57
|
+
**Tier 2 Sections (Required for Medium/Large)**:
|
|
58
|
+
|
|
59
|
+
| # | Section | Check |
|
|
60
|
+
|---|---------|-------|
|
|
61
|
+
| 8 | Architecture Impact | Components and files identified |
|
|
62
|
+
| 9 | API / Data Model Changes | Schemas defined (or N/A with reason) |
|
|
63
|
+
| 10 | Rollback Strategy | Concrete undo procedure |
|
|
64
|
+
| 11 | Observability | Logging/metrics plan |
|
|
65
|
+
| 12 | Performance Impact | Assessment provided |
|
|
66
|
+
| 13 | Documentation Updates | Specific docs identified |
|
|
67
|
+
| 14 | Dependencies | Blockers and dependents listed |
|
|
68
|
+
| 15 | Alternatives Considered | At least 1 rejected alternative with reasoning |
|
|
69
|
+
|
|
70
|
+
### Step 3: Cross-Cutting Verification
|
|
71
|
+
|
|
72
|
+
These sections MUST be non-empty regardless of task domain:
|
|
73
|
+
|
|
74
|
+
| Section | Acceptable Content |
|
|
75
|
+
|---------|-------------------|
|
|
76
|
+
| **Security Considerations** | Specific requirements from `rules/security.md` OR `N/A — [valid justification]` |
|
|
77
|
+
| **Testing Strategy** | At least unit test plan with coverage target OR `N/A — [valid justification]` |
|
|
78
|
+
| **Documentation Updates** | Specific docs listed OR `N/A — no docs affected` |
|
|
79
|
+
|
|
80
|
+
**Unacceptable**: Empty section, placeholder text, section completely missing.
|
|
81
|
+
|
|
82
|
+
### Step 4: Specificity Audit
|
|
83
|
+
|
|
84
|
+
Verify that implementation steps are actionable, not vague:
|
|
85
|
+
|
|
86
|
+
| Vague (FAIL) | Specific (PASS) |
|
|
87
|
+
|-------------|-----------------|
|
|
88
|
+
| "Update the component" | "Add `onSubmit` handler to `src/components/LoginForm.tsx`" |
|
|
89
|
+
| "Add tests" | "Create `tests/auth.test.js` with login success/failure cases" |
|
|
90
|
+
| "Fix the bug" | "Change line 42 of `lib/parser.js`: replace `==` with `===`" |
|
|
91
|
+
| "Style the UI" | "Add Tailwind classes `flex gap-4 p-6` to `Header.tsx`" |
|
|
92
|
+
|
|
93
|
+
**Rule**: Every implementation step MUST include a file path.
|
|
94
|
+
|
|
95
|
+
### Step 5: Completeness Scoring
|
|
96
|
+
|
|
97
|
+
Calculate the score using the rubric from `plan-schema.md`:
|
|
98
|
+
|
|
99
|
+
**Tier 1 Scoring** (60 points max):
|
|
100
|
+
|
|
101
|
+
| Section | Points |
|
|
102
|
+
|---------|--------|
|
|
103
|
+
| Context & Problem Statement | 10 |
|
|
104
|
+
| Goals & Non-Goals | 10 |
|
|
105
|
+
| Implementation Steps | 10 |
|
|
106
|
+
| Testing Strategy | 10 |
|
|
107
|
+
| Security Considerations | 10 |
|
|
108
|
+
| Risks & Mitigations | 5 |
|
|
109
|
+
| Success Criteria | 5 |
|
|
110
|
+
|
|
111
|
+
**Tier 2 Scoring** (20 additional points):
|
|
112
|
+
|
|
113
|
+
| Section | Points |
|
|
114
|
+
|---------|--------|
|
|
115
|
+
| Architecture Impact | 4 |
|
|
116
|
+
| API / Data Model Changes | 3 |
|
|
117
|
+
| Rollback Strategy | 3 |
|
|
118
|
+
| Observability | 2 |
|
|
119
|
+
| Performance Impact | 2 |
|
|
120
|
+
| Documentation Updates | 2 |
|
|
121
|
+
| Dependencies | 2 |
|
|
122
|
+
| Alternatives Considered | 2 |
|
|
123
|
+
|
|
124
|
+
**Score Rules**:
|
|
125
|
+
- Section present and substantively populated = full points
|
|
126
|
+
- Section present but placeholder/minimal = half points
|
|
127
|
+
- Section missing = 0 points
|
|
128
|
+
- "N/A" with valid justification = full points
|
|
129
|
+
|
|
130
|
+
**Domain Enhancement Scoring** (bonus/penalty on top of tier score):
|
|
131
|
+
- For each domain in `matchedDomains` from the loading engine:
|
|
132
|
+
- Domain enhancer section present and substantive = **+2 bonus points**
|
|
133
|
+
- Domain matched but enhancer section missing = **-2 penalty points**
|
|
134
|
+
- Domain matched with "N/A — [valid reason]" = no bonus, no penalty
|
|
135
|
+
- Maximum domain bonus: +6 points (3 domains × 2 points)
|
|
136
|
+
- Domain scoring does not change the pass threshold — it provides additional quality signal
|
|
137
|
+
|
|
138
|
+
### Step 6: Verdict
|
|
139
|
+
|
|
140
|
+
| Condition | Verdict | Action |
|
|
141
|
+
|-----------|---------|--------|
|
|
142
|
+
| Score >= 70% of tier max | **PASS** | Present plan to user with score |
|
|
143
|
+
| Score < 70% of tier max | **REVISE** | Identify gaps, revise, re-validate |
|
|
144
|
+
|
|
145
|
+
**Revision Protocol**:
|
|
146
|
+
1. Identify the specific missing or weak sections
|
|
147
|
+
2. Provide targeted instructions to the planner for revision
|
|
148
|
+
3. Re-run validation after revision
|
|
149
|
+
4. Maximum 2 revision cycles — then present with warnings
|
|
150
|
+
|
|
151
|
+
---
|
|
152
|
+
|
|
153
|
+
## Output Format
|
|
154
|
+
|
|
155
|
+
After validation, append to the plan:
|
|
156
|
+
|
|
157
|
+
```markdown
|
|
158
|
+
## Plan Quality Assessment
|
|
159
|
+
|
|
160
|
+
**Task Size**: [Trivial/Medium/Large]
|
|
161
|
+
**Quality Score**: [X]/[max] ([percentage]%) [+N domain bonus / -N domain penalty]
|
|
162
|
+
**Verdict**: [PASS/REVISE]
|
|
163
|
+
|
|
164
|
+
### Validation Results
|
|
165
|
+
|
|
166
|
+
| Check | Status |
|
|
167
|
+
|-------|--------|
|
|
168
|
+
| Schema Compliance | [sections present]/[sections required] |
|
|
169
|
+
| Cross-Cutting Concerns | [All addressed / Missing: X, Y] |
|
|
170
|
+
| Specificity Audit | [All steps have file paths / X steps lack paths] |
|
|
171
|
+
| Domain Enhancement | [N domains matched, N enhancer sections present] |
|
|
172
|
+
| Rules Consulted | [list of rule files referenced] |
|
|
173
|
+
| Matched Domains | [list from loading engine] |
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
---
|
|
177
|
+
|
|
178
|
+
## Integration
|
|
179
|
+
|
|
180
|
+
- **Invoked by**: `/plan` workflow (step 3.5, between plan creation and user presentation)
|
|
181
|
+
- **Depends on**: `plan-schema.md` for scoring rubric, `domain-enhancers.md` for domain sections
|
|
182
|
+
- **Feeds into**: Plan quality score shown to user alongside the plan
|
|
183
|
+
- **Learning**: Quality scores are logged to `.agent/contexts/plan-quality-log.md` for adaptive improvement
|
|
184
|
+
|
|
185
|
+
---
|
|
186
|
+
|
|
187
|
+
## Principles
|
|
188
|
+
|
|
189
|
+
1. **Validate, don't block**: The goal is quality improvement, not gatekeeping. After 2 revision cycles, present the plan with warnings rather than blocking indefinitely.
|
|
190
|
+
2. **Score transparently**: The user sees the quality score and understands what was checked.
|
|
191
|
+
3. **Learn from outcomes**: Post-implementation retrospectives compare predicted vs. actual to calibrate future scoring.
|
|
192
|
+
4. **Cross-cutting is non-negotiable**: Security, testing, and documentation sections must ALWAYS be addressed. This is the single most impactful quality gate.
|
|
@@ -42,17 +42,29 @@ Framework for breaking down work into clear, actionable tasks with verification
|
|
|
42
42
|
|
|
43
43
|
## Planning Principles
|
|
44
44
|
|
|
45
|
-
> 🔴 **NO fixed templates. Each plan is UNIQUE to the task.**
|
|
45
|
+
> 🔴 **NO fixed templates. Each plan's CONTENT is UNIQUE to the task.**
|
|
46
|
+
> ✅ **Every plan MUST satisfy the quality schema in `plan-schema.md`.**
|
|
47
|
+
> Dynamic content within a consistent structure = the standard.
|
|
46
48
|
|
|
47
|
-
### Principle 1:
|
|
49
|
+
### Principle 1: Right-Size to Task Tier
|
|
48
50
|
|
|
49
|
-
|
|
50
|
-
| --------------------------- | --------------------- |
|
|
51
|
-
| 50 tasks with sub-sub-tasks | 5-10 clear tasks max |
|
|
52
|
-
| Every micro-step listed | Only actionable items |
|
|
53
|
-
| Verbose descriptions | One-line per task |
|
|
51
|
+
Plan length MUST match task complexity:
|
|
54
52
|
|
|
55
|
-
|
|
53
|
+
| Task Tier | Max Sections | Max Tasks | Guideline |
|
|
54
|
+
| --------- | ------------ | --------- | --------- |
|
|
55
|
+
| **Trivial** (1-2 files) | Tier 1 only (7 sections) | 5-8 tasks | ~1 page — concise, no specialist synthesis |
|
|
56
|
+
| **Medium** (3-10 files) | Tier 1 + Tier 2 (15 sections) | 8-15 tasks | 2-3 pages — includes specialist input |
|
|
57
|
+
| **Large** (10+ files) | Tier 1 + Tier 2 + domains (15+ sections) | 15-25 tasks | 3-5 pages — full multi-agent synthesis |
|
|
58
|
+
|
|
59
|
+
| ❌ Wrong | ✅ Right |
|
|
60
|
+
| -------- | -------- |
|
|
61
|
+
| 50 tasks with sub-sub-tasks | Right-sized task count per tier |
|
|
62
|
+
| Every micro-step listed | Only actionable items |
|
|
63
|
+
| Verbose descriptions | One-line per task |
|
|
64
|
+
| Large task crammed into 1 page | Large task gets full Tier 2 coverage |
|
|
65
|
+
| Trivial task with 15 sections | Trivial task uses Tier 1 only |
|
|
66
|
+
|
|
67
|
+
> **Rule:** Trivial tasks stay concise (~1 page). Medium/Large tasks expand to cover all required tier sections. Never sacrifice completeness for brevity on complex tasks.
|
|
56
68
|
|
|
57
69
|
---
|
|
58
70
|
|
|
@@ -98,6 +110,33 @@ Framework for breaking down work into clear, actionable tasks with verification
|
|
|
98
110
|
|
|
99
111
|
---
|
|
100
112
|
|
|
113
|
+
### Principle 5: Cross-Cutting Concerns Are Mandatory
|
|
114
|
+
|
|
115
|
+
Every plan MUST explicitly address:
|
|
116
|
+
|
|
117
|
+
1. **Security**: Reference `.agent/rules/security.md` — what security implications exist?
|
|
118
|
+
2. **Testing**: Reference `.agent/rules/testing.md` — what test types are needed? Coverage targets?
|
|
119
|
+
3. **Documentation**: Reference `.agent/rules/documentation.md` — which docs need updating?
|
|
120
|
+
|
|
121
|
+
If a concern is genuinely not applicable, state `N/A — [one-line justification]`.
|
|
122
|
+
|
|
123
|
+
**NEVER silently omit these sections.** Silent omission is a plan defect.
|
|
124
|
+
|
|
125
|
+
---
|
|
126
|
+
|
|
127
|
+
### Principle 6: Schema Compliance
|
|
128
|
+
|
|
129
|
+
Every plan MUST satisfy the quality schema defined in `plan-schema.md`:
|
|
130
|
+
|
|
131
|
+
- **Tier 1** sections are ALWAYS required. Omitting any Tier 1 section is a plan defect.
|
|
132
|
+
- **Tier 2** sections are required for Medium and Large tasks (3+ files or 1+ hours).
|
|
133
|
+
- Before presenting a plan, validate it against the schema checklist.
|
|
134
|
+
- Plans scoring below 70% of their tier maximum must be revised before presentation.
|
|
135
|
+
|
|
136
|
+
See also: `domain-enhancers.md` for domain-specific plan sections.
|
|
137
|
+
|
|
138
|
+
---
|
|
139
|
+
|
|
101
140
|
## Plan Structure (Minimal)
|
|
102
141
|
|
|
103
142
|
```markdown
|
|
@@ -0,0 +1,114 @@
|
|
|
1
|
+
# Domain-Specific Plan Enhancers
|
|
2
|
+
|
|
3
|
+
> When the loading engine matches specific domains for a task, the planner
|
|
4
|
+
> MUST include the corresponding domain-specific sections below.
|
|
5
|
+
> These sections are additive to the base plan schema (Tier 1 + Tier 2).
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Frontend Domain
|
|
10
|
+
|
|
11
|
+
**Triggered when**: `frontend` domain matched (keywords: react, next.js, component, css, ui, ux, etc.)
|
|
12
|
+
|
|
13
|
+
Include in plan:
|
|
14
|
+
|
|
15
|
+
- **Accessibility (WCAG 2.1 AA)**: Identify components requiring ARIA labels, keyboard navigation, screen reader support, color contrast compliance
|
|
16
|
+
- **Responsive Design**: Specify breakpoints to test (mobile 375px, tablet 768px, desktop 1280px), identify layout changes per breakpoint
|
|
17
|
+
- **Bundle Size Impact**: Estimate size of new dependencies, identify tree-shaking opportunities, consider code splitting for new routes
|
|
18
|
+
- **Core Web Vitals**: Assess impact on LCP (largest contentful paint), CLS (cumulative layout shift), INP (interaction to next paint)
|
|
19
|
+
- **Component Composition**: Specify component hierarchy, prop interfaces, state management approach (local vs. global)
|
|
20
|
+
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## Backend Domain
|
|
24
|
+
|
|
25
|
+
**Triggered when**: `backend` domain matched (keywords: api, server, node, express, middleware, endpoint, etc.)
|
|
26
|
+
|
|
27
|
+
Include in plan:
|
|
28
|
+
|
|
29
|
+
- **API Contract**: Define request/response schemas (Zod validation), HTTP methods, status codes, error response format
|
|
30
|
+
- **Error Handling**: Specify error response structure, error codes, client-facing messages vs. internal logging
|
|
31
|
+
- **Rate Limiting**: Identify endpoints requiring rate limits, specify limits (requests/minute/user), throttling strategy
|
|
32
|
+
- **Middleware Chain**: Document new middleware additions, execution order, impact on existing middleware stack
|
|
33
|
+
- **Database Interaction**: Query patterns (parameterized), transaction boundaries, connection pooling impact
|
|
34
|
+
|
|
35
|
+
---
|
|
36
|
+
|
|
37
|
+
## Database Domain
|
|
38
|
+
|
|
39
|
+
**Triggered when**: `database` domain matched (keywords: database, sql, migration, schema, query, orm, etc.)
|
|
40
|
+
|
|
41
|
+
Include in plan:
|
|
42
|
+
|
|
43
|
+
- **Migration Rollback**: Write both up and down migrations, test rollback procedure before deploying
|
|
44
|
+
- **Index Impact Analysis**: Identify queries affected by schema changes, recommend index additions/removals, estimate query performance impact
|
|
45
|
+
- **Data Integrity**: Define constraints (foreign keys, unique, not null, check), cascade behavior for deletions
|
|
46
|
+
- **Backup Verification**: Verify backup exists before destructive migrations, test restore procedure for critical tables
|
|
47
|
+
- **Query Performance**: Benchmark key queries before and after changes, set acceptable latency thresholds
|
|
48
|
+
|
|
49
|
+
---
|
|
50
|
+
|
|
51
|
+
## DevOps Domain
|
|
52
|
+
|
|
53
|
+
**Triggered when**: `devops` domain matched (keywords: deploy, ci, cd, docker, kubernetes, pipeline, etc.)
|
|
54
|
+
|
|
55
|
+
Include in plan:
|
|
56
|
+
|
|
57
|
+
- **Infrastructure Changes**: Specify IaC modifications (Dockerfile, docker-compose, CI config), environment variable additions
|
|
58
|
+
- **Monitoring & Alerting**: Define new metrics to track, alerting thresholds, dashboard updates
|
|
59
|
+
- **Progressive Rollout**: Strategy for deployment (canary → staged → full), rollback triggers, health check endpoints
|
|
60
|
+
- **Runbook Updates**: Document operational procedures for the new functionality, incident response steps
|
|
61
|
+
- **Environment Parity**: Verify changes work across dev, staging, and production environments
|
|
62
|
+
|
|
63
|
+
---
|
|
64
|
+
|
|
65
|
+
## Security Domain
|
|
66
|
+
|
|
67
|
+
**Triggered when**: `security` domain matched (keywords or implicit triggers: auth, login, signup, form, payment, etc.)
|
|
68
|
+
|
|
69
|
+
Include in plan (in addition to mandatory security considerations):
|
|
70
|
+
|
|
71
|
+
- **Threat Model (STRIDE)**: Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege — assess each for the change
|
|
72
|
+
- **Authentication Flow Impact**: How the change affects login, session management, token lifecycle
|
|
73
|
+
- **Data Classification**: Identify data sensitivity levels (public, internal, confidential, restricted), storage and transmission requirements
|
|
74
|
+
- **Compliance Requirements**: GDPR/CCPA implications (data minimization, consent, right to erasure)
|
|
75
|
+
- **Secret Management**: New secrets required, rotation policy, storage mechanism (environment variables only)
|
|
76
|
+
|
|
77
|
+
---
|
|
78
|
+
|
|
79
|
+
## Performance Domain
|
|
80
|
+
|
|
81
|
+
**Triggered when**: `performance` domain matched (keywords: slow, optimize, speed, bundle, lighthouse, cache, etc.)
|
|
82
|
+
|
|
83
|
+
Include in plan:
|
|
84
|
+
|
|
85
|
+
- **Performance Budget**: Define acceptable thresholds (page load time, API response time, memory usage)
|
|
86
|
+
- **Profiling Strategy**: Tools and methods to measure before/after (Lighthouse, Chrome DevTools, load testing)
|
|
87
|
+
- **Caching Strategy**: Cache layers (browser, CDN, application, database), TTL values, invalidation approach
|
|
88
|
+
- **Lazy Loading**: Identify resources for deferred loading, intersection observer patterns, dynamic imports
|
|
89
|
+
- **Benchmarking**: Define benchmark suite, baseline measurements, regression detection
|
|
90
|
+
|
|
91
|
+
---
|
|
92
|
+
|
|
93
|
+
## Mobile Domain
|
|
94
|
+
|
|
95
|
+
**Triggered when**: `mobile` domain matched (keywords: mobile, react native, expo, ios, android, etc.)
|
|
96
|
+
|
|
97
|
+
Include in plan:
|
|
98
|
+
|
|
99
|
+
- **Platform Parity**: Identify iOS vs. Android differences in behavior, UI, or API access
|
|
100
|
+
- **Offline Support**: Define offline behavior, data sync strategy, conflict resolution
|
|
101
|
+
- **App Store Guidelines**: Compliance with Apple/Google review guidelines for the feature
|
|
102
|
+
- **Native Modules**: Bridge requirements, native module dependencies, build configuration changes
|
|
103
|
+
- **Device Testing**: Target device matrix, screen size variations, OS version compatibility
|
|
104
|
+
|
|
105
|
+
---
|
|
106
|
+
|
|
107
|
+
## Usage
|
|
108
|
+
|
|
109
|
+
The planner reads this file when domain-specific sections are needed:
|
|
110
|
+
|
|
111
|
+
1. Loading engine returns `matchedDomains` array
|
|
112
|
+
2. For each matched domain, include the corresponding enhancer section
|
|
113
|
+
3. Domain sections are added AFTER the base plan schema sections
|
|
114
|
+
4. Multiple domains can be active simultaneously (e.g., frontend + backend for a full-stack feature)
|
|
@@ -0,0 +1,116 @@
|
|
|
1
|
+
# Plan Retrospective
|
|
2
|
+
|
|
3
|
+
> Post-implementation review protocol for measuring plan accuracy
|
|
4
|
+
> and feeding learnings back into future plan generation.
|
|
5
|
+
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Overview
|
|
9
|
+
|
|
10
|
+
After a planned task reaches the VERIFY phase (all implementation complete, tests running), this retrospective compares the original plan against actual implementation to identify accuracy gaps and improve future planning.
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## When to Run
|
|
15
|
+
|
|
16
|
+
- **Primary Trigger**: The `plan-complete` hook in `.agent/hooks/hooks.json` fires when workflow state transitions to VERIFY phase
|
|
17
|
+
- **Manual Trigger**: User runs `/retrospective` on a completed plan
|
|
18
|
+
- **Data Flow**: The hook reads the original plan file (`docs/PLAN-{slug}.md`), compares against `git diff --name-only` from the plan's creation timestamp, then appends results to `.agent/contexts/plan-quality-log.md`
|
|
19
|
+
- **Frequency**: After every planned task completes implementation
|
|
20
|
+
- **Blocking**: No — this is a learning activity, not a quality gate (severity: medium, onFailure: log)
|
|
21
|
+
- **Planner Integration**: The planner reads `plan-quality-log.md` during Requirements Analysis (Step 1) to adjust estimates and predictions for future plans
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## Retrospective Dimensions
|
|
26
|
+
|
|
27
|
+
### 1. File Prediction Accuracy
|
|
28
|
+
|
|
29
|
+
Compare files listed in the plan vs. files actually modified:
|
|
30
|
+
|
|
31
|
+
| Metric | Measurement |
|
|
32
|
+
|--------|-------------|
|
|
33
|
+
| **Files Predicted** | Count of unique file paths in the plan |
|
|
34
|
+
| **Files Actually Modified** | Count from `git diff --name-only` against plan start |
|
|
35
|
+
| **Prediction Accuracy** | Predicted / Actual (percentage) |
|
|
36
|
+
| **Surprise Files** | Files modified that were NOT in the plan |
|
|
37
|
+
| **Unused Predictions** | Files in the plan that were NOT modified |
|
|
38
|
+
|
|
39
|
+
### 2. Task Completeness
|
|
40
|
+
|
|
41
|
+
| Metric | Measurement |
|
|
42
|
+
|--------|-------------|
|
|
43
|
+
| **Tasks Planned** | Count of implementation steps in original plan |
|
|
44
|
+
| **Tasks Completed** | Steps that matched actual work |
|
|
45
|
+
| **Surprise Tasks** | Work done that wasn't in the plan |
|
|
46
|
+
| **Dropped Tasks** | Planned tasks that turned out unnecessary |
|
|
47
|
+
| **Completeness Score** | (Completed - Surprise) / Planned |
|
|
48
|
+
|
|
49
|
+
### 3. Estimate Accuracy
|
|
50
|
+
|
|
51
|
+
| Metric | Measurement |
|
|
52
|
+
|--------|-------------|
|
|
53
|
+
| **Estimated Effort** | Total hours from plan |
|
|
54
|
+
| **Actual Effort** | Approximate actual time spent |
|
|
55
|
+
| **Drift** | Actual / Estimated (ratio; 1.0 = perfect) |
|
|
56
|
+
| **Drift Direction** | Over-estimated / Under-estimated / Accurate |
|
|
57
|
+
|
|
58
|
+
### 4. Risk Prediction
|
|
59
|
+
|
|
60
|
+
| Metric | Measurement |
|
|
61
|
+
|--------|-------------|
|
|
62
|
+
| **Risks Identified** | Count of risks in plan |
|
|
63
|
+
| **Risks Materialized** | Planned risks that actually occurred |
|
|
64
|
+
| **Surprise Risks** | Unplanned risks that emerged |
|
|
65
|
+
| **Risk Prediction Rate** | Materialized / (Materialized + Surprise) |
|
|
66
|
+
|
|
67
|
+
### 5. Specialist Contribution Value
|
|
68
|
+
|
|
69
|
+
| Specialist | Contribution Accurate? | Key Insight That Helped |
|
|
70
|
+
|-----------|----------------------|------------------------|
|
|
71
|
+
| Architect | Yes/No/Partial | [what was most useful] |
|
|
72
|
+
| Security-Reviewer | Yes/No/Partial | [what was most useful] |
|
|
73
|
+
| TDD-Guide | Yes/No/Partial | [what was most useful] |
|
|
74
|
+
|
|
75
|
+
---
|
|
76
|
+
|
|
77
|
+
## Output Format
|
|
78
|
+
|
|
79
|
+
Append one row to `.agent/contexts/plan-quality-log.md`:
|
|
80
|
+
|
|
81
|
+
```markdown
|
|
82
|
+
| [date] | [plan name] | [quality score] | [files predicted] | [files actual] | [surprise count] | [estimate drift] | [key learning] |
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
### Key Learning Format
|
|
86
|
+
|
|
87
|
+
Capture the single most important learning in one sentence:
|
|
88
|
+
|
|
89
|
+
**Good examples**:
|
|
90
|
+
- "Auth tasks consistently require middleware changes not predicted in plans"
|
|
91
|
+
- "Database migration effort was 2x underestimated due to index rebuilding"
|
|
92
|
+
- "Frontend plans should always include accessibility testing as a task"
|
|
93
|
+
|
|
94
|
+
**Bad examples**:
|
|
95
|
+
- "The plan was good" (not actionable)
|
|
96
|
+
- "Everything went as expected" (no learning value)
|
|
97
|
+
|
|
98
|
+
---
|
|
99
|
+
|
|
100
|
+
## Adaptive Feedback
|
|
101
|
+
|
|
102
|
+
The planner agent reads `plan-quality-log.md` at the start of each planning session to:
|
|
103
|
+
|
|
104
|
+
1. **Adjust estimates**: If historical drift is consistently 1.5x, multiply estimates by 1.5
|
|
105
|
+
2. **Predict surprise files**: If auth tasks consistently miss middleware, proactively include middleware files
|
|
106
|
+
3. **Weight risks**: If certain risk categories historically materialize, elevate their severity
|
|
107
|
+
4. **Improve domain sections**: If specific domain enhancer sections are consistently unhelpful, deprioritize them
|
|
108
|
+
5. **Value specialists**: If security-reviewer contributions are consistently accurate, weight their input more heavily
|
|
109
|
+
|
|
110
|
+
---
|
|
111
|
+
|
|
112
|
+
## Example Retrospective Entry
|
|
113
|
+
|
|
114
|
+
```
|
|
115
|
+
| 2026-03-16 | PLAN-user-auth | 72/80 | 8 | 11 | 3 (middleware, session config, error handler) | 1.4x | Auth plans should include middleware and session store files by default |
|
|
116
|
+
```
|
|
@@ -0,0 +1,119 @@
|
|
|
1
|
+
# Plan Quality Schema
|
|
2
|
+
|
|
3
|
+
> Defines the mandatory structure and scoring rubric for implementation plans.
|
|
4
|
+
> Every plan produced by the `/plan` workflow MUST satisfy this schema.
|
|
5
|
+
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Task Size Classification
|
|
9
|
+
|
|
10
|
+
Before applying the schema, classify the task:
|
|
11
|
+
|
|
12
|
+
| Size | Criteria | Required Tiers |
|
|
13
|
+
|------|----------|----------------|
|
|
14
|
+
| **Trivial** | 1-2 files, <30 minutes estimated effort | Tier 1 only |
|
|
15
|
+
| **Medium** | 3-10 files, 1-4 hours estimated effort | Tier 1 + Tier 2 |
|
|
16
|
+
| **Large** | 10+ files, multi-day effort | Tier 1 + Tier 2 + architect consultation |
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## Tier 1 — Always Required
|
|
21
|
+
|
|
22
|
+
Every plan, regardless of task size, MUST include these sections:
|
|
23
|
+
|
|
24
|
+
| # | Section | Description | Points |
|
|
25
|
+
|---|---------|-------------|--------|
|
|
26
|
+
| 1 | **Context & Problem Statement** | Why this change is needed. 2-3 sentences covering the problem, impact, and motivation. | 10 |
|
|
27
|
+
| 2 | **Goals & Non-Goals** | What the plan achieves (goals) and what is explicitly out of scope (non-goals). Prevents scope creep. | 10 |
|
|
28
|
+
| 3 | **Implementation Steps** | Ordered tasks with exact file paths, specific actions, and verification criteria per step. | 10 |
|
|
29
|
+
| 4 | **Testing Strategy** | Test types required (unit, integration, e2e), coverage targets, key test cases. Reference `.agent/rules/testing.md`. | 10 |
|
|
30
|
+
| 5 | **Security Considerations** | Applicable security requirements from `.agent/rules/security.md`. If genuinely not applicable, state `N/A — [one-line justification]`. | 10 |
|
|
31
|
+
| 6 | **Risks & Mitigations** | At least 1 risk with severity (Low/Medium/High) and concrete mitigation strategy. | 5 |
|
|
32
|
+
| 7 | **Success Criteria** | Measurable definition of done. Checkboxes with specific, verifiable outcomes. | 5 |
|
|
33
|
+
|
|
34
|
+
**Tier 1 Maximum: 60 points**
|
|
35
|
+
|
|
36
|
+
---
|
|
37
|
+
|
|
38
|
+
## Tier 2 — Required for Medium & Large Tasks
|
|
39
|
+
|
|
40
|
+
Plans for tasks affecting 3+ files or requiring 1+ hours MUST also include:
|
|
41
|
+
|
|
42
|
+
| # | Section | Description | Points |
|
|
43
|
+
|---|---------|-------------|--------|
|
|
44
|
+
| 8 | **Architecture Impact** | Affected components/modules, integration points, dependency changes. Include component diagram for Large tasks. | 4 |
|
|
45
|
+
| 9 | **API / Data Model Changes** | New or modified endpoints, request/response schemas, database schema changes. | 3 |
|
|
46
|
+
| 10 | **Rollback Strategy** | How to undo the change if deployment fails or defects are discovered post-release. | 3 |
|
|
47
|
+
| 11 | **Observability** | Logging additions, metrics to track, alerting changes, monitoring dashboards affected. | 2 |
|
|
48
|
+
| 12 | **Performance Impact** | Bundle size changes, query performance, latency estimates, memory usage. | 2 |
|
|
49
|
+
| 13 | **Documentation Updates** | Which docs need changing (ROADMAP, CHANGELOG, README, API docs, ADRs). Reference `.agent/rules/documentation.md`. | 2 |
|
|
50
|
+
| 14 | **Dependencies** | What blocks this work (prerequisites). What depends on this work (downstream impact). | 2 |
|
|
51
|
+
| 15 | **Alternatives Considered** | At least 1 rejected approach with reasoning for why the chosen approach is superior. | 2 |
|
|
52
|
+
|
|
53
|
+
**Tier 2 Maximum: 20 points (added to Tier 1)**
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## Domain Enhancement Scoring
|
|
58
|
+
|
|
59
|
+
When the loading engine matches specific domains (e.g., frontend, backend, security), the corresponding domain enhancer sections from `domain-enhancers.md` MUST be included. Domain sections are scored as **bonus points** on top of the tier maximum:
|
|
60
|
+
|
|
61
|
+
| Condition | Scoring Impact |
|
|
62
|
+
|-----------|---------------|
|
|
63
|
+
| Domain matched and enhancer section present + substantive | +2 bonus points per domain |
|
|
64
|
+
| Domain matched but enhancer section missing | -2 penalty per missing domain (deducted from tier score) |
|
|
65
|
+
| Domain matched with "N/A — [valid reason]" | No bonus, no penalty |
|
|
66
|
+
| No domains matched | No impact |
|
|
67
|
+
|
|
68
|
+
**Maximum domain bonus**: +6 points (3 domains × 2 points each).
|
|
69
|
+
|
|
70
|
+
Domain scoring does NOT change the pass threshold — it provides additional quality signal. A plan can PASS without domain bonuses but will be penalized if matched domains are ignored.
|
|
71
|
+
|
|
72
|
+
---
|
|
73
|
+
|
|
74
|
+
## Scoring
|
|
75
|
+
|
|
76
|
+
| Task Size | Max Score | Pass Threshold (70%) |
|
|
77
|
+
|-----------|-----------|---------------------|
|
|
78
|
+
| Trivial | 60 | 42 |
|
|
79
|
+
| Medium | 80 | 56 |
|
|
80
|
+
| Large | 100 | 70 |
|
|
81
|
+
|
|
82
|
+
**Score Calculation**:
|
|
83
|
+
- A section earns full points when present and substantively populated
|
|
84
|
+
- A section earns zero points when missing or contains only placeholder text
|
|
85
|
+
- "N/A" with a valid justification counts as populated (earns full points)
|
|
86
|
+
|
|
87
|
+
**Verdict**:
|
|
88
|
+
- **PASS**: Score >= 70% of tier maximum
|
|
89
|
+
- **REVISE**: Score < 70% — identify missing sections and revise (max 2 revision cycles)
|
|
90
|
+
|
|
91
|
+
---
|
|
92
|
+
|
|
93
|
+
## Cross-Cutting Mandate
|
|
94
|
+
|
|
95
|
+
Regardless of task domain, these sections MUST be substantively addressed in every plan:
|
|
96
|
+
|
|
97
|
+
1. **Security Considerations** (Tier 1, #5) — Reference `.agent/rules/security.md`
|
|
98
|
+
2. **Testing Strategy** (Tier 1, #4) — Reference `.agent/rules/testing.md`
|
|
99
|
+
3. **Documentation Updates** (Tier 2, #13) — Reference `.agent/rules/documentation.md`
|
|
100
|
+
|
|
101
|
+
If a cross-cutting section is genuinely not applicable, the plan MUST state:
|
|
102
|
+
```
|
|
103
|
+
N/A — [specific reason this concern does not apply to this task]
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
**NEVER silently omit a cross-cutting section.** Silent omission is a plan defect.
|
|
107
|
+
|
|
108
|
+
---
|
|
109
|
+
|
|
110
|
+
## Alignment Verification
|
|
111
|
+
|
|
112
|
+
Every plan MUST include an alignment check against operating constraints:
|
|
113
|
+
|
|
114
|
+
| Check | Question |
|
|
115
|
+
|-------|----------|
|
|
116
|
+
| Operating Constraints | Does this respect Trust > Optimization? |
|
|
117
|
+
| Existing Patterns | Does this follow project conventions? |
|
|
118
|
+
| Rules Consulted | Which rule files were reviewed? |
|
|
119
|
+
| Coding Style | Does this comply with `.agent/rules/coding-style.md`? |
|