moicle 1.6.0 → 2.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +82 -32
- package/assets/skills/docs/content/SKILL.md +269 -0
- package/assets/skills/{logo-design → docs/logo}/SKILL.md +31 -16
- package/assets/skills/{sync-docs → docs/sync}/SKILL.md +31 -1
- package/assets/skills/{video-content → docs/video}/SKILL.md +32 -17
- package/assets/skills/docs/write/SKILL.md +371 -0
- package/assets/skills/feature/api/SKILL.md +277 -0
- package/assets/skills/feature/deprecate/SKILL.md +276 -0
- package/assets/skills/{new-feature → feature/new}/SKILL.md +28 -12
- package/assets/skills/{refactor → feature/refactor}/SKILL.md +24 -12
- package/assets/skills/{hotfix → fix/hotfix}/SKILL.md +32 -30
- package/assets/skills/fix/incident/SKILL.md +272 -0
- package/assets/skills/{fix-pr-comment → fix/pr-comment}/SKILL.md +30 -1
- package/assets/skills/fix/root-cause/SKILL.md +219 -0
- package/assets/skills/{onboarding → research/onboarding}/SKILL.md +32 -31
- package/assets/skills/{spike → research/spike}/SKILL.md +33 -32
- package/assets/skills/research/web/SKILL.md +163 -0
- package/assets/skills/{architect-review → review/architect}/SKILL.md +37 -7
- package/assets/skills/{review-changes → review/branch}/SKILL.md +27 -7
- package/assets/skills/{pr-review → review/pr}/SKILL.md +31 -30
- package/assets/skills/review/tdd/SKILL.md +206 -0
- package/bin/cli.js +12 -1
- package/dist/commands/install.d.ts.map +1 -1
- package/dist/commands/install.js +219 -38
- package/dist/commands/install.js.map +1 -1
- package/dist/commands/list.d.ts.map +1 -1
- package/dist/commands/list.js +42 -2
- package/dist/commands/list.js.map +1 -1
- package/dist/commands/postinstall.d.ts.map +1 -1
- package/dist/commands/postinstall.js +2 -0
- package/dist/commands/postinstall.js.map +1 -1
- package/dist/commands/status.d.ts.map +1 -1
- package/dist/commands/status.js +31 -1
- package/dist/commands/status.js.map +1 -1
- package/dist/commands/uninstall.d.ts.map +1 -1
- package/dist/commands/uninstall.js +93 -38
- package/dist/commands/uninstall.js.map +1 -1
- package/dist/commands/upgrade.d.ts +7 -0
- package/dist/commands/upgrade.d.ts.map +1 -0
- package/dist/commands/upgrade.js +165 -0
- package/dist/commands/upgrade.js.map +1 -0
- package/dist/types.d.ts +1 -1
- package/dist/types.d.ts.map +1 -1
- package/dist/utils/symlink.d.ts +8 -0
- package/dist/utils/symlink.d.ts.map +1 -1
- package/dist/utils/symlink.js +100 -0
- package/dist/utils/symlink.js.map +1 -1
- package/package.json +3 -1
- package/assets/skills/api-integration/SKILL.md +0 -883
- package/assets/skills/content-writer/SKILL.md +0 -721
- package/assets/skills/deep-debug/SKILL.md +0 -114
- package/assets/skills/deprecation/SKILL.md +0 -923
- package/assets/skills/documentation/SKILL.md +0 -1333
- package/assets/skills/incident-response/SKILL.md +0 -946
- package/assets/skills/research/SKILL.md +0 -124
- package/assets/skills/tdd/SKILL.md +0 -828
|
@@ -0,0 +1,272 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: fix-incident
|
|
3
|
+
description: Incident response workflow for handling production issues. Use when there's an incident, outage, production down, or when user says "incident", "outage", "production down", "service down", "emergency".
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Incident Response Workflow
|
|
7
|
+
|
|
8
|
+
Structured workflow for handling production incidents from triage to postmortem. **Speed > completeness** during P1/P2 — mitigation first, root cause after.
|
|
9
|
+
|
|
10
|
+
## When to use this skill
|
|
11
|
+
|
|
12
|
+
- ✅ Production is down or degraded right now
|
|
13
|
+
- ✅ Users are impacted, on-call has been paged
|
|
14
|
+
- ✅ Need to coordinate response, mitigation, comms, and postmortem
|
|
15
|
+
- ❌ Bug is reported but no users impacted → use `/fix:hotfix`
|
|
16
|
+
- ❌ Already mitigated, need to find root cause → use `/fix:root-cause`
|
|
17
|
+
- ❌ Postmortem only, no active incident → jump to Phase 5
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## Workflow
|
|
22
|
+
|
|
23
|
+
```
|
|
24
|
+
TRIAGE → INVESTIGATE → MITIGATE → RESOLVE → POSTMORTEM
|
|
25
|
+
↓ ↓ ↓ ↓ ↑
|
|
26
|
+
comms comms comms comms learn
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
## Read Architecture First
|
|
30
|
+
|
|
31
|
+
Before responding:
|
|
32
|
+
1. `~/.claude/architecture/ddd-architecture.md`
|
|
33
|
+
2. Stack-specific doc for the affected service
|
|
34
|
+
3. Runbook for the affected component (`docs/runbooks/` or wiki)
|
|
35
|
+
|
|
36
|
+
---
|
|
37
|
+
|
|
38
|
+
## Severity Levels
|
|
39
|
+
|
|
40
|
+
| Level | Description | Response Time | Examples |
|
|
41
|
+
|-------|-------------|---------------|----------|
|
|
42
|
+
| **P1** | Critical — complete outage / data loss / security breach | <15 min | Site down, DB corrupted, secret leak |
|
|
43
|
+
| **P2** | High — major degradation, 50%+ users affected | <1 h | Login broken, payment failing |
|
|
44
|
+
| **P3** | Medium — partial degradation, <50% users | <4 h | One feature broken |
|
|
45
|
+
| **P4** | Low — minor issue, no significant impact | <24 h | UI glitch |
|
|
46
|
+
|
|
47
|
+
---
|
|
48
|
+
|
|
49
|
+
## Phase 1: TRIAGE (5–15 min)
|
|
50
|
+
|
|
51
|
+
**Goal:** classify severity, assemble response, start comms.
|
|
52
|
+
|
|
53
|
+
### Actions
|
|
54
|
+
1. Capture: what, when, who's affected, error messages, recent deploys
|
|
55
|
+
2. Assign severity (P1–P4)
|
|
56
|
+
3. Open incident channel (`#incident-{date}-{short-name}`)
|
|
57
|
+
4. Assign roles:
|
|
58
|
+
- **Incident Commander (IC)** — coordinates, makes decisions
|
|
59
|
+
- **Tech Lead** — investigates / fixes
|
|
60
|
+
- **Comms Lead** — updates stakeholders
|
|
61
|
+
5. Page on-call for affected service
|
|
62
|
+
6. Send first status update (template below)
|
|
63
|
+
|
|
64
|
+
### First update template
|
|
65
|
+
```
|
|
66
|
+
[INCIDENT] {severity} {service} — {one-line impact}
|
|
67
|
+
Detected: {timestamp}
|
|
68
|
+
Status: investigating
|
|
69
|
+
Channel: #incident-{slug}
|
|
70
|
+
Next update: {time}
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
### Gate
|
|
74
|
+
- [ ] Severity assigned
|
|
75
|
+
- [ ] IC + Tech Lead identified
|
|
76
|
+
- [ ] Incident channel open
|
|
77
|
+
- [ ] First status sent
|
|
78
|
+
|
|
79
|
+
---
|
|
80
|
+
|
|
81
|
+
## Phase 2: INVESTIGATE (parallel with comms)
|
|
82
|
+
|
|
83
|
+
**Goal:** find what's broken — don't fix yet, identify it.
|
|
84
|
+
|
|
85
|
+
### Check in this order
|
|
86
|
+
1. **Recent deploys** — last 24h. Roll back candidate?
|
|
87
|
+
2. **Monitoring dashboards** — error rate, latency, saturation, traffic (the 4 golden signals)
|
|
88
|
+
3. **Logs** — error spikes, new error types
|
|
89
|
+
4. **Dependencies** — upstream services, DB, cache, queue, external APIs
|
|
90
|
+
5. **Infrastructure** — CPU, memory, disk, network, certificate expiry, DNS
|
|
91
|
+
|
|
92
|
+
### Common patterns
|
|
93
|
+
| Symptom | Likely cause |
|
|
94
|
+
|---------|-------------|
|
|
95
|
+
| Errors only on new code path | Bad deploy → rollback |
|
|
96
|
+
| Errors across all paths | Infra (DB, cache, network) |
|
|
97
|
+
| Slow but not erroring | Resource saturation / external API slowdown |
|
|
98
|
+
| Intermittent | Cache, race condition, flapping dependency |
|
|
99
|
+
| Started exactly on a cron tick | Scheduled job |
|
|
100
|
+
|
|
101
|
+
### Gate
|
|
102
|
+
- [ ] Suspected cause identified (even if not yet confirmed)
|
|
103
|
+
- [ ] Mitigation candidates listed
|
|
104
|
+
- [ ] Update sent to incident channel
|
|
105
|
+
|
|
106
|
+
---
|
|
107
|
+
|
|
108
|
+
## Phase 3: MITIGATE (fast, reversible)
|
|
109
|
+
|
|
110
|
+
**Goal:** stop the bleeding. Don't fix root cause yet.
|
|
111
|
+
|
|
112
|
+
### Mitigation options (pick fastest reversible)
|
|
113
|
+
| Option | When | Reversibility |
|
|
114
|
+
|--------|------|---------------|
|
|
115
|
+
| Rollback deploy | Recent deploy is suspect | Easy |
|
|
116
|
+
| Feature flag off | Bad feature behind flag | Easy |
|
|
117
|
+
| Scale up | Resource saturation | Easy |
|
|
118
|
+
| Failover | Region / instance down | Medium |
|
|
119
|
+
| Block traffic | Bot / DDoS | Medium |
|
|
120
|
+
| DB restore | Data corruption | Hard — last resort |
|
|
121
|
+
|
|
122
|
+
### Rules
|
|
123
|
+
- **One change at a time.** If you change 3 things and it gets better, you don't know which fixed it.
|
|
124
|
+
- **Announce before doing.** "Rolling back to {sha} in 60s." Then do it.
|
|
125
|
+
- **Time-box.** If a mitigation doesn't work in 10 min, try the next one.
|
|
126
|
+
|
|
127
|
+
### Verify mitigation
|
|
128
|
+
- [ ] Error rate dropping in monitoring
|
|
129
|
+
- [ ] Users confirming (sample check)
|
|
130
|
+
- [ ] No new alerts firing
|
|
131
|
+
|
|
132
|
+
### Gate
|
|
133
|
+
- [ ] Impact reduced (errors down, users unblocked)
|
|
134
|
+
- [ ] Mitigation status sent to channel
|
|
135
|
+
- [ ] Severity downgraded if appropriate
|
|
136
|
+
|
|
137
|
+
---
|
|
138
|
+
|
|
139
|
+
## Phase 4: RESOLVE
|
|
140
|
+
|
|
141
|
+
**Goal:** apply the permanent fix and verify.
|
|
142
|
+
|
|
143
|
+
### Actions
|
|
144
|
+
1. Confirm root cause (use `/fix:root-cause` workflow if not obvious)
|
|
145
|
+
2. Implement permanent fix following stack architecture
|
|
146
|
+
3. Add test that would have caught this
|
|
147
|
+
4. Deploy fix through normal pipeline (don't skip CI even under pressure — unless explicitly authorized)
|
|
148
|
+
5. Monitor for {N} hours post-deploy
|
|
149
|
+
6. Remove temporary mitigation (e.g., re-enable feature flag) only after fix proven stable
|
|
150
|
+
|
|
151
|
+
### Gate
|
|
152
|
+
- [ ] Root cause confirmed
|
|
153
|
+
- [ ] Fix deployed
|
|
154
|
+
- [ ] Regression test added and passing
|
|
155
|
+
- [ ] Monitoring shows healthy metrics for ≥1 hour
|
|
156
|
+
- [ ] Incident marked resolved in channel
|
|
157
|
+
|
|
158
|
+
---
|
|
159
|
+
|
|
160
|
+
## Phase 5: POSTMORTEM
|
|
161
|
+
|
|
162
|
+
**Goal:** learn from this so it doesn't happen again. **Blameless.**
|
|
163
|
+
|
|
164
|
+
### Schedule
|
|
165
|
+
- Within 48h of resolution for P1/P2
|
|
166
|
+
- Within 1 week for P3
|
|
167
|
+
- Optional for P4 (only if recurring pattern)
|
|
168
|
+
|
|
169
|
+
### Postmortem template
|
|
170
|
+
|
|
171
|
+
```markdown
|
|
172
|
+
# Postmortem: {service} {date}
|
|
173
|
+
|
|
174
|
+
## Summary
|
|
175
|
+
{1-paragraph: what happened, impact, duration}
|
|
176
|
+
|
|
177
|
+
## Impact
|
|
178
|
+
- Users affected: {N or %}
|
|
179
|
+
- Duration: {detected → mitigated → resolved}
|
|
180
|
+
- Revenue / SLA impact: {if applicable}
|
|
181
|
+
|
|
182
|
+
## Timeline (UTC)
|
|
183
|
+
| Time | Event |
|
|
184
|
+
|------|-------|
|
|
185
|
+
| HH:MM | Deploy of {sha} |
|
|
186
|
+
| HH:MM | First alert fired |
|
|
187
|
+
| HH:MM | IC paged |
|
|
188
|
+
| HH:MM | Mitigation: rollback to {sha} |
|
|
189
|
+
| HH:MM | Error rate normal |
|
|
190
|
+
| HH:MM | Permanent fix deployed |
|
|
191
|
+
|
|
192
|
+
## Root Cause
|
|
193
|
+
{Technical cause — what code / config / data caused this}
|
|
194
|
+
|
|
195
|
+
## Contributing Factors
|
|
196
|
+
- {Factor 1, e.g., test gap}
|
|
197
|
+
- {Factor 2, e.g., missing alert}
|
|
198
|
+
|
|
199
|
+
## What Went Well
|
|
200
|
+
- {e.g., fast detection, clear rollback path}
|
|
201
|
+
|
|
202
|
+
## What Didn't Go Well
|
|
203
|
+
- {e.g., took 20 min to identify suspect deploy}
|
|
204
|
+
|
|
205
|
+
## Action Items
|
|
206
|
+
| # | Action | Owner | Due | Severity |
|
|
207
|
+
|---|--------|-------|-----|----------|
|
|
208
|
+
| 1 | {add alert for X} | @alice | {date} | high |
|
|
209
|
+
| 2 | {add regression test} | @bob | {date} | medium |
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
### Action item rules
|
|
213
|
+
- Each action item has owner + due date
|
|
214
|
+
- Track to completion (open a ticket, link from postmortem)
|
|
215
|
+
- "Better testing" is NOT an action item — be specific ("add test for X case")
|
|
216
|
+
|
|
217
|
+
---
|
|
218
|
+
|
|
219
|
+
## Communication Templates
|
|
220
|
+
|
|
221
|
+
**Status update (every {N} min during incident)**
|
|
222
|
+
```
|
|
223
|
+
[UPDATE {time}] {service} — {status}
|
|
224
|
+
Impact: {what users see}
|
|
225
|
+
Action: {what we're doing}
|
|
226
|
+
Next update: {time}
|
|
227
|
+
```
|
|
228
|
+
|
|
229
|
+
**Resolved**
|
|
230
|
+
```
|
|
231
|
+
[RESOLVED {time}] {service}
|
|
232
|
+
Root cause: {one line}
|
|
233
|
+
Mitigation: {what we did}
|
|
234
|
+
Permanent fix: {ETA or done}
|
|
235
|
+
Postmortem: {date}
|
|
236
|
+
```
|
|
237
|
+
|
|
238
|
+
---
|
|
239
|
+
|
|
240
|
+
## Hard Rules
|
|
241
|
+
|
|
242
|
+
- **Speed > completeness during P1/P2** — mitigate first, perfect later
|
|
243
|
+
- **One change at a time** during mitigation
|
|
244
|
+
- **Blameless postmortems** — focus on system, not person
|
|
245
|
+
- **Don't skip CI** for the fix unless explicitly authorized by IC
|
|
246
|
+
- **Don't close the incident** until monitoring is healthy for ≥1 hour
|
|
247
|
+
- **Action items must have owner + date** — otherwise they don't happen
|
|
248
|
+
|
|
249
|
+
---
|
|
250
|
+
|
|
251
|
+
## Related Skills
|
|
252
|
+
|
|
253
|
+
| When | Use |
|
|
254
|
+
|------|-----|
|
|
255
|
+
| Bug reported but no active incident | `/fix:hotfix` |
|
|
256
|
+
| Root cause unclear, need deep investigation | `/fix:root-cause` |
|
|
257
|
+
| Need to roll back deployment | (use ops runbook, not a skill) |
|
|
258
|
+
| Documenting postmortem as a doc | `/docs:write` |
|
|
259
|
+
|
|
260
|
+
---
|
|
261
|
+
|
|
262
|
+
## Recommended Agents
|
|
263
|
+
|
|
264
|
+
| Phase | Agent | Purpose |
|
|
265
|
+
|-------|-------|---------|
|
|
266
|
+
| TRIAGE | `@devops` | Infra + monitoring context |
|
|
267
|
+
| INVESTIGATE | Stack-specific dev agent | Code-level debugging |
|
|
268
|
+
| INVESTIGATE | `@security-audit` | If suspected breach / leak |
|
|
269
|
+
| INVESTIGATE | `@db-designer` | If DB / data issue |
|
|
270
|
+
| MITIGATE | `@devops` | Rollback / scale / failover |
|
|
271
|
+
| RESOLVE | Stack-specific dev agent | Permanent fix |
|
|
272
|
+
| POSTMORTEM | `@docs-writer` | Write postmortem doc |
|
|
@@ -8,10 +8,19 @@ args: PR_NUMBER
|
|
|
8
8
|
|
|
9
9
|
Workflow for fetching and fixing review comments from a pull request.
|
|
10
10
|
|
|
11
|
+
## When to use this skill
|
|
12
|
+
|
|
13
|
+
- ✅ Reviewer left comments on an open PR and you need to address each one
|
|
14
|
+
- ✅ You want to track which comments are resolved vs still open
|
|
15
|
+
- ✅ Need to post structured responses back to GitHub
|
|
16
|
+
- ❌ Reviewing PR yourself (you're the reviewer) → use `/review:pr`
|
|
17
|
+
- ❌ No PR yet, just want to clean up branch → use `/review:branch`
|
|
18
|
+
- ❌ Need to fix a brand-new bug found by reviewer → use `/fix:hotfix` first, then this skill
|
|
19
|
+
|
|
11
20
|
## Usage
|
|
12
21
|
|
|
13
22
|
```
|
|
14
|
-
/fix
|
|
23
|
+
/fix:pr-comment {PR_NUMBER}
|
|
15
24
|
```
|
|
16
25
|
|
|
17
26
|
## Workflow Overview
|
|
@@ -281,3 +290,23 @@ PR comments are properly addressed when:
|
|
|
281
290
|
3. Changes pushed to the PR branch
|
|
282
291
|
4. Reviewers notified for re-review
|
|
283
292
|
5. No new issues introduced
|
|
293
|
+
|
|
294
|
+
---
|
|
295
|
+
|
|
296
|
+
## Related Skills
|
|
297
|
+
|
|
298
|
+
| When | Use |
|
|
299
|
+
|------|-----|
|
|
300
|
+
| You are the reviewer | `/review:pr` |
|
|
301
|
+
| Self-review before pushing again | `/review:branch` |
|
|
302
|
+
| Fixing a bug surfaced by review | `/fix:hotfix` |
|
|
303
|
+
| Comment requests architecture refactor | `refactor` → then this skill |
|
|
304
|
+
|
|
305
|
+
## Recommended Agents
|
|
306
|
+
|
|
307
|
+
| Phase | Agent | Purpose |
|
|
308
|
+
|-------|-------|---------|
|
|
309
|
+
| ANALYZE | `@code-reviewer` | Triage comments by severity |
|
|
310
|
+
| FIX | Stack-specific dev agent | Apply code changes |
|
|
311
|
+
| FIX | `@security-audit` | If comment flagged security |
|
|
312
|
+
| RESPOND | `@docs-writer` | Polish written responses |
|
|
@@ -0,0 +1,219 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: fix-root-cause
|
|
3
|
+
description: Deep bug investigation workflow for hard-to-trace errors. Systematic root cause analysis — no guessing, no blind fixes. Use when user says "deep debug", "deep-debug", "trace bug", "find root cause", "hard bug", "investigate bug".
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Deep Bug Investigation
|
|
7
|
+
|
|
8
|
+
For hard bugs that have been "fixed" multiple times without success. **DO NOT guess** — trace step by step to the root cause.
|
|
9
|
+
|
|
10
|
+
## When to use this skill
|
|
11
|
+
|
|
12
|
+
- ✅ Same bug keeps coming back after multiple "fixes"
|
|
13
|
+
- ✅ "Sometimes works, sometimes doesn't" — likely hidden state
|
|
14
|
+
- ✅ Error is inside a vendor library / framework internals
|
|
15
|
+
- ✅ Local and production behave differently
|
|
16
|
+
- ❌ Bug is well understood and just needs a fix → use `/fix:hotfix`
|
|
17
|
+
- ❌ Production is currently down with users impacted → use `/fix:incident` first, deep-debug after mitigation
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## Workflow
|
|
22
|
+
|
|
23
|
+
```
|
|
24
|
+
COLLECT → VERIFY → TRACE → ROOT CAUSE → CHECK HIDDEN STATE → FIX → VERIFY
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## Step 1: Collect evidence
|
|
30
|
+
|
|
31
|
+
Record exactly, **do NOT interpret**:
|
|
32
|
+
|
|
33
|
+
- Exact error message
|
|
34
|
+
- Stack trace: file, line number, call chain
|
|
35
|
+
- Which environment is affected (production / staging / local)
|
|
36
|
+
- Happens every time or only in certain cases
|
|
37
|
+
|
|
38
|
+
### Gate
|
|
39
|
+
- [ ] Error message captured verbatim
|
|
40
|
+
- [ ] Stack trace captured (or "no stack trace" noted)
|
|
41
|
+
- [ ] Reproduction frequency known
|
|
42
|
+
|
|
43
|
+
---
|
|
44
|
+
|
|
45
|
+
## Step 2: Verify the code that is actually running
|
|
46
|
+
|
|
47
|
+
**DO NOT assume production code = local code.**
|
|
48
|
+
|
|
49
|
+
- Identify the exact version / commit currently deployed
|
|
50
|
+
- Compare it against the code you are reading locally
|
|
51
|
+
- If they DIFFER → read the deployed version before analyzing further
|
|
52
|
+
|
|
53
|
+
### Gate
|
|
54
|
+
- [ ] Deployed commit identified
|
|
55
|
+
- [ ] Local checkout matches deployed commit (or differences noted)
|
|
56
|
+
|
|
57
|
+
---
|
|
58
|
+
|
|
59
|
+
## Step 3: Trace the execution path
|
|
60
|
+
|
|
61
|
+
The most important step. Go from entry point → failing line. Trace **EVERY** step, **DO NOT skip**.
|
|
62
|
+
|
|
63
|
+
### 3a. Entry point → error line
|
|
64
|
+
- Where does the request / event / job enter from?
|
|
65
|
+
- Which function calls which? Follow the stack trace exactly
|
|
66
|
+
- How is data passed through each layer?
|
|
67
|
+
|
|
68
|
+
### 3b. Where does the faulty data come from?
|
|
69
|
+
- Where is the faulty variable created / loaded from?
|
|
70
|
+
- Loaded directly from source (DB, API) or from cache / session?
|
|
71
|
+
- Does it go through serialize → unserialize?
|
|
72
|
+
- Does it go through any transform / convert step?
|
|
73
|
+
|
|
74
|
+
### 3c. Type & state at the moment of failure
|
|
75
|
+
- What is the actual type of the variable? (string, object, null, enum...)
|
|
76
|
+
- What type does the code expect?
|
|
77
|
+
- Why does the actual type differ from the expected one?
|
|
78
|
+
|
|
79
|
+
### 3d. Framework internals (when error is inside vendor / library)
|
|
80
|
+
- Read the source code at the EXACT line number from the stack trace
|
|
81
|
+
- Trace backwards: who calls that method, with what arguments?
|
|
82
|
+
- What condition drives the code into the failing branch?
|
|
83
|
+
|
|
84
|
+
### Gate
|
|
85
|
+
- [ ] Full call chain from entry → failure documented
|
|
86
|
+
- [ ] Source of faulty data identified
|
|
87
|
+
- [ ] Actual vs expected type / state recorded
|
|
88
|
+
|
|
89
|
+
---
|
|
90
|
+
|
|
91
|
+
## Step 4: Find the root cause — answer 3 questions
|
|
92
|
+
|
|
93
|
+
1. **Why does it fail?** — the specific technical cause
|
|
94
|
+
2. **Why didn't it fail before?** — what changed
|
|
95
|
+
3. **Reproduction conditions?** — when it fails, when it doesn't
|
|
96
|
+
|
|
97
|
+
If you cannot answer all 3 → return to Step 3, trace further.
|
|
98
|
+
|
|
99
|
+
### Gate
|
|
100
|
+
- [ ] All 3 questions answered with evidence (not speculation)
|
|
101
|
+
|
|
102
|
+
---
|
|
103
|
+
|
|
104
|
+
## Step 5: Check hidden state sources
|
|
105
|
+
|
|
106
|
+
"Sometimes works, sometimes doesn't" bugs are usually caused by hidden state. Check in this order:
|
|
107
|
+
|
|
108
|
+
### Cache / Serialization
|
|
109
|
+
- Does the cached object lose internal state? (transient fields, lazy-loaded properties)
|
|
110
|
+
- Does stale cache contain the old data format while new code expects the new format?
|
|
111
|
+
- Does serialize / unserialize change the type? (int↔float, null handling, enum↔string)
|
|
112
|
+
|
|
113
|
+
### Database / Storage
|
|
114
|
+
- Do collation / encoding affect comparisons?
|
|
115
|
+
- Do default values in the DB match the code's expectations?
|
|
116
|
+
- Has the schema been updated on production yet?
|
|
117
|
+
|
|
118
|
+
### Runtime / Compiled cache
|
|
119
|
+
- Any compiled / cached config, routes, or views not cleared?
|
|
120
|
+
- Does the bytecode cache (OPcache, compiled assets) serve the old file?
|
|
121
|
+
- Does CDN / proxy cache serve a stale asset?
|
|
122
|
+
|
|
123
|
+
### Environment
|
|
124
|
+
- Are env vars on production correct and complete?
|
|
125
|
+
- Does the runtime version (PHP, Node, Go, Python...) differ from local?
|
|
126
|
+
- Do dependency versions differ?
|
|
127
|
+
|
|
128
|
+
### Gate
|
|
129
|
+
- [ ] All 4 categories considered (even if N/A — write that down)
|
|
130
|
+
|
|
131
|
+
---
|
|
132
|
+
|
|
133
|
+
## Step 6: Fix
|
|
134
|
+
|
|
135
|
+
Only fix once you have answered the 3 questions from Step 4. The fix MUST:
|
|
136
|
+
|
|
137
|
+
- Address the root cause, not the symptom
|
|
138
|
+
- Handle the edge case discovered (stale cache, type mismatch)
|
|
139
|
+
- Be defensive at data boundaries (cache, DB, external API) — NOT in internal logic
|
|
140
|
+
- NOT break the normal code path in order to patch an edge case
|
|
141
|
+
|
|
142
|
+
### Gate
|
|
143
|
+
- [ ] Fix targets root cause, not symptom
|
|
144
|
+
- [ ] Normal code path not regressed
|
|
145
|
+
|
|
146
|
+
---
|
|
147
|
+
|
|
148
|
+
## Step 7: Verify
|
|
149
|
+
|
|
150
|
+
- Reproduce the failure conditions from Step 4 → confirm the fix works
|
|
151
|
+
- Check the normal code path still works
|
|
152
|
+
- If cache-related → test BOTH fresh load and cached load
|
|
153
|
+
- Verify against the actually deployed version (repeat Step 2)
|
|
154
|
+
|
|
155
|
+
### Gate
|
|
156
|
+
- [ ] Original failure no longer reproducible
|
|
157
|
+
- [ ] Normal flow still works
|
|
158
|
+
- [ ] Cached / serialized paths both tested (if applicable)
|
|
159
|
+
|
|
160
|
+
---
|
|
161
|
+
|
|
162
|
+
## Final Report
|
|
163
|
+
|
|
164
|
+
```markdown
|
|
165
|
+
## Bug: {short description}
|
|
166
|
+
|
|
167
|
+
### Root Cause
|
|
168
|
+
{Answer to "why does it fail" — the actual technical cause}
|
|
169
|
+
|
|
170
|
+
### Why it didn't fail before
|
|
171
|
+
{What changed: deploy, dependency, data shape, config}
|
|
172
|
+
|
|
173
|
+
### Reproduction
|
|
174
|
+
{Exact steps to reproduce}
|
|
175
|
+
|
|
176
|
+
### Fix
|
|
177
|
+
- File: `path/to/file:line`
|
|
178
|
+
- Change: {what changed and why this addresses root cause, not symptom}
|
|
179
|
+
|
|
180
|
+
### Hidden state source
|
|
181
|
+
{Cache / Storage / Runtime / Env — or "none"}
|
|
182
|
+
|
|
183
|
+
### Verification
|
|
184
|
+
- [x] Original failure no longer reproducible
|
|
185
|
+
- [x] Normal path works
|
|
186
|
+
- [x] Cached path works (if applicable)
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
---
|
|
190
|
+
|
|
191
|
+
## Hard Rules
|
|
192
|
+
|
|
193
|
+
- **DO NOT GUESS** — trace evidence, do not infer from variable names or "maybe it's..."
|
|
194
|
+
- **DO NOT FIX BEFORE UNDERSTANDING** — fixing without knowing the root cause = creating a new bug
|
|
195
|
+
- **VERIFY DEPLOYED CODE** — always check the running version, never assume production = local
|
|
196
|
+
- **CHECK CACHE FIRST** — most "sometimes works, sometimes doesn't" bugs come from stale cached state
|
|
197
|
+
- **ONE ROOT CAUSE** — every bug has one root cause. If multiple possibilities remain → trace further
|
|
198
|
+
|
|
199
|
+
---
|
|
200
|
+
|
|
201
|
+
## Related Skills
|
|
202
|
+
|
|
203
|
+
| When | Use |
|
|
204
|
+
|------|-----|
|
|
205
|
+
| Bug is understood, just needs a fix | `/fix:hotfix` |
|
|
206
|
+
| Production is down, users impacted | `/fix:incident` |
|
|
207
|
+
| Need to write a regression test after the fix | `/review:tdd` |
|
|
208
|
+
| Need to research how others solved similar bugs | `/research:web` |
|
|
209
|
+
|
|
210
|
+
---
|
|
211
|
+
|
|
212
|
+
## Recommended Agents
|
|
213
|
+
|
|
214
|
+
| Phase | Agent | Purpose |
|
|
215
|
+
|-------|-------|---------|
|
|
216
|
+
| Step 3 (trace) | `@code-reviewer` | Independent reading of call chain |
|
|
217
|
+
| Step 5 (hidden state) | `@db-designer` | Schema / collation / index checks |
|
|
218
|
+
| Step 6 (fix) | Stack-specific dev agent | Implement defensive fix at boundary |
|
|
219
|
+
| Step 7 (verify) | `@test-writer` | Regression test for the failing condition |
|
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
---
|
|
2
|
-
name: onboarding
|
|
2
|
+
name: research-onboarding
|
|
3
3
|
description: Codebase onboarding workflow for understanding new projects. Use when joining a project, exploring codebase, or when user says "explain codebase", "onboard me", "new to project", "understand project", "explore codebase", "project overview".
|
|
4
4
|
---
|
|
5
5
|
|
|
@@ -7,39 +7,18 @@ description: Codebase onboarding workflow for understanding new projects. Use wh
|
|
|
7
7
|
|
|
8
8
|
Complete workflow for understanding and navigating new codebases effectively.
|
|
9
9
|
|
|
10
|
-
##
|
|
10
|
+
## When to use this skill
|
|
11
11
|
|
|
12
|
-
|
|
12
|
+
- ✅ Just joined a project / repo and need to ramp up
|
|
13
|
+
- ✅ Returning to a codebase you haven't touched in months
|
|
14
|
+
- ✅ Need to give a teammate a structured walkthrough
|
|
15
|
+
- ❌ Want to generate full docs site → use `/docs:sync`
|
|
16
|
+
- ❌ Need to fix a specific bug → use `/fix:hotfix` / `/fix:root-cause`
|
|
17
|
+
- ❌ Only want a quick file lookup → just use `grep` / `find`
|
|
13
18
|
|
|
14
|
-
|
|
15
|
-
```
|
|
16
|
-
~/.claude/architecture/
|
|
17
|
-
├── clean-architecture.md # Core principles for all projects
|
|
18
|
-
├── flutter-mobile.md # Flutter + Riverpod
|
|
19
|
-
├── react-frontend.md # React + Vite + TypeScript
|
|
20
|
-
├── go-backend.md # Go + Gin
|
|
21
|
-
├── laravel-backend.md # Laravel + PHP
|
|
22
|
-
├── remix-fullstack.md # Remix fullstack
|
|
23
|
-
└── monorepo.md # Monorepo structure
|
|
24
|
-
```
|
|
25
|
-
|
|
26
|
-
### Project-specific (if exists)
|
|
27
|
-
```
|
|
28
|
-
.claude/architecture/ # Project overrides
|
|
29
|
-
```
|
|
30
|
-
|
|
31
|
-
**Understanding the architecture first helps you grasp the codebase structure faster.**
|
|
19
|
+
## Read Architecture First
|
|
32
20
|
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
| Phase | Agent | Purpose |
|
|
36
|
-
|-------|-------|---------|
|
|
37
|
-
| SCAN | `@clean-architect` | Identify architecture patterns |
|
|
38
|
-
| ANALYZE | `@clean-architect` | Analyze layer structure and boundaries |
|
|
39
|
-
| ANALYZE | `@code-reviewer` | Code quality and conventions analysis |
|
|
40
|
-
| ANALYZE | `@security-audit` | Security patterns review |
|
|
41
|
-
| EXPLAIN | `@docs-writer` | Generate documentation/explanation |
|
|
42
|
-
| GUIDE | `@docs-writer` | Create navigation guide |
|
|
21
|
+
Read `ddd-architecture.md` + stack-specific doc first. Architecture context speeds up codebase comprehension dramatically.
|
|
43
22
|
|
|
44
23
|
## Workflow Overview
|
|
45
24
|
|
|
@@ -605,3 +584,25 @@ Onboarding is complete when you can:
|
|
|
605
584
|
3. **Pair with Code** - Review existing features to learn patterns
|
|
606
585
|
4. **Ask Questions** - Clarify anything unclear
|
|
607
586
|
5. **Start Contributing** - Begin with small features following the architecture
|
|
587
|
+
|
|
588
|
+
---
|
|
589
|
+
|
|
590
|
+
## Related Skills
|
|
591
|
+
|
|
592
|
+
| When | Use |
|
|
593
|
+
|------|-----|
|
|
594
|
+
| Generate full docs site after onboarding | `/docs:sync` |
|
|
595
|
+
| Build first feature after onboarding | `/feature:new` |
|
|
596
|
+
| First bug fix to learn the codebase | `/fix:hotfix` |
|
|
597
|
+
| Author / update the onboarding doc itself | `/docs:write` |
|
|
598
|
+
|
|
599
|
+
## Recommended Agents
|
|
600
|
+
|
|
601
|
+
| Phase | Agent | Purpose |
|
|
602
|
+
|-------|-------|---------|
|
|
603
|
+
| SCAN | `@clean-architect` | Identify architecture patterns |
|
|
604
|
+
| ANALYZE | `@clean-architect` | Layer structure + boundaries |
|
|
605
|
+
| ANALYZE | `@code-reviewer` | Code conventions |
|
|
606
|
+
| ANALYZE | `@security-audit` | Security patterns |
|
|
607
|
+
| EXPLAIN | `@docs-writer` | Generate explanations |
|
|
608
|
+
| GUIDE | `@docs-writer` | Navigation guide |
|
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
---
|
|
2
|
-
name: spike
|
|
2
|
+
name: research-spike
|
|
3
3
|
description: Spike/Research workflow for exploring unknowns before committing to implementation. Use when researching, prototyping, doing proof of concept, or when user says "spike", "research", "prototype", "poc", "explore", "investigate".
|
|
4
4
|
---
|
|
5
5
|
|
|
@@ -7,40 +7,18 @@ description: Spike/Research workflow for exploring unknowns before committing to
|
|
|
7
7
|
|
|
8
8
|
Time-boxed exploration workflow for investigating unknowns, validating assumptions, and de-risking decisions before committing to full implementation.
|
|
9
9
|
|
|
10
|
-
##
|
|
10
|
+
## When to use this skill
|
|
11
11
|
|
|
12
|
-
|
|
12
|
+
- ✅ Need to validate a technical assumption by building (not just reading)
|
|
13
|
+
- ✅ Picking between 2+ technical options and need real comparison
|
|
14
|
+
- ✅ De-risking a critical path before committing to it
|
|
15
|
+
- ❌ Want to compare options via docs / SO / blogs only → use `/research:web`
|
|
16
|
+
- ❌ Already decided on the approach → use `/feature:new`
|
|
17
|
+
- ❌ Stuck on a known bug → use `/fix:root-cause`
|
|
13
18
|
|
|
14
|
-
|
|
15
|
-
```
|
|
16
|
-
~/.claude/architecture/
|
|
17
|
-
├── clean-architecture.md # Core principles for all projects
|
|
18
|
-
├── flutter-mobile.md # Flutter + Riverpod
|
|
19
|
-
├── react-frontend.md # React + Vite + TypeScript
|
|
20
|
-
├── go-backend.md # Go + Gin
|
|
21
|
-
├── laravel-backend.md # Laravel + PHP
|
|
22
|
-
├── remix-fullstack.md # Remix fullstack
|
|
23
|
-
└── monorepo.md # Monorepo structure
|
|
24
|
-
```
|
|
25
|
-
|
|
26
|
-
### Project-specific (if exists)
|
|
27
|
-
```
|
|
28
|
-
.claude/architecture/ # Project overrides
|
|
29
|
-
```
|
|
30
|
-
|
|
31
|
-
**Understand the project structure and patterns before exploring alternatives.**
|
|
19
|
+
## Read Architecture First
|
|
32
20
|
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
| Phase | Agent | Purpose |
|
|
36
|
-
|-------|-------|---------|
|
|
37
|
-
| DEFINE | `@clean-architect` | Define research questions based on architecture |
|
|
38
|
-
| RESEARCH | `@api-designer` | API/integration research |
|
|
39
|
-
| RESEARCH | `@db-designer` | Data model research |
|
|
40
|
-
| RESEARCH | `@security-audit` | Security implications |
|
|
41
|
-
| PROTOTYPE | `@react-frontend-dev`, `@go-backend-dev`, `@laravel-backend-dev`, `@flutter-mobile-dev`, `@remix-fullstack-dev` | Stack-specific POC |
|
|
42
|
-
| EVALUATE | `@clean-architect` | Architecture impact assessment |
|
|
43
|
-
| EVALUATE | `@docs-writer` | Document findings |
|
|
21
|
+
Read `ddd-architecture.md` + stack-specific doc to understand current patterns before exploring alternatives.
|
|
44
22
|
|
|
45
23
|
## Workflow Overview
|
|
46
24
|
|
|
@@ -533,3 +511,26 @@ echo "Date: [date]" >> experiments/README.md
|
|
|
533
511
|
- ADR: [link]
|
|
534
512
|
- Next steps: [list]
|
|
535
513
|
```
|
|
514
|
+
|
|
515
|
+
---
|
|
516
|
+
|
|
517
|
+
## Related Skills
|
|
518
|
+
|
|
519
|
+
| When | Use |
|
|
520
|
+
|------|-----|
|
|
521
|
+
| Just need to compare options via docs (no code) | `/research:web` |
|
|
522
|
+
| Spike done, ready to implement | `/feature:new` |
|
|
523
|
+
| Spike revealed a bug to investigate | `/fix:root-cause` |
|
|
524
|
+
| Document the ADR as a doc page | `/docs:write` |
|
|
525
|
+
|
|
526
|
+
## Recommended Agents
|
|
527
|
+
|
|
528
|
+
| Phase | Agent | Purpose |
|
|
529
|
+
|-------|-------|---------|
|
|
530
|
+
| DEFINE | `@clean-architect` | Frame research questions |
|
|
531
|
+
| RESEARCH | `@api-designer` | API / integration research |
|
|
532
|
+
| RESEARCH | `@db-designer` | Data model research |
|
|
533
|
+
| RESEARCH | `@security-audit` | Security implications |
|
|
534
|
+
| PROTOTYPE | Stack-specific dev agent | Build the POC |
|
|
535
|
+
| EVALUATE | `@clean-architect` | Architecture impact |
|
|
536
|
+
| EVALUATE | `@docs-writer` | Write the ADR |
|