feed-the-machine 1.5.0 → 1.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -21
- package/README.md +170 -170
- package/bin/generate-manifest.mjs +463 -463
- package/bin/install.mjs +491 -491
- package/docs/HOOKS.md +243 -243
- package/docs/INBOX.md +233 -233
- package/ftm/SKILL.md +122 -122
- package/ftm-audit/SKILL.md +623 -541
- package/ftm-audit/references/protocols/PROJECT-PATTERNS.md +91 -91
- package/ftm-audit/references/protocols/RUNTIME-WIRING.md +66 -66
- package/ftm-audit/references/protocols/WIRING-CONTRACTS.md +135 -135
- package/ftm-audit/references/strategies/AUTO-FIX-STRATEGIES.md +69 -69
- package/ftm-audit/references/templates/REPORT-FORMAT.md +96 -96
- package/ftm-audit/scripts/run-knip.sh +23 -23
- package/ftm-audit.yml +2 -2
- package/ftm-brainstorm/SKILL.md +498 -498
- package/ftm-brainstorm/evals/evals.json +100 -100
- package/ftm-brainstorm/evals/promptfoo.yaml +109 -109
- package/ftm-brainstorm/references/agent-prompts.md +224 -224
- package/ftm-brainstorm/references/plan-template.md +121 -121
- package/ftm-brainstorm.yml +2 -2
- package/ftm-browse/SKILL.md +454 -454
- package/ftm-browse/daemon/browser-manager.ts +206 -206
- package/ftm-browse/daemon/bun.lock +30 -30
- package/ftm-browse/daemon/cli.ts +347 -347
- package/ftm-browse/daemon/commands.ts +410 -410
- package/ftm-browse/daemon/main.ts +357 -357
- package/ftm-browse/daemon/package.json +17 -17
- package/ftm-browse/daemon/server.ts +189 -189
- package/ftm-browse/daemon/snapshot.ts +519 -519
- package/ftm-browse/daemon/tsconfig.json +22 -22
- package/ftm-browse.yml +4 -4
- package/ftm-capture/SKILL.md +370 -370
- package/ftm-capture.yml +4 -4
- package/ftm-codex-gate/SKILL.md +361 -361
- package/ftm-codex-gate.yml +2 -2
- package/ftm-config/SKILL.md +345 -345
- package/ftm-config.default.yml +82 -80
- package/ftm-config.yml +2 -2
- package/ftm-council/SKILL.md +416 -416
- package/ftm-council/references/prompts/CLAUDE-INVESTIGATION.md +60 -60
- package/ftm-council/references/prompts/CODEX-INVESTIGATION.md +58 -58
- package/ftm-council/references/prompts/GEMINI-INVESTIGATION.md +58 -58
- package/ftm-council/references/prompts/REBUTTAL-TEMPLATE.md +57 -57
- package/ftm-council/references/protocols/PREREQUISITES.md +47 -47
- package/ftm-council/references/protocols/STEP-0-FRAMING.md +46 -46
- package/ftm-council.yml +2 -2
- package/ftm-dashboard/SKILL.md +163 -163
- package/ftm-dashboard.yml +4 -4
- package/ftm-debug/SKILL.md +1037 -1037
- package/ftm-debug/references/phases/PHASE-0-INTAKE.md +58 -58
- package/ftm-debug/references/phases/PHASE-1-TRIAGE.md +46 -46
- package/ftm-debug/references/phases/PHASE-2-WAR-ROOM-AGENTS.md +279 -279
- package/ftm-debug/references/phases/PHASE-3-TO-6-EXECUTION.md +436 -436
- package/ftm-debug/references/protocols/BLACKBOARD.md +86 -86
- package/ftm-debug/references/protocols/EDGE-CASES.md +103 -103
- package/ftm-debug.yml +2 -2
- package/ftm-diagram/SKILL.md +277 -277
- package/ftm-diagram.yml +2 -2
- package/ftm-executor/SKILL.md +777 -767
- package/ftm-executor/references/STYLE-TEMPLATE.md +73 -73
- package/ftm-executor/references/phases/PHASE-0-VERIFICATION.md +62 -62
- package/ftm-executor/references/phases/PHASE-2-AGENT-ASSEMBLY.md +34 -34
- package/ftm-executor/references/phases/PHASE-3-WORKTREES.md +38 -38
- package/ftm-executor/references/phases/PHASE-4-5-AUDIT.md +72 -72
- package/ftm-executor/references/phases/PHASE-4-DISPATCH.md +66 -66
- package/ftm-executor/references/phases/PHASE-5-5-CODEX-GATE.md +73 -73
- package/ftm-executor/references/protocols/DOCUMENTATION-BOOTSTRAP.md +36 -36
- package/ftm-executor/references/protocols/MODEL-PROFILE.md +59 -44
- package/ftm-executor/references/protocols/PROGRESS-TRACKING.md +66 -66
- package/ftm-executor/runtime/ftm-runtime.mjs +252 -252
- package/ftm-executor/runtime/package.json +8 -8
- package/ftm-executor.yml +2 -2
- package/ftm-git/SKILL.md +441 -441
- package/ftm-git/evals/evals.json +26 -26
- package/ftm-git/evals/promptfoo.yaml +75 -75
- package/ftm-git/hooks/post-commit-experience.sh +92 -92
- package/ftm-git/references/patterns/SECRET-PATTERNS.md +104 -104
- package/ftm-git/references/protocols/REMEDIATION.md +139 -139
- package/ftm-git/scripts/pre-commit-secrets.sh +110 -110
- package/ftm-git.yml +2 -2
- package/ftm-inbox/backend/adapters/_retry.py +64 -64
- package/ftm-inbox/backend/adapters/base.py +230 -230
- package/ftm-inbox/backend/adapters/freshservice.py +104 -104
- package/ftm-inbox/backend/adapters/gmail.py +125 -125
- package/ftm-inbox/backend/adapters/jira.py +136 -136
- package/ftm-inbox/backend/adapters/registry.py +192 -192
- package/ftm-inbox/backend/adapters/slack.py +110 -110
- package/ftm-inbox/backend/db/connection.py +54 -54
- package/ftm-inbox/backend/db/schema.py +78 -78
- package/ftm-inbox/backend/executor/__init__.py +7 -7
- package/ftm-inbox/backend/executor/engine.py +149 -149
- package/ftm-inbox/backend/executor/step_runner.py +98 -98
- package/ftm-inbox/backend/main.py +103 -103
- package/ftm-inbox/backend/models/__init__.py +1 -1
- package/ftm-inbox/backend/models/unified_task.py +36 -36
- package/ftm-inbox/backend/planner/__init__.py +6 -6
- package/ftm-inbox/backend/planner/generator.py +127 -127
- package/ftm-inbox/backend/planner/schema.py +34 -34
- package/ftm-inbox/backend/requirements.txt +5 -5
- package/ftm-inbox/backend/routes/execute.py +186 -186
- package/ftm-inbox/backend/routes/health.py +52 -52
- package/ftm-inbox/backend/routes/inbox.py +68 -68
- package/ftm-inbox/backend/routes/plan.py +271 -271
- package/ftm-inbox/bin/launchagent.mjs +91 -91
- package/ftm-inbox/bin/setup.mjs +188 -188
- package/ftm-inbox/bin/start.sh +10 -10
- package/ftm-inbox/bin/status.sh +17 -17
- package/ftm-inbox/bin/stop.sh +8 -8
- package/ftm-inbox/config.example.yml +55 -55
- package/ftm-inbox/package-lock.json +2898 -2898
- package/ftm-inbox/package.json +26 -26
- package/ftm-inbox/postcss.config.js +6 -6
- package/ftm-inbox/src/app.css +199 -199
- package/ftm-inbox/src/app.html +18 -18
- package/ftm-inbox/src/lib/api.ts +166 -166
- package/ftm-inbox/src/lib/components/ExecutionLog.svelte +81 -81
- package/ftm-inbox/src/lib/components/InboxFeed.svelte +143 -143
- package/ftm-inbox/src/lib/components/PlanStep.svelte +271 -271
- package/ftm-inbox/src/lib/components/PlanView.svelte +206 -206
- package/ftm-inbox/src/lib/components/StreamPanel.svelte +99 -99
- package/ftm-inbox/src/lib/components/TaskCard.svelte +190 -190
- package/ftm-inbox/src/lib/components/ui/EmptyState.svelte +63 -63
- package/ftm-inbox/src/lib/components/ui/KawaiiCard.svelte +86 -86
- package/ftm-inbox/src/lib/components/ui/PillButton.svelte +106 -106
- package/ftm-inbox/src/lib/components/ui/StatusBadge.svelte +67 -67
- package/ftm-inbox/src/lib/components/ui/StreamDrawer.svelte +149 -149
- package/ftm-inbox/src/lib/components/ui/ThemeToggle.svelte +80 -80
- package/ftm-inbox/src/lib/theme.ts +47 -47
- package/ftm-inbox/src/routes/+layout.svelte +76 -76
- package/ftm-inbox/src/routes/+page.svelte +401 -401
- package/ftm-inbox/svelte.config.js +12 -12
- package/ftm-inbox/tailwind.config.ts +63 -63
- package/ftm-inbox/tsconfig.json +13 -13
- package/ftm-inbox/vite.config.ts +6 -6
- package/ftm-intent/SKILL.md +241 -241
- package/ftm-intent.yml +2 -2
- package/ftm-manifest.json +3794 -3794
- package/ftm-map/SKILL.md +291 -291
- package/ftm-map/scripts/db.py +712 -712
- package/ftm-map/scripts/index.py +415 -415
- package/ftm-map/scripts/parser.py +224 -224
- package/ftm-map/scripts/queries/go-tags.scm +20 -20
- package/ftm-map/scripts/queries/javascript-tags.scm +35 -35
- package/ftm-map/scripts/queries/python-tags.scm +31 -31
- package/ftm-map/scripts/queries/ruby-tags.scm +19 -19
- package/ftm-map/scripts/queries/rust-tags.scm +37 -37
- package/ftm-map/scripts/queries/typescript-tags.scm +41 -41
- package/ftm-map/scripts/query.py +301 -301
- package/ftm-map/scripts/ranker.py +377 -377
- package/ftm-map/scripts/requirements.txt +5 -5
- package/ftm-map/scripts/setup-hooks.sh +27 -27
- package/ftm-map/scripts/setup.sh +56 -56
- package/ftm-map/scripts/test_db.py +364 -364
- package/ftm-map/scripts/test_parser.py +174 -174
- package/ftm-map/scripts/test_query.py +183 -183
- package/ftm-map/scripts/test_ranker.py +199 -199
- package/ftm-map/scripts/views.py +591 -591
- package/ftm-map.yml +2 -2
- package/ftm-mind/SKILL.md +1943 -1943
- package/ftm-mind/evals/promptfoo.yaml +142 -142
- package/ftm-mind/references/blackboard-schema.md +328 -328
- package/ftm-mind/references/complexity-guide.md +110 -110
- package/ftm-mind/references/event-registry.md +319 -319
- package/ftm-mind/references/mcp-inventory.md +296 -296
- package/ftm-mind/references/protocols/COMPLEXITY-SIZING.md +72 -72
- package/ftm-mind/references/protocols/MCP-HEURISTICS.md +32 -32
- package/ftm-mind/references/protocols/PLAN-APPROVAL.md +80 -80
- package/ftm-mind/references/reflexion-protocol.md +249 -249
- package/ftm-mind/references/routing/SCENARIOS.md +22 -22
- package/ftm-mind/references/routing-scenarios.md +35 -35
- package/ftm-mind.yml +2 -2
- package/ftm-pause/SKILL.md +395 -395
- package/ftm-pause/references/protocols/SKILL-RESTORE-PROTOCOLS.md +186 -186
- package/ftm-pause/references/protocols/VALIDATION.md +80 -80
- package/ftm-pause.yml +2 -2
- package/ftm-researcher/SKILL.md +275 -275
- package/ftm-researcher/evals/agent-diversity.yaml +17 -17
- package/ftm-researcher/evals/synthesis-quality.yaml +12 -12
- package/ftm-researcher/evals/trigger-accuracy.yaml +39 -39
- package/ftm-researcher/references/adaptive-search.md +116 -116
- package/ftm-researcher/references/agent-prompts.md +193 -193
- package/ftm-researcher/references/council-integration.md +193 -193
- package/ftm-researcher/references/output-format.md +203 -203
- package/ftm-researcher/references/synthesis-pipeline.md +165 -165
- package/ftm-researcher/scripts/score_credibility.py +234 -234
- package/ftm-researcher/scripts/validate_research.py +92 -92
- package/ftm-researcher.yml +2 -2
- package/ftm-resume/SKILL.md +518 -518
- package/ftm-resume/references/protocols/VALIDATION.md +172 -172
- package/ftm-resume.yml +2 -2
- package/ftm-retro/SKILL.md +380 -380
- package/ftm-retro/references/protocols/SCORING-RUBRICS.md +89 -89
- package/ftm-retro/references/templates/REPORT-FORMAT.md +109 -109
- package/ftm-retro.yml +2 -2
- package/ftm-routine/SKILL.md +170 -170
- package/ftm-routine.yml +4 -4
- package/ftm-state/blackboard/capabilities.json +5 -5
- package/ftm-state/blackboard/capabilities.schema.json +27 -27
- package/ftm-state/blackboard/context.json +23 -23
- package/ftm-state/blackboard/experiences/index.json +9 -9
- package/ftm-state/blackboard/patterns.json +6 -6
- package/ftm-state/schemas/context.schema.json +130 -130
- package/ftm-state/schemas/experience-index.schema.json +77 -77
- package/ftm-state/schemas/experience.schema.json +78 -78
- package/ftm-state/schemas/patterns.schema.json +44 -44
- package/ftm-upgrade/SKILL.md +194 -194
- package/ftm-upgrade/scripts/check-version.sh +76 -76
- package/ftm-upgrade/scripts/upgrade.sh +143 -143
- package/ftm-upgrade.yml +2 -2
- package/ftm-verify.yml +2 -2
- package/ftm.yml +2 -2
- package/hooks/ftm-blackboard-enforcer.sh +93 -93
- package/hooks/ftm-discovery-reminder.sh +90 -90
- package/hooks/ftm-drafts-gate.sh +61 -61
- package/hooks/ftm-event-logger.mjs +107 -107
- package/hooks/ftm-map-autodetect.sh +79 -79
- package/hooks/ftm-pending-sync-check.sh +22 -22
- package/hooks/ftm-plan-gate.sh +92 -92
- package/hooks/ftm-post-commit-trigger.sh +57 -57
- package/hooks/settings-template.json +81 -81
- package/install.sh +363 -363
- package/package.json +84 -84
- package/uninstall.sh +25 -25
package/ftm-retro/SKILL.md
CHANGED
|
@@ -1,380 +1,380 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: ftm-retro
|
|
3
|
-
description: Post-execution self-assessment skill. Automatically triggered after ftm-executor completes a plan. Scores execution across 5 dimensions, identifies what went well and what was slow, writes structured report with improvement suggestions. Use when user says "retro", "retrospective", "how did that go", "execution review", "self-assessment", "ftm retro".
|
|
4
|
-
---
|
|
5
|
-
|
|
6
|
-
## Events
|
|
7
|
-
|
|
8
|
-
### Emits
|
|
9
|
-
- `experience_recorded` — when a task outcome, fix attempt, or blocker is written to the blackboard experience log
|
|
10
|
-
- `pattern_discovered` — when a recurring pattern is identified from accumulated experiences and promoted to patterns.json
|
|
11
|
-
- `task_completed` — when the retro report is saved and the self-assessment session concludes
|
|
12
|
-
|
|
13
|
-
### Listens To
|
|
14
|
-
- `task_completed` — micro-reflection trigger: record the task outcome as a structured experience entry
|
|
15
|
-
- `error_encountered` — failure analysis: record the error context as a failure experience for pattern learning
|
|
16
|
-
- `bug_fixed` — success recording: record the fix details as a positive experience (what worked, what the root cause was)
|
|
17
|
-
|
|
18
|
-
# FTM Retro — Post-Execution Self-Assessment
|
|
19
|
-
|
|
20
|
-
Structured retrospective system for ftm-executor plans. Scores execution across 5 evidence-based dimensions, surfaces bottlenecks with specifics, and builds a cumulative pattern library that makes each future execution smarter.
|
|
21
|
-
|
|
22
|
-
## Why This Exists
|
|
23
|
-
|
|
24
|
-
Execution without reflection is a loop with no exit. FTM-retro closes the feedback cycle: every plan run generates a scored report, every report feeds a pattern library, and recurring issues get escalated until they're fixed. The goal is measurable improvement across executions, not vibes.
|
|
25
|
-
|
|
26
|
-
## Operating Modes
|
|
27
|
-
|
|
28
|
-
### Mode 1: Auto-triggered by ftm-executor (Phase 6.5)
|
|
29
|
-
|
|
30
|
-
FTM-executor calls this skill after all waves complete and the final commit is made. It passes execution context directly:
|
|
31
|
-
|
|
32
|
-
- Plan title and absolute path
|
|
33
|
-
- Task count and wave count
|
|
34
|
-
- Total agents spawned
|
|
35
|
-
- Per-task audit results: pass/fail/auto-fix counts per phase
|
|
36
|
-
- Codex gate results per wave (pass/fail + any failures found)
|
|
37
|
-
- Total execution duration
|
|
38
|
-
- Errors, blockers, or manual interventions that occurred
|
|
39
|
-
|
|
40
|
-
When invoked in this mode, proceed directly to scoring — all data is available.
|
|
41
|
-
|
|
42
|
-
### Mode 2: Manual (`/ftm retro`)
|
|
43
|
-
|
|
44
|
-
When invoked without execution context:
|
|
45
|
-
|
|
46
|
-
1. Search the current project for the most recent `PROGRESS.md` file. Read it fully to reconstruct what ran.
|
|
47
|
-
2. If no `PROGRESS.md` exists, check `~/.claude/ftm-retros/` for the most recent `.md` file and ask the user which execution they want to review.
|
|
48
|
-
3. Once context is established, proceed to scoring.
|
|
49
|
-
|
|
50
|
-
Never ask the user to provide data you can find yourself. Read the files.
|
|
51
|
-
|
|
52
|
-
---
|
|
53
|
-
|
|
54
|
-
## Scoring Dimensions
|
|
55
|
-
|
|
56
|
-
Score each dimension 0–10 with a citation to specific data. Do not estimate without evidence — if data is missing, note it and score conservatively.
|
|
57
|
-
|
|
58
|
-
### 1. Wave Parallelism Efficiency (0–10)
|
|
59
|
-
|
|
60
|
-
Were independent tasks actually dispatched in parallel? Could more tasks have been parallelized?
|
|
61
|
-
|
|
62
|
-
- **10**: Every task that could run in parallel did. No serial bottlenecks where parallelism was possible.
|
|
63
|
-
- **7–9**: Minor serial steps that could have been parallel (e.g., final post-processing tasks run sequentially).
|
|
64
|
-
- **4–6**: Significant parallelism opportunities missed. Tasks that had no dependencies ran serially.
|
|
65
|
-
- **1–3**: Nearly all tasks ran serially despite having no dependencies on each other.
|
|
66
|
-
- **0**: Everything was serial regardless of dependency structure.
|
|
67
|
-
|
|
68
|
-
Evidence to cite: wave structure from PROGRESS.md, task dependency graph, agent dispatch timestamps.
|
|
69
|
-
|
|
70
|
-
### 2. Audit Pass Rate (0–10)
|
|
71
|
-
|
|
72
|
-
What percentage of tasks passed ftm-audit on the first attempt?
|
|
73
|
-
|
|
74
|
-
- **10**: 100% first-pass. No task needed a fix cycle.
|
|
75
|
-
- **8**: 90%+ first-pass. One or two tasks needed minor fixes.
|
|
76
|
-
- **6**: 75–89% first-pass.
|
|
77
|
-
- **4**: 50–74% first-pass. Roughly half the tasks needed audit remediation.
|
|
78
|
-
- **2**: Below 50% first-pass.
|
|
79
|
-
- **0**: Every single task failed audit on the first attempt.
|
|
80
|
-
|
|
81
|
-
Evidence to cite: per-task audit results (pass/fail counts, auto-fix counts, manual-fix counts).
|
|
82
|
-
|
|
83
|
-
### 3. Codex Gate Pass Rate (0–10)
|
|
84
|
-
|
|
85
|
-
What percentage of waves passed the ftm-codex-gate on the first attempt?
|
|
86
|
-
|
|
87
|
-
- **10**: All waves passed on first gate run.
|
|
88
|
-
- **7–9**: One wave needed a fix-and-retry.
|
|
89
|
-
- **4–6**: Multiple waves needed retries.
|
|
90
|
-
- **1–3**: Most waves failed the gate at least once.
|
|
91
|
-
- **0**: Every wave failed the gate.
|
|
92
|
-
|
|
93
|
-
Evidence to cite: codex gate results per wave (pass/fail, failure types).
|
|
94
|
-
|
|
95
|
-
### 4. Retry and Fix Count (0–10)
|
|
96
|
-
|
|
97
|
-
How many total review-fix cycles were needed across all tasks and waves? Lower is better.
|
|
98
|
-
|
|
99
|
-
Formula: score = max(0, 10 - (total_retries / task_count) * 5)
|
|
100
|
-
|
|
101
|
-
- **10**: Zero retries.
|
|
102
|
-
- **8**: Fewer than 0.5 retries per task on average.
|
|
103
|
-
- **6**: 0.5–1.0 retries per task.
|
|
104
|
-
- **4**: 1–2 retries per task.
|
|
105
|
-
- **2**: 2–3 retries per task.
|
|
106
|
-
- **0**: More than 3 retries per task on average.
|
|
107
|
-
|
|
108
|
-
Evidence to cite: total retries, broken down by type (audit fix, codex gate retry, manual intervention).
|
|
109
|
-
|
|
110
|
-
### 5. Execution Smoothness (0–10)
|
|
111
|
-
|
|
112
|
-
Subjective but evidence-grounded assessment. Were there blockers, ambiguous plan steps, confusing errors, or required manual interventions?
|
|
113
|
-
|
|
114
|
-
- **10**: Fully autonomous from start to finish. No blockers, no ambiguity, no manual steps.
|
|
115
|
-
- **7–9**: Minor friction — one clarification needed, one unexpected error handled gracefully.
|
|
116
|
-
- **4–6**: Moderate friction — multiple ambiguities, one blocker that paused execution, one manual intervention.
|
|
117
|
-
- **1–3**: Significant friction — repeated blockers, unclear plan steps that caused wrong-direction work, multiple manual interventions.
|
|
118
|
-
- **0**: Execution could not proceed without constant human steering.
|
|
119
|
-
|
|
120
|
-
Evidence to cite: error log entries, any manual interventions recorded in PROGRESS.md, plan ambiguities encountered.
|
|
121
|
-
|
|
122
|
-
---
|
|
123
|
-
|
|
124
|
-
## Report Generation
|
|
125
|
-
|
|
126
|
-
### Step 1: Create retro directory
|
|
127
|
-
|
|
128
|
-
```bash
|
|
129
|
-
mkdir -p ~/.claude/ftm-retros/
|
|
130
|
-
```
|
|
131
|
-
|
|
132
|
-
### Step 2: Generate plan slug
|
|
133
|
-
|
|
134
|
-
Take the plan title, lowercase it, replace spaces with hyphens, strip all non-alphanumeric characters except hyphens.
|
|
135
|
-
|
|
136
|
-
Examples:
|
|
137
|
-
- "FTM Ecosystem Expansion" → `ftm-ecosystem-expansion`
|
|
138
|
-
- "Fix Auth Bug + Rate Limiting" → `fix-auth-bug-rate-limiting`
|
|
139
|
-
- "v2.0 API Refactor" → `v20-api-refactor`
|
|
140
|
-
|
|
141
|
-
### Step 3: Check for past retros
|
|
142
|
-
|
|
143
|
-
Before writing anything, check whether any `.md` files exist in `~/.claude/ftm-retros/`. If they do, read them all. You will use them for the Pattern Analysis section.
|
|
144
|
-
|
|
145
|
-
### Step 4: Write the report
|
|
146
|
-
|
|
147
|
-
Save to: `~/.claude/ftm-retros/{plan-slug}-{YYYY-MM-DD}.md`
|
|
148
|
-
|
|
149
|
-
Use this exact format:
|
|
150
|
-
|
|
151
|
-
```markdown
|
|
152
|
-
# Retro: {Plan Title}
|
|
153
|
-
|
|
154
|
-
**Date:** {YYYY-MM-DD}
|
|
155
|
-
**Plan:** {absolute path to plan file}
|
|
156
|
-
**Duration:** {total execution time, e.g. "47 minutes"}
|
|
157
|
-
|
|
158
|
-
## Scores
|
|
159
|
-
|
|
160
|
-
| Dimension | Score | Notes |
|
|
161
|
-
|-----------|-------|-------|
|
|
162
|
-
| Wave Parallelism | X/10 | {1-sentence justification with data} |
|
|
163
|
-
| Audit Pass Rate | X/10 | {N}/{total} tasks first-pass |
|
|
164
|
-
| Codex Gate Pass Rate | X/10 | {N}/{total} waves first-pass |
|
|
165
|
-
| Retry/Fix Count | X/10 | {total retries} across {N} tasks |
|
|
166
|
-
| Execution Smoothness | X/10 | {1-sentence justification} |
|
|
167
|
-
|
|
168
|
-
**Overall: {sum}/50**
|
|
169
|
-
|
|
170
|
-
## Raw Data
|
|
171
|
-
|
|
172
|
-
- Tasks: {N}
|
|
173
|
-
- Waves: {N}
|
|
174
|
-
- Agents spawned: {N}
|
|
175
|
-
- Audit findings: {N} total ({N} auto-fixed, {N} manual)
|
|
176
|
-
- Codex gate results: Wave 1: pass | Wave 2: fail → pass | Wave 3: pass
|
|
177
|
-
- Errors/blockers: {list any, or "none"}
|
|
178
|
-
|
|
179
|
-
## What Went Well
|
|
180
|
-
|
|
181
|
-
{2–4 specific observations, each grounded in a data point or task number.}
|
|
182
|
-
|
|
183
|
-
Example format:
|
|
184
|
-
- **Task 3 (auth middleware)** completed in a single commit with zero audit findings. The agent prompt had clear acceptance criteria and a scoped file list — the agent never wandered.
|
|
185
|
-
- **Wave 2 parallelism** was fully utilized: all 4 tasks dispatched simultaneously, cutting estimated serial time from ~32 minutes to ~9 minutes.
|
|
186
|
-
|
|
187
|
-
## What Was Slow
|
|
188
|
-
|
|
189
|
-
{2–4 specific bottlenecks with timing data or retry counts where available.}
|
|
190
|
-
|
|
191
|
-
Example format:
|
|
192
|
-
- **ftm-audit Phase 1 (knip)** repeated full project analysis for each task in wave 3, even though tasks only touched 2–3 files each. Added ~40s × 5 tasks = ~3.5 minutes of unnecessary scanning.
|
|
193
|
-
- **Task 7 needed 3 audit fix cycles** due to an import path that kept regenerating incorrectly. The agent prompt did not specify the alias configuration in tsconfig.paths.
|
|
194
|
-
|
|
195
|
-
## Proposed Improvements
|
|
196
|
-
|
|
197
|
-
{3–5 specific, actionable suggestions. Each must identify: which skill to change, what to change exactly, and why it would help.}
|
|
198
|
-
|
|
199
|
-
Format each as:
|
|
200
|
-
**N. {Short title}** — {Skill to change} — {Specific change} — {Expected impact}
|
|
201
|
-
|
|
202
|
-
Examples:
|
|
203
|
-
1. **Cache knip results within a wave** — ftm-audit — In Phase 1, check whether knip results are already cached for the current wave (via a temp file at `/tmp/ftm-knip-cache-{wave-id}.json`). Only re-run knip if the cache is missing or if the files changed by this task differ from cached scope. Expected: 3× speedup for ftm-audit on large projects with many tasks per wave.
|
|
204
|
-
2. **Dispatch Instrumentor and Researcher in parallel** — ftm-debug — These two agents have no shared state and currently run sequentially. Dispatch them simultaneously. Expected: ~40% reduction in ftm-debug total runtime.
|
|
205
|
-
3. **Add tsconfig.paths to agent context for TypeScript projects** — ftm-executor — When generating agent prompts for TypeScript tasks, include the relevant `paths` aliases from `tsconfig.json`. Expected: eliminates the import-alias regeneration loop that caused 3 retries on Task 7.
|
|
206
|
-
|
|
207
|
-
## Pattern Analysis
|
|
208
|
-
|
|
209
|
-
{Only include this section if past retros exist in ~/.claude/ftm-retros/}
|
|
210
|
-
|
|
211
|
-
### Recurring Issues
|
|
212
|
-
|
|
213
|
-
{List problems that appeared in 2 or more retros. Format: "Issue description — appeared in: retro-slug-1, retro-slug-2"}
|
|
214
|
-
|
|
215
|
-
### Score Trends
|
|
216
|
-
|
|
217
|
-
{Compare overall scores across retros. Are they improving, declining, or stable? Cite actual numbers.}
|
|
218
|
-
|
|
219
|
-
Example: Overall scores: 32/50 → 38/50 → 41/50 across the last 3 retros. Parallelism and smoothness improving; audit pass rate stuck at 6/10 for all three runs.
|
|
220
|
-
|
|
221
|
-
### Unaddressed Suggestions
|
|
222
|
-
|
|
223
|
-
{List proposed improvements from past retros that have not yet been implemented. These get escalated — flag them explicitly.}
|
|
224
|
-
|
|
225
|
-
Format: "**[ESCALATED]** {suggestion} — first proposed in {retro-slug-date}, appeared {N} times"
|
|
226
|
-
```
|
|
227
|
-
|
|
228
|
-
---
|
|
229
|
-
|
|
230
|
-
## Key Behaviors
|
|
231
|
-
|
|
232
|
-
### Evidence-first scoring
|
|
233
|
-
|
|
234
|
-
Every score needs a citation. "Tasks passed audit" is not a citation. "12/14 tasks passed audit on first attempt; Tasks 3 and 9 each needed one auto-fix cycle" is a citation. If the data to score a dimension is genuinely unavailable, note the gap explicitly and score conservatively (assume worst case for that dimension).
|
|
235
|
-
|
|
236
|
-
### Improvement specificity
|
|
237
|
-
|
|
238
|
-
"Improve parallelism" is not an improvement proposal. "Add a dependency pre-check step to ftm-executor Phase 2 that flags tasks with no declared dependencies as parallelizable, and warn when they are dispatched serially" is an improvement proposal. Every proposed improvement must be concrete enough that a future session could implement it from the description alone without asking clarifying questions.
|
|
239
|
-
|
|
240
|
-
### Pattern escalation
|
|
241
|
-
|
|
242
|
-
Recurring issues that have appeared in 3+ retros without being addressed should be flagged with `[ESCALATED - 3+ occurrences]` and moved to the top of the Proposed Improvements list. These are systemic problems, not one-off noise.
|
|
243
|
-
|
|
244
|
-
### No vibes
|
|
245
|
-
|
|
246
|
-
Do not write "the execution felt smooth" or "agents seemed efficient." Write "0 manual interventions were required and all errors were caught and auto-resolved by ftm-audit Phase 2." The report is read by future executions that need to calibrate behavior, not by humans looking for encouragement.
|
|
247
|
-
|
|
248
|
-
---
|
|
249
|
-
|
|
250
|
-
## Output
|
|
251
|
-
|
|
252
|
-
After saving the retro file, print to the user:
|
|
253
|
-
|
|
254
|
-
```
|
|
255
|
-
Retro saved: ~/.claude/ftm-retros/{filename}
|
|
256
|
-
|
|
257
|
-
Overall: {score}/50
|
|
258
|
-
Top issue: {single most impactful bottleneck in one sentence}
|
|
259
|
-
Top suggestion: {single highest-value proposed improvement in one sentence}
|
|
260
|
-
```
|
|
261
|
-
|
|
262
|
-
Do not print the full report to the terminal — it lives in the file. The summary above is sufficient for the user to know the run completed and where to find details.
|
|
263
|
-
|
|
264
|
-
---
|
|
265
|
-
|
|
266
|
-
## Micro-Reflection Mode
|
|
267
|
-
|
|
268
|
-
Micro-reflections are lightweight experience entries recorded after significant actions — not just full executor runs. The mind triggers this mode via the `task_completed`, `error_encountered`, and `bug_fixed` events.
|
|
269
|
-
|
|
270
|
-
### Trigger Events
|
|
271
|
-
- `task_completed` — any task completion (micro through large)
|
|
272
|
-
- `bug_fixed` — a bug was resolved
|
|
273
|
-
- `error_encountered` — an unexpected error during execution
|
|
274
|
-
- `code_committed` — a meaningful commit was made
|
|
275
|
-
- `plan_generated` — a plan was created from brainstorming
|
|
276
|
-
- `user_correction` — the user corrected the mind's approach
|
|
277
|
-
|
|
278
|
-
### Reflection Format (Verbal RL)
|
|
279
|
-
|
|
280
|
-
For each trigger, generate a structured reflection:
|
|
281
|
-
|
|
282
|
-
"I [succeeded/failed/partially succeeded] at [task description] because [specific reason].
|
|
283
|
-
Next time I should [concrete actionable adjustment].
|
|
284
|
-
Confidence: [low/medium/high]"
|
|
285
|
-
|
|
286
|
-
### Experience Entry Creation
|
|
287
|
-
|
|
288
|
-
Write a structured experience entry to `~/.claude/ftm-state/blackboard/experiences/YYYY-MM-DD_task-slug.json` following the schema in blackboard-schema.md.
|
|
289
|
-
|
|
290
|
-
Key fields:
|
|
291
|
-
- `task_type`: derived from the task
|
|
292
|
-
- `description`: 1-2 sentence summary
|
|
293
|
-
- `approach`: what was tried
|
|
294
|
-
- `outcome`: success/partial/failure
|
|
295
|
-
- `lessons`: concrete, actionable takeaways — the verbal RL reflection above, decomposed into individual lesson strings
|
|
296
|
-
- `complexity_estimated` vs `complexity_actual`: track both for calibration
|
|
297
|
-
- `capabilities_used`: skills, MCPs, and agent types activated
|
|
298
|
-
- `tags`: searchable labels
|
|
299
|
-
- `confidence`: low for first-time observations, medium for confirmed patterns
|
|
300
|
-
|
|
301
|
-
### Pattern Extraction
|
|
302
|
-
|
|
303
|
-
After writing an experience, check for pattern promotion:
|
|
304
|
-
|
|
305
|
-
1. Read `experiences/index.json`
|
|
306
|
-
2. Count entries with overlapping `task_type` AND `tags` that share the same lesson theme
|
|
307
|
-
3. If 3+ similar experiences exist with the same lesson → promote to `patterns.json`:
|
|
308
|
-
- Choose the appropriate category (codebase_insights, execution_patterns, user_behavior, recurring_issues)
|
|
309
|
-
- Set `confidence: "low"` for newly promoted patterns (3 occurrences)
|
|
310
|
-
- Raise to `"medium"` at 5+, `"high"` at 8+
|
|
311
|
-
|
|
312
|
-
### Pattern Decay
|
|
313
|
-
|
|
314
|
-
Patterns that are not reinforced within 30 days should have their confidence reduced:
|
|
315
|
-
- `high` → `medium`
|
|
316
|
-
- `medium` → `low`
|
|
317
|
-
- `low` → remove from patterns.json
|
|
318
|
-
|
|
319
|
-
Check for decay when reading patterns.json during any blackboard operation.
|
|
320
|
-
|
|
321
|
-
### Cold-Start Behavior
|
|
322
|
-
|
|
323
|
-
During the first ~10 interactions (when `experiences/index.json` has `total_count < 10`):
|
|
324
|
-
- Record EVERY completed task, even trivial ones
|
|
325
|
-
- Set `confidence: "low"` on all entries
|
|
326
|
-
- Prioritize breadth of recording over depth of analysis
|
|
327
|
-
|
|
328
|
-
## Requirements
|
|
329
|
-
|
|
330
|
-
- reference: `PROGRESS.md` | optional | executor progress log for auto-triggered mode
|
|
331
|
-
- reference: `~/.claude/ftm-retros/` | optional | prior retro files for pattern analysis
|
|
332
|
-
- reference: `references/protocols/SCORING-RUBRICS.md` | required | scoring scale breakpoints and evidence requirements
|
|
333
|
-
- reference: `references/templates/REPORT-FORMAT.md` | required | retro report output template
|
|
334
|
-
- reference: `~/.claude/ftm-state/blackboard/experiences/index.json` | optional | experience inventory for micro-reflection mode
|
|
335
|
-
- reference: `~/.claude/ftm-state/blackboard/patterns.json` | optional | pattern registry for promotion and decay
|
|
336
|
-
|
|
337
|
-
## Risk
|
|
338
|
-
|
|
339
|
-
- level: low_write
|
|
340
|
-
- scope: writes retro report to ~/.claude/ftm-retros/; writes experience files to blackboard; promotes patterns to patterns.json; does not modify project source files
|
|
341
|
-
- rollback: delete retro report file; remove experience entry from blackboard
|
|
342
|
-
|
|
343
|
-
## Approval Gates
|
|
344
|
-
|
|
345
|
-
- trigger: pattern promotion triggered (3+ matching experiences) | action: auto-promote to patterns.json without user gate (learning system behavior)
|
|
346
|
-
- complexity_routing: micro → auto | small → auto | medium → auto | large → auto | xl → auto
|
|
347
|
-
|
|
348
|
-
## Fallbacks
|
|
349
|
-
|
|
350
|
-
- condition: PROGRESS.md not found and manual mode | action: check ~/.claude/ftm-retros/ for most recent .md file; ask user which execution to review if multiple found
|
|
351
|
-
- condition: execution context not provided by ftm-executor | action: reconstruct from PROGRESS.md or ask user for context
|
|
352
|
-
- condition: scoring rubric file missing | action: apply built-in scoring heuristics from skill body
|
|
353
|
-
- condition: experiences/index.json has fewer than 10 entries | action: cold-start mode — record every task, set all confidence to low
|
|
354
|
-
|
|
355
|
-
## Capabilities
|
|
356
|
-
|
|
357
|
-
- env: none required
|
|
358
|
-
|
|
359
|
-
## Event Payloads
|
|
360
|
-
|
|
361
|
-
### experience_recorded
|
|
362
|
-
- skill: string — "ftm-retro"
|
|
363
|
-
- experience_path: string — path to written experience file
|
|
364
|
-
- task_type: string — type of task recorded
|
|
365
|
-
- outcome: string — success | partial | failure
|
|
366
|
-
- confidence: string — low | medium | high
|
|
367
|
-
|
|
368
|
-
### pattern_discovered
|
|
369
|
-
- skill: string — "ftm-retro"
|
|
370
|
-
- pattern_name: string — name of the promoted pattern
|
|
371
|
-
- category: string — codebase_insights | execution_patterns | user_behavior | recurring_issues
|
|
372
|
-
- occurrence_count: number — number of experiences that triggered promotion
|
|
373
|
-
- confidence: string — low | medium | high
|
|
374
|
-
|
|
375
|
-
### task_completed
|
|
376
|
-
- skill: string — "ftm-retro"
|
|
377
|
-
- report_path: string — absolute path to saved retro report
|
|
378
|
-
- overall_score: number — total score out of 50
|
|
379
|
-
- top_issue: string — most impactful bottleneck identified
|
|
380
|
-
- patterns_promoted: number — new patterns added to patterns.json
|
|
1
|
+
---
|
|
2
|
+
name: ftm-retro
|
|
3
|
+
description: Post-execution self-assessment skill. Automatically triggered after ftm-executor completes a plan. Scores execution across 5 dimensions, identifies what went well and what was slow, writes structured report with improvement suggestions. Use when user says "retro", "retrospective", "how did that go", "execution review", "self-assessment", "ftm retro".
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
## Events
|
|
7
|
+
|
|
8
|
+
### Emits
|
|
9
|
+
- `experience_recorded` — when a task outcome, fix attempt, or blocker is written to the blackboard experience log
|
|
10
|
+
- `pattern_discovered` — when a recurring pattern is identified from accumulated experiences and promoted to patterns.json
|
|
11
|
+
- `task_completed` — when the retro report is saved and the self-assessment session concludes
|
|
12
|
+
|
|
13
|
+
### Listens To
|
|
14
|
+
- `task_completed` — micro-reflection trigger: record the task outcome as a structured experience entry
|
|
15
|
+
- `error_encountered` — failure analysis: record the error context as a failure experience for pattern learning
|
|
16
|
+
- `bug_fixed` — success recording: record the fix details as a positive experience (what worked, what the root cause was)
|
|
17
|
+
|
|
18
|
+
# FTM Retro — Post-Execution Self-Assessment
|
|
19
|
+
|
|
20
|
+
Structured retrospective system for ftm-executor plans. Scores execution across 5 evidence-based dimensions, surfaces bottlenecks with specifics, and builds a cumulative pattern library that makes each future execution smarter.
|
|
21
|
+
|
|
22
|
+
## Why This Exists
|
|
23
|
+
|
|
24
|
+
Execution without reflection is a loop with no exit. FTM-retro closes the feedback cycle: every plan run generates a scored report, every report feeds a pattern library, and recurring issues get escalated until they're fixed. The goal is measurable improvement across executions, not vibes.
|
|
25
|
+
|
|
26
|
+
## Operating Modes
|
|
27
|
+
|
|
28
|
+
### Mode 1: Auto-triggered by ftm-executor (Phase 6.5)
|
|
29
|
+
|
|
30
|
+
FTM-executor calls this skill after all waves complete and the final commit is made. It passes execution context directly:
|
|
31
|
+
|
|
32
|
+
- Plan title and absolute path
|
|
33
|
+
- Task count and wave count
|
|
34
|
+
- Total agents spawned
|
|
35
|
+
- Per-task audit results: pass/fail/auto-fix counts per phase
|
|
36
|
+
- Codex gate results per wave (pass/fail + any failures found)
|
|
37
|
+
- Total execution duration
|
|
38
|
+
- Errors, blockers, or manual interventions that occurred
|
|
39
|
+
|
|
40
|
+
When invoked in this mode, proceed directly to scoring — all data is available.
|
|
41
|
+
|
|
42
|
+
### Mode 2: Manual (`/ftm retro`)
|
|
43
|
+
|
|
44
|
+
When invoked without execution context:
|
|
45
|
+
|
|
46
|
+
1. Search the current project for the most recent `PROGRESS.md` file. Read it fully to reconstruct what ran.
|
|
47
|
+
2. If no `PROGRESS.md` exists, check `~/.claude/ftm-retros/` for the most recent `.md` file and ask the user which execution they want to review.
|
|
48
|
+
3. Once context is established, proceed to scoring.
|
|
49
|
+
|
|
50
|
+
Never ask the user to provide data you can find yourself. Read the files.
|
|
51
|
+
|
|
52
|
+
---
|
|
53
|
+
|
|
54
|
+
## Scoring Dimensions
|
|
55
|
+
|
|
56
|
+
Score each dimension 0–10 with a citation to specific data. Do not estimate without evidence — if data is missing, note it and score conservatively.
|
|
57
|
+
|
|
58
|
+
### 1. Wave Parallelism Efficiency (0–10)
|
|
59
|
+
|
|
60
|
+
Were independent tasks actually dispatched in parallel? Could more tasks have been parallelized?
|
|
61
|
+
|
|
62
|
+
- **10**: Every task that could run in parallel did. No serial bottlenecks where parallelism was possible.
|
|
63
|
+
- **7–9**: Minor serial steps that could have been parallel (e.g., final post-processing tasks run sequentially).
|
|
64
|
+
- **4–6**: Significant parallelism opportunities missed. Tasks that had no dependencies ran serially.
|
|
65
|
+
- **1–3**: Nearly all tasks ran serially despite having no dependencies on each other.
|
|
66
|
+
- **0**: Everything was serial regardless of dependency structure.
|
|
67
|
+
|
|
68
|
+
Evidence to cite: wave structure from PROGRESS.md, task dependency graph, agent dispatch timestamps.
|
|
69
|
+
|
|
70
|
+
### 2. Audit Pass Rate (0–10)
|
|
71
|
+
|
|
72
|
+
What percentage of tasks passed ftm-audit on the first attempt?
|
|
73
|
+
|
|
74
|
+
- **10**: 100% first-pass. No task needed a fix cycle.
|
|
75
|
+
- **8**: 90%+ first-pass. One or two tasks needed minor fixes.
|
|
76
|
+
- **6**: 75–89% first-pass.
|
|
77
|
+
- **4**: 50–74% first-pass. Roughly half the tasks needed audit remediation.
|
|
78
|
+
- **2**: Below 50% first-pass.
|
|
79
|
+
- **0**: Every single task failed audit on the first attempt.
|
|
80
|
+
|
|
81
|
+
Evidence to cite: per-task audit results (pass/fail counts, auto-fix counts, manual-fix counts).
|
|
82
|
+
|
|
83
|
+
### 3. Codex Gate Pass Rate (0–10)
|
|
84
|
+
|
|
85
|
+
What percentage of waves passed the ftm-codex-gate on the first attempt?
|
|
86
|
+
|
|
87
|
+
- **10**: All waves passed on first gate run.
|
|
88
|
+
- **7–9**: One wave needed a fix-and-retry.
|
|
89
|
+
- **4–6**: Multiple waves needed retries.
|
|
90
|
+
- **1–3**: Most waves failed the gate at least once.
|
|
91
|
+
- **0**: Every wave failed the gate.
|
|
92
|
+
|
|
93
|
+
Evidence to cite: codex gate results per wave (pass/fail, failure types).
|
|
94
|
+
|
|
95
|
+
### 4. Retry and Fix Count (0–10)
|
|
96
|
+
|
|
97
|
+
How many total review-fix cycles were needed across all tasks and waves? Lower is better.
|
|
98
|
+
|
|
99
|
+
Formula: score = max(0, 10 - (total_retries / task_count) * 5)
|
|
100
|
+
|
|
101
|
+
- **10**: Zero retries.
|
|
102
|
+
- **8**: Fewer than 0.5 retries per task on average.
|
|
103
|
+
- **6**: 0.5–1.0 retries per task.
|
|
104
|
+
- **4**: 1–2 retries per task.
|
|
105
|
+
- **2**: 2–3 retries per task.
|
|
106
|
+
- **0**: More than 3 retries per task on average.
|
|
107
|
+
|
|
108
|
+
Evidence to cite: total retries, broken down by type (audit fix, codex gate retry, manual intervention).
|
|
109
|
+
|
|
110
|
+
### 5. Execution Smoothness (0–10)
|
|
111
|
+
|
|
112
|
+
Subjective but evidence-grounded assessment. Were there blockers, ambiguous plan steps, confusing errors, or required manual interventions?
|
|
113
|
+
|
|
114
|
+
- **10**: Fully autonomous from start to finish. No blockers, no ambiguity, no manual steps.
|
|
115
|
+
- **7–9**: Minor friction — one clarification needed, one unexpected error handled gracefully.
|
|
116
|
+
- **4–6**: Moderate friction — multiple ambiguities, one blocker that paused execution, one manual intervention.
|
|
117
|
+
- **1–3**: Significant friction — repeated blockers, unclear plan steps that caused wrong-direction work, multiple manual interventions.
|
|
118
|
+
- **0**: Execution could not proceed without constant human steering.
|
|
119
|
+
|
|
120
|
+
Evidence to cite: error log entries, any manual interventions recorded in PROGRESS.md, plan ambiguities encountered.
|
|
121
|
+
|
|
122
|
+
---
|
|
123
|
+
|
|
124
|
+
## Report Generation
|
|
125
|
+
|
|
126
|
+
### Step 1: Create retro directory
|
|
127
|
+
|
|
128
|
+
```bash
|
|
129
|
+
mkdir -p ~/.claude/ftm-retros/
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
### Step 2: Generate plan slug
|
|
133
|
+
|
|
134
|
+
Take the plan title, lowercase it, replace spaces with hyphens, strip all non-alphanumeric characters except hyphens.
|
|
135
|
+
|
|
136
|
+
Examples:
|
|
137
|
+
- "FTM Ecosystem Expansion" → `ftm-ecosystem-expansion`
|
|
138
|
+
- "Fix Auth Bug + Rate Limiting" → `fix-auth-bug-rate-limiting`
|
|
139
|
+
- "v2.0 API Refactor" → `v20-api-refactor`
|
|
140
|
+
|
|
141
|
+
### Step 3: Check for past retros
|
|
142
|
+
|
|
143
|
+
Before writing anything, check whether any `.md` files exist in `~/.claude/ftm-retros/`. If they do, read them all. You will use them for the Pattern Analysis section.
|
|
144
|
+
|
|
145
|
+
### Step 4: Write the report
|
|
146
|
+
|
|
147
|
+
Save to: `~/.claude/ftm-retros/{plan-slug}-{YYYY-MM-DD}.md`
|
|
148
|
+
|
|
149
|
+
Use this exact format:
|
|
150
|
+
|
|
151
|
+
```markdown
|
|
152
|
+
# Retro: {Plan Title}
|
|
153
|
+
|
|
154
|
+
**Date:** {YYYY-MM-DD}
|
|
155
|
+
**Plan:** {absolute path to plan file}
|
|
156
|
+
**Duration:** {total execution time, e.g. "47 minutes"}
|
|
157
|
+
|
|
158
|
+
## Scores
|
|
159
|
+
|
|
160
|
+
| Dimension | Score | Notes |
|
|
161
|
+
|-----------|-------|-------|
|
|
162
|
+
| Wave Parallelism | X/10 | {1-sentence justification with data} |
|
|
163
|
+
| Audit Pass Rate | X/10 | {N}/{total} tasks first-pass |
|
|
164
|
+
| Codex Gate Pass Rate | X/10 | {N}/{total} waves first-pass |
|
|
165
|
+
| Retry/Fix Count | X/10 | {total retries} across {N} tasks |
|
|
166
|
+
| Execution Smoothness | X/10 | {1-sentence justification} |
|
|
167
|
+
|
|
168
|
+
**Overall: {sum}/50**
|
|
169
|
+
|
|
170
|
+
## Raw Data
|
|
171
|
+
|
|
172
|
+
- Tasks: {N}
|
|
173
|
+
- Waves: {N}
|
|
174
|
+
- Agents spawned: {N}
|
|
175
|
+
- Audit findings: {N} total ({N} auto-fixed, {N} manual)
|
|
176
|
+
- Codex gate results: Wave 1: pass | Wave 2: fail → pass | Wave 3: pass
|
|
177
|
+
- Errors/blockers: {list any, or "none"}
|
|
178
|
+
|
|
179
|
+
## What Went Well
|
|
180
|
+
|
|
181
|
+
{2–4 specific observations, each grounded in a data point or task number.}
|
|
182
|
+
|
|
183
|
+
Example format:
|
|
184
|
+
- **Task 3 (auth middleware)** completed in a single commit with zero audit findings. The agent prompt had clear acceptance criteria and a scoped file list — the agent never wandered.
|
|
185
|
+
- **Wave 2 parallelism** was fully utilized: all 4 tasks dispatched simultaneously, cutting estimated serial time from ~32 minutes to ~9 minutes.
|
|
186
|
+
|
|
187
|
+
## What Was Slow
|
|
188
|
+
|
|
189
|
+
{2–4 specific bottlenecks with timing data or retry counts where available.}
|
|
190
|
+
|
|
191
|
+
Example format:
|
|
192
|
+
- **ftm-audit Phase 1 (knip)** repeated full project analysis for each task in wave 3, even though tasks only touched 2–3 files each. Added ~40s × 5 tasks = ~3.5 minutes of unnecessary scanning.
|
|
193
|
+
- **Task 7 needed 3 audit fix cycles** due to an import path that kept regenerating incorrectly. The agent prompt did not specify the alias configuration in tsconfig.paths.
|
|
194
|
+
|
|
195
|
+
## Proposed Improvements
|
|
196
|
+
|
|
197
|
+
{3–5 specific, actionable suggestions. Each must identify: which skill to change, what to change exactly, and why it would help.}
|
|
198
|
+
|
|
199
|
+
Format each as:
|
|
200
|
+
**N. {Short title}** — {Skill to change} — {Specific change} — {Expected impact}
|
|
201
|
+
|
|
202
|
+
Examples:
|
|
203
|
+
1. **Cache knip results within a wave** — ftm-audit — In Phase 1, check whether knip results are already cached for the current wave (via a temp file at `/tmp/ftm-knip-cache-{wave-id}.json`). Only re-run knip if the cache is missing or if the files changed by this task differ from cached scope. Expected: 3× speedup for ftm-audit on large projects with many tasks per wave.
|
|
204
|
+
2. **Dispatch Instrumentor and Researcher in parallel** — ftm-debug — These two agents have no shared state and currently run sequentially. Dispatch them simultaneously. Expected: ~40% reduction in ftm-debug total runtime.
|
|
205
|
+
3. **Add tsconfig.paths to agent context for TypeScript projects** — ftm-executor — When generating agent prompts for TypeScript tasks, include the relevant `paths` aliases from `tsconfig.json`. Expected: eliminates the import-alias regeneration loop that caused 3 retries on Task 7.
|
|
206
|
+
|
|
207
|
+
## Pattern Analysis
|
|
208
|
+
|
|
209
|
+
{Only include this section if past retros exist in ~/.claude/ftm-retros/}
|
|
210
|
+
|
|
211
|
+
### Recurring Issues
|
|
212
|
+
|
|
213
|
+
{List problems that appeared in 2 or more retros. Format: "Issue description — appeared in: retro-slug-1, retro-slug-2"}
|
|
214
|
+
|
|
215
|
+
### Score Trends
|
|
216
|
+
|
|
217
|
+
{Compare overall scores across retros. Are they improving, declining, or stable? Cite actual numbers.}
|
|
218
|
+
|
|
219
|
+
Example: Overall scores: 32/50 → 38/50 → 41/50 across the last 3 retros. Parallelism and smoothness improving; audit pass rate stuck at 6/10 for all three runs.
|
|
220
|
+
|
|
221
|
+
### Unaddressed Suggestions
|
|
222
|
+
|
|
223
|
+
{List proposed improvements from past retros that have not yet been implemented. These get escalated — flag them explicitly.}
|
|
224
|
+
|
|
225
|
+
Format: "**[ESCALATED]** {suggestion} — first proposed in {retro-slug-date}, appeared {N} times"
|
|
226
|
+
```
|
|
227
|
+
|
|
228
|
+
---
|
|
229
|
+
|
|
230
|
+
## Key Behaviors
|
|
231
|
+
|
|
232
|
+
### Evidence-first scoring
|
|
233
|
+
|
|
234
|
+
Every score needs a citation. "Tasks passed audit" is not a citation. "12/14 tasks passed audit on first attempt; Tasks 3 and 9 each needed one auto-fix cycle" is a citation. If the data to score a dimension is genuinely unavailable, note the gap explicitly and score conservatively (assume worst case for that dimension).
|
|
235
|
+
|
|
236
|
+
### Improvement specificity
|
|
237
|
+
|
|
238
|
+
"Improve parallelism" is not an improvement proposal. "Add a dependency pre-check step to ftm-executor Phase 2 that flags tasks with no declared dependencies as parallelizable, and warn when they are dispatched serially" is an improvement proposal. Every proposed improvement must be concrete enough that a future session could implement it from the description alone without asking clarifying questions.
|
|
239
|
+
|
|
240
|
+
### Pattern escalation
|
|
241
|
+
|
|
242
|
+
Recurring issues that have appeared in 3+ retros without being addressed should be flagged with `[ESCALATED - 3+ occurrences]` and moved to the top of the Proposed Improvements list. These are systemic problems, not one-off noise.
|
|
243
|
+
|
|
244
|
+
### No vibes
|
|
245
|
+
|
|
246
|
+
Do not write "the execution felt smooth" or "agents seemed efficient." Write "0 manual interventions were required and all errors were caught and auto-resolved by ftm-audit Phase 2." The report is read by future executions that need to calibrate behavior, not by humans looking for encouragement.
|
|
247
|
+
|
|
248
|
+
---
|
|
249
|
+
|
|
250
|
+
## Output
|
|
251
|
+
|
|
252
|
+
After saving the retro file, print to the user:
|
|
253
|
+
|
|
254
|
+
```
|
|
255
|
+
Retro saved: ~/.claude/ftm-retros/{filename}
|
|
256
|
+
|
|
257
|
+
Overall: {score}/50
|
|
258
|
+
Top issue: {single most impactful bottleneck in one sentence}
|
|
259
|
+
Top suggestion: {single highest-value proposed improvement in one sentence}
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
Do not print the full report to the terminal — it lives in the file. The summary above is sufficient for the user to know the run completed and where to find details.
|
|
263
|
+
|
|
264
|
+
---
|
|
265
|
+
|
|
266
|
+
## Micro-Reflection Mode
|
|
267
|
+
|
|
268
|
+
Micro-reflections are lightweight experience entries recorded after significant actions — not just full executor runs. The mind triggers this mode via the `task_completed`, `error_encountered`, and `bug_fixed` events.
|
|
269
|
+
|
|
270
|
+
### Trigger Events
|
|
271
|
+
- `task_completed` — any task completion (micro through large)
|
|
272
|
+
- `bug_fixed` — a bug was resolved
|
|
273
|
+
- `error_encountered` — an unexpected error during execution
|
|
274
|
+
- `code_committed` — a meaningful commit was made
|
|
275
|
+
- `plan_generated` — a plan was created from brainstorming
|
|
276
|
+
- `user_correction` — the user corrected the mind's approach
|
|
277
|
+
|
|
278
|
+
### Reflection Format (Verbal RL)
|
|
279
|
+
|
|
280
|
+
For each trigger, generate a structured reflection:
|
|
281
|
+
|
|
282
|
+
"I [succeeded/failed/partially succeeded] at [task description] because [specific reason].
|
|
283
|
+
Next time I should [concrete actionable adjustment].
|
|
284
|
+
Confidence: [low/medium/high]"
|
|
285
|
+
|
|
286
|
+
### Experience Entry Creation
|
|
287
|
+
|
|
288
|
+
Write a structured experience entry to `~/.claude/ftm-state/blackboard/experiences/YYYY-MM-DD_task-slug.json` following the schema in blackboard-schema.md.
|
|
289
|
+
|
|
290
|
+
Key fields:
|
|
291
|
+
- `task_type`: derived from the task
|
|
292
|
+
- `description`: 1-2 sentence summary
|
|
293
|
+
- `approach`: what was tried
|
|
294
|
+
- `outcome`: success/partial/failure
|
|
295
|
+
- `lessons`: concrete, actionable takeaways — the verbal RL reflection above, decomposed into individual lesson strings
|
|
296
|
+
- `complexity_estimated` vs `complexity_actual`: track both for calibration
|
|
297
|
+
- `capabilities_used`: skills, MCPs, and agent types activated
|
|
298
|
+
- `tags`: searchable labels
|
|
299
|
+
- `confidence`: low for first-time observations, medium for confirmed patterns
|
|
300
|
+
|
|
301
|
+
### Pattern Extraction
|
|
302
|
+
|
|
303
|
+
After writing an experience, check for pattern promotion:
|
|
304
|
+
|
|
305
|
+
1. Read `experiences/index.json`
|
|
306
|
+
2. Count entries with overlapping `task_type` AND `tags` that share the same lesson theme
|
|
307
|
+
3. If 3+ similar experiences exist with the same lesson → promote to `patterns.json`:
|
|
308
|
+
- Choose the appropriate category (codebase_insights, execution_patterns, user_behavior, recurring_issues)
|
|
309
|
+
- Set `confidence: "low"` for newly promoted patterns (3 occurrences)
|
|
310
|
+
- Raise to `"medium"` at 5+, `"high"` at 8+
|
|
311
|
+
|
|
312
|
+
### Pattern Decay
|
|
313
|
+
|
|
314
|
+
Patterns that are not reinforced within 30 days should have their confidence reduced:
|
|
315
|
+
- `high` → `medium`
|
|
316
|
+
- `medium` → `low`
|
|
317
|
+
- `low` → remove from patterns.json
|
|
318
|
+
|
|
319
|
+
Check for decay when reading patterns.json during any blackboard operation.
|
|
320
|
+
|
|
321
|
+
### Cold-Start Behavior
|
|
322
|
+
|
|
323
|
+
During the first ~10 interactions (when `experiences/index.json` has `total_count < 10`):
|
|
324
|
+
- Record EVERY completed task, even trivial ones
|
|
325
|
+
- Set `confidence: "low"` on all entries
|
|
326
|
+
- Prioritize breadth of recording over depth of analysis
|
|
327
|
+
|
|
328
|
+
## Requirements
|
|
329
|
+
|
|
330
|
+
- reference: `PROGRESS.md` | optional | executor progress log for auto-triggered mode
|
|
331
|
+
- reference: `~/.claude/ftm-retros/` | optional | prior retro files for pattern analysis
|
|
332
|
+
- reference: `references/protocols/SCORING-RUBRICS.md` | required | scoring scale breakpoints and evidence requirements
|
|
333
|
+
- reference: `references/templates/REPORT-FORMAT.md` | required | retro report output template
|
|
334
|
+
- reference: `~/.claude/ftm-state/blackboard/experiences/index.json` | optional | experience inventory for micro-reflection mode
|
|
335
|
+
- reference: `~/.claude/ftm-state/blackboard/patterns.json` | optional | pattern registry for promotion and decay
|
|
336
|
+
|
|
337
|
+
## Risk
|
|
338
|
+
|
|
339
|
+
- level: low_write
|
|
340
|
+
- scope: writes retro report to ~/.claude/ftm-retros/; writes experience files to blackboard; promotes patterns to patterns.json; does not modify project source files
|
|
341
|
+
- rollback: delete retro report file; remove experience entry from blackboard
|
|
342
|
+
|
|
343
|
+
## Approval Gates
|
|
344
|
+
|
|
345
|
+
- trigger: pattern promotion triggered (3+ matching experiences) | action: auto-promote to patterns.json without user gate (learning system behavior)
|
|
346
|
+
- complexity_routing: micro → auto | small → auto | medium → auto | large → auto | xl → auto
|
|
347
|
+
|
|
348
|
+
## Fallbacks
|
|
349
|
+
|
|
350
|
+
- condition: PROGRESS.md not found and manual mode | action: check ~/.claude/ftm-retros/ for most recent .md file; ask user which execution to review if multiple found
|
|
351
|
+
- condition: execution context not provided by ftm-executor | action: reconstruct from PROGRESS.md or ask user for context
|
|
352
|
+
- condition: scoring rubric file missing | action: apply built-in scoring heuristics from skill body
|
|
353
|
+
- condition: experiences/index.json has fewer than 10 entries | action: cold-start mode — record every task, set all confidence to low
|
|
354
|
+
|
|
355
|
+
## Capabilities
|
|
356
|
+
|
|
357
|
+
- env: none required
|
|
358
|
+
|
|
359
|
+
## Event Payloads
|
|
360
|
+
|
|
361
|
+
### experience_recorded
|
|
362
|
+
- skill: string — "ftm-retro"
|
|
363
|
+
- experience_path: string — path to written experience file
|
|
364
|
+
- task_type: string — type of task recorded
|
|
365
|
+
- outcome: string — success | partial | failure
|
|
366
|
+
- confidence: string — low | medium | high
|
|
367
|
+
|
|
368
|
+
### pattern_discovered
|
|
369
|
+
- skill: string — "ftm-retro"
|
|
370
|
+
- pattern_name: string — name of the promoted pattern
|
|
371
|
+
- category: string — codebase_insights | execution_patterns | user_behavior | recurring_issues
|
|
372
|
+
- occurrence_count: number — number of experiences that triggered promotion
|
|
373
|
+
- confidence: string — low | medium | high
|
|
374
|
+
|
|
375
|
+
### task_completed
|
|
376
|
+
- skill: string — "ftm-retro"
|
|
377
|
+
- report_path: string — absolute path to saved retro report
|
|
378
|
+
- overall_score: number — total score out of 50
|
|
379
|
+
- top_issue: string — most impactful bottleneck identified
|
|
380
|
+
- patterns_promoted: number — new patterns added to patterns.json
|