@dv.nghiem/flowdeck 0.4.11 → 0.4.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,297 @@
1
+ ---
2
+ name: context-steward
3
+ description: Unified context lifecycle for FlowDeck sessions — ingest, filter, prune, protect, summarize, and persist with telemetry.
4
+ origin: FlowDeck
5
+ ---
6
+
7
+ # Context Steward
8
+
9
+ FlowDeck sessions accumulate noise. Tool outputs, rule loads, failed attempts, and multi-agent chatter fill the context window. This skill defines a unified lifecycle to keep context lean, relevant, and recoverable.
10
+
11
+ ## When to Activate
12
+
13
+ Activate when:
14
+ - Context exceeds 50% of the window and response quality drops
15
+ - Multiple agents have contributed outputs in one session
16
+ - Tool results are large (logs, diffs, file reads)
17
+ - You are about to switch phases (plan → execute → verify)
18
+ - A `/fd-checkpoint` is imminent
19
+
20
+ ## Core Principles
21
+
22
+ - **Context is a liability** — every token not serving the current task is a distraction
23
+ - **Prune with purpose** — never drop what the agent needs to continue
24
+ - **Protect the thread** — user intent, active plans, and safety records are non-negotiable
25
+ - **Telemetry is cheap** — write stats before pruning so patterns are visible later
26
+
27
+ ---
28
+
29
+ ## Unified Context Lifecycle
30
+
31
+ ### 1. Ingest
32
+
33
+ Everything that enters the session window:
34
+
35
+ | Source | Typical Size | Risk Level |
36
+ |--------|-------------|------------|
37
+ | User prompts | Small | Low — never prune |
38
+ | Tool results (read, edit, bash) | Variable | High — can be huge |
39
+ | Skill loads | Medium | Medium — load once per session |
40
+ | Rule injections | Small-Medium | Medium — stage-gated already |
41
+ | Agent outputs | Medium | Medium — may contain plans or decisions |
42
+ | Memory queries | Small | Low |
43
+ | `codegraph` results | Small-Medium | Low |
44
+
45
+ **Ingest discipline**: Before any large output enters context, ask whether it is needed for the next 5 turns. If not, summarize or redirect to file.
46
+
47
+ ---
48
+
49
+ ### 2. Filter
50
+
51
+ FlowDeck already gates rules by stage. Extend this discipline to all context sources.
52
+
53
+ | Current Stage | Load | Defer / Skip |
54
+ |---------------|------|--------------|
55
+ | `discuss` | Behavioral rules, `AGENTS.md` | Coding standards, testing rules |
56
+ | `plan` | Architecture rules, planning rules | Security rules, lint rules |
57
+ | `execute` | Coding standards, language patterns, security | Debug rules (until needed) |
58
+ | `verify` | Testing, security, linting rules | Planning rules |
59
+ | `fix-bug` | Debug, testing rules | Architecture rules |
60
+
61
+ **Filter action**: If a skill or rule is not relevant to the current stage, do not load it. Use `load-rules` on demand rather than pre-loading.
62
+
63
+ ---
64
+
65
+ ### 3. Prune — Three-Pass Pipeline
66
+
67
+ Pruning is surgical. It runs when context exceeds 50% of the window or when switching tasks.
68
+
69
+ #### Pass 1: Deduplicate
70
+
71
+ **What gets pruned**:
72
+ - Identical tool outputs repeated across agents (e.g., two agents reading the same file)
73
+ - Duplicate skill loads (same skill invoked twice with identical parameters)
74
+ - Redundant `codegraph` queries returning the same symbols
75
+
76
+ **What stays**:
77
+ - First occurrence of any unique output
78
+ - Outputs with different parameters or timestamps
79
+ - User prompts (never deduplicated)
80
+
81
+ **How to invoke**:
82
+ - Agent-triggered: after parallel agent execution, the orchestrator deduplicates before presenting results
83
+ - Manual: agents may call a deduplication routine directly; there is no dedicated slash command
84
+
85
+ **FlowDeck-native pattern**: When `@parallel-coordinator` dispatches 3 agents that all read `src/config.ts`, keep only the first read result. Reference the others by index.
86
+
87
+ ---
88
+
89
+ #### Pass 2: Purge Errors
90
+
91
+ **What gets pruned**:
92
+ - Failed tool executions that have been superseded by a later success
93
+ - Stack traces from resolved errors
94
+ - Old build failures after a successful build
95
+ - Retry loops where the final attempt succeeded
96
+
97
+ **What stays**:
98
+ - The most recent failure if the issue is still unresolved
99
+ - Failures linked to an active `FAILURES.json` entry
100
+ - Errors that inform the current debugging session
101
+
102
+ **How to invoke**:
103
+ - Agent-triggered: `@debug-specialist` purges resolved error chains after root cause is found
104
+ - Automatic: after `bun test` exits 0, purge prior failing test output
105
+
106
+ **FlowDeck-native pattern**: If `@build-error-resolver` fixes a type error, purge the type-checker output but keep the fix description in `SESSION_SUMMARY.md`.
107
+
108
+ ---
109
+
110
+ #### Pass 3: Compress Stale Ranges
111
+
112
+ **What gets pruned**:
113
+ - Old conversation turns (> 10 turns back) not touching current files
114
+ - Large file reads from modules no longer being edited
115
+ - Tool outputs from completed sub-tasks
116
+ - Agent outputs for tasks already merged or abandoned
117
+
118
+ **What stays**:
119
+ - Last 2 user messages (see Protected Patterns)
120
+ - Active plan and STATE.md content
121
+ - Decisions and failures linked to current work
122
+ - Any output touching files in the current `git diff`
123
+
124
+ **How to invoke**:
125
+ - `/fd-checkpoint` — full session save + context clear
126
+ - Agent-triggered: `@orchestrator` compresses after each wave in a multi-wave plan
127
+
128
+ **FlowDeck-native pattern**: Replace 20 turns of exploratory editing on `src/auth.ts` with a single synthetic summary: "Explored 3 approaches for token refresh; selected sliding-window with 15-min expiry. See DECISIONS.jsonl:auth-refresh-2026-06-10."
129
+
130
+ ---
131
+
132
+ ### 4. Protect
133
+
134
+ Protected patterns are immune to all pruning passes.
135
+
136
+ #### Category A: Core System
137
+
138
+ | Pattern | Why Protected |
139
+ |---------|--------------|
140
+ | Orchestrator rules (`agent-orchestration.md`) | Routing depends on them |
141
+ | `AGENTS.md` | Defines agent boundaries and non-negotiables |
142
+ | `STATE.md` | Current phase, plan, blockers |
143
+ | `PLAN.md` (active) | Success criteria and step order |
144
+
145
+ #### Category B: Safety
146
+
147
+ | Pattern | Why Protected |
148
+ |---------|--------------|
149
+ | `.codebase/DECISIONS.jsonl` | Rationale for current design |
150
+ | `.codebase/FAILURES.json` | Prevents repeating failed approaches |
151
+ | `.codebase/CONSTRAINTS.md` | Architecture guards |
152
+
153
+ #### Category C: User Intent
154
+
155
+ | Pattern | Why Protected |
156
+ |---------|--------------|
157
+ | Last 2 user messages | Most recent instructions |
158
+ | Active plan reference | What the user asked for |
159
+ | Explicitly pinned context | User said "keep this in mind" |
160
+
161
+ #### Category D: Tool-Specific (In-Flight)
162
+
163
+ | Pattern | Why Protected |
164
+ |---------|--------------|
165
+ | `write` output for current file | Must verify what was written |
166
+ | `edit` diff for current change | Must confirm diff is correct |
167
+ | `bash` output for running command | Command may still be relevant |
168
+
169
+ **Protection rule**: If a tool operation is in-flight or its result is referenced in the next 3 turns, do not prune it. Mark it as pinned until the agent acknowledges it.
170
+
171
+ ---
172
+
173
+ ### 5. Summarize
174
+
175
+ After pruning, replace removed ranges with synthetic summary messages.
176
+
177
+ **Summary format**:
178
+
179
+ ```markdown
180
+ [Context Steward] Pruned N turns (M tokens). Retained: [list].
181
+ Summary: [1-2 sentences]. Evidence: [link to DECISIONS.jsonl or SESSION_SUMMARY.md].
182
+ ```
183
+
184
+ **What to summarize**:
185
+ - Exploratory edits → decision + chosen approach
186
+ - Research → conclusion + source
187
+ - Multi-agent discussion → consensus + dissent (if relevant)
188
+ - Build/test cycles → final status + any remaining failures
189
+
190
+ **What NOT to summarize**:
191
+ - Active user instructions (keep verbatim)
192
+ - In-flight tool operations (keep verbatim)
193
+ - Unresolved errors (keep verbatim until fixed)
194
+
195
+ ---
196
+
197
+ ### 6. Persist
198
+
199
+ Write pruning stats to `.codebase/TELEMETRY.jsonl` for pattern analysis.
200
+
201
+ **Entry format**:
202
+
203
+ ```json
204
+ {"ts":"2026-06-10T14:32:00Z","event":"context-prune","session_id":"abc123","before_tokens":85000,"after_tokens":42000,"passes":{"dedup":12,"purge_errors":8,"compress":25},"protected":15,"summary_tokens":180}
205
+ ```
206
+
207
+ **Why persist**: Over time, telemetry reveals which agents produce the most noise, which skills bloat context, and when pruning is most effective.
208
+
209
+ ---
210
+
211
+ ## Decision Matrix: Prune vs Compact vs Checkpoint
212
+
213
+ | Situation | Tokens | Action | Command |
214
+ |-----------|--------|--------|---------|
215
+ | Minor bloat, same task | 40-60% | Prune (3-pass) | Agent-triggered |
216
+ | Major bloat, same task | 60-80% | Compact + prune | Agent-triggered, then `/fd-checkpoint` |
217
+ | Task complete, new task next | Any | Checkpoint | `/fd-checkpoint` |
218
+ | Phase switch (plan → execute) | Any | Compact | Agent-triggered summary |
219
+ | Multi-wave plan, wave done | Any | Compact | `@orchestrator` summarizes wave |
220
+ | Session > 1 hour | Any | Checkpoint | `/fd-checkpoint` |
221
+ | Context > 80% | Any | Checkpoint immediately | `/fd-checkpoint` |
222
+
223
+ **Prune**: Remove noise, keep session alive.
224
+ **Compact**: Replace ranges with summaries, keep session alive.
225
+ **Checkpoint**: Save state, start fresh session.
226
+
227
+ ---
228
+
229
+ ## Anti-Patterns
230
+
231
+ ### Do Not Prune Active User Instructions
232
+
233
+ The last 2 user messages are sacred. If they contain a multi-part instruction, keep all parts until the agent has addressed each one.
234
+
235
+ **Bad**: Prune turn 5 where the user said "also fix the test" because it is 10 turns back, while the agent is still working on the first part.
236
+
237
+ **Good**: Pin the instruction and unpin after confirmation.
238
+
239
+ ### Do Not Duplicate Tool Results Across Agents
240
+
241
+ When `@parallel-coordinator` dispatches agents, each agent may read the same file. Do not carry all N copies forward.
242
+
243
+ **Bad**: 3 agents read `src/db.ts`; all 3 full file contents stay in context.
244
+
245
+ **Good**: Keep the first read. Subsequent agents reference it by citation.
246
+
247
+ ### Do Not Compress Without Preserving Evidence Links
248
+
249
+ A summary without a link is a rumor. Always attach evidence.
250
+
251
+ **Bad**: "We decided on approach A."
252
+
253
+ **Good**: "Selected approach A (sliding-window expiry). See DECISIONS.jsonl:auth-refresh-2026-06-10."
254
+
255
+ ---
256
+
257
+ ## FlowDeck Tool Reference
258
+
259
+ | Tool / Command | Role in Context Steward |
260
+ |----------------|------------------------|
261
+ | `codegraph` | Find symbols without reading full files — reduces ingest size |
262
+ | `memory` | Query past decisions instead of loading full `DECISIONS.jsonl` |
263
+ | `decision-trace` | Record decisions before compressing the discussion that led to them |
264
+ | `/fd-checkpoint` | Full save + clear — use at 80% or task boundaries |
265
+ | `/fd-resume` | Load summarized context instead of full history |
266
+ | `load-rules` | Stage-gated rule loading — reduces ingest at session start |
267
+
268
+ ---
269
+
270
+ ## Cross-Reference
271
+
272
+ | Skill | Relationship |
273
+ |-------|-------------|
274
+ | [`context-budget`](context-budget/SKILL.md) | Sets thresholds and audit practices. Context Steward executes the pruning when those thresholds are breached. |
275
+ | [`session-persistence`](session-persistence/SKILL.md) | Defines what to save at session boundaries. Context Steward decides what to prune before that save happens. |
276
+ | [`strategic-compact`](strategic-compact/SKILL.md) | Advises on when to compact manually. Context Steward automates compaction as part of the prune pipeline. |
277
+ | [`context-guard`](context-guard/SKILL.md) | Defines boundary checks. Context Steward uses those boundaries to decide what is protected during pruning. |
278
+
279
+ ---
280
+
281
+ ## Quick Reference
282
+
283
+ ```
284
+ Ingest → Filter by stage → Prune (dedup → purge → compress)
285
+ ↓ ↓
286
+ load-rules Protect core / safety / intent / in-flight
287
+ ↓ ↓
288
+ Skip irrelevant Summarize pruned ranges
289
+ ↓ ↓
290
+ rules/skills Persist telemetry
291
+ ```
292
+
293
+ **Protected always**: `AGENTS.md`, `STATE.md`, active `PLAN.md`, `.codebase/DECISIONS.jsonl`, `.codebase/FAILURES.json`, last 2 user messages, in-flight tool results.
294
+
295
+ **Prune first**: Duplicate reads, resolved errors, stale exploratory turns.
296
+
297
+ **Checkpoint when**: > 80% tokens, task complete, phase switch, session > 1 hour.
@@ -61,6 +61,141 @@ The `decision-trace-hook` auto-records a minimal entry for every write/edit. The
61
61
  { "action": "query", "query": { "risk_level": "high", "limit": 10 } }
62
62
  ```
63
63
 
64
+ ## Decision Evolution
65
+
66
+ Decisions are not static. They change as requirements shift, new evidence appears, or better alternatives emerge. Track the full lifecycle:
67
+
68
+ ### `alternatives_considered`
69
+
70
+ List every option evaluated and why it was rejected or accepted. This prevents re-litigating old choices.
71
+
72
+ ```json
73
+ "alternatives_considered": [
74
+ "Use PostgreSQL full-text search (rejected: poor ranking for our use case)",
75
+ "Add Elasticsearch (rejected: operational overhead exceeds benefit)",
76
+ "Hybrid: Postgres for exact match, in-memory trie for prefix (accepted: best latency/cost tradeoff)"
77
+ ]
78
+ ```
79
+
80
+ ### `superseded_by`
81
+
82
+ When a later decision replaces this one, link forward. This keeps the ledger from becoming stale.
83
+
84
+ ```json
85
+ {
86
+ "id": "cache-strategy-v1",
87
+ "superseded_by": "cache-strategy-v2",
88
+ "rationale": "Initial Redis caching for user sessions"
89
+ }
90
+ ```
91
+
92
+ When querying, always check if an entry has `superseded_by` set. If it does, read the newer decision instead.
93
+
94
+ ### `evidence`
95
+
96
+ Link to anything that supports the decision:
97
+ - Commit hash where the change was made
98
+ - Test file that validates the behavior
99
+ - Benchmark result showing performance improvement
100
+ - Failure ID from `.codebase/FAILURES.json` that motivated the fix
101
+ - Document or RFC that defined the requirement
102
+
103
+ Evidence must be checkable. "I think this is faster" is not evidence. A benchmark output is.
104
+
105
+ ### `confidence_level`
106
+
107
+ Rate how certain you are that this decision will hold:
108
+
109
+ | Level | Criteria | Action |
110
+ |-------|----------|--------|
111
+ | **high** | Clear requirement, strong evidence, reversible if wrong | Record and move on |
112
+ | **medium** | Some ambiguity, partial evidence, or moderate blast radius | Schedule review in 2 weeks |
113
+ | **low** | Guesswork, no evidence, high blast radius, or irreversible | Require second opinion before proceeding |
114
+
115
+ Set `confidence_level` honestly. A low-confidence decision is not bad — pretending it is high confidence is.
116
+
117
+ ## Decision Quality Checklist
118
+
119
+ Before recording, verify the decision meets these standards:
120
+
121
+ - [ ] **Problem defined**: The problem or goal is stated in one sentence
122
+ - [ ] **Alternatives evaluated**: At least two options were considered
123
+ - [ ] **Evidence exists**: The decision is supported by a commit, test, doc, or failure record — not just opinion
124
+ - [ ] **Risks documented**: Known downsides are listed in `assumptions` or `alternatives_considered`
125
+ - [ ] **Reversibility noted**: If this is wrong, how hard is it to undo? (easy / moderate / hard)
126
+
127
+ If any box is unchecked, either gather the missing information or flag the decision as `confidence_level: low`.
128
+
129
+ ## Reading the Decision Ledger
130
+
131
+ `.codebase/DECISIONS.jsonl` is append-only newline-delimited JSON. Query it with the `decision-trace` tool or standard tools:
132
+
133
+ ### Querying by Dimensions
134
+
135
+ Use the tool's `query` action to filter:
136
+
137
+ ```json
138
+ // All decisions touching auth files
139
+ { "action": "query", "query": { "file_path": "src/services/auth.ts" } }
140
+
141
+ // All deletions (high-risk)
142
+ { "action": "query", "query": { "change_type": "delete" } }
143
+
144
+ // All high-risk decisions from the last sprint
145
+ { "action": "query", "query": { "risk_level": "high", "limit": 20 } }
146
+ ```
147
+
148
+ ### Identifying Patterns
149
+
150
+ Read the ledger periodically to spot trends:
151
+
152
+ - **Repeated decisions**: If the same `alternatives_considered` appears 3+ times, extract a convention or skill
153
+ - **Assumption drift**: If an `assumptions` entry is contradicted by later decisions, update the original or mark it `superseded_by`
154
+ - **Risk clustering**: Many `high` risk decisions in one module signals instability — consider a refactor or deeper review
155
+
156
+ ### Decisions Needing Review
157
+
158
+ Flag entries for re-examination when:
159
+ - **Old**: Recorded > 90 days ago with `confidence_level: medium` or `low`
160
+ - **High risk**: `risk_level: high` with no linked `evidence`
161
+ - **No evidence**: Empty `evidence` array and `confidence_level` is not `high`
162
+ - **Superseded chain**: A decision has `superseded_by` which itself has `superseded_by` — merge into a single current decision
163
+
164
+ ## Tool Parameter Reference
165
+
166
+ The `decision-trace` tool accepts these actions:
167
+
168
+ | Action | Parameters | Description |
169
+ |--------|-----------|-------------|
170
+ | `record` | `entry` object (required) | Append a new decision to the ledger |
171
+ | `query` | `query` object with optional `file_path`, `change_type`, `risk_level`, `limit` | Search existing decisions |
172
+ | `get_for_file` | `file_path` (required) | Get all decisions for a specific file |
173
+
174
+ ### Entry Schema
175
+
176
+ ```typescript
177
+ interface DecisionEntry {
178
+ id: string; // unique identifier
179
+ file_path: string; // file affected
180
+ change_type: 'create' | 'edit' | 'delete' | 'refactor';
181
+ rationale: string; // why this change was made
182
+ evidence: string[]; // supporting commits, tests, docs, failure IDs
183
+ assumptions: string[]; // things assumed true
184
+ alternatives_considered: string[]; // options evaluated
185
+ risk_level: 'low' | 'medium' | 'high';
186
+ confidence_level: 'low' | 'medium' | 'high';
187
+ agent: string; // which agent made the decision
188
+ superseded_by?: string; // ID of a later decision that replaces this
189
+ }
190
+ ```
191
+
192
+ ## Cross-Reference
193
+
194
+ Use decision trace alongside these skills:
195
+
196
+ - **[change-impact-radar](../change-impact-radar/SKILL.md)**: Before recording a decision, run impact analysis to understand blast radius. Document the predicted impact in `assumptions`.
197
+ - **[arch-constraint-guard](../arch-constraint-guard/SKILL.md)**: If a decision violates a constraint, record it as `risk_level: high` with `confidence_level: low` and link to the constraint rule.
198
+
64
199
  ## Review Acceleration
65
200
 
66
201
  When reviewing a PR, query DECISIONS.jsonl for all files in the diff. For each entry, reviewers can quickly see the "why" without asking the author.
@@ -70,3 +205,5 @@ When reviewing a PR, query DECISIONS.jsonl for all files in the diff. For each e
70
205
  - Rationale should answer: "why this approach and not the obvious alternative?"
71
206
  - Evidence should be checkable: a doc URL, a failure ID, a test result
72
207
  - Assumptions should be explicit: if an assumption breaks, so does the change
208
+ - Confidence should be honest: flag uncertainty so the team can allocate review attention
209
+ - Superseded decisions should be linked: prevent stale decisions from misleading future readers