npm - @dv.nghiem/flowdeck - Versions diffs - 0.4.11 → 0.4.12 - Mend

@dv.nghiem/flowdeck 0.4.11 → 0.4.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (31) hide show

package/README.md +0 -2
package/dist/dashboard/lib/state-reader.d.ts +2 -1
package/dist/dashboard/lib/state-reader.d.ts.map +1 -1
package/dist/dashboard/server.mjs +128 -13
package/dist/dashboard/types.d.ts +12 -0
package/dist/dashboard/types.d.ts.map +1 -1
package/dist/hooks/orchestrator-guard-hook.d.ts.map +1 -1
package/dist/hooks/shell-env-hook.d.ts.map +1 -1
package/dist/index.d.ts.map +1 -1
package/dist/index.js +125 -334
package/dist/services/loop-detector.d.ts.map +1 -1
package/docs/getting-started/installation.md +0 -18
package/docs/index.md +0 -1
package/docs/reference/hooks.md +1 -16
package/package.json +6 -6
package/src/rules/common/agent-defense.md +66 -0
package/src/rules/common/agent-orchestration.md +35 -1
package/src/skills/context-budget/SKILL.md +266 -0
package/src/skills/context-guard/SKILL.md +172 -0
package/src/skills/context-steward/SKILL.md +297 -0
package/src/skills/decision-trace/SKILL.md +137 -0
package/src/skills/research-first/SKILL.md +344 -0
package/src/skills/session-persistence/SKILL.md +320 -0
package/src/skills/telemetry-steward/SKILL.md +191 -0
package/dist/services/rtk-manager.d.ts +0 -80
package/dist/services/rtk-manager.d.ts.map +0 -1
package/dist/services/rtk-policy.d.ts +0 -26
package/dist/services/rtk-policy.d.ts.map +0 -1
package/dist/tools/rtk-setup.d.ts +0 -22
package/dist/tools/rtk-setup.d.ts.map +0 -1
package/docs/reference/rtk.md +0 -162

package/src/skills/context-steward/SKILL.md ADDED Viewed

@@ -0,0 +1,297 @@
+---
+name: context-steward
+description: Unified context lifecycle for FlowDeck sessions — ingest, filter, prune, protect, summarize, and persist with telemetry.
+origin: FlowDeck
+---
+# Context Steward
+FlowDeck sessions accumulate noise. Tool outputs, rule loads, failed attempts, and multi-agent chatter fill the context window. This skill defines a unified lifecycle to keep context lean, relevant, and recoverable.
+## When to Activate
+Activate when:
+- Context exceeds 50% of the window and response quality drops
+- Multiple agents have contributed outputs in one session
+- Tool results are large (logs, diffs, file reads)
+- You are about to switch phases (plan → execute → verify)
+- A `/fd-checkpoint` is imminent
+## Core Principles
+- **Context is a liability** — every token not serving the current task is a distraction
+- **Prune with purpose** — never drop what the agent needs to continue
+- **Protect the thread** — user intent, active plans, and safety records are non-negotiable
+- **Telemetry is cheap** — write stats before pruning so patterns are visible later
+---
+## Unified Context Lifecycle
+### 1. Ingest
+Everything that enters the session window:
+| Source | Typical Size | Risk Level |
+|--------|-------------|------------|
+| User prompts | Small | Low — never prune |
+| Tool results (read, edit, bash) | Variable | High — can be huge |
+| Skill loads | Medium | Medium — load once per session |
+| Rule injections | Small-Medium | Medium — stage-gated already |
+| Agent outputs | Medium | Medium — may contain plans or decisions |
+| Memory queries | Small | Low |
+| `codegraph` results | Small-Medium | Low |
+**Ingest discipline**: Before any large output enters context, ask whether it is needed for the next 5 turns. If not, summarize or redirect to file.
+---
+### 2. Filter
+FlowDeck already gates rules by stage. Extend this discipline to all context sources.
+| Current Stage | Load | Defer / Skip |
+|---------------|------|--------------|
+| `discuss` | Behavioral rules, `AGENTS.md` | Coding standards, testing rules |
+| `plan` | Architecture rules, planning rules | Security rules, lint rules |
+| `execute` | Coding standards, language patterns, security | Debug rules (until needed) |
+| `verify` | Testing, security, linting rules | Planning rules |
+| `fix-bug` | Debug, testing rules | Architecture rules |
+**Filter action**: If a skill or rule is not relevant to the current stage, do not load it. Use `load-rules` on demand rather than pre-loading.
+---
+### 3. Prune — Three-Pass Pipeline
+Pruning is surgical. It runs when context exceeds 50% of the window or when switching tasks.
+#### Pass 1: Deduplicate
+**What gets pruned**:
+- Identical tool outputs repeated across agents (e.g., two agents reading the same file)
+- Duplicate skill loads (same skill invoked twice with identical parameters)
+- Redundant `codegraph` queries returning the same symbols
+**What stays**:
+- First occurrence of any unique output
+- Outputs with different parameters or timestamps
+- User prompts (never deduplicated)
+**How to invoke**:
+- Agent-triggered: after parallel agent execution, the orchestrator deduplicates before presenting results
+- Manual: agents may call a deduplication routine directly; there is no dedicated slash command
+**FlowDeck-native pattern**: When `@parallel-coordinator` dispatches 3 agents that all read `src/config.ts`, keep only the first read result. Reference the others by index.
+---
+#### Pass 2: Purge Errors
+**What gets pruned**:
+- Failed tool executions that have been superseded by a later success
+- Stack traces from resolved errors
+- Old build failures after a successful build
+- Retry loops where the final attempt succeeded
+**What stays**:
+- The most recent failure if the issue is still unresolved
+- Failures linked to an active `FAILURES.json` entry
+- Errors that inform the current debugging session
+**How to invoke**:
+- Agent-triggered: `@debug-specialist` purges resolved error chains after root cause is found
+- Automatic: after `bun test` exits 0, purge prior failing test output
+**FlowDeck-native pattern**: If `@build-error-resolver` fixes a type error, purge the type-checker output but keep the fix description in `SESSION_SUMMARY.md`.
+---
+#### Pass 3: Compress Stale Ranges
+**What gets pruned**:
+- Old conversation turns (> 10 turns back) not touching current files
+- Large file reads from modules no longer being edited
+- Tool outputs from completed sub-tasks
+- Agent outputs for tasks already merged or abandoned
+**What stays**:
+- Last 2 user messages (see Protected Patterns)
+- Active plan and STATE.md content
+- Decisions and failures linked to current work
+- Any output touching files in the current `git diff`
+**How to invoke**:
+- `/fd-checkpoint` — full session save + context clear
+- Agent-triggered: `@orchestrator` compresses after each wave in a multi-wave plan
+**FlowDeck-native pattern**: Replace 20 turns of exploratory editing on `src/auth.ts` with a single synthetic summary: "Explored 3 approaches for token refresh; selected sliding-window with 15-min expiry. See DECISIONS.jsonl:auth-refresh-2026-06-10."
+---
+### 4. Protect
+Protected patterns are immune to all pruning passes.
+#### Category A: Core System
+| Pattern | Why Protected |
+|---------|--------------|
+| Orchestrator rules (`agent-orchestration.md`) | Routing depends on them |
+| `AGENTS.md` | Defines agent boundaries and non-negotiables |
+| `STATE.md` | Current phase, plan, blockers |
+| `PLAN.md` (active) | Success criteria and step order |
+#### Category B: Safety
+| Pattern | Why Protected |
+|---------|--------------|
+| `.codebase/DECISIONS.jsonl` | Rationale for current design |
+| `.codebase/FAILURES.json` | Prevents repeating failed approaches |
+| `.codebase/CONSTRAINTS.md` | Architecture guards |
+#### Category C: User Intent
+| Pattern | Why Protected |
+|---------|--------------|
+| Last 2 user messages | Most recent instructions |
+| Active plan reference | What the user asked for |
+| Explicitly pinned context | User said "keep this in mind" |
+#### Category D: Tool-Specific (In-Flight)
+| Pattern | Why Protected |
+|---------|--------------|
+| `write` output for current file | Must verify what was written |
+| `edit` diff for current change | Must confirm diff is correct |
+| `bash` output for running command | Command may still be relevant |
+**Protection rule**: If a tool operation is in-flight or its result is referenced in the next 3 turns, do not prune it. Mark it as pinned until the agent acknowledges it.
+---
+### 5. Summarize
+After pruning, replace removed ranges with synthetic summary messages.
+**Summary format**:
+```markdown
+[Context Steward] Pruned N turns (M tokens). Retained: [list].
+Summary: [1-2 sentences]. Evidence: [link to DECISIONS.jsonl or SESSION_SUMMARY.md].
+```
+**What to summarize**:
+- Exploratory edits → decision + chosen approach
+- Research → conclusion + source
+- Multi-agent discussion → consensus + dissent (if relevant)
+- Build/test cycles → final status + any remaining failures
+**What NOT to summarize**:
+- Active user instructions (keep verbatim)
+- In-flight tool operations (keep verbatim)
+- Unresolved errors (keep verbatim until fixed)
+---
+### 6. Persist
+Write pruning stats to `.codebase/TELEMETRY.jsonl` for pattern analysis.
+**Entry format**:
+```json
+{"ts":"2026-06-10T14:32:00Z","event":"context-prune","session_id":"abc123","before_tokens":85000,"after_tokens":42000,"passes":{"dedup":12,"purge_errors":8,"compress":25},"protected":15,"summary_tokens":180}
+```
+**Why persist**: Over time, telemetry reveals which agents produce the most noise, which skills bloat context, and when pruning is most effective.
+---
+## Decision Matrix: Prune vs Compact vs Checkpoint
+| Situation | Tokens | Action | Command |
+|-----------|--------|--------|---------|
+| Minor bloat, same task | 40-60% | Prune (3-pass) | Agent-triggered |
+| Major bloat, same task | 60-80% | Compact + prune | Agent-triggered, then `/fd-checkpoint` |
+| Task complete, new task next | Any | Checkpoint | `/fd-checkpoint` |
+| Phase switch (plan → execute) | Any | Compact | Agent-triggered summary |
+| Multi-wave plan, wave done | Any | Compact | `@orchestrator` summarizes wave |
+| Session > 1 hour | Any | Checkpoint | `/fd-checkpoint` |
+| Context > 80% | Any | Checkpoint immediately | `/fd-checkpoint` |
+**Prune**: Remove noise, keep session alive.
+**Compact**: Replace ranges with summaries, keep session alive.
+**Checkpoint**: Save state, start fresh session.
+---
+## Anti-Patterns
+### Do Not Prune Active User Instructions
+The last 2 user messages are sacred. If they contain a multi-part instruction, keep all parts until the agent has addressed each one.
+**Bad**: Prune turn 5 where the user said "also fix the test" because it is 10 turns back, while the agent is still working on the first part.
+**Good**: Pin the instruction and unpin after confirmation.
+### Do Not Duplicate Tool Results Across Agents
+When `@parallel-coordinator` dispatches agents, each agent may read the same file. Do not carry all N copies forward.
+**Bad**: 3 agents read `src/db.ts`; all 3 full file contents stay in context.
+**Good**: Keep the first read. Subsequent agents reference it by citation.
+### Do Not Compress Without Preserving Evidence Links
+A summary without a link is a rumor. Always attach evidence.
+**Bad**: "We decided on approach A."
+**Good**: "Selected approach A (sliding-window expiry). See DECISIONS.jsonl:auth-refresh-2026-06-10."
+---
+## FlowDeck Tool Reference
+| Tool / Command | Role in Context Steward |
+|----------------|------------------------|
+| `codegraph` | Find symbols without reading full files — reduces ingest size |
+| `memory` | Query past decisions instead of loading full `DECISIONS.jsonl` |
+| `decision-trace` | Record decisions before compressing the discussion that led to them |
+| `/fd-checkpoint` | Full save + clear — use at 80% or task boundaries |
+| `/fd-resume` | Load summarized context instead of full history |
+| `load-rules` | Stage-gated rule loading — reduces ingest at session start |
+---
+## Cross-Reference
+| Skill | Relationship |
+|-------|-------------|
+| [`context-budget`](context-budget/SKILL.md) | Sets thresholds and audit practices. Context Steward executes the pruning when those thresholds are breached. |
+| [`session-persistence`](session-persistence/SKILL.md) | Defines what to save at session boundaries. Context Steward decides what to prune before that save happens. |
+| [`strategic-compact`](strategic-compact/SKILL.md) | Advises on when to compact manually. Context Steward automates compaction as part of the prune pipeline. |
+| [`context-guard`](context-guard/SKILL.md) | Defines boundary checks. Context Steward uses those boundaries to decide what is protected during pruning. |
+---
+## Quick Reference
+```
+Ingest  → Filter by stage → Prune (dedup → purge → compress)
+            ↓                    ↓
+        load-rules            Protect core / safety / intent / in-flight
+            ↓                    ↓
+        Skip irrelevant       Summarize pruned ranges
+            ↓                    ↓
+        rules/skills          Persist telemetry
+```
+**Protected always**: `AGENTS.md`, `STATE.md`, active `PLAN.md`, `.codebase/DECISIONS.jsonl`, `.codebase/FAILURES.json`, last 2 user messages, in-flight tool results.
+**Prune first**: Duplicate reads, resolved errors, stale exploratory turns.
+**Checkpoint when**: > 80% tokens, task complete, phase switch, session > 1 hour.

package/src/skills/decision-trace/SKILL.md CHANGED Viewed

@@ -61,6 +61,141 @@ The `decision-trace-hook` auto-records a minimal entry for every write/edit. The
 { "action": "query", "query": { "risk_level": "high", "limit": 10 } }
 ```
+## Decision Evolution
+Decisions are not static. They change as requirements shift, new evidence appears, or better alternatives emerge. Track the full lifecycle:
+### `alternatives_considered`
+List every option evaluated and why it was rejected or accepted. This prevents re-litigating old choices.
+```json
+"alternatives_considered": [
+  "Use PostgreSQL full-text search (rejected: poor ranking for our use case)",
+  "Add Elasticsearch (rejected: operational overhead exceeds benefit)",
+  "Hybrid: Postgres for exact match, in-memory trie for prefix (accepted: best latency/cost tradeoff)"
+]
+```
+### `superseded_by`
+When a later decision replaces this one, link forward. This keeps the ledger from becoming stale.
+```json
+{
+  "id": "cache-strategy-v1",
+  "superseded_by": "cache-strategy-v2",
+  "rationale": "Initial Redis caching for user sessions"
+}
+```
+When querying, always check if an entry has `superseded_by` set. If it does, read the newer decision instead.
+### `evidence`
+Link to anything that supports the decision:
+- Commit hash where the change was made
+- Test file that validates the behavior
+- Benchmark result showing performance improvement
+- Failure ID from `.codebase/FAILURES.json` that motivated the fix
+- Document or RFC that defined the requirement
+Evidence must be checkable. "I think this is faster" is not evidence. A benchmark output is.
+### `confidence_level`
+Rate how certain you are that this decision will hold:
+| Level | Criteria | Action |
+|-------|----------|--------|
+| **high** | Clear requirement, strong evidence, reversible if wrong | Record and move on |
+| **medium** | Some ambiguity, partial evidence, or moderate blast radius | Schedule review in 2 weeks |
+| **low** | Guesswork, no evidence, high blast radius, or irreversible | Require second opinion before proceeding |
+Set `confidence_level` honestly. A low-confidence decision is not bad — pretending it is high confidence is.
+## Decision Quality Checklist
+Before recording, verify the decision meets these standards:
+- [ ] **Problem defined**: The problem or goal is stated in one sentence
+- [ ] **Alternatives evaluated**: At least two options were considered
+- [ ] **Evidence exists**: The decision is supported by a commit, test, doc, or failure record — not just opinion
+- [ ] **Risks documented**: Known downsides are listed in `assumptions` or `alternatives_considered`
+- [ ] **Reversibility noted**: If this is wrong, how hard is it to undo? (easy / moderate / hard)
+If any box is unchecked, either gather the missing information or flag the decision as `confidence_level: low`.
+## Reading the Decision Ledger
+`.codebase/DECISIONS.jsonl` is append-only newline-delimited JSON. Query it with the `decision-trace` tool or standard tools:
+### Querying by Dimensions
+Use the tool's `query` action to filter:
+```json
+// All decisions touching auth files
+{ "action": "query", "query": { "file_path": "src/services/auth.ts" } }
+// All deletions (high-risk)
+{ "action": "query", "query": { "change_type": "delete" } }
+// All high-risk decisions from the last sprint
+{ "action": "query", "query": { "risk_level": "high", "limit": 20 } }
+```
+### Identifying Patterns
+Read the ledger periodically to spot trends:
+- **Repeated decisions**: If the same `alternatives_considered` appears 3+ times, extract a convention or skill
+- **Assumption drift**: If an `assumptions` entry is contradicted by later decisions, update the original or mark it `superseded_by`
+- **Risk clustering**: Many `high` risk decisions in one module signals instability — consider a refactor or deeper review
+### Decisions Needing Review
+Flag entries for re-examination when:
+- **Old**: Recorded > 90 days ago with `confidence_level: medium` or `low`
+- **High risk**: `risk_level: high` with no linked `evidence`
+- **No evidence**: Empty `evidence` array and `confidence_level` is not `high`
+- **Superseded chain**: A decision has `superseded_by` which itself has `superseded_by` — merge into a single current decision
+## Tool Parameter Reference
+The `decision-trace` tool accepts these actions:
+| Action | Parameters | Description |
+|--------|-----------|-------------|
+| `record` | `entry` object (required) | Append a new decision to the ledger |
+| `query` | `query` object with optional `file_path`, `change_type`, `risk_level`, `limit` | Search existing decisions |
+| `get_for_file` | `file_path` (required) | Get all decisions for a specific file |
+### Entry Schema
+```typescript
+interface DecisionEntry {
+  id: string;                    // unique identifier
+  file_path: string;             // file affected
+  change_type: 'create' | 'edit' | 'delete' | 'refactor';
+  rationale: string;             // why this change was made
+  evidence: string[];            // supporting commits, tests, docs, failure IDs
+  assumptions: string[];         // things assumed true
+  alternatives_considered: string[]; // options evaluated
+  risk_level: 'low' | 'medium' | 'high';
+  confidence_level: 'low' | 'medium' | 'high';
+  agent: string;                 // which agent made the decision
+  superseded_by?: string;        // ID of a later decision that replaces this
+}
+```
+## Cross-Reference
+Use decision trace alongside these skills:
+- **[change-impact-radar](../change-impact-radar/SKILL.md)**: Before recording a decision, run impact analysis to understand blast radius. Document the predicted impact in `assumptions`.
+- **[arch-constraint-guard](../arch-constraint-guard/SKILL.md)**: If a decision violates a constraint, record it as `risk_level: high` with `confidence_level: low` and link to the constraint rule.
 ## Review Acceleration
 When reviewing a PR, query DECISIONS.jsonl for all files in the diff. For each entry, reviewers can quickly see the "why" without asking the author.
@@ -70,3 +205,5 @@ When reviewing a PR, query DECISIONS.jsonl for all files in the diff. For each e
 - Rationale should answer: "why this approach and not the obvious alternative?"
 - Evidence should be checkable: a doc URL, a failure ID, a test result
 - Assumptions should be explicit: if an assumption breaks, so does the change
+- Confidence should be honest: flag uncertainty so the team can allocate review attention
+- Superseded decisions should be linked: prevent stale decisions from misleading future readers