npm - @dv.nghiem/flowdeck - Versions diffs - 0.4.11 → 0.5.0 - Mend

@dv.nghiem/flowdeck 0.4.11 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (105) hide show

package/README.md +0 -2
package/dist/agents/orchestrator.d.ts.map +1 -1
package/dist/config/index.d.ts +1 -1
package/dist/config/index.d.ts.map +1 -1
package/dist/config/schema.d.ts +27 -1
package/dist/config/schema.d.ts.map +1 -1
package/dist/dashboard/lib/state-reader.d.ts +2 -1
package/dist/dashboard/lib/state-reader.d.ts.map +1 -1
package/dist/dashboard/server.mjs +128 -13
package/dist/dashboard/types.d.ts +12 -0
package/dist/dashboard/types.d.ts.map +1 -1
package/dist/hooks/approval-hook.d.ts +16 -2
package/dist/hooks/approval-hook.d.ts.map +1 -1
package/dist/hooks/compaction-hook.d.ts +1 -1
package/dist/hooks/compaction-hook.d.ts.map +1 -1
package/dist/hooks/context-window-monitor.d.ts +7 -1
package/dist/hooks/context-window-monitor.d.ts.map +1 -1
package/dist/hooks/decision-trace-hook.d.ts +3 -0
package/dist/hooks/decision-trace-hook.d.ts.map +1 -1
package/dist/hooks/event-log-hook.d.ts +19 -3
package/dist/hooks/event-log-hook.d.ts.map +1 -1
package/dist/hooks/guard-rails.d.ts +16 -5
package/dist/hooks/guard-rails.d.ts.map +1 -1
package/dist/hooks/orchestrator-guard-hook.d.ts +8 -5
package/dist/hooks/orchestrator-guard-hook.d.ts.map +1 -1
package/dist/hooks/shell-env-hook.d.ts.map +1 -1
package/dist/hooks/tool-guard.d.ts +19 -3
package/dist/hooks/tool-guard.d.ts.map +1 -1
package/dist/index.d.ts.map +1 -1
package/dist/index.js +8401 -4863
package/dist/services/agent-contract-registry.d.ts.map +1 -1
package/dist/services/agent-trace-graph.d.ts +4 -0
package/dist/services/agent-trace-graph.d.ts.map +1 -1
package/dist/services/agent-validator.d.ts +2 -1
package/dist/services/agent-validator.d.ts.map +1 -1
package/dist/services/approval-manager.d.ts +14 -1
package/dist/services/approval-manager.d.ts.map +1 -1
package/dist/services/audit-log.d.ts +23 -0
package/dist/services/audit-log.d.ts.map +1 -0
package/dist/services/context-ingress.d.ts +75 -0
package/dist/services/context-ingress.d.ts.map +1 -0
package/dist/services/deadlock-detector.d.ts.map +1 -1
package/dist/services/delegation-budget.d.ts +55 -0
package/dist/services/delegation-budget.d.ts.map +1 -0
package/dist/services/event-logger.d.ts +3 -1
package/dist/services/event-logger.d.ts.map +1 -1
package/dist/services/execution-substrate.d.ts +35 -0
package/dist/services/execution-substrate.d.ts.map +1 -0
package/dist/services/harness-controller.d.ts +58 -0
package/dist/services/harness-controller.d.ts.map +1 -0
package/dist/services/harness-policy.d.ts +24 -0
package/dist/services/harness-policy.d.ts.map +1 -0
package/dist/services/harness-types.d.ts +178 -0
package/dist/services/harness-types.d.ts.map +1 -0
package/dist/services/lazy-rule-loader.d.ts +2 -0
package/dist/services/lazy-rule-loader.d.ts.map +1 -1
package/dist/services/loop-detector.d.ts.map +1 -1
package/dist/services/prompt-cache.d.ts +25 -0
package/dist/services/prompt-cache.d.ts.map +1 -0
package/dist/services/recovery-layer.d.ts +26 -0
package/dist/services/recovery-layer.d.ts.map +1 -0
package/dist/services/run-trace.d.ts +17 -0
package/dist/services/run-trace.d.ts.map +1 -1
package/dist/services/state-persistence.d.ts +22 -0
package/dist/services/state-persistence.d.ts.map +1 -0
package/dist/services/supervisor-binding.d.ts +9 -0
package/dist/services/supervisor-binding.d.ts.map +1 -1
package/dist/services/token-metrics.d.ts +39 -0
package/dist/services/token-metrics.d.ts.map +1 -0
package/dist/services/verification-layer.d.ts +24 -0
package/dist/services/verification-layer.d.ts.map +1 -0
package/dist/services/workflow-scorecard.d.ts +5 -0
package/dist/services/workflow-scorecard.d.ts.map +1 -1
package/dist/tools/decision-trace.d.ts +4 -0
package/dist/tools/decision-trace.d.ts.map +1 -1
package/dist/tools/delegate.d.ts +16 -0
package/dist/tools/delegate.d.ts.map +1 -0
package/dist/tools/failure-replay.d.ts +8 -0
package/dist/tools/failure-replay.d.ts.map +1 -1
package/dist/tools/policy-engine.d.ts +1 -0
package/dist/tools/policy-engine.d.ts.map +1 -1
package/docs/concepts/HARNESS_ARCHITECTURE.md +241 -0
package/docs/concepts/HARNESS_LAYERS.md +378 -0
package/docs/concepts/HARNESS_WIRING.md +404 -0
package/docs/getting-started/installation.md +0 -18
package/docs/index.md +0 -1
package/docs/reference/hooks.md +1 -16
package/package.json +6 -6
package/src/commands/fd-guarded-edit.md +69 -0
package/src/rules/common/agent-defense.md +66 -0
package/src/rules/common/agent-orchestration.md +35 -1
package/src/skills/context-budget/SKILL.md +266 -0
package/src/skills/context-guard/SKILL.md +172 -0
package/src/skills/context-steward/SKILL.md +297 -0
package/src/skills/decision-trace/SKILL.md +137 -0
package/src/skills/research-first/SKILL.md +344 -0
package/src/skills/session-persistence/SKILL.md +320 -0
package/src/skills/telemetry-steward/SKILL.md +191 -0
package/dist/services/rtk-manager.d.ts +0 -80
package/dist/services/rtk-manager.d.ts.map +0 -1
package/dist/services/rtk-policy.d.ts +0 -26
package/dist/services/rtk-policy.d.ts.map +0 -1
package/dist/tools/rtk-setup.d.ts +0 -22
package/dist/tools/rtk-setup.d.ts.map +0 -1
package/docs/reference/rtk.md +0 -162

package/src/rules/common/agent-orchestration.md CHANGED Viewed

@@ -51,6 +51,40 @@ The orchestrator NEVER:
 | `@tester` | Write and run tests (TDD) | Implementing features or fixing bugs |
 | `@writer` | Draft project documentation | Writing or updating docs |
+## Agent Categories
+Agents are grouped into categories for flexible routing:
+| Category | Agents | Purpose |
+|----------|--------|---------|
+| `cognition` | `@architect`, `@planner`, `@code-explorer` | Deep reasoning, design, and exploration |
+| `execution` | `@backend-coder`, `@frontend-coder`, `@devops`, `@default-executor` | Implementation and delivery |
+| `verification` | `@tester`, `@reviewer`, `@security-auditor`, `@build-error-resolver` | Quality assurance and validation |
+| `governance` | `@orchestrator`, `@discusser`, `@plan-checker`, `@task-splitter`, `@doc-updater`, `@writer` | Process coordination and documentation |
+| `specialist` | `@debug-specialist`, `@performance-optimizer`, `@refactor-guide`, `@researcher`, `@mapper` | Domain-specific expertise |
+## Category-Based Routing
+The orchestrator may route to a **category** instead of a named agent. Categories resolve to a default agent but can be overridden in `flowdeck.json`.
+| Category | Default Agent |
+|----------|--------------|
+| `cognition` | `@planner` |
+| `execution` | `@backend-coder` |
+| `verification` | `@reviewer` |
+| `governance` | `@orchestrator` |
+| `specialist` | `@researcher` |
+### Routing Examples
+- **Build failure** signal → `verification` category → default `@build-error-resolver`
+- **Complex feature** request → `cognition` category → default `@planner`, then hands off to `execution`
+- **Security concern** → `verification` category → default `@security-auditor` (override in config if needed)
+Category routing decouples workflow definitions from specific agent identities, making workflows more portable across projects.
+> **Note:** Agent names are stable; categories are configurable. Prefer routing by category in workflow skills.
 ## Execution Paths
 After the orchestrator analyzes and classifies a request, it selects ONE execution path:
@@ -89,7 +123,7 @@ For normal or complex tasks:
 ## When to Use Agents Immediately
-These situations should trigger agent use automatically:
+These situations should trigger agent use automatically. When the specific agent is unclear, route by **category** instead:
 | Situation | Agent |
 |-----------|-------|

package/src/skills/context-budget/SKILL.md ADDED Viewed

@@ -0,0 +1,266 @@
+---
+name: context-budget
+description: Optimize token usage and context window discipline. Reduce costs and improve response quality through smart context management.
+origin: FlowDeck
+---
+# Context Budget Skill
+Treat context window as a finite resource. Every token loaded — files, rules, tool outputs, conversation history — consumes budget. Optimizing context improves speed, cuts costs, and prevents mid-session truncation.
+## When to Activate
+Activate when:
+- A session exceeds 50K tokens or feels sluggish
+- You are about to load large files, MCP tools, or heavy rulesets
+- You want to audit and slim down your FlowDeck setup
+- You are designing new skills, agents, or workflows
+## Core Principles
+- **Load less, get more** — context quality beats context quantity
+- **Measure before optimizing** — know your current burn rate
+- **Batch over chat** — accumulate work, run checks once
+- **Right-size the model** — light tasks do not need the strongest model
+## Why Context Budget Matters
+| Factor | Impact |
+|--------|--------|
+| Context window limit | Hard cap — exceed it and early conversation is lost |
+| Cost per token | More context = more input tokens = higher bill |
+| Response latency | Large context increases time-to-first-token |
+| Attention degradation | Models perform worse on content near the middle of long context |
+### Hard Limits (Examples)
+| Model | Context Window |
+|-------|---------------|
+| Claude 3.5 Haiku | 200K tokens |
+| Claude 3.5 Sonnet | 200K tokens |
+| GPT-4o | 128K tokens |
+| GPT-4o mini | 128K tokens |
+Treat 80% of the window as your practical maximum. Beyond that, truncation risk rises sharply.
+## Skill Size Audit
+Oversized skills waste context on every activation. Audit yours regularly.
+### Thresholds
+| Metric | Warning | Critical |
+|--------|---------|----------|
+| Lines per SKILL.md | > 300 | > 400 |
+| Words in description | > 25 | > 30 |
+| Files loaded per task | > 5 | > 10 |
+| Rules active at once | > 8 | > 12 |
+### How to Audit
+```bash
+# Count lines in all skills
+find src/skills -name "SKILL.md" -exec wc -l {} + | sort -n
+# Flag skills over 300 lines
+find src/skills -name "SKILL.md" -exec sh -c 'lines=$(wc -l < "$1"); [ "$lines" -gt 300 ] && echo "$lines $1"' _ {} \;
+# Check description word counts
+grep -r "^description:" src/skills/ | awk '{print NF, $0}' | sort -n
+```
+### Remediation
+- **Split oversized skills** — extract sub-topics into separate skills
+- **Shorten descriptions** — under 25 words is ideal; under 30 is required
+- **Use stage-gated rules** — load heavy rules only in `execute` or `verify` stages
+- **Defer heavy context** — load `.codebase/ARCHITECTURE.md` only when needed
+## Model Routing Strategy
+Not every task needs the strongest model. Route by complexity.
+| Task Type | Example | Model Tier |
+|-----------|---------|-----------|
+| Simple edit | Fix typo, rename variable | Fast / Small |
+| Code review | Lint, style check | Fast / Small |
+| Research | Look up API docs | Fast / Small |
+| Feature implementation | Multi-file change | Strong / Large |
+| Debug | Root cause analysis | Strong / Large |
+| Architecture design | New module design | Strong / Large |
+### FlowDeck Agent Routing
+FlowDeck already routes by task class:
+- `quick` workflow → `@default-executor` (lightweight)
+- `standard` workflow → specialist agents (medium)
+- `verify-heavy` or `explore` → strongest models (heavy)
+Respect this routing. Do not escalate a `quick` task to a heavy agent.
+## Prefer CLI Tools Over MCPs
+MCP servers add context overhead: schema discovery, tool definitions, and response envelopes. Native CLI tools are leaner.
+| Use Case | Heavy MCP | Lean Alternative |
+|----------|-----------|-----------------|
+| Git operations | GitHub MCP | `git`, `gh` CLI |
+| AWS queries | AWS MCP | `aws` CLI |
+| Kubernetes checks | K8s MCP | `kubectl` |
+| File search | File-system MCP | `find`, `rg` |
+| Database query | DB MCP | `psql`, `mysql` CLI |
+### When MCPs Are Worth It
+- Complex multi-step operations (e.g., create PR + add reviewers + set labels)
+- Operations requiring authentication tokens you do not have locally
+- Structured data return that CLI would require parsing
+## Accumulator + Batch Pattern
+Chatty sessions burn context fast. Accumulate edits, then run checks once.
+### Anti-Pattern: Chatty Loop
+```
+Edit file A → run test → fix error → edit file B → run test → fix error → edit file C → run test
+```
+Each test run consumes output tokens. Three runs = 3x test output in context.
+### Preferred: Batch + Single Check
+```
+Edit file A
+Edit file B
+Edit file C
+Run tests once
+Fix all errors
+```
+### In FlowDeck
+Use `/fd-checkpoint` after a batch of edits, then `/fd-resume` to continue. This preserves your work without carrying full error output forward indefinitely.
+## Strategic Context Clearing
+Long sessions accumulate noise: failed attempts, dead-ends, large tool outputs. Clear context before it degrades quality.
+### When to Checkpoint
+| Signal | Action |
+|--------|--------|
+| Session > 1 hour | `/fd-checkpoint` |
+| Tokens > 50K | `/fd-checkpoint` |
+| Multiple failed attempts | `/fd-checkpoint` and reassess |
+| Task complete, new task next | `/fd-checkpoint` |
+### Resume Pattern
+```
+1. `/fd-checkpoint` — save current state to STATE.md
+2. Start fresh session
+3. `/fd-resume` — load STATE.md, PLAN.md, active context
+4. Continue with clean context
+```
+This is cheaper than carrying 80K tokens of conversation history.
+## Rule Loading Optimization
+FlowDeck uses stage-gated rules. Only rules matching the current stage are loaded.
+| Stage | Typical Rules Loaded |
+|-------|---------------------|
+| `discuss` | Behavioral, lightweight |
+| `plan` | Planning, architecture |
+| `execute` | Coding standards, language patterns, security |
+| `verify` | Testing, security, linting |
+| `fix-bug` | Debug, testing |
+### Keep Rules Focused
+- One concern per rule file
+- Use `stages` array to gate loading
+- Set `always_on: false` for heavy rules
+- Keep rules under 150 lines when possible
+Audit with:
+```bash
+# Find rules loaded in every stage (always_on = true)
+grep -r "always_on: true" src/rules/
+# Find oversized rules
+find src/rules -name "*.md" -exec sh -c 'lines=$(wc -l < "$1"); [ "$lines" -gt 200 ] && echo "$lines $1"' _ {} \;
+```
+## Code Modularity Benefits
+Smaller files = less context per task. A 400-line file forces the model to hold the entire file in working memory. Four 100-line files let the model focus on one at a time.
+| File Size | Context Impact |
+|-----------|---------------|
+| < 200 lines | Minimal — load on demand |
+| 200-400 lines | Moderate — acceptable for core files |
+| 400-800 lines | Heavy — consider splitting |
+| > 800 lines | Critical — split immediately |
+### Splitting Guidance
+- One responsibility per file
+- Extract utilities to `utils/` or `helpers/`
+- Extract types to `types.ts`
+- Use `codegraph` to find natural split points: `codegraph_impact` on a large symbol reveals which parts are independent
+## Self-Audit Checklist
+Run this monthly or when context feels heavy:
+### Skills
+- [ ] No SKILL.md exceeds 400 lines
+- [ ] No skill description exceeds 30 words
+- [ ] Unused skills removed from `.opencode/skills/`
+### Rules
+- [ ] No rule file exceeds 200 lines
+- [ ] Heavy rules are stage-gated (`always_on: false`)
+- [ ] No redundant rules (same topic, different files)
+### Workflows
+- [ ] Tasks are batched before verification runs
+- [ ] `/fd-checkpoint` used at natural boundaries
+- [ ] Model routing respects task complexity
+### Codebase
+- [ ] No source file exceeds 800 lines
+- [ ] Core modules are under 400 lines
+- [ ] Large files have clear split candidates via `codegraph`
+### Session Hygiene
+- [ ] MCP tools used only when CLI is insufficient
+- [ ] Large outputs (logs, diffs) are summarized, not pasted raw
+- [ ] Failed attempts are checkpointed, not retried endlessly
+## Quick Wins
+1. **Truncate diffs** — `git diff | head -50` instead of full diff
+2. **Summarize logs** — `tail -20` instead of full log file
+3. **Use `codegraph_search`** — find symbols without reading entire files
+4. **Load rules on demand** — `load-rules` instead of pre-loading everything
+5. **Split before you grow** — when a file hits 400 lines, plan the split
+## Related Skills
+- [`plan-task`](./plan-task/SKILL.md) — break work into right-sized chunks
+- [`performance-profiling`](./performance-profiling/SKILL.md) — measure before optimizing
+- [`context-load`](./context-load/SKILL.md) — load only the context you need
+## References
+- `/fd-checkpoint` — save session state, clear context
+- `/fd-resume` — restore from checkpoint
+- `load-rules` — stage-gated rule loading
+- `codegraph` — symbol search without full-file reads
+- `codegraph_impact` — find split points in large files
+- `codegraph_search` — locate symbols efficiently

package/src/skills/context-guard/SKILL.md ADDED Viewed

@@ -0,0 +1,172 @@
+---
+name: context-guard
+description: Protect critical context from pruning during compaction. Preserve active plans, safety files, pending operations, and user intent anchors.
+origin: FlowDeck
+---
+# Context Guard Skill
+Defines the protected-pattern contract: a whitelist of files, tools, decisions, and messages that must survive context compaction.
+## What Is a Protected Pattern?
+A protected pattern is any context item that a pruning pass must skip. Without it, compaction can silently discard the very state an agent needs to finish a task.
+Four categories:
+### Tool Patterns
+Tool invocations that must remain in the conversation window while they are in flight or unverified. Example: a `write` or `edit` call whose result has not yet been confirmed.
+### File Patterns
+Paths that anchor the current session. These include active planning files, project conventions, and safety ledgers.
+### Decision Records
+Records that explain why the session is in its current state. Removing them forces the agent to re-derive intent from scratch.
+### Intent Anchors
+User messages that establish the original goal and the most recent steering corrections. These are the cheapest way to prevent drift.
+## Default Protected-Pattern Registry
+FlowDeck ships with a default registry. Override it in `.opencode/flowdeck/protected-patterns.yaml`, never by editing this skill.
+### System Files
+| Pattern | Reason |
+|---|---|
+| `AGENTS.md` | Operating rules for every agent |
+| `.planning/STATE.md` | Current phase, completed steps, blockers |
+| `.planning/PLAN.md` | Active plan and success criteria |
+### Safety Files
+| Pattern | Reason |
+|---|---|
+| `.codebase/DECISIONS.jsonl` | Decision ledger — why choices were made |
+| `.codebase/FAILURES.json` | Failure replay engine data |
+### Intent Anchors
+| Pattern | Reason |
+|---|---|
+| Last 2 user messages | Original goal + latest steering |
+| Current phase objective | From STATE.md — the single sentence that defines success |
+### Active Operations
+Any pending tool whose side effects have not been verified:
+| Tool | Condition |
+|---|---|
+| `write` | while pending verification |
+| `edit` | while pending verification |
+| `bash` | while exit code/output not yet checked |
+## Guard Protocol
+Before any pruning or compaction run, `context-steward` executes this protocol:
+1. **Enumerate** — load the default registry and any user overrides
+2. **Resolve** — expand patterns to concrete files, tool IDs, and message indices
+3. **Check** — for every candidate marked for removal, test against the registry
+4. **Block** — if the candidate matches a protected pattern, keep it
+5. **Log** — record each blocked removal to telemetry with reason and pattern
+Only items that survive the guard pass are eligible for compaction. The protocol is fail-closed: when in doubt, protect.
+## Template Registry
+Create `.opencode/flowdeck/protected-patterns.yaml`:
+```yaml
+protected:
+  files:
+    - pattern: ".planning/STATE.md"
+      reason: "session state"
+    - pattern: ".planning/PLAN.md"
+      reason: "active plan"
+    - pattern: ".codebase/DECISIONS.jsonl"
+      reason: "decision ledger"
+    - pattern: ".codebase/FAILURES.json"
+      reason: "failure replay data"
+    - pattern: "AGENTS.md"
+      reason: "agent operating rules"
+  tools:
+    - name: "write"
+      while: "pending"
+    - name: "edit"
+      while: "pending"
+    - name: "bash"
+      while: "pending"
+  messages:
+    - type: "user"
+      count: 2
+  decisions:
+    - source: ".codebase/DECISIONS.jsonl"
+      count: 5
+```
+## Integration Notes
+### `context-steward`
+`context-steward` calls `context-guard` before every compaction pass:
+- Pass the candidate removal list to the guard
+- Receive the protected subset
+- Remove only the non-protected remainder
+- Write guard events to telemetry
+Users do not call `context-guard` directly. It is a dependency of the compaction pipeline.
+### Adding Project-Specific Patterns
+1. Create `.opencode/flowdeck/protected-patterns.yaml`
+2. Merge rules with the default registry (user patterns take precedence)
+3. Re-run compaction to verify protection
+Project patterns are appropriate for:
+- Domain-specific safety files (e.g., `MIGRATIONS.md`, `SCHEMA.md`)
+- Regulatory audit logs
+- Long-running operation state files
+Do not add transient build artifacts or cache files. Those are noise, not signal.
+## Anti-Patterns
+### Protecting Everything
+If every file and message is protected, compaction becomes a no-op. The registry exists to make pruning safe, not to disable it. Protect only items whose loss would force the agent to restart the task.
+### Exact-Filename Protection
+A pattern like `STATE.md` misses `.planning/STATE.md`. A pattern like `PLAN.md` misses phase-specific plans at `.planning/phases/phase-3/PLAN.md`. Use glob-style or prefix patterns so protection survives path changes.
+### Leaving Temporary Files Protected
+A `write` call that has been verified, or a temporary scratch file from a completed operation, should not remain in the registry. Temporary protection must expire when the operation completes.
+### Protecting Raw Tool Output
+Large outputs (`git diff`, test logs, MCP responses) are usually not state. Summarize them and protect the summary, not the full output.
+## Quick Reference
+| Pattern Type | Example | Keep Condition |
+|---|---|---|
+| File | `.planning/STATE.md` | Always |
+| File | `.codebase/DECISIONS.jsonl` | Always |
+| Tool | `write` | While pending |
+| Tool | `edit` | While pending |
+| Message | User turn | Last 2 |
+| Decision | `.codebase/DECISIONS.jsonl` | Last 5 entries |
+## Related Skills
+- [`context-load`](./context-load/SKILL.md) — what to load at session start
+- [`context-budget`](./context-budget/SKILL.md) — when and why to compact