npm - deepflow - Versions diffs - 0.1.72 → 0.1.74 - Mend

deepflow 0.1.72 → 0.1.74

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md +82 -201
package/bin/install.js +95 -12
package/package.json +7 -3
package/src/commands/df/execute.md +7 -2
package/src/skills/context-hub/SKILL.md +87 -0

package/README.md CHANGED Viewed

@@ -8,25 +8,36 @@
 ```
 <p align="center">
-  <strong>Stay in flow state — spec-driven task orchestration for Claude Code</strong>
+  <strong>Doing reveals what thinking can't predict</strong>
 </p>
 <p align="center">
   <a href="#quick-start">Quick Start</a> •
   <a href="#two-modes">Two Modes</a> •
-  <a href="#commands">Commands</a>
+  <a href="#commands">Commands</a> •
+  <a href="#what-deepflow-rejects">What It Rejects</a> •
+  <a href="#principles">Principles</a>
 </p>
 ---
-## Philosophy
+## Why Deepflow
-- **Specs define intent**, tasks close reality gaps
-- **You decide WHAT to build** — the AI decides HOW
-- **Two modes:** interactive (human-in-the-loop) and autonomous (overnight, unattended)
-- **Spike-first planning** — Validate risky hypotheses before full implementation
-- **Worktree isolation** — Main branch stays clean during execution
-- **Atomic commits** for clean rollback
+**You can't foresee what you don't know to ask.** Doing reveals — at every layer.
+Most spec-driven frameworks start from a finished spec and execute a static plan. Deepflow treats the entire process as discovery: asking reveals hidden requirements, debating reveals blind spots, spiking reveals technical risks, implementing reveals edge cases. Each step makes the next one sharper.
+- **Asking reveals what assuming hides** — Before any code, Socratic questioning surfaces the requirements you didn't know you had. Four AI perspectives collide to expose tensions in your approach. The spec isn't written from what you think you know — it's written from what the conversation uncovered.
+- **Spec as living hypothesis** — Core intent stays fixed, details refine through implementation. "The spec becomes bulletproof because you built it, not before."
+- **Parallel probes reveal the best path** — Uncertain approaches spawn parallel spikes in isolated worktrees. The machine selects the winner (fewer regressions > better coverage > fewer files changed). Failed approaches stay recorded and never repeat.
+- **Metrics decide, not opinions** — No LLM judges another LLM. Build, tests, typecheck, lint are the only judges. After an agent commits, the orchestrator runs health checks. Pass = keep. Fail = revert + new hypothesis.
+- **The loop is the product** — Not "execute a plan" — "evolve the codebase toward the spec's goals through iterative cycles." Each cycle reveals what the previous one couldn't see.
+## What We Learned by Doing
+Deepflow started with adversarial selection: one AI evaluated another AI's code in a fresh context. The "doing reveals" philosophy applied to the system itself — we discovered that **LLM judging LLM produces gaming**: agents that estimated instead of measuring, simulated instead of implementing, presented shortcuts as deliverables.
+The fix: eliminate subjective judgment. Only objective metrics decide. Tests created by the agent itself are excluded from the baseline to prevent self-validation. We call this a **ratchet** — inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch): a mechanism where the metric can only improve, never regress. Each cycle ratchets quality forward.
 ## Quick Start
@@ -38,212 +49,87 @@ npx deepflow
 npx deepflow --uninstall
 ```
-## Two Modes
+The installer configures granular permissions so background agents can read, write, run git, and execute health checks (build/test/typecheck/lint) without blocking on approval prompts. All permissions are scoped and cleaned up on uninstall.
-deepflow has two modes of operation. Both start from the same artifact: a **spec**.
+## Two Modes
-### Interactive Mode (human-in-the-loop)
+### Interactive (human-in-the-loop)
-You drive each step inside a Claude Code session. Good for when you want control over the process, are exploring a new domain, or want to iterate on the spec.
+You explore the problem, shape the spec, and trigger execution — all inside a Claude Code session.
 ```bash
 claude
-# 1. Explore the problem space (conversation with you)
+# 1. Discover — understand the problem before solving it
 /df:discover image-upload
+#    "Why do you need image upload? What exists today?
+#     What file sizes? What formats? Where are images stored?
+#     What does 'done' look like? What should this NOT do?"
-# 2. Debate tradeoffs (optional, 4 AI perspectives)
+# 2. Debate — stress-test the approach (optional)
 /df:debate upload-strategy
+#    User Advocate:   "Drag-and-drop is table stakes, not a feature"
+#    Tech Skeptic:    "Client-side resize before upload, or you'll hit memory limits"
+#    Systems Thinker: "What happens when storage goes down mid-upload?"
+#    LLM Efficiency:  "Split this into two specs: upload + processing"
-# 3. Generate spec from conversation
+# 3. Spec — now the conversation is rich enough to produce a solid spec
 /df:spec image-upload
-# 4. Generate task plan from spec
-/df:plan
-# 5. Execute tasks (parallel agents, you watch)
-/df:execute
-# 6. Verify and merge to main
-/df:verify
+# 4-6: the AI takes over
+/df:plan                         # Compare spec to code, create tasks
+/df:execute                      # Parallel agents in worktree, ratchet validates
+/df:verify                       # Check spec satisfied, merge to main
 ```
 **What requires you:** Steps 1-3 (defining the problem and approving the spec). Steps 4-6 run autonomously but you trigger each one and can intervene.
-### Autonomous Mode (unattended)
+### Autonomous (unattended)
-You write the specs, then walk away. The AI runs the full pipeline — hypothesis generation, parallel spikes, implementation, adversarial self-selection, verification — without any human intervention.
-```bash
-# You define WHAT (the specs), the AI figures out HOW, overnight
+The human loop comes first — discover and debate are where intent gets shaped. You refine the problem, stress-test ideas, and produce a spec that captures what you actually need. That's the living contract. Then you hand it off.
-# Inside Claude Code (requires Agent Teams)
-/df:auto                         # process all specs in specs/
-```
-**What the AI does alone:**
-1. Pre-checks if spec is already satisfied (skips if so)
-2. Discovers specs, respects `depends_on` ordering
-3. Generates N hypotheses for how to implement each spec
-4. Runs parallel spikes in isolated worktrees (one per hypothesis)
-5. Implements the passing approaches
-6. Adversarial selection: a fresh AI context compares approaches by artifacts only (never reads code), picks the best or rejects all
-7. If rejected: generates new hypotheses, retries (up to max-cycles)
-8. On convergence: verifies (L0-L4 gates), creates PR, merges to main
-**What you do:** Write specs (via interactive mode or manually) in `specs/`, run `/df:auto` inside Claude Code, read the report at `.deepflow/auto-report.md`. No need to run `/df:plan` first — auto mode promotes plain specs to `doing-*` automatically.
-**How to use:**
 ```bash
-# In Claude Code — create and approve a spec
+# First: the human loop — discover, debate, refine until the spec is solid
 $ claude
 > /df:discover auth
-> /df:spec auth          # creates specs/auth.md
+> /df:debate auth-strategy
+> /df:spec auth              # specs/auth.md — the handoff point
 > /exit
-# Inside Claude Code — run auto mode
+# Then: the AI loop — plan, execute, validate, merge
+$ claude
 > /df:auto
-# Next morning — check what happened
+# Next morning
 $ cat .deepflow/auto-report.md
 $ git log --oneline
 ```
-**Safety:** Never pushes to remote. Failed approaches recorded in `.deepflow/experiments/` and never repeated. Specs validated before processing (malformed specs are skipped).
+**What the AI does alone:**
+1. Runs `/df:plan` if no PLAN.md exists
+2. Snapshots pre-existing tests (ratchet baseline)
+3. Starts a loop (`/loop 1m /df:auto-cycle`) — fresh context each cycle
+4. Each cycle: picks next task → executes in worktree → runs health checks (build/tests/typecheck/lint)
+5. Pass = commit stands. Fail = revert + retry next cycle
+6. Circuit breaker: halts after N consecutive reverts on same task
+7. When all tasks done: runs `/df:verify`, merges to main
+**Safety:** Never pushes to remote. Failed approaches recorded in `.deepflow/experiments/` and never repeated. Specs validated before processing.
-### The Boundary
+### Two Loops, One Handoff
 ```
- YOU (the human)                    AI (autonomous)
+ HUMAN LOOP                         AI LOOP
  ─────────────────────────────────  ──────────────────────────────────
- Define the problem                 Generate hypotheses
- Write/approve the spec             Spike, implement, compare
- Set constraints & acceptance       Self-judge, verify against YOUR criteria
- criteria                           Merge or retry
- Read morning report
+ /df:discover — ask, surface gaps   /df:plan — compare spec to code
+ /df:debate — stress-test approach  /df:execute — spike, implement
+ /df:spec — produce living contract /df:verify — health checks, merge
+      ↻ refine until solid               ↻ retry until converged
  ─────────────────────────────────  ──────────────────────────────────
          specs/*.md is the handoff point
 ```
-## The Flow (Interactive)
-```
-/df:discover <name>
-    | Socratic questioning (motivation, scope, constraints...)
-    v
-/df:debate <topic>          <- optional
-    | 4 perspectives: User Advocate, Tech Skeptic,
-    |   Systems Thinker, LLM Efficiency
-    | Creates specs/.debate-{topic}.md
-    v
-/df:spec <name>
-    | Creates specs/{name}.md from conversation
-    | Validates structure before writing
-    v
-/df:plan
-    | Checks past experiments (learn from failures)
-    | Risky work? -> generates spike task first
-    | Creates PLAN.md with prioritized tasks
-    | Renames: feature.md -> doing-feature.md
-    v
-/df:execute
-    | Creates isolated worktree (main stays clean)
-    | Spike tasks run first, verified before continuing
-    | Parallel agents, file conflicts serialize
-    | Context-aware (>=50% -> checkpoint)
-    v
-/df:verify
-    | Checks requirements met
-    | Merges worktree to main, cleans up
-    | Extracts decisions -> .deepflow/decisions.md
-    | Deletes done-* spec after extraction
-```
-## The Flow (Autonomous)
-```
-/df:auto
-    | Discover specs (auto-promote, topological sort by depends_on)
-    | For each doing-* spec:
-    |
-    |   Pre-check (Haiku: already satisfied? skip)
-    |       v
-    |   Validate spec (malformed? skip)
-    |       v
-    |   Generate N hypotheses
-    |       v
-    |   Parallel spikes (one worktree per hypothesis)
-    |     | Pass? -> implement in same worktree
-    |     | Fail? -> record experiment, discard
-    |       v
-    |   Adversarial selection (fresh context, artifacts only)
-    |     | Winner? -> verify (L0-L4) -> PR -> merge
-    |     | Reject all? -> new hypotheses, retry
-    |       v
-    |   Morning report -> .deepflow/auto-report.md
-```
-## Spec Lifecycle
-```
-specs/
-  feature.md        -> new, needs /df:plan
-  doing-feature.md  -> in progress (active contract between you and the AI)
-  done-feature.md   -> transient (decisions extracted, then deleted)
-```
-## Works With Any Project
-**Greenfield:** Everything is new, agents create from scratch.
-**Ongoing:** Detects existing patterns, follows conventions, integrates with current code.
-## Spike-First Planning
-For risky or uncertain work, `/df:plan` generates a **spike task** first:
-```
-Spike: Validate streaming upload handles 10MB+ files
-  | Run minimal experiment
-  | Pass? -> Unblock implementation tasks
-  | Fail? -> Record learning, generate new hypothesis
-```
-Experiments are tracked in `.deepflow/experiments/`. Failed approaches won't be repeated.
-## Worktree Isolation
-Execution happens in an isolated git worktree:
-- Main branch stays clean during execution
-- On failure, worktree preserved for debugging
-- Resume with `/df:execute --continue`
-- On success, `/df:verify` merges to main and cleans up
-## LSP Integration
-/df:automatically enables Claude Code's LSP tools during install, giving agents access to `goToDefinition`, `findReferences`, and `workspaceSymbol` for precise code navigation instead of grep-based searching.
-- **Global install:** sets `ENABLE_LSP_TOOL=1` in `~/.claude/settings.json`
-- **Project install:** sets it in `.claude/settings.local.json`
-- **Uninstall:** cleans up automatically
-Agents prefer LSP tools when available and fall back to Grep/Glob silently. You'll need a language server installed for your language (e.g. `typescript-language-server`, `pyright`, `rust-analyzer`, `gopls`).
-## Spec Validation
-Specs are validated before downstream consumption by `/df:spec`, `/df:plan`, and `/df:auto`:
-- **Hard invariants** (block on failure): required sections present, REQ-N prefixes, checkbox ACs, no duplicate IDs
-- **Advisory warnings** (warn interactively, block in auto mode): long specs, orphaned requirements, excessive technical notes
-Run manually: `node hooks/df-spec-lint.js specs/my-spec.md`
-## Context-Aware Execution
-Statusline shows context usage. At >=50%:
-- Waits for running agents
-- Checkpoints state
-- Resume with `/df:execute --continue`
+**Spec lifecycle:** `feature.md` (new) → `doing-feature.md` (in progress) → `done-feature.md` (decisions extracted, then deleted)
 ## Commands
@@ -259,7 +145,7 @@ Statusline shows context usage. At >=50%:
 | `/df:consolidate` | Deduplicate and clean up decisions.md |
 | `/df:resume` | Session continuity briefing |
 | `/df:update` | Update deepflow to latest |
-| `/df:auto` | Autonomous execution via /loop (no human needed) |
+| `/df:auto` | Autonomous mode (plan → loop → verify, no human needed) |
 ## File Structure
@@ -273,39 +159,34 @@ your-project/
     +-- config.yaml            # project settings
     +-- decisions.md           # auto-extracted + ad-hoc decisions
     +-- auto-report.md         # morning report (autonomous mode)
-    +-- auto-decisions.log     # AI decision log (autonomous mode)
-    +-- last-consolidated.json # consolidation timestamp
-    +-- context.json           # context % tracking
+    +-- auto-memory.yaml       # cross-cycle learning
     +-- experiments/           # spike results (pass/fail)
     +-- worktrees/             # isolated execution
         +-- upload/            # one worktree per spec
 ```
-## Configuration
-Create `.deepflow/config.yaml`:
+## What Deepflow Rejects
-```yaml
-project:
-  source_dir: src/
-  specs_dir: specs/
+- **Predicting everything before doing** — You discover what you need by building it. TDD assumes you already know the correct behavior before coding. Deepflow assumes that **execution reveals** what planning can't anticipate.
+- **LLM judging LLM** — We started with adversarial selection (AI evaluating AI). We discovered gaming. We replaced it with objective metrics. Deepflow's own evolution proved the principle.
+- **Agents role-playing job titles** — Flat orchestrator + model routing. No PM agent, no QA agent, no Scrum Master agent.
+- **Automated research before understanding** — Conversation with you first. AI research comes after you've defined the problem.
+- **Ceremony** — 6 commands, one flow. Markdown, not schemas. No sprint planning, no story points, no retrospectives.
-parallelism:
-  execute:
-    max: 5              # max parallel agents
+## Principles
-worktree:
-  cleanup_on_success: true
-  cleanup_on_fail: false  # preserve for debugging
-```
+1. **Discover before specifying, spike before implementing** — Ask, debate, probe — then commit
+2. **You define WHAT, AI figures out HOW** — Specs are the contract
+3. **Metrics decide, not opinions** — Build/test/typecheck/lint are the only judges
+4. **Confirm before assume** — Search the code before marking "missing"
+5. **Complete implementations** — No stubs, no placeholders
+6. **Atomic commits** — One task = one commit
+7. **Context-aware** — Checkpoint before limits, resume seamlessly
-## Principles
+## More
-1. **You define WHAT, AI figures out HOW** — Specs are the contract
-2. **Confirm before assume** — Search code before marking "missing"
-3. **Complete implementations** — No stubs, no placeholders
-4. **Atomic commits** — One task = one commit
-5. **Context-aware** — Checkpoint before limits
+- [Concepts](docs/concepts.md) — Philosophy and flow in depth
+- [Configuration](docs/configuration.md) — All options, models, parallelism
 ## License

package/bin/install.js CHANGED Viewed

@@ -184,13 +184,14 @@ async function main() {
   console.log('');
   console.log(`Installed to ${c.cyan}${CLAUDE_DIR}${c.reset}:`);
   console.log('  commands/df/     — /df:discover, /df:debate, /df:spec, /df:plan, /df:execute, /df:verify, /df:auto, /df:note, /df:resume, /df:update');
-  console.log('  skills/          — gap-discovery, atomic-commits, code-completeness');
+  console.log('  skills/          — gap-discovery, atomic-commits, code-completeness, context-hub');
   console.log('  agents/          — reasoner (/df:auto — autonomous execution via /loop)');
   if (level === 'global') {
     console.log('  hooks/           — statusline, update checker');
   }
   console.log('  hooks/df-spec-*  — spec validation (auto-enforced by /df:spec and /df:plan)');
   console.log('  env/             — ENABLE_LSP_TOOL (code navigation via goToDefinition, findReferences, workspaceSymbol)');
+  console.log('  permissions/     — granular allow-list for background agents (git, build, test, read/write)');
   console.log('');
   if (level === 'project') {
     console.log(`${c.dim}Note: Statusline is only available with global install.${c.reset}`);
@@ -252,6 +253,10 @@ async function configureHooks(claudeDir) {
   settings.env.ENABLE_LSP_TOOL = "1";
   log('LSP tool enabled');
+  // Configure permissions for background agents
+  configurePermissions(settings);
+  log('Agent permissions configured');
   // Configure statusline
   if (settings.statusLine) {
     const answer = await ask(
@@ -319,8 +324,72 @@ function configureProjectSettings(claudeDir) {
   if (!settings.env) settings.env = {};
   settings.env.ENABLE_LSP_TOOL = "1";
+  // Configure permissions for background agents
+  configurePermissions(settings);
   fs.writeFileSync(settingsPath, JSON.stringify(settings, null, 2));
-  log('LSP tool enabled (project)');
+  log('LSP tool enabled + agent permissions configured (project)');
+}
+// Permissions required for background agents to work without blocking
+const DEEPFLOW_PERMISSIONS = [
+  // Agents need to read/write code
+  "Edit",
+  "Write",
+  "Read",
+  // Agents need to search codebase
+  "Glob",
+  "Grep",
+  // Git operations (orchestrator handles worktrees, agents read status)
+  "Bash(git status:*)",
+  "Bash(git diff:*)",
+  "Bash(git add:*)",
+  "Bash(git commit:*)",
+  "Bash(git log:*)",
+  "Bash(git stash:*)",
+  "Bash(git checkout:*)",
+  "Bash(git branch:*)",
+  "Bash(git revert:*)",
+  "Bash(git worktree:*)",
+  "Bash(git ls-files:*)",
+  "Bash(git merge:*)",
+  // Build & test (ratchet health checks)
+  "Bash(npm run build:*)",
+  "Bash(npm test:*)",
+  "Bash(npm run lint:*)",
+  "Bash(npx tsc:*)",
+  "Bash(cargo build:*)",
+  "Bash(cargo test:*)",
+  "Bash(go build:*)",
+  "Bash(go test:*)",
+  "Bash(pytest:*)",
+  "Bash(python -m pytest:*)",
+  "Bash(ruff:*)",
+  "Bash(mypy:*)",
+  // Utility
+  "Bash(node:*)",
+  "Bash(ls:*)",
+  "Bash(cat:*)",
+  "Bash(mkdir:*)",
+  "Bash(date:*)",
+  "Bash(wc:*)",
+  "Bash(head:*)",
+  "Bash(tail:*)",
+];
+function configurePermissions(settings) {
+  if (!settings.permissions) settings.permissions = {};
+  if (!settings.permissions.allow) settings.permissions.allow = [];
+  const existing = new Set(settings.permissions.allow);
+  let added = 0;
+  for (const perm of DEEPFLOW_PERMISSIONS) {
+    if (!existing.has(perm)) {
+      settings.permissions.allow.push(perm);
+      added++;
+    }
+  }
 }
 function ask(question) {
@@ -400,6 +469,7 @@ async function uninstall() {
     'skills/atomic-commits',
     'skills/code-completeness',
     'skills/gap-discovery',
+    'skills/context-hub',
     'agents/reasoner.md'
   ];
@@ -449,23 +519,30 @@ async function uninstall() {
       }
     }
-    // Remove ENABLE_LSP_TOOL from global settings
+    // Remove ENABLE_LSP_TOOL and deepflow permissions from global settings
     if (fs.existsSync(settingsPath)) {
       try {
         const settings = JSON.parse(fs.readFileSync(settingsPath, 'utf8'));
         if (settings.env?.ENABLE_LSP_TOOL) {
           delete settings.env.ENABLE_LSP_TOOL;
           if (settings.env && Object.keys(settings.env).length === 0) delete settings.env;
-          fs.writeFileSync(settingsPath, JSON.stringify(settings, null, 2));
           console.log(`  ${c.green}✓${c.reset} Removed ENABLE_LSP_TOOL from settings`);
         }
+        if (settings.permissions?.allow) {
+          const dfPerms = new Set(DEEPFLOW_PERMISSIONS);
+          settings.permissions.allow = settings.permissions.allow.filter(p => !dfPerms.has(p));
+          if (settings.permissions.allow.length === 0) delete settings.permissions.allow;
+          if (settings.permissions && Object.keys(settings.permissions).length === 0) delete settings.permissions;
+          console.log(`  ${c.green}✓${c.reset} Removed deepflow permissions from settings`);
+        }
+        fs.writeFileSync(settingsPath, JSON.stringify(settings, null, 2));
       } catch (e) {
         // Fail silently
       }
     }
   }
-  // Remove ENABLE_LSP_TOOL from project settings.local.json
+  // Remove ENABLE_LSP_TOOL and deepflow permissions from project settings.local.json
   if (level === 'project') {
     const localSettingsPath = path.join(PROJECT_DIR, 'settings.local.json');
     if (fs.existsSync(localSettingsPath)) {
@@ -474,13 +551,19 @@ async function uninstall() {
         if (localSettings.env?.ENABLE_LSP_TOOL) {
           delete localSettings.env.ENABLE_LSP_TOOL;
           if (localSettings.env && Object.keys(localSettings.env).length === 0) delete localSettings.env;
-          if (Object.keys(localSettings).length === 0) {
-            fs.unlinkSync(localSettingsPath);
-            console.log(`  ${c.green}✓${c.reset} Removed settings.local.json (empty after cleanup)`);
-          } else {
-            fs.writeFileSync(localSettingsPath, JSON.stringify(localSettings, null, 2));
-            console.log(`  ${c.green}✓${c.reset} Removed ENABLE_LSP_TOOL from settings.local.json`);
-          }
+        }
+        if (localSettings.permissions?.allow) {
+          const dfPerms = new Set(DEEPFLOW_PERMISSIONS);
+          localSettings.permissions.allow = localSettings.permissions.allow.filter(p => !dfPerms.has(p));
+          if (localSettings.permissions.allow.length === 0) delete localSettings.permissions.allow;
+          if (localSettings.permissions && Object.keys(localSettings.permissions).length === 0) delete localSettings.permissions;
+        }
+        if (Object.keys(localSettings).length === 0) {
+          fs.unlinkSync(localSettingsPath);
+          console.log(`  ${c.green}✓${c.reset} Removed settings.local.json (empty after cleanup)`);
+        } else {
+          fs.writeFileSync(localSettingsPath, JSON.stringify(localSettings, null, 2));
+          console.log(`  ${c.green}✓${c.reset} Removed deepflow settings from settings.local.json`);
         }
       } catch (e) {
         // Fail silently

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "deepflow",
-  "version": "0.1.72",
-  "description": "Stay in flow state - lightweight spec-driven task orchestration for Claude Code",
+  "version": "0.1.74",
+  "description": "Doing reveals what thinking can't predict — spec-driven iterative development for Claude Code",
   "keywords": [
     "claude",
     "claude-code",
@@ -12,7 +12,11 @@
     "specs",
     "tasks",
     "automation",
-    "productivity"
+    "productivity",
+    "ratchet",
+    "autonomous",
+    "spikes",
+    "evolutionary"
   ],
   "author": "saidwafiq",
   "license": "MIT",

package/src/commands/df/execute.md CHANGED Viewed

@@ -24,6 +24,7 @@ Implement tasks from PLAN.md with parallel agents, atomic commits, ratchet-drive
 ## Skills & Agents
 - Skill: `atomic-commits` — Clean commit protocol
+- Skill: `context-hub` — Fetch external API docs before coding (when task involves external libraries)
 **Use Task tool to spawn agents:**
 | Agent | subagent_type | Purpose |
@@ -453,8 +454,11 @@ Files: {target files}
 Spec: {spec_name}
 Steps:
-1. Implement the task
-2. Commit as feat({spec}): {description}
+1. If the task involves external APIs/SDKs, run: chub search "<library>" --json → chub get <id> --lang <lang>
+   Use fetched docs as ground truth for API signatures. Annotate any gaps: chub annotate <id> "note"
+   Skip this step if chub is not installed or the task only touches internal code.
+2. Implement the task
+3. Commit as feat({spec}): {description}
 Your ONLY job is to write code and commit. The orchestrator will run health checks after you finish.
 ```
@@ -546,6 +550,7 @@ After spawning wave agents, your turn ENDS. Completion notifications drive the l
 | Machine-selected winner | Fewer regressions > better coverage > fewer files changed; no LLM judge |
 | Failed probe insights logged | `.deepflow/auto-memory.yaml` in main tree; persists across cycles |
 | Winner cherry-picked to shared worktree | Downstream tasks see winning approach via shared worktree |
+| External APIs → chub first | Agents fetch curated docs before implementing external API calls; skip if chub unavailable |
 ## Example

package/src/skills/context-hub/SKILL.md ADDED Viewed

@@ -0,0 +1,87 @@
+---
+name: context-hub
+description: Fetches curated API docs for external libraries before coding. Use when implementing code that uses external APIs/SDKs (Stripe, OpenAI, MongoDB, etc.) to avoid hallucinating APIs and reduce token usage.
+---
+# Context Hub
+Fetch curated, versioned docs for external libraries instead of guessing APIs.
+## When to Use
+Before writing code that calls an external API or SDK:
+- New library integration (e.g., Stripe payments, AWS S3)
+- Unfamiliar API version or method
+- Complex API with many options (e.g., MongoDB aggregation)
+**Skip when:** Working with internal code (use LSP instead) or well-known stdlib APIs.
+## Prerequisites
+Requires `chub` CLI: `npm install -g @aisuite/chub`
+If `chub` is not installed, tell the user and skip — don't block implementation.
+## Workflow
+### 1. Search for docs
+```bash
+chub search "<library or API>" --json
+```
+Example:
+```bash
+chub search "stripe payments" --json
+chub search "mongodb aggregation" --json
+```
+### 2. Fetch relevant docs
+```bash
+chub get <id> --lang <py|js|ts>
+```
+Use `--lang` matching the project language. Use `--full` only if the summary lacks what you need.
+### 3. Write code using fetched docs
+Use the retrieved documentation as ground truth for API signatures, parameter names, and patterns.
+### 4. Annotate discoveries
+When you find something the docs missed or got wrong:
+```bash
+chub annotate <id> "Note: method X requires param Y since v2.0"
+```
+This persists locally and appears on future `chub get` calls — the agent learns across sessions.
+### 5. Rate docs (optional)
+```bash
+chub feedback <id> up --label accurate
+chub feedback <id> down --label outdated
+```
+Labels: `accurate`, `outdated`, `incomplete`, `wrong-version`, `helpful`
+## Integration with LSP
+| Need | Tool |
+|------|------|
+| Internal code navigation | LSP (`goToDefinition`, `findReferences`) |
+| External API signatures | Context Hub (`chub get`) |
+| Symbol search in project | LSP (`workspaceSymbol`) |
+| Library usage patterns | Context Hub (`chub search`) |
+**Combined approach:** Use LSP to understand how the project currently uses a library, then use Context Hub to verify correct API usage and discover better patterns.
+## Rules
+- Always search before implementing external API calls
+- Trust chub docs over training data for API specifics
+- Annotate gaps so future sessions benefit
+- Don't block on chub failures — fall back to best knowledge
+- Prefer `--json` flag for programmatic parsing in automated workflows