npm - triflux - Versions diffs - 10.2.1 → 10.3.1 - Mend

triflux 10.2.1 → 10.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

package/README.md +236 -156
package/hub/bridge.mjs +638 -290
package/hub/codex-compat.mjs +1 -1
package/hub/fullcycle.mjs +1 -1
package/hub/intent.mjs +1 -0
package/hub/lib/mcp-response-cache.mjs +205 -0
package/hub/pipe.mjs +228 -119
package/hub/reflexion.mjs +87 -13
package/hub/research.mjs +1 -0
package/hub/server.mjs +997 -611
package/hub/team/conductor-registry.mjs +121 -0
package/hub/team/conductor.mjs +256 -125
package/hub/team/execution-mode.mjs +105 -0
package/hub/team/headless.mjs +686 -252
package/hub/team/lead-control.mjs +91 -4
package/hub/team/mcp-selector.mjs +145 -0
package/hub/team/session-sync.mjs +153 -6
package/hub/team/swarm-hypervisor.mjs +208 -86
package/hub/token-mode.mjs +1 -0
package/hub/tools.mjs +474 -252
package/package.json +5 -5
package/scripts/claudemd-sync.mjs +12 -1
package/scripts/codex-gateway-preflight.mjs +133 -0
package/scripts/codex-mcp-gateway-sync.mjs +199 -0
package/skills/star-prompt/SKILL.md +169 -69
package/skills/tfx-setup/SKILL.md +124 -0
package/skills/tfx-swarm/SKILL.md +124 -72

package/README.md CHANGED Viewed

@@ -30,11 +30,11 @@
 <p align="center">
   <a href="#quick-start">Quick Start</a> &middot;
-  <a href="#tri-cli-consensus">Consensus Engine</a> &middot;
-  <a href="#42-skills">42 Skills</a> &middot;
+  <a href="#core-engine">Core Engine</a> &middot;
+  <a href="#killer-skills">Killer Skills</a> &middot;
+  <a href="#all-42-skills">All 42 Skills</a> &middot;
   <a href="#deep-vs-light">Deep vs Light</a> &middot;
   <a href="#architecture">Architecture</a> &middot;
-  <a href="#under-the-hood">Under the Hood</a> &middot;
   <a href="#security">Security</a>
 </p>
@@ -44,6 +44,8 @@
 Most AI coding tools talk to **one model**. triflux talks to **three** — and makes them argue.
+triflux is not a collection of skills. It is a **multi-model parallel orchestration harness**. The 42 skills are what it does. The harness — consensus engine, message bus, router, and security guard — is what makes it different.
 Every Deep skill runs Claude, Codex, and Gemini **independently** (no cross-visibility), then cross-validates their findings. Only consensus-verified results survive. The result: **87% fewer false positives** compared to single-model review.
 You don't need to memorize commands. Say what you want in natural language — triflux routes to the right skill automatically:
@@ -74,91 +76,32 @@ tfx setup
 ### 3. Use
 ```bash
-# Light — single model, fast execution
-/tfx-research "React 19 Server Actions best practices"
-/tfx-review
-/tfx-plan "add JWT auth middleware"
-# Deep — 3-party consensus for critical work
-/tfx-deep-research "microservice architecture comparison 2026"
+# 3-party consensus — three models argue, only consensus survives
 /tfx-deep-review
 /tfx-deep-plan "migrate REST to GraphQL"
-# Debate — get 3 independent opinions
-/tfx-debate "Redis vs PostgreSQL LISTEN/NOTIFY for real-time events"
-# Persistence — don't stop until done
-/tfx-persist "implement full auth flow with tests"
+# Swarm — split PRD into shards, parallel worktree execution
+/tfx-swarm
-# Team — multi-CLI parallel orchestration
+# Team — Claude + Codex + Gemini on parallel tasks
 /tfx-multi "refactor auth + update UI + add tests"
-# Monitor — real-time routing dashboard
-tfx monitor
+# Persist — don't stop until done, 3-party verified
+/tfx-persist "implement full auth flow with tests"
-# Remote — spawn Claude sessions on other machines
-/tfx-remote-setup           # interactive host wizard
-/tfx-remote-spawn "run security review on my-server"
+# Remote — spawn sessions on other machines
+/tfx-remote-spawn "run security review on ryzen5-7600"
 ```
 > **Note**: Deep skills require **psmux** (or tmux), **triflux Hub**, **Codex CLI**, and **Gemini CLI** for full Tri-CLI consensus. Without these, skills automatically degrade to Claude-only mode. Run `tfx doctor` to check your environment.
 ---
-## What's New
-### v10.1 — Reflexion Pipeline + TUI Monitor
-| Feature | Description |
-|---------|-------------|
-| **TUI Routing Monitor** | `tfx monitor` — interactive terminal dashboard showing real-time skill routing, model selection, and success rates |
-| **Reflexion Pipeline** | safety-guard events feed into a reflexion store, enabling adaptive learning from past routing decisions |
-| **Adaptive Rules API v2** | Penalty promotion pipeline (`pending-penalties` → `adaptive_rules`), hit_count isolation, schema v2 with 18 tests |
-| **Q-Learning Routing** | Experimental dynamic skill routing via Q-table weight optimization (`TRIFLUX_DYNAMIC_ROUTING=true`) |
-| **Security Hardening** | headless-guard: wrapper bypass, pipe bypass, env escape vectors blocked. SSH bash-syntax forwarding prevention |
-| **HUD System** | Codex plan-aware status display with correct bucket-to-slot mapping |
-### v10.0 — 4-Lake Roadmap
-<details>
-<summary>Expand v10.0 details</summary>
-- **Lake 1: CLI Stability** — Retry, stall detection, version cache. Zero silent failures
-- **Lake 2: Plugin Isolation** — cli-adapter-base, team-bridge, pack.mjs sync
-- **Lake 3: Remote Infrastructure** — SSH keepalive/retry, hosts.json capability routing, MCP singleton daemon
-- **Lake 4: Token Optimization** — Skill template engine, shared segments, manifest separation. 62% prompt token reduction
-- **Lake 5: Agent Mesh** — Message routing, per-agent queues, heartbeat monitoring, Conductor integration
-</details>
-### v9 — Harness-Native Intelligence
-<details>
-<summary>Expand v9 details</summary>
-- **Natural Language Routing** — Say "review this" or "리뷰해줘" instead of memorizing skill names
-- **Cross-Model Review** — Claude writes → Codex reviews. Same-model self-approve blocked
-- **Context Isolation** — Off-topic requests auto-detected; spawns a clean psmux session
-- **Codex Swarm Hardened** — PowerShell `.ps1` launchers, profile-based execution
-</details>
-### v8 — Tri-Debate Foundation
-<details>
-<summary>Expand v8 details</summary>
-- **Tri-Debate Engine** — 3-CLI independent analysis with anti-herding and consensus scoring
-- **Deep/Light Variants** — Every domain has both a fast mode and a thorough mode
-- **Expert Panel** — Virtual expert simulation via `tfx-panel`
-- **Hub IPC** — Named Pipe & HTTP MCP bridge
-- **psmux** — Windows Terminal native multiplexer
+## Core Engine
-</details>
----
+The infrastructure that makes triflux triflux. If any of these break, everything breaks.
-## Tri-CLI Consensus
+### Tri-CLI Consensus
 <p align="center">
   <img src="docs/assets/consensus-flow.svg" alt="Tri-CLI Consensus Flow" width="680">
@@ -183,11 +126,142 @@ Phase 3: Resolution (if consensus < 70%)
   └─ Unresolved → user decides
 ```
-**v10.1 addition**: The **Reflexion Pipeline** feeds consensus outcomes back into an adaptive rules store, so routing decisions improve over time based on which models perform best for which task types.
+### Hub — Singleton MCP Message Bus
+triflux Hub runs as a **singleton daemon** per machine. A filesystem lock prevents duplicate instances.
+```
+Local agents ──→ Named Pipe (NDJSON, sub-ms latency) ──→ Hub
+Remote/Dashboard ──→ HTTP/REST ──────────────────────→ Hub
+```
+The bridge client tries Named Pipe first and falls back to HTTP automatically. Sessions auto-expire after 30 minutes, and the Hub self-terminates when idle. Run `tfx hub ensure` to guarantee the Hub is alive from any context.
+### Router — Natural Language Skill Mapping
+`tfx-auto` is the unified entry point. Natural language input → keyword detection → skill routing → CLI dispatch. Depth modifiers ("thoroughly", "제대로") auto-escalate Light skills to Deep. The router handles Korean and English natively.
+### Guard — Security Perimeter
+Two layers that enforce the safety boundary:
+- **headless-guard**: Blocks direct `codex exec` / `gemini -y` outside tfx skills. Wrapper bypass, pipe bypass, env escape vectors all covered.
+- **safety-guard**: SSH bash-syntax forwarding prevention, injection-safe shell execution.
+Every CLI invocation flows through the guard layer. No exceptions.
+### Reflexion Adaptive Learning
+Errors become knowledge automatically. The Reflexion Engine runs a closed-loop learning pipeline:
+```
+safety-guard blocks command
+  → error normalized (paths, timestamps, UUIDs stripped)
+  → pattern stored in pending-penalties
+  → promoted to adaptive rule (Bayesian confidence scoring)
+  → injected into CLAUDE.md when confidence > threshold
+Three-tier memory:
+  Tier 1 (Session)   → cleared on session end
+  Tier 2 (Project)   → decays -0.2 confidence per 5 unobserved sessions
+  Tier 3 (Permanent) → auto-injected into CLAUDE.md as machine-readable rules
+```
+A blocked command in Session 1 becomes a proactive warning in Session 2 and eventually a permanent instruction. Your AI agent literally gets smarter over time.
+### Pipeline Quality Gates
+Every Deep task runs through a **10-phase state machine** with quality gates:
+```
+plan → PRD → confidence gate → execute → deslop → verify → selfcheck → complete
+                                                              ↓
+                                                          fix (max 3) → retry
+```
+- **Confidence Gate** (pre-execution): 5 weighted criteria must score >= 90% before execution starts
+- **Hallucination Detection** (post-execution): 7 regex patterns catch AI claims without evidence:
+  - "tests pass" without test output
+  - "performance improved" without benchmarks
+  - "backward compatible" without verification
+  - "no changes needed" when diff exists
+- **Bounded loops**: Fix attempts capped at 3, ralph iterations at 10. State persists in SQLite for crash recovery.
 ---
-## 42 Skills
+## Killer Skills
+These are why you use triflux. Each one depends on the Core Engine above.
+### Multi-CLI Team Orchestration — `tfx-multi`
+Run Claude + Codex + Gemini as a coordinated team on parallel tasks. Say `"refactor auth + update UI + add tests"` and each sub-task is dispatched to the best-fit model, executed in parallel, and results are merged with cross-model review.
+```bash
+/tfx-multi "refactor auth + update UI + add tests"
+/tfx-multi --agents codex,gemini "frontend + backend"
+```
+### Multi-Machine x Multi-Model Swarm — `tfx-swarm`
+One PRD, multiple machines, multiple models. Write a PRD with `agent:` and `host:` per shard, and triflux distributes work across local and remote machines using Claude + Codex + Gemini in parallel.
+```bash
+/tfx-swarm    # select PRDs, choose remote/model config, launch workers
+```
+Example PRD shard:
+```markdown
+## Shard: security-audit
+- agent: claude
+- host: ryzen5-7600
+- critical: true
+- files: src/security.mjs
+- prompt: Security vulnerability audit
+```
+Each shard gets its own git worktree, file-lease enforcement prevents conflicts, and results merge automatically in dependency order. Critical shards run on two different models for redundant verification.
+### Remote Sessions — `tfx-remote-spawn`
+Spawn Claude Code sessions on remote machines via SSH. Tailscale auto-discovery, host capability probing, session handoff, prompt injection into running sessions, and session re-attachment.
+```bash
+/tfx-remote-spawn "run security review on ryzen5-7600"
+/tfx-remote-spawn list     # see active remote sessions
+```
+### Persistence Loop — `tfx-persist` (ralph)
+"Don't stop until it's done." A 3-party verified execution loop that keeps going until the task passes consensus verification. Bounded at 10 iterations with state persistence for crash recovery.
+```bash
+/tfx-persist "implement full auth flow with tests"
+```
+### 3-Party Consensus Reviews — `tfx-deep-review` / `tfx-deep-plan`
+The bread-and-butter Deep skills. Three models independently review your code or plan your implementation, then cross-validate. Only consensus-verified findings survive.
+```bash
+/tfx-deep-review            # 3-party code review
+/tfx-deep-plan "migrate to GraphQL"  # 3-party planning
+```
+### Structured Debate — `tfx-debate`
+Three models take independent positions on a technical question, debate, and converge on a recommendation. Anti-herding ensures genuine independence.
+```bash
+/tfx-debate "Redis vs PostgreSQL LISTEN/NOTIFY for real-time events"
+```
+---
+## All 42 Skills
+<details>
+<summary>Expand full skill list</summary>
 ### Research & Discovery
@@ -249,8 +323,7 @@ Phase 3: Resolution (if consensus < 70%)
 | `tfx-consensus` | Core consensus engine (used by all Deep skills) |
 | `tfx-hub` | MCP message bus — Named Pipe & HTTP bridge |
 | `tfx-multi` | Multi-CLI team orchestration (2+ parallel tasks) |
-| `tfx-codex-swarm` | Parallel Codex sessions via worktree + psmux |
-| `tfx-swarm` | Unified swarm orchestration |
+| `tfx-swarm` | Multi-machine x multi-model swarm (PRD → shard → worktree, local+remote) |
 | `tfx-codex` | Codex-only orchestrator |
 | `tfx-gemini` | Gemini-only orchestrator |
@@ -276,6 +349,8 @@ Phase 3: Resolution (if consensus < 70%)
 | `merge-worktree` | Worktree merge helper for swarm results |
 | `star-prompt` | GitHub star prompt for postinstall |
+</details>
 ---
 ## Deep vs Light
@@ -341,85 +416,6 @@ graph TD
 ---
-## Under the Hood
-### Singleton MCP Hub with Dual-Protocol IPC
-triflux Hub runs as a **singleton daemon** per machine. A filesystem lock prevents duplicate instances.
-```
-Local agents ──→ Named Pipe (NDJSON, sub-ms latency) ──→ Hub
-Remote/Dashboard ──→ HTTP/REST ──────────────────────→ Hub
-```
-The bridge client tries Named Pipe first and falls back to HTTP automatically. Sessions auto-expire after 30 minutes, and the Hub self-terminates when idle. Run `tfx hub ensure` to guarantee the Hub is alive from any context.
-### Reflexion Adaptive Learning
-Errors become knowledge automatically. The Reflexion Engine runs a closed-loop learning pipeline:
-```
-safety-guard blocks command
-  → error normalized (paths, timestamps, UUIDs stripped)
-  → pattern stored in pending-penalties
-  → promoted to adaptive rule (Bayesian confidence scoring)
-  → injected into CLAUDE.md when confidence > threshold
-Three-tier memory:
-  Tier 1 (Session)   → cleared on session end
-  Tier 2 (Project)   → decays -0.2 confidence per 5 unobserved sessions
-  Tier 3 (Permanent) → auto-injected into CLAUDE.md as machine-readable rules
-```
-A blocked command in Session 1 becomes a proactive warning in Session 2 and eventually a permanent instruction. Your AI agent literally gets smarter over time.
-### Pipeline Quality Gates
-Every Deep task runs through a **10-phase state machine** with quality gates:
-```
-plan → PRD → confidence gate → execute → deslop → verify → selfcheck → complete
-                                                              ↓
-                                                          fix (max 3) → retry
-```
-- **Confidence Gate** (pre-execution): 5 weighted criteria must score >= 90% before execution starts
-- **Hallucination Detection** (post-execution): 7 regex patterns catch AI claims without evidence:
-  - "tests pass" without test output
-  - "performance improved" without benchmarks
-  - "backward compatible" without verification
-  - "no changes needed" when diff exists
-- **Bounded loops**: Fix attempts capped at 3, ralph iterations at 10. State persists in SQLite for crash recovery.
-### 5-Tier Adaptive HUD
-The Claude Code status bar auto-adapts to any terminal width:
-```
- full (120+ cols)  ██████░░░░ claude 52%  ██████░░░░ codex 48%  savings: $2.40
- compact (80 cols) c:52% x:48% g:Free  sv:$2.40  CTX:67%
- minimal (60 cols) c:52% x:48% sv:$2.40
- micro (<60 cols)  c52 x48 sv$2
- nano (<40 cols)   c:52%/x:48%
-```
-Zero config. Open a vertical split pane and the HUD auto-collapses. Close it and it expands back. When `tfx-multi` is active, a live worker row appears showing per-CLI progress: `x✓ g⋯ c✗` (completed/running/failed).
-Context token attribution tracks usage by skill, file, and tool call, with warnings at 60%/80%/90% context fill.
-### Windows Terminal Orchestration
-triflux doesn't just run in a terminal -- it **orchestrates** it. The WT Manager API provides:
-- **Tab creation** with PID-tracked lifecycle (temp file polling for readiness)
-- **Split-pane layouts** via `applySplitLayout()` for multi-agent dashboards
-- **Dead tab pruning** using cross-platform PID liveness detection
-- **Base64 PowerShell encoding** eliminating all quoting/escaping issues
-Every direct `wt.exe` call is blocked by safety-guard. Agents can only use the managed API path, preventing uncontrolled terminal sprawl.
----
 ## TUI Routing Monitor
 **New in v10.1** — `tfx monitor` launches an interactive terminal dashboard:
@@ -447,6 +443,59 @@ The monitor visualizes:
 ---
+## What's New
+### v10.1 — Reflexion Pipeline + TUI Monitor
+| Feature | Description |
+|---------|-------------|
+| **TUI Routing Monitor** | `tfx monitor` — interactive terminal dashboard showing real-time skill routing, model selection, and success rates |
+| **Reflexion Pipeline** | safety-guard events feed into a reflexion store, enabling adaptive learning from past routing decisions |
+| **Adaptive Rules API v2** | Penalty promotion pipeline (`pending-penalties` → `adaptive_rules`), hit_count isolation, schema v2 with 18 tests |
+| **Q-Learning Routing** | Experimental dynamic skill routing via Q-table weight optimization (`TRIFLUX_DYNAMIC_ROUTING=true`) |
+| **Security Hardening** | headless-guard: wrapper bypass, pipe bypass, env escape vectors blocked. SSH bash-syntax forwarding prevention |
+| **HUD System** | Codex plan-aware status display with correct bucket-to-slot mapping |
+### v10.0 — 4-Lake Roadmap
+<details>
+<summary>Expand v10.0 details</summary>
+- **Lake 1: CLI Stability** — Retry, stall detection, version cache. Zero silent failures
+- **Lake 2: Plugin Isolation** — cli-adapter-base, team-bridge, pack.mjs sync
+- **Lake 3: Remote Infrastructure** — SSH keepalive/retry, hosts.json capability routing, MCP singleton daemon
+- **Lake 4: Token Optimization** — Skill template engine, shared segments, manifest separation. 62% prompt token reduction
+- **Lake 5: Agent Mesh** — Message routing, per-agent queues, heartbeat monitoring, Conductor integration
+</details>
+### v9 — Harness-Native Intelligence
+<details>
+<summary>Expand v9 details</summary>
+- **Natural Language Routing** — Say "review this" or "리뷰해줘" instead of memorizing skill names
+- **Cross-Model Review** — Claude writes → Codex reviews. Same-model self-approve blocked
+- **Context Isolation** — Off-topic requests auto-detected; spawns a clean psmux session
+- **Codex Swarm Hardened** — PowerShell `.ps1` launchers, profile-based execution
+</details>
+### v8 — Tri-Debate Foundation
+<details>
+<summary>Expand v8 details</summary>
+- **Tri-Debate Engine** — 3-CLI independent analysis with anti-herding and consensus scoring
+- **Deep/Light Variants** — Every domain has both a fast mode and a thorough mode
+- **Expert Panel** — Virtual expert simulation via `tfx-panel`
+- **Hub IPC** — Named Pipe & HTTP MCP bridge
+- **psmux** — Windows Terminal native multiplexer
+</details>
+---
 ## Security
 | Layer | Protection |
@@ -471,6 +520,37 @@ The monitor visualizes:
 ---
+## 5-Tier Adaptive HUD
+The Claude Code status bar auto-adapts to any terminal width:
+```
+ full (120+ cols)  ██████░░░░ claude 52%  ██████░░░░ codex 48%  savings: $2.40
+ compact (80 cols) c:52% x:48% g:Free  sv:$2.40  CTX:67%
+ minimal (60 cols) c:52% x:48% sv:$2.40
+ micro (<60 cols)  c52 x48 sv$2
+ nano (<40 cols)   c:52%/x:48%
+```
+Zero config. Open a vertical split pane and the HUD auto-collapses. Close it and it expands back. When `tfx-multi` is active, a live worker row appears showing per-CLI progress: `x✓ g⋯ c✗` (completed/running/failed).
+Context token attribution tracks usage by skill, file, and tool call, with warnings at 60%/80%/90% context fill.
+---
+## Windows Terminal Orchestration
+triflux doesn't just run in a terminal -- it **orchestrates** it. The WT Manager API provides:
+- **Tab creation** with PID-tracked lifecycle (temp file polling for readiness)
+- **Split-pane layouts** via `applySplitLayout()` for multi-agent dashboards
+- **Dead tab pruning** using cross-platform PID liveness detection
+- **Base64 PowerShell encoding** eliminating all quoting/escaping issues
+Every direct `wt.exe` call is blocked by safety-guard. Agents can only use the managed API path, preventing uncontrolled terminal sprawl.
+---
 ## Research Foundation
 The triflux skill suite was shaped by patterns from across the Claude Code ecosystem: