triflux 10.2.1 → 10.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -30,11 +30,11 @@
30
30
 
31
31
  <p align="center">
32
32
  <a href="#quick-start">Quick Start</a> &middot;
33
- <a href="#tri-cli-consensus">Consensus Engine</a> &middot;
34
- <a href="#42-skills">42 Skills</a> &middot;
33
+ <a href="#core-engine">Core Engine</a> &middot;
34
+ <a href="#killer-skills">Killer Skills</a> &middot;
35
+ <a href="#all-42-skills">All 42 Skills</a> &middot;
35
36
  <a href="#deep-vs-light">Deep vs Light</a> &middot;
36
37
  <a href="#architecture">Architecture</a> &middot;
37
- <a href="#under-the-hood">Under the Hood</a> &middot;
38
38
  <a href="#security">Security</a>
39
39
  </p>
40
40
 
@@ -44,6 +44,8 @@
44
44
 
45
45
  Most AI coding tools talk to **one model**. triflux talks to **three** — and makes them argue.
46
46
 
47
+ triflux is not a collection of skills. It is a **multi-model parallel orchestration harness**. The 42 skills are what it does. The harness — consensus engine, message bus, router, and security guard — is what makes it different.
48
+
47
49
  Every Deep skill runs Claude, Codex, and Gemini **independently** (no cross-visibility), then cross-validates their findings. Only consensus-verified results survive. The result: **87% fewer false positives** compared to single-model review.
48
50
 
49
51
  You don't need to memorize commands. Say what you want in natural language — triflux routes to the right skill automatically:
@@ -74,91 +76,32 @@ tfx setup
74
76
  ### 3. Use
75
77
 
76
78
  ```bash
77
- # Lightsingle model, fast execution
78
- /tfx-research "React 19 Server Actions best practices"
79
- /tfx-review
80
- /tfx-plan "add JWT auth middleware"
81
-
82
- # Deep — 3-party consensus for critical work
83
- /tfx-deep-research "microservice architecture comparison 2026"
79
+ # 3-party consensus three models argue, only consensus survives
84
80
  /tfx-deep-review
85
81
  /tfx-deep-plan "migrate REST to GraphQL"
86
82
 
87
- # Debateget 3 independent opinions
88
- /tfx-debate "Redis vs PostgreSQL LISTEN/NOTIFY for real-time events"
89
-
90
- # Persistence — don't stop until done
91
- /tfx-persist "implement full auth flow with tests"
83
+ # Swarmsplit PRD into shards, parallel worktree execution
84
+ /tfx-swarm
92
85
 
93
- # Team — multi-CLI parallel orchestration
86
+ # Team — Claude + Codex + Gemini on parallel tasks
94
87
  /tfx-multi "refactor auth + update UI + add tests"
95
88
 
96
- # Monitorreal-time routing dashboard
97
- tfx monitor
89
+ # Persistdon't stop until done, 3-party verified
90
+ /tfx-persist "implement full auth flow with tests"
98
91
 
99
- # Remote — spawn Claude sessions on other machines
100
- /tfx-remote-setup # interactive host wizard
101
- /tfx-remote-spawn "run security review on my-server"
92
+ # Remote — spawn sessions on other machines
93
+ /tfx-remote-spawn "run security review on ryzen5-7600"
102
94
  ```
103
95
 
104
96
  > **Note**: Deep skills require **psmux** (or tmux), **triflux Hub**, **Codex CLI**, and **Gemini CLI** for full Tri-CLI consensus. Without these, skills automatically degrade to Claude-only mode. Run `tfx doctor` to check your environment.
105
97
 
106
98
  ---
107
99
 
108
- ## What's New
109
-
110
- ### v10.1 — Reflexion Pipeline + TUI Monitor
111
-
112
- | Feature | Description |
113
- |---------|-------------|
114
- | **TUI Routing Monitor** | `tfx monitor` — interactive terminal dashboard showing real-time skill routing, model selection, and success rates |
115
- | **Reflexion Pipeline** | safety-guard events feed into a reflexion store, enabling adaptive learning from past routing decisions |
116
- | **Adaptive Rules API v2** | Penalty promotion pipeline (`pending-penalties` → `adaptive_rules`), hit_count isolation, schema v2 with 18 tests |
117
- | **Q-Learning Routing** | Experimental dynamic skill routing via Q-table weight optimization (`TRIFLUX_DYNAMIC_ROUTING=true`) |
118
- | **Security Hardening** | headless-guard: wrapper bypass, pipe bypass, env escape vectors blocked. SSH bash-syntax forwarding prevention |
119
- | **HUD System** | Codex plan-aware status display with correct bucket-to-slot mapping |
120
-
121
- ### v10.0 — 4-Lake Roadmap
122
-
123
- <details>
124
- <summary>Expand v10.0 details</summary>
125
-
126
- - **Lake 1: CLI Stability** — Retry, stall detection, version cache. Zero silent failures
127
- - **Lake 2: Plugin Isolation** — cli-adapter-base, team-bridge, pack.mjs sync
128
- - **Lake 3: Remote Infrastructure** — SSH keepalive/retry, hosts.json capability routing, MCP singleton daemon
129
- - **Lake 4: Token Optimization** — Skill template engine, shared segments, manifest separation. 62% prompt token reduction
130
- - **Lake 5: Agent Mesh** — Message routing, per-agent queues, heartbeat monitoring, Conductor integration
131
-
132
- </details>
133
-
134
- ### v9 — Harness-Native Intelligence
135
-
136
- <details>
137
- <summary>Expand v9 details</summary>
138
-
139
- - **Natural Language Routing** — Say "review this" or "리뷰해줘" instead of memorizing skill names
140
- - **Cross-Model Review** — Claude writes → Codex reviews. Same-model self-approve blocked
141
- - **Context Isolation** — Off-topic requests auto-detected; spawns a clean psmux session
142
- - **Codex Swarm Hardened** — PowerShell `.ps1` launchers, profile-based execution
143
-
144
- </details>
145
-
146
- ### v8 — Tri-Debate Foundation
147
-
148
- <details>
149
- <summary>Expand v8 details</summary>
150
-
151
- - **Tri-Debate Engine** — 3-CLI independent analysis with anti-herding and consensus scoring
152
- - **Deep/Light Variants** — Every domain has both a fast mode and a thorough mode
153
- - **Expert Panel** — Virtual expert simulation via `tfx-panel`
154
- - **Hub IPC** — Named Pipe & HTTP MCP bridge
155
- - **psmux** — Windows Terminal native multiplexer
100
+ ## Core Engine
156
101
 
157
- </details>
158
-
159
- ---
102
+ The infrastructure that makes triflux triflux. If any of these break, everything breaks.
160
103
 
161
- ## Tri-CLI Consensus
104
+ ### Tri-CLI Consensus
162
105
 
163
106
  <p align="center">
164
107
  <img src="docs/assets/consensus-flow.svg" alt="Tri-CLI Consensus Flow" width="680">
@@ -183,11 +126,142 @@ Phase 3: Resolution (if consensus < 70%)
183
126
  └─ Unresolved → user decides
184
127
  ```
185
128
 
186
- **v10.1 addition**: The **Reflexion Pipeline** feeds consensus outcomes back into an adaptive rules store, so routing decisions improve over time based on which models perform best for which task types.
129
+ ### Hub Singleton MCP Message Bus
130
+
131
+ triflux Hub runs as a **singleton daemon** per machine. A filesystem lock prevents duplicate instances.
132
+
133
+ ```
134
+ Local agents ──→ Named Pipe (NDJSON, sub-ms latency) ──→ Hub
135
+ Remote/Dashboard ──→ HTTP/REST ──────────────────────→ Hub
136
+ ```
137
+
138
+ The bridge client tries Named Pipe first and falls back to HTTP automatically. Sessions auto-expire after 30 minutes, and the Hub self-terminates when idle. Run `tfx hub ensure` to guarantee the Hub is alive from any context.
139
+
140
+ ### Router — Natural Language Skill Mapping
141
+
142
+ `tfx-auto` is the unified entry point. Natural language input → keyword detection → skill routing → CLI dispatch. Depth modifiers ("thoroughly", "제대로") auto-escalate Light skills to Deep. The router handles Korean and English natively.
143
+
144
+ ### Guard — Security Perimeter
145
+
146
+ Two layers that enforce the safety boundary:
147
+
148
+ - **headless-guard**: Blocks direct `codex exec` / `gemini -y` outside tfx skills. Wrapper bypass, pipe bypass, env escape vectors all covered.
149
+ - **safety-guard**: SSH bash-syntax forwarding prevention, injection-safe shell execution.
150
+
151
+ Every CLI invocation flows through the guard layer. No exceptions.
152
+
153
+ ### Reflexion Adaptive Learning
154
+
155
+ Errors become knowledge automatically. The Reflexion Engine runs a closed-loop learning pipeline:
156
+
157
+ ```
158
+ safety-guard blocks command
159
+ → error normalized (paths, timestamps, UUIDs stripped)
160
+ → pattern stored in pending-penalties
161
+ → promoted to adaptive rule (Bayesian confidence scoring)
162
+ → injected into CLAUDE.md when confidence > threshold
163
+
164
+ Three-tier memory:
165
+ Tier 1 (Session) → cleared on session end
166
+ Tier 2 (Project) → decays -0.2 confidence per 5 unobserved sessions
167
+ Tier 3 (Permanent) → auto-injected into CLAUDE.md as machine-readable rules
168
+ ```
169
+
170
+ A blocked command in Session 1 becomes a proactive warning in Session 2 and eventually a permanent instruction. Your AI agent literally gets smarter over time.
171
+
172
+ ### Pipeline Quality Gates
173
+
174
+ Every Deep task runs through a **10-phase state machine** with quality gates:
175
+
176
+ ```
177
+ plan → PRD → confidence gate → execute → deslop → verify → selfcheck → complete
178
+
179
+ fix (max 3) → retry
180
+ ```
181
+
182
+ - **Confidence Gate** (pre-execution): 5 weighted criteria must score >= 90% before execution starts
183
+ - **Hallucination Detection** (post-execution): 7 regex patterns catch AI claims without evidence:
184
+ - "tests pass" without test output
185
+ - "performance improved" without benchmarks
186
+ - "backward compatible" without verification
187
+ - "no changes needed" when diff exists
188
+ - **Bounded loops**: Fix attempts capped at 3, ralph iterations at 10. State persists in SQLite for crash recovery.
187
189
 
188
190
  ---
189
191
 
190
- ## 42 Skills
192
+ ## Killer Skills
193
+
194
+ These are why you use triflux. Each one depends on the Core Engine above.
195
+
196
+ ### Multi-CLI Team Orchestration — `tfx-multi`
197
+
198
+ Run Claude + Codex + Gemini as a coordinated team on parallel tasks. Say `"refactor auth + update UI + add tests"` and each sub-task is dispatched to the best-fit model, executed in parallel, and results are merged with cross-model review.
199
+
200
+ ```bash
201
+ /tfx-multi "refactor auth + update UI + add tests"
202
+ /tfx-multi --agents codex,gemini "frontend + backend"
203
+ ```
204
+
205
+ ### Multi-Machine x Multi-Model Swarm — `tfx-swarm`
206
+
207
+ One PRD, multiple machines, multiple models. Write a PRD with `agent:` and `host:` per shard, and triflux distributes work across local and remote machines using Claude + Codex + Gemini in parallel.
208
+
209
+ ```bash
210
+ /tfx-swarm # select PRDs, choose remote/model config, launch workers
211
+ ```
212
+
213
+ Example PRD shard:
214
+ ```markdown
215
+ ## Shard: security-audit
216
+ - agent: claude
217
+ - host: ryzen5-7600
218
+ - critical: true
219
+ - files: src/security.mjs
220
+ - prompt: Security vulnerability audit
221
+ ```
222
+
223
+ Each shard gets its own git worktree, file-lease enforcement prevents conflicts, and results merge automatically in dependency order. Critical shards run on two different models for redundant verification.
224
+
225
+ ### Remote Sessions — `tfx-remote-spawn`
226
+
227
+ Spawn Claude Code sessions on remote machines via SSH. Tailscale auto-discovery, host capability probing, session handoff, prompt injection into running sessions, and session re-attachment.
228
+
229
+ ```bash
230
+ /tfx-remote-spawn "run security review on ryzen5-7600"
231
+ /tfx-remote-spawn list # see active remote sessions
232
+ ```
233
+
234
+ ### Persistence Loop — `tfx-persist` (ralph)
235
+
236
+ "Don't stop until it's done." A 3-party verified execution loop that keeps going until the task passes consensus verification. Bounded at 10 iterations with state persistence for crash recovery.
237
+
238
+ ```bash
239
+ /tfx-persist "implement full auth flow with tests"
240
+ ```
241
+
242
+ ### 3-Party Consensus Reviews — `tfx-deep-review` / `tfx-deep-plan`
243
+
244
+ The bread-and-butter Deep skills. Three models independently review your code or plan your implementation, then cross-validate. Only consensus-verified findings survive.
245
+
246
+ ```bash
247
+ /tfx-deep-review # 3-party code review
248
+ /tfx-deep-plan "migrate to GraphQL" # 3-party planning
249
+ ```
250
+
251
+ ### Structured Debate — `tfx-debate`
252
+
253
+ Three models take independent positions on a technical question, debate, and converge on a recommendation. Anti-herding ensures genuine independence.
254
+
255
+ ```bash
256
+ /tfx-debate "Redis vs PostgreSQL LISTEN/NOTIFY for real-time events"
257
+ ```
258
+
259
+ ---
260
+
261
+ ## All 42 Skills
262
+
263
+ <details>
264
+ <summary>Expand full skill list</summary>
191
265
 
192
266
  ### Research & Discovery
193
267
 
@@ -249,8 +323,7 @@ Phase 3: Resolution (if consensus < 70%)
249
323
  | `tfx-consensus` | Core consensus engine (used by all Deep skills) |
250
324
  | `tfx-hub` | MCP message bus — Named Pipe & HTTP bridge |
251
325
  | `tfx-multi` | Multi-CLI team orchestration (2+ parallel tasks) |
252
- | `tfx-codex-swarm` | Parallel Codex sessions via worktree + psmux |
253
- | `tfx-swarm` | Unified swarm orchestration |
326
+ | `tfx-swarm` | Multi-machine x multi-model swarm (PRD → shard → worktree, local+remote) |
254
327
  | `tfx-codex` | Codex-only orchestrator |
255
328
  | `tfx-gemini` | Gemini-only orchestrator |
256
329
 
@@ -276,6 +349,8 @@ Phase 3: Resolution (if consensus < 70%)
276
349
  | `merge-worktree` | Worktree merge helper for swarm results |
277
350
  | `star-prompt` | GitHub star prompt for postinstall |
278
351
 
352
+ </details>
353
+
279
354
  ---
280
355
 
281
356
  ## Deep vs Light
@@ -341,85 +416,6 @@ graph TD
341
416
 
342
417
  ---
343
418
 
344
- ## Under the Hood
345
-
346
- ### Singleton MCP Hub with Dual-Protocol IPC
347
-
348
- triflux Hub runs as a **singleton daemon** per machine. A filesystem lock prevents duplicate instances.
349
-
350
- ```
351
- Local agents ──→ Named Pipe (NDJSON, sub-ms latency) ──→ Hub
352
- Remote/Dashboard ──→ HTTP/REST ──────────────────────→ Hub
353
- ```
354
-
355
- The bridge client tries Named Pipe first and falls back to HTTP automatically. Sessions auto-expire after 30 minutes, and the Hub self-terminates when idle. Run `tfx hub ensure` to guarantee the Hub is alive from any context.
356
-
357
- ### Reflexion Adaptive Learning
358
-
359
- Errors become knowledge automatically. The Reflexion Engine runs a closed-loop learning pipeline:
360
-
361
- ```
362
- safety-guard blocks command
363
- → error normalized (paths, timestamps, UUIDs stripped)
364
- → pattern stored in pending-penalties
365
- → promoted to adaptive rule (Bayesian confidence scoring)
366
- → injected into CLAUDE.md when confidence > threshold
367
-
368
- Three-tier memory:
369
- Tier 1 (Session) → cleared on session end
370
- Tier 2 (Project) → decays -0.2 confidence per 5 unobserved sessions
371
- Tier 3 (Permanent) → auto-injected into CLAUDE.md as machine-readable rules
372
- ```
373
-
374
- A blocked command in Session 1 becomes a proactive warning in Session 2 and eventually a permanent instruction. Your AI agent literally gets smarter over time.
375
-
376
- ### Pipeline Quality Gates
377
-
378
- Every Deep task runs through a **10-phase state machine** with quality gates:
379
-
380
- ```
381
- plan → PRD → confidence gate → execute → deslop → verify → selfcheck → complete
382
-
383
- fix (max 3) → retry
384
- ```
385
-
386
- - **Confidence Gate** (pre-execution): 5 weighted criteria must score >= 90% before execution starts
387
- - **Hallucination Detection** (post-execution): 7 regex patterns catch AI claims without evidence:
388
- - "tests pass" without test output
389
- - "performance improved" without benchmarks
390
- - "backward compatible" without verification
391
- - "no changes needed" when diff exists
392
- - **Bounded loops**: Fix attempts capped at 3, ralph iterations at 10. State persists in SQLite for crash recovery.
393
-
394
- ### 5-Tier Adaptive HUD
395
-
396
- The Claude Code status bar auto-adapts to any terminal width:
397
-
398
- ```
399
- full (120+ cols) ██████░░░░ claude 52% ██████░░░░ codex 48% savings: $2.40
400
- compact (80 cols) c:52% x:48% g:Free sv:$2.40 CTX:67%
401
- minimal (60 cols) c:52% x:48% sv:$2.40
402
- micro (<60 cols) c52 x48 sv$2
403
- nano (<40 cols) c:52%/x:48%
404
- ```
405
-
406
- Zero config. Open a vertical split pane and the HUD auto-collapses. Close it and it expands back. When `tfx-multi` is active, a live worker row appears showing per-CLI progress: `x✓ g⋯ c✗` (completed/running/failed).
407
-
408
- Context token attribution tracks usage by skill, file, and tool call, with warnings at 60%/80%/90% context fill.
409
-
410
- ### Windows Terminal Orchestration
411
-
412
- triflux doesn't just run in a terminal -- it **orchestrates** it. The WT Manager API provides:
413
-
414
- - **Tab creation** with PID-tracked lifecycle (temp file polling for readiness)
415
- - **Split-pane layouts** via `applySplitLayout()` for multi-agent dashboards
416
- - **Dead tab pruning** using cross-platform PID liveness detection
417
- - **Base64 PowerShell encoding** eliminating all quoting/escaping issues
418
-
419
- Every direct `wt.exe` call is blocked by safety-guard. Agents can only use the managed API path, preventing uncontrolled terminal sprawl.
420
-
421
- ---
422
-
423
419
  ## TUI Routing Monitor
424
420
 
425
421
  **New in v10.1** — `tfx monitor` launches an interactive terminal dashboard:
@@ -447,6 +443,59 @@ The monitor visualizes:
447
443
 
448
444
  ---
449
445
 
446
+ ## What's New
447
+
448
+ ### v10.1 — Reflexion Pipeline + TUI Monitor
449
+
450
+ | Feature | Description |
451
+ |---------|-------------|
452
+ | **TUI Routing Monitor** | `tfx monitor` — interactive terminal dashboard showing real-time skill routing, model selection, and success rates |
453
+ | **Reflexion Pipeline** | safety-guard events feed into a reflexion store, enabling adaptive learning from past routing decisions |
454
+ | **Adaptive Rules API v2** | Penalty promotion pipeline (`pending-penalties` → `adaptive_rules`), hit_count isolation, schema v2 with 18 tests |
455
+ | **Q-Learning Routing** | Experimental dynamic skill routing via Q-table weight optimization (`TRIFLUX_DYNAMIC_ROUTING=true`) |
456
+ | **Security Hardening** | headless-guard: wrapper bypass, pipe bypass, env escape vectors blocked. SSH bash-syntax forwarding prevention |
457
+ | **HUD System** | Codex plan-aware status display with correct bucket-to-slot mapping |
458
+
459
+ ### v10.0 — 4-Lake Roadmap
460
+
461
+ <details>
462
+ <summary>Expand v10.0 details</summary>
463
+
464
+ - **Lake 1: CLI Stability** — Retry, stall detection, version cache. Zero silent failures
465
+ - **Lake 2: Plugin Isolation** — cli-adapter-base, team-bridge, pack.mjs sync
466
+ - **Lake 3: Remote Infrastructure** — SSH keepalive/retry, hosts.json capability routing, MCP singleton daemon
467
+ - **Lake 4: Token Optimization** — Skill template engine, shared segments, manifest separation. 62% prompt token reduction
468
+ - **Lake 5: Agent Mesh** — Message routing, per-agent queues, heartbeat monitoring, Conductor integration
469
+
470
+ </details>
471
+
472
+ ### v9 — Harness-Native Intelligence
473
+
474
+ <details>
475
+ <summary>Expand v9 details</summary>
476
+
477
+ - **Natural Language Routing** — Say "review this" or "리뷰해줘" instead of memorizing skill names
478
+ - **Cross-Model Review** — Claude writes → Codex reviews. Same-model self-approve blocked
479
+ - **Context Isolation** — Off-topic requests auto-detected; spawns a clean psmux session
480
+ - **Codex Swarm Hardened** — PowerShell `.ps1` launchers, profile-based execution
481
+
482
+ </details>
483
+
484
+ ### v8 — Tri-Debate Foundation
485
+
486
+ <details>
487
+ <summary>Expand v8 details</summary>
488
+
489
+ - **Tri-Debate Engine** — 3-CLI independent analysis with anti-herding and consensus scoring
490
+ - **Deep/Light Variants** — Every domain has both a fast mode and a thorough mode
491
+ - **Expert Panel** — Virtual expert simulation via `tfx-panel`
492
+ - **Hub IPC** — Named Pipe & HTTP MCP bridge
493
+ - **psmux** — Windows Terminal native multiplexer
494
+
495
+ </details>
496
+
497
+ ---
498
+
450
499
  ## Security
451
500
 
452
501
  | Layer | Protection |
@@ -471,6 +520,37 @@ The monitor visualizes:
471
520
 
472
521
  ---
473
522
 
523
+ ## 5-Tier Adaptive HUD
524
+
525
+ The Claude Code status bar auto-adapts to any terminal width:
526
+
527
+ ```
528
+ full (120+ cols) ██████░░░░ claude 52% ██████░░░░ codex 48% savings: $2.40
529
+ compact (80 cols) c:52% x:48% g:Free sv:$2.40 CTX:67%
530
+ minimal (60 cols) c:52% x:48% sv:$2.40
531
+ micro (<60 cols) c52 x48 sv$2
532
+ nano (<40 cols) c:52%/x:48%
533
+ ```
534
+
535
+ Zero config. Open a vertical split pane and the HUD auto-collapses. Close it and it expands back. When `tfx-multi` is active, a live worker row appears showing per-CLI progress: `x✓ g⋯ c✗` (completed/running/failed).
536
+
537
+ Context token attribution tracks usage by skill, file, and tool call, with warnings at 60%/80%/90% context fill.
538
+
539
+ ---
540
+
541
+ ## Windows Terminal Orchestration
542
+
543
+ triflux doesn't just run in a terminal -- it **orchestrates** it. The WT Manager API provides:
544
+
545
+ - **Tab creation** with PID-tracked lifecycle (temp file polling for readiness)
546
+ - **Split-pane layouts** via `applySplitLayout()` for multi-agent dashboards
547
+ - **Dead tab pruning** using cross-platform PID liveness detection
548
+ - **Base64 PowerShell encoding** eliminating all quoting/escaping issues
549
+
550
+ Every direct `wt.exe` call is blocked by safety-guard. Agents can only use the managed API path, preventing uncontrolled terminal sprawl.
551
+
552
+ ---
553
+
474
554
  ## Research Foundation
475
555
 
476
556
  The triflux skill suite was shaped by patterns from across the Claude Code ecosystem: