triflux 4.2.6 → 4.2.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (63) hide show
  1. package/package.json +2 -1
  2. package/skills/tfx-workspace/evals/evals.json +0 -79
  3. package/skills/tfx-workspace/iteration-1/benchmark.json +0 -162
  4. package/skills/tfx-workspace/iteration-1/codex-gemini-remap/eval_metadata.json +0 -11
  5. package/skills/tfx-workspace/iteration-1/codex-gemini-remap/old_skill/grading.json +0 -9
  6. package/skills/tfx-workspace/iteration-1/codex-gemini-remap/old_skill/outputs/analysis.md +0 -154
  7. package/skills/tfx-workspace/iteration-1/codex-gemini-remap/old_skill/timing.json +0 -5
  8. package/skills/tfx-workspace/iteration-1/codex-gemini-remap/with_skill/grading.json +0 -9
  9. package/skills/tfx-workspace/iteration-1/codex-gemini-remap/with_skill/outputs/analysis.md +0 -126
  10. package/skills/tfx-workspace/iteration-1/codex-gemini-remap/with_skill/timing.json +0 -5
  11. package/skills/tfx-workspace/iteration-1/doctor-diagnosis/eval_metadata.json +0 -11
  12. package/skills/tfx-workspace/iteration-1/doctor-diagnosis/old_skill/grading.json +0 -9
  13. package/skills/tfx-workspace/iteration-1/doctor-diagnosis/old_skill/outputs/analysis.md +0 -119
  14. package/skills/tfx-workspace/iteration-1/doctor-diagnosis/old_skill/timing.json +0 -5
  15. package/skills/tfx-workspace/iteration-1/doctor-diagnosis/with_skill/grading.json +0 -9
  16. package/skills/tfx-workspace/iteration-1/doctor-diagnosis/with_skill/outputs/analysis.md +0 -115
  17. package/skills/tfx-workspace/iteration-1/doctor-diagnosis/with_skill/timing.json +0 -5
  18. package/skills/tfx-workspace/iteration-1/hub-start-sequence/eval_metadata.json +0 -10
  19. package/skills/tfx-workspace/iteration-1/hub-start-sequence/old_skill/grading.json +0 -8
  20. package/skills/tfx-workspace/iteration-1/hub-start-sequence/old_skill/outputs/analysis.md +0 -86
  21. package/skills/tfx-workspace/iteration-1/hub-start-sequence/old_skill/timing.json +0 -5
  22. package/skills/tfx-workspace/iteration-1/hub-start-sequence/with_skill/grading.json +0 -8
  23. package/skills/tfx-workspace/iteration-1/hub-start-sequence/with_skill/outputs/analysis.md +0 -81
  24. package/skills/tfx-workspace/iteration-1/hub-start-sequence/with_skill/timing.json +0 -5
  25. package/skills/tfx-workspace/iteration-1/multi-team-creation/eval_metadata.json +0 -12
  26. package/skills/tfx-workspace/iteration-1/multi-team-creation/old_skill/grading.json +0 -10
  27. package/skills/tfx-workspace/iteration-1/multi-team-creation/old_skill/outputs/analysis.md +0 -316
  28. package/skills/tfx-workspace/iteration-1/multi-team-creation/old_skill/timing.json +0 -5
  29. package/skills/tfx-workspace/iteration-1/multi-team-creation/with_skill/grading.json +0 -10
  30. package/skills/tfx-workspace/iteration-1/multi-team-creation/with_skill/outputs/analysis.md +0 -352
  31. package/skills/tfx-workspace/iteration-1/multi-team-creation/with_skill/timing.json +0 -5
  32. package/skills/tfx-workspace/iteration-1/review.html +0 -1325
  33. package/skills/tfx-workspace/iteration-1/routing-implement-shortcut/eval_metadata.json +0 -12
  34. package/skills/tfx-workspace/iteration-1/routing-implement-shortcut/old_skill/grading.json +0 -10
  35. package/skills/tfx-workspace/iteration-1/routing-implement-shortcut/old_skill/outputs/analysis.md +0 -97
  36. package/skills/tfx-workspace/iteration-1/routing-implement-shortcut/old_skill/timing.json +0 -5
  37. package/skills/tfx-workspace/iteration-1/routing-implement-shortcut/with_skill/grading.json +0 -10
  38. package/skills/tfx-workspace/iteration-1/routing-implement-shortcut/with_skill/outputs/analysis.md +0 -94
  39. package/skills/tfx-workspace/iteration-1/routing-implement-shortcut/with_skill/timing.json +0 -5
  40. package/skills/tfx-workspace/iteration-1/routing-multi-task-triage/eval_metadata.json +0 -12
  41. package/skills/tfx-workspace/iteration-1/routing-multi-task-triage/old_skill/grading.json +0 -10
  42. package/skills/tfx-workspace/iteration-1/routing-multi-task-triage/old_skill/outputs/analysis.md +0 -209
  43. package/skills/tfx-workspace/iteration-1/routing-multi-task-triage/old_skill/timing.json +0 -5
  44. package/skills/tfx-workspace/iteration-1/routing-multi-task-triage/with_skill/grading.json +0 -10
  45. package/skills/tfx-workspace/iteration-1/routing-multi-task-triage/with_skill/outputs/analysis.md +0 -193
  46. package/skills/tfx-workspace/iteration-1/routing-multi-task-triage/with_skill/timing.json +0 -5
  47. package/skills/tfx-workspace/iteration-2/benchmark.json +0 -62
  48. package/skills/tfx-workspace/iteration-2/multi-team-creation-refactored/eval_metadata.json +0 -13
  49. package/skills/tfx-workspace/iteration-2/multi-team-creation-refactored/old_skill/grading.json +0 -11
  50. package/skills/tfx-workspace/iteration-2/multi-team-creation-refactored/old_skill/outputs/analysis.md +0 -382
  51. package/skills/tfx-workspace/iteration-2/multi-team-creation-refactored/old_skill/timing.json +0 -5
  52. package/skills/tfx-workspace/iteration-2/multi-team-creation-refactored/with_skill/grading.json +0 -11
  53. package/skills/tfx-workspace/iteration-2/multi-team-creation-refactored/with_skill/outputs/analysis.md +0 -333
  54. package/skills/tfx-workspace/iteration-2/multi-team-creation-refactored/with_skill/timing.json +0 -5
  55. package/skills/tfx-workspace/iteration-2/review.html +0 -1325
  56. package/skills/tfx-workspace/skill-snapshot/tfx-auto/SKILL.md +0 -217
  57. package/skills/tfx-workspace/skill-snapshot/tfx-auto-codex/SKILL.md +0 -77
  58. package/skills/tfx-workspace/skill-snapshot/tfx-codex/SKILL.md +0 -65
  59. package/skills/tfx-workspace/skill-snapshot/tfx-doctor/SKILL.md +0 -94
  60. package/skills/tfx-workspace/skill-snapshot/tfx-gemini/SKILL.md +0 -82
  61. package/skills/tfx-workspace/skill-snapshot/tfx-hub/SKILL.md +0 -133
  62. package/skills/tfx-workspace/skill-snapshot/tfx-multi/SKILL.md +0 -426
  63. package/skills/tfx-workspace/skill-snapshot/tfx-setup/SKILL.md +0 -101
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "triflux",
3
- "version": "4.2.6",
3
+ "version": "4.2.7",
4
4
  "description": "CLI-first multi-model orchestrator for Claude Code — route tasks to Codex, Gemini, and Claude",
5
5
  "type": "module",
6
6
  "bin": {
@@ -14,6 +14,7 @@
14
14
  "bin",
15
15
  "hub",
16
16
  "skills",
17
+ "!skills/tfx-workspace",
17
18
  "!**/failure-reports",
18
19
  "scripts",
19
20
  "hooks",
@@ -1,79 +0,0 @@
1
- {
2
- "skill_name": "tfx-skills-suite",
3
- "evals": [
4
- {
5
- "id": 1,
6
- "prompt": "You are a Claude Code agent. Read the tfx-auto skill definition, then explain how you would handle this user request: '/implement JWT 인증 미들웨어 추가해줘'. List the EXACT bash commands you would run. Do NOT actually execute them.",
7
- "expected_output": "Should route to executor agent via tfx-route.sh with 'implement' MCP profile. Command: bash ~/.claude/scripts/tfx-route.sh executor 'JWT 인증 미들웨어 추가해줘' implement",
8
- "files": [],
9
- "expectations": [
10
- "Routes to 'executor' agent (not architect, not analyst)",
11
- "Uses 'implement' MCP profile",
12
- "Generates correct tfx-route.sh command syntax",
13
- "Does NOT trigger triage (single command shortcut)",
14
- "Does NOT delegate to tfx-multi"
15
- ]
16
- },
17
- {
18
- "id": 2,
19
- "prompt": "You are a Claude Code agent. Read the tfx-auto skill definition, then explain how you would handle: '/tfx-auto 프론트엔드 리팩터링하고 보안 리뷰도 해줘'. List all routing decisions, triage steps, and delegation.",
20
- "expected_output": "Should enter auto triage mode, classify via Codex, decompose into 2+ subtasks, then delegate to tfx-multi Phase 3",
21
- "files": [],
22
- "expectations": [
23
- "Identifies this as auto mode (not command shortcut)",
24
- "Triggers Codex classification step",
25
- "Decomposes into at least 2 subtasks",
26
- "Notes delegation to tfx-multi for subtasks >= 2",
27
- "Does NOT try to execute all subtasks directly"
28
- ]
29
- },
30
- {
31
- "id": 3,
32
- "prompt": "You are a Claude Code agent. Read the tfx-multi skill definition, then explain step-by-step how you would handle: '/tfx-multi 인증 리팩터링 + UI 개선 + 보안 리뷰'. List all TeamCreate, TaskCreate, Agent calls with exact parameters.",
33
- "expected_output": "Should create team, 3 TaskCreates, 3 Agent spawns with slim wrapper structure following Phase 0-5",
34
- "files": [],
35
- "expectations": [
36
- "Creates exactly one TeamCreate with tfx- prefix naming",
37
- "Creates 3 TaskCreate calls (one per subtask)",
38
- "Spawns 3 Agent wrappers with mode: bypassPermissions",
39
- "Uses tfx-route.sh inside Agent wrapper (not direct codex/gemini)",
40
- "Includes Phase 5 cleanup (TeamDelete)"
41
- ]
42
- },
43
- {
44
- "id": 4,
45
- "prompt": "You are a Claude Code agent. Read the tfx-doctor skill definition, then explain how you would handle: 'HUD가 안 보이고 codex도 안 되는데 어떻게 해?'. List exact commands and reasoning.",
46
- "expected_output": "Should suggest running triflux doctor first, then triflux doctor --fix if issues found",
47
- "files": [],
48
- "expectations": [
49
- "Runs 'triflux doctor' as first diagnostic step",
50
- "Suggests '--fix' mode for auto-repair",
51
- "Mentions HUD and CLI path checks in explanation",
52
- "Does NOT jump straight to --reset (that's for cache only)"
53
- ]
54
- },
55
- {
56
- "id": 5,
57
- "prompt": "You are a Claude Code agent. Read the tfx-hub skill definition, then explain how you would handle: '/tfx-hub start'. List exact commands.",
58
- "expected_output": "Should run 'node hub/server.mjs' in background",
59
- "files": [],
60
- "expectations": [
61
- "Runs 'node hub/server.mjs' with run_in_background=true",
62
- "Mentions port 27888 and /mcp endpoint",
63
- "Does NOT try to run any triage or routing"
64
- ]
65
- },
66
- {
67
- "id": 6,
68
- "prompt": "You are a Claude Code agent. Read the tfx-codex skill definition, then explain the Gemini-to-Codex remapping. For '/tfx-codex API 문서를 작성하고 디자인 가이드도 만들어줘', list the routing showing how designer/writer get remapped.",
69
- "expected_output": "designer remapped to Codex(high), writer to Codex Spark(spark_fast), TFX_CLI_MODE=codex env var",
70
- "files": [],
71
- "expectations": [
72
- "designer remapped to Codex with effort: high",
73
- "writer remapped to Codex Spark with effort: spark_fast",
74
- "Sets TFX_CLI_MODE=codex environment variable",
75
- "Changes MCP profile: designer->implement, writer->analyze"
76
- ]
77
- }
78
- ]
79
- }
@@ -1,162 +0,0 @@
1
- {
2
- "metadata": {
3
- "skill_name": "tfx-skills-suite",
4
- "skill_path": "C:/Users/SSAFY/Desktop/Projects/cli/triflux/skills",
5
- "executor_model": "claude-sonnet-4-6",
6
- "analyzer_model": "claude-opus-4-6",
7
- "timestamp": "2026-03-19T10:00:00Z",
8
- "evals_run": [1, 2, 3, 4, 5, 6],
9
- "runs_per_configuration": 1
10
- },
11
- "runs": [
12
- {
13
- "eval_id": 1, "eval_name": "routing-implement-shortcut", "configuration": "with_skill", "run_number": 1,
14
- "result": {"pass_rate": 1.0, "passed": 5, "failed": 0, "total": 5, "time_seconds": 43.6, "tokens": 16303, "tool_calls": 4, "errors": 0},
15
- "expectations": [
16
- {"text": "Routes to executor agent", "passed": true, "evidence": "Correctly mapped from implement shortcut table"},
17
- {"text": "Uses implement MCP profile", "passed": true, "evidence": "Mapped from shortcut table"},
18
- {"text": "Generates correct tfx-route.sh command", "passed": true, "evidence": "bash ~/.claude/scripts/tfx-route.sh executor '...' implement"},
19
- {"text": "Does NOT trigger triage", "passed": true, "evidence": "Command shortcut skips triage"},
20
- {"text": "Does NOT delegate to tfx-multi", "passed": true, "evidence": "No subtask decomposition occurred"}
21
- ]
22
- },
23
- {
24
- "eval_id": 1, "eval_name": "routing-implement-shortcut", "configuration": "without_skill", "run_number": 1,
25
- "result": {"pass_rate": 1.0, "passed": 5, "failed": 0, "total": 5, "time_seconds": 48.1, "tokens": 16436, "tool_calls": 4, "errors": 0},
26
- "expectations": [
27
- {"text": "Routes to executor agent", "passed": true, "evidence": "Correctly mapped"},
28
- {"text": "Uses implement MCP profile", "passed": true, "evidence": "Assigned by shortcut table"},
29
- {"text": "Generates correct tfx-route.sh command", "passed": true, "evidence": "Correct syntax generated"},
30
- {"text": "Does NOT trigger triage", "passed": true, "evidence": "Shortcut mode skips triage"},
31
- {"text": "Does NOT delegate to tfx-multi", "passed": true, "evidence": "No delegation"}
32
- ]
33
- },
34
- {
35
- "eval_id": 2, "eval_name": "routing-multi-task-triage", "configuration": "with_skill", "run_number": 1,
36
- "result": {"pass_rate": 1.0, "passed": 5, "failed": 0, "total": 5, "time_seconds": 58.2, "tokens": 17584, "tool_calls": 3, "errors": 0},
37
- "expectations": [
38
- {"text": "Identifies as auto mode", "passed": true, "evidence": "No shortcut match, auto mode selected"},
39
- {"text": "Triggers Codex classification", "passed": true, "evidence": "Codex --full-auto classification triggered"},
40
- {"text": "Decomposes into 2+ subtasks", "passed": true, "evidence": "2 subtasks: executor + security-reviewer"},
41
- {"text": "Notes tfx-multi delegation", "passed": true, "evidence": "subtasks.length >= 2 triggers tfx-multi Phase 3"},
42
- {"text": "Does NOT execute directly", "passed": true, "evidence": "Delegates to tfx-multi"}
43
- ]
44
- },
45
- {
46
- "eval_id": 2, "eval_name": "routing-multi-task-triage", "configuration": "without_skill", "run_number": 1,
47
- "result": {"pass_rate": 1.0, "passed": 5, "failed": 0, "total": 5, "time_seconds": 77.2, "tokens": 18626, "tool_calls": 4, "errors": 0},
48
- "expectations": [
49
- {"text": "Identifies as auto mode", "passed": true, "evidence": "Auto mode selected"},
50
- {"text": "Triggers Codex classification", "passed": true, "evidence": "Codex --full-auto triggered"},
51
- {"text": "Decomposes into 2+ subtasks", "passed": true, "evidence": "2 subtasks decomposed"},
52
- {"text": "Notes tfx-multi delegation", "passed": true, "evidence": "Hands off to tfx-multi Phase 3"},
53
- {"text": "Does NOT execute directly", "passed": true, "evidence": "Delegates correctly"}
54
- ]
55
- },
56
- {
57
- "eval_id": 3, "eval_name": "multi-team-creation", "configuration": "with_skill", "run_number": 1,
58
- "result": {"pass_rate": 1.0, "passed": 5, "failed": 0, "total": 5, "time_seconds": 115.3, "tokens": 27197, "tool_calls": 3, "errors": 0},
59
- "expectations": [
60
- {"text": "Creates TeamCreate with tfx- prefix", "passed": true, "evidence": "TeamCreate({ team_name: 'tfx-<base36>' })"},
61
- {"text": "Creates 3 TaskCreate calls", "passed": true, "evidence": "3x TaskCreate with metadata"},
62
- {"text": "Spawns 3 Agent wrappers with bypassPermissions", "passed": true, "evidence": "3x Agent({ mode: bypassPermissions })"},
63
- {"text": "Uses tfx-route.sh inside wrappers", "passed": true, "evidence": "Direct codex/gemini calls prohibited"},
64
- {"text": "Includes Phase 5 TeamDelete", "passed": true, "evidence": "TeamDelete always runs, max 30s wait"}
65
- ]
66
- },
67
- {
68
- "eval_id": 3, "eval_name": "multi-team-creation", "configuration": "without_skill", "run_number": 1,
69
- "result": {"pass_rate": 1.0, "passed": 5, "failed": 0, "total": 5, "time_seconds": 100.6, "tokens": 26140, "tool_calls": 3, "errors": 0},
70
- "expectations": [
71
- {"text": "Creates TeamCreate with tfx- prefix", "passed": true, "evidence": "TeamCreate with tfx-<id>"},
72
- {"text": "Creates 3 TaskCreate calls", "passed": true, "evidence": "Three TaskCreate calls"},
73
- {"text": "Spawns 3 Agent wrappers with bypassPermissions", "passed": true, "evidence": "mode: bypassPermissions in all 3"},
74
- {"text": "Uses tfx-route.sh inside wrappers", "passed": true, "evidence": "Never direct codex/gemini calls"},
75
- {"text": "Includes Phase 5 TeamDelete", "passed": true, "evidence": "TeamDelete unconditionally"}
76
- ]
77
- },
78
- {
79
- "eval_id": 4, "eval_name": "doctor-diagnosis", "configuration": "with_skill", "run_number": 1,
80
- "result": {"pass_rate": 1.0, "passed": 4, "failed": 0, "total": 4, "time_seconds": 53.8, "tokens": 14499, "tool_calls": 4, "errors": 0},
81
- "expectations": [
82
- {"text": "Runs triflux doctor first", "passed": true, "evidence": "Bash(\"triflux doctor\")"},
83
- {"text": "Suggests --fix mode", "passed": true, "evidence": "Suggests after diagnosis report"},
84
- {"text": "Mentions HUD and CLI checks", "passed": true, "evidence": "HUD and CLI paths checked"},
85
- {"text": "Does NOT jump to --reset", "passed": true, "evidence": "--reset reserved for explicit request"}
86
- ]
87
- },
88
- {
89
- "eval_id": 4, "eval_name": "doctor-diagnosis", "configuration": "without_skill", "run_number": 1,
90
- "result": {"pass_rate": 1.0, "passed": 4, "failed": 0, "total": 4, "time_seconds": 48.3, "tokens": 14482, "tool_calls": 3, "errors": 0},
91
- "expectations": [
92
- {"text": "Runs triflux doctor first", "passed": true, "evidence": "Bash(\"triflux doctor\")"},
93
- {"text": "Suggests --fix mode", "passed": true, "evidence": "Offers --fix after diagnosis"},
94
- {"text": "Mentions HUD and CLI checks", "passed": true, "evidence": "All 8 diagnostics listed"},
95
- {"text": "Does NOT jump to --reset", "passed": true, "evidence": "--reset reserved for explicit request"}
96
- ]
97
- },
98
- {
99
- "eval_id": 5, "eval_name": "hub-start-sequence", "configuration": "with_skill", "run_number": 1,
100
- "result": {"pass_rate": 1.0, "passed": 3, "failed": 0, "total": 3, "time_seconds": 47.2, "tokens": 14821, "tool_calls": 4, "errors": 0},
101
- "expectations": [
102
- {"text": "Runs node hub/server.mjs in background", "passed": true, "evidence": "Bash(\"node hub/server.mjs\", run_in_background=true)"},
103
- {"text": "Mentions port 27888 and /mcp", "passed": true, "evidence": "Port 27888, http://127.0.0.1:27888/mcp"},
104
- {"text": "No triage or routing attempted", "passed": true, "evidence": "Command match, not fallthrough"}
105
- ]
106
- },
107
- {
108
- "eval_id": 5, "eval_name": "hub-start-sequence", "configuration": "without_skill", "run_number": 1,
109
- "result": {"pass_rate": 1.0, "passed": 3, "failed": 0, "total": 3, "time_seconds": 51.8, "tokens": 14904, "tool_calls": 4, "errors": 0},
110
- "expectations": [
111
- {"text": "Runs node hub/server.mjs in background", "passed": true, "evidence": "Bash(\"node hub/server.mjs\", run_in_background=true)"},
112
- {"text": "Mentions port 27888 and /mcp", "passed": true, "evidence": "Port 27888, endpoint /mcp"},
113
- {"text": "No triage or routing attempted", "passed": true, "evidence": "Command match, not fallthrough"}
114
- ]
115
- },
116
- {
117
- "eval_id": 6, "eval_name": "codex-gemini-remap", "configuration": "with_skill", "run_number": 1,
118
- "result": {"pass_rate": 1.0, "passed": 4, "failed": 0, "total": 4, "time_seconds": 69.7, "tokens": 14889, "tool_calls": 5, "errors": 0},
119
- "expectations": [
120
- {"text": "designer remapped to Codex (effort: high)", "passed": true, "evidence": "designer → Codex (effort: high)"},
121
- {"text": "writer remapped to Codex Spark (spark_fast)", "passed": true, "evidence": "writer → Codex Spark (effort: spark_fast)"},
122
- {"text": "TFX_CLI_MODE=codex set", "passed": true, "evidence": "Set for every Phase 3 call"},
123
- {"text": "MCP profiles changed", "passed": true, "evidence": "designer→implement, writer→analyze"}
124
- ]
125
- },
126
- {
127
- "eval_id": 6, "eval_name": "codex-gemini-remap", "configuration": "without_skill", "run_number": 1,
128
- "result": {"pass_rate": 1.0, "passed": 4, "failed": 0, "total": 4, "time_seconds": 85.2, "tokens": 19802, "tool_calls": 7, "errors": 0},
129
- "expectations": [
130
- {"text": "designer remapped to Codex (effort: high)", "passed": true, "evidence": "designer → Codex (effort: high)"},
131
- {"text": "writer remapped to Codex Spark (spark_fast)", "passed": true, "evidence": "writer → Codex Spark (effort: spark_fast)"},
132
- {"text": "TFX_CLI_MODE=codex set", "passed": true, "evidence": "TFX_CLI_MODE set to codex"},
133
- {"text": "MCP profiles changed", "passed": true, "evidence": "writer→analyze, designer→implement"}
134
- ]
135
- }
136
- ],
137
- "run_summary": {
138
- "with_skill": {
139
- "pass_rate": {"mean": 1.0, "stddev": 0.0, "min": 1.0, "max": 1.0},
140
- "time_seconds": {"mean": 64.6, "stddev": 26.4, "min": 43.6, "max": 115.3},
141
- "tokens": {"mean": 17549, "stddev": 4857, "min": 14499, "max": 27197}
142
- },
143
- "without_skill": {
144
- "pass_rate": {"mean": 1.0, "stddev": 0.0, "min": 1.0, "max": 1.0},
145
- "time_seconds": {"mean": 68.5, "stddev": 20.4, "min": 48.1, "max": 100.6},
146
- "tokens": {"mean": 18398, "stddev": 4227, "min": 14482, "max": 26140}
147
- },
148
- "delta": {
149
- "pass_rate": "+0.00",
150
- "time_seconds": "-3.9",
151
- "tokens": "-849"
152
- }
153
- },
154
- "notes": [
155
- "All 26 assertions pass at 100% for both configurations — the skills are functionally correct",
156
- "The fixes applied (dead reference removal, Phase numbering consistency, hub description) don't change routing logic, so pass rates are identical",
157
- "NEW version is marginally faster (-3.9s avg) and uses fewer tokens (-849 avg), likely due to cleaner references reducing model confusion",
158
- "tfx-multi is the most complex skill (115s / 27K tokens with_skill) — consider extracting reference docs to reduce context load",
159
- "tfx-codex OLD references 'Phase(1~6)' which doesn't exist in tfx-auto — the NEW version correctly references the actual workflow names",
160
- "All assertions pass regardless of configuration — these test the core routing logic which is unchanged. Consider adding assertions that specifically test the fixed issues (dead refs, phase naming) for differentiation"
161
- ]
162
- }
@@ -1,11 +0,0 @@
1
- {
2
- "eval_id": 6,
3
- "eval_name": "codex-gemini-remap",
4
- "prompt": "/tfx-codex API 문서를 작성하고 디자인 가이드도 만들어줘",
5
- "assertions": [
6
- "designer remapped to Codex with effort: high",
7
- "writer remapped to Codex Spark with effort: spark_fast",
8
- "Sets TFX_CLI_MODE=codex environment variable",
9
- "Changes MCP profile: designer->implement, writer->analyze"
10
- ]
11
- }
@@ -1,9 +0,0 @@
1
- {
2
- "expectations": [
3
- {"text": "designer remapped to Codex with effort: high", "passed": true, "evidence": "Agent output: designer → Codex (effort: high)"},
4
- {"text": "writer remapped to Codex Spark with effort: spark_fast", "passed": true, "evidence": "Agent output: writer → Codex Spark (effort: spark_fast)"},
5
- {"text": "Sets TFX_CLI_MODE=codex environment variable", "passed": true, "evidence": "Agent output: 'TFX_CLI_MODE: Set to codex'"},
6
- {"text": "Changes MCP profile: designer->implement, writer->analyze", "passed": true, "evidence": "Agent output: writer→analyze, designer→implement"}
7
- ],
8
- "summary": {"passed": 4, "failed": 0, "total": 4, "pass_rate": 1.0}
9
- }
@@ -1,154 +0,0 @@
1
- # tfx-codex 라우팅 분석 — DRY RUN
2
-
3
- **요청**: `/tfx-codex API 문서를 작성하고 디자인 가이드도 만들어줘`
4
- **분석 기준 SKILL**: `skills/tfx-workspace/skill-snapshot/tfx-codex/SKILL.md`
5
-
6
- ---
7
-
8
- ## 1. 에이전트 리매핑 테이블
9
-
10
- 이 요청은 두 개의 독립 서브태스크로 분해됩니다:
11
- - 서브태스크 A: "API 문서를 작성" → **writer** 역할
12
- - 서브태스크 B: "디자인 가이드도 만들어줘" → **designer** 역할
13
-
14
- | 에이전트 | 원래 CLI (tfx-auto) | tfx-codex에서 | effort 파라미터 | MCP 프로필 |
15
- |----------|---------------------|---------------|-----------------|-----------|
16
- | **writer** | ~~Gemini~~ (`docs` MCP) | **Codex** (effort: spark_fast) — Codex Spark 경량 문서 | `spark_fast` | `analyze` |
17
- | **designer** | ~~Gemini~~ (`docs` MCP) | **Codex** (effort: high) — UI 코드 생성 | `high` | `implement` |
18
-
19
- ### 원본 tfx-auto 기준 (리매핑 전)
20
-
21
- `tfx-auto` SKILL.md의 에이전트 매핑 테이블에서:
22
-
23
- ```
24
- | gemini / designer / writer | Gemini | docs |
25
- ```
26
-
27
- 즉, 원래 두 역할 모두 Gemini CLI + `docs` MCP로 실행됩니다.
28
-
29
- ### tfx-codex 기준 (리매핑 후)
30
-
31
- `tfx-codex` SKILL.md의 에이전트 라우팅 테이블에서:
32
-
33
- ```
34
- | designer | ~~Gemini~~ | Codex (effort: high) — UI 코드 생성 | implement |
35
- | writer | ~~Gemini~~ | Codex Spark (effort: spark_fast) — 경량 문서 | analyze |
36
- ```
37
-
38
- ---
39
-
40
- ## 2. TFX_CLI_MODE 환경변수
41
-
42
- ```
43
- TFX_CLI_MODE=codex
44
- ```
45
-
46
- 이 환경변수는 tfx-route.sh에 전달되어 Gemini 에이전트가 선택될 경우 Codex로 강제 교체하도록 지시합니다. Phase 2 트리아지에서 Codex 분류기가 `gemini`를 반환하더라도 이 값에 의해 `codex`로 교체됩니다.
47
-
48
- ---
49
-
50
- ## 3. Phase 2 트리아지 동작
51
-
52
- **자동 모드** (`/tfx-codex "API 문서를 작성하고 디자인 가이드도 만들어줘"`):
53
-
54
- 1. **Codex 분류** (`--full-auto --skip-git-repo-check`):
55
- - 입력 파싱 결과 예상 JSON:
56
- ```json
57
- {
58
- "parts": [
59
- { "description": "API 문서 작성", "agent": "gemini" },
60
- { "description": "디자인 가이드 생성", "agent": "gemini" }
61
- ]
62
- }
63
- ```
64
- - `TFX_CLI_MODE=codex` 적용 → 두 항목 모두 `"gemini"` → **`"codex"`로 강제 교체**
65
-
66
- 2. **Opus 인라인 분해** (강제 변환 이후):
67
- - `writer` 역할: MCP 프로필 `analyze` 할당
68
- - `designer` 역할: MCP 프로필 `implement` 할당
69
- - 두 서브태스크는 독립적(INDEPENDENT), `graph_type: "INDEPENDENT"`
70
-
71
- 3. **서브태스크 수 = 2** → tfx-multi Native Teams 모드로 자동 전환 (tfx-auto 규칙: 2개 이상 시 tfx-multi Phase 3)
72
-
73
- ---
74
-
75
- ## 4. 생성되는 Bash 커맨드 (서브태스크별)
76
-
77
- 서브태스크가 2개이므로 tfx-multi Phase 3a(TeamCreate) → Phase 3b(TaskCreate) → Phase 3c(Agent 래퍼 spawn) 순서로 실행됩니다. 각 Agent 래퍼 내부에서 다음 Bash 커맨드가 실행됩니다:
78
-
79
- ### 서브태스크 A — writer (API 문서 작성)
80
-
81
- ```bash
82
- TFX_CLI_MODE=codex bash ~/.claude/scripts/tfx-route.sh writer 'API 문서를 작성해줘' analyze
83
- ```
84
-
85
- - `writer` 에이전트: Codex Spark (`effort: spark_fast`) 로 실행
86
- - MCP 프로필: `analyze` (문서 기반 리서치+작성)
87
- - `run_in_background=true` (INDEPENDENT 병렬 실행)
88
-
89
- ### 서브태스크 B — designer (디자인 가이드 생성)
90
-
91
- ```bash
92
- TFX_CLI_MODE=codex bash ~/.claude/scripts/tfx-route.sh designer '디자인 가이드를 만들어줘' implement
93
- ```
94
-
95
- - `designer` 에이전트: Codex (`effort: high`) 로 실행
96
- - MCP 프로필: `implement` (코드 기반 UI 작업)
97
- - `run_in_background=true` (INDEPENDENT 병렬 실행)
98
-
99
- > 두 서브태스크는 `depends_on` 없이 Level 0에서 병렬 실행됩니다.
100
-
101
- ---
102
-
103
- ## 5. MCP 프로필 변화 상세
104
-
105
- | 에이전트 | tfx-auto 원본 MCP | tfx-codex 변경 후 MCP | 변경 이유 |
106
- |----------|-------------------|----------------------|----------|
107
- | **writer** | `docs` | `analyze` | Gemini → Codex 전환 시 문서 리서치+작성에 적합한 `analyze` 프로필 사용 |
108
- | **designer** | `docs` | `implement` | Gemini → Codex 전환 시 UI 코드 생성에 적합한 `implement` 프로필 사용 |
109
-
110
- 원래 `docs` MCP는 Gemini CLI의 웹 검색/문서 접근 기능을 전제로 설계되었습니다. Codex로 리매핑 시 각 역할의 실제 작업 성격에 맞는 프로필로 교체됩니다.
111
-
112
- ---
113
-
114
- ## 6. 워크플로우 레퍼런스
115
-
116
- **tfx-codex는 tfx-auto SKILL.md의 Phase 1~6 전체를 그대로 따릅니다.**
117
-
118
- ```
119
- Phase 1: 입력 파싱 — 트리거 `/tfx-codex` 인식, 인자 추출
120
- Phase 2: 트리아지
121
- - Codex 분류 실행 (TFX_CLI_MODE=codex)
122
- - gemini 반환값 → codex 강제 교체
123
- - Opus 인라인 분해 (writer→analyze MCP, designer→implement MCP)
124
- Phase 3: CLI 실행
125
- - TFX_CLI_MODE=codex 환경변수 포함
126
- - tfx-route.sh 호출
127
- - 서브태스크 2개 → tfx-multi Phase 3 전환
128
- Phase 4: 결과 수집
129
- - exit_code 0: === OUTPUT === 섹션 파싱
130
- - exit_code 124: === PARTIAL OUTPUT === 사용
131
- - 그 외: STDERR → Claude fallback
132
- Phase 5: 실패 처리
133
- - 1차: Claude executor(sonnet) fallback
134
- - 2차: 실패 보고 + 성공 결과만 종합
135
- Phase 6: 보고 형식 출력
136
- - 모드/그래프/레벨/서브태스크 상태 테이블
137
- - Token Savings Report
138
- ```
139
-
140
- **핵심 차이점 요약**: Phase 2와 Phase 3에서만 동작이 달라집니다.
141
- - Phase 2: gemini 분류 결과를 codex로 강제 변환 + MCP 프로필 재할당
142
- - Phase 3: 모든 tfx-route.sh 호출에 `TFX_CLI_MODE=codex` 접두 추가
143
-
144
- ---
145
-
146
- ## 7. 요약
147
-
148
- 이 요청(`/tfx-codex API 문서를 작성하고 디자인 가이드도 만들어줘`)은 다음과 같이 처리됩니다:
149
-
150
- 1. 두 서브태스크로 분해 (INDEPENDENT 그래프)
151
- 2. 원래 Gemini로 라우팅될 `writer`와 `designer` 모두 Codex로 리매핑
152
- 3. MCP 프로필: writer → `analyze`, designer → `implement` (원본 `docs`에서 변경)
153
- 4. 서브태스크 2개이므로 tfx-multi Native Teams 모드로 자동 전환하여 병렬 실행
154
- 5. 전체 Phase 1~6은 tfx-auto 워크플로우를 그대로 따름
@@ -1,5 +0,0 @@
1
- {
2
- "total_tokens": 19802,
3
- "duration_ms": 85239,
4
- "total_duration_seconds": 85.2
5
- }
@@ -1,9 +0,0 @@
1
- {
2
- "expectations": [
3
- {"text": "designer remapped to Codex with effort: high", "passed": true, "evidence": "Agent output: designer → Codex (effort: high)"},
4
- {"text": "writer remapped to Codex Spark with effort: spark_fast", "passed": true, "evidence": "Agent output: writer → Codex Spark (effort: spark_fast)"},
5
- {"text": "Sets TFX_CLI_MODE=codex environment variable", "passed": true, "evidence": "Agent output: 'TFX_CLI_MODE: Set to codex for every Phase 3 call'"},
6
- {"text": "Changes MCP profile: designer->implement, writer->analyze", "passed": true, "evidence": "Agent output: writer→analyze, designer→implement (changed from docs)"}
7
- ],
8
- "summary": {"passed": 4, "failed": 0, "total": 4, "pass_rate": 1.0}
9
- }
@@ -1,126 +0,0 @@
1
- # tfx-codex Routing Analysis — DRY RUN
2
-
3
- **User Request:** `/tfx-codex API 문서를 작성하고 디자인 가이드도 만들어줘`
4
-
5
- **Skill Definition Source:** `skills/tfx-codex/SKILL.md`
6
-
7
- ---
8
-
9
- ## 1. Trigger Matching
10
-
11
- The command prefix `/tfx-codex` matches the skill trigger `tfx-codex` exactly.
12
- The skill is invoked with the argument: `"API 문서를 작성하고 디자인 가이드도 만들어줘"`
13
-
14
- ---
15
-
16
- ## 2. TFX_CLI_MODE Environment Variable
17
-
18
- ```
19
- TFX_CLI_MODE=codex
20
- ```
21
-
22
- This variable is set for every CLI execution in Phase 3. It forces `tfx-route.sh` to substitute
23
- any `gemini` classification with `codex`, ensuring Gemini CLI is never called.
24
-
25
- ---
26
-
27
- ## 3. Task Decomposition (Phase 2 Triage)
28
-
29
- The user request contains two distinct subtasks:
30
-
31
- | # | Subtask | Natural Agent Assignment | tfx-codex Override |
32
- |---|---------|-------------------------|--------------------|
33
- | 1 | API 문서를 작성 (Write API documentation) | **writer** (originally Gemini) | **Codex Spark** |
34
- | 2 | 디자인 가이드도 만들어줘 (Create design guide) | **designer** (originally Gemini) | **Codex** (effort: high) |
35
-
36
- During Phase 2, the Opus decomposition step detects that both subtasks would ordinarily route to
37
- Gemini-backed roles. The `TFX_CLI_MODE=codex` override forces:
38
- - Any `gemini` classification result → replaced with `codex`
39
- - `designer` and `writer` agent types → mapped to Codex with adjusted MCP profiles
40
-
41
- ---
42
-
43
- ## 4. Agent Remapping Table
44
-
45
- | 에이전트 | 원래 CLI | tfx-codex 매핑 | effort 플래그 |
46
- |----------|---------|---------------|--------------|
47
- | **designer** | ~~Gemini~~ | **Codex** | `effort: high` — UI/시각 코드 생성 |
48
- | **writer** | ~~Gemini~~ | **Codex Spark** | `effort: spark_fast` — 경량 문서 작성 |
49
- | executor, build-fixer, debugger | Codex | Codex | 변경 없음 |
50
- | architect, planner, critic, analyst | Codex | Codex | 변경 없음 |
51
- | code-reviewer, security-reviewer | Codex | Codex | 변경 없음 |
52
- | scientist, document-specialist | Codex | Codex | 변경 없음 |
53
- | explore | Claude Haiku | Claude Haiku | 변경 없음 |
54
- | verifier, test-engineer | Claude Sonnet | Claude Sonnet | 변경 없음 |
55
-
56
- ---
57
-
58
- ## 5. MCP Profile Changes for designer and writer
59
-
60
- | 에이전트 | 기본 MCP 프로필 | tfx-codex MCP 프로필 | 이유 |
61
- |----------|--------------|---------------------|------|
62
- | **designer** | (Gemini 전용 — 없음) | `implement` | 코드 기반 UI 작업으로 처리 |
63
- | **writer** | (Gemini 전용 — 없음) | `analyze` | 문서 기반 리서치 + 작성 워크플로우 |
64
-
65
- Both roles lose access to Gemini's multimodal/creative profile and are instead assigned
66
- Codex-compatible MCP profiles that match the nature of the work:
67
- - `implement` for designer: treats the design guide as a code artifact (e.g., CSS, component specs)
68
- - `analyze` for writer: treats API documentation as a research-and-summarize task
69
-
70
- ---
71
-
72
- ## 6. Exact Bash Commands Generated (Phase 3)
73
-
74
- ### Subtask 1 — writer: API 문서 작성
75
-
76
- ```bash
77
- TFX_CLI_MODE=codex bash ~/.claude/scripts/tfx-route.sh writer 'API 문서를 작성해줘' analyze
78
- ```
79
-
80
- - Agent: `writer` → remapped to **Codex Spark** (`effort: spark_fast`)
81
- - MCP Profile: `analyze`
82
- - The `tfx-route.sh` script reads `TFX_CLI_MODE=codex` and substitutes the Gemini path with
83
- a Codex Spark invocation.
84
-
85
- ### Subtask 2 — designer: 디자인 가이드 작성
86
-
87
- ```bash
88
- TFX_CLI_MODE=codex bash ~/.claude/scripts/tfx-route.sh designer '디자인 가이드를 만들어줘' implement
89
- ```
90
-
91
- - Agent: `designer` → remapped to **Codex** (`effort: high`)
92
- - MCP Profile: `implement`
93
- - The `tfx-route.sh` script reads `TFX_CLI_MODE=codex` and substitutes the Gemini path with
94
- a full-effort Codex invocation.
95
-
96
- ---
97
-
98
- ## 7. Workflow Reference — tfx-auto Phases Followed
99
-
100
- Per the skill definition: *"tfx-auto SKILL.md의 전체 워크플로우(커맨드 숏컷 → 트리아지 → 실행 → 결과 파싱 → 보고)를 그대로 따릅니다."*
101
-
102
- The exact same phase sequence as `tfx-auto` is executed:
103
-
104
- | Phase | Name | tfx-codex 특이사항 |
105
- |-------|------|--------------------|
106
- | Phase 1 | 커맨드 숏컷 파싱 | 동일 (`N:codex` 숏컷 지원) |
107
- | Phase 2 | 트리아지 (Opus 분해) | `gemini` 분류 결과를 `codex`로 강제 변환; designer/writer → Codex + MCP 재할당 |
108
- | Phase 3 | CLI 실행 | `TFX_CLI_MODE=codex` 환경변수 주입하여 `tfx-route.sh` 호출 |
109
- | Phase 4 | 결과 파싱 | 동일 |
110
- | Phase 5 | 보고 | 동일 |
111
-
112
- The only deviation from `tfx-auto` occurs in **Phase 2** (forced gemini→codex substitution)
113
- and **Phase 3** (environment variable injection). All other phases are identical.
114
-
115
- ---
116
-
117
- ## 8. Summary
118
-
119
- For the request `/tfx-codex API 문서를 작성하고 디자인 가이드도 만들어줘`:
120
-
121
- - Two subtasks are identified: **writer** (API docs) and **designer** (design guide).
122
- - Both roles were originally mapped to **Gemini CLI** in the default `tfx-auto` routing.
123
- - `tfx-codex` remaps them: `writer` → Codex Spark (`analyze` MCP), `designer` → Codex high-effort (`implement` MCP).
124
- - `TFX_CLI_MODE=codex` is injected at Phase 3 for every `tfx-route.sh` call.
125
- - The full `tfx-auto` 5-phase workflow is followed with the two overrides noted above.
126
- - Gemini CLI is never invoked; no Gemini dependency exists.
@@ -1,5 +0,0 @@
1
- {
2
- "total_tokens": 14889,
3
- "duration_ms": 69725,
4
- "total_duration_seconds": 69.7
5
- }
@@ -1,11 +0,0 @@
1
- {
2
- "eval_id": 4,
3
- "eval_name": "doctor-diagnosis",
4
- "prompt": "HUD가 안 보이고 codex도 안 되는데 어떻게 해?",
5
- "assertions": [
6
- "Runs 'triflux doctor' as first diagnostic step",
7
- "Suggests '--fix' mode for auto-repair",
8
- "Mentions HUD and CLI path checks in explanation",
9
- "Does NOT jump straight to --reset (that's for cache only)"
10
- ]
11
- }
@@ -1,9 +0,0 @@
1
- {
2
- "expectations": [
3
- {"text": "Runs 'triflux doctor' as first diagnostic step", "passed": true, "evidence": "Agent output: 'Exact command: Bash(\"triflux doctor\")'"},
4
- {"text": "Suggests '--fix' mode for auto-repair", "passed": true, "evidence": "Agent output: 'after the diagnostic report, offer /tfx-doctor --fix as the next step'"},
5
- {"text": "Mentions HUD and CLI path checks in explanation", "passed": true, "evidence": "Agent output: 'HUD installation and config, Codex/Gemini/Claude CLI paths — directly relevant'"},
6
- {"text": "Does NOT jump straight to --reset (that's for cache only)", "passed": true, "evidence": "Agent output: '--reset is destructive and reserved for explicit cache-clear request'"}
7
- ],
8
- "summary": {"passed": 4, "failed": 0, "total": 4, "pass_rate": 1.0}
9
- }