triflux 4.2.5 → 4.2.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (74) hide show
  1. package/bin/tfx-doctor.mjs +1 -1
  2. package/bin/tfx-setup.mjs +1 -1
  3. package/bin/triflux.mjs +1 -1
  4. package/package.json +1 -1
  5. package/scripts/setup.mjs +21 -16
  6. package/skills/tfx-auto/SKILL.md +1 -1
  7. package/skills/tfx-codex/SKILL.md +1 -1
  8. package/skills/tfx-gemini/SKILL.md +1 -1
  9. package/skills/tfx-hub/SKILL.md +4 -1
  10. package/skills/tfx-multi/SKILL.md +177 -409
  11. package/skills/tfx-multi/references/agent-wrapper-rules.md +81 -0
  12. package/skills/tfx-multi/references/thorough-pipeline.md +66 -0
  13. package/skills/tfx-workspace/evals/evals.json +79 -0
  14. package/skills/tfx-workspace/iteration-1/benchmark.json +162 -0
  15. package/skills/tfx-workspace/iteration-1/codex-gemini-remap/eval_metadata.json +11 -0
  16. package/skills/tfx-workspace/iteration-1/codex-gemini-remap/old_skill/grading.json +9 -0
  17. package/skills/tfx-workspace/iteration-1/codex-gemini-remap/old_skill/outputs/analysis.md +154 -0
  18. package/skills/tfx-workspace/iteration-1/codex-gemini-remap/old_skill/timing.json +5 -0
  19. package/skills/tfx-workspace/iteration-1/codex-gemini-remap/with_skill/grading.json +9 -0
  20. package/skills/tfx-workspace/iteration-1/codex-gemini-remap/with_skill/outputs/analysis.md +126 -0
  21. package/skills/tfx-workspace/iteration-1/codex-gemini-remap/with_skill/timing.json +5 -0
  22. package/skills/tfx-workspace/iteration-1/doctor-diagnosis/eval_metadata.json +11 -0
  23. package/skills/tfx-workspace/iteration-1/doctor-diagnosis/old_skill/grading.json +9 -0
  24. package/skills/tfx-workspace/iteration-1/doctor-diagnosis/old_skill/outputs/analysis.md +119 -0
  25. package/skills/tfx-workspace/iteration-1/doctor-diagnosis/old_skill/timing.json +5 -0
  26. package/skills/tfx-workspace/iteration-1/doctor-diagnosis/with_skill/grading.json +9 -0
  27. package/skills/tfx-workspace/iteration-1/doctor-diagnosis/with_skill/outputs/analysis.md +115 -0
  28. package/skills/tfx-workspace/iteration-1/doctor-diagnosis/with_skill/timing.json +5 -0
  29. package/skills/tfx-workspace/iteration-1/hub-start-sequence/eval_metadata.json +10 -0
  30. package/skills/tfx-workspace/iteration-1/hub-start-sequence/old_skill/grading.json +8 -0
  31. package/skills/tfx-workspace/iteration-1/hub-start-sequence/old_skill/outputs/analysis.md +86 -0
  32. package/skills/tfx-workspace/iteration-1/hub-start-sequence/old_skill/timing.json +5 -0
  33. package/skills/tfx-workspace/iteration-1/hub-start-sequence/with_skill/grading.json +8 -0
  34. package/skills/tfx-workspace/iteration-1/hub-start-sequence/with_skill/outputs/analysis.md +81 -0
  35. package/skills/tfx-workspace/iteration-1/hub-start-sequence/with_skill/timing.json +5 -0
  36. package/skills/tfx-workspace/iteration-1/multi-team-creation/eval_metadata.json +12 -0
  37. package/skills/tfx-workspace/iteration-1/multi-team-creation/old_skill/grading.json +10 -0
  38. package/skills/tfx-workspace/iteration-1/multi-team-creation/old_skill/outputs/analysis.md +316 -0
  39. package/skills/tfx-workspace/iteration-1/multi-team-creation/old_skill/timing.json +5 -0
  40. package/skills/tfx-workspace/iteration-1/multi-team-creation/with_skill/grading.json +10 -0
  41. package/skills/tfx-workspace/iteration-1/multi-team-creation/with_skill/outputs/analysis.md +352 -0
  42. package/skills/tfx-workspace/iteration-1/multi-team-creation/with_skill/timing.json +5 -0
  43. package/skills/tfx-workspace/iteration-1/review.html +1325 -0
  44. package/skills/tfx-workspace/iteration-1/routing-implement-shortcut/eval_metadata.json +12 -0
  45. package/skills/tfx-workspace/iteration-1/routing-implement-shortcut/old_skill/grading.json +10 -0
  46. package/skills/tfx-workspace/iteration-1/routing-implement-shortcut/old_skill/outputs/analysis.md +97 -0
  47. package/skills/tfx-workspace/iteration-1/routing-implement-shortcut/old_skill/timing.json +5 -0
  48. package/skills/tfx-workspace/iteration-1/routing-implement-shortcut/with_skill/grading.json +10 -0
  49. package/skills/tfx-workspace/iteration-1/routing-implement-shortcut/with_skill/outputs/analysis.md +94 -0
  50. package/skills/tfx-workspace/iteration-1/routing-implement-shortcut/with_skill/timing.json +5 -0
  51. package/skills/tfx-workspace/iteration-1/routing-multi-task-triage/eval_metadata.json +12 -0
  52. package/skills/tfx-workspace/iteration-1/routing-multi-task-triage/old_skill/grading.json +10 -0
  53. package/skills/tfx-workspace/iteration-1/routing-multi-task-triage/old_skill/outputs/analysis.md +209 -0
  54. package/skills/tfx-workspace/iteration-1/routing-multi-task-triage/old_skill/timing.json +5 -0
  55. package/skills/tfx-workspace/iteration-1/routing-multi-task-triage/with_skill/grading.json +10 -0
  56. package/skills/tfx-workspace/iteration-1/routing-multi-task-triage/with_skill/outputs/analysis.md +193 -0
  57. package/skills/tfx-workspace/iteration-1/routing-multi-task-triage/with_skill/timing.json +5 -0
  58. package/skills/tfx-workspace/iteration-2/benchmark.json +62 -0
  59. package/skills/tfx-workspace/iteration-2/multi-team-creation-refactored/eval_metadata.json +13 -0
  60. package/skills/tfx-workspace/iteration-2/multi-team-creation-refactored/old_skill/grading.json +11 -0
  61. package/skills/tfx-workspace/iteration-2/multi-team-creation-refactored/old_skill/outputs/analysis.md +382 -0
  62. package/skills/tfx-workspace/iteration-2/multi-team-creation-refactored/old_skill/timing.json +5 -0
  63. package/skills/tfx-workspace/iteration-2/multi-team-creation-refactored/with_skill/grading.json +11 -0
  64. package/skills/tfx-workspace/iteration-2/multi-team-creation-refactored/with_skill/outputs/analysis.md +333 -0
  65. package/skills/tfx-workspace/iteration-2/multi-team-creation-refactored/with_skill/timing.json +5 -0
  66. package/skills/tfx-workspace/iteration-2/review.html +1325 -0
  67. package/skills/tfx-workspace/skill-snapshot/tfx-auto/SKILL.md +217 -0
  68. package/skills/tfx-workspace/skill-snapshot/tfx-auto-codex/SKILL.md +77 -0
  69. package/skills/tfx-workspace/skill-snapshot/tfx-codex/SKILL.md +65 -0
  70. package/skills/tfx-workspace/skill-snapshot/tfx-doctor/SKILL.md +94 -0
  71. package/skills/tfx-workspace/skill-snapshot/tfx-gemini/SKILL.md +82 -0
  72. package/skills/tfx-workspace/skill-snapshot/tfx-hub/SKILL.md +133 -0
  73. package/skills/tfx-workspace/skill-snapshot/tfx-multi/SKILL.md +426 -0
  74. package/skills/tfx-workspace/skill-snapshot/tfx-setup/SKILL.md +101 -0
@@ -0,0 +1,81 @@
1
+ # Agent 래퍼 상세 규칙
2
+
3
+ ## 슬림 래퍼의 존재 이유
4
+
5
+ Native Teams의 teammate는 Claude 모델만 가능하다.
6
+ Codex/Gemini는 teammate로 직접 등록할 수 없으므로, Claude slim wrapper를 spawn하고
7
+ 래퍼 내부에서 `tfx-route.sh`로 Codex/Gemini CLI를 실행하는 구조이다.
8
+
9
+ 래퍼가 존재하는 이유:
10
+ 1. **Shift+Down 네비게이션 등록** — 래퍼 없이 Lead가 직접 Bash를 실행하면 네비게이션에 등록되지 않음
11
+ 2. **리드↔워커 피드백 루프** — 워커가 결과를 보고하면 턴 경계가 생겨 리드가 방향을 수정할 수 있음
12
+ 3. **실패 시 재실행** — N회 실행을 지원
13
+
14
+ ## 필수 규칙
15
+
16
+ ### 1. Agent 래퍼 생략 금지
17
+
18
+ Codex/Gemini 서브태스크는 워커 수에 관계없이 반드시 Agent 래퍼를 spawn해야 한다.
19
+ 단일 워커(1:gemini 등)여도 Lead가 직접 Bash를 실행하면 안 된다.
20
+ Lead가 "효율적"이라고 판단해서 Agent를 건너뛰는 것은 금지한다.
21
+
22
+ ### 2. mode: bypassPermissions 필수
23
+
24
+ 모든 Agent spawn에 반드시 `mode: "bypassPermissions"`를 포함한다.
25
+ 이 설정이 없으면 워커가 Bash 실행 시 사용자 승인을 요청하여 자동 실행이 중단된다.
26
+
27
+ ### 3. tfx-route.sh 경유 필수
28
+
29
+ Lead 또는 Agent 래퍼가 `gemini -y -p "..."` 또는 `codex exec "..."`를 직접 호출하면 안 된다.
30
+ 직접 호출하면 다음이 누락된다:
31
+ - tfx-route.sh의 모델 지정(`-m gemini-3.1-pro-preview`)
32
+ - MCP 필터
33
+ - 팀 bridge 연동
34
+ - Windows 호환 경로
35
+ - 타임아웃
36
+ - 후처리(토큰 추적/이슈 로깅)
37
+
38
+ 반드시 `bash ~/.claude/scripts/tfx-route.sh {role} '{subtask}' {mcp_profile}`을 통해 실행해야 한다.
39
+
40
+ ### 4. 코드 직접 조작 금지
41
+
42
+ 슬림 래퍼 워커가 코드를 직접 읽거나 수정하면 안 된다.
43
+ codex-worker는 반드시 tfx-route.sh를 통해 Codex에 위임하고, gemini-worker도 마찬가지다.
44
+ 워커가 Read, Edit, Write, Grep, Glob 등 도구를 직접 사용하는 것은 위임 구조 위반이다.
45
+
46
+ ## 인터럽트 프로토콜
47
+
48
+ 워커가 Bash 실행 전에 SendMessage로 시작을 보고하면 턴 경계가 생겨 리드가 방향 전환 메시지를 보낼 수 있다.
49
+
50
+ ```
51
+ 1. TaskUpdate(taskId, status: in_progress) — task claim
52
+ 2. SendMessage(to: team-lead, "작업 시작: {agentName}") — 시작 보고 (턴 경계 생성)
53
+ 3. Bash(command: tfx-route.sh ..., timeout: {bashTimeoutMs}) — 실행
54
+ 4. SendMessage(to: team-lead, "결과: {요약}") — 결과 보고 (턴 경계 생성)
55
+ 5. 리드 피드백 대기 — 피드백 수신 시 Step 3으로 돌아가 재실행
56
+ 6. 최종 완료 시 TaskUpdate(status: completed, metadata: {result}) + SendMessage → 종료
57
+ ```
58
+
59
+ 리드는 워커의 Step 2, Step 4 시점에 턴 경계를 인식하고, 방향 전환/추가 지시/재실행 요청을 보낼 수 있다.
60
+
61
+ ## Bash timeout 동적 상속
62
+
63
+ Bash timeout은 tfx-route.sh의 role/profile별 timeout + 60초 여유를 ms로 변환하여 동적 상속한다.
64
+ `getRouteTimeout(role, mcpProfile)` 기준:
65
+ - analyze/review 프로필 또는 architect/analyst 역할: 3600초
66
+ - 그 외 기본: 1080초(18분)
67
+
68
+ ## tfx-route.sh 팀 통합 동작
69
+
70
+ `TFX_TEAM_*` 환경변수 기반 (이미 구현됨):
71
+ - `TFX_TEAM_NAME`: 팀 식별자
72
+ - `TFX_TEAM_TASK_ID`: 작업 식별자
73
+ - `TFX_TEAM_AGENT_NAME`: 워커 표기 이름
74
+ - `TFX_TEAM_LEAD_NAME`: 리드 수신자 이름 (기본 `team-lead`)
75
+
76
+ Hub 통신 (Named Pipe 우선, HTTP fallback):
77
+ - `bridge.mjs`가 Named Pipe(`\\.\pipe\triflux-{pid}`) 우선 연결, 실패 시 HTTP `/bridge/*` fallback
78
+ - 실행 시작: `node hub/bridge.mjs team-task-update --team {name} --task-id {id} --claim --status in_progress`
79
+ - 실행 종료: `node hub/bridge.mjs team-task-update --team {name} --task-id {id} --status completed|failed`
80
+ - 리드 보고: `node hub/bridge.mjs team-send-message --team {name} --from {agent} --to team-lead --text "..."`
81
+ - 결과 발행: `node hub/bridge.mjs result --agent {id} --topic task.result --file {output}`
@@ -0,0 +1,66 @@
1
+ # --thorough 파이프라인 상세
2
+
3
+ > `--quick`(기본) 모드에서는 이 파일의 내용이 적용되지 않는다.
4
+ > `--thorough` 모드에서만 Phase 2.5-2.6과 Phase 3.5-3.7이 실행된다.
5
+
6
+ ## Phase 2.5: Plan (Codex architect)
7
+
8
+ 1. Hub pipeline 초기화:
9
+ ```bash
10
+ Bash("node hub/bridge.mjs pipeline-advance --team ${teamName} --status plan")
11
+ ```
12
+ — 또는 createPipeline(db, teamName) 직접 호출
13
+ 2. Codex architect로 작업 분석 + 접근법 설계:
14
+ ```bash
15
+ bash ~/.claude/scripts/tfx-route.sh architect "${task}" analyze
16
+ ```
17
+ 3. 결과를 파이프라인 artifact에 저장:
18
+ ```
19
+ pipeline.setArtifact('plan_path', planOutputPath)
20
+ ```
21
+ 4. pipeline advance: plan → prd
22
+
23
+ ## Phase 2.6: PRD (Codex analyst)
24
+
25
+ 1. Codex analyst로 수용 기준 확정:
26
+ ```bash
27
+ bash ~/.claude/scripts/tfx-route.sh analyst "${task}" analyze
28
+ ```
29
+ 2. 결과를 파이프라인 artifact에 저장:
30
+ ```
31
+ pipeline.setArtifact('prd_path', prdOutputPath)
32
+ ```
33
+ 3. pipeline advance: prd → exec
34
+
35
+ ## Phase 3.5: Verify (Codex review)
36
+
37
+ 1. pipeline advance: exec → verify
38
+ 2. Codex verifier로 결과 검증:
39
+ ```bash
40
+ bash ~/.claude/scripts/tfx-route.sh verifier "결과 검증: ${task}" review
41
+ ```
42
+ — verifier는 Codex --profile thorough review로 실행됨
43
+ 3. 검증 결과를 파이프라인 artifact에 저장:
44
+ ```
45
+ pipeline.setArtifact('verify_report', verifyOutputPath)
46
+ ```
47
+ 4. 통과 → pipeline advance: verify → complete → Phase 5 (cleanup)
48
+ 5. 실패 → Phase 3.6
49
+
50
+ ## Phase 3.6: Fix (Codex executor, max 3회)
51
+
52
+ 1. pipeline advance: verify → fix
53
+ — fix_attempt 자동 증가, fix_max(3) 초과 시 전이 거부
54
+ 2. fix_attempt > fix_max → Phase 3.7 (ralph loop) 또는 failed 보고 → Phase 5
55
+ 3. Codex executor로 실패 항목 수정:
56
+ ```bash
57
+ bash ~/.claude/scripts/tfx-route.sh executor "실패 항목 수정: ${failedItems}" implement
58
+ ```
59
+ 4. pipeline advance: fix → exec (재실행)
60
+ 5. → Phase 3 (exec) → Phase 3.5 (verify) 재실행
61
+
62
+ ## Phase 3.7: Ralph Loop (fix 3회 초과 시)
63
+
64
+ 1. ralph_iteration 증가 (pipeline.restart())
65
+ 2. ralph_iteration > ralph_max(10) → 최종 failed → Phase 5
66
+ 3. fix_attempt 리셋, 전체 파이프라인 재시작 (Phase 2.5 plan부터)
@@ -0,0 +1,79 @@
1
+ {
2
+ "skill_name": "tfx-skills-suite",
3
+ "evals": [
4
+ {
5
+ "id": 1,
6
+ "prompt": "You are a Claude Code agent. Read the tfx-auto skill definition, then explain how you would handle this user request: '/implement JWT 인증 미들웨어 추가해줘'. List the EXACT bash commands you would run. Do NOT actually execute them.",
7
+ "expected_output": "Should route to executor agent via tfx-route.sh with 'implement' MCP profile. Command: bash ~/.claude/scripts/tfx-route.sh executor 'JWT 인증 미들웨어 추가해줘' implement",
8
+ "files": [],
9
+ "expectations": [
10
+ "Routes to 'executor' agent (not architect, not analyst)",
11
+ "Uses 'implement' MCP profile",
12
+ "Generates correct tfx-route.sh command syntax",
13
+ "Does NOT trigger triage (single command shortcut)",
14
+ "Does NOT delegate to tfx-multi"
15
+ ]
16
+ },
17
+ {
18
+ "id": 2,
19
+ "prompt": "You are a Claude Code agent. Read the tfx-auto skill definition, then explain how you would handle: '/tfx-auto 프론트엔드 리팩터링하고 보안 리뷰도 해줘'. List all routing decisions, triage steps, and delegation.",
20
+ "expected_output": "Should enter auto triage mode, classify via Codex, decompose into 2+ subtasks, then delegate to tfx-multi Phase 3",
21
+ "files": [],
22
+ "expectations": [
23
+ "Identifies this as auto mode (not command shortcut)",
24
+ "Triggers Codex classification step",
25
+ "Decomposes into at least 2 subtasks",
26
+ "Notes delegation to tfx-multi for subtasks >= 2",
27
+ "Does NOT try to execute all subtasks directly"
28
+ ]
29
+ },
30
+ {
31
+ "id": 3,
32
+ "prompt": "You are a Claude Code agent. Read the tfx-multi skill definition, then explain step-by-step how you would handle: '/tfx-multi 인증 리팩터링 + UI 개선 + 보안 리뷰'. List all TeamCreate, TaskCreate, Agent calls with exact parameters.",
33
+ "expected_output": "Should create team, 3 TaskCreates, 3 Agent spawns with slim wrapper structure following Phase 0-5",
34
+ "files": [],
35
+ "expectations": [
36
+ "Creates exactly one TeamCreate with tfx- prefix naming",
37
+ "Creates 3 TaskCreate calls (one per subtask)",
38
+ "Spawns 3 Agent wrappers with mode: bypassPermissions",
39
+ "Uses tfx-route.sh inside Agent wrapper (not direct codex/gemini)",
40
+ "Includes Phase 5 cleanup (TeamDelete)"
41
+ ]
42
+ },
43
+ {
44
+ "id": 4,
45
+ "prompt": "You are a Claude Code agent. Read the tfx-doctor skill definition, then explain how you would handle: 'HUD가 안 보이고 codex도 안 되는데 어떻게 해?'. List exact commands and reasoning.",
46
+ "expected_output": "Should suggest running triflux doctor first, then triflux doctor --fix if issues found",
47
+ "files": [],
48
+ "expectations": [
49
+ "Runs 'triflux doctor' as first diagnostic step",
50
+ "Suggests '--fix' mode for auto-repair",
51
+ "Mentions HUD and CLI path checks in explanation",
52
+ "Does NOT jump straight to --reset (that's for cache only)"
53
+ ]
54
+ },
55
+ {
56
+ "id": 5,
57
+ "prompt": "You are a Claude Code agent. Read the tfx-hub skill definition, then explain how you would handle: '/tfx-hub start'. List exact commands.",
58
+ "expected_output": "Should run 'node hub/server.mjs' in background",
59
+ "files": [],
60
+ "expectations": [
61
+ "Runs 'node hub/server.mjs' with run_in_background=true",
62
+ "Mentions port 27888 and /mcp endpoint",
63
+ "Does NOT try to run any triage or routing"
64
+ ]
65
+ },
66
+ {
67
+ "id": 6,
68
+ "prompt": "You are a Claude Code agent. Read the tfx-codex skill definition, then explain the Gemini-to-Codex remapping. For '/tfx-codex API 문서를 작성하고 디자인 가이드도 만들어줘', list the routing showing how designer/writer get remapped.",
69
+ "expected_output": "designer remapped to Codex(high), writer to Codex Spark(spark_fast), TFX_CLI_MODE=codex env var",
70
+ "files": [],
71
+ "expectations": [
72
+ "designer remapped to Codex with effort: high",
73
+ "writer remapped to Codex Spark with effort: spark_fast",
74
+ "Sets TFX_CLI_MODE=codex environment variable",
75
+ "Changes MCP profile: designer->implement, writer->analyze"
76
+ ]
77
+ }
78
+ ]
79
+ }
@@ -0,0 +1,162 @@
1
+ {
2
+ "metadata": {
3
+ "skill_name": "tfx-skills-suite",
4
+ "skill_path": "C:/Users/SSAFY/Desktop/Projects/cli/triflux/skills",
5
+ "executor_model": "claude-sonnet-4-6",
6
+ "analyzer_model": "claude-opus-4-6",
7
+ "timestamp": "2026-03-19T10:00:00Z",
8
+ "evals_run": [1, 2, 3, 4, 5, 6],
9
+ "runs_per_configuration": 1
10
+ },
11
+ "runs": [
12
+ {
13
+ "eval_id": 1, "eval_name": "routing-implement-shortcut", "configuration": "with_skill", "run_number": 1,
14
+ "result": {"pass_rate": 1.0, "passed": 5, "failed": 0, "total": 5, "time_seconds": 43.6, "tokens": 16303, "tool_calls": 4, "errors": 0},
15
+ "expectations": [
16
+ {"text": "Routes to executor agent", "passed": true, "evidence": "Correctly mapped from implement shortcut table"},
17
+ {"text": "Uses implement MCP profile", "passed": true, "evidence": "Mapped from shortcut table"},
18
+ {"text": "Generates correct tfx-route.sh command", "passed": true, "evidence": "bash ~/.claude/scripts/tfx-route.sh executor '...' implement"},
19
+ {"text": "Does NOT trigger triage", "passed": true, "evidence": "Command shortcut skips triage"},
20
+ {"text": "Does NOT delegate to tfx-multi", "passed": true, "evidence": "No subtask decomposition occurred"}
21
+ ]
22
+ },
23
+ {
24
+ "eval_id": 1, "eval_name": "routing-implement-shortcut", "configuration": "without_skill", "run_number": 1,
25
+ "result": {"pass_rate": 1.0, "passed": 5, "failed": 0, "total": 5, "time_seconds": 48.1, "tokens": 16436, "tool_calls": 4, "errors": 0},
26
+ "expectations": [
27
+ {"text": "Routes to executor agent", "passed": true, "evidence": "Correctly mapped"},
28
+ {"text": "Uses implement MCP profile", "passed": true, "evidence": "Assigned by shortcut table"},
29
+ {"text": "Generates correct tfx-route.sh command", "passed": true, "evidence": "Correct syntax generated"},
30
+ {"text": "Does NOT trigger triage", "passed": true, "evidence": "Shortcut mode skips triage"},
31
+ {"text": "Does NOT delegate to tfx-multi", "passed": true, "evidence": "No delegation"}
32
+ ]
33
+ },
34
+ {
35
+ "eval_id": 2, "eval_name": "routing-multi-task-triage", "configuration": "with_skill", "run_number": 1,
36
+ "result": {"pass_rate": 1.0, "passed": 5, "failed": 0, "total": 5, "time_seconds": 58.2, "tokens": 17584, "tool_calls": 3, "errors": 0},
37
+ "expectations": [
38
+ {"text": "Identifies as auto mode", "passed": true, "evidence": "No shortcut match, auto mode selected"},
39
+ {"text": "Triggers Codex classification", "passed": true, "evidence": "Codex --full-auto classification triggered"},
40
+ {"text": "Decomposes into 2+ subtasks", "passed": true, "evidence": "2 subtasks: executor + security-reviewer"},
41
+ {"text": "Notes tfx-multi delegation", "passed": true, "evidence": "subtasks.length >= 2 triggers tfx-multi Phase 3"},
42
+ {"text": "Does NOT execute directly", "passed": true, "evidence": "Delegates to tfx-multi"}
43
+ ]
44
+ },
45
+ {
46
+ "eval_id": 2, "eval_name": "routing-multi-task-triage", "configuration": "without_skill", "run_number": 1,
47
+ "result": {"pass_rate": 1.0, "passed": 5, "failed": 0, "total": 5, "time_seconds": 77.2, "tokens": 18626, "tool_calls": 4, "errors": 0},
48
+ "expectations": [
49
+ {"text": "Identifies as auto mode", "passed": true, "evidence": "Auto mode selected"},
50
+ {"text": "Triggers Codex classification", "passed": true, "evidence": "Codex --full-auto triggered"},
51
+ {"text": "Decomposes into 2+ subtasks", "passed": true, "evidence": "2 subtasks decomposed"},
52
+ {"text": "Notes tfx-multi delegation", "passed": true, "evidence": "Hands off to tfx-multi Phase 3"},
53
+ {"text": "Does NOT execute directly", "passed": true, "evidence": "Delegates correctly"}
54
+ ]
55
+ },
56
+ {
57
+ "eval_id": 3, "eval_name": "multi-team-creation", "configuration": "with_skill", "run_number": 1,
58
+ "result": {"pass_rate": 1.0, "passed": 5, "failed": 0, "total": 5, "time_seconds": 115.3, "tokens": 27197, "tool_calls": 3, "errors": 0},
59
+ "expectations": [
60
+ {"text": "Creates TeamCreate with tfx- prefix", "passed": true, "evidence": "TeamCreate({ team_name: 'tfx-<base36>' })"},
61
+ {"text": "Creates 3 TaskCreate calls", "passed": true, "evidence": "3x TaskCreate with metadata"},
62
+ {"text": "Spawns 3 Agent wrappers with bypassPermissions", "passed": true, "evidence": "3x Agent({ mode: bypassPermissions })"},
63
+ {"text": "Uses tfx-route.sh inside wrappers", "passed": true, "evidence": "Direct codex/gemini calls prohibited"},
64
+ {"text": "Includes Phase 5 TeamDelete", "passed": true, "evidence": "TeamDelete always runs, max 30s wait"}
65
+ ]
66
+ },
67
+ {
68
+ "eval_id": 3, "eval_name": "multi-team-creation", "configuration": "without_skill", "run_number": 1,
69
+ "result": {"pass_rate": 1.0, "passed": 5, "failed": 0, "total": 5, "time_seconds": 100.6, "tokens": 26140, "tool_calls": 3, "errors": 0},
70
+ "expectations": [
71
+ {"text": "Creates TeamCreate with tfx- prefix", "passed": true, "evidence": "TeamCreate with tfx-<id>"},
72
+ {"text": "Creates 3 TaskCreate calls", "passed": true, "evidence": "Three TaskCreate calls"},
73
+ {"text": "Spawns 3 Agent wrappers with bypassPermissions", "passed": true, "evidence": "mode: bypassPermissions in all 3"},
74
+ {"text": "Uses tfx-route.sh inside wrappers", "passed": true, "evidence": "Never direct codex/gemini calls"},
75
+ {"text": "Includes Phase 5 TeamDelete", "passed": true, "evidence": "TeamDelete unconditionally"}
76
+ ]
77
+ },
78
+ {
79
+ "eval_id": 4, "eval_name": "doctor-diagnosis", "configuration": "with_skill", "run_number": 1,
80
+ "result": {"pass_rate": 1.0, "passed": 4, "failed": 0, "total": 4, "time_seconds": 53.8, "tokens": 14499, "tool_calls": 4, "errors": 0},
81
+ "expectations": [
82
+ {"text": "Runs triflux doctor first", "passed": true, "evidence": "Bash(\"triflux doctor\")"},
83
+ {"text": "Suggests --fix mode", "passed": true, "evidence": "Suggests after diagnosis report"},
84
+ {"text": "Mentions HUD and CLI checks", "passed": true, "evidence": "HUD and CLI paths checked"},
85
+ {"text": "Does NOT jump to --reset", "passed": true, "evidence": "--reset reserved for explicit request"}
86
+ ]
87
+ },
88
+ {
89
+ "eval_id": 4, "eval_name": "doctor-diagnosis", "configuration": "without_skill", "run_number": 1,
90
+ "result": {"pass_rate": 1.0, "passed": 4, "failed": 0, "total": 4, "time_seconds": 48.3, "tokens": 14482, "tool_calls": 3, "errors": 0},
91
+ "expectations": [
92
+ {"text": "Runs triflux doctor first", "passed": true, "evidence": "Bash(\"triflux doctor\")"},
93
+ {"text": "Suggests --fix mode", "passed": true, "evidence": "Offers --fix after diagnosis"},
94
+ {"text": "Mentions HUD and CLI checks", "passed": true, "evidence": "All 8 diagnostics listed"},
95
+ {"text": "Does NOT jump to --reset", "passed": true, "evidence": "--reset reserved for explicit request"}
96
+ ]
97
+ },
98
+ {
99
+ "eval_id": 5, "eval_name": "hub-start-sequence", "configuration": "with_skill", "run_number": 1,
100
+ "result": {"pass_rate": 1.0, "passed": 3, "failed": 0, "total": 3, "time_seconds": 47.2, "tokens": 14821, "tool_calls": 4, "errors": 0},
101
+ "expectations": [
102
+ {"text": "Runs node hub/server.mjs in background", "passed": true, "evidence": "Bash(\"node hub/server.mjs\", run_in_background=true)"},
103
+ {"text": "Mentions port 27888 and /mcp", "passed": true, "evidence": "Port 27888, http://127.0.0.1:27888/mcp"},
104
+ {"text": "No triage or routing attempted", "passed": true, "evidence": "Command match, not fallthrough"}
105
+ ]
106
+ },
107
+ {
108
+ "eval_id": 5, "eval_name": "hub-start-sequence", "configuration": "without_skill", "run_number": 1,
109
+ "result": {"pass_rate": 1.0, "passed": 3, "failed": 0, "total": 3, "time_seconds": 51.8, "tokens": 14904, "tool_calls": 4, "errors": 0},
110
+ "expectations": [
111
+ {"text": "Runs node hub/server.mjs in background", "passed": true, "evidence": "Bash(\"node hub/server.mjs\", run_in_background=true)"},
112
+ {"text": "Mentions port 27888 and /mcp", "passed": true, "evidence": "Port 27888, endpoint /mcp"},
113
+ {"text": "No triage or routing attempted", "passed": true, "evidence": "Command match, not fallthrough"}
114
+ ]
115
+ },
116
+ {
117
+ "eval_id": 6, "eval_name": "codex-gemini-remap", "configuration": "with_skill", "run_number": 1,
118
+ "result": {"pass_rate": 1.0, "passed": 4, "failed": 0, "total": 4, "time_seconds": 69.7, "tokens": 14889, "tool_calls": 5, "errors": 0},
119
+ "expectations": [
120
+ {"text": "designer remapped to Codex (effort: high)", "passed": true, "evidence": "designer → Codex (effort: high)"},
121
+ {"text": "writer remapped to Codex Spark (spark_fast)", "passed": true, "evidence": "writer → Codex Spark (effort: spark_fast)"},
122
+ {"text": "TFX_CLI_MODE=codex set", "passed": true, "evidence": "Set for every Phase 3 call"},
123
+ {"text": "MCP profiles changed", "passed": true, "evidence": "designer→implement, writer→analyze"}
124
+ ]
125
+ },
126
+ {
127
+ "eval_id": 6, "eval_name": "codex-gemini-remap", "configuration": "without_skill", "run_number": 1,
128
+ "result": {"pass_rate": 1.0, "passed": 4, "failed": 0, "total": 4, "time_seconds": 85.2, "tokens": 19802, "tool_calls": 7, "errors": 0},
129
+ "expectations": [
130
+ {"text": "designer remapped to Codex (effort: high)", "passed": true, "evidence": "designer → Codex (effort: high)"},
131
+ {"text": "writer remapped to Codex Spark (spark_fast)", "passed": true, "evidence": "writer → Codex Spark (effort: spark_fast)"},
132
+ {"text": "TFX_CLI_MODE=codex set", "passed": true, "evidence": "TFX_CLI_MODE set to codex"},
133
+ {"text": "MCP profiles changed", "passed": true, "evidence": "writer→analyze, designer→implement"}
134
+ ]
135
+ }
136
+ ],
137
+ "run_summary": {
138
+ "with_skill": {
139
+ "pass_rate": {"mean": 1.0, "stddev": 0.0, "min": 1.0, "max": 1.0},
140
+ "time_seconds": {"mean": 64.6, "stddev": 26.4, "min": 43.6, "max": 115.3},
141
+ "tokens": {"mean": 17549, "stddev": 4857, "min": 14499, "max": 27197}
142
+ },
143
+ "without_skill": {
144
+ "pass_rate": {"mean": 1.0, "stddev": 0.0, "min": 1.0, "max": 1.0},
145
+ "time_seconds": {"mean": 68.5, "stddev": 20.4, "min": 48.1, "max": 100.6},
146
+ "tokens": {"mean": 18398, "stddev": 4227, "min": 14482, "max": 26140}
147
+ },
148
+ "delta": {
149
+ "pass_rate": "+0.00",
150
+ "time_seconds": "-3.9",
151
+ "tokens": "-849"
152
+ }
153
+ },
154
+ "notes": [
155
+ "All 26 assertions pass at 100% for both configurations — the skills are functionally correct",
156
+ "The fixes applied (dead reference removal, Phase numbering consistency, hub description) don't change routing logic, so pass rates are identical",
157
+ "NEW version is marginally faster (-3.9s avg) and uses fewer tokens (-849 avg), likely due to cleaner references reducing model confusion",
158
+ "tfx-multi is the most complex skill (115s / 27K tokens with_skill) — consider extracting reference docs to reduce context load",
159
+ "tfx-codex OLD references 'Phase(1~6)' which doesn't exist in tfx-auto — the NEW version correctly references the actual workflow names",
160
+ "All assertions pass regardless of configuration — these test the core routing logic which is unchanged. Consider adding assertions that specifically test the fixed issues (dead refs, phase naming) for differentiation"
161
+ ]
162
+ }
@@ -0,0 +1,11 @@
1
+ {
2
+ "eval_id": 6,
3
+ "eval_name": "codex-gemini-remap",
4
+ "prompt": "/tfx-codex API 문서를 작성하고 디자인 가이드도 만들어줘",
5
+ "assertions": [
6
+ "designer remapped to Codex with effort: high",
7
+ "writer remapped to Codex Spark with effort: spark_fast",
8
+ "Sets TFX_CLI_MODE=codex environment variable",
9
+ "Changes MCP profile: designer->implement, writer->analyze"
10
+ ]
11
+ }
@@ -0,0 +1,9 @@
1
+ {
2
+ "expectations": [
3
+ {"text": "designer remapped to Codex with effort: high", "passed": true, "evidence": "Agent output: designer → Codex (effort: high)"},
4
+ {"text": "writer remapped to Codex Spark with effort: spark_fast", "passed": true, "evidence": "Agent output: writer → Codex Spark (effort: spark_fast)"},
5
+ {"text": "Sets TFX_CLI_MODE=codex environment variable", "passed": true, "evidence": "Agent output: 'TFX_CLI_MODE: Set to codex'"},
6
+ {"text": "Changes MCP profile: designer->implement, writer->analyze", "passed": true, "evidence": "Agent output: writer→analyze, designer→implement"}
7
+ ],
8
+ "summary": {"passed": 4, "failed": 0, "total": 4, "pass_rate": 1.0}
9
+ }
@@ -0,0 +1,154 @@
1
+ # tfx-codex 라우팅 분석 — DRY RUN
2
+
3
+ **요청**: `/tfx-codex API 문서를 작성하고 디자인 가이드도 만들어줘`
4
+ **분석 기준 SKILL**: `skills/tfx-workspace/skill-snapshot/tfx-codex/SKILL.md`
5
+
6
+ ---
7
+
8
+ ## 1. 에이전트 리매핑 테이블
9
+
10
+ 이 요청은 두 개의 독립 서브태스크로 분해됩니다:
11
+ - 서브태스크 A: "API 문서를 작성" → **writer** 역할
12
+ - 서브태스크 B: "디자인 가이드도 만들어줘" → **designer** 역할
13
+
14
+ | 에이전트 | 원래 CLI (tfx-auto) | tfx-codex에서 | effort 파라미터 | MCP 프로필 |
15
+ |----------|---------------------|---------------|-----------------|-----------|
16
+ | **writer** | ~~Gemini~~ (`docs` MCP) | **Codex** (effort: spark_fast) — Codex Spark 경량 문서 | `spark_fast` | `analyze` |
17
+ | **designer** | ~~Gemini~~ (`docs` MCP) | **Codex** (effort: high) — UI 코드 생성 | `high` | `implement` |
18
+
19
+ ### 원본 tfx-auto 기준 (리매핑 전)
20
+
21
+ `tfx-auto` SKILL.md의 에이전트 매핑 테이블에서:
22
+
23
+ ```
24
+ | gemini / designer / writer | Gemini | docs |
25
+ ```
26
+
27
+ 즉, 원래 두 역할 모두 Gemini CLI + `docs` MCP로 실행됩니다.
28
+
29
+ ### tfx-codex 기준 (리매핑 후)
30
+
31
+ `tfx-codex` SKILL.md의 에이전트 라우팅 테이블에서:
32
+
33
+ ```
34
+ | designer | ~~Gemini~~ | Codex (effort: high) — UI 코드 생성 | implement |
35
+ | writer | ~~Gemini~~ | Codex Spark (effort: spark_fast) — 경량 문서 | analyze |
36
+ ```
37
+
38
+ ---
39
+
40
+ ## 2. TFX_CLI_MODE 환경변수
41
+
42
+ ```
43
+ TFX_CLI_MODE=codex
44
+ ```
45
+
46
+ 이 환경변수는 tfx-route.sh에 전달되어 Gemini 에이전트가 선택될 경우 Codex로 강제 교체하도록 지시합니다. Phase 2 트리아지에서 Codex 분류기가 `gemini`를 반환하더라도 이 값에 의해 `codex`로 교체됩니다.
47
+
48
+ ---
49
+
50
+ ## 3. Phase 2 트리아지 동작
51
+
52
+ **자동 모드** (`/tfx-codex "API 문서를 작성하고 디자인 가이드도 만들어줘"`):
53
+
54
+ 1. **Codex 분류** (`--full-auto --skip-git-repo-check`):
55
+ - 입력 파싱 결과 예상 JSON:
56
+ ```json
57
+ {
58
+ "parts": [
59
+ { "description": "API 문서 작성", "agent": "gemini" },
60
+ { "description": "디자인 가이드 생성", "agent": "gemini" }
61
+ ]
62
+ }
63
+ ```
64
+ - `TFX_CLI_MODE=codex` 적용 → 두 항목 모두 `"gemini"` → **`"codex"`로 강제 교체**
65
+
66
+ 2. **Opus 인라인 분해** (강제 변환 이후):
67
+ - `writer` 역할: MCP 프로필 `analyze` 할당
68
+ - `designer` 역할: MCP 프로필 `implement` 할당
69
+ - 두 서브태스크는 독립적(INDEPENDENT), `graph_type: "INDEPENDENT"`
70
+
71
+ 3. **서브태스크 수 = 2** → tfx-multi Native Teams 모드로 자동 전환 (tfx-auto 규칙: 2개 이상 시 tfx-multi Phase 3)
72
+
73
+ ---
74
+
75
+ ## 4. 생성되는 Bash 커맨드 (서브태스크별)
76
+
77
+ 서브태스크가 2개이므로 tfx-multi Phase 3a(TeamCreate) → Phase 3b(TaskCreate) → Phase 3c(Agent 래퍼 spawn) 순서로 실행됩니다. 각 Agent 래퍼 내부에서 다음 Bash 커맨드가 실행됩니다:
78
+
79
+ ### 서브태스크 A — writer (API 문서 작성)
80
+
81
+ ```bash
82
+ TFX_CLI_MODE=codex bash ~/.claude/scripts/tfx-route.sh writer 'API 문서를 작성해줘' analyze
83
+ ```
84
+
85
+ - `writer` 에이전트: Codex Spark (`effort: spark_fast`) 로 실행
86
+ - MCP 프로필: `analyze` (문서 기반 리서치+작성)
87
+ - `run_in_background=true` (INDEPENDENT 병렬 실행)
88
+
89
+ ### 서브태스크 B — designer (디자인 가이드 생성)
90
+
91
+ ```bash
92
+ TFX_CLI_MODE=codex bash ~/.claude/scripts/tfx-route.sh designer '디자인 가이드를 만들어줘' implement
93
+ ```
94
+
95
+ - `designer` 에이전트: Codex (`effort: high`) 로 실행
96
+ - MCP 프로필: `implement` (코드 기반 UI 작업)
97
+ - `run_in_background=true` (INDEPENDENT 병렬 실행)
98
+
99
+ > 두 서브태스크는 `depends_on` 없이 Level 0에서 병렬 실행됩니다.
100
+
101
+ ---
102
+
103
+ ## 5. MCP 프로필 변화 상세
104
+
105
+ | 에이전트 | tfx-auto 원본 MCP | tfx-codex 변경 후 MCP | 변경 이유 |
106
+ |----------|-------------------|----------------------|----------|
107
+ | **writer** | `docs` | `analyze` | Gemini → Codex 전환 시 문서 리서치+작성에 적합한 `analyze` 프로필 사용 |
108
+ | **designer** | `docs` | `implement` | Gemini → Codex 전환 시 UI 코드 생성에 적합한 `implement` 프로필 사용 |
109
+
110
+ 원래 `docs` MCP는 Gemini CLI의 웹 검색/문서 접근 기능을 전제로 설계되었습니다. Codex로 리매핑 시 각 역할의 실제 작업 성격에 맞는 프로필로 교체됩니다.
111
+
112
+ ---
113
+
114
+ ## 6. 워크플로우 레퍼런스
115
+
116
+ **tfx-codex는 tfx-auto SKILL.md의 Phase 1~6 전체를 그대로 따릅니다.**
117
+
118
+ ```
119
+ Phase 1: 입력 파싱 — 트리거 `/tfx-codex` 인식, 인자 추출
120
+ Phase 2: 트리아지
121
+ - Codex 분류 실행 (TFX_CLI_MODE=codex)
122
+ - gemini 반환값 → codex 강제 교체
123
+ - Opus 인라인 분해 (writer→analyze MCP, designer→implement MCP)
124
+ Phase 3: CLI 실행
125
+ - TFX_CLI_MODE=codex 환경변수 포함
126
+ - tfx-route.sh 호출
127
+ - 서브태스크 2개 → tfx-multi Phase 3 전환
128
+ Phase 4: 결과 수집
129
+ - exit_code 0: === OUTPUT === 섹션 파싱
130
+ - exit_code 124: === PARTIAL OUTPUT === 사용
131
+ - 그 외: STDERR → Claude fallback
132
+ Phase 5: 실패 처리
133
+ - 1차: Claude executor(sonnet) fallback
134
+ - 2차: 실패 보고 + 성공 결과만 종합
135
+ Phase 6: 보고 형식 출력
136
+ - 모드/그래프/레벨/서브태스크 상태 테이블
137
+ - Token Savings Report
138
+ ```
139
+
140
+ **핵심 차이점 요약**: Phase 2와 Phase 3에서만 동작이 달라집니다.
141
+ - Phase 2: gemini 분류 결과를 codex로 강제 변환 + MCP 프로필 재할당
142
+ - Phase 3: 모든 tfx-route.sh 호출에 `TFX_CLI_MODE=codex` 접두 추가
143
+
144
+ ---
145
+
146
+ ## 7. 요약
147
+
148
+ 이 요청(`/tfx-codex API 문서를 작성하고 디자인 가이드도 만들어줘`)은 다음과 같이 처리됩니다:
149
+
150
+ 1. 두 서브태스크로 분해 (INDEPENDENT 그래프)
151
+ 2. 원래 Gemini로 라우팅될 `writer`와 `designer` 모두 Codex로 리매핑
152
+ 3. MCP 프로필: writer → `analyze`, designer → `implement` (원본 `docs`에서 변경)
153
+ 4. 서브태스크 2개이므로 tfx-multi Native Teams 모드로 자동 전환하여 병렬 실행
154
+ 5. 전체 Phase 1~6은 tfx-auto 워크플로우를 그대로 따름
@@ -0,0 +1,5 @@
1
+ {
2
+ "total_tokens": 19802,
3
+ "duration_ms": 85239,
4
+ "total_duration_seconds": 85.2
5
+ }
@@ -0,0 +1,9 @@
1
+ {
2
+ "expectations": [
3
+ {"text": "designer remapped to Codex with effort: high", "passed": true, "evidence": "Agent output: designer → Codex (effort: high)"},
4
+ {"text": "writer remapped to Codex Spark with effort: spark_fast", "passed": true, "evidence": "Agent output: writer → Codex Spark (effort: spark_fast)"},
5
+ {"text": "Sets TFX_CLI_MODE=codex environment variable", "passed": true, "evidence": "Agent output: 'TFX_CLI_MODE: Set to codex for every Phase 3 call'"},
6
+ {"text": "Changes MCP profile: designer->implement, writer->analyze", "passed": true, "evidence": "Agent output: writer→analyze, designer→implement (changed from docs)"}
7
+ ],
8
+ "summary": {"passed": 4, "failed": 0, "total": 4, "pass_rate": 1.0}
9
+ }