forge-next 0.1.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (147) hide show
  1. forge_next-0.1.1/PKG-INFO +297 -0
  2. forge_next-0.1.1/README.md +288 -0
  3. forge_next-0.1.1/forge_codex/__init__.py +7 -0
  4. forge_next-0.1.1/forge_codex/assets/__init__.py +2 -0
  5. forge_next-0.1.1/forge_codex/assets/prompts/__init__.py +2 -0
  6. forge_next-0.1.1/forge_codex/assets/prompts/code-review/architecture_check.md +78 -0
  7. forge_next-0.1.1/forge_codex/assets/prompts/code-review/deep_dive.md +42 -0
  8. forge_next-0.1.1/forge_codex/assets/prompts/code-review/diff_analysis.md +73 -0
  9. forge_next-0.1.1/forge_codex/assets/prompts/code-review/discussion.md +48 -0
  10. forge_next-0.1.1/forge_codex/assets/prompts/code-review/mode_selection.md +45 -0
  11. forge_next-0.1.1/forge_codex/assets/prompts/code-review/report.md +42 -0
  12. forge_next-0.1.1/forge_codex/assets/prompts/code-review/security_scan.md +76 -0
  13. forge_next-0.1.1/forge_codex/assets/prompts/code-review/target_detection.md +31 -0
  14. forge_next-0.1.1/forge_codex/assets/prompts/develop/approval.md +30 -0
  15. forge_next-0.1.1/forge_codex/assets/prompts/develop/handoff.md +21 -0
  16. forge_next-0.1.1/forge_codex/assets/prompts/develop/investigation.md +25 -0
  17. forge_next-0.1.1/forge_codex/assets/prompts/develop/investigation_review.md +14 -0
  18. forge_next-0.1.1/forge_codex/assets/prompts/develop/scope.md +38 -0
  19. forge_next-0.1.1/forge_codex/assets/prompts/develop/solution.md +148 -0
  20. forge_next-0.1.1/forge_codex/assets/prompts/develop/startup.md +25 -0
  21. forge_next-0.1.1/forge_codex/assets/prompts/diagnose/analyze.md +31 -0
  22. forge_next-0.1.1/forge_codex/assets/prompts/diagnose/decompose.md +17 -0
  23. forge_next-0.1.1/forge_codex/assets/prompts/diagnose/define.md +34 -0
  24. forge_next-0.1.1/forge_codex/assets/prompts/diagnose/evidence.md +26 -0
  25. forge_next-0.1.1/forge_codex/assets/prompts/diagnose/quick_fix.md +25 -0
  26. forge_next-0.1.1/forge_codex/assets/prompts/diagnose/report.md +26 -0
  27. forge_next-0.1.1/forge_codex/assets/prompts/diagnose/solutions.md +22 -0
  28. forge_next-0.1.1/forge_codex/assets/prompts/implement/branch_setup.md +42 -0
  29. forge_next-0.1.1/forge_codex/assets/prompts/implement/documentation.md +28 -0
  30. forge_next-0.1.1/forge_codex/assets/prompts/implement/handoff.md +33 -0
  31. forge_next-0.1.1/forge_codex/assets/prompts/implement/integration_check.md +55 -0
  32. forge_next-0.1.1/forge_codex/assets/prompts/implement/plan_detect.md +35 -0
  33. forge_next-0.1.1/forge_codex/assets/prompts/implement/wave_complete.md +32 -0
  34. forge_next-0.1.1/forge_codex/assets/prompts/implement/wave_dispatch.md +53 -0
  35. forge_next-0.1.1/forge_codex/assets/prompts/implement/wave_review.md +58 -0
  36. forge_next-0.1.1/forge_codex/assets/prompts/plan/approval.md +34 -0
  37. forge_next-0.1.1/forge_codex/assets/prompts/plan/architecture.md +17 -0
  38. forge_next-0.1.1/forge_codex/assets/prompts/plan/context.md +12 -0
  39. forge_next-0.1.1/forge_codex/assets/prompts/plan/creation.md +29 -0
  40. forge_next-0.1.1/forge_codex/assets/prompts/plan/handoff.md +29 -0
  41. forge_next-0.1.1/forge_codex/assets/prompts/plan/review_loop.md +17 -0
  42. forge_next-0.1.1/forge_codex/assets/prompts/post/code_quality.md +42 -0
  43. forge_next-0.1.1/forge_codex/assets/prompts/post/completeness_audit.md +42 -0
  44. forge_next-0.1.1/forge_codex/assets/prompts/post/correctness.md +53 -0
  45. forge_next-0.1.1/forge_codex/assets/prompts/post/operational_readiness.md +74 -0
  46. forge_next-0.1.1/forge_codex/assets/prompts/post/performance.md +72 -0
  47. forge_next-0.1.1/forge_codex/assets/prompts/pre/codebase_alignment.md +37 -0
  48. forge_next-0.1.1/forge_codex/assets/prompts/pre/completeness.md +41 -0
  49. forge_next-0.1.1/forge_codex/assets/prompts/pre/feasibility.md +45 -0
  50. forge_next-0.1.1/forge_codex/assets/prompts/pre/risk_dependencies.md +82 -0
  51. forge_next-0.1.1/forge_codex/assets/prompts/report.md +58 -0
  52. forge_next-0.1.1/forge_codex/assets/prompts/review/findings_aggregation.md +31 -0
  53. forge_next-0.1.1/forge_codex/assets/prompts/review/remediation.md +35 -0
  54. forge_next-0.1.1/forge_codex/assets/prompts/review/team_dispatch.md +30 -0
  55. forge_next-0.1.1/forge_codex/assets/prompts/shared/discussion.md +27 -0
  56. forge_next-0.1.1/forge_codex/assets/prompts/shared/plan_parsing.md +21 -0
  57. forge_next-0.1.1/forge_codex/assets/prompts/test/context.md +36 -0
  58. forge_next-0.1.1/forge_codex/assets/prompts/test/coverage_gaps.md +72 -0
  59. forge_next-0.1.1/forge_codex/assets/prompts/test/discovery.md +60 -0
  60. forge_next-0.1.1/forge_codex/assets/prompts/test/execution.md +67 -0
  61. forge_next-0.1.1/forge_codex/assets/prompts/test/failure_analysis.md +58 -0
  62. forge_next-0.1.1/forge_codex/assets/prompts/test/flow_author.md +164 -0
  63. forge_next-0.1.1/forge_codex/assets/prompts/test/flow_context.md +9 -0
  64. forge_next-0.1.1/forge_codex/assets/prompts/test/flow_execute.md +115 -0
  65. forge_next-0.1.1/forge_codex/assets/prompts/test/flow_recommendation.md +140 -0
  66. forge_next-0.1.1/forge_codex/assets/prompts/test/flow_report.md +177 -0
  67. forge_next-0.1.1/forge_codex/assets/prompts/test/flow_scaffold.md +129 -0
  68. forge_next-0.1.1/forge_codex/assets/prompts/test/flow_scope.md +162 -0
  69. forge_next-0.1.1/forge_codex/assets/prompts/test/report.md +54 -0
  70. forge_next-0.1.1/forge_codex/assets/templates/__init__.py +2 -0
  71. forge_next-0.1.1/forge_codex/assets/templates/adr-template.md +69 -0
  72. forge_next-0.1.1/forge_codex/assets/templates/autonomy-levels.md +99 -0
  73. forge_next-0.1.1/forge_codex/assets/templates/beads-integration.md +80 -0
  74. forge_next-0.1.1/forge_codex/assets/templates/brainstorming-gates.md +296 -0
  75. forge_next-0.1.1/forge_codex/assets/templates/brainstorming.md +323 -0
  76. forge_next-0.1.1/forge_codex/assets/templates/code-smells.md +78 -0
  77. forge_next-0.1.1/forge_codex/assets/templates/codex-runtime.md +69 -0
  78. forge_next-0.1.1/forge_codex/assets/templates/dashboard.md +84 -0
  79. forge_next-0.1.1/forge_codex/assets/templates/data-analysis.md +288 -0
  80. forge_next-0.1.1/forge_codex/assets/templates/five-why-protocol.md +97 -0
  81. forge_next-0.1.1/forge_codex/assets/templates/handoff-protocol.md +136 -0
  82. forge_next-0.1.1/forge_codex/assets/templates/memory-README.md +61 -0
  83. forge_next-0.1.1/forge_codex/assets/templates/memory-protocol.md +97 -0
  84. forge_next-0.1.1/forge_codex/assets/templates/mock-flow-types.md +529 -0
  85. forge_next-0.1.1/forge_codex/assets/templates/parallel-dispatch.md +166 -0
  86. forge_next-0.1.1/forge_codex/assets/templates/pre-mortem.md +78 -0
  87. forge_next-0.1.1/forge_codex/assets/templates/review-loop.md +74 -0
  88. forge_next-0.1.1/forge_codex/assets/templates/scoring-rubric.md +48 -0
  89. forge_next-0.1.1/forge_codex/assets/templates/stage-approval.md +101 -0
  90. forge_next-0.1.1/forge_codex/assets/templates/stage-document.md +109 -0
  91. forge_next-0.1.1/forge_codex/assets/templates/stage-implement.md +90 -0
  92. forge_next-0.1.1/forge_codex/assets/templates/stage-investigate.md +69 -0
  93. forge_next-0.1.1/forge_codex/assets/templates/stage-plan.md +91 -0
  94. forge_next-0.1.1/forge_codex/assets/templates/stage-review.md +115 -0
  95. forge_next-0.1.1/forge_codex/assets/templates/stage-solution.md +79 -0
  96. forge_next-0.1.1/forge_codex/assets/templates/systematic-debugging.md +162 -0
  97. forge_next-0.1.1/forge_codex/assets/templates/tdd-protocol.md +213 -0
  98. forge_next-0.1.1/forge_codex/assets/templates/user-questions.md +42 -0
  99. forge_next-0.1.1/forge_codex/assets/templates/verification-protocol.md +219 -0
  100. forge_next-0.1.1/forge_codex/assets/templates/writing-plans.md +166 -0
  101. forge_next-0.1.1/forge_codex/cli.py +409 -0
  102. forge_next-0.1.1/forge_next.egg-info/PKG-INFO +297 -0
  103. forge_next-0.1.1/forge_next.egg-info/SOURCES.txt +145 -0
  104. forge_next-0.1.1/forge_next.egg-info/dependency_links.txt +1 -0
  105. forge_next-0.1.1/forge_next.egg-info/entry_points.txt +2 -0
  106. forge_next-0.1.1/forge_next.egg-info/top_level.txt +2 -0
  107. forge_next-0.1.1/pyproject.toml +26 -0
  108. forge_next-0.1.1/scripts/__init__.py +0 -0
  109. forge_next-0.1.1/scripts/code_review/__init__.py +0 -0
  110. forge_next-0.1.1/scripts/code_review/code_review.py +415 -0
  111. forge_next-0.1.1/scripts/develop/__init__.py +0 -0
  112. forge_next-0.1.1/scripts/develop/develop.py +372 -0
  113. forge_next-0.1.1/scripts/diagnose/__init__.py +0 -0
  114. forge_next-0.1.1/scripts/diagnose/decision_matrix.py +180 -0
  115. forge_next-0.1.1/scripts/diagnose/diagnostic_report.py +239 -0
  116. forge_next-0.1.1/scripts/diagnose/fmea_score.py +172 -0
  117. forge_next-0.1.1/scripts/diagnose/git_hotspots.py +229 -0
  118. forge_next-0.1.1/scripts/diagnose/log_analyzer.py +252 -0
  119. forge_next-0.1.1/scripts/diagnose/orchestrate.py +430 -0
  120. forge_next-0.1.1/scripts/evaluate/__init__.py +0 -0
  121. forge_next-0.1.1/scripts/evaluate/evaluate.py +566 -0
  122. forge_next-0.1.1/scripts/evaluate/mode_detector.py +80 -0
  123. forge_next-0.1.1/scripts/evaluate/plan_resolver.py +127 -0
  124. forge_next-0.1.1/scripts/evaluate/state.py +117 -0
  125. forge_next-0.1.1/scripts/evaluate/template_engine.py +91 -0
  126. forge_next-0.1.1/scripts/implement/__init__.py +0 -0
  127. forge_next-0.1.1/scripts/implement/implement.py +604 -0
  128. forge_next-0.1.1/scripts/plan/__init__.py +0 -0
  129. forge_next-0.1.1/scripts/plan/plan.py +512 -0
  130. forge_next-0.1.1/scripts/shared/__init__.py +0 -0
  131. forge_next-0.1.1/scripts/shared/findings.py +82 -0
  132. forge_next-0.1.1/scripts/shared/orchestrator.py +1151 -0
  133. forge_next-0.1.1/scripts/shared/report.py +81 -0
  134. forge_next-0.1.1/scripts/shared/resume.py +482 -0
  135. forge_next-0.1.1/scripts/shared/skill_chain.py +43 -0
  136. forge_next-0.1.1/scripts/smoke.py +261 -0
  137. forge_next-0.1.1/scripts/test/__init__.py +0 -0
  138. forge_next-0.1.1/scripts/test/_cassette.py +45 -0
  139. forge_next-0.1.1/scripts/test/_scenario_index.py +179 -0
  140. forge_next-0.1.1/scripts/test/_sidecar.py +139 -0
  141. forge_next-0.1.1/scripts/test/flow_types.py +264 -0
  142. forge_next-0.1.1/scripts/test/test.py +775 -0
  143. forge_next-0.1.1/scripts/test/test_layout.py +510 -0
  144. forge_next-0.1.1/setup.cfg +4 -0
  145. forge_next-0.1.1/tests/test_launcher_smoke.py +43 -0
  146. forge_next-0.1.1/tests/test_regressions.py +1296 -0
  147. forge_next-0.1.1/tests/test_shared_orchestrator.py +46 -0
@@ -0,0 +1,297 @@
1
+ Metadata-Version: 2.4
2
+ Name: forge-next
3
+ Version: 0.1.1
4
+ Summary: Forge Codex: install-once workflow orchestrators for Cursor/Codex
5
+ Author: Forge Codex
6
+ License: MIT
7
+ Requires-Python: >=3.10
8
+ Description-Content-Type: text/markdown
9
+
10
+ # forge
11
+
12
+ A Codex-native agent toolkit for structured software delivery: investigation, planning, implementation, review, testing, diagnostics, and workflow continuity across sessions.
13
+
14
+ ## Recent Changes
15
+
16
+ Mock Flows + Numbered Handoff Menu (2026-05-07):
17
+ - `forge:test --mode flows` authors end-to-end mock flows in 4 styles (scenario / BDD / HTTP-replay / workflow-dry-run). The skill detects your project layout, recommends the best-fit flow type with a confidence score, and progressively gates 8 quality criteria across scaffold → author → execute → report phases. Run with `--flow-type <type>` to override the recommendation, or `--framework`/`--entry-point`/`--roles` to fine-tune detection. Reference `templates/mock-flow-types.md` for per-type details.
18
+ - Every skill's final step now presents a numbered handoff menu instead of a single hardcoded next-skill string. Reply with "yes" / "1" / "default" or pick a numbered alternative to steer the workflow. Use `scripts/smoke.py` as a CI-eligible end-to-end harness for the new flows mode.
19
+
20
+ State-lifecycle and authoring fixes (2026-05-07):
21
+ - The `plan` skill now materializes a section-marker skeleton at step 1 (sourced from `templates/writing-plans.md`) and refuses to mark step 6 complete while any `<!-- FORGE_SKELETON: ... -->` markers remain.
22
+ - `forge resume --cleanup` lists state files eligible for cleanup (dry-run by default). Add `--force` to delete; `--all-stale --force` for migration mode (clears every state file regardless of age).
23
+ - Re-running step 1 of any skill now aborts when an in-progress same-skill session exists. To resume, use the `--state <path>` flag or run `resume.py`.
24
+ - Over-cap `--step` invocations (e.g., `--step 9` on an 8-step skill) now print a friendly "skill complete" message and exit 0 instead of erroring.
25
+ - A `failure_count` field tracks consecutive same-step retries; after two failures, `resume.py` emits an "inspect logs" hint instead of producing a third retry command.
26
+
27
+ ## Install (pipx) — run in any repo
28
+
29
+ This repo ships a global `forge` launcher so you can install once and run
30
+ workflows from any target repository (without copying `scripts/` into each repo).
31
+
32
+ ```bash
33
+ pipx install forge-next
34
+ ```
35
+
36
+ Then, from any target repo:
37
+
38
+ ```bash
39
+ forge evaluate --step 1 --mode review
40
+ forge plan --step 1
41
+ forge status
42
+ ```
43
+
44
+ Use `--repo <path>` to target a different repository root.
45
+
46
+ ## Quick Start (dev / contributors)
47
+
48
+ ```bash
49
+ # Clone the repo
50
+ git clone https://github.com/your-org/forge-codex.git /path/to/forge-codex
51
+
52
+ # Enter the project
53
+ cd /path/to/forge-codex
54
+ ```
55
+
56
+ Then use the repo as the home for Codex-oriented workflow assets, skills, prompts, and orchestrators.
57
+
58
+ ## Codex Config
59
+
60
+ If you want local Codex sessions to treat Forge skill invocation as implicit
61
+ permission to use Forge agents, add a `developer_instructions` block to
62
+ `~/.codex/config.toml`:
63
+
64
+ ```toml
65
+ developer_instructions = """
66
+ Invoking any `forge:*` skill implicitly authorizes the agent dispatch required by that workflow. Do not require the user to separately ask for delegation, sub-agents, or parallel agent work after invoking a Forge skill.
67
+
68
+ At the start of a fresh interactive session, begin the first user-visible response with exactly: Ready Player 1?
69
+ """
70
+ ```
71
+
72
+ You can verify the injected developer prompt with:
73
+
74
+ ```bash
75
+ codex debug prompt-input
76
+ ```
77
+
78
+ If a higher-priority launcher or hosted integration injects its own developer
79
+ instructions, those may still override or compete with your local config.
80
+
81
+ ## Goals
82
+
83
+ - Turn a structured multi-skill workflow model into a Codex-first environment
84
+ - Support multi-step, resumable engineering workflows instead of one-shot prompts
85
+ - Separate skill orchestration from reusable methodology templates
86
+ - Preserve handoff context between phases and between sessions
87
+ - Make review, verification, and diagnostics first-class parts of the workflow
88
+
89
+ ## Planned Skills
90
+
91
+ | Skill | Purpose | Typical Invocation |
92
+ |-------|---------|--------------------|
93
+ | **develop** | Investigate a problem space and shape solution options | `develop <problem or feature>` |
94
+ | **plan** | Convert an approved direction into an implementation plan | `plan` |
95
+ | **evaluate** | Review a plan before or after implementation | `evaluate <plan>` |
96
+ | **implement** | Execute a plan in ordered or parallel waves | `implement` |
97
+ | **code-review** | Run structured review modes against code changes | `code-review <target>` |
98
+ | **test** | Execute tests, analyze failures, and identify coverage gaps | `test` |
99
+ | **diagnose** | Perform root-cause analysis on bugs and regressions | `diagnose <issue>` |
100
+ | **status** | Show workflow position, open findings, and next action | `status` |
101
+ | **resume** | Continue the active workflow from persisted state | `resume` |
102
+
103
+ ## Forge Skill Invocation Contract
104
+
105
+ Invoking a Forge workflow skill is intended to be enough to authorize the agent team that skill needs.
106
+
107
+ - `forge:develop`, `forge:plan`, `forge:implement`, `forge:code-review`, `forge:test`, and `forge:diagnose` should auto-dispatch the relevant Forge agents when their workflow calls for it.
108
+ - `forge:evaluate` should auto-dispatch the review team when team/review mode is active.
109
+ - Users should not need to separately ask for "sub-agents", "delegation", or "parallel agent work" after invoking a Forge skill.
110
+ - If the surrounding Codex session policy blocks agent spawning, that should be surfaced as an environment limitation rather than treated as normal Forge behavior.
111
+ - Every spawned agent must be closed (`close_agent`) once it reports back or is no longer useful. Forge skills never leave agents open across wave / step / phase boundaries — Codex caps concurrent agents and leaked sessions eventually block further dispatch. See `templates/codex-runtime.md` for the lifecycle pattern.
112
+ - At the end of each skill's workflow, a numbered handoff menu replaces the previous single next-skill prompt. Users can reply "yes", "1", "default", or a literal command; the menu makes workflow alternatives explicit.
113
+
114
+ ## Workflow Model
115
+
116
+ ```text
117
+ develop -> plan -> evaluate (pre) -> implement -> code-review -> test -> diagnose (if needed)
118
+
119
+ At any point:
120
+ - evaluate can run as a standalone critique workflow
121
+ - diagnose can run as an ad-hoc incident workflow
122
+ - status and resume can inspect or continue the current state
123
+ ```
124
+
125
+ The intended model is composable rather than monolithic:
126
+
127
+ - Each skill can run on its own
128
+ - Skills can hand off context to the next skill in the chain
129
+ - State files make interrupted workflows resumable
130
+ - Review loops enforce quality gates before moving downstream
131
+
132
+ ## Agents
133
+
134
+ The Codex version is expected to use a small set of specialized roles rather than a single undifferentiated agent.
135
+
136
+ | Agent | Role |
137
+ |-------|------|
138
+ | **architect** | Investigation lead, solution design, architecture review |
139
+ | **planner** | Implementation planning, sequencing, dependency mapping |
140
+ | **backend-dev** | Backend implementation with tests |
141
+ | **frontend-dev** | Frontend implementation with tests |
142
+ | **critic** | Challenges assumptions, stresses weak logic, finds hidden risks |
143
+ | **qa-reviewer** | Validates behavior, testing quality, and verification depth |
144
+ | **security-reviewer** | Reviews security-sensitive changes and operational risk |
145
+ | **doc-writer** | Produces user-facing and developer-facing documentation and tracks documentation debt |
146
+
147
+ ## Methodology Coverage
148
+
149
+ `forge-codex` is intended to bundle practical engineering methods instead of vague “best practices”.
150
+
151
+ **Investigation and diagnostics**
152
+
153
+ - 5 Whys
154
+ - Kepner-Tregoe IS/IS-NOT
155
+ - Fishbone / Ishikawa
156
+ - FMEA
157
+ - MECE decomposition
158
+ - Bayesian evidence updates
159
+ - hypothesis-driven debugging
160
+ - change analysis
161
+ - counterfactual reasoning
162
+ - barrier analysis
163
+
164
+ **Solution design**
165
+
166
+ - divergent and convergent option generation
167
+ - trade-off scoring
168
+ - pre-mortem analysis
169
+ - reversibility checks
170
+ - constraint analysis
171
+
172
+ **Planning**
173
+
174
+ - phased execution
175
+ - dependency mapping
176
+ - parallelization opportunities
177
+ - rollback planning
178
+ - explicit verification steps
179
+ - documentation-in-the-loop
180
+
181
+ **Review and testing**
182
+
183
+ - structured finding severity
184
+ - behavior verification
185
+ - edge-case analysis
186
+ - regression coverage review
187
+ - failure triage
188
+ - operational readiness checks
189
+
190
+ ## Architecture
191
+
192
+ The repo is expected to follow a script-driven orchestration model.
193
+
194
+ - **Skill orchestrators** drive state progression for each workflow
195
+ - **Prompt templates** provide repeatable phase instructions
196
+ - **Shared templates** hold reusable review and planning patterns
197
+ - **State files** persist current step, completed step, findings, and handoffs
198
+ - **Memory files** carry context between adjacent skills
199
+ - **Reports** provide durable outputs from evaluate, review, and diagnose flows
200
+
201
+ ## State and Continuity
202
+
203
+ Cross-session continuity is a core design goal.
204
+
205
+ - Each active skill should persist its own state file
206
+ - Resume logic should distinguish between a true conflict and an unrelated active session
207
+ - Standalone skills should not pause just because another non-conflicting workflow exists
208
+ - Handoff files should summarize completed work and recommend the next step
209
+ - Status tooling should surface active sessions, findings, and next actions without requiring manual inspection
210
+
211
+ ## Design Principles
212
+
213
+ - **Codex-first**: optimize for Codex workflows, not a direct port of another assistant’s toolkit model
214
+ - **Actionable outputs**: produce plans, findings, commands, and reports that can be used immediately
215
+ - **Resumable by default**: interrupted work should be recoverable
216
+ - **Verification over narration**: claims should be tied to code, tests, or runtime evidence
217
+ - **Composable workflows**: users should be able to run a single skill or the full chain
218
+ - **Minimal hidden state**: the workflow should be inspectable from files in the repo
219
+
220
+ ## Current Project Structure
221
+
222
+ ```text
223
+ forge-codex/
224
+ ├── README.md
225
+ ├── agents/
226
+ ├── prompts/
227
+ │ ├── develop/
228
+ │ ├── plan/
229
+ │ ├── evaluate/
230
+ │ ├── implement/
231
+ │ ├── code-review/
232
+ │ ├── test/
233
+ │ └── diagnose/
234
+ ├── templates/
235
+ │ ├── review/
236
+ │ ├── planning/
237
+ │ ├── reporting/
238
+ │ └── handoff/
239
+ ├── scripts/
240
+ │ ├── shared/
241
+ │ ├── develop/
242
+ │ ├── plan/
243
+ │ ├── evaluate/
244
+ │ ├── implement/
245
+ │ ├── code-review/
246
+ │ ├── test/
247
+ │ └── diagnose/
248
+ ├── skills/
249
+ │ ├── develop/
250
+ │ ├── plan/
251
+ │ ├── evaluate/
252
+ │ ├── implement/
253
+ │ ├── code-review/
254
+ │ ├── test/
255
+ │ ├── diagnose/
256
+ │ ├── status/
257
+ │ └── resume/
258
+ └── templates/
259
+ ```
260
+
261
+ ## Initial Roadmap
262
+
263
+ ### Phase 1: Skeleton
264
+
265
+ - define repository layout
266
+ - add shared orchestration primitives
267
+ - add `status` and `resume` foundations
268
+ - document the state model
269
+
270
+ ### Phase 2: Core Skills
271
+
272
+ - implement `evaluate`
273
+ - implement `diagnose`
274
+ - implement `develop`
275
+ - add report generation and state cleanup rules
276
+
277
+ ### Phase 3: Delivery Flow
278
+
279
+ - implement `plan`
280
+ - implement `implement`
281
+ - implement `code-review`
282
+ - implement `test`
283
+
284
+ ### Phase 4: Hardening
285
+
286
+ - add regression tests for state handling
287
+ - verify conflict detection logic
288
+ - tighten workflow transitions
289
+ - document extension points for future agents and skills
290
+
291
+ ## Current Status
292
+
293
+ This repository now contains the copied Codex workflow assets, reorganized into a Codex-first layout. Assistant-specific packaging has been removed, and the top-level structure has been normalized around `agents/`, `skills/`, `scripts/`, `prompts/`, and `templates/`.
294
+
295
+ ## License
296
+
297
+ MIT
@@ -0,0 +1,288 @@
1
+ # forge
2
+
3
+ A Codex-native agent toolkit for structured software delivery: investigation, planning, implementation, review, testing, diagnostics, and workflow continuity across sessions.
4
+
5
+ ## Recent Changes
6
+
7
+ Mock Flows + Numbered Handoff Menu (2026-05-07):
8
+ - `forge:test --mode flows` authors end-to-end mock flows in 4 styles (scenario / BDD / HTTP-replay / workflow-dry-run). The skill detects your project layout, recommends the best-fit flow type with a confidence score, and progressively gates 8 quality criteria across scaffold → author → execute → report phases. Run with `--flow-type <type>` to override the recommendation, or `--framework`/`--entry-point`/`--roles` to fine-tune detection. Reference `templates/mock-flow-types.md` for per-type details.
9
+ - Every skill's final step now presents a numbered handoff menu instead of a single hardcoded next-skill string. Reply with "yes" / "1" / "default" or pick a numbered alternative to steer the workflow. Use `scripts/smoke.py` as a CI-eligible end-to-end harness for the new flows mode.
10
+
11
+ State-lifecycle and authoring fixes (2026-05-07):
12
+ - The `plan` skill now materializes a section-marker skeleton at step 1 (sourced from `templates/writing-plans.md`) and refuses to mark step 6 complete while any `<!-- FORGE_SKELETON: ... -->` markers remain.
13
+ - `forge resume --cleanup` lists state files eligible for cleanup (dry-run by default). Add `--force` to delete; `--all-stale --force` for migration mode (clears every state file regardless of age).
14
+ - Re-running step 1 of any skill now aborts when an in-progress same-skill session exists. To resume, use the `--state <path>` flag or run `resume.py`.
15
+ - Over-cap `--step` invocations (e.g., `--step 9` on an 8-step skill) now print a friendly "skill complete" message and exit 0 instead of erroring.
16
+ - A `failure_count` field tracks consecutive same-step retries; after two failures, `resume.py` emits an "inspect logs" hint instead of producing a third retry command.
17
+
18
+ ## Install (pipx) — run in any repo
19
+
20
+ This repo ships a global `forge` launcher so you can install once and run
21
+ workflows from any target repository (without copying `scripts/` into each repo).
22
+
23
+ ```bash
24
+ pipx install forge-next
25
+ ```
26
+
27
+ Then, from any target repo:
28
+
29
+ ```bash
30
+ forge evaluate --step 1 --mode review
31
+ forge plan --step 1
32
+ forge status
33
+ ```
34
+
35
+ Use `--repo <path>` to target a different repository root.
36
+
37
+ ## Quick Start (dev / contributors)
38
+
39
+ ```bash
40
+ # Clone the repo
41
+ git clone https://github.com/your-org/forge-codex.git /path/to/forge-codex
42
+
43
+ # Enter the project
44
+ cd /path/to/forge-codex
45
+ ```
46
+
47
+ Then use the repo as the home for Codex-oriented workflow assets, skills, prompts, and orchestrators.
48
+
49
+ ## Codex Config
50
+
51
+ If you want local Codex sessions to treat Forge skill invocation as implicit
52
+ permission to use Forge agents, add a `developer_instructions` block to
53
+ `~/.codex/config.toml`:
54
+
55
+ ```toml
56
+ developer_instructions = """
57
+ Invoking any `forge:*` skill implicitly authorizes the agent dispatch required by that workflow. Do not require the user to separately ask for delegation, sub-agents, or parallel agent work after invoking a Forge skill.
58
+
59
+ At the start of a fresh interactive session, begin the first user-visible response with exactly: Ready Player 1?
60
+ """
61
+ ```
62
+
63
+ You can verify the injected developer prompt with:
64
+
65
+ ```bash
66
+ codex debug prompt-input
67
+ ```
68
+
69
+ If a higher-priority launcher or hosted integration injects its own developer
70
+ instructions, those may still override or compete with your local config.
71
+
72
+ ## Goals
73
+
74
+ - Turn a structured multi-skill workflow model into a Codex-first environment
75
+ - Support multi-step, resumable engineering workflows instead of one-shot prompts
76
+ - Separate skill orchestration from reusable methodology templates
77
+ - Preserve handoff context between phases and between sessions
78
+ - Make review, verification, and diagnostics first-class parts of the workflow
79
+
80
+ ## Planned Skills
81
+
82
+ | Skill | Purpose | Typical Invocation |
83
+ |-------|---------|--------------------|
84
+ | **develop** | Investigate a problem space and shape solution options | `develop <problem or feature>` |
85
+ | **plan** | Convert an approved direction into an implementation plan | `plan` |
86
+ | **evaluate** | Review a plan before or after implementation | `evaluate <plan>` |
87
+ | **implement** | Execute a plan in ordered or parallel waves | `implement` |
88
+ | **code-review** | Run structured review modes against code changes | `code-review <target>` |
89
+ | **test** | Execute tests, analyze failures, and identify coverage gaps | `test` |
90
+ | **diagnose** | Perform root-cause analysis on bugs and regressions | `diagnose <issue>` |
91
+ | **status** | Show workflow position, open findings, and next action | `status` |
92
+ | **resume** | Continue the active workflow from persisted state | `resume` |
93
+
94
+ ## Forge Skill Invocation Contract
95
+
96
+ Invoking a Forge workflow skill is intended to be enough to authorize the agent team that skill needs.
97
+
98
+ - `forge:develop`, `forge:plan`, `forge:implement`, `forge:code-review`, `forge:test`, and `forge:diagnose` should auto-dispatch the relevant Forge agents when their workflow calls for it.
99
+ - `forge:evaluate` should auto-dispatch the review team when team/review mode is active.
100
+ - Users should not need to separately ask for "sub-agents", "delegation", or "parallel agent work" after invoking a Forge skill.
101
+ - If the surrounding Codex session policy blocks agent spawning, that should be surfaced as an environment limitation rather than treated as normal Forge behavior.
102
+ - Every spawned agent must be closed (`close_agent`) once it reports back or is no longer useful. Forge skills never leave agents open across wave / step / phase boundaries — Codex caps concurrent agents and leaked sessions eventually block further dispatch. See `templates/codex-runtime.md` for the lifecycle pattern.
103
+ - At the end of each skill's workflow, a numbered handoff menu replaces the previous single next-skill prompt. Users can reply "yes", "1", "default", or a literal command; the menu makes workflow alternatives explicit.
104
+
105
+ ## Workflow Model
106
+
107
+ ```text
108
+ develop -> plan -> evaluate (pre) -> implement -> code-review -> test -> diagnose (if needed)
109
+
110
+ At any point:
111
+ - evaluate can run as a standalone critique workflow
112
+ - diagnose can run as an ad-hoc incident workflow
113
+ - status and resume can inspect or continue the current state
114
+ ```
115
+
116
+ The intended model is composable rather than monolithic:
117
+
118
+ - Each skill can run on its own
119
+ - Skills can hand off context to the next skill in the chain
120
+ - State files make interrupted workflows resumable
121
+ - Review loops enforce quality gates before moving downstream
122
+
123
+ ## Agents
124
+
125
+ The Codex version is expected to use a small set of specialized roles rather than a single undifferentiated agent.
126
+
127
+ | Agent | Role |
128
+ |-------|------|
129
+ | **architect** | Investigation lead, solution design, architecture review |
130
+ | **planner** | Implementation planning, sequencing, dependency mapping |
131
+ | **backend-dev** | Backend implementation with tests |
132
+ | **frontend-dev** | Frontend implementation with tests |
133
+ | **critic** | Challenges assumptions, stresses weak logic, finds hidden risks |
134
+ | **qa-reviewer** | Validates behavior, testing quality, and verification depth |
135
+ | **security-reviewer** | Reviews security-sensitive changes and operational risk |
136
+ | **doc-writer** | Produces user-facing and developer-facing documentation and tracks documentation debt |
137
+
138
+ ## Methodology Coverage
139
+
140
+ `forge-codex` is intended to bundle practical engineering methods instead of vague “best practices”.
141
+
142
+ **Investigation and diagnostics**
143
+
144
+ - 5 Whys
145
+ - Kepner-Tregoe IS/IS-NOT
146
+ - Fishbone / Ishikawa
147
+ - FMEA
148
+ - MECE decomposition
149
+ - Bayesian evidence updates
150
+ - hypothesis-driven debugging
151
+ - change analysis
152
+ - counterfactual reasoning
153
+ - barrier analysis
154
+
155
+ **Solution design**
156
+
157
+ - divergent and convergent option generation
158
+ - trade-off scoring
159
+ - pre-mortem analysis
160
+ - reversibility checks
161
+ - constraint analysis
162
+
163
+ **Planning**
164
+
165
+ - phased execution
166
+ - dependency mapping
167
+ - parallelization opportunities
168
+ - rollback planning
169
+ - explicit verification steps
170
+ - documentation-in-the-loop
171
+
172
+ **Review and testing**
173
+
174
+ - structured finding severity
175
+ - behavior verification
176
+ - edge-case analysis
177
+ - regression coverage review
178
+ - failure triage
179
+ - operational readiness checks
180
+
181
+ ## Architecture
182
+
183
+ The repo is expected to follow a script-driven orchestration model.
184
+
185
+ - **Skill orchestrators** drive state progression for each workflow
186
+ - **Prompt templates** provide repeatable phase instructions
187
+ - **Shared templates** hold reusable review and planning patterns
188
+ - **State files** persist current step, completed step, findings, and handoffs
189
+ - **Memory files** carry context between adjacent skills
190
+ - **Reports** provide durable outputs from evaluate, review, and diagnose flows
191
+
192
+ ## State and Continuity
193
+
194
+ Cross-session continuity is a core design goal.
195
+
196
+ - Each active skill should persist its own state file
197
+ - Resume logic should distinguish between a true conflict and an unrelated active session
198
+ - Standalone skills should not pause just because another non-conflicting workflow exists
199
+ - Handoff files should summarize completed work and recommend the next step
200
+ - Status tooling should surface active sessions, findings, and next actions without requiring manual inspection
201
+
202
+ ## Design Principles
203
+
204
+ - **Codex-first**: optimize for Codex workflows, not a direct port of another assistant’s toolkit model
205
+ - **Actionable outputs**: produce plans, findings, commands, and reports that can be used immediately
206
+ - **Resumable by default**: interrupted work should be recoverable
207
+ - **Verification over narration**: claims should be tied to code, tests, or runtime evidence
208
+ - **Composable workflows**: users should be able to run a single skill or the full chain
209
+ - **Minimal hidden state**: the workflow should be inspectable from files in the repo
210
+
211
+ ## Current Project Structure
212
+
213
+ ```text
214
+ forge-codex/
215
+ ├── README.md
216
+ ├── agents/
217
+ ├── prompts/
218
+ │ ├── develop/
219
+ │ ├── plan/
220
+ │ ├── evaluate/
221
+ │ ├── implement/
222
+ │ ├── code-review/
223
+ │ ├── test/
224
+ │ └── diagnose/
225
+ ├── templates/
226
+ │ ├── review/
227
+ │ ├── planning/
228
+ │ ├── reporting/
229
+ │ └── handoff/
230
+ ├── scripts/
231
+ │ ├── shared/
232
+ │ ├── develop/
233
+ │ ├── plan/
234
+ │ ├── evaluate/
235
+ │ ├── implement/
236
+ │ ├── code-review/
237
+ │ ├── test/
238
+ │ └── diagnose/
239
+ ├── skills/
240
+ │ ├── develop/
241
+ │ ├── plan/
242
+ │ ├── evaluate/
243
+ │ ├── implement/
244
+ │ ├── code-review/
245
+ │ ├── test/
246
+ │ ├── diagnose/
247
+ │ ├── status/
248
+ │ └── resume/
249
+ └── templates/
250
+ ```
251
+
252
+ ## Initial Roadmap
253
+
254
+ ### Phase 1: Skeleton
255
+
256
+ - define repository layout
257
+ - add shared orchestration primitives
258
+ - add `status` and `resume` foundations
259
+ - document the state model
260
+
261
+ ### Phase 2: Core Skills
262
+
263
+ - implement `evaluate`
264
+ - implement `diagnose`
265
+ - implement `develop`
266
+ - add report generation and state cleanup rules
267
+
268
+ ### Phase 3: Delivery Flow
269
+
270
+ - implement `plan`
271
+ - implement `implement`
272
+ - implement `code-review`
273
+ - implement `test`
274
+
275
+ ### Phase 4: Hardening
276
+
277
+ - add regression tests for state handling
278
+ - verify conflict detection logic
279
+ - tighten workflow transitions
280
+ - document extension points for future agents and skills
281
+
282
+ ## Current Status
283
+
284
+ This repository now contains the copied Codex workflow assets, reorganized into a Codex-first layout. Assistant-specific packaging has been removed, and the top-level structure has been normalized around `agents/`, `skills/`, `scripts/`, `prompts/`, and `templates/`.
285
+
286
+ ## License
287
+
288
+ MIT
@@ -0,0 +1,7 @@
1
+ """Forge Codex launcher package.
2
+
3
+ This package provides the `forge` CLI entrypoint and bundles prompt/template
4
+ assets so workflows can run against any target repo without vendoring the
5
+ forge-codex repository into that repo.
6
+ """
7
+
@@ -0,0 +1,2 @@
1
+ """Bundled runtime assets (prompts/templates)."""
2
+
@@ -0,0 +1,2 @@
1
+ """Bundled prompt templates."""
2
+
@@ -0,0 +1,78 @@
1
+ # Phase 3: Team Dispatch — Architecture Mode
2
+
3
+ Dispatch all reviewers to analyze design patterns, coupling, and SOLID principles.
4
+
5
+ ## Review Target
6
+
7
+ **Mode:** Architecture Review
8
+ **Target:** {{TARGET}}
9
+ **Quick mode:** {{QUICK_MODE}}
10
+
11
+ ## Team Assignments
12
+
13
+ {{TEAM_ASSIGNMENTS}}
14
+
15
+ ## Instructions
16
+
17
+ ### 1. Identify Scope
18
+
19
+ Read the target files/modules and build a mental model of:
20
+ - Module boundaries and public interfaces
21
+ - Dependency graph (what depends on what)
22
+ - Data flow patterns (how data moves through the system)
23
+ - Error propagation patterns
24
+
25
+ ### 2. Dispatch Reviewers in Parallel
26
+
27
+ **Architect Review — SOLID Principles:**
28
+ - **S** (Single Responsibility): Does each module/class have one reason to change?
29
+ - **O** (Open/Closed): Can behavior be extended without modifying existing code?
30
+ - **L** (Liskov Substitution): Are subtypes truly substitutable for their base types?
31
+ - **I** (Interface Segregation): Are interfaces minimal and focused?
32
+ - **D** (Dependency Inversion): Do modules depend on abstractions, not concretions?
33
+
34
+ **Architect Review — Coupling & Cohesion:**
35
+ - Afferent coupling (Ca): How many modules depend on this one?
36
+ - Efferent coupling (Ce): How many modules does this one depend on?
37
+ - Instability (I = Ce / (Ca + Ce)): Is this module stable or volatile?
38
+ - Cohesion: Do the elements within each module belong together?
39
+
40
+ **Security Reviewer — Architectural Security:**
41
+ - Are trust boundaries clearly defined?
42
+ - Is authentication/authorization centralized or scattered?
43
+ - Are there privilege escalation paths?
44
+ - Is sensitive data properly compartmentalized?
45
+
46
+ **QA Reviewer — Testability:**
47
+ - Can components be tested in isolation?
48
+ - Are dependencies injectable?
49
+ - Are there hidden dependencies (globals, singletons)?
50
+ - Is the test infrastructure adequate for the architecture?
51
+
52
+ **Critic — Design Smells & Code Smells:**
53
+ - Run code smells assessment per `templates/code-smells.md`
54
+ - Priority smells: God Class, Shotgun Surgery, Inappropriate Intimacy (critical); Feature Envy, Long Method, Divergent Change (warning)
55
+ - For each smell: cite file:line, name the smell, state the consequence, recommend the specific refactoring
56
+ - Check for Dependency Structure Matrix issues: cyclic dependencies between modules, layering violations, coupling clusters
57
+
58
+ **Investigator — Dependency Analysis:**
59
+ - Map the full dependency graph
60
+ - Identify circular dependencies
61
+ - Check for dependency inversions (concrete depends on concrete)
62
+ - Evaluate third-party dependency health
63
+
64
+ **Doc-writer — Architecture Documentation:**
65
+ - Is the architecture documented?
66
+ - Do module-level docs explain the "why" not just the "what"?
67
+ - Are architectural decisions recorded (ADRs)?
68
+
69
+ ### 3. Compile Findings
70
+
71
+ Collect all findings into a unified list with:
72
+ - Finding ID (F1, F2, ...)
73
+ - Source reviewer
74
+ - Severity: critical / warning / suggestion
75
+ - Title (one line)
76
+ - Detail (explanation with specific code references)
77
+
78
+ Record findings in state and proceed to deep dive.