multi-forge 0.2.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- forge/__init__.py +3 -0
- forge/_extensions/agents/.gitkeep +0 -0
- forge/_extensions/commands/.gitkeep +0 -0
- forge/_extensions/skills/analyze/SKILL.md +87 -0
- forge/_extensions/skills/challenge/SKILL.md +91 -0
- forge/_extensions/skills/consensus/SKILL.md +120 -0
- forge/_extensions/skills/consensus/resources/code_consensus_evaluation.md +94 -0
- forge/_extensions/skills/consensus/resources/consensus_evaluation.md +70 -0
- forge/_extensions/skills/consensus/resources/synthesis.md +101 -0
- forge/_extensions/skills/debate/SKILL.md +116 -0
- forge/_extensions/skills/debate/resources/code_debate_evaluation.md +101 -0
- forge/_extensions/skills/debate/resources/debate_evaluation.md +90 -0
- forge/_extensions/skills/panel/SKILL.md +141 -0
- forge/_extensions/skills/panel/resources/synthesis.md +103 -0
- forge/_extensions/skills/qa/SKILL.md +704 -0
- forge/_extensions/skills/qa/resources/checklist/0-enable.md +78 -0
- forge/_extensions/skills/qa/resources/checklist/1-preflight.md +24 -0
- forge/_extensions/skills/qa/resources/checklist/10-resume.md +143 -0
- forge/_extensions/skills/qa/resources/checklist/11-config.md +150 -0
- forge/_extensions/skills/qa/resources/checklist/12-search.md +58 -0
- forge/_extensions/skills/qa/resources/checklist/13-guard.md +237 -0
- forge/_extensions/skills/qa/resources/checklist/14-workflow.md +305 -0
- forge/_extensions/skills/qa/resources/checklist/15-skills.md +155 -0
- forge/_extensions/skills/qa/resources/checklist/16-handoff.md +224 -0
- forge/_extensions/skills/qa/resources/checklist/17-info.md +50 -0
- forge/_extensions/skills/qa/resources/checklist/18-disable.md +84 -0
- forge/_extensions/skills/qa/resources/checklist/19-uninstall.md +146 -0
- forge/_extensions/skills/qa/resources/checklist/2-extensions.md +188 -0
- forge/_extensions/skills/qa/resources/checklist/20-cleanup.md +36 -0
- forge/_extensions/skills/qa/resources/checklist/3-auth.md +234 -0
- forge/_extensions/skills/qa/resources/checklist/4-proxy.md +481 -0
- forge/_extensions/skills/qa/resources/checklist/5-session.md +541 -0
- forge/_extensions/skills/qa/resources/checklist/6-hooks.md +275 -0
- forge/_extensions/skills/qa/resources/checklist/7-costs.md +309 -0
- forge/_extensions/skills/qa/resources/checklist/8-status-line.md +174 -0
- forge/_extensions/skills/qa/resources/checklist/9-direct-commands.md +146 -0
- forge/_extensions/skills/qa/resources/checklist.md +103 -0
- forge/_extensions/skills/qa/resources/report-template.md +62 -0
- forge/_extensions/skills/qa/scripts/start-container.sh +529 -0
- forge/_extensions/skills/qa/scripts/walkthrough-state.py +1137 -0
- forge/_extensions/skills/review/SKILL.md +125 -0
- forge/_extensions/skills/review/references/claude-4.6.md +474 -0
- forge/_extensions/skills/review/references/claude-4.7.md +710 -0
- forge/_extensions/skills/review/references/gemini-3.1.md +546 -0
- forge/_extensions/skills/review/references/gpt-5.5.md +490 -0
- forge/_extensions/skills/review/references/skills-writing-guide.md +1588 -0
- forge/_extensions/skills/review/resources/code-anthropic.md +160 -0
- forge/_extensions/skills/review/resources/code-gemini.md +184 -0
- forge/_extensions/skills/review/resources/code-openai.md +203 -0
- forge/_extensions/skills/review/resources/code.md +160 -0
- forge/_extensions/skills/review-docs/SKILL.md +121 -0
- forge/_extensions/skills/review-docs/resources/docs-anthropic.md +170 -0
- forge/_extensions/skills/review-docs/resources/docs-gemini.md +204 -0
- forge/_extensions/skills/review-docs/resources/docs-openai.md +231 -0
- forge/_extensions/skills/review-docs/resources/docs.md +170 -0
- forge/_extensions/skills/smoke-test/SKILL.md +27 -0
- forge/_extensions/skills/smoke-test/scripts/smoke-test.sh +118 -0
- forge/_extensions/skills/understand/SKILL.md +148 -0
- forge/_extensions/skills/understand/resources/code-anthropic.md +163 -0
- forge/_extensions/skills/understand/resources/code-gemini.md +194 -0
- forge/_extensions/skills/understand/resources/code-openai.md +181 -0
- forge/_extensions/skills/understand/resources/code.md +163 -0
- forge/_extensions/skills/understand/resources/docs-anthropic.md +177 -0
- forge/_extensions/skills/understand/resources/docs-gemini.md +202 -0
- forge/_extensions/skills/understand/resources/docs-openai.md +191 -0
- forge/_extensions/skills/understand/resources/docs.md +177 -0
- forge/_extensions/skills/walkthrough/SKILL.md +599 -0
- forge/_extensions/skills/walkthrough/resources/checklist.md +765 -0
- forge/_extensions/skills/walkthrough/scripts/run-in-repo.sh +118 -0
- forge/_extensions/skills/walkthrough/scripts/setup-test-repo.sh +198 -0
- forge/_extensions/skills/walkthrough/scripts/walkthrough-state.py +1137 -0
- forge/backend/__init__.py +174 -0
- forge/backend/adapters/__init__.py +38 -0
- forge/backend/adapters/litellm.py +158 -0
- forge/backend/creation.py +89 -0
- forge/backend/registry.py +178 -0
- forge/cli/__init__.py +16 -0
- forge/cli/auth.py +483 -0
- forge/cli/backend.py +298 -0
- forge/cli/claude.py +411 -0
- forge/cli/config_cmd.py +303 -0
- forge/cli/extensions.py +1001 -0
- forge/cli/gc.py +165 -0
- forge/cli/guard.py +1018 -0
- forge/cli/guards.py +106 -0
- forge/cli/handoff.py +110 -0
- forge/cli/hooks/__init__.py +36 -0
- forge/cli/hooks/_group.py +20 -0
- forge/cli/hooks/_helpers.py +149 -0
- forge/cli/hooks/commands.py +1677 -0
- forge/cli/hooks/direct_commands.py +1304 -0
- forge/cli/hooks/install.py +232 -0
- forge/cli/hooks/policy.py +151 -0
- forge/cli/hooks/read_hygiene.py +74 -0
- forge/cli/hooks/verification.py +370 -0
- forge/cli/logs.py +406 -0
- forge/cli/main.py +292 -0
- forge/cli/proxy.py +1821 -0
- forge/cli/proxy_costs.py +313 -0
- forge/cli/search.py +416 -0
- forge/cli/session.py +892 -0
- forge/cli/session_addendum.py +81 -0
- forge/cli/session_fork.py +750 -0
- forge/cli/session_handoff.py +141 -0
- forge/cli/session_lifecycle.py +2053 -0
- forge/cli/session_manage.py +1336 -0
- forge/cli/session_memory.py +201 -0
- forge/cli/status_line.py +1398 -0
- forge/cli/workflow.py +1964 -0
- forge/config/__init__.py +110 -0
- forge/config/dataclass_utils.py +88 -0
- forge/config/defaults/__init__.py +0 -0
- forge/config/defaults/backends/__init__.py +0 -0
- forge/config/defaults/backends/litellm.yaml +196 -0
- forge/config/defaults/templates/__init__.py +0 -0
- forge/config/defaults/templates/litellm-anthropic-local.yaml +33 -0
- forge/config/defaults/templates/litellm-anthropic.yaml +24 -0
- forge/config/defaults/templates/litellm-gemini-flash-local.yaml +37 -0
- forge/config/defaults/templates/litellm-gemini-local.yaml +32 -0
- forge/config/defaults/templates/litellm-gemini-test.yaml +34 -0
- forge/config/defaults/templates/litellm-gemini.yaml +21 -0
- forge/config/defaults/templates/litellm-openai-codex-local.yaml +36 -0
- forge/config/defaults/templates/litellm-openai-local.yaml +38 -0
- forge/config/defaults/templates/litellm-openai.yaml +28 -0
- forge/config/defaults/templates/openrouter-anthropic.yaml +23 -0
- forge/config/defaults/templates/openrouter-deepseek.yaml +26 -0
- forge/config/defaults/templates/openrouter-gemini-flash.yaml +26 -0
- forge/config/defaults/templates/openrouter-gemini.yaml +23 -0
- forge/config/defaults/templates/openrouter-glm.yaml +23 -0
- forge/config/defaults/templates/openrouter-kimi.yaml +30 -0
- forge/config/defaults/templates/openrouter-minimax.yaml +26 -0
- forge/config/defaults/templates/openrouter-openai-codex.yaml +23 -0
- forge/config/defaults/templates/openrouter-openai.yaml +28 -0
- forge/config/defaults/templates/openrouter-qwen.yaml +25 -0
- forge/config/loader.py +675 -0
- forge/config/schema.py +448 -0
- forge/core/__init__.py +5 -0
- forge/core/auth/__init__.py +67 -0
- forge/core/auth/capabilities.py +219 -0
- forge/core/auth/credentials_file.py +244 -0
- forge/core/auth/protocols.py +18 -0
- forge/core/auth/secrets.py +243 -0
- forge/core/auth/template_secrets.py +112 -0
- forge/core/data/__init__.py +5 -0
- forge/core/data/model_catalog.yaml +1522 -0
- forge/core/data/pricing.yaml +140 -0
- forge/core/data/system_prompt_addendums/__init__.py +0 -0
- forge/core/data/system_prompt_addendums/gemini.md +330 -0
- forge/core/data/system_prompt_addendums/openai.md +328 -0
- forge/core/llm/__init__.py +231 -0
- forge/core/llm/clients/__init__.py +14 -0
- forge/core/llm/clients/base.py +115 -0
- forge/core/llm/clients/litellm.py +619 -0
- forge/core/llm/clients/openai_compat.py +244 -0
- forge/core/llm/clients/openrouter.py +234 -0
- forge/core/llm/credentials.py +439 -0
- forge/core/llm/detection.py +86 -0
- forge/core/llm/errors.py +44 -0
- forge/core/llm/protocols.py +80 -0
- forge/core/llm/types.py +176 -0
- forge/core/logging.py +146 -0
- forge/core/models/__init__.py +91 -0
- forge/core/models/catalog.py +467 -0
- forge/core/models/pricing.py +165 -0
- forge/core/models/types.py +167 -0
- forge/core/naming.py +212 -0
- forge/core/ops/__init__.py +73 -0
- forge/core/ops/context.py +141 -0
- forge/core/ops/gc.py +802 -0
- forge/core/ops/proxy.py +146 -0
- forge/core/ops/resolution.py +135 -0
- forge/core/ops/session.py +344 -0
- forge/core/ops/session_context.py +548 -0
- forge/core/paths.py +38 -0
- forge/core/process.py +54 -0
- forge/core/reactive/__init__.py +38 -0
- forge/core/reactive/cost_tracking.py +300 -0
- forge/core/reactive/env.py +180 -0
- forge/core/reactive/proxy.py +78 -0
- forge/core/reactive/routing.py +622 -0
- forge/core/reactive/session_runner.py +185 -0
- forge/core/reactive/structured_output.py +62 -0
- forge/core/reactive/tagger.py +94 -0
- forge/core/reactive/throttle.py +132 -0
- forge/core/state/__init__.py +59 -0
- forge/core/state/exceptions.py +59 -0
- forge/core/state/io.py +140 -0
- forge/core/state/lock.py +99 -0
- forge/core/state/timestamps.py +60 -0
- forge/core/transcript.py +78 -0
- forge/core/typing_helpers.py +24 -0
- forge/core/workqueue/__init__.py +67 -0
- forge/core/workqueue/queue.py +552 -0
- forge/core/workqueue/types.py +63 -0
- forge/guard/__init__.py +26 -0
- forge/guard/deterministic/__init__.py +26 -0
- forge/guard/deterministic/base.py +158 -0
- forge/guard/deterministic/coding_standards.py +256 -0
- forge/guard/deterministic/registry.py +148 -0
- forge/guard/deterministic/tdd.py +171 -0
- forge/guard/engine.py +216 -0
- forge/guard/protocols.py +91 -0
- forge/guard/queries.py +96 -0
- forge/guard/semantic/__init__.py +34 -0
- forge/guard/semantic/promotion.py +18 -0
- forge/guard/semantic/supervisor.py +813 -0
- forge/guard/semantic/verdict.py +183 -0
- forge/guard/store.py +124 -0
- forge/guard/team/__init__.py +6 -0
- forge/guard/team/config.py +24 -0
- forge/guard/team/handlers.py +209 -0
- forge/guard/team/prompts.py +41 -0
- forge/guard/types.py +125 -0
- forge/guard/workflow/__init__.py +17 -0
- forge/guard/workflow/branches.py +67 -0
- forge/guard/workflow/config.py +63 -0
- forge/guard/workflow/divergence.py +113 -0
- forge/guard/workflow/policy.py +87 -0
- forge/guard/workflow/stages.py +205 -0
- forge/install/__init__.py +55 -0
- forge/install/cli.py +281 -0
- forge/install/exceptions.py +163 -0
- forge/install/hooks.py +109 -0
- forge/install/installer.py +1037 -0
- forge/install/models.py +321 -0
- forge/install/preset.py +272 -0
- forge/install/settings_merge.py +831 -0
- forge/install/tracking.py +238 -0
- forge/install/version.py +141 -0
- forge/proxy/__init__.py +0 -0
- forge/proxy/base_client.py +181 -0
- forge/proxy/client_adapter.py +476 -0
- forge/proxy/client_factory.py +531 -0
- forge/proxy/converters.py +1206 -0
- forge/proxy/cost_logger.py +132 -0
- forge/proxy/cost_tracker.py +242 -0
- forge/proxy/data_models.py +338 -0
- forge/proxy/error_hints.py +92 -0
- forge/proxy/metrics.py +222 -0
- forge/proxy/model_spec.py +158 -0
- forge/proxy/proxies.py +333 -0
- forge/proxy/proxy_identity.py +134 -0
- forge/proxy/proxy_orchestrator.py +1018 -0
- forge/proxy/proxy_startup.py +54 -0
- forge/proxy/server.py +1561 -0
- forge/proxy/utils.py +537 -0
- forge/review/__init__.py +6 -0
- forge/review/adversarial.py +111 -0
- forge/review/consensus.py +236 -0
- forge/review/engine.py +356 -0
- forge/review/models.py +437 -0
- forge/review/resources/__init__.py +5 -0
- forge/review/resources/codereview-performance.md +85 -0
- forge/review/resources/codereview-quick.md +75 -0
- forge/review/resources/codereview-security.md +92 -0
- forge/review/resources/codereview.md +85 -0
- forge/review/resources/docreview-quick.md +75 -0
- forge/review/resources/docreview.md +86 -0
- forge/review/resources/thinkdeep.md +89 -0
- forge/review/routing.py +368 -0
- forge/review/synthesis.py +73 -0
- forge/runtime_config.py +438 -0
- forge/search/__init__.py +55 -0
- forge/search/bm25_store.py +264 -0
- forge/search/content_store.py +197 -0
- forge/search/engine.py +352 -0
- forge/search/exceptions.py +51 -0
- forge/search/extractor.py +234 -0
- forge/search/index_state.py +295 -0
- forge/search/store.py +215 -0
- forge/search/tokenizer.py +24 -0
- forge/session/__init__.py +130 -0
- forge/session/active.py +339 -0
- forge/session/artifacts.py +202 -0
- forge/session/claude/__init__.py +50 -0
- forge/session/claude/cleanup.py +105 -0
- forge/session/claude/invoke.py +236 -0
- forge/session/claude/paths.py +200 -0
- forge/session/cleanup.py +216 -0
- forge/session/config.py +34 -0
- forge/session/direct_model.py +107 -0
- forge/session/effective.py +169 -0
- forge/session/exceptions.py +255 -0
- forge/session/handoff.py +881 -0
- forge/session/handoff_agent.py +544 -0
- forge/session/hooks/__init__.py +35 -0
- forge/session/hooks/models.py +73 -0
- forge/session/hooks/session_start.py +507 -0
- forge/session/identity.py +84 -0
- forge/session/index.py +553 -0
- forge/session/manager.py +1506 -0
- forge/session/models.py +572 -0
- forge/session/overrides.py +344 -0
- forge/session/plan_resolution.py +286 -0
- forge/session/prev_sessions.py +128 -0
- forge/session/store.py +431 -0
- forge/session/validation.py +47 -0
- forge/session/worktree/__init__.py +65 -0
- forge/session/worktree/cleanup.py +262 -0
- forge/session/worktree/config_copy.py +203 -0
- forge/session/worktree/create.py +332 -0
- forge/sidecar/__init__.py +29 -0
- forge/sidecar/container.py +161 -0
- forge/sidecar/docker.py +86 -0
- forge/sidecar/secrets.py +19 -0
- multi_forge-0.2.0.dist-info/METADATA +242 -0
- multi_forge-0.2.0.dist-info/RECORD +311 -0
- multi_forge-0.2.0.dist-info/WHEEL +4 -0
- multi_forge-0.2.0.dist-info/entry_points.txt +2 -0
- multi_forge-0.2.0.dist-info/licenses/LICENSE +203 -0
- multi_forge-0.2.0.dist-info/licenses/NOTICE +14 -0
|
@@ -0,0 +1,490 @@
|
|
|
1
|
+
# GPT-5.5 Prompting Guide
|
|
2
|
+
|
|
3
|
+
> Synthesized from [OpenAI Prompt Guidance](https://developers.openai.com/api/docs/guides/prompt-guidance),
|
|
4
|
+
> [OpenAI Platform docs](https://developers.openai.com/api/docs/guides/latest-model), and
|
|
5
|
+
> [OpenAI Cookbook](https://developers.openai.com/cookbook/examples/gpt-5/gpt-5_prompting_guide). May 2026.
|
|
6
|
+
|
|
7
|
+
## Overview
|
|
8
|
+
|
|
9
|
+
GPT-5.5 is OpenAI's frontier model for **complex professional work**, announced April 23, 2026 and made available in the
|
|
10
|
+
API on April 24, 2026. It is tuned for long-context, tool-heavy, professional workflows. Prompt-relevant changes:
|
|
11
|
+
|
|
12
|
+
- **1,050,000 token API context window**
|
|
13
|
+
- **128,000 max output tokens**
|
|
14
|
+
- **`medium` default reasoning effort**, with `none`, `low`, `high`, and `xhigh` available
|
|
15
|
+
- **More outcome-first behavior** - shorter prompts with clear success criteria usually work better than process-heavy
|
|
16
|
+
legacy scaffolding
|
|
17
|
+
|
|
18
|
+
**Key characteristic:** GPT-5.5 is designed for production-grade assistants and agents. It performs best when prompts
|
|
19
|
+
clearly specify the **output contract**, **tool-use expectations**, and **completion criteria**. The highest-leverage
|
|
20
|
+
prompt changes are choosing reasoning effort by task shape, defining exact output and citation formats, and making
|
|
21
|
+
completion criteria explicit.
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## Core API Parameters
|
|
26
|
+
|
|
27
|
+
### `reasoning.effort`
|
|
28
|
+
|
|
29
|
+
| Level | Use Case |
|
|
30
|
+
| -------- | ------------------------------------------------------------------------------------------------ |
|
|
31
|
+
| `none` | Execution-heavy workloads: workflow steps, extraction, triage, structured transforms. |
|
|
32
|
+
| `low` | Tasks needing nuanced interpretation: implicit requirements, ambiguity, cancelled-tool recovery. |
|
|
33
|
+
| `medium` | **Default.** Research-heavy: long-context synthesis, multi-document review, conflict resolution. |
|
|
34
|
+
| `high` | Complex multi-step problems, strategy writing. |
|
|
35
|
+
| `xhigh` | Maximum reasoning depth. 3-5x cost of `none`. |
|
|
36
|
+
|
|
37
|
+
**Defaults across the GPT-5 family:**
|
|
38
|
+
|
|
39
|
+
- GPT-5: `medium`
|
|
40
|
+
- GPT-5.1, GPT-5.2: `none`
|
|
41
|
+
- GPT-5.5: `medium`
|
|
42
|
+
|
|
43
|
+
**Best practice:** Make prompt updates before increasing reasoning effort. Increase `reasoning.effort` one notch only
|
|
44
|
+
after prompt fixes. When lowering to `none` for execution-heavy workloads, encourage the model to "think" or outline
|
|
45
|
+
steps before answering.
|
|
46
|
+
|
|
47
|
+
### `verbosity`
|
|
48
|
+
|
|
49
|
+
A **dedicated API parameter** (not just prompt engineering) that controls response length.
|
|
50
|
+
|
|
51
|
+
| Level | Behavior | Use when |
|
|
52
|
+
| -------- | -------------------------------------------- | --------------------------------------------- |
|
|
53
|
+
| `low` | Terse, to-the-point, just the facts | Latency and scanability matter most |
|
|
54
|
+
| `medium` | **Default.** Balanced detail for most tasks. | General assistants and professional workflows |
|
|
55
|
+
| `high` | Detailed, explanatory, comprehensive. | The user asked for depth or auditability |
|
|
56
|
+
|
|
57
|
+
```python
|
|
58
|
+
response = client.responses.create(
|
|
59
|
+
model="gpt-5.5",
|
|
60
|
+
input="Your prompt here",
|
|
61
|
+
text={"verbosity": "low"}
|
|
62
|
+
)
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
**Interaction with prompts:** If explicit instructions conflict with the `verbosity` parameter, explicit instructions
|
|
66
|
+
take precedence. For code generation, Cursor found that setting `verbosity: low` for text output while prompting for
|
|
67
|
+
verbose code in tool calls produced the best results.
|
|
68
|
+
|
|
69
|
+
### Context Window
|
|
70
|
+
|
|
71
|
+
- **1,050,000 tokens** input / **128,000 tokens** max output
|
|
72
|
+
- Prompts above 272K input tokens have higher API pricing; use context budgets deliberately.
|
|
73
|
+
|
|
74
|
+
### Knowledge Cutoff
|
|
75
|
+
|
|
76
|
+
**December 1, 2025.**
|
|
77
|
+
|
|
78
|
+
---
|
|
79
|
+
|
|
80
|
+
## Key Behavioral Differences from GPT-5.2
|
|
81
|
+
|
|
82
|
+
| Aspect | GPT-5.5 Behavior |
|
|
83
|
+
| --------------------- | --------------------------------------------------------------------------------- |
|
|
84
|
+
| Reasoning default | `medium`; use `low` before `none` when planning, search, or tool use still matter |
|
|
85
|
+
| Prompt shape | Outcome-first prompts usually work better than step-by-step process scaffolding |
|
|
86
|
+
| Tool calling | Stronger tool selection; define tool triggers, evidence rules, and stop rules |
|
|
87
|
+
| User-visible preamble | Useful for time-to-first-token in long or tool-heavy turns |
|
|
88
|
+
| Verbosity | Concise and direct by default; controllable via the `verbosity` API parameter |
|
|
89
|
+
| Instruction following | More literal and thorough; define success criteria and stopping conditions |
|
|
90
|
+
|
|
91
|
+
---
|
|
92
|
+
|
|
93
|
+
## Prompting Patterns
|
|
94
|
+
|
|
95
|
+
### Output Contracts and Completion Criteria
|
|
96
|
+
|
|
97
|
+
OpenAI's primary recommendation for GPT-5.5. Explicitly define **what "done" looks like**:
|
|
98
|
+
|
|
99
|
+
```xml
|
|
100
|
+
<output_contract>
|
|
101
|
+
- Return a JSON object with keys: summary, findings[], recommendations[], confidence_score.
|
|
102
|
+
- Each finding must include: file_path, line_range, severity (critical|warning|info), description.
|
|
103
|
+
- confidence_score is 0.0-1.0 reflecting how thoroughly the codebase was analyzed.
|
|
104
|
+
- Task is complete when all files in scope have been reviewed and findings are deduplicated.
|
|
105
|
+
</output_contract>
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
Start with the smallest prompt that passes your evals. Add blocks only when they fix a measured failure mode.
|
|
109
|
+
|
|
110
|
+
### Controlling Verbosity and Output Shape
|
|
111
|
+
|
|
112
|
+
Use the `verbosity` API parameter as the **primary lever**, and prompt-level constraints as secondary:
|
|
113
|
+
|
|
114
|
+
```xml
|
|
115
|
+
<output_verbosity_spec>
|
|
116
|
+
- Default: 3-6 sentences or <=5 bullets for typical answers.
|
|
117
|
+
- For simple "yes/no + short explanation" questions: <=2 sentences.
|
|
118
|
+
- For complex multi-step or multi-file tasks:
|
|
119
|
+
- 1 short overview paragraph
|
|
120
|
+
- then <=5 bullets tagged: What changed, Where, Risks, Next steps, Open questions.
|
|
121
|
+
- Do not rephrase the user's request unless it changes semantics.
|
|
122
|
+
</output_verbosity_spec>
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
### Initiative Nudges
|
|
126
|
+
|
|
127
|
+
If the model feels too literal or stops at the first plausible answer, add an **initiative nudge** before raising
|
|
128
|
+
`reasoning.effort`:
|
|
129
|
+
|
|
130
|
+
```xml
|
|
131
|
+
<initiative>
|
|
132
|
+
- Do not stop at the first plausible answer.
|
|
133
|
+
- Look for second-order issues, edge cases, and missing constraints.
|
|
134
|
+
- If the task is safety or accuracy critical, perform at least one verification step.
|
|
135
|
+
</initiative>
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
This is cheaper and often more effective than bumping `reasoning.effort` up a notch.
|
|
139
|
+
|
|
140
|
+
### Preventing Scope Drift
|
|
141
|
+
|
|
142
|
+
GPT-5.5 is more controllable than GPT-5.2 but still prone to scope drift on coding tasks:
|
|
143
|
+
|
|
144
|
+
```xml
|
|
145
|
+
<design_and_scope_constraints>
|
|
146
|
+
- Implement EXACTLY and ONLY what the user requests.
|
|
147
|
+
- No extra features, no added components, no UX embellishments.
|
|
148
|
+
- Style aligned to the design system at hand.
|
|
149
|
+
- Do NOT invent colors, shadows, tokens, animations, or new UI elements unless requested.
|
|
150
|
+
- If any instruction is ambiguous, choose the simplest valid interpretation.
|
|
151
|
+
</design_and_scope_constraints>
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
### Long-Context and Recall
|
|
155
|
+
|
|
156
|
+
With 1.05M tokens available, long-context handling is more common. For inputs >10K tokens, use **forced re-grounding**:
|
|
157
|
+
|
|
158
|
+
```xml
|
|
159
|
+
<long_context_handling>
|
|
160
|
+
- For inputs longer than ~10k tokens (multi-chapter docs, long threads, multiple PDFs):
|
|
161
|
+
- First, produce a short internal outline of key sections relevant to the user's request.
|
|
162
|
+
- Re-state the user's constraints explicitly before answering.
|
|
163
|
+
- Anchor claims to sections ("In the 'Data Retention' section...") rather than speaking generically.
|
|
164
|
+
- If the answer depends on fine details (dates, thresholds, clauses), quote or paraphrase them.
|
|
165
|
+
</long_context_handling>
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
### Preambles (Tool-Use Transparency)
|
|
169
|
+
|
|
170
|
+
GPT-5.5 can generate brief, user-visible explanations before invoking tools — outlining its intent before the actual
|
|
171
|
+
tool call. This boosts tool-calling accuracy without bloating reasoning overhead.
|
|
172
|
+
|
|
173
|
+
Enable with a system instruction:
|
|
174
|
+
|
|
175
|
+
```
|
|
176
|
+
Before you call a tool, explain in one sentence why you are calling it.
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
### Handling Ambiguity & Hallucination Risk
|
|
180
|
+
|
|
181
|
+
```xml
|
|
182
|
+
<uncertainty_and_ambiguity>
|
|
183
|
+
- If the question is ambiguous or underspecified, explicitly call this out and:
|
|
184
|
+
- Ask up to 1-3 precise clarifying questions, OR
|
|
185
|
+
- Present 2-3 plausible interpretations with clearly labeled assumptions.
|
|
186
|
+
- Never fabricate exact figures, line numbers, or external references when uncertain.
|
|
187
|
+
- When unsure, prefer language like "Based on the provided context..." instead of absolute claims.
|
|
188
|
+
</uncertainty_and_ambiguity>
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
**High-risk self-check for sensitive contexts:**
|
|
192
|
+
|
|
193
|
+
```xml
|
|
194
|
+
<high_risk_self_check>
|
|
195
|
+
Before finalizing an answer in legal, financial, compliance, or safety-sensitive contexts:
|
|
196
|
+
- Briefly re-scan your own answer for:
|
|
197
|
+
- Unstated assumptions,
|
|
198
|
+
- Specific numbers or claims not grounded in context,
|
|
199
|
+
- Overly strong language ("always," "guaranteed," etc.).
|
|
200
|
+
- If you find any, soften or qualify them and explicitly state assumptions.
|
|
201
|
+
</high_risk_self_check>
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
---
|
|
205
|
+
|
|
206
|
+
## Agentic Steerability & User Updates
|
|
207
|
+
|
|
208
|
+
GPT-5.5 works well in long-running workflows when the prompt defines progress, stopping conditions, and when to ask for
|
|
209
|
+
help.
|
|
210
|
+
|
|
211
|
+
### Verbosity + Code Quality (Cursor's Pattern)
|
|
212
|
+
|
|
213
|
+
Cursor found the best results by separating text and code verbosity:
|
|
214
|
+
|
|
215
|
+
- Set `verbosity: low` at the API level to keep text outputs brief
|
|
216
|
+
- In the prompt, strongly encourage verbose, well-commented output in coding tools only
|
|
217
|
+
|
|
218
|
+
This prevents status updates and post-task summaries from disrupting flow while keeping code readable.
|
|
219
|
+
|
|
220
|
+
### User Update Discipline
|
|
221
|
+
|
|
222
|
+
```xml
|
|
223
|
+
<user_updates_spec>
|
|
224
|
+
- Send brief updates (1-2 sentences) only when:
|
|
225
|
+
- You start a new major phase of work, or
|
|
226
|
+
- You discover something that changes the plan.
|
|
227
|
+
- Avoid narrating routine tool calls ("reading file...", "running tests...").
|
|
228
|
+
- Each update must include at least one concrete outcome ("Found X", "Confirmed Y", "Updated Z").
|
|
229
|
+
- Do not expand the task beyond what the user asked; if you notice new work, call it out as optional.
|
|
230
|
+
</user_updates_spec>
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
### Delegation Rules
|
|
234
|
+
|
|
235
|
+
```xml
|
|
236
|
+
<delegation_rules>
|
|
237
|
+
- Delegate only when subtasks are independent or can proceed in parallel.
|
|
238
|
+
- For each delegated task, define ownership, expected output, dependencies, and "done".
|
|
239
|
+
- Keep blocking decisions in the main workflow unless delegation is explicitly useful.
|
|
240
|
+
- Integrate delegated results before finalizing.
|
|
241
|
+
</delegation_rules>
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
---
|
|
245
|
+
|
|
246
|
+
## Tool Calling and Parallelism
|
|
247
|
+
|
|
248
|
+
### Tool Use Rules
|
|
249
|
+
|
|
250
|
+
```xml
|
|
251
|
+
<tool_usage_rules>
|
|
252
|
+
- Prefer tools over internal knowledge whenever:
|
|
253
|
+
- You need fresh or user-specific data (tickets, orders, configs, logs).
|
|
254
|
+
- You reference specific IDs, URLs, or document titles.
|
|
255
|
+
- Parallelize independent reads (read_file, fetch_record, search_docs) when possible.
|
|
256
|
+
- After any write/update tool call, briefly restate:
|
|
257
|
+
- What changed,
|
|
258
|
+
- Where (ID or path),
|
|
259
|
+
- Any follow-up validation performed.
|
|
260
|
+
</tool_usage_rules>
|
|
261
|
+
```
|
|
262
|
+
|
|
263
|
+
### Parallel Tool Calls
|
|
264
|
+
|
|
265
|
+
- GPT-5.5 supports parallel function calls — invoking multiple tools in a single model pass
|
|
266
|
+
- Do not rely on `none` for multi-step tool workflows; use `low` or higher when planning, search, or chained tools
|
|
267
|
+
matter
|
|
268
|
+
- OpenAI measures parallelization efficiency via **tool yields**: if 3 tools are called in parallel, followed by 3 more
|
|
269
|
+
in parallel, the number of yields is 2 (a better latency proxy than raw tool call count)
|
|
270
|
+
|
|
271
|
+
---
|
|
272
|
+
|
|
273
|
+
## Structured Extraction
|
|
274
|
+
|
|
275
|
+
For extraction, prompts should define the schema, missing-field behavior, and completeness check.
|
|
276
|
+
|
|
277
|
+
1. Always provide a schema or JSON shape
|
|
278
|
+
2. Use structured outputs for strict schema adherence
|
|
279
|
+
3. Distinguish required vs optional fields
|
|
280
|
+
4. Ask for "extraction completeness"
|
|
281
|
+
5. Handle missing fields explicitly
|
|
282
|
+
|
|
283
|
+
```xml
|
|
284
|
+
<extraction_spec>
|
|
285
|
+
You will extract structured data from tables/PDFs/emails into JSON.
|
|
286
|
+
- Always follow this schema exactly (no extra fields):
|
|
287
|
+
{
|
|
288
|
+
"party_name": string,
|
|
289
|
+
"jurisdiction": string | null,
|
|
290
|
+
"effective_date": string | null,
|
|
291
|
+
"termination_clause_summary": string | null
|
|
292
|
+
}
|
|
293
|
+
- If a field is not present in the source, set it to null rather than guessing.
|
|
294
|
+
- Before returning, quickly re-scan the source for any missed fields and correct omissions.
|
|
295
|
+
</extraction_spec>
|
|
296
|
+
```
|
|
297
|
+
|
|
298
|
+
**New in GPT-5.5:** You can define tools with `type: custom` to enable models to send plaintext inputs directly to
|
|
299
|
+
tools, rather than being limited to structured JSON.
|
|
300
|
+
|
|
301
|
+
---
|
|
302
|
+
|
|
303
|
+
## Web Search and Research
|
|
304
|
+
|
|
305
|
+
GPT-5.5 is more steerable at synthesizing across many sources. Knowledge cutoff: **December 1, 2025**.
|
|
306
|
+
|
|
307
|
+
### Research Agent Prompt
|
|
308
|
+
|
|
309
|
+
```xml
|
|
310
|
+
<web_search_rules>
|
|
311
|
+
- Act as an expert research assistant; default to comprehensive, well-structured answers.
|
|
312
|
+
- Prefer web research over assumptions whenever facts may be uncertain or incomplete.
|
|
313
|
+
- Include citations for all web-derived information.
|
|
314
|
+
- Research all parts of the query, resolve contradictions, and follow important second-order
|
|
315
|
+
implications until further research is unlikely to change the answer.
|
|
316
|
+
- Do not ask clarifying questions; instead cover all plausible user intents with both breadth and depth.
|
|
317
|
+
- Write clearly and directly using Markdown (headers, bullets, tables when helpful).
|
|
318
|
+
- Define acronyms, use concrete examples, and keep a natural, conversational tone.
|
|
319
|
+
</web_search_rules>
|
|
320
|
+
```
|
|
321
|
+
|
|
322
|
+
### Search Modes
|
|
323
|
+
|
|
324
|
+
| Mode | Use Case |
|
|
325
|
+
| -------------- | ------------------------------------------- |
|
|
326
|
+
| Non-reasoning | Quick lookups, completes in seconds |
|
|
327
|
+
| Agentic search | Iterative reasoning with follow-up searches |
|
|
328
|
+
| Deep research | Exhaustive investigations, takes minutes |
|
|
329
|
+
|
|
330
|
+
**Tip:** Using hints like "go deep" triggers more thorough research.
|
|
331
|
+
|
|
332
|
+
---
|
|
333
|
+
|
|
334
|
+
## Responses API
|
|
335
|
+
|
|
336
|
+
GPT-5.5 is designed around the **Responses API** for reasoning, tool-calling, and multi-turn use cases.
|
|
337
|
+
|
|
338
|
+
| Feature | Chat Completions | Responses API |
|
|
339
|
+
| --------------------------- | ---------------- | ------------- |
|
|
340
|
+
| Basic text generation | Yes | Yes |
|
|
341
|
+
| Reasoning item preservation | No | Yes |
|
|
342
|
+
| `previous_response_id` | No | Yes |
|
|
343
|
+
|
|
344
|
+
**Why Responses API matters:** It preserves reasoning items across turns, which improves multi-step tool workflows and
|
|
345
|
+
can reduce redundant reasoning. If you manually replay assistant output items, preserve returned reasoning and `phase`
|
|
346
|
+
items unchanged.
|
|
347
|
+
|
|
348
|
+
---
|
|
349
|
+
|
|
350
|
+
## Migration Guide to GPT-5.5
|
|
351
|
+
|
|
352
|
+
### Migration Mapping
|
|
353
|
+
|
|
354
|
+
| Current Model | Target | Reasoning Effort | Notes |
|
|
355
|
+
| ------------- | ------- | ------------------ | ----------------------------------- |
|
|
356
|
+
| GPT-5.2 | GPT-5.5 | Default (drop-in) | Just change the model name |
|
|
357
|
+
| GPT-5.3-Codex | GPT-5.5 | Default | GPT-5.5 subsumes Codex capabilities |
|
|
358
|
+
| o3 | GPT-5.5 | `medium` or `high` | For reasoning-heavy workloads |
|
|
359
|
+
| GPT-4.1 | GPT-5.5 | `none` | Treat as fast/low-deliberation |
|
|
360
|
+
| GPT-4o | GPT-5.5 | `none` | Same as GPT-4.1 |
|
|
361
|
+
|
|
362
|
+
### Migration Steps
|
|
363
|
+
|
|
364
|
+
1. **Switch models, don't change prompts yet** — Test model change in isolation
|
|
365
|
+
2. **Pin `reasoning.effort`** — Match prior model's latency/depth profile
|
|
366
|
+
3. **Run evals for baseline** — If results look good, ready to ship
|
|
367
|
+
4. **If regressions, try an initiative nudge first** — Before raising reasoning effort
|
|
368
|
+
5. **If still regressing, tune the prompt** — Use Prompt Optimizer + targeted constraints
|
|
369
|
+
6. **Re-run evals after each small change** — Iterate incrementally
|
|
370
|
+
|
|
371
|
+
### Prompt Optimizer
|
|
372
|
+
|
|
373
|
+
OpenAI's [Prompt Optimizer](https://platform.openai.com/chat/edit?optimize=true) in Playground helps:
|
|
374
|
+
|
|
375
|
+
- Quickly improve existing prompts for GPT-5.5
|
|
376
|
+
- Migrate across GPT-5 models
|
|
377
|
+
- Remove common failure modes
|
|
378
|
+
|
|
379
|
+
---
|
|
380
|
+
|
|
381
|
+
## Complete Example: Enterprise Agent System Prompt
|
|
382
|
+
|
|
383
|
+
```xml
|
|
384
|
+
<role>
|
|
385
|
+
You are a GPT-5.5 enterprise assistant for [DOMAIN].
|
|
386
|
+
You are precise, analytical, persistent, and disciplined.
|
|
387
|
+
</role>
|
|
388
|
+
|
|
389
|
+
<output_contract>
|
|
390
|
+
- Define the exact output shape for each task type.
|
|
391
|
+
- Task is complete when [explicit completion criteria].
|
|
392
|
+
- If completion criteria cannot be met, explain what is missing and what would unblock it.
|
|
393
|
+
</output_contract>
|
|
394
|
+
|
|
395
|
+
<output_verbosity_spec>
|
|
396
|
+
- Default: 3-6 sentences or <=5 bullets for typical answers.
|
|
397
|
+
- For simple questions: <=2 sentences.
|
|
398
|
+
- For complex tasks: 1 overview paragraph + <=5 tagged bullets
|
|
399
|
+
(What changed, Where, Risks, Next steps, Open questions).
|
|
400
|
+
- Do not rephrase the user's request unless it changes semantics.
|
|
401
|
+
</output_verbosity_spec>
|
|
402
|
+
|
|
403
|
+
<design_and_scope_constraints>
|
|
404
|
+
- Implement EXACTLY and ONLY what the user requests.
|
|
405
|
+
- No extra features, no added components, no embellishments.
|
|
406
|
+
- If instruction is ambiguous, choose the simplest valid interpretation.
|
|
407
|
+
</design_and_scope_constraints>
|
|
408
|
+
|
|
409
|
+
<initiative>
|
|
410
|
+
- Do not stop at the first plausible answer.
|
|
411
|
+
- Look for second-order issues, edge cases, and missing constraints.
|
|
412
|
+
- If safety or accuracy critical, perform at least one verification step.
|
|
413
|
+
</initiative>
|
|
414
|
+
|
|
415
|
+
<uncertainty_and_ambiguity>
|
|
416
|
+
- If ambiguous: ask 1-3 clarifying questions OR present 2-3 interpretations with labeled assumptions.
|
|
417
|
+
- Never fabricate exact figures or references when uncertain.
|
|
418
|
+
- Prefer "Based on the provided context..." over absolute claims.
|
|
419
|
+
</uncertainty_and_ambiguity>
|
|
420
|
+
|
|
421
|
+
<tool_usage_rules>
|
|
422
|
+
- Prefer tools over internal knowledge for fresh/user-specific data.
|
|
423
|
+
- Parallelize independent reads when possible.
|
|
424
|
+
- Before calling a tool, explain in one sentence why you are calling it.
|
|
425
|
+
- After write/update: restate what changed, where, and validation performed.
|
|
426
|
+
</tool_usage_rules>
|
|
427
|
+
|
|
428
|
+
<user_updates_spec>
|
|
429
|
+
- Brief updates (1-2 sentences) only when starting new phase or plan changes.
|
|
430
|
+
- Avoid narrating routine tool calls.
|
|
431
|
+
- Each update must include concrete outcome.
|
|
432
|
+
- Do not expand task beyond what user asked.
|
|
433
|
+
</user_updates_spec>
|
|
434
|
+
|
|
435
|
+
<high_risk_self_check>
|
|
436
|
+
Before finalizing in legal/financial/compliance/safety contexts:
|
|
437
|
+
- Re-scan for unstated assumptions, ungrounded claims, overly strong language.
|
|
438
|
+
- Soften or qualify as needed.
|
|
439
|
+
</high_risk_self_check>
|
|
440
|
+
```
|
|
441
|
+
|
|
442
|
+
---
|
|
443
|
+
|
|
444
|
+
## Key Differences: GPT-5.5 vs GPT-5.2 vs Gemini 3.1 Pro
|
|
445
|
+
|
|
446
|
+
| Aspect | GPT-5.5 | GPT-5.2 | Gemini 3.1 Pro |
|
|
447
|
+
| --------------------- | ---------------------------------------- | ------------------------- | -------------------------- |
|
|
448
|
+
| Default reasoning | `medium` | `none` | `high` (dynamic) |
|
|
449
|
+
| Default verbosity | Direct, controllable via API | Low, prompt-controlled | Concise |
|
|
450
|
+
| Context window | 1.05M tokens | 400K tokens | 1M tokens |
|
|
451
|
+
| Temperature | Flexible | Flexible | Keep at 1.0 |
|
|
452
|
+
| Structured extraction | Strong (+ custom tool types) | Strong | Good |
|
|
453
|
+
| Tool prompting | Define triggers, evidence, stop rules | May need more scaffolding | Use direct tool rules |
|
|
454
|
+
| Multi-turn state | `previous_response_id` / reasoning items | Responses state | Thought signatures |
|
|
455
|
+
| Knowledge cutoff | Dec 1, 2025 | August 2025 | January 2025 |
|
|
456
|
+
| Best for | Agentic, coding, professional work | Enterprise, document | Reasoning, multimodal work |
|
|
457
|
+
|
|
458
|
+
---
|
|
459
|
+
|
|
460
|
+
## Pro Tips
|
|
461
|
+
|
|
462
|
+
1. **Define output contracts first** — Explicit completion criteria are the highest-leverage prompt change for GPT-5.5
|
|
463
|
+
|
|
464
|
+
2. **Use initiative nudges before raising reasoning effort** — Cheaper and often more effective
|
|
465
|
+
|
|
466
|
+
3. **Set `verbosity` at the API level** — Separate text brevity from code verbosity (Cursor pattern)
|
|
467
|
+
|
|
468
|
+
4. **Enable preambles for tool-use transparency** — "Before calling a tool, explain why" boosts accuracy
|
|
469
|
+
|
|
470
|
+
5. **Tune `reasoning.effort` by task shape** — Lower to `none` for execution-heavy workloads; raise to `high` for
|
|
471
|
+
complex multi-step problems
|
|
472
|
+
|
|
473
|
+
6. **Anchor long-context answers** — Reference specific sections even with 1.05M context available
|
|
474
|
+
|
|
475
|
+
7. **Migration is incremental** — One change at a time; model first, then reasoning effort, then prompt
|
|
476
|
+
|
|
477
|
+
8. **Parallelize tool calls** — Use `low` or higher for multi-step tool planning; measure tool yields, not raw calls
|
|
478
|
+
|
|
479
|
+
---
|
|
480
|
+
|
|
481
|
+
## Sources
|
|
482
|
+
|
|
483
|
+
- [OpenAI: Prompt Guidance for GPT-5.5](https://developers.openai.com/api/docs/guides/prompt-guidance)
|
|
484
|
+
- [OpenAI: Using GPT-5.5](https://developers.openai.com/api/docs/guides/latest-model)
|
|
485
|
+
- [OpenAI: GPT-5.5 Model](https://developers.openai.com/api/docs/models/gpt-5.5)
|
|
486
|
+
- [OpenAI: Introducing GPT-5.5](https://openai.com/index/introducing-gpt-5-5/)
|
|
487
|
+
- [OpenAI: Reasoning Models](https://developers.openai.com/api/docs/guides/reasoning)
|
|
488
|
+
- [OpenAI: Reasoning Best Practices](https://developers.openai.com/api/docs/guides/reasoning-best-practices)
|
|
489
|
+
- [OpenAI Cookbook: GPT-5 Prompting Guide](https://developers.openai.com/cookbook/examples/gpt-5/gpt-5_prompting_guide)
|
|
490
|
+
- [OpenAI Cookbook: GPT-5 New Params and Tools](https://cookbook.openai.com/examples/gpt-5/gpt-5_new_params_and_tools)
|