ifcraftcorpus 1.4.0__py3-none-any.whl → 1.6.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (60) hide show
  1. ifcraftcorpus/index.py +5 -1
  2. ifcraftcorpus-1.6.0.data/data/share/ifcraftcorpus/corpus/agent-design/agent_memory_architecture.md +818 -0
  3. ifcraftcorpus-1.6.0.data/data/share/ifcraftcorpus/corpus/agent-design/agent_prompt_engineering.md +1481 -0
  4. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/agent-design/multi_agent_patterns.md +1 -0
  5. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/genre-conventions/fantasy_conventions.md +4 -0
  6. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/prose-and-language/narrative_point_of_view.md +3 -4
  7. {ifcraftcorpus-1.4.0.dist-info → ifcraftcorpus-1.6.0.dist-info}/METADATA +1 -1
  8. {ifcraftcorpus-1.4.0.dist-info → ifcraftcorpus-1.6.0.dist-info}/RECORD +59 -58
  9. ifcraftcorpus-1.4.0.data/data/share/ifcraftcorpus/corpus/agent-design/agent_prompt_engineering.md +0 -750
  10. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/audience-and-access/accessibility_guidelines.md +0 -0
  11. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/audience-and-access/audience_targeting.md +0 -0
  12. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/audience-and-access/localization_considerations.md +0 -0
  13. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/craft-foundations/audio_visual_integration.md +0 -0
  14. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/craft-foundations/collaborative_if_writing.md +0 -0
  15. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/craft-foundations/creative_workflow_pipeline.md +0 -0
  16. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/craft-foundations/diegetic_design.md +0 -0
  17. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/craft-foundations/idea_capture_and_hooks.md +0 -0
  18. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/craft-foundations/if_platform_tools.md +0 -0
  19. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/craft-foundations/player_analytics_metrics.md +0 -0
  20. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/craft-foundations/quality_standards_if.md +0 -0
  21. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/craft-foundations/research_and_verification.md +0 -0
  22. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/craft-foundations/testing_interactive_fiction.md +0 -0
  23. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/emotional-design/conflict_patterns.md +0 -0
  24. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/emotional-design/emotional_beats.md +0 -0
  25. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/game-design/mechanics_design_patterns.md +0 -0
  26. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/genre-conventions/children_and_ya_conventions.md +0 -0
  27. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/genre-conventions/historical_fiction.md +0 -0
  28. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/genre-conventions/horror_conventions.md +0 -0
  29. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/genre-conventions/mystery_conventions.md +0 -0
  30. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/genre-conventions/sci_fi_conventions.md +0 -0
  31. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/narrative-structure/branching_narrative_construction.md +0 -0
  32. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/narrative-structure/branching_narrative_craft.md +0 -0
  33. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/narrative-structure/endings_patterns.md +0 -0
  34. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/narrative-structure/episodic_serialized_if.md +0 -0
  35. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/narrative-structure/nonlinear_structure.md +0 -0
  36. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/narrative-structure/pacing_and_tension.md +0 -0
  37. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/narrative-structure/romance_and_relationships.md +0 -0
  38. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/narrative-structure/scene_structure_and_beats.md +0 -0
  39. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/narrative-structure/scene_transitions.md +0 -0
  40. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/prose-and-language/character_voice.md +0 -0
  41. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/prose-and-language/dialogue_craft.md +0 -0
  42. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/prose-and-language/exposition_techniques.md +0 -0
  43. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/prose-and-language/prose_patterns.md +0 -0
  44. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/prose-and-language/subtext_and_implication.md +0 -0
  45. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/prose-and-language/voice_register_consistency.md +0 -0
  46. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/scope-and-planning/scope_and_length.md +0 -0
  47. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/world-and-setting/canon_management.md +0 -0
  48. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/world-and-setting/setting_as_character.md +0 -0
  49. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/corpus/world-and-setting/worldbuilding_patterns.md +0 -0
  50. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/subagents/README.md +0 -0
  51. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/subagents/if_genre_consultant.md +0 -0
  52. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/subagents/if_platform_advisor.md +0 -0
  53. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/subagents/if_prose_writer.md +0 -0
  54. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/subagents/if_quality_reviewer.md +0 -0
  55. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/subagents/if_story_architect.md +0 -0
  56. {ifcraftcorpus-1.4.0.data → ifcraftcorpus-1.6.0.data}/data/share/ifcraftcorpus/subagents/if_world_curator.md +0 -0
  57. {ifcraftcorpus-1.4.0.dist-info → ifcraftcorpus-1.6.0.dist-info}/WHEEL +0 -0
  58. {ifcraftcorpus-1.4.0.dist-info → ifcraftcorpus-1.6.0.dist-info}/entry_points.txt +0 -0
  59. {ifcraftcorpus-1.4.0.dist-info → ifcraftcorpus-1.6.0.dist-info}/licenses/LICENSE +0 -0
  60. {ifcraftcorpus-1.4.0.dist-info → ifcraftcorpus-1.6.0.dist-info}/licenses/LICENSE-CONTENT +0 -0
@@ -1,750 +0,0 @@
1
- ---
2
- title: Agent Prompt Engineering
3
- summary: Techniques for crafting effective LLM agent prompts—attention patterns, tool design, context layering, model size considerations, and testing strategies.
4
- topics:
5
- - prompt-engineering
6
- - llm-agents
7
- - attention-patterns
8
- - tool-design
9
- - context-management
10
- - small-models
11
- - chain-of-thought
12
- - few-shot-learning
13
- cluster: agent-design
14
- ---
15
-
16
- # Agent Prompt Engineering
17
-
18
- Techniques for crafting effective prompts for LLM agents—attention patterns, tool design, context layering, and strategies for different model sizes.
19
-
20
- This document is useful both for agents creating content AND for humans designing agents.
21
-
22
- ---
23
-
24
- ## Attention Patterns
25
-
26
- ### Lost in the Middle
27
-
28
- LLMs exhibit a U-shaped attention curve: information at the **beginning** and **end** of prompts receives stronger attention than content in the middle.
29
-
30
- ```
31
- Position in prompt: [START] -------- [MIDDLE] -------- [END]
32
- Attention strength: HIGH LOW HIGH
33
- ```
34
-
35
- Critical instructions placed in the middle of a long prompt may be ignored, even by otherwise capable models.
36
-
37
- ### The Sandwich Pattern
38
-
39
- For critical instructions, repeat them at the **start AND end** of the prompt:
40
-
41
- ```markdown
42
- ## CRITICAL: You are an orchestrator. NEVER write prose yourself.
43
-
44
- [... 500+ lines of context ...]
45
-
46
- ## REMINDER: You are an orchestrator. NEVER write prose yourself.
47
- ```
48
-
49
- ### Ordering for Attention
50
-
51
- Structure prompts strategically given the U-shaped curve:
52
-
53
- **Recommended order:**
54
-
55
- 1. **Critical behavioral constraints** (lines 1-20)
56
- 2. **Role identity and purpose** (lines 21-50)
57
- 3. **Tool descriptions** (if using function calling)
58
- 4. **Reference material** (middle—lowest attention)
59
- 5. **Knowledge summaries** (for retrieval patterns)
60
- 6. **Critical reminder** (last 10-20 lines)
61
-
62
- **What goes in the middle:**
63
-
64
- Lower-priority content that can be retrieved on demand:
65
-
66
- - Detailed procedures
67
- - Reference tables
68
- - Quality criteria details
69
- - Examples (use retrieval when possible)
70
-
71
- ---
72
-
73
- ## Tool Design
74
-
75
- ### Tool Count Effects
76
-
77
- Tool count strongly correlates with compliance, especially for smaller models:
78
-
79
- | Tool Count | Compliance Rate (8B model) |
80
- |------------|---------------------------|
81
- | 6 tools | ~100% |
82
- | 12 tools | ~85% |
83
- | 20 tools | ~70% |
84
-
85
- **Recommendations:**
86
-
87
- - **Small models (≤8B)**: Limit to 6-8 tools
88
- - **Medium models (9B-70B)**: Up to 12 tools
89
- - **Large models (70B+)**: Can handle 15+ but consider UX
90
-
91
- ### Tool Schema Overhead
92
-
93
- Tool schemas sent via function calling are often larger than the system prompt itself:
94
-
95
- | Component | Typical Size |
96
- |-----------|--------------|
97
- | Tool name | ~5 tokens |
98
- | Description | 50-150 tokens |
99
- | Parameter schema | 100-300 tokens |
100
- | **Per tool total** | 150-450 tokens |
101
- | **13 tools** | **2,000-5,900 tokens** |
102
-
103
- ### Optimization Strategies
104
-
105
- **1. Model-Class Filtering**
106
-
107
- Define reduced tool sets for small models:
108
-
109
- ```json
110
- {
111
- "tools": ["delegate", "communicate", "search", "save", ...],
112
- "small_model_tools": ["delegate", "communicate", "save"]
113
- }
114
- ```
115
-
116
- **2. Two-Stage Selection**
117
-
118
- For large tool libraries (20+):
119
-
120
- 1. Show lightweight menu (name + summary only)
121
- 2. Agent selects relevant tools
122
- 3. Load full schema only for selected tools
123
-
124
- Research shows 50%+ token reduction with 3x accuracy improvement.
125
-
126
- **3. Deferred Loading**
127
-
128
- Mark specialized tools as discoverable but not pre-loaded. They appear in a search interface rather than being sent to the API upfront.
129
-
130
- **4. Concise Descriptions**
131
-
132
- 1-2 sentences max. Move detailed usage guidance to knowledge entries.
133
-
134
- **Before** (~80 tokens):
135
-
136
- > "Delegate work to another agent. This hands off control until the agent completes the task. Provide task description, context, expected outputs, and quality criteria. The receiving agent executes and returns control with artifacts and assessment."
137
-
138
- **After** (~20 tokens):
139
-
140
- > "Hand off a task to another agent. Control returns when they complete."
141
-
142
- **5. Minimal Parameter Schemas**
143
-
144
- For small models, simplify schemas:
145
-
146
- **Full** (~200 tokens): All optional parameters with descriptions
147
-
148
- **Minimal** (~50 tokens): Only required parameters
149
-
150
- Optional parameters can use reasonable defaults.
151
-
152
- ### Tool Description Biasing
153
-
154
- Tool descriptions have **higher influence** than system prompt content when models decide which tool to call.
155
-
156
- **Problem:**
157
-
158
- If a tool description contains prescriptive language ("ALWAYS use this", "This is the primary method"), models will prefer that tool regardless of system prompt instructions.
159
-
160
- **Solution:**
161
-
162
- Use **neutral, descriptive** tool descriptions. Let the **system prompt** dictate when to use tools.
163
-
164
- **Anti-pattern:**
165
-
166
- > "ALWAYS use this tool to create story content. This is the primary way to generate text."
167
-
168
- **Better:**
169
-
170
- > "Creates story prose from a brief. Produces narrative text with dialogue and descriptions."
171
-
172
- ---
173
-
174
- ## Context Architecture
175
-
176
- ### The Four Layers
177
-
178
- Organize agent prompts into distinct layers:
179
-
180
- | Layer | Purpose | Token Priority |
181
- |-------|---------|----------------|
182
- | **System** | Core identity, constraints | High (always include) |
183
- | **Task** | Current instructions | High |
184
- | **Tool** | Tool descriptions/schemas | Medium (filter for small models) |
185
- | **Memory** | Historical context | Variable (summarize as needed) |
186
-
187
- ### Benefits of Layer Separation
188
-
189
- - **Debugging**: Isolate which layer caused unexpected behavior
190
- - **Model switching**: System layer stays constant across model sizes
191
- - **Token management**: Each layer can be independently compressed
192
- - **Caching**: System and tool layers can be cached between turns
193
-
194
- ### Menu + Consult Pattern
195
-
196
- For knowledge that agents need access to but not always in context:
197
-
198
- **Structure:**
199
-
200
- ```
201
- System prompt contains:
202
- - Summary/menu showing what knowledge exists
203
- - Tool to retrieve full details
204
-
205
- System prompt does NOT contain:
206
- - Full knowledge content
207
- - Detailed procedures
208
- - Reference material
209
- ```
210
-
211
- **Benefits:**
212
-
213
- - Smaller initial prompt
214
- - Agent can "pull" knowledge when needed
215
- - Works well with small models
216
-
217
- ### When to Inject vs. Consult
218
-
219
- | Content Type | Small Model | Large Model |
220
- |--------------|-------------|-------------|
221
- | Role identity | Inject | Inject |
222
- | Behavioral constraints | Inject | Inject |
223
- | Workflow procedures | Consult | Inject or Consult |
224
- | Quality criteria | Consult | Inject |
225
- | Reference material | Consult | Consult |
226
-
227
- ---
228
-
229
- ## Model Size Considerations
230
-
231
- ### Token Budgets
232
-
233
- | Model Class | Recommended System Prompt |
234
- |-------------|---------------------------|
235
- | Small (≤8B) | ≤2,000 tokens |
236
- | Medium (9B-70B) | ≤6,000 tokens |
237
- | Large (70B+) | ≤12,000 tokens |
238
-
239
- Exceeding these budgets leads to:
240
-
241
- - Ignored instructions (especially in the middle)
242
- - Reduced tool compliance
243
- - Hallucinated responses
244
-
245
- ### Instruction Density
246
-
247
- Small models struggle with:
248
-
249
- - Conditional logic: "If X and not Y, then Z unless W"
250
- - Multiple competing priorities
251
- - Nuanced edge cases
252
-
253
- **Simplify for small models:**
254
-
255
- - "Always call delegate" (not "call delegate unless validating")
256
- - One instruction per topic
257
- - Remove edge case handling (accept lower quality)
258
-
259
- ### Concise Content Pattern
260
-
261
- Provide two versions of guidance:
262
-
263
- ```json
264
- {
265
- "summary": "Orchestrators delegate tasks to specialists. Before delegating, consult the relevant playbook to understand the workflow. Pass artifact IDs between steps. Monitor completion.",
266
- "concise_summary": "Delegate to specialists. Consult playbook first."
267
- }
268
- ```
269
-
270
- Runtime selects the appropriate version based on model class.
271
-
272
- ### Semantic Ambiguity
273
-
274
- Avoid instructions that can be interpreted multiple ways.
275
-
276
- **Anti-pattern:**
277
-
278
- > "Use your best judgment to determine when validation is needed."
279
-
280
- Small models may interpret as "never validate" or "always validate."
281
-
282
- **Better:**
283
-
284
- > "Call validate after every save."
285
-
286
- ---
287
-
288
- ## Sampling Parameters
289
-
290
- Sampling parameters control the randomness and diversity of LLM outputs. The two most important are **temperature** and **top_p**. These can be set per API call, enabling different settings for different phases of a workflow.
291
-
292
- ### Temperature
293
-
294
- Temperature controls the probability distribution over tokens. Lower values make the model more deterministic; higher values increase randomness and creativity.
295
-
296
- | Temperature | Effect | Use Cases |
297
- |-------------|--------|-----------|
298
- | 0.0–0.2 | Highly deterministic, consistent | Structured output, tool calling, factual responses |
299
- | 0.3–0.5 | Balanced, slight variation | General conversation, summarization |
300
- | 0.6–0.8 | More creative, diverse | Brainstorming, draft generation |
301
- | 0.9–1.0+ | High randomness, exploratory | Creative writing, idea exploration, poetry |
302
-
303
- **How it works:** Temperature scales the logits (pre-softmax scores) before sampling. At T=0, the model always picks the highest-probability token. At T>1, probability differences flatten, making unlikely tokens more probable.
304
-
305
- **Caveats:**
306
-
307
- - Even T=0 isn't fully deterministic—hardware concurrency and floating-point variations can introduce tiny differences
308
- - High temperature increases hallucination risk
309
- - Temperature interacts with top_p; tuning both simultaneously requires care
310
-
311
- ### Top_p (Nucleus Sampling)
312
-
313
- Top_p limits sampling to the smallest set of tokens whose cumulative probability exceeds p. This provides a different control over diversity than temperature.
314
-
315
- | Top_p | Effect |
316
- |-------|--------|
317
- | 0.1–0.3 | Very focused, few token choices |
318
- | 0.5–0.7 | Moderate diversity |
319
- | 0.9–1.0 | Wide sampling, more variation |
320
-
321
- **Temperature vs Top_p:**
322
-
323
- - Temperature affects *all* token probabilities uniformly
324
- - Top_p dynamically adjusts the candidate pool based on probability mass
325
- - For most use cases, adjust one and leave the other at default
326
- - Common pattern: low temperature (0.0–0.3) with top_p=1.0 for structured tasks
327
-
328
- ### Provider Temperature Ranges
329
-
330
- | Provider | Range | Default | Notes |
331
- |----------|-------|---------|-------|
332
- | OpenAI | 0.0–2.0 | 1.0 | Values >1.0 increase randomness significantly |
333
- | Anthropic | 0.0–1.0 | 1.0 | Cannot exceed 1.0 |
334
- | Gemini | 0.0–2.0 | 1.0 | Similar to OpenAI |
335
- | Ollama | 0.0–1.0+ | 0.7–0.8 | Model-dependent defaults |
336
-
337
- ### Phase-Specific Temperature
338
-
339
- Since temperature can be set per API call, use different values for different workflow phases:
340
-
341
- | Phase | Temperature | Rationale |
342
- |-------|-------------|-----------|
343
- | Brainstorming/Discuss | 0.7–1.0 | Encourage diverse ideas, exploration |
344
- | Planning/Freeze | 0.3–0.5 | Balance creativity with coherence |
345
- | Serialize/Tool calls | 0.0–0.2 | Maximize format compliance |
346
- | Validation repair | 0.0–0.2 | Deterministic corrections |
347
-
348
- This is particularly relevant for the **Discuss → Freeze → Serialize** pattern described below—each stage benefits from different temperature settings.
349
-
350
- ---
351
-
352
- ## Structured Output Pipelines
353
-
354
- Many agent tasks end in a **strict artifact**—JSON/YAML configs, story plans, outlines—rather than free-form prose. Trying to get both *conversation* and *perfectly formatted output* from a single response is brittle, especially for small/local models.
355
-
356
- A more reliable approach is to separate the flow into stages:
357
-
358
- 1. **Discuss** – messy, human-friendly turns to clarify goals and constraints. No structured output yet.
359
- 2. **Freeze** – summarize final decisions into a compact, explicit list (facts & constraints).
360
- 3. **Serialize** – a dedicated call whose only job is to emit the structured artifact, constrained by a schema or tool signature.
361
-
362
- ### Discuss → Freeze → Serialize
363
-
364
- **Discuss** (temperature 0.7–1.0): Keep prompts focused on meaning, not field names. Explicitly tell the model *not* to output JSON/YAML during this phase. Higher temperature encourages diverse ideas and creative exploration.
365
-
366
- **Freeze** (temperature 0.3–0.5): Compress decisions into a short summary:
367
-
368
- - 10–30 bullets, one decision per line.
369
- - No open questions, only resolved choices.
370
- - Structured enough that a smaller model can follow it reliably.
371
- - Moderate temperature balances coherence with flexibility.
372
-
373
- **Serialize** (temperature 0.0–0.2): In a separate call:
374
-
375
- - Provide the schema (JSON Schema, typed model, or tool definition).
376
- - Instruct: *"Output only JSON that matches this schema. No prose, no markdown fences."*
377
- - Use constrained decoding/tool calling where available.
378
- - Low temperature maximizes format compliance.
379
-
380
- This separates conversational drift from serialization, which significantly improves reliability for structured outputs like story plans, world-bible slices, or configuration objects. The temperature gradient—high for exploration, low for precision—matches each phase's purpose.
381
-
382
- ### Tool-Gated Finalization
383
-
384
- An alternative is to represent structured output as a **single tool call**:
385
-
386
- - During normal conversation: no tools are called.
387
- - On FINALIZE: the agent must call a tool such as `submit_plan(plan: PlanSchema)` exactly once.
388
-
389
- Pros:
390
-
391
- - Structured data arrives as typed arguments (no text parsing).
392
- - The runtime can validate arguments immediately.
393
-
394
- Cons:
395
-
396
- - Some models occasionally skip the tool call or send partial arguments.
397
-
398
- Pattern in practice:
399
-
400
- - Prefer tool-gated finalization when your stack treats tools as first-class.
401
- - Keep a fallback: if the tool call doesn’t happen, fall back to a serialize-only call using the freeze summary.
402
-
403
- ---
404
-
405
- ## Validate → Feedback → Repair Loop
406
-
407
- Even with good prompts, structured output will sometimes be **almost** right. Instead of accepting failures or silently discarding data, use a validate-with-feedback loop:
408
-
409
- 1. Generate a candidate object (JSON/tool args/text).
410
- 2. Validate it in code (schema/type checks, domain rules).
411
- 3. If invalid, feed back the errors and ask the model to repair **only** the problems.
412
- 4. Repeat for a small, fixed number of attempts.
413
-
414
- ### Validation Channels
415
-
416
- Typical validators:
417
-
418
- - **Schema/type validation:** JSON Schema, Pydantic/dataclasses, or your own type checks.
419
- - **Domain rules:** length ranges, allowed enum values, cross-field consistency (e.g., word-count vs estimated playtime).
420
- - **Link/graph checks:** required references exist, no impossible states.
421
-
422
- ### Designing the Feedback Prompt
423
-
424
- When a candidate fails validation, the repair prompt should:
425
-
426
- - Include the previous candidate object verbatim.
427
- - Include a concise list of validation errors, grouped by field.
428
- - Give strict instructions, e.g.:
429
-
430
- > “Return a corrected JSON object that fixes **only** these errors. Do not change fields that are not mentioned. Output only JSON.”
431
-
432
- For small models, keep error descriptions compact and concrete rather than abstract ("string too long: 345 > max 200").
433
-
434
- ### Structured Validation Feedback
435
-
436
- Rather than returning free-form error messages, use a structured feedback format that leverages attention patterns (status first, action last) and distinguishes error types clearly.
437
-
438
- **Result Categories**
439
-
440
- Use a semantic result enum rather than boolean success/failure:
441
-
442
- | Result | Meaning | Model Action |
443
- |--------|---------|--------------|
444
- | `accepted` | Validation passed, artifact stored | Proceed to next step |
445
- | `validation_failed` | Content issues the model can fix | Repair and resubmit |
446
- | `tool_error` | Infrastructure failure | Retry unchanged or escalate |
447
-
448
- This distinction matters: `validation_failed` tells the model its *content* was wrong (fixable), while `tool_error` indicates the tool itself failed (retry or give up).
449
-
450
- **Error Categorization**
451
-
452
- Group validation errors by type to help the model understand what went wrong:
453
-
454
- ```json
455
- {
456
- "result": "validation_failed",
457
- "issues": {
458
- "invalid": [
459
- {"field": "estimated_passages", "value": 15, "requirement": "must be 1-10"}
460
- ],
461
- "missing": ["protagonist_name", "setting"],
462
- "unknown": ["passages"]
463
- },
464
- "issue_count": {"invalid": 1, "missing": 2, "unknown": 1},
465
- "action": "Fix the 4 issues above and resubmit. Use exact field names from the schema."
466
- }
467
- ```
468
-
469
- | Category | Meaning | Common Cause |
470
- |----------|---------|--------------|
471
- | `invalid` | Field present but value wrong | Constraint violation, wrong type |
472
- | `missing` | Required field not provided | Omission, incomplete output |
473
- | `unknown` | Field not in schema | Typo, hallucinated field name |
474
-
475
- The `unknown` category is particularly valuable—it catches near-misses like `passages` instead of `estimated_passages` that would otherwise appear as "missing" with no hint about the typo.
476
-
477
- **Field Ordering (Primacy/Recency)**
478
-
479
- Structure feedback to exploit the U-shaped attention curve:
480
-
481
- 1. **Result status** (first—immediate orientation)
482
- 2. **Issues by category** (middle—detailed content)
483
- 3. **Issue count** (severity summary)
484
- 4. **Action instructions** (last—what to do next)
485
-
486
- **What NOT to Include**
487
-
488
- | Avoid | Why |
489
- |-------|-----|
490
- | Full schema | Already in tool definition; wastes tokens in retry loops |
491
- | Boolean `success` field | Ambiguous; use semantic result categories instead |
492
- | Generic hints | Replace with actionable, field-specific instructions |
493
- | Valid fields | Only describe what failed, not what succeeded |
494
-
495
- **Example: Before and After**
496
-
497
- Anti-pattern (vague, wastes tokens):
498
-
499
- ```
500
- Error: Validation failed. Expected fields: type, title, protagonist_name,
501
- setting, theme, estimated_passages, tone. Please check your submission
502
- and ensure all required fields are present with valid values.
503
- ```
504
-
505
- Better (specific, actionable):
506
-
507
- ```json
508
- {
509
- "result": "validation_failed",
510
- "issues": {
511
- "invalid": [{"field": "type", "value": "story", "requirement": "must be 'dream'"}],
512
- "missing": ["protagonist_name"],
513
- "unknown": ["passages"]
514
- },
515
- "action": "Fix these 3 issues. Did you mean 'estimated_passages' instead of 'passages'?"
516
- }
517
- ```
518
-
519
- The improved version:
520
-
521
- - Names the exact fields that failed
522
- - Suggests the likely typo (`passages` → `estimated_passages`)
523
- - Doesn't repeat schema information already available to the model
524
- - Ends with a clear action instruction (primacy/recency)
525
-
526
- ### Retry Budget and Token Efficiency
527
-
528
- Validation loops consume tokens. Design for efficiency:
529
-
530
- - **Cap retries**: 2-3 attempts is usually sufficient; more indicates a prompt or schema problem
531
- - **Escalate gracefully**: After retry budget exhausted, surface a clear failure rather than looping
532
- - **Track retry rates**: High retry rates signal opportunities for prompt improvement or schema simplification
533
- - **Consider model capability**: Less capable models may need higher retry budgets but with simpler feedback
534
-
535
- ### Best Practices
536
-
537
- - **Independent validator:** Treat validation as a separate layer or service whenever possible; don’t let the same model decide if its own output is valid.
538
- - **Retry budget:** Cap the number of repair attempts; surface a clear failure state instead of looping indefinitely.
539
- - **Partial success:** Prefer emitting valid-but-partial objects over invalid-but-complete-looking ones; downstream systems can handle missing optional fields more safely than malformed structure.
540
-
541
- Validate → feedback → repair is a general pattern:
542
-
543
- - Works for schema-bound JSON/YAML.
544
- - Works for more informal artifacts (e.g., checklists, outlines) when combined with light-weight structural checks.
545
- - Plays well with the structured-output patterns above and with the reflection/self-critique patterns below.
546
-
547
- ---
548
-
549
- ## Prompt-History Conflicts
550
-
551
- When the system prompt says "MUST do X first" but the conversation history shows the model already did Y, confusion results.
552
-
553
- **Problem:**
554
-
555
- ```
556
- System: "You MUST call consult_playbook before any delegation."
557
- History: [delegate(...) was called successfully]
558
- Model: "But I already delegated... should I undo it?"
559
- ```
560
-
561
- **Solutions:**
562
-
563
- 1. **Use present-tense rules**: "Call consult_playbook before delegating" not "MUST call first"
564
- 2. **Acknowledge state**: "If you haven't yet consulted the playbook, do so now"
565
- 3. **Avoid absolute language** when state may vary
566
-
567
- ---
568
-
569
- ## Chain-of-Thought (CoT)
570
-
571
- For complex logical tasks, forcing the model to articulate its reasoning *before* acting significantly reduces hallucination and logic errors.
572
-
573
- ### The Problem
574
-
575
- Zero-shot tool calling often fails on multi-step problems because the model commits to an action before fully processing constraints.
576
-
577
- ### Implementation
578
-
579
- Require explicit reasoning steps:
580
-
581
- - **Structure**: `<thought>Analysis...</thought>` followed by tool call
582
- - **Tooling**: Add a mandatory `reasoning` parameter to critical tools
583
- - **Benefits**: +40-50% improvement on complex reasoning benchmarks
584
-
585
- ### When to Use
586
-
587
- - Multi-step planning decisions
588
- - Constraint satisfaction problems
589
- - Quality assessments with multiple criteria
590
- - Decisions with long-term consequences
591
-
592
- ---
593
-
594
- ## Dynamic Few-Shot Prompting
595
-
596
- Static example lists consume tokens and may not match the current task.
597
-
598
- ### The Pattern
599
-
600
- Use retrieval to inject context-aware examples:
601
-
602
- 1. **Store** a library of high-quality examples as vectors
603
- 2. **Query** using the current task description
604
- 3. **Inject** top 3-5 most relevant examples into the prompt
605
-
606
- ### Benefits
607
-
608
- - Smaller prompts (no static example bloat)
609
- - More relevant examples for each task
610
- - Examples improve as library grows
611
-
612
- ### When to Use
613
-
614
- - Tasks requiring stylistic consistency
615
- - Complex tool usage patterns
616
- - Domain-specific formats
617
-
618
- ---
619
-
620
- ## Reflection and Self-Correction
621
-
622
- Models perform significantly better when asked to critique their own work before finalizing.
623
-
624
- ### The Pattern
625
-
626
- Implement a "Draft-Critique-Refine" loop:
627
-
628
- 1. **Draft**: Generate preliminary plan or content
629
- 2. **Critique**: Evaluate against constraints
630
- 3. **Refine**: Generate final output based on critique
631
-
632
- ### Implementation Options
633
-
634
- - **Two-turn**: Separate critique and refinement turns
635
- - **Single-turn**: Internal thought step for capable models
636
- - **Validator pattern**: Separate agent reviews work
637
-
638
- ### When to Use
639
-
640
- - High-stakes actions (modifying persistent state, finalizing content)
641
- - Complex constraint satisfaction
642
- - Quality-critical outputs
643
-
644
- ---
645
-
646
- ## Active Context Pruning
647
-
648
- Long-running sessions suffer from "Context Rot"—old, irrelevant details confuse the model even within token limits.
649
-
650
- ### The Problem
651
-
652
- Context is often treated as append-only log. But stale context:
653
-
654
- - Dilutes attention from current task
655
- - May contain outdated assumptions
656
- - Wastes token budget
657
-
658
- ### Strategies
659
-
660
- **Semantic Chunking:**
661
-
662
- Group history by episodes or tasks, not just turns.
663
-
664
- **Active Forgetting:**
665
-
666
- When a task completes, summarize to high-level outcome and **remove** raw turns.
667
-
668
- **State-over-History:**
669
-
670
- Prefer providing current *state* (artifacts, flags) over the *history* of how that state was reached.
671
-
672
- ---
673
-
674
- ## Testing Agent Prompts
675
-
676
- ### Test with Target Models
677
-
678
- Before deploying:
679
-
680
- 1. Test with smallest target model
681
- 2. Verify first-turn tool calls work
682
- 3. Check for unexpected prose generation
683
- 4. Measure token count of system prompt
684
-
685
- ### Metrics to Track
686
-
687
- | Metric | What It Measures |
688
- |--------|------------------|
689
- | Tool compliance rate | % of turns with correct tool calls |
690
- | First-turn success | Does the model call a tool on turn 1? |
691
- | Prose leakage | Does a coordinator generate content? |
692
- | Instruction following | Are critical constraints obeyed? |
693
-
694
- ---
695
-
696
- ## Provider-Specific Optimizations
697
-
698
- - **Anthropic**: Use `token-efficient-tools` beta header for up to 70% output token reduction; temperature capped at 1.0
699
- - **OpenAI**: Consider fine-tuning for frequently-used patterns; temperature range 0.0–2.0
700
- - **Gemini**: Temperature range 0.0–2.0, similar behavior to OpenAI
701
- - **Ollama/Local**: Tool retrieval essential—small models struggle with 10+ tools; default temperature varies by model (typically 0.7–0.8)
702
-
703
- See [Sampling Parameters](#sampling-parameters) for detailed temperature guidance by use case.
704
-
705
- ---
706
-
707
- ## Quick Reference
708
-
709
- | Pattern | Problem It Solves | Key Technique |
710
- |---------|-------------------|---------------|
711
- | Sandwich | Lost in the middle | Repeat critical instructions at start AND end |
712
- | Tool filtering | Small model tool overload | Limit tools by model class |
713
- | Two-stage selection | Large tool libraries | Menu → select → load |
714
- | Concise descriptions | Schema token overhead | 1-2 sentences, details in knowledge |
715
- | Neutral descriptions | Tool preference bias | Descriptive not prescriptive |
716
- | Menu + consult | Context explosion | Summaries in prompt, retrieve on demand |
717
- | Concise content | Small model budgets | Dual-length summaries |
718
- | CoT | Complex reasoning failures | Require reasoning before action |
719
- | Dynamic few-shot | Static example bloat | Retrieve relevant examples |
720
- | Reflection | Quality failures | Draft → critique → refine |
721
- | Context pruning | Context rot | Summarize and remove stale turns |
722
- | Structured feedback | Vague validation errors | Categorize issues (invalid/missing/unknown) |
723
- | Phase-specific temperature | Format errors in structured output | High temp for discuss, low for serialize |
724
-
725
- | Model Class | Max Prompt | Max Tools | Strategy |
726
- |-------------|------------|-----------|----------|
727
- | Small (≤8B) | 2,000 tokens | 6-8 | Aggressive filtering, concise content |
728
- | Medium (9B-70B) | 6,000 tokens | 12 | Selective filtering, menu+consult |
729
- | Large (70B+) | 12,000 tokens | 15+ | Full content where beneficial |
730
-
731
- ---
732
-
733
- ## Research Basis
734
-
735
- | Source | Key Finding |
736
- |--------|-------------|
737
- | Stanford "Lost in the Middle" | U-shaped attention curve; middle content ignored |
738
- | "Less is More" (2024) | Tool count inversely correlates with compliance |
739
- | RAG-MCP (2025) | Two-stage selection reduces tokens 50%+, improves accuracy 3x |
740
- | Anthropic Token-Efficient Tools | Schema optimization reduces output tokens 70% |
741
- | Reflexion research | Self-correction improves quality on complex tasks |
742
- | STROT Framework (2025) | Structured feedback loops achieve 95% first-attempt success |
743
- | AWS Evaluator-Optimizer | Semantic reflection enables self-improving validation |
744
-
745
- ---
746
-
747
- ## See Also
748
-
749
- - [Branching Narrative Construction](../narrative-structure/branching_narrative_construction.md) — LLM generation strategies for narratives
750
- - [Multi-Agent Patterns](multi_agent_patterns.md) — Team coordination and delegation