ifcraftcorpus 1.4.0__tar.gz → 1.5.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (68) hide show
  1. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/PKG-INFO +1 -1
  2. ifcraftcorpus-1.5.0/corpus/agent-design/agent_memory_architecture.md +765 -0
  3. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/agent-design/agent_prompt_engineering.md +247 -0
  4. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/agent-design/multi_agent_patterns.md +1 -0
  5. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/pyproject.toml +1 -1
  6. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/.gitignore +0 -0
  7. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/LICENSE +0 -0
  8. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/LICENSE-CONTENT +0 -0
  9. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/README.md +0 -0
  10. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/audience-and-access/accessibility_guidelines.md +0 -0
  11. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/audience-and-access/audience_targeting.md +0 -0
  12. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/audience-and-access/localization_considerations.md +0 -0
  13. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/craft-foundations/audio_visual_integration.md +0 -0
  14. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/craft-foundations/collaborative_if_writing.md +0 -0
  15. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/craft-foundations/creative_workflow_pipeline.md +0 -0
  16. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/craft-foundations/diegetic_design.md +0 -0
  17. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/craft-foundations/idea_capture_and_hooks.md +0 -0
  18. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/craft-foundations/if_platform_tools.md +0 -0
  19. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/craft-foundations/player_analytics_metrics.md +0 -0
  20. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/craft-foundations/quality_standards_if.md +0 -0
  21. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/craft-foundations/research_and_verification.md +0 -0
  22. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/craft-foundations/testing_interactive_fiction.md +0 -0
  23. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/emotional-design/conflict_patterns.md +0 -0
  24. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/emotional-design/emotional_beats.md +0 -0
  25. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/game-design/mechanics_design_patterns.md +0 -0
  26. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/genre-conventions/children_and_ya_conventions.md +0 -0
  27. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/genre-conventions/fantasy_conventions.md +0 -0
  28. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/genre-conventions/historical_fiction.md +0 -0
  29. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/genre-conventions/horror_conventions.md +0 -0
  30. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/genre-conventions/mystery_conventions.md +0 -0
  31. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/genre-conventions/sci_fi_conventions.md +0 -0
  32. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/narrative-structure/branching_narrative_construction.md +0 -0
  33. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/narrative-structure/branching_narrative_craft.md +0 -0
  34. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/narrative-structure/endings_patterns.md +0 -0
  35. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/narrative-structure/episodic_serialized_if.md +0 -0
  36. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/narrative-structure/nonlinear_structure.md +0 -0
  37. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/narrative-structure/pacing_and_tension.md +0 -0
  38. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/narrative-structure/romance_and_relationships.md +0 -0
  39. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/narrative-structure/scene_structure_and_beats.md +0 -0
  40. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/narrative-structure/scene_transitions.md +0 -0
  41. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/prose-and-language/character_voice.md +0 -0
  42. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/prose-and-language/dialogue_craft.md +0 -0
  43. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/prose-and-language/exposition_techniques.md +0 -0
  44. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/prose-and-language/narrative_point_of_view.md +0 -0
  45. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/prose-and-language/prose_patterns.md +0 -0
  46. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/prose-and-language/subtext_and_implication.md +0 -0
  47. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/prose-and-language/voice_register_consistency.md +0 -0
  48. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/scope-and-planning/scope_and_length.md +0 -0
  49. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/world-and-setting/canon_management.md +0 -0
  50. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/world-and-setting/setting_as_character.md +0 -0
  51. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/corpus/world-and-setting/worldbuilding_patterns.md +0 -0
  52. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/src/ifcraftcorpus/__init__.py +0 -0
  53. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/src/ifcraftcorpus/cli.py +0 -0
  54. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/src/ifcraftcorpus/embeddings.py +0 -0
  55. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/src/ifcraftcorpus/index.py +0 -0
  56. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/src/ifcraftcorpus/logging_utils.py +0 -0
  57. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/src/ifcraftcorpus/mcp_server.py +0 -0
  58. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/src/ifcraftcorpus/parser.py +0 -0
  59. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/src/ifcraftcorpus/providers.py +0 -0
  60. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/src/ifcraftcorpus/py.typed +0 -0
  61. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/src/ifcraftcorpus/search.py +0 -0
  62. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/subagents/README.md +0 -0
  63. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/subagents/if_genre_consultant.md +0 -0
  64. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/subagents/if_platform_advisor.md +0 -0
  65. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/subagents/if_prose_writer.md +0 -0
  66. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/subagents/if_quality_reviewer.md +0 -0
  67. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/subagents/if_story_architect.md +0 -0
  68. {ifcraftcorpus-1.4.0 → ifcraftcorpus-1.5.0}/subagents/if_world_curator.md +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: ifcraftcorpus
3
- Version: 1.4.0
3
+ Version: 1.5.0
4
4
  Summary: Interactive fiction craft corpus with search library and MCP server
5
5
  Project-URL: Homepage, https://pvliesdonk.github.io/if-craft-corpus
6
6
  Project-URL: Repository, https://github.com/pvliesdonk/if-craft-corpus
@@ -0,0 +1,765 @@
1
+ ---
2
+ title: Agent Memory Architecture
3
+ summary: Framework-independent patterns for managing agent conversation history and long-term memory—why prompt stuffing fails, state-managed alternatives, memory types, and multi-agent sharing.
4
+ topics:
5
+ - memory-architecture
6
+ - conversation-history
7
+ - state-management
8
+ - checkpointers
9
+ - context-engineering
10
+ - multi-agent
11
+ - langgraph
12
+ - openai-agents
13
+ cluster: agent-design
14
+ ---
15
+
16
+ # Agent Memory Architecture
17
+
18
+ Patterns for managing agent conversation history and long-term memory. This guide explains why manual prompt concatenation fails, how to use state-managed memory correctly, and how to share context between agents.
19
+
20
+ This document is framework-independent in principles but includes concrete examples for LangGraph and OpenAI Agents SDK.
21
+
22
+ ---
23
+
24
+ ## The Anti-Pattern: Manual Prompt Concatenation
25
+
26
+ When building agents, developers (and AI coding assistants) often default to manually concatenating conversation history into prompts. This is the most common mistake in agent development.
27
+
28
+ ### What It Looks Like
29
+
30
+ **Anti-pattern: Naive history concatenation**
31
+
32
+ ```python
33
+ # DON'T DO THIS
34
+ class NaiveAgent:
35
+ def __init__(self, model):
36
+ self.model = model
37
+ self.history = [] # Manual history list
38
+
39
+ def chat(self, user_message: str) -> str:
40
+ self.history.append({"role": "user", "content": user_message})
41
+
42
+ # Stuffing full history into every call
43
+ response = self.model.chat(
44
+ messages=[
45
+ {"role": "system", "content": SYSTEM_PROMPT},
46
+ *self.history # Growing unboundedly
47
+ ]
48
+ )
49
+
50
+ self.history.append({"role": "assistant", "content": response})
51
+ return response
52
+ ```
53
+
54
+ **Problems:**
55
+
56
+ 1. **No persistence**: History lost on restart
57
+ 2. **Unbounded growth**: Eventually exceeds context window
58
+ 3. **No thread isolation**: Can't run multiple conversations
59
+ 4. **Attention degradation**: Middle content gets ignored
60
+ 5. **Token waste**: Paying for stale context every call
61
+
62
+ **Anti-pattern: String concatenation**
63
+
64
+ ```python
65
+ # DON'T DO THIS
66
+ def build_prompt(history: list[dict], new_message: str) -> str:
67
+ history_text = "\n".join([
68
+ f"{msg['role']}: {msg['content']}"
69
+ for msg in history
70
+ ])
71
+
72
+ return f"""Previous conversation:
73
+ {history_text}
74
+
75
+ User: {new_message}
76
+ Assistant:"""
77
+ ```
78
+
79
+ **Problems:**
80
+
81
+ 1. **Format fragility**: Role formatting can confuse the model
82
+ 2. **No structure**: Loses message boundaries
83
+ 3. **Injection risk**: History content can break prompt structure
84
+ 4. **No tool call preservation**: Loses function call context
85
+
86
+ ### Why AI Coding Assistants Default to This
87
+
88
+ Training data contains many examples of this pattern because:
89
+
90
+ - It's the simplest implementation
91
+ - It works for demos and tutorials
92
+ - Framework-specific patterns require API knowledge
93
+ - Most code examples don't show production patterns
94
+
95
+ This is why you have to repeatedly explain you want proper memory management.
96
+
97
+ ### Why It Fails: The Evidence
98
+
99
+ **"Lost in the Middle" Research (Liu et al., 2023)**
100
+
101
+ LLMs exhibit a U-shaped attention curve—content at the start and end of context receives attention, middle content is systematically ignored. Stuffing history into the middle of a prompt means important context gets lost.
102
+
103
+ **The 75% Rule (Claude Code, Anthropic)**
104
+
105
+ When Claude Code operated above 90% context utilization, output quality degraded significantly. Implementing auto-compaction at 75% produced dramatic quality improvements. The lesson: **capacity ≠ capability**. Empty headroom enables reasoning, not just retrieval.
106
+
107
+ **Context Rot**
108
+
109
+ Old, irrelevant details don't just waste tokens—they actively confuse the model. A discussion about error handling from 50 turns ago can distract from the current task, even if technically within the context window.
110
+
111
+ ---
112
+
113
+ ## The Correct Model: State-Managed Memory
114
+
115
+ Memory should be **first-class state**, not prompt injection. The framework handles storage, retrieval, trimming, and injection—your code focuses on logic.
116
+
117
+ ### Core Principles
118
+
119
+ **1. Separation of Concerns**
120
+
121
+ | Concern | Responsibility | Your Code |
122
+ |---------|----------------|-----------|
123
+ | Storage | Persist messages to durable store | Configure checkpointer |
124
+ | Retrieval | Load relevant history for thread | Provide thread_id |
125
+ | Trimming | Keep context within limits | Set thresholds |
126
+ | Injection | Add history to model calls | Automatic |
127
+
128
+ **2. Thread Isolation**
129
+
130
+ Each conversation gets a unique `thread_id`. The framework maintains separate history per thread, enabling concurrent conversations without interference.
131
+
132
+ **3. Resumability**
133
+
134
+ Conversations can be paused and resumed—even across process restarts. The checkpointer persists state to durable storage.
135
+
136
+ **4. Automatic Management**
137
+
138
+ You don't manually append messages or manage context length. The framework handles this based on configuration.
139
+
140
+ ### LangGraph: Checkpointer Pattern
141
+
142
+ ```python
143
+ from langgraph.checkpoint.memory import InMemorySaver
144
+ from langgraph.checkpoint.sqlite import SqliteSaver
145
+ from langgraph.graph import StateGraph, MessagesState
146
+
147
+ # Development: in-memory
148
+ checkpointer = InMemorySaver()
149
+
150
+ # Production: persistent storage
151
+ # checkpointer = SqliteSaver.from_conn_string("conversations.db")
152
+
153
+ # Define your graph
154
+ builder = StateGraph(MessagesState)
155
+ builder.add_node("agent", call_model)
156
+ builder.add_edge("__start__", "agent")
157
+
158
+ # Compile WITH checkpointer
159
+ graph = builder.compile(checkpointer=checkpointer)
160
+
161
+ # Each conversation gets a thread_id
162
+ config = {"configurable": {"thread_id": "user-123-session-1"}}
163
+
164
+ # Framework handles history automatically
165
+ response = graph.invoke(
166
+ {"messages": [{"role": "user", "content": "Hello!"}]},
167
+ config
168
+ )
169
+
170
+ # Same thread_id = conversation continues
171
+ response = graph.invoke(
172
+ {"messages": [{"role": "user", "content": "What did I just say?"}]},
173
+ config # Same config = same thread
174
+ )
175
+ ```
176
+
177
+ **What the framework does:**
178
+
179
+ 1. Before invoke: Loads existing messages for thread_id
180
+ 2. Prepends history to new messages
181
+ 3. Calls model with full context
182
+ 4. After invoke: Persists new messages to checkpointer
183
+ 5. Handles context limits based on configuration
184
+
185
+ ### OpenAI Agents SDK: Session Pattern
186
+
187
+ ```python
188
+ from agents import Agent, Runner
189
+ from agents.sessions import SQLiteSession
190
+
191
+ # Create persistent session storage
192
+ session = SQLiteSession("conversations.db")
193
+
194
+ agent = Agent(
195
+ name="assistant",
196
+ instructions="You are a helpful assistant.",
197
+ model="gpt-4o"
198
+ )
199
+
200
+ runner = Runner()
201
+
202
+ # Session handles history automatically
203
+ response = await runner.run(
204
+ agent,
205
+ "Hello!",
206
+ session=session,
207
+ session_id="user-123-session-1"
208
+ )
209
+
210
+ # Same session_id = conversation continues
211
+ response = await runner.run(
212
+ agent,
213
+ "What did I just say?",
214
+ session=session,
215
+ session_id="user-123-session-1"
216
+ )
217
+ ```
218
+
219
+ **What the session does:**
220
+
221
+ 1. Before run: Retrieves conversation history for session_id
222
+ 2. Prepends history to input items
223
+ 3. Executes agent with full context
224
+ 4. After run: Stores new items (user input, responses, tool calls)
225
+ 5. Handles continuity across runs
226
+
227
+ ---
228
+
229
+ ## Memory Types
230
+
231
+ Agent memory isn't monolithic. Different types serve different purposes and have different scopes.
232
+
233
+ ### Short-Term Memory (Thread-Scoped)
234
+
235
+ **Scope**: Single conversation thread
236
+ **Purpose**: Maintain context within an ongoing session
237
+ **Lifetime**: Duration of conversation (or until explicitly cleared)
238
+
239
+ | Framework | Implementation |
240
+ |-----------|----------------|
241
+ | LangGraph | Checkpointer with `thread_id` |
242
+ | OpenAI SDK | Session with `session_id` |
243
+ | General | Thread-isolated message store |
244
+
245
+ **What belongs in short-term memory:**
246
+
247
+ - User messages and assistant responses
248
+ - Tool calls and results
249
+ - Reasoning traces (if using chain-of-thought)
250
+ - Current task state
251
+
252
+ ### Long-Term Memory (Cross-Session)
253
+
254
+ **Scope**: Across multiple conversations
255
+ **Purpose**: Persist facts, preferences, learned patterns
256
+ **Lifetime**: Indefinite (or until explicitly deleted)
257
+
258
+ #### Structured Long-Term Memory
259
+
260
+ Facts, relationships, and decisions stored in queryable format.
261
+
262
+ ```python
263
+ # LangGraph Store pattern
264
+ from langgraph.store.memory import InMemoryStore
265
+
266
+ store = InMemoryStore()
267
+
268
+ # Store user preference (persists across threads)
269
+ store.put(
270
+ namespace=("users", "user-123", "preferences"),
271
+ key="timezone",
272
+ value={"timezone": "America/New_York", "updated": "2025-01-17"}
273
+ )
274
+
275
+ # Retrieve in any thread
276
+ prefs = store.get(("users", "user-123", "preferences"), "timezone")
277
+ ```
278
+
279
+ #### Semantic Long-Term Memory
280
+
281
+ Embedding-based retrieval for finding relevant past context.
282
+
283
+ ```python
284
+ # Conceptual pattern (framework-independent)
285
+ from your_vector_store import VectorStore
286
+
287
+ memory_store = VectorStore()
288
+
289
+ # Store interaction summary with embedding
290
+ memory_store.add(
291
+ text="User prefers concise responses without code comments",
292
+ metadata={"user_id": "user-123", "type": "preference"},
293
+ embedding=embed("User prefers concise responses...")
294
+ )
295
+
296
+ # Retrieve relevant memories for new context
297
+ relevant = memory_store.search(
298
+ query="How should I format code for this user?",
299
+ filter={"user_id": "user-123"}
300
+ )
301
+ ```
302
+
303
+ ### Episodic Memory
304
+
305
+ **Scope**: Cross-session, timestamped
306
+ **Purpose**: Record past interactions for learning and audit
307
+ **Lifetime**: Configurable retention
308
+
309
+ ```python
310
+ # Record interaction outcome
311
+ episodic_store.add({
312
+ "timestamp": "2025-01-17T10:30:00Z",
313
+ "user_id": "user-123",
314
+ "thread_id": "session-456",
315
+ "task": "debug authentication error",
316
+ "outcome": "resolved",
317
+ "approach": "checked token expiration, found clock skew",
318
+ "user_feedback": "positive"
319
+ })
320
+
321
+ # Query past approaches for similar tasks
322
+ past_successes = episodic_store.query(
323
+ task_type="debug authentication",
324
+ outcome="resolved",
325
+ user_id="user-123"
326
+ )
327
+ ```
328
+
329
+ ### Memory Layers Summary
330
+
331
+ | Layer | Scope | Storage | Retrieval | Example Use |
332
+ |-------|-------|---------|-----------|-------------|
333
+ | Short-term | Thread | Checkpointer/Session | By thread_id | Conversation context |
334
+ | Long-term (Structured) | User/Global | Key-value store | By namespace + key | User preferences |
335
+ | Long-term (Semantic) | User/Global | Vector store | By similarity | Relevant past context |
336
+ | Episodic | User/Global | Event log | By query + time | Past task outcomes |
337
+
338
+ ---
339
+
340
+ ## State-Over-History Principle
341
+
342
+ A key insight for efficient memory management: **prefer passing current state over full history**.
343
+
344
+ ### The Problem with Full History
345
+
346
+ ```python
347
+ # Anti-pattern: Passing full transcript to sub-agent
348
+ sub_agent_prompt = f"""
349
+ Here's the full conversation so far:
350
+ {format_messages(all_300_messages)}
351
+
352
+ Now help with: {current_task}
353
+ """
354
+ ```
355
+
356
+ **Problems:**
357
+
358
+ - Token explosion
359
+ - Attention dilution
360
+ - Irrelevant context pollution
361
+ - Latency increase
362
+
363
+ ### State-Over-History Pattern
364
+
365
+ ```python
366
+ # Better: Pass current state, not history
367
+ current_state = {
368
+ "user_goal": "Build a REST API for user management",
369
+ "completed_steps": ["schema design", "database setup"],
370
+ "current_step": "implement CRUD endpoints",
371
+ "decisions_made": {
372
+ "database": "PostgreSQL",
373
+ "framework": "FastAPI",
374
+ "auth": "JWT tokens"
375
+ },
376
+ "open_questions": [],
377
+ "artifacts": ["schema.sql", "models.py"]
378
+ }
379
+
380
+ sub_agent_prompt = f"""
381
+ Current project state:
382
+ {json.dumps(current_state, indent=2)}
383
+
384
+ Task: {current_task}
385
+ """
386
+ ```
387
+
388
+ **Benefits:**
389
+
390
+ - Minimal tokens
391
+ - Focused attention
392
+ - No stale context
393
+ - Faster inference
394
+
395
+ ### What Belongs in State vs History
396
+
397
+ | State (Pass Forward) | History (Store, Don't Pass) |
398
+ |---------------------|------------------------------|
399
+ | Current goal | How goal was established |
400
+ | Decisions made | Discussion leading to decisions |
401
+ | Artifacts created | Iterations and revisions |
402
+ | Open questions | Resolved questions |
403
+ | Error context (if debugging) | Successful operations |
404
+
405
+ ### Implementing State Extraction
406
+
407
+ ```python
408
+ # LangGraph: Custom state schema
409
+ from typing import TypedDict, Annotated
410
+ from langgraph.graph import add_messages
411
+
412
+ class ProjectState(TypedDict):
413
+ messages: Annotated[list, add_messages] # Short-term (auto-managed)
414
+
415
+ # Extracted state (you manage)
416
+ current_goal: str
417
+ decisions: dict
418
+ artifacts: list[str]
419
+ phase: str
420
+
421
+ # Update state after significant events
422
+ def extract_state(messages: list, current_state: ProjectState) -> ProjectState:
423
+ """Extract/update state from recent messages."""
424
+ # Use LLM or rules to identify:
425
+ # - New decisions made
426
+ # - Artifacts created
427
+ # - Phase transitions
428
+ return updated_state
429
+ ```
430
+
431
+ ---
432
+
433
+ ## Managing History Growth
434
+
435
+ Even with proper memory architecture, history grows. You need strategies to keep it bounded.
436
+
437
+ ### Strategy 1: Trimming
438
+
439
+ Keep only the last N turns, drop the rest.
440
+
441
+ **LangGraph: trim_messages**
442
+
443
+ ```python
444
+ from langgraph.prebuilt import create_react_agent
445
+ from langchain_core.messages import trim_messages
446
+
447
+ def trim_to_recent(messages: list) -> list:
448
+ """Keep system message + last 10 messages."""
449
+ return trim_messages(
450
+ messages,
451
+ max_tokens=4000,
452
+ strategy="last",
453
+ token_counter=len, # Or use tiktoken
454
+ include_system=True,
455
+ allow_partial=False
456
+ )
457
+
458
+ # Apply before model call
459
+ agent = create_react_agent(
460
+ model,
461
+ tools,
462
+ state_modifier=trim_to_recent
463
+ )
464
+ ```
465
+
466
+ **When to use trimming:**
467
+
468
+ - Short, transactional conversations
469
+ - Tasks where old context is truly irrelevant
470
+ - When latency is critical
471
+
472
+ **Anti-patterns with trimming:**
473
+
474
+ - Losing critical decisions from early in conversation
475
+ - Trimming mid-tool-call (orphaned tool results)
476
+ - Using for planning tasks that need long-range context
477
+
478
+ ### Strategy 2: Summarization
479
+
480
+ Compress older messages into a synthetic summary.
481
+
482
+ **LangGraph: SummarizationMiddleware**
483
+
484
+ ```python
485
+ from langchain.agents import create_agent, SummarizationMiddleware
486
+
487
+ agent = create_agent(
488
+ model="gpt-4o",
489
+ tools=tools,
490
+ middleware=[
491
+ SummarizationMiddleware(
492
+ model="gpt-4o-mini", # Cheaper model for summarization
493
+ trigger={"tokens": 4000}, # Trigger when context exceeds
494
+ keep={"messages": 10} # Keep last 10 verbatim
495
+ )
496
+ ]
497
+ )
498
+ ```
499
+
500
+ **What summarization produces:**
501
+
502
+ ```
503
+ [Summary of turns 1-50]:
504
+ - User requested help building a REST API
505
+ - Decided on FastAPI + PostgreSQL
506
+ - Completed: schema design, database models
507
+ - Current focus: authentication implementation
508
+ - User prefers concise code without excessive comments
509
+
510
+ [Recent messages 51-60 kept verbatim]
511
+ ```
512
+
513
+ **When to use summarization:**
514
+
515
+ - Long-running planning conversations
516
+ - Support threads spanning multiple issues
517
+ - Tasks requiring long-range continuity
518
+
519
+ **Anti-patterns with summarization:**
520
+
521
+ - **Summary drift**: Facts get reinterpreted incorrectly
522
+ - **Context poisoning**: Errors in summary propagate indefinitely
523
+ - **Over-compression**: Losing critical details
524
+ - **Summarizing too frequently**: Latency overhead
525
+
526
+ ### Strategy 3: Hybrid (Recommended)
527
+
528
+ Combine summarization for old context + trimming for recent.
529
+
530
+ ```python
531
+ class HybridMemoryConfig:
532
+ # Summarize when total exceeds this
533
+ summarize_threshold_tokens: int = 8000
534
+
535
+ # Keep this many recent messages verbatim
536
+ keep_recent_messages: int = 20
537
+
538
+ # Maximum summary length
539
+ max_summary_tokens: int = 500
540
+
541
+ # Model for summarization (use cheaper model)
542
+ summary_model: str = "gpt-4o-mini"
543
+ ```
544
+
545
+ **Flow:**
546
+
547
+ 1. Check total token count
548
+ 2. If under threshold: no action
549
+ 3. If over threshold:
550
+ - Keep last N messages verbatim
551
+ - Summarize older messages
552
+ - Replace older messages with summary
553
+ - Continue with bounded context
554
+
555
+ ---
556
+
557
+ ## Multi-Agent Memory Sharing
558
+
559
+ When multiple agents collaborate, memory sharing becomes critical.
560
+
561
+ ### Pattern 1: Shared State Object
562
+
563
+ Agents read from and write to a common state.
564
+
565
+ ```python
566
+ # LangGraph: Shared state across nodes
567
+ from typing import TypedDict, Annotated
568
+ from langgraph.graph import StateGraph, add_messages
569
+
570
+ class SharedState(TypedDict):
571
+ messages: Annotated[list, add_messages]
572
+
573
+ # Shared across all agents
574
+ research_findings: list[str]
575
+ draft_content: str
576
+ review_feedback: list[str]
577
+ final_output: str
578
+
579
+ def researcher(state: SharedState) -> SharedState:
580
+ """Research agent adds findings to shared state."""
581
+ findings = do_research(state["messages"][-1])
582
+ return {"research_findings": state["research_findings"] + findings}
583
+
584
+ def writer(state: SharedState) -> SharedState:
585
+ """Writer agent reads research, produces draft."""
586
+ draft = write_draft(state["research_findings"])
587
+ return {"draft_content": draft}
588
+
589
+ def reviewer(state: SharedState) -> SharedState:
590
+ """Reviewer reads draft, adds feedback."""
591
+ feedback = review(state["draft_content"])
592
+ return {"review_feedback": feedback}
593
+
594
+ # Wire agents together
595
+ graph = StateGraph(SharedState)
596
+ graph.add_node("researcher", researcher)
597
+ graph.add_node("writer", writer)
598
+ graph.add_node("reviewer", reviewer)
599
+ ```
600
+
601
+ ### Pattern 2: Artifact Passing (Not Transcript Passing)
602
+
603
+ **Anti-pattern: Context telephone**
604
+
605
+ ```python
606
+ # DON'T DO THIS
607
+ def orchestrator_delegates_to_specialist(conversation_history):
608
+ # Passing full history degrades information
609
+ specialist_result = specialist.run(
610
+ f"Here's the conversation:\n{conversation_history}\n\nDo task X"
611
+ )
612
+ return specialist_result
613
+ ```
614
+
615
+ **Problems:**
616
+
617
+ - Information degrades through each handoff
618
+ - Irrelevant context pollutes specialist focus
619
+ - Token waste compounds at each level
620
+
621
+ **Better: Pass artifacts and state**
622
+
623
+ ```python
624
+ # DO THIS
625
+ def orchestrator_delegates_to_specialist(task_state):
626
+ # Pass only what specialist needs
627
+ specialist_result = specialist.run(
628
+ task_description=task_state["current_task"],
629
+ input_artifacts=task_state["relevant_artifacts"],
630
+ constraints=task_state["constraints"],
631
+ # NOT the full conversation history
632
+ )
633
+ return specialist_result
634
+ ```
635
+
636
+ ### Pattern 3: Memory Isolation vs Sharing
637
+
638
+ | Scenario | Memory Strategy |
639
+ |----------|-----------------|
640
+ | Agents working on same task | Shared state object |
641
+ | Agents with different domains | Isolated memory, share artifacts |
642
+ | Parallel independent tasks | Fully isolated threads |
643
+ | Validator reviewing creator's work | Read-only access to creator's output |
644
+
645
+ **LangGraph: Isolated sub-agents**
646
+
647
+ ```python
648
+ # Each specialist gets its own thread
649
+ def delegate_to_specialist(state, specialist_graph, task):
650
+ # Create isolated thread for specialist
651
+ specialist_thread_id = f"{state['thread_id']}-{specialist_graph.name}-{uuid4()}"
652
+
653
+ result = specialist_graph.invoke(
654
+ {"messages": [{"role": "user", "content": task}]},
655
+ {"configurable": {"thread_id": specialist_thread_id}}
656
+ )
657
+
658
+ # Return only the result, not specialist's internal history
659
+ return result["final_output"]
660
+ ```
661
+
662
+ ### Pattern 4: Namespace-Based Sharing
663
+
664
+ For long-term memory that should be shared across agents:
665
+
666
+ ```python
667
+ # Shared user preferences (all agents can read)
668
+ user_namespace = ("users", user_id, "preferences")
669
+
670
+ # Agent-specific learned patterns (isolated)
671
+ agent_namespace = ("agents", agent_id, "patterns")
672
+
673
+ # Project-specific context (shared within project)
674
+ project_namespace = ("projects", project_id, "context")
675
+ ```
676
+
677
+ ---
678
+
679
+ ## The 75% Rule
680
+
681
+ Never fill context to capacity. Reserve headroom for reasoning.
682
+
683
+ ### Why Headroom Matters
684
+
685
+ | Context Usage | Effect |
686
+ |---------------|--------|
687
+ | < 50% | Optimal reasoning space |
688
+ | 50-75% | Good balance |
689
+ | 75-90% | Degraded quality, trigger compaction |
690
+ | > 90% | Significant quality loss |
691
+
692
+ ### Implementation
693
+
694
+ ```python
695
+ def should_compact(messages: list, model_context_limit: int) -> bool:
696
+ """Check if context needs compaction."""
697
+ current_tokens = count_tokens(messages)
698
+ threshold = model_context_limit * 0.75
699
+ return current_tokens > threshold
700
+
701
+ def auto_compact_middleware(state: AgentState) -> AgentState:
702
+ """Middleware that triggers compaction at 75%."""
703
+ if should_compact(state["messages"], MODEL_CONTEXT_LIMIT):
704
+ state["messages"] = summarize_and_trim(state["messages"])
705
+ return state
706
+ ```
707
+
708
+ ---
709
+
710
+ ## Implementation Checklist
711
+
712
+ When building agents, verify:
713
+
714
+ - [ ] **No manual history concatenation** in prompt building
715
+ - [ ] **Checkpointer/Session configured** for conversation persistence
716
+ - [ ] **Thread IDs assigned** for conversation isolation
717
+ - [ ] **Trimming or summarization** configured for long conversations
718
+ - [ ] **State-over-history** for sub-agent delegation
719
+ - [ ] **Artifacts passed**, not transcripts, between agents
720
+ - [ ] **75% threshold** for context compaction
721
+ - [ ] **Long-term memory** separated from short-term (if needed)
722
+
723
+ ---
724
+
725
+ ## Quick Reference
726
+
727
+ ### Pattern Selection
728
+
729
+ | Situation | Pattern | Framework Feature |
730
+ |-----------|---------|-------------------|
731
+ | Basic conversation persistence | Checkpointer/Session | LangGraph: `InMemorySaver`, OpenAI: `SQLiteSession` |
732
+ | Long conversations | Summarization middleware | LangGraph: `SummarizationMiddleware` |
733
+ | Multi-agent shared context | Shared state schema | LangGraph: `StateGraph` with shared `TypedDict` |
734
+ | Cross-session user data | Long-term store | LangGraph: `InMemoryStore`, MongoDB Store |
735
+ | Semantic memory retrieval | Vector store integration | External: Pinecone, Chroma, pgvector |
736
+
737
+ ### Anti-Pattern Recognition
738
+
739
+ | If you see... | It's wrong because... | Replace with... |
740
+ |---------------|----------------------|-----------------|
741
+ | `history.append(msg)` | Manual management | Checkpointer |
742
+ | `prompt += history` | String concatenation | Session with auto-injection |
743
+ | Full transcript to sub-agent | Context telephone | Artifact/state passing |
744
+ | No thread_id | No isolation | Explicit thread management |
745
+ | No trimming/summarization | Unbounded growth | Memory middleware |
746
+
747
+ ---
748
+
749
+ ## Research Basis
750
+
751
+ | Source | Key Finding |
752
+ |--------|-------------|
753
+ | "Lost in the Middle" (Liu et al., 2023) | U-shaped attention; middle content ignored |
754
+ | Claude Code 75% Rule (Anthropic) | Quality degrades above 75% context usage |
755
+ | LangChain Short-Term Memory Guide | Checkpointer + summarization patterns |
756
+ | OpenAI Agents SDK Session Docs | Session-based auto-persistence |
757
+ | AWS Memory-Augmented Agents | Memory layer architecture patterns |
758
+ | A-Mem (2025) | Dynamic vs predefined memory access |
759
+
760
+ ---
761
+
762
+ ## See Also
763
+
764
+ - [Agent Prompt Engineering](agent_prompt_engineering.md) — Context architecture, active pruning, state-over-history principle
765
+ - [Multi-Agent Patterns](multi_agent_patterns.md) — Delegation, context passing, artifact handoffs
@@ -10,6 +10,9 @@ topics:
10
10
  - small-models
11
11
  - chain-of-thought
12
12
  - few-shot-learning
13
+ - list-completeness
14
+ - validation-loops
15
+ - external-validation
13
16
  cluster: agent-design
14
17
  ---
15
18
 
@@ -70,6 +73,113 @@ Lower-priority content that can be retrieved on demand:
70
73
 
71
74
  ---
72
75
 
76
+ ## List Completeness Patterns
77
+
78
+ When LLMs must process every item in a list (entity decisions, task completions, validation checklists), they frequently skip items—especially in the middle of long lists. This section describes patterns to ensure completeness.
79
+
80
+ ### Numbered Lists vs Checkboxes
81
+
82
+ Numbered lists outperform checkboxes for sequential processing:
83
+
84
+ | Format | Behavior | Reliability |
85
+ |--------|----------|-------------|
86
+ | `- [ ] item` | Treated as optional; often reformatted creatively | Lower |
87
+ | `1. item` | Signals discrete task requiring attention | Higher |
88
+
89
+ **Why it works:** Numbered format implies a sequence of individual tasks. Combined with explicit counts, this creates accountability that checkbox format cannot.
90
+
91
+ **Example:**
92
+
93
+ Anti-pattern:
94
+
95
+ ```text
96
+ - [ ] Decide on entity: butler_jameson
97
+ - [ ] Decide on entity: guest_clara
98
+ - [ ] Decide on entity: archive_room
99
+ ```
100
+
101
+ Better:
102
+
103
+ ```text
104
+ Entity Decisions (3 total):
105
+ 1. butler_jameson — [your decision]
106
+ 2. guest_clara — [your decision]
107
+ 3. archive_room — [your decision]
108
+ ```
109
+
110
+ ### Quantity Anchoring
111
+
112
+ State exact counts at both start AND end of prompts (sandwich pattern for quantities):
113
+
114
+ ```markdown
115
+ # REQUIREMENT: Exactly 21 Entity Decisions
116
+
117
+ [numbered list of 21 entities]
118
+
119
+ ...
120
+
121
+ # REMINDER: 21 entity decisions required. You must provide a decision for all 21.
122
+ ```
123
+
124
+ The explicit number creates a concrete, verifiable target. Vague instructions like "all items" or "every entity" are easier to satisfy incompletely.
125
+
126
+ ### Anti-Skipping Statements
127
+
128
+ Direct statements about completeness requirements are effective, especially when combined with the sandwich pattern:
129
+
130
+ | Position | Example |
131
+ |----------|---------|
132
+ | Start | "You must process ALL 21 entities. Skipping any is not acceptable." |
133
+ | End | "Total: 21 entities. Confirm you provided a decision for every single one." |
134
+
135
+ These explicit constraints work because they:
136
+
137
+ - Create a falsifiable claim the model must satisfy
138
+ - Exploit primacy/recency attention patterns
139
+ - Provide a concrete metric (count) rather than vague completeness
140
+
141
+ ### External Validation Required
142
+
143
+ **LLMs cannot reliably self-verify completeness mid-generation.**
144
+
145
+ Research shows that self-verification checklists embedded in prompts are frequently ignored or filled incorrectly. This is a fundamental limitation: LLMs operate via approximate retrieval, not logical verification.
146
+
147
+ **Anti-pattern:**
148
+
149
+ ```markdown
150
+ Before submitting, verify:
151
+ - [ ] I processed all 21 entities
152
+ - [ ] No entity was skipped
153
+ - [ ] Each decision is justified
154
+ ```
155
+
156
+ The model will often check these boxes without actually verifying.
157
+
158
+ **Better approach:**
159
+
160
+ ```text
161
+ 1. Generate output (entity decisions)
162
+ 2. External code counts decisions: found 20, expected 21
163
+ 3. Feedback: "Missing decision for entity 'guest_clara'. Provide decision."
164
+ 4. Model repairs the specific gap
165
+ ```
166
+
167
+ The "Validate → Feedback → Repair" loop (see below) must use **external logic**, not LLM self-assessment.
168
+
169
+ ### Combining Patterns
170
+
171
+ For maximum completeness on list-processing tasks:
172
+
173
+ 1. Use **numbered lists** (not checkboxes)
174
+ 2. State **exact count** at start and end (sandwich)
175
+ 3. Include **anti-skipping statements** at start and end
176
+ 4. Validate **externally** after generation
177
+ 5. Provide **specific feedback** naming missing items
178
+
179
+ This combination addressed a real-world failure where gpt-4o-mini skipped 1 of 21 entities despite an embedded entity checklist.
180
+
181
+ ---
182
+
73
183
  ## Tool Design
74
184
 
75
185
  ### Tool Count Effects
@@ -544,6 +654,135 @@ Validate → feedback → repair is a general pattern:
544
654
  - Works for more informal artifacts (e.g., checklists, outlines) when combined with light-weight structural checks.
545
655
  - Plays well with the structured-output patterns above and with the reflection/self-critique patterns below.
546
656
 
657
+ ### Two-Level Feedback Architecture
658
+
659
+ Simple validation loops assume errors can be fixed by repairing the output. But some errors originate earlier in the pipeline—the output is wrong because the *input* was wrong. A two-level architecture handles both cases.
660
+
661
+ #### The Problem: Broken Input Propagation
662
+
663
+ Consider a pipeline: `Summarize → Serialize → Validate`
664
+
665
+ ```text
666
+ Summarize → Brief (with invented IDs) → Serialize → Validate → Feedback
667
+ ↑ ↓
668
+ └──────── Brief stays the same! ───────────────┘
669
+ ```
670
+
671
+ If the summarize step invents an ID (`archive_access` instead of valid `diary_truth`), the serialize step will use it because it's in the brief. Validation rejects it. The inner repair loop retries serialize with the same broken brief → **0% correction rate**.
672
+
673
+ #### Solution: Nested Loops
674
+
675
+ ```text
676
+ ┌─────────────────────────────────────────────────────────────────┐
677
+ │ OUTER LOOP (max 2) │
678
+ │ When SEMANTIC validation fails → repair the SOURCE: │
679
+ │ - Original input + validation errors │
680
+ │ - Valid references list │
681
+ │ - Fuzzy replacement suggestions │
682
+ └─────────────────────────────────────────────────────────────────┘
683
+ ↓ ↑
684
+ ┌───────────┐ ┌──────────────┐
685
+ │ SOURCE │ │ SEMANTIC │
686
+ │ (brief) │ │ VALIDATION │
687
+ └───────────┘ └──────────────┘
688
+ ↓ ↑
689
+ ┌─────────────────────────────────────────────────────────────────┐
690
+ │ INNER LOOP (max 3) │
691
+ │ Handles schema/format errors only │
692
+ │ (Pydantic failures, JSON syntax, missing fields) │
693
+ └─────────────────────────────────────────────────────────────────┘
694
+ ```
695
+
696
+ **Inner loop** (fast, cheap): Schema errors, type mismatches, missing required fields. These can be fixed by repairing the serialized output directly.
697
+
698
+ **Outer loop** (expensive, rare): Semantic errors—invalid references, invented IDs, impossible states. These require repairing the *source* that caused the problem.
699
+
700
+ #### When to Use Each Loop
701
+
702
+ | Error Type | Loop | Example |
703
+ |------------|------|---------|
704
+ | JSON syntax error | Inner | Missing comma, unclosed brace |
705
+ | Missing required field | Inner | `protagonist_name` not provided |
706
+ | Invalid field value | Inner | `estimated_passages: 15` when max is 10 |
707
+ | Unknown field | Inner | `passages` instead of `estimated_passages` |
708
+ | Invalid reference ID | **Outer** | `thread: "archive_access"` when ID doesn't exist |
709
+ | Semantic inconsistency | **Outer** | Character referenced before introduction |
710
+ | Hallucinated entity | **Outer** | Entity name invented, not from source data |
711
+
712
+ #### Fuzzy ID Replacement Suggestions
713
+
714
+ When semantic validation finds invalid IDs, generate replacement suggestions using fuzzy matching:
715
+
716
+ ```markdown
717
+ ### Error: Invalid Thread ID
718
+ - Location: initial_beats.5.threads
719
+ - You used: `archive_access`
720
+ - VALID OPTIONS: `butler_fidelity` | `diary_truth` | `host_motive`
721
+ - SUGGESTED: `diary_truth` (closest match to "archive")
722
+
723
+ ### Error: Unknown Entity
724
+ - Location: scene.3.characters
725
+ - You used: `mysterious_stranger`
726
+ - VALID OPTIONS: `butler_jameson` | `guest_clara` | `detective_morse`
727
+ - SUGGESTED: Remove this reference (no close match)
728
+ ```
729
+
730
+ This gives the model actionable guidance rather than just rejection.
731
+
732
+ #### Source Repair Prompt Pattern
733
+
734
+ When the outer loop triggers, the repair prompt should include:
735
+
736
+ 1. **Original source** (the brief/summary being repaired)
737
+ 2. **Validation errors** (what went wrong downstream)
738
+ 3. **Valid references** (complete list of allowed IDs)
739
+ 4. **Fuzzy suggestions** (what to replace invalid IDs with)
740
+ 5. **Full context** (original input data the source was derived from)
741
+
742
+ ```markdown
743
+ ## Repair Required
744
+
745
+ Your brief contained invalid references that caused downstream failures.
746
+
747
+ ### Original Brief
748
+ [brief content here]
749
+
750
+ ### Validation Errors
751
+ 1. `archive_access` is not a valid thread ID
752
+ 2. `clock_distortion` is not a valid thread ID
753
+
754
+ ### Valid Thread IDs
755
+ - butler_fidelity
756
+ - diary_truth
757
+ - host_motive
758
+
759
+ ### Suggested Replacements
760
+ - `archive_access` → `diary_truth` (both relate to hidden information)
761
+ - `clock_distortion` → REMOVE (no matching concept)
762
+
763
+ ### Original Discussion (for context)
764
+ [full source material the brief was derived from]
765
+
766
+ Produce a corrected brief that uses only valid IDs.
767
+ ```
768
+
769
+ #### Budget and Applicability
770
+
771
+ | Stage Type | Needs Outer Loop? | Reason |
772
+ |------------|-------------------|--------|
773
+ | Generation (creates new IDs) | No | Creates IDs, doesn't reference them |
774
+ | Summarization | **Yes** | May invent or misremember IDs |
775
+ | Serialization (uses existing IDs) | **Yes** | References IDs from earlier stages |
776
+ | Expansion (adds detail) | Maybe | References scene/entity IDs |
777
+
778
+ **Total budget:** Outer loop max 2 iterations × Inner loop max 3 iterations = ≤12 LLM calls per stage worst case.
779
+
780
+ **Success criteria:**
781
+
782
+ - >80% correction rate on first outer loop iteration
783
+ - Clear error messages guide model to correct IDs
784
+ - Fuzzy matching reduces guesswork
785
+
547
786
  ---
548
787
 
549
788
  ## Prompt-History Conflicts
@@ -721,6 +960,10 @@ See [Sampling Parameters](#sampling-parameters) for detailed temperature guidanc
721
960
  | Context pruning | Context rot | Summarize and remove stale turns |
722
961
  | Structured feedback | Vague validation errors | Categorize issues (invalid/missing/unknown) |
723
962
  | Phase-specific temperature | Format errors in structured output | High temp for discuss, low for serialize |
963
+ | Numbered lists | Checkbox skipping | Use 1. 2. 3. format, not checkboxes |
964
+ | Quantity anchoring | Incomplete list processing | State exact count at start AND end |
965
+ | Anti-skipping statements | Middle items ignored | Explicit "process all N" constraints |
966
+ | Two-level validation | Broken input propagation | Outer loop repairs source, inner repairs output |
724
967
 
725
968
  | Model Class | Max Prompt | Max Tools | Strategy |
726
969
  |-------------|------------|-----------|----------|
@@ -741,10 +984,14 @@ See [Sampling Parameters](#sampling-parameters) for detailed temperature guidanc
741
984
  | Reflexion research | Self-correction improves quality on complex tasks |
742
985
  | STROT Framework (2025) | Structured feedback loops achieve 95% first-attempt success |
743
986
  | AWS Evaluator-Optimizer | Semantic reflection enables self-improving validation |
987
+ | LLM Self-Verification Limitations (2024) | LLMs cannot reliably self-verify; external validation required |
988
+ | Spotify Verification Loops (2025) | Inner/outer loop architecture; deterministic + semantic validation |
989
+ | LLMLOOP (ICSME 2025) | First feedback iteration has highest impact (up to 24% improvement) |
744
990
 
745
991
  ---
746
992
 
747
993
  ## See Also
748
994
 
995
+ - [Agent Memory Architecture](agent_memory_architecture.md) — State-managed memory, checkpointers, history management
749
996
  - [Branching Narrative Construction](../narrative-structure/branching_narrative_construction.md) — LLM generation strategies for narratives
750
997
  - [Multi-Agent Patterns](multi_agent_patterns.md) — Team coordination and delegation
@@ -492,5 +492,6 @@ This enables:
492
492
 
493
493
  ## See Also
494
494
 
495
+ - [Agent Memory Architecture](agent_memory_architecture.md) — Memory sharing, state-over-history, context passing
495
496
  - [Agent Prompt Engineering](agent_prompt_engineering.md) — Prompt design for individual agents
496
497
  - [Branching Narrative Construction](../narrative-structure/branching_narrative_construction.md) — Decomposition strategies for complex generation
@@ -1,6 +1,6 @@
1
1
  [project]
2
2
  name = "ifcraftcorpus"
3
- version = "1.4.0"
3
+ version = "1.5.0"
4
4
  description = "Interactive fiction craft corpus with search library and MCP server"
5
5
  readme = "README.md"
6
6
  license = {text = "MIT"}
File without changes
File without changes
File without changes