ace-playbook 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. ace_playbook-0.1.0/LICENSE +21 -0
  2. ace_playbook-0.1.0/PKG-INFO +362 -0
  3. ace_playbook-0.1.0/README.md +327 -0
  4. ace_playbook-0.1.0/ace/__init__.py +74 -0
  5. ace_playbook-0.1.0/ace/baselines.py +103 -0
  6. ace_playbook-0.1.0/ace/cli.py +152 -0
  7. ace_playbook-0.1.0/ace/config.py +36 -0
  8. ace_playbook-0.1.0/ace/delta.py +143 -0
  9. ace_playbook-0.1.0/ace/engine.py +289 -0
  10. ace_playbook-0.1.0/ace/feedback.py +41 -0
  11. ace_playbook-0.1.0/ace/integrations/__init__.py +1 -0
  12. ace_playbook-0.1.0/ace/integrations/openai_agents.py +254 -0
  13. ace_playbook-0.1.0/ace/llm.py +152 -0
  14. ace_playbook-0.1.0/ace/playbook.py +223 -0
  15. ace_playbook-0.1.0/ace/py.typed +0 -0
  16. ace_playbook-0.1.0/ace/refine.py +145 -0
  17. ace_playbook-0.1.0/ace/roles.py +261 -0
  18. ace_playbook-0.1.0/ace/tasks.py +254 -0
  19. ace_playbook-0.1.0/ace/visualize.py +297 -0
  20. ace_playbook-0.1.0/ace_playbook.egg-info/PKG-INFO +362 -0
  21. ace_playbook-0.1.0/ace_playbook.egg-info/SOURCES.txt +39 -0
  22. ace_playbook-0.1.0/ace_playbook.egg-info/dependency_links.txt +1 -0
  23. ace_playbook-0.1.0/ace_playbook.egg-info/entry_points.txt +2 -0
  24. ace_playbook-0.1.0/ace_playbook.egg-info/requires.txt +18 -0
  25. ace_playbook-0.1.0/ace_playbook.egg-info/top_level.txt +1 -0
  26. ace_playbook-0.1.0/pyproject.toml +53 -0
  27. ace_playbook-0.1.0/setup.cfg +4 -0
  28. ace_playbook-0.1.0/tests/test_baselines.py +43 -0
  29. ace_playbook-0.1.0/tests/test_cli.py +64 -0
  30. ace_playbook-0.1.0/tests/test_config_and_playbook_io.py +97 -0
  31. ace_playbook-0.1.0/tests/test_delta.py +67 -0
  32. ace_playbook-0.1.0/tests/test_engine.py +102 -0
  33. ace_playbook-0.1.0/tests/test_extensibility.py +145 -0
  34. ace_playbook-0.1.0/tests/test_feedback_and_delta_edges.py +88 -0
  35. ace_playbook-0.1.0/tests/test_llm_openai.py +104 -0
  36. ace_playbook-0.1.0/tests/test_openai_agents_integration.py +176 -0
  37. ace_playbook-0.1.0/tests/test_playbook.py +84 -0
  38. ace_playbook-0.1.0/tests/test_refine.py +68 -0
  39. ace_playbook-0.1.0/tests/test_refine_embedder.py +74 -0
  40. ace_playbook-0.1.0/tests/test_roles_and_viz.py +102 -0
  41. ace_playbook-0.1.0/tests/test_tasks_and_integration.py +70 -0
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 ACE Framework Contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,362 @@
1
+ Metadata-Version: 2.4
2
+ Name: ace-playbook
3
+ Version: 0.1.0
4
+ Summary: ACE โ€” Agentic Context Engineering: evolving, self-improving context playbooks for LLM agents. A faithful, framework-style implementation of the ICLR 2026 paper, with first-class OpenAI Agents SDK support.
5
+ Author: ACE Framework Contributors
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/rrahimi-uci/agentic-context-engineering
8
+ Project-URL: Repository, https://github.com/rrahimi-uci/agentic-context-engineering
9
+ Project-URL: Paper, https://arxiv.org/abs/2510.04618
10
+ Keywords: llm,agents,context-engineering,prompt-optimization,self-improving-agents,openai,openai-agents-sdk,agent-memory,in-context-learning,ace,playbook,rag,reflexion,gepa
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: Intended Audience :: Science/Research
14
+ Classifier: License :: OSI Approved :: MIT License
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
17
+ Requires-Python: >=3.9
18
+ Description-Content-Type: text/markdown
19
+ License-File: LICENSE
20
+ Requires-Dist: numpy>=1.23
21
+ Requires-Dist: rich>=13.0
22
+ Provides-Extra: openai
23
+ Requires-Dist: openai>=1.40; extra == "openai"
24
+ Provides-Extra: agents
25
+ Requires-Dist: openai-agents>=0.0.1; extra == "agents"
26
+ Provides-Extra: dev
27
+ Requires-Dist: pytest>=8.0; extra == "dev"
28
+ Requires-Dist: python-dotenv>=1.0; extra == "dev"
29
+ Provides-Extra: all
30
+ Requires-Dist: openai>=1.40; extra == "all"
31
+ Requires-Dist: openai-agents>=0.0.1; extra == "all"
32
+ Requires-Dist: python-dotenv>=1.0; extra == "all"
33
+ Requires-Dist: pytest>=8.0; extra == "all"
34
+ Dynamic: license-file
35
+
36
+ <div align="center">
37
+
38
+ # ๐ŸŽฎ ACE โ€” Agentic Context Engineering
39
+
40
+ ### Evolving, self-improving **context playbooks** for LLM agents โ€” a clean, tested, framework-style implementation of the [ICLR 2026 paper](https://arxiv.org/abs/2510.04618), with first-class **OpenAI Agents SDK** support.
41
+
42
+ [![Python](https://img.shields.io/badge/python-3.9%2B-blue.svg)](https://www.python.org/)
43
+ [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
44
+ [![Tests](https://img.shields.io/badge/tests-112%20passing-brightgreen.svg)](tests/)
45
+ [![OpenAI Agents SDK](https://img.shields.io/badge/OpenAI-Agents%20SDK-black.svg)](https://openai.github.io/openai-agents-python/)
46
+ [![Docs](https://img.shields.io/badge/docs-website-blue.svg)](https://rrahimi-uci.github.io/agentic-context-engineering/)
47
+ [![Paper](https://img.shields.io/badge/paper-arXiv%3A2510.04618-b31b1b.svg)](https://arxiv.org/abs/2510.04618)
48
+
49
+ **Stop re-prompting. Let your agent *write its own playbook* from experience.**
50
+
51
+ ๐Ÿ“– **[Documentation site](https://rrahimi-uci.github.io/agentic-context-engineering/)** ยท ๐Ÿ“ **[Architecture](https://rrahimi-uci.github.io/agentic-context-engineering/architecture.html)**
52
+
53
+ [Quickstart](#-quickstart) ยท [Why ACE](#-why-ace) ยท [Use on your own task](#-use-it-on-your-own-task) ยท [OpenAI Agents SDK](#-use-it-with-the-openai-agents-sdk) ยท [How it works](#-how-it-works) ยท [Results](#-results) ยท [Architecture](ARCHITECTURE.md)
54
+
55
+ </div>
56
+
57
+ ---
58
+
59
+ ## What is this?
60
+
61
+ LLM agents and domain experts increasingly improve through **context adaptation** โ€”
62
+ editing the *inputs* (instructions, strategies, evidence) instead of the *weights*.
63
+ But the two dominant approaches break down:
64
+
65
+ - **Brevity bias** โ€” prompt optimizers collapse toward short, generic instructions and throw away hard-won domain detail.
66
+ - **Context collapse** โ€” letting an LLM *rewrite the whole context* every step compresses it into a lossy summary and **craters accuracy** (see below).
67
+
68
+ **ACE** fixes both. It treats context as an **evolving playbook** of small, itemized
69
+ **bullets** that *accumulate, refine, and organize* strategies over time, through a
70
+ modular **Generator โ†’ Reflector โ†’ Curator** loop with **incremental delta updates** and a
71
+ **grow-and-refine** mechanism. The result: comprehensive, scalable, self-improving context โ€” with low overhead.
72
+
73
+ > This repository is a faithful, dependency-light, **fully tested** implementation you can
74
+ > use in a couple of commands and a few lines of code.
75
+
76
+ ---
77
+
78
+ ## โœจ Why ACE
79
+
80
+ | | Prompt optimizers (GEPA, MIPRO) | Monolithic memory (full rewrite) | **ACE** |
81
+ |---|---|---|---|
82
+ | Keeps domain detail | โŒ brevity bias | โš ๏ธ erodes over time | โœ… accumulates |
83
+ | Survives long horizons | โš ๏ธ | โŒ **context collapse** | โœ… incremental deltas |
84
+ | Update cost | ๐Ÿข full re-optimization | ๐Ÿข full re-ingest each step | โšก tiny deltas, non-LLM merge |
85
+ | Works without labels | โš ๏ธ | โœ… | โœ… execution feedback |
86
+ | Interpretable / editable | โš ๏ธ | โš ๏ธ | โœ… inspectable bullets |
87
+
88
+ ---
89
+
90
+ ## ๐Ÿš€ Quickstart
91
+
92
+ ```bash
93
+ git clone https://github.com/rrahimi-uci/agentic-context-engineering && cd agentic-context-engineering
94
+ pip install -e . # core library (numpy + rich only)
95
+ ```
96
+
97
+ Run the headline comparison โ€” **no API key required** (uses a deterministic, offline teaching environment):
98
+
99
+ ```bash
100
+ ace demo --html report.html
101
+ ```
102
+
103
+ ```
104
+ โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
105
+ โ”ƒ Method โ”ƒ Accuracy โ”ƒ Playbook โ”ƒ Note โ”ƒ
106
+ โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
107
+ โ”‚ Base LLM (no context) โ”‚ 44.4% โ”‚ 0 โ”‚ โ€” โ”‚
108
+ โ”‚ ACE (offline โ†’ eval) โ”‚ 83.3% โ”‚ 5 โ”‚ +38.9 pts โ”‚
109
+ โ”‚ Monolithic rewrite (online) โ”‚ 72.2% โ”‚ 4 โ”‚ 2 collapses โ”‚
110
+ โ”‚ ACE (online) โ”‚ 83.3% โ”‚ 6 โ”‚ no collapse โ”‚
111
+ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
112
+ ```
113
+
114
+ Watch a run adapt **live in your terminal**:
115
+
116
+ ```bash
117
+ ace run # animated dashboard: playbook growth, accuracy, deltas
118
+ ```
119
+
120
+ ### โ€ฆor in ~10 lines of Python
121
+
122
+ ```python
123
+ from ace import ACE, SimulatedLLM, TeachingEnvironment, build_teaching_task
124
+ from ace.baselines import StaticAgent
125
+
126
+ env = TeachingEnvironment()
127
+ task = build_teaching_task()
128
+ train, test = task.split()
129
+
130
+ base = StaticAgent(SimulatedLLM(env)).run(test) # no learning
131
+ ace = ACE(SimulatedLLM(env))
132
+ ace.adapt_offline(train) # build a playbook from feedback
133
+ result = ace.evaluate(test) # measure on held-out data
134
+
135
+ print(f"Base {base.accuracy:.0f}% โ†’ ACE {result.accuracy:.0f}%")
136
+ print(ace.playbook.render()) # human-readable playbook
137
+ ```
138
+
139
+ ---
140
+
141
+ ## ๐Ÿ”Œ Use it with the OpenAI Agents SDK
142
+
143
+ ACE plugs into the [OpenAI Agents SDK](https://openai.github.io/openai-agents-python/)
144
+ as a **self-improving memory**. The playbook is injected into your agent's
145
+ instructions on every run; after each task you hand back feedback (a label *or*
146
+ just natural execution signal) and ACE grows the playbook.
147
+
148
+ ```bash
149
+ pip install "ace-playbook[all]" # adds openai + openai-agents
150
+ export OPENAI_API_KEY=sk-...
151
+ ```
152
+
153
+ ```python
154
+ from agents import Agent, function_tool
155
+ from ace import ACE, OpenAILLM
156
+ from ace.integrations.openai_agents import ACEAgent
157
+
158
+ base = Agent(name="Support", instructions="You are a concise support agent.")
159
+ agent = ACEAgent(base, ace=ACE(OpenAILLM(model="gpt-4o-mini")))
160
+
161
+ # Run + learn from execution feedback โ€” no ground-truth labels needed:
162
+ out = agent.run_and_learn(
163
+ "Cancel order #C99",
164
+ signal="Policy: cancellation requires identity verification first.",
165
+ )
166
+ print(out.output)
167
+ print(agent.ace.playbook.render()) # the agent just wrote itself a rule
168
+ ```
169
+
170
+ Inside an existing event loop (FastAPI, notebooks, other async agents) use the
171
+ async entry points โ€” same semantics, no `run_sync`:
172
+
173
+ ```python
174
+ out = await agent.arun_and_learn("Cancel order #C99", signal="...")
175
+ ```
176
+
177
+ `ACEAgent` accepts string **or** dynamic (callable) base instructions and
178
+ composes the playbook beneath them; pass `base_instructions=...` to override.
179
+ `python examples/04_openai_agents.py` is a runnable end-to-end example.
180
+
181
+ ---
182
+
183
+ ## ๐Ÿงฉ Use it on *your own* task
184
+
185
+ Two extension points make ACE general-purpose โ€” bring your own `Task` and your
186
+ own feedback (no ground-truth labels required):
187
+
188
+ ```python
189
+ from ace import ACE, Feedback, Sample, Task, OpenAILLM
190
+
191
+ my_task = Task(name="my-domain", samples=[Sample(id="1", question="...")],
192
+ evaluate=lambda pred, s: my_score(pred, s))
193
+
194
+ def my_feedback(sample, generation) -> Feedback:
195
+ # plug in execution signals, a reward fn, or an LLM judge โ€” your call
196
+ ok = run_my_checks(generation.answer)
197
+ return Feedback(correct=ok, signal="tests passed" if ok else "tests FAILED")
198
+
199
+ ace = ACE(OpenAILLM(model="gpt-4o-mini"))
200
+ ace.adapt_online(my_task, feedback_fn=my_feedback) # learns from YOUR signals
201
+ ```
202
+
203
+ See `examples/05_custom_task.py` (runs offline). The Curator calls the LLM to
204
+ propose `ADD`/`UPDATE`/`REMOVE` edits by default (deterministic fallback never
205
+ drops a lesson); force deterministic curation with `ACEConfig(curator_use_llm=False)`.
206
+
207
+ ---
208
+
209
+ ## ๐Ÿง  How it works
210
+
211
+ ```mermaid
212
+ flowchart LR
213
+ Q([Query]) --> G[Generator]
214
+ PB[(Context Playbook)] -. injected .-> G
215
+ G -->|trajectory + bullet usage| R[Reflector]
216
+ FB([Feedback: labels or execution signal]) --> R
217
+ R -->|insights, iterative refinement| C[Curator]
218
+ C -->|delta items| M{{Deterministic Merge - non-LLM}}
219
+ M --> PB
220
+ M --> GR[Grow & Refine: dedupe / prune]
221
+ GR --> PB
222
+ classDef role fill:#1e293b,color:#fff;
223
+ classDef store fill:#2563eb,color:#fff;
224
+ classDef det fill:#16a34a,color:#fff;
225
+ class G,R,C role;
226
+ class PB store;
227
+ class M,GR det;
228
+ ```
229
+
230
+ 1. **Generator** solves the query using the current playbook, flagging which bullets helped or misled.
231
+ 2. **Reflector** critiques the trajectory against feedback and distills concrete, reusable **insights** (optionally over several refinement rounds).
232
+ 3. **Curator** turns insights into a few **delta operations** (`ADD` / `UPDATE` / `REMOVE`).
233
+ 4. **Deterministic merge** applies those edits to the playbook โ€” *no LLM, no rewrite, no collapse.*
234
+ 5. **Grow-and-refine** de-duplicates (semantic or lexical) and prunes consistently harmful bullets.
235
+
236
+ ACE runs in two regimes โ€” multi-epoch **offline** optimization and sequential **online** test-time adaptation (which can be warm-started from an offline playbook):
237
+
238
+ ```mermaid
239
+ flowchart LR
240
+ subgraph Offline["Offline โ€” system-prompt optimization"]
241
+ TR[(Train split)] --> EP{Multi-epoch}
242
+ EP --> ST[ACE.step] --> EP
243
+ EP --> PBO[(Playbook)]
244
+ end
245
+ subgraph Online["Online โ€” test-time memory"]
246
+ S[Next sample] --> PR[predict] --> LE[learn] --> S
247
+ end
248
+ PBO -. optional warm start .-> Online
249
+ classDef store fill:#2563eb,color:#fff;
250
+ class PBO store;
251
+ ```
252
+
253
+ Full diagrams (roles, bullet lifecycle, grow-and-refine, feedback regimes, data model โ€” 14 in total) live in **[ARCHITECTURE.md](ARCHITECTURE.md)** and on the **[docs site](https://rrahimi-uci.github.io/agentic-context-engineering/architecture.html)**.
254
+
255
+ ---
256
+
257
+ ## ๐Ÿ“Š Results
258
+
259
+ ### Reproducible, in this repo (offline teaching environment, no API key)
260
+
261
+ These come straight from the bundled examples (`examples/*.py`) and are fully deterministic:
262
+
263
+ | Demo | Base LLM | ACE | ฮ” |
264
+ |---|---|---|---|
265
+ | Quickstart (offline โ†’ held-out eval) | 44.4% | **83.3%** | **+38.9 pts** |
266
+ | Context-collapse benchmark (online) | 41.7% | **88.3%** | **+46.6 pts** |
267
+ | Offline warmup + online | 34.5% | **96.6%** | **+62.1 pts** |
268
+
269
+ In the context-collapse demo, the **monolithic-rewrite** baseline collapses its context
270
+ 7ร— and stalls at 60.0%, while ACE never collapses. Adaptation **token ingestion** for ACE
271
+ is **โˆ’94.9%** vs. full re-ingestion (deltas are tiny). Generate the visual report with
272
+ `ace demo --html report.html` โ†’ [sample report](docs/assets/sample_report.html).
273
+
274
+ ### Reported in the paper (real benchmarks, DeepSeek-V3.1)
275
+
276
+ | Benchmark | Baseline | **+ ACE** |
277
+ |---|---|---|
278
+ | AppWorld (agent, avg) | 42.4% (ReAct) | **59.5%** (+17.1) |
279
+ | FiNER (financial NER) | 70.7% | **78.3%** |
280
+ | Formula (financial reasoning) | 67.5% | **85.5%** |
281
+ | Adaptation latency (offline AppWorld) | โ€” | **โˆ’86.9%** |
282
+ | Token cost (online FiNER) | โ€” | **โˆ’83.6%** |
283
+
284
+ > On the AppWorld leaderboard, ReAct+ACE with an open-source model **matches the
285
+ > top-ranked production GPT-4.1 agent** and surpasses it on the harder
286
+ > test-challenge split. (Numbers above are from the paper; this repo reproduces
287
+ > the *mechanism* and its qualitative behavior offline.)
288
+
289
+ ---
290
+
291
+ ## ๐Ÿ—‚๏ธ What's in the box
292
+
293
+ ```
294
+ ace/
295
+ โ”œโ”€โ”€ playbook.py # Bullet + Playbook: the evolving, sectioned context
296
+ โ”œโ”€โ”€ delta.py # incremental ADD/UPDATE/REMOVE + deterministic merge
297
+ โ”œโ”€โ”€ roles.py # Generator ยท Reflector ยท Curator (+ prompts)
298
+ โ”œโ”€โ”€ refine.py # grow-and-refine: semantic dedupe + harmful pruning
299
+ โ”œโ”€โ”€ engine.py # ACE orchestrator: offline / online adaptation
300
+ โ”œโ”€โ”€ llm.py # LLM protocol ยท OpenAILLM ยท deterministic SimulatedLLM
301
+ โ”œโ”€โ”€ feedback.py # labeled or label-free execution feedback
302
+ โ”œโ”€โ”€ tasks.py # Sample/Task + offline TeachingEnvironment
303
+ โ”œโ”€โ”€ baselines.py # StaticAgent + MonolithicRewriteAgent (context collapse)
304
+ โ”œโ”€โ”€ visualize.py # live terminal dashboard + self-contained HTML report
305
+ โ”œโ”€โ”€ integrations/
306
+ โ”‚ โ””โ”€โ”€ openai_agents.py # ACEAgent: drop-in self-improving memory
307
+ โ””โ”€โ”€ cli.py # `ace demo | run | playbook | version`
308
+ examples/ # 5 runnable demos (4 need no API key)
309
+ tests/ # 112 tests, run in <1s, zero network
310
+ ```
311
+
312
+ ---
313
+
314
+ ## ๐Ÿงช Develop & test
315
+
316
+ ```bash
317
+ pip install -e ".[dev]"
318
+ pytest # 112 tests, fully offline, ~1s
319
+ python examples/01_quickstart.py
320
+ python examples/02_context_collapse.py # writes ace_report.html
321
+ ```
322
+
323
+ The bundled `SimulatedLLM` + `TeachingEnvironment` make every demo and test
324
+ **deterministic and key-free**, so the ACE *control loop* is exercised end-to-end
325
+ in CI. Swap in `OpenAILLM` for real models and benchmarks โ€” the algorithm and
326
+ prompts are unchanged.
327
+
328
+ ---
329
+
330
+ ## ๐Ÿ” Key concepts (glossary)
331
+
332
+ - **Playbook** โ€” the evolving context, a set of itemized **bullets** grouped into sections.
333
+ - **Bullet** โ€” one atomic lesson with a stable id and `helpful`/`harmful` counters.
334
+ - **Delta update** โ€” a small, localized batch of `ADD`/`UPDATE`/`REMOVE` edits (vs. a full rewrite).
335
+ - **Grow-and-refine** โ€” append new bullets, update existing in place, semantically de-duplicate, prune harmful.
336
+ - **Generator / Reflector / Curator** โ€” the three specialized roles of the ACE loop.
337
+ - **Offline vs. online** โ€” multi-epoch optimization on a train split vs. sequential test-time adaptation.
338
+
339
+ ---
340
+
341
+ ## ๐Ÿ“š Citation
342
+
343
+ ```bibtex
344
+ @inproceedings{zhang2026ace,
345
+ title = {Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models},
346
+ author = {Zhang, Qizheng and Hu, Changran and Upasani, Shubhangi and others},
347
+ booktitle = {International Conference on Learning Representations (ICLR)},
348
+ year = {2026},
349
+ url = {https://arxiv.org/abs/2510.04618}
350
+ }
351
+ ```
352
+
353
+ This implementation is an independent, open-source reproduction for research and
354
+ educational use. All credit for the ACE method belongs to the original authors.
355
+
356
+ ## ๐Ÿ“ License
357
+
358
+ [MIT](LICENSE). Contributions welcome โ€” see [CONTRIBUTING.md](CONTRIBUTING.md).
359
+
360
+ <div align="center">
361
+ <sub>Built to make <b>self-improving LLM agents</b> easy: <code>pip install</code> โ†’ a few lines โ†’ a playbook that gets better with every task.</sub>
362
+ </div>