context-compiler 0.5.2__tar.gz → 0.6.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (122) hide show
  1. {context_compiler-0.5.2 → context_compiler-0.6.2}/.github/workflows/ci.yml +4 -1
  2. {context_compiler-0.5.2 → context_compiler-0.6.2}/.github/workflows/publish-pypi.yml +36 -6
  3. context_compiler-0.6.2/.github/workflows/stress-tests.yml +47 -0
  4. {context_compiler-0.5.2 → context_compiler-0.6.2}/.pre-commit-config.yaml +2 -0
  5. {context_compiler-0.5.2 → context_compiler-0.6.2}/PKG-INFO +83 -23
  6. {context_compiler-0.5.2 → context_compiler-0.6.2}/README.md +80 -22
  7. {context_compiler-0.5.2 → context_compiler-0.6.2}/demos/06_llm_context_compaction.py +2 -2
  8. context_compiler-0.6.2/docs/README.md +20 -0
  9. context_compiler-0.6.2/docs/llm-preprocessor.md +44 -0
  10. context_compiler-0.6.2/evals/litellm_proxy_additional_findings.md +89 -0
  11. context_compiler-0.6.2/evals/litellm_proxy_behavioral_comparisons.md +120 -0
  12. {context_compiler-0.5.2 → context_compiler-0.6.2}/evals/swe-bench/swe-bench.py +6 -2
  13. {context_compiler-0.5.2 → context_compiler-0.6.2}/examples/06_transcript_replay.py +2 -2
  14. {context_compiler-0.5.2 → context_compiler-0.6.2}/examples/integrations/README.md +15 -5
  15. context_compiler-0.6.2/examples/integrations/litellm/README.md +64 -0
  16. context_compiler-0.6.2/examples/integrations/litellm/basic.py +124 -0
  17. context_compiler-0.6.2/examples/integrations/litellm/with_preprocessor.py +220 -0
  18. {context_compiler-0.5.2 → context_compiler-0.6.2}/examples/integrations/litellm_proxy/README.md +37 -1
  19. {context_compiler-0.5.2 → context_compiler-0.6.2}/examples/integrations/litellm_proxy/config.example.yaml +3 -0
  20. {context_compiler-0.5.2 → context_compiler-0.6.2}/examples/integrations/litellm_proxy/context_compiler_precall_hook.py +53 -20
  21. context_compiler-0.6.2/examples/integrations/litellm_proxy/context_compiler_precall_hook_with_preprocessor.py +270 -0
  22. context_compiler-0.6.2/examples/integrations/openwebui/README.md +84 -0
  23. context_compiler-0.6.2/examples/integrations/openwebui/open_webui_pipe.py +303 -0
  24. context_compiler-0.6.2/examples/integrations/openwebui/open_webui_pipe_with_preprocessor.py +374 -0
  25. context_compiler-0.6.2/experimental/__init__.py +1 -0
  26. context_compiler-0.6.2/experimental/preprocessor/README.md +72 -0
  27. context_compiler-0.6.2/experimental/preprocessor/__init__.py +24 -0
  28. context_compiler-0.6.2/experimental/preprocessor/constants.py +29 -0
  29. context_compiler-0.6.2/experimental/preprocessor/heuristic_precompiler.py +239 -0
  30. context_compiler-0.6.2/experimental/preprocessor/output_validation.py +102 -0
  31. context_compiler-0.6.2/experimental/preprocessor/prompt_utils.py +57 -0
  32. context_compiler-0.6.2/experimental/preprocessor/prompts/default.txt +129 -0
  33. context_compiler-0.6.2/experimental/preprocessor/prompts/llama.txt +114 -0
  34. {context_compiler-0.5.2 → context_compiler-0.6.2}/pyproject.toml +5 -2
  35. {context_compiler-0.5.2 → context_compiler-0.6.2}/src/context_compiler/__init__.py +4 -0
  36. {context_compiler-0.5.2 → context_compiler-0.6.2}/src/context_compiler/engine.py +11 -3
  37. context_compiler-0.6.2/tests/fixtures/v2/step/012_clear_premise_populated_update.json +32 -0
  38. context_compiler-0.6.2/tests/fixtures/v2/step/013_clear_premise_already_null_update.json +32 -0
  39. context_compiler-0.6.2/tests/fixtures/v2/step/014_reset_policies_populated_update.json +29 -0
  40. context_compiler-0.6.2/tests/fixtures/v2/step/015_reset_policies_already_empty_update.json +26 -0
  41. context_compiler-0.6.2/tests/fixtures/v2/step/016_clear_state_populated_update.json +28 -0
  42. context_compiler-0.6.2/tests/fixtures/v2/step/017_clear_state_already_empty_update.json +26 -0
  43. context_compiler-0.6.2/tests/test_precompiler_heuristic.py +252 -0
  44. context_compiler-0.6.2/tests/test_precompiler_heuristic_properties.py +148 -0
  45. context_compiler-0.6.2/tests/test_precompiler_output_validation.py +50 -0
  46. context_compiler-0.6.2/tests/test_precompiler_prompt_utils.py +67 -0
  47. context_compiler-0.6.2/tests/test_precompiler_validator_properties.py +173 -0
  48. {context_compiler-0.5.2 → context_compiler-0.6.2}/uv.lock +6 -2
  49. context_compiler-0.5.2/docs/README.md +0 -6
  50. context_compiler-0.5.2/docs/llm-preprocessor.md +0 -149
  51. context_compiler-0.5.2/examples/integrations/litellm_sdk.py +0 -95
  52. {context_compiler-0.5.2 → context_compiler-0.6.2}/.gitignore +0 -0
  53. {context_compiler-0.5.2 → context_compiler-0.6.2}/AGENTS.md +0 -0
  54. {context_compiler-0.5.2 → context_compiler-0.6.2}/CONTRIBUTING.md +0 -0
  55. {context_compiler-0.5.2 → context_compiler-0.6.2}/LICENSE +0 -0
  56. {context_compiler-0.5.2 → context_compiler-0.6.2}/demos/01_llm_contradiction_clarify.py +0 -0
  57. {context_compiler-0.5.2 → context_compiler-0.6.2}/demos/02_llm_constraint_guardrail.py +0 -0
  58. {context_compiler-0.5.2 → context_compiler-0.6.2}/demos/03_llm_premise_guardrail.py +0 -0
  59. {context_compiler-0.5.2 → context_compiler-0.6.2}/demos/04_llm_tool_denylist_guardrail.py +0 -0
  60. {context_compiler-0.5.2 → context_compiler-0.6.2}/demos/05_llm_prompt_drift_vs_state.py +0 -0
  61. {context_compiler-0.5.2 → context_compiler-0.6.2}/demos/07_llm_prompt_vs_state.py +0 -0
  62. {context_compiler-0.5.2 → context_compiler-0.6.2}/demos/README.md +0 -0
  63. {context_compiler-0.5.2 → context_compiler-0.6.2}/demos/__init__.py +0 -0
  64. {context_compiler-0.5.2 → context_compiler-0.6.2}/demos/common.py +0 -0
  65. {context_compiler-0.5.2 → context_compiler-0.6.2}/demos/llm_client.py +0 -0
  66. {context_compiler-0.5.2 → context_compiler-0.6.2}/demos/run_demo.py +0 -0
  67. {context_compiler-0.5.2 → context_compiler-0.6.2}/docs/DescriptionAndMilestones.md +0 -0
  68. {context_compiler-0.5.2 → context_compiler-0.6.2}/docs/DirectiveGrammarSpec.md +0 -0
  69. {context_compiler-0.5.2 → context_compiler-0.6.2}/docs/multi-engine.md +0 -0
  70. {context_compiler-0.5.2 → context_compiler-0.6.2}/evals/swe-bench/README.md +0 -0
  71. {context_compiler-0.5.2 → context_compiler-0.6.2}/evals/swe-bench/RUBRIC.md +0 -0
  72. {context_compiler-0.5.2 → context_compiler-0.6.2}/evals/swe-bench/manifest.json +0 -0
  73. {context_compiler-0.5.2 → context_compiler-0.6.2}/evals/swe-bench/tasks/django__django-12453.json +0 -0
  74. {context_compiler-0.5.2 → context_compiler-0.6.2}/evals/swe-bench/tasks/django__django-13158.json +0 -0
  75. {context_compiler-0.5.2 → context_compiler-0.6.2}/evals/swe-bench/tasks/django__django-13964.json +0 -0
  76. {context_compiler-0.5.2 → context_compiler-0.6.2}/evals/swe-bench/tasks/django__django-15252.json +0 -0
  77. {context_compiler-0.5.2 → context_compiler-0.6.2}/evals/swe-bench/tasks/matplotlib__matplotlib-23299.json +0 -0
  78. {context_compiler-0.5.2 → context_compiler-0.6.2}/evals/swe-bench/tasks/psf__requests-1963.json +0 -0
  79. {context_compiler-0.5.2 → context_compiler-0.6.2}/examples/01_persistent_guardrails.py +0 -0
  80. {context_compiler-0.5.2 → context_compiler-0.6.2}/examples/02_configuration_and_correction.py +0 -0
  81. {context_compiler-0.5.2 → context_compiler-0.6.2}/examples/03_ambiguity_with_clarification.py +0 -0
  82. {context_compiler-0.5.2 → context_compiler-0.6.2}/examples/04_tool_governance_denylist.py +0 -0
  83. {context_compiler-0.5.2 → context_compiler-0.6.2}/examples/05_llm_integration_pattern.py +0 -0
  84. {context_compiler-0.5.2 → context_compiler-0.6.2}/examples/07_single_policy_correction.py +0 -0
  85. {context_compiler-0.5.2 → context_compiler-0.6.2}/examples/README.md +0 -0
  86. {context_compiler-0.5.2 → context_compiler-0.6.2}/examples/_util.py +0 -0
  87. {context_compiler-0.5.2 → context_compiler-0.6.2}/src/context_compiler/const.py +0 -0
  88. {context_compiler-0.5.2 → context_compiler-0.6.2}/src/context_compiler/repl.py +0 -0
  89. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/fixtures/README.md +0 -0
  90. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/fixtures/v2/step/001_set_premise_update.json +0 -0
  91. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/fixtures/v2/step/002_use_item_normalization.json +0 -0
  92. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/fixtures/v2/step/003_conflict_prohibit_clarify.json +0 -0
  93. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/fixtures/v2/step/004_remove_policy_missing_idempotent_update.json +0 -0
  94. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/fixtures/v2/step/005_exact_prefix_passthrough_leading_space.json +0 -0
  95. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/fixtures/v2/step/006_near_miss_set_premise_to.json +0 -0
  96. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/fixtures/v2/step/007_near_miss_change_premise_missing_to.json +0 -0
  97. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/fixtures/v2/step/008_replace_missing_source_clarify_prompt.json +0 -0
  98. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/fixtures/v2/step/009_pending_affirmative_normalized_token.json +0 -0
  99. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/fixtures/v2/step/010_pending_negative_normalized_token.json +0 -0
  100. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/fixtures/v2/step/011_pending_unmatched_reuses_prompt.json +0 -0
  101. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/fixtures/v2/transcript/001_user_only_replay_state.json +0 -0
  102. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/fixtures/v2/transcript/002_non_string_user_content_ignored.json +0 -0
  103. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/fixtures/v2/transcript/003_stops_at_first_clarify_later_yes.json +0 -0
  104. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/fixtures/v2/transcript/004_stops_at_first_clarify_later_no.json +0 -0
  105. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/test_04_grammar_edge_cases.py +0 -0
  106. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/test_04_llm_tool_governance.py +0 -0
  107. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/test_07_llm_prompt_engineering_comparison.py +0 -0
  108. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/test_demo_01_04_behavior.py +0 -0
  109. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/test_demo_05_prompt_contract.py +0 -0
  110. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/test_demo_07_output_clarity.py +0 -0
  111. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/test_demo_compaction.py +0 -0
  112. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/test_engine.py +0 -0
  113. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/test_examples.py +0 -0
  114. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/test_fixtures.py +0 -0
  115. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/test_llm_client.py +0 -0
  116. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/test_llm_demos.py +0 -0
  117. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/test_properties.py +0 -0
  118. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/test_repl.py +0 -0
  119. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/test_repl_properties.py +0 -0
  120. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/test_run_demo.py +0 -0
  121. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/test_smoke.py +0 -0
  122. {context_compiler-0.5.2 → context_compiler-0.6.2}/tests/test_transcript_replay.py +0 -0
@@ -34,8 +34,11 @@ jobs:
34
34
  - name: Ruff (format)
35
35
  run: uv run ruff format --check .
36
36
 
37
+ - name: Install mypy integration dependencies
38
+ run: uv pip install --python .venv/bin/python litellm pydantic
39
+
37
40
  - name: Mypy
38
- run: uv run mypy src
41
+ run: uv run mypy src examples evals/swe-bench demos
39
42
 
40
43
  - name: Install Hypothesis for tests
41
44
  run: uv pip install --python .venv/bin/python hypothesis
@@ -3,13 +3,41 @@ name: Publish to PyPI
3
3
  on:
4
4
  release:
5
5
  types: [published]
6
- workflow_dispatch:
7
6
 
8
7
  permissions:
9
8
  contents: read
10
9
  id-token: write
11
10
 
12
11
  jobs:
12
+ stress-tests:
13
+ name: Stress tests (release gate)
14
+ runs-on: ubuntu-latest
15
+
16
+ steps:
17
+ - uses: actions/checkout@v6
18
+
19
+ - name: Set up Python
20
+ uses: actions/setup-python@v6
21
+ with:
22
+ python-version: "3.12"
23
+
24
+ - name: Install uv
25
+ uses: astral-sh/setup-uv@v7
26
+ with:
27
+ enable-cache: true
28
+
29
+ - name: Install dev and demos dependencies
30
+ run: uv sync --extra dev --extra demos
31
+
32
+ - name: Run pytest stress loop
33
+ shell: bash
34
+ run: |
35
+ loops="10"
36
+ for i in $(seq 1 "$loops"); do
37
+ echo "== stress run $i/$loops =="
38
+ uv run pytest -q
39
+ done
40
+
13
41
  build:
14
42
  name: Build distributions
15
43
  runs-on: ubuntu-latest
@@ -22,11 +50,13 @@ jobs:
22
50
  with:
23
51
  python-version: "3.12"
24
52
 
53
+ - name: Install uv
54
+ uses: astral-sh/setup-uv@v7
55
+ with:
56
+ enable-cache: true
57
+
25
58
  - name: Build sdist and wheel
26
- run: |
27
- python -m pip install --upgrade pip
28
- python -m pip install build
29
- python -m build
59
+ run: uv run --with build python -m build
30
60
 
31
61
  - name: Show built distributions
32
62
  run: |
@@ -41,7 +71,7 @@ jobs:
41
71
 
42
72
  publish:
43
73
  name: Publish to PyPI
44
- needs: build
74
+ needs: [build, stress-tests]
45
75
  runs-on: ubuntu-latest
46
76
  environment:
47
77
  name: pypi
@@ -0,0 +1,47 @@
1
+ name: Stress Tests
2
+
3
+ on:
4
+ workflow_dispatch:
5
+ inputs:
6
+ stress_loops:
7
+ description: Number of full pytest stress loops
8
+ required: false
9
+ default: "10"
10
+ schedule:
11
+ - cron: "0 3 * * *"
12
+
13
+ jobs:
14
+ stress-tests:
15
+ name: Stress tests
16
+ runs-on: ubuntu-latest
17
+ env:
18
+ STRESS_LOOPS: ${{ github.event_name == 'workflow_dispatch' && github.event.inputs.stress_loops || '10' }}
19
+
20
+ steps:
21
+ - uses: actions/checkout@v6
22
+
23
+ - name: Set up Python
24
+ uses: actions/setup-python@v6
25
+ with:
26
+ python-version: "3.12"
27
+
28
+ - name: Install uv
29
+ uses: astral-sh/setup-uv@v7
30
+ with:
31
+ enable-cache: true
32
+
33
+ - name: Install dev and demos dependencies
34
+ run: uv sync --extra dev --extra demos
35
+
36
+ - name: Run pytest stress loop
37
+ shell: bash
38
+ run: |
39
+ loops="${STRESS_LOOPS}"
40
+ if ! [[ "$loops" =~ ^[1-9][0-9]*$ ]]; then
41
+ echo "Invalid stress loop count: $loops"
42
+ exit 1
43
+ fi
44
+ for i in $(seq 1 "$loops"); do
45
+ echo "== stress run $i/$loops =="
46
+ uv run pytest -q
47
+ done
@@ -14,3 +14,5 @@ repos:
14
14
  args: [--pretty]
15
15
  additional_dependencies:
16
16
  - hypothesis
17
+ - litellm
18
+ - pydantic
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: context-compiler
3
- Version: 0.5.2
3
+ Version: 0.6.2
4
4
  Summary: Deterministic conversational state engine for LLM applications.
5
5
  Project-URL: Homepage, https://github.com/rlippmann/context-compiler
6
6
  Project-URL: Repository, https://github.com/rlippmann/context-compiler
@@ -26,6 +26,8 @@ Requires-Dist: pre-commit; extra == 'dev'
26
26
  Requires-Dist: pytest; extra == 'dev'
27
27
  Requires-Dist: pytest-cov; extra == 'dev'
28
28
  Requires-Dist: ruff<1.0,>=0.12; extra == 'dev'
29
+ Provides-Extra: experimental
30
+ Requires-Dist: litellm>=1.0.0; extra == 'experimental'
29
31
  Description-Content-Type: text/markdown
30
32
 
31
33
 
@@ -118,18 +120,32 @@ The host supplies the authoritative state to the model so the constraint persist
118
120
 
119
121
  ---
120
122
 
121
- ## Evidence (cross-model runs)
123
+ ## Deterministic behavior (examples)
122
124
 
123
- Behavior was evaluated using a fixed set of deterministic [demo scenarios](demos/).
125
+ LLMs interpret intent. Context Compiler enforces it.
124
126
 
125
- A run is considered a "pass" if the model output satisfies the scenario’s expected behavior.
127
+ **Explicit directive**
128
+ ```text
129
+ set premise concise replies
130
+ ```
131
+ - Base model: silently accepts / rewrites
132
+ - Context Compiler: applies a deterministic state update
126
133
 
127
- - Models tested: `llama3.1:8b`, `gpt-4o-mini`, `gpt-4.1`, `gpt-5`, `claude-sonnet-4`, `claude-opus-4`
128
- - Demo scenarios (all pass with compiler) cover ambiguity handling, constraint persistence, correction replacement, and tool governance.
129
- - Pass-rate summary: baseline (LLM only) `2–4 / 6`; with compiler `6 / 6`; with compiler + compaction `6 / 6`.
130
- - Context reduction in long conversations: up to `99%`
131
- - Prompt size reduction: about `50%`
132
- - [SWE curated results (compiler vs baseline)](evals/swe-bench/README.md) — cross-model evaluation on 6 tasks showing mostly positive deltas
134
+ **State-dependent operation**
135
+ ```text
136
+ clear state
137
+ use podman instead of docker
138
+ ```
139
+ - Base model: generic explanation
140
+ - Context Compiler: rejects (“No exact policy found for 'docker'…”)
141
+
142
+ **Lifecycle enforcement**
143
+ ```text
144
+ clear state
145
+ change premise to formal tone
146
+ ```
147
+ - Base model: conversational rewrite guidance
148
+ - Context Compiler: clarifies (“No premise exists yet…”)
133
149
 
134
150
  ---
135
151
 
@@ -183,8 +199,8 @@ Meaning:
183
199
  |---|---|
184
200
  | `create_engine(state=None)` | Create a new compiler engine; optional `state` provides initial authoritative state (validated/canonicalized). |
185
201
  | `step(user_input)` | Parse one user turn and return a deterministic `Decision`. |
186
- | `compile_transcript(messages)` | Replay a transcript from a fresh engine and return either final state or a confirmation prompt. |
187
- | `engine.apply_transcript(messages)` | Replay a transcript onto the current engine state and return either final state or a confirmation prompt. |
202
+ | `compile_transcript(messages: Transcript)` | Replay a transcript from a fresh engine and return either final state or a confirmation prompt. |
203
+ | `engine.apply_transcript(messages: Transcript)` | Replay a transcript onto the current engine state and return either final state or a confirmation prompt. |
188
204
  | `engine.state` | Read current authoritative in-memory state snapshot. |
189
205
  | `get_premise_value(state)` | Read the current premise value from a state snapshot. |
190
206
  | `get_policy_items(state, value=None)` | Read policy items from a state snapshot (all, `use`, or `prohibit`). |
@@ -252,27 +268,65 @@ For full directive grammar and edge-case behavior, see [DirectiveGrammarSpec.md]
252
268
 
253
269
  ---
254
270
 
255
- ## Conformance Fixtures
271
+ ## Guarantees
256
272
 
257
- Cross-language conformance tests are defined in [`tests/fixtures/`](tests/fixtures/).
273
+ - State changes only through explicit user directives or confirmation.
274
+ - Identical input sequences produce identical compiler state.
275
+ - Model responses never modify compiler state.
276
+ - Ambiguous directives trigger clarification instead of changing state.
277
+
278
+ These invariants are verified through behavioral tests and Hypothesis-based property tests.
258
279
 
259
280
  ---
260
281
 
261
- ## Advanced topics
282
+ ## Evidence
283
+
284
+ ### Behavioral correctness (key examples)
285
+
286
+ Concrete behavioral comparisons (base model vs compiler) are available here:
287
+
288
+ - [Open WebUI integration README](examples/integrations/openwebui/README.md)
289
+
290
+ These demonstrate deterministic clarification, state enforcement, and conflict handling.
291
+
292
+ ### Cross-model evaluation
293
+
294
+ - Models tested: `llama3.1:8b`, `gpt-4o-mini`, `gpt-4.1`, `gpt-5`, `claude-sonnet-4`, `claude-opus-4`
295
+ - Pass-rate summary: baseline (LLM only) `2–4 / 6`; with compiler `6 / 6`; with compiler + compaction `6 / 6`.
296
+
297
+ ### Efficiency
298
+
299
+ - Context reduction in long conversations: up to `99%`
300
+ - Prompt size reduction: about `50%`
301
+
302
+ ### Additional results
303
+
304
+ - [SWE curated results (compiler vs baseline)](evals/swe-bench/README.md) — cross-model evaluation on 6 tasks showing mostly positive deltas
262
305
 
263
- - [LLM preprocessor](docs/llm-preprocessor.md)
264
- - [Multiple engines](docs/multi-engine.md)
265
306
 
266
307
  ---
267
308
 
268
- ## Guarantees
269
309
 
270
- - State changes only through explicit user directives or confirmation.
271
- - Identical input sequences produce identical compiler state.
272
- - Model responses never modify compiler state.
273
- - Ambiguous directives trigger clarification instead of changing state.
310
+ ## Optional: LLM Preprocessor (Experimental)
274
311
 
275
- These invariants are verified through behavioral tests and Hypothesis-based property tests.
312
+ An optional host-side preprocessor can convert natural-language instructions
313
+ into canonical directives before compilation.
314
+
315
+ It is designed to be conservative and must be used with validation:
316
+
317
+ - heuristic-first, with LLM fallback when needed
318
+ - all outputs must be validated with `parse_precompiler_output(...)`
319
+ - raw outputs must not be passed directly to the compiler
320
+
321
+ See [LLM preprocessor](docs/llm-preprocessor.md) and
322
+ [`experimental/preprocessor/`](experimental/preprocessor/) for details.
323
+
324
+
325
+ ## Advanced topics
326
+
327
+ - [Multiple engines](docs/multi-engine.md)
328
+
329
+ For a full documentation map, see [docs/README.md](docs/README.md).
276
330
 
277
331
  ---
278
332
 
@@ -285,6 +339,12 @@ More detailed design and milestone documents are available in:
285
339
 
286
340
  ---
287
341
 
342
+ ## Conformance Fixtures
343
+
344
+ Cross-language conformance tests are defined in [`tests/fixtures/`](tests/fixtures/).
345
+
346
+ ---
347
+
288
348
  ## License
289
349
 
290
350
  Apache-2.0.
@@ -88,18 +88,32 @@ The host supplies the authoritative state to the model so the constraint persist
88
88
 
89
89
  ---
90
90
 
91
- ## Evidence (cross-model runs)
91
+ ## Deterministic behavior (examples)
92
92
 
93
- Behavior was evaluated using a fixed set of deterministic [demo scenarios](demos/).
93
+ LLMs interpret intent. Context Compiler enforces it.
94
94
 
95
- A run is considered a "pass" if the model output satisfies the scenario’s expected behavior.
95
+ **Explicit directive**
96
+ ```text
97
+ set premise concise replies
98
+ ```
99
+ - Base model: silently accepts / rewrites
100
+ - Context Compiler: applies a deterministic state update
96
101
 
97
- - Models tested: `llama3.1:8b`, `gpt-4o-mini`, `gpt-4.1`, `gpt-5`, `claude-sonnet-4`, `claude-opus-4`
98
- - Demo scenarios (all pass with compiler) cover ambiguity handling, constraint persistence, correction replacement, and tool governance.
99
- - Pass-rate summary: baseline (LLM only) `2–4 / 6`; with compiler `6 / 6`; with compiler + compaction `6 / 6`.
100
- - Context reduction in long conversations: up to `99%`
101
- - Prompt size reduction: about `50%`
102
- - [SWE curated results (compiler vs baseline)](evals/swe-bench/README.md) — cross-model evaluation on 6 tasks showing mostly positive deltas
102
+ **State-dependent operation**
103
+ ```text
104
+ clear state
105
+ use podman instead of docker
106
+ ```
107
+ - Base model: generic explanation
108
+ - Context Compiler: rejects (“No exact policy found for 'docker'…”)
109
+
110
+ **Lifecycle enforcement**
111
+ ```text
112
+ clear state
113
+ change premise to formal tone
114
+ ```
115
+ - Base model: conversational rewrite guidance
116
+ - Context Compiler: clarifies (“No premise exists yet…”)
103
117
 
104
118
  ---
105
119
 
@@ -153,8 +167,8 @@ Meaning:
153
167
  |---|---|
154
168
  | `create_engine(state=None)` | Create a new compiler engine; optional `state` provides initial authoritative state (validated/canonicalized). |
155
169
  | `step(user_input)` | Parse one user turn and return a deterministic `Decision`. |
156
- | `compile_transcript(messages)` | Replay a transcript from a fresh engine and return either final state or a confirmation prompt. |
157
- | `engine.apply_transcript(messages)` | Replay a transcript onto the current engine state and return either final state or a confirmation prompt. |
170
+ | `compile_transcript(messages: Transcript)` | Replay a transcript from a fresh engine and return either final state or a confirmation prompt. |
171
+ | `engine.apply_transcript(messages: Transcript)` | Replay a transcript onto the current engine state and return either final state or a confirmation prompt. |
158
172
  | `engine.state` | Read current authoritative in-memory state snapshot. |
159
173
  | `get_premise_value(state)` | Read the current premise value from a state snapshot. |
160
174
  | `get_policy_items(state, value=None)` | Read policy items from a state snapshot (all, `use`, or `prohibit`). |
@@ -222,27 +236,65 @@ For full directive grammar and edge-case behavior, see [DirectiveGrammarSpec.md]
222
236
 
223
237
  ---
224
238
 
225
- ## Conformance Fixtures
239
+ ## Guarantees
226
240
 
227
- Cross-language conformance tests are defined in [`tests/fixtures/`](tests/fixtures/).
241
+ - State changes only through explicit user directives or confirmation.
242
+ - Identical input sequences produce identical compiler state.
243
+ - Model responses never modify compiler state.
244
+ - Ambiguous directives trigger clarification instead of changing state.
245
+
246
+ These invariants are verified through behavioral tests and Hypothesis-based property tests.
228
247
 
229
248
  ---
230
249
 
231
- ## Advanced topics
250
+ ## Evidence
251
+
252
+ ### Behavioral correctness (key examples)
253
+
254
+ Concrete behavioral comparisons (base model vs compiler) are available here:
255
+
256
+ - [Open WebUI integration README](examples/integrations/openwebui/README.md)
257
+
258
+ These demonstrate deterministic clarification, state enforcement, and conflict handling.
259
+
260
+ ### Cross-model evaluation
261
+
262
+ - Models tested: `llama3.1:8b`, `gpt-4o-mini`, `gpt-4.1`, `gpt-5`, `claude-sonnet-4`, `claude-opus-4`
263
+ - Pass-rate summary: baseline (LLM only) `2–4 / 6`; with compiler `6 / 6`; with compiler + compaction `6 / 6`.
264
+
265
+ ### Efficiency
266
+
267
+ - Context reduction in long conversations: up to `99%`
268
+ - Prompt size reduction: about `50%`
269
+
270
+ ### Additional results
271
+
272
+ - [SWE curated results (compiler vs baseline)](evals/swe-bench/README.md) — cross-model evaluation on 6 tasks showing mostly positive deltas
232
273
 
233
- - [LLM preprocessor](docs/llm-preprocessor.md)
234
- - [Multiple engines](docs/multi-engine.md)
235
274
 
236
275
  ---
237
276
 
238
- ## Guarantees
239
277
 
240
- - State changes only through explicit user directives or confirmation.
241
- - Identical input sequences produce identical compiler state.
242
- - Model responses never modify compiler state.
243
- - Ambiguous directives trigger clarification instead of changing state.
278
+ ## Optional: LLM Preprocessor (Experimental)
244
279
 
245
- These invariants are verified through behavioral tests and Hypothesis-based property tests.
280
+ An optional host-side preprocessor can convert natural-language instructions
281
+ into canonical directives before compilation.
282
+
283
+ It is designed to be conservative and must be used with validation:
284
+
285
+ - heuristic-first, with LLM fallback when needed
286
+ - all outputs must be validated with `parse_precompiler_output(...)`
287
+ - raw outputs must not be passed directly to the compiler
288
+
289
+ See [LLM preprocessor](docs/llm-preprocessor.md) and
290
+ [`experimental/preprocessor/`](experimental/preprocessor/) for details.
291
+
292
+
293
+ ## Advanced topics
294
+
295
+ - [Multiple engines](docs/multi-engine.md)
296
+
297
+ For a full documentation map, see [docs/README.md](docs/README.md).
246
298
 
247
299
  ---
248
300
 
@@ -255,6 +307,12 @@ More detailed design and milestone documents are available in:
255
307
 
256
308
  ---
257
309
 
310
+ ## Conformance Fixtures
311
+
312
+ Cross-language conformance tests are defined in [`tests/fixtures/`](tests/fixtures/).
313
+
314
+ ---
315
+
258
316
  ## License
259
317
 
260
318
  Apache-2.0.
@@ -1,6 +1,6 @@
1
1
  """Demo 6: host-side prompt replacement from authoritative compiled state."""
2
2
 
3
- from context_compiler import compile_transcript, get_premise_value
3
+ from context_compiler import Transcript, compile_transcript, get_premise_value
4
4
  from demos.common import compact_user_turns, is_verbose, print_info_report
5
5
 
6
6
  DEMO_NAME = "06_context_compaction — superseded directives eliminated"
@@ -40,7 +40,7 @@ def _build_turns(turn_count: int) -> list[str]:
40
40
 
41
41
 
42
42
  def _compile_premise(turns: list[str]) -> str:
43
- messages: list[dict[str, object]] = [{"role": "user", "content": turn} for turn in turns]
43
+ messages: Transcript = [{"role": "user", "content": turn} for turn in turns]
44
44
  result = compile_transcript(messages)
45
45
  assert result["kind"] == "state"
46
46
  compiled_premise = get_premise_value(result["state"])
@@ -0,0 +1,20 @@
1
+ # Documentation Index
2
+
3
+ ## Start Here
4
+ - [Project README](../README.md)
5
+
6
+ ## Core Concepts
7
+ - [Directive Grammar](DirectiveGrammarSpec.md)
8
+
9
+ ## Integrations
10
+ - [Open WebUI integration](../examples/integrations/openwebui/README.md)
11
+
12
+ ## Preprocessor
13
+ - [LLM preprocessor](llm-preprocessor.md)
14
+
15
+ ## Evaluation & Evidence
16
+ - [Behavioral comparisons (Open WebUI)](../examples/integrations/openwebui/README.md)
17
+ - [SWE curated results](../evals/swe-bench/README.md)
18
+
19
+ ## Project Background
20
+ - [Description and Milestones](DescriptionAndMilestones.md)
@@ -0,0 +1,44 @@
1
+ # LLM Preprocessor (Optional, Experimental)
2
+
3
+ The experimental preprocessor is an optional host-side layer that can convert
4
+ natural-language messages into canonical Context Compiler directives before
5
+ compilation.
6
+
7
+ The compiler remains deterministic and authoritative. The preprocessor does not
8
+ replace core parsing or state semantics.
9
+
10
+ Install path for integrations using this layer:
11
+ `pip install "context-compiler[experimental]"`.
12
+
13
+ Integration runtimes must use installed-package imports/resources for this
14
+ layer. Do not rely on repo-relative preprocessor paths.
15
+
16
+ ## Required flow
17
+
18
+ Recommended conceptual flow:
19
+
20
+ 1. heuristic precompile
21
+ 2. validate candidate output
22
+ 3. LLM fallback precompile (only when needed)
23
+ 4. validate candidate output
24
+ 5. If a valid directive is produced, pass it to the compiler.
25
+ Otherwise pass the original input unchanged.
26
+
27
+ All preprocessor outputs, including heuristic outputs, must be validated with
28
+ `parse_precompiler_output(...)` before being applied.
29
+
30
+ Raw heuristic/LLM outputs must not be passed directly to the compiler.
31
+
32
+ ## Limits
33
+
34
+ The preprocessor is best-effort and intentionally conservative. Ambiguous,
35
+ reported, quoted, or mixed-intent inputs may still require abstention or host
36
+ clarification behavior.
37
+
38
+ ## Status
39
+
40
+ This preprocessor surface is experimental and may evolve independently of the
41
+ core engine.
42
+
43
+ For concrete module usage, prompt guidance, and integration details, see:
44
+ [`experimental/preprocessor/README.md`](../experimental/preprocessor/README.md).
@@ -0,0 +1,89 @@
1
+ # LiteLLM Proxy Additional Findings
2
+
3
+ Model: `ollama/qwen2.5:14b-instruct`
4
+
5
+ - Limitations/caveats:
6
+ - Confirm follow-up (`yes`) does not resolve the prior confirm in current replay-only proxy flow.
7
+ - Last-turn-only preprocessing can fail to persist earlier canonicalization effects across subsequent replay.
8
+ - Additional LiteLLM-surface behavior:
9
+ - Structured mixed-content user payloads can trigger upstream LiteLLM/Ollama message-shape validation errors.
10
+ - Structured text-part near-miss inputs still show a meaningful preprocessor lifecycle win over basic proxy.
11
+
12
+ ## Finding 1 — confirm follow-up loops (replay limitation)
13
+
14
+ **Prompt sequence**
15
+ 1. `clear state`
16
+ 2. `use podman instead of docker`
17
+ 3. `yes, keep existing policies and use podman`
18
+
19
+ **Vanilla**
20
+ - Step 2/3: generic Podman migration/help text.
21
+
22
+ **Basic proxy**
23
+ - Step 2: confirm clarify (`No exact policy found for "docker" ... Confirm to use "podman" ...`).
24
+ - Step 3: same confirm clarify repeats.
25
+
26
+ **Preprocessor proxy**
27
+ - Step 2: same confirm clarify.
28
+ - Step 3: same confirm clarify repeats.
29
+
30
+ **Why it matters**
31
+ Current replay-based proxy behavior does not treat natural-language “yes” as explicit confirm resolution, so this can loop until user supplies an explicit directive path.
32
+
33
+ ## Finding 2 — last-turn-only preprocessing is non-persistent across replay (replay limitation)
34
+
35
+ **Prompt sequence**
36
+ 1. `clear state`
37
+ 2. `set premise to concise replies`
38
+ 3. `Explain TCP in detail.`
39
+
40
+ **Vanilla**
41
+ - Conversationally accepts premise-like instruction, then gives normal long-form answer.
42
+
43
+ **Basic proxy**
44
+ - Step 2: syntax clarify (`Did you mean 'set premise concise replies'?`).
45
+ - Step 3: same syntax clarify repeats.
46
+
47
+ **Preprocessor proxy**
48
+ - Step 2: canonicalized update (`Premise set to concise replies ...`).
49
+ - Step 3: syntax clarify reappears (`Did you mean 'set premise concise replies'?`).
50
+
51
+ **Why it matters**
52
+ Only the latest replay turn is preprocessed; earlier raw near-miss text in transcript can still drive later replay outcomes.
53
+
54
+ ## Finding 3 — structured mixed content can fail upstream validation (LiteLLM-surface caveat)
55
+
56
+ **Prompt sequence**
57
+ 1. `clear state`
58
+ 2. user content parts: text (`set premise to concise replies`) + non-text (`input_image`)
59
+ 3. `What is TCP?`
60
+
61
+ **Vanilla**
62
+ - Upstream request fails with invalid user message shape error.
63
+
64
+ **Basic proxy**
65
+ - Blocks at compiler clarify before upstream model call.
66
+
67
+ **Preprocessor proxy**
68
+ - Step 2 hits upstream validation error path; later turn can return clarify.
69
+
70
+ **Why it matters**
71
+ In proxy mode, forwarded request messages remain unchanged; LiteLLM/Ollama payload validation behavior can dominate outcomes for mixed content shapes.
72
+
73
+ ## Finding 4 — structured text-part near-miss still yields stronger lifecycle result (LiteLLM-surface win)
74
+
75
+ **Prompt sequence**
76
+ 1. `clear state`
77
+ 2. user content text parts: `change premise` + `concise replies`
78
+
79
+ **Vanilla**
80
+ - Conversational acceptance of style change.
81
+
82
+ **Basic proxy**
83
+ - Syntax clarify only (`Did you mean 'change premise to concise replies'?`).
84
+
85
+ **Preprocessor proxy**
86
+ - Lifecycle clarify (`No premise exists yet. Use 'set premise ...' first.`).
87
+
88
+ **Why it matters**
89
+ For structured text-part inputs, preprocessor canonicalization can move past syntax-only clarify and reach the stronger lifecycle-semantic outcome.