PyPI - context-compiler - Versions diffs - 0.5.2__tar.gz → 0.6.2__tar.gz - Mend

context-compiler 0.5.2tar.gz → 0.6.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (122) hide show

{context_compiler-0.5.2 → context_compiler-0.6.2}/.github/workflows/ci.yml RENAMED Viewed

@@ -34,8 +34,11 @@ jobs:
       - name: Ruff (format)
         run: uv run ruff format --check .
+      - name: Install mypy integration dependencies
+        run: uv pip install --python .venv/bin/python litellm pydantic
       - name: Mypy
-        run: uv run mypy src
+        run: uv run mypy src examples evals/swe-bench demos
       - name: Install Hypothesis for tests
         run: uv pip install --python .venv/bin/python hypothesis

{context_compiler-0.5.2 → context_compiler-0.6.2}/.github/workflows/publish-pypi.yml RENAMED Viewed

@@ -3,13 +3,41 @@ name: Publish to PyPI
 on:
   release:
     types: [published]
-  workflow_dispatch:
 permissions:
   contents: read
   id-token: write
 jobs:
+  stress-tests:
+    name: Stress tests (release gate)
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v6
+      - name: Set up Python
+        uses: actions/setup-python@v6
+        with:
+          python-version: "3.12"
+      - name: Install uv
+        uses: astral-sh/setup-uv@v7
+        with:
+          enable-cache: true
+      - name: Install dev and demos dependencies
+        run: uv sync --extra dev --extra demos
+      - name: Run pytest stress loop
+        shell: bash
+        run: |
+          loops="10"
+          for i in $(seq 1 "$loops"); do
+            echo "== stress run $i/$loops =="
+            uv run pytest -q
+          done
   build:
     name: Build distributions
     runs-on: ubuntu-latest
@@ -22,11 +50,13 @@ jobs:
         with:
           python-version: "3.12"
+      - name: Install uv
+        uses: astral-sh/setup-uv@v7
+        with:
+          enable-cache: true
       - name: Build sdist and wheel
-        run: |
-          python -m pip install --upgrade pip
-          python -m pip install build
-          python -m build
+        run: uv run --with build python -m build
       - name: Show built distributions
         run: |
@@ -41,7 +71,7 @@ jobs:
   publish:
     name: Publish to PyPI
-    needs: build
+    needs: [build, stress-tests]
     runs-on: ubuntu-latest
     environment:
       name: pypi

context_compiler-0.6.2/.github/workflows/stress-tests.yml ADDED Viewed

@@ -0,0 +1,47 @@
+name: Stress Tests
+on:
+  workflow_dispatch:
+    inputs:
+      stress_loops:
+        description: Number of full pytest stress loops
+        required: false
+        default: "10"
+  schedule:
+    - cron: "0 3 * * *"
+jobs:
+  stress-tests:
+    name: Stress tests
+    runs-on: ubuntu-latest
+    env:
+      STRESS_LOOPS: ${{ github.event_name == 'workflow_dispatch' && github.event.inputs.stress_loops || '10' }}
+    steps:
+      - uses: actions/checkout@v6
+      - name: Set up Python
+        uses: actions/setup-python@v6
+        with:
+          python-version: "3.12"
+      - name: Install uv
+        uses: astral-sh/setup-uv@v7
+        with:
+          enable-cache: true
+      - name: Install dev and demos dependencies
+        run: uv sync --extra dev --extra demos
+      - name: Run pytest stress loop
+        shell: bash
+        run: |
+          loops="${STRESS_LOOPS}"
+          if ! [[ "$loops" =~ ^[1-9][0-9]*$ ]]; then
+            echo "Invalid stress loop count: $loops"
+            exit 1
+          fi
+          for i in $(seq 1 "$loops"); do
+            echo "== stress run $i/$loops =="
+            uv run pytest -q
+          done

{context_compiler-0.5.2 → context_compiler-0.6.2}/.pre-commit-config.yaml RENAMED Viewed

@@ -14,3 +14,5 @@ repos:
         args: [--pretty]
         additional_dependencies:
           - hypothesis
+          - litellm
+          - pydantic

{context_compiler-0.5.2 → context_compiler-0.6.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: context-compiler
-Version: 0.5.2
+Version: 0.6.2
 Summary: Deterministic conversational state engine for LLM applications.
 Project-URL: Homepage, https://github.com/rlippmann/context-compiler
 Project-URL: Repository, https://github.com/rlippmann/context-compiler
@@ -26,6 +26,8 @@ Requires-Dist: pre-commit; extra == 'dev'
 Requires-Dist: pytest; extra == 'dev'
 Requires-Dist: pytest-cov; extra == 'dev'
 Requires-Dist: ruff<1.0,>=0.12; extra == 'dev'
+Provides-Extra: experimental
+Requires-Dist: litellm>=1.0.0; extra == 'experimental'
 Description-Content-Type: text/markdown
@@ -118,18 +120,32 @@ The host supplies the authoritative state to the model so the constraint persist
 ---
-## Evidence (cross-model runs)
+## Deterministic behavior (examples)
-Behavior was evaluated using a fixed set of deterministic [demo scenarios](demos/).
+LLMs interpret intent. Context Compiler enforces it.
-A run is considered a "pass" if the model output satisfies the scenario’s expected behavior.
+**Explicit directive**
+```text
+set premise concise replies
+```
+- Base model: silently accepts / rewrites
+- Context Compiler: applies a deterministic state update
-- Models tested: `llama3.1:8b`, `gpt-4o-mini`, `gpt-4.1`, `gpt-5`, `claude-sonnet-4`, `claude-opus-4`
-- Demo scenarios (all pass with compiler) cover ambiguity handling, constraint persistence, correction replacement, and tool governance.
-- Pass-rate summary: baseline (LLM only) `2–4 / 6`; with compiler `6 / 6`; with compiler + compaction `6 / 6`.
-- Context reduction in long conversations: up to `99%`
-- Prompt size reduction: about `50%`
-- [SWE curated results (compiler vs baseline)](evals/swe-bench/README.md) — cross-model evaluation on 6 tasks showing mostly positive deltas
+**State-dependent operation**
+```text
+clear state
+use podman instead of docker
+```
+- Base model: generic explanation
+- Context Compiler: rejects (“No exact policy found for 'docker'…”)
+**Lifecycle enforcement**
+```text
+clear state
+change premise to formal tone
+```
+- Base model: conversational rewrite guidance
+- Context Compiler: clarifies (“No premise exists yet…”)
 ---
@@ -183,8 +199,8 @@ Meaning:
 |---|---|
 | `create_engine(state=None)` | Create a new compiler engine; optional `state` provides initial authoritative state (validated/canonicalized). |
 | `step(user_input)` | Parse one user turn and return a deterministic `Decision`. |
-| `compile_transcript(messages)` | Replay a transcript from a fresh engine and return either final state or a confirmation prompt. |
-| `engine.apply_transcript(messages)` | Replay a transcript onto the current engine state and return either final state or a confirmation prompt. |
+| `compile_transcript(messages: Transcript)` | Replay a transcript from a fresh engine and return either final state or a confirmation prompt. |
+| `engine.apply_transcript(messages: Transcript)` | Replay a transcript onto the current engine state and return either final state or a confirmation prompt. |
 | `engine.state` | Read current authoritative in-memory state snapshot. |
 | `get_premise_value(state)` | Read the current premise value from a state snapshot. |
 | `get_policy_items(state, value=None)` | Read policy items from a state snapshot (all, `use`, or `prohibit`). |
@@ -252,27 +268,65 @@ For full directive grammar and edge-case behavior, see [DirectiveGrammarSpec.md]
 ---
-## Conformance Fixtures
+## Guarantees
-Cross-language conformance tests are defined in [`tests/fixtures/`](tests/fixtures/).
+- State changes only through explicit user directives or confirmation.
+- Identical input sequences produce identical compiler state.
+- Model responses never modify compiler state.
+- Ambiguous directives trigger clarification instead of changing state.
+These invariants are verified through behavioral tests and Hypothesis-based property tests.
 ---
-## Advanced topics
+## Evidence
+### Behavioral correctness (key examples)
+Concrete behavioral comparisons (base model vs compiler) are available here:
+- [Open WebUI integration README](examples/integrations/openwebui/README.md)
+These demonstrate deterministic clarification, state enforcement, and conflict handling.
+### Cross-model evaluation
+- Models tested: `llama3.1:8b`, `gpt-4o-mini`, `gpt-4.1`, `gpt-5`, `claude-sonnet-4`, `claude-opus-4`
+- Pass-rate summary: baseline (LLM only) `2–4 / 6`; with compiler `6 / 6`; with compiler + compaction `6 / 6`.
+### Efficiency
+- Context reduction in long conversations: up to `99%`
+- Prompt size reduction: about `50%`
+### Additional results
+- [SWE curated results (compiler vs baseline)](evals/swe-bench/README.md) — cross-model evaluation on 6 tasks showing mostly positive deltas
-- [LLM preprocessor](docs/llm-preprocessor.md)
-- [Multiple engines](docs/multi-engine.md)
 ---
-## Guarantees
-- State changes only through explicit user directives or confirmation.
-- Identical input sequences produce identical compiler state.
-- Model responses never modify compiler state.
-- Ambiguous directives trigger clarification instead of changing state.
+## Optional: LLM Preprocessor (Experimental)
-These invariants are verified through behavioral tests and Hypothesis-based property tests.
+An optional host-side preprocessor can convert natural-language instructions
+into canonical directives before compilation.
+It is designed to be conservative and must be used with validation:
+- heuristic-first, with LLM fallback when needed
+- all outputs must be validated with `parse_precompiler_output(...)`
+- raw outputs must not be passed directly to the compiler
+See [LLM preprocessor](docs/llm-preprocessor.md) and
+[`experimental/preprocessor/`](experimental/preprocessor/) for details.
+## Advanced topics
+- [Multiple engines](docs/multi-engine.md)
+For a full documentation map, see [docs/README.md](docs/README.md).
 ---
@@ -285,6 +339,12 @@ More detailed design and milestone documents are available in:
 ---
+## Conformance Fixtures
+Cross-language conformance tests are defined in [`tests/fixtures/`](tests/fixtures/).
+---
 ## License
 Apache-2.0.

{context_compiler-0.5.2 → context_compiler-0.6.2}/README.md RENAMED Viewed

@@ -88,18 +88,32 @@ The host supplies the authoritative state to the model so the constraint persist
 ---
-## Evidence (cross-model runs)
+## Deterministic behavior (examples)
-Behavior was evaluated using a fixed set of deterministic [demo scenarios](demos/).
+LLMs interpret intent. Context Compiler enforces it.
-A run is considered a "pass" if the model output satisfies the scenario’s expected behavior.
+**Explicit directive**
+```text
+set premise concise replies
+```
+- Base model: silently accepts / rewrites
+- Context Compiler: applies a deterministic state update
-- Models tested: `llama3.1:8b`, `gpt-4o-mini`, `gpt-4.1`, `gpt-5`, `claude-sonnet-4`, `claude-opus-4`
-- Demo scenarios (all pass with compiler) cover ambiguity handling, constraint persistence, correction replacement, and tool governance.
-- Pass-rate summary: baseline (LLM only) `2–4 / 6`; with compiler `6 / 6`; with compiler + compaction `6 / 6`.
-- Context reduction in long conversations: up to `99%`
-- Prompt size reduction: about `50%`
-- [SWE curated results (compiler vs baseline)](evals/swe-bench/README.md) — cross-model evaluation on 6 tasks showing mostly positive deltas
+**State-dependent operation**
+```text
+clear state
+use podman instead of docker
+```
+- Base model: generic explanation
+- Context Compiler: rejects (“No exact policy found for 'docker'…”)
+**Lifecycle enforcement**
+```text
+clear state
+change premise to formal tone
+```
+- Base model: conversational rewrite guidance
+- Context Compiler: clarifies (“No premise exists yet…”)
 ---
@@ -153,8 +167,8 @@ Meaning:
 |---|---|
 | `create_engine(state=None)` | Create a new compiler engine; optional `state` provides initial authoritative state (validated/canonicalized). |
 | `step(user_input)` | Parse one user turn and return a deterministic `Decision`. |
-| `compile_transcript(messages)` | Replay a transcript from a fresh engine and return either final state or a confirmation prompt. |
-| `engine.apply_transcript(messages)` | Replay a transcript onto the current engine state and return either final state or a confirmation prompt. |
+| `compile_transcript(messages: Transcript)` | Replay a transcript from a fresh engine and return either final state or a confirmation prompt. |
+| `engine.apply_transcript(messages: Transcript)` | Replay a transcript onto the current engine state and return either final state or a confirmation prompt. |
 | `engine.state` | Read current authoritative in-memory state snapshot. |
 | `get_premise_value(state)` | Read the current premise value from a state snapshot. |
 | `get_policy_items(state, value=None)` | Read policy items from a state snapshot (all, `use`, or `prohibit`). |
@@ -222,27 +236,65 @@ For full directive grammar and edge-case behavior, see [DirectiveGrammarSpec.md]
 ---
-## Conformance Fixtures
+## Guarantees
-Cross-language conformance tests are defined in [`tests/fixtures/`](tests/fixtures/).
+- State changes only through explicit user directives or confirmation.
+- Identical input sequences produce identical compiler state.
+- Model responses never modify compiler state.
+- Ambiguous directives trigger clarification instead of changing state.
+These invariants are verified through behavioral tests and Hypothesis-based property tests.
 ---
-## Advanced topics
+## Evidence
+### Behavioral correctness (key examples)
+Concrete behavioral comparisons (base model vs compiler) are available here:
+- [Open WebUI integration README](examples/integrations/openwebui/README.md)
+These demonstrate deterministic clarification, state enforcement, and conflict handling.
+### Cross-model evaluation
+- Models tested: `llama3.1:8b`, `gpt-4o-mini`, `gpt-4.1`, `gpt-5`, `claude-sonnet-4`, `claude-opus-4`
+- Pass-rate summary: baseline (LLM only) `2–4 / 6`; with compiler `6 / 6`; with compiler + compaction `6 / 6`.
+### Efficiency
+- Context reduction in long conversations: up to `99%`
+- Prompt size reduction: about `50%`
+### Additional results
+- [SWE curated results (compiler vs baseline)](evals/swe-bench/README.md) — cross-model evaluation on 6 tasks showing mostly positive deltas
-- [LLM preprocessor](docs/llm-preprocessor.md)
-- [Multiple engines](docs/multi-engine.md)
 ---
-## Guarantees
-- State changes only through explicit user directives or confirmation.
-- Identical input sequences produce identical compiler state.
-- Model responses never modify compiler state.
-- Ambiguous directives trigger clarification instead of changing state.
+## Optional: LLM Preprocessor (Experimental)
-These invariants are verified through behavioral tests and Hypothesis-based property tests.
+An optional host-side preprocessor can convert natural-language instructions
+into canonical directives before compilation.
+It is designed to be conservative and must be used with validation:
+- heuristic-first, with LLM fallback when needed
+- all outputs must be validated with `parse_precompiler_output(...)`
+- raw outputs must not be passed directly to the compiler
+See [LLM preprocessor](docs/llm-preprocessor.md) and
+[`experimental/preprocessor/`](experimental/preprocessor/) for details.
+## Advanced topics
+- [Multiple engines](docs/multi-engine.md)
+For a full documentation map, see [docs/README.md](docs/README.md).
 ---
@@ -255,6 +307,12 @@ More detailed design and milestone documents are available in:
 ---
+## Conformance Fixtures
+Cross-language conformance tests are defined in [`tests/fixtures/`](tests/fixtures/).
+---
 ## License
 Apache-2.0.

{context_compiler-0.5.2 → context_compiler-0.6.2}/demos/06_llm_context_compaction.py RENAMED Viewed

@@ -1,6 +1,6 @@
 """Demo 6: host-side prompt replacement from authoritative compiled state."""
-from context_compiler import compile_transcript, get_premise_value
+from context_compiler import Transcript, compile_transcript, get_premise_value
 from demos.common import compact_user_turns, is_verbose, print_info_report
 DEMO_NAME = "06_context_compaction — superseded directives eliminated"
@@ -40,7 +40,7 @@ def _build_turns(turn_count: int) -> list[str]:
 def _compile_premise(turns: list[str]) -> str:
-    messages: list[dict[str, object]] = [{"role": "user", "content": turn} for turn in turns]
+    messages: Transcript = [{"role": "user", "content": turn} for turn in turns]
     result = compile_transcript(messages)
     assert result["kind"] == "state"
     compiled_premise = get_premise_value(result["state"])

context_compiler-0.6.2/docs/README.md ADDED Viewed

@@ -0,0 +1,20 @@
+# Documentation Index
+## Start Here
+- [Project README](../README.md)
+## Core Concepts
+- [Directive Grammar](DirectiveGrammarSpec.md)
+## Integrations
+- [Open WebUI integration](../examples/integrations/openwebui/README.md)
+## Preprocessor
+- [LLM preprocessor](llm-preprocessor.md)
+## Evaluation & Evidence
+- [Behavioral comparisons (Open WebUI)](../examples/integrations/openwebui/README.md)
+- [SWE curated results](../evals/swe-bench/README.md)
+## Project Background
+- [Description and Milestones](DescriptionAndMilestones.md)

context_compiler-0.6.2/docs/llm-preprocessor.md ADDED Viewed

@@ -0,0 +1,44 @@
+# LLM Preprocessor (Optional, Experimental)
+The experimental preprocessor is an optional host-side layer that can convert
+natural-language messages into canonical Context Compiler directives before
+compilation.
+The compiler remains deterministic and authoritative. The preprocessor does not
+replace core parsing or state semantics.
+Install path for integrations using this layer:
+`pip install "context-compiler[experimental]"`.
+Integration runtimes must use installed-package imports/resources for this
+layer. Do not rely on repo-relative preprocessor paths.
+## Required flow
+Recommended conceptual flow:
+1. heuristic precompile
+2. validate candidate output
+3. LLM fallback precompile (only when needed)
+4. validate candidate output
+5. If a valid directive is produced, pass it to the compiler.
+   Otherwise pass the original input unchanged.
+All preprocessor outputs, including heuristic outputs, must be validated with
+`parse_precompiler_output(...)` before being applied.
+Raw heuristic/LLM outputs must not be passed directly to the compiler.
+## Limits
+The preprocessor is best-effort and intentionally conservative. Ambiguous,
+reported, quoted, or mixed-intent inputs may still require abstention or host
+clarification behavior.
+## Status
+This preprocessor surface is experimental and may evolve independently of the
+core engine.
+For concrete module usage, prompt guidance, and integration details, see:
+[`experimental/preprocessor/README.md`](../experimental/preprocessor/README.md).

context_compiler-0.6.2/evals/litellm_proxy_additional_findings.md ADDED Viewed

@@ -0,0 +1,89 @@
+# LiteLLM Proxy Additional Findings
+Model: `ollama/qwen2.5:14b-instruct`
+- Limitations/caveats:
+  - Confirm follow-up (`yes`) does not resolve the prior confirm in current replay-only proxy flow.
+  - Last-turn-only preprocessing can fail to persist earlier canonicalization effects across subsequent replay.
+- Additional LiteLLM-surface behavior:
+  - Structured mixed-content user payloads can trigger upstream LiteLLM/Ollama message-shape validation errors.
+  - Structured text-part near-miss inputs still show a meaningful preprocessor lifecycle win over basic proxy.
+## Finding 1 — confirm follow-up loops (replay limitation)
+**Prompt sequence**
+1. `clear state`
+2. `use podman instead of docker`
+3. `yes, keep existing policies and use podman`
+**Vanilla**
+- Step 2/3: generic Podman migration/help text.
+**Basic proxy**
+- Step 2: confirm clarify (`No exact policy found for "docker" ... Confirm to use "podman" ...`).
+- Step 3: same confirm clarify repeats.
+**Preprocessor proxy**
+- Step 2: same confirm clarify.
+- Step 3: same confirm clarify repeats.
+**Why it matters**
+Current replay-based proxy behavior does not treat natural-language “yes” as explicit confirm resolution, so this can loop until user supplies an explicit directive path.
+## Finding 2 — last-turn-only preprocessing is non-persistent across replay (replay limitation)
+**Prompt sequence**
+1. `clear state`
+2. `set premise to concise replies`
+3. `Explain TCP in detail.`
+**Vanilla**
+- Conversationally accepts premise-like instruction, then gives normal long-form answer.
+**Basic proxy**
+- Step 2: syntax clarify (`Did you mean 'set premise concise replies'?`).
+- Step 3: same syntax clarify repeats.
+**Preprocessor proxy**
+- Step 2: canonicalized update (`Premise set to concise replies ...`).
+- Step 3: syntax clarify reappears (`Did you mean 'set premise concise replies'?`).
+**Why it matters**
+Only the latest replay turn is preprocessed; earlier raw near-miss text in transcript can still drive later replay outcomes.
+## Finding 3 — structured mixed content can fail upstream validation (LiteLLM-surface caveat)
+**Prompt sequence**
+1. `clear state`
+2. user content parts: text (`set premise to concise replies`) + non-text (`input_image`)
+3. `What is TCP?`
+**Vanilla**
+- Upstream request fails with invalid user message shape error.
+**Basic proxy**
+- Blocks at compiler clarify before upstream model call.
+**Preprocessor proxy**
+- Step 2 hits upstream validation error path; later turn can return clarify.
+**Why it matters**
+In proxy mode, forwarded request messages remain unchanged; LiteLLM/Ollama payload validation behavior can dominate outcomes for mixed content shapes.
+## Finding 4 — structured text-part near-miss still yields stronger lifecycle result (LiteLLM-surface win)
+**Prompt sequence**
+1. `clear state`
+2. user content text parts: `change premise` + `concise replies`
+**Vanilla**
+- Conversational acceptance of style change.
+**Basic proxy**
+- Syntax clarify only (`Did you mean 'change premise to concise replies'?`).
+**Preprocessor proxy**
+- Lifecycle clarify (`No premise exists yet. Use 'set premise ...' first.`).
+**Why it matters**
+For structured text-part inputs, preprocessor canonicalization can move past syntax-only clarify and reach the stronger lifecycle-semantic outcome.

context-compiler 0.5.2__tar.gz → 0.6.2__tar.gz

context-compiler 0.5.2tar.gz → 0.6.2tar.gz