llmsessioncontract 0.2.0__tar.gz → 0.2.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (24) hide show
  1. {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/PKG-INFO +47 -1
  2. {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/README.md +46 -0
  3. {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/__init__.py +4 -1
  4. {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/monitor/automaton.py +48 -5
  5. {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/monitor/monitor.py +39 -2
  6. {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmsessioncontract.egg-info/PKG-INFO +47 -1
  7. {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/pyproject.toml +1 -1
  8. {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/LICENSE +0 -0
  9. {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/dsl/__init__.py +0 -0
  10. {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/dsl/ast.py +0 -0
  11. {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/dsl/parser.py +0 -0
  12. {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/integration/__init__.py +0 -0
  13. {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/integration/client.py +0 -0
  14. {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/integration/exceptions.py +0 -0
  15. {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/integration/langfuse.py +0 -0
  16. {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/integration/middleware.py +0 -0
  17. {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/integration/types.py +0 -0
  18. {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/monitor/__init__.py +0 -0
  19. {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/py.typed +0 -0
  20. {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmsessioncontract.egg-info/SOURCES.txt +0 -0
  21. {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmsessioncontract.egg-info/dependency_links.txt +0 -0
  22. {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmsessioncontract.egg-info/requires.txt +0 -0
  23. {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmsessioncontract.egg-info/top_level.txt +0 -0
  24. {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/setup.cfg +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: llmsessioncontract
3
- Version: 0.2.0
3
+ Version: 0.2.2
4
4
  Summary: Runtime monitor for LLM agent interaction protocols based on session type theory
5
5
  Author-email: Chris Bartolo Burlo <chris@mizziburlo.com>
6
6
  License-Expression: MIT
@@ -111,6 +111,33 @@ for _ in range(100):
111
111
  m.receive("Pong") # Ok()
112
112
  ```
113
113
 
114
+ ### Handling natural-language input: `Unrecognized`
115
+
116
+ When the projection layer (typically over user chat) can't classify an event into a known label, it can emit the sentinel `UNRECOGNIZED` instead. The monitor treats this as a soft signal — distinct from `Violation` — without halting or advancing state, so the outer loop can drive a clarification turn:
117
+
118
+ ```python
119
+ from llmcontract import Monitor, Ok, Unrecognized, UNRECOGNIZED
120
+
121
+ m = Monitor("?{Yes.end, No.end}")
122
+ result = m.receive(UNRECOGNIZED) # projection couldn't decide
123
+ assert isinstance(result, Unrecognized) # not a Violation
124
+ # state preserved; ask the agent to ask the user to clarify, then:
125
+ m.receive("Yes") # Ok()
126
+ ```
127
+
128
+ A protocol can also handle `Unrecognized` *explicitly* as a first-class branch — useful for "ask again" loops:
129
+
130
+ ```python
131
+ protocol = "rec Loop.!Ask.?{Yes.end, No.end, Unrecognized.Loop}"
132
+ m = Monitor(protocol)
133
+ m.send("Ask")
134
+ m.receive(UNRECOGNIZED) # Ok — protocol routes back to Loop
135
+ m.send("Ask")
136
+ m.receive("Yes") # Ok — terminal
137
+ ```
138
+
139
+ The distinction matters at the system boundary: `Violation` means the agent broke the rules; `Unrecognized` means we don't have enough information to decide yet. Different responses (halt vs. clarify) come naturally from the typed result.
140
+
114
141
  ## Integration Layer
115
142
 
116
143
  For real agent loops, `llmcontract` provides a client wrapper and tool middleware that share a single monitor — so the full interaction is tracked automatically.
@@ -236,6 +263,25 @@ Each step appears as a guardrail observation in your Langfuse trace with:
236
263
  - **Output**: `passed: true/false`, violation details if applicable
237
264
  - **Score**: `protocol_compliance` (boolean) for filtering and analytics
238
265
 
266
+ ## Claude Code Plugin
267
+
268
+ A Claude Code plugin ships with this repo: **protocol-builder** walks you through designing a session-type protocol conversationally, validates it as you go, and emits a ready-to-paste Python integration snippet.
269
+
270
+ ```bash
271
+ # Install in Claude Code
272
+ /plugin marketplace add chrisbartoloburlo/llmcontract
273
+ /plugin install protocol-builder@llmcontract
274
+
275
+ # Then in any conversation
276
+ /protocol-builder
277
+ ```
278
+
279
+ The skill validates each draft DSL against `llmcontract`'s parser, so anything it produces is guaranteed to load with `Monitor(...)`. Source lives under `skills/protocol-builder/`.
280
+
281
+ ## Case Studies
282
+
283
+ - **[`llmcontract-tau2`](https://github.com/chrisbartoloburlo/llmcontract-tau2)** — Standalone replay of [tau2-bench](https://github.com/sierra-research/tau2-bench)'s shipped trajectories through `Monitor`. Headline: 11/1755 (0.6%) of trajectories that tau2 scored as passing violate the documented "obtain user confirmation before mutating the database" policy. Discussion upstream: [tau2-bench#298](https://github.com/sierra-research/tau2-bench/issues/298).
284
+
239
285
  ## Research
240
286
 
241
287
  This work is based on the theory developed in:
@@ -82,6 +82,33 @@ for _ in range(100):
82
82
  m.receive("Pong") # Ok()
83
83
  ```
84
84
 
85
+ ### Handling natural-language input: `Unrecognized`
86
+
87
+ When the projection layer (typically over user chat) can't classify an event into a known label, it can emit the sentinel `UNRECOGNIZED` instead. The monitor treats this as a soft signal — distinct from `Violation` — without halting or advancing state, so the outer loop can drive a clarification turn:
88
+
89
+ ```python
90
+ from llmcontract import Monitor, Ok, Unrecognized, UNRECOGNIZED
91
+
92
+ m = Monitor("?{Yes.end, No.end}")
93
+ result = m.receive(UNRECOGNIZED) # projection couldn't decide
94
+ assert isinstance(result, Unrecognized) # not a Violation
95
+ # state preserved; ask the agent to ask the user to clarify, then:
96
+ m.receive("Yes") # Ok()
97
+ ```
98
+
99
+ A protocol can also handle `Unrecognized` *explicitly* as a first-class branch — useful for "ask again" loops:
100
+
101
+ ```python
102
+ protocol = "rec Loop.!Ask.?{Yes.end, No.end, Unrecognized.Loop}"
103
+ m = Monitor(protocol)
104
+ m.send("Ask")
105
+ m.receive(UNRECOGNIZED) # Ok — protocol routes back to Loop
106
+ m.send("Ask")
107
+ m.receive("Yes") # Ok — terminal
108
+ ```
109
+
110
+ The distinction matters at the system boundary: `Violation` means the agent broke the rules; `Unrecognized` means we don't have enough information to decide yet. Different responses (halt vs. clarify) come naturally from the typed result.
111
+
85
112
  ## Integration Layer
86
113
 
87
114
  For real agent loops, `llmcontract` provides a client wrapper and tool middleware that share a single monitor — so the full interaction is tracked automatically.
@@ -207,6 +234,25 @@ Each step appears as a guardrail observation in your Langfuse trace with:
207
234
  - **Output**: `passed: true/false`, violation details if applicable
208
235
  - **Score**: `protocol_compliance` (boolean) for filtering and analytics
209
236
 
237
+ ## Claude Code Plugin
238
+
239
+ A Claude Code plugin ships with this repo: **protocol-builder** walks you through designing a session-type protocol conversationally, validates it as you go, and emits a ready-to-paste Python integration snippet.
240
+
241
+ ```bash
242
+ # Install in Claude Code
243
+ /plugin marketplace add chrisbartoloburlo/llmcontract
244
+ /plugin install protocol-builder@llmcontract
245
+
246
+ # Then in any conversation
247
+ /protocol-builder
248
+ ```
249
+
250
+ The skill validates each draft DSL against `llmcontract`'s parser, so anything it produces is guaranteed to load with `Monitor(...)`. Source lives under `skills/protocol-builder/`.
251
+
252
+ ## Case Studies
253
+
254
+ - **[`llmcontract-tau2`](https://github.com/chrisbartoloburlo/llmcontract-tau2)** — Standalone replay of [tau2-bench](https://github.com/sierra-research/tau2-bench)'s shipped trajectories through `Monitor`. Headline: 11/1755 (0.6%) of trajectories that tau2 scored as passing violate the documented "obtain user confirmation before mutating the database" policy. Discussion upstream: [tau2-bench#298](https://github.com/sierra-research/tau2-bench/issues/298).
255
+
210
256
  ## Research
211
257
 
212
258
  This work is based on the theory developed in:
@@ -1,4 +1,6 @@
1
- from llmcontract.monitor.monitor import Monitor, MonitorResult, Ok, Violation, Blocked
1
+ from llmcontract.monitor.monitor import (
2
+ Monitor, MonitorResult, Ok, Violation, Blocked, Unrecognized, UNRECOGNIZED,
3
+ )
2
4
  from llmcontract.integration import (
3
5
  MonitoredClient, ToolMiddleware, ToolResult,
4
6
  LLMResponse, ToolCall, ProtocolViolationError,
@@ -6,6 +8,7 @@ from llmcontract.integration import (
6
8
 
7
9
  __all__ = [
8
10
  "Monitor", "MonitorResult", "Ok", "Violation", "Blocked",
11
+ "Unrecognized", "UNRECOGNIZED",
9
12
  "MonitoredClient", "ToolMiddleware", "ToolResult",
10
13
  "LLMResponse", "ToolCall", "ProtocolViolationError",
11
14
  ]
@@ -22,6 +22,11 @@ class Automaton:
22
22
  terminal_states: set[int] = field(default_factory=set)
23
23
  initial_state: int = 0
24
24
  _next_id: int = field(default=0, repr=False)
25
+ # State aliases produced by recursion back-edges. `aliases[child] = target`
26
+ # means the child state behaves identically to target. Resolved after
27
+ # compilation finishes so back-edges see the *final* set of target
28
+ # transitions, not just the ones that existed when the back-edge was hit.
29
+ _aliases: dict[int, int] = field(default_factory=dict, repr=False)
25
30
 
26
31
  def _new_state(self) -> int:
27
32
  sid = self._next_id
@@ -41,9 +46,47 @@ def compile_ast(node: ProtocolNode) -> Automaton:
41
46
  aut.initial_state = start
42
47
  rec_env: dict[str, int] = {}
43
48
  _compile(node, start, aut, rec_env)
49
+ _resolve_aliases(aut)
44
50
  return aut
45
51
 
46
52
 
53
+ def _resolve_aliases(aut: Automaton) -> None:
54
+ """Collapse alias states by redirecting every reference to its canonical id.
55
+
56
+ A state X marked as an alias of T behaves like T for all observers. We
57
+ redirect every outgoing transition that points at X to point at T instead
58
+ (chasing alias chains), then drop X from the state set entirely. This must
59
+ happen after `_compile` finishes so the snapshot of T's transitions is
60
+ final — fixing the bug where a recursion back-edge inside a choice only
61
+ saw the branches that were compiled before it.
62
+ """
63
+
64
+ def canonical(state: int) -> int:
65
+ seen: set[int] = set()
66
+ cur = state
67
+ while cur in aut._aliases and cur not in seen:
68
+ seen.add(cur)
69
+ cur = aut._aliases[cur]
70
+ return cur
71
+
72
+ aliased = set(aut._aliases.keys())
73
+ if not aliased:
74
+ return
75
+
76
+ for src in list(aut.transitions.keys()):
77
+ if src in aliased:
78
+ continue
79
+ for key, dest in list(aut.transitions[src].items()):
80
+ aut.transitions[src][key] = canonical(dest)
81
+
82
+ for s in aliased:
83
+ aut.transitions.pop(s, None)
84
+ aut.terminal_states.discard(s)
85
+
86
+ if aut.initial_state in aliased:
87
+ aut.initial_state = canonical(aut.initial_state)
88
+
89
+
47
90
  def _compile(
48
91
  node: ProtocolNode,
49
92
  current: int,
@@ -96,12 +139,12 @@ def _compile(
96
139
  _compile(node.body, current, aut, rec_env_copy)
97
140
 
98
141
  elif isinstance(node, RecVar):
99
- # Back-edge: wire current state to the recursion point.
100
- # We mark current as an epsilon-transition target by copying transitions.
142
+ # Back-edge: alias current to the recursion target. We can't copy the
143
+ # target's transitions now because more branches of the surrounding
144
+ # choice may still be compiled; resolution happens once compilation
145
+ # finishes (see `_resolve_aliases`).
101
146
  target = rec_env[node.var]
102
- # Copy all transitions from the target to current state
103
- for key, dest in aut.transitions.get(target, {}).items():
104
- aut.transitions[current][key] = dest
147
+ aut._aliases[current] = target
105
148
 
106
149
  else:
107
150
  raise TypeError(f"Unknown AST node: {type(node)}")
@@ -30,7 +30,36 @@ class Blocked:
30
30
  reason: str
31
31
 
32
32
 
33
- MonitorResult = Union[Ok, Violation, Blocked]
33
+ @dataclass(frozen=True)
34
+ class Unrecognized:
35
+ """The projection couldn't classify the event into a known label.
36
+
37
+ Distinct from `Violation`: a violation means the agent did the wrong
38
+ thing, an `Unrecognized` means the projection layer (typically over
39
+ natural language) couldn't decide which label to emit. Outer-loop code
40
+ is expected to react by asking the underlying agent to clarify with the
41
+ user — not by halting as if the protocol had been broken.
42
+
43
+ The monitor's state is NOT advanced when it returns `Unrecognized` and
44
+ the monitor is NOT halted, so a follow-up event after clarification
45
+ can be fed normally.
46
+
47
+ A protocol can opt out of this behavior by including a literal
48
+ `Unrecognized` transition at any state — in that case the monitor
49
+ follows the transition and returns `Ok`, treating clarification as a
50
+ first-class branch of the protocol.
51
+ """
52
+ expected: list[str]
53
+ direction: str
54
+
55
+
56
+ MonitorResult = Union[Ok, Violation, Blocked, Unrecognized]
57
+
58
+
59
+ # Sentinel label that triggers Unrecognized handling. Use this constant
60
+ # rather than a bare string so callers don't typo their way around the
61
+ # special case.
62
+ UNRECOGNIZED = "Unrecognized"
34
63
 
35
64
 
36
65
  # ── Monitor ──────────────────────────────────────────────────
@@ -75,8 +104,16 @@ class Monitor:
75
104
  self._current_state = transitions[key]
76
105
  return Ok()
77
106
 
78
- # Build a useful violation message
79
107
  expected = [f"{'!' if d == 'send' else '?'}{l}" for d, l in transitions]
108
+
109
+ # Soft fail-open path for projection-induced uncertainty: if the
110
+ # event's label is the UNRECOGNIZED sentinel and the protocol does
111
+ # not declare a transition for it at this state, return Unrecognized
112
+ # without halting and without advancing state — the outer loop is
113
+ # expected to drive a clarification turn and re-feed the result.
114
+ if label == UNRECOGNIZED:
115
+ return Unrecognized(expected=expected, direction=direction)
116
+
80
117
  got = f"{'!' if direction == 'send' else '?'}{label}"
81
118
  self._halted = True
82
119
  return Violation(expected=expected, got=got)
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: llmsessioncontract
3
- Version: 0.2.0
3
+ Version: 0.2.2
4
4
  Summary: Runtime monitor for LLM agent interaction protocols based on session type theory
5
5
  Author-email: Chris Bartolo Burlo <chris@mizziburlo.com>
6
6
  License-Expression: MIT
@@ -111,6 +111,33 @@ for _ in range(100):
111
111
  m.receive("Pong") # Ok()
112
112
  ```
113
113
 
114
+ ### Handling natural-language input: `Unrecognized`
115
+
116
+ When the projection layer (typically over user chat) can't classify an event into a known label, it can emit the sentinel `UNRECOGNIZED` instead. The monitor treats this as a soft signal — distinct from `Violation` — without halting or advancing state, so the outer loop can drive a clarification turn:
117
+
118
+ ```python
119
+ from llmcontract import Monitor, Ok, Unrecognized, UNRECOGNIZED
120
+
121
+ m = Monitor("?{Yes.end, No.end}")
122
+ result = m.receive(UNRECOGNIZED) # projection couldn't decide
123
+ assert isinstance(result, Unrecognized) # not a Violation
124
+ # state preserved; ask the agent to ask the user to clarify, then:
125
+ m.receive("Yes") # Ok()
126
+ ```
127
+
128
+ A protocol can also handle `Unrecognized` *explicitly* as a first-class branch — useful for "ask again" loops:
129
+
130
+ ```python
131
+ protocol = "rec Loop.!Ask.?{Yes.end, No.end, Unrecognized.Loop}"
132
+ m = Monitor(protocol)
133
+ m.send("Ask")
134
+ m.receive(UNRECOGNIZED) # Ok — protocol routes back to Loop
135
+ m.send("Ask")
136
+ m.receive("Yes") # Ok — terminal
137
+ ```
138
+
139
+ The distinction matters at the system boundary: `Violation` means the agent broke the rules; `Unrecognized` means we don't have enough information to decide yet. Different responses (halt vs. clarify) come naturally from the typed result.
140
+
114
141
  ## Integration Layer
115
142
 
116
143
  For real agent loops, `llmcontract` provides a client wrapper and tool middleware that share a single monitor — so the full interaction is tracked automatically.
@@ -236,6 +263,25 @@ Each step appears as a guardrail observation in your Langfuse trace with:
236
263
  - **Output**: `passed: true/false`, violation details if applicable
237
264
  - **Score**: `protocol_compliance` (boolean) for filtering and analytics
238
265
 
266
+ ## Claude Code Plugin
267
+
268
+ A Claude Code plugin ships with this repo: **protocol-builder** walks you through designing a session-type protocol conversationally, validates it as you go, and emits a ready-to-paste Python integration snippet.
269
+
270
+ ```bash
271
+ # Install in Claude Code
272
+ /plugin marketplace add chrisbartoloburlo/llmcontract
273
+ /plugin install protocol-builder@llmcontract
274
+
275
+ # Then in any conversation
276
+ /protocol-builder
277
+ ```
278
+
279
+ The skill validates each draft DSL against `llmcontract`'s parser, so anything it produces is guaranteed to load with `Monitor(...)`. Source lives under `skills/protocol-builder/`.
280
+
281
+ ## Case Studies
282
+
283
+ - **[`llmcontract-tau2`](https://github.com/chrisbartoloburlo/llmcontract-tau2)** — Standalone replay of [tau2-bench](https://github.com/sierra-research/tau2-bench)'s shipped trajectories through `Monitor`. Headline: 11/1755 (0.6%) of trajectories that tau2 scored as passing violate the documented "obtain user confirmation before mutating the database" policy. Discussion upstream: [tau2-bench#298](https://github.com/sierra-research/tau2-bench/issues/298).
284
+
239
285
  ## Research
240
286
 
241
287
  This work is based on the theory developed in:
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "llmsessioncontract"
7
- version = "0.2.0"
7
+ version = "0.2.2"
8
8
  description = "Runtime monitor for LLM agent interaction protocols based on session type theory"
9
9
  requires-python = ">=3.10"
10
10
  license = "MIT"