llmsessioncontract 0.2.0__tar.gz → 0.2.2__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/PKG-INFO +47 -1
- {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/README.md +46 -0
- {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/__init__.py +4 -1
- {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/monitor/automaton.py +48 -5
- {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/monitor/monitor.py +39 -2
- {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmsessioncontract.egg-info/PKG-INFO +47 -1
- {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/pyproject.toml +1 -1
- {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/LICENSE +0 -0
- {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/dsl/__init__.py +0 -0
- {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/dsl/ast.py +0 -0
- {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/dsl/parser.py +0 -0
- {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/integration/__init__.py +0 -0
- {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/integration/client.py +0 -0
- {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/integration/exceptions.py +0 -0
- {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/integration/langfuse.py +0 -0
- {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/integration/middleware.py +0 -0
- {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/integration/types.py +0 -0
- {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/monitor/__init__.py +0 -0
- {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmcontract/py.typed +0 -0
- {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmsessioncontract.egg-info/SOURCES.txt +0 -0
- {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmsessioncontract.egg-info/dependency_links.txt +0 -0
- {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmsessioncontract.egg-info/requires.txt +0 -0
- {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmsessioncontract.egg-info/top_level.txt +0 -0
- {llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/setup.cfg +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: llmsessioncontract
|
|
3
|
-
Version: 0.2.
|
|
3
|
+
Version: 0.2.2
|
|
4
4
|
Summary: Runtime monitor for LLM agent interaction protocols based on session type theory
|
|
5
5
|
Author-email: Chris Bartolo Burlo <chris@mizziburlo.com>
|
|
6
6
|
License-Expression: MIT
|
|
@@ -111,6 +111,33 @@ for _ in range(100):
|
|
|
111
111
|
m.receive("Pong") # Ok()
|
|
112
112
|
```
|
|
113
113
|
|
|
114
|
+
### Handling natural-language input: `Unrecognized`
|
|
115
|
+
|
|
116
|
+
When the projection layer (typically over user chat) can't classify an event into a known label, it can emit the sentinel `UNRECOGNIZED` instead. The monitor treats this as a soft signal — distinct from `Violation` — without halting or advancing state, so the outer loop can drive a clarification turn:
|
|
117
|
+
|
|
118
|
+
```python
|
|
119
|
+
from llmcontract import Monitor, Ok, Unrecognized, UNRECOGNIZED
|
|
120
|
+
|
|
121
|
+
m = Monitor("?{Yes.end, No.end}")
|
|
122
|
+
result = m.receive(UNRECOGNIZED) # projection couldn't decide
|
|
123
|
+
assert isinstance(result, Unrecognized) # not a Violation
|
|
124
|
+
# state preserved; ask the agent to ask the user to clarify, then:
|
|
125
|
+
m.receive("Yes") # Ok()
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
A protocol can also handle `Unrecognized` *explicitly* as a first-class branch — useful for "ask again" loops:
|
|
129
|
+
|
|
130
|
+
```python
|
|
131
|
+
protocol = "rec Loop.!Ask.?{Yes.end, No.end, Unrecognized.Loop}"
|
|
132
|
+
m = Monitor(protocol)
|
|
133
|
+
m.send("Ask")
|
|
134
|
+
m.receive(UNRECOGNIZED) # Ok — protocol routes back to Loop
|
|
135
|
+
m.send("Ask")
|
|
136
|
+
m.receive("Yes") # Ok — terminal
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
The distinction matters at the system boundary: `Violation` means the agent broke the rules; `Unrecognized` means we don't have enough information to decide yet. Different responses (halt vs. clarify) come naturally from the typed result.
|
|
140
|
+
|
|
114
141
|
## Integration Layer
|
|
115
142
|
|
|
116
143
|
For real agent loops, `llmcontract` provides a client wrapper and tool middleware that share a single monitor — so the full interaction is tracked automatically.
|
|
@@ -236,6 +263,25 @@ Each step appears as a guardrail observation in your Langfuse trace with:
|
|
|
236
263
|
- **Output**: `passed: true/false`, violation details if applicable
|
|
237
264
|
- **Score**: `protocol_compliance` (boolean) for filtering and analytics
|
|
238
265
|
|
|
266
|
+
## Claude Code Plugin
|
|
267
|
+
|
|
268
|
+
A Claude Code plugin ships with this repo: **protocol-builder** walks you through designing a session-type protocol conversationally, validates it as you go, and emits a ready-to-paste Python integration snippet.
|
|
269
|
+
|
|
270
|
+
```bash
|
|
271
|
+
# Install in Claude Code
|
|
272
|
+
/plugin marketplace add chrisbartoloburlo/llmcontract
|
|
273
|
+
/plugin install protocol-builder@llmcontract
|
|
274
|
+
|
|
275
|
+
# Then in any conversation
|
|
276
|
+
/protocol-builder
|
|
277
|
+
```
|
|
278
|
+
|
|
279
|
+
The skill validates each draft DSL against `llmcontract`'s parser, so anything it produces is guaranteed to load with `Monitor(...)`. Source lives under `skills/protocol-builder/`.
|
|
280
|
+
|
|
281
|
+
## Case Studies
|
|
282
|
+
|
|
283
|
+
- **[`llmcontract-tau2`](https://github.com/chrisbartoloburlo/llmcontract-tau2)** — Standalone replay of [tau2-bench](https://github.com/sierra-research/tau2-bench)'s shipped trajectories through `Monitor`. Headline: 11/1755 (0.6%) of trajectories that tau2 scored as passing violate the documented "obtain user confirmation before mutating the database" policy. Discussion upstream: [tau2-bench#298](https://github.com/sierra-research/tau2-bench/issues/298).
|
|
284
|
+
|
|
239
285
|
## Research
|
|
240
286
|
|
|
241
287
|
This work is based on the theory developed in:
|
|
@@ -82,6 +82,33 @@ for _ in range(100):
|
|
|
82
82
|
m.receive("Pong") # Ok()
|
|
83
83
|
```
|
|
84
84
|
|
|
85
|
+
### Handling natural-language input: `Unrecognized`
|
|
86
|
+
|
|
87
|
+
When the projection layer (typically over user chat) can't classify an event into a known label, it can emit the sentinel `UNRECOGNIZED` instead. The monitor treats this as a soft signal — distinct from `Violation` — without halting or advancing state, so the outer loop can drive a clarification turn:
|
|
88
|
+
|
|
89
|
+
```python
|
|
90
|
+
from llmcontract import Monitor, Ok, Unrecognized, UNRECOGNIZED
|
|
91
|
+
|
|
92
|
+
m = Monitor("?{Yes.end, No.end}")
|
|
93
|
+
result = m.receive(UNRECOGNIZED) # projection couldn't decide
|
|
94
|
+
assert isinstance(result, Unrecognized) # not a Violation
|
|
95
|
+
# state preserved; ask the agent to ask the user to clarify, then:
|
|
96
|
+
m.receive("Yes") # Ok()
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
A protocol can also handle `Unrecognized` *explicitly* as a first-class branch — useful for "ask again" loops:
|
|
100
|
+
|
|
101
|
+
```python
|
|
102
|
+
protocol = "rec Loop.!Ask.?{Yes.end, No.end, Unrecognized.Loop}"
|
|
103
|
+
m = Monitor(protocol)
|
|
104
|
+
m.send("Ask")
|
|
105
|
+
m.receive(UNRECOGNIZED) # Ok — protocol routes back to Loop
|
|
106
|
+
m.send("Ask")
|
|
107
|
+
m.receive("Yes") # Ok — terminal
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
The distinction matters at the system boundary: `Violation` means the agent broke the rules; `Unrecognized` means we don't have enough information to decide yet. Different responses (halt vs. clarify) come naturally from the typed result.
|
|
111
|
+
|
|
85
112
|
## Integration Layer
|
|
86
113
|
|
|
87
114
|
For real agent loops, `llmcontract` provides a client wrapper and tool middleware that share a single monitor — so the full interaction is tracked automatically.
|
|
@@ -207,6 +234,25 @@ Each step appears as a guardrail observation in your Langfuse trace with:
|
|
|
207
234
|
- **Output**: `passed: true/false`, violation details if applicable
|
|
208
235
|
- **Score**: `protocol_compliance` (boolean) for filtering and analytics
|
|
209
236
|
|
|
237
|
+
## Claude Code Plugin
|
|
238
|
+
|
|
239
|
+
A Claude Code plugin ships with this repo: **protocol-builder** walks you through designing a session-type protocol conversationally, validates it as you go, and emits a ready-to-paste Python integration snippet.
|
|
240
|
+
|
|
241
|
+
```bash
|
|
242
|
+
# Install in Claude Code
|
|
243
|
+
/plugin marketplace add chrisbartoloburlo/llmcontract
|
|
244
|
+
/plugin install protocol-builder@llmcontract
|
|
245
|
+
|
|
246
|
+
# Then in any conversation
|
|
247
|
+
/protocol-builder
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
The skill validates each draft DSL against `llmcontract`'s parser, so anything it produces is guaranteed to load with `Monitor(...)`. Source lives under `skills/protocol-builder/`.
|
|
251
|
+
|
|
252
|
+
## Case Studies
|
|
253
|
+
|
|
254
|
+
- **[`llmcontract-tau2`](https://github.com/chrisbartoloburlo/llmcontract-tau2)** — Standalone replay of [tau2-bench](https://github.com/sierra-research/tau2-bench)'s shipped trajectories through `Monitor`. Headline: 11/1755 (0.6%) of trajectories that tau2 scored as passing violate the documented "obtain user confirmation before mutating the database" policy. Discussion upstream: [tau2-bench#298](https://github.com/sierra-research/tau2-bench/issues/298).
|
|
255
|
+
|
|
210
256
|
## Research
|
|
211
257
|
|
|
212
258
|
This work is based on the theory developed in:
|
|
@@ -1,4 +1,6 @@
|
|
|
1
|
-
from llmcontract.monitor.monitor import
|
|
1
|
+
from llmcontract.monitor.monitor import (
|
|
2
|
+
Monitor, MonitorResult, Ok, Violation, Blocked, Unrecognized, UNRECOGNIZED,
|
|
3
|
+
)
|
|
2
4
|
from llmcontract.integration import (
|
|
3
5
|
MonitoredClient, ToolMiddleware, ToolResult,
|
|
4
6
|
LLMResponse, ToolCall, ProtocolViolationError,
|
|
@@ -6,6 +8,7 @@ from llmcontract.integration import (
|
|
|
6
8
|
|
|
7
9
|
__all__ = [
|
|
8
10
|
"Monitor", "MonitorResult", "Ok", "Violation", "Blocked",
|
|
11
|
+
"Unrecognized", "UNRECOGNIZED",
|
|
9
12
|
"MonitoredClient", "ToolMiddleware", "ToolResult",
|
|
10
13
|
"LLMResponse", "ToolCall", "ProtocolViolationError",
|
|
11
14
|
]
|
|
@@ -22,6 +22,11 @@ class Automaton:
|
|
|
22
22
|
terminal_states: set[int] = field(default_factory=set)
|
|
23
23
|
initial_state: int = 0
|
|
24
24
|
_next_id: int = field(default=0, repr=False)
|
|
25
|
+
# State aliases produced by recursion back-edges. `aliases[child] = target`
|
|
26
|
+
# means the child state behaves identically to target. Resolved after
|
|
27
|
+
# compilation finishes so back-edges see the *final* set of target
|
|
28
|
+
# transitions, not just the ones that existed when the back-edge was hit.
|
|
29
|
+
_aliases: dict[int, int] = field(default_factory=dict, repr=False)
|
|
25
30
|
|
|
26
31
|
def _new_state(self) -> int:
|
|
27
32
|
sid = self._next_id
|
|
@@ -41,9 +46,47 @@ def compile_ast(node: ProtocolNode) -> Automaton:
|
|
|
41
46
|
aut.initial_state = start
|
|
42
47
|
rec_env: dict[str, int] = {}
|
|
43
48
|
_compile(node, start, aut, rec_env)
|
|
49
|
+
_resolve_aliases(aut)
|
|
44
50
|
return aut
|
|
45
51
|
|
|
46
52
|
|
|
53
|
+
def _resolve_aliases(aut: Automaton) -> None:
|
|
54
|
+
"""Collapse alias states by redirecting every reference to its canonical id.
|
|
55
|
+
|
|
56
|
+
A state X marked as an alias of T behaves like T for all observers. We
|
|
57
|
+
redirect every outgoing transition that points at X to point at T instead
|
|
58
|
+
(chasing alias chains), then drop X from the state set entirely. This must
|
|
59
|
+
happen after `_compile` finishes so the snapshot of T's transitions is
|
|
60
|
+
final — fixing the bug where a recursion back-edge inside a choice only
|
|
61
|
+
saw the branches that were compiled before it.
|
|
62
|
+
"""
|
|
63
|
+
|
|
64
|
+
def canonical(state: int) -> int:
|
|
65
|
+
seen: set[int] = set()
|
|
66
|
+
cur = state
|
|
67
|
+
while cur in aut._aliases and cur not in seen:
|
|
68
|
+
seen.add(cur)
|
|
69
|
+
cur = aut._aliases[cur]
|
|
70
|
+
return cur
|
|
71
|
+
|
|
72
|
+
aliased = set(aut._aliases.keys())
|
|
73
|
+
if not aliased:
|
|
74
|
+
return
|
|
75
|
+
|
|
76
|
+
for src in list(aut.transitions.keys()):
|
|
77
|
+
if src in aliased:
|
|
78
|
+
continue
|
|
79
|
+
for key, dest in list(aut.transitions[src].items()):
|
|
80
|
+
aut.transitions[src][key] = canonical(dest)
|
|
81
|
+
|
|
82
|
+
for s in aliased:
|
|
83
|
+
aut.transitions.pop(s, None)
|
|
84
|
+
aut.terminal_states.discard(s)
|
|
85
|
+
|
|
86
|
+
if aut.initial_state in aliased:
|
|
87
|
+
aut.initial_state = canonical(aut.initial_state)
|
|
88
|
+
|
|
89
|
+
|
|
47
90
|
def _compile(
|
|
48
91
|
node: ProtocolNode,
|
|
49
92
|
current: int,
|
|
@@ -96,12 +139,12 @@ def _compile(
|
|
|
96
139
|
_compile(node.body, current, aut, rec_env_copy)
|
|
97
140
|
|
|
98
141
|
elif isinstance(node, RecVar):
|
|
99
|
-
# Back-edge:
|
|
100
|
-
#
|
|
142
|
+
# Back-edge: alias current to the recursion target. We can't copy the
|
|
143
|
+
# target's transitions now because more branches of the surrounding
|
|
144
|
+
# choice may still be compiled; resolution happens once compilation
|
|
145
|
+
# finishes (see `_resolve_aliases`).
|
|
101
146
|
target = rec_env[node.var]
|
|
102
|
-
|
|
103
|
-
for key, dest in aut.transitions.get(target, {}).items():
|
|
104
|
-
aut.transitions[current][key] = dest
|
|
147
|
+
aut._aliases[current] = target
|
|
105
148
|
|
|
106
149
|
else:
|
|
107
150
|
raise TypeError(f"Unknown AST node: {type(node)}")
|
|
@@ -30,7 +30,36 @@ class Blocked:
|
|
|
30
30
|
reason: str
|
|
31
31
|
|
|
32
32
|
|
|
33
|
-
|
|
33
|
+
@dataclass(frozen=True)
|
|
34
|
+
class Unrecognized:
|
|
35
|
+
"""The projection couldn't classify the event into a known label.
|
|
36
|
+
|
|
37
|
+
Distinct from `Violation`: a violation means the agent did the wrong
|
|
38
|
+
thing, an `Unrecognized` means the projection layer (typically over
|
|
39
|
+
natural language) couldn't decide which label to emit. Outer-loop code
|
|
40
|
+
is expected to react by asking the underlying agent to clarify with the
|
|
41
|
+
user — not by halting as if the protocol had been broken.
|
|
42
|
+
|
|
43
|
+
The monitor's state is NOT advanced when it returns `Unrecognized` and
|
|
44
|
+
the monitor is NOT halted, so a follow-up event after clarification
|
|
45
|
+
can be fed normally.
|
|
46
|
+
|
|
47
|
+
A protocol can opt out of this behavior by including a literal
|
|
48
|
+
`Unrecognized` transition at any state — in that case the monitor
|
|
49
|
+
follows the transition and returns `Ok`, treating clarification as a
|
|
50
|
+
first-class branch of the protocol.
|
|
51
|
+
"""
|
|
52
|
+
expected: list[str]
|
|
53
|
+
direction: str
|
|
54
|
+
|
|
55
|
+
|
|
56
|
+
MonitorResult = Union[Ok, Violation, Blocked, Unrecognized]
|
|
57
|
+
|
|
58
|
+
|
|
59
|
+
# Sentinel label that triggers Unrecognized handling. Use this constant
|
|
60
|
+
# rather than a bare string so callers don't typo their way around the
|
|
61
|
+
# special case.
|
|
62
|
+
UNRECOGNIZED = "Unrecognized"
|
|
34
63
|
|
|
35
64
|
|
|
36
65
|
# ── Monitor ──────────────────────────────────────────────────
|
|
@@ -75,8 +104,16 @@ class Monitor:
|
|
|
75
104
|
self._current_state = transitions[key]
|
|
76
105
|
return Ok()
|
|
77
106
|
|
|
78
|
-
# Build a useful violation message
|
|
79
107
|
expected = [f"{'!' if d == 'send' else '?'}{l}" for d, l in transitions]
|
|
108
|
+
|
|
109
|
+
# Soft fail-open path for projection-induced uncertainty: if the
|
|
110
|
+
# event's label is the UNRECOGNIZED sentinel and the protocol does
|
|
111
|
+
# not declare a transition for it at this state, return Unrecognized
|
|
112
|
+
# without halting and without advancing state — the outer loop is
|
|
113
|
+
# expected to drive a clarification turn and re-feed the result.
|
|
114
|
+
if label == UNRECOGNIZED:
|
|
115
|
+
return Unrecognized(expected=expected, direction=direction)
|
|
116
|
+
|
|
80
117
|
got = f"{'!' if direction == 'send' else '?'}{label}"
|
|
81
118
|
self._halted = True
|
|
82
119
|
return Violation(expected=expected, got=got)
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: llmsessioncontract
|
|
3
|
-
Version: 0.2.
|
|
3
|
+
Version: 0.2.2
|
|
4
4
|
Summary: Runtime monitor for LLM agent interaction protocols based on session type theory
|
|
5
5
|
Author-email: Chris Bartolo Burlo <chris@mizziburlo.com>
|
|
6
6
|
License-Expression: MIT
|
|
@@ -111,6 +111,33 @@ for _ in range(100):
|
|
|
111
111
|
m.receive("Pong") # Ok()
|
|
112
112
|
```
|
|
113
113
|
|
|
114
|
+
### Handling natural-language input: `Unrecognized`
|
|
115
|
+
|
|
116
|
+
When the projection layer (typically over user chat) can't classify an event into a known label, it can emit the sentinel `UNRECOGNIZED` instead. The monitor treats this as a soft signal — distinct from `Violation` — without halting or advancing state, so the outer loop can drive a clarification turn:
|
|
117
|
+
|
|
118
|
+
```python
|
|
119
|
+
from llmcontract import Monitor, Ok, Unrecognized, UNRECOGNIZED
|
|
120
|
+
|
|
121
|
+
m = Monitor("?{Yes.end, No.end}")
|
|
122
|
+
result = m.receive(UNRECOGNIZED) # projection couldn't decide
|
|
123
|
+
assert isinstance(result, Unrecognized) # not a Violation
|
|
124
|
+
# state preserved; ask the agent to ask the user to clarify, then:
|
|
125
|
+
m.receive("Yes") # Ok()
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
A protocol can also handle `Unrecognized` *explicitly* as a first-class branch — useful for "ask again" loops:
|
|
129
|
+
|
|
130
|
+
```python
|
|
131
|
+
protocol = "rec Loop.!Ask.?{Yes.end, No.end, Unrecognized.Loop}"
|
|
132
|
+
m = Monitor(protocol)
|
|
133
|
+
m.send("Ask")
|
|
134
|
+
m.receive(UNRECOGNIZED) # Ok — protocol routes back to Loop
|
|
135
|
+
m.send("Ask")
|
|
136
|
+
m.receive("Yes") # Ok — terminal
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
The distinction matters at the system boundary: `Violation` means the agent broke the rules; `Unrecognized` means we don't have enough information to decide yet. Different responses (halt vs. clarify) come naturally from the typed result.
|
|
140
|
+
|
|
114
141
|
## Integration Layer
|
|
115
142
|
|
|
116
143
|
For real agent loops, `llmcontract` provides a client wrapper and tool middleware that share a single monitor — so the full interaction is tracked automatically.
|
|
@@ -236,6 +263,25 @@ Each step appears as a guardrail observation in your Langfuse trace with:
|
|
|
236
263
|
- **Output**: `passed: true/false`, violation details if applicable
|
|
237
264
|
- **Score**: `protocol_compliance` (boolean) for filtering and analytics
|
|
238
265
|
|
|
266
|
+
## Claude Code Plugin
|
|
267
|
+
|
|
268
|
+
A Claude Code plugin ships with this repo: **protocol-builder** walks you through designing a session-type protocol conversationally, validates it as you go, and emits a ready-to-paste Python integration snippet.
|
|
269
|
+
|
|
270
|
+
```bash
|
|
271
|
+
# Install in Claude Code
|
|
272
|
+
/plugin marketplace add chrisbartoloburlo/llmcontract
|
|
273
|
+
/plugin install protocol-builder@llmcontract
|
|
274
|
+
|
|
275
|
+
# Then in any conversation
|
|
276
|
+
/protocol-builder
|
|
277
|
+
```
|
|
278
|
+
|
|
279
|
+
The skill validates each draft DSL against `llmcontract`'s parser, so anything it produces is guaranteed to load with `Monitor(...)`. Source lives under `skills/protocol-builder/`.
|
|
280
|
+
|
|
281
|
+
## Case Studies
|
|
282
|
+
|
|
283
|
+
- **[`llmcontract-tau2`](https://github.com/chrisbartoloburlo/llmcontract-tau2)** — Standalone replay of [tau2-bench](https://github.com/sierra-research/tau2-bench)'s shipped trajectories through `Monitor`. Headline: 11/1755 (0.6%) of trajectories that tau2 scored as passing violate the documented "obtain user confirmation before mutating the database" policy. Discussion upstream: [tau2-bench#298](https://github.com/sierra-research/tau2-bench/issues/298).
|
|
284
|
+
|
|
239
285
|
## Research
|
|
240
286
|
|
|
241
287
|
This work is based on the theory developed in:
|
|
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
|
|
|
4
4
|
|
|
5
5
|
[project]
|
|
6
6
|
name = "llmsessioncontract"
|
|
7
|
-
version = "0.2.
|
|
7
|
+
version = "0.2.2"
|
|
8
8
|
description = "Runtime monitor for LLM agent interaction protocols based on session type theory"
|
|
9
9
|
requires-python = ">=3.10"
|
|
10
10
|
license = "MIT"
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
{llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmsessioncontract.egg-info/SOURCES.txt
RENAMED
|
File without changes
|
|
File without changes
|
{llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmsessioncontract.egg-info/requires.txt
RENAMED
|
File without changes
|
{llmsessioncontract-0.2.0 → llmsessioncontract-0.2.2}/llmsessioncontract.egg-info/top_level.txt
RENAMED
|
File without changes
|
|
File without changes
|