statepilot 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,68 @@
1
+ # Byte-compiled / optimized / DLL files
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+
7
+ # Distribution / packaging
8
+ .Python
9
+ build/
10
+ develop-eggs/
11
+ dist/
12
+ downloads/
13
+ eggs/
14
+ .eggs/
15
+ lib/
16
+ lib64/
17
+ parts/
18
+ sdist/
19
+ var/
20
+ wheels/
21
+ share/python-wheels/
22
+ *.egg-info/
23
+ .installed.cfg
24
+ *.egg
25
+ MANIFEST
26
+
27
+ # Unit test / coverage reports
28
+ htmlcov/
29
+ .tox/
30
+ .nox/
31
+ .coverage
32
+ .coverage.*
33
+ .cache
34
+ nosetests.xml
35
+ coverage.xml
36
+ *.cover
37
+ *.py,cover
38
+ .hypothesis/
39
+ .pytest_cache/
40
+ cover/
41
+
42
+ # Environments
43
+ .env
44
+ .venv
45
+ env/
46
+ venv/
47
+ ENV/
48
+ env.bak/
49
+ venv.bak/
50
+
51
+ # Type checkers
52
+ .mypy_cache/
53
+ .dmypy.json
54
+ dmypy.json
55
+ .pyre/
56
+ .pytype/
57
+ .ruff_cache/
58
+
59
+ # IDEs / editors
60
+ .idea/
61
+ .vscode/
62
+ *.swp
63
+ *.swo
64
+ *~
65
+ .DS_Store
66
+
67
+ # uv
68
+ uv.lock
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 StudioMeyer
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,339 @@
1
+ Metadata-Version: 2.4
2
+ Name: statepilot
3
+ Version: 0.1.0
4
+ Summary: Deterministic state-machine guards for AI-agent workflows: enforce which tools an agent may call, in which order, with loop detection, cost budgets and step caps.
5
+ Project-URL: Homepage, https://github.com/studiomeyer-io/statepilot
6
+ Project-URL: Repository, https://github.com/studiomeyer-io/statepilot
7
+ Project-URL: Issues, https://github.com/studiomeyer-io/statepilot/issues
8
+ Project-URL: Changelog, https://github.com/studiomeyer-io/statepilot/blob/main/CHANGELOG.md
9
+ Author-email: StudioMeyer <hello@studiomeyer.io>
10
+ License: MIT
11
+ License-File: LICENSE
12
+ Keywords: agent,determinism,guardrails,langgraph,llm,state-machine,workflow
13
+ Classifier: Development Status :: 4 - Beta
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: License :: OSI Approved :: MIT License
16
+ Classifier: Operating System :: OS Independent
17
+ Classifier: Programming Language :: Python :: 3
18
+ Classifier: Programming Language :: Python :: 3.10
19
+ Classifier: Programming Language :: Python :: 3.11
20
+ Classifier: Programming Language :: Python :: 3.12
21
+ Classifier: Programming Language :: Python :: 3.13
22
+ Classifier: Programming Language :: Python :: Implementation :: CPython
23
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
24
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
25
+ Classifier: Typing :: Typed
26
+ Requires-Python: >=3.10
27
+ Provides-Extra: dev
28
+ Requires-Dist: mypy>=1.10; extra == 'dev'
29
+ Requires-Dist: pytest-cov>=5.0; extra == 'dev'
30
+ Requires-Dist: pytest>=8.0; extra == 'dev'
31
+ Requires-Dist: pyyaml>=6.0; extra == 'dev'
32
+ Requires-Dist: ruff>=0.6; extra == 'dev'
33
+ Requires-Dist: types-pyyaml>=6.0; extra == 'dev'
34
+ Provides-Extra: yaml
35
+ Requires-Dist: pyyaml>=6.0; extra == 'yaml'
36
+ Description-Content-Type: text/markdown
37
+
38
+ # statepilot
39
+
40
+ **Deterministic state-machine guards for AI-agent workflows.**
41
+
42
+ Define a state machine, then enforce — at *runtime* — which tools your agent may
43
+ call, in which order, with loop detection, a cost budget, and a hard step cap.
44
+ The agent only gets to do what the state machine allows. Anything else raises.
45
+
46
+ Zero runtime dependencies in the core. Fully typed. Python 3.10+.
47
+
48
+ ```bash
49
+ pip install statepilot
50
+ ```
51
+
52
+ ## The problem
53
+
54
+ A recurring theme in 2026 agent tooling is "wrap the non-deterministic LLM in
55
+ deterministic code." A few data points that frame the gap:
56
+
57
+ - **Statewright** (Rust + MCP) put deterministic state machines for agents on the
58
+ map — it reached the [Hacker News front page](https://news.ycombinator.com/)
59
+ and lives at <https://github.com/statewright/statewright>. The framing clearly
60
+ resonates.
61
+ - **llm-canary** ships *policy gates for agent traces* — tool order, cost
62
+ budgets, runaway-loop checks — but as a **post-hoc test layer** over recorded
63
+ traces, not as runtime enforcement.
64
+ - Orchestrators like **LangGraph**, **CrewAI** and the **OpenAI Agents SDK** are
65
+ excellent at *routing*, but they don't hand you a small, hard rule that says
66
+ "tool X is illegal in state Y, full stop."
67
+
68
+ The missing piece is a **Python-native runtime guard**: a thin layer you put in
69
+ front of every tool call that enforces the allowed transitions and trips on
70
+ loops and budget overruns. That is what statepilot is.
71
+
72
+ It does not orchestrate, plan, or call your LLM. It is the bouncer at the door.
73
+
74
+ ## Quickstart (Python builder)
75
+
76
+ ```python
77
+ from statepilot import StateMachine, Pilot
78
+
79
+ machine = (
80
+ StateMachine.builder()
81
+ .initial("research")
82
+ .transition("research", "research", tool="search") # looping allowed...
83
+ .transition("research", "draft", tool="write_draft")
84
+ .transition("draft", "review", tool="review")
85
+ .transition("review", "draft", tool="revise") # send it back
86
+ .transition("review", "published", tool="publish")
87
+ .terminal("published")
88
+ .build()
89
+ )
90
+
91
+ pilot = Pilot(machine, budget=5.0, max_state_visits=4, max_steps=20)
92
+
93
+ pilot.step("search", cost=0.5) # ok, still in "research"
94
+ pilot.step("write_draft", cost=1.0) # -> "draft"
95
+ pilot.step("review") # -> "review"
96
+ pilot.step("publish") # -> "published" (terminal)
97
+
98
+ pilot.step("review") # raises TransitionError: terminal state
99
+ ```
100
+
101
+ Every accepted step is recorded:
102
+
103
+ ```python
104
+ for record in pilot.history:
105
+ print(record.index, record.source, "--", record.tool, "->", record.dest)
106
+ ```
107
+
108
+ ## The `@guarded` decorator
109
+
110
+ Bind your actual tool functions to the pilot. The guard runs **before** the
111
+ function body, so a violation means the body never executes.
112
+
113
+ ```python
114
+ from statepilot import StateMachine, Pilot, guarded, GuardViolation
115
+
116
+ machine = (
117
+ StateMachine.builder()
118
+ .initial("research")
119
+ .transition("research", "research", tool="search")
120
+ .transition("research", "draft", tool="write_draft")
121
+ .terminal("draft")
122
+ .build()
123
+ )
124
+ pilot = Pilot(machine, budget=5.0)
125
+
126
+ @guarded(pilot, cost=1.0) # tool name defaults to the function name
127
+ def search(query: str) -> list[str]:
128
+ return real_search(query)
129
+
130
+ @guarded(pilot, tool="write_draft") # or name it explicitly
131
+ def make_draft(notes: list[str]) -> str:
132
+ return real_draft(notes)
133
+
134
+ search("agent guardrails") # advances the machine, charges 1.0
135
+ make_draft(["..."]) # -> "draft"
136
+
137
+ try:
138
+ make_draft(["..."]) # already terminal
139
+ except GuardViolation as exc:
140
+ print("blocked:", exc)
141
+ ```
142
+
143
+ ## YAML definition
144
+
145
+ Prefer config over code? Define the machine in YAML and load it. (YAML support is
146
+ an optional extra: `pip install statepilot[yaml]`.)
147
+
148
+ ```yaml
149
+ # pipeline.yaml
150
+ initial: research
151
+ terminal:
152
+ - published
153
+ transitions:
154
+ - {from: research, to: research, tool: search}
155
+ - {from: research, to: draft, tool: write_draft}
156
+ - {from: draft, to: review, tool: review}
157
+ - {from: review, to: published, tool: publish}
158
+ ```
159
+
160
+ ```python
161
+ from statepilot import StateMachine, Pilot
162
+
163
+ machine = StateMachine.from_yaml_file("pipeline.yaml") # from a file
164
+ # or: StateMachine.from_yaml(yaml_string) # from an inline string
165
+ pilot = Pilot(machine, budget=5.0)
166
+ ```
167
+
168
+ `states` may be omitted — it is inferred from `initial`, `terminal`, and every
169
+ state named in `transitions`. `StateMachine.from_dict(...)` accepts the same
170
+ shape if you already have a dict.
171
+
172
+ ## A realistic agent example
173
+
174
+ "Research, then draft, then review, then publish. Never publish before review.
175
+ Allow at most 3 research loops. Stop if cost exceeds \$5."
176
+
177
+ ```python
178
+ from statepilot import StateMachine, Pilot, guarded, GuardViolation
179
+
180
+ machine = (
181
+ StateMachine.builder()
182
+ .initial("research")
183
+ .transition("research", "research", tool="search")
184
+ .transition("research", "draft", tool="write_draft")
185
+ .transition("draft", "review", tool="review")
186
+ .transition("review", "draft", tool="revise")
187
+ .transition("review", "published", tool="publish")
188
+ .terminal("published")
189
+ .build()
190
+ )
191
+
192
+ # initial visit counts as 1, so max_state_visits=4 allows 3 extra research loops
193
+ pilot = Pilot(machine, budget=5.0, max_state_visits=4, max_steps=25)
194
+
195
+ @guarded(pilot, cost=0.8)
196
+ def search(q: str) -> str: ...
197
+
198
+ @guarded(pilot, cost=1.2)
199
+ def write_draft(notes: str) -> str: ...
200
+
201
+ @guarded(pilot)
202
+ def review(draft: str) -> bool: ...
203
+
204
+ @guarded(pilot, cost=0.3)
205
+ def publish(draft: str) -> str: ...
206
+ ```
207
+
208
+ The agent loop calls these as it sees fit. statepilot makes the illegal paths
209
+ impossible:
210
+
211
+ - calling `publish()` while still in `research` -> `TransitionError`
212
+ - a 4th `search()` loop -> `LoopLimitExceeded`
213
+ - cumulative cost over \$5 -> `BudgetExceeded`
214
+ - more than 25 steps -> `StepLimitExceeded`
215
+
216
+ A runnable version is in [`examples/research_pipeline.py`](examples/research_pipeline.py).
217
+
218
+ ## Why deterministic guards
219
+
220
+ LLMs are probabilistic. Most of the time the model follows the plan; occasionally
221
+ it calls `publish` before `review`, gets stuck re-searching the same thing, or
222
+ burns the budget. "Most of the time" is not a guarantee, and prompt-only
223
+ constraints are suggestions, not enforcement.
224
+
225
+ A state machine turns those soft expectations into a hard contract that lives in
226
+ code, runs on every tool call, and is trivial to unit-test. You get:
227
+
228
+ - **Safety** — illegal tool sequences cannot happen; they raise instead.
229
+ - **Cost control** — a real budget cap, enforced before the expensive call runs.
230
+ - **Loop protection** — runaway repetition trips a clear, typed exception.
231
+ - **Auditability** — `pilot.history` and `pilot.to_trace()` give you a complete,
232
+ JSON-serialisable record of what the agent actually did.
233
+
234
+ It is intentionally small. The whole core is a `StateMachine` plus a `Pilot`, and
235
+ the runtime cost is a dict lookup and a few integer comparisons per step.
236
+
237
+ ## API reference
238
+
239
+ ### `StateMachine`
240
+
241
+ Immutable, validated machine definition. Carries no runtime state.
242
+
243
+ - `StateMachine.builder(initial=None) -> StateMachineBuilder` — fluent builder.
244
+ - `StateMachine.from_dict(data) -> StateMachine` — build from a mapping.
245
+ - `StateMachine.from_yaml(text) -> StateMachine` — build from an inline YAML
246
+ string (needs the `yaml` extra).
247
+ - `StateMachine.from_yaml_file(path) -> StateMachine` — build from a YAML file
248
+ (needs the `yaml` extra).
249
+ - `.to_dict()` — round-trips with `from_dict`.
250
+ - `.allowed_tools(state) -> tuple[str, ...]`
251
+ - `.resolve(state, tool) -> str | None` — destination, or `None` if disallowed.
252
+ - `.is_terminal(state) -> bool`
253
+
254
+ ### `StateMachineBuilder`
255
+
256
+ - `.initial(state)`, `.state(*states)`, `.transition(src, dest, *, tool)`,
257
+ `.terminal(*states)`, `.build()`. Every mutator returns `self`.
258
+
259
+ ### `Pilot`
260
+
261
+ Stateful runtime enforcer. Construct with the machine and optional limits:
262
+
263
+ ```python
264
+ Pilot(
265
+ machine,
266
+ budget=None, # cumulative cost cap
267
+ max_steps=None, # total steps cap
268
+ max_state_visits=None, # per-state visit cap (initial state counts as 1)
269
+ max_consecutive_tool=None # same tool back-to-back cap
270
+ )
271
+ ```
272
+
273
+ - `.step(tool, *, cost=0.0) -> str` — validate + apply; returns the new state.
274
+ Raises on violation; state is unchanged on failure.
275
+ - `.can(tool, *, cost=0.0) -> bool` — pure check, never mutates, never raises.
276
+ - `.allowed_tools() -> tuple[str, ...]`, `.state`, `.done`, `.steps_taken`,
277
+ `.cost_spent`, `.history`.
278
+ - `.to_trace() -> dict` — JSON-serialisable run trace.
279
+ - `.reset()` — back to the initial state, clears cost/counters/history.
280
+
281
+ ### `@guarded(pilot, *, tool=None, cost=0.0)`
282
+
283
+ Decorator that calls `pilot.step(...)` before the function body. `tool` defaults
284
+ to the function name.
285
+
286
+ ### Exceptions
287
+
288
+ ```
289
+ StatepilotError
290
+ ├── StateMachineError # invalid machine definition (definition-time)
291
+ └── GuardViolation # runtime rule broken — catch this for "agent misbehaved"
292
+ ├── TransitionError # tool not allowed in the current state
293
+ ├── LoopLimitExceeded # state revisited / tool repeated too often
294
+ ├── BudgetExceeded # cumulative cost over budget
295
+ └── StepLimitExceeded # too many total steps
296
+ ```
297
+
298
+ ## LangGraph adapter (experimental)
299
+
300
+ If you orchestrate with [LangGraph](https://github.com/langchain-ai/langgraph),
301
+ `statepilot.adapters.guard_node` wraps a node so the pilot guards it:
302
+
303
+ ```python
304
+ from statepilot import StateMachine, Pilot
305
+ from statepilot.adapters import guard_node
306
+ # from langgraph.graph import StateGraph
307
+
308
+ pilot = Pilot(machine, budget=5.0)
309
+ # graph = StateGraph(MyState)
310
+ # graph.add_node("research", guard_node(pilot, research_node, cost=1.0))
311
+ # graph.add_node("draft", guard_node(pilot, draft_node))
312
+ ```
313
+
314
+ It targets LangGraph's stable *node contract* (a callable `state -> partial
315
+ state dict`) and **never imports langgraph itself**, so it adds no import-time
316
+ dependency and does not break when the LangGraph API changes. It is deliberately
317
+ minimal and marked experimental — conditional edges, `Send` fan-out, and
318
+ checkpoint/resume are out of scope. For full control, just drive the `Pilot`
319
+ inside your own node functions; that path is fully supported.
320
+
321
+ The adapter needs no extra dependency — it works with any callable. Install
322
+ LangGraph in your own project if you use it.
323
+
324
+ ## Concurrency
325
+
326
+ A `Pilot` holds the mutable state of **one** agent run and is **not
327
+ thread-safe** — use one pilot per run, don't share it across threads, and call
328
+ `pilot.reset()` to reuse it. `pilot.history` is an immutable snapshot (a tuple),
329
+ so reading or logging it can never desync the run's guards.
330
+
331
+ ## Status
332
+
333
+ Beta (`0.1.0`). The core API (`StateMachine`, `Pilot`, `@guarded`) is what we
334
+ intend to keep stable. No benchmarks are claimed — the design goal is
335
+ correctness and a tiny footprint, not throughput. Issues and PRs welcome.
336
+
337
+ ## License
338
+
339
+ MIT © 2026 StudioMeyer. See [LICENSE](LICENSE).