boolean-algebra-engine 0.1.4__tar.gz → 0.1.6__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- boolean_algebra_engine-0.1.6/PKG-INFO +345 -0
- boolean_algebra_engine-0.1.6/README.md +298 -0
- boolean_algebra_engine-0.1.6/boolean_algebra_engine.egg-info/PKG-INFO +345 -0
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/boolean_algebra_engine.egg-info/requires.txt +1 -0
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/pyproject.toml +2 -2
- boolean_algebra_engine-0.1.4/PKG-INFO +0 -286
- boolean_algebra_engine-0.1.4/README.md +0 -240
- boolean_algebra_engine-0.1.4/boolean_algebra_engine.egg-info/PKG-INFO +0 -286
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/LICENSE +0 -0
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/api/__init__.py +0 -0
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/api/routes.py +0 -0
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/boolean_algebra_engine.egg-info/SOURCES.txt +0 -0
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/boolean_algebra_engine.egg-info/dependency_links.txt +0 -0
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/boolean_algebra_engine.egg-info/entry_points.txt +0 -0
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/boolean_algebra_engine.egg-info/top_level.txt +0 -0
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/cli/__init__.py +0 -0
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/cli/main.py +0 -0
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/core/__init__.py +0 -0
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/core/evaluator.py +0 -0
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/core/models.py +0 -0
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/core/parser.py +0 -0
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/core/synthesizer.py +0 -0
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/mcp_server/__init__.py +0 -0
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/mcp_server/server.py +0 -0
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/nl/__init__.py +0 -0
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/nl/nl.py +0 -0
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/setup.cfg +0 -0
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/tests/test_edge_cases.py +0 -0
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/tests/test_evaluator.py +0 -0
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/tests/test_integration.py +0 -0
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/tests/test_models.py +0 -0
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/tests/test_parser.py +0 -0
- {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/tests/test_synthesizer.py +0 -0
|
@@ -0,0 +1,345 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: boolean-algebra-engine
|
|
3
|
+
Version: 0.1.6
|
|
4
|
+
Summary: Deterministic logic layer for AI agents — catch logical contradictions in system prompts, rules, and agent reasoning
|
|
5
|
+
Author-email: Aditya Shrivastava <aditya.shrivastava.architect@proton.me>
|
|
6
|
+
License-Expression: GPL-3.0-only
|
|
7
|
+
Project-URL: Homepage, https://github.com/Shrivastava-Aditya/boolean-algebra-engine-python
|
|
8
|
+
Project-URL: Repository, https://github.com/Shrivastava-Aditya/boolean-algebra-engine-python
|
|
9
|
+
Project-URL: Issues, https://github.com/Shrivastava-Aditya/boolean-algebra-engine-python/issues
|
|
10
|
+
Keywords: boolean,algebra,logic,truth-table,quine-mccluskey,digital-logic
|
|
11
|
+
Classifier: Development Status :: 3 - Alpha
|
|
12
|
+
Classifier: Intended Audience :: Developers
|
|
13
|
+
Classifier: Intended Audience :: Education
|
|
14
|
+
Classifier: Programming Language :: Python :: 3
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
19
|
+
Classifier: Topic :: Scientific/Engineering :: Mathematics
|
|
20
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
21
|
+
Requires-Python: >=3.9
|
|
22
|
+
Description-Content-Type: text/markdown
|
|
23
|
+
License-File: LICENSE
|
|
24
|
+
Provides-Extra: cli
|
|
25
|
+
Requires-Dist: typer>=0.12.0; extra == "cli"
|
|
26
|
+
Requires-Dist: rich>=13.0.0; extra == "cli"
|
|
27
|
+
Provides-Extra: mcp
|
|
28
|
+
Requires-Dist: mcp[cli]>=1.0.0; extra == "mcp"
|
|
29
|
+
Provides-Extra: nl-anthropic
|
|
30
|
+
Requires-Dist: anthropic>=0.50.0; extra == "nl-anthropic"
|
|
31
|
+
Provides-Extra: nl-openai
|
|
32
|
+
Requires-Dist: openai>=1.0.0; extra == "nl-openai"
|
|
33
|
+
Provides-Extra: nl
|
|
34
|
+
Requires-Dist: anthropic>=0.50.0; extra == "nl"
|
|
35
|
+
Provides-Extra: api
|
|
36
|
+
Requires-Dist: fastapi>=0.100.0; extra == "api"
|
|
37
|
+
Requires-Dist: uvicorn>=0.20.0; extra == "api"
|
|
38
|
+
Provides-Extra: api-cache
|
|
39
|
+
Requires-Dist: fastapi>=0.100.0; extra == "api-cache"
|
|
40
|
+
Requires-Dist: uvicorn>=0.20.0; extra == "api-cache"
|
|
41
|
+
Requires-Dist: redis>=5.0.0; extra == "api-cache"
|
|
42
|
+
Provides-Extra: dev
|
|
43
|
+
Requires-Dist: pytest>=8.0.0; extra == "dev"
|
|
44
|
+
Requires-Dist: httpx>=0.24.0; extra == "dev"
|
|
45
|
+
Requires-Dist: z3-solver>=4.12.0; extra == "dev"
|
|
46
|
+
Dynamic: license-file
|
|
47
|
+
|
|
48
|
+
# boolean-algebra-engine
|
|
49
|
+
|
|
50
|
+
**The logic layer your AI is missing.**
|
|
51
|
+
|
|
52
|
+
AI agents hallucinate on boolean logic — not sometimes, reliably. They predict the next token. They don't compute. This engine does. Deterministic, exhaustive, under 10ms. It sits inside your agent pipeline and makes one guarantee the model cannot make itself: that its reasoning is logically consistent.
|
|
53
|
+
|
|
54
|
+
```bash
|
|
55
|
+
pip install boolean-algebra-engine
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
---
|
|
59
|
+
|
|
60
|
+
## Quick start
|
|
61
|
+
|
|
62
|
+
Zero dependencies. Works immediately after install.
|
|
63
|
+
|
|
64
|
+
```python
|
|
65
|
+
from core.evaluator import evaluate
|
|
66
|
+
from core.synthesizer import synthesize
|
|
67
|
+
|
|
68
|
+
# Does a contradiction exist?
|
|
69
|
+
table, _ = evaluate("A.!A")
|
|
70
|
+
print(table.satisfiable) # False — always a contradiction
|
|
71
|
+
|
|
72
|
+
# Can two rules both be true simultaneously?
|
|
73
|
+
table, _ = evaluate("(A.B).(!A)")
|
|
74
|
+
print(table.satisfiable) # False — A and !A can't both hold
|
|
75
|
+
|
|
76
|
+
# Full truth table
|
|
77
|
+
table, _ = evaluate("A.(B+C)")
|
|
78
|
+
print(table.variables) # ['A', 'B', 'C']
|
|
79
|
+
print(table.minterms) # [5, 6, 7]
|
|
80
|
+
print(table.satisfiable) # True
|
|
81
|
+
|
|
82
|
+
# Simplify to minimal form
|
|
83
|
+
minimal, _ = synthesize(table)
|
|
84
|
+
print(minimal) # A.C+A.B
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
---
|
|
88
|
+
|
|
89
|
+
## The problem
|
|
90
|
+
|
|
91
|
+
Six rules. Three variables. Written by four people over six months.
|
|
92
|
+
|
|
93
|
+
A fintech AI agent auto-approves or rejects loan applications based on these rules — nobody ever verified them together. The engine checks all 8 input combinations for every rule, in every combination:
|
|
94
|
+
|
|
95
|
+
```python
|
|
96
|
+
# pip install boolean-algebra-engine[mcp]
|
|
97
|
+
from mcp_server.server import check_prompt_logic
|
|
98
|
+
|
|
99
|
+
result = check_prompt_logic([
|
|
100
|
+
"A.B", # approve: good credit AND income verified
|
|
101
|
+
"!A", # reject: bad credit
|
|
102
|
+
"C", # approve: collateral exists
|
|
103
|
+
"!C", # reject: no collateral
|
|
104
|
+
])
|
|
105
|
+
|
|
106
|
+
print(result["summary"])
|
|
107
|
+
# {'total': 4, 'contradictions': 0, 'tautologies': 0,
|
|
108
|
+
# 'equivalent_pairs': 0, 'conflicting_pairs': 2}
|
|
109
|
+
|
|
110
|
+
print([(p["rule1"], p["rule2"]) for p in result["pairwise"] if p["always_conflict"]])
|
|
111
|
+
# [('A.B', '!A'), ('C', '!C')]
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
**What it found:**
|
|
115
|
+
- `A.B` and `!A` conflict — good credit approval and bad credit rejection fire simultaneously when `A=1`. The agent picks a winner arbitrarily.
|
|
116
|
+
- `C` and `!C` conflict — collateral approval and no-collateral rejection are mutually exclusive by definition. Both rules can never apply at the same time.
|
|
117
|
+
|
|
118
|
+
Nobody caught these by reading the rules. The engine caught them by checking every combination.
|
|
119
|
+
|
|
120
|
+
---
|
|
121
|
+
|
|
122
|
+
## The benchmark
|
|
123
|
+
|
|
124
|
+
The engine is the oracle — ground truth is computed by exhaustive enumeration, not guessed. Every LLM disagreement is a provable hallucination.
|
|
125
|
+
|
|
126
|
+
**Methodology:** generate pairs of boolean expressions where the correct answer (satisfiable or not) is known exactly. Ask the LLM. Compare. No ambiguity, no human labeling, no interpretation.
|
|
127
|
+
|
|
128
|
+
```
|
|
129
|
+
python3 benchmark.py --provider ollama --model tinyllama --cases 20
|
|
130
|
+
python3 benchmark.py --provider ollama --model llama3.2:3b --cases 20
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
**tinyllama — 1.1B parameters**
|
|
134
|
+
|
|
135
|
+
```
|
|
136
|
+
⬡ z3 verifying 20 ground truth labels... ✓ all 20 cases agree
|
|
137
|
+
|
|
138
|
+
╭───────────── benchmark config ──────────────╮
|
|
139
|
+
│ model ollama/tinyllama │
|
|
140
|
+
│ cases 20 (10 conflict · 10 compat) │
|
|
141
|
+
│ variables 3 (A, B, C) │
|
|
142
|
+
│ temperature 0 (deterministic) │
|
|
143
|
+
│ max tokens 5 (yes / no) │
|
|
144
|
+
│ workers 8 parallel │
|
|
145
|
+
╰─────────────────────────────────────────────╯
|
|
146
|
+
|
|
147
|
+
ollama/tinyllama — 20/20 cases | 50.0% hallucination rate
|
|
148
|
+
|
|
149
|
+
# Rule 1 Rule 2 vars engine llm
|
|
150
|
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
151
|
+
1 ✗ B !B B no yes
|
|
152
|
+
2 ✗ A.B+C !A.!B.!C A B C no yes
|
|
153
|
+
3 ✗ A.B A.!B A B no yes
|
|
154
|
+
4 ✓ A+!B A.(B+C) A B C yes yes
|
|
155
|
+
5 ✗ A.B A^B A B no yes
|
|
156
|
+
6 ✓ !A+B.C B A B C yes yes
|
|
157
|
+
7 ✓ A.B+C A+B A B C yes yes
|
|
158
|
+
8 ✓ A+B.C.D C A B C D yes yes
|
|
159
|
+
9 ✓ A.B B A B yes yes
|
|
160
|
+
10 ✓ !C !B B C yes yes
|
|
161
|
+
...
|
|
162
|
+
|
|
163
|
+
╭─────────── results — ollama/tinyllama ─────────────╮
|
|
164
|
+
│ model ollama/tinyllama │
|
|
165
|
+
│ total cases 20 (10 conflict · 10 compat) │
|
|
166
|
+
│ variables 3 (A, B, C) │
|
|
167
|
+
│ temperature 0 (deterministic) │
|
|
168
|
+
│ max tokens 5 │
|
|
169
|
+
│ correct 10 │
|
|
170
|
+
│ hallucinated 10 │
|
|
171
|
+
│ hallucination rate 50.0% │
|
|
172
|
+
│ missed conflicts 10/10 (100.0%) │
|
|
173
|
+
│ missed compatibles 0/10 (0.0%) │
|
|
174
|
+
╰────────────────────────────────────────────────────╯
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
**llama3.2:3b — 3B parameters**
|
|
178
|
+
|
|
179
|
+
```
|
|
180
|
+
⬡ z3 verifying 20 ground truth labels... ✓ all 20 cases agree
|
|
181
|
+
|
|
182
|
+
╭───────────── benchmark config ──────────────╮
|
|
183
|
+
│ model ollama/llama3.2:3b │
|
|
184
|
+
│ cases 20 (10 conflict · 10 compat) │
|
|
185
|
+
│ variables 4 (A, B, C, D) │
|
|
186
|
+
│ temperature 0 (deterministic) │
|
|
187
|
+
│ max tokens 5 (yes / no) │
|
|
188
|
+
│ workers 8 parallel │
|
|
189
|
+
╰─────────────────────────────────────────────╯
|
|
190
|
+
|
|
191
|
+
ollama/llama3.2:3b — 20/20 cases | 50.0% hallucination rate
|
|
192
|
+
|
|
193
|
+
# Rule 1 Rule 2 vars engine llm
|
|
194
|
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
195
|
+
1 ✓ B !B B no no
|
|
196
|
+
2 ✓ A.B+C !A.!B.!C A B C no no
|
|
197
|
+
3 ✓ A.B A.!B A B no no
|
|
198
|
+
4 ✗ A+!B A.(B+C) A B C yes no
|
|
199
|
+
5 ✓ A.B A^B A B no no
|
|
200
|
+
6 ✗ !A+B.C B A B C yes no
|
|
201
|
+
7 ✗ A.B+C A+B A B C yes no
|
|
202
|
+
8 ✗ A+B.C.D C A B C D yes no
|
|
203
|
+
9 ✗ A.B B A B yes no
|
|
204
|
+
10 ✗ !C !B B C yes no
|
|
205
|
+
...
|
|
206
|
+
|
|
207
|
+
╭─────────── results — ollama/llama3.2:3b ───────────╮
|
|
208
|
+
│ model ollama/llama3.2:3b │
|
|
209
|
+
│ total cases 20 (10 conflict · 10 compat) │
|
|
210
|
+
│ variables 4 (A, B, C, D) │
|
|
211
|
+
│ temperature 0 (deterministic) │
|
|
212
|
+
│ max tokens 5 │
|
|
213
|
+
│ correct 10 │
|
|
214
|
+
│ hallucinated 10 │
|
|
215
|
+
│ hallucination rate 50.0% │
|
|
216
|
+
│ missed conflicts 0/10 (0.0%) │
|
|
217
|
+
│ missed compatibles 10/10 (100.0%) │
|
|
218
|
+
╰────────────────────────────────────────────────────╯
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
Both models score 50% — equal to a coin flip — but in opposite directions. tinyllama always answers "yes", llama3.2:3b always answers "no". Neither is reasoning. Both are outputting a constant.
|
|
222
|
+
|
|
223
|
+
The `vars` column shows how many variables each case involves. The `engine` column is ground truth. Every mismatch with `llm` is a provable hallucination — not an opinion.
|
|
224
|
+
|
|
225
|
+

|
|
226
|
+
|
|
227
|
+
Per-case strips (bottom row of the chart): every conflict cell is uniformly one colour per model, every compatible cell is the opposite. No case-by-case variation — no reasoning happening at all.
|
|
228
|
+
|
|
229
|
+
---
|
|
230
|
+
|
|
231
|
+
## Install
|
|
232
|
+
|
|
233
|
+
```bash
|
|
234
|
+
# Core engine — zero dependencies
|
|
235
|
+
pip install boolean-algebra-engine
|
|
236
|
+
|
|
237
|
+
# With CLI
|
|
238
|
+
pip install "boolean-algebra-engine[cli]"
|
|
239
|
+
|
|
240
|
+
# With MCP server (for Claude Desktop)
|
|
241
|
+
pip install "boolean-algebra-engine[mcp]"
|
|
242
|
+
|
|
243
|
+
# With REST API
|
|
244
|
+
pip install "boolean-algebra-engine[api]"
|
|
245
|
+
|
|
246
|
+
# With NL layer (Anthropic)
|
|
247
|
+
pip install "boolean-algebra-engine[nl-anthropic]"
|
|
248
|
+
|
|
249
|
+
# With NL layer (OpenAI)
|
|
250
|
+
pip install "boolean-algebra-engine[nl-openai]"
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
---
|
|
254
|
+
|
|
255
|
+
## Core API
|
|
256
|
+
|
|
257
|
+
```python
|
|
258
|
+
from core.evaluator import evaluate
|
|
259
|
+
from core.synthesizer import synthesize
|
|
260
|
+
|
|
261
|
+
# Forward: expression → truth table
|
|
262
|
+
table, _ = evaluate("A.(B+C)")
|
|
263
|
+
print(table.variables) # ['A', 'B', 'C']
|
|
264
|
+
print(table.minterms) # [5, 6, 7]
|
|
265
|
+
print(table.satisfiable) # True
|
|
266
|
+
|
|
267
|
+
# Inverse: truth table → minimal expression
|
|
268
|
+
minimal, _ = synthesize(table)
|
|
269
|
+
print(minimal) # A.C+A.B
|
|
270
|
+
|
|
271
|
+
# Equivalence and satisfiability (via MCP server functions — no HTTP, direct call)
|
|
272
|
+
# pip install boolean-algebra-engine[mcp]
|
|
273
|
+
from mcp_server.server import equivalent, satisfiable
|
|
274
|
+
|
|
275
|
+
print(equivalent("A.(B+C)", "A.B+A.C")["equivalent"]) # True — distributive law
|
|
276
|
+
print(satisfiable("A.!A")["satisfiable"]) # False — contradiction
|
|
277
|
+
```
|
|
278
|
+
|
|
279
|
+
`core/` has zero external dependencies. Import it into any Python project.
|
|
280
|
+
|
|
281
|
+
---
|
|
282
|
+
|
|
283
|
+
## MCP — Claude calls the engine
|
|
284
|
+
|
|
285
|
+
Wire the engine into Claude Desktop and Claude stops predicting boolean logic. It computes it.
|
|
286
|
+
|
|
287
|
+
```json
|
|
288
|
+
{
|
|
289
|
+
"mcpServers": {
|
|
290
|
+
"boolean-algebra-engine": {
|
|
291
|
+
"command": "python",
|
|
292
|
+
"args": ["-m", "mcp_server.server"]
|
|
293
|
+
}
|
|
294
|
+
}
|
|
295
|
+
}
|
|
296
|
+
```
|
|
297
|
+
|
|
298
|
+
Five tools Claude can call mid-conversation:
|
|
299
|
+
- `evaluate` — expression → truth table
|
|
300
|
+
- `simplify` — expression → minimal form
|
|
301
|
+
- `equivalent` — are two expressions identical?
|
|
302
|
+
- `satisfiable` — does any input make this true?
|
|
303
|
+
- `check_prompt_logic` — audit a full rule set for contradictions, tautologies, conflicts, duplicates
|
|
304
|
+
|
|
305
|
+
---
|
|
306
|
+
|
|
307
|
+
## Operators
|
|
308
|
+
|
|
309
|
+
| Symbol | Operation | Precedence |
|
|
310
|
+
|--------|-----------|------------|
|
|
311
|
+
| `!` | NOT | 4 (highest) |
|
|
312
|
+
| `.` | AND | 3 |
|
|
313
|
+
| `^` | XOR | 2 |
|
|
314
|
+
| `+` | OR | 1 (lowest) |
|
|
315
|
+
|
|
316
|
+
Variables: uppercase `A`–`Z`. Parentheses override precedence. Up to 26 variables, arbitrary nesting.
|
|
317
|
+
|
|
318
|
+
---
|
|
319
|
+
|
|
320
|
+
## Interfaces
|
|
321
|
+
|
|
322
|
+
| Interface | How |
|
|
323
|
+
|---|---|
|
|
324
|
+
| **Python library** | `from core.evaluator import evaluate` — embed in any project |
|
|
325
|
+
| **CLI / REPL** | `boolcalc "A.B+!A.C"` — instant truth table in terminal |
|
|
326
|
+
| **MCP server** | Claude Desktop plugin — plug and play |
|
|
327
|
+
| **REST API** | `POST /check-rules` — callable from any language or stack |
|
|
328
|
+
| **NL layer** | Plain English → expression → verified result (Anthropic, OpenAI, Ollama, any OpenAI-compat) |
|
|
329
|
+
| **Streamlit UI** | Three modes: Expression, Rule Auditor, Plain English |
|
|
330
|
+
|
|
331
|
+
---
|
|
332
|
+
|
|
333
|
+
## Credibility
|
|
334
|
+
|
|
335
|
+
The engine does not sample, approximate, or predict. It evaluates every possible input combination:
|
|
336
|
+
|
|
337
|
+
- **Satisfiable** — an actual row where output = 1 was found
|
|
338
|
+
- **Contradiction** — every row was checked, all were 0
|
|
339
|
+
- **Equivalent** — output columns compared row-by-row across the full truth table
|
|
340
|
+
- **Conflict** — conjunction of both rules evaluated for every input, always returned 0
|
|
341
|
+
|
|
342
|
+
The core evaluator is 15 lines (`core/evaluator.py`). No black box, no model weights, no probability — just arithmetic. This is a stronger correctness claim than any probabilistic tool can make.
|
|
343
|
+
|
|
344
|
+
90 tests across unit, integration, edge cases, and round-trips. All passing.
|
|
345
|
+
|
|
@@ -0,0 +1,298 @@
|
|
|
1
|
+
# boolean-algebra-engine
|
|
2
|
+
|
|
3
|
+
**The logic layer your AI is missing.**
|
|
4
|
+
|
|
5
|
+
AI agents hallucinate on boolean logic — not sometimes, reliably. They predict the next token. They don't compute. This engine does. Deterministic, exhaustive, under 10ms. It sits inside your agent pipeline and makes one guarantee the model cannot make itself: that its reasoning is logically consistent.
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
pip install boolean-algebra-engine
|
|
9
|
+
```
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## Quick start
|
|
14
|
+
|
|
15
|
+
Zero dependencies. Works immediately after install.
|
|
16
|
+
|
|
17
|
+
```python
|
|
18
|
+
from core.evaluator import evaluate
|
|
19
|
+
from core.synthesizer import synthesize
|
|
20
|
+
|
|
21
|
+
# Does a contradiction exist?
|
|
22
|
+
table, _ = evaluate("A.!A")
|
|
23
|
+
print(table.satisfiable) # False — always a contradiction
|
|
24
|
+
|
|
25
|
+
# Can two rules both be true simultaneously?
|
|
26
|
+
table, _ = evaluate("(A.B).(!A)")
|
|
27
|
+
print(table.satisfiable) # False — A and !A can't both hold
|
|
28
|
+
|
|
29
|
+
# Full truth table
|
|
30
|
+
table, _ = evaluate("A.(B+C)")
|
|
31
|
+
print(table.variables) # ['A', 'B', 'C']
|
|
32
|
+
print(table.minterms) # [5, 6, 7]
|
|
33
|
+
print(table.satisfiable) # True
|
|
34
|
+
|
|
35
|
+
# Simplify to minimal form
|
|
36
|
+
minimal, _ = synthesize(table)
|
|
37
|
+
print(minimal) # A.C+A.B
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
---
|
|
41
|
+
|
|
42
|
+
## The problem
|
|
43
|
+
|
|
44
|
+
Six rules. Three variables. Written by four people over six months.
|
|
45
|
+
|
|
46
|
+
A fintech AI agent auto-approves or rejects loan applications based on these rules — nobody ever verified them together. The engine checks all 8 input combinations for every rule, in every combination:
|
|
47
|
+
|
|
48
|
+
```python
|
|
49
|
+
# pip install boolean-algebra-engine[mcp]
|
|
50
|
+
from mcp_server.server import check_prompt_logic
|
|
51
|
+
|
|
52
|
+
result = check_prompt_logic([
|
|
53
|
+
"A.B", # approve: good credit AND income verified
|
|
54
|
+
"!A", # reject: bad credit
|
|
55
|
+
"C", # approve: collateral exists
|
|
56
|
+
"!C", # reject: no collateral
|
|
57
|
+
])
|
|
58
|
+
|
|
59
|
+
print(result["summary"])
|
|
60
|
+
# {'total': 4, 'contradictions': 0, 'tautologies': 0,
|
|
61
|
+
# 'equivalent_pairs': 0, 'conflicting_pairs': 2}
|
|
62
|
+
|
|
63
|
+
print([(p["rule1"], p["rule2"]) for p in result["pairwise"] if p["always_conflict"]])
|
|
64
|
+
# [('A.B', '!A'), ('C', '!C')]
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
**What it found:**
|
|
68
|
+
- `A.B` and `!A` conflict — good credit approval and bad credit rejection fire simultaneously when `A=1`. The agent picks a winner arbitrarily.
|
|
69
|
+
- `C` and `!C` conflict — collateral approval and no-collateral rejection are mutually exclusive by definition. Both rules can never apply at the same time.
|
|
70
|
+
|
|
71
|
+
Nobody caught these by reading the rules. The engine caught them by checking every combination.
|
|
72
|
+
|
|
73
|
+
---
|
|
74
|
+
|
|
75
|
+
## The benchmark
|
|
76
|
+
|
|
77
|
+
The engine is the oracle — ground truth is computed by exhaustive enumeration, not guessed. Every LLM disagreement is a provable hallucination.
|
|
78
|
+
|
|
79
|
+
**Methodology:** generate pairs of boolean expressions where the correct answer (satisfiable or not) is known exactly. Ask the LLM. Compare. No ambiguity, no human labeling, no interpretation.
|
|
80
|
+
|
|
81
|
+
```
|
|
82
|
+
python3 benchmark.py --provider ollama --model tinyllama --cases 20
|
|
83
|
+
python3 benchmark.py --provider ollama --model llama3.2:3b --cases 20
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
**tinyllama — 1.1B parameters**
|
|
87
|
+
|
|
88
|
+
```
|
|
89
|
+
⬡ z3 verifying 20 ground truth labels... ✓ all 20 cases agree
|
|
90
|
+
|
|
91
|
+
╭───────────── benchmark config ──────────────╮
|
|
92
|
+
│ model ollama/tinyllama │
|
|
93
|
+
│ cases 20 (10 conflict · 10 compat) │
|
|
94
|
+
│ variables 3 (A, B, C) │
|
|
95
|
+
│ temperature 0 (deterministic) │
|
|
96
|
+
│ max tokens 5 (yes / no) │
|
|
97
|
+
│ workers 8 parallel │
|
|
98
|
+
╰─────────────────────────────────────────────╯
|
|
99
|
+
|
|
100
|
+
ollama/tinyllama — 20/20 cases | 50.0% hallucination rate
|
|
101
|
+
|
|
102
|
+
# Rule 1 Rule 2 vars engine llm
|
|
103
|
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
104
|
+
1 ✗ B !B B no yes
|
|
105
|
+
2 ✗ A.B+C !A.!B.!C A B C no yes
|
|
106
|
+
3 ✗ A.B A.!B A B no yes
|
|
107
|
+
4 ✓ A+!B A.(B+C) A B C yes yes
|
|
108
|
+
5 ✗ A.B A^B A B no yes
|
|
109
|
+
6 ✓ !A+B.C B A B C yes yes
|
|
110
|
+
7 ✓ A.B+C A+B A B C yes yes
|
|
111
|
+
8 ✓ A+B.C.D C A B C D yes yes
|
|
112
|
+
9 ✓ A.B B A B yes yes
|
|
113
|
+
10 ✓ !C !B B C yes yes
|
|
114
|
+
...
|
|
115
|
+
|
|
116
|
+
╭─────────── results — ollama/tinyllama ─────────────╮
|
|
117
|
+
│ model ollama/tinyllama │
|
|
118
|
+
│ total cases 20 (10 conflict · 10 compat) │
|
|
119
|
+
│ variables 3 (A, B, C) │
|
|
120
|
+
│ temperature 0 (deterministic) │
|
|
121
|
+
│ max tokens 5 │
|
|
122
|
+
│ correct 10 │
|
|
123
|
+
│ hallucinated 10 │
|
|
124
|
+
│ hallucination rate 50.0% │
|
|
125
|
+
│ missed conflicts 10/10 (100.0%) │
|
|
126
|
+
│ missed compatibles 0/10 (0.0%) │
|
|
127
|
+
╰────────────────────────────────────────────────────╯
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
**llama3.2:3b — 3B parameters**
|
|
131
|
+
|
|
132
|
+
```
|
|
133
|
+
⬡ z3 verifying 20 ground truth labels... ✓ all 20 cases agree
|
|
134
|
+
|
|
135
|
+
╭───────────── benchmark config ──────────────╮
|
|
136
|
+
│ model ollama/llama3.2:3b │
|
|
137
|
+
│ cases 20 (10 conflict · 10 compat) │
|
|
138
|
+
│ variables 4 (A, B, C, D) │
|
|
139
|
+
│ temperature 0 (deterministic) │
|
|
140
|
+
│ max tokens 5 (yes / no) │
|
|
141
|
+
│ workers 8 parallel │
|
|
142
|
+
╰─────────────────────────────────────────────╯
|
|
143
|
+
|
|
144
|
+
ollama/llama3.2:3b — 20/20 cases | 50.0% hallucination rate
|
|
145
|
+
|
|
146
|
+
# Rule 1 Rule 2 vars engine llm
|
|
147
|
+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
148
|
+
1 ✓ B !B B no no
|
|
149
|
+
2 ✓ A.B+C !A.!B.!C A B C no no
|
|
150
|
+
3 ✓ A.B A.!B A B no no
|
|
151
|
+
4 ✗ A+!B A.(B+C) A B C yes no
|
|
152
|
+
5 ✓ A.B A^B A B no no
|
|
153
|
+
6 ✗ !A+B.C B A B C yes no
|
|
154
|
+
7 ✗ A.B+C A+B A B C yes no
|
|
155
|
+
8 ✗ A+B.C.D C A B C D yes no
|
|
156
|
+
9 ✗ A.B B A B yes no
|
|
157
|
+
10 ✗ !C !B B C yes no
|
|
158
|
+
...
|
|
159
|
+
|
|
160
|
+
╭─────────── results — ollama/llama3.2:3b ───────────╮
|
|
161
|
+
│ model ollama/llama3.2:3b │
|
|
162
|
+
│ total cases 20 (10 conflict · 10 compat) │
|
|
163
|
+
│ variables 4 (A, B, C, D) │
|
|
164
|
+
│ temperature 0 (deterministic) │
|
|
165
|
+
│ max tokens 5 │
|
|
166
|
+
│ correct 10 │
|
|
167
|
+
│ hallucinated 10 │
|
|
168
|
+
│ hallucination rate 50.0% │
|
|
169
|
+
│ missed conflicts 0/10 (0.0%) │
|
|
170
|
+
│ missed compatibles 10/10 (100.0%) │
|
|
171
|
+
╰────────────────────────────────────────────────────╯
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
Both models score 50% — equal to a coin flip — but in opposite directions. tinyllama always answers "yes", llama3.2:3b always answers "no". Neither is reasoning. Both are outputting a constant.
|
|
175
|
+
|
|
176
|
+
The `vars` column shows how many variables each case involves. The `engine` column is ground truth. Every mismatch with `llm` is a provable hallucination — not an opinion.
|
|
177
|
+
|
|
178
|
+

|
|
179
|
+
|
|
180
|
+
Per-case strips (bottom row of the chart): every conflict cell is uniformly one colour per model, every compatible cell is the opposite. No case-by-case variation — no reasoning happening at all.
|
|
181
|
+
|
|
182
|
+
---
|
|
183
|
+
|
|
184
|
+
## Install
|
|
185
|
+
|
|
186
|
+
```bash
|
|
187
|
+
# Core engine — zero dependencies
|
|
188
|
+
pip install boolean-algebra-engine
|
|
189
|
+
|
|
190
|
+
# With CLI
|
|
191
|
+
pip install "boolean-algebra-engine[cli]"
|
|
192
|
+
|
|
193
|
+
# With MCP server (for Claude Desktop)
|
|
194
|
+
pip install "boolean-algebra-engine[mcp]"
|
|
195
|
+
|
|
196
|
+
# With REST API
|
|
197
|
+
pip install "boolean-algebra-engine[api]"
|
|
198
|
+
|
|
199
|
+
# With NL layer (Anthropic)
|
|
200
|
+
pip install "boolean-algebra-engine[nl-anthropic]"
|
|
201
|
+
|
|
202
|
+
# With NL layer (OpenAI)
|
|
203
|
+
pip install "boolean-algebra-engine[nl-openai]"
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
---
|
|
207
|
+
|
|
208
|
+
## Core API
|
|
209
|
+
|
|
210
|
+
```python
|
|
211
|
+
from core.evaluator import evaluate
|
|
212
|
+
from core.synthesizer import synthesize
|
|
213
|
+
|
|
214
|
+
# Forward: expression → truth table
|
|
215
|
+
table, _ = evaluate("A.(B+C)")
|
|
216
|
+
print(table.variables) # ['A', 'B', 'C']
|
|
217
|
+
print(table.minterms) # [5, 6, 7]
|
|
218
|
+
print(table.satisfiable) # True
|
|
219
|
+
|
|
220
|
+
# Inverse: truth table → minimal expression
|
|
221
|
+
minimal, _ = synthesize(table)
|
|
222
|
+
print(minimal) # A.C+A.B
|
|
223
|
+
|
|
224
|
+
# Equivalence and satisfiability (via MCP server functions — no HTTP, direct call)
|
|
225
|
+
# pip install boolean-algebra-engine[mcp]
|
|
226
|
+
from mcp_server.server import equivalent, satisfiable
|
|
227
|
+
|
|
228
|
+
print(equivalent("A.(B+C)", "A.B+A.C")["equivalent"]) # True — distributive law
|
|
229
|
+
print(satisfiable("A.!A")["satisfiable"]) # False — contradiction
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
`core/` has zero external dependencies. Import it into any Python project.
|
|
233
|
+
|
|
234
|
+
---
|
|
235
|
+
|
|
236
|
+
## MCP — Claude calls the engine
|
|
237
|
+
|
|
238
|
+
Wire the engine into Claude Desktop and Claude stops predicting boolean logic. It computes it.
|
|
239
|
+
|
|
240
|
+
```json
|
|
241
|
+
{
|
|
242
|
+
"mcpServers": {
|
|
243
|
+
"boolean-algebra-engine": {
|
|
244
|
+
"command": "python",
|
|
245
|
+
"args": ["-m", "mcp_server.server"]
|
|
246
|
+
}
|
|
247
|
+
}
|
|
248
|
+
}
|
|
249
|
+
```
|
|
250
|
+
|
|
251
|
+
Five tools Claude can call mid-conversation:
|
|
252
|
+
- `evaluate` — expression → truth table
|
|
253
|
+
- `simplify` — expression → minimal form
|
|
254
|
+
- `equivalent` — are two expressions identical?
|
|
255
|
+
- `satisfiable` — does any input make this true?
|
|
256
|
+
- `check_prompt_logic` — audit a full rule set for contradictions, tautologies, conflicts, duplicates
|
|
257
|
+
|
|
258
|
+
---
|
|
259
|
+
|
|
260
|
+
## Operators
|
|
261
|
+
|
|
262
|
+
| Symbol | Operation | Precedence |
|
|
263
|
+
|--------|-----------|------------|
|
|
264
|
+
| `!` | NOT | 4 (highest) |
|
|
265
|
+
| `.` | AND | 3 |
|
|
266
|
+
| `^` | XOR | 2 |
|
|
267
|
+
| `+` | OR | 1 (lowest) |
|
|
268
|
+
|
|
269
|
+
Variables: uppercase `A`–`Z`. Parentheses override precedence. Up to 26 variables, arbitrary nesting.
|
|
270
|
+
|
|
271
|
+
---
|
|
272
|
+
|
|
273
|
+
## Interfaces
|
|
274
|
+
|
|
275
|
+
| Interface | How |
|
|
276
|
+
|---|---|
|
|
277
|
+
| **Python library** | `from core.evaluator import evaluate` — embed in any project |
|
|
278
|
+
| **CLI / REPL** | `boolcalc "A.B+!A.C"` — instant truth table in terminal |
|
|
279
|
+
| **MCP server** | Claude Desktop plugin — plug and play |
|
|
280
|
+
| **REST API** | `POST /check-rules` — callable from any language or stack |
|
|
281
|
+
| **NL layer** | Plain English → expression → verified result (Anthropic, OpenAI, Ollama, any OpenAI-compat) |
|
|
282
|
+
| **Streamlit UI** | Three modes: Expression, Rule Auditor, Plain English |
|
|
283
|
+
|
|
284
|
+
---
|
|
285
|
+
|
|
286
|
+
## Credibility
|
|
287
|
+
|
|
288
|
+
The engine does not sample, approximate, or predict. It evaluates every possible input combination:
|
|
289
|
+
|
|
290
|
+
- **Satisfiable** — an actual row where output = 1 was found
|
|
291
|
+
- **Contradiction** — every row was checked, all were 0
|
|
292
|
+
- **Equivalent** — output columns compared row-by-row across the full truth table
|
|
293
|
+
- **Conflict** — conjunction of both rules evaluated for every input, always returned 0
|
|
294
|
+
|
|
295
|
+
The core evaluator is 15 lines (`core/evaluator.py`). No black box, no model weights, no probability — just arithmetic. This is a stronger correctness claim than any probabilistic tool can make.
|
|
296
|
+
|
|
297
|
+
90 tests across unit, integration, edge cases, and round-trips. All passing.
|
|
298
|
+
|