boolean-algebra-engine 0.1.4__tar.gz → 0.1.6__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (33) hide show
  1. boolean_algebra_engine-0.1.6/PKG-INFO +345 -0
  2. boolean_algebra_engine-0.1.6/README.md +298 -0
  3. boolean_algebra_engine-0.1.6/boolean_algebra_engine.egg-info/PKG-INFO +345 -0
  4. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/boolean_algebra_engine.egg-info/requires.txt +1 -0
  5. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/pyproject.toml +2 -2
  6. boolean_algebra_engine-0.1.4/PKG-INFO +0 -286
  7. boolean_algebra_engine-0.1.4/README.md +0 -240
  8. boolean_algebra_engine-0.1.4/boolean_algebra_engine.egg-info/PKG-INFO +0 -286
  9. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/LICENSE +0 -0
  10. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/api/__init__.py +0 -0
  11. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/api/routes.py +0 -0
  12. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/boolean_algebra_engine.egg-info/SOURCES.txt +0 -0
  13. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/boolean_algebra_engine.egg-info/dependency_links.txt +0 -0
  14. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/boolean_algebra_engine.egg-info/entry_points.txt +0 -0
  15. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/boolean_algebra_engine.egg-info/top_level.txt +0 -0
  16. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/cli/__init__.py +0 -0
  17. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/cli/main.py +0 -0
  18. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/core/__init__.py +0 -0
  19. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/core/evaluator.py +0 -0
  20. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/core/models.py +0 -0
  21. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/core/parser.py +0 -0
  22. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/core/synthesizer.py +0 -0
  23. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/mcp_server/__init__.py +0 -0
  24. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/mcp_server/server.py +0 -0
  25. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/nl/__init__.py +0 -0
  26. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/nl/nl.py +0 -0
  27. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/setup.cfg +0 -0
  28. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/tests/test_edge_cases.py +0 -0
  29. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/tests/test_evaluator.py +0 -0
  30. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/tests/test_integration.py +0 -0
  31. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/tests/test_models.py +0 -0
  32. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/tests/test_parser.py +0 -0
  33. {boolean_algebra_engine-0.1.4 → boolean_algebra_engine-0.1.6}/tests/test_synthesizer.py +0 -0
@@ -0,0 +1,345 @@
1
+ Metadata-Version: 2.4
2
+ Name: boolean-algebra-engine
3
+ Version: 0.1.6
4
+ Summary: Deterministic logic layer for AI agents — catch logical contradictions in system prompts, rules, and agent reasoning
5
+ Author-email: Aditya Shrivastava <aditya.shrivastava.architect@proton.me>
6
+ License-Expression: GPL-3.0-only
7
+ Project-URL: Homepage, https://github.com/Shrivastava-Aditya/boolean-algebra-engine-python
8
+ Project-URL: Repository, https://github.com/Shrivastava-Aditya/boolean-algebra-engine-python
9
+ Project-URL: Issues, https://github.com/Shrivastava-Aditya/boolean-algebra-engine-python/issues
10
+ Keywords: boolean,algebra,logic,truth-table,quine-mccluskey,digital-logic
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: Intended Audience :: Education
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.9
16
+ Classifier: Programming Language :: Python :: 3.10
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Classifier: Programming Language :: Python :: 3.12
19
+ Classifier: Topic :: Scientific/Engineering :: Mathematics
20
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
21
+ Requires-Python: >=3.9
22
+ Description-Content-Type: text/markdown
23
+ License-File: LICENSE
24
+ Provides-Extra: cli
25
+ Requires-Dist: typer>=0.12.0; extra == "cli"
26
+ Requires-Dist: rich>=13.0.0; extra == "cli"
27
+ Provides-Extra: mcp
28
+ Requires-Dist: mcp[cli]>=1.0.0; extra == "mcp"
29
+ Provides-Extra: nl-anthropic
30
+ Requires-Dist: anthropic>=0.50.0; extra == "nl-anthropic"
31
+ Provides-Extra: nl-openai
32
+ Requires-Dist: openai>=1.0.0; extra == "nl-openai"
33
+ Provides-Extra: nl
34
+ Requires-Dist: anthropic>=0.50.0; extra == "nl"
35
+ Provides-Extra: api
36
+ Requires-Dist: fastapi>=0.100.0; extra == "api"
37
+ Requires-Dist: uvicorn>=0.20.0; extra == "api"
38
+ Provides-Extra: api-cache
39
+ Requires-Dist: fastapi>=0.100.0; extra == "api-cache"
40
+ Requires-Dist: uvicorn>=0.20.0; extra == "api-cache"
41
+ Requires-Dist: redis>=5.0.0; extra == "api-cache"
42
+ Provides-Extra: dev
43
+ Requires-Dist: pytest>=8.0.0; extra == "dev"
44
+ Requires-Dist: httpx>=0.24.0; extra == "dev"
45
+ Requires-Dist: z3-solver>=4.12.0; extra == "dev"
46
+ Dynamic: license-file
47
+
48
+ # boolean-algebra-engine
49
+
50
+ **The logic layer your AI is missing.**
51
+
52
+ AI agents hallucinate on boolean logic — not sometimes, reliably. They predict the next token. They don't compute. This engine does. Deterministic, exhaustive, under 10ms. It sits inside your agent pipeline and makes one guarantee the model cannot make itself: that its reasoning is logically consistent.
53
+
54
+ ```bash
55
+ pip install boolean-algebra-engine
56
+ ```
57
+
58
+ ---
59
+
60
+ ## Quick start
61
+
62
+ Zero dependencies. Works immediately after install.
63
+
64
+ ```python
65
+ from core.evaluator import evaluate
66
+ from core.synthesizer import synthesize
67
+
68
+ # Does a contradiction exist?
69
+ table, _ = evaluate("A.!A")
70
+ print(table.satisfiable) # False — always a contradiction
71
+
72
+ # Can two rules both be true simultaneously?
73
+ table, _ = evaluate("(A.B).(!A)")
74
+ print(table.satisfiable) # False — A and !A can't both hold
75
+
76
+ # Full truth table
77
+ table, _ = evaluate("A.(B+C)")
78
+ print(table.variables) # ['A', 'B', 'C']
79
+ print(table.minterms) # [5, 6, 7]
80
+ print(table.satisfiable) # True
81
+
82
+ # Simplify to minimal form
83
+ minimal, _ = synthesize(table)
84
+ print(minimal) # A.C+A.B
85
+ ```
86
+
87
+ ---
88
+
89
+ ## The problem
90
+
91
+ Six rules. Three variables. Written by four people over six months.
92
+
93
+ A fintech AI agent auto-approves or rejects loan applications based on these rules — nobody ever verified them together. The engine checks all 8 input combinations for every rule, in every combination:
94
+
95
+ ```python
96
+ # pip install boolean-algebra-engine[mcp]
97
+ from mcp_server.server import check_prompt_logic
98
+
99
+ result = check_prompt_logic([
100
+ "A.B", # approve: good credit AND income verified
101
+ "!A", # reject: bad credit
102
+ "C", # approve: collateral exists
103
+ "!C", # reject: no collateral
104
+ ])
105
+
106
+ print(result["summary"])
107
+ # {'total': 4, 'contradictions': 0, 'tautologies': 0,
108
+ # 'equivalent_pairs': 0, 'conflicting_pairs': 2}
109
+
110
+ print([(p["rule1"], p["rule2"]) for p in result["pairwise"] if p["always_conflict"]])
111
+ # [('A.B', '!A'), ('C', '!C')]
112
+ ```
113
+
114
+ **What it found:**
115
+ - `A.B` and `!A` conflict — good credit approval and bad credit rejection fire simultaneously when `A=1`. The agent picks a winner arbitrarily.
116
+ - `C` and `!C` conflict — collateral approval and no-collateral rejection are mutually exclusive by definition. Both rules can never apply at the same time.
117
+
118
+ Nobody caught these by reading the rules. The engine caught them by checking every combination.
119
+
120
+ ---
121
+
122
+ ## The benchmark
123
+
124
+ The engine is the oracle — ground truth is computed by exhaustive enumeration, not guessed. Every LLM disagreement is a provable hallucination.
125
+
126
+ **Methodology:** generate pairs of boolean expressions where the correct answer (satisfiable or not) is known exactly. Ask the LLM. Compare. No ambiguity, no human labeling, no interpretation.
127
+
128
+ ```
129
+ python3 benchmark.py --provider ollama --model tinyllama --cases 20
130
+ python3 benchmark.py --provider ollama --model llama3.2:3b --cases 20
131
+ ```
132
+
133
+ **tinyllama — 1.1B parameters**
134
+
135
+ ```
136
+ ⬡ z3 verifying 20 ground truth labels... ✓ all 20 cases agree
137
+
138
+ ╭───────────── benchmark config ──────────────╮
139
+ │ model ollama/tinyllama │
140
+ │ cases 20 (10 conflict · 10 compat) │
141
+ │ variables 3 (A, B, C) │
142
+ │ temperature 0 (deterministic) │
143
+ │ max tokens 5 (yes / no) │
144
+ │ workers 8 parallel │
145
+ ╰─────────────────────────────────────────────╯
146
+
147
+ ollama/tinyllama — 20/20 cases | 50.0% hallucination rate
148
+
149
+ # Rule 1 Rule 2 vars engine llm
150
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
151
+ 1 ✗ B !B B no yes
152
+ 2 ✗ A.B+C !A.!B.!C A B C no yes
153
+ 3 ✗ A.B A.!B A B no yes
154
+ 4 ✓ A+!B A.(B+C) A B C yes yes
155
+ 5 ✗ A.B A^B A B no yes
156
+ 6 ✓ !A+B.C B A B C yes yes
157
+ 7 ✓ A.B+C A+B A B C yes yes
158
+ 8 ✓ A+B.C.D C A B C D yes yes
159
+ 9 ✓ A.B B A B yes yes
160
+ 10 ✓ !C !B B C yes yes
161
+ ...
162
+
163
+ ╭─────────── results — ollama/tinyllama ─────────────╮
164
+ │ model ollama/tinyllama │
165
+ │ total cases 20 (10 conflict · 10 compat) │
166
+ │ variables 3 (A, B, C) │
167
+ │ temperature 0 (deterministic) │
168
+ │ max tokens 5 │
169
+ │ correct 10 │
170
+ │ hallucinated 10 │
171
+ │ hallucination rate 50.0% │
172
+ │ missed conflicts 10/10 (100.0%) │
173
+ │ missed compatibles 0/10 (0.0%) │
174
+ ╰────────────────────────────────────────────────────╯
175
+ ```
176
+
177
+ **llama3.2:3b — 3B parameters**
178
+
179
+ ```
180
+ ⬡ z3 verifying 20 ground truth labels... ✓ all 20 cases agree
181
+
182
+ ╭───────────── benchmark config ──────────────╮
183
+ │ model ollama/llama3.2:3b │
184
+ │ cases 20 (10 conflict · 10 compat) │
185
+ │ variables 4 (A, B, C, D) │
186
+ │ temperature 0 (deterministic) │
187
+ │ max tokens 5 (yes / no) │
188
+ │ workers 8 parallel │
189
+ ╰─────────────────────────────────────────────╯
190
+
191
+ ollama/llama3.2:3b — 20/20 cases | 50.0% hallucination rate
192
+
193
+ # Rule 1 Rule 2 vars engine llm
194
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
195
+ 1 ✓ B !B B no no
196
+ 2 ✓ A.B+C !A.!B.!C A B C no no
197
+ 3 ✓ A.B A.!B A B no no
198
+ 4 ✗ A+!B A.(B+C) A B C yes no
199
+ 5 ✓ A.B A^B A B no no
200
+ 6 ✗ !A+B.C B A B C yes no
201
+ 7 ✗ A.B+C A+B A B C yes no
202
+ 8 ✗ A+B.C.D C A B C D yes no
203
+ 9 ✗ A.B B A B yes no
204
+ 10 ✗ !C !B B C yes no
205
+ ...
206
+
207
+ ╭─────────── results — ollama/llama3.2:3b ───────────╮
208
+ │ model ollama/llama3.2:3b │
209
+ │ total cases 20 (10 conflict · 10 compat) │
210
+ │ variables 4 (A, B, C, D) │
211
+ │ temperature 0 (deterministic) │
212
+ │ max tokens 5 │
213
+ │ correct 10 │
214
+ │ hallucinated 10 │
215
+ │ hallucination rate 50.0% │
216
+ │ missed conflicts 0/10 (0.0%) │
217
+ │ missed compatibles 10/10 (100.0%) │
218
+ ╰────────────────────────────────────────────────────╯
219
+ ```
220
+
221
+ Both models score 50% — equal to a coin flip — but in opposite directions. tinyllama always answers "yes", llama3.2:3b always answers "no". Neither is reasoning. Both are outputting a constant.
222
+
223
+ The `vars` column shows how many variables each case involves. The `engine` column is ground truth. Every mismatch with `llm` is a provable hallucination — not an opinion.
224
+
225
+ ![Benchmark results — 20 cases](https://raw.githubusercontent.com/Shrivastava-Aditya/boolean-algebra-engine-python/v0.1.6/images/benchmark_20cases.png)
226
+
227
+ Per-case strips (bottom row of the chart): every conflict cell is uniformly one colour per model, every compatible cell is the opposite. No case-by-case variation — no reasoning happening at all.
228
+
229
+ ---
230
+
231
+ ## Install
232
+
233
+ ```bash
234
+ # Core engine — zero dependencies
235
+ pip install boolean-algebra-engine
236
+
237
+ # With CLI
238
+ pip install "boolean-algebra-engine[cli]"
239
+
240
+ # With MCP server (for Claude Desktop)
241
+ pip install "boolean-algebra-engine[mcp]"
242
+
243
+ # With REST API
244
+ pip install "boolean-algebra-engine[api]"
245
+
246
+ # With NL layer (Anthropic)
247
+ pip install "boolean-algebra-engine[nl-anthropic]"
248
+
249
+ # With NL layer (OpenAI)
250
+ pip install "boolean-algebra-engine[nl-openai]"
251
+ ```
252
+
253
+ ---
254
+
255
+ ## Core API
256
+
257
+ ```python
258
+ from core.evaluator import evaluate
259
+ from core.synthesizer import synthesize
260
+
261
+ # Forward: expression → truth table
262
+ table, _ = evaluate("A.(B+C)")
263
+ print(table.variables) # ['A', 'B', 'C']
264
+ print(table.minterms) # [5, 6, 7]
265
+ print(table.satisfiable) # True
266
+
267
+ # Inverse: truth table → minimal expression
268
+ minimal, _ = synthesize(table)
269
+ print(minimal) # A.C+A.B
270
+
271
+ # Equivalence and satisfiability (via MCP server functions — no HTTP, direct call)
272
+ # pip install boolean-algebra-engine[mcp]
273
+ from mcp_server.server import equivalent, satisfiable
274
+
275
+ print(equivalent("A.(B+C)", "A.B+A.C")["equivalent"]) # True — distributive law
276
+ print(satisfiable("A.!A")["satisfiable"]) # False — contradiction
277
+ ```
278
+
279
+ `core/` has zero external dependencies. Import it into any Python project.
280
+
281
+ ---
282
+
283
+ ## MCP — Claude calls the engine
284
+
285
+ Wire the engine into Claude Desktop and Claude stops predicting boolean logic. It computes it.
286
+
287
+ ```json
288
+ {
289
+ "mcpServers": {
290
+ "boolean-algebra-engine": {
291
+ "command": "python",
292
+ "args": ["-m", "mcp_server.server"]
293
+ }
294
+ }
295
+ }
296
+ ```
297
+
298
+ Five tools Claude can call mid-conversation:
299
+ - `evaluate` — expression → truth table
300
+ - `simplify` — expression → minimal form
301
+ - `equivalent` — are two expressions identical?
302
+ - `satisfiable` — does any input make this true?
303
+ - `check_prompt_logic` — audit a full rule set for contradictions, tautologies, conflicts, duplicates
304
+
305
+ ---
306
+
307
+ ## Operators
308
+
309
+ | Symbol | Operation | Precedence |
310
+ |--------|-----------|------------|
311
+ | `!` | NOT | 4 (highest) |
312
+ | `.` | AND | 3 |
313
+ | `^` | XOR | 2 |
314
+ | `+` | OR | 1 (lowest) |
315
+
316
+ Variables: uppercase `A`–`Z`. Parentheses override precedence. Up to 26 variables, arbitrary nesting.
317
+
318
+ ---
319
+
320
+ ## Interfaces
321
+
322
+ | Interface | How |
323
+ |---|---|
324
+ | **Python library** | `from core.evaluator import evaluate` — embed in any project |
325
+ | **CLI / REPL** | `boolcalc "A.B+!A.C"` — instant truth table in terminal |
326
+ | **MCP server** | Claude Desktop plugin — plug and play |
327
+ | **REST API** | `POST /check-rules` — callable from any language or stack |
328
+ | **NL layer** | Plain English → expression → verified result (Anthropic, OpenAI, Ollama, any OpenAI-compat) |
329
+ | **Streamlit UI** | Three modes: Expression, Rule Auditor, Plain English |
330
+
331
+ ---
332
+
333
+ ## Credibility
334
+
335
+ The engine does not sample, approximate, or predict. It evaluates every possible input combination:
336
+
337
+ - **Satisfiable** — an actual row where output = 1 was found
338
+ - **Contradiction** — every row was checked, all were 0
339
+ - **Equivalent** — output columns compared row-by-row across the full truth table
340
+ - **Conflict** — conjunction of both rules evaluated for every input, always returned 0
341
+
342
+ The core evaluator is 15 lines (`core/evaluator.py`). No black box, no model weights, no probability — just arithmetic. This is a stronger correctness claim than any probabilistic tool can make.
343
+
344
+ 90 tests across unit, integration, edge cases, and round-trips. All passing.
345
+
@@ -0,0 +1,298 @@
1
+ # boolean-algebra-engine
2
+
3
+ **The logic layer your AI is missing.**
4
+
5
+ AI agents hallucinate on boolean logic — not sometimes, reliably. They predict the next token. They don't compute. This engine does. Deterministic, exhaustive, under 10ms. It sits inside your agent pipeline and makes one guarantee the model cannot make itself: that its reasoning is logically consistent.
6
+
7
+ ```bash
8
+ pip install boolean-algebra-engine
9
+ ```
10
+
11
+ ---
12
+
13
+ ## Quick start
14
+
15
+ Zero dependencies. Works immediately after install.
16
+
17
+ ```python
18
+ from core.evaluator import evaluate
19
+ from core.synthesizer import synthesize
20
+
21
+ # Does a contradiction exist?
22
+ table, _ = evaluate("A.!A")
23
+ print(table.satisfiable) # False — always a contradiction
24
+
25
+ # Can two rules both be true simultaneously?
26
+ table, _ = evaluate("(A.B).(!A)")
27
+ print(table.satisfiable) # False — A and !A can't both hold
28
+
29
+ # Full truth table
30
+ table, _ = evaluate("A.(B+C)")
31
+ print(table.variables) # ['A', 'B', 'C']
32
+ print(table.minterms) # [5, 6, 7]
33
+ print(table.satisfiable) # True
34
+
35
+ # Simplify to minimal form
36
+ minimal, _ = synthesize(table)
37
+ print(minimal) # A.C+A.B
38
+ ```
39
+
40
+ ---
41
+
42
+ ## The problem
43
+
44
+ Six rules. Three variables. Written by four people over six months.
45
+
46
+ A fintech AI agent auto-approves or rejects loan applications based on these rules — nobody ever verified them together. The engine checks all 8 input combinations for every rule, in every combination:
47
+
48
+ ```python
49
+ # pip install boolean-algebra-engine[mcp]
50
+ from mcp_server.server import check_prompt_logic
51
+
52
+ result = check_prompt_logic([
53
+ "A.B", # approve: good credit AND income verified
54
+ "!A", # reject: bad credit
55
+ "C", # approve: collateral exists
56
+ "!C", # reject: no collateral
57
+ ])
58
+
59
+ print(result["summary"])
60
+ # {'total': 4, 'contradictions': 0, 'tautologies': 0,
61
+ # 'equivalent_pairs': 0, 'conflicting_pairs': 2}
62
+
63
+ print([(p["rule1"], p["rule2"]) for p in result["pairwise"] if p["always_conflict"]])
64
+ # [('A.B', '!A'), ('C', '!C')]
65
+ ```
66
+
67
+ **What it found:**
68
+ - `A.B` and `!A` conflict — good credit approval and bad credit rejection fire simultaneously when `A=1`. The agent picks a winner arbitrarily.
69
+ - `C` and `!C` conflict — collateral approval and no-collateral rejection are mutually exclusive by definition. Both rules can never apply at the same time.
70
+
71
+ Nobody caught these by reading the rules. The engine caught them by checking every combination.
72
+
73
+ ---
74
+
75
+ ## The benchmark
76
+
77
+ The engine is the oracle — ground truth is computed by exhaustive enumeration, not guessed. Every LLM disagreement is a provable hallucination.
78
+
79
+ **Methodology:** generate pairs of boolean expressions where the correct answer (satisfiable or not) is known exactly. Ask the LLM. Compare. No ambiguity, no human labeling, no interpretation.
80
+
81
+ ```
82
+ python3 benchmark.py --provider ollama --model tinyllama --cases 20
83
+ python3 benchmark.py --provider ollama --model llama3.2:3b --cases 20
84
+ ```
85
+
86
+ **tinyllama — 1.1B parameters**
87
+
88
+ ```
89
+ ⬡ z3 verifying 20 ground truth labels... ✓ all 20 cases agree
90
+
91
+ ╭───────────── benchmark config ──────────────╮
92
+ │ model ollama/tinyllama │
93
+ │ cases 20 (10 conflict · 10 compat) │
94
+ │ variables 3 (A, B, C) │
95
+ │ temperature 0 (deterministic) │
96
+ │ max tokens 5 (yes / no) │
97
+ │ workers 8 parallel │
98
+ ╰─────────────────────────────────────────────╯
99
+
100
+ ollama/tinyllama — 20/20 cases | 50.0% hallucination rate
101
+
102
+ # Rule 1 Rule 2 vars engine llm
103
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
104
+ 1 ✗ B !B B no yes
105
+ 2 ✗ A.B+C !A.!B.!C A B C no yes
106
+ 3 ✗ A.B A.!B A B no yes
107
+ 4 ✓ A+!B A.(B+C) A B C yes yes
108
+ 5 ✗ A.B A^B A B no yes
109
+ 6 ✓ !A+B.C B A B C yes yes
110
+ 7 ✓ A.B+C A+B A B C yes yes
111
+ 8 ✓ A+B.C.D C A B C D yes yes
112
+ 9 ✓ A.B B A B yes yes
113
+ 10 ✓ !C !B B C yes yes
114
+ ...
115
+
116
+ ╭─────────── results — ollama/tinyllama ─────────────╮
117
+ │ model ollama/tinyllama │
118
+ │ total cases 20 (10 conflict · 10 compat) │
119
+ │ variables 3 (A, B, C) │
120
+ │ temperature 0 (deterministic) │
121
+ │ max tokens 5 │
122
+ │ correct 10 │
123
+ │ hallucinated 10 │
124
+ │ hallucination rate 50.0% │
125
+ │ missed conflicts 10/10 (100.0%) │
126
+ │ missed compatibles 0/10 (0.0%) │
127
+ ╰────────────────────────────────────────────────────╯
128
+ ```
129
+
130
+ **llama3.2:3b — 3B parameters**
131
+
132
+ ```
133
+ ⬡ z3 verifying 20 ground truth labels... ✓ all 20 cases agree
134
+
135
+ ╭───────────── benchmark config ──────────────╮
136
+ │ model ollama/llama3.2:3b │
137
+ │ cases 20 (10 conflict · 10 compat) │
138
+ │ variables 4 (A, B, C, D) │
139
+ │ temperature 0 (deterministic) │
140
+ │ max tokens 5 (yes / no) │
141
+ │ workers 8 parallel │
142
+ ╰─────────────────────────────────────────────╯
143
+
144
+ ollama/llama3.2:3b — 20/20 cases | 50.0% hallucination rate
145
+
146
+ # Rule 1 Rule 2 vars engine llm
147
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
148
+ 1 ✓ B !B B no no
149
+ 2 ✓ A.B+C !A.!B.!C A B C no no
150
+ 3 ✓ A.B A.!B A B no no
151
+ 4 ✗ A+!B A.(B+C) A B C yes no
152
+ 5 ✓ A.B A^B A B no no
153
+ 6 ✗ !A+B.C B A B C yes no
154
+ 7 ✗ A.B+C A+B A B C yes no
155
+ 8 ✗ A+B.C.D C A B C D yes no
156
+ 9 ✗ A.B B A B yes no
157
+ 10 ✗ !C !B B C yes no
158
+ ...
159
+
160
+ ╭─────────── results — ollama/llama3.2:3b ───────────╮
161
+ │ model ollama/llama3.2:3b │
162
+ │ total cases 20 (10 conflict · 10 compat) │
163
+ │ variables 4 (A, B, C, D) │
164
+ │ temperature 0 (deterministic) │
165
+ │ max tokens 5 │
166
+ │ correct 10 │
167
+ │ hallucinated 10 │
168
+ │ hallucination rate 50.0% │
169
+ │ missed conflicts 0/10 (0.0%) │
170
+ │ missed compatibles 10/10 (100.0%) │
171
+ ╰────────────────────────────────────────────────────╯
172
+ ```
173
+
174
+ Both models score 50% — equal to a coin flip — but in opposite directions. tinyllama always answers "yes", llama3.2:3b always answers "no". Neither is reasoning. Both are outputting a constant.
175
+
176
+ The `vars` column shows how many variables each case involves. The `engine` column is ground truth. Every mismatch with `llm` is a provable hallucination — not an opinion.
177
+
178
+ ![Benchmark results — 20 cases](https://raw.githubusercontent.com/Shrivastava-Aditya/boolean-algebra-engine-python/v0.1.6/images/benchmark_20cases.png)
179
+
180
+ Per-case strips (bottom row of the chart): every conflict cell is uniformly one colour per model, every compatible cell is the opposite. No case-by-case variation — no reasoning happening at all.
181
+
182
+ ---
183
+
184
+ ## Install
185
+
186
+ ```bash
187
+ # Core engine — zero dependencies
188
+ pip install boolean-algebra-engine
189
+
190
+ # With CLI
191
+ pip install "boolean-algebra-engine[cli]"
192
+
193
+ # With MCP server (for Claude Desktop)
194
+ pip install "boolean-algebra-engine[mcp]"
195
+
196
+ # With REST API
197
+ pip install "boolean-algebra-engine[api]"
198
+
199
+ # With NL layer (Anthropic)
200
+ pip install "boolean-algebra-engine[nl-anthropic]"
201
+
202
+ # With NL layer (OpenAI)
203
+ pip install "boolean-algebra-engine[nl-openai]"
204
+ ```
205
+
206
+ ---
207
+
208
+ ## Core API
209
+
210
+ ```python
211
+ from core.evaluator import evaluate
212
+ from core.synthesizer import synthesize
213
+
214
+ # Forward: expression → truth table
215
+ table, _ = evaluate("A.(B+C)")
216
+ print(table.variables) # ['A', 'B', 'C']
217
+ print(table.minterms) # [5, 6, 7]
218
+ print(table.satisfiable) # True
219
+
220
+ # Inverse: truth table → minimal expression
221
+ minimal, _ = synthesize(table)
222
+ print(minimal) # A.C+A.B
223
+
224
+ # Equivalence and satisfiability (via MCP server functions — no HTTP, direct call)
225
+ # pip install boolean-algebra-engine[mcp]
226
+ from mcp_server.server import equivalent, satisfiable
227
+
228
+ print(equivalent("A.(B+C)", "A.B+A.C")["equivalent"]) # True — distributive law
229
+ print(satisfiable("A.!A")["satisfiable"]) # False — contradiction
230
+ ```
231
+
232
+ `core/` has zero external dependencies. Import it into any Python project.
233
+
234
+ ---
235
+
236
+ ## MCP — Claude calls the engine
237
+
238
+ Wire the engine into Claude Desktop and Claude stops predicting boolean logic. It computes it.
239
+
240
+ ```json
241
+ {
242
+ "mcpServers": {
243
+ "boolean-algebra-engine": {
244
+ "command": "python",
245
+ "args": ["-m", "mcp_server.server"]
246
+ }
247
+ }
248
+ }
249
+ ```
250
+
251
+ Five tools Claude can call mid-conversation:
252
+ - `evaluate` — expression → truth table
253
+ - `simplify` — expression → minimal form
254
+ - `equivalent` — are two expressions identical?
255
+ - `satisfiable` — does any input make this true?
256
+ - `check_prompt_logic` — audit a full rule set for contradictions, tautologies, conflicts, duplicates
257
+
258
+ ---
259
+
260
+ ## Operators
261
+
262
+ | Symbol | Operation | Precedence |
263
+ |--------|-----------|------------|
264
+ | `!` | NOT | 4 (highest) |
265
+ | `.` | AND | 3 |
266
+ | `^` | XOR | 2 |
267
+ | `+` | OR | 1 (lowest) |
268
+
269
+ Variables: uppercase `A`–`Z`. Parentheses override precedence. Up to 26 variables, arbitrary nesting.
270
+
271
+ ---
272
+
273
+ ## Interfaces
274
+
275
+ | Interface | How |
276
+ |---|---|
277
+ | **Python library** | `from core.evaluator import evaluate` — embed in any project |
278
+ | **CLI / REPL** | `boolcalc "A.B+!A.C"` — instant truth table in terminal |
279
+ | **MCP server** | Claude Desktop plugin — plug and play |
280
+ | **REST API** | `POST /check-rules` — callable from any language or stack |
281
+ | **NL layer** | Plain English → expression → verified result (Anthropic, OpenAI, Ollama, any OpenAI-compat) |
282
+ | **Streamlit UI** | Three modes: Expression, Rule Auditor, Plain English |
283
+
284
+ ---
285
+
286
+ ## Credibility
287
+
288
+ The engine does not sample, approximate, or predict. It evaluates every possible input combination:
289
+
290
+ - **Satisfiable** — an actual row where output = 1 was found
291
+ - **Contradiction** — every row was checked, all were 0
292
+ - **Equivalent** — output columns compared row-by-row across the full truth table
293
+ - **Conflict** — conjunction of both rules evaluated for every input, always returned 0
294
+
295
+ The core evaluator is 15 lines (`core/evaluator.py`). No black box, no model weights, no probability — just arithmetic. This is a stronger correctness claim than any probabilistic tool can make.
296
+
297
+ 90 tests across unit, integration, edge cases, and round-trips. All passing.
298
+