unicode-fol-kit 0.1.0__tar.gz → 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (29) hide show
  1. unicode_fol_kit-0.2.0/PKG-INFO +645 -0
  2. unicode_fol_kit-0.2.0/README.md +621 -0
  3. unicode_fol_kit-0.2.0/pyproject.toml +44 -0
  4. unicode_fol_kit-0.2.0/unicode_fol_kit/__init__.py +39 -0
  5. unicode_fol_kit-0.2.0/unicode_fol_kit/fol/__init__.py +35 -0
  6. unicode_fol_kit-0.1.0/unicode_fol_kit/fol/nodes.py → unicode_fol_kit-0.2.0/unicode_fol_kit/fol/_fol_nodes.py +41 -8
  7. unicode_fol_kit-0.2.0/unicode_fol_kit/fol/_msfl_nodes.py +954 -0
  8. unicode_fol_kit-0.2.0/unicode_fol_kit/fol/grammars/fl.lark +68 -0
  9. unicode_fol_kit-0.1.0/unicode_fol_kit/fol/syntax.lark → unicode_fol_kit-0.2.0/unicode_fol_kit/fol/grammars/fol.lark +17 -27
  10. unicode_fol_kit-0.2.0/unicode_fol_kit/fol/grammars/msfl.lark +68 -0
  11. unicode_fol_kit-0.2.0/unicode_fol_kit/fol/grammars/msfol.lark +64 -0
  12. unicode_fol_kit-0.2.0/unicode_fol_kit/fol/grammars/terminals.lark +17 -0
  13. unicode_fol_kit-0.2.0/unicode_fol_kit/fol/msflparser.py +182 -0
  14. {unicode_fol_kit-0.1.0 → unicode_fol_kit-0.2.0}/unicode_fol_kit/fol/naming.py +31 -13
  15. unicode_fol_kit-0.2.0/unicode_fol_kit/fol/nodes.py +45 -0
  16. unicode_fol_kit-0.1.0/.gitattributes +0 -2
  17. unicode_fol_kit-0.1.0/PKG-INFO +0 -302
  18. unicode_fol_kit-0.1.0/README.md +0 -291
  19. unicode_fol_kit-0.1.0/pyproject.toml +0 -18
  20. unicode_fol_kit-0.1.0/requirements.txt +0 -9
  21. unicode_fol_kit-0.1.0/test.py +0 -7
  22. unicode_fol_kit-0.1.0/unicode_fol_kit/__init__.py +0 -17
  23. unicode_fol_kit-0.1.0/unicode_fol_kit/fol/__init__.py +0 -15
  24. unicode_fol_kit-0.1.0/unicode_fol_kit/fol/folparser.py +0 -21
  25. {unicode_fol_kit-0.1.0 → unicode_fol_kit-0.2.0}/.gitignore +0 -0
  26. {unicode_fol_kit-0.1.0 → unicode_fol_kit-0.2.0}/LICENSE +0 -0
  27. {unicode_fol_kit-0.1.0 → unicode_fol_kit-0.2.0}/unicode_fol_kit/atp/__init__.py +0 -0
  28. {unicode_fol_kit-0.1.0 → unicode_fol_kit-0.2.0}/unicode_fol_kit/atp/prover9_entailment.py +0 -0
  29. {unicode_fol_kit-0.1.0 → unicode_fol_kit-0.2.0}/unicode_fol_kit/atp/z3_equivalence.py +0 -0
@@ -0,0 +1,645 @@
1
+ Metadata-Version: 2.4
2
+ Name: unicode-fol-kit
3
+ Version: 0.2.0
4
+ Summary: Parser and toolkit for first-order logic formulas using Unicode operators
5
+ Project-URL: Repository, https://github.com/fvossel/unicode-fol-kit
6
+ Project-URL: Issues, https://github.com/fvossel/unicode-fol-kit/issues
7
+ Author-email: Felix Vossel <felixvossel@gmail.com>
8
+ License: MIT
9
+ License-File: LICENSE
10
+ Keywords: first-order-logic,fol,lambda-calculus,logic,lukasiewicz,parser
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: Intended Audience :: Science/Research
14
+ Classifier: License :: OSI Approved :: MIT License
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Programming Language :: Python :: 3.10
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Classifier: Programming Language :: Python :: 3.12
19
+ Classifier: Topic :: Scientific/Engineering
20
+ Requires-Python: >=3.10
21
+ Requires-Dist: lark>=1.1
22
+ Requires-Dist: z3-solver>=4.12
23
+ Description-Content-Type: text/markdown
24
+
25
+ # unicode-fol-kit
26
+
27
+ A Python toolkit for parsing and working with first-order logic (FOL) formulas written with Unicode operators. The single parser class `MSFLParser` supports three modes — classical FOL, many-sorted FOL (MSFOL), and many-sorted fuzzy logic (MSFL, Łukasiewicz) — selected by constructor flags.
28
+
29
+ ## Features
30
+
31
+ - **Three parser modes** — FOL, many-sorted FOL (MSFOL), and many-sorted fuzzy/Łukasiewicz logic (MSFL), all from one class
32
+ - **Unicode surface syntax** — natural symbols (∀ ∃ ∧ ∨ ¬ → ↔ ⊕ ⊗ = ≠ ≤ ≥) with no ASCII fallbacks needed
33
+ - **Sorted quantifiers and constants** — `∀x:Human P(x)`, `P(alice:Human)` in MSFOL and MSFL modes
34
+ - **Łukasiewicz operators** — weak ∧ / ∨ (min/max), strong ⊗ / ⊕ (t-norm/t-conorm), and Łukasiewicz ¬ → ↔ in MSFL mode
35
+ - **Full AST** — all standard FOL constructs plus MSFL-specific nodes, all as Python dataclasses
36
+ - **Reductions** — `to_msfol()` lowers Łukasiewicz operators to classical nodes; `to_fol()` further eliminates sorts via relativisation
37
+ - **Serialisation** — convert formulas to/from JSON dictionaries; round-trip safe
38
+ - **Tree view** — render any formula as a readable ASCII tree
39
+ - **Z3 export** — translate formulas to Z3 expressions for SMT solving
40
+ - **Prover9 export** — translate formulas to Prover9 syntax for automated theorem proving
41
+ - **TPTP export** — translate formulas to TPTP syntax
42
+ - **Equivalence checking** — check if two formulas are logically equivalent via Z3
43
+ - **Entailment checking** — check if a conclusion follows from premises via Prover9
44
+ - **Lambda abstraction** — `λx. φ` syntax in all three parser modes; parameters can be variables (`λx.`), named constants (`λfoo.`), or predicate symbols (`λP.`); body extends rightward through all connectives
45
+ - **Higher-order predicate application** — `(func)(arg)` explicit application; `λP. P(x)` writes the body naturally and is automatically scope-resolved to `Application(LambdaVar("P"), Variable("x"))`
46
+ - **Lambda-calculus operations** — free-variable computation, capture-avoiding substitution, beta-reduction (normal-order, step-limited), eta-reduction, combined beta-eta normalisation to fixpoint, and lexical scope resolution
47
+
48
+ ## Installation
49
+
50
+ ### Via pip
51
+
52
+ ```bash
53
+ pip install unicode-fol-kit
54
+ ```
55
+
56
+ ### Via git clone
57
+
58
+ ```bash
59
+ git clone https://github.com/felixvossel/unicode-fol-kit.git
60
+ cd unicode-fol-kit
61
+ pip install .
62
+ ```
63
+
64
+ ## Parser modes
65
+
66
+ `MSFLParser` is instantiated with two boolean flags:
67
+
68
+ ```python
69
+ MSFLParser(many_sorted=False, fuzzy=False) # FOL (default)
70
+ MSFLParser(many_sorted=True, fuzzy=False) # MSFOL
71
+ MSFLParser(many_sorted=True, fuzzy=True) # MSFL
72
+ MSFLParser(many_sorted=False, fuzzy=True) # → raises ValueError
73
+ ```
74
+
75
+ | `many_sorted` | `fuzzy` | Mode | Quantifiers | Constants | Connectives |
76
+ |---|---|---|---|---|---|
77
+ | `False` | `False` | **FOL** | unsorted `∀x` | unsorted | classical ∧ ∨ ⊕ ¬ → ↔ |
78
+ | `True` | `False` | **MSFOL** | sorted `∀x:Sort` | sorted `alice:Sort` | classical ∧ ∨ ¬ → ↔ (no ⊕) |
79
+ | `True` | `True` | **MSFL** | sorted `∀x:Sort` | sorted `alice:Sort` | weak ∧ ∨, strong ⊗ ⊕, Łuk ¬ → ↔ |
80
+
81
+ ## Usage
82
+
83
+ ### FOL mode (default)
84
+
85
+ ```python
86
+ from unicode_fol_kit import MSFLParser
87
+
88
+ parser = MSFLParser()
89
+ formula = parser.parse("∀x (Human(x) → Mortal(x))")
90
+ ```
91
+
92
+ ### MSFOL mode
93
+
94
+ Quantifiers and ground constants must carry a sort annotation. The colon can be written with or without a space before it:
95
+
96
+ ```python
97
+ parser = MSFLParser(many_sorted=True)
98
+
99
+ # Sorted quantifier
100
+ q = parser.parse("∀x:Human (Mortal(x) ∧ ¬Immortal(x))")
101
+ # SortedQuantifier(type='∀', variable=Variable('x'), sort='Human', formula=…)
102
+
103
+ # Sorted constant — both spacing forms are accepted
104
+ parser.parse("P(alice:Human)")
105
+ parser.parse("P(alice :Human)")
106
+ ```
107
+
108
+ ### MSFL mode
109
+
110
+ Uses Łukasiewicz logic: `∧`/`∨` become weak (min/max), `⊗`/`⊕` become strong (t-norm/t-conorm), and `¬`/`→`/`↔` map to their Łukasiewicz counterparts:
111
+
112
+ ```python
113
+ parser = MSFLParser(many_sorted=True, fuzzy=True)
114
+
115
+ parser.parse("P(x) ∧ Q(x)") # WeakConjunction (min)
116
+ parser.parse("P(x) ⊗ Q(x)") # StrongConjunction (t-norm: max{0, x+y−1})
117
+ parser.parse("P(x) ⊕ Q(x)") # StrongDisjunction (t-conorm: min{1, x+y})
118
+ parser.parse("¬P(x)") # LukNegation (1−x)
119
+ parser.parse("P(x) → Q(x)") # LukImplication (min{1, 1−x+y})
120
+ parser.parse("∀x:Human P(x)") # SortedQuantifier
121
+ ```
122
+
123
+ ### ASCII tree view
124
+
125
+ ```python
126
+ formula = MSFLParser().parse("∀x (Human(x) → Mortal(x))")
127
+ print(formula.tree_str())
128
+ # ∀ x
129
+ # └── →
130
+ # ├── Atom: Human
131
+ # │ └── Variable: x
132
+ # └── Atom: Mortal
133
+ # └── Variable: x
134
+ ```
135
+
136
+ ### Exporting to other formats
137
+
138
+ ```python
139
+ formula.to_prover9() # '(all x (Human(x) -> Mortal(x)))'
140
+ formula.to_tptp() # '(![X]: (human(X) => mortal(X)))'
141
+ formula.to_dict() # JSON-serialisable dict
142
+ ```
143
+
144
+ ### Serialisation
145
+
146
+ ```python
147
+ from unicode_fol_kit import Node
148
+
149
+ d = formula.to_dict()
150
+ formula2 = Node.from_dict(d) # round-trip
151
+ ```
152
+
153
+ ### Lambda-calculus
154
+
155
+ All three parser modes support lambda abstraction and application. `parse()` automatically applies scope resolution, so the returned AST is always fully resolved.
156
+
157
+ ```python
158
+ from unicode_fol_kit import (
159
+ MSFLParser,
160
+ LambdaVar, Lambda, Application,
161
+ free_variables, substitute,
162
+ beta_reduce, eta_reduce, beta_eta_normalize,
163
+ resolve_lambda_scope,
164
+ ReductionLimitError,
165
+ )
166
+
167
+ parser = MSFLParser()
168
+
169
+ # Parse — scope resolution is applied automatically
170
+ term = parser.parse("λP. λx. P(x)")
171
+ # Lambda(LambdaVar("P"), Lambda(LambdaVar("x"), Application(LambdaVar("P"), LambdaVar("x"))))
172
+
173
+ # Application
174
+ app = parser.parse("(λP. P(x))(Q)")
175
+ # Application(Lambda(LambdaVar("P"), Application(LambdaVar("P"), Variable("x"))),
176
+ # Atom("Q", []))
177
+ ```
178
+
179
+ #### Free variables
180
+
181
+ ```python
182
+ term = parser.parse("λP. P(x)")
183
+ free_variables(term)
184
+ # {Variable("x")} — x is free; P is lambda-bound and does not appear
185
+ ```
186
+
187
+ The result is a mixed set that may contain both `Variable` (logical) and `LambdaVar` (lambda-bound) objects.
188
+
189
+ #### Beta-reduction
190
+
191
+ `beta_reduce` reduces to beta-normal form using a normal-order (leftmost-outermost) strategy with full capture-avoiding substitution. It raises `ReductionLimitError` after 10 000 steps if the term does not normalise.
192
+
193
+ ```python
194
+ # (λP. λx. P(x))(Q) → λx. Application(Atom("Q",[]), LambdaVar("x"))
195
+ result = beta_reduce(parser.parse("(λP. λx. P(x))(Q)"))
196
+
197
+ # Full pipeline: parse → resolve → reduce
198
+ reduced = beta_reduce(parser.parse("(λP. P(x))(λy. Q(y))"))
199
+ # Atom("Q", [Variable("x")])
200
+ ```
201
+
202
+ #### Eta-reduction
203
+
204
+ `eta_reduce` performs a single bottom-up pass contracting all eta-redexes: `λp. f(p) → f` when `p` is not free in `f`. Quantifiers are recursed into but never treated as eta-redexes.
205
+
206
+ ```python
207
+ from unicode_fol_kit import LambdaVar, Lambda, Application, Atom, Variable
208
+
209
+ f = Atom("P", [Variable("x")]) # some formula
210
+ term = Lambda(LambdaVar("p"), Application(f, LambdaVar("p"))) # λp. f(p)
211
+ eta_reduce(term) # → f (the Atom node, not the Lambda)
212
+ ```
213
+
214
+ #### Beta-eta normalisation
215
+
216
+ `beta_eta_normalize` alternates `beta_reduce` and `eta_reduce` to fixpoint (up to 100 rounds). The alternation loop is a genuine necessity: eta-reduction can expose fresh beta-redexes, requiring another beta pass.
217
+
218
+ ```python
219
+ normal = beta_eta_normalize(parser.parse("(λP. P(x))(Q)"))
220
+ ```
221
+
222
+ `ReductionLimitError` is raised if the inner beta-reduction limit or the outer round limit is exceeded.
223
+
224
+ #### Scope resolution (standalone)
225
+
226
+ `resolve_lambda_scope` is also available as a standalone function for hand-built ASTs:
227
+
228
+ ```python
229
+ from unicode_fol_kit import resolve_lambda_scope, Lambda, LambdaVar, Atom, Variable
230
+
231
+ raw = Lambda(LambdaVar("x"), Atom("P", [Variable("x")]))
232
+ resolved = resolve_lambda_scope(raw)
233
+ # Lambda(LambdaVar("x"), Atom("P", [LambdaVar("x")]))
234
+ ```
235
+
236
+ ### Reducing MSFL formulas to classical FOL
237
+
238
+ `to_fol()` performs a two-phase reduction: it first lowers Łukasiewicz operators to classical ones (`to_msfol()`), then eliminates sort annotations via relativisation (`_relativize()`):
239
+
240
+ ```python
241
+ from unicode_fol_kit import MSFLParser, to_fol
242
+
243
+ parser = MSFLParser(many_sorted=True, fuzzy=True)
244
+ formula = parser.parse("∀x:Human (P(x) ∧ ¬Q(x))")
245
+
246
+ classical = to_fol(formula)
247
+ # Quantifier(∀, x, Implies(Atom(Human, [x]), And(Atom(P,[x]), Not(Atom(Q,[x])))))
248
+
249
+ # Optionally, conjoin sort-membership facts for constants at the top level:
250
+ classical_with_facts = to_fol(formula, include_sort_facts=True)
251
+ ```
252
+
253
+ ### Equivalence checking (Z3)
254
+
255
+ ```python
256
+ from unicode_fol_kit import MSFLParser, formulas_are_equivalent
257
+
258
+ parser = MSFLParser()
259
+ f1 = parser.parse("¬(P(x) ∧ Q(x))")
260
+ f2 = parser.parse("¬P(x) ∨ ¬Q(x)")
261
+
262
+ formulas_are_equivalent(f1, f2) # True
263
+ ```
264
+
265
+ ### Entailment checking (Prover9)
266
+
267
+ ```python
268
+ from unicode_fol_kit import MSFLParser, check_logical_entailment
269
+
270
+ parser = MSFLParser()
271
+ premises = [
272
+ parser.parse("∀x (Human(x) → Mortal(x))"),
273
+ parser.parse("Human(socrates)"),
274
+ ]
275
+ conclusion = parser.parse("Mortal(socrates)")
276
+
277
+ check_logical_entailment(premises, conclusion, prover9_path="/usr/bin/prover9") # True
278
+ ```
279
+
280
+ ## Syntax reference
281
+
282
+ This section describes the full surface syntax accepted by the parser. Because the three modes share the same term and atom layer, most of the syntax is identical across modes; differences are called out explicitly.
283
+
284
+ ### Tokens
285
+
286
+ The lexer distinguishes the following token kinds. Because the patterns are mutually exclusive, a given identifier is unambiguously a variable, a constant, a function/predicate name, a number, or a sort annotation.
287
+
288
+ | Token | Pattern | Examples | Meaning |
289
+ |---|---|---|---|
290
+ | Variable | one lowercase letter, optional trailing digits | `x`, `y`, `x1`, `z42` | a (possibly bound) logical variable |
291
+ | Name | lowercase, at least two letters, may contain digits and uppercase after the first letter | `socrates`, `distance`, `centerOf`, `foo1` | a bare constant or a function symbol |
292
+ | Constant (`c_`) | `c_` followed by letters/digits | `c_a`, `c_zero`, `c_42` | an explicitly marked constant |
293
+ | Predicate | one uppercase letter, then letters/digits | `P`, `Human`, `OnSurfaceOf` | a predicate symbol |
294
+ | Number | digits, optional decimal part | `0`, `42`, `3.14` | a numeric literal |
295
+ | Sort annotation | `:` followed by an uppercase letter and letters/digits | `:Human`, `:Sort1` | a sort tag *(MSFOL and MSFL modes only)* |
296
+
297
+ The `c_` form exists so that **single-letter constants** can be written without colliding with variables. A bare `a` is always a variable; if you need the constant *a*, write `c_a`.
298
+
299
+ A function or predicate is recognised by being immediately followed by a parenthesised argument list, e.g. `distance(x, y)` or `Human(socrates)`. The same token class (Name) serves both as a bare constant and, when applied, as a function symbol.
300
+
301
+ The sort annotation token always begins with `:`, which makes it lexically disjoint from all other tokens. **Whitespace before the colon is optional**: `∀x:Human P(x)` and `∀x :Human P(x)` are both valid and produce identical parse trees.
302
+
303
+ ### Terms
304
+
305
+ A term is one of:
306
+
307
+ - a variable (`x`, `x1`)
308
+ - a constant (`socrates`, `c_a`) or number (`42`, `3.14`)
309
+ - in MSFOL / MSFL modes: a **sort-annotated constant** (`alice:Human`, `c_a:Sort1`)
310
+ - a function application (`f(t1, …, tn)`, e.g. `centerOf(x)`)
311
+ - an arithmetic combination of terms using `+`, `-`, `*`, `/`
312
+ - a parenthesised term (`(t)`)
313
+
314
+ Arithmetic follows the usual precedence: `*` and `/` bind tighter than `+` and `-`, and both groups are left-associative. For example `x + y * z` parses as `x + (y * z)`.
315
+
316
+ **Sort rules in MSFOL / MSFL modes:** variables are sorted implicitly by the quantifier that binds them; ground constants must carry an explicit sort annotation. An unsorted constant (e.g. bare `alice`) is a syntax error in sorted modes.
317
+
318
+ ### Atomic formulas
319
+
320
+ An atomic formula is either:
321
+
322
+ - a predicate applied to terms: `P`, `Human(socrates)`, `OnSurfaceOf(y, x)`
323
+ (a predicate may be nullary, i.e. used without arguments)
324
+ - an infix comparison between two terms: `=`, `≠`, `<`, `>`, `≤`, `≥`,
325
+ e.g. `x1 + 1 = y1` or `distance(y, c) > distance(z, c)`
326
+
327
+ ### Compound formulas
328
+
329
+ Atomic formulas are combined with connectives and quantifiers. The available connectives and their interpretations depend on the mode:
330
+
331
+ #### FOL mode
332
+
333
+ | Syntax | Operator | Interpretation |
334
+ |---|---|---|
335
+ | `¬φ` | negation | classical |
336
+ | `φ ∧ ψ` | conjunction | classical |
337
+ | `φ ∨ ψ` | disjunction | classical |
338
+ | `φ ⊕ ψ` | exclusive or | classical |
339
+ | `φ → ψ` | implication | classical |
340
+ | `φ ↔ ψ` | biconditional | classical |
341
+ | `∀x φ` | universal | unsorted |
342
+ | `∃x φ` | existential | unsorted |
343
+
344
+ #### MSFOL mode
345
+
346
+ Same connectives as FOL **except `⊕` (exclusive or) is not available**. Quantifiers require a sort annotation:
347
+
348
+ | Syntax | Operator |
349
+ |---|---|
350
+ | `¬φ`, `φ ∧ ψ`, `φ ∨ ψ`, `φ → ψ`, `φ ↔ ψ` | classical (as FOL) |
351
+ | `∀x:Sort φ`, `∃x:Sort φ` | sorted quantifiers |
352
+
353
+ #### MSFL mode
354
+
355
+ Connectives are reinterpreted as Łukasiewicz operators:
356
+
357
+ | Syntax | Operator | Semantics |
358
+ |---|---|---|
359
+ | `¬φ` | Łuk. negation | 1 − φ |
360
+ | `φ ∧ ψ` | weak conjunction | min(φ, ψ) |
361
+ | `φ ∨ ψ` | weak disjunction | max(φ, ψ) |
362
+ | `φ ⊗ ψ` | strong conjunction | max(0, φ + ψ − 1) |
363
+ | `φ ⊕ ψ` | strong disjunction | min(1, φ + ψ) |
364
+ | `φ → ψ` | Łuk. implication | min(1, 1 − φ + ψ) |
365
+ | `φ ↔ ψ` | Łuk. equivalence | 1 − \|φ − ψ\| |
366
+ | `∀x:Sort φ`, `∃x:Sort φ` | sorted quantifiers | |
367
+
368
+ A formula may be wrapped in parentheses `( … )` or square brackets `[ … ]`; the two are interchangeable for grouping.
369
+
370
+ ### Operator precedence
371
+
372
+ The precedence levels are the same across all three modes (MSFL uses the same syntactic structure with Łukasiewicz semantics):
373
+
374
+ | Precedence | Operators | Associativity |
375
+ |---|---|---|
376
+ | 1 (highest) | `¬`, quantifiers `∀` / `∃` | prefix |
377
+ | 2 | `∧` `∨` `⊕` (FOL) / `∧` `∨` (MSFOL) / `∧` `∨` `⊗` `⊕` (MSFL) | left |
378
+ | 3 | `→` | right |
379
+ | 4 (lowest) | `↔` | right |
380
+
381
+ Worked examples (parenthesised to show how the parser groups them):
382
+
383
+ - `¬P(x) ∧ Q(x)` → `(¬P(x)) ∧ Q(x)` — negation binds tighter than conjunction
384
+ - `P(x) ∧ Q(x) → R(x)` → `(P(x) ∧ Q(x)) → R(x)` — conjunction binds tighter than implication
385
+ - `P(x) → Q(x) ↔ R(x)` → `(P(x) → Q(x)) ↔ R(x)` — implication binds tighter than biconditional
386
+ - `P(x) → Q(x) → R(x)` → `P(x) → (Q(x) → R(x))` — implication is right-associative
387
+ - `P(x) ∧ Q(x) ∧ R(x)` → `(P(x) ∧ Q(x)) ∧ R(x)` — conjunction is left-associative
388
+
389
+ ### Mixing same-level operators
390
+
391
+ The same-level connectives (level 2 above) **cannot be mixed without explicit parentheses**. This is deliberate: it avoids the silent, easy-to-misread grouping that a default precedence would impose.
392
+
393
+ **FOL mode** — `∧`, `∨`, `⊕` cannot be mixed:
394
+
395
+ ```text
396
+ P(x) ∧ Q(x) ∨ R(x) # rejected
397
+ (P(x) ∧ Q(x)) ∨ R(x) # accepted
398
+ ```
399
+
400
+ **MSFOL mode** — `∧` and `∨` cannot be mixed:
401
+
402
+ ```text
403
+ P(x) ∧ Q(x) ∨ R(x) # rejected
404
+ (P(x) ∧ Q(x)) ∨ R(x) # accepted
405
+ ```
406
+
407
+ **MSFL mode** — `∧`, `∨`, `⊗`, `⊕` cannot be mixed:
408
+
409
+ ```text
410
+ P(x) ∧ Q(x) ⊗ R(x) # rejected
411
+ (P(x) ∧ Q(x)) ⊗ R(x) # accepted
412
+ ```
413
+
414
+ A chain of the *same* operator is always fine: `P ∧ Q ∧ R`, `P ⊗ Q ⊗ R`, etc.
415
+
416
+ ### Quantifier scope
417
+
418
+ A quantifier binds **only the immediately following (tightly bound) formula**, not the rest of the line:
419
+
420
+ ```text
421
+ ∀x P(x) ∧ Q(x) # parses as (∀x P(x)) ∧ Q(x)
422
+ ∀x P(x) → Q(x) # parses as (∀x P(x)) → Q(x)
423
+ ```
424
+
425
+ If you intend the quantifier to range over the whole formula — which is usually what is meant — **add parentheses**:
426
+
427
+ ```text
428
+ ∀x (P(x) → Q(x)) # quantifier ranges over the implication
429
+ ∀x (P(x) ∧ Q(x)) # quantifier ranges over the conjunction
430
+ ```
431
+
432
+ Quantifiers can be stacked directly: `∀x:H ∀y:H ∃z:A φ`.
433
+
434
+ ### Supported symbols
435
+
436
+ | Category | FOL | MSFOL | MSFL |
437
+ |---|---|---|---|
438
+ | Quantifiers | `∀` `∃` (unsorted) | `∀` `∃` (sorted `:Sort`) | `∀` `∃` (sorted `:Sort`) |
439
+ | Connectives | `∧` `∨` `⊕` `¬` `→` `↔` | `∧` `∨` `¬` `→` `↔` | `∧` `∨` `⊗` `⊕` `¬` `→` `↔` |
440
+ | Lambda | `λ` | `λ` | `λ` |
441
+ | Sort annotations | — | `:Sort` | `:Sort` |
442
+ | Equality / comparison | `=` `≠` `<` `>` `≤` `≥` | same | same |
443
+ | Arithmetic | `+` `-` `*` `/` | same | same |
444
+ | Grouping | `(` `)` `[` `]` | same | same |
445
+ | Argument separator | `,` | same | same |
446
+
447
+ Whitespace is insignificant and may be used freely between tokens — including before sort annotation colons.
448
+
449
+ ### Lambda abstraction and application (all modes)
450
+
451
+ A lambda abstraction is written `λ` followed by a parameter name, a literal `.`, and a body formula. All three parser modes support identical lambda surface notation.
452
+
453
+ #### Parameter types
454
+
455
+ | Parameter form | Example | Typical use |
456
+ |---|---|---|
457
+ | Single lowercase letter | `λx. P(x)` | value variable |
458
+ | Multi-letter lowercase name | `λfoo. P(foo(x))` | named-constant parameter |
459
+ | Uppercase predicate symbol | `λP. P(x)` | predicate / higher-order parameter |
460
+
461
+ All three token classes become a `LambdaVar` in the AST. Scope resolution (applied automatically by `parse()`) then rewrites body occurrences of the lambda-bound name:
462
+
463
+ - **Variable occurrence** — `λx. P(x)`: the `x` in `P(x)` becomes `LambdaVar("x")`.
464
+ - **Predicate-application occurrence** — `λP. P(x)`: the `P(x)` in the body becomes `Application(LambdaVar("P"), Variable("x"))`. Multi-argument atoms curry left: `P(x, y)` → `Application(Application(LambdaVar("P"), x), y)`.
465
+ - **Named-function occurrence** — `λfoo. P(foo(x))`: the `foo(x)` in `P`'s argument list (a term-level function call) becomes `Application(LambdaVar("foo"), Variable("x"))`.
466
+
467
+ The scope obeys the **innermost-binder rule**: a quantifier removes the quantified name from the lambda-bound set. Inside `λx. ∀x P(x)`, the `x` in `P(x)` is logical (stays `Variable`).
468
+
469
+ #### Body scope
470
+
471
+ The body extends rightward through all connectives — lambda has lower precedence than every binary operator:
472
+
473
+ ```text
474
+ λx. P(x) ∧ Q(x) # body is the And node P(x) ∧ Q(x)
475
+ λx. P(x) → Q(x) # body is the Implies node P(x) → Q(x)
476
+ ```
477
+
478
+ Multi-parameter lambdas are written by nesting: `λP. λx. P(x)`.
479
+
480
+ #### Application syntax
481
+
482
+ A lambda application requires both sides to be parenthesised: `(func)(arg)`.
483
+
484
+ ```text
485
+ (λx. P(x))(a) # arg is variable a
486
+ (λx. P(x))(alice) # arg is constant alice
487
+ (λP. P(x))(Q) # arg is the zero-arity atom Q
488
+ (λP. P(x))(Q(y)) # arg is the atom Q(y)
489
+ ```
490
+
491
+ Higher-order application inside the body — a predicate parameter applied to arguments — is written in the natural `P(x)` notation, not as `(P)(x)`. Scope resolution handles the rewrite automatically.
492
+
493
+ #### Parse examples
494
+
495
+ ```python
496
+ parser = MSFLParser()
497
+
498
+ parser.parse("λx. P(x)")
499
+ # Lambda(LambdaVar("x"), Atom("P", [LambdaVar("x")]))
500
+
501
+ parser.parse("λP. P(x)")
502
+ # Lambda(LambdaVar("P"), Application(LambdaVar("P"), Variable("x")))
503
+
504
+ parser.parse("λP. λx. P(x)")
505
+ # Lambda(LambdaVar("P"), Lambda(LambdaVar("x"), Application(LambdaVar("P"), LambdaVar("x"))))
506
+
507
+ parser.parse("λx. ∀x P(x)")
508
+ # Lambda(LambdaVar("x"), Quantifier("∀", Variable("x"), Atom("P", [Variable("x")])))
509
+ # x inside ∀ is quantifier-bound — NOT rewritten to LambdaVar
510
+
511
+ parser.parse("(λP. P(x))(Q)")
512
+ # Application(Lambda(LambdaVar("P"), Application(LambdaVar("P"), Variable("x"))), Atom("Q", []))
513
+ ```
514
+
515
+ ### A complete FOL example
516
+
517
+ ```text
518
+ ∀x ((Object(x) ∧ HasThreeDimensionalShape(x) ∧
519
+ ∀y ∀z ((Point(y) ∧ OnSurfaceOf(y, x) ∧ Point(z) ∧ OnSurfaceOf(z, x))
520
+ → distance(y, centerOf(x)) = distance(z, centerOf(x))))
521
+ → Sphere(x))
522
+ ```
523
+
524
+ ### A complete MSFOL example
525
+
526
+ ```text
527
+ ∀x:Person ∀y:Person (Knows(x, y) ∧ Trusted(y:Person)) → Shares(x, y)
528
+ ```
529
+
530
+ ### A complete MSFL example
531
+
532
+ ```text
533
+ ∀x:Patient ∀y:Treatment
534
+ (Effective(y:Treatment) ⊗ Tolerable(x:Patient, y:Treatment))
535
+ → Recommended(x:Patient, y:Treatment)
536
+ ```
537
+
538
+ ## AST nodes
539
+
540
+ All nodes are Python dataclasses and can be imported from `unicode_fol_kit`.
541
+
542
+ ### Shared term and atom nodes (all modes)
543
+
544
+ | Class | Fields | Notes |
545
+ |---|---|---|
546
+ | `Variable` | `name: str` | bound or free variable |
547
+ | `Constant` | `name: str` | bare constant or c_-prefixed |
548
+ | `Number` | `value: int \| float` | numeric literal |
549
+ | `Function` | `name: str`, `args: list` | function application and arithmetic ops |
550
+ | `Atom` | `predicate: str`, `args: list` | predicate or infix comparison |
551
+
552
+ ### Classical formula nodes (FOL / MSFOL)
553
+
554
+ | Class | Fields |
555
+ |---|---|
556
+ | `Not` | `formula` |
557
+ | `And` | `left`, `right` |
558
+ | `Or` | `left`, `right` |
559
+ | `Xor` | `left`, `right` *(FOL only)* |
560
+ | `Implies` | `left`, `right` |
561
+ | `Iff` | `left`, `right` |
562
+ | `Quantifier` | `type: str`, `variable`, `formula` *(FOL only)* |
563
+
564
+ ### MSFOL / MSFL nodes
565
+
566
+ | Class | Fields | Notes |
567
+ |---|---|---|
568
+ | `SortedQuantifier` | `type: str`, `variable`, `sort: str`, `formula` | sort annotation without leading `:` |
569
+ | `SortedConstant` | `name: str`, `sort: str` | sort annotation without leading `:` |
570
+
571
+ ### MSFL Łukasiewicz nodes
572
+
573
+ | Class | Fields | Semantics |
574
+ |---|---|---|
575
+ | `LukNegation` | `formula` | 1 − φ |
576
+ | `WeakConjunction` | `left`, `right` | min(φ, ψ) |
577
+ | `WeakDisjunction` | `left`, `right` | max(φ, ψ) |
578
+ | `StrongConjunction` | `left`, `right` | max(0, φ + ψ − 1) |
579
+ | `StrongDisjunction` | `left`, `right` | min(1, φ + ψ) |
580
+ | `LukImplication` | `left`, `right` | min(1, 1 − φ + ψ) |
581
+ | `LukEquivalence` | `left`, `right` | 1 − \|φ − ψ\| |
582
+
583
+ ### Lambda-calculus nodes (all modes)
584
+
585
+ | Class | Fields | Notes |
586
+ |---|---|---|
587
+ | `LambdaVar` | `name: str` | lambda-bound variable; frozen and hashable — distinct from `Variable` |
588
+ | `Lambda` | `param: LambdaVar`, `body: Node` | lambda abstraction `λparam. body` |
589
+ | `Application` | `func: Node`, `arg: Node` | lambda application `func(arg)` |
590
+
591
+ `LambdaVar` is kept separate from `Variable` so that logical binding (by quantifiers) and lambda binding never get confused. `free_variables()` returns a mixed set that may contain both.
592
+
593
+ ### Reductions
594
+
595
+ Every MSFL node implements two reduction steps:
596
+
597
+ - **`to_msfol()`** — lowers Łukasiewicz connectives to classical nodes while preserving sort annotations (`SortedQuantifier` and `SortedConstant` survive unchanged).
598
+ - **`_relativize(facts)`** — eliminates sort annotations by replacing `∀x:S φ` with `∀x (S(x) → φ)` and `∃x:S φ` with `∃x (S(x) ∧ φ)`, and replacing `SortedConstant(name, sort)` with a plain `Constant(name)`.
599
+
600
+ The top-level helper `to_fol(node, include_sort_facts=False)` chains both steps and optionally conjoins sort-membership atoms for all ground constants at the top level.
601
+
602
+ ## Error handling
603
+
604
+ Parse errors are reported with human-readable messages rather than raw parser internals. Lexer-level problems (an invalid character, a malformed name or number) raise `NamingError`; structural problems (an incomplete formula, a misplaced operator, or an attempt to mix same-level connectives without parentheses) raise `ParsingError`. Both report the offending position and, where useful, a hint. The hint text is mode-aware:
605
+
606
+ ```python
607
+ from unicode_fol_kit import MSFLParser
608
+
609
+ # FOL mode — hint names ∧, ∨, and ⊕
610
+ MSFLParser().parse("P(x) ∧ Q(x) ∨ R(x)")
611
+ # SYNTAX_ERROR: … Hint: Cannot mix conjunction (∧), disjunction (∨), and exclusive or (⊕) without parentheses
612
+
613
+ # MSFOL mode — hint names only ∧ and ∨
614
+ MSFLParser(many_sorted=True).parse("P(x) ∧ Q(x) ∨ R(x)")
615
+ # SYNTAX_ERROR: … Hint: Cannot mix conjunction (∧) and disjunction (∨) without parentheses
616
+
617
+ # MSFL mode — hint names all four Łukasiewicz connectives
618
+ MSFLParser(many_sorted=True, fuzzy=True).parse("P(x) ∧ Q(x) ⊗ R(x)")
619
+ # SYNTAX_ERROR: … Hint: Cannot mix weak conjunction (∧), weak disjunction (∨),
620
+ # strong conjunction (⊗), and strong disjunction (⊕) without parentheses
621
+ ```
622
+
623
+ ## Citation
624
+
625
+ If you use this toolkit in academic work, please cite the accompanying preprint:
626
+
627
+ ```bibtex
628
+ @misc{vossel2025advancingnaturallanguageformalization,
629
+ title={Advancing Natural Language Formalization to First Order Logic with Fine-tuned LLMs},
630
+ author={Felix Vossel and Till Mossakowski and Björn Gehrke},
631
+ year={2025},
632
+ eprint={2509.22338},
633
+ archivePrefix={arXiv},
634
+ primaryClass={cs.CL},
635
+ url={https://arxiv.org/abs/2509.22338},
636
+ }
637
+ ```
638
+
639
+ > Vossel, F., Mossakowski, T., & Gehrke, B. (2025). *Advancing Natural Language
640
+ > Formalization to First Order Logic with Fine-tuned LLMs.* arXiv preprint
641
+ > arXiv:2509.22338.
642
+
643
+ ## License
644
+
645
+ MIT