unicode-fol-kit 0.1.0__tar.gz → 0.2.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- unicode_fol_kit-0.2.0/PKG-INFO +645 -0
- unicode_fol_kit-0.2.0/README.md +621 -0
- unicode_fol_kit-0.2.0/pyproject.toml +44 -0
- unicode_fol_kit-0.2.0/unicode_fol_kit/__init__.py +39 -0
- unicode_fol_kit-0.2.0/unicode_fol_kit/fol/__init__.py +35 -0
- unicode_fol_kit-0.1.0/unicode_fol_kit/fol/nodes.py → unicode_fol_kit-0.2.0/unicode_fol_kit/fol/_fol_nodes.py +41 -8
- unicode_fol_kit-0.2.0/unicode_fol_kit/fol/_msfl_nodes.py +954 -0
- unicode_fol_kit-0.2.0/unicode_fol_kit/fol/grammars/fl.lark +68 -0
- unicode_fol_kit-0.1.0/unicode_fol_kit/fol/syntax.lark → unicode_fol_kit-0.2.0/unicode_fol_kit/fol/grammars/fol.lark +17 -27
- unicode_fol_kit-0.2.0/unicode_fol_kit/fol/grammars/msfl.lark +68 -0
- unicode_fol_kit-0.2.0/unicode_fol_kit/fol/grammars/msfol.lark +64 -0
- unicode_fol_kit-0.2.0/unicode_fol_kit/fol/grammars/terminals.lark +17 -0
- unicode_fol_kit-0.2.0/unicode_fol_kit/fol/msflparser.py +182 -0
- {unicode_fol_kit-0.1.0 → unicode_fol_kit-0.2.0}/unicode_fol_kit/fol/naming.py +31 -13
- unicode_fol_kit-0.2.0/unicode_fol_kit/fol/nodes.py +45 -0
- unicode_fol_kit-0.1.0/.gitattributes +0 -2
- unicode_fol_kit-0.1.0/PKG-INFO +0 -302
- unicode_fol_kit-0.1.0/README.md +0 -291
- unicode_fol_kit-0.1.0/pyproject.toml +0 -18
- unicode_fol_kit-0.1.0/requirements.txt +0 -9
- unicode_fol_kit-0.1.0/test.py +0 -7
- unicode_fol_kit-0.1.0/unicode_fol_kit/__init__.py +0 -17
- unicode_fol_kit-0.1.0/unicode_fol_kit/fol/__init__.py +0 -15
- unicode_fol_kit-0.1.0/unicode_fol_kit/fol/folparser.py +0 -21
- {unicode_fol_kit-0.1.0 → unicode_fol_kit-0.2.0}/.gitignore +0 -0
- {unicode_fol_kit-0.1.0 → unicode_fol_kit-0.2.0}/LICENSE +0 -0
- {unicode_fol_kit-0.1.0 → unicode_fol_kit-0.2.0}/unicode_fol_kit/atp/__init__.py +0 -0
- {unicode_fol_kit-0.1.0 → unicode_fol_kit-0.2.0}/unicode_fol_kit/atp/prover9_entailment.py +0 -0
- {unicode_fol_kit-0.1.0 → unicode_fol_kit-0.2.0}/unicode_fol_kit/atp/z3_equivalence.py +0 -0
|
@@ -0,0 +1,645 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: unicode-fol-kit
|
|
3
|
+
Version: 0.2.0
|
|
4
|
+
Summary: Parser and toolkit for first-order logic formulas using Unicode operators
|
|
5
|
+
Project-URL: Repository, https://github.com/fvossel/unicode-fol-kit
|
|
6
|
+
Project-URL: Issues, https://github.com/fvossel/unicode-fol-kit/issues
|
|
7
|
+
Author-email: Felix Vossel <felixvossel@gmail.com>
|
|
8
|
+
License: MIT
|
|
9
|
+
License-File: LICENSE
|
|
10
|
+
Keywords: first-order-logic,fol,lambda-calculus,logic,lukasiewicz,parser
|
|
11
|
+
Classifier: Development Status :: 3 - Alpha
|
|
12
|
+
Classifier: Intended Audience :: Developers
|
|
13
|
+
Classifier: Intended Audience :: Science/Research
|
|
14
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
15
|
+
Classifier: Programming Language :: Python :: 3
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
19
|
+
Classifier: Topic :: Scientific/Engineering
|
|
20
|
+
Requires-Python: >=3.10
|
|
21
|
+
Requires-Dist: lark>=1.1
|
|
22
|
+
Requires-Dist: z3-solver>=4.12
|
|
23
|
+
Description-Content-Type: text/markdown
|
|
24
|
+
|
|
25
|
+
# unicode-fol-kit
|
|
26
|
+
|
|
27
|
+
A Python toolkit for parsing and working with first-order logic (FOL) formulas written with Unicode operators. The single parser class `MSFLParser` supports three modes — classical FOL, many-sorted FOL (MSFOL), and many-sorted fuzzy logic (MSFL, Łukasiewicz) — selected by constructor flags.
|
|
28
|
+
|
|
29
|
+
## Features
|
|
30
|
+
|
|
31
|
+
- **Three parser modes** — FOL, many-sorted FOL (MSFOL), and many-sorted fuzzy/Łukasiewicz logic (MSFL), all from one class
|
|
32
|
+
- **Unicode surface syntax** — natural symbols (∀ ∃ ∧ ∨ ¬ → ↔ ⊕ ⊗ = ≠ ≤ ≥) with no ASCII fallbacks needed
|
|
33
|
+
- **Sorted quantifiers and constants** — `∀x:Human P(x)`, `P(alice:Human)` in MSFOL and MSFL modes
|
|
34
|
+
- **Łukasiewicz operators** — weak ∧ / ∨ (min/max), strong ⊗ / ⊕ (t-norm/t-conorm), and Łukasiewicz ¬ → ↔ in MSFL mode
|
|
35
|
+
- **Full AST** — all standard FOL constructs plus MSFL-specific nodes, all as Python dataclasses
|
|
36
|
+
- **Reductions** — `to_msfol()` lowers Łukasiewicz operators to classical nodes; `to_fol()` further eliminates sorts via relativisation
|
|
37
|
+
- **Serialisation** — convert formulas to/from JSON dictionaries; round-trip safe
|
|
38
|
+
- **Tree view** — render any formula as a readable ASCII tree
|
|
39
|
+
- **Z3 export** — translate formulas to Z3 expressions for SMT solving
|
|
40
|
+
- **Prover9 export** — translate formulas to Prover9 syntax for automated theorem proving
|
|
41
|
+
- **TPTP export** — translate formulas to TPTP syntax
|
|
42
|
+
- **Equivalence checking** — check if two formulas are logically equivalent via Z3
|
|
43
|
+
- **Entailment checking** — check if a conclusion follows from premises via Prover9
|
|
44
|
+
- **Lambda abstraction** — `λx. φ` syntax in all three parser modes; parameters can be variables (`λx.`), named constants (`λfoo.`), or predicate symbols (`λP.`); body extends rightward through all connectives
|
|
45
|
+
- **Higher-order predicate application** — `(func)(arg)` explicit application; `λP. P(x)` writes the body naturally and is automatically scope-resolved to `Application(LambdaVar("P"), Variable("x"))`
|
|
46
|
+
- **Lambda-calculus operations** — free-variable computation, capture-avoiding substitution, beta-reduction (normal-order, step-limited), eta-reduction, combined beta-eta normalisation to fixpoint, and lexical scope resolution
|
|
47
|
+
|
|
48
|
+
## Installation
|
|
49
|
+
|
|
50
|
+
### Via pip
|
|
51
|
+
|
|
52
|
+
```bash
|
|
53
|
+
pip install unicode-fol-kit
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
### Via git clone
|
|
57
|
+
|
|
58
|
+
```bash
|
|
59
|
+
git clone https://github.com/felixvossel/unicode-fol-kit.git
|
|
60
|
+
cd unicode-fol-kit
|
|
61
|
+
pip install .
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
## Parser modes
|
|
65
|
+
|
|
66
|
+
`MSFLParser` is instantiated with two boolean flags:
|
|
67
|
+
|
|
68
|
+
```python
|
|
69
|
+
MSFLParser(many_sorted=False, fuzzy=False) # FOL (default)
|
|
70
|
+
MSFLParser(many_sorted=True, fuzzy=False) # MSFOL
|
|
71
|
+
MSFLParser(many_sorted=True, fuzzy=True) # MSFL
|
|
72
|
+
MSFLParser(many_sorted=False, fuzzy=True) # → raises ValueError
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
| `many_sorted` | `fuzzy` | Mode | Quantifiers | Constants | Connectives |
|
|
76
|
+
|---|---|---|---|---|---|
|
|
77
|
+
| `False` | `False` | **FOL** | unsorted `∀x` | unsorted | classical ∧ ∨ ⊕ ¬ → ↔ |
|
|
78
|
+
| `True` | `False` | **MSFOL** | sorted `∀x:Sort` | sorted `alice:Sort` | classical ∧ ∨ ¬ → ↔ (no ⊕) |
|
|
79
|
+
| `True` | `True` | **MSFL** | sorted `∀x:Sort` | sorted `alice:Sort` | weak ∧ ∨, strong ⊗ ⊕, Łuk ¬ → ↔ |
|
|
80
|
+
|
|
81
|
+
## Usage
|
|
82
|
+
|
|
83
|
+
### FOL mode (default)
|
|
84
|
+
|
|
85
|
+
```python
|
|
86
|
+
from unicode_fol_kit import MSFLParser
|
|
87
|
+
|
|
88
|
+
parser = MSFLParser()
|
|
89
|
+
formula = parser.parse("∀x (Human(x) → Mortal(x))")
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
### MSFOL mode
|
|
93
|
+
|
|
94
|
+
Quantifiers and ground constants must carry a sort annotation. The colon can be written with or without a space before it:
|
|
95
|
+
|
|
96
|
+
```python
|
|
97
|
+
parser = MSFLParser(many_sorted=True)
|
|
98
|
+
|
|
99
|
+
# Sorted quantifier
|
|
100
|
+
q = parser.parse("∀x:Human (Mortal(x) ∧ ¬Immortal(x))")
|
|
101
|
+
# SortedQuantifier(type='∀', variable=Variable('x'), sort='Human', formula=…)
|
|
102
|
+
|
|
103
|
+
# Sorted constant — both spacing forms are accepted
|
|
104
|
+
parser.parse("P(alice:Human)")
|
|
105
|
+
parser.parse("P(alice :Human)")
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
### MSFL mode
|
|
109
|
+
|
|
110
|
+
Uses Łukasiewicz logic: `∧`/`∨` become weak (min/max), `⊗`/`⊕` become strong (t-norm/t-conorm), and `¬`/`→`/`↔` map to their Łukasiewicz counterparts:
|
|
111
|
+
|
|
112
|
+
```python
|
|
113
|
+
parser = MSFLParser(many_sorted=True, fuzzy=True)
|
|
114
|
+
|
|
115
|
+
parser.parse("P(x) ∧ Q(x)") # WeakConjunction (min)
|
|
116
|
+
parser.parse("P(x) ⊗ Q(x)") # StrongConjunction (t-norm: max{0, x+y−1})
|
|
117
|
+
parser.parse("P(x) ⊕ Q(x)") # StrongDisjunction (t-conorm: min{1, x+y})
|
|
118
|
+
parser.parse("¬P(x)") # LukNegation (1−x)
|
|
119
|
+
parser.parse("P(x) → Q(x)") # LukImplication (min{1, 1−x+y})
|
|
120
|
+
parser.parse("∀x:Human P(x)") # SortedQuantifier
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
### ASCII tree view
|
|
124
|
+
|
|
125
|
+
```python
|
|
126
|
+
formula = MSFLParser().parse("∀x (Human(x) → Mortal(x))")
|
|
127
|
+
print(formula.tree_str())
|
|
128
|
+
# ∀ x
|
|
129
|
+
# └── →
|
|
130
|
+
# ├── Atom: Human
|
|
131
|
+
# │ └── Variable: x
|
|
132
|
+
# └── Atom: Mortal
|
|
133
|
+
# └── Variable: x
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
### Exporting to other formats
|
|
137
|
+
|
|
138
|
+
```python
|
|
139
|
+
formula.to_prover9() # '(all x (Human(x) -> Mortal(x)))'
|
|
140
|
+
formula.to_tptp() # '(![X]: (human(X) => mortal(X)))'
|
|
141
|
+
formula.to_dict() # JSON-serialisable dict
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
### Serialisation
|
|
145
|
+
|
|
146
|
+
```python
|
|
147
|
+
from unicode_fol_kit import Node
|
|
148
|
+
|
|
149
|
+
d = formula.to_dict()
|
|
150
|
+
formula2 = Node.from_dict(d) # round-trip
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
### Lambda-calculus
|
|
154
|
+
|
|
155
|
+
All three parser modes support lambda abstraction and application. `parse()` automatically applies scope resolution, so the returned AST is always fully resolved.
|
|
156
|
+
|
|
157
|
+
```python
|
|
158
|
+
from unicode_fol_kit import (
|
|
159
|
+
MSFLParser,
|
|
160
|
+
LambdaVar, Lambda, Application,
|
|
161
|
+
free_variables, substitute,
|
|
162
|
+
beta_reduce, eta_reduce, beta_eta_normalize,
|
|
163
|
+
resolve_lambda_scope,
|
|
164
|
+
ReductionLimitError,
|
|
165
|
+
)
|
|
166
|
+
|
|
167
|
+
parser = MSFLParser()
|
|
168
|
+
|
|
169
|
+
# Parse — scope resolution is applied automatically
|
|
170
|
+
term = parser.parse("λP. λx. P(x)")
|
|
171
|
+
# Lambda(LambdaVar("P"), Lambda(LambdaVar("x"), Application(LambdaVar("P"), LambdaVar("x"))))
|
|
172
|
+
|
|
173
|
+
# Application
|
|
174
|
+
app = parser.parse("(λP. P(x))(Q)")
|
|
175
|
+
# Application(Lambda(LambdaVar("P"), Application(LambdaVar("P"), Variable("x"))),
|
|
176
|
+
# Atom("Q", []))
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
#### Free variables
|
|
180
|
+
|
|
181
|
+
```python
|
|
182
|
+
term = parser.parse("λP. P(x)")
|
|
183
|
+
free_variables(term)
|
|
184
|
+
# {Variable("x")} — x is free; P is lambda-bound and does not appear
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
The result is a mixed set that may contain both `Variable` (logical) and `LambdaVar` (lambda-bound) objects.
|
|
188
|
+
|
|
189
|
+
#### Beta-reduction
|
|
190
|
+
|
|
191
|
+
`beta_reduce` reduces to beta-normal form using a normal-order (leftmost-outermost) strategy with full capture-avoiding substitution. It raises `ReductionLimitError` after 10 000 steps if the term does not normalise.
|
|
192
|
+
|
|
193
|
+
```python
|
|
194
|
+
# (λP. λx. P(x))(Q) → λx. Application(Atom("Q",[]), LambdaVar("x"))
|
|
195
|
+
result = beta_reduce(parser.parse("(λP. λx. P(x))(Q)"))
|
|
196
|
+
|
|
197
|
+
# Full pipeline: parse → resolve → reduce
|
|
198
|
+
reduced = beta_reduce(parser.parse("(λP. P(x))(λy. Q(y))"))
|
|
199
|
+
# Atom("Q", [Variable("x")])
|
|
200
|
+
```
|
|
201
|
+
|
|
202
|
+
#### Eta-reduction
|
|
203
|
+
|
|
204
|
+
`eta_reduce` performs a single bottom-up pass contracting all eta-redexes: `λp. f(p) → f` when `p` is not free in `f`. Quantifiers are recursed into but never treated as eta-redexes.
|
|
205
|
+
|
|
206
|
+
```python
|
|
207
|
+
from unicode_fol_kit import LambdaVar, Lambda, Application, Atom, Variable
|
|
208
|
+
|
|
209
|
+
f = Atom("P", [Variable("x")]) # some formula
|
|
210
|
+
term = Lambda(LambdaVar("p"), Application(f, LambdaVar("p"))) # λp. f(p)
|
|
211
|
+
eta_reduce(term) # → f (the Atom node, not the Lambda)
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
#### Beta-eta normalisation
|
|
215
|
+
|
|
216
|
+
`beta_eta_normalize` alternates `beta_reduce` and `eta_reduce` to fixpoint (up to 100 rounds). The alternation loop is a genuine necessity: eta-reduction can expose fresh beta-redexes, requiring another beta pass.
|
|
217
|
+
|
|
218
|
+
```python
|
|
219
|
+
normal = beta_eta_normalize(parser.parse("(λP. P(x))(Q)"))
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
`ReductionLimitError` is raised if the inner beta-reduction limit or the outer round limit is exceeded.
|
|
223
|
+
|
|
224
|
+
#### Scope resolution (standalone)
|
|
225
|
+
|
|
226
|
+
`resolve_lambda_scope` is also available as a standalone function for hand-built ASTs:
|
|
227
|
+
|
|
228
|
+
```python
|
|
229
|
+
from unicode_fol_kit import resolve_lambda_scope, Lambda, LambdaVar, Atom, Variable
|
|
230
|
+
|
|
231
|
+
raw = Lambda(LambdaVar("x"), Atom("P", [Variable("x")]))
|
|
232
|
+
resolved = resolve_lambda_scope(raw)
|
|
233
|
+
# Lambda(LambdaVar("x"), Atom("P", [LambdaVar("x")]))
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
### Reducing MSFL formulas to classical FOL
|
|
237
|
+
|
|
238
|
+
`to_fol()` performs a two-phase reduction: it first lowers Łukasiewicz operators to classical ones (`to_msfol()`), then eliminates sort annotations via relativisation (`_relativize()`):
|
|
239
|
+
|
|
240
|
+
```python
|
|
241
|
+
from unicode_fol_kit import MSFLParser, to_fol
|
|
242
|
+
|
|
243
|
+
parser = MSFLParser(many_sorted=True, fuzzy=True)
|
|
244
|
+
formula = parser.parse("∀x:Human (P(x) ∧ ¬Q(x))")
|
|
245
|
+
|
|
246
|
+
classical = to_fol(formula)
|
|
247
|
+
# Quantifier(∀, x, Implies(Atom(Human, [x]), And(Atom(P,[x]), Not(Atom(Q,[x])))))
|
|
248
|
+
|
|
249
|
+
# Optionally, conjoin sort-membership facts for constants at the top level:
|
|
250
|
+
classical_with_facts = to_fol(formula, include_sort_facts=True)
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
### Equivalence checking (Z3)
|
|
254
|
+
|
|
255
|
+
```python
|
|
256
|
+
from unicode_fol_kit import MSFLParser, formulas_are_equivalent
|
|
257
|
+
|
|
258
|
+
parser = MSFLParser()
|
|
259
|
+
f1 = parser.parse("¬(P(x) ∧ Q(x))")
|
|
260
|
+
f2 = parser.parse("¬P(x) ∨ ¬Q(x)")
|
|
261
|
+
|
|
262
|
+
formulas_are_equivalent(f1, f2) # True
|
|
263
|
+
```
|
|
264
|
+
|
|
265
|
+
### Entailment checking (Prover9)
|
|
266
|
+
|
|
267
|
+
```python
|
|
268
|
+
from unicode_fol_kit import MSFLParser, check_logical_entailment
|
|
269
|
+
|
|
270
|
+
parser = MSFLParser()
|
|
271
|
+
premises = [
|
|
272
|
+
parser.parse("∀x (Human(x) → Mortal(x))"),
|
|
273
|
+
parser.parse("Human(socrates)"),
|
|
274
|
+
]
|
|
275
|
+
conclusion = parser.parse("Mortal(socrates)")
|
|
276
|
+
|
|
277
|
+
check_logical_entailment(premises, conclusion, prover9_path="/usr/bin/prover9") # True
|
|
278
|
+
```
|
|
279
|
+
|
|
280
|
+
## Syntax reference
|
|
281
|
+
|
|
282
|
+
This section describes the full surface syntax accepted by the parser. Because the three modes share the same term and atom layer, most of the syntax is identical across modes; differences are called out explicitly.
|
|
283
|
+
|
|
284
|
+
### Tokens
|
|
285
|
+
|
|
286
|
+
The lexer distinguishes the following token kinds. Because the patterns are mutually exclusive, a given identifier is unambiguously a variable, a constant, a function/predicate name, a number, or a sort annotation.
|
|
287
|
+
|
|
288
|
+
| Token | Pattern | Examples | Meaning |
|
|
289
|
+
|---|---|---|---|
|
|
290
|
+
| Variable | one lowercase letter, optional trailing digits | `x`, `y`, `x1`, `z42` | a (possibly bound) logical variable |
|
|
291
|
+
| Name | lowercase, at least two letters, may contain digits and uppercase after the first letter | `socrates`, `distance`, `centerOf`, `foo1` | a bare constant or a function symbol |
|
|
292
|
+
| Constant (`c_`) | `c_` followed by letters/digits | `c_a`, `c_zero`, `c_42` | an explicitly marked constant |
|
|
293
|
+
| Predicate | one uppercase letter, then letters/digits | `P`, `Human`, `OnSurfaceOf` | a predicate symbol |
|
|
294
|
+
| Number | digits, optional decimal part | `0`, `42`, `3.14` | a numeric literal |
|
|
295
|
+
| Sort annotation | `:` followed by an uppercase letter and letters/digits | `:Human`, `:Sort1` | a sort tag *(MSFOL and MSFL modes only)* |
|
|
296
|
+
|
|
297
|
+
The `c_` form exists so that **single-letter constants** can be written without colliding with variables. A bare `a` is always a variable; if you need the constant *a*, write `c_a`.
|
|
298
|
+
|
|
299
|
+
A function or predicate is recognised by being immediately followed by a parenthesised argument list, e.g. `distance(x, y)` or `Human(socrates)`. The same token class (Name) serves both as a bare constant and, when applied, as a function symbol.
|
|
300
|
+
|
|
301
|
+
The sort annotation token always begins with `:`, which makes it lexically disjoint from all other tokens. **Whitespace before the colon is optional**: `∀x:Human P(x)` and `∀x :Human P(x)` are both valid and produce identical parse trees.
|
|
302
|
+
|
|
303
|
+
### Terms
|
|
304
|
+
|
|
305
|
+
A term is one of:
|
|
306
|
+
|
|
307
|
+
- a variable (`x`, `x1`)
|
|
308
|
+
- a constant (`socrates`, `c_a`) or number (`42`, `3.14`)
|
|
309
|
+
- in MSFOL / MSFL modes: a **sort-annotated constant** (`alice:Human`, `c_a:Sort1`)
|
|
310
|
+
- a function application (`f(t1, …, tn)`, e.g. `centerOf(x)`)
|
|
311
|
+
- an arithmetic combination of terms using `+`, `-`, `*`, `/`
|
|
312
|
+
- a parenthesised term (`(t)`)
|
|
313
|
+
|
|
314
|
+
Arithmetic follows the usual precedence: `*` and `/` bind tighter than `+` and `-`, and both groups are left-associative. For example `x + y * z` parses as `x + (y * z)`.
|
|
315
|
+
|
|
316
|
+
**Sort rules in MSFOL / MSFL modes:** variables are sorted implicitly by the quantifier that binds them; ground constants must carry an explicit sort annotation. An unsorted constant (e.g. bare `alice`) is a syntax error in sorted modes.
|
|
317
|
+
|
|
318
|
+
### Atomic formulas
|
|
319
|
+
|
|
320
|
+
An atomic formula is either:
|
|
321
|
+
|
|
322
|
+
- a predicate applied to terms: `P`, `Human(socrates)`, `OnSurfaceOf(y, x)`
|
|
323
|
+
(a predicate may be nullary, i.e. used without arguments)
|
|
324
|
+
- an infix comparison between two terms: `=`, `≠`, `<`, `>`, `≤`, `≥`,
|
|
325
|
+
e.g. `x1 + 1 = y1` or `distance(y, c) > distance(z, c)`
|
|
326
|
+
|
|
327
|
+
### Compound formulas
|
|
328
|
+
|
|
329
|
+
Atomic formulas are combined with connectives and quantifiers. The available connectives and their interpretations depend on the mode:
|
|
330
|
+
|
|
331
|
+
#### FOL mode
|
|
332
|
+
|
|
333
|
+
| Syntax | Operator | Interpretation |
|
|
334
|
+
|---|---|---|
|
|
335
|
+
| `¬φ` | negation | classical |
|
|
336
|
+
| `φ ∧ ψ` | conjunction | classical |
|
|
337
|
+
| `φ ∨ ψ` | disjunction | classical |
|
|
338
|
+
| `φ ⊕ ψ` | exclusive or | classical |
|
|
339
|
+
| `φ → ψ` | implication | classical |
|
|
340
|
+
| `φ ↔ ψ` | biconditional | classical |
|
|
341
|
+
| `∀x φ` | universal | unsorted |
|
|
342
|
+
| `∃x φ` | existential | unsorted |
|
|
343
|
+
|
|
344
|
+
#### MSFOL mode
|
|
345
|
+
|
|
346
|
+
Same connectives as FOL **except `⊕` (exclusive or) is not available**. Quantifiers require a sort annotation:
|
|
347
|
+
|
|
348
|
+
| Syntax | Operator |
|
|
349
|
+
|---|---|
|
|
350
|
+
| `¬φ`, `φ ∧ ψ`, `φ ∨ ψ`, `φ → ψ`, `φ ↔ ψ` | classical (as FOL) |
|
|
351
|
+
| `∀x:Sort φ`, `∃x:Sort φ` | sorted quantifiers |
|
|
352
|
+
|
|
353
|
+
#### MSFL mode
|
|
354
|
+
|
|
355
|
+
Connectives are reinterpreted as Łukasiewicz operators:
|
|
356
|
+
|
|
357
|
+
| Syntax | Operator | Semantics |
|
|
358
|
+
|---|---|---|
|
|
359
|
+
| `¬φ` | Łuk. negation | 1 − φ |
|
|
360
|
+
| `φ ∧ ψ` | weak conjunction | min(φ, ψ) |
|
|
361
|
+
| `φ ∨ ψ` | weak disjunction | max(φ, ψ) |
|
|
362
|
+
| `φ ⊗ ψ` | strong conjunction | max(0, φ + ψ − 1) |
|
|
363
|
+
| `φ ⊕ ψ` | strong disjunction | min(1, φ + ψ) |
|
|
364
|
+
| `φ → ψ` | Łuk. implication | min(1, 1 − φ + ψ) |
|
|
365
|
+
| `φ ↔ ψ` | Łuk. equivalence | 1 − \|φ − ψ\| |
|
|
366
|
+
| `∀x:Sort φ`, `∃x:Sort φ` | sorted quantifiers | |
|
|
367
|
+
|
|
368
|
+
A formula may be wrapped in parentheses `( … )` or square brackets `[ … ]`; the two are interchangeable for grouping.
|
|
369
|
+
|
|
370
|
+
### Operator precedence
|
|
371
|
+
|
|
372
|
+
The precedence levels are the same across all three modes (MSFL uses the same syntactic structure with Łukasiewicz semantics):
|
|
373
|
+
|
|
374
|
+
| Precedence | Operators | Associativity |
|
|
375
|
+
|---|---|---|
|
|
376
|
+
| 1 (highest) | `¬`, quantifiers `∀` / `∃` | prefix |
|
|
377
|
+
| 2 | `∧` `∨` `⊕` (FOL) / `∧` `∨` (MSFOL) / `∧` `∨` `⊗` `⊕` (MSFL) | left |
|
|
378
|
+
| 3 | `→` | right |
|
|
379
|
+
| 4 (lowest) | `↔` | right |
|
|
380
|
+
|
|
381
|
+
Worked examples (parenthesised to show how the parser groups them):
|
|
382
|
+
|
|
383
|
+
- `¬P(x) ∧ Q(x)` → `(¬P(x)) ∧ Q(x)` — negation binds tighter than conjunction
|
|
384
|
+
- `P(x) ∧ Q(x) → R(x)` → `(P(x) ∧ Q(x)) → R(x)` — conjunction binds tighter than implication
|
|
385
|
+
- `P(x) → Q(x) ↔ R(x)` → `(P(x) → Q(x)) ↔ R(x)` — implication binds tighter than biconditional
|
|
386
|
+
- `P(x) → Q(x) → R(x)` → `P(x) → (Q(x) → R(x))` — implication is right-associative
|
|
387
|
+
- `P(x) ∧ Q(x) ∧ R(x)` → `(P(x) ∧ Q(x)) ∧ R(x)` — conjunction is left-associative
|
|
388
|
+
|
|
389
|
+
### Mixing same-level operators
|
|
390
|
+
|
|
391
|
+
The same-level connectives (level 2 above) **cannot be mixed without explicit parentheses**. This is deliberate: it avoids the silent, easy-to-misread grouping that a default precedence would impose.
|
|
392
|
+
|
|
393
|
+
**FOL mode** — `∧`, `∨`, `⊕` cannot be mixed:
|
|
394
|
+
|
|
395
|
+
```text
|
|
396
|
+
P(x) ∧ Q(x) ∨ R(x) # rejected
|
|
397
|
+
(P(x) ∧ Q(x)) ∨ R(x) # accepted
|
|
398
|
+
```
|
|
399
|
+
|
|
400
|
+
**MSFOL mode** — `∧` and `∨` cannot be mixed:
|
|
401
|
+
|
|
402
|
+
```text
|
|
403
|
+
P(x) ∧ Q(x) ∨ R(x) # rejected
|
|
404
|
+
(P(x) ∧ Q(x)) ∨ R(x) # accepted
|
|
405
|
+
```
|
|
406
|
+
|
|
407
|
+
**MSFL mode** — `∧`, `∨`, `⊗`, `⊕` cannot be mixed:
|
|
408
|
+
|
|
409
|
+
```text
|
|
410
|
+
P(x) ∧ Q(x) ⊗ R(x) # rejected
|
|
411
|
+
(P(x) ∧ Q(x)) ⊗ R(x) # accepted
|
|
412
|
+
```
|
|
413
|
+
|
|
414
|
+
A chain of the *same* operator is always fine: `P ∧ Q ∧ R`, `P ⊗ Q ⊗ R`, etc.
|
|
415
|
+
|
|
416
|
+
### Quantifier scope
|
|
417
|
+
|
|
418
|
+
A quantifier binds **only the immediately following (tightly bound) formula**, not the rest of the line:
|
|
419
|
+
|
|
420
|
+
```text
|
|
421
|
+
∀x P(x) ∧ Q(x) # parses as (∀x P(x)) ∧ Q(x)
|
|
422
|
+
∀x P(x) → Q(x) # parses as (∀x P(x)) → Q(x)
|
|
423
|
+
```
|
|
424
|
+
|
|
425
|
+
If you intend the quantifier to range over the whole formula — which is usually what is meant — **add parentheses**:
|
|
426
|
+
|
|
427
|
+
```text
|
|
428
|
+
∀x (P(x) → Q(x)) # quantifier ranges over the implication
|
|
429
|
+
∀x (P(x) ∧ Q(x)) # quantifier ranges over the conjunction
|
|
430
|
+
```
|
|
431
|
+
|
|
432
|
+
Quantifiers can be stacked directly: `∀x:H ∀y:H ∃z:A φ`.
|
|
433
|
+
|
|
434
|
+
### Supported symbols
|
|
435
|
+
|
|
436
|
+
| Category | FOL | MSFOL | MSFL |
|
|
437
|
+
|---|---|---|---|
|
|
438
|
+
| Quantifiers | `∀` `∃` (unsorted) | `∀` `∃` (sorted `:Sort`) | `∀` `∃` (sorted `:Sort`) |
|
|
439
|
+
| Connectives | `∧` `∨` `⊕` `¬` `→` `↔` | `∧` `∨` `¬` `→` `↔` | `∧` `∨` `⊗` `⊕` `¬` `→` `↔` |
|
|
440
|
+
| Lambda | `λ` | `λ` | `λ` |
|
|
441
|
+
| Sort annotations | — | `:Sort` | `:Sort` |
|
|
442
|
+
| Equality / comparison | `=` `≠` `<` `>` `≤` `≥` | same | same |
|
|
443
|
+
| Arithmetic | `+` `-` `*` `/` | same | same |
|
|
444
|
+
| Grouping | `(` `)` `[` `]` | same | same |
|
|
445
|
+
| Argument separator | `,` | same | same |
|
|
446
|
+
|
|
447
|
+
Whitespace is insignificant and may be used freely between tokens — including before sort annotation colons.
|
|
448
|
+
|
|
449
|
+
### Lambda abstraction and application (all modes)
|
|
450
|
+
|
|
451
|
+
A lambda abstraction is written `λ` followed by a parameter name, a literal `.`, and a body formula. All three parser modes support identical lambda surface notation.
|
|
452
|
+
|
|
453
|
+
#### Parameter types
|
|
454
|
+
|
|
455
|
+
| Parameter form | Example | Typical use |
|
|
456
|
+
|---|---|---|
|
|
457
|
+
| Single lowercase letter | `λx. P(x)` | value variable |
|
|
458
|
+
| Multi-letter lowercase name | `λfoo. P(foo(x))` | named-constant parameter |
|
|
459
|
+
| Uppercase predicate symbol | `λP. P(x)` | predicate / higher-order parameter |
|
|
460
|
+
|
|
461
|
+
All three token classes become a `LambdaVar` in the AST. Scope resolution (applied automatically by `parse()`) then rewrites body occurrences of the lambda-bound name:
|
|
462
|
+
|
|
463
|
+
- **Variable occurrence** — `λx. P(x)`: the `x` in `P(x)` becomes `LambdaVar("x")`.
|
|
464
|
+
- **Predicate-application occurrence** — `λP. P(x)`: the `P(x)` in the body becomes `Application(LambdaVar("P"), Variable("x"))`. Multi-argument atoms curry left: `P(x, y)` → `Application(Application(LambdaVar("P"), x), y)`.
|
|
465
|
+
- **Named-function occurrence** — `λfoo. P(foo(x))`: the `foo(x)` in `P`'s argument list (a term-level function call) becomes `Application(LambdaVar("foo"), Variable("x"))`.
|
|
466
|
+
|
|
467
|
+
The scope obeys the **innermost-binder rule**: a quantifier removes the quantified name from the lambda-bound set. Inside `λx. ∀x P(x)`, the `x` in `P(x)` is logical (stays `Variable`).
|
|
468
|
+
|
|
469
|
+
#### Body scope
|
|
470
|
+
|
|
471
|
+
The body extends rightward through all connectives — lambda has lower precedence than every binary operator:
|
|
472
|
+
|
|
473
|
+
```text
|
|
474
|
+
λx. P(x) ∧ Q(x) # body is the And node P(x) ∧ Q(x)
|
|
475
|
+
λx. P(x) → Q(x) # body is the Implies node P(x) → Q(x)
|
|
476
|
+
```
|
|
477
|
+
|
|
478
|
+
Multi-parameter lambdas are written by nesting: `λP. λx. P(x)`.
|
|
479
|
+
|
|
480
|
+
#### Application syntax
|
|
481
|
+
|
|
482
|
+
A lambda application requires both sides to be parenthesised: `(func)(arg)`.
|
|
483
|
+
|
|
484
|
+
```text
|
|
485
|
+
(λx. P(x))(a) # arg is variable a
|
|
486
|
+
(λx. P(x))(alice) # arg is constant alice
|
|
487
|
+
(λP. P(x))(Q) # arg is the zero-arity atom Q
|
|
488
|
+
(λP. P(x))(Q(y)) # arg is the atom Q(y)
|
|
489
|
+
```
|
|
490
|
+
|
|
491
|
+
Higher-order application inside the body — a predicate parameter applied to arguments — is written in the natural `P(x)` notation, not as `(P)(x)`. Scope resolution handles the rewrite automatically.
|
|
492
|
+
|
|
493
|
+
#### Parse examples
|
|
494
|
+
|
|
495
|
+
```python
|
|
496
|
+
parser = MSFLParser()
|
|
497
|
+
|
|
498
|
+
parser.parse("λx. P(x)")
|
|
499
|
+
# Lambda(LambdaVar("x"), Atom("P", [LambdaVar("x")]))
|
|
500
|
+
|
|
501
|
+
parser.parse("λP. P(x)")
|
|
502
|
+
# Lambda(LambdaVar("P"), Application(LambdaVar("P"), Variable("x")))
|
|
503
|
+
|
|
504
|
+
parser.parse("λP. λx. P(x)")
|
|
505
|
+
# Lambda(LambdaVar("P"), Lambda(LambdaVar("x"), Application(LambdaVar("P"), LambdaVar("x"))))
|
|
506
|
+
|
|
507
|
+
parser.parse("λx. ∀x P(x)")
|
|
508
|
+
# Lambda(LambdaVar("x"), Quantifier("∀", Variable("x"), Atom("P", [Variable("x")])))
|
|
509
|
+
# x inside ∀ is quantifier-bound — NOT rewritten to LambdaVar
|
|
510
|
+
|
|
511
|
+
parser.parse("(λP. P(x))(Q)")
|
|
512
|
+
# Application(Lambda(LambdaVar("P"), Application(LambdaVar("P"), Variable("x"))), Atom("Q", []))
|
|
513
|
+
```
|
|
514
|
+
|
|
515
|
+
### A complete FOL example
|
|
516
|
+
|
|
517
|
+
```text
|
|
518
|
+
∀x ((Object(x) ∧ HasThreeDimensionalShape(x) ∧
|
|
519
|
+
∀y ∀z ((Point(y) ∧ OnSurfaceOf(y, x) ∧ Point(z) ∧ OnSurfaceOf(z, x))
|
|
520
|
+
→ distance(y, centerOf(x)) = distance(z, centerOf(x))))
|
|
521
|
+
→ Sphere(x))
|
|
522
|
+
```
|
|
523
|
+
|
|
524
|
+
### A complete MSFOL example
|
|
525
|
+
|
|
526
|
+
```text
|
|
527
|
+
∀x:Person ∀y:Person (Knows(x, y) ∧ Trusted(y:Person)) → Shares(x, y)
|
|
528
|
+
```
|
|
529
|
+
|
|
530
|
+
### A complete MSFL example
|
|
531
|
+
|
|
532
|
+
```text
|
|
533
|
+
∀x:Patient ∀y:Treatment
|
|
534
|
+
(Effective(y:Treatment) ⊗ Tolerable(x:Patient, y:Treatment))
|
|
535
|
+
→ Recommended(x:Patient, y:Treatment)
|
|
536
|
+
```
|
|
537
|
+
|
|
538
|
+
## AST nodes
|
|
539
|
+
|
|
540
|
+
All nodes are Python dataclasses and can be imported from `unicode_fol_kit`.
|
|
541
|
+
|
|
542
|
+
### Shared term and atom nodes (all modes)
|
|
543
|
+
|
|
544
|
+
| Class | Fields | Notes |
|
|
545
|
+
|---|---|---|
|
|
546
|
+
| `Variable` | `name: str` | bound or free variable |
|
|
547
|
+
| `Constant` | `name: str` | bare constant or c_-prefixed |
|
|
548
|
+
| `Number` | `value: int \| float` | numeric literal |
|
|
549
|
+
| `Function` | `name: str`, `args: list` | function application and arithmetic ops |
|
|
550
|
+
| `Atom` | `predicate: str`, `args: list` | predicate or infix comparison |
|
|
551
|
+
|
|
552
|
+
### Classical formula nodes (FOL / MSFOL)
|
|
553
|
+
|
|
554
|
+
| Class | Fields |
|
|
555
|
+
|---|---|
|
|
556
|
+
| `Not` | `formula` |
|
|
557
|
+
| `And` | `left`, `right` |
|
|
558
|
+
| `Or` | `left`, `right` |
|
|
559
|
+
| `Xor` | `left`, `right` *(FOL only)* |
|
|
560
|
+
| `Implies` | `left`, `right` |
|
|
561
|
+
| `Iff` | `left`, `right` |
|
|
562
|
+
| `Quantifier` | `type: str`, `variable`, `formula` *(FOL only)* |
|
|
563
|
+
|
|
564
|
+
### MSFOL / MSFL nodes
|
|
565
|
+
|
|
566
|
+
| Class | Fields | Notes |
|
|
567
|
+
|---|---|---|
|
|
568
|
+
| `SortedQuantifier` | `type: str`, `variable`, `sort: str`, `formula` | sort annotation without leading `:` |
|
|
569
|
+
| `SortedConstant` | `name: str`, `sort: str` | sort annotation without leading `:` |
|
|
570
|
+
|
|
571
|
+
### MSFL Łukasiewicz nodes
|
|
572
|
+
|
|
573
|
+
| Class | Fields | Semantics |
|
|
574
|
+
|---|---|---|
|
|
575
|
+
| `LukNegation` | `formula` | 1 − φ |
|
|
576
|
+
| `WeakConjunction` | `left`, `right` | min(φ, ψ) |
|
|
577
|
+
| `WeakDisjunction` | `left`, `right` | max(φ, ψ) |
|
|
578
|
+
| `StrongConjunction` | `left`, `right` | max(0, φ + ψ − 1) |
|
|
579
|
+
| `StrongDisjunction` | `left`, `right` | min(1, φ + ψ) |
|
|
580
|
+
| `LukImplication` | `left`, `right` | min(1, 1 − φ + ψ) |
|
|
581
|
+
| `LukEquivalence` | `left`, `right` | 1 − \|φ − ψ\| |
|
|
582
|
+
|
|
583
|
+
### Lambda-calculus nodes (all modes)
|
|
584
|
+
|
|
585
|
+
| Class | Fields | Notes |
|
|
586
|
+
|---|---|---|
|
|
587
|
+
| `LambdaVar` | `name: str` | lambda-bound variable; frozen and hashable — distinct from `Variable` |
|
|
588
|
+
| `Lambda` | `param: LambdaVar`, `body: Node` | lambda abstraction `λparam. body` |
|
|
589
|
+
| `Application` | `func: Node`, `arg: Node` | lambda application `func(arg)` |
|
|
590
|
+
|
|
591
|
+
`LambdaVar` is kept separate from `Variable` so that logical binding (by quantifiers) and lambda binding never get confused. `free_variables()` returns a mixed set that may contain both.
|
|
592
|
+
|
|
593
|
+
### Reductions
|
|
594
|
+
|
|
595
|
+
Every MSFL node implements two reduction steps:
|
|
596
|
+
|
|
597
|
+
- **`to_msfol()`** — lowers Łukasiewicz connectives to classical nodes while preserving sort annotations (`SortedQuantifier` and `SortedConstant` survive unchanged).
|
|
598
|
+
- **`_relativize(facts)`** — eliminates sort annotations by replacing `∀x:S φ` with `∀x (S(x) → φ)` and `∃x:S φ` with `∃x (S(x) ∧ φ)`, and replacing `SortedConstant(name, sort)` with a plain `Constant(name)`.
|
|
599
|
+
|
|
600
|
+
The top-level helper `to_fol(node, include_sort_facts=False)` chains both steps and optionally conjoins sort-membership atoms for all ground constants at the top level.
|
|
601
|
+
|
|
602
|
+
## Error handling
|
|
603
|
+
|
|
604
|
+
Parse errors are reported with human-readable messages rather than raw parser internals. Lexer-level problems (an invalid character, a malformed name or number) raise `NamingError`; structural problems (an incomplete formula, a misplaced operator, or an attempt to mix same-level connectives without parentheses) raise `ParsingError`. Both report the offending position and, where useful, a hint. The hint text is mode-aware:
|
|
605
|
+
|
|
606
|
+
```python
|
|
607
|
+
from unicode_fol_kit import MSFLParser
|
|
608
|
+
|
|
609
|
+
# FOL mode — hint names ∧, ∨, and ⊕
|
|
610
|
+
MSFLParser().parse("P(x) ∧ Q(x) ∨ R(x)")
|
|
611
|
+
# SYNTAX_ERROR: … Hint: Cannot mix conjunction (∧), disjunction (∨), and exclusive or (⊕) without parentheses
|
|
612
|
+
|
|
613
|
+
# MSFOL mode — hint names only ∧ and ∨
|
|
614
|
+
MSFLParser(many_sorted=True).parse("P(x) ∧ Q(x) ∨ R(x)")
|
|
615
|
+
# SYNTAX_ERROR: … Hint: Cannot mix conjunction (∧) and disjunction (∨) without parentheses
|
|
616
|
+
|
|
617
|
+
# MSFL mode — hint names all four Łukasiewicz connectives
|
|
618
|
+
MSFLParser(many_sorted=True, fuzzy=True).parse("P(x) ∧ Q(x) ⊗ R(x)")
|
|
619
|
+
# SYNTAX_ERROR: … Hint: Cannot mix weak conjunction (∧), weak disjunction (∨),
|
|
620
|
+
# strong conjunction (⊗), and strong disjunction (⊕) without parentheses
|
|
621
|
+
```
|
|
622
|
+
|
|
623
|
+
## Citation
|
|
624
|
+
|
|
625
|
+
If you use this toolkit in academic work, please cite the accompanying preprint:
|
|
626
|
+
|
|
627
|
+
```bibtex
|
|
628
|
+
@misc{vossel2025advancingnaturallanguageformalization,
|
|
629
|
+
title={Advancing Natural Language Formalization to First Order Logic with Fine-tuned LLMs},
|
|
630
|
+
author={Felix Vossel and Till Mossakowski and Björn Gehrke},
|
|
631
|
+
year={2025},
|
|
632
|
+
eprint={2509.22338},
|
|
633
|
+
archivePrefix={arXiv},
|
|
634
|
+
primaryClass={cs.CL},
|
|
635
|
+
url={https://arxiv.org/abs/2509.22338},
|
|
636
|
+
}
|
|
637
|
+
```
|
|
638
|
+
|
|
639
|
+
> Vossel, F., Mossakowski, T., & Gehrke, B. (2025). *Advancing Natural Language
|
|
640
|
+
> Formalization to First Order Logic with Fine-tuned LLMs.* arXiv preprint
|
|
641
|
+
> arXiv:2509.22338.
|
|
642
|
+
|
|
643
|
+
## License
|
|
644
|
+
|
|
645
|
+
MIT
|