PyPI - unicode-fol-kit - Versions diffs - 0.5.0__tar.gz → 0.5.2__tar.gz - Mend

unicode-fol-kit 0.5.0tar.gz → 0.5.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (48) hide show

{unicode_fol_kit-0.5.0 → unicode_fol_kit-0.5.2}/CHANGELOG.md RENAMED Viewed

@@ -5,6 +5,43 @@ loosely based on [Keep a Changelog](https://keepachangelog.com/). Versioning is
 semantic, but the project is pre-1.0 (alpha): a **minor** release may contain
 breaking changes.
+## [0.5.2] - 2026-06-26
+### Added
+- **Predicate-aligned string match** (`unicode_fol_kit.eval.predicate_match`) —
+  `match_predicates`, `formulas_are_matched_identical`, and
+  `formulas_are_identical`, re-exported at the package top level. A lexical
+  (string-level) evaluation notion for NL→FOL: `match_predicates` greedily
+  renames each predicate/function symbol in a predicted formula to the
+  lexically-closest symbol in the reference (by **normalised Levenshtein
+  distance**, accepting matches at or below a `max_norm_distance` threshold,
+  default `0.6`), so a structurally-correct answer that merely chose different
+  predicate names is not penalised. `formulas_are_identical` is the plain
+  whitespace- and case-insensitive string equality; `formulas_are_matched_identical`
+  combines the two (realign predicates, then compare). This is **complementary**
+  to the AST-level `exact_match`: the canonical match quotients out α-renaming /
+  commutativity / associativity / double negation but treats different predicate
+  names as a mismatch, whereas this matcher quotients out predicate-name (and
+  whitespace/case) differences but not the structural rewrites — the two are
+  typically reported as separate metrics. The Levenshtein distance is computed in
+  pure Python, so **no new dependency** is introduced; the matcher is
+  parser-independent and also applies to raw, not-yet-parseable model output.
+## [0.5.1] - 2026-06-24
+### Added
+- **`check_logical_entailment_vampire`** — entailment checking via the
+  [Vampire](https://vprover.github.io/) theorem prover, a TPTP-based companion to
+  the existing Prover9 backend. Premises are emitted as TPTP `axiom`s and the
+  conclusion as a `conjecture`; the path to the Vampire executable is passed as
+  the `vampire_path` argument, and a `SZS status Theorem` result means the
+  entailment holds. Classical FOL only (the same fragment `to_tptp` supports).
+  Pass `use_wsl=True` to drive a Linux Vampire installed in WSL from a Windows
+  host (Vampire is launched via `wsl.exe`, with automatic `wslpath` translation of
+  the temp-file path).
 ## [0.5.0] - 2026-06-24
 Adds an NL→FOL **evaluation** toolkit and broad **non-classical logic** coverage —

{unicode_fol_kit-0.5.0 → unicode_fol_kit-0.5.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: unicode-fol-kit
-Version: 0.5.0
+Version: 0.5.2
 Summary: Parser and toolkit for first-order logic formulas using Unicode operators
 Project-URL: Repository, https://github.com/fvossel/unicode-fol-kit
 Project-URL: Issues, https://github.com/fvossel/unicode-fol-kit/issues
@@ -48,7 +48,7 @@ A Python toolkit for parsing and working with first-order logic (FOL) formulas w
 - **Prover9 export** — translate formulas to Prover9 syntax for automated theorem proving
 - **TPTP export** — translate formulas to TPTP syntax
 - **Equivalence checking** — check if two formulas are logically equivalent via Z3
-- **Entailment checking** — check if a conclusion follows from premises via Prover9
+- **Entailment checking** — check if a conclusion follows from premises via Prover9 (`check_logical_entailment`) or Vampire (`check_logical_entailment_vampire`), each taking the prover's executable path as an argument
 - **Built-in resolution prover** — `prove()` and `is_valid_resolution()` decide entailment/validity in-process (sound first-order resolution, no external solver needed)
 - **Canonical form & exact match** — `canonicalize()` normalises bound-variable renaming, commutativity/associativity, operand duplication, and double negation; `exact_match()` gives a fair NL→FOL comparison stricter than logical equivalence but more forgiving than raw equality
 - **Formula validation** — `validate()` / `is_wellformed()` / `validate_text()` report free variables, inconsistent predicate/function arity, leftover lambda nodes, and parseability of raw model output
@@ -472,6 +472,32 @@ conclusion = parser.parse("Mortal(socrates)")
 check_logical_entailment(premises, conclusion, prover9_path="/usr/bin/prover9")  # True
 ```
+### Entailment checking (Vampire)
+The same check backed by [Vampire](https://vprover.github.io/) instead of Prover9: the premises are emitted as TPTP `axiom`s and the conclusion as a `conjecture`, and the path to the Vampire executable is passed as an argument (Vampire reports `SZS status Theorem` when the entailment holds).
+```python
+from unicode_fol_kit import MSFLParser, check_logical_entailment_vampire  # doctest: +SKIP (needs an installed Vampire)
+parser = MSFLParser()
+premises = [
+    parser.parse("∀x (Human(x) → Mortal(x))"),
+    parser.parse("Human(socrates)"),
+]
+conclusion = parser.parse("Mortal(socrates)")
+check_logical_entailment_vampire(premises, conclusion, vampire_path="/usr/bin/vampire")  # True
+```
+On Windows you can drive a Linux Vampire installed in **WSL** with `use_wsl=True`: Vampire is launched through `wsl.exe` and the temporary problem file's path is translated to its `/mnt/...` form automatically. Here `vampire_path` is the command/path *inside* WSL (e.g. `"vampire"` if it is on the WSL `PATH`).
+```python
+# Windows host, Vampire installed in WSL:  # doctest: +SKIP (needs WSL + Vampire)
+check_logical_entailment_vampire(premises, conclusion, vampire_path="vampire", use_wsl=True)  # True
+```
+Note that every premise and the conclusion must be a closed sentence — Vampire rejects unquantified (free) variables, and recall that a single lowercase letter like `x` is a *variable*, so a constant individual needs a multi-character name (`socrates`) or the `c_`-prefix.
 ### Entailment and validity (built-in resolution prover)
 For entailment and validity **without** an external prover, the package ships a self-contained first-order **resolution** prover. It clausifies the input (skolemise → drop ∀ prefix → CNF → clauses), then refutes `premises ∧ ¬conclusion` by binary resolution and factoring, deriving the empty clause iff the entailment holds.
@@ -679,6 +705,30 @@ validate_text("∀x (P(x)").parseable   # False  (unbalanced parenthesis)
 The `ValidationReport` also exposes `has_lambdas`, and `predicates` / `functions` / `constants` / `sorts_used` inventories. Built-in comparison (`= ≠ < > ≤ ≥`) and arithmetic (`+ - * /`) symbols are excluded from the arity checks and inventories.
+### Predicate-aligned string match
+`canonicalize` / `exact_match` forgive *structural* differences (α-renaming, commutativity/associativity, …) but treat two **different predicate names** as a genuine mismatch. The complementary, lexical notion is `match_predicates`: it greedily renames each predicate/function symbol in a predicted formula to the closest symbol in the reference — by **normalised Levenshtein distance**, accepting a match at or below a threshold (`max_norm_distance`, default `0.6`) — so a structurally-correct answer that merely chose different predicate names is not penalised. `formulas_are_identical` is the plain whitespace- and case-insensitive string equality; `formulas_are_matched_identical` realigns predicates and then compares.
+```python
+from unicode_fol_kit import (
+    match_predicates,
+    formulas_are_identical,
+    formulas_are_matched_identical,
+)
+pred = "∀x (Wins(x) → Happy(x))"
+ref  = "∀x (Win(x) → Happy(x))"        # same shape; "Wins" vs "Win"
+formulas_are_identical(pred, ref)            # False  (raw strings differ)
+match_predicates(pred, ref)                  # '∀x (Win(x) → Happy(x))'
+formulas_are_matched_identical(pred, ref)    # True   (Wins → Win is a close match)
+# A symbol with no sufficiently close reference counterpart is left untouched:
+match_predicates("Red(x)", "Tall(x)")        # 'Red(x)'  (normalised distance 1.0 > 0.6)
+```
+Unlike the `canonicalize`/`exact_match` pair, this matcher is purely **lexical** (string-level), so it also applies to raw model output that does not yet parse, and the two notions are typically reported as separate metrics (e.g. `EXACT_MATCH` vs `PREDICATE_MATCHED_EXACT_MATCH`). The Levenshtein distance is computed in pure Python, so no extra dependency is required.
 ## Modal, temporal, and epistemic logic
 Natural language is full of constructs classical FOL can't express directly — necessity/possibility, knowledge and belief, and time. `MSFLParser(modal=True)` adds a modal mode (classical unsorted FOL extended with modal operators) and the toolkit ships Kripke-model semantics plus a standard translation back to FOL.

{unicode_fol_kit-0.5.0 → unicode_fol_kit-0.5.2}/README.md RENAMED Viewed

@@ -24,7 +24,7 @@ A Python toolkit for parsing and working with first-order logic (FOL) formulas w
 - **Prover9 export** — translate formulas to Prover9 syntax for automated theorem proving
 - **TPTP export** — translate formulas to TPTP syntax
 - **Equivalence checking** — check if two formulas are logically equivalent via Z3
-- **Entailment checking** — check if a conclusion follows from premises via Prover9
+- **Entailment checking** — check if a conclusion follows from premises via Prover9 (`check_logical_entailment`) or Vampire (`check_logical_entailment_vampire`), each taking the prover's executable path as an argument
 - **Built-in resolution prover** — `prove()` and `is_valid_resolution()` decide entailment/validity in-process (sound first-order resolution, no external solver needed)
 - **Canonical form & exact match** — `canonicalize()` normalises bound-variable renaming, commutativity/associativity, operand duplication, and double negation; `exact_match()` gives a fair NL→FOL comparison stricter than logical equivalence but more forgiving than raw equality
 - **Formula validation** — `validate()` / `is_wellformed()` / `validate_text()` report free variables, inconsistent predicate/function arity, leftover lambda nodes, and parseability of raw model output
@@ -448,6 +448,32 @@ conclusion = parser.parse("Mortal(socrates)")
 check_logical_entailment(premises, conclusion, prover9_path="/usr/bin/prover9")  # True
 ```
+### Entailment checking (Vampire)
+The same check backed by [Vampire](https://vprover.github.io/) instead of Prover9: the premises are emitted as TPTP `axiom`s and the conclusion as a `conjecture`, and the path to the Vampire executable is passed as an argument (Vampire reports `SZS status Theorem` when the entailment holds).
+```python
+from unicode_fol_kit import MSFLParser, check_logical_entailment_vampire  # doctest: +SKIP (needs an installed Vampire)
+parser = MSFLParser()
+premises = [
+    parser.parse("∀x (Human(x) → Mortal(x))"),
+    parser.parse("Human(socrates)"),
+]
+conclusion = parser.parse("Mortal(socrates)")
+check_logical_entailment_vampire(premises, conclusion, vampire_path="/usr/bin/vampire")  # True
+```
+On Windows you can drive a Linux Vampire installed in **WSL** with `use_wsl=True`: Vampire is launched through `wsl.exe` and the temporary problem file's path is translated to its `/mnt/...` form automatically. Here `vampire_path` is the command/path *inside* WSL (e.g. `"vampire"` if it is on the WSL `PATH`).
+```python
+# Windows host, Vampire installed in WSL:  # doctest: +SKIP (needs WSL + Vampire)
+check_logical_entailment_vampire(premises, conclusion, vampire_path="vampire", use_wsl=True)  # True
+```
+Note that every premise and the conclusion must be a closed sentence — Vampire rejects unquantified (free) variables, and recall that a single lowercase letter like `x` is a *variable*, so a constant individual needs a multi-character name (`socrates`) or the `c_`-prefix.
 ### Entailment and validity (built-in resolution prover)
 For entailment and validity **without** an external prover, the package ships a self-contained first-order **resolution** prover. It clausifies the input (skolemise → drop ∀ prefix → CNF → clauses), then refutes `premises ∧ ¬conclusion` by binary resolution and factoring, deriving the empty clause iff the entailment holds.
@@ -655,6 +681,30 @@ validate_text("∀x (P(x)").parseable   # False  (unbalanced parenthesis)
 The `ValidationReport` also exposes `has_lambdas`, and `predicates` / `functions` / `constants` / `sorts_used` inventories. Built-in comparison (`= ≠ < > ≤ ≥`) and arithmetic (`+ - * /`) symbols are excluded from the arity checks and inventories.
+### Predicate-aligned string match
+`canonicalize` / `exact_match` forgive *structural* differences (α-renaming, commutativity/associativity, …) but treat two **different predicate names** as a genuine mismatch. The complementary, lexical notion is `match_predicates`: it greedily renames each predicate/function symbol in a predicted formula to the closest symbol in the reference — by **normalised Levenshtein distance**, accepting a match at or below a threshold (`max_norm_distance`, default `0.6`) — so a structurally-correct answer that merely chose different predicate names is not penalised. `formulas_are_identical` is the plain whitespace- and case-insensitive string equality; `formulas_are_matched_identical` realigns predicates and then compares.
+```python
+from unicode_fol_kit import (
+    match_predicates,
+    formulas_are_identical,
+    formulas_are_matched_identical,
+)
+pred = "∀x (Wins(x) → Happy(x))"
+ref  = "∀x (Win(x) → Happy(x))"        # same shape; "Wins" vs "Win"
+formulas_are_identical(pred, ref)            # False  (raw strings differ)
+match_predicates(pred, ref)                  # '∀x (Win(x) → Happy(x))'
+formulas_are_matched_identical(pred, ref)    # True   (Wins → Win is a close match)
+# A symbol with no sufficiently close reference counterpart is left untouched:
+match_predicates("Red(x)", "Tall(x)")        # 'Red(x)'  (normalised distance 1.0 > 0.6)
+```
+Unlike the `canonicalize`/`exact_match` pair, this matcher is purely **lexical** (string-level), so it also applies to raw model output that does not yet parse, and the two notions are typically reported as separate metrics (e.g. `EXACT_MATCH` vs `PREDICATE_MATCHED_EXACT_MATCH`). The Levenshtein distance is computed in pure Python, so no extra dependency is required.
 ## Modal, temporal, and epistemic logic
 Natural language is full of constructs classical FOL can't express directly — necessity/possibility, knowledge and belief, and time. `MSFLParser(modal=True)` adds a modal mode (classical unsorted FOL extended with modal operators) and the toolkit ships Kripke-model semantics plus a standard translation back to FOL.

{unicode_fol_kit-0.5.0 → unicode_fol_kit-0.5.2}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "unicode-fol-kit"
-version = "0.5.0"
+version = "0.5.2"
 description = "Parser and toolkit for first-order logic formulas using Unicode operators"
 readme = "README.md"
 license = { text = "MIT" }

{unicode_fol_kit-0.5.0 → unicode_fol_kit-0.5.2}/unicode_fol_kit/__init__.py RENAMED Viewed

@@ -26,6 +26,7 @@ from .fol import (
 )
 from .atp import (
     formulas_are_equivalent, check_logical_entailment,
+    check_logical_entailment_vampire,
     is_satisfiable, is_valid, get_model,
     fuzzy_is_satisfiable, fuzzy_is_valid, fuzzy_get_model,
     to_z3_arith, is_satisfiable_arith, is_valid_arith, get_model_arith,
@@ -41,9 +42,10 @@ from .semantics import (
 from .eval import (
     canonicalize, exact_match,
     validate, is_wellformed, validate_text, ValidationReport,
+    formulas_are_identical, match_predicates, formulas_are_matched_identical,
 )
-__version__ = "0.5.0"
+__version__ = "0.5.2"
 __all__ = [
     "MSFLParser",
@@ -52,6 +54,7 @@ __all__ = [
     "Z3Env",
     "NamingError", "ParsingError",
     "formulas_are_equivalent", "check_logical_entailment",
+    "check_logical_entailment_vampire",
     "SortedQuantifier", "SortedConstant",
     "WeakConjunction", "WeakDisjunction",
     "StrongConjunction", "StrongDisjunction",
@@ -81,4 +84,5 @@ __all__ = [
     "satisfies_so", "holds",
     "canonicalize", "exact_match",
     "validate", "is_wellformed", "validate_text", "ValidationReport",
+    "formulas_are_identical", "match_predicates", "formulas_are_matched_identical",
 ]

{unicode_fol_kit-0.5.0 → unicode_fol_kit-0.5.2}/unicode_fol_kit/atp/__init__.py RENAMED Viewed

@@ -1,5 +1,6 @@
 from .z3_equivalence import formulas_are_equivalent
 from .prover9_entailment import check_logical_entailment
+from .vampire_entailment import check_logical_entailment_vampire
 from .z3_models import is_satisfiable, is_valid, get_model
 from .z3_fuzzy import (
     fuzzy_is_satisfiable, fuzzy_is_valid, fuzzy_get_model, degree_expr,
@@ -12,6 +13,7 @@ from .resolution import to_clauses, refute, prove, is_valid_resolution
 __all__ = [
     "formulas_are_equivalent",
     "check_logical_entailment",
+    "check_logical_entailment_vampire",
     "is_satisfiable", "is_valid", "get_model",
     "fuzzy_is_satisfiable", "fuzzy_is_valid", "fuzzy_get_model", "degree_expr",
     "to_z3_arith", "is_satisfiable_arith", "is_valid_arith", "get_model_arith", "ArithEnv",

unicode_fol_kit-0.5.2/unicode_fol_kit/atp/vampire_entailment.py ADDED Viewed

@@ -0,0 +1,150 @@
+"""Entailment checking via the Vampire theorem prover (TPTP backend).
+The companion to :func:`prover9_entailment.check_logical_entailment`, but driving
+`Vampire <https://vprover.github.io/>`_ instead of Prover9. The problem is emitted
+in TPTP ``fof`` syntax — every premise as an ``axiom`` and the conclusion as a
+``conjecture`` — and handed to a Vampire binary whose path the caller supplies.
+Vampire negates the conjecture internally and reports ``SZS status Theorem`` when
+the premises entail the conclusion.
+Only the classical FOL fragment is supported, exactly as far as ``Node.to_tptp``
+reaches: a modal, second-order, Łukasiewicz, or lambda node raises
+``NotImplementedError`` from ``to_tptp`` and that error propagates here.
+A Windows host can drive a Linux Vampire installed in WSL by passing
+``use_wsl=True``: Vampire is then launched through ``wsl.exe`` and the temporary
+problem file's path is translated to its ``/mnt/...`` form with ``wslpath``.
+"""
+import os
+import subprocess
+import tempfile
+from typing import List
+from ..fol.nodes import Node
+def _generate_vampire_input(premises: List[Node], conclusion: Node) -> str:
+    """Build a TPTP ``fof`` problem string from premises and a conclusion.
+    Each premise becomes ``fof(premise_<i>, axiom, <tptp>).`` and the conclusion
+    becomes ``fof(goal, conjecture, <tptp>).``. The bodies come from
+    ``Node.to_tptp`` (so variables are upper-cased TPTP-style). Vampire treats the
+    single conjecture as the goal to prove from the axioms.
+    """
+    lines: List[str] = []
+    for i, premise in enumerate(premises, start=1):
+        lines.append(f"fof(premise_{i}, axiom, {premise.to_tptp()}).")
+    lines.append(f"fof(goal, conjecture, {conclusion.to_tptp()}).")
+    return "\n".join(lines) + "\n"
+def _is_entailed_output(stdout: str) -> bool:
+    """Decide entailment from Vampire's stdout.
+    Vampire reports ``SZS status Theorem`` when it proves the conjecture from the
+    axioms; ``Refutation found`` is the equivalent message in its default proof
+    output (and also covers the vacuous case of inconsistent premises, which
+    entail anything). Either signal means the entailment holds. A
+    ``CounterSatisfiable`` / ``Satisfiable`` / ``Timeout`` status — or no proof at
+    all — means it does not.
+    """
+    return ("SZS status Theorem" in stdout) or ("Refutation found" in stdout)
+def _to_wsl_path(windows_path: str) -> str:
+    """Translate a Windows path to its WSL ``/mnt/...`` form via ``wslpath``.
+    Backslashes are turned into forward slashes first: the WSL interop layer
+    swallows backslashes in arguments (``C:\\Users\\…`` reaches ``wslpath`` as
+    ``C:Users…`` with the separators gone), whereas ``wslpath`` accepts the
+    forward-slash spelling ``C:/Users/…`` directly.
+    """
+    result = subprocess.run(
+        ["wsl.exe", "wslpath", "-u", windows_path.replace("\\", "/")],
+        capture_output=True,
+        text=True,
+        timeout=20,
+    )
+    wsl_path = result.stdout.strip()
+    if not wsl_path:
+        raise RuntimeError(
+            f"wslpath could not translate {windows_path!r} (is WSL available?): "
+            f"{result.stderr.strip()}"
+        )
+    return wsl_path
+def _run_vampire(input_str: str, vampire_path: str, timeout: int = 30,
+                 use_wsl: bool = False) -> bool:
+    """Write the TPTP problem to a temp file and run Vampire on it.
+    Mirrors the Prover9 runner's contract: a subprocess timeout is swallowed and
+    reported as "not entailed" (Vampire could not finish), while any other error —
+    notably ``FileNotFoundError`` for a wrong ``vampire_path`` — propagates to the
+    caller. The temporary file is always removed, even when the subprocess raises.
+    With ``use_wsl=True`` Vampire is invoked inside WSL as
+    ``wsl.exe <vampire_path> <file>``, and the Windows temp-file path is first
+    translated to its ``/mnt/...`` form with ``wslpath`` so a Linux Vampire under
+    WSL can read the file the Windows side created.
+    """
+    with tempfile.NamedTemporaryFile(mode="w", suffix=".p", delete=False,
+                                     encoding="utf-8") as temp_file:
+        temp_file.write(input_str)
+        temp_filename = temp_file.name
+    try:
+        if use_wsl:
+            command = ["wsl.exe", vampire_path, _to_wsl_path(temp_filename)]
+        else:
+            command = [vampire_path, temp_filename]
+        result = subprocess.run(
+            command,
+            capture_output=True,
+            text=True,
+            timeout=timeout,
+        )
+        return _is_entailed_output(result.stdout)
+    except subprocess.TimeoutExpired:
+        return False
+    finally:
+        try:
+            os.unlink(temp_filename)
+        except OSError:
+            pass
+def check_logical_entailment_vampire(premises: List[Node], conclusion: Node,
+                                     vampire_path: str, timeout: int = 30,
+                                     use_wsl: bool = False) -> bool:
+    """Return whether ``premises`` entail ``conclusion``, decided by Vampire.
+    Args:
+        premises: a list of classical FOL premise formulas.
+        conclusion: the classical FOL conclusion formula.
+        vampire_path: path to a Vampire executable (e.g. ``"/usr/bin/vampire"``).
+            With ``use_wsl=True`` this is the command/path INSIDE WSL — e.g.
+            ``"vampire"`` if it is on the WSL ``PATH``, or ``"/home/me/vampire"``.
+        timeout: seconds to allow the Vampire process before giving up and
+            returning ``False`` (default 30).
+        use_wsl: when True, run Vampire inside WSL via ``wsl.exe`` and translate
+            the temp-file path to its ``/mnt/...`` form, so a Windows host can
+            drive a Linux Vampire installed in WSL.
+    Returns:
+        ``True`` iff Vampire proves the conclusion follows from the premises.
+        Note that every premise and the conclusion must be a closed sentence:
+        Vampire rejects formulas with unquantified (free) variables, and such a
+        rejection is reported as ``False`` (no proof), not raised.
+    Raises:
+        FileNotFoundError: ``vampire_path`` does not point to an executable (or,
+            with ``use_wsl=True``, ``wsl.exe`` itself is not found).
+        NotImplementedError: a formula is outside the first-order fragment
+            (modal / second-order / Łukasiewicz / lambda), surfaced by
+            ``to_tptp``.
+    """
+    vampire_input = _generate_vampire_input(premises, conclusion)
+    return _run_vampire(vampire_input, vampire_path, timeout=timeout,
+                        use_wsl=use_wsl)

{unicode_fol_kit-0.5.0 → unicode_fol_kit-0.5.2}/unicode_fol_kit/eval/__init__.py RENAMED Viewed

@@ -9,12 +9,22 @@
 - :func:`validate` / :func:`is_wellformed` / :func:`validate_text` report the
   common defects in a generated formula (free variables, inconsistent predicate
   or function arity, leftover lambda nodes, unparseable text).
+- :func:`match_predicates` / :func:`formulas_are_matched_identical` /
+  :func:`formulas_are_identical` provide a lexical, predicate-aligned string
+  match (Levenshtein-based predicate renaming) — complementary to the AST-level
+  :func:`exact_match`, which instead quotients out the structural rewrites.
 """
 from .canonical import canonicalize, exact_match
 from .validate import validate, is_wellformed, validate_text, ValidationReport
+from .predicate_match import (
+    formulas_are_identical,
+    match_predicates,
+    formulas_are_matched_identical,
+)
 __all__ = [
     "canonicalize", "exact_match",
     "validate", "is_wellformed", "validate_text", "ValidationReport",
+    "formulas_are_identical", "match_predicates", "formulas_are_matched_identical",
 ]

unicode_fol_kit-0.5.2/unicode_fol_kit/eval/predicate_match.py ADDED Viewed

@@ -0,0 +1,180 @@
+"""Predicate-aligned string matching for NL→FOL evaluation.
+When scoring a model that translates natural language to FOL, two formulas may
+denote the same thing while using *different predicate names* — a model might
+write ``Wins(x)`` where the reference writes ``IsWinner(x)``. A plain string
+comparison (or even a structural one) counts that as wrong, even though the
+logical *shape* is identical and only the lexical choice of predicate symbol
+differs. ``match_predicates`` closes that gap: it greedily renames each
+predicate/function symbol in the prediction to the closest reference symbol
+(by **normalised Levenshtein distance**, accepting a match at or below a
+distance threshold) and returns the rewritten string, so a subsequent string
+comparison rewards a structurally-correct answer that merely renamed its
+predicates.
+This is a deliberately **lexical / string-level** notion, complementary to the
+AST-level :func:`unicode_fol_kit.eval.canonical.exact_match`:
+* :func:`exact_match` (canonical) quotients out α-renaming, commutativity /
+  associativity, operand duplication, and double negation, but treats two
+  *different predicate names* as a genuine mismatch.
+* :func:`match_predicates` / :func:`formulas_are_matched_identical` quotient out
+  *predicate-name* differences (and, via :func:`formulas_are_identical`,
+  whitespace and case), but not the structural rewrites above.
+The two are orthogonal and are typically reported as separate metrics
+(``EXACT_MATCH`` vs ``PREDICATE_MATCHED_EXACT_MATCH``). The matcher is
+parser-independent: it operates directly on the surface strings, so it also
+applies to raw model output that does not (yet) parse.
+The Levenshtein distance is computed in pure Python (classical unit-cost
+insertion / deletion / substitution dynamic program), so this module adds no
+third-party dependency.
+"""
+import re
+__all__ = [
+    "formulas_are_identical",
+    "match_predicates",
+    "formulas_are_matched_identical",
+]
+# A predicate or function symbol is a maximal word immediately followed by an
+# opening parenthesis, e.g. the ``P`` in ``P(x)`` or the ``loves`` in
+# ``loves(a, b)``. Nullary predicates written without parentheses are not
+# captured (there is nothing lexical to realign), and neither are bare terms.
+_SYMBOL_BEFORE_PAREN = re.compile(r"\b\w+(?=\()")
+_WHITESPACE = re.compile(r"\s+")
+def _levenshtein(a: str, b: str) -> int:
+    """Return the Levenshtein edit distance between ``a`` and ``b``.
+    Classical unit-cost dynamic program (insertion, deletion, and substitution
+    each cost 1), computed with a single rolling row in O(len(a)·len(b)) time
+    and O(len(b)) space. Matches the value of ``Levenshtein.distance`` for the
+    same inputs, so results are identical whether or not the optional
+    ``python-Levenshtein`` C extension is installed.
+    """
+    if a == b:
+        return 0
+    if not a:
+        return len(b)
+    if not b:
+        return len(a)
+    previous = list(range(len(b) + 1))
+    for i, ca in enumerate(a, start=1):
+        current = [i]
+        for j, cb in enumerate(b, start=1):
+            insertion = current[j - 1] + 1
+            deletion = previous[j] + 1
+            substitution = previous[j - 1] + (ca != cb)
+            current.append(min(insertion, deletion, substitution))
+        previous = current
+    return previous[len(b)]
+def _normalised_distance(a: str, b: str) -> float:
+    """Levenshtein distance scaled by the longer string's length, in [0, 1].
+    Normalising by ``max(len(a), len(b))`` makes the threshold length-agnostic:
+    a one-character edit weighs more between two short names than between two
+    long ones. Both names are predicate/function symbols matched by
+    :data:`_SYMBOL_BEFORE_PAREN`, hence always non-empty, so the denominator is
+    never zero.
+    """
+    return _levenshtein(a, b) / max(len(a), len(b))
+def formulas_are_identical(prediction: str, reference: str) -> bool:
+    """Return whether two formula strings are equal ignoring whitespace and case.
+    Both strings are stripped of all whitespace and lower-cased before
+    comparison, so ``"∀x P(x)"`` and ``"∀x  p( x )"`` are considered identical.
+    This is the plain ``EXACT_MATCH`` notion; it does **not** realign predicate
+    names — use :func:`formulas_are_matched_identical` for that.
+    """
+    cleaned_prediction = _WHITESPACE.sub("", prediction).lower()
+    cleaned_reference = _WHITESPACE.sub("", reference).lower()
+    return cleaned_prediction == cleaned_reference
+def _map_predicates(
+    prediction_symbols: list,
+    reference_symbols: list,
+    max_norm_distance: float = 0.6,
+) -> list:
+    """Map each prediction symbol to its nearest reference symbol, or keep it.
+    For every symbol in ``prediction_symbols`` the closest symbol in
+    ``reference_symbols`` (smallest normalised Levenshtein distance) is found.
+    If that distance is at or below ``max_norm_distance`` the reference symbol is
+    used; otherwise the original prediction symbol is kept unchanged (the match
+    is too weak to trust). Ties are broken by the reference symbol's position,
+    matching ``min``'s first-minimum semantics.
+    """
+    mapped = []
+    for symbol in prediction_symbols:
+        best_match = min(
+            reference_symbols,
+            key=lambda candidate: _normalised_distance(symbol, candidate),
+        )
+        if _normalised_distance(symbol, best_match) <= max_norm_distance:
+            mapped.append(best_match)
+        else:
+            mapped.append(symbol)
+    return mapped
+def match_predicates(
+    prediction: str,
+    reference: str,
+    max_norm_distance: float = 0.6,
+) -> str:
+    """Rewrite ``prediction``'s predicate/function names toward ``reference``.
+    Every symbol that appears immediately before a ``(`` in ``prediction`` is
+    realigned to the lexically-closest such symbol in ``reference`` (see
+    :func:`_map_predicates`), and the rewrite is applied to the surface string
+    as a ``"<old>(" → "<new>("`` substitution. Symbols with no sufficiently
+    close reference counterpart (normalised distance above ``max_norm_distance``)
+    are left as they are. If either side has no parenthesised symbols, the
+    prediction is returned unchanged.
+    The result is a string in the same surface syntax as the input, suitable for
+    a subsequent :func:`formulas_are_identical` comparison or for re-parsing.
+    """
+    matched_formula = prediction
+    prediction_symbols = _SYMBOL_BEFORE_PAREN.findall(prediction)
+    reference_symbols = _SYMBOL_BEFORE_PAREN.findall(reference)
+    if prediction_symbols and reference_symbols:
+        mapped_symbols = _map_predicates(
+            prediction_symbols, reference_symbols, max_norm_distance
+        )
+        for old_symbol, new_symbol in zip(prediction_symbols, mapped_symbols):
+            matched_formula = matched_formula.replace(
+                old_symbol + "(", new_symbol + "("
+            )
+    return matched_formula
+def formulas_are_matched_identical(
+    prediction: str,
+    reference: str,
+    max_norm_distance: float = 0.6,
+) -> bool:
+    """Return whether ``prediction`` equals ``reference`` after predicate realignment.
+    Realigns the prediction's predicate/function names to the reference's with
+    :func:`match_predicates`, then compares with :func:`formulas_are_identical`
+    (whitespace- and case-insensitive). This is the ``PREDICATE_MATCHED_EXACT``
+    notion: it forgives a structurally-correct answer that merely chose different
+    predicate symbol names.
+    """
+    matched_prediction = match_predicates(prediction, reference, max_norm_distance)
+    return formulas_are_identical(matched_prediction, reference)