PyPI - hyperbase - Versions diffs - 0.9.0__tar.gz → 0.10.0__tar.gz - Mend

hyperbase 0.9.0tar.gz → 0.10.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (103) hide show

{hyperbase-0.9.0 → hyperbase-0.10.0}/CHANGELOG.md RENAMED Viewed

@@ -1,8 +1,39 @@
 # Changelog
+## [0.10.0] - 11-04-2026
+### Added
+- `[]` pattern notation for specifying sequences of arguments.
+- EdgeType and ArgRole enums.
+- safety cap for match (`_MAX_ARGROLE_ITEMS=10`) against pathological edge arities.
+- caching of computed `Hyperedge`/`Atom` properties.
+- `parse_to_jsonl` method on `Parser`.
+- unified parameter interface for parsers.
+- method `Parser.accepted_params`.
+- maximum depth protection for parsers.
+- repl api for parsers.
+### Changed
+- multiple patterns functions are now `Hyperedge`/`Atom` methods: `is_wildcard`, `is_pattern`, `is_fun_pattern`, `is_variable`, `contains_variable`, `variable_name`.
+- `hyperbase.py` now delegating to smaller modules with well-defined concerns: `builders.py`, `correctness.py`, `transforms.py`, `patterns.checks.py` and `patterns.matcher.py`.
+- replaced `itertools.permutations` with constraint-propagated backtracking in argrole matcher.
+- `parse_text` renamed to `parse`; old iterator-based `parse` removed.
+- `read_source` renamed to `parse_source`; `read_source_to_jsonl` renamed to `parse_source_to_jsonl`.
+- renamed `sentensize` to `get_sentences`.
+- hedge now uses an explicit stack instead of recursion (so that pathologically
+    nested edge strings cannot exhaust Python's call stack).
+- renamed parsers.correctness to parsers.badness.
+### Removed
+- `__add__` operator overloading in `Hyperedge`/`Atom`.
 ## [0.9.0] - 05-04-2026
 ### Added
 - readers (txt, url, wikipedia).
 - cli interface with repl, parsers, readers.
 - hyperedge.Hyperedge.match function (calls parsers.match_pattern).
@@ -11,6 +42,7 @@
 - load_edges function.
 ### Changed
 - added get_parser to main functions (at hyperbase root).
 - improved documentation.
 - hedge now accepts ParseResults and can recursively add Hyperedge.text strings.
@@ -25,23 +57,27 @@
 - Renamed Hyperedge.normalized to normalise.
 ### Removed
 - function patterns.edge_matches_pattern.
 - deprecated and obsolete methods from Hyperedge: is_atom, to_str, roots, insert_first_argument, connect, sequence, contains_atom_type, main_concepts, replace_main_concept, has_argroles.
 ## [0.8.0] - 26-03-2026 - hyperbase is the successor of graphbrain
 ### Added
 - parser plugin foundation.
 - more comprehensive Hyperedge.check_correctness.
 - check parse correctness.
 - type checking: full code coverage.
 ### Changed
 - renamed library to hyperbase.
 - trimmed down library to the essentials: hyperedge, patterns and parser foundations.
 - converted documentation to Material for MkDocs.
 ### Removed
 - hypergraph module, hypergraph database (memory module).
 - alphabeta parser implementation.
 - old scripts, examples, processors.
@@ -51,7 +87,9 @@
 - obsolete constants.
 ## [0.7.0] - 05-03-2026
 ### Added
 - patterns.is_wildcard().
 - Base class hypergraph.memory.keyvalue.KeyValue for key-value hypergraph databases, removing redundant code between LevelDB and SQLite.
 - Tests for LevelDB (only the SQLite Hypergraph implementation was being directly tested).
@@ -63,6 +101,7 @@
 - Hypergraph.get_attributes().
 ### Changed
 - Entire project is now in pure Python
 - Python >=3.10 now required.
 - Hypergraph.search(), .match() and .count() now working with functional patterns and argument role matching.
@@ -72,19 +111,25 @@
 - Matches from patterns with repeated variables are collected in lists.
 ### Removed
 - graphbrain.logic obsolete module.
 - LevelDB backend
 ## [0.6.1] - 31-10-2022
 ### Changed
 - Hyperedge.replace_argroles() .insert_argrole() and .add_argument() now works with functional patterns such as var.
 - Fixed bug when matching patterns containing atoms functional pattern where no atom has argroles.
 ### Removed
 - interactive_case_generator() from graphbrain.notebook.
 ## [0.6.0] - 27-10-2022
 ### Added
 - Hyperedge.atom and .not_atom properties.
 - Hyperedge.mtype() and .connector_mtype() methods.
 - Hyperedge.t, .mt, .ct and .cmt type shortcut properties.
@@ -99,6 +144,7 @@
 - Processor class.
 ### Changed
 - Coreference resolution now using the new spaCy experimental model.
 - Now using spaCy transformer GPU models by default, can fallback to CPU model.
 - Hyperedge.is_atom() deprecated.
@@ -112,6 +158,7 @@
 - Hyperedge.argroles() now also works at relation/concept level.
 ### Removed
 - graphbrain.patterns.normalize_edge().
 - graphbrain.stats obsolete package.
 - graphbrain.cognition obsolete package.
@@ -119,18 +166,22 @@
 - Hyperedge .predicate() and .predicate_atom().
 ## [0.5.0] - 28-07-2021
 ### Added
 - SQLite3 hypergraph database backend.
 - Hypergraph.add_with_attributes().
 - import and export commands.
 - Hypergraph context manager for batch writes (with hopen(hg_locator) as hg ...).
 ### Changed
 - Main hypergraph database backend is now SQLite3.
 - LevelDB backend becomes optional. (disabled by default)
 - Neuralcoref becomes optional. (disabled by default)
 ### Removed
 - Hypergraph.atom_count().
 - Hypergraph.edge_count().
 - Hypergraph.primary_atom_count().
@@ -139,21 +190,29 @@
 - corefs_unidecode agent.
 ## [0.4.3] - 22-04-2021
 ### Changed
 - Fixed AlphaBeta bug related to temporary atoms being removed too soon from atom2tokens.
 - Hypergraph.add_sequence() converts sequence name directly to atom.
 - Parser level coreference resolution (neuralcoref) disabled by default, requires dedicated build.
 ## [0.4.2] - 12-04-2021
 ### Changed
 - Solving wheel compilation issue.
 ## [0.4.1] - 07-04-2021
 ### Changed
 - Solving issue with inclusion of auxiliary data file in non-binary distributions.
 ## [0.4.0] - 07-04-2021
 ### Added
 - Agents system.
 - Conjunctions resolution agent.
 - Number agent (singular/plural relations) and related meaning.number module.
@@ -178,6 +237,7 @@
 - Utility functions to show colored edges in the terminal.
 ### Changed
 - Special characters in atoms are now percent-encoded.
 - parse() now returns a dictionary that includes inferred edges.
 - parse() now returns a dictionary of edges to text.
@@ -193,25 +253,32 @@
 - Hyperedge.replace_atom() optional unique argument.
 ### Removed
 - Meta-modifier hyperedge type.
 - Auxiliary, subpredicate and dependency hyperedge types.
 - Obsolete Hyperedge.nest() method.
 ## [0.3.2] - 10-02-2020
 ### Added
 - simplify_role() on Atom objects produces an atom with only its simple type as role.
 ### Changed
 - Lemmas are now based on atoms with simplified roles.
 - Improved actors agent (more accurate identification of actors, English only for now).
 ## [0.3.1] - 03-02-2020
 ### Added
 - German parser (experimental and incomplete).
 - Documentation.
 - Hyperedge sequences.
 ### Changed
 - Improved hyperedge visualization in notebooks.
 - Agents receive language and sequence.
 - txt_parser agent creates a sequence.
@@ -220,11 +287,14 @@
 - Improved conflict agent.
 ## [0.3.0] - 28-09-2019
 ### Added
 - Tests.
 - Documentation.
 ### Changed
 - Graphbrain is now beta (main APIs considered stable).
 - LevelDB edge attributes encoded in JSON.
 - Renamed hypergraph() to hgraph() and moved function to __jnit__.
@@ -237,23 +307,29 @@
 - Improved notebooks visualizations (show(), blocks(), vblocks()).
 ### Removed
 - graphbrain.funs module.
 ## [0.2.2] - 13-09-2019
 ### Added
 - txt_parser agent.
 - MANIFEST.in to include VERSION file in distribution.
 ### Changed
 - Fixing 'pip install graphbrain' on Linux/Windows.
 ## [0.2.1] - 04-09-2019
 ### Added
 - claim_actors and corefs_dets agents.
 - meaning.concepts module.
 ### Changed
 - Fixed example.
 - hypergraph.sum_degree() and .sum_deep_degree().
 - Parser improvements.
@@ -261,11 +337,14 @@
 - Improved docs.
 ### Removed
 - Obsolete 'work-in-progress' code.
 - hg2json command.
 ## [0.2.0] - 04-08-2019
 ### Added
 - Primary entities and deep degrees.
 - Hyperedges have their own class, deriving from tuple.
 - Atoms have a special class, deriving from Hyperedge.
@@ -273,11 +352,15 @@
 - Created agent system + first agents.
 ### Changed
 - Parsers now have own package.
 ### Removed
 - Old experimental code.
 ## [0.1.0] - 14-06-2019
 ### Added
-- First release.
+- First release.

{hyperbase-0.9.0 → hyperbase-0.10.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: hyperbase
-Version: 0.9.0
+Version: 0.10.0
 Summary: A foundational library for Semantic Hypergraphs
 Project-URL: Homepage, https://hyperquest.ai/hyperbase
 Author-email: "Telmo Menezes et al." <telmo@telmomenezes.net>

hyperbase-0.10.0/VERSION ADDED Viewed

	@@ -0,0 +1 @@
1	+ 0.10.0

{hyperbase-0.9.0 → hyperbase-0.10.0}/docs/installation.md RENAMED Viewed

@@ -124,11 +124,11 @@ uv run hyperbase parsers
 Once installed, parsers can be used from the interactive REPL:
 ```bash
-hyperbase repl --parser alphabeta --language en
+hyperbase repl --parser alphabeta --lang en
 ```
 ```bash
-uv run hyperbase repl --parser alphabeta --language en
+uv run hyperbase repl --parser alphabeta --lang en
 ```
 Or programmatically:
@@ -136,6 +136,6 @@ Or programmatically:
 ```python
 from hyperbase.parsers import get_parser
-parser = get_parser("alphabeta", language="en")
+parser = get_parser("alphabeta", lang="en")
 result = parser.parse_text("The sky is blue.")
 ```

{hyperbase-0.9.0 → hyperbase-0.10.0}/docs/manual/parsers.md RENAMED Viewed

@@ -20,10 +20,10 @@ Parsers are obtained by name with `get_parser()`:
 ```python
 from hyperbase import get_parser
-parser = get_parser("alphabeta", language="en")
+parser = get_parser("alphabeta", lang="en")
 ```
-The keyword arguments are forwarded to the parser constructor. Each parser plugin defines its own parameters -- for example, `alphabeta` takes a `language` code, while `generative` accepts `model_path`, `device`, `max_length`, and others.
+The keyword arguments are forwarded to the parser constructor. Each parser plugin defines its own parameters -- for example, `alphabeta` takes a `lang` code, while `generative` accepts `model_path`, `device`, `max_length`, and others. Run `hyperbase repl --parser <name> --help` (or `hyperbase read --parser <name> --help`) to see the full set of CLI flags injected by the active plugin.
 To see which parsers are installed:
@@ -125,42 +125,7 @@ This is what `read_source_to_jsonl()` uses internally -- each line in the output
 ## Quality checking
-The `hyperbase.parsers.correctness` module provides functions to assess the quality of a parse result.
-### Badness check
-`badness_check()` runs a comprehensive quality check on a parsed edge, combining structural validation with token-to-atom matching:
-```python
-from hyperbase.parsers.correctness import badness_check
-errors = badness_check(result.edge, result.tokens)
-if errors:
-    for key, error_list in errors.items():
-        for code, message, severity in error_list:
-            print(f"[{code}] {message} (severity: {severity})")
-else:
-    print("No errors found.")
-```
-The function returns a dictionary mapping edge fragments (or the string `'token-matching'`) to lists of `(code, message, severity)` tuples. An empty dictionary means no errors were found.
-The checks include:
-- **Structural correctness** -- validates the hyperedge against the SH specification (via `Hyperedge.check_correctness()`).
-- **Argument role validation** -- checks that argument roles are drawn from the valid set (`m`, `s`, `p`, `a`, `o`, `i`, `x`, `t`, `j`, `r`, `c`) and that roles like `s`, `p`, `o` are not duplicated.
-- **Junction consistency** -- verifies that junction arguments are consistently typed (all relations or all concepts).
-- **Token matching** -- ensures that every token in the original sentence maps to an atom root in the edge, and vice versa. Handles multi-token atoms, contractions and other non-trivial correspondences.
-### Structural quality only
-For a lighter check that skips token matching:
-```python
-from hyperbase.parsers.correctness import check_structural_quality
-errors = check_structural_quality(result.edge)
-```
+Badness/correctness checking lives in the parser plugin that needs it. The generative parser ships [`hyperbase_parser_gen.correctness.badness_check`](https://github.com/telmomenezes/hyperbase-parser-gen) for combined structural + token-matching validation; see that package's docs for usage.
 ## CLI
@@ -177,7 +142,7 @@ Shows all installed parser plugins and their entry point values.
 The REPL lets you parse sentences interactively:
 ```bash
-hyperbase repl --parser alphabeta --language en
+hyperbase repl --parser alphabeta --lang en
 ```
 Inside the REPL, type a sentence to parse it. Use `/help` to see available commands, `/settings` to view current configuration, and `/set` to change settings on the fly (e.g. `/set parser generative`). The REPL caches parser instances, so switching between parsers is fast after the first load.
@@ -186,7 +151,7 @@ Inside the REPL, type a sentence to parse it. Use `/help` to see available comma
 ```bash
 # Parse a file to JSONL
-hyperbase read article.txt -o output.jsonl --parser alphabeta --language en
+hyperbase read article.txt -o output.jsonl --parser alphabeta --lang en
 # Parse a Wikipedia article
 hyperbase read https://en.wikipedia.org/wiki/Hypergraph -o output.jsonl
@@ -196,10 +161,12 @@ See the [readers](readers.md) documentation for the full set of `hyperbase read`
 ## Custom parsers
-To create a custom parser, subclass `Parser` and implement two methods:
+To create a custom parser, subclass `Parser` and implement:
-- `sentensize(text)` -- split a text string into a list of sentences.
+- `__init__(params)` -- constructor accepting a dictionary of parser parameters.
+- `get_sentences(text)` -- split a text string into a list of sentences.
 - `parse_sentence(sentence)` -- parse a single sentence and return a list of `ParseResult` objects.
+- `accepted_params()` (classmethod) -- return a dict describing the parameters the parser accepts.
 Optionally, override `parse_batch(sentences)` if your parser can process multiple sentences more efficiently in a single call.
@@ -208,7 +175,20 @@ from hyperbase.parsers import Parser, ParseResult
 from hyperbase.hyperedge import hedge
 class MyParser(Parser):
-    def sentensize(self, text):
+    @classmethod
+    def accepted_params(cls):
+        return {
+            "lang": {
+                "type": str, "default": None,
+                "description": "Language code.", "required": True,
+            },
+        }
+    def __init__(self, params=None):
+        super().__init__(params)
+        self.lang = self.params["lang"]
+    def get_sentences(self, text):
         # simple sentence splitting
         return [s.strip() for s in text.split('.') if s.strip()]

{hyperbase-0.9.0 → hyperbase-0.10.0}/docs/manual/readers.md RENAMED Viewed

@@ -70,7 +70,7 @@ hyperbase read article.txt -o output.txt
 hyperbase read https://en.wikipedia.org/wiki/Hypergraph -o output.jsonl
 # Specify reader and parser explicitly
-hyperbase read source.txt -o output.jsonl --reader plain_text --parser alphabeta --language en
+hyperbase read source.txt -o output.jsonl --reader plain_text --parser alphabeta --lang en
 ```
 ## Built-in readers

{hyperbase-0.9.0 → hyperbase-0.10.0}/pyproject.toml RENAMED Viewed

@@ -74,7 +74,7 @@ target-version = "py310"
 select = ["E", "F", "W", "I", "UP", "B", "SIM", "RUF", "Q", "C4", "PT", "N", "ANN"]
 [tool.ruff.lint.per-file-ignores]
-"tests/*" = ["E501", "ANN201", "D100", "D101", "D102", "D400", "D415"]
+"tests/*" = ["ANN001", "ANN003", "ANN201", "ANN202", "ANN204", "ANN205", "D100", "D101", "D102", "D400", "D415", "E501", "N802", "PT011"]
 [tool.ruff.lint.flake8-quotes]
 inline-quotes = "double"

{hyperbase-0.9.0 → hyperbase-0.10.0}/src/hyperbase/__init__.py RENAMED Viewed

@@ -1,4 +1,4 @@
-from hyperbase.hyperedge import hedge
+from hyperbase.builders import hedge
 from hyperbase.loaders import load_edges
 from hyperbase.parsers import get_parser

hyperbase-0.10.0/src/hyperbase/builders.py ADDED Viewed

@@ -0,0 +1,187 @@
+from __future__ import annotations
+from collections.abc import Iterable
+from typing import Any, cast
+from hyperbase.constants import ATOM_ENCODE_TABLE
+from hyperbase.hyperedge import Atom, Hyperedge, UniqueAtom
+from hyperbase.parsers.parse_result import ParseResult
+def str_to_atom(s: str) -> str:
+    """Converts a string into a valid atom."""
+    return s.lower().translate(ATOM_ENCODE_TABLE)
+def _edge_str_has_outer_parens(edge_str: str) -> bool:
+    """Check if string representation of edge is delimited by outer
+    parenthesis.
+    """
+    if len(edge_str) < 2:
+        return False
+    return edge_str[0] == "("
+def split_edge_str(edge_str: str) -> tuple[str, ...]:
+    """Shallow split into tokens of a string representation of an edge,
+    without outer parenthesis.
+    """
+    start = 0
+    depth = 0
+    str_length = len(edge_str)
+    active = 0
+    tokens: list[str] = []
+    for i in range(str_length):
+        c = edge_str[i]
+        if c == " ":
+            if active and depth == 0:
+                tokens.append(edge_str[start:i])
+                active = 0
+        elif c == "(":
+            if depth == 0:
+                active = 1
+                start = i
+            depth += 1
+        elif c == ")":
+            depth -= 1
+            if depth == 0:
+                tokens.append(edge_str[start : i + 1])
+                active = 0
+            elif depth < 0:
+                raise ValueError(f"Unbalanced parenthesis in edge string: '{edge_str}'")
+        else:
+            if not active:
+                active = 1
+                start = i
+    if active:
+        if depth > 0:
+            raise ValueError(f"Unbalanced parenthesis in edge string: '{edge_str}'")
+        else:
+            tokens.append(edge_str[start:])
+    return tuple(tokens)
+def _hedge_from_str(source: str) -> Hyperedge:
+    """Iteratively parse an edge string into a Hyperedge.
+    Uses an explicit stack rather than recursion so that pathologically
+    nested edge strings cannot exhaust Python's call stack. Each frame in
+    the stack represents one open ``(...)`` group being assembled and
+    holds: ``[parens_flag, tokens, next_token_index, children_built]``.
+    """
+    edge_str = source.strip().replace("\n", " ")
+    parens = _edge_str_has_outer_parens(edge_str)
+    inner = edge_str[1:-1] if parens else edge_str
+    tokens = split_edge_str(inner)
+    if not tokens:
+        raise ValueError(f"Edge string is empty: '{source}'")
+    stack: list[list[Any]] = [[parens, tokens, 0, []]]
+    final: Hyperedge | None = None
+    while stack:
+        frame = stack[-1]
+        if frame[2] >= len(frame[1]):
+            # All tokens for this frame consumed; build the edge.
+            children: list[Hyperedge] = frame[3]
+            frame_parens: bool = frame[0]
+            if len(children) == 1 and isinstance(children[0], Atom):
+                built: Hyperedge = Atom(str(children[0]), frame_parens)
+            elif children:
+                built = Hyperedge(tuple(children))
+            else:
+                # Unreachable: empty token lists are rejected before push,
+                # but keep the guard for defensiveness.
+                raise ValueError(f"Edge string is empty: '{source}'")
+            stack.pop()
+            if stack:
+                stack[-1][3].append(built)
+            else:
+                final = built
+            continue
+        token = frame[1][frame[2]]
+        frame[2] += 1
+        if _edge_str_has_outer_parens(token):
+            inner_tok = token[1:-1]
+            sub_tokens = split_edge_str(inner_tok)
+            if not sub_tokens:
+                raise ValueError(f"Edge string is empty: '{token}'")
+            stack.append([True, sub_tokens, 0, []])
+        else:
+            frame[3].append(Atom(token))
+    assert final is not None  # loop guarantees this
+    return final
+def _collect_positions(tok_pos: Hyperedge) -> list[int]:
+    """Collect all valid (>= 0) token positions from a tok_pos tree."""
+    if tok_pos.atom:
+        pos = int(str(tok_pos))
+        return [pos] if pos >= 0 else []
+    else:
+        positions: list[int] = []
+        for sub in tok_pos:
+            positions.extend(_collect_positions(sub))
+        return positions
+def _rebuild_with_text(
+    edge: Hyperedge,
+    tok_pos: Hyperedge,
+    tokens: list[str],
+) -> Hyperedge:
+    """Recursively rebuild an edge, assigning text from tokens and tok_pos."""
+    if edge.atom:
+        atom = cast(Atom, edge)
+        pos = int(str(tok_pos))
+        text = tokens[pos] if pos >= 0 else None
+        return Atom(str(atom), atom.parens, text=text)
+    else:
+        new_children = tuple(
+            _rebuild_with_text(sub_edge, sub_tok_pos, tokens)
+            for sub_edge, sub_tok_pos in zip(edge, tok_pos, strict=False)
+        )
+        positions = _collect_positions(tok_pos)
+        if positions:
+            min_pos = min(positions)
+            max_pos = max(positions)
+            text = " ".join(tokens[min_pos : max_pos + 1])
+        else:
+            text = None
+        return Hyperedge(new_children, text=text)
+def hedge(
+    source: str | Hyperedge | list | tuple | ParseResult,
+) -> Hyperedge:
+    """Create a hyperedge."""
+    if isinstance(source, ParseResult):
+        _source = source
+        edge = _rebuild_with_text(_source.edge, _source.tok_pos, _source.tokens)
+        object.__setattr__(edge, "text", _source.text)
+        return edge
+    if type(source) in {tuple, list}:
+        _source = cast(Iterable, source)
+        return Hyperedge(tuple(hedge(item) for item in _source))
+    elif type(source) is str:
+        return _hedge_from_str(source)
+    elif type(source) in {Hyperedge, Atom, UniqueAtom}:
+        return source  # type: ignore
+    else:
+        raise TypeError(
+            f"Cannot create hyperedge from {type(source).__name__}: {source!r}"
+        )
+def build_atom(text: str, *parts: str) -> Atom:
+    """Build an atom from text and other parts."""
+    atom = str_to_atom(text)
+    parts_str = "/".join([part for part in parts if part])
+    if len(parts_str) > 0:
+        atom_str = "".join((atom, "/", parts_str))
+    return Atom(atom_str)

hyperbase 0.9.0__tar.gz → 0.10.0__tar.gz

hyperbase 0.9.0tar.gz → 0.10.0tar.gz