PyPI - relationalai - Versions diffs - 0.12.7__py3-none-any.whl → 0.12.9__py3-none-any.whl - Mend

relationalai 0.12.7py3-none-any.whl → 0.12.9py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (33) hide show

relationalai/semantics/lqp/rewrite/functional_dependencies.py ADDED Viewed

@@ -0,0 +1,282 @@
+from __future__ import annotations
+from typing import Optional, Sequence
+from relationalai.semantics.internal import internal
+from relationalai.semantics.metamodel.ir import (
+    Require, Logical, Var, Relation, Lookup, ScalarType
+)
+from relationalai.semantics.metamodel import builtins
+"""
+Helper functions for converting `Require` nodes with unique constraints to functional
+dependencies. The main functionalities provided are:
+    1. Check whether a `Require` node is a valid unique constraint representation
+    2. Represent the uniqueness constraint as a functional dependency
+    3. Check if the functional dependency is structural i.e., can be represented with
+         `@function(k)` annotation on a single relation.
+=========================== Structure of unique constraints ================================
+A `Require` node represents a _unique constraint_ if it meets the following criteria:
+ * the `Require` node's `domain` is an empty `Logical` node
+ * the `Require` node's `checks` has a single `Check` node
+ * the single `Check` node has `Logical` task that is a list of `Lookup` tasks
+ * precisely one `Lookup` task in the `Check` uses the `unique` builtin relation name
+ * the `unique` lookup has precisely one argument, which is a `TupleArg` or a `tuple`
+   containing at least one `Var`
+ * all `Lookup` nodes use variables only (no constants)
+ * the variables used in the `unique` lookup are a subset of the variables used in other
+   lookups
+============================================================================================
+We use the following unique constraint as the running example.
+```
+Require
+  domain
+    Logical
+  checks:
+    Check
+      check:
+        Logical
+          Person(person::Person)
+          first_name(person::Person, firstname::String)
+          last_name(person::Person, lastname::String)
+          unique((firstname::String, lastname::String))
+      error:
+        ...
+```
+=========================== Semantics of unique constraints ================================
+A unique constraint states that the columns declared in the `unique` predicate must be
+unique in the result of the conjunctive query consisting of all remaining predicates.
+============================================================================================
+In the running example, the conjunctive query computes a table with 3 columns, the person id
+`person::Person`, the first name `firstname::String`, and the last name `lastname::String`.
+The uniqueness predicate `unique((firstname::String, lastname::String))` states that no person
+can have more than a single combination of first and last name.
+The unique constraint in the running example above corresponds to the following functional
+dependency.
+```
+Person(x) ∧ first_name(x, y) ∧ last_name(x, z): {y, z} -> {x}
+```
+------------------------------ Redundant Type Atoms ----------------------------------------
+At the time of writing, PyRel does not yet remove redundant unary atoms. For instance, in
+the running example, the atom `Person(person::Person)` is redundant because the type of the
+`person` variable is specified in the other two atoms `first_name` and `last_name`.
+Consequently, we identify redundant atoms and remove them from the definition of the
+corresponding functional dependency.
+Formally, a _guard_ atom is any `Lookup` node whose relation name is not `unique`. Now, a
+unary guard atom `T(x::T)` is _redundant_ if the uniqueness constraint has a non-unary guard
+atom `R(...,x::T,...)`.
+================================ Normalized FDs ============================================
+Now, the _(normalized)_ functional dependency_ corresponding to a unique constraint is an
+object of the form `φ: X → Y`, where :
+ 1. `φ` is the set of all non-redundant guard atoms.
+ 2. `X` is the set of variables used in the `unique` atom
+ 3. `Y` is the set of all other variables used in the constraint
+============================================================================================
+The normalized functional dependency corresponding to the unique constraints from the running
+example is :
+```
+first_name(person::Person, firstname::String) ∧ last_name(person::Person, lastname::String): {firstname:String, lastname:String} -> {person:Person}
+```
+Note that the unary atom `Person(person::Person)` is redundant and thus omitted from the
+decomposition.
+Some simple functional dependencies can, however, be expressed simply with `@function(k)`
+attribute of a single relation.  Specifically, a functional dependency `φ: X → Y` is
+_structural_ if φ consists of a single atom `R(x1,...,xm,y1,...,yk)` and `X = {x1,...,xm}`.
+"""
+#
+# Checks that an input `Require` node is a valid unique constraint. Returns `None` if not.
+# If yes, we return the decomposition of the unique constraint as a tuple
+# `(all_vars, unique_vars, guard)`, where
+# - `all_vars` is the list of all variables used in the constraint
+# - `unique_vars` is the list of variables used in the `unique` atom
+# - `guard` is the list of all other `Lookup` atoms
+#
+def _split_unique_require_node(node: Require) -> Optional[tuple[list[Var], list[Var], list[Lookup]]]:
+    if not isinstance(node.domain, Logical):
+        return None
+    if len(node.domain.body) != 0:
+        return None
+    if len(node.checks) != 1:
+        return None
+    check = node.checks[0]
+    if not isinstance(check.check, Logical):
+        return None
+    unique_atom: Optional[Lookup] = None
+    guard: list[Lookup] = []
+    for task in check.check.body:
+        if not isinstance(task, Lookup):
+            return None
+        if task.relation.name == builtins.unique.name:
+            if unique_atom is not None:
+                return None
+            unique_atom = task
+        else:
+            guard.append(task)
+    if unique_atom is None:
+        return None
+    # collect variables
+    all_vars: set[Var] = set()
+    for lookup in guard:
+        for arg in lookup.args:
+            if not isinstance(arg, Var):
+                return None
+            all_vars.add(arg)
+    unique_vars: set[Var] = set()
+    if len(unique_atom.args) != 1:
+        return None
+    if not isinstance(unique_atom.args[0], (internal.TupleArg, tuple)):
+        return None
+    if len(unique_atom.args[0]) == 0:
+        return None
+    for arg in unique_atom.args[0]:
+        if not isinstance(arg, Var):
+            return None
+        unique_vars.add(arg)
+    # check that unique vars are a subset of other vars
+    if not unique_vars.issubset(all_vars):
+        return None
+    return list(all_vars), list(unique_vars), guard
+def is_valid_unique_constraint(node: Require) -> bool:
+    """
+    Checks whether the input `Require` node is a valid unique constraint. See description at
+    the top of the file for details.
+    """
+    return _split_unique_require_node(node) is not None
+#
+# A unary guard atom `T(x::T)` is redundant if the constraint contains a non-unary atom
+# `R(...,x::T,...)`.  We discard all redundant guard atoms in the constructed fd.
+#
+def normalized_fd(node: Require) -> Optional[FunctionalDependency]:
+    """
+    If the input `Require` node is a uniqueness constraint, constructs its reduced
+    functional dependency `φ: X -> Y`, where `φ` contains all non-redundant guard atoms,
+    `X` are the variables used in the `unique` atom, and `Y` are the remaining variables.
+    Returns `None` if the input node is not a valid uniqueness constraint.
+    """
+    parts = _split_unique_require_node(node)
+    if parts is None:
+        return None
+    all_vars, unique_vars, guard_atoms = parts
+    # remove redundant lookups
+    redundant_guard_atoms: list[Lookup] = []
+    for atom in guard_atoms:
+        # the atom is unary A(x::T)
+        if len(atom.args) != 1:
+            continue
+        var = atom.args[0]
+        assert isinstance(var, Var)
+        # T is a scalar type (which includes entity types)
+        var_type = var.type
+        if not isinstance(var_type, ScalarType):
+            continue
+        # the atom is a entity typing T(x::T) i.e., T = A (and hence not a Boolean property)
+        var_type_name = var_type.name
+        rel_name = atom.relation.name
+        if rel_name != var_type_name:
+            continue
+        # Found an atom of the form T(x::T)
+        # check if there is another atom R(...,x::T,...)
+        for typed_atom in guard_atoms:
+            if len(typed_atom.args) == 1:
+                continue
+            if var in typed_atom.args:
+                redundant_guard_atoms.append(atom)
+                break
+    guard = [atom for atom in guard_atoms if atom not in redundant_guard_atoms]
+    keys = unique_vars
+    values = [v for v in all_vars if v not in keys]
+    return FunctionalDependency(guard, keys, values)
+class FunctionalDependency:
+    """
+    Represents a functional dependency of the form `φ: X -> Y`, where
+     - `φ` is a set of `Lookup` atoms
+     - `X` and `Y` are disjoint and covering sets of variables used in `φ`
+    """
+    def __init__(self, guard: Sequence[Lookup], keys: Sequence[Var], values: Sequence[Var]):
+        self.guard = frozenset(guard)
+        self.keys = frozenset(keys)
+        self.values = frozenset(values)
+        assert self.keys.isdisjoint(self.values), "Keys and values must be disjoint"
+        # for structural fd check
+        self._is_structural:bool = False
+        self._structural_relation:Optional[Relation] = None
+        self._structural_rank:Optional[int] = None
+        self._determine_is_structural()
+    # A functional dependency `φ: X → Y` is _k-functional_ if `φ` consists of a single atom
+    # `R(x1,...,xm,y1,...,yk)` and `X = {x1,...,xm}`. Not all functional dependencies are
+    # k-functional. For instance, `R(x, y, z): {y, z} → {x}` cannot be expressed with
+    # `@function`. neither can `R(x, y) ∧ P(x, z) : {x} → {y, z}`.
+    def _determine_is_structural(self):
+        if len(self.guard) != 1:
+            self._is_structural = False
+            return
+        atom = next(iter(self.guard))
+        atom_vars = atom.args
+        if len(atom_vars) <= len(self.keys): # @function(0) provides no information
+            self._is_structural = False
+            return
+        prefix_vars = atom_vars[:len(self.keys)]
+        if set(prefix_vars) != set(self.keys):
+            self._is_structural = False
+            return
+        self._is_structural = True
+        self._structural_relation = atom.relation
+        self._structural_rank = len(atom_vars) - len(self.keys)
+    @property
+    def is_structural(self) -> bool:
+        """
+        Whether the functional dependency is functional, i.e., can be represented
+        with `@function(k)` annotation on a single relation.
+        """
+        return self._is_structural
+    @property
+    def structural_relation(self) -> Relation:
+        """
+        The structural relation of a functional dependency. Raises ValueError if the functional
+        dependency is not structural.
+        """
+        if not self._is_structural:
+            raise ValueError("Functional dependency is not structural")
+        assert self._structural_relation is not None
+        return self._structural_relation
+    @property
+    def structural_rank(self) -> int:
+        """
+        The structural rank k of k-structural fd. Raises ValueError if the structural
+        dependency is not k-structural.
+        """
+        if not self._is_structural:
+            raise ValueError("Functional dependency is not structural")
+        assert self._structural_rank is not None
+        return self._structural_rank

relationalai/semantics/metamodel/builtins.py CHANGED Viewed

@@ -495,6 +495,11 @@ output_keys_annotation = f.annotation(output_keys, [])
 function = f.relation("function", [f.input_field("code", types.Symbol)])
 function_checked_annotation = f.annotation(function, [f.lit("checked")])
 function_annotation = f.annotation(function, [])
+function_ranked = f.relation("function", [f.input_field("code", types.Symbol), f.input_field("rank", types.Int64)])
+def function_ranked_checked_annotation(k:int) -> ir.Annotation:
+    return f.annotation(function_ranked, [f.lit("checked"), f.lit(k)])
+def function_ranked_annotation(k:int) -> ir.Annotation:
+    return f.annotation(function_ranked, [f.lit(k)])
 # Indicates this relation should be tracked in telemetry. Supported for Relationships and Concepts.
 # `RAI_BackIR.with_relation_tracking` produces log messages at the start and end of each
@@ -654,6 +659,7 @@ rel_primitive_solverlib_ho_appl = aggregation("rel_primitive_solverlib_ho_appl",
 ])
 implies = f.relation("implies", [f.input_field("a", types.Bool), f.input_field("b", types.Bool)])
 all_different = aggregation("all_different", [f.input_field("over", types.Any)])
+special_ordered_set_type_2 = aggregation("special_ordered_set_type_2", [f.input_field("rank", types.Any)])
 # graph primitive algorithm helpers
 infomap = aggregation("infomap", [

relationalai/semantics/metamodel/rewrite/__init__.py CHANGED Viewed

@@ -2,5 +2,6 @@ from .discharge_constraints import DischargeConstraints
 from .dnf_union_splitter import DNFUnionSplitter
 from .extract_nested_logicals import ExtractNestedLogicals
 from .flatten import Flatten
+from .format_outputs import FormatOutputs
-__all__ = ["DischargeConstraints", "DNFUnionSplitter", "ExtractNestedLogicals", "Flatten"]
+__all__ = ["DischargeConstraints", "DNFUnionSplitter", "ExtractNestedLogicals", "Flatten", "FormatOutputs"]

relationalai/semantics/metamodel/rewrite/dnf_union_splitter.py CHANGED Viewed

@@ -150,7 +150,7 @@ class DNFExtractor(Visitor):
             replacement_tasks: list[ir.Task] = []
             for body in replacement_bodies:
-                new_task = f.logical(body)
+                new_task = f.logical(body, node.hoisted)
                 replacement_tasks.append(new_task)
             self.replaced_by[node] = replacement_tasks

relationalai/semantics/metamodel/rewrite/extract_nested_logicals.py CHANGED Viewed

@@ -1,7 +1,7 @@
 from __future__ import annotations
 from relationalai.semantics.metamodel import ir, factory as f, helpers
-from relationalai.semantics.metamodel.visitor import Rewriter
+from relationalai.semantics.metamodel.visitor import Rewriter, collect_by_type
 from relationalai.semantics.metamodel.compiler import Pass
 from relationalai.semantics.metamodel.util import OrderedSet, ordered_set, NameCache
 from relationalai.semantics.metamodel import dependency
@@ -48,10 +48,10 @@ class LogicalExtractor(Rewriter):
         # variables (which is currently done by flatten), such as when the parent is a Match
         # or a Union, of if the logical has a Rank.
         if not (
-            node.hoisted and
+            logical.hoisted and
             not isinstance(parent, (ir.Match, ir.Union)) and
-            all(isinstance(v, ir.Var) for v in node.hoisted) and
-            not any(isinstance(c, ir.Rank) for c in node.body)
+            all(isinstance(v, ir.Var) for v in logical.hoisted) and
+            not any(isinstance(c, ir.Rank) for c in logical.body)
             ):
             return logical
@@ -61,11 +61,11 @@ class LogicalExtractor(Rewriter):
         # if there are aggregations, make sure we don't expose the projected and input vars,
         # but expose groupbys
-        for child in node.body:
-            if isinstance(child, ir.Aggregate):
-                exposed_vars.difference_update(child.projection)
-                exposed_vars.difference_update(helpers.aggregate_inputs(child))
-                exposed_vars.update(child.group)
+        for agg in collect_by_type(ir.Aggregate, logical):
+            exposed_vars.difference_update(agg.projection)
+            exposed_vars.difference_update(helpers.aggregate_inputs(agg))
+            exposed_vars.update(agg.group)
         # add the values (hoisted)
         exposed_vars.update(helpers.hoisted_vars(logical.hoisted))

relationalai/semantics/metamodel/rewrite/flatten.py CHANGED Viewed

@@ -3,12 +3,11 @@ from dataclasses import dataclass
 from typing import cast, Optional, TypeVar
 from typing import Tuple
-from relationalai.semantics.metamodel import builtins, ir, factory as f, helpers, types
+from relationalai.semantics.metamodel import builtins, ir, factory as f, helpers
 from relationalai.semantics.metamodel.compiler import Pass, group_tasks
 from relationalai.semantics.metamodel.util import OrderedSet, ordered_set, NameCache
 from relationalai.semantics.metamodel import dependency
-from relationalai.semantics.metamodel.util import FrozenOrderedSet, filter_by_type
-from relationalai.semantics.metamodel.typer.typer import to_type, is_primitive
+from relationalai.semantics.metamodel.typer.typer import to_type
 class Flatten(Pass):
     """
@@ -225,15 +224,26 @@ class Flatten(Pass):
             "ranks": ir.Rank,
         })
-        # if there are outputs, adjust them (depending on the config for wide vs gnf)
+        # If there are outputs, flatten each into its own top-level rule, along with its
+        # dependencies.
         if groups["outputs"]:
-            if self._handle_outputs:
-                return self.adjust_outputs(task, body, groups, ctx)
-            else:
-                # When we do not handle outputs. For example, in SQL compiler. We need to leave output as a top-level element.
+            if not self._handle_outputs:
                 ctx.rewrite_ctx.top_level.append(ir.Logical(task.engine, task.hoisted, tuple(body), task.annotations))
                 return Flatten.HandleResult(None)
+            # Analyze the dependencies in the newly rewritten body
+            new_logical = ir.Logical(task.engine, task.hoisted, tuple(body))
+            info = dependency.analyze(new_logical)
+            for output in groups["outputs"]:
+                assert(isinstance(output, ir.Output))
+                new_body = info.task_dependencies(output)
+                new_body.update(ctx.extra_tasks)
+                new_body.add(output)
+                ctx.rewrite_ctx.top_level.append(ir.Logical(task.engine, task.hoisted, tuple(new_body), task.annotations))
+            return Flatten.HandleResult(None)
         # if there are updates, extract as a new top level rule
         if groups["updates"]:
             # add task dependencies to the body
@@ -455,147 +465,6 @@ class Flatten(Pass):
                 task.annotations
             ))
-    #--------------------------------------------------
-    # GNF vs wide output support
-    #--------------------------------------------------
-    def adjust_outputs(self, task: ir.Logical, body: OrderedSet[ir.Task], groups: dict[str, OrderedSet[ir.Task]], ctx: Context):
-        # for wide outputs, only adjust the output task to include the keys.
-        if ctx.options.get("wide_outputs", False):
-            for output in groups["outputs"]:
-                assert(isinstance(output, ir.Output))
-                if output.keys:
-                    body.remove(output)
-                    body.add(self.rewrite_wide_output(output))
-            # self.remove_subsumptions(body, ctx)
-            return Flatten.HandleResult(ir.Logical(task.engine, task.hoisted, tuple(body), task.annotations))
-        # for GNF outputs we need to generate a rule for each "column" in the output
-        else:
-            # first split outputs in potentially multiple outputs, one for each "column"
-            for output in groups["outputs"]:
-                assert(isinstance(output, ir.Output))
-                if output.keys:
-                    # we will replace the output bellow,
-                    body.remove(output)
-                    is_export = builtins.export_annotation in output.annotations
-                    # generate an output for each "column"
-                    # output looks like def output(:cols, :col000, key0, key1, value):
-                    original_cols = OrderedSet()
-                    for idx, alias in enumerate(output.aliases):
-                        # skip None values which are used as a placeholder for missing values
-                        if alias[1] is None:
-                            continue
-                        original_cols.add(alias[1])
-                        self._generate_output_column(body, output, idx, alias, is_export)
-                    idx = len(output.aliases)
-                    for key in output.keys:
-                        if key not in original_cols:
-                            self._generate_output_column(body, output, idx, (key.name, key), is_export)
-                            idx += 1
-            # analyse the resulting logical to be able to pull dependencies
-            logical = ir.Logical(task.engine, task.hoisted, tuple(body), task.annotations)
-            info = dependency.analyze(logical)
-            # now extract a logical for each output, bringing together its dependencies
-            for output in filter_by_type(body, ir.Output):
-                deps = info.task_dependencies(output)
-                # TODO: verify safety of doing this
-                # self.remove_subsumptions(deps, ctx)
-                deps.add(output)
-                ctx.rewrite_ctx.top_level.append(ir.Logical(task.engine, tuple(), tuple(deps)))
-            return Flatten.HandleResult(None)
-    def _generate_output_column(self, body: OrderedSet[ir.Task], output: ir.Output, idx: int, alias: tuple[str, ir.Value], is_export: bool):
-        if not output.keys:
-            return output
-        aliases = [("cols", f.literal("cols", types.Symbol))] if not is_export else []
-        aliases.append(("col", f.literal(f"col{idx:03}", types.Symbol)))
-        for k in output.keys:
-            aliases.append((f"key_{k.name}_{idx}", k))
-        if (is_export and
-            isinstance(alias[1], ir.Var) and
-            (not is_primitive(alias[1].type) or alias[1].type == types.Hash)):
-            uuid = f.var(f"{alias[0]}_{idx}_uuid", types.String)
-            body.add(f.lookup(builtins.uuid_to_string, [alias[1], uuid]))
-            aliases.append((uuid.name, uuid))
-        else:
-            aliases.append(alias)
-        body.add(ir.Output(
-            output.engine,
-            FrozenOrderedSet.from_iterable(aliases),
-            output.keys,
-            output.annotations
-        ))
-    def remove_subsumptions(self, body:OrderedSet[ir.Task], ctx: Context):
-        # remove from the body all the tasks that are subsumed by some other task in the set;
-        # this can be done because some tasks are references to extracted nested logical that
-        # contain filters they dependend on, so we don't need those filters here if the
-        # reference is present.
-        for logical in filter_by_type(body, ir.Logical):
-            if logical.id in ctx.included:
-                # if the logical id is included, it means it's a reference to an extracted
-                # rule, so remove all other items in the body that are already included in
-                # the body referenced by it
-                for item in body:
-                    if item in ctx.included[logical.id]:
-                        body.remove(item)
-    def rewrite_wide_output(self, output: ir.Output):
-        assert(output.keys)
-        # only append keys that are not already in the output
-        suffix_keys = []
-        for key in output.keys:
-            if all([val is not key for _, val in output.aliases]):
-                suffix_keys.append(key)
-        aliases: OrderedSet[Tuple[str, ir.Value]] = ordered_set()
-        # add the remaining args, unless it is already a key
-        for name, val in output.aliases:
-            if not isinstance(val, ir.Var) or val not in suffix_keys:
-                aliases.add((name, val))
-        # add the keys to the output
-        for key in suffix_keys:
-            aliases.add((key.name, key))
-        # TODO - we are assuming that the Rel compiler will translate nullable lookups
-        # properly, returning a `Missing` if necessary, like this:
-        # (nested_192(_adult, _adult_name) or (not nested_192(_adult, _) and _adult_name = Missing)) and
-        return ir.Output(
-            output.engine,
-            aliases.frozen(),
-            output.keys,
-            output.annotations
-        )
-        # TODO: in the rel compiler, see if we can do this outer join
-        # 1. number of keys
-        # 2. each relation
-        # 3. each variable, starting with the keys
-        # 4. tag output with @arrow
-        # @arrow def output(_book, _book_title, _author_name):
-        #   rel_primitive_outer_join(#1, book_title, author_name, _book, _book_title, _author_name)
-        # def output(p, n, c):
-        #     rel_primitive_outer_join(#1, name, coolness, p, n, c)
 #--------------------------------------------------
 # Helpers
 #--------------------------------------------------

relationalai 0.12.7__py3-none-any.whl → 0.12.9__py3-none-any.whl

relationalai 0.12.7py3-none-any.whl → 0.12.9py3-none-any.whl