PyPI - knowledge-graph-rdbms - Versions diffs - 0.1.2__tar.gz → 0.1.4__tar.gz - Mend

knowledge-graph-rdbms 0.1.2tar.gz → 0.1.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (52) hide show

{knowledge_graph_rdbms-0.1.2 → knowledge_graph_rdbms-0.1.4}/CLAUDE.md RENAMED Viewed

@@ -21,6 +21,8 @@ python bench/charts.py                   # render assets/*.png from bench data (
 python bench/runtimes/compare.py         # CPython vs Node vs Bun SQLite comparison
 kg stats                                 # default ontology (~/.kgrdbms/graph.db)
+kg schema                                # observed vocabulary: kinds, edge types, labels, keys-per-kind
+kg schema --samples                      # + example ids and enum-like property values per kind
 kg ontology list                         # the registry (the "db of dbs")
 kg ontology create coffee --stance inferential   # register a named ontology
 kg --ontology coffee node add drink:latte --kind Drink   # route to it (resolver)
@@ -107,4 +109,5 @@ Node ids follow `prefix:reference` — `person:ada-lovelace`, `company:apple`, `
 - **Properties round-trip as JSON.** Storage is `value_json`; ints/bools/lists/objects come back as their JSON type. CLI `--prop key=value` parses value as JSON when possible, else keeps it as a string.
 - **Per-call writes each commit (one fsync).** Wrapping work in `batch()` / using `add_nodes` / `add_edges` collapses to one transaction (~10× faster). Don't add a per-row commit inside a bulk loop.
 - **Reads are not in `service.py`** by design — callers hit `Graph` directly. Don't route reads through the gate.
+- **`schema()` is the discovery primitive.** It returns the *observed* vocabulary (kinds, edge types, labels, and property keys per kind, with counts; `samples=True` adds example ids + enum-like values) so a consumer can query by real values instead of guessing. It's a read like any other (`GraphBackend` method, Postgres port, stub line), and the MCP server instructions tell models to call `kg_schema` first. The schema is *observed, not enforced* — the graph stays schemaless; this just profiles what's there.
 - **CLI exit codes are contractual:** `0` ok, `1` not found / bad input, `2` policy denial, `3` invariant violation. Preserve these.

{knowledge_graph_rdbms-0.1.2 → knowledge_graph_rdbms-0.1.4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: knowledge-graph-rdbms
-Version: 0.1.2
+Version: 0.1.4
 Summary: A label property graph on an RDBMS (SQLite): nodes, typed edges, an append-only event log, and an optional MCP server.
 Project-URL: Homepage, https://github.com/cunicopia-dev/knowledge-graph-rdbms
 Project-URL: Repository, https://github.com/cunicopia-dev/knowledge-graph-rdbms
@@ -34,7 +34,7 @@ Description-Content-Type: text/markdown
 ![Python](https://img.shields.io/badge/python-3.10%2B-3776AB?logo=python&logoColor=white)
 ![License: MIT](https://img.shields.io/badge/license-MIT-green)
 ![core dependencies: 0](https://img.shields.io/badge/core_dependencies-0-success)
-![tests: 79 passing](https://img.shields.io/badge/tests-79_passing-brightgreen)
+![tests: 87 passing](https://img.shields.io/badge/tests-87_passing-brightgreen)
 ![storage: SQLite](https://img.shields.io/badge/storage-SQLite-003B57?logo=sqlite&logoColor=white)
 ![MCP](https://img.shields.io/badge/MCP-ready-FF6F00)
@@ -536,6 +536,8 @@ kg out person:ada                 # outbound edges
 kg path person:ada field:cs       # shortest path
 kg nodes-by-kind Person
 kg stats
+kg schema                         # observed vocabulary — kinds, edge types, labels, keys-per-kind
+kg schema --samples               # + example ids and enum-like property values per kind
 kg --json node get person:ada     # machine-readable output for piping
 kg events -n 10                   # tail the event log
@@ -570,7 +572,8 @@ Or hand-edit a client config (e.g. Claude Desktop):
 { "mcpServers": { "kgrdbms": { "command": "kgrdbms-mcp" } } }
 ```
-It exposes `kg_`-prefixed tools for reads (`kg_node_get`, `kg_nodes_by_kind`,
+It exposes `kg_`-prefixed tools for reads (`kg_schema` — the vocabulary, meant to
+be called first; `kg_node_get`, `kg_nodes_by_kind`,
 `kg_neighborhood`, `kg_shortest_path`, `kg_descendants`, …), gated writes
 (`kg_node_upsert`, `kg_edge_add`, `kg_node_delete`, …), bulk composition
 (`kg_import` — a whole `{nodes, edges}` batch in one call, so an agent populates
@@ -743,6 +746,7 @@ replayable.
 | Command                         | What it does                                  |
 | ------------------------------- | --------------------------------------------- |
 | `kg stats`                      | node/edge counts and db path                  |
+| `kg schema [--samples]`         | observed vocabulary: kinds, edge types, labels, keys-per-kind |
 | `kg node add ID --kind K …`     | create or update a node (gated + logged)      |
 | `kg node get ID`                | fetch a node                                  |
 | `kg node del ID`                | delete a node (cascades edges)                |

{knowledge_graph_rdbms-0.1.2 → knowledge_graph_rdbms-0.1.4}/README.md RENAMED Viewed

@@ -3,7 +3,7 @@
 ![Python](https://img.shields.io/badge/python-3.10%2B-3776AB?logo=python&logoColor=white)
 ![License: MIT](https://img.shields.io/badge/license-MIT-green)
 ![core dependencies: 0](https://img.shields.io/badge/core_dependencies-0-success)
-![tests: 79 passing](https://img.shields.io/badge/tests-79_passing-brightgreen)
+![tests: 87 passing](https://img.shields.io/badge/tests-87_passing-brightgreen)
 ![storage: SQLite](https://img.shields.io/badge/storage-SQLite-003B57?logo=sqlite&logoColor=white)
 ![MCP](https://img.shields.io/badge/MCP-ready-FF6F00)
@@ -505,6 +505,8 @@ kg out person:ada                 # outbound edges
 kg path person:ada field:cs       # shortest path
 kg nodes-by-kind Person
 kg stats
+kg schema                         # observed vocabulary — kinds, edge types, labels, keys-per-kind
+kg schema --samples               # + example ids and enum-like property values per kind
 kg --json node get person:ada     # machine-readable output for piping
 kg events -n 10                   # tail the event log
@@ -539,7 +541,8 @@ Or hand-edit a client config (e.g. Claude Desktop):
 { "mcpServers": { "kgrdbms": { "command": "kgrdbms-mcp" } } }
 ```
-It exposes `kg_`-prefixed tools for reads (`kg_node_get`, `kg_nodes_by_kind`,
+It exposes `kg_`-prefixed tools for reads (`kg_schema` — the vocabulary, meant to
+be called first; `kg_node_get`, `kg_nodes_by_kind`,
 `kg_neighborhood`, `kg_shortest_path`, `kg_descendants`, …), gated writes
 (`kg_node_upsert`, `kg_edge_add`, `kg_node_delete`, …), bulk composition
 (`kg_import` — a whole `{nodes, edges}` batch in one call, so an agent populates
@@ -712,6 +715,7 @@ replayable.
 | Command                         | What it does                                  |
 | ------------------------------- | --------------------------------------------- |
 | `kg stats`                      | node/edge counts and db path                  |
+| `kg schema [--samples]`         | observed vocabulary: kinds, edge types, labels, keys-per-kind |
 | `kg node add ID --kind K …`     | create or update a node (gated + logged)      |
 | `kg node get ID`                | fetch a node                                  |
 | `kg node del ID`                | delete a node (cascades edges)                |

{knowledge_graph_rdbms-0.1.2 → knowledge_graph_rdbms-0.1.4}/kgrdbms/__init__.py RENAMED Viewed

@@ -12,7 +12,7 @@ A small, dependency-free knowledge-graph core:
 from __future__ import annotations
-__version__ = "0.1.2"
+__version__ = "0.1.4"
 from kgrdbms.graph import Edge, Graph, Node, default_graph_path, slug
 from kgrdbms.events import (

{knowledge_graph_rdbms-0.1.2 → knowledge_graph_rdbms-0.1.4}/kgrdbms/backends/base.py RENAMED Viewed

@@ -47,6 +47,7 @@ class GraphBackend(Protocol):
     def count_edges_by_type(self) -> dict[str, int]: ...
     def total_nodes(self) -> int: ...
     def total_edges(self) -> int: ...
+    def schema(self, *, samples: bool = ..., sample_limit: int = ...) -> dict: ...
     # bulk: a context manager that defers commits to one transaction
     def batch(self) -> Any: ...
     def close(self) -> None: ...
@@ -93,6 +94,7 @@ class _StubBackend:
     def count_edges_by_type(self, *a: Any, **k: Any) -> dict[str, int]: return self._todo("count_edges_by_type")
     def total_nodes(self, *a: Any, **k: Any) -> int: return self._todo("total_nodes")
     def total_edges(self, *a: Any, **k: Any) -> int: return self._todo("total_edges")
+    def schema(self, *a: Any, **k: Any) -> dict: return self._todo("schema")
     @contextmanager
     def batch(self) -> Iterator["_StubBackend"]:

{knowledge_graph_rdbms-0.1.2 → knowledge_graph_rdbms-0.1.4}/kgrdbms/backends/postgres.py RENAMED Viewed

@@ -450,6 +450,73 @@ class PostgresGraph:
     def total_edges(self) -> int:
         return self.conn.execute("SELECT COUNT(*) AS c FROM edges").fetchone()["c"]
+    def schema(self, *, samples: bool = False, sample_limit: int = 20) -> dict:
+        """Observed schema (kinds, edge types, labels, property keys per kind),
+        mirroring Graph.schema. Pure aggregates over the same five tables; jsonb
+        values arrive already parsed, so no json.loads on the sampling path."""
+        kinds = self.count_nodes_by_kind()
+        edge_types = self.count_edges_by_type()
+        labels = {
+            r["label"]: r["c"]
+            for r in self.conn.execute(
+                "SELECT label, COUNT(*) AS c FROM node_labels GROUP BY label ORDER BY c DESC"
+            ).fetchall()
+        }
+        node_keys_by_kind: dict[str, dict[str, int]] = {}
+        for r in self.conn.execute(
+            "SELECT n.kind AS kind, p.key AS key, COUNT(*) AS c "
+            "FROM node_properties p JOIN nodes n ON n.id = p.node_id "
+            "GROUP BY n.kind, p.key ORDER BY n.kind, c DESC"
+        ).fetchall():
+            node_keys_by_kind.setdefault(r["kind"], {})[r["key"]] = r["c"]
+        edge_keys = {
+            r["key"]: r["c"]
+            for r in self.conn.execute(
+                "SELECT key, COUNT(*) AS c FROM edge_properties GROUP BY key ORDER BY c DESC"
+            ).fetchall()
+        }
+        result: dict[str, Any] = {
+            "nodes_total": self.total_nodes(),
+            "edges_total": self.total_edges(),
+            "kinds": kinds,
+            "edge_types": edge_types,
+            "labels": labels,
+            "node_keys_by_kind": node_keys_by_kind,
+            "edge_keys": edge_keys,
+        }
+        if samples:
+            result["samples"] = self._schema_samples(kinds, node_keys_by_kind, sample_limit)
+        return result
+    def _schema_samples(
+        self, kinds: dict[str, int], node_keys_by_kind: dict[str, dict[str, int]], sample_limit: int
+    ) -> dict[str, dict]:
+        from kgrdbms.graph import _scalar_samples
+        samples: dict[str, dict] = {}
+        for kind in kinds:
+            example_ids = [
+                r["id"]
+                for r in self.conn.execute(
+                    "SELECT id FROM nodes WHERE kind=%s ORDER BY id LIMIT 5", (kind,)
+                ).fetchall()
+            ]
+            values: dict[str, list] = {}
+            for key in node_keys_by_kind.get(kind, {}):
+                rows = self.conn.execute(
+                    "SELECT DISTINCT p.value_json FROM node_properties p "
+                    "JOIN nodes n ON n.id = p.node_id "
+                    "WHERE n.kind=%s AND p.key=%s LIMIT %s",
+                    (kind, key, sample_limit + 1),
+                ).fetchall()
+                if len(rows) > sample_limit:
+                    continue  # open-ended / free-text field — don't enumerate it
+                vals = _scalar_samples(r["value_json"] for r in rows)  # jsonb already parsed
+                if vals is not None:
+                    values[key] = vals
+            samples[kind] = {"example_ids": example_ids, "values": values}
+        return samples
     # ---- hydration (jsonb returns parsed values — no json.loads) --------
     def _hydrate_node(self, row: dict) -> Node:

{knowledge_graph_rdbms-0.1.2 → knowledge_graph_rdbms-0.1.4}/kgrdbms/cli.py RENAMED Viewed

@@ -21,6 +21,7 @@ from __future__ import annotations
 import argparse
 import json
+import sqlite3
 import sys
 from typing import Any
@@ -172,6 +173,43 @@ def cmd_stats(app: App, args) -> int:
     return 0
+def cmd_schema(app: App, args) -> int:
+    """The observed schema: kinds, edge types, labels, and property keys per kind.
+    The map to read *before* querying an unfamiliar ontology — so you query by a
+    kind/label/key that actually exists instead of guessing.
+    """
+    res = app.graph.schema(samples=args.samples)
+    if res.get("ontology") is None and app.ontology:
+        res = {"ontology": app.ontology, **res}
+    lines: list[str] = []
+    if app.ontology:
+        lines.append(f"ontology: {app.ontology}")
+    lines.append(f"nodes: {res['nodes_total']:,}   edges: {res['edges_total']:,}")
+    lines.append("")
+    lines.append("kinds (node count) and their property keys:")
+    for kind, n in res["kinds"].items():
+        keys = res["node_keys_by_kind"].get(kind, {})
+        keystr = ", ".join(f"{k}×{c}" for k, c in keys.items()) or "(no properties)"
+        lines.append(f"  {kind}  ×{n}")
+        lines.append(f"      keys: {keystr}")
+        if args.samples:
+            samp = res.get("samples", {}).get(kind, {})
+            ex = ", ".join(samp.get("example_ids", []))
+            if ex:
+                lines.append(f"      e.g. {ex}")
+            for k, vals in samp.get("values", {}).items():
+                lines.append(f"      {k} ∈ {{{', '.join(str(v) for v in vals)}}}")
+    lines.append("")
+    lines.append("edge types: " + (", ".join(f"{t}×{c}" for t, c in res["edge_types"].items()) or "(none)"))
+    lines.append("labels: " + (", ".join(f"{l}×{c}" for l, c in res["labels"].items()) or "(none)"))
+    if res["edge_keys"]:
+        lines.append("edge keys: " + ", ".join(f"{k}×{c}" for k, c in res["edge_keys"].items()))
+    app.emit(res, "\n".join(lines))
+    return 0
 # ---- registry handlers (the control plane / db-of-dbs) ---------------
@@ -431,6 +469,12 @@ def build_parser() -> argparse.ArgumentParser:
     sp = sub.add_parser("stats", help="node/edge counts, db path, active ontology")
     sp.set_defaults(func=cmd_stats)
+    sp = sub.add_parser("schema", help="observed schema: kinds, edge types, labels, "
+                                       "property keys per kind (read this before querying)")
+    sp.add_argument("--samples", action="store_true",
+                    help="also show example node ids and enum-like property values per kind")
+    sp.set_defaults(func=cmd_schema)
     # ---- ontology registry (the db-of-dbs) ----
     ont = sub.add_parser("ontology", help="manage the ontology registry").add_subparsers(dest="action", required=True)
     a = ont.add_parser("list", help="list registered ontologies")
@@ -600,6 +644,11 @@ def main(argv: list[str] | None = None) -> int:
     except (KeyError, ValueError, FileNotFoundError) as e:
         print(f"error: {e}", file=sys.stderr)
         return 1
+    except sqlite3.IntegrityError as e:
+        # Safety net for any FK/constraint path not pre-checked in service.py
+        # (e.g. restoring a deleted node whose edge endpoint is since gone).
+        print(f"error: {e}", file=sys.stderr)
+        return 1
     finally:
         app.close()

{knowledge_graph_rdbms-0.1.2 → knowledge_graph_rdbms-0.1.4}/kgrdbms/events.py RENAMED Viewed

@@ -53,6 +53,7 @@ OP_NODE_DEL_PROPERTY = "NODE_DEL_PROPERTY"
 OP_EDGE_ADD = "EDGE_ADD"
 OP_EDGE_REMOVE = "EDGE_REMOVE"
 OP_RESTORE = "RESTORE"        # re-create a captured node + its edges (used to undo a delete)
+OP_NODE_RESTORE_STATE = "NODE_RESTORE_STATE"  # exact-restore a node to a prior snapshot (undo of an upsert)
 OP_BATCH = "BATCH"            # add many nodes + edges in one event
 OP_GENESIS = "GENESIS"
@@ -226,7 +227,11 @@ class EventLog:
             prior = p.get("prior")
             if prior is None:
                 return OP_NODE_DELETE, {"node": p["after"], "edges": []}
-            return OP_NODE_UPSERT, {"after": prior, "prior": p["after"]}
+            # A plain re-upsert of `prior` would MERGE, so it cannot remove the
+            # labels/properties this upsert *added* — leaving a non-inverse. The
+            # exact-restore op carries the prior snapshot plus the `added` delta,
+            # so it can strip those additions and rebuild `prior` precisely.
+            return OP_NODE_RESTORE_STATE, {"node": prior, "added": p["after"]}
         if op == OP_NODE_DELETE:
             return OP_RESTORE, {"node": p["node"], "edges": p.get("edges", [])}
         if op == OP_NODE_SET_LABEL:
@@ -261,6 +266,19 @@ def apply_event(graph: "Graph", ev: GraphEvent) -> None:
                        labels=spec.get("labels", []), properties=spec.get("properties", {}))
         for e in p.get("edges", []):
             graph.add_edge(e["from"], e["to"], e["type"], e.get("properties", {}))
+    elif op == OP_NODE_RESTORE_STATE:
+        # Exact-restore a node to `node`, removing the labels/properties that the
+        # reverted upsert added (`added` is that upsert's delta). add_node is a
+        # merge, so we must strip the additions first, then rebuild the snapshot.
+        target = p["node"]
+        added = p.get("added", {})
+        nid = target["id"]
+        for label in set(added.get("labels", [])) - set(target.get("labels", [])):
+            graph.remove_label(nid, label)
+        for key in set(added.get("properties", {}).keys()) - set(target.get("properties", {}).keys()):
+            graph.del_property(nid, key)
+        graph.add_node(nid, target["kind"], target["name"],
+                       labels=target.get("labels", []), properties=target.get("properties", {}))
     elif op == OP_NODE_SET_LABEL:
         graph.add_label(p["id"], p["label"])
     elif op == OP_NODE_REMOVE_LABEL:

{knowledge_graph_rdbms-0.1.2 → knowledge_graph_rdbms-0.1.4}/kgrdbms/graph.py RENAMED Viewed

@@ -187,6 +187,26 @@ def _normalize_edge(spec: "Edge | dict | tuple | list") -> tuple[str, str, str,
     raise TypeError("edge spec must be an Edge, dict, or (from, to, type[, properties]) tuple")
+def _scalar_samples(values: Iterable[Any], *, max_str: int = 80) -> list | None:
+    """Bounded distinct scalar values for schema sampling.
+    Returns the values sorted, or None to signal "not an enumerable vocabulary"
+    — i.e. some value is a list/object or an over-long string, so showing it as a
+    closed set would mislead. Keeps `schema(samples=True)` from dumping free-text.
+    """
+    out: list = []
+    for v in values:
+        if isinstance(v, str):
+            if len(v) > max_str:
+                return None
+            out.append(v)
+        elif isinstance(v, (int, float)):  # bool is a subclass of int — included
+            out.append(v)
+        else:  # list / dict / None
+            return None
+    return sorted(out, key=str) if out else None
 # ---- graph ----------------------------------------------------------
@@ -622,6 +642,88 @@ class Graph:
     def total_edges(self) -> int:
         return self.conn.execute("SELECT COUNT(*) AS c FROM edges").fetchone()["c"]
+    # ---- schema (observed TBox: the map of what's in the graph) ----
+    def schema(self, *, samples: bool = False, sample_limit: int = 20) -> dict:
+        """The observed schema of the graph — the vocabulary needed to query it
+        without guessing.
+        Returns kinds, edge types, labels, and property keys *per kind*, each
+        with counts. This is a profile of what actually occurs (the graph is
+        schemaless — nothing is enforced), the property-graph analogue of an
+        ontology's TBox derived from its ABox.
+        `samples=True` additionally returns, per kind, a few example node ids
+        (revealing the id/CURIE convention) and — for enum-like properties — the
+        bounded set of distinct scalar values a key takes, turning "there is a
+        key `status`" into "status is one of {active, archived}". A key whose
+        distinct values exceed `sample_limit` (a free-text field) is left
+        un-enumerated rather than dumped.
+        Read-only; pure GROUP BY aggregates. The intended first call for any
+        consumer dropped into an unfamiliar ontology.
+        """
+        kinds = self.count_nodes_by_kind()
+        edge_types = self.count_edges_by_type()
+        labels = {
+            r["label"]: r["c"]
+            for r in self.conn.execute(
+                "SELECT label, COUNT(*) AS c FROM node_labels GROUP BY label ORDER BY c DESC"
+            ).fetchall()
+        }
+        node_keys_by_kind: dict[str, dict[str, int]] = {}
+        for r in self.conn.execute(
+            "SELECT n.kind AS kind, p.key AS key, COUNT(*) AS c "
+            "FROM node_properties p JOIN nodes n ON n.id = p.node_id "
+            "GROUP BY n.kind, p.key ORDER BY n.kind, c DESC"
+        ).fetchall():
+            node_keys_by_kind.setdefault(r["kind"], {})[r["key"]] = r["c"]
+        edge_keys = {
+            r["key"]: r["c"]
+            for r in self.conn.execute(
+                "SELECT key, COUNT(*) AS c FROM edge_properties GROUP BY key ORDER BY c DESC"
+            ).fetchall()
+        }
+        result: dict[str, Any] = {
+            "nodes_total": self.total_nodes(),
+            "edges_total": self.total_edges(),
+            "kinds": kinds,
+            "edge_types": edge_types,
+            "labels": labels,
+            "node_keys_by_kind": node_keys_by_kind,
+            "edge_keys": edge_keys,
+        }
+        if samples:
+            result["samples"] = self._schema_samples(kinds, node_keys_by_kind, sample_limit)
+        return result
+    def _schema_samples(
+        self, kinds: dict[str, int], node_keys_by_kind: dict[str, dict[str, int]], sample_limit: int
+    ) -> dict[str, dict]:
+        samples: dict[str, dict] = {}
+        for kind in kinds:
+            example_ids = [
+                r["id"]
+                for r in self.conn.execute(
+                    "SELECT id FROM nodes WHERE kind=? ORDER BY id LIMIT 5", (kind,)
+                ).fetchall()
+            ]
+            values: dict[str, list] = {}
+            for key in node_keys_by_kind.get(kind, {}):
+                rows = self.conn.execute(
+                    "SELECT DISTINCT p.value_json FROM node_properties p "
+                    "JOIN nodes n ON n.id = p.node_id "
+                    "WHERE n.kind=? AND p.key=? LIMIT ?",
+                    (kind, key, sample_limit + 1),
+                ).fetchall()
+                if len(rows) > sample_limit:
+                    continue  # open-ended / free-text field — don't enumerate it
+                vals = _scalar_samples(json.loads(r["value_json"]) for r in rows)
+                if vals is not None:
+                    values[key] = vals
+            samples[kind] = {"example_ids": example_ids, "values": values}
+        return samples
     # ---- hydration -------------------------------------------------
     def _hydrate_node(self, row: sqlite3.Row) -> Node:

{knowledge_graph_rdbms-0.1.2 → knowledge_graph_rdbms-0.1.4}/kgrdbms/mcp_server.py RENAMED Viewed

@@ -21,6 +21,8 @@ Tool surface (all prefixed kg_):
   reads
     kg_stats              — node/edge counts, the db path, the active ontology
+    kg_schema             — observed vocabulary (kinds, edge types, labels, keys);
+                            read this FIRST to query without guessing
     kg_node_get           — fetch a node by id
     kg_nodes_by_kind      — list all nodes of a kind
     kg_nodes_by_label     — list all nodes carrying a label
@@ -90,7 +92,10 @@ mcp = FastMCP(
         "prefixed kg_ read or mutate nodes (id, kind, name, labels, properties) "
         "and typed directed edges. Every tool takes an optional `ontology` name "
         "(omit for the default); use kg_ontologies_list to discover them and "
-        "kg_ontology_create to add one. Writes are gated by compiled-in "
+        "kg_ontology_create to add one. When working with an ontology whose "
+        "contents you don't already know, call kg_schema FIRST — it returns the "
+        "exact kinds, edge types, labels, and property keys so you can query by "
+        "real values instead of guessing. Writes are gated by compiled-in "
         "invariants and a configurable policy, and recorded to an append-only, "
         "replayable event log."
     ),
@@ -197,6 +202,26 @@ def kg_stats(ontology: str | None = None) -> dict:
     }
+@mcp.tool()
+def kg_schema(samples: bool = False, ontology: str | None = None) -> dict:
+    """The observed schema of an ontology — CALL THIS FIRST when you don't already
+    know what an ontology contains, before kg_nodes_by_kind / kg_nodes_by_label /
+    kg_node_get. It tells you the exact vocabulary so you never have to guess.
+    Returns:
+      - kinds            — every node `kind` and its count
+      - edge_types       — every edge `type` and its count
+      - labels           — every label and its count
+      - node_keys_by_kind— for each kind, which property keys its nodes carry (+counts)
+      - edge_keys        — property keys that appear on edges
+    With samples=True, also returns per kind a few example node ids (showing the
+    id/CURIE convention) and, for enum-like properties, the set of distinct values
+    a key takes (free-text keys are left un-enumerated). Read-only; cheap.
+    """
+    return _bundle(ontology).backend.schema(samples=samples)
 @mcp.tool()
 def kg_node_get(id: str, ontology: str | None = None) -> dict | None:
     """Fetch a single node by id."""

{knowledge_graph_rdbms-0.1.2 → knowledge_graph_rdbms-0.1.4}/kgrdbms/rdf.py RENAMED Viewed

@@ -29,6 +29,7 @@ import json
 import re
 from dataclasses import dataclass, field
 from typing import Any, Iterator
+from urllib.parse import quote, unquote
 from kgrdbms.backends.base import GraphBackend
 from kgrdbms.graph import Edge, Node
@@ -78,6 +79,22 @@ XSD = "http://www.w3.org/2001/XMLSchema#"
 KG = "https://kg.local/vocab#"
+def _enc(segment: str) -> str:
+    """Percent-encode an arbitrary string into a valid IRI path/fragment segment.
+    Node references are slug-safe, but `kind`, edge `type`, and property `key`
+    are free-form user text — a space or '%' there would otherwise emit an IRI
+    no conformant RDF store will parse. `safe=""` also encodes '/', so a key like
+    'a/b' can't masquerade as a path boundary. Inverted by `_dec` on import.
+    """
+    return quote(segment, safe="")
+def _dec(segment: str) -> str:
+    """Invert `_enc` — percent-decode an IRI segment back to its stored value."""
+    return unquote(segment)
 # ---- IRI context: the CURIE -> IRI expansion table -------------------
@@ -106,13 +123,13 @@ class IriContext:
             base = self.prefix_bases.get(prefix, f"{self.default_base}{prefix}/")
         else:
             prefix, ref, base = "", node_id, self.default_base
-        return Iri(f"{base}{ref}")
+        return Iri(f"{base}{_enc(ref)}")
     def prop_predicate(self, key: str) -> Iri:
-        return Iri(f"{self.prop_base}{key}")
+        return Iri(f"{self.prop_base}{_enc(key)}")
     def edge_predicate(self, edge_type: str) -> Iri:
-        return Iri(f"{self.edge_base}{edge_type}")
+        return Iri(f"{self.edge_base}{_enc(edge_type)}")
 # ---- value -> literal typing -----------------------------------------
@@ -163,7 +180,7 @@ def node_to_triples(node: Node, ctx: IriContext) -> list[Triple]:
     s = ctx.expand_node(node.id)
     triples: list[Triple] = [
         # kind -> rdf:type, pointing at a class IRI under the kg vocab.
-        (s, Iri(f"{RDF}type"), Iri(f"{KG}{node.kind}")),
+        (s, Iri(f"{RDF}type"), Iri(f"{KG}{_enc(node.kind)}")),
     ]
     if node.name:
         triples.append((s, Iri(f"{KG}name"), Literal(node.name)))
@@ -494,18 +511,19 @@ def contract_iri(iri: str, ctx: IriContext) -> str:
     # Explicit prefix bindings win (longest base first to avoid prefix overlap).
     for prefix, base in sorted(ctx.prefix_bases.items(), key=lambda kv: -len(kv[1])):
         if iri.startswith(base):
-            return f"{prefix}:{iri[len(base):]}"
+            return f"{prefix}:{_dec(iri[len(base):])}"
     if iri.startswith(ctx.default_base):
         rest = iri[len(ctx.default_base):]
         if "/" in rest:
             prefix, ref = rest.split("/", 1)
-            return f"{prefix}:{ref}"
-        return rest
+            return f"{prefix}:{_dec(ref)}"
+        return _dec(rest)
     return iri  # foreign IRI — keep verbatim
 def _local_after(iri: str, base: str) -> str | None:
-    return iri[len(base):] if iri.startswith(base) else None
+    """Strip `base` and percent-decode the remaining segment (kind/type/key)."""
+    return _dec(iri[len(base):]) if iri.startswith(base) else None
 def triples_to_graph(triples: list[Triple], ctx: IriContext | None = None) -> tuple[list[dict], list[dict]]:

{knowledge_graph_rdbms-0.1.2 → knowledge_graph_rdbms-0.1.4}/kgrdbms/service.py RENAMED Viewed

@@ -95,6 +95,11 @@ def upsert_node(
 def set_label(graph: Graph, events: EventLog, id: str, label: str, actor: str = "anonymous") -> Node | None:
     guard(graph, _node_ctx(graph, id, "node_set_label"))
+    node = graph.node(id)
+    if node is None:
+        raise ValueError(f"node {id!r} does not exist")
+    if label in node.labels:
+        return node  # already present: a true no-op, so don't log a non-invertible event
     graph.add_label(id, label)
     events.record(actor, OP_NODE_SET_LABEL, {"id": id, "label": label})
     return graph.node(id)
@@ -107,7 +112,9 @@ def set_property(
     ctx.property_key = key
     guard(graph, ctx)
     prior_node = graph.node(id)
-    prior_value = prior_node.properties.get(key, _MISSING) if prior_node else _MISSING
+    if prior_node is None:
+        raise ValueError(f"node {id!r} does not exist")
+    prior_value = prior_node.properties.get(key, _MISSING)
     graph.set_property(id, key, value)
     events.record(actor, OP_NODE_SET_PROPERTY, {"id": id, "key": key, "value": value, "prior": prior_value})
     return graph.node(id)
@@ -139,6 +146,9 @@ def add_edge(
 ) -> Edge:
     ctx = MutationContext(operation="edge_add", edge_type=type, from_node_id=from_id, to_node_id=to_id)
     guard(graph, ctx)
+    for endpoint, role in ((from_id, "from"), (to_id, "to")):
+        if graph.node(endpoint) is None:
+            raise ValueError(f"{role} node {endpoint!r} does not exist")
     edge = graph.add_edge(from_node=from_id, to_node=to_id, type=type, properties=properties or {})
     events.record(actor, OP_EDGE_ADD, {"edge": edge_spec(edge)})
     return edge

{knowledge_graph_rdbms-0.1.2 → knowledge_graph_rdbms-0.1.4}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "knowledge-graph-rdbms"
-version = "0.1.2"
+version = "0.1.4"
 description = "A label property graph on an RDBMS (SQLite): nodes, typed edges, an append-only event log, and an optional MCP server."
 readme = "README.md"
 requires-python = ">=3.10"

{knowledge_graph_rdbms-0.1.2 → knowledge_graph_rdbms-0.1.4}/tests/test_cli.py RENAMED Viewed

@@ -189,3 +189,45 @@ def test_rdf_export_lossy_reports_dropped(db, capsys):
     assert "rel/influences" in captured.out      # bare edge present
     assert "prop/since" not in captured.out       # property dropped
     assert "dropped" in captured.err              # but loudly, not silently
+# ---- regression: FK violations exit 1 cleanly, no traceback ---------
+def test_set_label_missing_node_exits_1(db, capsys):
+    assert run(db, "node", "add-label", "ghost:1", "L") == 1
+    err = capsys.readouterr().err
+    assert "does not exist" in err and "Traceback" not in err
+def test_set_prop_missing_node_exits_1(db, capsys):
+    assert run(db, "node", "set-prop", "ghost:1", "k", "1") == 1
+    assert "does not exist" in capsys.readouterr().err
+def test_edge_add_missing_endpoint_exits_1(db, capsys):
+    run(db, "node", "add", "x:1", "--kind", "T")
+    capsys.readouterr()
+    assert run(db, "edge", "add", "x:1", "y:1", "LINK") == 1
+    err = capsys.readouterr().err
+    assert "to node 'y:1' does not exist" in err and "Traceback" not in err
+def test_schema_json_lists_kinds_and_keys(db, capsys):
+    run(db, "node", "add", "person:ada", "--kind", "Person", "--prop", "role=analyst")
+    run(db, "node", "add", "memory:m1", "--kind", "Memory", "--prop", "importance=high")
+    capsys.readouterr()
+    assert run(db, "schema", as_json=True) == 0
+    payload = json.loads(capsys.readouterr().out)
+    assert payload["kinds"] == {"Person": 1, "Memory": 1}
+    assert payload["node_keys_by_kind"]["Person"] == {"role": 1}
+    assert payload["node_keys_by_kind"]["Memory"] == {"importance": 1}
+def test_schema_samples_human_shows_enum_values(db, capsys):
+    run(db, "node", "add", "memory:m1", "--kind", "Memory", "--prop", "importance=high")
+    run(db, "node", "add", "memory:m2", "--kind", "Memory", "--prop", "importance=low")
+    capsys.readouterr()
+    assert run(db, "schema", "--samples") == 0
+    out = capsys.readouterr().out
+    assert "importance" in out and "high" in out and "low" in out

{knowledge_graph_rdbms-0.1.2 → knowledge_graph_rdbms-0.1.4}/tests/test_events.py RENAMED Viewed

@@ -154,3 +154,52 @@ def test_batch_op_is_replayable(tmp_path):
     assert g.node("n:1") is not None
     assert g.out("n:1", "LINK")
     g.close()
+# ---- regression: revert of an upsert is a TRUE inverse --------------
+def test_revert_upsert_removes_added_labels_and_props(tmp_path):
+    """An upsert that ADDS labels/props must, on revert, restore the node
+    exactly to its prior state — not merely overwrite changed values.
+    Regression for: compensation merged instead of replacing."""
+    from kgrdbms import service
+    g, log = _fresh(tmp_path)
+    service.upsert_node(g, log, id="t:1", kind="T", labels=["A"], properties={"color": "red"})
+    service.upsert_node(g, log, id="t:1", kind="T", labels=["B"],
+                        properties={"color": "blue", "size": "big"})
+    ev2 = log.tail(1)[0].id
+    service.revert_event(log, ev2)
+    n = g.node("t:1")
+    assert sorted(n.labels) == ["A"]                 # added label B removed
+    assert n.properties == {"color": "red"}          # added prop dropped, value restored
+def test_replay_after_revert_is_consistent(tmp_path):
+    """The new restore-state op must itself be replayable: rebuilding from the
+    log reproduces the post-revert state."""
+    from kgrdbms import service
+    g, log = _fresh(tmp_path)
+    service.upsert_node(g, log, id="t:1", kind="T", labels=["A"], properties={"color": "red"})
+    service.upsert_node(g, log, id="t:1", kind="T", labels=["B"], properties={"color": "blue"})
+    service.revert_event(log, log.tail(1)[0].id)
+    service.replay_log(g, log)
+    n = g.node("t:1")
+    assert sorted(n.labels) == ["A"] and n.properties == {"color": "red"}
+def test_resetting_existing_label_is_noop_not_logged(tmp_path):
+    """Re-adding a label a node already has must not log an event whose revert
+    would then remove the pre-existing label."""
+    from kgrdbms import service
+    g, log = _fresh(tmp_path)
+    service.upsert_node(g, log, id="t:1", kind="T", labels=["A"])
+    before = log.count()
+    service.set_label(g, log, "t:1", "A")            # already present -> no-op
+    assert log.count() == before                      # nothing logged
+    assert "A" in g.node("t:1").labels

{knowledge_graph_rdbms-0.1.2 → knowledge_graph_rdbms-0.1.4}/tests/test_mcp_server.py RENAMED Viewed

@@ -62,6 +62,19 @@ def test_nodes_by_kind_and_label(mcp_mod):
     assert tagged == {"a:1", "b:1"}
+def test_schema_exposes_vocabulary(mcp_mod):
+    mcp_mod.kg_node_upsert(id="a:1", kind="A", name="1", labels=["Tagged"],
+                           properties={"status": "active"})
+    mcp_mod.kg_node_upsert(id="a:2", kind="A", name="2", properties={"status": "archived"})
+    s = mcp_mod.kg_schema()
+    assert s["kinds"] == {"A": 2}
+    assert s["labels"] == {"Tagged": 1}
+    assert s["node_keys_by_kind"]["A"] == {"status": 2}
+    # samples enumerate the enum-like status values
+    s2 = mcp_mod.kg_schema(samples=True)
+    assert s2["samples"]["A"]["values"]["status"] == ["active", "archived"]
 def test_edges_out_and_shortest_path(mcp_mod):
     for nid in ("x:1", "x:2", "x:3"):
         mcp_mod.kg_node_upsert(id=nid, kind="X", name=nid)

{knowledge_graph_rdbms-0.1.2 → knowledge_graph_rdbms-0.1.4}/tests/test_postgres.py RENAMED Viewed

@@ -107,6 +107,28 @@ def test_bulk_add_nodes_and_edges(pg):
     assert g.out("b:1")[0][0].properties == {"w": 9}
+def test_schema_on_postgres_mirrors_sqlite(pg):
+    g = pg.backend
+    g.add_nodes([
+        {"id": "person:ada", "kind": "Person", "name": "Ada",
+         "labels": ["important"], "properties": {"role": "analyst"}},
+        {"id": "person:alan", "kind": "Person", "name": "Alan",
+         "properties": {"role": "logician"}},
+        {"id": "memory:m1", "kind": "Memory", "name": "m1",
+         "properties": {"importance": "high", "content": "x" * 200}},
+    ])
+    g.add_edges([("person:ada", "memory:m1", "WROTE", {"year": 1843})])
+    s = g.schema(samples=True)
+    assert s["kinds"] == {"Person": 2, "Memory": 1}
+    assert s["edge_types"] == {"WROTE": 1}
+    assert s["labels"] == {"important": 1}
+    assert s["node_keys_by_kind"]["Person"]["role"] == 2
+    assert s["edge_keys"] == {"year": 1}
+    # enum enumerated, long free-text content omitted
+    assert s["samples"]["Memory"]["values"]["importance"] == ["high"]
+    assert "content" not in s["samples"]["Memory"]["values"]
 def test_replay_rebuilds_postgres_from_sqlite_log(pg):
     service.upsert_node(pg.backend, pg.events, id="p:1", kind="Person", name="One", actor="t")
     service.upsert_node(pg.backend, pg.events, id="p:2", kind="Person", name="Two", actor="t")

{knowledge_graph_rdbms-0.1.2 → knowledge_graph_rdbms-0.1.4}/tests/test_rdf.py RENAMED Viewed

@@ -97,6 +97,47 @@ def test_rdf_star_annotates_the_quoted_triple(populated):
     assert "<https://kg.local/prop/since>" in nt
+def test_special_chars_produce_valid_iris(populated):
+    """Regression: kind / edge-type / property-key with spaces or punctuation
+    must percent-encode into valid IRIs, not emit a raw space a store rejects."""
+    g = _fresh_graph()
+    g.add_node("topic:ml", kind="Knowledge Area", name="ML",
+               properties={"first name": "Ada", "rate %": 50, "a/b": 1})
+    g.add_node("topic:cs", kind="Knowledge Area", name="CS")
+    g.add_edge("topic:ml", "topic:cs", "is part of", properties={"note": "x"})
+    nt = rdf.export(g, "ntriples")
+    assert "Knowledge%20Area" in nt
+    assert "prop/first%20name" in nt
+    assert "rel/is%20part%20of" in nt
+    assert "prop/a%2Fb" in nt          # '/' encoded so it can't fake a path boundary
+    # No IRI reference may contain a raw space (would break any conformant parser).
+    import re
+    for iri in re.findall(r"<([^<>]+)>", nt):
+        assert " " not in iri, f"raw space in IRI: {iri!r}"
+def test_special_chars_round_trip_and_rdflib_accepts(populated):
+    rdflib = pytest.importorskip("rdflib")
+    g = _fresh_graph()
+    g.add_node("topic:ml", kind="Knowledge Area", name="ML",
+               properties={"first name": "Ada", "rate %": 50, "a/b": 1})
+    g.add_node("topic:cs", kind="Knowledge Area", name="CS")
+    g.add_edge("topic:ml", "topic:cs", "is part of", properties={"note": "x"})
+    # A conformant store accepts the export (reification — rdflib is RDF 1.1).
+    ctx = rdf.IriContext(edge_strategy="reification")
+    rdflib.Graph().parse(data=rdf.export(g, "ntriples", ctx), format="nt")
+    # And the values survive a full star round-trip unchanged.
+    dst = _reimport(rdf.export(g, "ntriples"), "ntriples")
+    n = dst.node("topic:ml")
+    assert n.kind == "Knowledge Area"
+    assert n.properties == {"first name": "Ada", "rate %": 50, "a/b": 1}
+    e = dst.out("topic:ml")[0][0]
+    assert e.type == "is part of" and e.properties == {"note": "x"}
 def test_reification_emits_statement_node(populated):
     ctx = rdf.IriContext(edge_strategy="reification")
     triples = rdf.export_graph(populated, ctx)

knowledge_graph_rdbms-0.1.4/tests/test_schema.py ADDED Viewed

@@ -0,0 +1,97 @@
+"""schema() — the observed TBox an LLM reads before querying, so it never guesses."""
+from kgrdbms.graph import Graph, _scalar_samples
+def _seed(g: Graph) -> None:
+    g.add_node("person:ada", kind="Person", name="Ada Lovelace",
+               labels={"Person", "important"}, properties={"role": "analyst", "born": 1815})
+    g.add_node("person:alan", kind="Person", name="Alan Turing",
+               labels={"Person"}, properties={"role": "logician", "born": 1912})
+    g.add_node("memory:m1", kind="Memory", name="note one",
+               properties={"content": "a free-text body well over eighty characters long so the "
+                                      "schema sampler treats it as prose, not an enumerable value set",
+                           "importance": "high"})
+    g.add_node("memory:m2", kind="Memory", name="note two",
+               properties={"content": "another distinct free-text body, also comfortably past the "
+                                      "eighty-character cap that marks a property as un-enumerable prose",
+                           "importance": "low"})
+    g.add_edge("person:ada", "memory:m1", "WROTE", properties={"year": 1843})
+def test_schema_reports_kinds_edge_types_labels(tmp_path):
+    g = Graph(path=tmp_path / "s.db")
+    _seed(g)
+    s = g.schema()
+    assert s["nodes_total"] == 4
+    assert s["edges_total"] == 1
+    assert s["kinds"] == {"Person": 2, "Memory": 2}
+    assert s["edge_types"] == {"WROTE": 1}
+    assert s["labels"] == {"Person": 2, "important": 1}
+    assert s["edge_keys"] == {"year": 1}
+    g.close()
+def test_schema_property_keys_are_grouped_by_kind(tmp_path):
+    g = Graph(path=tmp_path / "k.db")
+    _seed(g)
+    s = g.schema()
+    assert set(s["node_keys_by_kind"]["Person"]) == {"role", "born"}
+    assert s["node_keys_by_kind"]["Person"]["role"] == 2
+    assert set(s["node_keys_by_kind"]["Memory"]) == {"content", "importance"}
+    g.close()
+def test_schema_kind_with_no_properties_still_listed(tmp_path):
+    g = Graph(path=tmp_path / "np.db")
+    g.add_node("tag:x", kind="Tag", name="x")  # no properties at all
+    s = g.schema()
+    assert s["kinds"] == {"Tag": 1}
+    assert s["node_keys_by_kind"].get("Tag", {}) == {}
+    g.close()
+def test_schema_samples_enumerate_enum_keys_but_not_freetext(tmp_path):
+    g = Graph(path=tmp_path / "samp.db")
+    _seed(g)
+    s = g.schema(samples=True)
+    mem = s["samples"]["Memory"]
+    # example ids reveal the CURIE convention
+    assert mem["example_ids"] == ["memory:m1", "memory:m2"]
+    # importance is enum-like → enumerated; content is free-text → omitted
+    assert mem["values"]["importance"] == ["high", "low"]
+    assert "content" not in mem["values"]
+    # numeric enum on Person too
+    assert s["samples"]["Person"]["values"]["born"] == [1815, 1912]
+    g.close()
+def test_schema_samples_respects_sample_limit(tmp_path):
+    g = Graph(path=tmp_path / "lim.db")
+    for i in range(30):
+        g.add_node(f"n:{i}", kind="K", name=str(i), properties={"v": i})
+    s = g.schema(samples=True, sample_limit=10)
+    # 30 distinct values > limit 10 → not enumerated
+    assert "v" not in s["samples"]["K"]["values"]
+    g.close()
+def test_scalar_samples_helper_rejects_nonscalar_and_longstrings():
+    assert _scalar_samples([1, 2, 3]) == [1, 2, 3]
+    assert _scalar_samples(["b", "a"]) == ["a", "b"]
+    assert _scalar_samples([True, False]) == [False, True]
+    assert _scalar_samples([["a", "b"]]) is None       # list value
+    assert _scalar_samples([{"k": "v"}]) is None        # object value
+    assert _scalar_samples(["x" * 200]) is None         # over-long string
+    assert _scalar_samples([]) is None                  # nothing to show
+def test_schema_empty_graph(tmp_path):
+    g = Graph(path=tmp_path / "empty.db")
+    s = g.schema()
+    assert s == {
+        "nodes_total": 0, "edges_total": 0,
+        "kinds": {}, "edge_types": {}, "labels": {},
+        "node_keys_by_kind": {}, "edge_keys": {},
+    }
+    g.close()