PyPI - knowledge-graph-rdbms - Versions diffs - 0.1.4__tar.gz → 0.1.6__tar.gz - Mend

knowledge-graph-rdbms 0.1.4tar.gz → 0.1.6tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (59) hide show

{knowledge_graph_rdbms-0.1.4 → knowledge_graph_rdbms-0.1.6}/.github/workflows/ci.yml RENAMED Viewed

@@ -1,14 +1,14 @@
 name: CI
-# Runs the test suite on every push to master and every pull request.
+# Runs the test suite on every push to main and every pull request.
 # Make the `ci` job below a required status check in branch protection to
 # gate merges on a green suite.
 on:
   push:
-    branches: [master]
+    branches: [main]
   pull_request:
-    branches: [master]
+    branches: [main]
 concurrency:
   group: ci-${{ github.ref }}

{knowledge_graph_rdbms-0.1.4 → knowledge_graph_rdbms-0.1.6}/CLAUDE.md RENAMED Viewed

@@ -26,6 +26,11 @@ kg schema --samples                      # + example ids and enum-like property
 kg ontology list                         # the registry (the "db of dbs")
 kg ontology create coffee --stance inferential   # register a named ontology
 kg --ontology coffee node add drink:latte --kind Drink   # route to it (resolver)
+kg fed schema                            # union vocabulary across ALL ontologies (multithreaded fan-out)
+kg fed node person:ada                   # find an id across the federation (identity-aware)
+kg link add coffee drink:latte ENJOYED_BY people person:ada   # cross-ontology edge (backbone)
+kg link same-as people person:ada wiki person:ada-lovelace    # assert same real-world entity
+kg prefix add person https://kg.local/person/   # CURIE prefix -> IRI (identity backbone)
 kg --db /tmp/x.db node add a:1 --kind T  # raw escape hatch: exact file, no registry
 kg serve                                 # run the MCP server (needs [mcp] extra)
 ```
@@ -68,6 +73,15 @@ The engine above (`graph.py` + `service.py` + the log + the gate) operates on **
 - **The event log is decoupled from the backend.** `EventLog(store, projection=None)`: *store* is the SQLite that holds the log rows; *projection* is the `GraphBackend` that `compensate()`/replay apply to. They coincide for sqlite (`EventLog(graph)` — projection defaults to store, unchanged). For postgres the store is `resolver._ControlPlaneLogStore` (a `<root>/ontologies/<slug>/events.db` sidecar) and the projection is the `PostgresGraph` — so audit/replay/undo keep working with graph data in Postgres and history in SQLite. `apply_event` only calls `GraphBackend` methods, so it drives any backend. This store↔projection split is the seam to respect for *any* non-sqlite engine (neo4j next).
 - **New failure class:** routing to a stub engine raises `NotImplementedError`. Every front door's error handling must account for it (the CLI's `main()` already maps it to `unavailable: …` / exit 1).
+## Cross-ontology: federation (reads) + the backbone (links)
+The control plane routes to **one** ontology per call. Two modules sit on top to work across **many** at once — and the split is deliberate: reads federate, writes go through a backbone.
+- **`federation.py` — cross-ontology READS, multithreaded by default.** A federated read is a *fan-out*: open each member ontology in its own thread (own connection — SQLite/psycopg release the GIL during a query, so N ontologies read concurrently) and merge the results tagged by source. `Federation(names)` or `Federation.all()`; methods mirror the single-graph reads (`schema`, `stats`, `nodes_by_kind`, `nodes_by_label`, `node`) plus `identity()`. Each worker calls `resolver.resolve()` in its *own* thread, so no connection is ever shared across threads. `parallel=False` forces sequential for debugging. **Federation never writes** and never silently drops a member (a member that raises propagates).
+- **`backbone.py` — cross-ontology LINKS + the prefix registry.** A leaf edge can't cross ontologies (its FK lives in one file). The backbone is where cross-ontology structure lives, and **it needs no new storage: it IS the index graph** (`<root>/index.db`) growing new kinds. A link becomes a `Ref` proxy node per endpoint (id = `<ontology>::<node_id>`, FK-satisfied inside the index) joined by an edge; the prefix registry becomes `Prefix` nodes. **Both go through the gated + logged `service` path against the index's own event log** — so cross-domain assertions (`link`, `same_as`) are audited, reversible, and replayable exactly like leaf data. `SAME_AS` is symmetric and traversed transitively by `identity_cluster()`.
+- **Identity is opt-in per ontology.** `OntologyEntry.shared_identity` (default `False` = local): when `True`, federation treats same-CURIE nodes across such ontologies as the *same* entity and merges them in `Federation.node()`. Everyone else stays local — link explicitly via the backbone. This is the lightweight LPG answer to RDF's global-IRI identity: stay local by default, adopt shared identity surgically.
+- **The qualified-id delimiter is `::`** (`coffee::drink:latte`), distinct from the single-colon CURIE so a node's own colons survive `unqualify()`. Don't reuse `:`.
 ## Layout
 ```
@@ -78,6 +92,8 @@ kgrdbms/
 ├── invariants.py   # compiled-in invariants, run before policy (no-op default)
 ├── service.py      # the shared gated + logged write path (all front doors use this)
 ├── resolver.py     # control plane: name → (backend, events, entry); the ontology index
+├── federation.py   # cross-ontology READS: multithreaded fan-out, identity-aware merge
+├── backbone.py     # cross-ontology LINKS + prefix/IRI registry (lives in the index graph)
 ├── backends/       # pluggable data plane (engine registry)
 │   ├── base.py     #   GraphBackend Protocol + _StubBackend
 │   ├── __init__.py #   registry: @backend(name), get_backend, available_backends
@@ -88,7 +104,7 @@ kgrdbms/
 └── mcp_server.py   # MCP server, kg_-prefixed tools, each with optional `ontology=` (optional [mcp] extra)
 ```
-`graph.py` has no internal dependencies — everything else layers on top of it. Dependency direction: `graph` ← `events`/`backends` ← `resolver` ← `service`-callers (`cli`, `mcp_server`). `service.py` depends only on the `GraphBackend` surface, never a concrete engine. Public API is re-exported from `__init__.py`.
+`graph.py` has no internal dependencies — everything else layers on top of it. Dependency direction: `graph` ← `events`/`backends` ← `resolver` ← `backbone`/`federation` ← `service`-callers (`cli`, `mcp_server`). `backbone` builds on `resolver` + `service` (the index is its store); `federation` builds on `resolver` (+ `backbone` for identity clusters). `service.py` depends only on the `GraphBackend` surface, never a concrete engine. Public API is re-exported from `__init__.py`.
 ## Node id convention (CURIEs)

{knowledge_graph_rdbms-0.1.4 → knowledge_graph_rdbms-0.1.6}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: knowledge-graph-rdbms
-Version: 0.1.4
+Version: 0.1.6
 Summary: A label property graph on an RDBMS (SQLite): nodes, typed edges, an append-only event log, and an optional MCP server.
 Project-URL: Homepage, https://github.com/cunicopia-dev/knowledge-graph-rdbms
 Project-URL: Repository, https://github.com/cunicopia-dev/knowledge-graph-rdbms
@@ -31,11 +31,12 @@ Description-Content-Type: text/markdown
 # knowledge-graph-rdbms
+![PyPI](https://img.shields.io/pypi/v/knowledge-graph-rdbms?logo=pypi&logoColor=white&color=3775A9)
 ![Python](https://img.shields.io/badge/python-3.10%2B-3776AB?logo=python&logoColor=white)
 ![License: MIT](https://img.shields.io/badge/license-MIT-green)
 ![core dependencies: 0](https://img.shields.io/badge/core_dependencies-0-success)
-![tests: 87 passing](https://img.shields.io/badge/tests-87_passing-brightgreen)
-![storage: SQLite](https://img.shields.io/badge/storage-SQLite-003B57?logo=sqlite&logoColor=white)
+![tests: 107 passing](https://img.shields.io/badge/tests-107_passing-brightgreen)
+![storage: SQLite + Postgres](https://img.shields.io/badge/storage-SQLite_%2B_Postgres-003B57?logo=sqlite&logoColor=white)
 ![MCP](https://img.shields.io/badge/MCP-ready-FF6F00)
 **A knowledge graph for modeling _meaning_ — entities, the kinds of things they
@@ -77,6 +78,8 @@ Small enough to hold in your head. Flexible enough to model anything.
 - [The data model](#the-data-model)
 - [Architecture: three front doors, one engine](#architecture-three-front-doors-one-engine)
 - [Many ontologies: one control plane](#many-ontologies-one-control-plane)
+- [Discovery: read the schema before you query](#discovery-read-the-schema-before-you-query)
+- [Cross-ontology: federation and the backbone](#cross-ontology-federation-and-the-backbone)
 - [Event sourcing: the graph is a projection](#event-sourcing-the-graph-is-a-projection)
 - [The safety gate: invariants vs. policy](#the-safety-gate-invariants-vs-policy)
 - [Install](#install)
@@ -358,6 +361,94 @@ NAME` routes through the resolver (named, registered, multi-engine), while
 ---
+## Discovery: read the schema before you query
+A graph you didn't build is opaque: which `kind`s exist? which edge types? what
+property keys live on a `Person`? `schema()` answers all of it in **one read** —
+the map to read before querying, rather than guessing with trial calls.
+```bash
+kg schema             # kinds, edge types, labels, and property keys per kind — with counts
+kg schema --samples   # + a few example ids per kind and the enum-like values a key takes
+```
+What comes back is the *observed* vocabulary — a profile of what's actually in
+the graph, not an enforced schema (the graph stays schemaless). The MCP tool
+`kg_schema` carries an instruction to call it **first**, so an agent reads the map
+before it moves. It's a plain read — pure `GROUP BY` aggregates, no gate — and
+like every read it has a federated form (next) that unions the vocabulary across
+many ontologies at once.
+---
+## Cross-ontology: federation and the backbone
+The control plane routes to *one* ontology per call. Two layers sit on top to work
+across *many* at once: **reads federate, writes go through a backbone.**
+```mermaid
+flowchart TD
+    Q["kg fed node person:ada"] --> FED["federation.py<br/>multithreaded fan-out"]
+    FED -.->|own thread + connection| O1[("people")]
+    FED -.->|own thread + connection| O2[("papers")]
+    FED -.->|own thread + connection| O3[("coffee")]
+    O1 --> M["merge, tagged by source<br/>(identity-aware)"]
+    O2 --> M
+    O3 --> M
+    L["kg link same-as …"] --> BB["backbone.py"]
+    BB --> IDX[["index.db — the backbone<br/>Ref + Prefix nodes, SAME_AS edges<br/>gated + logged"]]
+```
+### Federation — cross-ontology reads, multithreaded
+A federated read is a *fan-out*: open each member ontology in its own thread (its
+own connection — SQLite releases the GIL during a query, so N ontologies read
+**concurrently**) and merge the results tagged by source. There's no separate
+"federated" API: every base read takes an optional `ontologies=[...]` scope.
+```python
+from kgrdbms import Federation
+fed = Federation(["people", "papers", "coffee"])   # or Federation.all()
+fed.schema()                 # unioned vocabulary + a per-ontology breakdown
+fed.nodes_by_kind("Person")  # [Located(ontology, node), ...] — tagged by source
+fed.node("person:ada")       # every occurrence across worlds, identity-merged
+```
+```bash
+kg fed schema                # union vocabulary across ALL ontologies
+kg fed node person:ada       # find an id across the federation (identity-aware)
+```
+Federation never writes and never silently drops a member (a member that raises
+propagates); `parallel=False` forces sequential for debugging.
+### The backbone — cross-ontology links
+A leaf edge can't cross ontologies: its foreign key lives in one file. The backbone
+is where cross-ontology structure lives, and it needs **no new storage — it *is* the
+index graph** growing new kinds. A link becomes a lightweight `Ref` proxy node per
+endpoint (id `<ontology>::<node_id>`, FK-satisfied *inside* the index) joined by an
+edge; the prefix registry becomes `Prefix` nodes. Both go through the **same gated +
+logged `service` path**, so a cross-domain assertion is audited, reversible, and
+replayable exactly like leaf data.
+```bash
+kg link add coffee drink:latte ENJOYED_BY people person:ada   # a typed cross-ontology edge
+kg link same-as people person:ada wiki person:ada-lovelace    # "same real-world entity" (symmetric)
+kg link cluster people person:ada                             # the transitive SAME_AS cluster
+kg prefix add person https://kg.local/person/                 # CURIE prefix -> IRI (identity backbone)
+```
+**Identity is opt-in per ontology.** By default identity is *local* — `person:ada`
+in two ontologies are different nodes until you link them via the backbone. An
+ontology created with `--shared-identity` opts into *global* identity, and federation
+then treats same-CURIE nodes across such ontologies as the **same** entity and merges
+them.
+---
 ## Event sourcing: the graph is a projection
 The graph you query is a cache. The **append-only event log is the source of
@@ -425,6 +516,55 @@ replay(graph, events, genesis=genesis, upto_ts=ts)  # ...as of an instant
 ---
+## Virtual edges: relationships you don't store
+Event sourcing is the right model for *curated facts* — but the wrong one for a
+high-cardinality, machine-generated relationship layer that already lives, fresh
+and authoritative, in some operational store. Mirroring 100k correlation edges
+into the graph (and the log) every night is wasteful and instantly stale.
+A **virtual edge** inverts that. The ontology stores only a *binding* — an edge
+TYPE plus the SQL that resolves its instances against an external source. At
+traversal time the resolver runs that query, parameterized by the node you're
+standing on, and synthesizes the edges live. Zero copy, always current, one
+source of truth — Ontology-Based Data Access in the graph's own terms.
+```python
+# Bind CO_HELD_WITH to a query over the operational store (here, any DB-API source)
+kg_virtual_edge_add(
+    edge_type="CO_HELD_WITH",
+    query="SELECT b AS to_id, shared FROM co_held WHERE a = ?",  # '?' sqlite, '%s' postgres
+    dsn_env="OUROBOROS_DSN",          # credentials by reference — never in the graph
+    source="id_slug",                  # company:NVDA -> bind "NVDA"
+    target_id_template="company:{value}",
+    prop_cols=["shared"], directions="both", ontology="market",
+)
+# now ordinary traversal unions stored + virtual edges; virtual ones carry _virtual:true
+kg_edges("company:NVDA", direction="out", ontology="market")
+# -> [{to: "company:AMD", type: "CO_HELD_WITH", properties: {shared: 12, _virtual: true}, …}]
+```
+Two properties keep it safe and simple:
+- **Read-only.** Virtual edges are never written, so they sidestep the whole
+  gated-write / event-log / compensation machinery. A binding is config; the
+  edges are a view. Nothing to invalidate, replay, or undo.
+- **Parameterized, never interpolated.** The SQL template is operator-authored;
+  the per-node value is always *bound* through the driver, never formatted into
+  the string. Credentials ride `dsn_env` (an env-var name), so secrets stay out
+  of the graph and out of version control.
+Bindings live as reserved-kind (`_VirtualEdge`) nodes *in the ontology*, so they
+travel with it and version alongside the schema. This is the seam for a
+schema-graph / data-graph split: keep the curated **schema** (types, contracts,
+doctrine) in a portable SQLite ontology, and **virtualize the populated extension**
+straight out of your system-of-record — no ETL, no drift.
+Tools: `kg_virtual_edge_add`, `kg_virtual_edges_list`, `kg_virtual_edge_remove`.
+---
 ## The safety gate: invariants vs. policy
 When you expose the graph for live mutation — especially to an AI agent over
@@ -572,18 +712,20 @@ Or hand-edit a client config (e.g. Claude Desktop):
 { "mcpServers": { "kgrdbms": { "command": "kgrdbms-mcp" } } }
 ```
-It exposes `kg_`-prefixed tools for reads (`kg_schema` — the vocabulary, meant to
-be called first; `kg_node_get`, `kg_nodes_by_kind`,
-`kg_neighborhood`, `kg_shortest_path`, `kg_descendants`, …), gated writes
-(`kg_node_upsert`, `kg_edge_add`, `kg_node_delete`, …), bulk composition
-(`kg_import` — a whole `{nodes, edges}` batch in one call, so an agent populates
-an ontology in a single tool call instead of dozens), RDF interop
-(`kg_rdf_export`, `kg_rdf_import` — see below), and the event log
-(`kg_events_tail`, `kg_event_revert`, `kg_replay`). Every write passes through
-the invariants + policy gate and is recorded — same engine, same file as the
-CLI. Every tool also takes an optional `ontology` name (omit for the default),
-and `kg_ontologies_list` / `kg_ontology_create` manage the registry — so an
-agent can discover, create, and route between ontologies entirely over MCP.
+It exposes `kg_`-prefixed tools over one engine. Reads — `kg_schema` (the
+vocabulary, meant to be called first), `kg_node_get`, `kg_find` (by kind and/or
+label), `kg_edges`, `kg_neighborhood`, `kg_shortest_path`, `kg_descendants` — each
+take an optional `ontologies=[...]` to fan out across many ontologies in one
+call. Then gated writes (`kg_node_upsert`, `kg_edge_add`, `kg_edge_remove`,
+`kg_node_delete`), bulk composition (`kg_import` — a whole `{nodes, edges}` batch in
+one call, so an agent populates an ontology in a single tool call instead of dozens),
+the cross-ontology backbone (`kg_link`, `kg_links_of`, `kg_identity`, `kg_prefix_add`,
+`kg_prefix_resolve`), RDF interop (`kg_rdf_export`, `kg_rdf_import` — see below), and
+the event log (`kg_events_tail`, `kg_event_revert`, `kg_replay`). Every write passes
+the invariants + policy gate and is recorded — same engine, same file as the CLI.
+Every tool takes an optional `ontology` name, and `kg_ontologies_list` /
+`kg_ontology_create` / `kg_ontology_delete` manage the registry — so an agent can
+discover, create, route between, and delete ontologies entirely over MCP.
 ---
@@ -604,7 +746,7 @@ that data by `bench/charts.py`. Run both on your own machine in one command.
 ### Writes — the batching lever
-![Write throughput — batch the commit, ~10× faster](https://raw.githubusercontent.com/cunicopia-dev/knowledge-graph-rdbms/master/assets/write_throughput.png)
+![Write throughput — batch the commit, ~10× faster](https://raw.githubusercontent.com/cunicopia-dev/knowledge-graph-rdbms/main/assets/write_throughput.png)
 Each single write commits on its own for durability. Wrapping a bulk load in
 `batch()` / `add_nodes` / `add_edges` collapses those per-call commits into one
@@ -615,7 +757,7 @@ thousands per second.
 ### Reads — fast, with an honest tail
-![Read latency — p50 marker, whisker to p99, log scale](https://raw.githubusercontent.com/cunicopia-dev/knowledge-graph-rdbms/master/assets/read_latency.png)
+![Read latency — p50 marker, whisker to p99, log scale](https://raw.githubusercontent.com/cunicopia-dev/knowledge-graph-rdbms/main/assets/read_latency.png)
 Point lookups land in single-digit microseconds, and multi-node reads hydrate
 the whole result set in a constant number of queries (no N+1 fan-out). The chart
@@ -629,7 +771,7 @@ SQLite engine runs under CPython, Node, and Bun, so the gap between them is pure
 binding overhead — under 2×, and it doesn't even favor one runtime across
 operations.
-![Same SQLite across CPython, Node, and Bun](https://raw.githubusercontent.com/cunicopia-dev/knowledge-graph-rdbms/master/assets/runtimes.png)
+![Same SQLite across CPython, Node, and Bun](https://raw.githubusercontent.com/cunicopia-dev/knowledge-graph-rdbms/main/assets/runtimes.png)
 The lever that actually moved the needle was transaction batching (~10×, above),
 not the language. Reproduce it with `python bench/runtimes/compare.py`.
@@ -639,7 +781,7 @@ not the language. Reproduce it with `python bench/runtimes/compare.py`.
 We measured it against Neo4j — same graph, same queries, identical methodology
 (full harness and reproduction in [`bench/neo4j/`](bench/neo4j/README.md)):
-![Where the crossover is — kgrdbms vs Neo4j](https://raw.githubusercontent.com/cunicopia-dev/knowledge-graph-rdbms/master/assets/crossover.png)
+![Where the crossover is — kgrdbms vs Neo4j](https://raw.githubusercontent.com/cunicopia-dev/knowledge-graph-rdbms/main/assets/crossover.png)
 Queries compile to SQL over B-tree indexes, so each traversal hop is an index
 lookup — wonderfully cheap for point reads and shallow traversals. An in-process
@@ -768,7 +910,18 @@ replayable.
 | `kg rdf export [--format F]`    | serialize to Turtle/N-Triples (RDF-star)      |
 | `kg rdf import FILE`            | load RDF back in (gated + logged)             |
 | `kg ontology list`              | list registered ontologies (the registry)     |
-| `kg ontology create NAME …`     | register an ontology (`--backend`, `--stance`) |
+| `kg ontology create NAME …`     | register an ontology (`--backend`, `--stance`, `--shared-identity`) |
+| `kg ontology delete NAME [--purge]` | deregister an ontology (`--purge` also deletes its data) |
+| `kg fed schema [--samples]`     | union vocabulary across ALL ontologies (multithreaded fan-out) |
+| `kg fed stats`                  | node/edge totals across the federation        |
+| `kg fed nodes-by-kind KIND` / `nodes-by-label LABEL` | nodes across ontologies, tagged by source |
+| `kg fed node ID`                | find an id across the federation (identity-aware) |
+| `kg link add FROM_ONT FROM TYPE TO_ONT TO` | cross-ontology edge (the backbone) |
+| `kg link same-as A FROM B TO`   | assert two nodes are the same entity (symmetric) |
+| `kg link of ONT ID`             | cross-ontology links touching a node          |
+| `kg link cluster ONT ID`        | the transitive SAME_AS identity cluster       |
+| `kg prefix add P IRI_BASE`      | bind a CURIE prefix to an IRI base            |
+| `kg prefix expand CURIE` / `contract IRI` | CURIE ↔ IRI via the registry        |
 | `kg serve [--transport T]`      | run the MCP server                            |
 Add `--json` to any command for machine-readable output. Target a graph with
@@ -787,6 +940,8 @@ kgrdbms/
 ├── invariants.py   # compiled-in invariants, checked before policy (no-op default)
 ├── service.py      # the shared gated + logged write path
 ├── resolver.py     # control plane: ontology name → (backend, events, entry) + the index
+├── federation.py   # cross-ontology reads: multithreaded fan-out, identity-aware merge
+├── backbone.py     # cross-ontology links + prefix/IRI registry (lives in the index graph)
 ├── backends/       # pluggable engine registry
 │   ├── base.py     #   GraphBackend protocol + raising stub skeleton
 │   ├── sqlite.py   #   live engine (adapter over Graph)
@@ -808,7 +963,7 @@ own. Everything else layers on top; `service.py` depends only on the
 ```bash
 git clone <repo> && cd knowledge-graph-rdbms
 uv venv && uv pip install -e ".[dev]"
-pytest                       # 62 tests
+pytest                       # 107 tests
 python bench/benchmark.py    # benchmark with p50–p99 (see bench/README.md)
 ```

{knowledge_graph_rdbms-0.1.4 → knowledge_graph_rdbms-0.1.6}/README.md RENAMED Viewed

@@ -1,10 +1,11 @@
 # knowledge-graph-rdbms
+![PyPI](https://img.shields.io/pypi/v/knowledge-graph-rdbms?logo=pypi&logoColor=white&color=3775A9)
 ![Python](https://img.shields.io/badge/python-3.10%2B-3776AB?logo=python&logoColor=white)
 ![License: MIT](https://img.shields.io/badge/license-MIT-green)
 ![core dependencies: 0](https://img.shields.io/badge/core_dependencies-0-success)
-![tests: 87 passing](https://img.shields.io/badge/tests-87_passing-brightgreen)
-![storage: SQLite](https://img.shields.io/badge/storage-SQLite-003B57?logo=sqlite&logoColor=white)
+![tests: 107 passing](https://img.shields.io/badge/tests-107_passing-brightgreen)
+![storage: SQLite + Postgres](https://img.shields.io/badge/storage-SQLite_%2B_Postgres-003B57?logo=sqlite&logoColor=white)
 ![MCP](https://img.shields.io/badge/MCP-ready-FF6F00)
 **A knowledge graph for modeling _meaning_ — entities, the kinds of things they
@@ -46,6 +47,8 @@ Small enough to hold in your head. Flexible enough to model anything.
 - [The data model](#the-data-model)
 - [Architecture: three front doors, one engine](#architecture-three-front-doors-one-engine)
 - [Many ontologies: one control plane](#many-ontologies-one-control-plane)
+- [Discovery: read the schema before you query](#discovery-read-the-schema-before-you-query)
+- [Cross-ontology: federation and the backbone](#cross-ontology-federation-and-the-backbone)
 - [Event sourcing: the graph is a projection](#event-sourcing-the-graph-is-a-projection)
 - [The safety gate: invariants vs. policy](#the-safety-gate-invariants-vs-policy)
 - [Install](#install)
@@ -327,6 +330,94 @@ NAME` routes through the resolver (named, registered, multi-engine), while
 ---
+## Discovery: read the schema before you query
+A graph you didn't build is opaque: which `kind`s exist? which edge types? what
+property keys live on a `Person`? `schema()` answers all of it in **one read** —
+the map to read before querying, rather than guessing with trial calls.
+```bash
+kg schema             # kinds, edge types, labels, and property keys per kind — with counts
+kg schema --samples   # + a few example ids per kind and the enum-like values a key takes
+```
+What comes back is the *observed* vocabulary — a profile of what's actually in
+the graph, not an enforced schema (the graph stays schemaless). The MCP tool
+`kg_schema` carries an instruction to call it **first**, so an agent reads the map
+before it moves. It's a plain read — pure `GROUP BY` aggregates, no gate — and
+like every read it has a federated form (next) that unions the vocabulary across
+many ontologies at once.
+---
+## Cross-ontology: federation and the backbone
+The control plane routes to *one* ontology per call. Two layers sit on top to work
+across *many* at once: **reads federate, writes go through a backbone.**
+```mermaid
+flowchart TD
+    Q["kg fed node person:ada"] --> FED["federation.py<br/>multithreaded fan-out"]
+    FED -.->|own thread + connection| O1[("people")]
+    FED -.->|own thread + connection| O2[("papers")]
+    FED -.->|own thread + connection| O3[("coffee")]
+    O1 --> M["merge, tagged by source<br/>(identity-aware)"]
+    O2 --> M
+    O3 --> M
+    L["kg link same-as …"] --> BB["backbone.py"]
+    BB --> IDX[["index.db — the backbone<br/>Ref + Prefix nodes, SAME_AS edges<br/>gated + logged"]]
+```
+### Federation — cross-ontology reads, multithreaded
+A federated read is a *fan-out*: open each member ontology in its own thread (its
+own connection — SQLite releases the GIL during a query, so N ontologies read
+**concurrently**) and merge the results tagged by source. There's no separate
+"federated" API: every base read takes an optional `ontologies=[...]` scope.
+```python
+from kgrdbms import Federation
+fed = Federation(["people", "papers", "coffee"])   # or Federation.all()
+fed.schema()                 # unioned vocabulary + a per-ontology breakdown
+fed.nodes_by_kind("Person")  # [Located(ontology, node), ...] — tagged by source
+fed.node("person:ada")       # every occurrence across worlds, identity-merged
+```
+```bash
+kg fed schema                # union vocabulary across ALL ontologies
+kg fed node person:ada       # find an id across the federation (identity-aware)
+```
+Federation never writes and never silently drops a member (a member that raises
+propagates); `parallel=False` forces sequential for debugging.
+### The backbone — cross-ontology links
+A leaf edge can't cross ontologies: its foreign key lives in one file. The backbone
+is where cross-ontology structure lives, and it needs **no new storage — it *is* the
+index graph** growing new kinds. A link becomes a lightweight `Ref` proxy node per
+endpoint (id `<ontology>::<node_id>`, FK-satisfied *inside* the index) joined by an
+edge; the prefix registry becomes `Prefix` nodes. Both go through the **same gated +
+logged `service` path**, so a cross-domain assertion is audited, reversible, and
+replayable exactly like leaf data.
+```bash
+kg link add coffee drink:latte ENJOYED_BY people person:ada   # a typed cross-ontology edge
+kg link same-as people person:ada wiki person:ada-lovelace    # "same real-world entity" (symmetric)
+kg link cluster people person:ada                             # the transitive SAME_AS cluster
+kg prefix add person https://kg.local/person/                 # CURIE prefix -> IRI (identity backbone)
+```
+**Identity is opt-in per ontology.** By default identity is *local* — `person:ada`
+in two ontologies are different nodes until you link them via the backbone. An
+ontology created with `--shared-identity` opts into *global* identity, and federation
+then treats same-CURIE nodes across such ontologies as the **same** entity and merges
+them.
+---
 ## Event sourcing: the graph is a projection
 The graph you query is a cache. The **append-only event log is the source of
@@ -394,6 +485,55 @@ replay(graph, events, genesis=genesis, upto_ts=ts)  # ...as of an instant
 ---
+## Virtual edges: relationships you don't store
+Event sourcing is the right model for *curated facts* — but the wrong one for a
+high-cardinality, machine-generated relationship layer that already lives, fresh
+and authoritative, in some operational store. Mirroring 100k correlation edges
+into the graph (and the log) every night is wasteful and instantly stale.
+A **virtual edge** inverts that. The ontology stores only a *binding* — an edge
+TYPE plus the SQL that resolves its instances against an external source. At
+traversal time the resolver runs that query, parameterized by the node you're
+standing on, and synthesizes the edges live. Zero copy, always current, one
+source of truth — Ontology-Based Data Access in the graph's own terms.
+```python
+# Bind CO_HELD_WITH to a query over the operational store (here, any DB-API source)
+kg_virtual_edge_add(
+    edge_type="CO_HELD_WITH",
+    query="SELECT b AS to_id, shared FROM co_held WHERE a = ?",  # '?' sqlite, '%s' postgres
+    dsn_env="OUROBOROS_DSN",          # credentials by reference — never in the graph
+    source="id_slug",                  # company:NVDA -> bind "NVDA"
+    target_id_template="company:{value}",
+    prop_cols=["shared"], directions="both", ontology="market",
+)
+# now ordinary traversal unions stored + virtual edges; virtual ones carry _virtual:true
+kg_edges("company:NVDA", direction="out", ontology="market")
+# -> [{to: "company:AMD", type: "CO_HELD_WITH", properties: {shared: 12, _virtual: true}, …}]
+```
+Two properties keep it safe and simple:
+- **Read-only.** Virtual edges are never written, so they sidestep the whole
+  gated-write / event-log / compensation machinery. A binding is config; the
+  edges are a view. Nothing to invalidate, replay, or undo.
+- **Parameterized, never interpolated.** The SQL template is operator-authored;
+  the per-node value is always *bound* through the driver, never formatted into
+  the string. Credentials ride `dsn_env` (an env-var name), so secrets stay out
+  of the graph and out of version control.
+Bindings live as reserved-kind (`_VirtualEdge`) nodes *in the ontology*, so they
+travel with it and version alongside the schema. This is the seam for a
+schema-graph / data-graph split: keep the curated **schema** (types, contracts,
+doctrine) in a portable SQLite ontology, and **virtualize the populated extension**
+straight out of your system-of-record — no ETL, no drift.
+Tools: `kg_virtual_edge_add`, `kg_virtual_edges_list`, `kg_virtual_edge_remove`.
+---
 ## The safety gate: invariants vs. policy
 When you expose the graph for live mutation — especially to an AI agent over
@@ -541,18 +681,20 @@ Or hand-edit a client config (e.g. Claude Desktop):
 { "mcpServers": { "kgrdbms": { "command": "kgrdbms-mcp" } } }
 ```
-It exposes `kg_`-prefixed tools for reads (`kg_schema` — the vocabulary, meant to
-be called first; `kg_node_get`, `kg_nodes_by_kind`,
-`kg_neighborhood`, `kg_shortest_path`, `kg_descendants`, …), gated writes
-(`kg_node_upsert`, `kg_edge_add`, `kg_node_delete`, …), bulk composition
-(`kg_import` — a whole `{nodes, edges}` batch in one call, so an agent populates
-an ontology in a single tool call instead of dozens), RDF interop
-(`kg_rdf_export`, `kg_rdf_import` — see below), and the event log
-(`kg_events_tail`, `kg_event_revert`, `kg_replay`). Every write passes through
-the invariants + policy gate and is recorded — same engine, same file as the
-CLI. Every tool also takes an optional `ontology` name (omit for the default),
-and `kg_ontologies_list` / `kg_ontology_create` manage the registry — so an
-agent can discover, create, and route between ontologies entirely over MCP.
+It exposes `kg_`-prefixed tools over one engine. Reads — `kg_schema` (the
+vocabulary, meant to be called first), `kg_node_get`, `kg_find` (by kind and/or
+label), `kg_edges`, `kg_neighborhood`, `kg_shortest_path`, `kg_descendants` — each
+take an optional `ontologies=[...]` to fan out across many ontologies in one
+call. Then gated writes (`kg_node_upsert`, `kg_edge_add`, `kg_edge_remove`,
+`kg_node_delete`), bulk composition (`kg_import` — a whole `{nodes, edges}` batch in
+one call, so an agent populates an ontology in a single tool call instead of dozens),
+the cross-ontology backbone (`kg_link`, `kg_links_of`, `kg_identity`, `kg_prefix_add`,
+`kg_prefix_resolve`), RDF interop (`kg_rdf_export`, `kg_rdf_import` — see below), and
+the event log (`kg_events_tail`, `kg_event_revert`, `kg_replay`). Every write passes
+the invariants + policy gate and is recorded — same engine, same file as the CLI.
+Every tool takes an optional `ontology` name, and `kg_ontologies_list` /
+`kg_ontology_create` / `kg_ontology_delete` manage the registry — so an agent can
+discover, create, route between, and delete ontologies entirely over MCP.
 ---
@@ -573,7 +715,7 @@ that data by `bench/charts.py`. Run both on your own machine in one command.
 ### Writes — the batching lever
-![Write throughput — batch the commit, ~10× faster](https://raw.githubusercontent.com/cunicopia-dev/knowledge-graph-rdbms/master/assets/write_throughput.png)
+![Write throughput — batch the commit, ~10× faster](https://raw.githubusercontent.com/cunicopia-dev/knowledge-graph-rdbms/main/assets/write_throughput.png)
 Each single write commits on its own for durability. Wrapping a bulk load in
 `batch()` / `add_nodes` / `add_edges` collapses those per-call commits into one
@@ -584,7 +726,7 @@ thousands per second.
 ### Reads — fast, with an honest tail
-![Read latency — p50 marker, whisker to p99, log scale](https://raw.githubusercontent.com/cunicopia-dev/knowledge-graph-rdbms/master/assets/read_latency.png)
+![Read latency — p50 marker, whisker to p99, log scale](https://raw.githubusercontent.com/cunicopia-dev/knowledge-graph-rdbms/main/assets/read_latency.png)
 Point lookups land in single-digit microseconds, and multi-node reads hydrate
 the whole result set in a constant number of queries (no N+1 fan-out). The chart
@@ -598,7 +740,7 @@ SQLite engine runs under CPython, Node, and Bun, so the gap between them is pure
 binding overhead — under 2×, and it doesn't even favor one runtime across
 operations.
-![Same SQLite across CPython, Node, and Bun](https://raw.githubusercontent.com/cunicopia-dev/knowledge-graph-rdbms/master/assets/runtimes.png)
+![Same SQLite across CPython, Node, and Bun](https://raw.githubusercontent.com/cunicopia-dev/knowledge-graph-rdbms/main/assets/runtimes.png)
 The lever that actually moved the needle was transaction batching (~10×, above),
 not the language. Reproduce it with `python bench/runtimes/compare.py`.
@@ -608,7 +750,7 @@ not the language. Reproduce it with `python bench/runtimes/compare.py`.
 We measured it against Neo4j — same graph, same queries, identical methodology
 (full harness and reproduction in [`bench/neo4j/`](bench/neo4j/README.md)):
-![Where the crossover is — kgrdbms vs Neo4j](https://raw.githubusercontent.com/cunicopia-dev/knowledge-graph-rdbms/master/assets/crossover.png)
+![Where the crossover is — kgrdbms vs Neo4j](https://raw.githubusercontent.com/cunicopia-dev/knowledge-graph-rdbms/main/assets/crossover.png)
 Queries compile to SQL over B-tree indexes, so each traversal hop is an index
 lookup — wonderfully cheap for point reads and shallow traversals. An in-process
@@ -737,7 +879,18 @@ replayable.
 | `kg rdf export [--format F]`    | serialize to Turtle/N-Triples (RDF-star)      |
 | `kg rdf import FILE`            | load RDF back in (gated + logged)             |
 | `kg ontology list`              | list registered ontologies (the registry)     |
-| `kg ontology create NAME …`     | register an ontology (`--backend`, `--stance`) |
+| `kg ontology create NAME …`     | register an ontology (`--backend`, `--stance`, `--shared-identity`) |
+| `kg ontology delete NAME [--purge]` | deregister an ontology (`--purge` also deletes its data) |
+| `kg fed schema [--samples]`     | union vocabulary across ALL ontologies (multithreaded fan-out) |
+| `kg fed stats`                  | node/edge totals across the federation        |
+| `kg fed nodes-by-kind KIND` / `nodes-by-label LABEL` | nodes across ontologies, tagged by source |
+| `kg fed node ID`                | find an id across the federation (identity-aware) |
+| `kg link add FROM_ONT FROM TYPE TO_ONT TO` | cross-ontology edge (the backbone) |
+| `kg link same-as A FROM B TO`   | assert two nodes are the same entity (symmetric) |
+| `kg link of ONT ID`             | cross-ontology links touching a node          |
+| `kg link cluster ONT ID`        | the transitive SAME_AS identity cluster       |
+| `kg prefix add P IRI_BASE`      | bind a CURIE prefix to an IRI base            |
+| `kg prefix expand CURIE` / `contract IRI` | CURIE ↔ IRI via the registry        |
 | `kg serve [--transport T]`      | run the MCP server                            |
 Add `--json` to any command for machine-readable output. Target a graph with
@@ -756,6 +909,8 @@ kgrdbms/
 ├── invariants.py   # compiled-in invariants, checked before policy (no-op default)
 ├── service.py      # the shared gated + logged write path
 ├── resolver.py     # control plane: ontology name → (backend, events, entry) + the index
+├── federation.py   # cross-ontology reads: multithreaded fan-out, identity-aware merge
+├── backbone.py     # cross-ontology links + prefix/IRI registry (lives in the index graph)
 ├── backends/       # pluggable engine registry
 │   ├── base.py     #   GraphBackend protocol + raising stub skeleton
 │   ├── sqlite.py   #   live engine (adapter over Graph)
@@ -777,7 +932,7 @@ own. Everything else layers on top; `service.py` depends only on the
 ```bash
 git clone <repo> && cd knowledge-graph-rdbms
 uv venv && uv pip install -e ".[dev]"
-pytest                       # 62 tests
+pytest                       # 107 tests
 python bench/benchmark.py    # benchmark with p50–p99 (see bench/README.md)
 ```

knowledge-graph-rdbms 0.1.4__tar.gz → 0.1.6__tar.gz

knowledge-graph-rdbms 0.1.4tar.gz → 0.1.6tar.gz