knowledge-graph-rdbms 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- knowledge_graph_rdbms-0.1.0/.claude/skills/kg-compose/SKILL.md +115 -0
- knowledge_graph_rdbms-0.1.0/.gitignore +30 -0
- knowledge_graph_rdbms-0.1.0/CLAUDE.md +110 -0
- knowledge_graph_rdbms-0.1.0/CODE_OF_CONDUCT.md +35 -0
- knowledge_graph_rdbms-0.1.0/CONTRIBUTING.md +72 -0
- knowledge_graph_rdbms-0.1.0/LICENSE +21 -0
- knowledge_graph_rdbms-0.1.0/PKG-INFO +746 -0
- knowledge_graph_rdbms-0.1.0/README.md +718 -0
- knowledge_graph_rdbms-0.1.0/SECURITY.md +56 -0
- knowledge_graph_rdbms-0.1.0/assets/crossover.png +0 -0
- knowledge_graph_rdbms-0.1.0/assets/read_latency.png +0 -0
- knowledge_graph_rdbms-0.1.0/assets/runtimes.png +0 -0
- knowledge_graph_rdbms-0.1.0/assets/write_throughput.png +0 -0
- knowledge_graph_rdbms-0.1.0/bench/README.md +144 -0
- knowledge_graph_rdbms-0.1.0/bench/benchmark.py +396 -0
- knowledge_graph_rdbms-0.1.0/bench/charts.py +354 -0
- knowledge_graph_rdbms-0.1.0/bench/neo4j/README.md +66 -0
- knowledge_graph_rdbms-0.1.0/bench/neo4j/headtohead.py +233 -0
- knowledge_graph_rdbms-0.1.0/bench/postgres/README.md +67 -0
- knowledge_graph_rdbms-0.1.0/bench/postgres/benchmark.py +301 -0
- knowledge_graph_rdbms-0.1.0/bench/postgres/charts.py +230 -0
- knowledge_graph_rdbms-0.1.0/bench/runtimes/compare.py +96 -0
- knowledge_graph_rdbms-0.1.0/bench/runtimes/run_bun.js +40 -0
- knowledge_graph_rdbms-0.1.0/bench/runtimes/run_node.mjs +44 -0
- knowledge_graph_rdbms-0.1.0/bench/runtimes/run_python.py +81 -0
- knowledge_graph_rdbms-0.1.0/kgrdbms/__init__.py +50 -0
- knowledge_graph_rdbms-0.1.0/kgrdbms/backends/__init__.py +69 -0
- knowledge_graph_rdbms-0.1.0/kgrdbms/backends/base.py +104 -0
- knowledge_graph_rdbms-0.1.0/kgrdbms/backends/neo4j.py +45 -0
- knowledge_graph_rdbms-0.1.0/kgrdbms/backends/postgres.py +525 -0
- knowledge_graph_rdbms-0.1.0/kgrdbms/backends/sqlite.py +23 -0
- knowledge_graph_rdbms-0.1.0/kgrdbms/cli.py +538 -0
- knowledge_graph_rdbms-0.1.0/kgrdbms/events.py +319 -0
- knowledge_graph_rdbms-0.1.0/kgrdbms/graph.py +749 -0
- knowledge_graph_rdbms-0.1.0/kgrdbms/invariants.py +45 -0
- knowledge_graph_rdbms-0.1.0/kgrdbms/mcp_server.py +417 -0
- knowledge_graph_rdbms-0.1.0/kgrdbms/policy.py +111 -0
- knowledge_graph_rdbms-0.1.0/kgrdbms/resolver.py +269 -0
- knowledge_graph_rdbms-0.1.0/kgrdbms/service.py +221 -0
- knowledge_graph_rdbms-0.1.0/pyproject.toml +41 -0
- knowledge_graph_rdbms-0.1.0/tests/test_bulk.py +174 -0
- knowledge_graph_rdbms-0.1.0/tests/test_cli.py +149 -0
- knowledge_graph_rdbms-0.1.0/tests/test_events.py +156 -0
- knowledge_graph_rdbms-0.1.0/tests/test_graph.py +98 -0
- knowledge_graph_rdbms-0.1.0/tests/test_mcp_server.py +190 -0
- knowledge_graph_rdbms-0.1.0/tests/test_policy.py +57 -0
- knowledge_graph_rdbms-0.1.0/tests/test_postgres.py +119 -0
|
@@ -0,0 +1,115 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: kg-compose
|
|
3
|
+
description: Decompose documents or pasted context into a queryable kgrdbms ontology — extract entities and typed relationships, mint stable CURIE ids, and write them gated+logged into a named ontology via the `kg` CLI. Use when the user hands over source material (notes, docs, transcripts, research, pasted text) and wants it turned into a knowledge graph they can query, or says things like "compose this into an ontology", "build a KG from this", "extract the entities and relationships", "turn these notes into a graph".
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# kg-compose — document → ontology
|
|
7
|
+
|
|
8
|
+
Turn unstructured source material into structured, queryable graph facts in a
|
|
9
|
+
named kgrdbms ontology. You are the extraction engine; the ontology supplies the
|
|
10
|
+
*opinion* (how aggressive to be), and every write is gated and logged so a wrong
|
|
11
|
+
call is reversible, not permanent.
|
|
12
|
+
|
|
13
|
+
## The one idea
|
|
14
|
+
|
|
15
|
+
**You are mechanism; the ontology is policy.** Don't impose a house style — read
|
|
16
|
+
the target ontology's `stance` and honor it. A `literal` legal-notes ontology and
|
|
17
|
+
an `inferential` research-notes ontology get *different* graphs from the same
|
|
18
|
+
paragraph, on purpose.
|
|
19
|
+
|
|
20
|
+
## Procedure
|
|
21
|
+
|
|
22
|
+
### 0. Resolve the target ontology
|
|
23
|
+
- If the user named one, use it. If not, propose a short kebab-case name from the
|
|
24
|
+
material and confirm.
|
|
25
|
+
- Check whether it exists: `kg --json ontology list`. If absent, create it:
|
|
26
|
+
`kg ontology create NAME --stance <literal|inferential> --description "…"`.
|
|
27
|
+
If present, **do not recreate it** — read its existing `stance`/`path` from the
|
|
28
|
+
list output and honor them.
|
|
29
|
+
|
|
30
|
+
### 1. Read the ontology's opinion
|
|
31
|
+
From the registry entry: `stance` (free-text extraction guidance — see *Stance*
|
|
32
|
+
below), `allowed_kinds` (if non-empty, prefer those kinds; extract others but
|
|
33
|
+
flag that they're outside the ontology's allowlist), `id_convention` (default
|
|
34
|
+
CURIE `prefix:slug`). These are the guidance you compose within.
|
|
35
|
+
|
|
36
|
+
### 2. Decompose the source into a `{nodes, edges}` model
|
|
37
|
+
- **Nodes** = the *things*. Each: `id` (a CURIE — see *Id rules*), `kind` (a
|
|
38
|
+
TitleCase type like `Person`, `Company`, `Method`), `name` (display string),
|
|
39
|
+
`labels` (set memberships), `properties` (JSON facts).
|
|
40
|
+
- **Edges** = the *relationships*. Each: `from`, `to`, `type` (UPPER_SNAKE verb
|
|
41
|
+
like `FOUNDED`, `MADE_WITH`, `REPORTS_TO`), and optional `properties` (facts
|
|
42
|
+
about the relationship itself — `year`, `confidence`, `source`).
|
|
43
|
+
- Let the **stance** govern how far past the literal text you go.
|
|
44
|
+
|
|
45
|
+
### 3. Write it (bulk, gated, logged — ONE call, not N)
|
|
46
|
+
Write the whole `{nodes, edges}` model in a single bulk operation. **Do not emit
|
|
47
|
+
dozens of individual upsert calls** — use the bulk path:
|
|
48
|
+
|
|
49
|
+
- **Over MCP:** call `kg_import(nodes=[...], edges=[...], ontology="NAME")` once.
|
|
50
|
+
- **Over the CLI:** `kg --ontology NAME import /tmp/compose.json --actor kg-compose`.
|
|
51
|
+
|
|
52
|
+
Both run the same gated + logged path inside one transaction — fast *and* fully
|
|
53
|
+
recorded, so it survives replay. Every node/edge is still individually gated and
|
|
54
|
+
reversible; bulk only collapses the commit, not the gate. Re-running on
|
|
55
|
+
overlapping sources is safe: stable CURIE ids make re-imports **merge**, never
|
|
56
|
+
duplicate. For a very large source, chunk into a few `kg_import` calls rather
|
|
57
|
+
than one per entity.
|
|
58
|
+
|
|
59
|
+
### 4. Verify and hand back the receipt
|
|
60
|
+
- `kg --ontology NAME stats` — what landed.
|
|
61
|
+
- `kg --ontology NAME path A B` (or a `neighbors`/`out` query) — show it's
|
|
62
|
+
actually connected, not just a pile of nodes.
|
|
63
|
+
- `kg --ontology NAME events -n 10` — the audit trail of this composition.
|
|
64
|
+
- Summarize in prose: counts, the key entities/relationships, and anything you
|
|
65
|
+
inferred (so the user can see and revert it).
|
|
66
|
+
|
|
67
|
+
## Id rules (CURIEs)
|
|
68
|
+
- `id = prefix:reference`. `prefix` is a short stable lowercase type token
|
|
69
|
+
(`person`, `company`, `paper`); `reference` is the slugged name.
|
|
70
|
+
- Mint with the slug discipline: "Ada Lovelace" → `person:ada-lovelace`. Two
|
|
71
|
+
spellings that slug the same **must** get the same id — that's how the same
|
|
72
|
+
entity across two documents becomes one node.
|
|
73
|
+
- The id is an **address, not a record**: identity goes in the id, mutable facts
|
|
74
|
+
go in `properties`. Never bake `status=active` into an id.
|
|
75
|
+
|
|
76
|
+
## Stance: the ontology's extraction guidance
|
|
77
|
+
|
|
78
|
+
`stance` is **free-text guidance the ontology carries about how to extract** — a
|
|
79
|
+
dial, not a switch, and not a fixed enum. Read it and apply judgment; the
|
|
80
|
+
ontology's own words win. Common values, just to convey the range (not a menu to
|
|
81
|
+
pick from):
|
|
82
|
+
|
|
83
|
+
- **`literal`** — assert only what the text states outright: entities only if
|
|
84
|
+
named, relationships only if stated, never guess two mentions are the same.
|
|
85
|
+
Fits legal, technical, contractual sources.
|
|
86
|
+
- **`inferential`** — also assert what's clearly implied, resolve aliases
|
|
87
|
+
("Ada" / "Lovelace" / "the Countess" → one node), normalize and enrich. Fits
|
|
88
|
+
research notes, brainstorming, exploratory reading.
|
|
89
|
+
- …or whatever the ontology actually says — `"conservative; medical terms must
|
|
90
|
+
be exact"`, `"connect aggressively, this is ideation"`. Honor the intent, not
|
|
91
|
+
a keyword.
|
|
92
|
+
|
|
93
|
+
**The floor (always, regardless of stance):** when you assert something the
|
|
94
|
+
source didn't state outright, mark it — add `{"inferred": true}` (and a `source`
|
|
95
|
+
snippet when useful) to that node/edge's properties. An inference is always
|
|
96
|
+
visible and revertible, never laundered into a stated fact. This isn't a stance
|
|
97
|
+
choice; it's the one non-negotiable.
|
|
98
|
+
|
|
99
|
+
## Guardrails
|
|
100
|
+
- **Reversible, so don't freeze up.** Every write is logged; a bad edge is one
|
|
101
|
+
`kg --ontology NAME revert <event-id>` away. Compose confidently, then review.
|
|
102
|
+
- **Respect `allowed_kinds`.** If the ontology constrains kinds, stay inside them
|
|
103
|
+
or surface what you'd have to add.
|
|
104
|
+
- **Don't put data in ids.** (See *Id rules*.)
|
|
105
|
+
- **One ontology per coherent domain.** If the source clearly spans two unrelated
|
|
106
|
+
domains, ask whether to split into two ontologies rather than blending them.
|
|
107
|
+
|
|
108
|
+
## Anti-patterns
|
|
109
|
+
- Minting ids by hand that bypass slug discipline (`person:Ada` vs
|
|
110
|
+
`person:ada-lovelace`) — breaks cross-document dedup.
|
|
111
|
+
- Inventing relationships under `literal` stance.
|
|
112
|
+
- Encoding a whole sentence as a node `name` instead of extracting the entity.
|
|
113
|
+
- Emitting one write per entity — dozens of `kg_node_upsert`/`kg node add` calls —
|
|
114
|
+
when a single `kg_import` (MCP) or `kg import` (CLI) does the whole batch in one
|
|
115
|
+
gated, logged, atomic transaction. This is the most common mistake; don't.
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
# Python
|
|
2
|
+
__pycache__/
|
|
3
|
+
*.py[cod]
|
|
4
|
+
*.egg-info/
|
|
5
|
+
.eggs/
|
|
6
|
+
build/
|
|
7
|
+
dist/
|
|
8
|
+
.venv/
|
|
9
|
+
venv/
|
|
10
|
+
|
|
11
|
+
# Test / tooling
|
|
12
|
+
.pytest_cache/
|
|
13
|
+
.mypy_cache/
|
|
14
|
+
.ruff_cache/
|
|
15
|
+
.coverage
|
|
16
|
+
htmlcov/
|
|
17
|
+
|
|
18
|
+
# Local graph databases
|
|
19
|
+
*.db
|
|
20
|
+
*.db-wal
|
|
21
|
+
*.db-shm
|
|
22
|
+
|
|
23
|
+
# Scratch / generated artifacts (benchmark JSON, charts).
|
|
24
|
+
# Promote a chart worth keeping into assets/ and commit that explicitly.
|
|
25
|
+
temp/
|
|
26
|
+
|
|
27
|
+
# OS / editor
|
|
28
|
+
.DS_Store
|
|
29
|
+
.idea/
|
|
30
|
+
.vscode/
|
|
@@ -0,0 +1,110 @@
|
|
|
1
|
+
# CLAUDE.md
|
|
2
|
+
|
|
3
|
+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
4
|
+
|
|
5
|
+
## What this is
|
|
6
|
+
|
|
7
|
+
A label property graph (nodes, typed directed edges, labels, JSON properties) stored in SQLite. The core library has **zero third-party dependencies** — everything is stdlib + SQLite. Three front doors (Python library, `kg` CLI, MCP server) sit on one gated + logged engine, and a **control plane** (`resolver.py` + the `backends/` package) lets all three address *many named ontologies* — each its own file (or, eventually, its own engine) — through one interface. See `README.md` for the full design narrative.
|
|
8
|
+
|
|
9
|
+
## Commands
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
uv venv && uv pip install -e ".[dev]" # set up dev env (installs pytest + mcp)
|
|
13
|
+
|
|
14
|
+
pytest # run all tests (~62)
|
|
15
|
+
pytest tests/test_graph.py # one file
|
|
16
|
+
pytest tests/test_graph.py::test_name # one test
|
|
17
|
+
pytest -k events # by keyword
|
|
18
|
+
|
|
19
|
+
python bench/benchmark.py # perf, full p50–p99 distributions
|
|
20
|
+
python bench/charts.py # render assets/*.png from bench data (needs [charts])
|
|
21
|
+
python bench/runtimes/compare.py # CPython vs Node vs Bun SQLite comparison
|
|
22
|
+
|
|
23
|
+
kg stats # default ontology (~/.kgrdbms/graph.db)
|
|
24
|
+
kg ontology list # the registry (the "db of dbs")
|
|
25
|
+
kg ontology create coffee --stance inferential # register a named ontology
|
|
26
|
+
kg --ontology coffee node add drink:latte --kind Drink # route to it (resolver)
|
|
27
|
+
kg --db /tmp/x.db node add a:1 --kind T # raw escape hatch: exact file, no registry
|
|
28
|
+
kg serve # run the MCP server (needs [mcp] extra)
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
There is no linter/formatter configured — match the surrounding style (type hints, `from __future__ import annotations`, dataclasses).
|
|
32
|
+
|
|
33
|
+
## Architecture: the load-bearing ideas
|
|
34
|
+
|
|
35
|
+
**Two write paths, and the distinction is the whole point.**
|
|
36
|
+
|
|
37
|
+
- **Direct path** — `graph.py` methods (`g.add_node`, `g.add_nodes`, `g.batch()`). Writes go straight to the SQLite projection. Fast, but **not gated and not logged** — `replay()` will not reproduce them. Use for bulk loading raw data.
|
|
38
|
+
- **Logged path** — everything in `service.py` (used by the CLI and MCP server). Each mutation is gated, then applied, then appended to the `graph_events` log. Audited, reversible, replayable.
|
|
39
|
+
|
|
40
|
+
When you add or change a mutation, decide which path it belongs to. A new gated operation must go through `service.py`, not directly on `Graph`.
|
|
41
|
+
|
|
42
|
+
**The graph is a projection; the event log is the source of truth.** `events.py` holds an append-only log. `replay(graph, events, genesis=...)` rebuilds the projection from an optional declarative seed + the log, optionally `upto_ts=` for time travel. Undo is `compensate()` — it appends the *inverse* event, never deletes a row. If you add a new logged operation, you must also make it replayable and compensatable in `events.py` (add an `OP_*` constant + apply/compensate handling).
|
|
43
|
+
|
|
44
|
+
**The two-layer mutation gate, in order.** `service.guard()` runs `invariants.enforce()` **before** `policy.mutation_check()`:
|
|
45
|
+
|
|
46
|
+
- `invariants.py` = mechanism. Compiled-in, no off switch, changing one is a code change. Default: no-op.
|
|
47
|
+
- `policy.py` = configuration. A single `mutation_check(ctx) -> Decision`. Default: permissive (allow all). The file has commented example policies at the bottom.
|
|
48
|
+
|
|
49
|
+
Order matters: a permissive or compromised policy can never re-open something an invariant sealed. Failures raise `InvariantViolation` (invariant) vs `PermissionError` (policy) — keep these distinct.
|
|
50
|
+
|
|
51
|
+
**Hooks resolve through their modules at call time.** `guard()` calls `invariants.enforce` / `policy.mutation_check` via the module, not a captured reference — so editing policy (or monkeypatching it in a test) takes effect across all three front doors at once. Don't `from policy import mutation_check` into the service and call it directly; that would break this.
|
|
52
|
+
|
|
53
|
+
## Control plane: ontologies, the resolver, the backend registry
|
|
54
|
+
|
|
55
|
+
The engine above (`graph.py` + `service.py` + the log + the gate) operates on **one** graph. The control plane lets the three front doors address **many named ontologies** through that same engine, without any of them knowing where an ontology physically lives.
|
|
56
|
+
|
|
57
|
+
**`resolver.py` maps an ontology *name* → a `Resolved(backend, events, entry)` bundle.** Callers say `resolve("coffee")`; the resolver looks the name up in an **index** (itself a kg — `<root>/index.db`, nodes of kind `Ontology`), opens the right backend, pairs it with an event log, and hands back the bundle. The front doors then call the *same* `service.*` functions as before — the gate and the log didn't move, only *which* `(graph, events)` pair gets passed in. That's why multi-tenancy was a small change: the single write path was already the choke point.
|
|
58
|
+
|
|
59
|
+
- **The default ontology is the legacy file.** `resolver._default_db_path` special-cases the default name → `<root>/graph.db`, so omitting `--ontology` / the `ontology` arg behaves exactly as the pre-resolver code did. Backward-compat lives in that one function; both front doors inherit it. Don't scatter default-path logic elsewhere.
|
|
60
|
+
- **Isolation is filesystem-shaped, not code-shaped.** Each named ontology is its own SQLite file under `<root>/ontologies/<slug>/graph.db`, with its own event log. No tenant-id columns, no row filtering — "coffee doesn't know Ada" because they're different files.
|
|
61
|
+
|
|
62
|
+
**`backends/` is the pluggable data plane (engine registry).** An engine is a factory `(*, location, **opts) -> GraphBackend` registered with `@backend("name")`. Adding one = write a module + decorate its factory + add one import line in `backends/__init__.py`. **No switch to edit** — `resolver._open_backend` does `get_backend(entry.backend)(location=...)` and never knows which engines exist.
|
|
63
|
+
|
|
64
|
+
- `backends/base.py` — the `GraphBackend` Protocol (the finite method surface `service.py` + reads depend on; `Graph` satisfies it *structurally*, zero changes to `graph.py`) and `_StubBackend` (raising skeleton so a half-built engine is still a routable, fail-loud `GraphBackend`).
|
|
65
|
+
- `sqlite.py` and `postgres.py` are **live**; `neo4j.py` is a **stub** (routes and fails per-call with what's missing; its docstring is the ADR for how it'd be built). `postgres.py` is the scale-up: same five-table model + query shapes ported to psycopg (`%s`, `ON CONFLICT`, `jsonb` properties, `= ANY(%s)`, the recursive-CTE `descendants`, BFS traversals reused verbatim). It needs the `postgres` extra (`psycopg`), imported lazily so `import kgrdbms.backends` works without it; a missing driver raises `NotImplementedError`. `location` for postgres is a DSN, not a file path — `register()` requires it for any non-sqlite backend.
|
|
66
|
+
- **The event log is decoupled from the backend.** `EventLog(store, projection=None)`: *store* is the SQLite that holds the log rows; *projection* is the `GraphBackend` that `compensate()`/replay apply to. They coincide for sqlite (`EventLog(graph)` — projection defaults to store, unchanged). For postgres the store is `resolver._ControlPlaneLogStore` (a `<root>/ontologies/<slug>/events.db` sidecar) and the projection is the `PostgresGraph` — so audit/replay/undo keep working with graph data in Postgres and history in SQLite. `apply_event` only calls `GraphBackend` methods, so it drives any backend. This store↔projection split is the seam to respect for *any* non-sqlite engine (neo4j next).
|
|
67
|
+
- **New failure class:** routing to a stub engine raises `NotImplementedError`. Every front door's error handling must account for it (the CLI's `main()` already maps it to `unavailable: …` / exit 1).
|
|
68
|
+
|
|
69
|
+
## Layout
|
|
70
|
+
|
|
71
|
+
```
|
|
72
|
+
kgrdbms/
|
|
73
|
+
├── graph.py # the LPG over SQLite — imports nothing internal; usable standalone
|
|
74
|
+
├── events.py # append-only event log: OP_* constants, record, compensate, replay
|
|
75
|
+
├── policy.py # configurable mutation policy (edit mutation_check)
|
|
76
|
+
├── invariants.py # compiled-in invariants, run before policy (no-op default)
|
|
77
|
+
├── service.py # the shared gated + logged write path (all front doors use this)
|
|
78
|
+
├── resolver.py # control plane: name → (backend, events, entry); the ontology index
|
|
79
|
+
├── backends/ # pluggable data plane (engine registry)
|
|
80
|
+
│ ├── base.py # GraphBackend Protocol + _StubBackend
|
|
81
|
+
│ ├── __init__.py # registry: @backend(name), get_backend, available_backends
|
|
82
|
+
│ ├── sqlite.py # live engine (adapter over Graph)
|
|
83
|
+
│ ├── postgres.py # live engine (psycopg; jsonb + recursive CTEs); needs [postgres] extra
|
|
84
|
+
│ └── neo4j.py # stub (deep-traversal escalation)
|
|
85
|
+
├── cli.py # the `kg` command (stdlib argparse) — `--ontology` / `--db` / `kg ontology …`
|
|
86
|
+
└── mcp_server.py # MCP server, kg_-prefixed tools, each with optional `ontology=` (optional [mcp] extra)
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
`graph.py` has no internal dependencies — everything else layers on top of it. Dependency direction: `graph` ← `events`/`backends` ← `resolver` ← `service`-callers (`cli`, `mcp_server`). `service.py` depends only on the `GraphBackend` surface, never a concrete engine. Public API is re-exported from `__init__.py`.
|
|
90
|
+
|
|
91
|
+
## Node id convention (CURIEs)
|
|
92
|
+
|
|
93
|
+
Node ids follow `prefix:reference` — `person:ada-lovelace`, `company:apple`, `card:abc123`. This is deliberately a **CURIE** (a compact URI: the prefix is shorthand that expands to a full IRI through a lookup table you only need the day you publish to the RDF/linked-data world). Adopting the shape now keeps interop a cheap, additive option later; until then a CURIE stands alone as a plain string. Three rules:
|
|
94
|
+
|
|
95
|
+
1. **`prefix`** is a short, stable, lowercase type word (`person`, `company`, `card`, `device`) — not the `kind` field's exact casing, just a stable token.
|
|
96
|
+
2. **`reference`** is slugged. Mint ids with `slug(name, prefix="person")` → `person:ada-lovelace`. **`slug()` is the CURIE constructor and the dedup mechanism** — but `add_node` does *not* auto-slug the id it's handed, so an id minted by hand (`"person:Ada"`) will *not* collapse with `slug()`-minted ones. Always go through `slug()` when the local part comes from natural language.
|
|
97
|
+
3. **The id is an address, not a record.** Put identity in the id (`company/apple`), never mutable attributes (`status=active`) — those are node *properties*. Mental model: the id is a URL's *path* (stable), properties are its *query string* (changeable). Baking a changeable fact into an id breaks identity when the fact changes.
|
|
98
|
+
|
|
99
|
+
**Namespacing is free and you don't type it.** Each ontology is its own file with its own registry name, so "which world a node came from" is already known — the ontology *is* the namespace. If two ontologies are ever merged, qualify by ontology name; you don't pre-encode it in the id. An ontology that genuinely needs strict global identity sets its `id_convention` on the registry entry (currently descriptive metadata; the per-ontology enforcement seam, not yet wired to a validator) and adopts fuller CURIE/IRI discipline without affecting the others.
|
|
100
|
+
|
|
101
|
+
**No RDF stack.** This is the *only* RDF idea adopted (identity, because it's the one expensive-to-retrofit decision). No OWL, no SPARQL, no triplestore-as-storage. Interop (Turtle/JSON-LD) and vocabulary borrowing (SKOS, PROV-O) are deferred until a real external consumer exists — both are additive at the boundary, store LPG inside.
|
|
102
|
+
|
|
103
|
+
## Conventions that bite
|
|
104
|
+
|
|
105
|
+
- **Edges are unique on `(from_node, type, to_node)`.** Re-adding the same triple updates properties rather than duplicating — mutations are idempotent by construction. Tests rely on this.
|
|
106
|
+
- **`slug()` deduplicates natural-language ids** — two strings that slugify the same collapse to one node id (see *Node id convention* above; `add_node` does not auto-slug).
|
|
107
|
+
- **Properties round-trip as JSON.** Storage is `value_json`; ints/bools/lists/objects come back as their JSON type. CLI `--prop key=value` parses value as JSON when possible, else keeps it as a string.
|
|
108
|
+
- **Per-call writes each commit (one fsync).** Wrapping work in `batch()` / using `add_nodes` / `add_edges` collapses to one transaction (~10× faster). Don't add a per-row commit inside a bulk loop.
|
|
109
|
+
- **Reads are not in `service.py`** by design — callers hit `Graph` directly. Don't route reads through the gate.
|
|
110
|
+
- **CLI exit codes are contractual:** `0` ok, `1` not found / bad input, `2` policy denial, `3` invariant violation. Preserve these.
|
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
# Code of Conduct
|
|
2
|
+
|
|
3
|
+
## The short version
|
|
4
|
+
|
|
5
|
+
Be kind, be constructive, assume good faith. This is a small project and a
|
|
6
|
+
welcoming one — technical disagreement is good, personal hostility is not.
|
|
7
|
+
|
|
8
|
+
## Expected behavior
|
|
9
|
+
|
|
10
|
+
- Be respectful of differing viewpoints and experience levels.
|
|
11
|
+
- Give and accept constructive feedback gracefully.
|
|
12
|
+
- Focus on what's best for the project and its users.
|
|
13
|
+
- Show empathy toward other community members.
|
|
14
|
+
|
|
15
|
+
## Unacceptable behavior
|
|
16
|
+
|
|
17
|
+
Harassment, discriminatory or derogatory comments, personal or political attacks,
|
|
18
|
+
publishing others' private information, or other conduct that would reasonably be
|
|
19
|
+
considered inappropriate in a professional setting.
|
|
20
|
+
|
|
21
|
+
## Scope
|
|
22
|
+
|
|
23
|
+
Applies in all project spaces — issues, pull requests, discussions — and when
|
|
24
|
+
representing the project in public spaces.
|
|
25
|
+
|
|
26
|
+
## Enforcement
|
|
27
|
+
|
|
28
|
+
Report unacceptable behavior privately to the maintainer via GitHub (see
|
|
29
|
+
`SECURITY.md` for the private-contact route). Reports will be reviewed and
|
|
30
|
+
handled in good faith; the maintainer may remove, edit, or reject contributions
|
|
31
|
+
and comments that violate this code, and may ban contributors for behavior they
|
|
32
|
+
deem inappropriate.
|
|
33
|
+
|
|
34
|
+
This Code of Conduct is adapted in spirit from the
|
|
35
|
+
[Contributor Covenant](https://www.contributor-covenant.org/).
|
|
@@ -0,0 +1,72 @@
|
|
|
1
|
+
# Contributing
|
|
2
|
+
|
|
3
|
+
Thanks for your interest. This is a small, opinionated project — the goal is a
|
|
4
|
+
knowledge graph you can hold in your head, so contributions that keep it legible
|
|
5
|
+
are worth more than ones that add surface area.
|
|
6
|
+
|
|
7
|
+
## Setup
|
|
8
|
+
|
|
9
|
+
```bash
|
|
10
|
+
git clone https://github.com/cunicopia-dev/knowledge-graph-rdbms
|
|
11
|
+
cd knowledge-graph-rdbms
|
|
12
|
+
uv venv && uv pip install -e ".[dev]" # pytest + mcp
|
|
13
|
+
pytest # should be all green
|
|
14
|
+
```
|
|
15
|
+
|
|
16
|
+
The Postgres backend and its tests are optional — they need the `postgres` extra
|
|
17
|
+
and a reachable Postgres (`pip install -e ".[dev,postgres]"`); the suite skips
|
|
18
|
+
those tests cleanly when no database is available.
|
|
19
|
+
|
|
20
|
+
## The rules that matter here
|
|
21
|
+
|
|
22
|
+
These aren't style nits — they're the load-bearing invariants the design depends
|
|
23
|
+
on. A change that breaks one of these will be asked to change, no matter how nice
|
|
24
|
+
it looks.
|
|
25
|
+
|
|
26
|
+
1. **The core stays zero-dependency.** `kgrdbms` (the library, CLI, engine) imports
|
|
27
|
+
only the standard library + SQLite. Third-party deps live behind extras
|
|
28
|
+
(`mcp`, `postgres`, `charts`) and are imported lazily. Don't add a hard
|
|
29
|
+
dependency to the core.
|
|
30
|
+
|
|
31
|
+
2. **Mutations go through the gate.** Any new *logged* operation must go through
|
|
32
|
+
`service.py` (which runs `invariants.enforce` then `policy.mutation_check`,
|
|
33
|
+
then records the event) — not directly on a backend. The direct `Graph`
|
|
34
|
+
methods are the unlogged fast path, on purpose; know which one you're adding to.
|
|
35
|
+
|
|
36
|
+
3. **A new logged op must be replayable and reversible.** If you add an operation
|
|
37
|
+
to the event log, add its `OP_*` constant plus `apply_event` and `compensate`
|
|
38
|
+
handling in `events.py`. "Every mutation can be replayed and undone" is a
|
|
39
|
+
guarantee, not a nice-to-have.
|
|
40
|
+
|
|
41
|
+
4. **Backends register; they don't get switched on.** A new engine is a module in
|
|
42
|
+
`backends/` with a factory decorated `@backend("name")` and one import line in
|
|
43
|
+
`backends/__init__.py`. It implements the `GraphBackend` protocol. No `if
|
|
44
|
+
engine == ...` ladders anywhere.
|
|
45
|
+
|
|
46
|
+
5. **Node ids are CURIEs** (`prefix:reference`, minted via `slug()`). See the
|
|
47
|
+
convention in `CLAUDE.md` / the README before inventing id shapes.
|
|
48
|
+
|
|
49
|
+
6. **CLI exit codes are contractual:** `0` ok · `1` not found / bad input · `2`
|
|
50
|
+
policy denial · `3` invariant violation · (`unavailable`/1 for an unbuilt
|
|
51
|
+
backend). Preserve them.
|
|
52
|
+
|
|
53
|
+
`CLAUDE.md` documents the architecture in more depth — it's worth a read before a
|
|
54
|
+
non-trivial change.
|
|
55
|
+
|
|
56
|
+
## Style
|
|
57
|
+
|
|
58
|
+
There's no linter or formatter configured — match the surrounding code:
|
|
59
|
+
`from __future__ import annotations`, type hints, dataclasses, clear names over
|
|
60
|
+
clever ones. Reads are not gated; writes are. Keep tests close to the behavior
|
|
61
|
+
they describe.
|
|
62
|
+
|
|
63
|
+
## Pull requests
|
|
64
|
+
|
|
65
|
+
- Keep the diff focused; one idea per PR.
|
|
66
|
+
- Add or update tests — `pytest` should pass, and new behavior should be covered.
|
|
67
|
+
- If you change a public behavior, update the README and `CLAUDE.md` to match.
|
|
68
|
+
- Describe *why*, not just *what*, in the PR body.
|
|
69
|
+
|
|
70
|
+
Small fixes and docs improvements are very welcome. For larger changes (a new
|
|
71
|
+
backend, a new logged operation, a change to the gate), open an issue first so we
|
|
72
|
+
can agree on the shape before you build it.
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Keith Cunic
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|