knowledge-graph-rdbms 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (47) hide show
  1. knowledge_graph_rdbms-0.1.0/.claude/skills/kg-compose/SKILL.md +115 -0
  2. knowledge_graph_rdbms-0.1.0/.gitignore +30 -0
  3. knowledge_graph_rdbms-0.1.0/CLAUDE.md +110 -0
  4. knowledge_graph_rdbms-0.1.0/CODE_OF_CONDUCT.md +35 -0
  5. knowledge_graph_rdbms-0.1.0/CONTRIBUTING.md +72 -0
  6. knowledge_graph_rdbms-0.1.0/LICENSE +21 -0
  7. knowledge_graph_rdbms-0.1.0/PKG-INFO +746 -0
  8. knowledge_graph_rdbms-0.1.0/README.md +718 -0
  9. knowledge_graph_rdbms-0.1.0/SECURITY.md +56 -0
  10. knowledge_graph_rdbms-0.1.0/assets/crossover.png +0 -0
  11. knowledge_graph_rdbms-0.1.0/assets/read_latency.png +0 -0
  12. knowledge_graph_rdbms-0.1.0/assets/runtimes.png +0 -0
  13. knowledge_graph_rdbms-0.1.0/assets/write_throughput.png +0 -0
  14. knowledge_graph_rdbms-0.1.0/bench/README.md +144 -0
  15. knowledge_graph_rdbms-0.1.0/bench/benchmark.py +396 -0
  16. knowledge_graph_rdbms-0.1.0/bench/charts.py +354 -0
  17. knowledge_graph_rdbms-0.1.0/bench/neo4j/README.md +66 -0
  18. knowledge_graph_rdbms-0.1.0/bench/neo4j/headtohead.py +233 -0
  19. knowledge_graph_rdbms-0.1.0/bench/postgres/README.md +67 -0
  20. knowledge_graph_rdbms-0.1.0/bench/postgres/benchmark.py +301 -0
  21. knowledge_graph_rdbms-0.1.0/bench/postgres/charts.py +230 -0
  22. knowledge_graph_rdbms-0.1.0/bench/runtimes/compare.py +96 -0
  23. knowledge_graph_rdbms-0.1.0/bench/runtimes/run_bun.js +40 -0
  24. knowledge_graph_rdbms-0.1.0/bench/runtimes/run_node.mjs +44 -0
  25. knowledge_graph_rdbms-0.1.0/bench/runtimes/run_python.py +81 -0
  26. knowledge_graph_rdbms-0.1.0/kgrdbms/__init__.py +50 -0
  27. knowledge_graph_rdbms-0.1.0/kgrdbms/backends/__init__.py +69 -0
  28. knowledge_graph_rdbms-0.1.0/kgrdbms/backends/base.py +104 -0
  29. knowledge_graph_rdbms-0.1.0/kgrdbms/backends/neo4j.py +45 -0
  30. knowledge_graph_rdbms-0.1.0/kgrdbms/backends/postgres.py +525 -0
  31. knowledge_graph_rdbms-0.1.0/kgrdbms/backends/sqlite.py +23 -0
  32. knowledge_graph_rdbms-0.1.0/kgrdbms/cli.py +538 -0
  33. knowledge_graph_rdbms-0.1.0/kgrdbms/events.py +319 -0
  34. knowledge_graph_rdbms-0.1.0/kgrdbms/graph.py +749 -0
  35. knowledge_graph_rdbms-0.1.0/kgrdbms/invariants.py +45 -0
  36. knowledge_graph_rdbms-0.1.0/kgrdbms/mcp_server.py +417 -0
  37. knowledge_graph_rdbms-0.1.0/kgrdbms/policy.py +111 -0
  38. knowledge_graph_rdbms-0.1.0/kgrdbms/resolver.py +269 -0
  39. knowledge_graph_rdbms-0.1.0/kgrdbms/service.py +221 -0
  40. knowledge_graph_rdbms-0.1.0/pyproject.toml +41 -0
  41. knowledge_graph_rdbms-0.1.0/tests/test_bulk.py +174 -0
  42. knowledge_graph_rdbms-0.1.0/tests/test_cli.py +149 -0
  43. knowledge_graph_rdbms-0.1.0/tests/test_events.py +156 -0
  44. knowledge_graph_rdbms-0.1.0/tests/test_graph.py +98 -0
  45. knowledge_graph_rdbms-0.1.0/tests/test_mcp_server.py +190 -0
  46. knowledge_graph_rdbms-0.1.0/tests/test_policy.py +57 -0
  47. knowledge_graph_rdbms-0.1.0/tests/test_postgres.py +119 -0
@@ -0,0 +1,115 @@
1
+ ---
2
+ name: kg-compose
3
+ description: Decompose documents or pasted context into a queryable kgrdbms ontology — extract entities and typed relationships, mint stable CURIE ids, and write them gated+logged into a named ontology via the `kg` CLI. Use when the user hands over source material (notes, docs, transcripts, research, pasted text) and wants it turned into a knowledge graph they can query, or says things like "compose this into an ontology", "build a KG from this", "extract the entities and relationships", "turn these notes into a graph".
4
+ ---
5
+
6
+ # kg-compose — document → ontology
7
+
8
+ Turn unstructured source material into structured, queryable graph facts in a
9
+ named kgrdbms ontology. You are the extraction engine; the ontology supplies the
10
+ *opinion* (how aggressive to be), and every write is gated and logged so a wrong
11
+ call is reversible, not permanent.
12
+
13
+ ## The one idea
14
+
15
+ **You are mechanism; the ontology is policy.** Don't impose a house style — read
16
+ the target ontology's `stance` and honor it. A `literal` legal-notes ontology and
17
+ an `inferential` research-notes ontology get *different* graphs from the same
18
+ paragraph, on purpose.
19
+
20
+ ## Procedure
21
+
22
+ ### 0. Resolve the target ontology
23
+ - If the user named one, use it. If not, propose a short kebab-case name from the
24
+ material and confirm.
25
+ - Check whether it exists: `kg --json ontology list`. If absent, create it:
26
+ `kg ontology create NAME --stance <literal|inferential> --description "…"`.
27
+ If present, **do not recreate it** — read its existing `stance`/`path` from the
28
+ list output and honor them.
29
+
30
+ ### 1. Read the ontology's opinion
31
+ From the registry entry: `stance` (free-text extraction guidance — see *Stance*
32
+ below), `allowed_kinds` (if non-empty, prefer those kinds; extract others but
33
+ flag that they're outside the ontology's allowlist), `id_convention` (default
34
+ CURIE `prefix:slug`). These are the guidance you compose within.
35
+
36
+ ### 2. Decompose the source into a `{nodes, edges}` model
37
+ - **Nodes** = the *things*. Each: `id` (a CURIE — see *Id rules*), `kind` (a
38
+ TitleCase type like `Person`, `Company`, `Method`), `name` (display string),
39
+ `labels` (set memberships), `properties` (JSON facts).
40
+ - **Edges** = the *relationships*. Each: `from`, `to`, `type` (UPPER_SNAKE verb
41
+ like `FOUNDED`, `MADE_WITH`, `REPORTS_TO`), and optional `properties` (facts
42
+ about the relationship itself — `year`, `confidence`, `source`).
43
+ - Let the **stance** govern how far past the literal text you go.
44
+
45
+ ### 3. Write it (bulk, gated, logged — ONE call, not N)
46
+ Write the whole `{nodes, edges}` model in a single bulk operation. **Do not emit
47
+ dozens of individual upsert calls** — use the bulk path:
48
+
49
+ - **Over MCP:** call `kg_import(nodes=[...], edges=[...], ontology="NAME")` once.
50
+ - **Over the CLI:** `kg --ontology NAME import /tmp/compose.json --actor kg-compose`.
51
+
52
+ Both run the same gated + logged path inside one transaction — fast *and* fully
53
+ recorded, so it survives replay. Every node/edge is still individually gated and
54
+ reversible; bulk only collapses the commit, not the gate. Re-running on
55
+ overlapping sources is safe: stable CURIE ids make re-imports **merge**, never
56
+ duplicate. For a very large source, chunk into a few `kg_import` calls rather
57
+ than one per entity.
58
+
59
+ ### 4. Verify and hand back the receipt
60
+ - `kg --ontology NAME stats` — what landed.
61
+ - `kg --ontology NAME path A B` (or a `neighbors`/`out` query) — show it's
62
+ actually connected, not just a pile of nodes.
63
+ - `kg --ontology NAME events -n 10` — the audit trail of this composition.
64
+ - Summarize in prose: counts, the key entities/relationships, and anything you
65
+ inferred (so the user can see and revert it).
66
+
67
+ ## Id rules (CURIEs)
68
+ - `id = prefix:reference`. `prefix` is a short stable lowercase type token
69
+ (`person`, `company`, `paper`); `reference` is the slugged name.
70
+ - Mint with the slug discipline: "Ada Lovelace" → `person:ada-lovelace`. Two
71
+ spellings that slug the same **must** get the same id — that's how the same
72
+ entity across two documents becomes one node.
73
+ - The id is an **address, not a record**: identity goes in the id, mutable facts
74
+ go in `properties`. Never bake `status=active` into an id.
75
+
76
+ ## Stance: the ontology's extraction guidance
77
+
78
+ `stance` is **free-text guidance the ontology carries about how to extract** — a
79
+ dial, not a switch, and not a fixed enum. Read it and apply judgment; the
80
+ ontology's own words win. Common values, just to convey the range (not a menu to
81
+ pick from):
82
+
83
+ - **`literal`** — assert only what the text states outright: entities only if
84
+ named, relationships only if stated, never guess two mentions are the same.
85
+ Fits legal, technical, contractual sources.
86
+ - **`inferential`** — also assert what's clearly implied, resolve aliases
87
+ ("Ada" / "Lovelace" / "the Countess" → one node), normalize and enrich. Fits
88
+ research notes, brainstorming, exploratory reading.
89
+ - …or whatever the ontology actually says — `"conservative; medical terms must
90
+ be exact"`, `"connect aggressively, this is ideation"`. Honor the intent, not
91
+ a keyword.
92
+
93
+ **The floor (always, regardless of stance):** when you assert something the
94
+ source didn't state outright, mark it — add `{"inferred": true}` (and a `source`
95
+ snippet when useful) to that node/edge's properties. An inference is always
96
+ visible and revertible, never laundered into a stated fact. This isn't a stance
97
+ choice; it's the one non-negotiable.
98
+
99
+ ## Guardrails
100
+ - **Reversible, so don't freeze up.** Every write is logged; a bad edge is one
101
+ `kg --ontology NAME revert <event-id>` away. Compose confidently, then review.
102
+ - **Respect `allowed_kinds`.** If the ontology constrains kinds, stay inside them
103
+ or surface what you'd have to add.
104
+ - **Don't put data in ids.** (See *Id rules*.)
105
+ - **One ontology per coherent domain.** If the source clearly spans two unrelated
106
+ domains, ask whether to split into two ontologies rather than blending them.
107
+
108
+ ## Anti-patterns
109
+ - Minting ids by hand that bypass slug discipline (`person:Ada` vs
110
+ `person:ada-lovelace`) — breaks cross-document dedup.
111
+ - Inventing relationships under `literal` stance.
112
+ - Encoding a whole sentence as a node `name` instead of extracting the entity.
113
+ - Emitting one write per entity — dozens of `kg_node_upsert`/`kg node add` calls —
114
+ when a single `kg_import` (MCP) or `kg import` (CLI) does the whole batch in one
115
+ gated, logged, atomic transaction. This is the most common mistake; don't.
@@ -0,0 +1,30 @@
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *.egg-info/
5
+ .eggs/
6
+ build/
7
+ dist/
8
+ .venv/
9
+ venv/
10
+
11
+ # Test / tooling
12
+ .pytest_cache/
13
+ .mypy_cache/
14
+ .ruff_cache/
15
+ .coverage
16
+ htmlcov/
17
+
18
+ # Local graph databases
19
+ *.db
20
+ *.db-wal
21
+ *.db-shm
22
+
23
+ # Scratch / generated artifacts (benchmark JSON, charts).
24
+ # Promote a chart worth keeping into assets/ and commit that explicitly.
25
+ temp/
26
+
27
+ # OS / editor
28
+ .DS_Store
29
+ .idea/
30
+ .vscode/
@@ -0,0 +1,110 @@
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## What this is
6
+
7
+ A label property graph (nodes, typed directed edges, labels, JSON properties) stored in SQLite. The core library has **zero third-party dependencies** — everything is stdlib + SQLite. Three front doors (Python library, `kg` CLI, MCP server) sit on one gated + logged engine, and a **control plane** (`resolver.py` + the `backends/` package) lets all three address *many named ontologies* — each its own file (or, eventually, its own engine) — through one interface. See `README.md` for the full design narrative.
8
+
9
+ ## Commands
10
+
11
+ ```bash
12
+ uv venv && uv pip install -e ".[dev]" # set up dev env (installs pytest + mcp)
13
+
14
+ pytest # run all tests (~62)
15
+ pytest tests/test_graph.py # one file
16
+ pytest tests/test_graph.py::test_name # one test
17
+ pytest -k events # by keyword
18
+
19
+ python bench/benchmark.py # perf, full p50–p99 distributions
20
+ python bench/charts.py # render assets/*.png from bench data (needs [charts])
21
+ python bench/runtimes/compare.py # CPython vs Node vs Bun SQLite comparison
22
+
23
+ kg stats # default ontology (~/.kgrdbms/graph.db)
24
+ kg ontology list # the registry (the "db of dbs")
25
+ kg ontology create coffee --stance inferential # register a named ontology
26
+ kg --ontology coffee node add drink:latte --kind Drink # route to it (resolver)
27
+ kg --db /tmp/x.db node add a:1 --kind T # raw escape hatch: exact file, no registry
28
+ kg serve # run the MCP server (needs [mcp] extra)
29
+ ```
30
+
31
+ There is no linter/formatter configured — match the surrounding style (type hints, `from __future__ import annotations`, dataclasses).
32
+
33
+ ## Architecture: the load-bearing ideas
34
+
35
+ **Two write paths, and the distinction is the whole point.**
36
+
37
+ - **Direct path** — `graph.py` methods (`g.add_node`, `g.add_nodes`, `g.batch()`). Writes go straight to the SQLite projection. Fast, but **not gated and not logged** — `replay()` will not reproduce them. Use for bulk loading raw data.
38
+ - **Logged path** — everything in `service.py` (used by the CLI and MCP server). Each mutation is gated, then applied, then appended to the `graph_events` log. Audited, reversible, replayable.
39
+
40
+ When you add or change a mutation, decide which path it belongs to. A new gated operation must go through `service.py`, not directly on `Graph`.
41
+
42
+ **The graph is a projection; the event log is the source of truth.** `events.py` holds an append-only log. `replay(graph, events, genesis=...)` rebuilds the projection from an optional declarative seed + the log, optionally `upto_ts=` for time travel. Undo is `compensate()` — it appends the *inverse* event, never deletes a row. If you add a new logged operation, you must also make it replayable and compensatable in `events.py` (add an `OP_*` constant + apply/compensate handling).
43
+
44
+ **The two-layer mutation gate, in order.** `service.guard()` runs `invariants.enforce()` **before** `policy.mutation_check()`:
45
+
46
+ - `invariants.py` = mechanism. Compiled-in, no off switch, changing one is a code change. Default: no-op.
47
+ - `policy.py` = configuration. A single `mutation_check(ctx) -> Decision`. Default: permissive (allow all). The file has commented example policies at the bottom.
48
+
49
+ Order matters: a permissive or compromised policy can never re-open something an invariant sealed. Failures raise `InvariantViolation` (invariant) vs `PermissionError` (policy) — keep these distinct.
50
+
51
+ **Hooks resolve through their modules at call time.** `guard()` calls `invariants.enforce` / `policy.mutation_check` via the module, not a captured reference — so editing policy (or monkeypatching it in a test) takes effect across all three front doors at once. Don't `from policy import mutation_check` into the service and call it directly; that would break this.
52
+
53
+ ## Control plane: ontologies, the resolver, the backend registry
54
+
55
+ The engine above (`graph.py` + `service.py` + the log + the gate) operates on **one** graph. The control plane lets the three front doors address **many named ontologies** through that same engine, without any of them knowing where an ontology physically lives.
56
+
57
+ **`resolver.py` maps an ontology *name* → a `Resolved(backend, events, entry)` bundle.** Callers say `resolve("coffee")`; the resolver looks the name up in an **index** (itself a kg — `<root>/index.db`, nodes of kind `Ontology`), opens the right backend, pairs it with an event log, and hands back the bundle. The front doors then call the *same* `service.*` functions as before — the gate and the log didn't move, only *which* `(graph, events)` pair gets passed in. That's why multi-tenancy was a small change: the single write path was already the choke point.
58
+
59
+ - **The default ontology is the legacy file.** `resolver._default_db_path` special-cases the default name → `<root>/graph.db`, so omitting `--ontology` / the `ontology` arg behaves exactly as the pre-resolver code did. Backward-compat lives in that one function; both front doors inherit it. Don't scatter default-path logic elsewhere.
60
+ - **Isolation is filesystem-shaped, not code-shaped.** Each named ontology is its own SQLite file under `<root>/ontologies/<slug>/graph.db`, with its own event log. No tenant-id columns, no row filtering — "coffee doesn't know Ada" because they're different files.
61
+
62
+ **`backends/` is the pluggable data plane (engine registry).** An engine is a factory `(*, location, **opts) -> GraphBackend` registered with `@backend("name")`. Adding one = write a module + decorate its factory + add one import line in `backends/__init__.py`. **No switch to edit** — `resolver._open_backend` does `get_backend(entry.backend)(location=...)` and never knows which engines exist.
63
+
64
+ - `backends/base.py` — the `GraphBackend` Protocol (the finite method surface `service.py` + reads depend on; `Graph` satisfies it *structurally*, zero changes to `graph.py`) and `_StubBackend` (raising skeleton so a half-built engine is still a routable, fail-loud `GraphBackend`).
65
+ - `sqlite.py` and `postgres.py` are **live**; `neo4j.py` is a **stub** (routes and fails per-call with what's missing; its docstring is the ADR for how it'd be built). `postgres.py` is the scale-up: same five-table model + query shapes ported to psycopg (`%s`, `ON CONFLICT`, `jsonb` properties, `= ANY(%s)`, the recursive-CTE `descendants`, BFS traversals reused verbatim). It needs the `postgres` extra (`psycopg`), imported lazily so `import kgrdbms.backends` works without it; a missing driver raises `NotImplementedError`. `location` for postgres is a DSN, not a file path — `register()` requires it for any non-sqlite backend.
66
+ - **The event log is decoupled from the backend.** `EventLog(store, projection=None)`: *store* is the SQLite that holds the log rows; *projection* is the `GraphBackend` that `compensate()`/replay apply to. They coincide for sqlite (`EventLog(graph)` — projection defaults to store, unchanged). For postgres the store is `resolver._ControlPlaneLogStore` (a `<root>/ontologies/<slug>/events.db` sidecar) and the projection is the `PostgresGraph` — so audit/replay/undo keep working with graph data in Postgres and history in SQLite. `apply_event` only calls `GraphBackend` methods, so it drives any backend. This store↔projection split is the seam to respect for *any* non-sqlite engine (neo4j next).
67
+ - **New failure class:** routing to a stub engine raises `NotImplementedError`. Every front door's error handling must account for it (the CLI's `main()` already maps it to `unavailable: …` / exit 1).
68
+
69
+ ## Layout
70
+
71
+ ```
72
+ kgrdbms/
73
+ ├── graph.py # the LPG over SQLite — imports nothing internal; usable standalone
74
+ ├── events.py # append-only event log: OP_* constants, record, compensate, replay
75
+ ├── policy.py # configurable mutation policy (edit mutation_check)
76
+ ├── invariants.py # compiled-in invariants, run before policy (no-op default)
77
+ ├── service.py # the shared gated + logged write path (all front doors use this)
78
+ ├── resolver.py # control plane: name → (backend, events, entry); the ontology index
79
+ ├── backends/ # pluggable data plane (engine registry)
80
+ │ ├── base.py # GraphBackend Protocol + _StubBackend
81
+ │ ├── __init__.py # registry: @backend(name), get_backend, available_backends
82
+ │ ├── sqlite.py # live engine (adapter over Graph)
83
+ │ ├── postgres.py # live engine (psycopg; jsonb + recursive CTEs); needs [postgres] extra
84
+ │ └── neo4j.py # stub (deep-traversal escalation)
85
+ ├── cli.py # the `kg` command (stdlib argparse) — `--ontology` / `--db` / `kg ontology …`
86
+ └── mcp_server.py # MCP server, kg_-prefixed tools, each with optional `ontology=` (optional [mcp] extra)
87
+ ```
88
+
89
+ `graph.py` has no internal dependencies — everything else layers on top of it. Dependency direction: `graph` ← `events`/`backends` ← `resolver` ← `service`-callers (`cli`, `mcp_server`). `service.py` depends only on the `GraphBackend` surface, never a concrete engine. Public API is re-exported from `__init__.py`.
90
+
91
+ ## Node id convention (CURIEs)
92
+
93
+ Node ids follow `prefix:reference` — `person:ada-lovelace`, `company:apple`, `card:abc123`. This is deliberately a **CURIE** (a compact URI: the prefix is shorthand that expands to a full IRI through a lookup table you only need the day you publish to the RDF/linked-data world). Adopting the shape now keeps interop a cheap, additive option later; until then a CURIE stands alone as a plain string. Three rules:
94
+
95
+ 1. **`prefix`** is a short, stable, lowercase type word (`person`, `company`, `card`, `device`) — not the `kind` field's exact casing, just a stable token.
96
+ 2. **`reference`** is slugged. Mint ids with `slug(name, prefix="person")` → `person:ada-lovelace`. **`slug()` is the CURIE constructor and the dedup mechanism** — but `add_node` does *not* auto-slug the id it's handed, so an id minted by hand (`"person:Ada"`) will *not* collapse with `slug()`-minted ones. Always go through `slug()` when the local part comes from natural language.
97
+ 3. **The id is an address, not a record.** Put identity in the id (`company/apple`), never mutable attributes (`status=active`) — those are node *properties*. Mental model: the id is a URL's *path* (stable), properties are its *query string* (changeable). Baking a changeable fact into an id breaks identity when the fact changes.
98
+
99
+ **Namespacing is free and you don't type it.** Each ontology is its own file with its own registry name, so "which world a node came from" is already known — the ontology *is* the namespace. If two ontologies are ever merged, qualify by ontology name; you don't pre-encode it in the id. An ontology that genuinely needs strict global identity sets its `id_convention` on the registry entry (currently descriptive metadata; the per-ontology enforcement seam, not yet wired to a validator) and adopts fuller CURIE/IRI discipline without affecting the others.
100
+
101
+ **No RDF stack.** This is the *only* RDF idea adopted (identity, because it's the one expensive-to-retrofit decision). No OWL, no SPARQL, no triplestore-as-storage. Interop (Turtle/JSON-LD) and vocabulary borrowing (SKOS, PROV-O) are deferred until a real external consumer exists — both are additive at the boundary, store LPG inside.
102
+
103
+ ## Conventions that bite
104
+
105
+ - **Edges are unique on `(from_node, type, to_node)`.** Re-adding the same triple updates properties rather than duplicating — mutations are idempotent by construction. Tests rely on this.
106
+ - **`slug()` deduplicates natural-language ids** — two strings that slugify the same collapse to one node id (see *Node id convention* above; `add_node` does not auto-slug).
107
+ - **Properties round-trip as JSON.** Storage is `value_json`; ints/bools/lists/objects come back as their JSON type. CLI `--prop key=value` parses value as JSON when possible, else keeps it as a string.
108
+ - **Per-call writes each commit (one fsync).** Wrapping work in `batch()` / using `add_nodes` / `add_edges` collapses to one transaction (~10× faster). Don't add a per-row commit inside a bulk loop.
109
+ - **Reads are not in `service.py`** by design — callers hit `Graph` directly. Don't route reads through the gate.
110
+ - **CLI exit codes are contractual:** `0` ok, `1` not found / bad input, `2` policy denial, `3` invariant violation. Preserve these.
@@ -0,0 +1,35 @@
1
+ # Code of Conduct
2
+
3
+ ## The short version
4
+
5
+ Be kind, be constructive, assume good faith. This is a small project and a
6
+ welcoming one — technical disagreement is good, personal hostility is not.
7
+
8
+ ## Expected behavior
9
+
10
+ - Be respectful of differing viewpoints and experience levels.
11
+ - Give and accept constructive feedback gracefully.
12
+ - Focus on what's best for the project and its users.
13
+ - Show empathy toward other community members.
14
+
15
+ ## Unacceptable behavior
16
+
17
+ Harassment, discriminatory or derogatory comments, personal or political attacks,
18
+ publishing others' private information, or other conduct that would reasonably be
19
+ considered inappropriate in a professional setting.
20
+
21
+ ## Scope
22
+
23
+ Applies in all project spaces — issues, pull requests, discussions — and when
24
+ representing the project in public spaces.
25
+
26
+ ## Enforcement
27
+
28
+ Report unacceptable behavior privately to the maintainer via GitHub (see
29
+ `SECURITY.md` for the private-contact route). Reports will be reviewed and
30
+ handled in good faith; the maintainer may remove, edit, or reject contributions
31
+ and comments that violate this code, and may ban contributors for behavior they
32
+ deem inappropriate.
33
+
34
+ This Code of Conduct is adapted in spirit from the
35
+ [Contributor Covenant](https://www.contributor-covenant.org/).
@@ -0,0 +1,72 @@
1
+ # Contributing
2
+
3
+ Thanks for your interest. This is a small, opinionated project — the goal is a
4
+ knowledge graph you can hold in your head, so contributions that keep it legible
5
+ are worth more than ones that add surface area.
6
+
7
+ ## Setup
8
+
9
+ ```bash
10
+ git clone https://github.com/cunicopia-dev/knowledge-graph-rdbms
11
+ cd knowledge-graph-rdbms
12
+ uv venv && uv pip install -e ".[dev]" # pytest + mcp
13
+ pytest # should be all green
14
+ ```
15
+
16
+ The Postgres backend and its tests are optional — they need the `postgres` extra
17
+ and a reachable Postgres (`pip install -e ".[dev,postgres]"`); the suite skips
18
+ those tests cleanly when no database is available.
19
+
20
+ ## The rules that matter here
21
+
22
+ These aren't style nits — they're the load-bearing invariants the design depends
23
+ on. A change that breaks one of these will be asked to change, no matter how nice
24
+ it looks.
25
+
26
+ 1. **The core stays zero-dependency.** `kgrdbms` (the library, CLI, engine) imports
27
+ only the standard library + SQLite. Third-party deps live behind extras
28
+ (`mcp`, `postgres`, `charts`) and are imported lazily. Don't add a hard
29
+ dependency to the core.
30
+
31
+ 2. **Mutations go through the gate.** Any new *logged* operation must go through
32
+ `service.py` (which runs `invariants.enforce` then `policy.mutation_check`,
33
+ then records the event) — not directly on a backend. The direct `Graph`
34
+ methods are the unlogged fast path, on purpose; know which one you're adding to.
35
+
36
+ 3. **A new logged op must be replayable and reversible.** If you add an operation
37
+ to the event log, add its `OP_*` constant plus `apply_event` and `compensate`
38
+ handling in `events.py`. "Every mutation can be replayed and undone" is a
39
+ guarantee, not a nice-to-have.
40
+
41
+ 4. **Backends register; they don't get switched on.** A new engine is a module in
42
+ `backends/` with a factory decorated `@backend("name")` and one import line in
43
+ `backends/__init__.py`. It implements the `GraphBackend` protocol. No `if
44
+ engine == ...` ladders anywhere.
45
+
46
+ 5. **Node ids are CURIEs** (`prefix:reference`, minted via `slug()`). See the
47
+ convention in `CLAUDE.md` / the README before inventing id shapes.
48
+
49
+ 6. **CLI exit codes are contractual:** `0` ok · `1` not found / bad input · `2`
50
+ policy denial · `3` invariant violation · (`unavailable`/1 for an unbuilt
51
+ backend). Preserve them.
52
+
53
+ `CLAUDE.md` documents the architecture in more depth — it's worth a read before a
54
+ non-trivial change.
55
+
56
+ ## Style
57
+
58
+ There's no linter or formatter configured — match the surrounding code:
59
+ `from __future__ import annotations`, type hints, dataclasses, clear names over
60
+ clever ones. Reads are not gated; writes are. Keep tests close to the behavior
61
+ they describe.
62
+
63
+ ## Pull requests
64
+
65
+ - Keep the diff focused; one idea per PR.
66
+ - Add or update tests — `pytest` should pass, and new behavior should be covered.
67
+ - If you change a public behavior, update the README and `CLAUDE.md` to match.
68
+ - Describe *why*, not just *what*, in the PR body.
69
+
70
+ Small fixes and docs improvements are very welcome. For larger changes (a new
71
+ backend, a new logged operation, a change to the gate), open an issue first so we
72
+ can agree on the shape before you build it.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Keith Cunic
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.