temporal-reasoning 0.3.2__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- temporal_reasoning-0.3.2/LICENSE +21 -0
- temporal_reasoning-0.3.2/PKG-INFO +287 -0
- temporal_reasoning-0.3.2/README.md +255 -0
- temporal_reasoning-0.3.2/mcp_server.py +2734 -0
- temporal_reasoning-0.3.2/pyproject.toml +61 -0
- temporal_reasoning-0.3.2/report_issue.py +233 -0
- temporal_reasoning-0.3.2/setup.cfg +4 -0
- temporal_reasoning-0.3.2/temporal_reasoning.egg-info/PKG-INFO +287 -0
- temporal_reasoning-0.3.2/temporal_reasoning.egg-info/SOURCES.txt +13 -0
- temporal_reasoning-0.3.2/temporal_reasoning.egg-info/dependency_links.txt +1 -0
- temporal_reasoning-0.3.2/temporal_reasoning.egg-info/entry_points.txt +2 -0
- temporal_reasoning-0.3.2/temporal_reasoning.egg-info/requires.txt +17 -0
- temporal_reasoning-0.3.2/temporal_reasoning.egg-info/top_level.txt +2 -0
- temporal_reasoning-0.3.2/tests/test_install.py +76 -0
- temporal_reasoning-0.3.2/tests/test_mcp_server.py +2622 -0
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Aditya Mukhopadhyay
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,287 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: temporal-reasoning
|
|
3
|
+
Version: 0.3.2
|
|
4
|
+
Summary: Perfect memory. Exact reasoning. Complete history. Bi-temporal graph memory for AI coding agents.
|
|
5
|
+
Author-email: Aditya Mukhopadhyay <webmaster@adityamukho.com>
|
|
6
|
+
License: MIT
|
|
7
|
+
Keywords: ai-agents,graph-database,datalog,knowledge-graph,persistent-memory,temporal-reasoning
|
|
8
|
+
Classifier: Development Status :: 3 - Alpha
|
|
9
|
+
Classifier: Intended Audience :: Developers
|
|
10
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
11
|
+
Classifier: Programming Language :: Python :: 3
|
|
12
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
13
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
15
|
+
Requires-Python: >=3.10
|
|
16
|
+
Description-Content-Type: text/markdown
|
|
17
|
+
License-File: LICENSE
|
|
18
|
+
Requires-Dist: minigraf>=0.22.0
|
|
19
|
+
Requires-Dist: mcp>=1.27.0
|
|
20
|
+
Provides-Extra: bm25
|
|
21
|
+
Requires-Dist: rank-bm25; extra == "bm25"
|
|
22
|
+
Provides-Extra: git-ingestion
|
|
23
|
+
Requires-Dist: tree-sitter-languages; extra == "git-ingestion"
|
|
24
|
+
Provides-Extra: llm
|
|
25
|
+
Requires-Dist: anthropic>=0.40.0; extra == "llm"
|
|
26
|
+
Provides-Extra: dev
|
|
27
|
+
Requires-Dist: pytest>=8.4.2; extra == "dev"
|
|
28
|
+
Requires-Dist: pytest-asyncio>=0.23.0; extra == "dev"
|
|
29
|
+
Requires-Dist: black>=25.11.0; extra == "dev"
|
|
30
|
+
Requires-Dist: ruff>=0.15.12; extra == "dev"
|
|
31
|
+
Dynamic: license-file
|
|
32
|
+
|
|
33
|
+
# Temporal Reasoning
|
|
34
|
+
|
|
35
|
+
**Perfect memory. Exact reasoning. Complete history.**
|
|
36
|
+
|
|
37
|
+
Temporal Reasoning gives AI coding agents bi-temporal graph memory: query any past state, traverse live dependency graphs, and correlate architectural decisions with structural change — all with deterministic Datalog, no fuzzy retrieval.
|
|
38
|
+
|
|
39
|
+
## Questions Only Temporal Reasoning Can Answer
|
|
40
|
+
|
|
41
|
+
These queries are impossible with git log, vector search, or key-value memory:
|
|
42
|
+
|
|
43
|
+
```datalog
|
|
44
|
+
; What did the dependency graph look like before the auth refactor?
|
|
45
|
+
[:find ?caller ?callee
|
|
46
|
+
:as-of 30
|
|
47
|
+
:where [?caller :calls ?callee]]
|
|
48
|
+
|
|
49
|
+
; When did this coupling first appear — and what decision caused it?
|
|
50
|
+
[:find ?reason
|
|
51
|
+
:where [:project/service-a :depends-on :project/service-b]
|
|
52
|
+
[?d :motivated-by ?c]
|
|
53
|
+
[?c :description ?reason]]
|
|
54
|
+
|
|
55
|
+
; Which modules were coupled to the payment service when we made the DB decision?
|
|
56
|
+
[:find ?module
|
|
57
|
+
:as-of 15
|
|
58
|
+
:where [?module :depends-on :service/payment]]
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
This is the only tool where both the decision and the structural change live as datoms in the same graph and can be joined in a single query. See [Phase 5](ROADMAP.md) for code structure evolution from git history.
|
|
62
|
+
|
|
63
|
+
## Why Temporal Reasoning?
|
|
64
|
+
|
|
65
|
+
Most memory tools for agents are key-value stores or vector databases. They answer "what do you know now?" Temporal Reasoning answers a harder question: **"what did you know then, and what changed?"**
|
|
66
|
+
|
|
67
|
+
**Time travel.** Every write is stamped with a transaction number. You can query the graph as it existed at any past transaction:
|
|
68
|
+
|
|
69
|
+
```python
|
|
70
|
+
# Decision made in session 1, transaction 3
|
|
71
|
+
transact('[[:project/db :name "PostgreSQL"]]', reason="Initial choice")
|
|
72
|
+
|
|
73
|
+
# Changed in session 4, transaction 11
|
|
74
|
+
retract('[[:project/db :name "PostgreSQL"]]', reason="Switching to CockroachDB for geo-distribution")
|
|
75
|
+
transact('[[:project/db :name "CockroachDB"]]', reason="Switching to CockroachDB for geo-distribution")
|
|
76
|
+
|
|
77
|
+
# Later: what did we think the database was before session 4?
|
|
78
|
+
query("[:find ?name :as-of 10 :where [:project/db :name ?name]]")
|
|
79
|
+
# → "PostgreSQL"
|
|
80
|
+
|
|
81
|
+
# What do we think now?
|
|
82
|
+
query("[:find ?name :where [:project/db :name ?name]]")
|
|
83
|
+
# → "CockroachDB"
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
**Retraction with preserved history.** Changing your mind doesn't erase the record. Retracted facts stay in the bi-temporal log and remain queryable at their original transaction time. This means the agent can always reconstruct *why* a decision changed, not just *what* the current state is.
|
|
87
|
+
|
|
88
|
+
**Exact Datalog queries, not fuzzy search.** Results are deterministic and reproducible — no embedding model, no similarity threshold, no hallucinated retrievals. A query either matches or it doesn't.
|
|
89
|
+
|
|
90
|
+
**Graph traversal.** Entities are first-class nodes — not isolated key-value blobs. Store service-calls-service as a real graph edge (`:calls :project/auth-service`) and traverse it with Datalog joins. Fixed-depth transitive queries (2-hop, 3-hop) are expressed as multi-hop joins. Rules unify multiple edge types under a single named relation.
|
|
91
|
+
|
|
92
|
+
**Local and offline.** A single binary and a file. No API key, no network dependency, no cloud service to go down.
|
|
93
|
+
|
|
94
|
+
## Architecture
|
|
95
|
+
|
|
96
|
+
```
|
|
97
|
+
┌──────────────────────────────────────────────────────────────────┐
|
|
98
|
+
│ AI Coding Agent │
|
|
99
|
+
│ (Claude Code, OpenCode, Codex) │
|
|
100
|
+
└──────────┬───────────────────────────────────────┬───────────────┘
|
|
101
|
+
│ MCP tool calls │ per-turn hooks
|
|
102
|
+
│ (minigraf_query, minigraf_transact, …) │ (UserPromptSubmit / Stop)
|
|
103
|
+
▼ ▼
|
|
104
|
+
┌──────────────────────────┐ ┌─────────────────────────────┐
|
|
105
|
+
│ MCP Server │ │ Hook scripts │
|
|
106
|
+
│ mcp_server.py │◄────────│ prepare_hook.py │
|
|
107
|
+
│ (persistent stdio) │ │ finalize_hook.py │
|
|
108
|
+
└──────────┬───────────────┘ └─────────────────────────────┘
|
|
109
|
+
│
|
|
110
|
+
▼
|
|
111
|
+
┌──────────────────────────────────────────────────────────────────┐
|
|
112
|
+
│ MiniGrafDb Python binding (minigraf package) │
|
|
113
|
+
│ https://github.com/project-minigraf/minigraf │
|
|
114
|
+
│ - Bi-temporal Datalog engine │
|
|
115
|
+
│ - Transaction time + Valid time │
|
|
116
|
+
└──────────┬───────────────────────────────────────────────────────┘
|
|
117
|
+
│
|
|
118
|
+
▼
|
|
119
|
+
┌──────────────────────────────────────────────────────────────────┐
|
|
120
|
+
│ Graph File │
|
|
121
|
+
│ memory.graph (current working directory) │
|
|
122
|
+
└──────────────────────────────────────────────────────────────────┘
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
## Install
|
|
126
|
+
|
|
127
|
+
```bash
|
|
128
|
+
git clone https://github.com/project-minigraf/temporal_reasoning
|
|
129
|
+
cd /your/project
|
|
130
|
+
python /path/to/temporal_reasoning/install.py
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
Run `install.py` from your project root. It creates a virtualenv, installs dependencies, and writes `.mcp.json` and `.claude/settings*.json` into your project directory. That's it.
|
|
134
|
+
|
|
135
|
+
**Optional — LLM extraction strategy:** `install.py` defaults to heuristic (regex) extraction, which requires no API key. To use LLM-based extraction, set `MINIGRAF_EXTRACTION_STRATEGY=llm` and `ANTHROPIC_API_KEY=<your key>` in `.claude/settings.local.json` after running the script.
|
|
136
|
+
|
|
137
|
+
### OpenCode
|
|
138
|
+
|
|
139
|
+
```bash
|
|
140
|
+
python /path/to/temporal_reasoning/install.py
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
This also syncs the skill into `.opencode/skills/temporal-reasoning`.
|
|
144
|
+
|
|
145
|
+
## Quick Start
|
|
146
|
+
|
|
147
|
+
```python
|
|
148
|
+
from minigraf import query, transact
|
|
149
|
+
|
|
150
|
+
# Store a decision
|
|
151
|
+
transact("[[:decision/cache-strategy :decision/description \"use Redis\"]]",
|
|
152
|
+
reason="Architecture decision for low-latency caching")
|
|
153
|
+
|
|
154
|
+
# Query decisions
|
|
155
|
+
result = query("[:find ?d :where [?e :decision/description ?d]]")
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
## Storage Location
|
|
159
|
+
|
|
160
|
+
Default: `memory.graph` in the current working directory.
|
|
161
|
+
|
|
162
|
+
Override: `MINIGRAF_GRAPH_PATH=/custom/path python ...`
|
|
163
|
+
|
|
164
|
+
## Per-Turn Auto-Memory
|
|
165
|
+
|
|
166
|
+
When running under Claude Code with the hook configuration in `hooks/claude-code.json`, the system automatically injects relevant memory context before each turn and extracts durable facts after each turn — without the agent explicitly calling any tool.
|
|
167
|
+
|
|
168
|
+
### Prepare phase (before the turn)
|
|
169
|
+
|
|
170
|
+
`prepare_hook.py` fires on the `UserPromptSubmit` event. It:
|
|
171
|
+
|
|
172
|
+
1. Extracts candidate entity tokens from the user's message (stop-word filtered, minimum 4 characters).
|
|
173
|
+
2. Queries the graph for facts whose values contain those tokens, using `:valid-at` set to the current UTC timestamp so only currently-valid facts are returned.
|
|
174
|
+
3. Falls back to a broad scan (capped by `MINIGRAF_PREPARE_SCAN_LIMIT`, default 50 rows) when no entity-specific results are found.
|
|
175
|
+
4. Returns the results as `additionalContext` prepended to the agent's working context for that turn.
|
|
176
|
+
|
|
177
|
+
For messages containing temporal signals (e.g. "before", "last week", "as of") with an explicit ISO date, `:valid-at` is set to that date instead (midnight UTC), enabling point-in-time recall.
|
|
178
|
+
|
|
179
|
+
### Finalize phase (after the turn)
|
|
180
|
+
|
|
181
|
+
`finalize_hook.py` fires on the `Stop` event. It reads the last user+assistant exchange from the transcript, then runs the configured extraction strategy:
|
|
182
|
+
|
|
183
|
+
| Strategy | Behaviour |
|
|
184
|
+
|----------|-----------|
|
|
185
|
+
| `heuristic` (default) | Regex patterns detect decision-signal phrases ("we'll use X", "decided to use X", "always use X", "depends on X", …) and transact the matched tokens as `:decision/`, `:preference/`, `:constraint/`, or `:dependency/` entities. |
|
|
186
|
+
| `llm` | Sends the exchange to a lightweight Claude model (`claude-haiku-4-5-20251001` by default) with a structured prompt. The model returns a Datalog `transact` expression; an optional `; valid-at: YYYY-MM-DD` comment sets the fact's valid time. Falls back to the `agent` strategy on error. |
|
|
187
|
+
| `agent` | Uses MCP sampling to ask the connected agent itself for a memory block in the same Datalog format. |
|
|
188
|
+
|
|
189
|
+
### Configuration
|
|
190
|
+
|
|
191
|
+
| Environment variable | Default | Effect |
|
|
192
|
+
|----------------------|---------|--------|
|
|
193
|
+
| `MINIGRAF_EXTRACTION_STRATEGY` | `heuristic` | Finalize strategy: `heuristic`, `llm`, or `agent` |
|
|
194
|
+
| `MINIGRAF_PREPARE_SCAN_LIMIT` | `50` | Max rows returned by the broad fallback scan in the prepare phase |
|
|
195
|
+
| `MINIGRAF_LLM_MODEL` | `claude-haiku-4-5-20251001` | Model used when `MINIGRAF_EXTRACTION_STRATEGY=llm` |
|
|
196
|
+
| `ANTHROPIC_API_KEY` | — | Required when `MINIGRAF_EXTRACTION_STRATEGY=llm` and using a Claude model |
|
|
197
|
+
| `OPENAI_API_KEY` | — | Required when `MINIGRAF_EXTRACTION_STRATEGY=llm` and `MINIGRAF_LLM_MODEL` is an OpenAI model (e.g. `gpt-4o-mini`) |
|
|
198
|
+
| `MINIGRAF_GRAPH_PATH` | `memory.graph` | Override the graph file location |
|
|
199
|
+
|
|
200
|
+
## Files
|
|
201
|
+
|
|
202
|
+
| File | Purpose |
|
|
203
|
+
|------|---------|
|
|
204
|
+
| `mcp_server.py` | Persistent stdio MCP server — primary interface to the graph |
|
|
205
|
+
| `minigraf.py` | Python CLI wrapper (direct use outside MCP) |
|
|
206
|
+
| `hooks/prepare_hook.py` | Claude Code UserPromptSubmit hook — injects memory context |
|
|
207
|
+
| `hooks/ingest_hook.py` | Claude Code UserPromptSubmit hook — triggers background git ingestion |
|
|
208
|
+
| `hooks/finalize_hook.py` | Claude Code Stop hook — extracts and stores facts |
|
|
209
|
+
| `hooks/claude-code.json` | Hook + MCP configuration for Claude Code |
|
|
210
|
+
| `report_issue.py` | GitHub issue reporter |
|
|
211
|
+
| `install.py` | Setup script |
|
|
212
|
+
| `pyproject.toml` | Python packaging |
|
|
213
|
+
| `tools/*.json` | Tool schemas |
|
|
214
|
+
|
|
215
|
+
## Tools
|
|
216
|
+
|
|
217
|
+
- **minigraf_query** — Query memory with Datalog
|
|
218
|
+
- **minigraf_transact** — Store facts (reason required)
|
|
219
|
+
- **minigraf_retract** — Retract facts (original stays in history)
|
|
220
|
+
- **minigraf_report_issue** — File GitHub issues
|
|
221
|
+
- **memory_prepare_turn** — Retrieve relevant context for the current user message
|
|
222
|
+
- **memory_finalize_turn** — Extract and store memorable facts after a turn
|
|
223
|
+
- **minigraf_audit** — Audit all entities against the schema; retracts violators (history preserved)
|
|
224
|
+
- **minigraf_ingest_git** — Ingest code structure from git history into the bi-temporal graph (background task)
|
|
225
|
+
- **minigraf_ingest_status** — Poll progress of a running git ingestion; reports wall-clock time and final commit hash of the last completed run (including hook-driven ingestion)
|
|
226
|
+
|
|
227
|
+
## Query Examples
|
|
228
|
+
|
|
229
|
+
```python
|
|
230
|
+
# Basic query
|
|
231
|
+
query("[:find ?x :where [?e :attr ?x]]")
|
|
232
|
+
|
|
233
|
+
# Temporal query (state at transaction N)
|
|
234
|
+
query("[:find ?x :as-of 5 :where [?e :attr ?x]]")
|
|
235
|
+
|
|
236
|
+
# Aggregation
|
|
237
|
+
query("[:find (count ?e) :where [?e :decision/description ?d]]")
|
|
238
|
+
|
|
239
|
+
# Single-hop graph traversal — what does api-gateway call?
|
|
240
|
+
query("[:find ?name :where [:project/api-gateway :calls ?svc] [?svc :name ?name]]")
|
|
241
|
+
|
|
242
|
+
# Two-hop join — transitive impact: what depends on key-store (directly or via one intermediate)?
|
|
243
|
+
query("""[:find ?svc
|
|
244
|
+
:where [?mid :depends-on :project/key-store]
|
|
245
|
+
[?svc :depends-on ?mid]]""")
|
|
246
|
+
|
|
247
|
+
# Decision traceability — why did we choose asyncio?
|
|
248
|
+
query("[:find ?reason :where [:decision/asyncio-choice :motivated-by ?c] [?c :description ?reason]]")
|
|
249
|
+
|
|
250
|
+
# Typed entity query — list all stored components
|
|
251
|
+
query("[:find ?name :where [?e :entity-type :type/component] [?e :name ?name]]")
|
|
252
|
+
```
|
|
253
|
+
|
|
254
|
+
## Skill Benchmarks
|
|
255
|
+
|
|
256
|
+
Twelve evals run in isolated sandboxes measure how the skill changes behavior versus a no-skill baseline. Each eval uses a fresh graph with pre-seeded state where relevant.
|
|
257
|
+
|
|
258
|
+
| Eval | What it tests | With Skill | Without Skill |
|
|
259
|
+
|------|--------------|-----------|---------------|
|
|
260
|
+
| 1 — Decision storage | Persists architectural decisions with correct naming + reasons | 5/5 | 0/5 |
|
|
261
|
+
| 2 — Memory retrieval | Queries memory and cites stored facts by name | 4/5 | 3/5 |
|
|
262
|
+
| 3 — Cross-session preference | Discovers and applies a constraint never stated in the current conversation | 4/4 | 0/4 |
|
|
263
|
+
| 4 — Conflict detection | Surfaces architectural conflicts before silently overriding decisions | 4/4 | 0/4 |
|
|
264
|
+
| 5 — Entity reference storage | Stores relationships as traversable graph edges, not dead-end strings | 5/5 | 0/5 |
|
|
265
|
+
| 6 — Transitive impact analysis | Traverses a multi-hop dependency chain to find all affected services | 5/5 | 4/5 |
|
|
266
|
+
| 7 — Decision traceability | Follows a `:motivated-by` edge to surface the constraint behind a decision | 5/5 | 1/5 |
|
|
267
|
+
| 8 — Git ingestion | Checks status before starting ingestion; moves on without polling | 6/6 | 0/6 |
|
|
268
|
+
| 9 — Ingest status | Reports idle/running/complete accurately; surfaces errors | 5/5 | 0/5 |
|
|
269
|
+
| 10 — Memory prepare-turn | Injects relevant context before the agent responds | 5/5 | 0/5 |
|
|
270
|
+
| 11 — Audit | Detects and retracts schema violations | 4/5 | 0/5 |
|
|
271
|
+
| 12 — Already running | Does not re-trigger ingestion when already in progress | 4/5 | 2/5 |
|
|
272
|
+
| **Total** | | **56/59 (95%)** | **10/59 (17%)** |
|
|
273
|
+
|
|
274
|
+
The cross-session preference eval is the most discriminating for memory recall: the prompt says "make sure it fits with how we do things" with no hint that a relevant constraint exists. The skill queries memory, finds a stored no-mocks preference, and writes a test using real database connections.
|
|
275
|
+
|
|
276
|
+
The transitive impact eval is the most discriminating for graph traversal: given "key-store is being replaced — what breaks?" the skill executes a 2-hop Datalog join and returns a full impact chain; without it, the agent correctly admits it cannot name the affected services.
|
|
277
|
+
|
|
278
|
+
See [`evals/benchmark.md`](evals/benchmark.md) for full results and per-eval breakdowns.
|
|
279
|
+
|
|
280
|
+
## Phases
|
|
281
|
+
|
|
282
|
+
- **Phase 1** — Python skill layer ✓
|
|
283
|
+
- **Phase 2** — Write policy, report_issue, install, skill benchmarks ✓
|
|
284
|
+
- **Phase 3** — MCP server, per-turn auto-memory hooks ✓
|
|
285
|
+
- **Phase 4** — Entity normalization, schema-aware extraction, minigraf_audit ✓
|
|
286
|
+
- **Phase 5** — Code structure ingestion from git history, minigraf_ingest_git ✓
|
|
287
|
+
- **Phase 6** — Observability and trust for automatic memory (planned)
|
|
@@ -0,0 +1,255 @@
|
|
|
1
|
+
# Temporal Reasoning
|
|
2
|
+
|
|
3
|
+
**Perfect memory. Exact reasoning. Complete history.**
|
|
4
|
+
|
|
5
|
+
Temporal Reasoning gives AI coding agents bi-temporal graph memory: query any past state, traverse live dependency graphs, and correlate architectural decisions with structural change — all with deterministic Datalog, no fuzzy retrieval.
|
|
6
|
+
|
|
7
|
+
## Questions Only Temporal Reasoning Can Answer
|
|
8
|
+
|
|
9
|
+
These queries are impossible with git log, vector search, or key-value memory:
|
|
10
|
+
|
|
11
|
+
```datalog
|
|
12
|
+
; What did the dependency graph look like before the auth refactor?
|
|
13
|
+
[:find ?caller ?callee
|
|
14
|
+
:as-of 30
|
|
15
|
+
:where [?caller :calls ?callee]]
|
|
16
|
+
|
|
17
|
+
; When did this coupling first appear — and what decision caused it?
|
|
18
|
+
[:find ?reason
|
|
19
|
+
:where [:project/service-a :depends-on :project/service-b]
|
|
20
|
+
[?d :motivated-by ?c]
|
|
21
|
+
[?c :description ?reason]]
|
|
22
|
+
|
|
23
|
+
; Which modules were coupled to the payment service when we made the DB decision?
|
|
24
|
+
[:find ?module
|
|
25
|
+
:as-of 15
|
|
26
|
+
:where [?module :depends-on :service/payment]]
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
This is the only tool where both the decision and the structural change live as datoms in the same graph and can be joined in a single query. See [Phase 5](ROADMAP.md) for code structure evolution from git history.
|
|
30
|
+
|
|
31
|
+
## Why Temporal Reasoning?
|
|
32
|
+
|
|
33
|
+
Most memory tools for agents are key-value stores or vector databases. They answer "what do you know now?" Temporal Reasoning answers a harder question: **"what did you know then, and what changed?"**
|
|
34
|
+
|
|
35
|
+
**Time travel.** Every write is stamped with a transaction number. You can query the graph as it existed at any past transaction:
|
|
36
|
+
|
|
37
|
+
```python
|
|
38
|
+
# Decision made in session 1, transaction 3
|
|
39
|
+
transact('[[:project/db :name "PostgreSQL"]]', reason="Initial choice")
|
|
40
|
+
|
|
41
|
+
# Changed in session 4, transaction 11
|
|
42
|
+
retract('[[:project/db :name "PostgreSQL"]]', reason="Switching to CockroachDB for geo-distribution")
|
|
43
|
+
transact('[[:project/db :name "CockroachDB"]]', reason="Switching to CockroachDB for geo-distribution")
|
|
44
|
+
|
|
45
|
+
# Later: what did we think the database was before session 4?
|
|
46
|
+
query("[:find ?name :as-of 10 :where [:project/db :name ?name]]")
|
|
47
|
+
# → "PostgreSQL"
|
|
48
|
+
|
|
49
|
+
# What do we think now?
|
|
50
|
+
query("[:find ?name :where [:project/db :name ?name]]")
|
|
51
|
+
# → "CockroachDB"
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
**Retraction with preserved history.** Changing your mind doesn't erase the record. Retracted facts stay in the bi-temporal log and remain queryable at their original transaction time. This means the agent can always reconstruct *why* a decision changed, not just *what* the current state is.
|
|
55
|
+
|
|
56
|
+
**Exact Datalog queries, not fuzzy search.** Results are deterministic and reproducible — no embedding model, no similarity threshold, no hallucinated retrievals. A query either matches or it doesn't.
|
|
57
|
+
|
|
58
|
+
**Graph traversal.** Entities are first-class nodes — not isolated key-value blobs. Store service-calls-service as a real graph edge (`:calls :project/auth-service`) and traverse it with Datalog joins. Fixed-depth transitive queries (2-hop, 3-hop) are expressed as multi-hop joins. Rules unify multiple edge types under a single named relation.
|
|
59
|
+
|
|
60
|
+
**Local and offline.** A single binary and a file. No API key, no network dependency, no cloud service to go down.
|
|
61
|
+
|
|
62
|
+
## Architecture
|
|
63
|
+
|
|
64
|
+
```
|
|
65
|
+
┌──────────────────────────────────────────────────────────────────┐
|
|
66
|
+
│ AI Coding Agent │
|
|
67
|
+
│ (Claude Code, OpenCode, Codex) │
|
|
68
|
+
└──────────┬───────────────────────────────────────┬───────────────┘
|
|
69
|
+
│ MCP tool calls │ per-turn hooks
|
|
70
|
+
│ (minigraf_query, minigraf_transact, …) │ (UserPromptSubmit / Stop)
|
|
71
|
+
▼ ▼
|
|
72
|
+
┌──────────────────────────┐ ┌─────────────────────────────┐
|
|
73
|
+
│ MCP Server │ │ Hook scripts │
|
|
74
|
+
│ mcp_server.py │◄────────│ prepare_hook.py │
|
|
75
|
+
│ (persistent stdio) │ │ finalize_hook.py │
|
|
76
|
+
└──────────┬───────────────┘ └─────────────────────────────┘
|
|
77
|
+
│
|
|
78
|
+
▼
|
|
79
|
+
┌──────────────────────────────────────────────────────────────────┐
|
|
80
|
+
│ MiniGrafDb Python binding (minigraf package) │
|
|
81
|
+
│ https://github.com/project-minigraf/minigraf │
|
|
82
|
+
│ - Bi-temporal Datalog engine │
|
|
83
|
+
│ - Transaction time + Valid time │
|
|
84
|
+
└──────────┬───────────────────────────────────────────────────────┘
|
|
85
|
+
│
|
|
86
|
+
▼
|
|
87
|
+
┌──────────────────────────────────────────────────────────────────┐
|
|
88
|
+
│ Graph File │
|
|
89
|
+
│ memory.graph (current working directory) │
|
|
90
|
+
└──────────────────────────────────────────────────────────────────┘
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
## Install
|
|
94
|
+
|
|
95
|
+
```bash
|
|
96
|
+
git clone https://github.com/project-minigraf/temporal_reasoning
|
|
97
|
+
cd /your/project
|
|
98
|
+
python /path/to/temporal_reasoning/install.py
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
Run `install.py` from your project root. It creates a virtualenv, installs dependencies, and writes `.mcp.json` and `.claude/settings*.json` into your project directory. That's it.
|
|
102
|
+
|
|
103
|
+
**Optional — LLM extraction strategy:** `install.py` defaults to heuristic (regex) extraction, which requires no API key. To use LLM-based extraction, set `MINIGRAF_EXTRACTION_STRATEGY=llm` and `ANTHROPIC_API_KEY=<your key>` in `.claude/settings.local.json` after running the script.
|
|
104
|
+
|
|
105
|
+
### OpenCode
|
|
106
|
+
|
|
107
|
+
```bash
|
|
108
|
+
python /path/to/temporal_reasoning/install.py
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
This also syncs the skill into `.opencode/skills/temporal-reasoning`.
|
|
112
|
+
|
|
113
|
+
## Quick Start
|
|
114
|
+
|
|
115
|
+
```python
|
|
116
|
+
from minigraf import query, transact
|
|
117
|
+
|
|
118
|
+
# Store a decision
|
|
119
|
+
transact("[[:decision/cache-strategy :decision/description \"use Redis\"]]",
|
|
120
|
+
reason="Architecture decision for low-latency caching")
|
|
121
|
+
|
|
122
|
+
# Query decisions
|
|
123
|
+
result = query("[:find ?d :where [?e :decision/description ?d]]")
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
## Storage Location
|
|
127
|
+
|
|
128
|
+
Default: `memory.graph` in the current working directory.
|
|
129
|
+
|
|
130
|
+
Override: `MINIGRAF_GRAPH_PATH=/custom/path python ...`
|
|
131
|
+
|
|
132
|
+
## Per-Turn Auto-Memory
|
|
133
|
+
|
|
134
|
+
When running under Claude Code with the hook configuration in `hooks/claude-code.json`, the system automatically injects relevant memory context before each turn and extracts durable facts after each turn — without the agent explicitly calling any tool.
|
|
135
|
+
|
|
136
|
+
### Prepare phase (before the turn)
|
|
137
|
+
|
|
138
|
+
`prepare_hook.py` fires on the `UserPromptSubmit` event. It:
|
|
139
|
+
|
|
140
|
+
1. Extracts candidate entity tokens from the user's message (stop-word filtered, minimum 4 characters).
|
|
141
|
+
2. Queries the graph for facts whose values contain those tokens, using `:valid-at` set to the current UTC timestamp so only currently-valid facts are returned.
|
|
142
|
+
3. Falls back to a broad scan (capped by `MINIGRAF_PREPARE_SCAN_LIMIT`, default 50 rows) when no entity-specific results are found.
|
|
143
|
+
4. Returns the results as `additionalContext` prepended to the agent's working context for that turn.
|
|
144
|
+
|
|
145
|
+
For messages containing temporal signals (e.g. "before", "last week", "as of") with an explicit ISO date, `:valid-at` is set to that date instead (midnight UTC), enabling point-in-time recall.
|
|
146
|
+
|
|
147
|
+
### Finalize phase (after the turn)
|
|
148
|
+
|
|
149
|
+
`finalize_hook.py` fires on the `Stop` event. It reads the last user+assistant exchange from the transcript, then runs the configured extraction strategy:
|
|
150
|
+
|
|
151
|
+
| Strategy | Behaviour |
|
|
152
|
+
|----------|-----------|
|
|
153
|
+
| `heuristic` (default) | Regex patterns detect decision-signal phrases ("we'll use X", "decided to use X", "always use X", "depends on X", …) and transact the matched tokens as `:decision/`, `:preference/`, `:constraint/`, or `:dependency/` entities. |
|
|
154
|
+
| `llm` | Sends the exchange to a lightweight Claude model (`claude-haiku-4-5-20251001` by default) with a structured prompt. The model returns a Datalog `transact` expression; an optional `; valid-at: YYYY-MM-DD` comment sets the fact's valid time. Falls back to the `agent` strategy on error. |
|
|
155
|
+
| `agent` | Uses MCP sampling to ask the connected agent itself for a memory block in the same Datalog format. |
|
|
156
|
+
|
|
157
|
+
### Configuration
|
|
158
|
+
|
|
159
|
+
| Environment variable | Default | Effect |
|
|
160
|
+
|----------------------|---------|--------|
|
|
161
|
+
| `MINIGRAF_EXTRACTION_STRATEGY` | `heuristic` | Finalize strategy: `heuristic`, `llm`, or `agent` |
|
|
162
|
+
| `MINIGRAF_PREPARE_SCAN_LIMIT` | `50` | Max rows returned by the broad fallback scan in the prepare phase |
|
|
163
|
+
| `MINIGRAF_LLM_MODEL` | `claude-haiku-4-5-20251001` | Model used when `MINIGRAF_EXTRACTION_STRATEGY=llm` |
|
|
164
|
+
| `ANTHROPIC_API_KEY` | — | Required when `MINIGRAF_EXTRACTION_STRATEGY=llm` and using a Claude model |
|
|
165
|
+
| `OPENAI_API_KEY` | — | Required when `MINIGRAF_EXTRACTION_STRATEGY=llm` and `MINIGRAF_LLM_MODEL` is an OpenAI model (e.g. `gpt-4o-mini`) |
|
|
166
|
+
| `MINIGRAF_GRAPH_PATH` | `memory.graph` | Override the graph file location |
|
|
167
|
+
|
|
168
|
+
## Files
|
|
169
|
+
|
|
170
|
+
| File | Purpose |
|
|
171
|
+
|------|---------|
|
|
172
|
+
| `mcp_server.py` | Persistent stdio MCP server — primary interface to the graph |
|
|
173
|
+
| `minigraf.py` | Python CLI wrapper (direct use outside MCP) |
|
|
174
|
+
| `hooks/prepare_hook.py` | Claude Code UserPromptSubmit hook — injects memory context |
|
|
175
|
+
| `hooks/ingest_hook.py` | Claude Code UserPromptSubmit hook — triggers background git ingestion |
|
|
176
|
+
| `hooks/finalize_hook.py` | Claude Code Stop hook — extracts and stores facts |
|
|
177
|
+
| `hooks/claude-code.json` | Hook + MCP configuration for Claude Code |
|
|
178
|
+
| `report_issue.py` | GitHub issue reporter |
|
|
179
|
+
| `install.py` | Setup script |
|
|
180
|
+
| `pyproject.toml` | Python packaging |
|
|
181
|
+
| `tools/*.json` | Tool schemas |
|
|
182
|
+
|
|
183
|
+
## Tools
|
|
184
|
+
|
|
185
|
+
- **minigraf_query** — Query memory with Datalog
|
|
186
|
+
- **minigraf_transact** — Store facts (reason required)
|
|
187
|
+
- **minigraf_retract** — Retract facts (original stays in history)
|
|
188
|
+
- **minigraf_report_issue** — File GitHub issues
|
|
189
|
+
- **memory_prepare_turn** — Retrieve relevant context for the current user message
|
|
190
|
+
- **memory_finalize_turn** — Extract and store memorable facts after a turn
|
|
191
|
+
- **minigraf_audit** — Audit all entities against the schema; retracts violators (history preserved)
|
|
192
|
+
- **minigraf_ingest_git** — Ingest code structure from git history into the bi-temporal graph (background task)
|
|
193
|
+
- **minigraf_ingest_status** — Poll progress of a running git ingestion; reports wall-clock time and final commit hash of the last completed run (including hook-driven ingestion)
|
|
194
|
+
|
|
195
|
+
## Query Examples
|
|
196
|
+
|
|
197
|
+
```python
|
|
198
|
+
# Basic query
|
|
199
|
+
query("[:find ?x :where [?e :attr ?x]]")
|
|
200
|
+
|
|
201
|
+
# Temporal query (state at transaction N)
|
|
202
|
+
query("[:find ?x :as-of 5 :where [?e :attr ?x]]")
|
|
203
|
+
|
|
204
|
+
# Aggregation
|
|
205
|
+
query("[:find (count ?e) :where [?e :decision/description ?d]]")
|
|
206
|
+
|
|
207
|
+
# Single-hop graph traversal — what does api-gateway call?
|
|
208
|
+
query("[:find ?name :where [:project/api-gateway :calls ?svc] [?svc :name ?name]]")
|
|
209
|
+
|
|
210
|
+
# Two-hop join — transitive impact: what depends on key-store (directly or via one intermediate)?
|
|
211
|
+
query("""[:find ?svc
|
|
212
|
+
:where [?mid :depends-on :project/key-store]
|
|
213
|
+
[?svc :depends-on ?mid]]""")
|
|
214
|
+
|
|
215
|
+
# Decision traceability — why did we choose asyncio?
|
|
216
|
+
query("[:find ?reason :where [:decision/asyncio-choice :motivated-by ?c] [?c :description ?reason]]")
|
|
217
|
+
|
|
218
|
+
# Typed entity query — list all stored components
|
|
219
|
+
query("[:find ?name :where [?e :entity-type :type/component] [?e :name ?name]]")
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
## Skill Benchmarks
|
|
223
|
+
|
|
224
|
+
Twelve evals run in isolated sandboxes measure how the skill changes behavior versus a no-skill baseline. Each eval uses a fresh graph with pre-seeded state where relevant.
|
|
225
|
+
|
|
226
|
+
| Eval | What it tests | With Skill | Without Skill |
|
|
227
|
+
|------|--------------|-----------|---------------|
|
|
228
|
+
| 1 — Decision storage | Persists architectural decisions with correct naming + reasons | 5/5 | 0/5 |
|
|
229
|
+
| 2 — Memory retrieval | Queries memory and cites stored facts by name | 4/5 | 3/5 |
|
|
230
|
+
| 3 — Cross-session preference | Discovers and applies a constraint never stated in the current conversation | 4/4 | 0/4 |
|
|
231
|
+
| 4 — Conflict detection | Surfaces architectural conflicts before silently overriding decisions | 4/4 | 0/4 |
|
|
232
|
+
| 5 — Entity reference storage | Stores relationships as traversable graph edges, not dead-end strings | 5/5 | 0/5 |
|
|
233
|
+
| 6 — Transitive impact analysis | Traverses a multi-hop dependency chain to find all affected services | 5/5 | 4/5 |
|
|
234
|
+
| 7 — Decision traceability | Follows a `:motivated-by` edge to surface the constraint behind a decision | 5/5 | 1/5 |
|
|
235
|
+
| 8 — Git ingestion | Checks status before starting ingestion; moves on without polling | 6/6 | 0/6 |
|
|
236
|
+
| 9 — Ingest status | Reports idle/running/complete accurately; surfaces errors | 5/5 | 0/5 |
|
|
237
|
+
| 10 — Memory prepare-turn | Injects relevant context before the agent responds | 5/5 | 0/5 |
|
|
238
|
+
| 11 — Audit | Detects and retracts schema violations | 4/5 | 0/5 |
|
|
239
|
+
| 12 — Already running | Does not re-trigger ingestion when already in progress | 4/5 | 2/5 |
|
|
240
|
+
| **Total** | | **56/59 (95%)** | **10/59 (17%)** |
|
|
241
|
+
|
|
242
|
+
The cross-session preference eval is the most discriminating for memory recall: the prompt says "make sure it fits with how we do things" with no hint that a relevant constraint exists. The skill queries memory, finds a stored no-mocks preference, and writes a test using real database connections.
|
|
243
|
+
|
|
244
|
+
The transitive impact eval is the most discriminating for graph traversal: given "key-store is being replaced — what breaks?" the skill executes a 2-hop Datalog join and returns a full impact chain; without it, the agent correctly admits it cannot name the affected services.
|
|
245
|
+
|
|
246
|
+
See [`evals/benchmark.md`](evals/benchmark.md) for full results and per-eval breakdowns.
|
|
247
|
+
|
|
248
|
+
## Phases
|
|
249
|
+
|
|
250
|
+
- **Phase 1** — Python skill layer ✓
|
|
251
|
+
- **Phase 2** — Write policy, report_issue, install, skill benchmarks ✓
|
|
252
|
+
- **Phase 3** — MCP server, per-turn auto-memory hooks ✓
|
|
253
|
+
- **Phase 4** — Entity normalization, schema-aware extraction, minigraf_audit ✓
|
|
254
|
+
- **Phase 5** — Code structure ingestion from git history, minigraf_ingest_git ✓
|
|
255
|
+
- **Phase 6** — Observability and trust for automatic memory (planned)
|