@kiyeonjeon21/datacontext 0.3.2 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (52) hide show
  1. package/dist/adapters/sqlite.d.ts.map +1 -1
  2. package/dist/adapters/sqlite.js +13 -0
  3. package/dist/adapters/sqlite.js.map +1 -1
  4. package/dist/api/server.d.ts.map +1 -1
  5. package/dist/api/server.js +115 -0
  6. package/dist/api/server.js.map +1 -1
  7. package/dist/cli/index.js +58 -14
  8. package/dist/cli/index.js.map +1 -1
  9. package/dist/core/context-service.d.ts +63 -0
  10. package/dist/core/context-service.d.ts.map +1 -1
  11. package/dist/core/context-service.js +66 -0
  12. package/dist/core/context-service.js.map +1 -1
  13. package/dist/core/harvester.d.ts +57 -5
  14. package/dist/core/harvester.d.ts.map +1 -1
  15. package/dist/core/harvester.js +86 -6
  16. package/dist/core/harvester.js.map +1 -1
  17. package/dist/core/types.d.ts +21 -5
  18. package/dist/core/types.d.ts.map +1 -1
  19. package/dist/index.d.ts +2 -1
  20. package/dist/index.d.ts.map +1 -1
  21. package/dist/index.js +9 -1
  22. package/dist/index.js.map +1 -1
  23. package/dist/knowledge/store.d.ts +186 -3
  24. package/dist/knowledge/store.d.ts.map +1 -1
  25. package/dist/knowledge/store.js +389 -5
  26. package/dist/knowledge/store.js.map +1 -1
  27. package/dist/knowledge/types.d.ts +252 -4
  28. package/dist/knowledge/types.d.ts.map +1 -1
  29. package/dist/knowledge/types.js +138 -1
  30. package/dist/knowledge/types.js.map +1 -1
  31. package/dist/mcp/tools.d.ts.map +1 -1
  32. package/dist/mcp/tools.js +231 -3
  33. package/dist/mcp/tools.js.map +1 -1
  34. package/docs/KNOWLEDGE_GRAPH.md +540 -0
  35. package/docs/KNOWLEDGE_TYPES.md +261 -0
  36. package/docs/MULTI_DB_ARCHITECTURE.md +319 -0
  37. package/package.json +1 -1
  38. package/scripts/create-sqlite-testdb.sh +75 -0
  39. package/scripts/test-databases.sh +324 -0
  40. package/sqlite:./test-sqlite.db +0 -0
  41. package/src/adapters/sqlite.ts +16 -0
  42. package/src/api/server.ts +134 -0
  43. package/src/cli/index.ts +57 -16
  44. package/src/core/context-service.ts +70 -0
  45. package/src/core/harvester.ts +120 -8
  46. package/src/core/types.ts +21 -5
  47. package/src/index.ts +19 -1
  48. package/src/knowledge/store.ts +480 -6
  49. package/src/knowledge/types.ts +321 -4
  50. package/src/mcp/tools.ts +273 -3
  51. package/test-sqlite.db +0 -0
  52. package/tests/knowledge-store.test.ts +130 -0
@@ -0,0 +1,540 @@
1
+ # Knowledge Graph Architecture
2
+
3
+ > **Last Updated:** 2026-01-01
4
+ > **Version:** 2.0.0
5
+ > **Status:** Implemented (Phase 1.5)
6
+
7
+ This document describes the Knowledge Graph architecture in DataContext, including data structures, graph traversal methods, and usage patterns.
8
+
9
+ ---
10
+
11
+ ## Overview
12
+
13
+ DataContext's Knowledge Store has evolved from a simple key-value storage to a **graph-based knowledge model**. This enables:
14
+
15
+ 1. **Relationship Discovery**: Automatically find how tables connect
16
+ 2. **Join Path Finding**: Calculate optimal join paths between tables
17
+ 3. **Context Enrichment**: Provide AI with relationship information
18
+ 4. **Schema Understanding**: Help AI understand data model structure
19
+
20
+ ```
21
+ ┌─────────────────────────────────────────────────────────────────┐
22
+ │ Knowledge Graph │
23
+ │ │
24
+ │ [users] ◄────────── [orders] ────────────► [products] │
25
+ │ │ │ │ │
26
+ │ │ ▼ │ │
27
+ │ │ [order_items] ◄───────────────┘ │
28
+ │ │ │
29
+ │ ▼ │
30
+ │ [profiles] │
31
+ │ │
32
+ │ Legend: │
33
+ │ ────► Foreign Key relationship │
34
+ │ ◄──── Reverse relationship │
35
+ │ │
36
+ └─────────────────────────────────────────────────────────────────┘
37
+ ```
38
+
39
+ ---
40
+
41
+ ## Data Model
42
+
43
+ ### Graph Structure
44
+
45
+ | Concept | Implementation | Description |
46
+ |---------|---------------|-------------|
47
+ | **Nodes** | `TableDescription` | Tables with business context |
48
+ | **Edges** | `TableRelationship` | Relationships between tables |
49
+ | **Annotations** | `QueryExample`, `BusinessRule`, `BusinessTerm` | Context attached to nodes |
50
+
51
+ ### Core Types
52
+
53
+ #### TableRef — Qualified Table Reference
54
+
55
+ ```typescript
56
+ interface TableRef {
57
+ database?: string; // Optional for multi-DB (future)
58
+ schema: string; // e.g., "public"
59
+ table: string; // e.g., "users"
60
+ }
61
+ ```
62
+
63
+ Enables fully qualified references like `analytics.public.events` for future multi-database support.
64
+
65
+ #### TableRelationship — Graph Edge
66
+
67
+ ```typescript
68
+ interface TableRelationship extends KnowledgeMeta {
69
+ type: 'table_relationship';
70
+
71
+ // Graph edge endpoints
72
+ from: TableRef; // Source table (FK side)
73
+ to: TableRef; // Target table (referenced side)
74
+
75
+ // Relationship metadata
76
+ relationshipType: RelationshipType; // 'foreign_key' | 'implicit_join' | 'manual'
77
+ joinCondition: string; // SQL: "orders.user_id = users.id"
78
+ cardinality?: RelationshipCardinality; // 'one-to-one' | 'one-to-many' | etc.
79
+
80
+ // Column details
81
+ fromColumns?: string[]; // ["user_id"]
82
+ toColumns?: string[]; // ["id"]
83
+
84
+ // Additional context
85
+ isPreferred: boolean; // Preferred join path when multiple exist
86
+ constraintName?: string; // FK constraint name from DB
87
+ notes?: string; // Human-readable notes
88
+ }
89
+ ```
90
+
91
+ ### Relationship Types
92
+
93
+ | Type | Source | Description |
94
+ |------|--------|-------------|
95
+ | `foreign_key` | Auto-harvested | From DB FK constraints |
96
+ | `implicit_join` | Learned | Inferred from query patterns (future) |
97
+ | `manual` | User-defined | Manually added by user |
98
+
99
+ ### Cardinality
100
+
101
+ | Value | Description | Example |
102
+ |-------|-------------|---------|
103
+ | `one-to-one` | Each row maps to exactly one row | user ↔ profile |
104
+ | `one-to-many` | One row maps to many rows | user → orders |
105
+ | `many-to-one` | Many rows map to one row | orders → user |
106
+ | `many-to-many` | Requires junction table | students ↔ courses |
107
+
108
+ ---
109
+
110
+ ## Usage
111
+
112
+ ### Auto-Harvesting Relationships
113
+
114
+ Foreign keys are automatically converted to relationships during harvesting:
115
+
116
+ ```typescript
117
+ // Harvest database metadata
118
+ const result = await service.harvest('public');
119
+
120
+ console.log(`Relationships added: ${result.relationshipsAdded}`);
121
+
122
+ // Result includes:
123
+ // - tablesProcessed: 10
124
+ // - descriptionsAdded: 5
125
+ // - columnsAdded: 20
126
+ // - relationshipsAdded: 8 ← NEW
127
+ ```
128
+
129
+ ### Querying Relationships
130
+
131
+ #### Get Related Tables (1-hop)
132
+
133
+ ```typescript
134
+ // Find all tables connected to 'orders'
135
+ const relationships = knowledge.getRelatedTables('orders', 'public');
136
+
137
+ // Returns:
138
+ // - orders → users (user_id → id)
139
+ // - orders → products (product_id → id)
140
+ // - order_items → orders (order_id → id)
141
+ ```
142
+
143
+ #### Find Join Path (Multi-hop)
144
+
145
+ ```typescript
146
+ // Find shortest path from order_items to customers
147
+ const path = knowledge.findJoinPath('order_items', 'customers', 'public');
148
+
149
+ // Returns array of relationships:
150
+ // [
151
+ // { from: 'order_items', to: 'orders', joinCondition: '...' },
152
+ // { from: 'orders', to: 'customers', joinCondition: '...' }
153
+ // ]
154
+ ```
155
+
156
+ #### Get Relationship Between Tables
157
+
158
+ ```typescript
159
+ const rel = knowledge.getRelationshipBetween('orders', 'users', 'public');
160
+
161
+ if (rel) {
162
+ console.log(`Join: ${rel.joinCondition}`);
163
+ // "orders.user_id = users.id"
164
+ }
165
+ ```
166
+
167
+ ### Adding Manual Relationships
168
+
169
+ ```typescript
170
+ // Add an implicit join (no FK in database)
171
+ await knowledge.addRelationship({
172
+ ...createKnowledgeMeta('user', schemaHash),
173
+ type: 'table_relationship',
174
+ from: createTableRef('events', 'public'),
175
+ to: createTableRef('users', 'public'),
176
+ relationshipType: 'implicit_join',
177
+ joinCondition: 'events.actor_id = users.id',
178
+ cardinality: 'many-to-one',
179
+ isPreferred: false,
180
+ notes: 'Legacy system - no FK constraint exists',
181
+ });
182
+
183
+ // Or use the convenience method for FK-style relationships
184
+ await knowledge.addRelationshipFromFK({
185
+ fromTable: 'orders',
186
+ fromColumn: 'user_id',
187
+ toTable: 'users',
188
+ toColumn: 'id',
189
+ });
190
+ ```
191
+
192
+ ### Graph Summary
193
+
194
+ ```typescript
195
+ const summary = knowledge.getGraphSummary();
196
+
197
+ // Returns:
198
+ // {
199
+ // nodeCount: 15, // Tables with descriptions
200
+ // edgeCount: 12, // Relationships
201
+ // tablesWithRelationships: 10,
202
+ // isolatedTables: 5, // Tables with no relationships
203
+ // relationshipsByType: {
204
+ // foreign_key: 10,
205
+ // implicit_join: 2,
206
+ // manual: 0
207
+ // }
208
+ // }
209
+ ```
210
+
211
+ ### Building AI Context with Relationships
212
+
213
+ ```typescript
214
+ // Standard context (descriptions, rules, examples)
215
+ const context = knowledge.buildContext(['orders', 'users']);
216
+
217
+ // Enhanced context including relationships
218
+ const fullContext = knowledge.buildContextWithRelationships(['orders', 'users']);
219
+
220
+ // Output includes:
221
+ // ## Table Relationships
222
+ // Use these relationships when joining tables:
223
+ //
224
+ // - **public.orders** → **public.users**
225
+ // Join: `orders.user_id = users.id`
226
+ // Cardinality: many-to-one
227
+ ```
228
+
229
+ ---
230
+
231
+ ## Data Flow
232
+
233
+ ### Harvesting Flow
234
+
235
+ ```
236
+ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
237
+ │ Database │ │ Harvester │ │ Knowledge │
238
+ │ │ │ │ │ Store │
239
+ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘
240
+ │ │ │
241
+ │ 1. Query FK │ │
242
+ │◄───────────────────│ │
243
+ │ │ │
244
+ │ 2. Return FK info │ │
245
+ │────────────────────► │
246
+ │ │ │
247
+ │ │ 3. Convert to │
248
+ │ │ Relationship │
249
+ │ │────────────────────►
250
+ │ │ │
251
+ │ │ │ 4. Store
252
+ │ │ │ (dedupe)
253
+ │ │ │
254
+ ```
255
+
256
+ ### Query Assistance Flow
257
+
258
+ ```
259
+ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
260
+ │ AI │ │ DataContext │ │ Knowledge │
261
+ │ │ │ Service │ │ Store │
262
+ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘
263
+ │ │ │
264
+ │ "Join orders and │ │
265
+ │ customers" │ │
266
+ │────────────────────► │
267
+ │ │ │
268
+ │ │ findJoinPath() │
269
+ │ │────────────────────►
270
+ │ │ │
271
+ │ │ [orders→users→ │
272
+ │ │ customers] │
273
+ │ │◄───────────────────│
274
+ │ │ │
275
+ │ Context with │ │
276
+ │ join conditions │ │
277
+ │◄───────────────────│ │
278
+ ```
279
+
280
+ ---
281
+
282
+ ## Storage Format
283
+
284
+ Knowledge is stored in JSON files at `~/.datacontext/{database_id}.json`:
285
+
286
+ ```json
287
+ {
288
+ "version": "2.0.0",
289
+ "databaseId": "mydb",
290
+ "schemaHash": "abc123...",
291
+ "lastSyncAt": "2026-01-01T00:00:00.000Z",
292
+
293
+ "tableDescriptions": [...],
294
+
295
+ "tableRelationships": [
296
+ {
297
+ "id": "1234-abc",
298
+ "type": "table_relationship",
299
+ "from": { "schema": "public", "table": "orders" },
300
+ "to": { "schema": "public", "table": "users" },
301
+ "relationshipType": "foreign_key",
302
+ "joinCondition": "orders.user_id = users.id",
303
+ "cardinality": "many-to-one",
304
+ "fromColumns": ["user_id"],
305
+ "toColumns": ["id"],
306
+ "isPreferred": true,
307
+ "source": "auto",
308
+ "confidence": 0.8,
309
+ "schemaHash": "abc123...",
310
+ "createdAt": "2026-01-01T00:00:00.000Z",
311
+ "updatedAt": "2026-01-01T00:00:00.000Z",
312
+ "lastVerifiedAt": "2026-01-01T00:00:00.000Z"
313
+ }
314
+ ],
315
+
316
+ "queryExamples": [...],
317
+ "businessRules": [...],
318
+ "businessTerms": [...]
319
+ }
320
+ ```
321
+
322
+ ---
323
+
324
+ ## Performance Considerations
325
+
326
+ ### Current Implementation (JSON)
327
+
328
+ - **Node count**: < 500 tables
329
+ - **Edge count**: < 2,000 relationships
330
+ - **Query complexity**: 2-hop traversal in < 10ms
331
+
332
+ ### Future: SQLite Migration
333
+
334
+ For larger graphs:
335
+
336
+ ```sql
337
+ CREATE TABLE relationships (
338
+ id TEXT PRIMARY KEY,
339
+ from_schema TEXT NOT NULL,
340
+ from_table TEXT NOT NULL,
341
+ to_schema TEXT NOT NULL,
342
+ to_table TEXT NOT NULL,
343
+ relationship_type TEXT NOT NULL,
344
+ join_condition TEXT,
345
+ -- ... metadata
346
+ );
347
+
348
+ CREATE INDEX idx_rel_from ON relationships(from_table);
349
+ CREATE INDEX idx_rel_to ON relationships(to_table);
350
+ ```
351
+
352
+ ---
353
+
354
+ ## Migration from v1.x
355
+
356
+ Data files from v1.x are automatically migrated:
357
+
358
+ 1. `tableRelationships` array is initialized if missing
359
+ 2. Version is updated to `2.0.0`
360
+ 3. Existing data is preserved
361
+
362
+ ```typescript
363
+ // Migration happens automatically on load()
364
+ await knowledge.load();
365
+
366
+ // Old files without tableRelationships work fine
367
+ // Run harvest to populate relationships
368
+ await service.harvest('public');
369
+ ```
370
+
371
+ ---
372
+
373
+ ## API Reference
374
+
375
+ ### KnowledgeStore Methods
376
+
377
+ | Method | Description |
378
+ |--------|-------------|
379
+ | `getRelationships()` | Get all relationships |
380
+ | `getRelatedTables(table, schema)` | Get 1-hop related tables |
381
+ | `getOutgoingRelationships(table, schema)` | Get relationships where table is source |
382
+ | `getIncomingRelationships(table, schema)` | Get relationships where table is target |
383
+ | `findJoinPath(from, to, schema)` | Find shortest path (BFS) |
384
+ | `getRelationshipBetween(from, to, schema)` | Get direct relationship |
385
+ | `addRelationship(rel)` | Add a relationship |
386
+ | `addRelationshipFromFK(params)` | Convenience method for FK |
387
+ | `addRelationships(rels)` | Batch add |
388
+ | `deleteRelationship(id)` | Delete by ID |
389
+ | `getGraphSummary()` | Get graph statistics |
390
+ | `buildContextWithRelationships(tables)` | Build AI context with relationships |
391
+
392
+ ### Factory Functions
393
+
394
+ | Function | Description |
395
+ |----------|-------------|
396
+ | `createTableRef(table, schema?, database?)` | Create qualified table reference |
397
+ | `tableRefToString(ref)` | Convert to string "schema.table" |
398
+ | `tableRefsEqual(a, b)` | Compare two TableRefs |
399
+ | `createRelationshipFromFK(params)` | Create relationship from FK info |
400
+
401
+ ---
402
+
403
+ ## MCP Tools
404
+
405
+ Knowledge Graph functionality is exposed via MCP tools for AI assistants:
406
+
407
+ ### `get_context` (Enhanced)
408
+
409
+ Now includes relationship information:
410
+
411
+ ```json
412
+ {
413
+ "context": "## Table: orders\n...\n\n## Table Relationships\n...",
414
+ "tables": ["orders", "users"],
415
+ "relatedTables": ["order_items", "products"],
416
+ "relationships": [
417
+ {
418
+ "from": "public.orders",
419
+ "to": "public.users",
420
+ "type": "foreign_key",
421
+ "joinCondition": "orders.user_id = users.id",
422
+ "cardinality": "many-to-one"
423
+ }
424
+ ]
425
+ }
426
+ ```
427
+
428
+ ### `find_join_path`
429
+
430
+ Find optimal path between two tables:
431
+
432
+ ```json
433
+ // Request
434
+ { "fromTable": "order_items", "toTable": "users" }
435
+
436
+ // Response
437
+ {
438
+ "found": true,
439
+ "hops": 2,
440
+ "path": [
441
+ { "from": "order_items", "to": "orders", "joinCondition": "..." },
442
+ { "from": "orders", "to": "users", "joinCondition": "..." }
443
+ ],
444
+ "sqlHint": "FROM order_items\nJOIN orders ON ...\nJOIN users ON ..."
445
+ }
446
+ ```
447
+
448
+ ### `get_relationships`
449
+
450
+ Get all relationships, optionally filtered:
451
+
452
+ ```json
453
+ // Request
454
+ { "table": "orders" }
455
+
456
+ // Response
457
+ {
458
+ "count": 3,
459
+ "relationships": [...]
460
+ }
461
+ ```
462
+
463
+ ### `get_graph_summary`
464
+
465
+ Get graph statistics:
466
+
467
+ ```json
468
+ {
469
+ "nodeCount": 15,
470
+ "edgeCount": 12,
471
+ "tablesWithRelationships": 10,
472
+ "isolatedTables": 5
473
+ }
474
+ ```
475
+
476
+ ---
477
+
478
+ ## REST API Endpoints
479
+
480
+ ### `GET /api/relationships?table=orders`
481
+
482
+ Returns all relationships, optionally filtered by table.
483
+
484
+ ### `GET /api/relationships/path?from=order_items&to=users`
485
+
486
+ Find join path between two tables.
487
+
488
+ ### `GET /api/graph/summary`
489
+
490
+ Get Knowledge Graph summary statistics.
491
+
492
+ ---
493
+
494
+ ## VS Code Extension Commands
495
+
496
+ | Command | Description |
497
+ |---------|-------------|
498
+ | `DataContext: Show Table Relationships` | Show all relationships in a markdown view |
499
+ | `DataContext: Find Join Path` | Interactive picker to find path between tables |
500
+ | `DataContext: Show Knowledge Graph Summary` | Display graph statistics |
501
+
502
+ ---
503
+
504
+ ## Future Enhancements
505
+
506
+ ### Phase 2: Implicit Join Learning
507
+
508
+ Learn relationships from query patterns:
509
+
510
+ ```typescript
511
+ // When queries frequently join on events.actor_id = users.id
512
+ // but no FK exists, create implicit_join relationship
513
+ ```
514
+
515
+ ### Phase 2+: Multi-Database
516
+
517
+ ```typescript
518
+ // Cross-database relationships
519
+ const relationship: TableRelationship = {
520
+ from: { database: 'orders_db', schema: 'public', table: 'orders' },
521
+ to: { database: 'users_db', schema: 'public', table: 'users' },
522
+ // ...
523
+ };
524
+ ```
525
+
526
+ ### Phase 3: Graph Visualization
527
+
528
+ - Generate Mermaid diagrams
529
+ - Interactive graph explorer
530
+ - Lineage visualization
531
+
532
+ ---
533
+
534
+ ## Related Documentation
535
+
536
+ - [ARCHITECTURE.md](./ARCHITECTURE.md) — Overall system architecture
537
+ - [KNOWLEDGE_TYPES.md](./KNOWLEDGE_TYPES.md) — All knowledge type definitions
538
+ - [API.md](./API.md) — REST API reference
539
+
540
+