graphjin 3.18.10 → 3.18.22

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +274 -25
  2. package/package.json +1 -1
package/README.md CHANGED
@@ -7,9 +7,9 @@
7
7
  [![GoDoc](https://img.shields.io/badge/godoc-reference-5272B4.svg?style=for-the-badge&logo=go)](https://pkg.go.dev/github.com/dosco/graphjin/core/v3)
8
8
  [![GoReport](https://goreportcard.com/badge/github.com/gojp/goreportcard?style=for-the-badge)](https://goreportcard.com/report/github.com/dosco/graphjin/core/v3)
9
9
 
10
- Point GraphJin at any database and AI assistants can query it instantly. Auto-discovers your schema, understands relationships, compiles to optimized SQL. No configuration required.
10
+ Point GraphJin at any database or source tree and AI assistants can query it instantly. Auto-discovers your schema, understands relationships, indexes code with tree-sitter, and compiles to optimized SQL. No configuration required.
11
11
 
12
- Works with PostgreSQL, MySQL, MongoDB, SQLite, Oracle, MSSQL, Snowflake - and models from Claude/GPT-4 to local 7B models.
12
+ Works with PostgreSQL, MySQL, MongoDB, SQLite, Oracle, MSSQL, Snowflake, S3/GCS/files, CodeSQL source indexes - and models from Claude/GPT-4 to local 7B models.
13
13
 
14
14
  ## Installation
15
15
 
@@ -194,12 +194,79 @@ Copy paste the Claude Desktop Config provided by `graphjin serve` into the Claud
194
194
 
195
195
  1. **Connects to database** - Reads your schema automatically
196
196
  2. **Discovers relationships** - Foreign keys become navigable joins
197
- 3. **Exposes MCP tools** - Teach any LLM the query syntax
198
- 4. **Runs JS workflows** - Chain multiple GraphJin MCP tools in one reusable workflow
199
- 5. **Compiles to SQL** - Every request becomes a single optimized query
197
+ 3. **Exposes metadata** - `gj_*` tables make discovered databases, tables, columns, relationships, functions, and indexes queryable when the GraphJin source is enabled
198
+ 4. **Indexes source code** - CodeSQL turns tree-sitter syntax trees and database references into a managed SQLite database
199
+ 5. **Exposes MCP tools** - Teach any LLM the query syntax
200
+ 6. **Runs JS workflows** - Chain multiple GraphJin MCP tools in one reusable workflow
201
+ 7. **Compiles to SQL** - Every request becomes a single optimized query
200
202
 
201
203
  No resolvers. No ORM. No N+1 queries. Just point and query.
202
204
 
205
+ ## CodeSQL: Query Source Code Like a Database
206
+
207
+ CodeSQL is a managed source kind for source trees. Configure a source folder and GraphJin creates a SQLite cache under `config/codesql/`, indexes it with tree-sitter, and updates it on restart. In development it also watches for changes while the service runs; in production live watching is disabled.
208
+
209
+ ```yaml
210
+ sources:
211
+ - name: app
212
+ kind: sql
213
+ type: postgres
214
+ connection_string: postgres://app:secret@db/app
215
+ default: true
216
+
217
+ - name: code
218
+ kind: codesql
219
+ path: /srv/app
220
+ infer_db_refs: true
221
+
222
+ - name: graphjin
223
+ kind: graphjin
224
+ metadata: true
225
+
226
+ tables:
227
+ - name: users
228
+ source: app
229
+
230
+ - name: gj_code
231
+ source: code
232
+ read_only: true
233
+ ```
234
+
235
+ GraphJin exposes CodeSQL through one ordinary GraphQL root, `gj_code`. Use `kind` to select files, symbols, references, imports, database references, docs, parse errors, change sets, and locks:
236
+
237
+ ```graphql
238
+ query {
239
+ gj_code(where: { kind: { eq: "symbol" }, name: { iregex: "handler|resolver" } }, limit: 20) {
240
+ name
241
+ symbol_kind
242
+ language
243
+ start_row
244
+ path
245
+ hash
246
+ }
247
+ }
248
+ ```
249
+
250
+ With a `kind: graphjin` source, GraphJin creates a read-only system graph named `graphjin` by default. Schema, catalog, entrypoint, capability, workflow, and system metadata are catalog items in `gj_catalog`; table and column metadata are selected by `kind`. When one CodeSQL source is active, GraphJin links catalog items to code references automatically:
251
+
252
+ ```graphql
253
+ query {
254
+ gj_catalog(where: { kind: { eq: "column" }, table_name: { eq: "users" }, column_name: { eq: "email" } }) {
255
+ database_name
256
+ table_name
257
+ column_name
258
+ gj_code {
259
+ kind
260
+ ref_kind
261
+ path
262
+ symbol_id
263
+ }
264
+ }
265
+ }
266
+ ```
267
+
268
+ This is where the model gets genuinely powerful: the same agent can inspect production data systems and the code that operates them. It can ask, "which handlers touch customer invoices?", "what tables do these workflows depend on?", or "show me the imports and call sites near this data path" without switching tools or inventing a new backend.
269
+
203
270
  ## What AI Can Do
204
271
 
205
272
  **Simple queries with filters:**
@@ -223,6 +290,22 @@ No resolvers. No ORM. No N+1 queries. Just point and query.
223
290
  { products { count_id sum_price avg_price } }
224
291
  ```
225
292
 
293
+ **Analytics directives:**
294
+ ```graphql
295
+ {
296
+ orders {
297
+ account_id
298
+ month
299
+ total
300
+ running_total: total @running(aggregate: sum, by: "account_id", orderBy: { month: asc })
301
+ moving_avg_total: total @moving(aggregate: avg, rows: 6, by: "account_id", orderBy: { month: asc })
302
+ previous_total: total @previous(by: "account_id", orderBy: { month: asc })
303
+ rank_by_total: total @rank(by: "account_id", order: desc)
304
+ }
305
+ }
306
+ ```
307
+ Use analytics directives when each original row should remain visible while adding report metrics such as running totals, moving averages, previous/next values, first/last values, and rank within a group. Ordinary one-row-per-group summaries still use `distinct` plus aggregate fields. Supported SQL databases validate analytics support at compile time; MongoDB and known-old database versions return clear errors.
308
+
226
309
  **Mutations:**
227
310
  ```graphql
228
311
  mutation {
@@ -258,19 +341,166 @@ subscription {
258
341
  - Automatic change detection - updates only sent when data actually changes
259
342
  - Built-in cursor pagination for feeds and infinite scroll
260
343
 
261
- Works from Node.js, Go, or any WebSocket client.
344
+ Subscribe over **WebSockets** (`graphql-ws` / `graphql-transport-ws` subprotocols) or **Server-Sent Events** — set `Accept: text/event-stream` on a `POST /api/v1/graphql` request and GraphJin streams `event: next` frames for each result, terminated by `event: complete`. Works from Node.js, Go, or any browser `EventSource` / WebSocket client.
345
+
346
+ ## Filesystem Tables (Local, S3, GCS)
347
+
348
+ Object stores show up as ordinary tables in your GraphQL schema. Declare them in config and they get the same query surface as a database table — no per-storage GraphQL plumbing on your side.
349
+
350
+ ```yaml
351
+ sources:
352
+ - name: avatars
353
+ kind: filesystem
354
+ backend: s3
355
+ bucket: my-bucket
356
+ prefix: avatars/
357
+ region: us-east-1
358
+ presign_ttl: 15m
359
+
360
+ - name: invoices
361
+ kind: filesystem
362
+ backend: gcs
363
+ bucket: invoices
364
+ prefix: 2026/
365
+
366
+ - name: uploads_local
367
+ kind: filesystem
368
+ backend: local
369
+ root: /var/lib/graphjin/uploads
370
+
371
+ tables:
372
+ - name: avatars
373
+ source: avatars
374
+ read_only: true
375
+
376
+ - name: invoices
377
+ source: invoices
378
+ read_only: true
379
+
380
+ - name: uploads_local
381
+ source: uploads_local
382
+ ```
383
+
384
+ Every filesystem table exposes the same columns regardless of backend:
385
+
386
+ ```graphql
387
+ { avatars(
388
+ where: { key: { like: "users/%" } }
389
+ order_by: { key: asc }
390
+ limit: 50
391
+ ) {
392
+ key size content_type modified_at url
393
+ }
394
+ }
395
+
396
+ { avatars(id: "users/42.png") {
397
+ key size url data # data is base64 because the field was selected
398
+ }
399
+ }
400
+ ```
401
+
402
+ The legacy `prefix`, `key`, and `inline_data` arguments remain accepted, but new callers should use the normal GraphJin read surface: `id`, `where`, `order_by`, `limit`, `offset`, `first`, `last`, `after`, and `before`.
403
+ For cursor pagination, request the standard root cursor field, e.g. `avatars_cursor`, and pass it back through `after: $cursor`.
404
+
405
+ `url` is a presigned GET URL by default (15 min, configurable per table). Auth follows the standard credential chain: AWS env / `~/.aws` / IRSA / EC2 IMDS for S3, Application Default Credentials for GCS — never embedded in GraphJin config.
406
+
407
+ Slim builds drop SDK weight: `-tags no_s3` or `-tags no_gcs` excludes either backend. Custom backends register through `core.OptionSetFilesystemBackend(name, factory)` — same SDK GraphJin uses for the built-ins.
408
+
409
+ ## File Uploads
410
+
411
+ The GraphQL endpoint accepts multipart bodies per the [graphql-multipart-request-spec](https://github.com/jaydenseric/graphql-multipart-request-spec). Files can be inlined as base64 (default) or streamed straight to a filesystem table:
412
+
413
+ ```yaml
414
+ uploads:
415
+ enabled: true
416
+ storage: avatars # name of a filesystems[] entry; omit to inline as base64
417
+ storage_key_prefix: "{date}/" # {date} → YYYY/MM/DD
418
+ max_size: 25_000_000
419
+ allowed_mime: ["image/*", "application/pdf"]
420
+ ```
421
+
422
+ When `storage` is set, the file body is written to the backend and the GraphQL variable becomes a stable reference — mutations persist this directly into a JSONB column:
423
+
424
+ ```json
425
+ { "key": "2026/05/08/abc123.png",
426
+ "url": "https://s3.../...?presigned",
427
+ "size": 12345,
428
+ "content_type": "image/png" }
429
+ ```
430
+
431
+ When `storage` is empty the variable carries the bytes inline as `{filename, content_type, size, data}` (base64) — useful for small uploads going straight into `bytea`.
432
+
433
+ ## Apollo Federation v2
434
+
435
+ GraphJin can register as a federation subgraph so it composes with other services behind Apollo Router / Cosmo / Hive Gateway:
436
+
437
+ ```yaml
438
+ federation:
439
+ enabled: true
440
+ version: "v2.5"
441
+ keys:
442
+ users: ["id"] # auto-derived from PKs by default
443
+ orders: ["id", "tenant_id"] # composite keys via override
444
+ shareable: ["Tag.name"] # field-level @shareable
445
+ inaccessible: ["Users.encrypted_password"]
446
+ ```
447
+
448
+ `_service { sdl }` returns a federation-flavoured SDL with `@link`, `@key`, `@shareable`, `@inaccessible`, `@tag`, `_Service`, and `_Entity`. Composition succeeds out of the box; `_entities` resolution is on the roadmap (the engine returns a clear error today, so gateways see the gap rather than silent failures).
449
+
450
+ ## HTTP API Routes
451
+
452
+ `graphjin serve` exposes everything under a single host/port. All routes go through the configured auth handler unless noted.
453
+
454
+ | Route | Methods | Purpose |
455
+ |---|---|---|
456
+ | `/api/v1/graphql` | `GET`, `POST` | GraphQL queries and mutations. Subscriptions if the request is a WebSocket upgrade or carries `Accept: text/event-stream` (SSE). |
457
+ | `/api/v1/rest/<name>` | `GET`, `POST` | Run a saved/persisted query by name. Variables go in `?variables=…` (GET) or the JSON body (POST). |
458
+ | `/api/v1/workflows/<name>` | `GET`, `POST` | Legacy workflow execution endpoint. In source mode it is registered only when `mcp.legacy_discovery: true`; use `gj_workflow_execution(insert)` through GraphQL otherwise. |
459
+ | `/api/v1/openapi.json` | `GET` | OpenAPI 3 spec generated from your saved REST queries. |
460
+ | `/api/v1/mcp` | `POST` | MCP (Model Context Protocol) HTTP transport — Streamable HTTP, stateless. |
461
+ | `/api/v1/mcp/message` | `POST` | MCP HTTP transport for stateless message integrations. |
462
+ | `/api/v1/discovery` | `GET` | Legacy discovery document. In source mode it is registered only when `mcp.legacy_discovery: true`; use catalog GraphQL roots otherwise. |
463
+ | `/api/v1/discovery/<section>` | `GET` | Legacy discovery drill-down (e.g. `tables`, `insights`), gated the same way as `/api/v1/discovery`. |
464
+ | `/api/v1/admin/tables` | `GET` | Admin: list known tables (Web UI). |
465
+ | `/api/v1/admin/tables/<name>` | `GET` | Admin: schema for a single table. |
466
+ | `/api/v1/admin/queries` | `GET` | Admin: list saved queries. |
467
+ | `/api/v1/admin/queries/<name>` | `GET` | Admin: details for a saved query. |
468
+ | `/api/v1/admin/fragments` | `GET` | Admin: list GraphQL fragments. |
469
+ | `/api/v1/admin/config` | `GET` | Admin: effective runtime config. |
470
+ | `/api/v1/admin/database` / `/api/v1/admin/databases` | `GET` | Admin: connected database info. |
471
+ | `/api/v1/auth/device` | `POST` | OIDC device-flow start (only if `auth_login.enabled`). |
472
+ | `/api/v1/auth/device/token` | `POST` | OIDC device-flow poll. |
473
+ | `/api/v1/auth/login` | `GET` | OIDC login redirect. |
474
+ | `/api/v1/auth/callback` | `GET` | OIDC callback. |
475
+ | `/health` | `GET` | Liveness probe. **No auth.** |
476
+ | `/` | `GET` | Built-in Web UI (only when `webui: true`). |
477
+
478
+ **Mode flags that change which routes are live:**
479
+ - `mcp.disable: true` — removes `/api/v1/mcp` and `/api/v1/mcp/message`.
480
+ - `mcp.only: true` — keeps only `/health` and `/api/v1/mcp*`. Legacy `/api/v1/workflows/*` and `/api/v1/discovery*` remain only when `mcp.legacy_discovery: true`.
481
+ - Source mode (`sources:` present) disables legacy `/api/v1/workflows/*` and `/api/v1/discovery*` unless `mcp.legacy_discovery: true`.
482
+ - `webui: false` — drops `/` and the `/api/v1/admin/*` routes.
262
483
 
263
484
  ## MCP Tools
264
485
 
265
- GraphJin exposes several tools that guide AI models to write valid queries. Key tools: `list_tables` and `describe_table` for schema discovery, `get_query_syntax` for learning the DSL, `execute_graphql` for running queries, and `execute_saved_query` for production-approved queries.
486
+ GraphJin exposes a catalog-first agent surface that guides AI models to discover before acting. Start with `query_catalog`, then inspect evidence with `get_catalog_card` before writing queries, choosing relationships, or using GraphJin-specific syntax. For actions, agents can use GraphJin control-plane GraphQL roots such as `gj_workflow_execution(insert)`, `gj_workflow(insert/update/delete)`, and `gj_config(id: "current", update: ...)`. Schema reloads, schema changes, where-clause validation, and query repair remain MCP action tools. The legacy discovery tools are migration shims and are disabled unless `mcp.legacy_discovery: true`.
487
+
488
+ For teams building MCP agents, internal copilots, workflow agents, or enterprise automation, see [AGENTIC.md](AGENTIC.md). It explains the catalog-first agent loop in detail: discover, inspect, validate, act, observe, and refine.
489
+
490
+ Key discovery tools:
491
+ - `get_catalog_entrypoints` to choose a discovery path when the task is broad
492
+ - `query_catalog` to search schema, relationship, workflow, language, config, policy, capability, and query-pattern items. Use `search` for ranked text discovery and `where` for exact filters.
493
+ - `get_catalog_card` to inspect evidence, examples, details, safety notes, and graph edges
494
+ - `validate_where_clause` to validate filters before execution
266
495
 
267
496
  For JS orchestration, use:
497
+ - `query_catalog` with `where: { kind: { eq: "workflow" } }` to discover reusable workflows
268
498
  - `get_js_runtime_api` to discover exactly which globals/functions are available inside workflow scripts
269
- - `execute_workflow` to run `./workflows/<name>.js` with input variables
499
+ - `gj_workflow_execution(insert: { workflow_name: "...", variables: {...} })` to run `./workflows/<name>.js` through GraphQL. This is mutation-only and returns an ephemeral result row; it does not store run history. Mark the workflows source or `gj_workflow_execution` table `read_only` to block it. The `execute_workflow` MCP compatibility tool is available only when `mcp.legacy_discovery: true` and `mcp.allow_workflow_execution: true`.
270
500
 
271
501
  Prompts like `write_query` and `fix_query_error` help models construct and debug queries.
272
502
 
273
- ## JS Workflows (MCP + REST)
503
+ ## JS Workflows (GraphQL + REST)
274
504
 
275
505
  Workflows let an LLM run multi-step logic in JavaScript while still using GraphJin MCP tools for DB-aware operations.
276
506
 
@@ -278,7 +508,9 @@ Create a file in `./workflows`, for example `./workflows/customer_insights.js`:
278
508
 
279
509
  ```js
280
510
  function main(input) {
281
- const tables = gj.tools.listTables({});
511
+ const tables = gj.tools.queryCatalog({
512
+ where: { kind: { eq: "table" } }
513
+ }).cards;
282
514
  const top = gj.tools.executeSavedQuery({
283
515
  name: "top_customers",
284
516
  variables: { limit: input.limit || 5 }
@@ -287,16 +519,27 @@ function main(input) {
287
519
  }
288
520
  ```
289
521
 
290
- ### Run via MCP
522
+ ### Run via GraphQL
291
523
 
292
- Call:
293
- - `get_js_runtime_api` first (for exact runtime schema)
294
- - `execute_workflow` with:
295
- - `name`: workflow file name (with or without `.js`)
296
- - `variables`: input payload passed to global `input` and `main(input)`
524
+ ```graphql
525
+ mutation {
526
+ gj_workflow_execution(insert: {
527
+ workflow_name: "customer_insights"
528
+ variables: { limit: 5 }
529
+ }) {
530
+ status
531
+ result_json
532
+ error
533
+ }
534
+ }
535
+ ```
536
+
537
+ Legacy MCP clients can call `execute_workflow` only when `mcp.legacy_discovery: true`.
297
538
 
298
539
  ### Run via REST
299
540
 
541
+ This legacy HTTP endpoint is available in legacy database mode, or in source mode only when `mcp.legacy_discovery: true`.
542
+
300
543
  - `POST /api/v1/workflows/<name>` with JSON body
301
544
  - `GET /api/v1/workflows/<name>?variables={...json...}`
302
545
 
@@ -326,15 +569,15 @@ Real conversations showing how an AI assistant uses GraphJin MCP tools to help y
326
569
  >
327
570
  > *calls `update_current_config` with databases: {"main": {type: "postgres", host: "localhost", port: 5432, user: "postgres", dbname: "myapp"}}*
328
571
  >
329
- > Connected successfully! Let me see what tables you have.
572
+ > Connected successfully! Let me discover the catalog surface.
330
573
  >
331
- > *calls `list_tables`*
574
+ > *calls `query_catalog` with `where: {kind: {eq: "table"}}`*
332
575
  >
333
576
  > Your database has these tables: `users`, `orders`, `products`, `reviews`. Want me to write a query?
334
577
 
335
578
  > **You:** Show me users with their orders.
336
579
 
337
- > **Claude:** *calls `get_query_syntax` then `execute_graphql`*
580
+ > **Claude:** *calls `query_catalog` with `search: "join users orders", where: {kind: {eq: "relationship"}}`, then `get_catalog_card`, then `execute_graphql`*
338
581
  > ```graphql
339
582
  > { users { id name email orders { id total status } } }
340
583
  > ```
@@ -361,13 +604,13 @@ Real conversations showing how an AI assistant uses GraphJin MCP tools to help y
361
604
 
362
605
  > **Claude:** *calls `update_current_config` with the full credentials*
363
606
  >
364
- > Connected! Let me explore the schema.
607
+ > Connected! Let me explore the catalog.
365
608
  >
366
- > *calls `list_tables`*
609
+ > *calls `query_catalog` with `where: {kind: {eq: "table"}}`*
367
610
  >
368
611
  > Tables found: `customers`, `products`, `orders`, `order_items`, `categories`. Let me check how they relate.
369
612
  >
370
- > *calls `explore_relationships` with table: "orders"*
613
+ > *calls `query_catalog` with `search: "join orders customers order_items products", where: {kind: {eq: "relationship"}}`, then `get_catalog_card` for the relevant relationship items*
371
614
  >
372
615
  > `orders` connects to `customers` (via customer_id) and `order_items` (via order_id), and `order_items` links to `products`. Here's a nested query:
373
616
  >
@@ -392,13 +635,13 @@ Real conversations showing how an AI assistant uses GraphJin MCP tools to help y
392
635
  >
393
636
  > Connected! Let me see your tables.
394
637
  >
395
- > *calls `list_tables`*
638
+ > *calls `query_catalog` with `where: {kind: {eq: "table"}}`*
396
639
  >
397
640
  > Tables: `tasks`, `projects`, `tags`, `task_tags`. Want to query something?
398
641
 
399
642
  > **You:** Show me all tasks with their project name, sorted by due date.
400
643
 
401
- > **Claude:** *calls `execute_graphql`*
644
+ > **Claude:** *calls `query_catalog` with `search: "join tasks projects", where: {kind: {eq: "relationship"}}`, then `get_catalog_card`, then `execute_graphql`*
402
645
  > ```graphql
403
646
  > { tasks(order_by: {due_date: asc}) { id title due_date completed project { name } } }
404
647
  > ```
@@ -436,7 +679,13 @@ roles:
436
679
 
437
680
  **JWT authentication** - Supports Auth0, Firebase, JWKS endpoints.
438
681
 
439
- **Response caching** - Redis with in-memory fallback. Automatic cache invalidation.
682
+ **Response caching** - Redis with in-memory fallback. Automatic cache invalidation on mutations. **Stale-while-revalidate** support: serve cached responses immediately while a background worker refreshes the entry — concurrent refreshes for the same key are deduplicated via singleflight, and the worker pool is bounded so a thundering herd can't spawn unbounded goroutines.
683
+
684
+ ```yaml
685
+ caching:
686
+ ttl: 3600 # hard expiry in seconds
687
+ fresh_ttl: 300 # soft expiry — entries past this trigger SWR refresh
688
+ ```
440
689
 
441
690
  ## Also a GraphQL API
442
691
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "graphjin",
3
- "version": "3.18.10",
3
+ "version": "3.18.22",
4
4
  "description": "GraphJin CLI - Build APIs in 5 minutes with GraphQL",
5
5
  "bin": {
6
6
  "graphjin": "bin/graphjin.js"