@runcontext/cli 0.4.0 → 0.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,102 +1,66 @@
1
1
  # @runcontext/cli
2
2
 
3
- CLI for [ContextKit](https://github.com/erickittelson/ContextKit) — AI-ready metadata governance over OSI.
3
+ **ContextKit — tell your AI agent to build your semantic layer.**
4
4
 
5
- **Your data already has a schema. ContextKit gives it meaning.**
5
+ Tell your AI agent: *"Install @runcontext/cli and build a semantic layer for my database."*
6
+
7
+ The agent introspects your database, scaffolds metadata, and goes back and forth with you — asking about metrics, ownership, and business rules — while it builds the semantic layer using CLI commands. When it reaches Gold tier, it exports an **AI Blueprint** and serves the metadata to other AI agents via MCP.
6
8
 
7
9
  ## Installation
8
10
 
9
11
  ```bash
10
- # Global install
11
- npm install -g @runcontext/cli
12
-
13
- # Or per-project
14
- npm install -D @runcontext/cli
12
+ npm install @runcontext/cli
15
13
  ```
16
14
 
17
15
  ## Quick Start
18
16
 
19
- ```bash
20
- # Point at any database — interactive wizard does the rest
21
- context setup
22
-
23
- # Or step by step
24
- context introspect --db duckdb://warehouse.duckdb
25
- context enrich --target silver --apply
26
- context tier
27
- ```
17
+ In Claude Code, Cursor, Windsurf, or any agentic coding platform:
28
18
 
29
- ## Commands
19
+ > *"Install @runcontext/cli and build a semantic layer for my database."*
30
20
 
31
- ### Core Workflow
21
+ Or run it yourself:
32
22
 
33
23
  ```bash
34
- context setup # Interactive wizard — database to metadata in one flow
35
- context introspect # Scan a database → scaffold Bronze-level OSI metadata
36
- context enrich --target silver # Auto-enrich metadata toward a target tier
37
- context lint # Run 37 lint rules against context files
38
- context fix --write # Auto-fix lint issues where possible
39
- context build # Compile context files → emit manifest JSON
40
- context tier [model] # Show Bronze/Silver/Gold scorecard
24
+ context setup # Interactive wizard — database to metadata in one flow
25
+ context tier # Check Bronze/Silver/Gold score
26
+ context blueprint # Export AI Blueprint (portable Gold-tier spec)
27
+ context serve --stdio # Serve to AI agents via MCP
41
28
  ```
42
29
 
43
- ### Exploration
30
+ ## Commands
44
31
 
45
32
  ```bash
46
- context explain <name> # Look up any model, term, or owner
47
- context rules # List all lint rules with tier, severity, fixability
48
- context validate-osi <file> # Validate a single OSI file against the schema
49
- context verify # Check metadata accuracy against a live database
50
- ```
51
-
52
- ### Serving
33
+ # Build the semantic layer
34
+ context setup # Interactive wizard full pipeline in one flow
35
+ context new <name> # Scaffold a new data product
36
+ context introspect # Scan a database -> scaffold Bronze metadata
37
+ context enrich --target silver # Auto-enrich toward a target tier
38
+ context lint # Run 40 lint rules
39
+ context fix --write # Auto-fix lint issues
40
+ context build # Compile -> emit manifest JSON
41
+ context tier [model] # Show Bronze/Silver/Gold scorecard
53
42
 
54
- ```bash
55
- context serve --stdio # Start MCP server over stdio (for Claude, Cursor, etc.)
56
- context serve --http --port 3000 # Start MCP server over HTTP
57
- context site # Generate a static documentation site
58
- context dev # Watch mode re-lint on file changes
59
- context init # Scaffold a new project structure
43
+ # Explore and verify
44
+ context explain <name> # Look up any model, term, or owner
45
+ context rules # List all lint rules
46
+ context validate-osi <file> # Validate against OSI spec
47
+ context verify # Check accuracy against a live database
48
+
49
+ # Export and serve
50
+ context blueprint [model] # Export AI Blueprints (portable OSI YAML)
51
+ context serve --stdio # MCP server over stdio
52
+ context serve --http --port 3000 # MCP server over HTTP
53
+ context site # Static documentation site
54
+ context dev --studio # Visual editor in the browser
55
+ context init # Scaffold a new project
60
56
  ```
61
57
 
62
- ## The Tier System
63
-
64
- | Tier | What it means | How to get there |
65
- |---|---|---|
66
- | **Bronze** | Discoverable — described, owned, classified | `context introspect` |
67
- | **Silver** | Trusted — lineage, glossary, sample values, trust status | `context enrich --target silver` |
68
- | **Gold** | AI-Ready — semantic roles, golden queries, guardrails, business rules | Human curation + `context enrich --target gold` |
69
-
70
58
  ## Database Support
71
59
 
72
- | Adapter | Connection |
73
- |---|---|
74
- | DuckDB | `--db duckdb://path.duckdb` |
75
- | PostgreSQL | `--db postgres://user:pass@host:5432/db` |
76
- | MySQL | `--db mysql://user:pass@host:3306/db` |
77
- | SQL Server | `--db mssql://user:pass@host:1433/db` |
78
- | SQLite | `--db path/to/file.sqlite` |
79
- | Snowflake | `--db snowflake://account/database/schema` |
80
- | BigQuery | `--db bigquery://project/dataset` |
81
- | ClickHouse | `--db clickhouse://host:8123` |
82
- | Databricks | Config file only |
83
-
84
- Each adapter requires its own driver as an optional peer dependency. See [Database Support docs](https://contextkit.dev/reference/databases/) for installation details.
60
+ DuckDB, PostgreSQL, MySQL, SQL Server, SQLite, Snowflake, BigQuery, ClickHouse, Databricks.
85
61
 
86
62
  ## MCP Server
87
63
 
88
- Expose your metadata to AI agents:
89
-
90
- ```bash
91
- # For Claude Code / Cursor
92
- context serve --stdio
93
-
94
- # For multi-agent setups
95
- context serve --http --port 3000
96
- ```
97
-
98
- Add to `.claude/mcp.json`:
99
-
100
64
  ```json
101
65
  {
102
66
  "mcpServers": {
@@ -108,9 +72,9 @@ Add to `.claude/mcp.json`:
108
72
  }
109
73
  ```
110
74
 
111
- ## Full Documentation
75
+ ## Documentation
112
76
 
113
- See the [ContextKit repository](https://github.com/erickittelson/ContextKit) for complete docs, file structure, tier requirements, and examples.
77
+ [contextkit.dev](https://contextkit.dev) | [GitHub](https://github.com/erickittelson/ContextKit)
114
78
 
115
79
  ## License
116
80
 
package/dist/index.js CHANGED
@@ -141,7 +141,7 @@ function formatSarif(diagnostics) {
141
141
  tool: {
142
142
  driver: {
143
143
  name: "ContextKit",
144
- version: "0.4.0",
144
+ version: "0.4.2",
145
145
  informationUri: "https://github.com/erickittelson/ContextKit",
146
146
  rules: Array.from(ruleMap.values())
147
147
  }
@@ -579,9 +579,9 @@ function parseDbUrl(db) {
579
579
  `Cannot determine adapter from "${db}". Use a URL prefix (duckdb://, postgres://, mysql://, mssql://, clickhouse://, snowflake://, bigquery://) or a recognized file extension (.duckdb, .db, .sqlite, .sqlite3).`
580
580
  );
581
581
  }
582
- var introspectCommand = new Command5("introspect").description("Introspect a database and scaffold Bronze-level OSI metadata").option(
582
+ var introspectCommand = new Command5("introspect").description("Introspect a database and scaffold Bronze-level OSI metadata. Supports: duckdb://, postgres://, mysql://, mssql://, snowflake://, bigquery://, clickhouse://, .sqlite, .duckdb files, and Databricks (via config).").option(
583
583
  "--db <url>",
584
- "Database URL (e.g., duckdb://path.duckdb or postgres://...)"
584
+ "Database URL (duckdb://path.duckdb, postgres://user:pass@host/db, mysql://..., mssql://..., snowflake://account/db/schema, bigquery://project/dataset, clickhouse://host, or file.sqlite)"
585
585
  ).option(
586
586
  "--source <name>",
587
587
  "Use a named data_source from contextkit.config.yaml"
@@ -3259,6 +3259,71 @@ ${intentSection}## The Cardinal Rule: Never Fabricate Metadata
3259
3259
 
3260
3260
  If you don't know something, **ask the user**. A honest "I'm not sure \u2014 can you tell me what this field means?" is infinitely better than fabricated metadata that looks plausible but is wrong.
3261
3261
 
3262
+ ## Database Safety \u2014 MANDATORY
3263
+
3264
+ **Your job is to READ the database to build metadata. You must NEVER modify the database.**
3265
+
3266
+ ### Hard Rules (no exceptions)
3267
+
3268
+ - **READ-ONLY.** Never execute INSERT, UPDATE, DELETE, DROP, ALTER, CREATE, TRUNCATE, MERGE, REPLACE, or any DDL/DML statement
3269
+ - **LIMIT everything.** Every SELECT must include \`LIMIT\`. Use \`LIMIT 20\` for sample values, \`LIMIT 100\` max for golden query validation. Never run unlimited queries
3270
+ - **No full table scans.** Never \`SELECT * FROM large_table\` without a WHERE clause and LIMIT. For row counts, use \`COUNT(*)\` \u2014 never pull all rows to count them
3271
+ - **No expensive aggregations.** Avoid \`GROUP BY\` on high-cardinality columns across full tables. When checking distinct values, use \`SELECT DISTINCT col FROM table LIMIT 20\`, not \`SELECT DISTINCT col FROM table\`
3272
+ - **No cross joins or cartesian products.** Never join tables without a proper join condition
3273
+ - **No recursive or deeply nested queries.** Keep queries simple \u2014 you're sampling data, not building reports
3274
+ - **No EXPLAIN ANALYZE on cloud warehouses.** On Snowflake, BigQuery, Databricks, etc., even EXPLAIN can trigger computation. Use metadata queries (information_schema) instead when possible
3275
+
3276
+ ### Cost Awareness
3277
+
3278
+ Cloud data warehouses (Snowflake, BigQuery, Databricks, Redshift) charge per query based on data scanned. **Every query costs money.**
3279
+
3280
+ - Prefer \`information_schema\` queries over scanning actual tables
3281
+ - Use \`LIMIT\` on every query \u2014 no exceptions
3282
+ - Sample a few rows to understand a column, don't scan the full table
3283
+ - For BigQuery: always qualify table names with dataset to avoid scanning wrong tables
3284
+ - For Snowflake: use \`SAMPLE\` clause when available instead of full table scans
3285
+ - If you need row counts, use table metadata or \`COUNT(*)\` \u2014 never \`SELECT *\`
3286
+ - Batch your questions: gather what you need to know, then write ONE efficient query instead of many small ones
3287
+
3288
+ ### What You ARE Allowed To Do
3289
+
3290
+ \`\`\`sql
3291
+ -- YES: Sample values (always with LIMIT)
3292
+ SELECT DISTINCT column_name FROM table_name LIMIT 20;
3293
+
3294
+ -- YES: Basic stats for a column (single column, not full row)
3295
+ SELECT MIN(col), MAX(col), COUNT(DISTINCT col) FROM table_name;
3296
+
3297
+ -- YES: Row count
3298
+ SELECT COUNT(*) FROM table_name;
3299
+
3300
+ -- YES: Schema metadata (free or near-free on all platforms)
3301
+ SELECT column_name, data_type FROM information_schema.columns
3302
+ WHERE table_name = 'my_table';
3303
+
3304
+ -- YES: Validate a golden query (with LIMIT)
3305
+ SELECT geoid, score FROM vw_rankings ORDER BY score DESC LIMIT 10;
3306
+ \`\`\`
3307
+
3308
+ ### What You Must NEVER Do
3309
+
3310
+ \`\`\`sql
3311
+ -- NEVER: Modify data
3312
+ INSERT INTO / UPDATE / DELETE FROM / DROP TABLE / ALTER TABLE
3313
+
3314
+ -- NEVER: Unlimited scans
3315
+ SELECT * FROM large_table;
3316
+ SELECT DISTINCT high_cardinality_col FROM big_table;
3317
+
3318
+ -- NEVER: Expensive cross-table operations without LIMIT
3319
+ SELECT * FROM a JOIN b ON a.id = b.id JOIN c ON b.id = c.id;
3320
+
3321
+ -- NEVER: Write to the database in any way
3322
+ CREATE TABLE / CREATE VIEW / CREATE INDEX
3323
+ \`\`\`
3324
+
3325
+ If a query might be expensive and you're not sure, **ask the user first**. "This table looks large \u2014 is it OK if I run a COUNT(*)?" is always the right call.
3326
+
3262
3327
  ## Reference Documents
3263
3328
 
3264
3329
  Check \`context/reference/\` for any files the user has provided \u2014 data dictionaries, Confluence exports, ERDs, business glossaries, dashboard docs, etc. **Read these first** before querying the database. They contain domain knowledge that will dramatically improve your metadata quality.
@@ -3360,18 +3425,20 @@ You must iterate \u2014 a single pass is never enough. Each \`context tier\` run
3360
3425
 
3361
3426
  ### Before writing ANY metadata, query the database first
3362
3427
 
3363
- For every field you're about to describe or classify:
3428
+ For every field you're about to describe or classify (**always with LIMIT, always read-only**):
3364
3429
 
3365
3430
  \`\`\`sql
3366
3431
  -- What type of values does this column contain?
3367
3432
  SELECT DISTINCT column_name FROM table LIMIT 20;
3368
3433
 
3369
3434
  -- For numeric columns: is this a metric or dimension?
3370
- SELECT MIN(col), MAX(col), AVG(col), COUNT(DISTINCT col) FROM table;
3435
+ SELECT MIN(col), MAX(col), AVG(col), COUNT(DISTINCT col) FROM table LIMIT 1;
3371
3436
 
3372
3437
  -- For potential metrics: does SUM make sense?
3373
3438
  -- If SUM produces a meaningful business number \u2192 additive: true
3374
3439
  -- If SUM is meaningless (e.g., summing percentages, scores, ratings) \u2192 additive: false
3440
+
3441
+ -- REMEMBER: Never run queries without LIMIT. Never modify data.
3375
3442
  \`\`\`
3376
3443
 
3377
3444
  ### Semantic Role Decision Tree
@@ -3454,6 +3521,42 @@ ${datasetList || "(none detected)"}
3454
3521
 
3455
3522
  ${failingSection}
3456
3523
 
3524
+ ## Serving to Other Agents via MCP
3525
+
3526
+ Once the semantic layer reaches Silver or Gold, serve it so other AI agents can use the curated metadata:
3527
+
3528
+ \`\`\`bash
3529
+ # Start MCP server (agents connect via stdio)
3530
+ context serve --stdio
3531
+
3532
+ # Or via HTTP for remote/multi-agent setups
3533
+ context serve --http --port 3000
3534
+ \`\`\`
3535
+
3536
+ To add ContextKit as an MCP server in another agent's config:
3537
+
3538
+ \`\`\`json
3539
+ {
3540
+ "mcpServers": {
3541
+ "contextkit": {
3542
+ "command": "npx",
3543
+ "args": ["@runcontext/cli", "serve", "--stdio"]
3544
+ }
3545
+ }
3546
+ }
3547
+ \`\`\`
3548
+
3549
+ ### Exporting AI Blueprints
3550
+
3551
+ Export the Gold-tier outcome as a portable YAML file:
3552
+
3553
+ \`\`\`bash
3554
+ context blueprint ${modelName}
3555
+ # \u2192 blueprints/${modelName}.data-product.osi.yaml
3556
+ \`\`\`
3557
+
3558
+ This AI Blueprint contains the complete semantic specification \u2014 share it, serve it via MCP, or import it into any OSI-compliant tool.
3559
+
3457
3560
  ## MCP Tools (if using ContextKit as an MCP server)
3458
3561
 
3459
3562
  | Tool | Parameters | What it does |
@@ -3478,10 +3581,11 @@ ${failingSection}
3478
3581
  Inspect computed views in the database. Any calculated column is a candidate metric.
3479
3582
 
3480
3583
  \`\`\`sql
3481
- -- Find computed columns in views
3584
+ -- Find computed columns in views (information_schema queries are free/cheap)
3482
3585
  SELECT column_name, data_type
3483
3586
  FROM information_schema.columns
3484
- WHERE table_name LIKE 'vw_%' AND data_type IN ('DOUBLE', 'FLOAT', 'INTEGER', 'BIGINT', 'DECIMAL');
3587
+ WHERE table_name LIKE 'vw_%' AND data_type IN ('DOUBLE', 'FLOAT', 'INTEGER', 'BIGINT', 'DECIMAL')
3588
+ LIMIT 100;
3485
3589
  \`\`\`
3486
3590
 
3487
3591
  For each computed column (e.g., \`opportunity_score\`, \`shops_per_10k\`, \`demand_signal_pct\`):
@@ -3529,9 +3633,10 @@ Models with 5+ datasets need at least 3 glossary terms linked by shared tags or
3529
3633
  For each join in the SQL views, define a relationship in the OSI model.
3530
3634
 
3531
3635
  \`\`\`sql
3532
- -- Find joins by examining view definitions
3636
+ -- Find joins by examining view definitions (metadata query, low cost)
3533
3637
  -- Look for patterns: ON table_a.col = table_b.col
3534
3638
  -- Or spatial joins: ABS(a.lat - b.lat) < threshold
3639
+ -- NEVER run the actual joins yourself to "test" them \u2014 just document the relationship
3535
3640
  \`\`\`
3536
3641
 
3537
3642
  For each join:
@@ -3549,12 +3654,14 @@ Models with 3+ datasets need at least 3 relationships.
3549
3654
 
3550
3655
  ### Golden Queries
3551
3656
 
3552
- Write 3-5 SQL queries answering common business questions. **Test each query first!**
3657
+ Write 3-5 SQL queries answering common business questions. **Test each query with LIMIT first!**
3553
3658
 
3554
3659
  \`\`\`sql
3555
- -- Run the query, verify it returns sensible results, then document:
3660
+ -- Validate with LIMIT (never run unbounded queries to "test"):
3556
3661
  SELECT geoid, tract_name, opportunity_score
3557
3662
  FROM vw_candidate_zones ORDER BY opportunity_score DESC LIMIT 10;
3663
+
3664
+ -- The golden query YAML can document the full query, but when you TEST it, always use LIMIT
3558
3665
  \`\`\`
3559
3666
 
3560
3667
  ## YAML Formats
@@ -3638,7 +3745,7 @@ async function runAgentInstructionsStep(ctx) {
3638
3745
  }
3639
3746
 
3640
3747
  // src/commands/setup.ts
3641
- var setupCommand = new Command15("setup").description("Interactive wizard to scaffold and enrich metadata from a database").action(async () => {
3748
+ var setupCommand = new Command15("setup").description("Interactive wizard \u2014 detects databases, introspects schema, scaffolds metadata, enriches to Silver, generates agent instructions. Supports DuckDB, PostgreSQL, MySQL, SQL Server, SQLite, Snowflake, BigQuery, ClickHouse, and Databricks.").action(async () => {
3642
3749
  p10.intro(chalk16.bgCyan(chalk16.black(" ContextKit Setup ")));
3643
3750
  const ctx = await runConnectStep();
3644
3751
  if (!ctx) return;
@@ -4131,7 +4238,7 @@ var newCommand = new Command17("new").description("Scaffold a new data product i
4131
4238
 
4132
4239
  // src/index.ts
4133
4240
  var program = new Command18();
4134
- program.name("context").description("ContextKit \u2014 AI-ready metadata governance over OSI").version("0.4.0");
4241
+ program.name("context").description("ContextKit \u2014 AI-ready metadata governance over OSI").version("0.4.2");
4135
4242
  program.addCommand(lintCommand);
4136
4243
  program.addCommand(buildCommand);
4137
4244
  program.addCommand(tierCommand);