npm - @runcontext/cli - Versions diffs - 0.4.0 → 0.4.2 - Mend

@runcontext/cli 0.4.0 → 0.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md CHANGED Viewed

@@ -1,102 +1,66 @@
 # @runcontext/cli
-CLI for [ContextKit](https://github.com/erickittelson/ContextKit) — AI-ready metadata governance over OSI.
+**ContextKit — tell your AI agent to build your semantic layer.**
-**Your data already has a schema. ContextKit gives it meaning.**
+Tell your AI agent: *"Install @runcontext/cli and build a semantic layer for my database."*
+The agent introspects your database, scaffolds metadata, and goes back and forth with you — asking about metrics, ownership, and business rules — while it builds the semantic layer using CLI commands. When it reaches Gold tier, it exports an **AI Blueprint** and serves the metadata to other AI agents via MCP.
 ## Installation
 ```bash
-# Global install
-npm install -g @runcontext/cli
-# Or per-project
-npm install -D @runcontext/cli
+npm install @runcontext/cli
 ```
 ## Quick Start
-```bash
-# Point at any database — interactive wizard does the rest
-context setup
-# Or step by step
-context introspect --db duckdb://warehouse.duckdb
-context enrich --target silver --apply
-context tier
-```
+In Claude Code, Cursor, Windsurf, or any agentic coding platform:
-## Commands
+> *"Install @runcontext/cli and build a semantic layer for my database."*
-### Core Workflow
+Or run it yourself:
 ```bash
-context setup                    # Interactive wizard — database to metadata in one flow
-context introspect               # Scan a database → scaffold Bronze-level OSI metadata
-context enrich --target silver   # Auto-enrich metadata toward a target tier
-context lint                     # Run 37 lint rules against context files
-context fix --write              # Auto-fix lint issues where possible
-context build                    # Compile context files → emit manifest JSON
-context tier [model]             # Show Bronze/Silver/Gold scorecard
+context setup           # Interactive wizard — database to metadata in one flow
+context tier            # Check Bronze/Silver/Gold score
+context blueprint       # Export AI Blueprint (portable Gold-tier spec)
+context serve --stdio   # Serve to AI agents via MCP
 ```
-### Exploration
+## Commands
 ```bash
-context explain <name>           # Look up any model, term, or owner
-context rules                    # List all lint rules with tier, severity, fixability
-context validate-osi <file>      # Validate a single OSI file against the schema
-context verify                   # Check metadata accuracy against a live database
-```
-### Serving
+# Build the semantic layer
+context setup                    # Interactive wizard — full pipeline in one flow
+context new <name>               # Scaffold a new data product
+context introspect               # Scan a database -> scaffold Bronze metadata
+context enrich --target silver   # Auto-enrich toward a target tier
+context lint                     # Run 40 lint rules
+context fix --write              # Auto-fix lint issues
+context build                    # Compile -> emit manifest JSON
+context tier [model]             # Show Bronze/Silver/Gold scorecard
-```bash
-context serve --stdio            # Start MCP server over stdio (for Claude, Cursor, etc.)
-context serve --http --port 3000 # Start MCP server over HTTP
-context site                     # Generate a static documentation site
-context dev                      # Watch mode — re-lint on file changes
-context init                     # Scaffold a new project structure
+# Explore and verify
+context explain <name>           # Look up any model, term, or owner
+context rules                    # List all lint rules
+context validate-osi <file>      # Validate against OSI spec
+context verify                   # Check accuracy against a live database
+# Export and serve
+context blueprint [model]        # Export AI Blueprints (portable OSI YAML)
+context serve --stdio            # MCP server over stdio
+context serve --http --port 3000 # MCP server over HTTP
+context site                     # Static documentation site
+context dev --studio             # Visual editor in the browser
+context init                     # Scaffold a new project
 ```
-## The Tier System
-| Tier | What it means | How to get there |
-|---|---|---|
-| **Bronze** | Discoverable — described, owned, classified | `context introspect` |
-| **Silver** | Trusted — lineage, glossary, sample values, trust status | `context enrich --target silver` |
-| **Gold** | AI-Ready — semantic roles, golden queries, guardrails, business rules | Human curation + `context enrich --target gold` |
 ## Database Support
-| Adapter | Connection |
-|---|---|
-| DuckDB | `--db duckdb://path.duckdb` |
-| PostgreSQL | `--db postgres://user:pass@host:5432/db` |
-| MySQL | `--db mysql://user:pass@host:3306/db` |
-| SQL Server | `--db mssql://user:pass@host:1433/db` |
-| SQLite | `--db path/to/file.sqlite` |
-| Snowflake | `--db snowflake://account/database/schema` |
-| BigQuery | `--db bigquery://project/dataset` |
-| ClickHouse | `--db clickhouse://host:8123` |
-| Databricks | Config file only |
-Each adapter requires its own driver as an optional peer dependency. See [Database Support docs](https://contextkit.dev/reference/databases/) for installation details.
+DuckDB, PostgreSQL, MySQL, SQL Server, SQLite, Snowflake, BigQuery, ClickHouse, Databricks.
 ## MCP Server
-Expose your metadata to AI agents:
-```bash
-# For Claude Code / Cursor
-context serve --stdio
-# For multi-agent setups
-context serve --http --port 3000
-```
-Add to `.claude/mcp.json`:
 ```json
 {
   "mcpServers": {
@@ -108,9 +72,9 @@ Add to `.claude/mcp.json`:
 }
 ```
-## Full Documentation
+## Documentation
-See the [ContextKit repository](https://github.com/erickittelson/ContextKit) for complete docs, file structure, tier requirements, and examples.
+[contextkit.dev](https://contextkit.dev) | [GitHub](https://github.com/erickittelson/ContextKit)
 ## License

package/dist/index.js CHANGED Viewed

@@ -141,7 +141,7 @@ function formatSarif(diagnostics) {
         tool: {
           driver: {
             name: "ContextKit",
-            version: "0.4.0",
+            version: "0.4.2",
             informationUri: "https://github.com/erickittelson/ContextKit",
             rules: Array.from(ruleMap.values())
           }
@@ -579,9 +579,9 @@ function parseDbUrl(db) {
     `Cannot determine adapter from "${db}". Use a URL prefix (duckdb://, postgres://, mysql://, mssql://, clickhouse://, snowflake://, bigquery://) or a recognized file extension (.duckdb, .db, .sqlite, .sqlite3).`
   );
 }
-var introspectCommand = new Command5("introspect").description("Introspect a database and scaffold Bronze-level OSI metadata").option(
+var introspectCommand = new Command5("introspect").description("Introspect a database and scaffold Bronze-level OSI metadata. Supports: duckdb://, postgres://, mysql://, mssql://, snowflake://, bigquery://, clickhouse://, .sqlite, .duckdb files, and Databricks (via config).").option(
   "--db <url>",
-  "Database URL (e.g., duckdb://path.duckdb or postgres://...)"
+  "Database URL (duckdb://path.duckdb, postgres://user:pass@host/db, mysql://..., mssql://..., snowflake://account/db/schema, bigquery://project/dataset, clickhouse://host, or file.sqlite)"
 ).option(
   "--source <name>",
   "Use a named data_source from contextkit.config.yaml"
@@ -3259,6 +3259,71 @@ ${intentSection}## The Cardinal Rule: Never Fabricate Metadata
 If you don't know something, **ask the user**. A honest "I'm not sure \u2014 can you tell me what this field means?" is infinitely better than fabricated metadata that looks plausible but is wrong.
+## Database Safety \u2014 MANDATORY
+**Your job is to READ the database to build metadata. You must NEVER modify the database.**
+### Hard Rules (no exceptions)
+- **READ-ONLY.** Never execute INSERT, UPDATE, DELETE, DROP, ALTER, CREATE, TRUNCATE, MERGE, REPLACE, or any DDL/DML statement
+- **LIMIT everything.** Every SELECT must include \`LIMIT\`. Use \`LIMIT 20\` for sample values, \`LIMIT 100\` max for golden query validation. Never run unlimited queries
+- **No full table scans.** Never \`SELECT * FROM large_table\` without a WHERE clause and LIMIT. For row counts, use \`COUNT(*)\` \u2014 never pull all rows to count them
+- **No expensive aggregations.** Avoid \`GROUP BY\` on high-cardinality columns across full tables. When checking distinct values, use \`SELECT DISTINCT col FROM table LIMIT 20\`, not \`SELECT DISTINCT col FROM table\`
+- **No cross joins or cartesian products.** Never join tables without a proper join condition
+- **No recursive or deeply nested queries.** Keep queries simple \u2014 you're sampling data, not building reports
+- **No EXPLAIN ANALYZE on cloud warehouses.** On Snowflake, BigQuery, Databricks, etc., even EXPLAIN can trigger computation. Use metadata queries (information_schema) instead when possible
+### Cost Awareness
+Cloud data warehouses (Snowflake, BigQuery, Databricks, Redshift) charge per query based on data scanned. **Every query costs money.**
+- Prefer \`information_schema\` queries over scanning actual tables
+- Use \`LIMIT\` on every query \u2014 no exceptions
+- Sample a few rows to understand a column, don't scan the full table
+- For BigQuery: always qualify table names with dataset to avoid scanning wrong tables
+- For Snowflake: use \`SAMPLE\` clause when available instead of full table scans
+- If you need row counts, use table metadata or \`COUNT(*)\` \u2014 never \`SELECT *\`
+- Batch your questions: gather what you need to know, then write ONE efficient query instead of many small ones
+### What You ARE Allowed To Do
+\`\`\`sql
+-- YES: Sample values (always with LIMIT)
+SELECT DISTINCT column_name FROM table_name LIMIT 20;
+-- YES: Basic stats for a column (single column, not full row)
+SELECT MIN(col), MAX(col), COUNT(DISTINCT col) FROM table_name;
+-- YES: Row count
+SELECT COUNT(*) FROM table_name;
+-- YES: Schema metadata (free or near-free on all platforms)
+SELECT column_name, data_type FROM information_schema.columns
+WHERE table_name = 'my_table';
+-- YES: Validate a golden query (with LIMIT)
+SELECT geoid, score FROM vw_rankings ORDER BY score DESC LIMIT 10;
+\`\`\`
+### What You Must NEVER Do
+\`\`\`sql
+-- NEVER: Modify data
+INSERT INTO / UPDATE / DELETE FROM / DROP TABLE / ALTER TABLE
+-- NEVER: Unlimited scans
+SELECT * FROM large_table;
+SELECT DISTINCT high_cardinality_col FROM big_table;
+-- NEVER: Expensive cross-table operations without LIMIT
+SELECT * FROM a JOIN b ON a.id = b.id JOIN c ON b.id = c.id;
+-- NEVER: Write to the database in any way
+CREATE TABLE / CREATE VIEW / CREATE INDEX
+\`\`\`
+If a query might be expensive and you're not sure, **ask the user first**. "This table looks large \u2014 is it OK if I run a COUNT(*)?" is always the right call.
 ## Reference Documents
 Check \`context/reference/\` for any files the user has provided \u2014 data dictionaries, Confluence exports, ERDs, business glossaries, dashboard docs, etc. **Read these first** before querying the database. They contain domain knowledge that will dramatically improve your metadata quality.
@@ -3360,18 +3425,20 @@ You must iterate \u2014 a single pass is never enough. Each \`context tier\` run
 ### Before writing ANY metadata, query the database first
-For every field you're about to describe or classify:
+For every field you're about to describe or classify (**always with LIMIT, always read-only**):
 \`\`\`sql
 -- What type of values does this column contain?
 SELECT DISTINCT column_name FROM table LIMIT 20;
 -- For numeric columns: is this a metric or dimension?
-SELECT MIN(col), MAX(col), AVG(col), COUNT(DISTINCT col) FROM table;
+SELECT MIN(col), MAX(col), AVG(col), COUNT(DISTINCT col) FROM table LIMIT 1;
 -- For potential metrics: does SUM make sense?
 -- If SUM produces a meaningful business number \u2192 additive: true
 -- If SUM is meaningless (e.g., summing percentages, scores, ratings) \u2192 additive: false
+-- REMEMBER: Never run queries without LIMIT. Never modify data.
 \`\`\`
 ### Semantic Role Decision Tree
@@ -3454,6 +3521,42 @@ ${datasetList || "(none detected)"}
 ${failingSection}
+## Serving to Other Agents via MCP
+Once the semantic layer reaches Silver or Gold, serve it so other AI agents can use the curated metadata:
+\`\`\`bash
+# Start MCP server (agents connect via stdio)
+context serve --stdio
+# Or via HTTP for remote/multi-agent setups
+context serve --http --port 3000
+\`\`\`
+To add ContextKit as an MCP server in another agent's config:
+\`\`\`json
+{
+  "mcpServers": {
+    "contextkit": {
+      "command": "npx",
+      "args": ["@runcontext/cli", "serve", "--stdio"]
+    }
+  }
+}
+\`\`\`
+### Exporting AI Blueprints
+Export the Gold-tier outcome as a portable YAML file:
+\`\`\`bash
+context blueprint ${modelName}
+# \u2192 blueprints/${modelName}.data-product.osi.yaml
+\`\`\`
+This AI Blueprint contains the complete semantic specification \u2014 share it, serve it via MCP, or import it into any OSI-compliant tool.
 ## MCP Tools (if using ContextKit as an MCP server)
 | Tool | Parameters | What it does |
@@ -3478,10 +3581,11 @@ ${failingSection}
 Inspect computed views in the database. Any calculated column is a candidate metric.
 \`\`\`sql
--- Find computed columns in views
+-- Find computed columns in views (information_schema queries are free/cheap)
 SELECT column_name, data_type
 FROM information_schema.columns
-WHERE table_name LIKE 'vw_%' AND data_type IN ('DOUBLE', 'FLOAT', 'INTEGER', 'BIGINT', 'DECIMAL');
+WHERE table_name LIKE 'vw_%' AND data_type IN ('DOUBLE', 'FLOAT', 'INTEGER', 'BIGINT', 'DECIMAL')
+LIMIT 100;
 \`\`\`
 For each computed column (e.g., \`opportunity_score\`, \`shops_per_10k\`, \`demand_signal_pct\`):
@@ -3529,9 +3633,10 @@ Models with 5+ datasets need at least 3 glossary terms linked by shared tags or
 For each join in the SQL views, define a relationship in the OSI model.
 \`\`\`sql
--- Find joins by examining view definitions
+-- Find joins by examining view definitions (metadata query, low cost)
 -- Look for patterns: ON table_a.col = table_b.col
 -- Or spatial joins: ABS(a.lat - b.lat) < threshold
+-- NEVER run the actual joins yourself to "test" them \u2014 just document the relationship
 \`\`\`
 For each join:
@@ -3549,12 +3654,14 @@ Models with 3+ datasets need at least 3 relationships.
 ### Golden Queries
-Write 3-5 SQL queries answering common business questions. **Test each query first!**
+Write 3-5 SQL queries answering common business questions. **Test each query with LIMIT first!**
 \`\`\`sql
--- Run the query, verify it returns sensible results, then document:
+-- Validate with LIMIT (never run unbounded queries to "test"):
 SELECT geoid, tract_name, opportunity_score
 FROM vw_candidate_zones ORDER BY opportunity_score DESC LIMIT 10;
+-- The golden query YAML can document the full query, but when you TEST it, always use LIMIT
 \`\`\`
 ## YAML Formats
@@ -3638,7 +3745,7 @@ async function runAgentInstructionsStep(ctx) {
 }
 // src/commands/setup.ts
-var setupCommand = new Command15("setup").description("Interactive wizard to scaffold and enrich metadata from a database").action(async () => {
+var setupCommand = new Command15("setup").description("Interactive wizard \u2014 detects databases, introspects schema, scaffolds metadata, enriches to Silver, generates agent instructions. Supports DuckDB, PostgreSQL, MySQL, SQL Server, SQLite, Snowflake, BigQuery, ClickHouse, and Databricks.").action(async () => {
   p10.intro(chalk16.bgCyan(chalk16.black(" ContextKit Setup ")));
   const ctx = await runConnectStep();
   if (!ctx) return;
@@ -4131,7 +4238,7 @@ var newCommand = new Command17("new").description("Scaffold a new data product i
 // src/index.ts
 var program = new Command18();
-program.name("context").description("ContextKit \u2014 AI-ready metadata governance over OSI").version("0.4.0");
+program.name("context").description("ContextKit \u2014 AI-ready metadata governance over OSI").version("0.4.2");
 program.addCommand(lintCommand);
 program.addCommand(buildCommand);
 program.addCommand(tierCommand);