@runcontext/cli 0.4.1 → 0.4.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/index.js +116 -9
- package/dist/index.js.map +1 -1
- package/package.json +4 -4
package/dist/index.js
CHANGED
|
@@ -141,7 +141,7 @@ function formatSarif(diagnostics) {
|
|
|
141
141
|
tool: {
|
|
142
142
|
driver: {
|
|
143
143
|
name: "ContextKit",
|
|
144
|
-
version: "0.4.
|
|
144
|
+
version: "0.4.2",
|
|
145
145
|
informationUri: "https://github.com/erickittelson/ContextKit",
|
|
146
146
|
rules: Array.from(ruleMap.values())
|
|
147
147
|
}
|
|
@@ -3259,6 +3259,71 @@ ${intentSection}## The Cardinal Rule: Never Fabricate Metadata
|
|
|
3259
3259
|
|
|
3260
3260
|
If you don't know something, **ask the user**. A honest "I'm not sure \u2014 can you tell me what this field means?" is infinitely better than fabricated metadata that looks plausible but is wrong.
|
|
3261
3261
|
|
|
3262
|
+
## Database Safety \u2014 MANDATORY
|
|
3263
|
+
|
|
3264
|
+
**Your job is to READ the database to build metadata. You must NEVER modify the database.**
|
|
3265
|
+
|
|
3266
|
+
### Hard Rules (no exceptions)
|
|
3267
|
+
|
|
3268
|
+
- **READ-ONLY.** Never execute INSERT, UPDATE, DELETE, DROP, ALTER, CREATE, TRUNCATE, MERGE, REPLACE, or any DDL/DML statement
|
|
3269
|
+
- **LIMIT everything.** Every SELECT must include \`LIMIT\`. Use \`LIMIT 20\` for sample values, \`LIMIT 100\` max for golden query validation. Never run unlimited queries
|
|
3270
|
+
- **No full table scans.** Never \`SELECT * FROM large_table\` without a WHERE clause and LIMIT. For row counts, use \`COUNT(*)\` \u2014 never pull all rows to count them
|
|
3271
|
+
- **No expensive aggregations.** Avoid \`GROUP BY\` on high-cardinality columns across full tables. When checking distinct values, use \`SELECT DISTINCT col FROM table LIMIT 20\`, not \`SELECT DISTINCT col FROM table\`
|
|
3272
|
+
- **No cross joins or cartesian products.** Never join tables without a proper join condition
|
|
3273
|
+
- **No recursive or deeply nested queries.** Keep queries simple \u2014 you're sampling data, not building reports
|
|
3274
|
+
- **No EXPLAIN ANALYZE on cloud warehouses.** On Snowflake, BigQuery, Databricks, etc., even EXPLAIN can trigger computation. Use metadata queries (information_schema) instead when possible
|
|
3275
|
+
|
|
3276
|
+
### Cost Awareness
|
|
3277
|
+
|
|
3278
|
+
Cloud data warehouses (Snowflake, BigQuery, Databricks, Redshift) charge per query based on data scanned. **Every query costs money.**
|
|
3279
|
+
|
|
3280
|
+
- Prefer \`information_schema\` queries over scanning actual tables
|
|
3281
|
+
- Use \`LIMIT\` on every query \u2014 no exceptions
|
|
3282
|
+
- Sample a few rows to understand a column, don't scan the full table
|
|
3283
|
+
- For BigQuery: always qualify table names with dataset to avoid scanning wrong tables
|
|
3284
|
+
- For Snowflake: use \`SAMPLE\` clause when available instead of full table scans
|
|
3285
|
+
- If you need row counts, use table metadata or \`COUNT(*)\` \u2014 never \`SELECT *\`
|
|
3286
|
+
- Batch your questions: gather what you need to know, then write ONE efficient query instead of many small ones
|
|
3287
|
+
|
|
3288
|
+
### What You ARE Allowed To Do
|
|
3289
|
+
|
|
3290
|
+
\`\`\`sql
|
|
3291
|
+
-- YES: Sample values (always with LIMIT)
|
|
3292
|
+
SELECT DISTINCT column_name FROM table_name LIMIT 20;
|
|
3293
|
+
|
|
3294
|
+
-- YES: Basic stats for a column (single column, not full row)
|
|
3295
|
+
SELECT MIN(col), MAX(col), COUNT(DISTINCT col) FROM table_name;
|
|
3296
|
+
|
|
3297
|
+
-- YES: Row count
|
|
3298
|
+
SELECT COUNT(*) FROM table_name;
|
|
3299
|
+
|
|
3300
|
+
-- YES: Schema metadata (free or near-free on all platforms)
|
|
3301
|
+
SELECT column_name, data_type FROM information_schema.columns
|
|
3302
|
+
WHERE table_name = 'my_table';
|
|
3303
|
+
|
|
3304
|
+
-- YES: Validate a golden query (with LIMIT)
|
|
3305
|
+
SELECT geoid, score FROM vw_rankings ORDER BY score DESC LIMIT 10;
|
|
3306
|
+
\`\`\`
|
|
3307
|
+
|
|
3308
|
+
### What You Must NEVER Do
|
|
3309
|
+
|
|
3310
|
+
\`\`\`sql
|
|
3311
|
+
-- NEVER: Modify data
|
|
3312
|
+
INSERT INTO / UPDATE / DELETE FROM / DROP TABLE / ALTER TABLE
|
|
3313
|
+
|
|
3314
|
+
-- NEVER: Unlimited scans
|
|
3315
|
+
SELECT * FROM large_table;
|
|
3316
|
+
SELECT DISTINCT high_cardinality_col FROM big_table;
|
|
3317
|
+
|
|
3318
|
+
-- NEVER: Expensive cross-table operations without LIMIT
|
|
3319
|
+
SELECT * FROM a JOIN b ON a.id = b.id JOIN c ON b.id = c.id;
|
|
3320
|
+
|
|
3321
|
+
-- NEVER: Write to the database in any way
|
|
3322
|
+
CREATE TABLE / CREATE VIEW / CREATE INDEX
|
|
3323
|
+
\`\`\`
|
|
3324
|
+
|
|
3325
|
+
If a query might be expensive and you're not sure, **ask the user first**. "This table looks large \u2014 is it OK if I run a COUNT(*)?" is always the right call.
|
|
3326
|
+
|
|
3262
3327
|
## Reference Documents
|
|
3263
3328
|
|
|
3264
3329
|
Check \`context/reference/\` for any files the user has provided \u2014 data dictionaries, Confluence exports, ERDs, business glossaries, dashboard docs, etc. **Read these first** before querying the database. They contain domain knowledge that will dramatically improve your metadata quality.
|
|
@@ -3360,18 +3425,20 @@ You must iterate \u2014 a single pass is never enough. Each \`context tier\` run
|
|
|
3360
3425
|
|
|
3361
3426
|
### Before writing ANY metadata, query the database first
|
|
3362
3427
|
|
|
3363
|
-
For every field you're about to describe or classify:
|
|
3428
|
+
For every field you're about to describe or classify (**always with LIMIT, always read-only**):
|
|
3364
3429
|
|
|
3365
3430
|
\`\`\`sql
|
|
3366
3431
|
-- What type of values does this column contain?
|
|
3367
3432
|
SELECT DISTINCT column_name FROM table LIMIT 20;
|
|
3368
3433
|
|
|
3369
3434
|
-- For numeric columns: is this a metric or dimension?
|
|
3370
|
-
SELECT MIN(col), MAX(col), AVG(col), COUNT(DISTINCT col) FROM table;
|
|
3435
|
+
SELECT MIN(col), MAX(col), AVG(col), COUNT(DISTINCT col) FROM table LIMIT 1;
|
|
3371
3436
|
|
|
3372
3437
|
-- For potential metrics: does SUM make sense?
|
|
3373
3438
|
-- If SUM produces a meaningful business number \u2192 additive: true
|
|
3374
3439
|
-- If SUM is meaningless (e.g., summing percentages, scores, ratings) \u2192 additive: false
|
|
3440
|
+
|
|
3441
|
+
-- REMEMBER: Never run queries without LIMIT. Never modify data.
|
|
3375
3442
|
\`\`\`
|
|
3376
3443
|
|
|
3377
3444
|
### Semantic Role Decision Tree
|
|
@@ -3454,6 +3521,42 @@ ${datasetList || "(none detected)"}
|
|
|
3454
3521
|
|
|
3455
3522
|
${failingSection}
|
|
3456
3523
|
|
|
3524
|
+
## Serving to Other Agents via MCP
|
|
3525
|
+
|
|
3526
|
+
Once the semantic layer reaches Silver or Gold, serve it so other AI agents can use the curated metadata:
|
|
3527
|
+
|
|
3528
|
+
\`\`\`bash
|
|
3529
|
+
# Start MCP server (agents connect via stdio)
|
|
3530
|
+
context serve --stdio
|
|
3531
|
+
|
|
3532
|
+
# Or via HTTP for remote/multi-agent setups
|
|
3533
|
+
context serve --http --port 3000
|
|
3534
|
+
\`\`\`
|
|
3535
|
+
|
|
3536
|
+
To add ContextKit as an MCP server in another agent's config:
|
|
3537
|
+
|
|
3538
|
+
\`\`\`json
|
|
3539
|
+
{
|
|
3540
|
+
"mcpServers": {
|
|
3541
|
+
"contextkit": {
|
|
3542
|
+
"command": "npx",
|
|
3543
|
+
"args": ["@runcontext/cli", "serve", "--stdio"]
|
|
3544
|
+
}
|
|
3545
|
+
}
|
|
3546
|
+
}
|
|
3547
|
+
\`\`\`
|
|
3548
|
+
|
|
3549
|
+
### Exporting AI Blueprints
|
|
3550
|
+
|
|
3551
|
+
Export the Gold-tier outcome as a portable YAML file:
|
|
3552
|
+
|
|
3553
|
+
\`\`\`bash
|
|
3554
|
+
context blueprint ${modelName}
|
|
3555
|
+
# \u2192 blueprints/${modelName}.data-product.osi.yaml
|
|
3556
|
+
\`\`\`
|
|
3557
|
+
|
|
3558
|
+
This AI Blueprint contains the complete semantic specification \u2014 share it, serve it via MCP, or import it into any OSI-compliant tool.
|
|
3559
|
+
|
|
3457
3560
|
## MCP Tools (if using ContextKit as an MCP server)
|
|
3458
3561
|
|
|
3459
3562
|
| Tool | Parameters | What it does |
|
|
@@ -3478,10 +3581,11 @@ ${failingSection}
|
|
|
3478
3581
|
Inspect computed views in the database. Any calculated column is a candidate metric.
|
|
3479
3582
|
|
|
3480
3583
|
\`\`\`sql
|
|
3481
|
-
-- Find computed columns in views
|
|
3584
|
+
-- Find computed columns in views (information_schema queries are free/cheap)
|
|
3482
3585
|
SELECT column_name, data_type
|
|
3483
3586
|
FROM information_schema.columns
|
|
3484
|
-
WHERE table_name LIKE 'vw_%' AND data_type IN ('DOUBLE', 'FLOAT', 'INTEGER', 'BIGINT', 'DECIMAL')
|
|
3587
|
+
WHERE table_name LIKE 'vw_%' AND data_type IN ('DOUBLE', 'FLOAT', 'INTEGER', 'BIGINT', 'DECIMAL')
|
|
3588
|
+
LIMIT 100;
|
|
3485
3589
|
\`\`\`
|
|
3486
3590
|
|
|
3487
3591
|
For each computed column (e.g., \`opportunity_score\`, \`shops_per_10k\`, \`demand_signal_pct\`):
|
|
@@ -3529,9 +3633,10 @@ Models with 5+ datasets need at least 3 glossary terms linked by shared tags or
|
|
|
3529
3633
|
For each join in the SQL views, define a relationship in the OSI model.
|
|
3530
3634
|
|
|
3531
3635
|
\`\`\`sql
|
|
3532
|
-
-- Find joins by examining view definitions
|
|
3636
|
+
-- Find joins by examining view definitions (metadata query, low cost)
|
|
3533
3637
|
-- Look for patterns: ON table_a.col = table_b.col
|
|
3534
3638
|
-- Or spatial joins: ABS(a.lat - b.lat) < threshold
|
|
3639
|
+
-- NEVER run the actual joins yourself to "test" them \u2014 just document the relationship
|
|
3535
3640
|
\`\`\`
|
|
3536
3641
|
|
|
3537
3642
|
For each join:
|
|
@@ -3549,12 +3654,14 @@ Models with 3+ datasets need at least 3 relationships.
|
|
|
3549
3654
|
|
|
3550
3655
|
### Golden Queries
|
|
3551
3656
|
|
|
3552
|
-
Write 3-5 SQL queries answering common business questions. **Test each query first!**
|
|
3657
|
+
Write 3-5 SQL queries answering common business questions. **Test each query with LIMIT first!**
|
|
3553
3658
|
|
|
3554
3659
|
\`\`\`sql
|
|
3555
|
-
--
|
|
3660
|
+
-- Validate with LIMIT (never run unbounded queries to "test"):
|
|
3556
3661
|
SELECT geoid, tract_name, opportunity_score
|
|
3557
3662
|
FROM vw_candidate_zones ORDER BY opportunity_score DESC LIMIT 10;
|
|
3663
|
+
|
|
3664
|
+
-- The golden query YAML can document the full query, but when you TEST it, always use LIMIT
|
|
3558
3665
|
\`\`\`
|
|
3559
3666
|
|
|
3560
3667
|
## YAML Formats
|
|
@@ -4131,7 +4238,7 @@ var newCommand = new Command17("new").description("Scaffold a new data product i
|
|
|
4131
4238
|
|
|
4132
4239
|
// src/index.ts
|
|
4133
4240
|
var program = new Command18();
|
|
4134
|
-
program.name("context").description("ContextKit \u2014 AI-ready metadata governance over OSI").version("0.4.
|
|
4241
|
+
program.name("context").description("ContextKit \u2014 AI-ready metadata governance over OSI").version("0.4.2");
|
|
4135
4242
|
program.addCommand(lintCommand);
|
|
4136
4243
|
program.addCommand(buildCommand);
|
|
4137
4244
|
program.addCommand(tierCommand);
|