create-contextkit 0.3.0 → 0.3.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +8 -18
- package/templates/minimal/context/AGENT_INSTRUCTIONS.md +184 -0
- package/LICENSE +0 -21
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "create-contextkit",
|
|
3
|
-
"version": "0.3.
|
|
3
|
+
"version": "0.3.2",
|
|
4
4
|
"description": "Scaffold a new ContextKit project",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"author": "Eric Kittelson",
|
|
@@ -10,27 +10,17 @@
|
|
|
10
10
|
"url": "https://github.com/erickittelson/ContextKit.git",
|
|
11
11
|
"directory": "create-contextkit"
|
|
12
12
|
},
|
|
13
|
-
"keywords": [
|
|
14
|
-
"contextkit",
|
|
15
|
-
"scaffolder",
|
|
16
|
-
"create",
|
|
17
|
-
"init"
|
|
18
|
-
],
|
|
13
|
+
"keywords": ["contextkit", "scaffolder", "create", "init"],
|
|
19
14
|
"type": "module",
|
|
20
|
-
"bin": {
|
|
21
|
-
|
|
15
|
+
"bin": { "create-contextkit": "./dist/index.js" },
|
|
16
|
+
"files": ["dist", "templates"],
|
|
17
|
+
"scripts": {
|
|
18
|
+
"build": "tsup",
|
|
19
|
+
"clean": "rm -rf dist"
|
|
22
20
|
},
|
|
23
|
-
"files": [
|
|
24
|
-
"dist",
|
|
25
|
-
"templates"
|
|
26
|
-
],
|
|
27
21
|
"devDependencies": {
|
|
28
22
|
"@types/node": "^25.3.3",
|
|
29
23
|
"tsup": "^8.4.0",
|
|
30
24
|
"typescript": "^5.7.0"
|
|
31
|
-
},
|
|
32
|
-
"scripts": {
|
|
33
|
-
"build": "tsup",
|
|
34
|
-
"clean": "rm -rf dist"
|
|
35
25
|
}
|
|
36
|
-
}
|
|
26
|
+
}
|
|
@@ -0,0 +1,184 @@
|
|
|
1
|
+
# ContextKit Agent Instructions
|
|
2
|
+
|
|
3
|
+
You have two MCP servers: **duckdb** (query data) and **contextkit** (query metadata).
|
|
4
|
+
|
|
5
|
+
## The Cardinal Rule: Never Fabricate Metadata
|
|
6
|
+
|
|
7
|
+
**Every piece of metadata you write must be grounded in evidence from the actual data.**
|
|
8
|
+
|
|
9
|
+
- NEVER invent owner names, emails, team names, or contact info
|
|
10
|
+
- NEVER write a field description that is just the column name repeated
|
|
11
|
+
- NEVER assign a semantic_role without first querying the column's actual values
|
|
12
|
+
- NEVER mark a field as additive without understanding what summing it means
|
|
13
|
+
- NEVER write lineage entries without knowing the actual data sources
|
|
14
|
+
- NEVER write a business_context narrative you can't justify from the data
|
|
15
|
+
- NEVER create a glossary definition that is just "Definition for X"
|
|
16
|
+
|
|
17
|
+
If you don't know something, say so. Leave it as a TODO with a note about what you'd need to determine the answer. A honest TODO is infinitely better than fabricated metadata that looks plausible but is wrong.
|
|
18
|
+
|
|
19
|
+
## On Session Start
|
|
20
|
+
|
|
21
|
+
1. Run `context_tier` to check the current metadata tier (Bronze/Silver/Gold)
|
|
22
|
+
2. Report the current tier and list failing checks
|
|
23
|
+
3. Ask the user what they'd like to work on — don't start changing files unprompted
|
|
24
|
+
|
|
25
|
+
## When Asked to Reach Gold
|
|
26
|
+
|
|
27
|
+
Work through ALL failing Gold checks iteratively until `context tier` reports Gold:
|
|
28
|
+
|
|
29
|
+
1. Run `context_tier` and collect every failing check
|
|
30
|
+
2. For each failing check, query the database to gather evidence, then fix the metadata
|
|
31
|
+
3. Run `context_tier` again
|
|
32
|
+
4. If checks still fail, go back to step 2
|
|
33
|
+
5. **Do NOT stop until every Gold check passes** or you hit something that genuinely requires human input (like real owner contact info)
|
|
34
|
+
6. For checks you cannot fix (e.g., owner email), leave a clear TODO explaining what a human needs to provide
|
|
35
|
+
|
|
36
|
+
You must iterate — a single pass is never enough. Each `context tier` run may reveal new failures after earlier ones are fixed.
|
|
37
|
+
|
|
38
|
+
## How to Curate Metadata (the right way)
|
|
39
|
+
|
|
40
|
+
### Before writing ANY metadata, query the database first
|
|
41
|
+
|
|
42
|
+
For every field you're about to describe or classify:
|
|
43
|
+
|
|
44
|
+
```sql
|
|
45
|
+
-- What type of values does this column contain?
|
|
46
|
+
SELECT DISTINCT column_name FROM table LIMIT 20;
|
|
47
|
+
|
|
48
|
+
-- For numeric columns: is this a metric or dimension?
|
|
49
|
+
SELECT MIN(col), MAX(col), AVG(col), COUNT(DISTINCT col) FROM table;
|
|
50
|
+
|
|
51
|
+
-- For potential metrics: does SUM make sense?
|
|
52
|
+
-- If SUM produces a meaningful business number → additive: true
|
|
53
|
+
-- If SUM is meaningless (e.g., summing percentages, scores, ratings) → additive: false
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
### Semantic Role Decision Tree
|
|
57
|
+
|
|
58
|
+
Query the column first, then apply this logic:
|
|
59
|
+
|
|
60
|
+
1. **Is it a primary key or foreign key?** → `identifier`
|
|
61
|
+
2. **Is it a date or timestamp?** → `date`
|
|
62
|
+
3. **Is it numeric AND does aggregation make business sense?**
|
|
63
|
+
- Does SUM make sense? (counts, amounts, quantities) → `metric`, `additive: true`
|
|
64
|
+
- Does only AVG/MIN/MAX make sense? (rates, percentages, scores, ratings) → `metric`, `additive: false`
|
|
65
|
+
4. **Everything else** → `dimension`
|
|
66
|
+
|
|
67
|
+
Common mistakes to avoid:
|
|
68
|
+
- `stars` (ratings) → metric with AVG, NOT additive (summing star ratings is meaningless)
|
|
69
|
+
- `_per_10k_people` (rates) → metric with AVG, NOT additive
|
|
70
|
+
- `_score` (composite scores) → metric with AVG, NOT additive
|
|
71
|
+
- `useful/funny/cool` (vote counts) → metric with SUM, additive
|
|
72
|
+
- `_count` fields → metric with SUM, additive (usually)
|
|
73
|
+
|
|
74
|
+
### Field Descriptions
|
|
75
|
+
|
|
76
|
+
Write descriptions that help someone who has never seen this database understand what the column contains. Include:
|
|
77
|
+
- What the value represents
|
|
78
|
+
- Units or scale (if applicable)
|
|
79
|
+
- Where the data comes from (if known)
|
|
80
|
+
- Any known quirks or caveats
|
|
81
|
+
|
|
82
|
+
Bad: `description: total_population`
|
|
83
|
+
Good: `description: Total resident population of the census tract from American Community Survey 5-year estimates`
|
|
84
|
+
|
|
85
|
+
### Lineage
|
|
86
|
+
|
|
87
|
+
Upstream sources are the EXTERNAL systems that feed data into this warehouse. They are NOT the tables in the warehouse itself.
|
|
88
|
+
|
|
89
|
+
Ask yourself: "Where did this data originally come from before it was loaded here?"
|
|
90
|
+
|
|
91
|
+
### Owner Files
|
|
92
|
+
|
|
93
|
+
Do NOT create fake owner identities. If the real owner is unknown:
|
|
94
|
+
- Keep the existing owner file as-is
|
|
95
|
+
- Note in the file that contact info needs to be filled in by a real person
|
|
96
|
+
- NEVER invent email addresses like `analytics@example.com`
|
|
97
|
+
|
|
98
|
+
### Business Context
|
|
99
|
+
|
|
100
|
+
Write business_context entries that describe real analytical use cases you can verify from the data. Query the data to understand what questions it can answer before writing narratives.
|
|
101
|
+
|
|
102
|
+
### Golden Queries
|
|
103
|
+
|
|
104
|
+
Every golden query MUST be tested against the actual database before you write it. Run the SQL, verify it returns sensible results, then document it.
|
|
105
|
+
|
|
106
|
+
### Data Quality
|
|
107
|
+
|
|
108
|
+
When you discover data quality issues (null values, broken joins, missing data), FLAG THEM — don't hide them. Add notes in governance or report them to the user.
|
|
109
|
+
|
|
110
|
+
## MCP Tools
|
|
111
|
+
|
|
112
|
+
| Tool | Parameters | What it does |
|
|
113
|
+
|------|-----------|-------------|
|
|
114
|
+
| `context_search` | `query` | Find models, datasets, fields, terms by keyword |
|
|
115
|
+
| `context_explain` | `model` | Full model details — governance, rules, lineage, tier |
|
|
116
|
+
| `context_validate` | — | Run linter, get errors and warnings |
|
|
117
|
+
| `context_tier` | `model` | Tier scorecard with all check results |
|
|
118
|
+
| `context_golden_query` | `question` | Find pre-validated SQL for a question |
|
|
119
|
+
| `context_guardrails` | `tables[]` | Get required WHERE clauses for tables |
|
|
120
|
+
|
|
121
|
+
## Tier Checks Quick Reference
|
|
122
|
+
|
|
123
|
+
**Bronze (7):** descriptions, owner, security, grain, table_type
|
|
124
|
+
**Silver (+6):** trust, 2+ tags, glossary linked, lineage, refresh, 2+ sample_values
|
|
125
|
+
**Gold (+21):** semantic_role on ALL fields, metric aggregation/additive, 1+ guardrail, 3+ golden queries, 1+ business rule, 1+ hierarchy, 1+ default_filter, trust=endorsed, contactable owner, 1+ relationship, description >=50 chars, ai_context (no TODO), 1+ business_context, version, field descriptions not lazy, glossary definitions substantive, lineage references real sources, grain statements specific, ai_context filled in
|
|
126
|
+
|
|
127
|
+
## YAML Formats
|
|
128
|
+
|
|
129
|
+
**Governance** (`context/governance/*.governance.yaml`):
|
|
130
|
+
```yaml
|
|
131
|
+
model: my-model
|
|
132
|
+
owner: team-name
|
|
133
|
+
version: "1.0.0"
|
|
134
|
+
trust: endorsed
|
|
135
|
+
security: internal
|
|
136
|
+
tags: [domain-tag-1, domain-tag-2]
|
|
137
|
+
business_context:
|
|
138
|
+
- name: Use Case Name
|
|
139
|
+
description: What analytical question this data answers and for whom.
|
|
140
|
+
datasets:
|
|
141
|
+
my_table:
|
|
142
|
+
grain: "One row per [entity] identified by [key]"
|
|
143
|
+
table_type: fact # fact | dimension | event | view
|
|
144
|
+
refresh: daily
|
|
145
|
+
fields:
|
|
146
|
+
dataset.field:
|
|
147
|
+
semantic_role: metric # metric | dimension | identifier | date
|
|
148
|
+
default_aggregation: SUM # SUM | AVG | COUNT | COUNT_DISTINCT | MIN | MAX
|
|
149
|
+
additive: true # can this metric be summed across dimensions?
|
|
150
|
+
default_filter: "is_open = 1"
|
|
151
|
+
sample_values: ["val1", "val2"]
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
**Rules** (`context/rules/*.rules.yaml`):
|
|
155
|
+
```yaml
|
|
156
|
+
model: my-model
|
|
157
|
+
golden_queries:
|
|
158
|
+
- question: What are the top items by count?
|
|
159
|
+
sql: SELECT name, count FROM my_table ORDER BY count DESC LIMIT 10
|
|
160
|
+
intent: Identify top performers by volume
|
|
161
|
+
caveats: Filters to active records only
|
|
162
|
+
business_rules:
|
|
163
|
+
- name: valid-ratings
|
|
164
|
+
definition: All ratings must be between 1 and 5
|
|
165
|
+
guardrail_filters:
|
|
166
|
+
- name: active-only
|
|
167
|
+
filter: "status = 'active'"
|
|
168
|
+
reason: Exclude inactive records from analytics
|
|
169
|
+
tables: [my_table]
|
|
170
|
+
hierarchies:
|
|
171
|
+
- name: geography
|
|
172
|
+
levels: [state, city, postal_code]
|
|
173
|
+
dataset: my_table
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
## CLI Commands
|
|
177
|
+
|
|
178
|
+
```bash
|
|
179
|
+
context tier # Check scorecard
|
|
180
|
+
context verify --db <path> # Validate against live data
|
|
181
|
+
context fix --db <path> # Auto-fix data warnings
|
|
182
|
+
context setup # Interactive setup wizard
|
|
183
|
+
context dev # Watch mode for live editing
|
|
184
|
+
```
|
package/LICENSE
DELETED
|
@@ -1,21 +0,0 @@
|
|
|
1
|
-
MIT License
|
|
2
|
-
|
|
3
|
-
Copyright (c) 2026 Eric Kittelson
|
|
4
|
-
|
|
5
|
-
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
-
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
-
in the Software without restriction, including without limitation the rights
|
|
8
|
-
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
-
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
-
furnished to do so, subject to the following conditions:
|
|
11
|
-
|
|
12
|
-
The above copyright notice and this permission notice shall be included in all
|
|
13
|
-
copies or substantial portions of the Software.
|
|
14
|
-
|
|
15
|
-
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
-
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
-
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
-
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
-
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
-
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
-
SOFTWARE.
|