opencode-skills-collection 2.0.0 → 2.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (90) hide show
  1. package/bundled-skills/.antigravity-install-manifest.json +6 -1
  2. package/bundled-skills/docs/integrations/jetski-cortex.md +3 -3
  3. package/bundled-skills/docs/integrations/jetski-gemini-loader/README.md +1 -1
  4. package/bundled-skills/docs/maintainers/repo-growth-seo.md +3 -3
  5. package/bundled-skills/docs/maintainers/skills-update-guide.md +1 -1
  6. package/bundled-skills/docs/users/bundles.md +1 -1
  7. package/bundled-skills/docs/users/claude-code-skills.md +1 -1
  8. package/bundled-skills/docs/users/gemini-cli-skills.md +1 -1
  9. package/bundled-skills/docs/users/getting-started.md +1 -1
  10. package/bundled-skills/docs/users/kiro-integration.md +1 -1
  11. package/bundled-skills/docs/users/usage.md +4 -4
  12. package/bundled-skills/docs/users/visual-guide.md +4 -4
  13. package/bundled-skills/manage-skills/SKILL.md +187 -0
  14. package/bundled-skills/monte-carlo-monitor-creation/SKILL.md +222 -0
  15. package/bundled-skills/monte-carlo-monitor-creation/references/comparison-monitor.md +426 -0
  16. package/bundled-skills/monte-carlo-monitor-creation/references/custom-sql-monitor.md +207 -0
  17. package/bundled-skills/monte-carlo-monitor-creation/references/metric-monitor.md +292 -0
  18. package/bundled-skills/monte-carlo-monitor-creation/references/table-monitor.md +231 -0
  19. package/bundled-skills/monte-carlo-monitor-creation/references/validation-monitor.md +404 -0
  20. package/bundled-skills/monte-carlo-prevent/SKILL.md +252 -0
  21. package/bundled-skills/monte-carlo-prevent/references/TROUBLESHOOTING.md +23 -0
  22. package/bundled-skills/monte-carlo-prevent/references/parameters.md +32 -0
  23. package/bundled-skills/monte-carlo-prevent/references/workflows.md +478 -0
  24. package/bundled-skills/monte-carlo-push-ingestion/SKILL.md +363 -0
  25. package/bundled-skills/monte-carlo-push-ingestion/references/anomaly-detection.md +87 -0
  26. package/bundled-skills/monte-carlo-push-ingestion/references/custom-lineage.md +203 -0
  27. package/bundled-skills/monte-carlo-push-ingestion/references/direct-http-api.md +207 -0
  28. package/bundled-skills/monte-carlo-push-ingestion/references/prerequisites.md +150 -0
  29. package/bundled-skills/monte-carlo-push-ingestion/references/push-lineage.md +160 -0
  30. package/bundled-skills/monte-carlo-push-ingestion/references/push-metadata.md +158 -0
  31. package/bundled-skills/monte-carlo-push-ingestion/references/push-query-logs.md +219 -0
  32. package/bundled-skills/monte-carlo-push-ingestion/references/validation.md +257 -0
  33. package/bundled-skills/monte-carlo-push-ingestion/scripts/sample_verify.py +357 -0
  34. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/bigquery/collect_and_push_lineage.py +70 -0
  35. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/bigquery/collect_and_push_metadata.py +65 -0
  36. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/bigquery/collect_and_push_query_logs.py +70 -0
  37. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/bigquery/collect_lineage.py +214 -0
  38. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/bigquery/collect_metadata.py +160 -0
  39. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/bigquery/collect_query_logs.py +164 -0
  40. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/bigquery/push_lineage.py +198 -0
  41. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/bigquery/push_metadata.py +193 -0
  42. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/bigquery/push_query_logs.py +207 -0
  43. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/bigquery-iceberg/collect_and_push_metadata.py +71 -0
  44. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/bigquery-iceberg/collect_and_push_query_logs.py +64 -0
  45. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/bigquery-iceberg/collect_metadata.py +253 -0
  46. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/bigquery-iceberg/collect_query_logs.py +149 -0
  47. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/bigquery-iceberg/push_metadata.py +190 -0
  48. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/bigquery-iceberg/push_query_logs.py +208 -0
  49. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/databricks/collect_and_push_lineage.py +83 -0
  50. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/databricks/collect_and_push_metadata.py +77 -0
  51. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/databricks/collect_and_push_query_logs.py +83 -0
  52. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/databricks/collect_lineage.py +240 -0
  53. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/databricks/collect_metadata.py +212 -0
  54. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/databricks/collect_query_logs.py +204 -0
  55. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/databricks/push_lineage.py +192 -0
  56. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/databricks/push_metadata.py +178 -0
  57. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/databricks/push_query_logs.py +200 -0
  58. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/hive/collect_and_push_lineage.py +119 -0
  59. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/hive/collect_and_push_metadata.py +119 -0
  60. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/hive/collect_and_push_query_logs.py +117 -0
  61. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/hive/collect_lineage.py +265 -0
  62. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/hive/collect_metadata.py +313 -0
  63. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/hive/collect_query_logs.py +284 -0
  64. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/hive/push_lineage.py +309 -0
  65. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/hive/push_metadata.py +245 -0
  66. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/hive/push_query_logs.py +255 -0
  67. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/redshift/collect_and_push_lineage.py +78 -0
  68. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/redshift/collect_and_push_metadata.py +80 -0
  69. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/redshift/collect_and_push_query_logs.py +88 -0
  70. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/redshift/collect_lineage.py +235 -0
  71. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/redshift/collect_metadata.py +219 -0
  72. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/redshift/collect_query_logs.py +239 -0
  73. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/redshift/push_lineage.py +178 -0
  74. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/redshift/push_metadata.py +178 -0
  75. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/redshift/push_query_logs.py +196 -0
  76. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/snowflake/collect_and_push_lineage.py +154 -0
  77. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/snowflake/collect_and_push_metadata.py +137 -0
  78. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/snowflake/collect_and_push_query_logs.py +137 -0
  79. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/snowflake/collect_lineage.py +349 -0
  80. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/snowflake/collect_metadata.py +329 -0
  81. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/snowflake/collect_query_logs.py +254 -0
  82. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/snowflake/push_lineage.py +307 -0
  83. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/snowflake/push_metadata.py +228 -0
  84. package/bundled-skills/monte-carlo-push-ingestion/scripts/templates/snowflake/push_query_logs.py +248 -0
  85. package/bundled-skills/monte-carlo-push-ingestion/scripts/test_template_sdk_usage.py +340 -0
  86. package/bundled-skills/monte-carlo-validation-notebook/SKILL.md +685 -0
  87. package/bundled-skills/monte-carlo-validation-notebook/scripts/generate_notebook_url.py +141 -0
  88. package/bundled-skills/monte-carlo-validation-notebook/scripts/resolve_dbt_schema.py +161 -0
  89. package/package.json +1 -1
  90. package/skills_index.json +503 -61
@@ -0,0 +1,685 @@
1
+ ---
2
+ name: monte-carlo-validation-notebook
3
+ description: "Generates SQL validation notebooks for dbt PR changes with before/after comparison queries."
4
+ category: data
5
+ risk: safe
6
+ source: community
7
+ source_repo: monte-carlo-data/mc-agent-toolkit
8
+ source_type: community
9
+ date_added: "2026-04-08"
10
+ author: monte-carlo-data
11
+ tags: [data-observability, validation, dbt, monte-carlo, sql-notebook]
12
+ tools: [claude, cursor, codex]
13
+ ---
14
+
15
+ > **Tip:** This skill works well with Sonnet. Run `/model sonnet` before invoking for faster generation.
16
+
17
+ Generate a SQL Notebook with validation queries for dbt changes.
18
+
19
+ **Arguments:** $ARGUMENTS
20
+
21
+ Parse the arguments:
22
+ - **Target** (required): first argument — a GitHub PR URL or local dbt repo path
23
+ - **MC Base URL** (optional): `--mc-base-url <URL>` — defaults to `https://getmontecarlo.com`
24
+ - **Models** (optional): `--models <model1,model2,...>` — comma-separated list of model filenames (without `.sql` extension) to generate queries for. Only these models will be included. By default, all changed models are included up to a maximum of 10.
25
+
26
+ ---
27
+
28
+ # Setup
29
+
30
+ **Prerequisites:**
31
+ - **`gh`** (GitHub CLI) — required for PR mode. Must be authenticated (`gh auth status`).
32
+ - **`python3`** — required for helper scripts.
33
+ - **`pyyaml`** — install with `pip3 install pyyaml` (or `pip install pyyaml`, `uv pip install pyyaml`, etc.)
34
+
35
+ **Note:** Generated SQL uses ANSI-compatible syntax that works across Snowflake, BigQuery, Redshift, and Athena. Minor adjustments may be needed for specific warehouse quirks.
36
+
37
+ This skill includes two helper scripts in `${CLAUDE_PLUGIN_ROOT}/skills/monte-carlo-validation-notebook/scripts/`:
38
+
39
+ - **`resolve_dbt_schema.py`** - Resolves dbt model output schemas from `dbt_project.yml` routing rules and model config overrides.
40
+ - **`generate_notebook_url.py`** - Encodes notebook YAML into a base64 import URL and opens it in the browser.
41
+
42
+ # Mode Detection
43
+
44
+ Auto-detect mode from the target argument:
45
+ - If target looks like a URL (contains `://` or `github.com`) -> **PR mode**
46
+ - If target is a path (`.`, `/path/to/repo`, relative path) -> **Local mode**
47
+
48
+ ---
49
+
50
+ # Context
51
+
52
+ This command generates a SQL Notebook containing validation queries for dbt changes. The notebook can be opened in the MC Bridge SQL Notebook interface for interactive validation.
53
+
54
+ The output is an import URL that opens directly in the notebook interface:
55
+ ```
56
+ <MC_BASE_URL>/notebooks/import#<base64-encoded-yaml>
57
+ ```
58
+
59
+ **Key Features:**
60
+ - **Database Parameters**: Two `text` parameters (`prod_db` and `dev_db`) for selecting databases
61
+ - **Schema Inference**: Automatically infers schema per model from `dbt_project.yml` and model configs
62
+ - **Single-table queries**: Basic validation queries using `{{prod_db}}.<SCHEMA>.<TABLE>`
63
+ - **Comparison queries**: Before/after queries comparing `{{prod_db}}` vs `{{dev_db}}`
64
+ - **Flexible usage**: Users can set both parameters to the same database for single-database analysis
65
+
66
+ # Notebook YAML Spec Reference
67
+
68
+ Key structure:
69
+ ```yaml
70
+ version: 1
71
+ metadata:
72
+ id: string # kebab-case + random suffix
73
+ name: string # display name
74
+ created_at: string # ISO 8601
75
+ updated_at: string # ISO 8601
76
+ default_context: # optional database/schema context
77
+ database: string
78
+ schema: string
79
+ cells:
80
+ - id: string
81
+ type: sql | markdown | parameter
82
+ content: string # SQL, markdown, or parameter config (JSON)
83
+ display_type: table | bar | timeseries
84
+ ```
85
+
86
+ ## Parameter Cell Spec
87
+
88
+ Parameter cells allow defining variables referenced in SQL via `{{param_name}}` syntax:
89
+
90
+ ```yaml
91
+ - id: param-prod-db
92
+ type: parameter
93
+ content:
94
+ name: prod_db # variable name
95
+ config:
96
+ type: text # free-form text input
97
+ default_value: "ANALYTICS"
98
+ placeholder: "Prod database"
99
+ display_type: table
100
+ ```
101
+
102
+ Parameter types:
103
+ - `text`: Free-form text input (used for database names)
104
+ - `schema_selector`: Two dropdowns (database -> schema), value stored as `DATABASE.SCHEMA`
105
+ - `dropdown`: Select from predefined options
106
+
107
+ # Task
108
+
109
+ Generate a SQL Notebook with validation queries based on the mode and target.
110
+
111
+ ## Phase 1: Get Changed Files
112
+
113
+ The approach differs based on mode:
114
+
115
+ ### If PR mode (GitHub PR):
116
+
117
+ 1. Extract the PR number and repo from the target URL.
118
+ - Example: `https://github.com/monte-carlo-data/dbt/pull/3386` -> owner=`monte-carlo-data`, repo=`dbt`, PR=`3386`
119
+
120
+ 2. Fetch PR metadata using `gh`:
121
+ ```bash
122
+ gh pr view <PR#> --repo <owner>/<repo> --json number,title,author,mergedAt,headRefOid
123
+ ```
124
+
125
+ 3. Fetch the list of changed files:
126
+ ```bash
127
+ gh pr view <PR#> --repo <owner>/<repo> --json files --jq '.files[].path'
128
+ ```
129
+
130
+ 4. Fetch the diff:
131
+ ```bash
132
+ gh pr diff <PR#> --repo <owner>/<repo>
133
+ ```
134
+
135
+ 5. Filter the changed files list to only `.sql` files under `models/` or `snapshots/` directories (at any depth — e.g., `models/`, `analytics/models/`, `dbt/models/`). These are the dbt models to analyze. If no model SQL files were changed, report that and stop.
136
+
137
+ 6. For each changed model file, fetch the full file content at the head SHA:
138
+ ```bash
139
+ gh api repos/<owner>/<repo>/contents/<file_path>?ref=<head_sha> --jq '.content' | python3 -c "import sys,base64; sys.stdout.write(base64.b64decode(sys.stdin.read()).decode())"
140
+ ```
141
+
142
+ 7. **Fetch dbt_project.yml** for schema resolution. Detect the dbt project root by looking at the changed file paths — find the common parent directory that contains `dbt_project.yml`. Try these paths in order until one succeeds:
143
+ ```bash
144
+ gh api repos/<owner>/<repo>/contents/<dbt_root>/dbt_project.yml?ref=<head_sha> --jq '.content' | python3 -c "import sys,base64; sys.stdout.write(base64.b64decode(sys.stdin.read()).decode())"
145
+ ```
146
+ Common `<dbt_root>` locations: `analytics`, `.` (repo root), `dbt`, `transform`. Try each until found.
147
+
148
+ Save `dbt_project.yml` to `/tmp/validation_notebook_working/<PR#>/dbt_project.yml`.
149
+
150
+ ### If Local mode (Local Directory):
151
+
152
+ 1. Change to the target directory.
153
+
154
+ 2. Get current branch info:
155
+ ```bash
156
+ git rev-parse --abbrev-ref HEAD
157
+ ```
158
+
159
+ 3. Detect base branch - try `main`, `master`, `develop` in order, or use upstream tracking branch.
160
+
161
+ 4. Get the list of changed SQL files compared to base branch:
162
+ ```bash
163
+ git diff --name-only <base_branch>...HEAD -- '*.sql'
164
+ ```
165
+
166
+ 5. Filter to only `.sql` files under `models/` or `snapshots/` directories (at any depth — e.g., `models/`, `analytics/models/`, `dbt/models/`). If no model SQL files were changed, report that and stop.
167
+
168
+ 6. Get the diff for each changed file:
169
+ ```bash
170
+ git diff <base_branch>...HEAD -- <file_path>
171
+ ```
172
+
173
+ 7. Read model files directly from the filesystem.
174
+
175
+ 8. **Find dbt_project.yml**:
176
+ ```bash
177
+ find . -name "dbt_project.yml" -type f | head -1
178
+ ```
179
+
180
+ 9. For notebook metadata in local mode, use:
181
+ - **ID**: `local-<branch-name>-<timestamp>`
182
+ - **Title**: `Local: <branch-name>`
183
+ - **Author**: Output of `git config user.name`
184
+ - **Merged**: "N/A (local)"
185
+
186
+ ### Model Selection (applies to both modes)
187
+
188
+ After filtering to `.sql` files under `models/` or `snapshots/`:
189
+
190
+ 1. **If `--models` was specified:** Filter the changed files list to only include models whose filename (without `.sql` extension, case-insensitive) matches one of the specified model names. If any specified model is not found in the changed files, warn the user but continue with the models that were found. If none match, report that and stop.
191
+
192
+ 2. **Model cap:** If more than 10 models remain after filtering, select the first 10 (by file path order) and warn the user:
193
+ ```
194
+ ⚠️ <total_count> models changed — generating validation queries for the first 10 only.
195
+ To generate for specific models, re-run with: --models <model1,model2,...>
196
+ Skipped models: <list of skipped model filenames>
197
+ ```
198
+
199
+ ## Phase 2: Parse Changed Models
200
+
201
+ For EACH changed dbt model `.sql` file, parse and extract:
202
+
203
+ ### 2a. Model Metadata
204
+
205
+ **Output table name** -- Derive from file name:
206
+ - `<any_path>/models/<subdir>/<model_name>.sql` -> table is `<MODEL_NAME>` (uppercase, taken from the filename)
207
+
208
+ **Output schema** -- Use the schema resolution script:
209
+
210
+ 1. **Setup**: Save `dbt_project.yml` and model files to `/tmp/validation_notebook_working/<id>/` preserving paths:
211
+ ```
212
+ /tmp/validation_notebook_working/<id>/
213
+ +-- dbt_project.yml
214
+ +-- models/
215
+ +-- <path>/<model>.sql
216
+ ```
217
+
218
+ 2. **Run the script** for each model:
219
+ ```bash
220
+ python3 ${CLAUDE_PLUGIN_ROOT}/skills/monte-carlo-validation-notebook/scripts/resolve_dbt_schema.py /tmp/validation_notebook_working/<id>/dbt_project.yml /tmp/validation_notebook_working/<id>/models/<path>/<model>.sql
221
+ ```
222
+
223
+ 3. **Error handling**: If the script fails, **STOP immediately** and report the error. Do NOT proceed with notebook generation if schema resolution fails.
224
+
225
+ 4. **Output**: The script prints the resolved schema (e.g., `PROD`, `PROD_STAGE`, `PROD_LINEAGE`)
226
+
227
+ **Note**: Do NOT manually parse dbt_project.yml or model configs for schema -- always use the script. It handles model config overrides, dbt_project.yml routing rules, PROD_ prefix for custom schemas, and defaults to `PROD`.
228
+
229
+ **Config block** -- Look for `{{ config(...) }}` and extract:
230
+ - `materialized` -- 'table', 'view', 'incremental', 'ephemeral'
231
+ - `unique_key` -- the dedup key (may be a string or list)
232
+ - `cluster_by` -- clustering fields (may contain the time axis)
233
+
234
+ **Core segmentation fields** -- Scan the entire model SQL for fields likely to be business keys:
235
+ - Fields named `*_id` (e.g., `account_id`, `resource_id`, `monitor_id`) that appear in JOIN ON, GROUP BY, PARTITION BY, or `unique_key`
236
+ - Deduplicate and rank by frequency. Take the top 3.
237
+
238
+ **Time axis field** -- Detect the model's time dimension (in priority order):
239
+ 1. `is_incremental()` block: field used in the WHERE comparison
240
+ 2. `cluster_by` config: timestamp/date fields
241
+ 3. Field name conventions: `ingest_ts`, `created_time`, `date_part`, `timestamp`, `run_start_time`, `export_ts`, `event_created_time`
242
+ 4. ORDER BY DESC in QUALIFY/ROW_NUMBER
243
+
244
+ If no time axis is found, skip time-axis queries for this model.
245
+
246
+ ### 2b. Diff Analysis
247
+
248
+ Parse the diff hunks for this file. Classify each changed line:
249
+
250
+ - **Changed fields** -- Lines added/modified in SELECT clauses or CTE definitions. Extract the output column name.
251
+ - **Changed filters** -- Lines added/modified in WHERE clauses.
252
+ - **Changed joins** -- Lines added/modified in JOIN ON conditions.
253
+ - **Changed unique_key** -- If `unique_key` in config was modified, note both old and new values.
254
+ - **New columns** -- Columns in "after" SELECT that don't appear in "before" (pure additions).
255
+
256
+ ### 2c. Model Classification
257
+
258
+ Classify each model as **new** or **modified** based on the diff:
259
+ - If the diff for this file contains `new file mode` → classify as **new**
260
+ - Otherwise → classify as **modified**
261
+
262
+ This classification determines which query patterns are generated in Phase 3.
263
+
264
+ **Note:** For **new models**, Phase 2b diff analysis is skipped (there is no "before" to compare against). Phase 2a metadata extraction still applies.
265
+
266
+ ## Phase 3: Generate Validation Queries
267
+
268
+ For each changed model, generate the applicable queries based on its classification (new vs modified).
269
+
270
+ **CRITICAL: Parameter Placeholder Syntax**
271
+
272
+ Use **double curly braces** `{{...}}` for parameter placeholders. Do NOT use `${...}` or any other syntax.
273
+
274
+ Correct: `{{prod_db}}.PROD.AGENT_RUNS`
275
+ Wrong: `${prod_db}.PROD.AGENT_RUNS`
276
+
277
+ **Table Reference Format:**
278
+ - Use `{{prod_db}}.<SCHEMA>.<TABLE_NAME>` for prod queries
279
+ - Use `{{dev_db}}.<SCHEMA>.<TABLE_NAME>` for dev queries
280
+ - `<SCHEMA>` is **hardcoded per-model** using the output from the schema resolution script
281
+
282
+ ---
283
+
284
+ ### Query Patterns for NEW Models
285
+
286
+ For new models, all queries target `{{dev_db}}` only. No comparison queries are generated since no prod table exists.
287
+
288
+ #### Pattern 7-new: Total Row Count
289
+ **Trigger:** Always.
290
+
291
+ ```sql
292
+ SELECT COUNT(*) AS total_rows
293
+ FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
294
+ ```
295
+
296
+ #### Pattern 9: Sample Data Preview
297
+ **Trigger:** Always.
298
+
299
+ ```sql
300
+ SELECT *
301
+ FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
302
+ LIMIT 20
303
+ ```
304
+
305
+ #### Pattern 2-new: Core Segmentation Counts
306
+ **Trigger:** Always.
307
+
308
+ ```sql
309
+ SELECT
310
+ <segmentation_field>,
311
+ COUNT(*) AS row_count
312
+ FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
313
+ GROUP BY <segmentation_field>
314
+ ORDER BY row_count DESC
315
+ LIMIT 100
316
+ ```
317
+
318
+ #### Pattern 5: Uniqueness Check
319
+ **Trigger:** Always for new models (verify unique_key constraint from the start).
320
+
321
+ ```sql
322
+ SELECT
323
+ COUNT(*) AS total_rows,
324
+ COUNT(DISTINCT <key_fields>) AS distinct_keys,
325
+ COUNT(*) - COUNT(DISTINCT <key_fields>) AS duplicate_count
326
+ FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
327
+ ```
328
+
329
+ ```sql
330
+ SELECT <key_fields>, COUNT(*) AS n
331
+ FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
332
+ GROUP BY <key_fields>
333
+ HAVING COUNT(*) > 1
334
+ ORDER BY n DESC
335
+ LIMIT 100
336
+ ```
337
+
338
+ #### Pattern 6-new: NULL Rate Check (all columns)
339
+ **Trigger:** Always. Checks all output columns since everything is new.
340
+
341
+ ```sql
342
+ SELECT
343
+ COUNT(*) AS total_rows,
344
+ SUM(CASE WHEN <col1> IS NULL THEN 1 ELSE 0 END) AS <col1>_null_count,
345
+ ROUND(100.0 * SUM(CASE WHEN <col1> IS NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0), 2) AS <col1>_null_pct,
346
+ SUM(CASE WHEN <col2> IS NULL THEN 1 ELSE 0 END) AS <col2>_null_count,
347
+ ROUND(100.0 * SUM(CASE WHEN <col2> IS NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0), 2) AS <col2>_null_pct
348
+ -- repeat for each output column
349
+ FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
350
+ ```
351
+
352
+ #### Pattern 8: Time-Axis Continuity
353
+ **Trigger:** Model is `materialized='incremental'` OR a time axis field was identified.
354
+
355
+ ```sql
356
+ SELECT
357
+ CAST(<time_axis> AS DATE) AS day,
358
+ COUNT(*) AS row_count
359
+ FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
360
+ WHERE <time_axis> >= CURRENT_TIMESTAMP - INTERVAL '14' DAY
361
+ GROUP BY day
362
+ ORDER BY day DESC
363
+ LIMIT 30
364
+ ```
365
+
366
+ ---
367
+
368
+ ### Query Patterns for MODIFIED Models
369
+
370
+ For modified models, single-table queries use `{{prod_db}}` and comparison queries use both.
371
+
372
+ #### Pattern 7: Total Row Count
373
+ **Trigger:** Always.
374
+
375
+ ```sql
376
+ SELECT COUNT(*) AS total_rows
377
+ FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
378
+ ```
379
+
380
+ #### Pattern 9: Sample Data Preview
381
+ **Trigger:** Always.
382
+
383
+ ```sql
384
+ SELECT *
385
+ FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
386
+ LIMIT 20
387
+ ```
388
+
389
+ #### Pattern 2: Core Segmentation Counts
390
+ **Trigger:** Always.
391
+
392
+ ```sql
393
+ SELECT
394
+ <segmentation_field>,
395
+ COUNT(*) AS row_count
396
+ FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
397
+ GROUP BY <segmentation_field>
398
+ ORDER BY row_count DESC
399
+ LIMIT 100
400
+ ```
401
+
402
+ #### Pattern 1: Changed Field Distribution
403
+ **Trigger:** Changed fields found in Phase 2b. **Exclude added columns** (from "New columns" in Phase 2b) — only include fields that exist in prod.
404
+
405
+ ```sql
406
+ SELECT
407
+ <changed_field>,
408
+ COUNT(*) AS row_count,
409
+ ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 2) AS pct
410
+ FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
411
+ GROUP BY <changed_field>
412
+ ORDER BY row_count DESC
413
+ LIMIT 100
414
+ ```
415
+
416
+ #### Pattern 5: Uniqueness Check
417
+ **Trigger:** JOIN condition changed, `unique_key` changed, or model is incremental.
418
+
419
+ ```sql
420
+ SELECT
421
+ COUNT(*) AS total_rows,
422
+ COUNT(DISTINCT <key_fields>) AS distinct_keys,
423
+ COUNT(*) - COUNT(DISTINCT <key_fields>) AS duplicate_count
424
+ FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
425
+ ```
426
+
427
+ ```sql
428
+ SELECT <key_fields>, COUNT(*) AS n
429
+ FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
430
+ GROUP BY <key_fields>
431
+ HAVING COUNT(*) > 1
432
+ ORDER BY n DESC
433
+ LIMIT 100
434
+ ```
435
+
436
+ #### Pattern 6: NULL Rate Check
437
+ **Trigger:** New column added, or column wrapped in COALESCE/NULLIF.
438
+
439
+ **Important:** Added columns (from "New columns" in Phase 2b) do NOT exist in prod yet. For added columns, query `{{dev_db}}` only. For modified columns (COALESCE/NULLIF changes), compare both databases.
440
+
441
+ **For added columns** (dev only):
442
+ ```sql
443
+ SELECT
444
+ COUNT(*) AS total_rows,
445
+ SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) AS null_count,
446
+ ROUND(100.0 * SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0), 2) AS null_pct
447
+ FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
448
+ ```
449
+
450
+ **For modified columns** (prod vs dev):
451
+ ```sql
452
+ SELECT
453
+ 'prod' AS source,
454
+ COUNT(*) AS total_rows,
455
+ SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) AS null_count,
456
+ ROUND(100.0 * SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0), 2) AS null_pct
457
+ FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
458
+ UNION ALL
459
+ SELECT
460
+ 'dev' AS source,
461
+ COUNT(*) AS total_rows,
462
+ SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) AS null_count,
463
+ ROUND(100.0 * SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0), 2) AS null_pct
464
+ FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
465
+ ```
466
+
467
+ #### Pattern 8: Time-Axis Continuity
468
+ **Trigger:** Model is `materialized='incremental'` OR a time axis field was identified.
469
+
470
+ ```sql
471
+ SELECT
472
+ CAST(<time_axis> AS DATE) AS day,
473
+ COUNT(*) AS row_count
474
+ FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
475
+ WHERE <time_axis> >= CURRENT_TIMESTAMP - INTERVAL '14' DAY
476
+ GROUP BY day
477
+ ORDER BY day DESC
478
+ LIMIT 30
479
+ ```
480
+
481
+ #### Pattern 3: Before/After Comparison
482
+ **Trigger:** Always (for changed fields + top segmentation field). **Modified models only.**
483
+
484
+ **Important:** Exclude added columns (from "New columns" in Phase 2b) from `<group_fields>`. Only use fields that exist in BOTH prod and dev. Added columns don't exist in prod and will cause query errors.
485
+
486
+ ```sql
487
+ WITH prod AS (
488
+ SELECT <group_fields>, COUNT(*) AS cnt
489
+ FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
490
+ GROUP BY <group_fields>
491
+ ),
492
+ dev AS (
493
+ SELECT <group_fields>, COUNT(*) AS cnt
494
+ FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
495
+ GROUP BY <group_fields>
496
+ )
497
+ SELECT
498
+ COALESCE(b.<field>, d.<field>) AS <field>,
499
+ COALESCE(b.cnt, 0) AS cnt_prod,
500
+ COALESCE(d.cnt, 0) AS cnt_dev,
501
+ COALESCE(d.cnt, 0) - COALESCE(b.cnt, 0) AS diff
502
+ FROM prod b
503
+ FULL OUTER JOIN dev d ON b.<field> = d.<field>
504
+ ORDER BY ABS(diff) DESC
505
+ LIMIT 100
506
+ ```
507
+
508
+ #### Pattern 7b: Row Count Comparison
509
+ **Trigger:** Always. **Modified models only.**
510
+
511
+ ```sql
512
+ SELECT 'prod' AS source, COUNT(*) AS row_count FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
513
+ UNION ALL
514
+ SELECT 'dev' AS source, COUNT(*) AS row_count FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
515
+ ```
516
+
517
+ ## Phase 4: Build Notebook YAML
518
+
519
+ ### 4a. Metadata
520
+ ```yaml
521
+ version: 1
522
+ metadata:
523
+ id: validation-pr-<PR_NUMBER>-<random_suffix>
524
+ name: "Validation: PR #<PR_NUMBER> - <PR_TITLE_TRUNCATED>"
525
+ created_at: "<current_iso_timestamp>"
526
+ updated_at: "<current_iso_timestamp>"
527
+ ```
528
+
529
+ ### 4b. Parameter Cells
530
+
531
+ **Only include `prod_db` if there are modified models.** If all models are new, only include `dev_db`.
532
+
533
+ ```yaml
534
+ # Include ONLY if there are modified models:
535
+ - id: param-prod-db
536
+ type: parameter
537
+ content:
538
+ name: prod_db
539
+ config:
540
+ type: text
541
+ default_value: "ANALYTICS"
542
+ placeholder: "Prod database (e.g., ANALYTICS)"
543
+ display_type: table
544
+
545
+ # Always include:
546
+ - id: param-dev-db
547
+ type: parameter
548
+ content:
549
+ name: dev_db
550
+ config:
551
+ type: text
552
+ default_value: "PERSONAL_<USER>"
553
+ placeholder: "Dev database (e.g., PERSONAL_JSMITH)"
554
+ display_type: table
555
+ ```
556
+
557
+ ### 4c. Markdown Summary Cell
558
+ ```yaml
559
+ - id: cell-summary
560
+ type: markdown
561
+ content: |
562
+ # Validation Queries for <PR or Local Branch>
563
+ ## Summary
564
+ - **Title:** <title>
565
+ - **Author:** <author>
566
+ - **Source:** <PR URL or "Local branch: <branch>">
567
+ - **Status:** <merge_timestamp or "Not yet merged" or "N/A (local)">
568
+ ## Changes
569
+ <brief description based on diff analysis>
570
+ ## Changed Models
571
+ - `<SCHEMA>.<TABLE_NAME>` (from `<file_path>`)
572
+ ## How to Use
573
+ 1. Select your Snowflake connector above
574
+ 2. Set **dev_db** to your dev database (e.g., `PERSONAL_JSMITH`)
575
+ 3. If modified models are present, set **prod_db** to your prod database (e.g., `ANALYTICS`)
576
+ 4. Run single-table queries first, then comparison queries
577
+ display_type: table
578
+ ```
579
+
580
+ ### 4d. SQL Cell Format
581
+ ```yaml
582
+ - id: cell-<pattern>-<model>-<index>
583
+ type: sql
584
+ content: |
585
+ /*
586
+ ========================================
587
+ <Pattern Name (human-readable, e.g. "Total Row Count" — do NOT include pattern numbers like "Pattern 7:")>
588
+ ========================================
589
+ Model: <SCHEMA>.<TABLE_NAME>
590
+ Triggered by: <why this pattern was generated>
591
+ What to look for: <interpretation guidance>
592
+ ----------------------------------------
593
+ */
594
+ <actual_sql_query>
595
+ display_type: table
596
+ ```
597
+
598
+ ### 4e. Cell Organization
599
+
600
+ Cells are ordered consistently for both model types, following this sequence:
601
+
602
+ **New models:**
603
+ 1. Summary markdown cell (note that model is new)
604
+ 2. Parameter cells (dev_db only — no prod_db if all models are new)
605
+ 3. Total row count (Pattern 7-new)
606
+ 4. Sample data preview (Pattern 9)
607
+ 5. Core segmentation counts (Pattern 2-new)
608
+ 6. Uniqueness check (Pattern 5), NULL rate check (Pattern 6-new), Time-axis continuity (Pattern 8)
609
+
610
+ **Modified models:**
611
+ 1. Summary markdown cell
612
+ 2. Parameter cells (prod_db, dev_db)
613
+ 3. Total row count (Pattern 7)
614
+ 4. Sample data preview (Pattern 9)
615
+ 5. Core segmentation counts (Pattern 2)
616
+ 6. Changed field distribution (Pattern 1)
617
+ 7. Uniqueness check (Pattern 5), NULL rate check (Pattern 6), Time-axis continuity (Pattern 8)
618
+ 8. Before/after comparisons (Pattern 3), Row count comparison (Pattern 7b)
619
+
620
+ ## Phase 5: Generate Import URL
621
+
622
+ 1. Write notebook YAML to `/tmp/validation_notebook_working/<id>/notebook.yaml`
623
+ 2. Run the URL generation script:
624
+ ```bash
625
+ python3 ${CLAUDE_PLUGIN_ROOT}/skills/monte-carlo-validation-notebook/scripts/generate_notebook_url.py /tmp/validation_notebook_working/<id>/notebook.yaml --mc-base-url <MC_BASE_URL>
626
+ ```
627
+ 3. The script validates both YAML syntax and notebook schema (required fields on metadata and cells). If validation fails, read the error messages carefully, fix the YAML to match the spec in Phase 4, and re-run.
628
+
629
+ ## Phase 6: Output
630
+
631
+ Present:
632
+ ```markdown
633
+ # Validation Notebook Generated
634
+ ## Summary
635
+ - **Source:** PR #<number> - <title> OR Local: <branch>
636
+ - **Author:** <author>
637
+ - **Changed Models:** <count> models (of <total_count> changed)
638
+ - **Generated Queries:** <count> queries
639
+
640
+ > ⚠️ If models were capped: "Only the first 10 of <total_count> changed models were included. Re-run with `--models` to select specific models."
641
+
642
+ ## Notebook Opened
643
+ The notebook has been opened directly in your browser.
644
+ Select your Snowflake connector in the notebook interface to begin running queries.
645
+ *Make sure MC Bridge is running. Let me know if you want tips on how to install this locally*
646
+ ```
647
+
648
+ ## Important Guidelines
649
+
650
+ 1. **Do NOT execute queries** -- only generate the notebook
651
+ 2. **Keep SQL readable** -- proper formatting and meaningful aliases
652
+ 3. **Include LIMIT 100** on queries that could return many rows
653
+ 4. **Use double curly braces** -- `{{prod_db}}` NOT `${prod_db}`
654
+ 5. **Use correct table format** -- `{{prod_db}}.<SCHEMA>.<TABLE>` and `{{dev_db}}.<SCHEMA>.<TABLE>`
655
+ 6. **Always use the schema resolution script** -- do NOT manually parse dbt_project.yml
656
+ 7. **Schema is NOT a parameter** -- only `prod_db` and `dev_db` are parameters
657
+ 8. **Skip ephemeral models** -- they have no physical table
658
+ 9. **Truncate notebook name** -- keep under 50 chars
659
+ 10. **Generate unique cell IDs** -- use pattern like `cell-p3-model-1`
660
+ 11. **YAML multiline content** -- use `|` block scalar for SQL with comments
661
+ 12. **ASCII-only YAML** -- the script sanitizes and validates before encoding
662
+
663
+ ## Query Pattern Reference
664
+
665
+ | Pattern | Name | Trigger | Model Type | Database | Order |
666
+ |---------|------|---------|------------|----------|-------|
667
+ | 7 / 7-new | Total Row Count | Always | Both | `{{prod_db}}` (modified) / `{{dev_db}}` (new) | 1 |
668
+ | 9 | Sample Data Preview | Always | Both | `{{prod_db}}` (modified) / `{{dev_db}}` (new) | 2 |
669
+ | 2 / 2-new | Core Segmentation Counts | Always | Both | `{{prod_db}}` (modified) / `{{dev_db}}` (new) | 3 |
670
+ | 1 | Changed Field Distribution | Column modified in diff (not added) | Modified only | `{{prod_db}}` | 4 |
671
+ | 5 | Uniqueness Check | JOIN/unique_key changed (modified) / Always (new) | Both | `{{dev_db}}` | 5 |
672
+ | 6 / 6-new | NULL Rate Check | New column or COALESCE (modified) / Always (new) | Both | Added col: `{{dev_db}}` only; COALESCE: Both (modified) / `{{dev_db}}` (new) | 5 |
673
+ | 8 | Time-Axis Continuity | Incremental or time field | Both | `{{prod_db}}` (modified) / `{{dev_db}}` (new) | 5 |
674
+ | 3 | Before/After Comparison | Changed fields (not added) | Modified only | Both | 6 |
675
+ | 7b | Row Count Comparison | Always | Modified only | Both | 6 |
676
+
677
+ ## MC Bridge Setup Help
678
+
679
+ If the user asks how to install or set up MC Bridge, fetch the README from the mc-bridge repo and show the relevant quick start / setup instructions:
680
+
681
+ ```bash
682
+ gh api repos/monte-carlo-data/mc-bridge/readme --jq '.content' | base64 --decode
683
+ ```
684
+
685
+ Focus on: how to install, configure connections, and run MC Bridge. Don't dump the entire README — extract just the setup-relevant sections.