@clipboard-health/ai-rules 1.4.14 → 1.4.15

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -24,19 +24,19 @@ npm install --save-dev @clipboard-health/ai-rules
24
24
 
25
25
  1. Choose the profile that matches your project type:
26
26
 
27
- | Profile | Includes | Use For |
28
- | --------------- | --------------------------- | -------------------------------------- |
29
- | `common` | common | TypeScript libraries, generic projects |
30
- | `frontend` | common + frontend | React apps, web apps |
31
- | `backend` | common + backend | NestJS services, APIs |
32
- | `fullstack` | common + frontend + backend | Monorepos, fullstack apps |
33
- | `data-modeling` | data-modeling | DBT data modeling |
27
+ | Profile | Includes | Use For |
28
+ | -------------- | --------------------------- | -------------------------------------- |
29
+ | `common` | common | TypeScript libraries, generic projects |
30
+ | `frontend` | common + frontend | React apps, web apps |
31
+ | `backend` | common + backend | NestJS services, APIs |
32
+ | `fullstack` | common + frontend + backend | Monorepos, fullstack apps |
33
+ | `datamodeling` | datamodeling | DBT data modeling |
34
34
 
35
35
  **Rule categories:**
36
36
  - **common**: TypeScript, testing, code style, error handling, key conventions
37
37
  - **frontend**: React patterns, hooks, performance, styling, data fetching, custom hooks
38
38
  - **backend**: NestJS APIs, three-tier architecture, controllers, services
39
- - **data-modeling**: data modeling, testing, yaml documentation, data cleaning, analytics
39
+ - **datamodeling**: data modeling, testing, yaml documentation, data cleaning, analytics
40
40
 
41
41
  2. Add it to your `package.json`:
42
42
 
@@ -0,0 +1,113 @@
1
+ <!-- Generated by Ruler -->
2
+
3
+ <!-- Source: .ruler/datamodeling/analytics.md -->
4
+
5
+ # About
6
+
7
+ General knowledge for running data analysis and querying our snowflake data warehouse.
8
+
9
+ ## Understanding dbt Model Relationships and Metadata
10
+
11
+ Use the dbt-mcp server to:
12
+
13
+ - Understand DAG relationships and general metadata around dbt models.
14
+ - Fetch the full table name in production for a given model.
15
+ - Look at dbt cloud runs for CI or production errors instead of other methods.
16
+
17
+ ## Querying Snowflake Data Warehouse (using Snowflake MCP)
18
+
19
+ - When you need to answer data-related questions or obtain analytics by querying the Snowflake data warehouse, you should use the `mcp-cli` tool.
20
+ - When using the Snowflake MCP to run queries, you must set the database context properly in your queries. Use fully qualified table names or set the database context to avoid connection errors.
21
+ - The `describe_object` tool in the Snowflake MCP has a bug where it misinterprets the target_object structure, treating the table name as a database name and causing 404 "database does not exist" errors. Use `run_snowflake_query` with "DESCRIBE TABLE" instead to get table schema information successfully.
22
+
23
+ ## Guidelines when using this knowledge:
24
+
25
+ - Read all of the docs.yml files to learn about the analytics schema.
26
+ - When in doubt, read the code in the data-modeling repo to learn how each column is calculated and where the data is coming from
27
+ - Strongly prefer to use mart models (defined inside the mart folder, those that don't have an int* or stg* prefix) before int* and stg* models
28
+ - Strongly prefer to query tables under the analytics schema, before querying any other schemas like airbyte_db/hevo_database
29
+ - If unsure, confirm with the user. Providing suggestions of tables to use
30
+ - If required, you might do some data analysis using python instead of pure SQL. Connect to snowflake using a python script and then use libraries like pandas, numpy, seaborn for visualization
31
+
32
+ ## Output format
33
+
34
+ - When running queries against snowflake and providing the user with a final answer, always show the final query that produced the result along with the result itself, so that the user is able to validate the query makes sense.
35
+ - Once you've reached a final query that you need to show to the user, use get_metabase_playground_link to generate a playground link where the user can run the query themselves. Format it as a link with the 'metabase playground link' label as the link text, using slack's markdown format. This is A MUST
36
+ - Include charts or tables formatted as markdown if needed
37
+ - If the final result is a single number, make sure to show this prominently to the user so it's very easy to see
38
+
39
+ ## Identifying the right columns to use and how to filter data
40
+
41
+ For categorical columns, you might want to select distinct values for a specific column to see what the possible options are and how to filter data
42
+ Read the model definitions in the dbt folders to see how that column is computed and what possible values it might have
43
+
44
+ ## Finding Source Columns for dbt Models
45
+
46
+ When working on dbt model modifications and you cannot find specific fields in the staging models, use the Snowflake MCP to examine the source tables directly. For example, if fields are not visible in `stg_salesforce__accounts.sql`, examine the `airbyte_database.salesforce_accounts.account` table using Snowflake MCP to identify the actual column names in the source data.
47
+
48
+ ## Column Discovery Best Practices
49
+
50
+ When discovering columns in Snowflake source tables, use the full table name approach with information_schema.columns and avoid LIKE clauses for more precise results. Use queries like:
51
+ `SELECT column_name, data_type FROM airbyte_database.information_schema.columns WHERE table_name = 'EVENT' AND table_schema = 'SALESFORCE' ORDER BY ordinal_position`
52
+ Instead of using LIKE clauses or partial matching which can be imprecise.
53
+
54
+ <!-- Source: .ruler/datamodeling/castingDbtStagingModels.md -->
55
+
56
+ # Casting data-types for staging models
57
+
58
+ 1. Use declared schemas when available
59
+ Always cast explicitly. Never rely on implicit types.
60
+ 2. If everything is VARCHAR, infer from values, not names
61
+ Inference order:
62
+ JSON → TIMESTAMP → DATE → INTEGER → FLOAT → BOOLEAN → STRING
63
+ 3. Prefer TRY_CAST over CAST
64
+ Only cast when >95% of values match or safe to fail null.
65
+ 4. Normalize nulls
66
+ Convert empty strings and known garbage values (NULL, N/A, -) to null.
67
+ 5. Never infer types from column names alone
68
+ (id, status, amount, etc. are not type evidence).
69
+ 6. Preserve raw when casting is risky
70
+ Add \_raw column if transformation is lossy or unreliable.
71
+ 7. Flag high cast failure rates (>5–10%)
72
+ Leave as STRING and add comment.
73
+
74
+ <!-- Source: .ruler/datamodeling/dbtModelDevelopment.md -->
75
+
76
+ # Read the following docs in data-modeling repo
77
+
78
+ - CONTRIBUTING.md
79
+ - dbt_style_guide.md
80
+ - README.md
81
+ - models/DEVIN_ANALYST_MODEL_GUIDE.md
82
+ - models/DEVIN_ANALYST_FOLDER_STRUCTURE.md
83
+ These define our modeling rules, patterns, and safety constraints.
84
+
85
+ # Key best practices to follow
86
+
87
+ - ALL dbt staging models must have strictly defined datatypes. Please Read the "Casting DBT Staging Model Datatype Heuristic" Knowledge we have. These datatypes need to defined in the yaml documentation too.
88
+ - Use doc blocks for any YAML column descriptions that span across more than one model. Do NOT repeat descriptions for the same column - please reuse a doc-block!
89
+ - When adding new fields to tables keep the original source field name format, but remove any custom field prefix (**c). For example assignment_type**c should be renamed to assignment_type. Please do not hallucinate the column field names as this is misleading for users.
90
+ - If a source table doesn't exist. Please tell the user to ask the data-team to ingest it via the relevant ETL tool.
91
+ - A model must always have a primary/unique key. If there's no obvious one, please create a surrogate key using a combination of fields and by looking at the data. Use `dbt_utils.generate_surrogate_key` to do so.
92
+ - Keep It Simple. Don't overcomplicate.
93
+
94
+ <!-- Source: .ruler/datamodeling/dbtYamlDocumentation.md -->
95
+
96
+ # .yaml documentation rules
97
+
98
+ - The YAML should include the following sections: `version`, `models`, and `columns`.
99
+ - At least one column must be the primary key. This column should have both the `not_null` and `unique` tests.
100
+ - Every column must include:
101
+ - A `name`
102
+ - A `description`. Use a doc-block if the column already exists in the `docs.md` file. If you see the same column being referenced more than once in the repo, create a doc-block for it.
103
+ - A `data_type`.
104
+ - Include a newline between columns
105
+ - Ensure proper YAML formatting and indentation.
106
+ - Include a `description` for the model that:
107
+ - Format for human-readability. It is hard to read descriptions as one big block, space out appropriately.
108
+ - Explains what the model is and why it exists.
109
+ - What does each row represent?
110
+ - Mentions important **filtering criteria** or **gotchas** users should be aware of when querying it
111
+ - Uses full sentences in plain, concise English
112
+ - Wraps long text using folded block style (`>`)
113
+ - Be concise and to the point.
@@ -0,0 +1 @@
1
+ @AGENTS.md
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "@clipboard-health/ai-rules",
3
3
  "description": "Pre-built AI agent rules for consistent coding standards.",
4
- "version": "1.4.14",
4
+ "version": "1.4.15",
5
5
  "bugs": "https://github.com/ClipboardHealth/core-utils/issues",
6
6
  "devDependencies": {
7
7
  "@intellectronica/ruler": "0.3.16"
@@ -18,5 +18,6 @@ exports.PROFILES = {
18
18
  frontend: ["common", "frontend"],
19
19
  backend: ["common", "backend"],
20
20
  fullstack: ["common", "frontend", "backend"],
21
+ datamodeling: ["datamodeling"],
21
22
  common: ["common"],
22
23
  };