npm - @clipboard-health/ai-rules - Versions diffs - 1.4.14 → 1.4.15 - Mend

@clipboard-health/ai-rules 1.4.14 → 1.4.15

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md CHANGED Viewed

@@ -24,19 +24,19 @@ npm install --save-dev @clipboard-health/ai-rules
 1. Choose the profile that matches your project type:
-   | Profile         | Includes                    | Use For                                |
-   | --------------- | --------------------------- | -------------------------------------- |
-   | `common`        | common                      | TypeScript libraries, generic projects |
-   | `frontend`      | common + frontend           | React apps, web apps                   |
-   | `backend`       | common + backend            | NestJS services, APIs                  |
-   | `fullstack`     | common + frontend + backend | Monorepos, fullstack apps              |
-   | `data-modeling` | data-modeling               | DBT data modeling                      |
+   | Profile        | Includes                    | Use For                                |
+   | -------------- | --------------------------- | -------------------------------------- |
+   | `common`       | common                      | TypeScript libraries, generic projects |
+   | `frontend`     | common + frontend           | React apps, web apps                   |
+   | `backend`      | common + backend            | NestJS services, APIs                  |
+   | `fullstack`    | common + frontend + backend | Monorepos, fullstack apps              |
+   | `datamodeling` | datamodeling                | DBT data modeling                      |
    **Rule categories:**
    - **common**: TypeScript, testing, code style, error handling, key conventions
    - **frontend**: React patterns, hooks, performance, styling, data fetching, custom hooks
    - **backend**: NestJS APIs, three-tier architecture, controllers, services
-   - **data-modeling**: data modeling, testing, yaml documentation, data cleaning, analytics
+   - **datamodeling**: data modeling, testing, yaml documentation, data cleaning, analytics
 2. Add it to your `package.json`:

package/datamodeling/AGENTS.md ADDED Viewed

@@ -0,0 +1,113 @@
+<!-- Generated by Ruler -->
+<!-- Source: .ruler/datamodeling/analytics.md -->
+# About
+General knowledge for running data analysis and querying our snowflake data warehouse.
+## Understanding dbt Model Relationships and Metadata
+Use the dbt-mcp server to:
+- Understand DAG relationships and general metadata around dbt models.
+- Fetch the full table name in production for a given model.
+- Look at dbt cloud runs for CI or production errors instead of other methods.
+## Querying Snowflake Data Warehouse (using Snowflake MCP)
+- When you need to answer data-related questions or obtain analytics by querying the Snowflake data warehouse, you should use the `mcp-cli` tool.
+- When using the Snowflake MCP to run queries, you must set the database context properly in your queries. Use fully qualified table names or set the database context to avoid connection errors.
+- The `describe_object` tool in the Snowflake MCP has a bug where it misinterprets the target_object structure, treating the table name as a database name and causing 404 "database does not exist" errors. Use `run_snowflake_query` with "DESCRIBE TABLE" instead to get table schema information successfully.
+## Guidelines when using this knowledge:
+- Read all of the docs.yml files to learn about the analytics schema.
+- When in doubt, read the code in the data-modeling repo to learn how each column is calculated and where the data is coming from
+- Strongly prefer to use mart models (defined inside the mart folder, those that don't have an int* or stg* prefix) before int* and stg* models
+- Strongly prefer to query tables under the analytics schema, before querying any other schemas like airbyte_db/hevo_database
+- If unsure, confirm with the user. Providing suggestions of tables to use
+- If required, you might do some data analysis using python instead of pure SQL. Connect to snowflake using a python script and then use libraries like pandas, numpy, seaborn for visualization
+## Output format
+- When running queries against snowflake and providing the user with a final answer, always show the final query that produced the result along with the result itself, so that the user is able to validate the query makes sense.
+- Once you've reached a final query that you need to show to the user, use get_metabase_playground_link to generate a playground link where the user can run the query themselves. Format it as a link with the 'metabase playground link' label as the link text, using slack's markdown format. This is A MUST
+- Include charts or tables formatted as markdown if needed
+- If the final result is a single number, make sure to show this prominently to the user so it's very easy to see
+## Identifying the right columns to use and how to filter data
+For categorical columns, you might want to select distinct values for a specific column to see what the possible options are and how to filter data
+Read the model definitions in the dbt folders to see how that column is computed and what possible values it might have
+## Finding Source Columns for dbt Models
+When working on dbt model modifications and you cannot find specific fields in the staging models, use the Snowflake MCP to examine the source tables directly. For example, if fields are not visible in `stg_salesforce__accounts.sql`, examine the `airbyte_database.salesforce_accounts.account` table using Snowflake MCP to identify the actual column names in the source data.
+## Column Discovery Best Practices
+When discovering columns in Snowflake source tables, use the full table name approach with information_schema.columns and avoid LIKE clauses for more precise results. Use queries like:
+`SELECT column_name, data_type FROM airbyte_database.information_schema.columns WHERE table_name = 'EVENT' AND table_schema = 'SALESFORCE' ORDER BY ordinal_position`
+Instead of using LIKE clauses or partial matching which can be imprecise.
+<!-- Source: .ruler/datamodeling/castingDbtStagingModels.md -->
+# Casting data-types for staging models
+1. Use declared schemas when available
+   Always cast explicitly. Never rely on implicit types.
+2. If everything is VARCHAR, infer from values, not names
+   Inference order:
+   JSON → TIMESTAMP → DATE → INTEGER → FLOAT → BOOLEAN → STRING
+3. Prefer TRY_CAST over CAST
+   Only cast when >95% of values match or safe to fail null.
+4. Normalize nulls
+   Convert empty strings and known garbage values (NULL, N/A, -) to null.
+5. Never infer types from column names alone
+   (id, status, amount, etc. are not type evidence).
+6. Preserve raw when casting is risky
+   Add \_raw column if transformation is lossy or unreliable.
+7. Flag high cast failure rates (>5–10%)
+   Leave as STRING and add comment.
+<!-- Source: .ruler/datamodeling/dbtModelDevelopment.md -->
+# Read the following docs in data-modeling repo
+- CONTRIBUTING.md
+- dbt_style_guide.md
+- README.md
+- models/DEVIN_ANALYST_MODEL_GUIDE.md
+- models/DEVIN_ANALYST_FOLDER_STRUCTURE.md
+  These define our modeling rules, patterns, and safety constraints.
+# Key best practices to follow
+- ALL dbt staging models must have strictly defined datatypes. Please Read the "Casting DBT Staging Model Datatype Heuristic" Knowledge we have. These datatypes need to defined in the yaml documentation too.
+- Use doc blocks for any YAML column descriptions that span across more than one model. Do NOT repeat descriptions for the same column - please reuse a doc-block!
+- When adding new fields to tables keep the original source field name format, but remove any custom field prefix (**c). For example assignment_type**c should be renamed to assignment_type. Please do not hallucinate the column field names as this is misleading for users.
+- If a source table doesn't exist. Please tell the user to ask the data-team to ingest it via the relevant ETL tool.
+- A model must always have a primary/unique key. If there's no obvious one, please create a surrogate key using a combination of fields and by looking at the data. Use `dbt_utils.generate_surrogate_key` to do so.
+- Keep It Simple. Don't overcomplicate.
+<!-- Source: .ruler/datamodeling/dbtYamlDocumentation.md -->
+# .yaml documentation rules
+- The YAML should include the following sections: `version`, `models`, and `columns`.
+- At least one column must be the primary key. This column should have both the `not_null` and `unique` tests.
+- Every column must include:
+  - A `name`
+  - A `description`. Use a doc-block if the column already exists in the `docs.md` file. If you see the same column being referenced more than once in the repo, create a doc-block for it.
+  - A `data_type`.
+  - Include a newline between columns
+- Ensure proper YAML formatting and indentation.
+- Include a `description` for the model that:
+  - Format for human-readability. It is hard to read descriptions as one big block, space out appropriately.
+  - Explains what the model is and why it exists.
+  - What does each row represent?
+  - Mentions important **filtering criteria** or **gotchas** users should be aware of when querying it
+  - Uses full sentences in plain, concise English
+  - Wraps long text using folded block style (`>`)
+- Be concise and to the point.

package/datamodeling/CLAUDE.md ADDED Viewed

	@@ -0,0 +1 @@
1	+ @AGENTS.md

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "@clipboard-health/ai-rules",
   "description": "Pre-built AI agent rules for consistent coding standards.",
-  "version": "1.4.14",
+  "version": "1.4.15",
   "bugs": "https://github.com/ClipboardHealth/core-utils/issues",
   "devDependencies": {
     "@intellectronica/ruler": "0.3.16"

package/scripts/constants.js CHANGED Viewed

@@ -18,5 +18,6 @@ exports.PROFILES = {
     frontend: ["common", "frontend"],
     backend: ["common", "backend"],
     fullstack: ["common", "frontend", "backend"],
+    datamodeling: ["datamodeling"],
     common: ["common"],
 };