PyPI - mdbt - Versions diffs - 0.4.27__tar.gz - Mend

mdbt 0.4.27__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

mdbt-0.4.27/LICENSE +21 -0
mdbt-0.4.27/PKG-INFO +28 -0
mdbt-0.4.27/README.md +186 -0
mdbt-0.4.27/mdbt/__init__.py +0 -0
mdbt-0.4.27/mdbt/ai_core.py +116 -0
mdbt-0.4.27/mdbt/build_dbt_docs_ai.py +147 -0
mdbt-0.4.27/mdbt/build_unit_test_data_ai.py +129 -0
mdbt-0.4.27/mdbt/cmdline.py +368 -0
mdbt-0.4.27/mdbt/core.py +113 -0
mdbt-0.4.27/mdbt/expectations_output_builder.py +74 -0
mdbt-0.4.27/mdbt/lightdash.py +84 -0
mdbt-0.4.27/mdbt/main.py +474 -0
mdbt-0.4.27/mdbt/precommit_format.py +84 -0
mdbt-0.4.27/mdbt/prompts.py +244 -0
mdbt-0.4.27/mdbt/recce.py +66 -0
mdbt-0.4.27/mdbt/sort_yaml_fields.py +148 -0
mdbt-0.4.27/mdbt/sql_sorter.py +165 -0
mdbt-0.4.27/mdbt.egg-info/PKG-INFO +28 -0
mdbt-0.4.27/mdbt.egg-info/SOURCES.txt +25 -0
mdbt-0.4.27/mdbt.egg-info/dependency_links.txt +1 -0
mdbt-0.4.27/mdbt.egg-info/entry_points.txt +2 -0
mdbt-0.4.27/mdbt.egg-info/requires.txt +10 -0
mdbt-0.4.27/mdbt.egg-info/top_level.txt +1 -0
mdbt-0.4.27/setup.cfg +4 -0
mdbt-0.4.27/setup.py +34 -0
mdbt-0.4.27/tests/test_main.py +256 -0
mdbt-0.4.27/tests/test_sort_yaml_fields.py +89 -0

mdbt-0.4.27/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2025 Marki-Microwave
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

mdbt-0.4.27/PKG-INFO ADDED Viewed

@@ -0,0 +1,28 @@
+Metadata-Version: 2.4
+Name: mdbt
+Version: 0.4.27
+Summary: A CLI tool to manage dbt builds with state handling and manifest management
+Author: Craig Lathrop
+Author-email: info@markimicrowave.com
+Classifier: Programming Language :: Python :: 3
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Requires-Python: >=3.9
+License-File: LICENSE
+Requires-Dist: click<9.0.0,>=8.0.0
+Requires-Dist: pyperclip<2.0.0,>=1.8.0
+Requires-Dist: snowflake-connector-python[pandas]<4.0.0,>=3.11.0
+Requires-Dist: python-dotenv<1.2.0,>=1.0.0
+Requires-Dist: openai<2.0.0,>=1.35.0
+Requires-Dist: sqlfluff==3.4.0
+Requires-Dist: sqlfluff-templater-dbt==3.4.0
+Requires-Dist: wordninja==2.0.0
+Requires-Dist: ruamel.yaml<0.18.0
+Requires-Dist: recce<=0.44.3
+Dynamic: author
+Dynamic: author-email
+Dynamic: classifier
+Dynamic: license-file
+Dynamic: requires-dist
+Dynamic: requires-python
+Dynamic: summary

mdbt-0.4.27/README.md ADDED Viewed

@@ -0,0 +1,186 @@
+# `cdbt` - Cold Bore Capital Data Build Tools Helper
+This is a command line tool created for Cold Bore Capitals data development team. It was only developed as an internal tool, not designed for public use, however anyone is welcome to use it. Please understand that this tool is not polished, and is highly specialized for the Cold Bore Capital workflow.
+`cdbt` is a CLI (Command Line Interface) tool developed to enhance and manage DBT (Data Build Tool) builds, particularly focusing on state management and build optimizations. It provides functionalities to refresh, select, and test models efficiently, adding enhancements like conditional child or parent builds directly from the command line.
+## Features
+- **Full Refresh Control:** Toggle full refreshes on all models.
+- **Selective Model Builds:** Use the DBT-style select strings to run specific models.
+- **Failure Handling:** Customizable behavior on failures including fast failing options.
+- **State Based Builds:** Enhanced state management for efficient DBT runs.
+- **Real-Time Output:** Stream output in real-time for better monitoring.
+- **Dynamic Build Commands:** Automatically include child or parent dependencies with flexible depth control.
+- **AI-powered Documentation and Testing:** Automatically generate or update DBT YML files and unit test mock data.
+## Installation
+To install `cdbt` with pip, run the following command in your terminal:
+```bash
+pip install mdbt
+```
+To use the AI docs and unit test features, you need an `.env` file in the root of your project configured with the following:
+```bash
+OPENAI_API_KEY=<openai api key>
+DATACOVES__MAIN__ACCOUNT=<snowflake account>
+DATACOVES__MAIN__PASSWORD=<snowflake password>
+DATACOVES__MAIN__ROLE=<snowflake role>
+DATACOVES__MAIN__SCHEMA=<snowflake schema>
+DATACOVES__MAIN__USER=<snowflake user>
+DATACOVES__MAIN__WAREHOUSE=<snowflake warehouse>
+```
+Ensure that you have Python 3.8 or higher installed, as it is required for `cdbt` to function properly.
+## Basic DBT Shadowed Commands
+These commands act as a pass-through to DBT and are provided for convenience. They are not the primary focus of `cdbt`.
+- `cdbt run`
+- `cdbt test`
+- `cdbt build`
+## `trun` Run and Test Only
+This command runs the `run` and `test` commands in sequence. It is useful for running both commands in a single step, without executing a snapshot and seed.
+```bash
+mdbt trun --select my_model
+```
+## `unittest` - Run the DBT Unit Tests
+Executes the unit tests for selected or all models.
+```bash
+mdbt unittest --select my_model
+```
+## `clip-compile` Compile to Clipboard
+`clip-compile` will compile the selected model and copy the SQL to the clipboard. This is useful for quickly copying the SQL to run in console.
+Usage:
+```bash
+mdbt clip-compile --select my_model
+```
+## State Build Commands
+### Important Notes
+#### Auto Full Refresh
+Both the `sbuild` and `pbuild` commands will scan the models to be built and automatically initiate a full refresh if an incrementally materialized model is found in the list (as per `dbt ls`). If you wish to force a `--full-refresh` for other reasons such as a column being added to a seed, add the `--full-refresh` flag.
+#### State Build Commands with Parent and Child Modifiers
+Both the `sbuild` and `pbuild` commands can include modifications to build parent or child dependencies by appending a `+` and an optional integer to the command.
+- `+` or `+<number>` at the end of the command includes child dependencies up to the specified depth.
+- `+<number>` at the beginning of the command includes parent dependencies up to the specified depth.
+Example:
+- `cdbt pbuild+` and `cdbt pbuild+3` will build all state based variance models along with all child models up to 3 levels deep, respectively.
+- `cdbt 3+pbuild` will include parent models up to 3 levels up in the build.
+### Production Build `pbuild`
+This command initiates a state-based build based on the manifest.json file associated with the master branch. This will use the DBT macro provided by Datacoves `get_last_artifacts` to pull the artifacts from the Snowflake file stage and save to the `./logs` folder. Then comparison is made against this file. This file is updated during the production deployment CI process.
+```bash
+mdbt pbuild
+```
+### Local State Build `sbuild`
+Initiates a production state build.
+**Error Handling:**
+If an error occurs during an `sbuild` operation, the manifest file copied to the `_artifacts` location will be moved back to `target`. This avoids an issue where after executing a state-based build with a failure, the next build will not properly compare the state of the models.
+```bash
+mdbt sbuild
+```
+## Git-based Build `gbuild`
+This command builds models based on Git changes from production to the current branch.
+```bash
+mdbt gbuild
+```
+## AI DBT Docs Build
+The command `cdbt build-docs --select model_name` will automatically build or update the DBT YML file for a selected model.
+```bash
+mdbt build-docs --select my_model
+```
+## AI Build DBT Unit Test Mock Data
+The command `cdbt build-unit --select model_name` will automatically build or update the DBT YML file for a selected model.
+```bash
+mdbt build-unit --select my_model
+```
+## Additional Commands
+### `lightdash`
+Start a Lightdash preview for a model or all models.
+```bash
+mdbt lightdash --name preview_name --select model_name
+```
+### `format`
+Format models using sqlfluff.
+```bash
+mdbt format --select model_name
+```
+### `clean-stg`
+Clean files in the L1_stg folders.
+```bash
+mdbt clean-stg --select model_name
+```
+### `clean-clip`
+Clean and sort a series of select statements from your clipboard and put back to clipboard.
+```bash
+mdbt clean-clip
+```
+### `pop-yaml`
+Build the YAML PoP macro columns for a given model targeted in the select statement.
+```bash
+mdbt pop-yaml --select model_name
+```
+### `ma-yaml`
+Build the YAML Moving Average macro columns for a given model targeted in the select statement.
+```bash
+mdbt ma-yaml --select model_name
+```
+### `pre-commit`
+Run pre-commit hooks.
+```bash
+mdbt pre-commit
+```
+## License
+Copyright 2024 Cold Bore Capital
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

mdbt-0.4.27/mdbt/__init__.py ADDED Viewed

File without changes

mdbt-0.4.27/mdbt/ai_core.py ADDED Viewed

@@ -0,0 +1,116 @@
+import os
+import re
+import subprocess
+from typing import Dict
+from typing import List
+import openai
+from snowflake.connector import DatabaseError
+from mdbt.core import Core
+from mdbt.prompts import Prompts
+# Have to load env before import openai package.
+# flake8: noqa: E402
+class AiCore(Core):
+    def __init__(self, model: str = "gpt-4.1", test_mode: bool = False):
+        super().__init__(test_mode=test_mode)
+        self.model = model
+        # Make sure you have OPENAI_API_KEY set in your environment variables.
+        self.client = openai.OpenAI()
+        self.prompts = Prompts()
+    def send_message(self, _messages: List[Dict[str, str]]) -> object:
+        print("Sending to API")
+        completion = self.client.chat.completions.create(
+            model=self.model, messages=_messages
+        )
+        return completion.choices[0].message.content
+    @staticmethod
+    def read_file(path: str) -> str:
+        with open(path, "r") as file:
+            return file.read()
+    @staticmethod
+    def is_file_committed(file_path):
+        try:
+            # Check the Git status of the file
+            subprocess.run(
+                ["git", "ls-files", "--error-unmatch", file_path],
+                check=True,
+                stdout=subprocess.PIPE,
+                stderr=subprocess.PIPE,
+            )
+            # If the file is tracked, check if it has any modifications
+            status_result = subprocess.run(
+                ["git", "status", "--porcelain", file_path], stdout=subprocess.PIPE
+            )
+            status_output = status_result.stdout.decode().strip()
+            # If the output is empty, file is committed and has no modifications
+            return len(status_output) == 0
+        except subprocess.CalledProcessError:
+            # The file is either untracked or does not exist
+            return False
+    def _get_sample_data_from_snowflake(self, model_names: List[str]) -> Dict[str, str]:
+        """
+        Compiles the target model to SQL, then breaks out each sub query and CTE into a separate SQL strings, executing
+        each to get a sample of the data.
+        Args:
+            model_names: A list of target model names to pull sample data from.
+        Returns:
+            A dictionary of model names and their sample data in CSV format.
+        """
+        sample_results = {}
+        for model_name in model_names:
+            print(f"Getting sample data for {model_name}")
+            args = ["--select", model_name]
+            cmd = "compile"
+            results = self.execute_dbt_command_capture(cmd, args)
+            extracted_sql = self.extract_sql(results)
+            sample_sql = self.build_sample_sql(extracted_sql)
+            try:
+                self._cur.execute(sample_sql)
+            except DatabaseError as e:
+                print(f"Error executing sample SQL for {model_name}")
+                print(e)
+                print("\n\n" + sample_sql + "\n\n")
+                raise e
+            tmp_df = self._cur.fetch_pandas_all()
+            sample_results[model_name] = tmp_df.to_csv(index=False)
+        print(f"Sample results: {sample_results}")
+        return sample_results
+    @staticmethod
+    def build_sample_sql(sql: str) -> str:
+        sql = f"""
+            with tgt_table as (
+                {sql}
+            )
+            select *
+            from tgt_table
+            sample (10 rows)
+            """
+        return sql
+    @staticmethod
+    def extract_sql(log):
+        sql_lines = [line for line in log.splitlines() if not re.match(r"--\s.*", line)]
+        keyword_line_index = 0
+        for i, line in enumerate(sql_lines):
+            if "Compiled node" in line:
+                keyword_line_index = i + 1
+                break
+        sql_lines = sql_lines[keyword_line_index:]
+        # Join the remaining lines and remove escape sequences
+        sql = "\n".join(sql_lines).replace("\x1b[0m", "").strip()
+        return sql

mdbt-0.4.27/mdbt/build_dbt_docs_ai.py ADDED Viewed

@@ -0,0 +1,147 @@
+import subprocess
+import pyperclip
+from dotenv import find_dotenv
+from dotenv import load_dotenv
+from mdbt.ai_core import AiCore
+from mdbt.prompts import Prompts
+load_dotenv(find_dotenv("../.env"))
+load_dotenv(find_dotenv(".env"))
+class BuildDBTDocs(AiCore):
+    """
+    # Make sure you have OPENAI_API_KEY set in your environment variables.
+    """
+    def __init__(self):
+        super().__init__()
+    def main(self, model_name, sys_context, is_new=False):
+        if model_name.endswith(".sql"):
+            model_name = model_name[:-4]
+        if not is_new:
+            print(
+                """
+            1) Build new DBT documentation.
+            2) Check existing DBT documentation against model for missing definitions.
+            """
+            )
+            mode = int(input())
+        else:
+            mode = 1
+        print("Getting file.")
+        sql_file_path = self.get_file_path(model_name)
+        if "l4" in sql_file_path.lower() or "l3" in sql_file_path.lower():
+            system_instructions = Prompts().dbt_docs_gte_l3_prompt
+        else:
+            system_instructions = Prompts().dbt_docs_lte_l2_prompt
+        if sys_context:
+            system_instructions += f"\nContext about system docs are generated for: \n{sys_context}\n"
+        sample_data = self._get_sample_data_from_snowflake([model_name])
+        system_instructions = system_instructions + sample_data[model_name]
+        # Might bring this back in the future.
+        extra_info = ""
+        if mode == 1:
+            # Build new documentation
+            user_input = self.build_user_msg_mode_1(sql_file_path, extra_info)
+            yml_file_path = sql_file_path.replace(".sql", ".yml")
+        elif mode == 2:
+            # Check existing documentation
+            yml_file_path = sql_file_path[:-4] + ".yml"
+            user_input = self.build_user_msg_mode_2(
+                sql_file_path, yml_file_path, extra_info
+            )
+        else:
+            print(mode)
+            raise ValueError("Invalid mode")
+        messages = [
+            {"role": "user", "content": system_instructions + "\n" + user_input}
+        ]
+        assistant_responses = []
+        result = self.send_message(messages)
+        assistant_responses.append(result)
+        messages.append({"role": "assistant", "content": assistant_responses[0]})
+        print(assistant_responses[0])
+        output = assistant_responses[0]
+        # Check for ``` at end of output (str) and remove
+        # Remove trailing markdown code fences if present
+        lines = output.split('\n')
+        if lines and '```' in lines[-1].strip():
+            lines = lines[:-1]
+        elif len(lines) > 1 and '```' in lines[-2].strip():
+            # Remove the second-to-last line if it's a code fence
+            lines.pop(-2)
+        output = '\n'.join(lines)
+        if not is_new:
+            clip_or_file = input(
+                f"1. to copy to clipboard\n2, to write to file ({yml_file_path}\n:"
+            )
+        else:
+            clip_or_file = "2"
+        if clip_or_file == "1":
+            print("Output copied to clipboard")
+            pyperclip.copy(output)
+        elif clip_or_file == "2":
+            if mode == 2:
+                # Make a backup of the current YML file.
+                self.backup_existing_yml_file(yml_file_path)
+            output = assistant_responses[0].split("\n")
+            # output = output[1:-1]
+            output = output[1:]
+            output = "\n".join(output)
+            with open(yml_file_path, "w") as file:
+                file.write(output)
+            if not self.is_file_committed(yml_file_path):
+                if not is_new:
+                    commit_file = input("Press 1 to add to git, any other key to byapss: ")
+                else:
+                    commit_file = "1"
+                if commit_file == "1":
+                    subprocess.run(["git", "add", yml_file_path])
+    @staticmethod
+    def backup_existing_yml_file(yml_file_path):
+        with open(yml_file_path, "r") as file:
+            yml_content = file.read()
+        with open(yml_file_path + ".bak", "w") as file:
+            file.write(yml_content)
+    def build_user_msg_mode_1(self, _sql_file_path: str, extra_info: str) -> str:
+        self.read_file(_sql_file_path)
+        model_name = _sql_file_path.split("/")[-1].split(".")[0]
+        prompt_str = f"Build new DBT documentation for the following SQL query with model name {model_name}"
+        if len(extra_info):
+            prompt_str += f"\n{extra_info}"
+        return prompt_str
+    def build_user_msg_mode_2(
+        self, _sql_file_path: str, _yml_file_path: str, extra_info: str
+    ) -> str:
+        self.read_file(_sql_file_path)
+        yml = self.read_file(_yml_file_path)
+        model_name = _sql_file_path.split("/")[-1].split(".")[0]
+        prompt_str = f"Check for missing columns in the following DBT documentation for the following SQL query with model name {model_name}. Identify any columns in the DBT documentation that do not exist in the SQL and comment them out."
+        if len(extra_info):
+            prompt_str += f"\n {extra_info}"
+        prompt_str += f"\nYML File Contents:\n{yml}"
+        return prompt_str
+if __name__ == "__main__":
+    BuildDBTDocs().main("revenue_by_dvm")

mdbt-0.4.27/mdbt/build_unit_test_data_ai.py ADDED Viewed

@@ -0,0 +1,129 @@
+import logging
+import os
+import re
+import warnings
+from typing import Dict
+import pyperclip
+from dotenv import find_dotenv
+from dotenv import load_dotenv
+load_dotenv(find_dotenv("../.env"))
+load_dotenv(find_dotenv(".env"))
+# flake8: noqa: E402
+from mdbt.ai_core import AiCore
+# Have to load env before import openai package.
+warnings.simplefilter(action="ignore", category=FutureWarning)
+logging.getLogger("snowflake.connector").setLevel(logging.WARNING)
+class BuildUnitTestDataAI(AiCore):
+    def __init__(self):
+        super().__init__(model="o3-mini")
+    def main(self, model_name: str):
+        file_path = self.get_file_path(model_name)
+        # Extract the folder immediately after 'models'. Not sure I need to use this just yet, holding on to it for
+        # later.
+        layer_name = file_path.split("/")[1][:2]
+        sub_folder = file_path.split("/")[2]
+        file_name = os.path.splitext(os.path.basename(file_path))[0]
+        test_file_path = (
+            f"tests/unit_tests/{layer_name}/{sub_folder}/test_{file_name}.sql"
+        )
+        input_sql_file_name = file_path
+        input_sql = self.read_file(input_sql_file_name)
+        models_in_model_file = self.extract_model_names(input_sql)
+        sample_data = self._get_sample_data_from_snowflake(models_in_model_file)
+        prompt = self.build_prompt(
+            self.prompts.build_unit_test_prompt.format(model_name=model_name),
+            model_name,
+            input_sql,
+            sample_data,
+        )
+        print(f"##################\n{prompt}\n##################")
+        messages = [
+            {
+                "role": "user",
+                "content": "You are helping to build unit tests for DBT (database build tools) models.\n"
+                + prompt,
+            },
+        ]
+        response = self.send_message(messages)
+        output = self._remove_first_and_last_line_from_string(response)
+        print(output)
+        clip_or_file = input(
+            f"1. to copy to clipboard\n2, to write to file ({test_file_path}"
+        )
+        if clip_or_file == "1":
+            print("Output copied to clipboard")
+            pyperclip.copy(output)
+        elif clip_or_file == "2":
+            # Check if file exists and ask if it should be overwritten.
+            if os.path.exists(test_file_path):
+                overwrite = input(f"File {test_file_path} exists. Overwrite? (y/n)")
+                if overwrite.lower() == "y":
+                    with open(test_file_path, "w") as file:
+                        file.write(output)
+                    print(f"Output written to {test_file_path}")
+            else:
+                with open(test_file_path, "w") as file:
+                    file.write(output)
+                print(f"Output written to {test_file_path}")
+    def _remove_first_and_last_line_from_string(self, s: str) -> str:
+        return "\n".join(s.split("\n")[1:-1])
+    @staticmethod
+    def extract_model_names(dbt_script):
+        # Regular expression to find all occurrences of {{ ref('model_name') }}
+        pattern = r"\{\{\s*ref\('([^']+)'\)\s*\}\}"
+        # Find all matches in the script
+        model_names = re.findall(pattern, dbt_script)
+        return model_names
+    @staticmethod
+    def build_prompt(
+        prompt_template: str,
+        model_name: str,
+        model_sql,
+        sample_models_and_data: Dict[str, str],
+    ):
+        sample_str = ""
+        for model_name, sample_data in sample_models_and_data.items():
+            sample_str += f"""{model_name}: \n{sample_data}\n"""
+        output = f"""
+The model name we are building the test for is {model_name}. In the example, this says "model_name". Put this value in that same place.'
+{prompt_template}
+The SQL for the model is:
+{model_sql}
+Here is sample data for each input model. This just represents a random sample. Use it to create realistic test data, but try to build the test input data so that it tests the logic found within the model, regardless of the particular combination of sample data. Imagine that certain flags might be true or false, even if that flag is always true or false in the sample data.
+{sample_str}
+"""
+        return output
+if __name__ == "__main__":
+    BuildUnitTestDataAI().main("avg_client_rev_per_year")