PyPI - snowforge-package - Versions diffs - 0.2.7__tar.gz - Mend

snowforge-package 0.2.7__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

snowforge_package-0.2.7/LICENSE ADDED Viewed

File without changes

snowforge_package-0.2.7/PKG-INFO ADDED Viewed

@@ -0,0 +1,148 @@
+Metadata-Version: 2.2
+Name: snowforge-package
+Version: 0.2.7
+Summary: A Python package for supporting migration from on-prem to cloud
+Home-page: https://github.com/yourusername/Snowforge
+Author: Andreas Heggelund
+Author-email: andreasheggelund@gmail.com
+Classifier: Programming Language :: Python :: 3
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Requires-Python: >=3.12
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: boto3
+Requires-Dist: snowflake-connector-python
+Requires-Dist: coloredlogs
+Requires-Dist: colored
+Requires-Dist: tqdm
+Requires-Dist: toml
+Requires-Dist: argparse
+Dynamic: author
+Dynamic: author-email
+Dynamic: classifier
+Dynamic: description
+Dynamic: description-content-type
+Dynamic: home-page
+Dynamic: requires-dist
+Dynamic: requires-python
+Dynamic: summary
+# 🚀 Snowforge - Powerful Data Integration
+**Snowforge** is a Python package designed to streamline data integration and transfer between **AWS**, **Snowflake**, and various **on-premise database systems**. It provides efficient data extraction, logging, configuration management, and AWS utilities to support robust data engineering workflows.
+---
+## ✨ Features
+- **AWS Integration**: Manage AWS S3 and Secrets Manager operations.
+- **Snowflake Connection**: Establish and manage Snowflake connections effortlessly.
+- **Advanced Logging**: Centralized logging system with colored output for better visibility.
+- **Configuration Management**: Load and manage credentials from a TOML configuration file.
+- **Data Mover Engine**: Parallel data processing and extraction strategies for efficiency.
+- **Extensible Database Extraction**: Uses a **strategy pattern** to support multiple **on-prem database systems** (e.g., Netezza, Oracle, PostgreSQL, etc.).
+---
+## 📥 Installation
+Install Snowforge using pip:
+```sh
+pip install snowforge-package
+```
+---
+## ⚙️ Configuration
+Snowforge requires a configuration file (`snowforge_config.toml`) to manage credentials for AWS and Snowflake. The package searches for the config file in the following locations:
+1. Path specified in `SNOWFORGE_CONFIG_PATH` environment variable.
+2. Current working directory.
+3. `~/.config/snowforge_config.toml`
+4. Package directory.
+### Example `snowforge_config.toml` File
+```toml
+[AWS]
+[default]
+AWS_ACCESS_KEY = "your-access-key"
+AWS_SECRET_KEY = "your-secret-key"
+REGION = "us-east-1"
+[SNOWFLAKE]
+[default]
+USERNAME = "your-username"
+ACCOUNT = "your-account"
+```
+---
+## 🚀 Quick Start
+### 🔹 Initialize AWS Integration
+```python
+from Snowforge.AWSIntegration import AWSIntegration
+AWSIntegration.initialize(profile="default", verbose=True)
+```
+### 🔹 Connect to Snowflake
+```python
+from Snowforge.SnowflakeConnect import SnowflakeConnection
+conn = SnowflakeConnection.establish_connection(user_name="your-user", account="your-account")
+```
+### 🔹 Use Logging
+```python
+from Snowforge.Logging import Debug
+Debug.log("This is an info message", level='INFO')
+Debug.log("This is an error message", level='ERROR')
+```
+### 🔹 Extract Data from an On-Prem Database
+```python
+import Snowforge.AWSIntegration as aws
+def export_and_upload_table_data(extractor: ExtractorStrategy):
+    #Fetch data from an on-prem system:
+    query = extractor.extract_table_query("database.schema.table", "filter_column", "filter_value")
+    full_path_to_file = extractor.export_table_to_file(query, output_path, file_format(optional))
+    aws.upload_to_s3("bucket name", full_path_to_file, "key to store the file under in S3")
+def main():
+    from Snowforge.Extractors.NetezzaExtractor import NetezzaExtractor
+    from Snowforge.Extractors.OracleExtractor import OracleExtractor
+    from Snowforge.Extractors.PostgrSQLExtractor import PostgrSQLExtractor
+    # Export and upload data from different source systems by exchanging the extractor strategy
+    export_and_upload_table_data(NetezzaExtractor())
+    export_and_upload_table_data(OracleExtractor())
+    export_and_upload_table_data(PostgrSQLExtractor())
+```
+Since **Snowforge** follows a **strategy pattern**, it can be easily extended to support other **database systems** by implementing new extractor classes that conform to the `ExtractorStrategy` interface.
+---
+## 📜 License
+This project is licensed under the **MIT License**.
+---
+## 👤 Author
+Developed by **andreasheggelund@gmail.com**. Feel free to reach out for support, suggestions, or collaboration!

snowforge_package-0.2.7/README.md ADDED Viewed

@@ -0,0 +1,118 @@
+# 🚀 Snowforge - Powerful Data Integration
+**Snowforge** is a Python package designed to streamline data integration and transfer between **AWS**, **Snowflake**, and various **on-premise database systems**. It provides efficient data extraction, logging, configuration management, and AWS utilities to support robust data engineering workflows.
+---
+## ✨ Features
+- **AWS Integration**: Manage AWS S3 and Secrets Manager operations.
+- **Snowflake Connection**: Establish and manage Snowflake connections effortlessly.
+- **Advanced Logging**: Centralized logging system with colored output for better visibility.
+- **Configuration Management**: Load and manage credentials from a TOML configuration file.
+- **Data Mover Engine**: Parallel data processing and extraction strategies for efficiency.
+- **Extensible Database Extraction**: Uses a **strategy pattern** to support multiple **on-prem database systems** (e.g., Netezza, Oracle, PostgreSQL, etc.).
+---
+## 📥 Installation
+Install Snowforge using pip:
+```sh
+pip install snowforge-package
+```
+---
+## ⚙️ Configuration
+Snowforge requires a configuration file (`snowforge_config.toml`) to manage credentials for AWS and Snowflake. The package searches for the config file in the following locations:
+1. Path specified in `SNOWFORGE_CONFIG_PATH` environment variable.
+2. Current working directory.
+3. `~/.config/snowforge_config.toml`
+4. Package directory.
+### Example `snowforge_config.toml` File
+```toml
+[AWS]
+[default]
+AWS_ACCESS_KEY = "your-access-key"
+AWS_SECRET_KEY = "your-secret-key"
+REGION = "us-east-1"
+[SNOWFLAKE]
+[default]
+USERNAME = "your-username"
+ACCOUNT = "your-account"
+```
+---
+## 🚀 Quick Start
+### 🔹 Initialize AWS Integration
+```python
+from Snowforge.AWSIntegration import AWSIntegration
+AWSIntegration.initialize(profile="default", verbose=True)
+```
+### 🔹 Connect to Snowflake
+```python
+from Snowforge.SnowflakeConnect import SnowflakeConnection
+conn = SnowflakeConnection.establish_connection(user_name="your-user", account="your-account")
+```
+### 🔹 Use Logging
+```python
+from Snowforge.Logging import Debug
+Debug.log("This is an info message", level='INFO')
+Debug.log("This is an error message", level='ERROR')
+```
+### 🔹 Extract Data from an On-Prem Database
+```python
+import Snowforge.AWSIntegration as aws
+def export_and_upload_table_data(extractor: ExtractorStrategy):
+    #Fetch data from an on-prem system:
+    query = extractor.extract_table_query("database.schema.table", "filter_column", "filter_value")
+    full_path_to_file = extractor.export_table_to_file(query, output_path, file_format(optional))
+    aws.upload_to_s3("bucket name", full_path_to_file, "key to store the file under in S3")
+def main():
+    from Snowforge.Extractors.NetezzaExtractor import NetezzaExtractor
+    from Snowforge.Extractors.OracleExtractor import OracleExtractor
+    from Snowforge.Extractors.PostgrSQLExtractor import PostgrSQLExtractor
+    # Export and upload data from different source systems by exchanging the extractor strategy
+    export_and_upload_table_data(NetezzaExtractor())
+    export_and_upload_table_data(OracleExtractor())
+    export_and_upload_table_data(PostgrSQLExtractor())
+```
+Since **Snowforge** follows a **strategy pattern**, it can be easily extended to support other **database systems** by implementing new extractor classes that conform to the `ExtractorStrategy` interface.
+---
+## 📜 License
+This project is licensed under the **MIT License**.
+---
+## 👤 Author
+Developed by **andreasheggelund@gmail.com**. Feel free to reach out for support, suggestions, or collaboration!

snowforge_package-0.2.7/Snowforge/AWSIntegration.py ADDED Viewed

@@ -0,0 +1,215 @@
+import os
+import sys
+import boto3
+import botocore.exceptions as be
+from boto3.s3.transfer import TransferConfig
+import json
+from tqdm import tqdm
+from .Logging import Debug  # Import logging class
+from .Config import Config  # Import config class
+class ProgressPercentage:
+    def __init__(self, filename):
+        self._filename = filename
+        self._size = float(os.path.getsize(filename))
+        self._seen_so_far = 0
+        self._tqdm = tqdm(total=self._size, unit='B', unit_scale=True, desc=filename)
+    def __call__(self, bytes_amount):
+        self._seen_so_far += bytes_amount
+        self._tqdm.update(bytes_amount)
+class AWSIntegration:
+    """Static AWS Helper Class for managing S3 and Secrets Manager operations."""
+    s3_client = None
+    secret_client = None
+    _current_profile = None
+    @staticmethod
+    def initialize(profile: str = "default", verbose: bool = False):
+        """Initializes AWS clients for S3 and Secrets Manager.
+        If credentials are missing, prompts the user for input. If authentication
+        fails, resets credentials so that `initialize()` can be called again for reattempt.
+        Args:
+            aws_profile (str, optional): Specifies which AWS profile to use for the connection. Defaults to 'Default' profile.
+            verbose (bool, optional): set True to enable DEBUG output. Defaults to False.
+        Raises:
+            Exception: If AWS authentication fails or Default profile not found in .toml file.
+        """
+        # If already initialized successfully, return
+        if AWSIntegration.s3_client is not None and AWSIntegration.secret_client is not None:
+            Debug.log(f"Already authenticated! using profile: '{AWSIntegration._current_profile}'!", 'DEBUG', verbose)
+            return
+        try:
+            aws_creds   = Config.get_aws_credentials(profile)  # Change "default" to "production" as needed
+            access_key  = aws_creds["AWS_ACCESS_KEY"]
+            secret_key  = aws_creds["AWS_SECRET_KEY"]
+            region      = aws_creds["REGION"]
+        except TypeError as e:
+            Debug.log(f"No profile named '{profile}' in config file.", 'ERROR')
+            sys.exit(1)
+        Debug.log(f"credentials found in config.toml by user: \nAWS_ACCESS_KEY_ID: {access_key}\nAWS_SECRET_ACCESS_KEY: {secret_key}\n", 'DEBUG', verbose)
+        try:
+            identity = AWSIntegration.check_connection(access_key, secret_key, region)
+            AWSIntegration._current_profile = profile # Persist the currently authenticated profile
+            Debug.log(f"Authenticated as: {identity['Arn'].split('/')[-1]}", 'SUCCESS')
+        except be.ClientError as e:
+            Debug.log("Invalid credentials. Please verify that your profile has the required permissions.", 'ERROR')
+            sys.exit(1)
+        try:
+            AWSIntegration.s3_client = boto3.client(
+                "s3",
+                aws_access_key_id=access_key,
+                aws_secret_access_key=secret_key,
+                region_name=region
+            )
+            AWSIntegration.secret_client = boto3.client(
+                "secretsmanager",
+                aws_access_key_id=access_key,
+                aws_secret_access_key=secret_key,
+                region_name=region
+            )
+            Debug.log(f"Successfully created connection to aws clients!", 'DEBUG', verbose)
+        except be.ClientError as e:
+            error_code = e.response['Error']['Code']
+            # Reset class variables to allow retrying on next call
+            AWSIntegration.s3_client = None
+            AWSIntegration.secret_client = None
+            # Reset environment variables (so `initialize()` prompts again on next call)
+            os.environ.pop("AWS_ACCESS_KEY_ID", None)
+            os.environ.pop("AWS_SECRET_ACCESS_KEY", None)
+            if error_code == 'InvalidAccessKeyId':
+                Debug.log(f"\n\nThe selected IAM user is not found.\n", 'ERROR')
+    @staticmethod
+    def check_connection(access_key: str, secret_key: str, region: str):
+        '''Validates connection to AWS by fetching the caller identity.
+            Args:
+                access_key (str): The IAM access key associated with the IAM user.
+                secret_key (str): The IAM access key associated with the IAM user.
+            Returns:
+                identity (boto3.client.identity): The called identity, IF authenticated.
+        '''
+        try:
+            sts_client = boto3.client(
+                "sts",
+                aws_access_key_id=access_key,
+                aws_secret_access_key=secret_key,
+                region_name=region
+            )
+        except be.ClientError as e:
+            Debug.log(f"Invalid credentials, verify you are using the correct IAM profile.", 'ERROR')
+        identity = sts_client.get_caller_identity()
+        return identity
+    @staticmethod
+    def define_s3_transfer_config(size_threshold: float, threads: int):
+        """Defines and returns an AWS S3 TransferConfig for efficient file uploads.
+        Args:
+            size_threshold (float): The file size (in GB) at which multipart upload should trigger.
+            threads (int): Number of concurrent threads for upload.
+            verbose (bool, optional): set True to enable DEBUG output. Defaults to False.
+        Returns:
+            TransferConfig: Configured transfer settings for AWS S3 uploads.
+        """
+        GB = 1024 ** 3
+        Debug.log(f"Threshold for multithreaded upload to S3: {size_threshold}GB\n"
+                f"Concurrent threads: {threads}", 'INFO')
+        return TransferConfig(multipart_threshold=size_threshold * GB, max_concurrency=threads)
+    @staticmethod
+    def get_secret(secret_name: str, verbose: bool = False):
+        """Retrieves a secret from AWS Secrets Manager.
+        Args:
+            secret_name (str): The name of the secret to retrieve.
+            verbose (bool, optional): set True to enable DEBUG output. Defaults to False.
+        Returns:
+            dict: The secret's value parsed as a dictionary.
+        Raises:
+            Exception: If retrieval fails.
+        """
+        AWSIntegration.initialize(verbose)
+        try:
+            response = AWSIntegration.secret_client.get_secret_value(SecretId=secret_name)
+            return json.loads(response['SecretString'])
+        except Exception as e:
+            Debug.log(f"Failed to retrieve secret: {e}", 'ERROR')
+    @staticmethod
+    def get_bucket_contents(bucket_name: str, verbose: bool = False):
+        """Lists all files in a given AWS S3 bucket.
+        Args:
+            bucket_name (str): The name of the S3 bucket.
+            verbose (bool, optional): set True to enable DEBUG output. Defaults to False.
+        Returns:
+            list[str]: A list of filenames stored in the bucket.
+        Raises:
+            Exception: If the bucket is not accessible.
+        """
+        AWSIntegration.initialize(verbose)
+        try:
+            response = AWSIntegration.s3_client.list_objects_v2(Bucket=bucket_name)
+            return [item['Key'] for item in response.get('Contents', [])]
+        except Exception as e:
+            Debug.log(f"Error fetching bucket contents: {e}", 'ERROR')
+    @staticmethod
+    def push_file_to_s3(bucket_name: str, file_to_upload: str, key: str, config: TransferConfig = None, verbose: bool = False):
+        """Uploads a file to an AWS S3 bucket.
+        Args:
+            bucket_name (str): The destination S3 bucket name.
+            file_to_upload (str): Path to the file to upload.
+            key (str): The S3 key (filename) to assign.
+            config (TransferConfig, optional): AWS S3 transfer configuration. Defaults to None.
+            verbose (bool, optional): set True to enable DEBUG output. Defaults to False.
+        Raises:
+            Exception: If the upload fails.
+        """
+        AWSIntegration.initialize(verbose)
+        if config is None:
+            config = AWSIntegration.define_s3_transfer_config(0.1, 10)
+        try:
+            Debug.log(f"Uploading {file_to_upload} to {bucket_name}/{key}...", 'INFO')
+            with open(file_to_upload, 'rb') as file_obj:
+                AWSIntegration.s3_client.upload_fileobj(
+                    file_obj, bucket_name, key, Config=config, Callback=ProgressPercentage(file_to_upload)
+                )
+            Debug.log(f"Successfully uploaded {file_to_upload} to {bucket_name}/{key}", 'SUCCESS')
+        except Exception as e:
+            Debug.log(f"Error uploading file: {e}", 'ERROR')

snowforge_package-0.2.7/Snowforge/Config.py ADDED Viewed

@@ -0,0 +1,86 @@
+import os
+import sys
+import toml
+from .Logging import Debug  # Use existing logging system
+class Config:
+    """Loads configuration settings from config.toml and manages profiles globally."""
+    _config_data = {}
+    _aws_profile = "default"  # Stores the globally selected profile
+    _snowflake_profile = "default"
+    CONFIG_FILE_PATHS = [
+        os.getenv("SNOWFORGE_CONFIG_PATH"),  # Custom path via env variable
+        os.path.join(os.getcwd(), "snowforge_config.toml"),  # Current working directory
+        os.path.join(os.path.expanduser("~"), ".config", "snowforge_config.toml"),  # ~/.config/snowforge_config.toml
+        os.path.join(os.path.dirname(__file__), "snowforge_config.toml")  # Package directory
+    ]
+    @staticmethod
+    def find_config_file(config_paths: list = CONFIG_FILE_PATHS, verbose: bool = False):
+        """Finds the first available config file by searching for the file in locations included in 'config_paths'.
+        Args:
+            config_paths (list): a list of locations where the script will search for the config. defaults locations are in SNOWFORGE_CONFIG_PATH, Current working dir, '~' and the package dir. You have to create this file yourself in some of these locations and name it 'snowforge_config.toml'.
+            verbose (bool): boolean to enable/disable verbose logging. Defaults to 'False'
+        """
+        for path in config_paths:
+            if path and os.path.exists(path):
+                Debug.log(f"found file at: {path}", 'DEBUG', verbose)
+                return path
+        Debug.log("⚠️ No config.toml file found. Exiting..", "WARNING")
+        return None
+    @staticmethod
+    def load_config(config_paths: list = CONFIG_FILE_PATHS, verbose: bool = False):
+        """Loads the config from the path included in 'config_paths'"""
+        config_path = Config.find_config_file(config_paths, verbose)
+        # Returns early if no valid path is found
+        if not config_path:
+            Debug.log(f"No valid file found! Ensure you have created a config file named 'snowforge_config.toml'.", 'ERROR')
+            raise FileNotFoundError
+        try:
+            Config._config_data = toml.load(config_path)
+            Debug.log(f"Successfully loaded config file from: {config_path}!", 'DEBUG', verbose)
+        except toml.TomlDecodeError as e:
+            Debug.log(f"Error loading config file\n{e.msg}", 'ERROR')
+    @staticmethod
+    def get_current_aws_profile()->str:
+        return Config._aws_profile
+    @staticmethod
+    def get_current_snowflake_profile()->str:
+        """Returns the current selected snowflake profile."""
+        return Config._snowflake_profile
+    @staticmethod
+    def get_snowflake_credentials(config_paths: list = CONFIG_FILE_PATHS, profile: str = "default", verbose: bool = False)->dict:
+        """Returns credentials for a given Snowflake profile specified in the snowforge_config.toml file."""
+        Config.load_config(config_paths, verbose)
+        sf_config = Config._config_data.get("SNOWFLAKE", {}).get(profile, {})
+        if not sf_config:
+            Debug.log(f"No profile '{profile}' in .toml file... Please provide a valid configuration.", 'WARNING')
+            return None
+        return sf_config
+    @staticmethod
+    def get_aws_credentials(config_paths: list = CONFIG_FILE_PATHS, profile: str = "default", verbose: bool = False)->dict:
+        """Returns credentials for a given AWS profile specified in the snowforge_config.toml file."""
+        Config.load_config(config_paths, verbose)
+        aws_config = Config._config_data.get("AWS", {}).get(profile, {})
+        if not aws_config:
+            Debug.log(f"No profile '{profile}' in .toml file... Please provide a valid configuration.", 'WARNING')
+            return None
+        return aws_config

snowforge_package-0.2.7/Snowforge/DataMover/DataMover.py ADDED Viewed

@@ -0,0 +1,95 @@
+import os
+import multiprocessing as mp
+import asyncio
+from concurrent.futures import ProcessPoolExecutor
+from .Extractors.ExtractorStrategy import ExtractorStrategy
+from ..Logging import Debug
+class Engine():
+    """Engine for moving data across plattforms and across on-prem/cloud"""
+    def __init__(self):
+        """Initialize the DatMover engine. Takes the ExtractStrategy as input (i.e NetezzaExtractor, OracleExtractor etc.)"""
+        self.cpu_count = os.cpu_count() or 4
+        self.pool = ProcessPoolExecutor(max_workers=8 or (self.cpu_count - 2))
+    @staticmethod
+    def parallel_process(worker_func: object, args_list: list[tuple], num_workers: int = None, use_shared_queue: bool = False, queue = None):
+        '''
+        Executes a worker function 'worker_func' in parallel using a number multiple processes defined by the 'num_workers' variable.
+        Args:
+            worker_func (function): The function that each worker process should execute.
+            args_list (list): A list of tuples, where each tuple contains arguments for worker_func.
+            num_workers (int, optional): Number of parallel workers. Defaults to max(4, CPU count - 2).
+            use_shared_queue (bool, optional): If True, a multiprocessing queue will be created and passed to workers.
+            queue (mp.Queue, optional): If provided, it will be used instead of creating a new queue.
+        Returns:
+            list: List of results from worker processes if applicable.
+        '''
+        # Determine the number of CPU cores to use
+        if num_workers is None:
+            num_workers = max(4, os.cpu_count() - 2) #failsafe slik at noen kjerner er tilgjengelig for systemet
+        num_processes = min(num_workers, len(args_list)) # sørger for at ingen prosesser får mer en angitt num_workers, men spawner bare opptil så mange oppgaver den har dersom num_workers > len(jobber_som_skal_kjøres)
+        process_list = []
+        if use_shared_queue and queue is None:
+            Debug.log("You HAVE to supply a queue as input to this function if you set 'use_shared_queue = True', otherwise the queue will not be reachable to produces/consumer processes on the other side!", 'WARNING')
+            raise SyntaxError
+        # Create and start all worker processes
+        for i in range(num_processes):
+            if use_shared_queue:
+                process = mp.Process(target=worker_func, args=(*args_list[i], queue))
+            else:
+                process = mp.Process(target=worker_func, args=args_list[i])
+            process.daemon = True  # Ensure processes exit when main program exits. This ensures no orphans or zombies
+            process_list.append(process)
+            process.start()
+        return process_list  # Return the list of running processes
+    @staticmethod
+    def determine_file_offsets(file_name: str, num_chunks: int):
+        """Determine file offsets for parallel reading based on line breaks."""
+        file_size = os.path.getsize(file_name)
+        chunk_size = max(1, file_size // num_chunks)
+        offsets = [0]
+        with open(file_name, 'rb') as f:
+            for _ in range(num_chunks - 1):
+                f.seek(offsets[-1] + chunk_size)
+                f.readline()
+                offsets.append(f.tell())
+        print(f"DEBUG: File offsets computed: {offsets}")
+        return offsets
+    @staticmethod
+    def export_to_file(extractor: ExtractorStrategy, output_path: str, fully_qualified_table_name: str, filter_column: str = None, filter_value: str = None, verbose: bool = False)->tuple:
+        """Utilizes the selected Extractor to export database(s) to a csv file."""
+        header, csv_file = extractor.export_external_table(output_path, fully_qualified_table_name, filter_column, filter_value, verbose)
+        return header, csv_file
+    @staticmethod
+    def calculate_chunks(external_table: str, compression: int = 4):
+        """Calculates how many chunks to split the file into."""
+        unzipped_chunk_filesize = 100 * 1024 * 1024 * compression   # 200 MB zipped (added compression_factor in order to account for the compression factor of gzip on table data)
+        total_filesize = os.path.getsize(external_table)
+        if total_filesize > unzipped_chunk_filesize:
+            num_chunks = int(total_filesize // unzipped_chunk_filesize) + 2 # ensures that at least three chunks is created (trekker fra 1 lenger nede i koden)
+        else:
+            num_chunks = 2
+        Debug.log(f"\nTotal filesize: {total_filesize // (1024*1024)} mb\nnumber of chunks: {num_chunks - 1}\n", 'INFO')
+        return num_chunks

snowforge_package-0.2.7/Snowforge/DataMover/Extractors/ExtractorStrategy.py ADDED Viewed

@@ -0,0 +1,20 @@
+# snowforge/DataMover/Extractors/ExtractorStrategy.py
+from abc import ABC, abstractmethod
+class ExtractorStrategy(ABC):
+    """Abstract base class for all extraction strategies. Think of this like an interface from C#"""
+    @abstractmethod
+    def extract_table_query(self, fully_qualified_table_name: str, filter_column: str, filter_value: str, verbose: bool = False):
+        """Extracts a table based on the provided criteria."""
+        pass
+    @abstractmethod
+    def list_all_tables(self, database_name: str, verbose: bool = False):
+        """Lists all tables in a given database."""
+        pass
+    @abstractmethod
+    def export_external_table(self, query: str, output_path: str, table_name: str, verbose: bool = False):
+        """Exports an external table from a database table."""
+        pass

snowforge_package-0.2.7/Snowforge/DataMover/Extractors/NetezzaExtractor.py ADDED Viewed

@@ -0,0 +1,77 @@
+import subprocess
+import os
+from .ExtractorStrategy import ExtractorStrategy
+from ...Logging import Debug
+class NetezzaExtractor(ExtractorStrategy):
+    """Handles extraction from Netezza specifically."""
+    def extract_table_query(self, fully_qualified_table_name: str, filter_column: str = None, filter_value: str = None, verbose: bool = False)->str:
+        """Build query against database table. (Can be extended in a future version)."""
+        if filter_column is not None and filter_value is None:
+            Debug.log(f"You must provide a filter value in order to apply any filtering.", 'WARNING')
+            raise Exception
+        elif filter_column is None and filter_value is not None:
+            Debug.log(f"You cannot supply a filter value without specifying a filter (--filter option)\ncontinuing without filter.", 'WARNING')
+            raise Exception
+        if filter_value is None or filter_column is None:
+            query = f"SELECT * FROM {fully_qualified_table_name}"
+        else:
+            query = f"SELECT * FROM {fully_qualified_table_name} WHERE {filter_column} BETWEEN TO_DATE('{filter_value}', 'DD.MM.YYYY') AND CURRENT_DATE+1"
+        return query
+    def list_all_tables(self, database_name: str, verbose: bool = False)->list:
+        """Query all tables in the specified database and export them as an array."""
+        command = f"nzsql -q -c \"SELECT TABLE_NAME FROM _V_TABLE WHERE TABLE_SCHEMA = '{database_name}';\""
+        output = subprocess.check_output(command, shell=True).decode('ISO-8859-1')
+        table_list = [line.strip() for line in output.split('\n') if line.strip()]
+        return table_list
+    def export_external_table(self, output_path: str, fully_qualified_table_name: str, filter_column: str = None, filter_value: str = None, verbose: bool = False)->str:
+        """Runs the query on Netezza and exports the data to a CSV file."""
+        table_name = fully_qualified_table_name.split('.')[-1] if fully_qualified_table_name else None
+        query = self.extract_table_query(fully_qualified_table_name, filter_column, filter_value, verbose)
+        os.makedirs(output_path, exist_ok=True)
+        exported_csv_file = os.path.join(output_path, f"{table_name}_full.csv")
+        with open(exported_csv_file, 'w') as f:
+            pass
+        external_table_query = f"""
+        CREATE EXTERNAL TABLE '/export/home/nz/{exported_csv_file}'
+        USING (
+            delimiter ','
+            escapeChar '\\'
+            nullValue 'NULL'
+            encoding 'internal'
+        )
+        AS {query};
+        """
+        encoding_command = f"iconv -f ISO-8859-1 -t UTF-8 {exported_csv_file} -o {exported_csv_file}"
+        nzsql_command = f"""nzsql -c "{external_table_query}" """
+        try:
+            Debug.log(f"Running command: {encoding_command}", 'DEBUG', verbose)
+            subprocess.run(encoding_command, shell=True, check=True)
+            subprocess.run(nzsql_command, shell=True, check=True)
+        except subprocess.CalledProcessError as e:
+            Debug.log(f"Error executing Netezza command: {e}", 'ERROR')
+            return None
+        header = "ddd"
+        return header, exported_csv_file
+    def get_row_id_of_row(self, fully_qualified_table: str, header: str, row: str):
+        """Fetches the rowid of a given row."""
+        rowid_query = f""""""

snowforge_package-0.2.7/Snowforge/DataMover/Extractors/__init__.py ADDED Viewed

File without changes

snowforge_package-0.2.7/Snowforge/DataMover/__init__.py ADDED Viewed

File without changes

snowforge_package-0.2.7/Snowforge/Logging.py ADDED Viewed

@@ -0,0 +1,59 @@
+import logging
+from colored import Fore, Style
+class Debug:
+    """Handles logging with colored output for better visibility."""
+    logger = logging.getLogger("SnowforgeLogger")
+    handler = logging.StreamHandler()
+    handler.setFormatter(logging.Formatter('%(asctime)s - %(levelname)s - %(message)s'))
+    # Ensure the logger only has one handler (avoid duplicate logs)
+    if not logger.hasHandlers():
+        logger.addHandler(handler)
+    @staticmethod
+    def log(message: str, level='INFO', verbose_logging: bool = False):
+        """Logs a message with a specified severity level and colored output.
+        Args:
+            message (str): The message to log.
+            level (str, optional): The log level (INFO, DEBUG, ERROR, etc.). Defaults to "INFO".
+            verbose_logging (bool, optional): Set to True to enable DEBUG output globally. Defaults to False.
+        """
+        # Adjust log level based on verbose flag
+        if verbose_logging:
+            Debug.logger.setLevel(logging.DEBUG)
+        else:
+            Debug.logger.setLevel(logging.INFO)
+        # Convert level to uppercase
+        level = level.upper()
+        log_levels = {
+            'DEBUG': logging.DEBUG,
+            'INFO': logging.INFO,
+            'WARNING': logging.WARNING,
+            'ERROR': logging.ERROR,
+            'CRITICAL': logging.CRITICAL
+        }
+        # Define color mapping
+        color_map = {
+            'INFO': Fore.white,
+            'ERROR': Fore.red,
+            'DEBUG': Fore.blue,
+            'WARNING': Fore.yellow,
+            'SUCCESS': Fore.light_green,
+            'FAILURE': Fore.red,
+            'CRITICAL': Fore.light_red
+        }
+        colored_message = f"{color_map.get(level, Fore.white)}{message}{Style.reset}"
+        # Ensure the log level exists
+        if level in log_levels:
+            getattr(Debug.logger, level.lower())(colored_message)
+        else:
+            Debug.logger.info(colored_message)

snowforge_package-0.2.7/Snowforge/SnowflakeConnect.py ADDED Viewed

@@ -0,0 +1,48 @@
+import snowflake.connector as sf
+from .Logging import Debug  # Import from same package
+class SnowflakeConnection:
+    """Handles establishing and managing connections to Snowflake."""
+    DEFAULTS = {
+        "snowflake_username": "snowflake username",
+        "snowflake_account": "snowflake account"
+    }
+    _connection = None  # Instance variable
+    @staticmethod
+    def establish_connection(user_name: str = DEFAULTS["snowflake_username"], account: str = DEFAULTS["snowflake_account"]) -> sf.connection:
+        """Establishes a connection to Snowflake.
+        Uses either a credentials file or manual login via username and account.
+        Args:
+            user_name (str, optional): The Snowflake username. Defaults to 'DEFAULTS["snowflake_username"]'.
+            account (str, optional): The Snowflake account ID. Defaults to 'DEFAULTS["snowflake_account"]'.
+            verbose (bool, optional): set True to enable DEBUG output. Defaults to False.
+        Returns:
+            sf.connection: A Snowflake account connection object.
+        Raises:
+            sf.errors.Error: If connection fails.
+        """
+        if SnowflakeConnection._connection:
+            return SnowflakeConnection._connection
+        try:
+            if user_name == SnowflakeConnection.DEFAULTS["snowflake_username"] or account == SnowflakeConnection.DEFAULTS["snowflake_account"]:
+                SnowflakeConnection._connection = sf.connect()
+            else:
+                SnowflakeConnection._connection = sf.connect(
+                    user=user_name,
+                    account=account,
+                    authenticator="externalbrowser"
+                )
+            return SnowflakeConnection._connection
+        except Exception as e:
+            Debug.log(f"\nCould not connect to Snowflake, did you create a .toml file?\nRemember you can always connect using account + username.\nError message: {e}", 'ERROR')
+            raise sf.errors.ConfigSourceError

snowforge_package-0.2.7/Snowforge/__init__.py ADDED Viewed

@@ -0,0 +1,7 @@
+# Snowforge/__init__.py
+from .Logging import Debug
+from .SnowflakeConnect import SnowflakeConnection
+from .AWSIntegration import AWSIntegration
+from .Config import Config
+__all__ = ["Debug", "SnowflakeConnection", "SnowflakeLogging", "AWSIntegration", "Config"]

snowforge_package-0.2.7/pyproject.toml ADDED Viewed

@@ -0,0 +1,4 @@
+# pyproject.toml
+[build-system]
+requires = ["setuptools", "wheel"]
+build-backend = "setuptools.build_meta"

snowforge_package-0.2.7/setup.cfg ADDED Viewed

@@ -0,0 +1,4 @@
+[egg_info]
+tag_build =
+tag_date = 0

snowforge_package-0.2.7/setup.py ADDED Viewed

@@ -0,0 +1,28 @@
+from setuptools import setup, find_packages
+setup(
+    name="snowforge-package",
+    version="0.2.7",  # Change this for new releases
+    author="Andreas Heggelund",
+    author_email="andreasheggelund@gmail.com",
+    description="A Python package for supporting migration from on-prem to cloud",
+    long_description=open("README.md").read(),
+    long_description_content_type="text/markdown",
+    url="https://github.com/yourusername/Snowforge",  # Replace with your GitHub repo
+    packages=find_packages(),
+    install_requires=[
+        "boto3",
+        "snowflake-connector-python",
+        "coloredlogs",
+        "colored",
+        "tqdm",
+        "toml",
+        "argparse"
+    ],
+    python_requires=">=3.12",
+    classifiers=[
+        "Programming Language :: Python :: 3",
+        "License :: OSI Approved :: MIT License",
+        "Operating System :: OS Independent",
+    ],
+)

snowforge_package-0.2.7/snowforge_package.egg-info/PKG-INFO ADDED Viewed

@@ -0,0 +1,148 @@
+Metadata-Version: 2.2
+Name: snowforge-package
+Version: 0.2.7
+Summary: A Python package for supporting migration from on-prem to cloud
+Home-page: https://github.com/yourusername/Snowforge
+Author: Andreas Heggelund
+Author-email: andreasheggelund@gmail.com
+Classifier: Programming Language :: Python :: 3
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Requires-Python: >=3.12
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: boto3
+Requires-Dist: snowflake-connector-python
+Requires-Dist: coloredlogs
+Requires-Dist: colored
+Requires-Dist: tqdm
+Requires-Dist: toml
+Requires-Dist: argparse
+Dynamic: author
+Dynamic: author-email
+Dynamic: classifier
+Dynamic: description
+Dynamic: description-content-type
+Dynamic: home-page
+Dynamic: requires-dist
+Dynamic: requires-python
+Dynamic: summary
+# 🚀 Snowforge - Powerful Data Integration
+**Snowforge** is a Python package designed to streamline data integration and transfer between **AWS**, **Snowflake**, and various **on-premise database systems**. It provides efficient data extraction, logging, configuration management, and AWS utilities to support robust data engineering workflows.
+---
+## ✨ Features
+- **AWS Integration**: Manage AWS S3 and Secrets Manager operations.
+- **Snowflake Connection**: Establish and manage Snowflake connections effortlessly.
+- **Advanced Logging**: Centralized logging system with colored output for better visibility.
+- **Configuration Management**: Load and manage credentials from a TOML configuration file.
+- **Data Mover Engine**: Parallel data processing and extraction strategies for efficiency.
+- **Extensible Database Extraction**: Uses a **strategy pattern** to support multiple **on-prem database systems** (e.g., Netezza, Oracle, PostgreSQL, etc.).
+---
+## 📥 Installation
+Install Snowforge using pip:
+```sh
+pip install snowforge-package
+```
+---
+## ⚙️ Configuration
+Snowforge requires a configuration file (`snowforge_config.toml`) to manage credentials for AWS and Snowflake. The package searches for the config file in the following locations:
+1. Path specified in `SNOWFORGE_CONFIG_PATH` environment variable.
+2. Current working directory.
+3. `~/.config/snowforge_config.toml`
+4. Package directory.
+### Example `snowforge_config.toml` File
+```toml
+[AWS]
+[default]
+AWS_ACCESS_KEY = "your-access-key"
+AWS_SECRET_KEY = "your-secret-key"
+REGION = "us-east-1"
+[SNOWFLAKE]
+[default]
+USERNAME = "your-username"
+ACCOUNT = "your-account"
+```
+---
+## 🚀 Quick Start
+### 🔹 Initialize AWS Integration
+```python
+from Snowforge.AWSIntegration import AWSIntegration
+AWSIntegration.initialize(profile="default", verbose=True)
+```
+### 🔹 Connect to Snowflake
+```python
+from Snowforge.SnowflakeConnect import SnowflakeConnection
+conn = SnowflakeConnection.establish_connection(user_name="your-user", account="your-account")
+```
+### 🔹 Use Logging
+```python
+from Snowforge.Logging import Debug
+Debug.log("This is an info message", level='INFO')
+Debug.log("This is an error message", level='ERROR')
+```
+### 🔹 Extract Data from an On-Prem Database
+```python
+import Snowforge.AWSIntegration as aws
+def export_and_upload_table_data(extractor: ExtractorStrategy):
+    #Fetch data from an on-prem system:
+    query = extractor.extract_table_query("database.schema.table", "filter_column", "filter_value")
+    full_path_to_file = extractor.export_table_to_file(query, output_path, file_format(optional))
+    aws.upload_to_s3("bucket name", full_path_to_file, "key to store the file under in S3")
+def main():
+    from Snowforge.Extractors.NetezzaExtractor import NetezzaExtractor
+    from Snowforge.Extractors.OracleExtractor import OracleExtractor
+    from Snowforge.Extractors.PostgrSQLExtractor import PostgrSQLExtractor
+    # Export and upload data from different source systems by exchanging the extractor strategy
+    export_and_upload_table_data(NetezzaExtractor())
+    export_and_upload_table_data(OracleExtractor())
+    export_and_upload_table_data(PostgrSQLExtractor())
+```
+Since **Snowforge** follows a **strategy pattern**, it can be easily extended to support other **database systems** by implementing new extractor classes that conform to the `ExtractorStrategy` interface.
+---
+## 📜 License
+This project is licensed under the **MIT License**.
+---
+## 👤 Author
+Developed by **andreasheggelund@gmail.com**. Feel free to reach out for support, suggestions, or collaboration!

snowforge_package-0.2.7/snowforge_package.egg-info/SOURCES.txt ADDED Viewed

@@ -0,0 +1,19 @@
+LICENSE
+README.md
+pyproject.toml
+setup.py
+Snowforge/AWSIntegration.py
+Snowforge/Config.py
+Snowforge/Logging.py
+Snowforge/SnowflakeConnect.py
+Snowforge/__init__.py
+Snowforge/DataMover/DataMover.py
+Snowforge/DataMover/__init__.py
+Snowforge/DataMover/Extractors/ExtractorStrategy.py
+Snowforge/DataMover/Extractors/NetezzaExtractor.py
+Snowforge/DataMover/Extractors/__init__.py
+snowforge_package.egg-info/PKG-INFO
+snowforge_package.egg-info/SOURCES.txt
+snowforge_package.egg-info/dependency_links.txt
+snowforge_package.egg-info/requires.txt
+snowforge_package.egg-info/top_level.txt

snowforge_package-0.2.7/snowforge_package.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+

snowforge_package-0.2.7/snowforge_package.egg-info/requires.txt ADDED Viewed

@@ -0,0 +1,7 @@
+boto3
+snowflake-connector-python
+coloredlogs
+colored
+tqdm
+toml
+argparse

snowforge_package-0.2.7/snowforge_package.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ Snowforge