PyPI - easy-data-loader - Versions diffs - 0.1.0__tar.gz → 0.1.2__tar.gz - Mend

easy-data-loader 0.1.0tar.gz → 0.1.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (39) hide show

easy_data_loader-0.1.2/PKG-INFO ADDED Viewed

@@ -0,0 +1,110 @@
+Metadata-Version: 2.4
+Name: easy_data_loader
+Version: 0.1.2
+Summary: Data transfer utilities between files and databases
+Author-email: Bojoi Gabriel <bojoigabriel@gmail.com>
+Classifier: Development Status :: 3 - Alpha
+Classifier: Intended Audience :: Developers
+Classifier: Topic :: Database
+Classifier: Topic :: Scientific/Engineering :: Information Analysis
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Operating System :: OS Independent
+Requires-Python: >=3.13
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: click>=8.3.0
+Requires-Dist: openpyxl>=3.1.5
+Requires-Dist: pandas>=2.3.3
+Requires-Dist: pyarrow>=22.0.0
+Requires-Dist: pydantic>=2.12.5
+Requires-Dist: pydantic-settings>=2.12.0
+Requires-Dist: pyodbc>=5.2.0
+Requires-Dist: python-dotenv>=1.1.1
+Requires-Dist: sqlalchemy>=2.0.43
+Dynamic: license-file
+# Easy Data Loader 🚀
+[![PyPI version](https://badge.fury.io/py/easy-data-loader.svg)](https://badge.fury.io/py/easy-data-loader)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+![Downloads](https://static.pepy.tech/badge/easy-data-loader)
+**Easy Data Loader** is a flexible, modular Python library designed to streamline ETL (Extract, Transform, Load) processes between various file data sources (csv, xlsx, parquet, orc) and databases (MSSQL, PostgreSQL and others).
+## ✨ Key Features
+- **Declarative Configuration**: Manage connections and pipelines through simple python files and `.env` resources.
+- **Integrated CLI**: Initialize a standardized project structure with a single command.
+- **Custom Transformation Hooks**: Inject your own Pandas transformation logic directly into the pipeline execution.
+- **Performance Optimized**: Built-in support for chunked loading and writing to handle large datasets efficiently.
+- **Extensible Architecture**: Uses a Factory Pattern for database connectors, making it easy to support new drivers.
+---
+## 📦 Installation
+Install directly via `pip` or `uv`:
+```bash
+pip install easy_data_loader
+uv add easy_data_loader
+```
+## 🚀 Getting Started
+1. Initialize a new project structure to generate template configurations:
+   ```bash
+   easy-data-loader init
+   ```
+2. Review the generated `config/` folders for sample resources and pipelines.
+3. Run all discovered pipelines across the active configurations:
+   ```bash
+   easy-data-loader run_all
+   ```
+## ✔️ Generic concepts
+`easy_data_loader` uses `resources` as a way to define a file or a database. The resouces can represent either a source or a destination making posible the folowing ETL scenarios: file -> file, file -> database, database -> file, database -> database.
+`easy_data_loader` project initializer will created the predefined folder structure `/config/resources` where the resources are expected to be defined following the current convention: the file type is .env and the file name must be prefixed with the resource type `file_` or `database_`. The predefined folder structure together with the naming convention enables `easy_data_loader` to find and load all resources.
+A secondary predefined folder `/config/pipelines` will contain the pipeline definition files, which are regular Python files. There are 3 types of pipelines that can be defined:
+- `LoadPipeline` the main pipeline type which transports data from source to destination
+- `ProcedurePipeline` a pipeline dedicated for executing stored procedures inside a database
+- `OrchestratorPipeline` a pipeline that can execute a group of pipelines sequentialy
+## LoadPipeline
+In order to define a `LoadPipeline` we must use the `BasePipelineDefinition` from `easy_data_loader` as depicted in the example pipelines created by the initializer.
+In the simplest form there are only a few mandatory parameters:
+- `pipeline_name : str` - this name will be used to execute the pipeline
+- `source : str` - the file name (without extension) coresponding to the desired resource to be the data source
+- `destination : str` - the file name (without extension) coresponding to the desired resource to be the data destination
+If either the source or destination are a database then additional parameters become mandatory:
+- `source_sql : str` - can be a table name or a specific query in the SQL dialect of the source database flavor
+- `destination_table : str` - table name where the data will be inserted
+There are many other aspects of the pipeline that can be defined:
+- `audit : str` - the pipeline has a built in audit functionality, it records certain information after the pipeline completes in a SqlLite database. If the user desires, the same information can be recorded in a database `resource`
+- `validator: Pydantic BaseModel` - the data read from the source `resource` can be validated using an arbitrary defined Pydantic model before is written to destination
+- `columns : Dict[str, ColumnDefinition]` - this parameter is used for strict control on how the data is written to destination; it has the dual purpose of renaming the columns and also define explicitly the data types (mainly for inserting into a database table); the `ColumnDefinition` is constructed with an optional `target_name: str` for renaming columns and / or a  `data_type : SqlAlchemy Type` thus controling column data types, lenghts, precision etc.
+- `read_parameters : Dict[str, Any]` and `write_parameters : Dict[str, Any]` - these parameters control how the data is being read or written from source to destination and provide an easy way to use special delimiters for files, drop and recreate the database table, etc. `easy_data_loader` is using pandas as the transport layer therefore the read and write parameters will be passed to the coresponding read and write functions supported by pandas.
+- the pipeline has a set of predefined hooks allowing the execution of functions at specific moments during the execution: `file_pre_process : Callable` - executed before the file is read into the pandas DataFrame (e.g. unzip the file); `transform : Callable` - perform data transformation over the data already in the pandas DataFrame (requires pandas methods); `file_post_process : Callable` - after the pipeline completes and the data is written to the destination perform post processing on the source file (e.g. move the file to another folder)
+## ProcedurePipeline
+This secondary pipeline type is responsible for executing one or more stored procedures inside a database.
+To define one we need to use the `ProcedureDefinition` with the following parameters:
+- `pipeline_name : str` - this name will be used to execute the pipeline
+- `audit : str, optional` - database resource name where the audit info will be recorded
+- `resource : str` - database resource name where the stored procedure(s) wil be executed
+- `procedures : List[tuple(str, Optional[Dict[str, Any]])]` - list of one or more stord procedures along with optional procedures parameters as dictionaries
+## OrchestratorPipeline
+This pipeline type is responsible of executing sequentially a set of pipelines, `LoadPipeline`s and / or `ProcedurePipeline`s. Very simple to define using the `OrchestratorDefinition` with:
+- `orchestrator_name : str` - name by which the orchestrator is executer
+- 'pipelines : List[str]` - list of pipelines to execute sequentially
+- `fail_fast : bool, Default True` - if any of the pipelines fail the rest of the pipelines in the list do not get executed

easy_data_loader-0.1.2/README.md ADDED Viewed

@@ -0,0 +1,84 @@
+# Easy Data Loader 🚀
+[![PyPI version](https://badge.fury.io/py/easy-data-loader.svg)](https://badge.fury.io/py/easy-data-loader)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+![Downloads](https://static.pepy.tech/badge/easy-data-loader)
+**Easy Data Loader** is a flexible, modular Python library designed to streamline ETL (Extract, Transform, Load) processes between various file data sources (csv, xlsx, parquet, orc) and databases (MSSQL, PostgreSQL and others).
+## ✨ Key Features
+- **Declarative Configuration**: Manage connections and pipelines through simple python files and `.env` resources.
+- **Integrated CLI**: Initialize a standardized project structure with a single command.
+- **Custom Transformation Hooks**: Inject your own Pandas transformation logic directly into the pipeline execution.
+- **Performance Optimized**: Built-in support for chunked loading and writing to handle large datasets efficiently.
+- **Extensible Architecture**: Uses a Factory Pattern for database connectors, making it easy to support new drivers.
+---
+## 📦 Installation
+Install directly via `pip` or `uv`:
+```bash
+pip install easy_data_loader
+uv add easy_data_loader
+```
+## 🚀 Getting Started
+1. Initialize a new project structure to generate template configurations:
+   ```bash
+   easy-data-loader init
+   ```
+2. Review the generated `config/` folders for sample resources and pipelines.
+3. Run all discovered pipelines across the active configurations:
+   ```bash
+   easy-data-loader run_all
+   ```
+## ✔️ Generic concepts
+`easy_data_loader` uses `resources` as a way to define a file or a database. The resouces can represent either a source or a destination making posible the folowing ETL scenarios: file -> file, file -> database, database -> file, database -> database.
+`easy_data_loader` project initializer will created the predefined folder structure `/config/resources` where the resources are expected to be defined following the current convention: the file type is .env and the file name must be prefixed with the resource type `file_` or `database_`. The predefined folder structure together with the naming convention enables `easy_data_loader` to find and load all resources.
+A secondary predefined folder `/config/pipelines` will contain the pipeline definition files, which are regular Python files. There are 3 types of pipelines that can be defined:
+- `LoadPipeline` the main pipeline type which transports data from source to destination
+- `ProcedurePipeline` a pipeline dedicated for executing stored procedures inside a database
+- `OrchestratorPipeline` a pipeline that can execute a group of pipelines sequentialy
+## LoadPipeline
+In order to define a `LoadPipeline` we must use the `BasePipelineDefinition` from `easy_data_loader` as depicted in the example pipelines created by the initializer.
+In the simplest form there are only a few mandatory parameters:
+- `pipeline_name : str` - this name will be used to execute the pipeline
+- `source : str` - the file name (without extension) coresponding to the desired resource to be the data source
+- `destination : str` - the file name (without extension) coresponding to the desired resource to be the data destination
+If either the source or destination are a database then additional parameters become mandatory:
+- `source_sql : str` - can be a table name or a specific query in the SQL dialect of the source database flavor
+- `destination_table : str` - table name where the data will be inserted
+There are many other aspects of the pipeline that can be defined:
+- `audit : str` - the pipeline has a built in audit functionality, it records certain information after the pipeline completes in a SqlLite database. If the user desires, the same information can be recorded in a database `resource`
+- `validator: Pydantic BaseModel` - the data read from the source `resource` can be validated using an arbitrary defined Pydantic model before is written to destination
+- `columns : Dict[str, ColumnDefinition]` - this parameter is used for strict control on how the data is written to destination; it has the dual purpose of renaming the columns and also define explicitly the data types (mainly for inserting into a database table); the `ColumnDefinition` is constructed with an optional `target_name: str` for renaming columns and / or a  `data_type : SqlAlchemy Type` thus controling column data types, lenghts, precision etc.
+- `read_parameters : Dict[str, Any]` and `write_parameters : Dict[str, Any]` - these parameters control how the data is being read or written from source to destination and provide an easy way to use special delimiters for files, drop and recreate the database table, etc. `easy_data_loader` is using pandas as the transport layer therefore the read and write parameters will be passed to the coresponding read and write functions supported by pandas.
+- the pipeline has a set of predefined hooks allowing the execution of functions at specific moments during the execution: `file_pre_process : Callable` - executed before the file is read into the pandas DataFrame (e.g. unzip the file); `transform : Callable` - perform data transformation over the data already in the pandas DataFrame (requires pandas methods); `file_post_process : Callable` - after the pipeline completes and the data is written to the destination perform post processing on the source file (e.g. move the file to another folder)
+## ProcedurePipeline
+This secondary pipeline type is responsible for executing one or more stored procedures inside a database.
+To define one we need to use the `ProcedureDefinition` with the following parameters:
+- `pipeline_name : str` - this name will be used to execute the pipeline
+- `audit : str, optional` - database resource name where the audit info will be recorded
+- `resource : str` - database resource name where the stored procedure(s) wil be executed
+- `procedures : List[tuple(str, Optional[Dict[str, Any]])]` - list of one or more stord procedures along with optional procedures parameters as dictionaries
+## OrchestratorPipeline
+This pipeline type is responsible of executing sequentially a set of pipelines, `LoadPipeline`s and / or `ProcedurePipeline`s. Very simple to define using the `OrchestratorDefinition` with:
+- `orchestrator_name : str` - name by which the orchestrator is executer
+- 'pipelines : List[str]` - list of pipelines to execute sequentially
+- `fail_fast : bool, Default True` - if any of the pipelines fail the rest of the pipelines in the list do not get executed

easy_data_loader-0.1.2/pyproject.toml ADDED Viewed

@@ -0,0 +1,43 @@
+[project]
+name = "easy_data_loader"
+version = "0.1.2"
+description = "Data transfer utilities between files and databases"
+authors = [{ name = "Bojoi Gabriel", email = "bojoigabriel@gmail.com" }]
+readme = "README.md"
+requires-python = ">=3.13"
+dependencies = [
+    "click>=8.3.0",
+    "openpyxl>=3.1.5",
+    "pandas>=2.3.3",
+    "pyarrow>=22.0.0",
+    "pydantic>=2.12.5",
+    "pydantic-settings>=2.12.0",
+    "pyodbc>=5.2.0",
+    "python-dotenv>=1.1.1",
+    "sqlalchemy>=2.0.43",
+]
+classifiers = [
+    "Development Status :: 3 - Alpha",
+    "Intended Audience :: Developers",
+    "Topic :: Database",
+    "Topic :: Scientific/Engineering :: Information Analysis",
+    "License :: OSI Approved :: MIT License",
+    "Programming Language :: Python :: 3.13",
+    "Operating System :: OS Independent",
+]
+[dependency-groups]
+dev = ["ipykernel>=7.1.0", "pytest>=8.4.2", "ruff", "mypy", "pre-commit"]
+[project.scripts]
+easy-data-loader = "easy_data_loader.cli:main"
+[tool.setuptools.packages.find]
+where = ["src"]
+[tool.pytest.ini_options]
+pythonpath = "src"
+[build-system]
+requires = ["setuptools>=61.0"]
+build-backend = "setuptools.build_meta"

easy_data_loader-0.1.2/src/easy_data_loader/__init__.py ADDED Viewed

@@ -0,0 +1,21 @@
+__version__ = "0.1.0"
+from .models import (
+    BasePipelineDefinition,
+    ColumnDefinition,
+    OrchestratorDefinition,
+    ProcedureDefinition,
+)
+from .orchestrator import OrchestratorPipeline
+from .pipeline import LoadPipeline
+from .procedure_pipeline import ProcedurePipeline
+__all__ = [
+    "LoadPipeline",
+    "ProcedurePipeline",
+    "OrchestratorPipeline",
+    "BasePipelineDefinition",
+    "ProcedureDefinition",
+    "ColumnDefinition",
+    "OrchestratorDefinition",
+]

{easy_data_loader-0.1.0 → easy_data_loader-0.1.2}/src/easy_data_loader/cli.py RENAMED Viewed

@@ -1,7 +1,7 @@
-import click
-import os
 from pathlib import Path
+import click
 # Integrated templates
 PIPELINE_TEMPLATE = """
 from easy_data_loader.pipeline import LoadPipeline
@@ -89,34 +89,40 @@ CONN_PORT=1433
 FILE_ENV = """
 # file resource definition
-FILE_TYPE=CSV
-FOLDER_PATH=./data/imports
-FILE_NAME=large_sales_data
+FILE_TYPE=CSV                # can also be XLSX, PARQUET, ORC
+FOLDER_PATH=./data/imports   # source folder where the file is located
+FILE_NAME=large_sales_data   # exact file name without extension
+#FILE_PATTERN=large_sales    # file pattern to search in the source folder
 """
 MAIN = """
-from easy_data_loader.pipeline import LoadPipeline
+from easy_data_loader import LoadPipeline, ProcedurePipeline
-# Run an ETL pipeline
-LoadPipeline(pipeline_name="example_pipeline").run()
+def main():
+    # Run an ETL pipeline
+    LoadPipeline(pipeline_name="example_pipeline").run()
+    # Run a procedure pipeline
+    ProcedurePipeline(pipeline_name="example_procedure").run()
-# Run a procedure pipeline
-# from easy_data_loader.procedure_pipeline import ProcedurePipeline
-# ProcedurePipeline(pipeline_name="example_procedure").run()
+if __name__ == "__main__":
+    main()
 """
 @click.group()
 def main():
-    """Easy Data Loader CLI - ETL instrument between files and databases"""
+    """Easy Data Loader CLI - ETL instrument for files and databases"""
     pass
 @main.command()
 def init():
     """Initialize folder structure and sample files"""
     base_path = Path.cwd()
     # folders
-    folders = ['config/resources', 'config/pipelines']
+    folders = ["config/resources", "config/pipelines"]
     for folder in folders:
         (base_path / folder).mkdir(parents=True, exist_ok=True)
@@ -127,7 +133,7 @@ def init():
         "config/pipelines/orchestrator_example.py": ORCHESTRATOR_TEMPLATE,
         "config/resources/database_example.env": DATABASE_ENV,
         "config/resources/file_example.env": FILE_ENV,
-        "main.py" : MAIN,
+        "main.py": MAIN,
     }
     for name, content in files.items():
@@ -141,10 +147,12 @@ def init():
     click.echo("\nProject initialized successfully!")
 @main.command()
 def list():
     """List all discovered resources and pipelines"""
     from .config_loader import Configuration
     config = Configuration()
     click.echo("--- Discovered Resources ---")
@@ -157,19 +165,22 @@ def list():
 @main.command()
-@click.argument('resource_name')
-@click.argument('table_name')
+@click.argument("resource_name")
+@click.argument("table_name")
 def inspect_db(resource_name, table_name):
     """Inspect a database table and generate ColumnDefinition code"""
     from .config_loader import Configuration
     from .database_connector import CONNECTOR_FACTORY
     from .database_operations import DatabaseOperations
-    from .models import ConnectionSettings
+    from .models import ServerBasedConnectionSettings, FileBasedConnectionSettings
     config = Configuration()
     resource = config.get_resource(resource_name)
-    if not isinstance(resource, ConnectionSettings):
+    if not isinstance(
+        resource, (ServerBasedConnectionSettings, FileBasedConnectionSettings)
+    ):
         click.echo(f"Error: Resource '{resource_name}' is not a database connection.")
         return
@@ -186,7 +197,9 @@ def inspect_db(resource_name, table_name):
     click.echo(f"\n# Suggested Column definitions for {table_name}:")
     click.echo("columns={")
     for col, dtype in schema.items():
-        click.echo(f'    "{col}": ColumnDefinition(target_name="{col}", data_type={dtype}),')
+        click.echo(
+            f'    "{col}": ColumnDefinition(target_name="{col}", data_type={dtype}),'
+        )
     click.echo("}")
@@ -194,9 +207,13 @@ def inspect_db(resource_name, table_name):
 def run_all():
     """Run all discovered pipelines and show status summary"""
     from .config_loader import Configuration
+    from .models import (
+        BasePipelineDefinition,
+        OrchestratorDefinition,
+        ProcedureDefinition,
+    )
     from .pipeline import LoadPipeline
     from .procedure_pipeline import ProcedurePipeline
-    from .models import BasePipelineDefinition, ProcedureDefinition, OrchestratorDefinition
     config = Configuration()
     pipelines = config.get_all_pipelines()
@@ -219,6 +236,7 @@ def run_all():
                 success = ProcedurePipeline(name).run()
             elif isinstance(definition, OrchestratorDefinition):
                 from .orchestrator import OrchestratorPipeline
                 success = OrchestratorPipeline(name).run()
             else:
                 success = False
@@ -235,12 +253,13 @@ def run_all():
     for name, status in results.items():
         click.echo(f"{name:<25} | {status}")
 @main.command()
-@click.argument('orchestrator_name')
+@click.argument("orchestrator_name")
 def run_orchestrator(orchestrator_name):
     """Run a specific orchestrator by name"""
     from .orchestrator import OrchestratorPipeline
     try:
         success = OrchestratorPipeline(orchestrator_name).run()
         if success:
@@ -256,7 +275,7 @@ def validate_resources():
     """Validate all configured resources"""
     from .config_loader import Configuration
     from .database_connector import CONNECTOR_FACTORY
-    from .models import ConnectionSettings, FileSettings
+    from .models import FileSettings
     config = Configuration()
     resources = config.get_all_resources()
@@ -267,24 +286,28 @@ def validate_resources():
     click.echo(f"🔍 Validating {len(resources)} resources...\n")
+    from .models import ServerBasedConnectionSettings, FileBasedConnectionSettings
     results = {}
     for name, resource in resources.items():
         click.echo(f"Resource: {name} ... ", nl=False)
         try:
-            if isinstance(resource, ConnectionSettings):
+            if isinstance(
+                resource, (ServerBasedConnectionSettings, FileBasedConnectionSettings)
+            ):
                 # Validate Database Connection
-                connector = CONNECTOR_FACTORY[resource.conn_server_type](resource)
+                CONNECTOR_FACTORY[resource.conn_server_type](resource)
                 # The connector tests connection in __init__, so if we are here it passed
                 results[name] = "OK (Connected)"
             elif isinstance(resource, FileSettings):
                 # Validate File Path
                 if resource.folder_path.exists():
-                     results[name] = "OK (Path Exists)"
+                    results[name] = "OK (Path Exists)"
                 else:
-                     raise ValueError(f"Path does not exist: {resource.folder_path}")
+                    raise ValueError(f"Path does not exist: {resource.folder_path}")
             else:
-                 results[name] = "UNKNOWN TYPE"
+                results[name] = "UNKNOWN TYPE"
         except Exception as e:
             results[name] = f"FAILED: {str(e)}"
@@ -299,4 +322,4 @@ def validate_resources():
 if __name__ == "__main__":
-    main()
+    main()

{easy_data_loader-0.1.0 → easy_data_loader-0.1.2}/src/easy_data_loader/config_loader.py RENAMED Viewed

@@ -1,11 +1,18 @@
 import importlib.util
 from pathlib import Path
-from typing import Any, Dict, Union
 from types import ModuleType
-from dotenv import dotenv_values
+from typing import Dict, Union
 from .log import LoggedComponent
-from .models import BasePipelineDefinition, ProcedureDefinition, OrchestratorDefinition, ConnectionSettings, FileSettings, ResourceConfig
+from .models import (
+    BasePipelineDefinition,
+    FileSettings,
+    OrchestratorDefinition,
+    ProcedureDefinition,
+    ResourceConfig,
+    PipelineType,
+    ServerBasedConnectionSettings,
+)
 class Configuration(LoggedComponent):
@@ -14,6 +21,7 @@ class Configuration(LoggedComponent):
     Resources and pipelines are loaded only when requested.
     This class implements the Singleton pattern.
     """
     _instance = None
     _initialized = False
@@ -26,10 +34,17 @@ class Configuration(LoggedComponent):
         if not self._initialized:
             super().__init__()
             self.config_dir = Path(config_dir)
-            self.resources : Dict[str, ResourceConfig] = {}
-            self.pipelines : Dict[str, Union[BasePipelineDefinition, ProcedureDefinition, OrchestratorDefinition]] = {}
+            self.resources: Dict[str, ResourceConfig] = {}
+            self.pipelines: Dict[
+                str,
+                Union[
+                    BasePipelineDefinition, ProcedureDefinition, OrchestratorDefinition
+                ],
+            ] = {}
-            self.logger.debug(f"Initializing configuration from directory: {config_dir}")
+            self.logger.debug(
+                f"Initializing configuration from directory: {config_dir}"
+            )
             self._initialized = True
     def _load_env_file(self, env_file: Path) -> ResourceConfig:
@@ -38,15 +53,25 @@ class Configuration(LoggedComponent):
         env_file_name = env_file.stem
-        if env_file_name.startswith('database_'):
-            return ConnectionSettings(_env_file=[env_file]) # type: ignore
-        if env_file_name.startswith('file_'):
-            return FileSettings(_env_file=[env_file]) # type: ignore
+        if env_file_name.startswith("database_"):
+            # Peek at the env file to determine which connection class to use
+            from dotenv import dotenv_values
+            raw_values = dotenv_values(env_file)
+            server_type_str = raw_values.get("CONN_SERVER_TYPE", "").upper()
+            if server_type_str == "SQLITE":
+                from .models import FileBasedConnectionSettings
+                return FileBasedConnectionSettings(_env_file=[env_file])
+            return ServerBasedConnectionSettings(_env_file=[env_file])
+        if env_file_name.startswith("file_"):
+            return FileSettings(_env_file=[env_file])
-        self.log_and_raise(ValueError,
+        self.log_and_raise(
+            ValueError,
             f"Failed to load env file: {env_file.name}. "
-            f"Resource files must start with 'database_' or 'file_' prefix."
-            )
+            f"Resource files must start with 'database_' or 'file_' prefix.",
+        )
     def _import_module(self, module_file: Path) -> ModuleType:
         """Dynamically import a Python module from a file path"""
@@ -101,12 +126,12 @@ class Configuration(LoggedComponent):
                 raise
         # 3. Not found
-        self.log_and_raise(ValueError,
-             f"Resource not found: {resource_name}. "
-             f"Checked path: {resource_file}"
+        self.log_and_raise(
+            ValueError,
+            f"Resource not found: {resource_name}. Checked path: {resource_file}",
         )
-    def get_pipeline(self, pipeline_name: str) -> Union[BasePipelineDefinition, ProcedureDefinition, OrchestratorDefinition]:
+    def get_pipeline(self, pipeline_name: str) -> PipelineType:
         """
         Retrieve a pipeline definition by name.
         Uses lazy loading: checks memory first, then attempts to load from file.
@@ -125,18 +150,26 @@ class Configuration(LoggedComponent):
                 # Find the definition in the module
                 for attr_name in dir(config_module):
                     attr = getattr(config_module, attr_name)
-                    if isinstance(attr, (BasePipelineDefinition, ProcedureDefinition, OrchestratorDefinition)):
+                    if isinstance(
+                        attr,
+                        (
+                            BasePipelineDefinition,
+                            ProcedureDefinition,
+                            OrchestratorDefinition,
+                        ),
+                    ):
                         self.pipelines[pipeline_name] = attr
                         self.logger.info(f"Lazily loaded pipeline: {pipeline_name}")
                         return attr
             except Exception as e:
-                 self.log_exception(e, f"Failed to load pipeline: {pipeline_name}")
-                 raise
+                self.log_exception(e, f"Failed to load pipeline: {pipeline_name}")
+                raise
         # 3. Not found
-        self.log_and_raise(ValueError,
-                           f"Pipeline not found: {pipeline_name}. "
-                           f"Checked path: {pipeline_file}")
+        self.log_and_raise(
+            ValueError,
+            f"Pipeline not found: {pipeline_name}. Checked path: {pipeline_file}",
+        )
     def get_all_resources(self) -> dict[str, ResourceConfig]:
         """
@@ -149,19 +182,19 @@ class Configuration(LoggedComponent):
         # Discover all .env files
         for env_file in resources_dir.glob("*.env"):
-             # Simple logging to debug discovery
-             self.logger.debug(f"Found potential resource file: {env_file.name}")
+            # Simple logging to debug discovery
+            self.logger.debug(f"Found potential resource file: {env_file.name}")
-             resource_name = env_file.stem
-             if resource_name not in self.resources:
-                 try:
-                     self.resources[resource_name] = self._load_env_file(env_file)
-                 except Exception as e:
-                     self.logger.warning(f"Failed to load resource {resource_name}: {e}")
+            resource_name = env_file.stem
+            if resource_name not in self.resources:
+                try:
+                    self.resources[resource_name] = self._load_env_file(env_file)
+                except Exception as e:
+                    self.logger.warning(f"Failed to load resource {resource_name}: {e}")
         return self.resources
-    def get_all_pipelines(self) -> dict[str, BasePipelineDefinition]:
+    def get_all_pipelines(self) -> dict[str, PipelineType]:
         """
         Retrieve all pipelines.
         Scans the pipelines directory if not all loaded.
@@ -175,10 +208,10 @@ class Configuration(LoggedComponent):
             pipeline_name = pipeline_file.stem
             if pipeline_name not in self.pipelines:
-                 # Helper to trigger lazy load
-                 try:
+                # Helper to trigger lazy load
+                try:
                     self.get_pipeline(pipeline_name)
-                 except Exception as e:
-                     self.logger.warning(f"Failed to load pipeline {pipeline_name}: {e}")
+                except Exception as e:
+                    self.logger.warning(f"Failed to load pipeline {pipeline_name}: {e}")
         return self.pipelines

{easy_data_loader-0.1.0 → easy_data_loader-0.1.2}/src/easy_data_loader/custom_exceptions.py RENAMED Viewed

@@ -15,6 +15,7 @@ class DatabaseOperationException(Exception):
         self.message = message
         super().__init__(self.message)
 class InvalidFileException(Exception):
     def __init__(self, message: str = "The provided file is invalid or corrupted"):
         self.message = message

easy-data-loader 0.1.0__tar.gz → 0.1.2__tar.gz

easy-data-loader 0.1.0tar.gz → 0.1.2tar.gz