PyPI - penwings - Versions diffs - 0.2.4__tar.gz → 0.3.0.dev1__tar.gz - Mend

penwings 0.2.4tar.gz → 0.3.0.dev1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

penwings-0.3.0.dev1/PKG-INFO ADDED Viewed

@@ -0,0 +1,286 @@
+Metadata-Version: 2.4
+Name: penwings
+Version: 0.3.0.dev1
+Summary: Lightweight library to handle data and reproduce workflows
+Author-email: Raf Blanckaert <r.blanckaert@outlook.com>
+License-Expression: MIT
+Project-URL: Homepage, https://github.com/Frissie/penwings
+Project-URL: Repository, https://github.com/Frissie/penwings
+Project-URL: Issues, https://github.com/Frissie/penwings/issues
+Keywords: data,workflow,reproducibility,sql,analytics
+Classifier: Development Status :: 3 - Alpha
+Classifier: Intended Audience :: Developers
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Operating System :: OS Independent
+Requires-Python: >=3.11
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: pandas<4.0,>=2.2
+Requires-Dist: numpy<3.0,>=1.26
+Provides-Extra: sql
+Requires-Dist: sqlalchemy<3.0,>=2.0; extra == "sql"
+Provides-Extra: dev
+Requires-Dist: pytest>=8.0; extra == "dev"
+Requires-Dist: ruff>=0.3; extra == "dev"
+Requires-Dist: mypy>=1.8; extra == "dev"
+Requires-Dist: sqlalchemy<3.0,>=2.0; extra == "dev"
+Dynamic: license-file
+# Penwings
+**Penwings** is a lightweight Python library for building **reproducible data workflows**.
+It provides simple, composable tools to:
+- manage project structure
+- standardize data access
+- cache SQL queries efficiently
+The goal is to reduce boilerplate and make data pipelines **faster, cleaner, and reproducible by default**.
+---
+## ✨ Features
+### 🗂️ Project Structure Management
+- Standardized folder setup for data science projects
+- Automatic project root detection
+- Flexible, extensible path system
+### 🧠 SQL → Parquet Caching
+- Execute SQL queries via SQLAlchemy
+- Automatically cache results as Parquet
+- Reuse cached data to avoid unnecessary database hits
+- Configurable refresh logic
+### ⚡ Lightweight & Modular
+- Minimal core dependencies (`pandas`, `numpy`)
+- Optional SQL support
+- Designed to scale into larger workflows
+---
+## 📦 Installation
+```bash
+pip install penwings
+````
+### Optional SQL support
+```bash
+pip install penwings[sql]
+```
+> Requires Python **3.11+**
+---
+## 🚀 Quick Start
+### 1. Project structure
+```python
+from penwings import ProjectPaths
+paths = ProjectPaths()
+print(paths.data)
+print(paths.models)
+```
+Creates a standardized structure like:
+```
+configs/
+data/
+  raw/
+  processed/
+  external/
+features/
+logs/
+models/
+notebooks/
+reports/
+  figures/
+  tables/
+sql/
+```
+---
+### 2. SQL caching
+```python
+from sqlalchemy import create_engine
+from penwings import SQLParquetCache
+engine = create_engine("sqlite:///example.db")
+cache = SQLParquetCache(
+    sql_dir="sql",
+    parquet_dir="cache",
+    conn=engine,
+    refresh_days=1
+)
+```
+---
+## 📊 Usage
+### Using SQL files
+```python
+df = cache.get("sales.sql")
+```
+* Loads from **Parquet** if cached
+* Otherwise executes SQL and caches result
+---
+### Using raw SQL
+```python
+query = "SELECT * FROM sales WHERE month = '2026-02'"
+df = cache.get(
+    sql=query,
+    parquet_name="sales_feb"
+)
+```
+---
+### Cache behavior
+```python
+df = cache.get("sales.sql", force=True)
+```
+* `force=True` → always re-run SQL
+* `refresh_days=N` → cache expires after N days
+* returns:
+  * `DataFrame`
+---
+## 🧩 ProjectPaths
+Create only specific parts of a project:
+```python
+paths = ProjectPaths(folders=["data", "ml"])
+```
+Custom directories:
+```python
+paths = ProjectPaths(
+    custom_dirs={
+        "modules": "src/modules",
+        "views": "src/views"
+    }
+)
+```
+Access paths:
+```python
+paths.data
+paths["models"]
+paths.as_dict()
+```
+---
+## 🧠 Design Philosophy
+Penwings is built around a few core ideas:
+* **Reproducibility first** → deterministic data access via caching
+* **Convention over configuration** → sensible defaults for structure
+* **Composable building blocks** → small tools that work well together
+* **Lightweight core** → no heavy framework overhead
+---
+## 🛣️ Roadmap
+* Pipeline abstraction (data workflows as steps)
+* Improved SQL utilities and query management
+* Integration with feature engineering workflows
+* Better caching strategies and metadata tracking
+---
+## 🔢 Versioning
+Penwings follows **semantic versioning**:
+* **MAJOR** → breaking changes
+* **MINOR** → new features
+* **PATCH** → bug fixes
+---
+## 🤝 Contributing
+Contributions are welcome!
+1. Fork the repository
+2. Create a branch (`feature/my-feature`)
+3. Commit your changes
+4. Open a pull request
+---
+## 📄 License
+MIT License — see [LICENSE](LICENSE)
+---
+## 💡 Example Workflow
+```python
+from sqlalchemy import create_engine
+from penwings import ProjectPaths, SQLParquetCache
+# Setup project structure
+paths = ProjectPaths()
+# Setup SQL cache
+engine = create_engine("sqlite:///example.db")
+cache = SQLParquetCache(
+    sql_dir=paths.sql,
+    parquet_dir=paths.data,
+    conn=engine
+)
+# Load data
+df_sales = cache.get("sales.sql")
+```
+---
+## ⭐ Why Penwings?
+Penwings sits between:
+* ad-hoc scripts ❌
+* heavy frameworks ❌
+It gives you just enough structure to:
+* stay organized
+* move fast
+* keep workflows reproducible
+without getting in your way.

penwings-0.3.0.dev1/README.md ADDED Viewed

@@ -0,0 +1,256 @@
+# Penwings
+**Penwings** is a lightweight Python library for building **reproducible data workflows**.
+It provides simple, composable tools to:
+- manage project structure
+- standardize data access
+- cache SQL queries efficiently
+The goal is to reduce boilerplate and make data pipelines **faster, cleaner, and reproducible by default**.
+---
+## ✨ Features
+### 🗂️ Project Structure Management
+- Standardized folder setup for data science projects
+- Automatic project root detection
+- Flexible, extensible path system
+### 🧠 SQL → Parquet Caching
+- Execute SQL queries via SQLAlchemy
+- Automatically cache results as Parquet
+- Reuse cached data to avoid unnecessary database hits
+- Configurable refresh logic
+### ⚡ Lightweight & Modular
+- Minimal core dependencies (`pandas`, `numpy`)
+- Optional SQL support
+- Designed to scale into larger workflows
+---
+## 📦 Installation
+```bash
+pip install penwings
+````
+### Optional SQL support
+```bash
+pip install penwings[sql]
+```
+> Requires Python **3.11+**
+---
+## 🚀 Quick Start
+### 1. Project structure
+```python
+from penwings import ProjectPaths
+paths = ProjectPaths()
+print(paths.data)
+print(paths.models)
+```
+Creates a standardized structure like:
+```
+configs/
+data/
+  raw/
+  processed/
+  external/
+features/
+logs/
+models/
+notebooks/
+reports/
+  figures/
+  tables/
+sql/
+```
+---
+### 2. SQL caching
+```python
+from sqlalchemy import create_engine
+from penwings import SQLParquetCache
+engine = create_engine("sqlite:///example.db")
+cache = SQLParquetCache(
+    sql_dir="sql",
+    parquet_dir="cache",
+    conn=engine,
+    refresh_days=1
+)
+```
+---
+## 📊 Usage
+### Using SQL files
+```python
+df = cache.get("sales.sql")
+```
+* Loads from **Parquet** if cached
+* Otherwise executes SQL and caches result
+---
+### Using raw SQL
+```python
+query = "SELECT * FROM sales WHERE month = '2026-02'"
+df = cache.get(
+    sql=query,
+    parquet_name="sales_feb"
+)
+```
+---
+### Cache behavior
+```python
+df = cache.get("sales.sql", force=True)
+```
+* `force=True` → always re-run SQL
+* `refresh_days=N` → cache expires after N days
+* returns:
+  * `DataFrame`
+---
+## 🧩 ProjectPaths
+Create only specific parts of a project:
+```python
+paths = ProjectPaths(folders=["data", "ml"])
+```
+Custom directories:
+```python
+paths = ProjectPaths(
+    custom_dirs={
+        "modules": "src/modules",
+        "views": "src/views"
+    }
+)
+```
+Access paths:
+```python
+paths.data
+paths["models"]
+paths.as_dict()
+```
+---
+## 🧠 Design Philosophy
+Penwings is built around a few core ideas:
+* **Reproducibility first** → deterministic data access via caching
+* **Convention over configuration** → sensible defaults for structure
+* **Composable building blocks** → small tools that work well together
+* **Lightweight core** → no heavy framework overhead
+---
+## 🛣️ Roadmap
+* Pipeline abstraction (data workflows as steps)
+* Improved SQL utilities and query management
+* Integration with feature engineering workflows
+* Better caching strategies and metadata tracking
+---
+## 🔢 Versioning
+Penwings follows **semantic versioning**:
+* **MAJOR** → breaking changes
+* **MINOR** → new features
+* **PATCH** → bug fixes
+---
+## 🤝 Contributing
+Contributions are welcome!
+1. Fork the repository
+2. Create a branch (`feature/my-feature`)
+3. Commit your changes
+4. Open a pull request
+---
+## 📄 License
+MIT License — see [LICENSE](LICENSE)
+---
+## 💡 Example Workflow
+```python
+from sqlalchemy import create_engine
+from penwings import ProjectPaths, SQLParquetCache
+# Setup project structure
+paths = ProjectPaths()
+# Setup SQL cache
+engine = create_engine("sqlite:///example.db")
+cache = SQLParquetCache(
+    sql_dir=paths.sql,
+    parquet_dir=paths.data,
+    conn=engine
+)
+# Load data
+df_sales = cache.get("sales.sql")
+```
+---
+## ⭐ Why Penwings?
+Penwings sits between:
+* ad-hoc scripts ❌
+* heavy frameworks ❌
+It gives you just enough structure to:
+* stay organized
+* move fast
+* keep workflows reproducible
+without getting in your way.

{penwings-0.2.4 → penwings-0.3.0.dev1}/pyproject.toml RENAMED Viewed

@@ -25,28 +25,17 @@ classifiers = [
 ]
 dependencies = [
-    "sqlalchemy>=2.0,<3.0",
-    "pyodbc>=5.0,<6.0",
     "pandas>=2.2,<4.0",
     "numpy>=1.26,<3.0",
 ]
 [project.optional-dependencies]
-excel = ["openpyxl>=3.1,<4.0"]
-ml = ["scikit-learn>=1.4,<2.0"]
-scipy = ["scipy>=1.11,<2.0"]
-optuna = ["optuna>=3.5,<5.0"]
-all = [
-    "openpyxl>=3.1,<4.0",
-    "scikit-learn>=1.4,<2.0",
-    "scipy>=1.11,<2.0",
-    "optuna>=3.5,<5.0",
-]
+sql = ["sqlalchemy>=2.0,<3.0"]
 dev = [
     "pytest>=8.0",
     "ruff>=0.3",
     "mypy>=1.8",
+    "sqlalchemy>=2.0,<3.0",
 ]
 [project.urls]

penwings-0.3.0.dev1/src/penwings/io/__init__.py ADDED Viewed

@@ -0,0 +1,5 @@
+from .cache import SQLParquetCache
+__all__ = [
+    "SQLParquetCache",
+]

{penwings-0.2.4 → penwings-0.3.0.dev1}/src/penwings/io/cache.py RENAMED Viewed

@@ -1,12 +1,16 @@
+from __future__ import annotations
 import pandas as pd
-from sqlalchemy import Engine
 from pathlib import Path
 from datetime import datetime, timedelta
-from typing import Unpack, Optional
+from typing import Unpack, Optional, TYPE_CHECKING
 from ..utils._typing import SQLParquetKwargs
 from ..utils._decorators import timing_sql
+if TYPE_CHECKING:
+    from sqlalchemy.engine import Engine
 class SQLParquetCache:
     """
@@ -62,6 +66,10 @@ class SQLParquetCache:
         verbose: bool = True,
         **kwargs: Unpack[SQLParquetKwargs],
     ):
+        try:
+            import sqlalchemy  # noqa: F401
+        except ImportError:
+            raise ImportError("SQLParquetCache requires 'sqlalchemy'. Install it with: pip install penwings[sql]")
         if sql_dir is not None:
             self.sql_dir: Path = Path(sql_dir)

penwings-0.3.0.dev1/src/penwings/paths/__init__.py ADDED Viewed

@@ -0,0 +1,6 @@
+from .project_paths import ProjectPaths, ConfigPaths
+__all__ = [
+    "ProjectPaths",
+    "ConfigPaths",
+]

penwings 0.2.4__tar.gz → 0.3.0.dev1__tar.gz

penwings 0.2.4tar.gz → 0.3.0.dev1tar.gz