PyPI - datacompose - Versions diffs - 0.2.8__tar.gz - Mend

datacompose 0.2.8__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (101) hide show

datacompose-0.2.8/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,227 @@
+# Changelog
+All notable changes to Datacompose will be documented in this file.
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [Unreleased]
+## [0.2.7.0] - 2025-09-11
+### Fixed
+- **SHA256 Transformer Memory Issues**: Fixed Java heap space OutOfMemoryError in email and phone number SHA256 hashing
+  - Set `standardize_first=False` by default in tests to avoid complex Spark query planning issues
+  - All SHA256 hashing tests now pass without memory errors
+- **CLI Configuration Handling**: Improved config file error handling in add command
+  - Add command now properly fails with helpful error message when no config file exists
+  - Add command correctly handles malformed JSON config files
+  - "pyspark" is now the default target when explicitly called without config
+- **Test Fixtures**: Added missing `diverse_test_data` fixture for conditional operator tests
+  - Created comprehensive test dataset with category, value, size, id, and text columns
+  - Fixed all conditional logic tests in `test_conditional_core.py`
+  - Fixed all real-world scenario tests in `test_conditional_real_world.py`
+- **Test Assertions**: Updated test expectations to match actual behavior
+  - Fixed init command test to expect full command in error message ("datacompose init --force")
+  - Updated conditional test assertions for non-standardized hashing behavior
+### Changed
+- **Default Target Behavior**: ConfigLoader now returns "pyspark" as fallback when no config is provided programmatically
+## [0.2.6.0] - 2025-08-24
+### Added
+- **Automatic Conditional Detection**: Smart detection of conditional operators based on naming patterns
+  - Functions starting with `is_`, `has_`, `needs_`, `should_`, `can_`, `contains_`, `matches_`, `equals_`, `starts_with_`, `ends_with_` are automatically detected as conditionals
+  - Eliminates need for explicit `is_conditional=True` in most cases
+  - Explicit override still available when needed via `is_conditional` parameter
+- **Phone Number Processing Pipeline**: Complete phone number validation and formatting example
+  - Letter-to-number conversion (1-800-FLOWERS)
+  - NANP validation and formatting
+  - Toll-free number detection
+  - E.164 and parentheses formatting
+### Changed
+- **Conditional Operator Registration**: `is_conditional` parameter now optional with smart defaults
+- **Test Organization**: Consolidated conditional tests into three focused files:
+  - `test_conditional_core.py` - Core functionality, logic, errors, parameters, and performance
+  - `test_conditional_real_world.py` - Real-world pipeline scenarios
+  - `test_conditional_auto_detection.py` - Auto-detection feature tests
+### Fixed
+- **Phone Number Validation**: Updated NANP validation to be more flexible for testing scenarios
+## [0.2.5.3] - 2025-08-23
+### Added
+- **Compose Decorator Enhancement**: Auto-detection of PrimitiveRegistry instances in function globals
+  - Compose decorator now automatically discovers all namespace instances without explicit passing
+  - Improved namespace resolution using function's global scope instead of module globals
+  - Better support for multiple namespaces in composed functions
+### Fixed
+- **Namespace Resolution**: Fixed global namespace lookups to use function's own globals
+  - PipelineCompiler now correctly resolves namespaces from the decorated function's scope
+  - Fallback compose mode uses function globals for namespace discovery
+  - Prevents namespace resolution errors when registries are defined in different modules
+### Changed
+- **Phone Number Tests**: Updated test imports and formatting for phone number primitives
+- **Test Organization**: Added comprehensive conditional composition tests
+## [0.2.5.2] - 2025-08-22
+### Fixed
+- **Import Paths**: Updated import paths in phone_numbers pyspark primitives for clarity and consistency
+- **Documentation**: Improved docstring documentation across primitives
+## [0.2.5.1] - 2025-08-22
+### Changed
+- **Import Paths**: Renamed imports to be more transparent and clear
+### Added
+- **Documentation**: Added clear module-level docstrings throughout the codebase
+- **Unit Tests**: Added comprehensive unit tests for default initialization and datacompose.json configuration
+  - Tests for default target auto-selection with single target
+  - Tests for explicit target override behavior
+  - Tests for configuration file validation
+  - Tests for output path resolution from config
+### Fixed
+- **CLI Tests**: Fixed all failing default target configuration tests
+  - Added proper validation mocks for non-existent platforms in tests
+  - Fixed error message assertion for invalid platform validation
+  - Properly mocked generator class hierarchy for output path testing
+  - All 13 CLI default target tests now passing (100% pass rate)
+## [0.2.5] - 2025-08-21
+### Changed
+- **Documentation**: Streamlined README to be more concise
+  - Removed extensive code examples (now on website)
+  - Reduced from 390 lines to 44 lines
+  - Focused on core features and philosophy
+  - Added link to datacompose.io for detailed documentation
+### Fixed
+- **Test Suite**: Fixed failing CLI tests for `add` command
+  - Tests now properly mock ConfigLoader for isolated filesystem environments
+  - `test_add_invalid_transformer` correctly validates transformer not found error
+  - `test_complete_transformer_success` updated to match actual transformer names
+  - All CLI command tests passing with proper configuration mocking
+## [0.2.4] - 2025-08-13
+### Added
+- **Published to PyPI**: Package is now available via `pip install datacompose`
+- **Phone Number Primitives**: Complete set of 45+ phone number transformation functions
+  - NANP validation and formatting (North American Numbering Plan)
+  - International phone support with E.164 formatting
+  - Extension handling and toll-free detection
+  - Phone number extraction from text
+  - Letter-to-number conversion (1-800-FLOWERS support)
+- **Address Improvements**: Enhanced street extraction and standardization
+  - Fixed numbered street extraction ("5th Avenue" correctly returns "5th")
+  - Improved null handling in street extraction
+  - Custom mapping support for street suffix standardization
+- **Utils Export**: Generated code now includes `utils/primitives.py` for standalone deployment
+  - PrimitiveRegistry class embedded with generated code
+  - No runtime dependency on datacompose package
+  - Fallback imports for maximum compatibility
+- **Comprehensive Test Coverage**: Improved test coverage from 87% to 92%
+  - Added 18 new tests for primitives.py module (70% → 86% coverage)
+  - Created comprehensive test suites for all CLI commands
+  - Added full end-to-end integration tests (init → add → transform)
+  - validation.py achieved 100% coverage
+  - add.py improved to 99% coverage
+### Changed
+- **BREAKING**: Renamed `PrimitiveNameSpace` to `PrimitiveRegistry` throughout codebase
+- **Major Architecture Shift**: Removed YAML/spec file system entirely
+  - No more YAML specifications or JSON replacements
+  - Direct primitive file copying instead of template rendering
+  - Simplified discovery system works with transformer directories
+  - Removed `validate` command completely
+- **Import Strategy**: Primitives now try local utils import first, fall back to datacompose package
+- **File Naming**: Generated files use plural form with primitives suffix
+  - `emails` → `email_primitives.py`
+  - `addresses` → `address_primitives.py`
+  - `phone_numbers` → `phone_primitives.py`
+### Fixed
+- **Critical**: Fixed utils/primitives.py output location to be shared across all transformers
+  - Utils module now generates at top-level build/utils/ instead of per-transformer
+  - All transformers share the same PrimitiveRegistry implementation
+  - Prevents duplicate utils modules and ensures consistency
+- Phone `normalize_separators` now correctly handles parentheses: `(555)123-4567` → `555-123-4567`
+- Street extraction for numbered streets ("5th Avenue" issue)
+- Compose decorator now requires namespace to be passed explicitly for proper method resolution
+- `standardize_street_suffix` applies both custom and default mappings correctly
+- Test failures due to namespace resolution in compose decorator
+- Generator initialization error handling in add command
+### Removed
+- All YAML/spec file functionality
+- PostgreSQL generator references
+- Jinja2 template dependencies
+- `validate` command from CLI
+- Old Spark integration tests (replaced with end-to-end tests)
+## [0.2.0] - 2024-XX-XX
+### Added
+- **Primitive System**: New composable primitive architecture for building data pipelines
+  - `SmartPrimitive` class for partial application and parameter binding
+  - `PrimitiveRegistry` (originally PrimitiveNameSpace) for organizing related transformations
+  - Support for conditional primitives (boolean-returning functions)
+- **Conditional Compilation**: AST-based pipeline compilation with if/else support
+  - `PipelineCompiler` for parsing and compiling conditional logic
+  - `StablePipeline` for executing compiled pipelines
+  - Full support for nested conditionals and complex branching
+- **Comprehensive Testing**: 44+ tests covering conditional compilation scenarios
+  - Edge cases and null handling
+  - Complex nested logic
+  - Data-driven conditions
+  - Performance optimization tests
+  - Real-world use cases
+  - Parameter handling
+  - Error handling
+- **Improved Architecture**: Dual approach for different runtime constraints
+  - Primitives for flexible runtimes (Python, Spark, Scala)
+  - Templates for rigid targets (SQL, PostgreSQL)
+### Changed
+- Made PySpark an optional dependency
+- Reorganized test structure with focused test files and shared fixtures
+- Refined architecture to support both template-based and primitive-based approaches
+### Fixed
+- Import paths for pipeline compilation modules
+- Missing return statements in pipeline execution
+- Conditional logic to use accumulated results correctly
+## [0.1.4] - 2024-XX-XX
+### Added
+- Initial release of Datacompose
+- Core framework for generating data cleaning UDFs
+- Support for Spark, PostgreSQL, and Pandas targets
+- Built-in specifications for common data cleaning tasks:
+  - Email address cleaning
+  - Phone number normalization
+  - Address standardization
+  - Job title standardization
+  - Date/time parsing
+- CLI interface with commands:
+  - `datacompose init` - Initialize project
+  - `datacompose add` - Generate UDFs from specs
+  - `datacompose list` - List available targets and specs
+  - `datacompose validate` - Validate specification files
+- YAML-based specification format
+- Jinja2 templating for code generation
+- Comprehensive test suite
+- Documentation with Sphinx and Furo theme

datacompose-0.2.8/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2025 datacompose
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

datacompose-0.2.8/MANIFEST.in ADDED Viewed

@@ -0,0 +1,38 @@
+# Include documentation
+include README.md
+include LICENSE
+include CHANGELOG.md
+# Include all YAML specifications
+recursive-include datacompose/transformers *.yaml
+# Include all Jinja2 templates
+recursive-include datacompose/transformers *.j2
+recursive-include datacompose/generators *.j2
+# Include type hints
+recursive-include datacompose py.typed
+# Include test data (optional, remove if you don't want tests in distribution)
+recursive-include tests *.py
+recursive-include tests *.csv
+recursive-include tests *.yaml
+# Exclude unnecessary files
+global-exclude *.pyc
+global-exclude *.pyo
+global-exclude __pycache__
+global-exclude .DS_Store
+global-exclude .git*
+global-exclude *.swp
+global-exclude *~
+# Exclude development files
+exclude .pre-commit-config.yaml
+exclude .gitignore
+exclude docker-compose.yml
+exclude Dockerfile
+exclude Makefile
+prune docs/build
+prune notebooks
+prune scripts

datacompose-0.2.8/PKG-INFO ADDED Viewed

@@ -0,0 +1,176 @@
+Metadata-Version: 2.4
+Name: datacompose
+Version: 0.2.8
+Summary: Copy-pasteable data transformation primitives for PySpark. Inspired by shadcn-svelte.
+Author: Datacompose Contributors
+Maintainer: Datacompose Contributors
+License: MIT
+Project-URL: Homepage, https://github.com/tc-cole/datacompose
+Project-URL: Documentation, https://github.com/tc-cole/datacompose/tree/main/docs
+Project-URL: Repository, https://github.com/tc-cole/datacompose.git
+Project-URL: Issues, https://github.com/tc-cole/datacompose/issues
+Project-URL: Changelog, https://github.com/tc-cole/datacompose/blob/main/CHANGELOG.md
+Keywords: data-cleaning,data-quality,udf,spark,postgres,code-generation,data-pipeline,etl
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: Topic :: Software Development :: Code Generators
+Classifier: Topic :: Database
+Classifier: Topic :: Software Development :: Libraries :: Python Modules
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.8
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Operating System :: OS Independent
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: jinja2>=3.0.0
+Requires-Dist: pyyaml>=6.0
+Requires-Dist: click>=8.0.0
+Provides-Extra: dev
+Requires-Dist: pytest>=7.0.0; extra == "dev"
+Requires-Dist: black>=23.0.0; extra == "dev"
+Requires-Dist: mypy>=1.0.0; extra == "dev"
+Requires-Dist: ruff>=0.1.0; extra == "dev"
+Provides-Extra: docs
+Requires-Dist: mkdocs>=1.5.3; extra == "docs"
+Requires-Dist: mkdocs-material>=9.5.0; extra == "docs"
+Requires-Dist: mkdocs-material-extensions>=1.3; extra == "docs"
+Requires-Dist: mkdocs-minify-plugin>=0.7.1; extra == "docs"
+Requires-Dist: mkdocs-redirects>=1.2.1; extra == "docs"
+Requires-Dist: mike>=2.0.0; extra == "docs"
+Requires-Dist: pymdown-extensions>=10.5; extra == "docs"
+Requires-Dist: pygments>=2.17.0; extra == "docs"
+Requires-Dist: mkdocs-git-revision-date-localized-plugin>=1.2.2; extra == "docs"
+Requires-Dist: mkdocs-glightbox>=0.3.5; extra == "docs"
+Dynamic: license-file
+# DataCompose
+PySpark transformations you can actually own and modify. No black boxes.
+## Before vs After
+```python
+# Before: Regex nightmare for addresses
+df = df.withColumn("state_clean",
+    F.when(F.col("address").rlike(".*\\b(NY|N\\.Y\\.|New York|NewYork|Newyork)\\b.*"), "NY")
+    .when(F.col("address").rlike(".*\\b(CA|Cal\\.|Calif\\.|California)\\b.*"), "CA")
+    .when(F.col("address").rlike(".*\\b(IL|Ill\\.|Illinois|Illinios)\\b.*"), "IL")
+    .when(F.upper(F.col("address")).contains("NEW YORK"), "NY")
+    .when(F.regexp_extract(F.col("address"), ",\\s*([A-Z]{2})\\s+\\d{5}", 1) == "NY", "NY")
+    .when(F.regexp_extract(F.col("address"), "\\s+([A-Z]{2})\\s*$", 1) == "NY", "NY")
+    # ... handle "N.Y 10001" vs "NY, 10001" vs "New York 10001"
+    # ... handle misspellings like "Californai" or "Illnois"
+    # ... 50 more states × 10 variations each
+)
+# After: One line
+from builders.transformers.addresses import addresses
+df = df.withColumn("state", addresses.standardize_state(F.col("address")))
+```
+## Installation
+```bash
+pip install datacompose
+```
+## How It Works
+```bash
+# Copy transformers into YOUR repo
+datacompose add phones
+datacompose add addresses
+datacompose add emails
+```
+```python
+# Use them like any Python module - this is your code now
+from transformers.pyspark.addresses import addresses
+df = (df
+    .withColumn("street_number", addresses.extract_street_number(F.col("address")))
+    .withColumn("street_name", addresses.extract_street_name(F.col("address")))
+    .withColumn("city", addresses.extract_city(F.col("address")))
+    .withColumn("state", addresses.standardize_state(F.col("address")))
+    .withColumn("zip", addresses.extract_zip_code(F.col("address")))
+)
+# Result:
++----------------------------------------+-------------+------------+-----------+-----+-------+
+|address                                 |street_number|street_name |city       |state|zip    |
++----------------------------------------+-------------+------------+-----------+-----+-------+
+|123 Main St, New York, NY 10001        |123          |Main        |New York   |NY   |10001  |
+|456 Oak Ave Apt 5B, Los Angeles, CA 90001|456        |Oak         |Los Angeles|CA   |90001  |
+|789 Pine Blvd, Chicago, IL 60601       |789          |Pine        |Chicago    |IL   |60601  |
++----------------------------------------+-------------+------------+-----------+-----+-------+
+```
+The code lives in your repo. Modify it. Delete what you don't need. No external dependencies.
+## Why Copy-to-Own?
+- **Your data is weird** - Phone numbers with "ask for Bob"? We can't predict that. You can fix it.
+- **No breaking changes** - Library updates can't break your pipeline at 2 AM
+- **Actually debuggable** - Stack traces point to YOUR code, not site-packages
+- **No dependency hell** - It's just PySpark. If Spark runs, this runs.
+## Available Transformers
+**Phones** - Standardize formats, extract from text, validate, handle extensions
+**Addresses** - Parse components, standardize states, validate zips, detect PO boxes
+**Emails** - Validate, extract domains, fix typos (gmial→gmail), standardize
+More coming based on what you need.
+## Real Example
+```python
+# Messy customer data
+df = spark.createDataFrame([
+    ("(555) 123-4567 ext 89", "john.doe@gmial.com", "123 Main St Apt 4B"),
+    ("555.987.6543", "JANE@COMPANY.COM", "456 Oak Ave, NY, NY 10001")
+])
+# Clean it
+clean_df = (df
+    .withColumn("phone", phones.standardize_phone(F.col("phone")))
+    .withColumn("email", emails.fix_common_typos(F.col("email")))
+    .withColumn("street", addresses.extract_street_address(F.col("address")))
+)
+```
+## The Philosophy
+```
+█████████████ 60% - Already clean
+████████ 30% - Common patterns (formatting, typos)
+██ 8% - Edge cases (weird but fixable)
+▌ 2% - Complete chaos (that's what interns are for)
+```
+We handle the 38% with patterns. You handle the 2% chaos.
+## Documentation
+Full docs at [datacompose.io](https://datacompose.io)
+## Key Features
+- **Zero dependencies** - Just PySpark code that runs anywhere Spark runs
+- **Fully modifiable** - It's in your repo. Change whatever you need
+- **Battle-tested patterns** - Built from real production data cleaning challenges
+- **Composable functions** - Chain simple operations into complex pipelines
+- **No breaking changes** - You control when and how to update
+## License
+MIT - It's your code now.
+---
+*Inspired by [shadcn/ui](https://ui.shadcn.com/) and [Svelte](https://svelte.dev/)'s approach to components - copy, don't install.*

datacompose-0.2.8/README.md ADDED Viewed

@@ -0,0 +1,126 @@
+# DataCompose
+PySpark transformations you can actually own and modify. No black boxes.
+## Before vs After
+```python
+# Before: Regex nightmare for addresses
+df = df.withColumn("state_clean",
+    F.when(F.col("address").rlike(".*\\b(NY|N\\.Y\\.|New York|NewYork|Newyork)\\b.*"), "NY")
+    .when(F.col("address").rlike(".*\\b(CA|Cal\\.|Calif\\.|California)\\b.*"), "CA")
+    .when(F.col("address").rlike(".*\\b(IL|Ill\\.|Illinois|Illinios)\\b.*"), "IL")
+    .when(F.upper(F.col("address")).contains("NEW YORK"), "NY")
+    .when(F.regexp_extract(F.col("address"), ",\\s*([A-Z]{2})\\s+\\d{5}", 1) == "NY", "NY")
+    .when(F.regexp_extract(F.col("address"), "\\s+([A-Z]{2})\\s*$", 1) == "NY", "NY")
+    # ... handle "N.Y 10001" vs "NY, 10001" vs "New York 10001"
+    # ... handle misspellings like "Californai" or "Illnois"
+    # ... 50 more states × 10 variations each
+)
+# After: One line
+from builders.transformers.addresses import addresses
+df = df.withColumn("state", addresses.standardize_state(F.col("address")))
+```
+## Installation
+```bash
+pip install datacompose
+```
+## How It Works
+```bash
+# Copy transformers into YOUR repo
+datacompose add phones
+datacompose add addresses
+datacompose add emails
+```
+```python
+# Use them like any Python module - this is your code now
+from transformers.pyspark.addresses import addresses
+df = (df
+    .withColumn("street_number", addresses.extract_street_number(F.col("address")))
+    .withColumn("street_name", addresses.extract_street_name(F.col("address")))
+    .withColumn("city", addresses.extract_city(F.col("address")))
+    .withColumn("state", addresses.standardize_state(F.col("address")))
+    .withColumn("zip", addresses.extract_zip_code(F.col("address")))
+)
+# Result:
++----------------------------------------+-------------+------------+-----------+-----+-------+
+|address                                 |street_number|street_name |city       |state|zip    |
++----------------------------------------+-------------+------------+-----------+-----+-------+
+|123 Main St, New York, NY 10001        |123          |Main        |New York   |NY   |10001  |
+|456 Oak Ave Apt 5B, Los Angeles, CA 90001|456        |Oak         |Los Angeles|CA   |90001  |
+|789 Pine Blvd, Chicago, IL 60601       |789          |Pine        |Chicago    |IL   |60601  |
++----------------------------------------+-------------+------------+-----------+-----+-------+
+```
+The code lives in your repo. Modify it. Delete what you don't need. No external dependencies.
+## Why Copy-to-Own?
+- **Your data is weird** - Phone numbers with "ask for Bob"? We can't predict that. You can fix it.
+- **No breaking changes** - Library updates can't break your pipeline at 2 AM
+- **Actually debuggable** - Stack traces point to YOUR code, not site-packages
+- **No dependency hell** - It's just PySpark. If Spark runs, this runs.
+## Available Transformers
+**Phones** - Standardize formats, extract from text, validate, handle extensions
+**Addresses** - Parse components, standardize states, validate zips, detect PO boxes
+**Emails** - Validate, extract domains, fix typos (gmial→gmail), standardize
+More coming based on what you need.
+## Real Example
+```python
+# Messy customer data
+df = spark.createDataFrame([
+    ("(555) 123-4567 ext 89", "john.doe@gmial.com", "123 Main St Apt 4B"),
+    ("555.987.6543", "JANE@COMPANY.COM", "456 Oak Ave, NY, NY 10001")
+])
+# Clean it
+clean_df = (df
+    .withColumn("phone", phones.standardize_phone(F.col("phone")))
+    .withColumn("email", emails.fix_common_typos(F.col("email")))
+    .withColumn("street", addresses.extract_street_address(F.col("address")))
+)
+```
+## The Philosophy
+```
+█████████████ 60% - Already clean
+████████ 30% - Common patterns (formatting, typos)
+██ 8% - Edge cases (weird but fixable)
+▌ 2% - Complete chaos (that's what interns are for)
+```
+We handle the 38% with patterns. You handle the 2% chaos.
+## Documentation
+Full docs at [datacompose.io](https://datacompose.io)
+## Key Features
+- **Zero dependencies** - Just PySpark code that runs anywhere Spark runs
+- **Fully modifiable** - It's in your repo. Change whatever you need
+- **Battle-tested patterns** - Built from real production data cleaning challenges
+- **Composable functions** - Chain simple operations into complex pipelines
+- **No breaking changes** - You control when and how to update
+## License
+MIT - It's your code now.
+---
+*Inspired by [shadcn/ui](https://ui.shadcn.com/) and [Svelte](https://svelte.dev/)'s approach to components - copy, don't install.*

datacompose-0.2.8/datacompose/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ """Datacompose source package."""

datacompose-0.2.8/datacompose/cli/__init__.py ADDED Viewed

@@ -0,0 +1,5 @@
+"""
+Datacompose CLI - Command-line interface for generating data cleaning UDFs.
+"""
+__version__ = "0.2.7.0"

datacompose-0.2.8/datacompose/cli/colors.py ADDED Viewed

@@ -0,0 +1,80 @@
+"""
+Simple color utilities for CLI output.
+"""
+import os
+import sys
+class Colors:
+    """ANSI color codes for terminal output."""
+    # Text colors
+    RED = "\033[91m"
+    GREEN = "\033[92m"
+    YELLOW = "\033[93m"
+    BLUE = "\033[94m"
+    MAGENTA = "\033[95m"
+    CYAN = "\033[96m"
+    WHITE = "\033[97m"
+    GRAY = "\033[90m"
+    # Styles
+    BOLD = "\033[1m"
+    DIM = "\033[2m"
+    UNDERLINE = "\033[4m"
+    # Reset
+    RESET = "\033[0m"
+    @classmethod
+    def is_enabled(cls) -> bool:
+        """Check if colors should be enabled."""
+        # Disable colors if NO_COLOR env var is set
+        if os.getenv("NO_COLOR"):
+            return False
+        # Disable colors if not in a TTY
+        if not sys.stdout.isatty():
+            return False
+        return True
+def colorize(text: str, color: str = "", style: str = "") -> str:
+    """Colorize text if colors are enabled."""
+    if not Colors.is_enabled():
+        return text
+    prefix = style + color
+    return f"{prefix}{text}{Colors.RESET}"
+def success(text: str) -> str:
+    """Green text for success messages."""
+    return colorize(text, Colors.GREEN, Colors.BOLD)
+def error(text: str) -> str:
+    """Red text for error messages."""
+    return colorize(text, Colors.RED, Colors.BOLD)
+def warning(text: str) -> str:
+    """Yellow text for warning messages."""
+    return colorize(text, Colors.YELLOW, Colors.BOLD)
+def info(text: str) -> str:
+    """Blue text for info messages."""
+    return colorize(text, Colors.BLUE)
+def highlight(text: str) -> str:
+    """Cyan text for highlighted text."""
+    return colorize(text, Colors.CYAN, Colors.BOLD)
+def dim(text: str) -> str:
+    """Dimmed text for less important info."""
+    return colorize(text, Colors.GRAY)

datacompose-0.2.8/datacompose/cli/commands/__init__.py ADDED Viewed

@@ -0,0 +1,3 @@
+"""
+CLI commands for Datacompose.
+"""