PyPI - additory - Versions diffs - 0.1.0a2__tar.gz → 0.1.0a4__tar.gz - Mend

additory 0.1.0a2tar.gz → 0.1.0a4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (131) hide show

{additory-0.1.0a2 → additory-0.1.0a4}/PKG-INFO RENAMED Viewed

@@ -1,7 +1,7 @@
 Metadata-Version: 2.4
 Name: additory
-Version: 0.1.0a2
-Summary: A semantic, extensible dataframe transformation engine with expressions, lookup, synthetic data, and sample-data support.
+Version: 0.1.0a4
+Summary: A semantic, extensible dataframe transformation engine with expressions, lookup, and synthetic data generation support.
 Author: Krishnamoorthy Sankaran
 License: MIT
 Project-URL: homepage, https://github.com/sekarkrishna/additory
@@ -13,6 +13,7 @@ Description-Content-Type: text/markdown
 License-File: LICENSE
 Requires-Dist: pandas>=1.5
 Requires-Dist: polars>=0.20
+Requires-Dist: pyarrow>=10.0
 Requires-Dist: pyyaml>=6.0
 Requires-Dist: requests>=2.31
 Requires-Dist: toml>=0.10
@@ -34,11 +35,11 @@ Dynamic: license-file
 # Additory
-**A semantic, extensible dataframe transformation engine with expressions, lookup, synthetic data, and sample-data support.**
+**A semantic, extensible dataframe transformation engine with expressions, lookup, and augmentation support.**
 [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-[![Version](https://img.shields.io/badge/version-0.1.0a1-orange.svg)](https://github.com/sekarkrishna/additory/tree/main/V0.1.0a1/)
+[![Version](https://img.shields.io/badge/version-0.1.0a4-orange.svg)](https://github.com/sekarkrishna/additory)
 **Author:** Krishnamoorthy Sankaran
@@ -51,17 +52,17 @@ Dynamic: license-file
 ## 📦 Installation
 ```bash
-pip install additory==0.1.0a1
+pip install additory==0.1.0a4
 ```
 **Optional GPU support:**
 ```bash
-pip install additory[gpu]==0.1.0a1  # Includes cuDF for GPU acceleration
+pip install additory[gpu]==0.1.0a4  # Includes cuDF for GPU acceleration
 ```
 **Development installation:**
 ```bash
-pip install additory[dev]==0.1.0a1  # Includes testing and development tools
+pip install additory[dev]==0.1.0a4  # Includes testing and development tools
 ```
 ## 🎯 Core Functions
@@ -69,8 +70,8 @@ pip install additory[dev]==0.1.0a1  # Includes testing and development tools
 | Function | Purpose | Example |
 |----------|---------|---------|
 | `add.to()` | Lookup/join operations | `add.to(df1, from_df=df2, bring='col', against='key')` |
-| `add.augment()` | Generate additional data | `add.augment(df, n_rows=1000)` |
-| `add.synth()` | Synthetic data from schemas | `add.synth("schema.toml", rows=5000)` |
+| `add.synthetic()` | Generate additional data | `add.synthetic(df, n_rows=1000)` |
+| `add.deduce()` | Text-based label deduction | `add.deduce(df, from_column='text', to_column='label')` |
 | `add.scan()` | Data profiling & analysis | `add.scan(df, preset="full")` |
 ## 🧬 Available Expressions
@@ -119,7 +120,7 @@ import additory as add
 # Works with polars
 df_polars = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
-result = add.augment(df_polars, n_rows=100)
+result = add.synthetic(df_polars, n_rows=100)
 # Automatic type detection and conversion
 ```
@@ -193,27 +194,42 @@ patients_with_bsa = add.bsa(patients)
 result = add.fitness_score(add.bmr(add.bmi(patients)))
 ```
-### 🔄 Augment and Synthetic Data
+### 🔄 Synthetic Data Generation
-**Augment** generates more data similar to your existing dataset, while **Synthetic** creates entirely new datasets from schema definitions.
-**Key Differences:**
-- **Augment**: Learns patterns from existing data to create similar rows
-- **Synthetic**: Uses predefined schemas to generate structured data
+**Synthetic** generates additional data similar to your existing dataset using inline strategies.
 ```python
-# Augment existing data (learns from patterns)
-more_customers = add.augment(customers, n_rows=1000)
+# Extend existing data (learns from patterns)
+more_customers = add.synthetic(customers, n_rows=1000)
 # Create data from scratch with strategies
-new_data = add.augment("@new", n_rows=500, strategy={
+new_data = add.synthetic("@new", n_rows=500, strategy={
     'id': 'increment:start=1',
     'name': 'choice:[John,Jane,Bob]',
     'age': 'range:18-65'
 })
+```
+### 🤖 Text-Based Label Deduction
-# Generate from schema file (structured approach)
-customers = add.synth("customer_schema.toml", rows=10000)
+**Deduce** automatically fills in missing labels by learning from your existing labeled examples. Pure Python, no LLMs, offline-first.
+```python
+# Deduce missing labels from text
+tickets = pd.DataFrame({
+    "ticket_text": ["Cannot log in", "Billing question", "App crashes", "Need invoice"],
+    "category": ["Technical", "Billing", None, None]
+})
+# Automatically fill in missing categories
+result = add.deduce(tickets, from_column="ticket_text", to_column="category")
+# Use multiple columns for better accuracy
+result = add.deduce(
+    df,
+    from_column=["title", "description"],
+    to_column="category"
+)
 ```
 ## 🧪 Examples
@@ -231,7 +247,7 @@ customers = pd.DataFrame({
 })
 # Generate more customers
-customers = add.augment(customers, n_rows=10000)
+customers = add.synthetic(customers, n_rows=10000)
 # Add customer tiers
 tiers = pd.DataFrame({
@@ -257,7 +273,7 @@ strategy = {
     'height_cm': 'range:150-200'  # Height in cm
 }
-patients = add.augment("@new", n_rows=1000, strategy=strategy)
+patients = add.synthetic("@new", n_rows=1000, strategy=strategy)
 # Convert height to meters for expressions
 patients['height_m'] = patients['height_cm'] / 100
@@ -272,19 +288,19 @@ print(result.correlations)
 ## 📚 Documentation
-- **[Function Documentation](https://github.com/sekarkrishna/additory/tree/main/V0.1.0a1/documentation/V0.1.0/)** - Detailed guides for each function
-- **[Expressions Guide](https://github.com/sekarkrishna/additory/tree/main/V0.1.0a1/documentation/V0.1.0/expressions.html)** - Complete expressions reference
+- **[Function Documentation](https://github.com/sekarkrishna/additory/tree/main/documentation/)** - Detailed guides for each function
+- **[Expressions Guide](https://github.com/sekarkrishna/additory/tree/main/documentation/)** - Complete expressions reference
 ## 📄 License
-MIT License - see [LICENSE](https://github.com/sekarkrishna/additory/tree/main/V0.1.0a1/LICENSE) file for details.
+MIT License - see [LICENSE](LICENSE) file for details.
 ## 📞 Support
 - **Issues**: [GitHub Issues](https://github.com/sekarkrishna/additory/issues)
-- **Documentation**: [Full Documentation](https://github.com/sekarkrishna/additory/tree/main/V0.1.0a1/documentation/V0.1.0)
+- **Documentation**: [Full Documentation](https://github.com/sekarkrishna/additory/tree/main/documentation/)
-## 🗺️ v0.1.1 (February 2025)
+## 🗺️ v0.1.1 (January 2026)
 - Enhanced documentation and tutorials
 - Performance optimizations
 - Additional expressions

{additory-0.1.0a2 → additory-0.1.0a4}/README.md RENAMED Viewed

@@ -1,10 +1,10 @@
 # Additory
-**A semantic, extensible dataframe transformation engine with expressions, lookup, synthetic data, and sample-data support.**
+**A semantic, extensible dataframe transformation engine with expressions, lookup, and augmentation support.**
 [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-[![Version](https://img.shields.io/badge/version-0.1.0a1-orange.svg)](https://github.com/sekarkrishna/additory/tree/main/V0.1.0a1/)
+[![Version](https://img.shields.io/badge/version-0.1.0a4-orange.svg)](https://github.com/sekarkrishna/additory)
 **Author:** Krishnamoorthy Sankaran
@@ -17,17 +17,17 @@
 ## 📦 Installation
 ```bash
-pip install additory==0.1.0a1
+pip install additory==0.1.0a4
 ```
 **Optional GPU support:**
 ```bash
-pip install additory[gpu]==0.1.0a1  # Includes cuDF for GPU acceleration
+pip install additory[gpu]==0.1.0a4  # Includes cuDF for GPU acceleration
 ```
 **Development installation:**
 ```bash
-pip install additory[dev]==0.1.0a1  # Includes testing and development tools
+pip install additory[dev]==0.1.0a4  # Includes testing and development tools
 ```
 ## 🎯 Core Functions
@@ -35,8 +35,8 @@ pip install additory[dev]==0.1.0a1  # Includes testing and development tools
 | Function | Purpose | Example |
 |----------|---------|---------|
 | `add.to()` | Lookup/join operations | `add.to(df1, from_df=df2, bring='col', against='key')` |
-| `add.augment()` | Generate additional data | `add.augment(df, n_rows=1000)` |
-| `add.synth()` | Synthetic data from schemas | `add.synth("schema.toml", rows=5000)` |
+| `add.synthetic()` | Generate additional data | `add.synthetic(df, n_rows=1000)` |
+| `add.deduce()` | Text-based label deduction | `add.deduce(df, from_column='text', to_column='label')` |
 | `add.scan()` | Data profiling & analysis | `add.scan(df, preset="full")` |
 ## 🧬 Available Expressions
@@ -85,7 +85,7 @@ import additory as add
 # Works with polars
 df_polars = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
-result = add.augment(df_polars, n_rows=100)
+result = add.synthetic(df_polars, n_rows=100)
 # Automatic type detection and conversion
 ```
@@ -159,27 +159,42 @@ patients_with_bsa = add.bsa(patients)
 result = add.fitness_score(add.bmr(add.bmi(patients)))
 ```
-### 🔄 Augment and Synthetic Data
+### 🔄 Synthetic Data Generation
-**Augment** generates more data similar to your existing dataset, while **Synthetic** creates entirely new datasets from schema definitions.
-**Key Differences:**
-- **Augment**: Learns patterns from existing data to create similar rows
-- **Synthetic**: Uses predefined schemas to generate structured data
+**Synthetic** generates additional data similar to your existing dataset using inline strategies.
 ```python
-# Augment existing data (learns from patterns)
-more_customers = add.augment(customers, n_rows=1000)
+# Extend existing data (learns from patterns)
+more_customers = add.synthetic(customers, n_rows=1000)
 # Create data from scratch with strategies
-new_data = add.augment("@new", n_rows=500, strategy={
+new_data = add.synthetic("@new", n_rows=500, strategy={
     'id': 'increment:start=1',
     'name': 'choice:[John,Jane,Bob]',
     'age': 'range:18-65'
 })
+```
+### 🤖 Text-Based Label Deduction
-# Generate from schema file (structured approach)
-customers = add.synth("customer_schema.toml", rows=10000)
+**Deduce** automatically fills in missing labels by learning from your existing labeled examples. Pure Python, no LLMs, offline-first.
+```python
+# Deduce missing labels from text
+tickets = pd.DataFrame({
+    "ticket_text": ["Cannot log in", "Billing question", "App crashes", "Need invoice"],
+    "category": ["Technical", "Billing", None, None]
+})
+# Automatically fill in missing categories
+result = add.deduce(tickets, from_column="ticket_text", to_column="category")
+# Use multiple columns for better accuracy
+result = add.deduce(
+    df,
+    from_column=["title", "description"],
+    to_column="category"
+)
 ```
 ## 🧪 Examples
@@ -197,7 +212,7 @@ customers = pd.DataFrame({
 })
 # Generate more customers
-customers = add.augment(customers, n_rows=10000)
+customers = add.synthetic(customers, n_rows=10000)
 # Add customer tiers
 tiers = pd.DataFrame({
@@ -223,7 +238,7 @@ strategy = {
     'height_cm': 'range:150-200'  # Height in cm
 }
-patients = add.augment("@new", n_rows=1000, strategy=strategy)
+patients = add.synthetic("@new", n_rows=1000, strategy=strategy)
 # Convert height to meters for expressions
 patients['height_m'] = patients['height_cm'] / 100
@@ -238,19 +253,19 @@ print(result.correlations)
 ## 📚 Documentation
-- **[Function Documentation](https://github.com/sekarkrishna/additory/tree/main/V0.1.0a1/documentation/V0.1.0/)** - Detailed guides for each function
-- **[Expressions Guide](https://github.com/sekarkrishna/additory/tree/main/V0.1.0a1/documentation/V0.1.0/expressions.html)** - Complete expressions reference
+- **[Function Documentation](https://github.com/sekarkrishna/additory/tree/main/documentation/)** - Detailed guides for each function
+- **[Expressions Guide](https://github.com/sekarkrishna/additory/tree/main/documentation/)** - Complete expressions reference
 ## 📄 License
-MIT License - see [LICENSE](https://github.com/sekarkrishna/additory/tree/main/V0.1.0a1/LICENSE) file for details.
+MIT License - see [LICENSE](LICENSE) file for details.
 ## 📞 Support
 - **Issues**: [GitHub Issues](https://github.com/sekarkrishna/additory/issues)
-- **Documentation**: [Full Documentation](https://github.com/sekarkrishna/additory/tree/main/V0.1.0a1/documentation/V0.1.0)
+- **Documentation**: [Full Documentation](https://github.com/sekarkrishna/additory/tree/main/documentation/)
-## 🗺️ v0.1.1 (February 2025)
+## 🗺️ v0.1.1 (January 2026)
 - Enhanced documentation and tutorials
 - Performance optimizations
 - Additional expressions

{additory-0.1.0a2 → additory-0.1.0a4}/additory/__init__.py RENAMED Viewed

@@ -2,6 +2,9 @@
 from .dynamic_api import add as _api_instance
+# Version information
+__version__ = "0.1.0a4"
 # Expose the API instance normally
 add = _api_instance
@@ -12,4 +15,5 @@ def __getattr__(name):
 __all__ = [
     "add",
+    "__version__",
 ]

{additory-0.1.0a2 → additory-0.1.0a4}/additory/common/__init__.py RENAMED Viewed

@@ -1,14 +1,14 @@
 """
 Common Utilities Module
-Shared functionality used by both augment and synthetic modules:
+Shared functionality used by both synthetic and expressions modules:
 - Distribution functions (normal, uniform, skewed, etc.)
 - List file management (.list format)
 - Pattern file management (.properties format)
 - Fallback resolution logic
 This module eliminates code duplication and provides consistent behavior
-across augment and synthetic data generation.
+across synthetic and expression data generation.
 """
 from .distributions import (

{additory-0.1.0a2 → additory-0.1.0a4}/additory/common/backend.py RENAMED Viewed

@@ -180,11 +180,14 @@ def get_arrow_bridge():
         - Use for all cross-backend conversions
         - Handles pandas/polars/cuDF via Arrow
     """
-    from additory.core.backends.arrow_bridge import EnhancedArrowBridge
+    from additory.core.backends.arrow_bridge import EnhancedArrowBridge, ArrowBridgeError
     # Singleton pattern
     if not hasattr(get_arrow_bridge, '_instance'):
-        get_arrow_bridge._instance = EnhancedArrowBridge()
+        try:
+            get_arrow_bridge._instance = EnhancedArrowBridge()
+        except ArrowBridgeError:
+            get_arrow_bridge._instance = None
     return get_arrow_bridge._instance
@@ -194,7 +197,7 @@ def to_polars(df: Any, backend_type: BackendType = None) -> 'pl.DataFrame':
     Convert any dataframe to Polars via Arrow bridge.
     This is the primary conversion function for the Polars-only architecture.
-    All operations (expressions, augment, etc.) use this to convert input
+    All operations (expressions, synthetic, etc.) use this to convert input
     dataframes to Polars for processing.
     Args:
@@ -224,7 +227,7 @@ def to_polars(df: Any, backend_type: BackendType = None) -> 'pl.DataFrame':
         )
     # Fast path: already Polars
-    if isinstance(df, pl.DataFrame):
+    if HAS_POLARS and isinstance(df, pl.DataFrame):
         return df
     # Validate input
@@ -240,6 +243,13 @@ def to_polars(df: Any, backend_type: BackendType = None) -> 'pl.DataFrame':
     # Convert via Arrow bridge
     try:
         bridge = get_arrow_bridge()
+        if bridge is None:
+            # Fallback: direct conversion for pandas
+            if backend_type == "pandas":
+                if isinstance(df, pd.DataFrame):
+                    return pl.from_pandas(df)
+            raise RuntimeError("Arrow bridge not available and cannot convert non-pandas DataFrame")
         arrow_table = bridge.to_arrow(df, backend_type)
         pl_df = bridge.from_arrow(arrow_table, "polars")
         return pl_df
@@ -309,6 +319,12 @@ def from_polars(pl_df: 'pl.DataFrame', target_backend: BackendType) -> Any:
     # Convert via Arrow bridge
     try:
         bridge = get_arrow_bridge()
+        if bridge is None:
+            # Fallback: direct conversion for pandas
+            if target_backend == "pandas":
+                return pl_df.to_pandas()
+            raise RuntimeError("Arrow bridge not available and cannot convert to non-pandas DataFrame")
         arrow_table = bridge.to_arrow(pl_df, "polars")
         result_df = bridge.from_arrow(arrow_table, target_backend)
         return result_df

{additory-0.1.0a2 → additory-0.1.0a4}/additory/common/distributions.py RENAMED Viewed

@@ -1,5 +1,5 @@
 """
-Distribution Strategies for Data Augmentation
+Distribution Strategies for Synthetic Data Generation
 Provides statistical distribution-based data generation:
 - Normal (Gaussian) distribution

{additory-0.1.0a2 → additory-0.1.0a4}/additory/common/sample_data.py RENAMED Viewed

@@ -8,8 +8,8 @@ loaded on-demand using the existing .add file parser.
 Usage:
     from additory.common.sample_data import get_sample_dataset
-    # For augment
-    df = get_sample_dataset("augment", "sample")
+    # For synthetic
+    df = get_sample_dataset("synthetic", "sample")
     # For expressions (future)
     df = get_sample_dataset("expressions", "sample")
@@ -25,7 +25,7 @@ from additory.common.exceptions import ValidationError
 def get_sample_dataset(
-    module: str = "augment",
+    module: str = "synthetic",
     block: str = "sample",
     dataset_type: str = "clean"
 ) -> pl.DataFrame:
@@ -33,12 +33,12 @@ def get_sample_dataset(
     Load a sample dataset from .add files.
     This function provides centralized access to sample datasets across
-    all additory modules (augment, expressions, utilities). Sample datasets
+    all additory modules (synthetic, expressions, utilities). Sample datasets
     are stored as .add files in the reference/ directory structure.
     Args:
-        module: Module name ("augment", "expressions", "utilities")
-        block: Block name within the .add file ("sample" for augment)
+        module: Module name ("synthetic", "expressions", "utilities")
+        block: Block name within the .add file ("sample" for synthetic)
         dataset_type: Type of sample data ("clean" or "unclean")
     Returns:
@@ -48,8 +48,8 @@ def get_sample_dataset(
         ValidationError: If module, block, or dataset_type not found
     Examples:
-        >>> # Load augment sample dataset
-        >>> df = get_sample_dataset("augment", "sample")
+        >>> # Load synthetic sample dataset
+        >>> df = get_sample_dataset("synthetic", "sample")
         >>> print(df.shape)
         (50, 10)
@@ -57,7 +57,7 @@ def get_sample_dataset(
         >>> df = get_sample_dataset("expressions", "sample", "clean")
         >>> df_unclean = get_sample_dataset("expressions", "sample", "unclean")
-    Sample Dataset Structure (augment):
+    Sample Dataset Structure (synthetic):
         - id: Sequential numeric IDs (1-50)
         - emp_id: Employee IDs with pattern (EMP_001 - EMP_050)
         - order_id: Order IDs with different padding (ORD_0001 - ORD_0050)
@@ -72,8 +72,8 @@ def get_sample_dataset(
     # Construct path to .add file
     base_path = Path(__file__).parent.parent.parent / "reference"
-    if module == "augment":
-        add_file_path = base_path / "augment_definitions" / f"{block}_0.1.add"
+    if module == "synthetic":
+        add_file_path = base_path / "synthetic_definitions" / f"{block}_0.1.add"
     elif module == "expressions":
         add_file_path = base_path / "expressions_definitions" / f"{block}_0.1.add"
     elif module == "utilities":
@@ -81,7 +81,7 @@ def get_sample_dataset(
     else:
         raise ValidationError(
             f"Unknown module '{module}'. "
-            f"Valid modules: augment, expressions, utilities"
+            f"Valid modules: synthetic, expressions, utilities"
         )
     # Check if file exists
@@ -141,7 +141,7 @@ def list_available_samples() -> dict:
         >>> samples = list_available_samples()
         >>> print(samples)
         {
-            'augment': ['sample'],
+            'synthetic': ['sample'],
             'expressions': ['sample'],
             'utilities': []
         }
@@ -149,15 +149,15 @@ def list_available_samples() -> dict:
     base_path = Path(__file__).parent.parent.parent / "reference"
     available = {}
-    # Check augment
-    augment_path = base_path / "augment_definitions"
-    if augment_path.exists():
-        available['augment'] = [
+    # Check synthetic
+    synthetic_path = base_path / "synthetic_definitions"
+    if synthetic_path.exists():
+        available['synthetic'] = [
             f.stem.rsplit('_', 1)[0]  # Remove version suffix
-            for f in augment_path.glob("*.add")
+            for f in synthetic_path.glob("*.add")
         ]
     else:
-        available['augment'] = []
+        available['synthetic'] = []
     # Check expressions
     expressions_path = base_path / "expressions_definitions"

{additory-0.1.0a2 → additory-0.1.0a4}/additory/core/backends/arrow_bridge.py RENAMED Viewed

@@ -16,6 +16,13 @@ try:
 except ImportError as e:
     ARROW_AVAILABLE = False
     IMPORT_ERROR = str(e)
+    # Create dummy classes for type annotations
+    class pa:
+        Table = Any
+    class pl:
+        DataFrame = Any
+    class pd:
+        DataFrame = Any
 from ..logging import log_info, log_warning
 from .cudf_bridge import get_cudf_bridge

{additory-0.1.0a2 → additory-0.1.0a4}/additory/core/config.py RENAMED Viewed

@@ -329,14 +329,14 @@ def set_custom_formula_path(path):
 # backend preference setting
-_backend_preference: str | None = None  # "cpu", "gpu", or None
+_backend_preference: Optional[str] = None  # "cpu", "gpu", or None
-def set_backend_preference(mode: str | None):
+def set_backend_preference(mode: Optional[str]):
     global _backend_preference
     if mode not in (None, "cpu", "gpu"):
         raise ValueError("backend must be 'cpu', 'gpu', or None")
     _backend_preference = mode
-def get_backend_preference() -> str | None:
+def get_backend_preference() -> Optional[str]:
     return _backend_preference

additory 0.1.0a2__tar.gz → 0.1.0a4__tar.gz

additory 0.1.0a2tar.gz → 0.1.0a4tar.gz