PyPI - additory - Versions diffs - 0.1.0a3__tar.gz → 0.1.0a4__tar.gz - Mend

additory 0.1.0a3tar.gz → 0.1.0a4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (98) hide show

{additory-0.1.0a3 → additory-0.1.0a4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: additory
-Version: 0.1.0a3
+Version: 0.1.0a4
 Summary: A semantic, extensible dataframe transformation engine with expressions, lookup, and synthetic data generation support.
 Author: Krishnamoorthy Sankaran
 License: MIT
@@ -39,7 +39,7 @@ Dynamic: license-file
 [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-[![Version](https://img.shields.io/badge/version-0.1.0a2-orange.svg)](https://github.com/sekarkrishna/additory/tree/main/V0.1.0a1/)
+[![Version](https://img.shields.io/badge/version-0.1.0a4-orange.svg)](https://github.com/sekarkrishna/additory)
 **Author:** Krishnamoorthy Sankaran
@@ -52,17 +52,17 @@ Dynamic: license-file
 ## 📦 Installation
 ```bash
-pip install additory==0.1.0a2
+pip install additory==0.1.0a4
 ```
 **Optional GPU support:**
 ```bash
-pip install additory[gpu]==0.1.0a2  # Includes cuDF for GPU acceleration
+pip install additory[gpu]==0.1.0a4  # Includes cuDF for GPU acceleration
 ```
 **Development installation:**
 ```bash
-pip install additory[dev]==0.1.0a2  # Includes testing and development tools
+pip install additory[dev]==0.1.0a4  # Includes testing and development tools
 ```
 ## 🎯 Core Functions
@@ -70,7 +70,8 @@ pip install additory[dev]==0.1.0a2  # Includes testing and development tools
 | Function | Purpose | Example |
 |----------|---------|---------|
 | `add.to()` | Lookup/join operations | `add.to(df1, from_df=df2, bring='col', against='key')` |
-| `add.augment()` | Generate additional data | `add.augment(df, n_rows=1000)` |
+| `add.synthetic()` | Generate additional data | `add.synthetic(df, n_rows=1000)` |
+| `add.deduce()` | Text-based label deduction | `add.deduce(df, from_column='text', to_column='label')` |
 | `add.scan()` | Data profiling & analysis | `add.scan(df, preset="full")` |
 ## 🧬 Available Expressions
@@ -119,7 +120,7 @@ import additory as add
 # Works with polars
 df_polars = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
-result = add.augment(df_polars, n_rows=100)
+result = add.synthetic(df_polars, n_rows=100)
 # Automatic type detection and conversion
 ```
@@ -193,22 +194,44 @@ patients_with_bsa = add.bsa(patients)
 result = add.fitness_score(add.bmr(add.bmi(patients)))
 ```
-### 🔄 Augment Data Generation
+### 🔄 Synthetic Data Generation
-**Augment** generates additional data similar to your existing dataset using inline strategies.
+**Synthetic** generates additional data similar to your existing dataset using inline strategies.
 ```python
-# Augment existing data (learns from patterns)
-more_customers = add.augment(customers, n_rows=1000)
+# Extend existing data (learns from patterns)
+more_customers = add.synthetic(customers, n_rows=1000)
 # Create data from scratch with strategies
-new_data = add.augment("@new", n_rows=500, strategy={
+new_data = add.synthetic("@new", n_rows=500, strategy={
     'id': 'increment:start=1',
     'name': 'choice:[John,Jane,Bob]',
     'age': 'range:18-65'
 })
 ```
+### 🤖 Text-Based Label Deduction
+**Deduce** automatically fills in missing labels by learning from your existing labeled examples. Pure Python, no LLMs, offline-first.
+```python
+# Deduce missing labels from text
+tickets = pd.DataFrame({
+    "ticket_text": ["Cannot log in", "Billing question", "App crashes", "Need invoice"],
+    "category": ["Technical", "Billing", None, None]
+})
+# Automatically fill in missing categories
+result = add.deduce(tickets, from_column="ticket_text", to_column="category")
+# Use multiple columns for better accuracy
+result = add.deduce(
+    df,
+    from_column=["title", "description"],
+    to_column="category"
+)
+```
 ## 🧪 Examples
 ### E-commerce Data Pipeline
@@ -224,7 +247,7 @@ customers = pd.DataFrame({
 })
 # Generate more customers
-customers = add.augment(customers, n_rows=10000)
+customers = add.synthetic(customers, n_rows=10000)
 # Add customer tiers
 tiers = pd.DataFrame({
@@ -250,7 +273,7 @@ strategy = {
     'height_cm': 'range:150-200'  # Height in cm
 }
-patients = add.augment("@new", n_rows=1000, strategy=strategy)
+patients = add.synthetic("@new", n_rows=1000, strategy=strategy)
 # Convert height to meters for expressions
 patients['height_m'] = patients['height_cm'] / 100
@@ -265,19 +288,19 @@ print(result.correlations)
 ## 📚 Documentation
-- **[Function Documentation](https://github.com/sekarkrishna/additory/tree/main/V0.1.0a1/documentation/V0.1.0/)** - Detailed guides for each function
-- **[Expressions Guide](https://github.com/sekarkrishna/additory/tree/main/V0.1.0a1/documentation/V0.1.0/expressions.html)** - Complete expressions reference
+- **[Function Documentation](https://github.com/sekarkrishna/additory/tree/main/documentation/)** - Detailed guides for each function
+- **[Expressions Guide](https://github.com/sekarkrishna/additory/tree/main/documentation/)** - Complete expressions reference
 ## 📄 License
-MIT License - see [LICENSE](https://github.com/sekarkrishna/additory/tree/main/V0.1.0a1/LICENSE) file for details.
+MIT License - see [LICENSE](LICENSE) file for details.
 ## 📞 Support
 - **Issues**: [GitHub Issues](https://github.com/sekarkrishna/additory/issues)
-- **Documentation**: [Full Documentation](https://github.com/sekarkrishna/additory/tree/main/V0.1.0a1/documentation/V0.1.0)
+- **Documentation**: [Full Documentation](https://github.com/sekarkrishna/additory/tree/main/documentation/)
-## 🗺️ v0.1.1 (February 2025)
+## 🗺️ v0.1.1 (January 2026)
 - Enhanced documentation and tutorials
 - Performance optimizations
 - Additional expressions

{additory-0.1.0a3 → additory-0.1.0a4}/README.md RENAMED Viewed

@@ -4,7 +4,7 @@
 [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-[![Version](https://img.shields.io/badge/version-0.1.0a2-orange.svg)](https://github.com/sekarkrishna/additory/tree/main/V0.1.0a1/)
+[![Version](https://img.shields.io/badge/version-0.1.0a4-orange.svg)](https://github.com/sekarkrishna/additory)
 **Author:** Krishnamoorthy Sankaran
@@ -17,17 +17,17 @@
 ## 📦 Installation
 ```bash
-pip install additory==0.1.0a2
+pip install additory==0.1.0a4
 ```
 **Optional GPU support:**
 ```bash
-pip install additory[gpu]==0.1.0a2  # Includes cuDF for GPU acceleration
+pip install additory[gpu]==0.1.0a4  # Includes cuDF for GPU acceleration
 ```
 **Development installation:**
 ```bash
-pip install additory[dev]==0.1.0a2  # Includes testing and development tools
+pip install additory[dev]==0.1.0a4  # Includes testing and development tools
 ```
 ## 🎯 Core Functions
@@ -35,7 +35,8 @@ pip install additory[dev]==0.1.0a2  # Includes testing and development tools
 | Function | Purpose | Example |
 |----------|---------|---------|
 | `add.to()` | Lookup/join operations | `add.to(df1, from_df=df2, bring='col', against='key')` |
-| `add.augment()` | Generate additional data | `add.augment(df, n_rows=1000)` |
+| `add.synthetic()` | Generate additional data | `add.synthetic(df, n_rows=1000)` |
+| `add.deduce()` | Text-based label deduction | `add.deduce(df, from_column='text', to_column='label')` |
 | `add.scan()` | Data profiling & analysis | `add.scan(df, preset="full")` |
 ## 🧬 Available Expressions
@@ -84,7 +85,7 @@ import additory as add
 # Works with polars
 df_polars = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
-result = add.augment(df_polars, n_rows=100)
+result = add.synthetic(df_polars, n_rows=100)
 # Automatic type detection and conversion
 ```
@@ -158,22 +159,44 @@ patients_with_bsa = add.bsa(patients)
 result = add.fitness_score(add.bmr(add.bmi(patients)))
 ```
-### 🔄 Augment Data Generation
+### 🔄 Synthetic Data Generation
-**Augment** generates additional data similar to your existing dataset using inline strategies.
+**Synthetic** generates additional data similar to your existing dataset using inline strategies.
 ```python
-# Augment existing data (learns from patterns)
-more_customers = add.augment(customers, n_rows=1000)
+# Extend existing data (learns from patterns)
+more_customers = add.synthetic(customers, n_rows=1000)
 # Create data from scratch with strategies
-new_data = add.augment("@new", n_rows=500, strategy={
+new_data = add.synthetic("@new", n_rows=500, strategy={
     'id': 'increment:start=1',
     'name': 'choice:[John,Jane,Bob]',
     'age': 'range:18-65'
 })
 ```
+### 🤖 Text-Based Label Deduction
+**Deduce** automatically fills in missing labels by learning from your existing labeled examples. Pure Python, no LLMs, offline-first.
+```python
+# Deduce missing labels from text
+tickets = pd.DataFrame({
+    "ticket_text": ["Cannot log in", "Billing question", "App crashes", "Need invoice"],
+    "category": ["Technical", "Billing", None, None]
+})
+# Automatically fill in missing categories
+result = add.deduce(tickets, from_column="ticket_text", to_column="category")
+# Use multiple columns for better accuracy
+result = add.deduce(
+    df,
+    from_column=["title", "description"],
+    to_column="category"
+)
+```
 ## 🧪 Examples
 ### E-commerce Data Pipeline
@@ -189,7 +212,7 @@ customers = pd.DataFrame({
 })
 # Generate more customers
-customers = add.augment(customers, n_rows=10000)
+customers = add.synthetic(customers, n_rows=10000)
 # Add customer tiers
 tiers = pd.DataFrame({
@@ -215,7 +238,7 @@ strategy = {
     'height_cm': 'range:150-200'  # Height in cm
 }
-patients = add.augment("@new", n_rows=1000, strategy=strategy)
+patients = add.synthetic("@new", n_rows=1000, strategy=strategy)
 # Convert height to meters for expressions
 patients['height_m'] = patients['height_cm'] / 100
@@ -230,19 +253,19 @@ print(result.correlations)
 ## 📚 Documentation
-- **[Function Documentation](https://github.com/sekarkrishna/additory/tree/main/V0.1.0a1/documentation/V0.1.0/)** - Detailed guides for each function
-- **[Expressions Guide](https://github.com/sekarkrishna/additory/tree/main/V0.1.0a1/documentation/V0.1.0/expressions.html)** - Complete expressions reference
+- **[Function Documentation](https://github.com/sekarkrishna/additory/tree/main/documentation/)** - Detailed guides for each function
+- **[Expressions Guide](https://github.com/sekarkrishna/additory/tree/main/documentation/)** - Complete expressions reference
 ## 📄 License
-MIT License - see [LICENSE](https://github.com/sekarkrishna/additory/tree/main/V0.1.0a1/LICENSE) file for details.
+MIT License - see [LICENSE](LICENSE) file for details.
 ## 📞 Support
 - **Issues**: [GitHub Issues](https://github.com/sekarkrishna/additory/issues)
-- **Documentation**: [Full Documentation](https://github.com/sekarkrishna/additory/tree/main/V0.1.0a1/documentation/V0.1.0)
+- **Documentation**: [Full Documentation](https://github.com/sekarkrishna/additory/tree/main/documentation/)
-## 🗺️ v0.1.1 (February 2025)
+## 🗺️ v0.1.1 (January 2026)
 - Enhanced documentation and tutorials
 - Performance optimizations
 - Additional expressions

{additory-0.1.0a3 → additory-0.1.0a4}/additory/__init__.py RENAMED Viewed

@@ -3,7 +3,7 @@
 from .dynamic_api import add as _api_instance
 # Version information
-__version__ = "0.1.0a3"
+__version__ = "0.1.0a4"
 # Expose the API instance normally
 add = _api_instance

{additory-0.1.0a3 → additory-0.1.0a4}/additory/core/config.py RENAMED Viewed

@@ -329,14 +329,14 @@ def set_custom_formula_path(path):
 # backend preference setting
-_backend_preference: str | None = None  # "cpu", "gpu", or None
+_backend_preference: Optional[str] = None  # "cpu", "gpu", or None
-def set_backend_preference(mode: str | None):
+def set_backend_preference(mode: Optional[str]):
     global _backend_preference
     if mode not in (None, "cpu", "gpu"):
         raise ValueError("backend must be 'cpu', 'gpu', or None")
     _backend_preference = mode
-def get_backend_preference() -> str | None:
+def get_backend_preference() -> Optional[str]:
     return _backend_preference

{additory-0.1.0a3 → additory-0.1.0a4}/additory/core/registry.py RENAMED Viewed

@@ -2,6 +2,7 @@
 # Versioned registry for additory
 from dataclasses import dataclass
+from typing import Optional
 import os
 import json
@@ -26,9 +27,9 @@ class ResolvedFormula:
     source: str
     version: str
     mode: str = "local"
-    ast: dict | None = None
-    sample_clean: dict | None = None
-    sample_unclean: dict | None = None
+    ast: Optional[dict] = None
+    sample_clean: Optional[dict] = None
+    sample_unclean: Optional[dict] = None
 # ------------------------------------------------------------

{additory-0.1.0a3 → additory-0.1.0a4}/additory/dynamic_api.py RENAMED Viewed

@@ -30,8 +30,15 @@ class AdditoryAPI(SimpleNamespace):
         self.my = ExpressionProxy(namespace="user")
         self._builtin_proxy = ExpressionProxy(namespace="builtin")
-        # Explicitly set the synthetic method to prevent namespace conflicts
+        # Explicitly set methods to prevent namespace conflicts
         self.synthetic = self._synthetic_method
+        self.deduce = self._deduce_method
+        self.to = self._to_method
+        self.onehotencoding = self._onehotencoding_method
+        self.harmonize_units = self._harmonize_units_method
+        self.scan = self._scan_method
+        self.games = self._games_method
+        self.play = self._play_method
     def __getattr__(self, name):
         """
@@ -118,7 +125,7 @@ class AdditoryAPI(SimpleNamespace):
                 additory.synthetic = self._synthetic_method
             raise
-    def to(self, target_df, from_df=None, bring=None, against=None, **kwargs):
+    def _to_method(self, target_df, from_df=None, bring=None, against=None, **kwargs):
         """
         Add columns from reference dataframe to target dataframe.
@@ -139,7 +146,7 @@ class AdditoryAPI(SimpleNamespace):
         from additory.utilities.lookup import to
         return to(target_df, from_df, bring=bring, against=against, **kwargs)
-    def onehotencoding(self, df, columns=None, **kwargs):
+    def _onehotencoding_method(self, df, columns=None, **kwargs):
         """
         One-hot encode categorical columns.
@@ -154,7 +161,7 @@ class AdditoryAPI(SimpleNamespace):
         from additory.utilities.encoding import onehotencoding
         return onehotencoding(df, column=columns, **kwargs)
-    def harmonize_units(self, df, value_column, unit_column, target_unit=None, position="end", **kwargs):
+    def _harmonize_units_method(self, df, value_column, unit_column, target_unit=None, position="end", **kwargs):
         """
         Harmonize units in a dataframe.
@@ -176,7 +183,7 @@ class AdditoryAPI(SimpleNamespace):
         from additory.utilities.units import harmonize_units
         return harmonize_units(df, value_column, unit_column, target_unit, position, **kwargs)
-    def scan(
+    def _scan_method(
         self,
         df: Union[pl.DataFrame, pd.DataFrame, Any],
         preset: Optional[str] = None,
@@ -259,7 +266,48 @@ class AdditoryAPI(SimpleNamespace):
             verbose=verbose
         )
-    def games(self):
+    def _deduce_method(
+        self,
+        df: Union[pd.DataFrame, pl.DataFrame, Any],
+        from_column: Union[str, List[str]],
+        to_column: str
+    ) -> Union[pd.DataFrame, pl.DataFrame, Any]:
+        """
+        Deduce missing labels based on text similarity to labeled examples.
+        Uses cosine similarity on TF-IDF vectors. Pure Python, no LLMs, offline-first.
+        Requires at least 3 labeled examples to work.
+        When multiple source columns are provided, they are concatenated with
+        spaces before computing similarity.
+        Args:
+            df: DataFrame with some labeled and some unlabeled rows
+            from_column: Text column(s) to analyze
+                        - str: Single column (e.g., "comment")
+                        - List[str]: Multiple columns (e.g., ["comment", "notes"])
+            to_column: Label column to fill (e.g., "status")
+        Returns:
+            DataFrame with deduced labels filled in
+        Examples:
+            # Single column
+            >>> result = add.deduce(df, from_column="comment", to_column="status")
+            # Multiple columns (better accuracy)
+            >>> result = add.deduce(
+            ...     df,
+            ...     from_column=["comment", "notes", "description"],
+            ...     to_column="status"
+            ... )
+        Privacy: Your data never leaves your machine. No external connections.
+        """
+        from additory.synthetic.deduce import deduce as deduce_impl
+        return deduce_impl(df, from_column, to_column)
+    def _games_method(self):
         """
         List available games! 🎮
@@ -275,7 +323,7 @@ class AdditoryAPI(SimpleNamespace):
         """
         return ['tictactoe', 'sudoku']
-    def play(self, game: str = "tictactoe"):
+    def _play_method(self, game: str = "tictactoe"):
         """
         Play a game! 🎮

{additory-0.1.0a3 → additory-0.1.0a4}/additory/expressions/registry.py RENAMED Viewed

@@ -28,9 +28,9 @@ class ResolvedFormula:
     version: str
     mode: str = "local"
     namespace: str = "builtin"  # NEW: "builtin" or "user"
-    ast: dict | None = None
-    sample_clean: dict | None = None
-    sample_unclean: dict | None = None
+    ast: Optional[dict] = None
+    sample_clean: Optional[dict] = None
+    sample_unclean: Optional[dict] = None
 # ------------------------------------------------------------

additory 0.1.0a3__tar.gz → 0.1.0a4__tar.gz

additory 0.1.0a3tar.gz → 0.1.0a4tar.gz