PyPI - featcopilot - Versions diffs - 0.2.0__tar.gz → 0.3.0__tar.gz - Mend

featcopilot 0.2.0tar.gz → 0.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (52) hide show

{featcopilot-0.2.0 → featcopilot-0.3.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: featcopilot
-Version: 0.2.0
+Version: 0.3.0
 Summary: Next-generation LLM-powered auto feature engineering framework with GitHub Copilot SDK
 Author: FeatCopilot Contributors
 License: MIT
@@ -46,8 +46,9 @@ Provides-Extra: benchmark
 Requires-Dist: github-copilot-sdk>=0.1.0; extra == "benchmark"
 Requires-Dist: statsmodels>=0.13.0; extra == "benchmark"
 Requires-Dist: flaml[automl,blendsearch]>=2.0.0; extra == "benchmark"
-Requires-Dist: autogluon.tabular>=1.0.0; extra == "benchmark"
+Requires-Dist: autogluon.tabular[fastai]>=1.5.0; extra == "benchmark"
 Requires-Dist: h2o>=3.40.0; extra == "benchmark"
+Requires-Dist: numpy<2; extra == "benchmark"
 Provides-Extra: dev
 Requires-Dist: pytest>=7.0.0; extra == "dev"
 Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
@@ -63,28 +64,35 @@ Requires-Dist: pre-commit>=3.6.0; extra == "dev"
 FeatCopilot automatically generates, selects, and explains predictive features using semantic understanding. It analyzes column meanings, applies domain-aware transformations, and provides human-readable explanations—turning raw data into ML-ready features in seconds.
+## 🎬 Introduction Video
+[![FeatCopilot Introduction](https://img.youtube.com/vi/H7m50TLGHFk/0.jpg)](https://www.youtube.com/watch?v=H7m50TLGHFk)
 ## 📊 Benchmark Highlights
-### Tabular Engine (Fast Mode - <1s)
+### Simple Models Benchmark (42 Datasets)
+| Configuration | Improved | Avg Improvement | Best Improvement |
+|---------------|----------|-----------------|------------------|
+| **Tabular Engine** | 20 (48%) | +4.54% | +197% (delays_zurich) |
+| **Tabular + LLM** | 23 (55%) | +6.12% | +420% (delays_zurich) |
+Models: RandomForest (n_estimators=200, max_depth=20), LogisticRegression/Ridge
-| Task Type | Average Improvement | Best Case |
-|-----------|--------------------:|----------:|
-| **Text Classification** | **+12.44%** | +49.02% (News Headlines) |
-| Time Series | +1.51% | +12.12% (Retail Demand) |
-| Classification | +0.54% | +4.35% |
-| Regression | +0.65% | +5.57% |
+### AutoML Benchmark (FLAML, 120s budget)
-### LLM Engine (With LiteLLM - 30-60s)
+| Metric | Value |
+|--------|-------|
+| **Datasets** | 41 |
+| **Improved** | 19 (46%) |
+| **Best Improvement** | +8.55% (abalone) |
-| Task Type | Average Improvement | Best Case |
-|-----------|--------------------:|----------:|
-| **Regression** | **+7.79%** | +19.66% (Retail Demand) |
-| Classification | +2.38% | +2.87% |
+### Key Results
-- ✅ **12/12 wins** on text classification (tabular mode)
-- 🧠 **+19.66% max improvement** with LLM-powered features
-- ⚡ **<1 second** (tabular) or **30-60s** (with LLM) processing time
-- 📈 Largest gains with simple models (LogisticRegression, Ridge)
+- ✅ **+197% improvement** on delays_zurich (tabular only)
+- 🧠 **+420% improvement** with LLM-enhanced features
+- 📈 **+8.98%** on abalone regression task
+- 🚀 **+5.68%** on complex_classification
 [View Full Benchmark Results](https://thinkall.github.io/featcopilot/user-guide/benchmarks/)
@@ -131,7 +139,7 @@ print(f"Features: {X.shape[1]} -> {X_transformed.shape[1]}")
 ```python
 from featcopilot import AutoFeatureEngineer
-# LLM-powered semantic features (+19.66% max improvement)
+# LLM-powered semantic features (+420% max improvement)
 engineer = AutoFeatureEngineer(
     engines=['tabular', 'llm'],
     max_features=50

{featcopilot-0.2.0 → featcopilot-0.3.0}/README.md RENAMED Viewed

@@ -4,28 +4,35 @@
 FeatCopilot automatically generates, selects, and explains predictive features using semantic understanding. It analyzes column meanings, applies domain-aware transformations, and provides human-readable explanations—turning raw data into ML-ready features in seconds.
+## 🎬 Introduction Video
+[![FeatCopilot Introduction](https://img.youtube.com/vi/H7m50TLGHFk/0.jpg)](https://www.youtube.com/watch?v=H7m50TLGHFk)
 ## 📊 Benchmark Highlights
-### Tabular Engine (Fast Mode - <1s)
+### Simple Models Benchmark (42 Datasets)
+| Configuration | Improved | Avg Improvement | Best Improvement |
+|---------------|----------|-----------------|------------------|
+| **Tabular Engine** | 20 (48%) | +4.54% | +197% (delays_zurich) |
+| **Tabular + LLM** | 23 (55%) | +6.12% | +420% (delays_zurich) |
+Models: RandomForest (n_estimators=200, max_depth=20), LogisticRegression/Ridge
-| Task Type | Average Improvement | Best Case |
-|-----------|--------------------:|----------:|
-| **Text Classification** | **+12.44%** | +49.02% (News Headlines) |
-| Time Series | +1.51% | +12.12% (Retail Demand) |
-| Classification | +0.54% | +4.35% |
-| Regression | +0.65% | +5.57% |
+### AutoML Benchmark (FLAML, 120s budget)
-### LLM Engine (With LiteLLM - 30-60s)
+| Metric | Value |
+|--------|-------|
+| **Datasets** | 41 |
+| **Improved** | 19 (46%) |
+| **Best Improvement** | +8.55% (abalone) |
-| Task Type | Average Improvement | Best Case |
-|-----------|--------------------:|----------:|
-| **Regression** | **+7.79%** | +19.66% (Retail Demand) |
-| Classification | +2.38% | +2.87% |
+### Key Results
-- ✅ **12/12 wins** on text classification (tabular mode)
-- 🧠 **+19.66% max improvement** with LLM-powered features
-- ⚡ **<1 second** (tabular) or **30-60s** (with LLM) processing time
-- 📈 Largest gains with simple models (LogisticRegression, Ridge)
+- ✅ **+197% improvement** on delays_zurich (tabular only)
+- 🧠 **+420% improvement** with LLM-enhanced features
+- 📈 **+8.98%** on abalone regression task
+- 🚀 **+5.68%** on complex_classification
 [View Full Benchmark Results](https://thinkall.github.io/featcopilot/user-guide/benchmarks/)
@@ -72,7 +79,7 @@ print(f"Features: {X.shape[1]} -> {X_transformed.shape[1]}")
 ```python
 from featcopilot import AutoFeatureEngineer
-# LLM-powered semantic features (+19.66% max improvement)
+# LLM-powered semantic features (+420% max improvement)
 engineer = AutoFeatureEngineer(
     engines=['tabular', 'llm'],
     max_features=50

{featcopilot-0.2.0 → featcopilot-0.3.0}/featcopilot/__init__.py RENAMED Viewed

@@ -12,6 +12,9 @@ __author__ = "FeatCopilot Contributors"
 from featcopilot.core.base import BaseEngine, BaseSelector
 from featcopilot.core.feature import Feature, FeatureSet
+from featcopilot.core.transform_rule import TransformRule
+from featcopilot.llm.transform_rule_generator import TransformRuleGenerator
+from featcopilot.stores.rule_store import TransformRuleStore
 from featcopilot.transformers.sklearn_compat import (
     AutoFeatureEngineer,
     FeatureEngineerTransformer,
@@ -23,6 +26,10 @@ __all__ = [
     "BaseSelector",
     "Feature",
     "FeatureSet",
+    # Transform Rules
+    "TransformRule",
+    "TransformRuleStore",
+    "TransformRuleGenerator",
     # Main API
     "AutoFeatureEngineer",
     "FeatureEngineerTransformer",

{featcopilot-0.2.0 → featcopilot-0.3.0}/featcopilot/core/__init__.py RENAMED Viewed

@@ -3,6 +3,7 @@
 from featcopilot.core.base import BaseEngine, BaseSelector
 from featcopilot.core.feature import Feature, FeatureSet
 from featcopilot.core.registry import FeatureRegistry
+from featcopilot.core.transform_rule import TransformRule
 __all__ = [
     "BaseEngine",
@@ -10,4 +11,5 @@ __all__ = [
     "Feature",
     "FeatureSet",
     "FeatureRegistry",
+    "TransformRule",
 ]

featcopilot-0.3.0/featcopilot/core/transform_rule.py ADDED Viewed

@@ -0,0 +1,276 @@
+"""Transform rule model for reusable feature transformations.
+Defines TransformRule - a reusable transformation that can be created from
+natural language descriptions and applied across different datasets.
+"""
+import re
+import uuid
+from datetime import datetime, timezone
+from typing import Any, Optional
+import numpy as np
+import pandas as pd
+from pydantic import BaseModel, Field
+from featcopilot.utils.logger import get_logger
+logger = get_logger(__name__)
+class TransformRule(BaseModel):
+    """
+    A reusable feature transformation rule.
+    Transform rules capture feature engineering logic that can be generated
+    from natural language descriptions and reused across different datasets.
+    Parameters
+    ----------
+    id : str, optional
+        Unique identifier for the rule
+    name : str
+        Human-readable name for the rule
+    description : str
+        Natural language description of what the rule does
+    code : str
+        Python code that implements the transformation
+    input_columns : list[str]
+        Column names or patterns this rule expects as input
+    output_name : str, optional
+        Name for the output feature (default: derived from rule name)
+    output_type : str
+        Expected output data type ('numeric', 'categorical', 'boolean')
+    tags : list[str]
+        Tags for categorization and search
+    column_patterns : list[str]
+        Regex patterns for matching columns (e.g., 'price.*', '.*_amount')
+    usage_count : int
+        Number of times this rule has been applied
+    created_at : str
+        ISO timestamp of rule creation
+    metadata : dict
+        Additional metadata
+    Examples
+    --------
+    >>> rule = TransformRule(
+    ...     name="ratio_calculation",
+    ...     description="Calculate ratio of two numeric columns",
+    ...     code="result = df['{col1}'] / (df['{col2}'] + 1e-8)",
+    ...     input_columns=["col1", "col2"],
+    ...     tags=["ratio", "numeric"]
+    ... )
+    >>> result = rule.apply(df, column_mapping={"col1": "price", "col2": "quantity"})
+    """
+    id: str = Field(default_factory=lambda: str(uuid.uuid4())[:8], description="Unique rule identifier")
+    name: str = Field(description="Human-readable rule name")
+    description: str = Field(description="Natural language description of the transformation")
+    code: str = Field(description="Python code implementing the transformation")
+    input_columns: list[str] = Field(default_factory=list, description="Expected input column names or placeholders")
+    output_name: Optional[str] = Field(default=None, description="Output feature name")
+    output_type: str = Field(default="numeric", description="Output data type")
+    tags: list[str] = Field(default_factory=list, description="Tags for categorization")
+    column_patterns: list[str] = Field(default_factory=list, description="Regex patterns for column matching")
+    usage_count: int = Field(default=0, description="Number of times applied")
+    created_at: str = Field(
+        default_factory=lambda: datetime.now(timezone.utc).isoformat(), description="Creation timestamp"
+    )
+    metadata: dict[str, Any] = Field(default_factory=dict, description="Additional metadata")
+    def get_output_name(self, column_mapping: Optional[dict[str, str]] = None) -> str:
+        """
+        Get the output feature name.
+        Parameters
+        ----------
+        column_mapping : dict, optional
+            Mapping from placeholder columns to actual column names
+        Returns
+        -------
+        str
+            Output feature name
+        """
+        if self.output_name:
+            return self.output_name
+        # Generate name from input columns
+        if column_mapping and self.input_columns:
+            cols = [column_mapping.get(c, c) for c in self.input_columns[:2]]
+            return f"{'_'.join(cols)}_{self.name}"
+        return f"rule_{self.name}"
+    def matches_columns(self, columns: list[str]) -> tuple[bool, dict[str, str]]:
+        """
+        Check if this rule can be applied to the given columns.
+        Parameters
+        ----------
+        columns : list[str]
+            Available column names
+        Returns
+        -------
+        matches : bool
+            Whether the rule can be applied
+        mapping : dict
+            Suggested mapping from rule's input_columns to actual columns
+        """
+        if not self.input_columns:
+            return True, {}
+        mapping = {}
+        for input_col in self.input_columns:
+            # Try exact match first
+            if input_col in columns:
+                mapping[input_col] = input_col
+                continue
+            # Try pattern matching
+            matched = False
+            for pattern in self.column_patterns:
+                regex = re.compile(pattern, re.IGNORECASE)
+                for col in columns:
+                    if regex.match(col) and col not in mapping.values():
+                        mapping[input_col] = col
+                        matched = True
+                        break
+                if matched:
+                    break
+            # Try fuzzy matching by checking if input_col is substring
+            if not matched:
+                for col in columns:
+                    if input_col.lower() in col.lower() and col not in mapping.values():
+                        mapping[input_col] = col
+                        matched = True
+                        break
+            if not matched:
+                return False, {}
+        return len(mapping) == len(self.input_columns), mapping
+    def apply(
+        self,
+        df: pd.DataFrame,
+        column_mapping: Optional[dict[str, str]] = None,
+        validate: bool = True,
+    ) -> pd.Series:
+        """
+        Apply the transformation rule to a DataFrame.
+        Parameters
+        ----------
+        df : DataFrame
+            Input data
+        column_mapping : dict, optional
+            Mapping from rule's input_columns to actual column names
+        validate : bool, default=True
+            Whether to validate before execution
+        Returns
+        -------
+        Series
+            Transformed feature values
+        Raises
+        ------
+        ValueError
+            If required columns are missing or code execution fails
+        """
+        column_mapping = column_mapping or {}
+        # Prepare the code with actual column names
+        code = self._prepare_code(column_mapping)
+        if validate:
+            # Check required columns exist
+            for input_col in self.input_columns:
+                actual_col = column_mapping.get(input_col, input_col)
+                if actual_col not in df.columns:
+                    raise ValueError(f"Required column '{actual_col}' not found in DataFrame")
+        # Execute the code in a restricted environment
+        local_vars: dict[str, Any] = {"df": df, "np": np, "pd": pd}
+        try:
+            exec(self._get_safe_code(code), {"__builtins__": self._get_safe_builtins()}, local_vars)
+            if "result" not in local_vars:
+                raise ValueError("Code did not produce a 'result' variable")
+            result = local_vars["result"]
+            # Increment usage count
+            self.usage_count += 1
+            return result
+        except Exception as e:
+            logger.error(f"Failed to apply rule '{self.name}': {e}")
+            raise ValueError(f"Rule execution failed: {e}") from e
+    def _prepare_code(self, column_mapping: dict[str, str]) -> str:
+        """Substitute column placeholders with actual column names."""
+        code = self.code
+        # Replace {col} style placeholders
+        for placeholder, actual in column_mapping.items():
+            code = code.replace(f"{{{{ '{placeholder}' }}}}", f"'{actual}'")
+            code = code.replace(f"{{{placeholder}}}", actual)
+            code = code.replace(f"df['{placeholder}']", f"df['{actual}']")
+            code = code.replace(f'df["{placeholder}"]', f'df["{actual}"]')
+        return code
+    def _get_safe_code(self, code: str) -> str:
+        """Wrap code for safe execution."""
+        return code
+    def _get_safe_builtins(self) -> dict[str, Any]:
+        """Get restricted builtins for safe code execution."""
+        return {
+            "len": len,
+            "sum": sum,
+            "max": max,
+            "min": min,
+            "int": int,
+            "float": float,
+            "str": str,
+            "bool": bool,
+            "abs": abs,
+            "round": round,
+            "pow": pow,
+            "range": range,
+            "list": list,
+            "dict": dict,
+            "set": set,
+            "tuple": tuple,
+            "sorted": sorted,
+            "reversed": reversed,
+            "enumerate": enumerate,
+            "zip": zip,
+            "any": any,
+            "all": all,
+            "map": map,
+            "filter": filter,
+            "isinstance": isinstance,
+            "hasattr": hasattr,
+            "getattr": getattr,
+        }
+    def to_dict(self) -> dict[str, Any]:
+        """Convert rule to dictionary for serialization."""
+        return self.model_dump()
+    @classmethod
+    def from_dict(cls, data: dict[str, Any]) -> "TransformRule":
+        """Create rule from dictionary."""
+        return cls(**data)
+    def __repr__(self) -> str:
+        return f"TransformRule(name='{self.name}', description='{self.description[:50]}...')"

{featcopilot-0.2.0 → featcopilot-0.3.0}/featcopilot/engines/tabular.py RENAMED Viewed

@@ -30,6 +30,16 @@ class TabularEngineConfig(EngineConfig):
     )
     numeric_only: bool = Field(default=True, description="Only process numeric columns")
     min_unique_values: int = Field(default=5, description="Min unique values for continuous")
+    # Categorical encoding settings
+    encode_categorical: bool = Field(default=True, description="Auto-encode categorical columns")
+    keep_original_categorical: bool = Field(
+        default=True, description="Keep original categorical columns (for models that handle them natively)"
+    )
+    onehot_ratio_threshold: float = Field(default=0.05, description="Max n_unique/n_rows ratio for one-hot encoding")
+    target_encode_ratio_threshold: float = Field(
+        default=0.5, description="Max n_unique/n_rows ratio for target encoding"
+    )
+    min_samples_per_category: int = Field(default=3, description="Min samples per category to include")
 class TabularEngine(BaseEngine):
@@ -81,6 +91,10 @@ class TabularEngine(BaseEngine):
         include_transforms: Optional[list[str]] = None,
         max_features: Optional[int] = None,
         verbose: bool = False,
+        encode_categorical: bool = True,
+        onehot_ratio_threshold: float = 0.05,
+        target_encode_ratio_threshold: float = 0.5,
+        min_samples_per_category: int = 3,
         **kwargs,
     ):
         config = TabularEngineConfig(
@@ -89,12 +103,22 @@ class TabularEngine(BaseEngine):
             include_transforms=include_transforms or ["log", "sqrt", "square"],
             max_features=max_features,
             verbose=verbose,
+            encode_categorical=encode_categorical,
+            onehot_ratio_threshold=onehot_ratio_threshold,
+            target_encode_ratio_threshold=target_encode_ratio_threshold,
+            min_samples_per_category=min_samples_per_category,
             **kwargs,
         )
         super().__init__(config=config)
         self.config: TabularEngineConfig = config
         self._numeric_columns: list[str] = []
         self._feature_set = FeatureSet()
+        # Categorical encoding state
+        self._onehot_columns: list[str] = []
+        self._target_encode_columns: list[str] = []
+        self._onehot_categories: dict[str, list] = {}
+        self._target_encode_maps: dict[str, dict] = {}
+        self._target_encode_global_mean: float = 0.0
     def fit(
         self,
@@ -110,7 +134,7 @@ class TabularEngine(BaseEngine):
         X : DataFrame or ndarray
             Input features
         y : Series or ndarray, optional
-            Target variable (unused, for API compatibility)
+            Target variable (used for target encoding of categorical columns)
         Returns
         -------
@@ -129,12 +153,91 @@ class TabularEngine(BaseEngine):
         if self.config.verbose:
             logger.info(f"TabularEngine: Found {len(self._numeric_columns)} numeric columns")
+        # Handle categorical columns
+        if self.config.encode_categorical:
+            self._fit_categorical_encoding(X, y)
         # Plan features to generate
         self._plan_features(X)
         self._is_fitted = True
         return self
+    def _fit_categorical_encoding(self, X: pd.DataFrame, y: Optional[Union[pd.Series, np.ndarray]] = None) -> None:
+        """Fit categorical encoding based on cardinality ratio."""
+        self._onehot_columns = []
+        self._target_encode_columns = []
+        self._onehot_categories = {}
+        self._target_encode_maps = {}
+        self._target_label_encoder = None  # For string targets
+        # Find categorical columns (object or category dtype)
+        cat_cols = X.select_dtypes(include=["object", "category"]).columns.tolist()
+        if not cat_cols:
+            return
+        n_rows = len(X)
+        y_encoded = None
+        if y is not None:
+            y_series = pd.Series(y) if not isinstance(y, pd.Series) else y
+            # Check if target is string/categorical - encode it for target encoding
+            if y_series.dtype == "object" or y_series.dtype.name == "category":
+                from sklearn.preprocessing import LabelEncoder
+                self._target_label_encoder = LabelEncoder()
+                y_encoded = pd.Series(self._target_label_encoder.fit_transform(y_series.astype(str)))
+                self._target_encode_global_mean = float(y_encoded.mean())
+            else:
+                y_encoded = y_series
+                self._target_encode_global_mean = float(y_series.mean())
+        for col in cat_cols:
+            n_unique = X[col].nunique()
+            ratio = n_unique / n_rows
+            # Count samples per category
+            value_counts = X[col].value_counts()
+            # Filter categories with enough samples
+            valid_categories = value_counts[value_counts >= self.config.min_samples_per_category].index.tolist()
+            if len(valid_categories) == 0:
+                if self.config.verbose:
+                    logger.info(f"TabularEngine: Skipping '{col}' - no categories with enough samples")
+                continue
+            if ratio <= self.config.onehot_ratio_threshold:
+                # One-hot encoding for low cardinality
+                self._onehot_columns.append(col)
+                self._onehot_categories[col] = valid_categories
+                if self.config.verbose:
+                    logger.info(
+                        f"TabularEngine: One-hot encoding '{col}' "
+                        f"({len(valid_categories)} categories, ratio={ratio:.4f})"
+                    )
+            elif ratio <= self.config.target_encode_ratio_threshold and y_encoded is not None:
+                # Target encoding for medium cardinality
+                self._target_encode_columns.append(col)
+                # Compute target mean per category (using encoded target for string labels)
+                df_temp = pd.DataFrame({"col": X[col], "y": y_encoded})
+                target_means = df_temp.groupby("col")["y"].mean().to_dict()
+                # Only keep valid categories
+                self._target_encode_maps[col] = {k: v for k, v in target_means.items() if k in valid_categories}
+                if self.config.verbose:
+                    logger.info(
+                        f"TabularEngine: Target encoding '{col}' "
+                        f"({len(self._target_encode_maps[col])} categories, ratio={ratio:.4f})"
+                    )
+            else:
+                # High cardinality - likely ID column, skip
+                if self.config.verbose:
+                    logger.info(
+                        f"TabularEngine: Skipping '{col}' - high cardinality " f"({n_unique} unique, ratio={ratio:.4f})"
+                    )
     def _plan_features(self, X: pd.DataFrame) -> None:
         """Plan which features to generate."""
         self._feature_set = FeatureSet()
@@ -231,11 +334,19 @@ class TabularEngine(BaseEngine):
         X = self._validate_input(X)
         result = X.copy()
+        original_columns = set(X.columns)
+        # Apply categorical encoding first
+        if self.config.encode_categorical:
+            result = self._transform_categorical(result)
         cols = self._numeric_columns
-        feature_count = 0
         max_features = self.config.max_features
+        # Count categorical features generated so far against max_features
+        categorical_features = [c for c in result.columns if c not in original_columns]
+        feature_count = len(categorical_features)
         # Generate polynomial features
         if not self.config.interaction_only:
             for col in cols:
@@ -291,6 +402,38 @@ class TabularEngine(BaseEngine):
         return result
+    def _transform_categorical(self, X: pd.DataFrame) -> pd.DataFrame:
+        """Apply categorical encoding to DataFrame."""
+        result = X.copy()
+        # One-hot encoding
+        for col in self._onehot_columns:
+            if col not in result.columns:
+                continue
+            categories = self._onehot_categories.get(col, [])
+            for cat in categories:
+                col_name = f"{col}_{cat}"
+                result[col_name] = (result[col] == cat).astype(int)
+            # Add "other" column for rare categories
+            col_other = f"{col}_other"
+            result[col_other] = (~result[col].isin(categories)).astype(int)
+            # Drop original column only if not keeping original categorical
+            if not self.config.keep_original_categorical:
+                result = result.drop(columns=[col])
+        # Target encoding
+        for col in self._target_encode_columns:
+            if col not in result.columns:
+                continue
+            encode_map = self._target_encode_maps.get(col, {})
+            col_name = f"{col}_target_encoded"
+            result[col_name] = result[col].map(encode_map).fillna(self._target_encode_global_mean)
+            # Drop original column only if not keeping original categorical
+            if not self.config.keep_original_categorical:
+                result = result.drop(columns=[col])
+        return result
     def get_feature_set(self) -> FeatureSet:
         """Get the feature set with metadata."""
         return self._feature_set

featcopilot 0.2.0__tar.gz → 0.3.0__tar.gz

featcopilot 0.2.0tar.gz → 0.3.0tar.gz