PyPI - mloda - Versions diffs - 0.2.15__py3-none-any.whl → 0.3.0__py3-none-any.whl - Mend

mloda 0.2.15py3-none-any.whl → 0.3.0py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

{mloda-0.2.15.dist-info → mloda-0.3.0.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: mloda
-Version: 0.2.15
+Version: 0.3.0
 Summary: Rethinking Data and Feature Engineering
 Author-email: Tom Kaltofen <info@mloda.ai>
 License: Apache-2.0
@@ -87,7 +87,7 @@ result = mlodaAPI.run_all(
     features=[
         "customer_id",                    # Original column
         "age",                            # Original column
-        "standard_scaled__income"         # Transform: scale income to mean=0, std=1
+        "income__standard_scaled"         # Transform: scale income to mean=0, std=1
     ],
     compute_frameworks={PandasDataframe}
 )
@@ -104,54 +104,54 @@ print(data.head())
 3. **mlodaAPI.run_all()** - Executed the feature pipeline:
    - Got data from `SampleData`
    - Extracted `customer_id` and `age` as-is
-   - Applied StandardScaler to `income` → `standard_scaled__income`
+   - Applied StandardScaler to `income` → `income__standard_scaled`
 4. **result[0]** - Retrieved the processed pandas DataFrame
-> **Key Insight**: The syntax `standard_scaled__income` is mloda's **feature chaining**. Behind the scenes, mloda creates a chain of **feature group** objects (`StandardScalingFeatureGroup` → `SourceFeatureGroup`), automatically resolving dependencies. See [Section 2](#2-understanding-feature-chaining-transformations) for full explanation of chaining syntax and [Section 4](#4-advanced-feature-objects-for-complex-configurations) to learn about the underlying feature group architecture.
+> **Key Insight**: The syntax `income__standard_scaled` is mloda's **feature chaining**. Behind the scenes, mloda creates a chain of **feature group** objects (`SourceFeatureGroup` → `StandardScalingFeatureGroup`), automatically resolving dependencies. See [Section 2](#2-understanding-feature-chaining-transformations) for full explanation of chaining syntax and [Section 4](#4-advanced-feature-objects-for-complex-configurations) to learn about the underlying feature group architecture.
 ### 2. Understanding Feature Chaining (Transformations)
 **The Power of Double Underscore `__` Syntax**
-As mentioned in Section 1, feature chaining (like `standard_scaled__income`) is syntactic sugar that mloda converts into a chain of **feature group objects**. Each transformation (`standard_scaled`, `mean_imputed`, etc.) corresponds to a specific feature group class.
+As mentioned in Section 1, feature chaining (like `income__standard_scaled`) is syntactic sugar that mloda converts into a chain of **feature group objects**. Each transformation (`standard_scaled`, `mean_imputed`, etc.) corresponds to a specific feature group class.
 mloda's chaining syntax lets you compose transformations using `__` as a separator:
 ```python
 # Pattern examples (these show the syntax):
-#   "standard_scaled__income"                     # Scale income column
-#   "mean_imputed__age"                           # Fill missing age values with mean
-#   "onehot_encoded__category"                    # One-hot encode category column
+#   "income__standard_scaled"                     # Scale income column
+#   "age__mean_imputed"                           # Fill missing age values with mean
+#   "category__onehot_encoded"                    # One-hot encode category column
 #
 # You can chain transformations!
-# Pattern: {transform2}__{transform1}__{source}
-#   "standard_scaled__mean_imputed__income"       # First impute, then scale
+# Pattern: {source}__{transform1}__{transform2}
+#   "income__mean_imputed__standard_scaled"       # First impute, then scale
 # Real working example:
-_ = ["standard_scaled__income", "mean_imputed__age"]  # Valid feature names
+_ = ["income__standard_scaled", "age__mean_imputed"]  # Valid feature names
 ```
 **Available Transformations:**
 | Transformation | Purpose | Example |
 |---------------|---------|---------|
-| `standard_scaled__` | StandardScaler (mean=0, std=1) | `standard_scaled__income` |
-| `minmax_scaled__` | MinMaxScaler (range [0,1]) | `minmax_scaled__age` |
-| `robust_scaled__` | RobustScaler (median-based, handles outliers) | `robust_scaled__price` |
-| `mean_imputed__` | Fill missing values with mean | `mean_imputed__salary` |
-| `median_imputed__` | Fill missing values with median | `median_imputed__age` |
-| `mode_imputed__` | Fill missing values with mode | `mode_imputed__category` |
-| `onehot_encoded__` | One-hot encoding | `onehot_encoded__state` |
-| `label_encoded__` | Label encoding | `label_encoded__priority` |
+| `__standard_scaled` | StandardScaler (mean=0, std=1) | `income__standard_scaled` |
+| `__minmax_scaled` | MinMaxScaler (range [0,1]) | `age__minmax_scaled` |
+| `__robust_scaled` | RobustScaler (median-based, handles outliers) | `price__robust_scaled` |
+| `__mean_imputed` | Fill missing values with mean | `salary__mean_imputed` |
+| `__median_imputed` | Fill missing values with median | `age__median_imputed` |
+| `__mode_imputed` | Fill missing values with mode | `category__mode_imputed` |
+| `__onehot_encoded` | One-hot encoding | `state__onehot_encoded` |
+| `__label_encoded` | Label encoding | `priority__label_encoded` |
-> **Key Insight**: Transformations are read right-to-left. `standard_scaled__mean_imputed__income` means: take `income` → apply mean imputation → apply standard scaling.
+> **Key Insight**: Transformations are read left-to-right. `income__mean_imputed__standard_scaled` means: take `income` → apply mean imputation → apply standard scaling.
 **When You Need More Control**
 Most of the time, simple string syntax is enough:
 ```python
 # Example feature list (simple strings)
-example_features = ["customer_id", "standard_scaled__income", "onehot_encoded__region"]
+example_features = ["customer_id", "income__standard_scaled", "region__onehot_encoded"]
 ```
 But for advanced configurations, you can explicitly create `Feature` objects with custom options (covered in Section 3).
@@ -160,11 +160,11 @@ But for advanced configurations, you can explicitly create `Feature` objects wit
 **Understanding the Feature Group Architecture**
-Behind the scenes, chaining like `standard_scaled__income` creates feature group objects:
+Behind the scenes, chaining like `income__standard_scaled` creates feature group objects:
 ```python
 # When you write this string:
-"standard_scaled__income"
+"income__standard_scaled"
 # mloda creates this chain of feature groups:
 # StandardScalingFeatureGroup (reads from) → IncomeSourceFeatureGroup
@@ -236,7 +236,7 @@ mloda supports multiple data access patterns depending on your use case:
 # )
 #
 # result = mlodaAPI.run_all(
-#     features=["customer_id", "standard_scaled__income"],
+#     features=["customer_id", "income__standard_scaled"],
 #     compute_frameworks={PandasDataframe},
 #     data_access_collection=data_access
 # )
@@ -254,7 +254,7 @@ mloda supports multiple data access patterns depending on your use case:
 # )
 #
 # result = mlodaAPI.run_all(
-#     features=["customer_id", "standard_scaled__age"],
+#     features=["customer_id", "age__standard_scaled"],
 #     compute_frameworks={PandasDataframe},
 #     api_input_data_collection=api_input_data_collection,
 #     api_data=api_data
@@ -273,7 +273,7 @@ mloda supports multiple compute frameworks (pandas, polars, pyarrow, etc.). Most
 # Using the SampleData class from Section 1
 # Default: Everything processes with pandas
 result = mlodaAPI.run_all(
-    features=["customer_id", "standard_scaled__income"],
+    features=["customer_id", "income__standard_scaled"],
     compute_frameworks={PandasDataframe}  # Use pandas for all features
 )
@@ -334,12 +334,12 @@ from sklearn.metrics import accuracy_score
 result = mlodaAPI.run_all(
     features=[
         "customer_id",
-        "standard_scaled__age",
-        "standard_scaled__income",
-        "robust_scaled__account_balance",
-        "label_encoded__subscription_tier",
-        "label_encoded__region",
-        "label_encoded__customer_segment",
+        "age__standard_scaled",
+        "income__standard_scaled",
+        "account_balance__robust_scaled",
+        "subscription_tier__label_encoded",
+        "region__label_encoded",
+        "customer_segment__label_encoded",
         "churned"
     ],
     compute_frameworks={PandasDataframe}

{mloda-0.2.15.dist-info → mloda-0.3.0.dist-info}/RECORD RENAMED Viewed

@@ -1,5 +1,5 @@
-mloda-0.2.15.dist-info/licenses/LICENSE.TXT,sha256=gmhQwSkHxjiShsqQ1FpJ-20YFtaa4vRCE7aCx55-6nk,11366
-mloda-0.2.15.dist-info/licenses/NOTICE.md,sha256=Hu10B2sPnGLIHxZ4QhACSLLxukJpeJzjvkzCu48q5fY,520
+mloda-0.3.0.dist-info/licenses/LICENSE.TXT,sha256=gmhQwSkHxjiShsqQ1FpJ-20YFtaa4vRCE7aCx55-6nk,11366
+mloda-0.3.0.dist-info/licenses/NOTICE.md,sha256=Hu10B2sPnGLIHxZ4QhACSLLxukJpeJzjvkzCu48q5fY,520
 mloda_core/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
 mloda_core/abstract_plugins/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
 mloda_core/abstract_plugins/abstract_feature_group.py,sha256=I3fVEULHUtrvPoc94iyxyBQVacD7GGI5piqJ6FoqgAY,18435
@@ -22,7 +22,7 @@ mloda_core/abstract_plugins/components/options.py,sha256=k3fLwT4DpHN1Dmeht8mtXqj
 mloda_core/abstract_plugins/components/parallelization_modes.py,sha256=k7z5yvyQfhfNYcljfZ0dWBf0ZMpnCSqaW0vajCh202Q,144
 mloda_core/abstract_plugins/components/utils.py,sha256=_ofeiOBQLwYU3_p9JBe61Ihps4dpFUcsrqI6XrA92Yo,530
 mloda_core/abstract_plugins/components/feature_chainer/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
-mloda_core/abstract_plugins/components/feature_chainer/feature_chain_parser.py,sha256=xmMIQProp2y5kM6b7IGnkpwaXm2Yq4p7D_FtYX9sCsE,13180
+mloda_core/abstract_plugins/components/feature_chainer/feature_chain_parser.py,sha256=dqwQOLJTOrEFmG-lIwGrKZnJ9rilEDDNAfC373dLJHQ,13289
 mloda_core/abstract_plugins/components/framework_transformer/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
 mloda_core/abstract_plugins/components/framework_transformer/base_transformer.py,sha256=3eRSOzYZZ4OHRezvUnw4RLTUjirMGtcZCKQYJ1MuuZU,5793
 mloda_core/abstract_plugins/components/framework_transformer/cfw_transformer.py,sha256=dODu95RTxAmLExId2XxPau-GZhBaGCO6k1sPntcwjfk,4298
@@ -114,7 +114,7 @@ mloda_plugins/compute_framework/base_implementations/python_dict/python_dict_fra
 mloda_plugins/compute_framework/base_implementations/python_dict/python_dict_merge_engine.py,sha256=ueuL1i4B9OmCKYFBGHwXvlTOu_qD-mDdptMcx1VjH1s,8347
 mloda_plugins/compute_framework/base_implementations/python_dict/python_dict_pyarrow_transformer.py,sha256=S0yn42V95bN6Zxv2_JRRmX6NR_o7maEdzPluJrqpqD0,3438
 mloda_plugins/compute_framework/base_implementations/spark/spark_filter_engine.py,sha256=w6Z6cFQhmy1sl4bH5R9KFVdJGq-B5_s0bfHuzmpifKM,5256
-mloda_plugins/compute_framework/base_implementations/spark/spark_framework.py,sha256=Jf57IEHKPXwlpc3A8jnoka8T-JVSFPIny_wxWKo86zw,8168
+mloda_plugins/compute_framework/base_implementations/spark/spark_framework.py,sha256=yiUa66tV8ckTpaWZ-B3YS1_B63j_YIjM_xG-WAcuKIs,8279
 mloda_plugins/compute_framework/base_implementations/spark/spark_merge_engine.py,sha256=syBOP6Ww9A_IfeJc49jpxByeP5PVvZTM9FFTUCZc3Xg,3452
 mloda_plugins/compute_framework/base_implementations/spark/spark_pyarrow_transformer.py,sha256=CtIOllhGdYQisIiG0Ml0haG4sBC2UmrxKl8bhp4gzjY,3303
 mloda_plugins/config/__init__.py,sha256=wm08JOS1kVronYOtmPJZCcEeMlA9wPOCFAIJG_Isi8c,34
@@ -127,29 +127,29 @@ mloda_plugins/feature_group/experimental/__init__.py,sha256=47DEQpj8HBSa-_TImW-5
 mloda_plugins/feature_group/experimental/default_options_key.py,sha256=GpSwOvR806wWZJ93DxC-Y3hnt4g7E4dELm8B5k6mZ0I,1040
 mloda_plugins/feature_group/experimental/source_input_feature.py,sha256=SXnC8iB6WxSbj-w5qtnRHtxV4K9H4qsg3uMJd3zg3GA,11080
 mloda_plugins/feature_group/experimental/aggregated_feature_group/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
-mloda_plugins/feature_group/experimental/aggregated_feature_group/base.py,sha256=VpYgJnAQP3jmQ7YXS6239TLExu0nA681-fZoD2CjrqQ,11426
+mloda_plugins/feature_group/experimental/aggregated_feature_group/base.py,sha256=t16yx9bnU8FSQwg4mGoBZWDQ30eWc0TzRE6D5Bg1tc8,11423
 mloda_plugins/feature_group/experimental/aggregated_feature_group/pandas.py,sha256=7ntyidBNFVo-SmGTf1M5M0q-4dKonjQgmeqJ6XmwfYY,5014
 mloda_plugins/feature_group/experimental/aggregated_feature_group/polars_lazy.py,sha256=ulsr6HDHHeaNSA63Fo4FIm5TwrBzFIfQgNrRWfWAX3I,6294
 mloda_plugins/feature_group/experimental/aggregated_feature_group/pyarrow.py,sha256=A7mgNeAwa5-afpnkNDIf3xbDxspqh198wjxkvcMBsF8,5738
 mloda_plugins/feature_group/experimental/clustering/__init__.py,sha256=769NSapfi48V7BBh8zoo-ale2We6K4OV6ocNlzAhfEw,59
-mloda_plugins/feature_group/experimental/clustering/base.py,sha256=7NFibbkFu4Wv5FMc3OFsMvSVqWDKFdoxx8YNk2XiJG0,18592
+mloda_plugins/feature_group/experimental/clustering/base.py,sha256=ijJeAq2nqkc5TNzuz30kSgs4MsFcGvvUf0XbynC1-Bo,18569
 mloda_plugins/feature_group/experimental/clustering/pandas.py,sha256=0k3gBw3ITzt9DMnOG2PCt4o0NzdOQy9-XM15M51Xqas,19327
 mloda_plugins/feature_group/experimental/data_quality/__init__.py,sha256=ga8jdKaLl4bxkxMqNtRbrkHFnRWZIp8f3bR7DVG5d-I,45
 mloda_plugins/feature_group/experimental/data_quality/missing_value/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
-mloda_plugins/feature_group/experimental/data_quality/missing_value/base.py,sha256=-fu4ziP1dCtjy0iHwfxNvKYhinaLBK8v29Qr2xc_zoQ,15193
+mloda_plugins/feature_group/experimental/data_quality/missing_value/base.py,sha256=FIJnlIAq6U5PW1pa52W1mXHjPV0_YZB_vR4ml5xrLeM,15780
 mloda_plugins/feature_group/experimental/data_quality/missing_value/pandas.py,sha256=8l-uXJmxjlra8ADQisTQwla2abjT1UUplwuoyKIxp3k,8682
 mloda_plugins/feature_group/experimental/data_quality/missing_value/pyarrow.py,sha256=d13kWrXxdRddQ_6GbX5hKMNKpY9iRwhmVcx0CG5wafQ,14346
 mloda_plugins/feature_group/experimental/data_quality/missing_value/python_dict.py,sha256=OrOd5MZdbnL4DCJFSZYuda5t2b5MvOBqdedgIPisV9g,13968
-mloda_plugins/feature_group/experimental/dimensionality_reduction/base.py,sha256=u3DZMrdz_aopkdrygSkKplJ4y_Jj5Hwww4ot36wJFP4,17431
-mloda_plugins/feature_group/experimental/dimensionality_reduction/pandas.py,sha256=50M72lvFkU4q7QqW8trS26f7NamJnucdrI5fdfnw8uE,13279
+mloda_plugins/feature_group/experimental/dimensionality_reduction/base.py,sha256=aIi6Cx09LxbmQH5geXlR78Cz5cTlMVWWpTbL85NJx34,17466
+mloda_plugins/feature_group/experimental/dimensionality_reduction/pandas.py,sha256=v47-g2gHQnLEjxo0txM9OGlG7nX6kkKrzRTGK0dRkqM,13279
 mloda_plugins/feature_group/experimental/dynamic_feature_group_factory/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
 mloda_plugins/feature_group/experimental/dynamic_feature_group_factory/dynamic_feature_group_factory.py,sha256=6EHBHpDKeg9lapzzMeRnvP392JKskhrxWQ_QZYIkH7Q,12850
 mloda_plugins/feature_group/experimental/forecasting/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
-mloda_plugins/feature_group/experimental/forecasting/base.py,sha256=7Vzl2tl1Jz057GIYk-IaYJTMPBU21IyLxtGnTrXAecI,24033
+mloda_plugins/feature_group/experimental/forecasting/base.py,sha256=8XSTivQwb-UbF62NjikOT_kMFm_ixHnUGVO1hjHe_uQ,24068
 mloda_plugins/feature_group/experimental/forecasting/forecasting_artifact.py,sha256=41HPYoJEXqTqcv6Zvce-vkL9RZ5YrdzSiJgmEFxGVR0,4289
 mloda_plugins/feature_group/experimental/forecasting/pandas.py,sha256=Qus5jwAPs8bp546Y8e_piw6EoHkuru0Sl1UgdG0k_Yg,28913
 mloda_plugins/feature_group/experimental/geo_distance/__init__.py,sha256=wqp7I3j87AmrVBi2rlqcz4Sj-R1QMe3EasmNFb_Zxg4,85
-mloda_plugins/feature_group/experimental/geo_distance/base.py,sha256=CHYbzIBPypKs22-DKlk_PDXf7-obKr9acGY7CXIyxaE,12259
+mloda_plugins/feature_group/experimental/geo_distance/base.py,sha256=Zz7DC4NbEc-oNqRir50bMNx7y8Bhq33WsKRUQmTDQP4,12801
 mloda_plugins/feature_group/experimental/geo_distance/pandas.py,sha256=KwN_-sdpZobBiFev68ar0JWNXmupmAvh6f5L3CtbBAE,6023
 mloda_plugins/feature_group/experimental/llm/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
 mloda_plugins/feature_group/experimental/llm/cli.py,sha256=65VO3deuQyNo2gQWRh6HuJXvzMtnYS6WIdaV-fqCFhc,1409
@@ -181,24 +181,24 @@ mloda_plugins/feature_group/experimental/llm/tools/available/replace_file_tool.p
 mloda_plugins/feature_group/experimental/llm/tools/available/replace_file_tool_which_runs_tox.py,sha256=jTBpsIxF7mzZjeesd9ZeHUDwA17SkbLsL9brvl-YfOo,2119
 mloda_plugins/feature_group/experimental/llm/tools/available/run_single_pytest.py,sha256=dLMb1iunH0EVY7YZ0NmlHC4kVhTOjs2Hjs2412dFTao,4114
 mloda_plugins/feature_group/experimental/llm/tools/available/run_tox.py,sha256=2APL0MD_ExaMzsJK9_WfgDD9dmMY8amsgfc6B4Xgj70,3814
-mloda_plugins/feature_group/experimental/node_centrality/base.py,sha256=2wU4PHrG429B3erVjnpoi9r5uvXisV71gOvXC8phOls,14600
-mloda_plugins/feature_group/experimental/node_centrality/pandas.py,sha256=PI2fjKutagb34WNGPP8yU8lIFU4XR3pLkQ9wFRddkbo,20164
+mloda_plugins/feature_group/experimental/node_centrality/base.py,sha256=bmWEA6qcdmwIz6Va3QYmwjban1YD16sKUiZn8n4Y49Y,14769
+mloda_plugins/feature_group/experimental/node_centrality/pandas.py,sha256=pBvoe-rhAInIPeAKfxLZOrJzAkkauUuxKguhF6XXXws,20261
 mloda_plugins/feature_group/experimental/sklearn/__init__.py,sha256=UubmqLyavXbzW40FeGY06XyORo-x1Uo0WCLcpmPWnAs,208
 mloda_plugins/feature_group/experimental/sklearn/sklearn_artifact.py,sha256=Sa5bIurlF-YZ0ybl1cPJWpLLOUTfaDa1DCffNcEvoVA,12777
 mloda_plugins/feature_group/experimental/sklearn/encoding/__init__.py,sha256=WOe_iTVz2CXmVcL2IUNqhLJQqINFvY2rUktDXsNSOl8,153
-mloda_plugins/feature_group/experimental/sklearn/encoding/base.py,sha256=P88oOfsdXrxZnoAmjMbKhD1ij_RcEWMxywDkdouTgpk,19875
-mloda_plugins/feature_group/experimental/sklearn/encoding/pandas.py,sha256=2kYTEOz_HygPUQSCnprVPHtKLRJg9nhuZim87tpfyJk,6001
+mloda_plugins/feature_group/experimental/sklearn/encoding/base.py,sha256=ikl4PBWU3eUXc9Dxn8llmaEoAtKQ3MaIzRIITbo8IBw,19884
+mloda_plugins/feature_group/experimental/sklearn/encoding/pandas.py,sha256=_U9gD-39wAFVl8tL1QexcJ2WZc7fu6qShuI1L0O1XBI,6001
 mloda_plugins/feature_group/experimental/sklearn/pipeline/__init__.py,sha256=Z_xSZFAFItwRlbBVxbBxwW_S61tQ8r1N8Ih59jTUXqk,199
-mloda_plugins/feature_group/experimental/sklearn/pipeline/base.py,sha256=2Wd4F0fbxCU1KdvwtHpP8M2ir32x3gQI0jFvT78b22U,23646
+mloda_plugins/feature_group/experimental/sklearn/pipeline/base.py,sha256=VsWEp8dNdu3k4NSd6ckPtGBt3hDAnly7a2fzxiylvXM,23447
 mloda_plugins/feature_group/experimental/sklearn/pipeline/pandas.py,sha256=nKLRbqy2q5vFNhgEsHoBnwbaiJheV9bkgizDSYd_epE,4045
 mloda_plugins/feature_group/experimental/sklearn/scaling/__init__.py,sha256=CsQEzK6DJ-WakWqsWTScHYsrBuOwLeX78zYV-NqxuDg,79
-mloda_plugins/feature_group/experimental/sklearn/scaling/base.py,sha256=-rFud7Pu1vrylaF-lflOSG9p7zskppX4GA686dG9-Nk,15409
+mloda_plugins/feature_group/experimental/sklearn/scaling/base.py,sha256=6CqOVyzKgTRdQCRjPT5RFfJTQ453MCO0GOoewpC7cuc,15409
 mloda_plugins/feature_group/experimental/sklearn/scaling/pandas.py,sha256=8-DPSmUsEJVK4dlNh-041FI2YzmQ1Q7p6gWs0Zb7nKI,3960
-mloda_plugins/feature_group/experimental/text_cleaning/base.py,sha256=N36x2njBTTuqCqC1PSB5VFSuG1PfwkrWBJ06XvNNUHc,11350
+mloda_plugins/feature_group/experimental/text_cleaning/base.py,sha256=-7nN7R7-wEkHoGYiry0UHtiL7W5_CKa-T1ktF0q7gUI,11313
 mloda_plugins/feature_group/experimental/text_cleaning/pandas.py,sha256=7RbV8lMUzx5b8ph4IsXnab4v06IByrNOGte9oK7Zz0g,7339
 mloda_plugins/feature_group/experimental/text_cleaning/python_dict.py,sha256=9wRE1RioFRL-OtX467u4OEPvhDTzQAvdB-XAaJ1zDys,7829
 mloda_plugins/feature_group/experimental/time_window/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
-mloda_plugins/feature_group/experimental/time_window/base.py,sha256=KSA6z1OTY4Zfwgylgpxa7MHh6HLWCgN1Q1l2-TnaQuY,18217
+mloda_plugins/feature_group/experimental/time_window/base.py,sha256=TAqEFrnHQVzBtVQ4Y2L5yJ8f35SBo0j9_AFZJJ6bakk,18367
 mloda_plugins/feature_group/experimental/time_window/pandas.py,sha256=YFjkO2Xu_vnB1XfQx2bElKRpUty0Ldic04hiYJKYfEo,7863
 mloda_plugins/feature_group/experimental/time_window/pyarrow.py,sha256=SVwlfIt2qZVFp3InfLoszdSIBZh_EYFGzvIvRW9RVfA,10762
 mloda_plugins/feature_group/input_data/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
@@ -222,8 +222,8 @@ mloda_plugins/function_extender/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm
 mloda_plugins/function_extender/base_implementations/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
 mloda_plugins/function_extender/base_implementations/otel/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
 mloda_plugins/function_extender/base_implementations/otel/otel_extender.py,sha256=M8GKb55ZGaoRaNCQOp69qr3w8jSMSD6D3VuGBpfw2t4,731
-mloda-0.2.15.dist-info/METADATA,sha256=JcCj2VopjqSjyA33sQBtDSJeiUjOJyqed7pnRdR1LpA,16644
-mloda-0.2.15.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
-mloda-0.2.15.dist-info/entry_points.txt,sha256=f7hp7s4laABj9eN5YwEjQAyInF-fa687MXdz-hKYMIA,80
-mloda-0.2.15.dist-info/top_level.txt,sha256=KScNbTs4_vV-mJ1pIlP6cyvMl611B3hNxVYj2hA0Ex4,25
-mloda-0.2.15.dist-info/RECORD,,
+mloda-0.3.0.dist-info/METADATA,sha256=gR1iP4xYXJNucYNPRsxqS8XRs9lv3Dl21indx8rESeQ,16643
+mloda-0.3.0.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
+mloda-0.3.0.dist-info/entry_points.txt,sha256=f7hp7s4laABj9eN5YwEjQAyInF-fa687MXdz-hKYMIA,80
+mloda-0.3.0.dist-info/top_level.txt,sha256=KScNbTs4_vV-mJ1pIlP6cyvMl611B3hNxVYj2hA0Ex4,25
+mloda-0.3.0.dist-info/RECORD,,

mloda_core/abstract_plugins/components/feature_chainer/feature_chain_parser.py CHANGED Viewed

@@ -35,23 +35,24 @@ class FeatureChainParser:
         """Internal method for parsing feature names - used by match_configuration_feature_chain_parser."""
         _feature_name: str = feature_name.name if isinstance(feature_name, FeatureName) else feature_name
-        parts = _feature_name.split("__", 1)
-        itself = parts[0] + "__"  # Ensure we have the prefix part with double underscore
+        parts = _feature_name.rsplit(pattern, 1)
+        source_feature = parts[0] if len(parts) > 1 else ""
+        operation_part = parts[1] if len(parts) > 1 else parts[0]
-        remainder = ""
-        if len(parts) > 1:
-            remainder = parts[1]
-        for prefix_pattern in prefix_patterns:
-            if re.match(prefix_pattern, itself) is None:
+        for suffix_pattern in prefix_patterns:
+            if re.match(suffix_pattern, _feature_name) is None:
                 continue
-            if len(parts) == 1:
+            if len(parts) == 1 or not source_feature:
                 raise ValueError(f"Matches the pattern {pattern}, but has no source feature: {_feature_name}")
-            source_feature = remainder
-            has_prefix_configuration = itself.split(pattern, 1)[0]
-            return has_prefix_configuration, source_feature
+            match = re.match(suffix_pattern, _feature_name)
+            if match and match.groups():
+                operation_config = match.group(1)
+            else:
+                operation_config = operation_part.split("_")[0]
+            return operation_config, source_feature
         return None, None
@@ -286,13 +287,13 @@ class FeatureChainParser:
         return False
     @classmethod
-    def extract_source_feature(cls, feature_name: str, prefix_pattern: str) -> str:
+    def extract_source_feature(cls, feature_name: str, suffix_pattern: str) -> str:
         """
-        Extract the source feature from a feature name based on the prefix pattern.
+        Extract the source feature from a feature name based on the suffix pattern.
         Args:
             feature_name: The feature name to parse
-            prefix_pattern: Regex pattern for the prefix (e.g., r"^([w]+)_aggr__")
+            suffix_pattern: Regex pattern for the suffix (e.g., r"^.+__([w]+)$")
         Returns:
             The source feature part of the name
@@ -300,14 +301,14 @@ class FeatureChainParser:
         Raises:
             ValueError: If the feature name doesn't match the expected pattern
         """
-        match = re.match(prefix_pattern, feature_name)
+        match = re.match(suffix_pattern, feature_name)
         if not match:
             raise ValueError(f"Invalid feature name format: {feature_name}")
-        # Extract the prefix part (everything before the double underscore)
-        prefix_end = feature_name.find("__")
-        if prefix_end == -1:
+        # For L→R: source is everything BEFORE the last __
+        suffix_start = feature_name.rfind("__")
+        if suffix_start == -1:
             raise ValueError(f"Invalid feature name format: {feature_name}. Missing double underscore separator.")
-        # Return everything after the double underscore
-        return feature_name[prefix_end + 2 :]
+        # Return everything BEFORE the last double underscore (the source)
+        return feature_name[:suffix_start]

mloda_plugins/compute_framework/base_implementations/spark/spark_framework.py CHANGED Viewed

@@ -174,6 +174,8 @@ class SparkFramework(ComputeFrameWork):
                     self.set_framework_connection_object()
                 spark = self.framework_connection_object
+                if spark is None:
+                    raise RuntimeError("Failed to initialize Spark session")
                 new_data_df = spark.createDataFrame(
                     [(i + 1, val) for i, val in enumerate(data_list)],
                     StructType(

mloda_plugins/feature_group/experimental/aggregated_feature_group/base.py CHANGED Viewed

@@ -40,15 +40,15 @@ class AggregatedFeatureGroup(AbstractFeatureGroup):
     ### 1. String-Based Creation
-    Features follow the naming pattern: `{aggregation_type}_aggr__{mloda_source_features}`
+    Features follow the naming pattern: `{mloda_source_features}__{aggregation_type}_aggr`
     Examples:
     ```python
     features = [
-        "sum_aggr__sales",           # Sum of sales values
-        "avg_aggr__temperature",     # Average temperature
-        "max_aggr__price",           # Maximum price
-        "count_aggr__transactions"   # Count of transactions
+        "sales__sum_aggr",           # Sum of sales values
+        "temperature__avg_aggr",     # Average temperature
+        "price__max_aggr",           # Maximum price
+        "transactions__count_aggr"   # Count of transactions
     ]
     ```
@@ -96,8 +96,8 @@ class AggregatedFeatureGroup(AbstractFeatureGroup):
         "median": "Median value",
     }
-    PATTERN = "_aggr__"
-    PREFIX_PATTERN = r"^([\w]+)_aggr__"
+    PATTERN = "__"
+    PREFIX_PATTERN = r".*__([\w]+)_aggr$"
     # Property mapping for configuration-based feature creation
     PROPERTY_MAPPING = {

mloda_plugins/feature_group/experimental/clustering/base.py CHANGED Viewed

@@ -27,15 +27,15 @@ class ClusteringFeatureGroup(AbstractFeatureGroup):
     ## Feature Naming Convention
     Clustering features follow this naming pattern:
-    `cluster_{algorithm}_{k_value}__{mloda_source_features}`
+    `{mloda_source_features}__cluster_{algorithm}_{k_value}`
-    The source features (mloda_source_features) are extracted from the feature name and used
-    as input for the clustering algorithm. Note the double underscore before the source features.
+    The source features come first, followed by the clustering operation.
+    Note the double underscore separating the source features from the operation.
     Examples:
-    - `cluster_kmeans_5__customer_behavior`: K-means clustering with 5 clusters on customer behavior data
-    - `cluster_hierarchical_3__transaction_patterns`: Hierarchical clustering with 3 clusters on transaction patterns
-    - `cluster_dbscan_auto__sensor_readings`: DBSCAN clustering with automatic cluster detection on sensor readings
+    - `customer_behavior__cluster_kmeans_5`: K-means clustering with 5 clusters on customer behavior data
+    - `transaction_patterns__cluster_hierarchical_3`: Hierarchical clustering with 3 clusters on transaction patterns
+    - `sensor_readings__cluster_dbscan_auto`: DBSCAN clustering with automatic cluster detection on sensor readings
     ## Configuration-Based Creation
@@ -57,7 +57,7 @@ class ClusteringFeatureGroup(AbstractFeatureGroup):
         )
     )
-    # The Engine will automatically parse this into a feature with name "cluster_kmeans_5__customer_behavior"
+    # The Engine will automatically parse this into a feature with name "customer_behavior__cluster_kmeans_5"
     ```
     ## Parameter Classification
@@ -102,7 +102,7 @@ class ClusteringFeatureGroup(AbstractFeatureGroup):
     }
     # Define the prefix pattern for this feature group
-    PREFIX_PATTERN = r"^cluster_([\w]+)_([\w]+)__"
+    PREFIX_PATTERN = r".*__cluster_([\w]+)_([\w]+)$"
     PATTERN = "__"
     # Property mapping for configuration-based feature creation
@@ -158,7 +158,7 @@ class ClusteringFeatureGroup(AbstractFeatureGroup):
     @classmethod
     def parse_clustering_prefix(cls, feature_name: str) -> tuple[str, str]:
         """
-        Parse the clustering prefix into its components.
+        Parse the clustering suffix into its components.
         Args:
             feature_name: The feature name to parse
@@ -167,23 +167,23 @@ class ClusteringFeatureGroup(AbstractFeatureGroup):
             A tuple containing (algorithm, k_value)
         Raises:
-            ValueError: If the prefix doesn't match the expected pattern
+            ValueError: If the suffix doesn't match the expected pattern
         """
-        # Extract the prefix part (everything before the double underscore)
-        prefix_end = feature_name.find("__")
-        if prefix_end == -1:
+        # Extract the suffix part (everything after the double underscore)
+        suffix_start = feature_name.find("__")
+        if suffix_start == -1:
             raise ValueError(
                 f"Invalid clustering feature name format: {feature_name}. Missing double underscore separator."
             )
-        prefix = feature_name[:prefix_end]
+        suffix = feature_name[suffix_start + 2 :]
-        # Parse the prefix components
-        parts = prefix.split("_")
+        # Parse the suffix components
+        parts = suffix.split("_")
         if len(parts) != 3 or parts[0] != "cluster":
             raise ValueError(
                 f"Invalid clustering feature name format: {feature_name}. "
-                f"Expected format: cluster_{{algorithm}}_{{k_value}}__{{mloda_source_features}}"
+                f"Expected format: {{mloda_source_features}}__cluster_{{algorithm}}_{{k_value}}"
             )
         algorithm, k_value = parts[1], parts[2]

mloda_plugins/feature_group/experimental/data_quality/missing_value/base.py CHANGED Viewed

@@ -37,14 +37,14 @@ class MissingValueFeatureGroup(AbstractFeatureGroup):
     ### 1. String-Based Creation
-    Features follow the naming pattern: `{imputation_method}_imputed__{mloda_source_features}`
+    Features follow the naming pattern: `{mloda_source_features}__{imputation_method}_imputed`
     Examples:
     ```python
     features = [
-        "mean_imputed__income",      # Impute missing values in income with the mean
-        "median_imputed__age",       # Impute missing values in age with the median
-        "constant_imputed__category" # Impute missing values in category with a constant value
+        "income__mean_imputed",      # Impute missing values in income with the mean
+        "age__median_imputed",       # Impute missing values in age with the median
+        "category__constant_imputed" # Impute missing values in category with a constant value
     ]
     ```
@@ -85,16 +85,16 @@ class MissingValueFeatureGroup(AbstractFeatureGroup):
     from mloda_core.abstract_plugins.components.feature import Feature
     # Impute missing income values with mean
-    feature = Feature(name="mean_imputed__income")
+    feature = Feature(name="income__mean_imputed")
     # Impute missing age values with median
-    feature = Feature(name="median_imputed__age")
+    feature = Feature(name="age__median_imputed")
     # Impute missing category values with mode
-    feature = Feature(name="mode_imputed__category")
+    feature = Feature(name="category__mode_imputed")
     # Forward fill missing temperature values
-    feature = Feature(name="ffill_imputed__temperature")
+    feature = Feature(name="temperature__ffill_imputed")
     ```
     ### Configuration-Based Creation
@@ -158,7 +158,7 @@ class MissingValueFeatureGroup(AbstractFeatureGroup):
     }
     PATTERN = "__"
-    PREFIX_PATTERN = r"^([\w]+)_imputed__"
+    PREFIX_PATTERN = r".*__([\w]+)_imputed$"
     PROPERTY_MAPPING = {
         IMPUTATION_METHOD: {
@@ -187,7 +187,10 @@ class MissingValueFeatureGroup(AbstractFeatureGroup):
         source_feature: str | None = None
         # Try string-based parsing first
-        _, source_feature = FeatureChainParser.parse_feature_name(feature_name, self.PATTERN, [self.PREFIX_PATTERN])
+        # parse_feature_name returns (operation_config, source_feature)
+        operation_config, source_feature = FeatureChainParser.parse_feature_name(
+            feature_name, self.PATTERN, [self.PREFIX_PATTERN]
+        )
         if source_feature is not None:
             return {Feature(source_feature)}
@@ -202,11 +205,16 @@ class MissingValueFeatureGroup(AbstractFeatureGroup):
     @classmethod
     def get_imputation_method(cls, feature_name: str) -> str:
         """Extract the imputation method from the feature name."""
-        imputation_method, _ = FeatureChainParser.parse_feature_name(feature_name, cls.PATTERN, [cls.PREFIX_PATTERN])
-        if imputation_method is None:
+        # parse_feature_name returns (operation_config, source_feature)
+        # The operation_config contains the imputation method extracted from the suffix pattern
+        operation_config, _ = FeatureChainParser.parse_feature_name(feature_name, cls.PATTERN, [cls.PREFIX_PATTERN])
+        if operation_config is None:
             raise ValueError(f"Invalid missing value feature name format: {feature_name}")
-        imputation_method = imputation_method.replace("imputed", "").strip("_")
+        # The PREFIX_PATTERN captures the method name (e.g., "mean" from "mean_imputed")
+        # So operation_config already contains just the method name
+        imputation_method = operation_config
         # Validate imputation method
         if imputation_method not in cls.IMPUTATION_METHODS:
             raise ValueError(
@@ -257,7 +265,9 @@ class MissingValueFeatureGroup(AbstractFeatureGroup):
         feature_name_str = feature.name.name if hasattr(feature.name, "name") else str(feature.name)
         if cls.PATTERN in feature_name_str:
+            # Use get_imputation_method which already handles parse_feature_name correctly
             imputation_method = cls.get_imputation_method(feature_name_str)
+            # Use extract_source_feature which returns everything before the last __
             source_feature_name = FeatureChainParser.extract_source_feature(feature_name_str, cls.PREFIX_PATTERN)
             return imputation_method, source_feature_name
@@ -271,7 +281,7 @@ class MissingValueFeatureGroup(AbstractFeatureGroup):
         if imputation_method is None or source_feature_name is None:
             raise ValueError(f"Could not extract imputation method and source feature from: {feature.name}")
-        imputation_method = imputation_method.replace("imputed", "").strip("_")
+        # Validate imputation method (no need to strip "imputed" from config-based method)
         if imputation_method not in cls.IMPUTATION_METHODS:
             raise ValueError(
                 f"Unsupported imputation method: {imputation_method}. "

mloda 0.2.15__py3-none-any.whl → 0.3.0__py3-none-any.whl

mloda 0.2.15py3-none-any.whl → 0.3.0py3-none-any.whl