PyPI - informatica-python - Versions diffs - 1.9.8__tar.gz → 1.10.0__tar.gz - Mend

informatica-python 1.9.8tar.gz → 1.10.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (31) hide show

{informatica_python-1.9.8 → informatica_python-1.10.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: informatica-python
-Version: 1.9.8
+Version: 1.10.0
 Summary: Convert Informatica PowerCenter workflow XML to Python/PySpark code
 Author: Nick
 License: MIT
@@ -124,7 +124,7 @@ The code generator produces real, runnable Python for these transformation types
 - **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style) with 40+ vectorized function handlers
 - **Filter** — Row filtering with vectorized converted conditions
 - **Joiner** — `pd.merge()` with join type and condition parsing (inner/left/right/outer)
-- **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution
+- **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution, SQL override support, table caching via `lookup_func()`
 - **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST, computed aggregates
 - **Sorter** — `sort_values()` with multi-key ascending/descending per-field direction from SORTDIRECTION attribute
 - **Router** — Multi-group conditional routing with named groups
@@ -196,7 +196,7 @@ Column-level pandas operations instead of row-level iteration. The expression co
 - `REPLACECHR/REPLACESTR` → `.str.replace()`
 - `REG_EXTRACT/REG_REPLACE` → `.str.extract()/.str.replace(regex=True)`
 - `CHR(code)` → `chr(int(code))`
-- `||` concatenation → `+` with `.astype(str)` on non-literals
+- `||` concatenation → `+` with smart coercion: `.fillna('').astype(str)` for Series, `str()` for scalars
 **Date/Time:**
 - `TO_DATE(val, fmt)` → `pd.to_datetime()` with Informatica→Python format conversion
@@ -343,10 +343,12 @@ Target field datatypes are mapped to pandas types and generate proper casting co
 - Decimals/Floats: `pd.to_numeric(errors='coerce')`
 - Booleans: `.astype('boolean')`
-### Flat File Handling (v1.3+)
+### Flat File Handling (v1.3+, enhanced v1.9.8)
 Parses FLATFILE metadata for delimiter, fixed-width, header lines, skip rows, quote/escape chars. Generates `pd.read_fwf()` for fixed-width or enriched `read_file()` for delimited.
+**Fixed-width enhancements (v1.9.8):** `OFFSET`, `PHYSICALLENGTH`, and `PHYSICALOFFSET` are parsed from `SOURCEFIELD` attributes. `physical_length` is preferred over `precision` for accurate column width calculations in `pd.read_fwf()`.
 ### Mapplet Inlining (v1.3+)
 Expands Mapplet instances into prefixed transforms, rewires connectors, and eliminates duplication.
@@ -371,12 +373,17 @@ The generated `helper_functions.py` provides a complete runtime library:
 ### Database Operations
 | Function | Description |
 |----------|-------------|
-| `get_db_connection(config, conn_name)` | Create DB connection (pyodbc/pymssql/sqlalchemy fallback for MSSQL) |
+| `get_db_connection(config, conn_name)` | SQLAlchemy-first DB connection with engine caching and connection pooling; DBAPI fallback for pyodbc/pymssql |
 | `read_from_db(config, query, conn_name)` | Execute SQL query and return DataFrame |
 | `write_to_db(config, df, table, conn_name)` | Write DataFrame to database table via `.to_sql()` |
-| `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement (INSERT, UPDATE, DELETE) |
+| `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement; auto-detects SQLAlchemy vs DBAPI via `dialect` attribute |
 | `write_with_update_strategy(config, df, table, ...)` | Split rows by `_update_strategy` column into INSERT/UPDATE/DELETE/REJECT operations |
 | `call_stored_procedure(config, proc, params, ...)` | Execute stored procedure with input/output parameter mapping (Oracle/MSSQL/generic) |
+| `lookup_func(table, *args)` | Full lookup implementation with table caching, condition parsing, and default value support |
+| `resolve_env(value)` | Resolve `${VAR}` placeholders from environment variables with config fallback |
+| `resolve_builtin_variable(var_name, ...)` | Resolve `$PMMappingName`, `$PMSessionName`, `$PMFolderName`, etc. |
+| `rename_with_duplicates(df, col_map)` | Safe column rename supporting one-source-to-many-target mapping |
+| `_safe_close(conn)` | Safe connection cleanup handling both SQLAlchemy and raw DBAPI connections |
 ### File Operations
 | Function | Description |
@@ -407,7 +414,41 @@ The generated `helper_functions.py` provides a complete runtime library:
 ## Changelog
-### v1.9.3 (Current)
+### v1.10.0 (Current)
+- **Router multi-group output support**: Router transformations now properly handle `<GROUP>` elements with `EXPRESSION` attributes — generates separate filtered DataFrames for each named output group (e.g., `df_rtr_rest_type_per`, `df_rtr_rest_value_per`), not just the DEFAULT group
+- **Connector group routing**: `FROMINSTANCEGROUP` / `TOINSTANCEGROUP` attributes on `CONNECTOR` elements are now parsed and used to wire downstream transforms/targets to the correct Router output group
+- **GroupDef expression field**: `GroupDef` model now stores the `EXPRESSION` attribute from `<GROUP>` XML elements
+- **Backward-compatible Router fallback**: Existing `TABLEATTRIBUTE`-based Router group conditions (older XML format) continue to work — the code checks `<GROUP>` elements first, then falls back to `TABLEATTRIBUTE` entries
+- **223 tests** passing
+### v1.9.8
+- **NOT(expr) function-call form**: `NOT(ISNULL(x))` now correctly converts to `~(df["x"].isna())` — handles both `NOT ` (with space) and `NOT(` (without space) forms
+- **AND/OR/NOT as field names fix**: Logical operators no longer mangled into `df["AND"]` / `df["OR"]` — conversion moved before field substitution in both `_vec_recursive` fallback and `_vectorize_simple`
+- **Condition tokenizer word-boundary fix**: `_split_condition_tokens` no longer splits on `OR` inside field names like `DeletedIndicator` — verifies preceding character is a real word boundary
+- **`$PMMappingName` in expressions**: `$PM*` built-in variables in expression context properly convert to `resolve_builtin_variable("PMMappingName")` instead of being mangled to `$df["PMMappingName"]`
+- **TO_CHAR arithmetic parenthesization**: `TO_CHAR(TO_INTEGER(x) - 1)` now produces `(pd.to_numeric(...) - 1).astype(str)` instead of incorrect `- 1.astype(str)` binding
+- **String literal early-return fix**: Expressions like `'PER_' || X || '_suffix'` no longer short-circuit as a single string literal
+- **Fixed-width file enhancements**: `OFFSET`, `PHYSICALLENGTH`, `PHYSICALOFFSET` parsed from SOURCEFIELD XML; `physical_length` preferred over `precision` for `read_fwf` column widths
+- **Smart concat coercion**: Scalar returns (e.g. `resolve_builtin_variable()`, `get_variable()`) use `str()` wrapping; Series use `.fillna('').astype(str)`
+- **700 tests** passing
+### v1.9.5 / v1.9.6
+- **`rename_with_duplicates`** helper for one-source-to-many-target column mapping
+- **`resolve_env()`** for `${VAR}` placeholder resolution (env → config fallback)
+- **`resolve_builtin_variable()`** for `$PMMappingName`, `$PMSessionName`, `$PMFolderName`, etc.
+- **SQLAlchemy-first `get_db_connection`**: Engine caching and connection pooling; DBAPI fallback for pyodbc/pymssql
+- **`_safe_close()`**: Safe connection cleanup handling both SQLAlchemy and raw DBAPI connections
+- **Full `lookup_func()` implementation**: Table caching, condition parsing, default value support
+- **Null-safe `||` concatenation**: `.fillna('').astype(str)` prevents "nan" strings in concatenation
+- **`$PM*` variable substitution in SQL Override queries**
+- **`execute_sql` dialect detection**: Uses `dialect` attribute to choose SQLAlchemy `text()` vs DBAPI `cursor.execute()`
+- **678 tests** passing
+### v1.9.4
+- Extended expression function coverage and edge-case fixes
+- Improved mapplet and connector handling
+### v1.9.3
 - **Smart target write detection**: Bare targets default to `write_to_db()` instead of `write_file()`; file extension allowlist (`.csv`, `.dat`, `.txt`, `.xml`, `.json`, `.parquet`, `.xlsx`, `.xls`, `.tsv`, `.avro`) for file targets; schema-qualified names (`dbo.TABLE`) correctly route to database
 - **DECODE vectorization**: `DECODE(TRUE, cond1, val1, ..., default)` → nested `np.where()` chains; value-matching DECODE; handles IN() conditions and complex boolean nesting
 - **IS_SPACES vectorization**: `IS_SPACES(field)` → `field.str.strip().eq("")`
@@ -495,7 +536,7 @@ The generated `helper_functions.py` provides a complete runtime library:
 cd informatica_python
 pip install -e ".[dev]"
-# Run tests (663 tests)
+# Run tests (700 tests)
 pytest tests/ -v
 ```

{informatica_python-1.9.8 → informatica_python-1.10.0}/README.md RENAMED Viewed

@@ -97,7 +97,7 @@ The code generator produces real, runnable Python for these transformation types
 - **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style) with 40+ vectorized function handlers
 - **Filter** — Row filtering with vectorized converted conditions
 - **Joiner** — `pd.merge()` with join type and condition parsing (inner/left/right/outer)
-- **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution
+- **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution, SQL override support, table caching via `lookup_func()`
 - **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST, computed aggregates
 - **Sorter** — `sort_values()` with multi-key ascending/descending per-field direction from SORTDIRECTION attribute
 - **Router** — Multi-group conditional routing with named groups
@@ -169,7 +169,7 @@ Column-level pandas operations instead of row-level iteration. The expression co
 - `REPLACECHR/REPLACESTR` → `.str.replace()`
 - `REG_EXTRACT/REG_REPLACE` → `.str.extract()/.str.replace(regex=True)`
 - `CHR(code)` → `chr(int(code))`
-- `||` concatenation → `+` with `.astype(str)` on non-literals
+- `||` concatenation → `+` with smart coercion: `.fillna('').astype(str)` for Series, `str()` for scalars
 **Date/Time:**
 - `TO_DATE(val, fmt)` → `pd.to_datetime()` with Informatica→Python format conversion
@@ -316,10 +316,12 @@ Target field datatypes are mapped to pandas types and generate proper casting co
 - Decimals/Floats: `pd.to_numeric(errors='coerce')`
 - Booleans: `.astype('boolean')`
-### Flat File Handling (v1.3+)
+### Flat File Handling (v1.3+, enhanced v1.9.8)
 Parses FLATFILE metadata for delimiter, fixed-width, header lines, skip rows, quote/escape chars. Generates `pd.read_fwf()` for fixed-width or enriched `read_file()` for delimited.
+**Fixed-width enhancements (v1.9.8):** `OFFSET`, `PHYSICALLENGTH`, and `PHYSICALOFFSET` are parsed from `SOURCEFIELD` attributes. `physical_length` is preferred over `precision` for accurate column width calculations in `pd.read_fwf()`.
 ### Mapplet Inlining (v1.3+)
 Expands Mapplet instances into prefixed transforms, rewires connectors, and eliminates duplication.
@@ -344,12 +346,17 @@ The generated `helper_functions.py` provides a complete runtime library:
 ### Database Operations
 | Function | Description |
 |----------|-------------|
-| `get_db_connection(config, conn_name)` | Create DB connection (pyodbc/pymssql/sqlalchemy fallback for MSSQL) |
+| `get_db_connection(config, conn_name)` | SQLAlchemy-first DB connection with engine caching and connection pooling; DBAPI fallback for pyodbc/pymssql |
 | `read_from_db(config, query, conn_name)` | Execute SQL query and return DataFrame |
 | `write_to_db(config, df, table, conn_name)` | Write DataFrame to database table via `.to_sql()` |
-| `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement (INSERT, UPDATE, DELETE) |
+| `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement; auto-detects SQLAlchemy vs DBAPI via `dialect` attribute |
 | `write_with_update_strategy(config, df, table, ...)` | Split rows by `_update_strategy` column into INSERT/UPDATE/DELETE/REJECT operations |
 | `call_stored_procedure(config, proc, params, ...)` | Execute stored procedure with input/output parameter mapping (Oracle/MSSQL/generic) |
+| `lookup_func(table, *args)` | Full lookup implementation with table caching, condition parsing, and default value support |
+| `resolve_env(value)` | Resolve `${VAR}` placeholders from environment variables with config fallback |
+| `resolve_builtin_variable(var_name, ...)` | Resolve `$PMMappingName`, `$PMSessionName`, `$PMFolderName`, etc. |
+| `rename_with_duplicates(df, col_map)` | Safe column rename supporting one-source-to-many-target mapping |
+| `_safe_close(conn)` | Safe connection cleanup handling both SQLAlchemy and raw DBAPI connections |
 ### File Operations
 | Function | Description |
@@ -380,7 +387,41 @@ The generated `helper_functions.py` provides a complete runtime library:
 ## Changelog
-### v1.9.3 (Current)
+### v1.10.0 (Current)
+- **Router multi-group output support**: Router transformations now properly handle `<GROUP>` elements with `EXPRESSION` attributes — generates separate filtered DataFrames for each named output group (e.g., `df_rtr_rest_type_per`, `df_rtr_rest_value_per`), not just the DEFAULT group
+- **Connector group routing**: `FROMINSTANCEGROUP` / `TOINSTANCEGROUP` attributes on `CONNECTOR` elements are now parsed and used to wire downstream transforms/targets to the correct Router output group
+- **GroupDef expression field**: `GroupDef` model now stores the `EXPRESSION` attribute from `<GROUP>` XML elements
+- **Backward-compatible Router fallback**: Existing `TABLEATTRIBUTE`-based Router group conditions (older XML format) continue to work — the code checks `<GROUP>` elements first, then falls back to `TABLEATTRIBUTE` entries
+- **223 tests** passing
+### v1.9.8
+- **NOT(expr) function-call form**: `NOT(ISNULL(x))` now correctly converts to `~(df["x"].isna())` — handles both `NOT ` (with space) and `NOT(` (without space) forms
+- **AND/OR/NOT as field names fix**: Logical operators no longer mangled into `df["AND"]` / `df["OR"]` — conversion moved before field substitution in both `_vec_recursive` fallback and `_vectorize_simple`
+- **Condition tokenizer word-boundary fix**: `_split_condition_tokens` no longer splits on `OR` inside field names like `DeletedIndicator` — verifies preceding character is a real word boundary
+- **`$PMMappingName` in expressions**: `$PM*` built-in variables in expression context properly convert to `resolve_builtin_variable("PMMappingName")` instead of being mangled to `$df["PMMappingName"]`
+- **TO_CHAR arithmetic parenthesization**: `TO_CHAR(TO_INTEGER(x) - 1)` now produces `(pd.to_numeric(...) - 1).astype(str)` instead of incorrect `- 1.astype(str)` binding
+- **String literal early-return fix**: Expressions like `'PER_' || X || '_suffix'` no longer short-circuit as a single string literal
+- **Fixed-width file enhancements**: `OFFSET`, `PHYSICALLENGTH`, `PHYSICALOFFSET` parsed from SOURCEFIELD XML; `physical_length` preferred over `precision` for `read_fwf` column widths
+- **Smart concat coercion**: Scalar returns (e.g. `resolve_builtin_variable()`, `get_variable()`) use `str()` wrapping; Series use `.fillna('').astype(str)`
+- **700 tests** passing
+### v1.9.5 / v1.9.6
+- **`rename_with_duplicates`** helper for one-source-to-many-target column mapping
+- **`resolve_env()`** for `${VAR}` placeholder resolution (env → config fallback)
+- **`resolve_builtin_variable()`** for `$PMMappingName`, `$PMSessionName`, `$PMFolderName`, etc.
+- **SQLAlchemy-first `get_db_connection`**: Engine caching and connection pooling; DBAPI fallback for pyodbc/pymssql
+- **`_safe_close()`**: Safe connection cleanup handling both SQLAlchemy and raw DBAPI connections
+- **Full `lookup_func()` implementation**: Table caching, condition parsing, default value support
+- **Null-safe `||` concatenation**: `.fillna('').astype(str)` prevents "nan" strings in concatenation
+- **`$PM*` variable substitution in SQL Override queries**
+- **`execute_sql` dialect detection**: Uses `dialect` attribute to choose SQLAlchemy `text()` vs DBAPI `cursor.execute()`
+- **678 tests** passing
+### v1.9.4
+- Extended expression function coverage and edge-case fixes
+- Improved mapplet and connector handling
+### v1.9.3
 - **Smart target write detection**: Bare targets default to `write_to_db()` instead of `write_file()`; file extension allowlist (`.csv`, `.dat`, `.txt`, `.xml`, `.json`, `.parquet`, `.xlsx`, `.xls`, `.tsv`, `.avro`) for file targets; schema-qualified names (`dbo.TABLE`) correctly route to database
 - **DECODE vectorization**: `DECODE(TRUE, cond1, val1, ..., default)` → nested `np.where()` chains; value-matching DECODE; handles IN() conditions and complex boolean nesting
 - **IS_SPACES vectorization**: `IS_SPACES(field)` → `field.str.strip().eq("")`
@@ -468,7 +509,7 @@ The generated `helper_functions.py` provides a complete runtime library:
 cd informatica_python
 pip install -e ".[dev]"
-# Run tests (663 tests)
+# Run tests (700 tests)
 pytest tests/ -v
 ```

{informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python/generators/mapping_gen.py RENAMED Viewed

@@ -54,6 +54,8 @@ def _expand_mapplet_recursive(mapplet, mapplet_map, prefix, depth=0, max_depth=1
             to_instance=new_to,
             to_field=conn.to_field,
             to_instance_type=conn.to_instance_type,
+            from_instance_group=conn.from_instance_group,
+            to_instance_group=conn.to_instance_group,
         ))
     for inst in getattr(mapplet, 'instances', []):
@@ -138,6 +140,8 @@ def _inline_mapplets(mapping, folder):
                     to_instance=first_tx,
                     to_field=conn.to_field,
                     to_instance_type=conn.to_instance_type,
+                    from_instance_group=conn.from_instance_group,
+                    to_instance_group=conn.to_instance_group,
                 ))
             else:
                 rewired_connectors.append(conn)
@@ -167,6 +171,8 @@ def _inline_mapplets(mapping, folder):
                     to_instance=conn.to_instance,
                     to_field=conn.to_field,
                     to_instance_type=conn.to_instance_type,
+                    from_instance_group=conn.from_instance_group,
+                    to_instance_group=conn.to_instance_group,
                 ))
             else:
                 rewired_connectors.append(conn)
@@ -730,7 +736,11 @@ def _generate_transformation(lines, tx, connector_graph, source_dfs, transform_m
     input_conns = connector_graph.get("to", {}).get(tx.name, [])
     input_sources = set()
     for c in input_conns:
-        input_sources.add(c.from_instance)
+        if c.from_instance_group:
+            group_key = f"{c.from_instance}:{c.from_instance_group}"
+            input_sources.add(group_key)
+        else:
+            input_sources.add(c.from_instance)
     input_df = None
     for src in input_sources:
@@ -739,7 +749,14 @@ def _generate_transformation(lines, tx, connector_graph, source_dfs, transform_m
             break
     if not input_df:
         for src in input_sources:
-            input_df = f"df_{_safe_name(src)}"
+            base = src.split(":")[0] if ":" in src else src
+            if base in source_dfs:
+                input_df = source_dfs[base]
+                break
+    if not input_df:
+        for src in input_sources:
+            base = src.split(":")[0] if ":" in src else src
+            input_df = f"df_{_safe_name(base)}"
             break
     if not input_df:
         input_df = "df_input"
@@ -1095,9 +1112,24 @@ def _gen_lookup_transform(lines, tx, tx_safe, input_df, source_dfs, connector_gr
 def _gen_router_transform(lines, tx, tx_safe, input_df, source_dfs):
     lines.append(f"    # Router groups:")
     group_conditions = {}
-    for attr in tx.attributes:
-        if "Group Filter Condition" in attr.name:
-            group_conditions[attr.name] = attr.value
+    output_groups = [
+        g for g in tx.groups
+        if g.type.upper() not in ("INPUT", "") and "DEFAULT" not in g.type.upper()
+    ]
+    output_groups.sort(key=lambda g: g.order)
+    if output_groups:
+        for g in output_groups:
+            if g.expression and g.expression.strip():
+                group_conditions[g.name] = g.expression
+            else:
+                group_conditions[g.name] = ""
+    if not group_conditions:
+        for attr in tx.attributes:
+            if "Group Filter Condition" in attr.name:
+                group_conditions[attr.name] = attr.value
     remaining_mask_parts = []
     if group_conditions:
@@ -1108,15 +1140,21 @@ def _gen_router_transform(lines, tx, tx_safe, input_df, source_dfs):
                 expr_py = f"pd.Series(True, index={input_df}.index)"
             mask_var = f"_router_mask_{tx_safe}_{i}"
             lines.append(f"    {mask_var} = {expr_py}  # {gname}")
-            lines.append(f"    df_{tx_safe}_group{i} = {input_df}[{mask_var}].copy()")
-            source_dfs[f"{tx.name}_group{i}"] = f"df_{tx_safe}_group{i}"
+            group_df_name = f"df_{tx_safe}_{_safe_name(gname)}"
+            lines.append(f"    {group_df_name} = {input_df}[{mask_var}].copy()")
+            source_dfs[f"{tx.name}:{gname}"] = group_df_name
             remaining_mask_parts.append(f"~{mask_var}")
+    default_groups = [g for g in tx.groups if "DEFAULT" in g.type.upper()]
+    default_name = default_groups[0].name if default_groups else "DEFAULT"
     if remaining_mask_parts:
         lines.append(f"    _router_default_mask = {' & '.join(remaining_mask_parts)}")
-        lines.append(f"    df_{tx_safe} = {input_df}[_router_default_mask].copy()  # Default group")
+        lines.append(f"    df_{tx_safe} = {input_df}[_router_default_mask].copy()  # Default group ({default_name})")
     else:
-        lines.append(f"    df_{tx_safe} = {input_df}.copy()  # Default group")
+        lines.append(f"    df_{tx_safe} = {input_df}.copy()  # Default group ({default_name})")
     source_dfs[tx.name] = f"df_{tx_safe}"
+    source_dfs[f"{tx.name}:{default_name}"] = f"df_{tx_safe}"
 def _gen_union_transform(lines, tx, tx_safe, input_sources, source_dfs, data_lib="pandas"):
@@ -1448,6 +1486,11 @@ def _generate_target_write(lines, tgt_name, tgt_def, connector_graph, source_dfs
     to_conns = connector_graph.get("to", {}).get(tgt_name, [])
     input_df = None
     for c in to_conns:
+        if c.from_instance_group:
+            group_key = f"{c.from_instance}:{c.from_instance_group}"
+            if group_key in source_dfs:
+                input_df = source_dfs[group_key]
+                break
         if c.from_instance in source_dfs:
             input_df = source_dfs[c.from_instance]
             break

{informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python/models.py RENAMED Viewed

@@ -80,6 +80,7 @@ class GroupDef:
     type: str = ""
     description: str = ""
     order: int = 0
+    expression: str = ""
     fields: List[FieldDef] = field(default_factory=list)
@@ -274,6 +275,8 @@ class ConnectorDef:
     to_field: str
     to_instance: str
     to_instance_type: str
+    from_instance_group: str = ""
+    to_instance_group: str = ""
 @dataclass

{informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python/parser.py RENAMED Viewed

@@ -240,6 +240,7 @@ class InformaticaParser:
             type=self._attr(elem, "TYPE"),
             description=self._attr(elem, "DESCRIPTION"),
             order=self._int_attr(elem, "ORDER"),
+            expression=self._attr(elem, "EXPRESSION"),
         )
         for fld in elem.findall("SOURCEFIELD"):
             grp.fields.append(self._parse_source_field(fld))
@@ -580,6 +581,8 @@ class InformaticaParser:
             to_field=self._attr(elem, "TOFIELD"),
             to_instance=self._attr(elem, "TOINSTANCE"),
             to_instance_type=self._attr(elem, "TOINSTANCETYPE"),
+            from_instance_group=self._attr(elem, "FROMINSTANCEGROUP"),
+            to_instance_group=self._attr(elem, "TOINSTANCEGROUP"),
         )
     def _parse_instance(self, elem) -> InstanceDef:

{informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: informatica-python
-Version: 1.9.8
+Version: 1.10.0
 Summary: Convert Informatica PowerCenter workflow XML to Python/PySpark code
 Author: Nick
 License: MIT
@@ -124,7 +124,7 @@ The code generator produces real, runnable Python for these transformation types
 - **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style) with 40+ vectorized function handlers
 - **Filter** — Row filtering with vectorized converted conditions
 - **Joiner** — `pd.merge()` with join type and condition parsing (inner/left/right/outer)
-- **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution
+- **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution, SQL override support, table caching via `lookup_func()`
 - **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST, computed aggregates
 - **Sorter** — `sort_values()` with multi-key ascending/descending per-field direction from SORTDIRECTION attribute
 - **Router** — Multi-group conditional routing with named groups
@@ -196,7 +196,7 @@ Column-level pandas operations instead of row-level iteration. The expression co
 - `REPLACECHR/REPLACESTR` → `.str.replace()`
 - `REG_EXTRACT/REG_REPLACE` → `.str.extract()/.str.replace(regex=True)`
 - `CHR(code)` → `chr(int(code))`
-- `||` concatenation → `+` with `.astype(str)` on non-literals
+- `||` concatenation → `+` with smart coercion: `.fillna('').astype(str)` for Series, `str()` for scalars
 **Date/Time:**
 - `TO_DATE(val, fmt)` → `pd.to_datetime()` with Informatica→Python format conversion
@@ -343,10 +343,12 @@ Target field datatypes are mapped to pandas types and generate proper casting co
 - Decimals/Floats: `pd.to_numeric(errors='coerce')`
 - Booleans: `.astype('boolean')`
-### Flat File Handling (v1.3+)
+### Flat File Handling (v1.3+, enhanced v1.9.8)
 Parses FLATFILE metadata for delimiter, fixed-width, header lines, skip rows, quote/escape chars. Generates `pd.read_fwf()` for fixed-width or enriched `read_file()` for delimited.
+**Fixed-width enhancements (v1.9.8):** `OFFSET`, `PHYSICALLENGTH`, and `PHYSICALOFFSET` are parsed from `SOURCEFIELD` attributes. `physical_length` is preferred over `precision` for accurate column width calculations in `pd.read_fwf()`.
 ### Mapplet Inlining (v1.3+)
 Expands Mapplet instances into prefixed transforms, rewires connectors, and eliminates duplication.
@@ -371,12 +373,17 @@ The generated `helper_functions.py` provides a complete runtime library:
 ### Database Operations
 | Function | Description |
 |----------|-------------|
-| `get_db_connection(config, conn_name)` | Create DB connection (pyodbc/pymssql/sqlalchemy fallback for MSSQL) |
+| `get_db_connection(config, conn_name)` | SQLAlchemy-first DB connection with engine caching and connection pooling; DBAPI fallback for pyodbc/pymssql |
 | `read_from_db(config, query, conn_name)` | Execute SQL query and return DataFrame |
 | `write_to_db(config, df, table, conn_name)` | Write DataFrame to database table via `.to_sql()` |
-| `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement (INSERT, UPDATE, DELETE) |
+| `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement; auto-detects SQLAlchemy vs DBAPI via `dialect` attribute |
 | `write_with_update_strategy(config, df, table, ...)` | Split rows by `_update_strategy` column into INSERT/UPDATE/DELETE/REJECT operations |
 | `call_stored_procedure(config, proc, params, ...)` | Execute stored procedure with input/output parameter mapping (Oracle/MSSQL/generic) |
+| `lookup_func(table, *args)` | Full lookup implementation with table caching, condition parsing, and default value support |
+| `resolve_env(value)` | Resolve `${VAR}` placeholders from environment variables with config fallback |
+| `resolve_builtin_variable(var_name, ...)` | Resolve `$PMMappingName`, `$PMSessionName`, `$PMFolderName`, etc. |
+| `rename_with_duplicates(df, col_map)` | Safe column rename supporting one-source-to-many-target mapping |
+| `_safe_close(conn)` | Safe connection cleanup handling both SQLAlchemy and raw DBAPI connections |
 ### File Operations
 | Function | Description |
@@ -407,7 +414,41 @@ The generated `helper_functions.py` provides a complete runtime library:
 ## Changelog
-### v1.9.3 (Current)
+### v1.10.0 (Current)
+- **Router multi-group output support**: Router transformations now properly handle `<GROUP>` elements with `EXPRESSION` attributes — generates separate filtered DataFrames for each named output group (e.g., `df_rtr_rest_type_per`, `df_rtr_rest_value_per`), not just the DEFAULT group
+- **Connector group routing**: `FROMINSTANCEGROUP` / `TOINSTANCEGROUP` attributes on `CONNECTOR` elements are now parsed and used to wire downstream transforms/targets to the correct Router output group
+- **GroupDef expression field**: `GroupDef` model now stores the `EXPRESSION` attribute from `<GROUP>` XML elements
+- **Backward-compatible Router fallback**: Existing `TABLEATTRIBUTE`-based Router group conditions (older XML format) continue to work — the code checks `<GROUP>` elements first, then falls back to `TABLEATTRIBUTE` entries
+- **223 tests** passing
+### v1.9.8
+- **NOT(expr) function-call form**: `NOT(ISNULL(x))` now correctly converts to `~(df["x"].isna())` — handles both `NOT ` (with space) and `NOT(` (without space) forms
+- **AND/OR/NOT as field names fix**: Logical operators no longer mangled into `df["AND"]` / `df["OR"]` — conversion moved before field substitution in both `_vec_recursive` fallback and `_vectorize_simple`
+- **Condition tokenizer word-boundary fix**: `_split_condition_tokens` no longer splits on `OR` inside field names like `DeletedIndicator` — verifies preceding character is a real word boundary
+- **`$PMMappingName` in expressions**: `$PM*` built-in variables in expression context properly convert to `resolve_builtin_variable("PMMappingName")` instead of being mangled to `$df["PMMappingName"]`
+- **TO_CHAR arithmetic parenthesization**: `TO_CHAR(TO_INTEGER(x) - 1)` now produces `(pd.to_numeric(...) - 1).astype(str)` instead of incorrect `- 1.astype(str)` binding
+- **String literal early-return fix**: Expressions like `'PER_' || X || '_suffix'` no longer short-circuit as a single string literal
+- **Fixed-width file enhancements**: `OFFSET`, `PHYSICALLENGTH`, `PHYSICALOFFSET` parsed from SOURCEFIELD XML; `physical_length` preferred over `precision` for `read_fwf` column widths
+- **Smart concat coercion**: Scalar returns (e.g. `resolve_builtin_variable()`, `get_variable()`) use `str()` wrapping; Series use `.fillna('').astype(str)`
+- **700 tests** passing
+### v1.9.5 / v1.9.6
+- **`rename_with_duplicates`** helper for one-source-to-many-target column mapping
+- **`resolve_env()`** for `${VAR}` placeholder resolution (env → config fallback)
+- **`resolve_builtin_variable()`** for `$PMMappingName`, `$PMSessionName`, `$PMFolderName`, etc.
+- **SQLAlchemy-first `get_db_connection`**: Engine caching and connection pooling; DBAPI fallback for pyodbc/pymssql
+- **`_safe_close()`**: Safe connection cleanup handling both SQLAlchemy and raw DBAPI connections
+- **Full `lookup_func()` implementation**: Table caching, condition parsing, default value support
+- **Null-safe `||` concatenation**: `.fillna('').astype(str)` prevents "nan" strings in concatenation
+- **`$PM*` variable substitution in SQL Override queries**
+- **`execute_sql` dialect detection**: Uses `dialect` attribute to choose SQLAlchemy `text()` vs DBAPI `cursor.execute()`
+- **678 tests** passing
+### v1.9.4
+- Extended expression function coverage and edge-case fixes
+- Improved mapplet and connector handling
+### v1.9.3
 - **Smart target write detection**: Bare targets default to `write_to_db()` instead of `write_file()`; file extension allowlist (`.csv`, `.dat`, `.txt`, `.xml`, `.json`, `.parquet`, `.xlsx`, `.xls`, `.tsv`, `.avro`) for file targets; schema-qualified names (`dbo.TABLE`) correctly route to database
 - **DECODE vectorization**: `DECODE(TRUE, cond1, val1, ..., default)` → nested `np.where()` chains; value-matching DECODE; handles IN() conditions and complex boolean nesting
 - **IS_SPACES vectorization**: `IS_SPACES(field)` → `field.str.strip().eq("")`
@@ -495,7 +536,7 @@ The generated `helper_functions.py` provides a complete runtime library:
 cd informatica_python
 pip install -e ".[dev]"
-# Run tests (663 tests)
+# Run tests (700 tests)
 pytest tests/ -v
 ```

{informatica_python-1.9.8 → informatica_python-1.10.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "informatica-python"
-version = "1.9.8"
+version = "1.10.0"
 description = "Convert Informatica PowerCenter workflow XML to Python/PySpark code"
 readme = "README.md"
 license = {text = "MIT"}

{informatica_python-1.9.8 → informatica_python-1.10.0}/tests/test_integration.py RENAMED Viewed

@@ -2911,3 +2911,302 @@ class TestConcatWithLtrimRtrim(unittest.TestCase):
         result = convert_expression_vectorized(expr)
         assert "+" in result
         assert "||" not in result
+class TestRouterGroupElements(unittest.TestCase):
+    ROUTER_GROUP_XML = '''<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE POWERMART SYSTEM "powrmart.dtd">
+<POWERMART CREATION_DATE="01/01/2025" REPOSITORY_VERSION="1">
+<REPOSITORY NAME="repo" VERSION="1" CODEPAGE="UTF-8" DATABASETYPE="Oracle">
+<FOLDER NAME="TEST" OWNER="admin">
+  <SOURCE NAME="SRC" DATABASETYPE="Flat File" DBDNAME="SRC">
+    <FLATFILE DELIMITEDBY="COMMA" HEADERROWPRESENT="YES" PADBYTES="NO" ROWDELIMITER="\\n"/>
+    <SOURCEFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" NULLABLE="NOTNULL" KEYTYPE="PRIMARY KEY" FIELDNUMBER="1"/>
+    <SOURCEFIELD NAME="Party_Type" DATATYPE="string" PRECISION="10" SCALE="0" NULLABLE="NULL" KEYTYPE="NOT A KEY" FIELDNUMBER="2"/>
+    <SOURCEFIELD NAME="Attrib_Name" DATATYPE="string" PRECISION="50" SCALE="0" NULLABLE="NULL" KEYTYPE="NOT A KEY" FIELDNUMBER="3"/>
+  </SOURCE>
+  <TARGET NAME="TGT_PER_TYPE" DATABASETYPE="Flat File">
+    <TARGETFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" NULLABLE="NULL" KEYTYPE="NOT A KEY" FIELDNUMBER="1"/>
+  </TARGET>
+  <TARGET NAME="TGT_PER_VALUE" DATABASETYPE="Flat File">
+    <TARGETFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" NULLABLE="NULL" KEYTYPE="NOT A KEY" FIELDNUMBER="1"/>
+  </TARGET>
+  <TARGET NAME="TGT_ORG_TYPE" DATABASETYPE="Flat File">
+    <TARGETFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" NULLABLE="NULL" KEYTYPE="NOT A KEY" FIELDNUMBER="1"/>
+  </TARGET>
+  <TARGET NAME="TGT_DEFAULT" DATABASETYPE="Flat File">
+    <TARGETFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" NULLABLE="NULL" KEYTYPE="NOT A KEY" FIELDNUMBER="1"/>
+  </TARGET>
+  <MAPPING NAME="m_router_groups_test" ISVALID="YES">
+    <TRANSFORMATION NAME="SQ_SRC" TYPE="Source Qualifier" REUSABLE="NO">
+      <TRANSFORMFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" PORTTYPE="OUTPUT"/>
+      <TRANSFORMFIELD NAME="Party_Type" DATATYPE="string" PRECISION="10" SCALE="0" PORTTYPE="OUTPUT"/>
+      <TRANSFORMFIELD NAME="Attrib_Name" DATATYPE="string" PRECISION="50" SCALE="0" PORTTYPE="OUTPUT"/>
+    </TRANSFORMATION>
+    <TRANSFORMATION NAME="RTR_SPLIT" TYPE="Router" REUSABLE="NO">
+      <TRANSFORMFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" PORTTYPE="INPUT/OUTPUT"/>
+      <TRANSFORMFIELD NAME="Party_Type" DATATYPE="string" PRECISION="10" SCALE="0" PORTTYPE="INPUT/OUTPUT"/>
+      <TRANSFORMFIELD NAME="Attrib_Name" DATATYPE="string" PRECISION="50" SCALE="0" PORTTYPE="INPUT/OUTPUT"/>
+      <GROUP DESCRIPTION="" NAME="INPUT" ORDER="1" TYPE="INPUT"/>
+      <GROUP DESCRIPTION="" EXPRESSION="Party_Type = &apos;PER&apos; AND Attrib_Name = &apos;RESTRICTION_TYPE&apos;" NAME="Rest_Type_PER" ORDER="2" TYPE="OUTPUT"/>
+      <GROUP DESCRIPTION="" EXPRESSION="Party_Type = &apos;PER&apos; AND Attrib_Name = &apos;RESTRICTION_VALUE&apos;" NAME="Rest_Value_PER" ORDER="3" TYPE="OUTPUT"/>
+      <GROUP DESCRIPTION="" EXPRESSION="Party_Type = &apos;ORG&apos; AND Attrib_Name = &apos;RESTRICTION_TYPE&apos;" NAME="Rest_Type_ORG" ORDER="4" TYPE="OUTPUT"/>
+      <GROUP DESCRIPTION="" NAME="DEFAULT1" ORDER="5" TYPE="OUTPUT/DEFAULT"/>
+    </TRANSFORMATION>
+    <INSTANCE NAME="SRC" TYPE="Source Definition" TRANSFORMATION_NAME="SRC"/>
+    <INSTANCE NAME="SQ_SRC" TYPE="Source Qualifier" TRANSFORMATION_NAME="SQ_SRC"/>
+    <INSTANCE NAME="RTR_SPLIT" TYPE="Router" TRANSFORMATION_NAME="RTR_SPLIT"/>
+    <INSTANCE NAME="TGT_PER_TYPE" TYPE="Target Definition" TRANSFORMATION_NAME="TGT_PER_TYPE"/>
+    <INSTANCE NAME="TGT_PER_VALUE" TYPE="Target Definition" TRANSFORMATION_NAME="TGT_PER_VALUE"/>
+    <INSTANCE NAME="TGT_ORG_TYPE" TYPE="Target Definition" TRANSFORMATION_NAME="TGT_ORG_TYPE"/>
+    <INSTANCE NAME="TGT_DEFAULT" TYPE="Target Definition" TRANSFORMATION_NAME="TGT_DEFAULT"/>
+    <CONNECTOR FROMINSTANCE="SRC" FROMFIELD="ID" TOINSTANCE="SQ_SRC" TOFIELD="ID"/>
+    <CONNECTOR FROMINSTANCE="SRC" FROMFIELD="Party_Type" TOINSTANCE="SQ_SRC" TOFIELD="Party_Type"/>
+    <CONNECTOR FROMINSTANCE="SRC" FROMFIELD="Attrib_Name" TOINSTANCE="SQ_SRC" TOFIELD="Attrib_Name"/>
+    <CONNECTOR FROMINSTANCE="SQ_SRC" FROMFIELD="ID" TOINSTANCE="RTR_SPLIT" TOFIELD="ID"/>
+    <CONNECTOR FROMINSTANCE="SQ_SRC" FROMFIELD="Party_Type" TOINSTANCE="RTR_SPLIT" TOFIELD="Party_Type"/>
+    <CONNECTOR FROMINSTANCE="SQ_SRC" FROMFIELD="Attrib_Name" TOINSTANCE="RTR_SPLIT" TOFIELD="Attrib_Name"/>
+    <CONNECTOR FROMINSTANCE="RTR_SPLIT" FROMINSTANCEGROUP="Rest_Type_PER" FROMFIELD="ID" TOINSTANCE="TGT_PER_TYPE" TOFIELD="ID"/>
+    <CONNECTOR FROMINSTANCE="RTR_SPLIT" FROMINSTANCEGROUP="Rest_Value_PER" FROMFIELD="ID" TOINSTANCE="TGT_PER_VALUE" TOFIELD="ID"/>
+    <CONNECTOR FROMINSTANCE="RTR_SPLIT" FROMINSTANCEGROUP="Rest_Type_ORG" FROMFIELD="ID" TOINSTANCE="TGT_ORG_TYPE" TOFIELD="ID"/>
+    <CONNECTOR FROMINSTANCE="RTR_SPLIT" FROMINSTANCEGROUP="DEFAULT1" FROMFIELD="ID" TOINSTANCE="TGT_DEFAULT" TOFIELD="ID"/>
+  </MAPPING>
+  <CONFIG NAME="default_session_config"/>
+  <WORKFLOW NAME="wf_router_groups_test" ISVALID="YES">
+    <TASK NAME="Start" REUSABLE="NO" TYPE="Start"/>
+    <SESSION NAME="s_m_router_groups_test" ISVALID="YES" REUSABLE="NO" MAPPINGNAME="m_router_groups_test">
+      <CONFIGREFERENCE REFOBJECTNAME="default_session_config" TYPE="Session config"/>
+    </SESSION>
+    <TASKINSTANCE NAME="Start" TASKNAME="Start" TASKTYPE="Start"/>
+    <TASKINSTANCE NAME="s_m_router_groups_test" TASKNAME="s_m_router_groups_test" TASKTYPE="Session"/>
+    <WORKFLOWLINK FROMTASK="Start" TOTASK="s_m_router_groups_test"/>
+  </WORKFLOW>
+</FOLDER>
+</REPOSITORY>
+</POWERMART>'''
+    def test_router_generates_all_named_groups(self):
+        converter = InformaticaConverter()
+        tmpdir = tempfile.mkdtemp()
+        try:
+            converter.convert_string(self.ROUTER_GROUP_XML, output_dir=tmpdir)
+            for fn in os.listdir(tmpdir):
+                if fn.startswith("mapping_") and fn.endswith(".py"):
+                    with open(os.path.join(tmpdir, fn)) as f:
+                        code = f.read()
+                    assert "Rest_Type_PER" in code, "Should have Rest_Type_PER group"
+                    assert "Rest_Value_PER" in code, "Should have Rest_Value_PER group"
+                    assert "Rest_Type_ORG" in code, "Should have Rest_Type_ORG group"
+                    assert "_router_mask_" in code, "Should have router masks"
+                    assert "Default group" in code, "Should have default group"
+                    assert "_router_default_mask" in code, "Should have default mask"
+                    break
+        finally:
+            shutil.rmtree(tmpdir)
+    def test_router_group_creates_separate_dataframes(self):
+        converter = InformaticaConverter()
+        tmpdir = tempfile.mkdtemp()
+        try:
+            converter.convert_string(self.ROUTER_GROUP_XML, output_dir=tmpdir)
+            for fn in os.listdir(tmpdir):
+                if fn.startswith("mapping_") and fn.endswith(".py"):
+                    with open(os.path.join(tmpdir, fn)) as f:
+                        code = f.read()
+                    assert "df_rtr_split_rest_type_per" in code, "Should create df for Rest_Type_PER"
+                    assert "df_rtr_split_rest_value_per" in code, "Should create df for Rest_Value_PER"
+                    assert "df_rtr_split_rest_type_org" in code, "Should create df for Rest_Type_ORG"
+                    assert ".copy()" in code, "Should copy DataFrames"
+                    break
+        finally:
+            shutil.rmtree(tmpdir)
+    def test_router_group_connector_resolution(self):
+        converter = InformaticaConverter()
+        tmpdir = tempfile.mkdtemp()
+        try:
+            converter.convert_string(self.ROUTER_GROUP_XML, output_dir=tmpdir)
+            for fn in os.listdir(tmpdir):
+                if fn.startswith("mapping_") and fn.endswith(".py"):
+                    with open(os.path.join(tmpdir, fn)) as f:
+                        code = f.read()
+                    assert "df_rtr_split_rest_type_per" in code
+                    assert "df_rtr_split_rest_value_per" in code
+                    break
+        finally:
+            shutil.rmtree(tmpdir)
+    def test_router_default_excludes_all_groups(self):
+        converter = InformaticaConverter()
+        tmpdir = tempfile.mkdtemp()
+        try:
+            converter.convert_string(self.ROUTER_GROUP_XML, output_dir=tmpdir)
+            for fn in os.listdir(tmpdir):
+                if fn.startswith("mapping_") and fn.endswith(".py"):
+                    with open(os.path.join(tmpdir, fn)) as f:
+                        code = f.read()
+                    assert "~_router_mask_rtr_split_0" in code, "Default should negate first group mask"
+                    assert "~_router_mask_rtr_split_1" in code, "Default should negate second group mask"
+                    assert "~_router_mask_rtr_split_2" in code, "Default should negate third group mask"
+                    break
+        finally:
+            shutil.rmtree(tmpdir)
+class TestRouterGroupParsing(unittest.TestCase):
+    def test_group_expression_parsed(self):
+        xml = '''<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE POWERMART SYSTEM "powrmart.dtd">
+<POWERMART CREATION_DATE="01/01/2025" REPOSITORY_VERSION="1">
+<REPOSITORY NAME="repo" VERSION="1" CODEPAGE="UTF-8" DATABASETYPE="Oracle">
+<FOLDER NAME="TEST" OWNER="admin">
+  <MAPPING NAME="m_test" ISVALID="YES">
+    <TRANSFORMATION NAME="RTR" TYPE="Router" REUSABLE="NO">
+      <TRANSFORMFIELD NAME="X" DATATYPE="string" PRECISION="10" SCALE="0" PORTTYPE="INPUT/OUTPUT"/>
+      <GROUP NAME="INPUT" ORDER="1" TYPE="INPUT"/>
+      <GROUP NAME="GRP_A" ORDER="2" TYPE="OUTPUT" EXPRESSION="X = &apos;A&apos;"/>
+      <GROUP NAME="DEFAULT1" ORDER="3" TYPE="OUTPUT/DEFAULT"/>
+    </TRANSFORMATION>
+  </MAPPING>
+</FOLDER>
+</REPOSITORY>
+</POWERMART>'''
+        from informatica_python.parser import InformaticaParser
+        parser = InformaticaParser()
+        pm = parser.parse_string(xml)
+        mapping = pm.repositories[0].folders[0].mappings[0]
+        rtr = mapping.transformations[0]
+        assert len(rtr.groups) == 3
+        grp_a = [g for g in rtr.groups if g.name == "GRP_A"][0]
+        assert "X = " in grp_a.expression
+        assert grp_a.type == "OUTPUT"
+        default_g = [g for g in rtr.groups if g.name == "DEFAULT1"][0]
+        assert "DEFAULT" in default_g.type.upper()
+    def test_connector_group_attributes_parsed(self):
+        xml = '''<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE POWERMART SYSTEM "powrmart.dtd">
+<POWERMART CREATION_DATE="01/01/2025" REPOSITORY_VERSION="1">
+<REPOSITORY NAME="repo" VERSION="1" CODEPAGE="UTF-8" DATABASETYPE="Oracle">
+<FOLDER NAME="TEST" OWNER="admin">
+  <MAPPING NAME="m_test" ISVALID="YES">
+    <TRANSFORMATION NAME="RTR" TYPE="Router" REUSABLE="NO">
+      <TRANSFORMFIELD NAME="X" DATATYPE="string" PRECISION="10" SCALE="0" PORTTYPE="INPUT/OUTPUT"/>
+    </TRANSFORMATION>
+    <CONNECTOR FROMINSTANCE="RTR" FROMINSTANCEGROUP="GRP_A" FROMFIELD="X" TOINSTANCE="TGT" TOFIELD="X"/>
+  </MAPPING>
+</FOLDER>
+</REPOSITORY>
+</POWERMART>'''
+        from informatica_python.parser import InformaticaParser
+        parser = InformaticaParser()
+        pm = parser.parse_string(xml)
+        mapping = pm.repositories[0].folders[0].mappings[0]
+        conn = mapping.connectors[0]
+        assert conn.from_instance_group == "GRP_A"
+class TestNotInsideIIF(unittest.TestCase):
+    def test_not_isnull_in_iif(self):
+        expr = "IIF(NOT(ISNULL(STATUS)), 'HAS_STATUS', 'NO_STATUS')"
+        result = convert_expression_vectorized(expr)
+        assert "isna" in result or "isnull" in result
+        assert "~" in result
+        assert "np.where" in result
+    def test_not_isnull_standalone(self):
+        from informatica_python.utils.expression_converter import convert_filter_vectorized
+        result = convert_filter_vectorized("NOT(ISNULL(FIELD1))", "df")
+        assert "~" in result
+        assert "isna" in result
+    def test_not_equals_in_iif(self):
+        expr = "IIF(NOT(STATUS = 'ACTIVE'), 'INACTIVE', 'ACTIVE')"
+        result = convert_expression_vectorized(expr)
+        assert "np.where" in result
+        assert "~" in result
+class TestLookupSqlOverride(unittest.TestCase):
+    LOOKUP_SQL_XML = '''<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE POWERMART SYSTEM "powrmart.dtd">
+<POWERMART CREATION_DATE="01/01/2025" REPOSITORY_VERSION="1">
+<REPOSITORY NAME="repo" VERSION="1" CODEPAGE="UTF-8" DATABASETYPE="Oracle">
+<FOLDER NAME="TEST" OWNER="admin">
+  <SOURCE NAME="SRC" DATABASETYPE="Flat File" DBDNAME="SRC">
+    <FLATFILE DELIMITEDBY="COMMA" HEADERROWPRESENT="YES" PADBYTES="NO" ROWDELIMITER="\\n"/>
+    <SOURCEFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" NULLABLE="NOTNULL" KEYTYPE="PRIMARY KEY" FIELDNUMBER="1"/>
+    <SOURCEFIELD NAME="CODE" DATATYPE="string" PRECISION="10" SCALE="0" NULLABLE="NULL" KEYTYPE="NOT A KEY" FIELDNUMBER="2"/>
+  </SOURCE>
+  <TARGET NAME="TGT" DATABASETYPE="Flat File">
+    <TARGETFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" NULLABLE="NULL" KEYTYPE="NOT A KEY" FIELDNUMBER="1"/>
+    <TARGETFIELD NAME="DESC" DATATYPE="string" PRECISION="50" SCALE="0" NULLABLE="NULL" KEYTYPE="NOT A KEY" FIELDNUMBER="2"/>
+  </TARGET>
+  <MAPPING NAME="m_lookup_sql_test" ISVALID="YES">
+    <TRANSFORMATION NAME="SQ_SRC" TYPE="Source Qualifier" REUSABLE="NO">
+      <TRANSFORMFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" PORTTYPE="OUTPUT"/>
+      <TRANSFORMFIELD NAME="CODE" DATATYPE="string" PRECISION="10" SCALE="0" PORTTYPE="OUTPUT"/>
+    </TRANSFORMATION>
+    <TRANSFORMATION NAME="LKP_CODES" TYPE="Lookup Procedure" REUSABLE="NO">
+      <TRANSFORMFIELD NAME="CODE_IN" DATATYPE="string" PRECISION="10" SCALE="0" PORTTYPE="INPUT"/>
+      <TRANSFORMFIELD NAME="DESC_OUT" DATATYPE="string" PRECISION="50" SCALE="0" PORTTYPE="OUTPUT"/>
+      <TABLEATTRIBUTE NAME="Lookup Sql Override" VALUE="SELECT CODE, DESCRIPTION FROM REF_CODES WHERE ACTIVE_FLAG = &apos;Y&apos;"/>
+      <TABLEATTRIBUTE NAME="Lookup table name" VALUE="REF_CODES"/>
+    </TRANSFORMATION>
+    <INSTANCE NAME="SRC" TYPE="Source Definition" TRANSFORMATION_NAME="SRC"/>
+    <INSTANCE NAME="SQ_SRC" TYPE="Source Qualifier" TRANSFORMATION_NAME="SQ_SRC"/>
+    <INSTANCE NAME="LKP_CODES" TYPE="Lookup Procedure" TRANSFORMATION_NAME="LKP_CODES"/>
+    <INSTANCE NAME="TGT" TYPE="Target Definition" TRANSFORMATION_NAME="TGT"/>
+    <CONNECTOR FROMINSTANCE="SRC" FROMFIELD="ID" TOINSTANCE="SQ_SRC" TOFIELD="ID"/>
+    <CONNECTOR FROMINSTANCE="SRC" FROMFIELD="CODE" TOINSTANCE="SQ_SRC" TOFIELD="CODE"/>
+    <CONNECTOR FROMINSTANCE="SQ_SRC" FROMFIELD="ID" TOINSTANCE="LKP_CODES" TOFIELD="CODE_IN"/>
+    <CONNECTOR FROMINSTANCE="LKP_CODES" FROMFIELD="DESC_OUT" TOINSTANCE="TGT" TOFIELD="DESC"/>
+    <CONNECTOR FROMINSTANCE="SQ_SRC" FROMFIELD="ID" TOINSTANCE="TGT" TOFIELD="ID"/>
+  </MAPPING>
+  <CONFIG NAME="default_session_config"/>
+  <WORKFLOW NAME="wf_lookup_sql_test" ISVALID="YES">
+    <TASK NAME="Start" REUSABLE="NO" TYPE="Start"/>
+    <SESSION NAME="s_m_lookup_sql_test" ISVALID="YES" REUSABLE="NO" MAPPINGNAME="m_lookup_sql_test">
+      <CONFIGREFERENCE REFOBJECTNAME="default_session_config" TYPE="Session config"/>
+    </SESSION>
+    <TASKINSTANCE NAME="Start" TASKNAME="Start" TASKTYPE="Start"/>
+    <TASKINSTANCE NAME="s_m_lookup_sql_test" TASKNAME="s_m_lookup_sql_test" TASKTYPE="Session"/>
+    <WORKFLOWLINK FROMTASK="Start" TOTASK="s_m_lookup_sql_test"/>
+  </WORKFLOW>
+</FOLDER>
+</REPOSITORY>
+</POWERMART>'''
+    def test_lookup_sql_override_applied(self):
+        converter = InformaticaConverter()
+        tmpdir = tempfile.mkdtemp()
+        try:
+            converter.convert_string(self.LOOKUP_SQL_XML, output_dir=tmpdir)
+            for fn in os.listdir(tmpdir):
+                if fn.startswith("mapping_") and fn.endswith(".py"):
+                    with open(os.path.join(tmpdir, fn)) as f:
+                        code = f.read()
+                    assert "SELECT CODE, DESCRIPTION FROM REF_CODES" in code, \
+                        "Lookup SQL Override should appear in generated code"
+                    assert "read_from_db" in code, \
+                        "Should use read_from_db with the override SQL"
+                    break
+        finally:
+            shutil.rmtree(tmpdir)
+    def test_lookup_sql_in_all_sql_queries(self):
+        converter = InformaticaConverter()
+        tmpdir = tempfile.mkdtemp()
+        try:
+            converter.convert_string(self.LOOKUP_SQL_XML, output_dir=tmpdir)
+            sql_file = os.path.join(tmpdir, "all_sql_queries.sql")
+            if os.path.exists(sql_file):
+                with open(sql_file) as f:
+                    sql = f.read()
+                assert "REF_CODES" in sql, "SQL file should contain lookup SQL"
+        finally:
+            shutil.rmtree(tmpdir)