informatica-python 1.9.8__tar.gz → 1.10.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (31) hide show
  1. {informatica_python-1.9.8 → informatica_python-1.10.0}/PKG-INFO +49 -8
  2. {informatica_python-1.9.8 → informatica_python-1.10.0}/README.md +48 -7
  3. {informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python/generators/mapping_gen.py +52 -9
  4. {informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python/models.py +3 -0
  5. {informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python/parser.py +3 -0
  6. {informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python.egg-info/PKG-INFO +49 -8
  7. {informatica_python-1.9.8 → informatica_python-1.10.0}/pyproject.toml +1 -1
  8. {informatica_python-1.9.8 → informatica_python-1.10.0}/tests/test_integration.py +299 -0
  9. {informatica_python-1.9.8 → informatica_python-1.10.0}/LICENSE +0 -0
  10. {informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python/__init__.py +0 -0
  11. {informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python/cli.py +0 -0
  12. {informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python/converter.py +0 -0
  13. {informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python/generators/__init__.py +0 -0
  14. {informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python/generators/config_gen.py +0 -0
  15. {informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python/generators/error_log_gen.py +0 -0
  16. {informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python/generators/helper_gen.py +0 -0
  17. {informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python/generators/sql_gen.py +0 -0
  18. {informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python/generators/workflow_gen.py +0 -0
  19. {informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python/utils/__init__.py +0 -0
  20. {informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python/utils/datatype_map.py +0 -0
  21. {informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python/utils/expression_converter.py +0 -0
  22. {informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python/utils/lib_adapters.py +0 -0
  23. {informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python/utils/sql_dialect.py +0 -0
  24. {informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python.egg-info/SOURCES.txt +0 -0
  25. {informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python.egg-info/dependency_links.txt +0 -0
  26. {informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python.egg-info/entry_points.txt +0 -0
  27. {informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python.egg-info/requires.txt +0 -0
  28. {informatica_python-1.9.8 → informatica_python-1.10.0}/informatica_python.egg-info/top_level.txt +0 -0
  29. {informatica_python-1.9.8 → informatica_python-1.10.0}/setup.cfg +0 -0
  30. {informatica_python-1.9.8 → informatica_python-1.10.0}/tests/test_converter.py +0 -0
  31. {informatica_python-1.9.8 → informatica_python-1.10.0}/tests/test_expressions.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: informatica-python
3
- Version: 1.9.8
3
+ Version: 1.10.0
4
4
  Summary: Convert Informatica PowerCenter workflow XML to Python/PySpark code
5
5
  Author: Nick
6
6
  License: MIT
@@ -124,7 +124,7 @@ The code generator produces real, runnable Python for these transformation types
124
124
  - **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style) with 40+ vectorized function handlers
125
125
  - **Filter** — Row filtering with vectorized converted conditions
126
126
  - **Joiner** — `pd.merge()` with join type and condition parsing (inner/left/right/outer)
127
- - **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution
127
+ - **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution, SQL override support, table caching via `lookup_func()`
128
128
  - **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST, computed aggregates
129
129
  - **Sorter** — `sort_values()` with multi-key ascending/descending per-field direction from SORTDIRECTION attribute
130
130
  - **Router** — Multi-group conditional routing with named groups
@@ -196,7 +196,7 @@ Column-level pandas operations instead of row-level iteration. The expression co
196
196
  - `REPLACECHR/REPLACESTR` → `.str.replace()`
197
197
  - `REG_EXTRACT/REG_REPLACE` → `.str.extract()/.str.replace(regex=True)`
198
198
  - `CHR(code)` → `chr(int(code))`
199
- - `||` concatenation → `+` with `.astype(str)` on non-literals
199
+ - `||` concatenation → `+` with smart coercion: `.fillna('').astype(str)` for Series, `str()` for scalars
200
200
 
201
201
  **Date/Time:**
202
202
  - `TO_DATE(val, fmt)` → `pd.to_datetime()` with Informatica→Python format conversion
@@ -343,10 +343,12 @@ Target field datatypes are mapped to pandas types and generate proper casting co
343
343
  - Decimals/Floats: `pd.to_numeric(errors='coerce')`
344
344
  - Booleans: `.astype('boolean')`
345
345
 
346
- ### Flat File Handling (v1.3+)
346
+ ### Flat File Handling (v1.3+, enhanced v1.9.8)
347
347
 
348
348
  Parses FLATFILE metadata for delimiter, fixed-width, header lines, skip rows, quote/escape chars. Generates `pd.read_fwf()` for fixed-width or enriched `read_file()` for delimited.
349
349
 
350
+ **Fixed-width enhancements (v1.9.8):** `OFFSET`, `PHYSICALLENGTH`, and `PHYSICALOFFSET` are parsed from `SOURCEFIELD` attributes. `physical_length` is preferred over `precision` for accurate column width calculations in `pd.read_fwf()`.
351
+
350
352
  ### Mapplet Inlining (v1.3+)
351
353
 
352
354
  Expands Mapplet instances into prefixed transforms, rewires connectors, and eliminates duplication.
@@ -371,12 +373,17 @@ The generated `helper_functions.py` provides a complete runtime library:
371
373
  ### Database Operations
372
374
  | Function | Description |
373
375
  |----------|-------------|
374
- | `get_db_connection(config, conn_name)` | Create DB connection (pyodbc/pymssql/sqlalchemy fallback for MSSQL) |
376
+ | `get_db_connection(config, conn_name)` | SQLAlchemy-first DB connection with engine caching and connection pooling; DBAPI fallback for pyodbc/pymssql |
375
377
  | `read_from_db(config, query, conn_name)` | Execute SQL query and return DataFrame |
376
378
  | `write_to_db(config, df, table, conn_name)` | Write DataFrame to database table via `.to_sql()` |
377
- | `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement (INSERT, UPDATE, DELETE) |
379
+ | `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement; auto-detects SQLAlchemy vs DBAPI via `dialect` attribute |
378
380
  | `write_with_update_strategy(config, df, table, ...)` | Split rows by `_update_strategy` column into INSERT/UPDATE/DELETE/REJECT operations |
379
381
  | `call_stored_procedure(config, proc, params, ...)` | Execute stored procedure with input/output parameter mapping (Oracle/MSSQL/generic) |
382
+ | `lookup_func(table, *args)` | Full lookup implementation with table caching, condition parsing, and default value support |
383
+ | `resolve_env(value)` | Resolve `${VAR}` placeholders from environment variables with config fallback |
384
+ | `resolve_builtin_variable(var_name, ...)` | Resolve `$PMMappingName`, `$PMSessionName`, `$PMFolderName`, etc. |
385
+ | `rename_with_duplicates(df, col_map)` | Safe column rename supporting one-source-to-many-target mapping |
386
+ | `_safe_close(conn)` | Safe connection cleanup handling both SQLAlchemy and raw DBAPI connections |
380
387
 
381
388
  ### File Operations
382
389
  | Function | Description |
@@ -407,7 +414,41 @@ The generated `helper_functions.py` provides a complete runtime library:
407
414
 
408
415
  ## Changelog
409
416
 
410
- ### v1.9.3 (Current)
417
+ ### v1.10.0 (Current)
418
+ - **Router multi-group output support**: Router transformations now properly handle `<GROUP>` elements with `EXPRESSION` attributes — generates separate filtered DataFrames for each named output group (e.g., `df_rtr_rest_type_per`, `df_rtr_rest_value_per`), not just the DEFAULT group
419
+ - **Connector group routing**: `FROMINSTANCEGROUP` / `TOINSTANCEGROUP` attributes on `CONNECTOR` elements are now parsed and used to wire downstream transforms/targets to the correct Router output group
420
+ - **GroupDef expression field**: `GroupDef` model now stores the `EXPRESSION` attribute from `<GROUP>` XML elements
421
+ - **Backward-compatible Router fallback**: Existing `TABLEATTRIBUTE`-based Router group conditions (older XML format) continue to work — the code checks `<GROUP>` elements first, then falls back to `TABLEATTRIBUTE` entries
422
+ - **223 tests** passing
423
+
424
+ ### v1.9.8
425
+ - **NOT(expr) function-call form**: `NOT(ISNULL(x))` now correctly converts to `~(df["x"].isna())` — handles both `NOT ` (with space) and `NOT(` (without space) forms
426
+ - **AND/OR/NOT as field names fix**: Logical operators no longer mangled into `df["AND"]` / `df["OR"]` — conversion moved before field substitution in both `_vec_recursive` fallback and `_vectorize_simple`
427
+ - **Condition tokenizer word-boundary fix**: `_split_condition_tokens` no longer splits on `OR` inside field names like `DeletedIndicator` — verifies preceding character is a real word boundary
428
+ - **`$PMMappingName` in expressions**: `$PM*` built-in variables in expression context properly convert to `resolve_builtin_variable("PMMappingName")` instead of being mangled to `$df["PMMappingName"]`
429
+ - **TO_CHAR arithmetic parenthesization**: `TO_CHAR(TO_INTEGER(x) - 1)` now produces `(pd.to_numeric(...) - 1).astype(str)` instead of incorrect `- 1.astype(str)` binding
430
+ - **String literal early-return fix**: Expressions like `'PER_' || X || '_suffix'` no longer short-circuit as a single string literal
431
+ - **Fixed-width file enhancements**: `OFFSET`, `PHYSICALLENGTH`, `PHYSICALOFFSET` parsed from SOURCEFIELD XML; `physical_length` preferred over `precision` for `read_fwf` column widths
432
+ - **Smart concat coercion**: Scalar returns (e.g. `resolve_builtin_variable()`, `get_variable()`) use `str()` wrapping; Series use `.fillna('').astype(str)`
433
+ - **700 tests** passing
434
+
435
+ ### v1.9.5 / v1.9.6
436
+ - **`rename_with_duplicates`** helper for one-source-to-many-target column mapping
437
+ - **`resolve_env()`** for `${VAR}` placeholder resolution (env → config fallback)
438
+ - **`resolve_builtin_variable()`** for `$PMMappingName`, `$PMSessionName`, `$PMFolderName`, etc.
439
+ - **SQLAlchemy-first `get_db_connection`**: Engine caching and connection pooling; DBAPI fallback for pyodbc/pymssql
440
+ - **`_safe_close()`**: Safe connection cleanup handling both SQLAlchemy and raw DBAPI connections
441
+ - **Full `lookup_func()` implementation**: Table caching, condition parsing, default value support
442
+ - **Null-safe `||` concatenation**: `.fillna('').astype(str)` prevents "nan" strings in concatenation
443
+ - **`$PM*` variable substitution in SQL Override queries**
444
+ - **`execute_sql` dialect detection**: Uses `dialect` attribute to choose SQLAlchemy `text()` vs DBAPI `cursor.execute()`
445
+ - **678 tests** passing
446
+
447
+ ### v1.9.4
448
+ - Extended expression function coverage and edge-case fixes
449
+ - Improved mapplet and connector handling
450
+
451
+ ### v1.9.3
411
452
  - **Smart target write detection**: Bare targets default to `write_to_db()` instead of `write_file()`; file extension allowlist (`.csv`, `.dat`, `.txt`, `.xml`, `.json`, `.parquet`, `.xlsx`, `.xls`, `.tsv`, `.avro`) for file targets; schema-qualified names (`dbo.TABLE`) correctly route to database
412
453
  - **DECODE vectorization**: `DECODE(TRUE, cond1, val1, ..., default)` → nested `np.where()` chains; value-matching DECODE; handles IN() conditions and complex boolean nesting
413
454
  - **IS_SPACES vectorization**: `IS_SPACES(field)` → `field.str.strip().eq("")`
@@ -495,7 +536,7 @@ The generated `helper_functions.py` provides a complete runtime library:
495
536
  cd informatica_python
496
537
  pip install -e ".[dev]"
497
538
 
498
- # Run tests (663 tests)
539
+ # Run tests (700 tests)
499
540
  pytest tests/ -v
500
541
  ```
501
542
 
@@ -97,7 +97,7 @@ The code generator produces real, runnable Python for these transformation types
97
97
  - **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style) with 40+ vectorized function handlers
98
98
  - **Filter** — Row filtering with vectorized converted conditions
99
99
  - **Joiner** — `pd.merge()` with join type and condition parsing (inner/left/right/outer)
100
- - **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution
100
+ - **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution, SQL override support, table caching via `lookup_func()`
101
101
  - **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST, computed aggregates
102
102
  - **Sorter** — `sort_values()` with multi-key ascending/descending per-field direction from SORTDIRECTION attribute
103
103
  - **Router** — Multi-group conditional routing with named groups
@@ -169,7 +169,7 @@ Column-level pandas operations instead of row-level iteration. The expression co
169
169
  - `REPLACECHR/REPLACESTR` → `.str.replace()`
170
170
  - `REG_EXTRACT/REG_REPLACE` → `.str.extract()/.str.replace(regex=True)`
171
171
  - `CHR(code)` → `chr(int(code))`
172
- - `||` concatenation → `+` with `.astype(str)` on non-literals
172
+ - `||` concatenation → `+` with smart coercion: `.fillna('').astype(str)` for Series, `str()` for scalars
173
173
 
174
174
  **Date/Time:**
175
175
  - `TO_DATE(val, fmt)` → `pd.to_datetime()` with Informatica→Python format conversion
@@ -316,10 +316,12 @@ Target field datatypes are mapped to pandas types and generate proper casting co
316
316
  - Decimals/Floats: `pd.to_numeric(errors='coerce')`
317
317
  - Booleans: `.astype('boolean')`
318
318
 
319
- ### Flat File Handling (v1.3+)
319
+ ### Flat File Handling (v1.3+, enhanced v1.9.8)
320
320
 
321
321
  Parses FLATFILE metadata for delimiter, fixed-width, header lines, skip rows, quote/escape chars. Generates `pd.read_fwf()` for fixed-width or enriched `read_file()` for delimited.
322
322
 
323
+ **Fixed-width enhancements (v1.9.8):** `OFFSET`, `PHYSICALLENGTH`, and `PHYSICALOFFSET` are parsed from `SOURCEFIELD` attributes. `physical_length` is preferred over `precision` for accurate column width calculations in `pd.read_fwf()`.
324
+
323
325
  ### Mapplet Inlining (v1.3+)
324
326
 
325
327
  Expands Mapplet instances into prefixed transforms, rewires connectors, and eliminates duplication.
@@ -344,12 +346,17 @@ The generated `helper_functions.py` provides a complete runtime library:
344
346
  ### Database Operations
345
347
  | Function | Description |
346
348
  |----------|-------------|
347
- | `get_db_connection(config, conn_name)` | Create DB connection (pyodbc/pymssql/sqlalchemy fallback for MSSQL) |
349
+ | `get_db_connection(config, conn_name)` | SQLAlchemy-first DB connection with engine caching and connection pooling; DBAPI fallback for pyodbc/pymssql |
348
350
  | `read_from_db(config, query, conn_name)` | Execute SQL query and return DataFrame |
349
351
  | `write_to_db(config, df, table, conn_name)` | Write DataFrame to database table via `.to_sql()` |
350
- | `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement (INSERT, UPDATE, DELETE) |
352
+ | `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement; auto-detects SQLAlchemy vs DBAPI via `dialect` attribute |
351
353
  | `write_with_update_strategy(config, df, table, ...)` | Split rows by `_update_strategy` column into INSERT/UPDATE/DELETE/REJECT operations |
352
354
  | `call_stored_procedure(config, proc, params, ...)` | Execute stored procedure with input/output parameter mapping (Oracle/MSSQL/generic) |
355
+ | `lookup_func(table, *args)` | Full lookup implementation with table caching, condition parsing, and default value support |
356
+ | `resolve_env(value)` | Resolve `${VAR}` placeholders from environment variables with config fallback |
357
+ | `resolve_builtin_variable(var_name, ...)` | Resolve `$PMMappingName`, `$PMSessionName`, `$PMFolderName`, etc. |
358
+ | `rename_with_duplicates(df, col_map)` | Safe column rename supporting one-source-to-many-target mapping |
359
+ | `_safe_close(conn)` | Safe connection cleanup handling both SQLAlchemy and raw DBAPI connections |
353
360
 
354
361
  ### File Operations
355
362
  | Function | Description |
@@ -380,7 +387,41 @@ The generated `helper_functions.py` provides a complete runtime library:
380
387
 
381
388
  ## Changelog
382
389
 
383
- ### v1.9.3 (Current)
390
+ ### v1.10.0 (Current)
391
+ - **Router multi-group output support**: Router transformations now properly handle `<GROUP>` elements with `EXPRESSION` attributes — generates separate filtered DataFrames for each named output group (e.g., `df_rtr_rest_type_per`, `df_rtr_rest_value_per`), not just the DEFAULT group
392
+ - **Connector group routing**: `FROMINSTANCEGROUP` / `TOINSTANCEGROUP` attributes on `CONNECTOR` elements are now parsed and used to wire downstream transforms/targets to the correct Router output group
393
+ - **GroupDef expression field**: `GroupDef` model now stores the `EXPRESSION` attribute from `<GROUP>` XML elements
394
+ - **Backward-compatible Router fallback**: Existing `TABLEATTRIBUTE`-based Router group conditions (older XML format) continue to work — the code checks `<GROUP>` elements first, then falls back to `TABLEATTRIBUTE` entries
395
+ - **223 tests** passing
396
+
397
+ ### v1.9.8
398
+ - **NOT(expr) function-call form**: `NOT(ISNULL(x))` now correctly converts to `~(df["x"].isna())` — handles both `NOT ` (with space) and `NOT(` (without space) forms
399
+ - **AND/OR/NOT as field names fix**: Logical operators no longer mangled into `df["AND"]` / `df["OR"]` — conversion moved before field substitution in both `_vec_recursive` fallback and `_vectorize_simple`
400
+ - **Condition tokenizer word-boundary fix**: `_split_condition_tokens` no longer splits on `OR` inside field names like `DeletedIndicator` — verifies preceding character is a real word boundary
401
+ - **`$PMMappingName` in expressions**: `$PM*` built-in variables in expression context properly convert to `resolve_builtin_variable("PMMappingName")` instead of being mangled to `$df["PMMappingName"]`
402
+ - **TO_CHAR arithmetic parenthesization**: `TO_CHAR(TO_INTEGER(x) - 1)` now produces `(pd.to_numeric(...) - 1).astype(str)` instead of incorrect `- 1.astype(str)` binding
403
+ - **String literal early-return fix**: Expressions like `'PER_' || X || '_suffix'` no longer short-circuit as a single string literal
404
+ - **Fixed-width file enhancements**: `OFFSET`, `PHYSICALLENGTH`, `PHYSICALOFFSET` parsed from SOURCEFIELD XML; `physical_length` preferred over `precision` for `read_fwf` column widths
405
+ - **Smart concat coercion**: Scalar returns (e.g. `resolve_builtin_variable()`, `get_variable()`) use `str()` wrapping; Series use `.fillna('').astype(str)`
406
+ - **700 tests** passing
407
+
408
+ ### v1.9.5 / v1.9.6
409
+ - **`rename_with_duplicates`** helper for one-source-to-many-target column mapping
410
+ - **`resolve_env()`** for `${VAR}` placeholder resolution (env → config fallback)
411
+ - **`resolve_builtin_variable()`** for `$PMMappingName`, `$PMSessionName`, `$PMFolderName`, etc.
412
+ - **SQLAlchemy-first `get_db_connection`**: Engine caching and connection pooling; DBAPI fallback for pyodbc/pymssql
413
+ - **`_safe_close()`**: Safe connection cleanup handling both SQLAlchemy and raw DBAPI connections
414
+ - **Full `lookup_func()` implementation**: Table caching, condition parsing, default value support
415
+ - **Null-safe `||` concatenation**: `.fillna('').astype(str)` prevents "nan" strings in concatenation
416
+ - **`$PM*` variable substitution in SQL Override queries**
417
+ - **`execute_sql` dialect detection**: Uses `dialect` attribute to choose SQLAlchemy `text()` vs DBAPI `cursor.execute()`
418
+ - **678 tests** passing
419
+
420
+ ### v1.9.4
421
+ - Extended expression function coverage and edge-case fixes
422
+ - Improved mapplet and connector handling
423
+
424
+ ### v1.9.3
384
425
  - **Smart target write detection**: Bare targets default to `write_to_db()` instead of `write_file()`; file extension allowlist (`.csv`, `.dat`, `.txt`, `.xml`, `.json`, `.parquet`, `.xlsx`, `.xls`, `.tsv`, `.avro`) for file targets; schema-qualified names (`dbo.TABLE`) correctly route to database
385
426
  - **DECODE vectorization**: `DECODE(TRUE, cond1, val1, ..., default)` → nested `np.where()` chains; value-matching DECODE; handles IN() conditions and complex boolean nesting
386
427
  - **IS_SPACES vectorization**: `IS_SPACES(field)` → `field.str.strip().eq("")`
@@ -468,7 +509,7 @@ The generated `helper_functions.py` provides a complete runtime library:
468
509
  cd informatica_python
469
510
  pip install -e ".[dev]"
470
511
 
471
- # Run tests (663 tests)
512
+ # Run tests (700 tests)
472
513
  pytest tests/ -v
473
514
  ```
474
515
 
@@ -54,6 +54,8 @@ def _expand_mapplet_recursive(mapplet, mapplet_map, prefix, depth=0, max_depth=1
54
54
  to_instance=new_to,
55
55
  to_field=conn.to_field,
56
56
  to_instance_type=conn.to_instance_type,
57
+ from_instance_group=conn.from_instance_group,
58
+ to_instance_group=conn.to_instance_group,
57
59
  ))
58
60
 
59
61
  for inst in getattr(mapplet, 'instances', []):
@@ -138,6 +140,8 @@ def _inline_mapplets(mapping, folder):
138
140
  to_instance=first_tx,
139
141
  to_field=conn.to_field,
140
142
  to_instance_type=conn.to_instance_type,
143
+ from_instance_group=conn.from_instance_group,
144
+ to_instance_group=conn.to_instance_group,
141
145
  ))
142
146
  else:
143
147
  rewired_connectors.append(conn)
@@ -167,6 +171,8 @@ def _inline_mapplets(mapping, folder):
167
171
  to_instance=conn.to_instance,
168
172
  to_field=conn.to_field,
169
173
  to_instance_type=conn.to_instance_type,
174
+ from_instance_group=conn.from_instance_group,
175
+ to_instance_group=conn.to_instance_group,
170
176
  ))
171
177
  else:
172
178
  rewired_connectors.append(conn)
@@ -730,7 +736,11 @@ def _generate_transformation(lines, tx, connector_graph, source_dfs, transform_m
730
736
  input_conns = connector_graph.get("to", {}).get(tx.name, [])
731
737
  input_sources = set()
732
738
  for c in input_conns:
733
- input_sources.add(c.from_instance)
739
+ if c.from_instance_group:
740
+ group_key = f"{c.from_instance}:{c.from_instance_group}"
741
+ input_sources.add(group_key)
742
+ else:
743
+ input_sources.add(c.from_instance)
734
744
 
735
745
  input_df = None
736
746
  for src in input_sources:
@@ -739,7 +749,14 @@ def _generate_transformation(lines, tx, connector_graph, source_dfs, transform_m
739
749
  break
740
750
  if not input_df:
741
751
  for src in input_sources:
742
- input_df = f"df_{_safe_name(src)}"
752
+ base = src.split(":")[0] if ":" in src else src
753
+ if base in source_dfs:
754
+ input_df = source_dfs[base]
755
+ break
756
+ if not input_df:
757
+ for src in input_sources:
758
+ base = src.split(":")[0] if ":" in src else src
759
+ input_df = f"df_{_safe_name(base)}"
743
760
  break
744
761
  if not input_df:
745
762
  input_df = "df_input"
@@ -1095,9 +1112,24 @@ def _gen_lookup_transform(lines, tx, tx_safe, input_df, source_dfs, connector_gr
1095
1112
  def _gen_router_transform(lines, tx, tx_safe, input_df, source_dfs):
1096
1113
  lines.append(f" # Router groups:")
1097
1114
  group_conditions = {}
1098
- for attr in tx.attributes:
1099
- if "Group Filter Condition" in attr.name:
1100
- group_conditions[attr.name] = attr.value
1115
+
1116
+ output_groups = [
1117
+ g for g in tx.groups
1118
+ if g.type.upper() not in ("INPUT", "") and "DEFAULT" not in g.type.upper()
1119
+ ]
1120
+ output_groups.sort(key=lambda g: g.order)
1121
+
1122
+ if output_groups:
1123
+ for g in output_groups:
1124
+ if g.expression and g.expression.strip():
1125
+ group_conditions[g.name] = g.expression
1126
+ else:
1127
+ group_conditions[g.name] = ""
1128
+
1129
+ if not group_conditions:
1130
+ for attr in tx.attributes:
1131
+ if "Group Filter Condition" in attr.name:
1132
+ group_conditions[attr.name] = attr.value
1101
1133
 
1102
1134
  remaining_mask_parts = []
1103
1135
  if group_conditions:
@@ -1108,15 +1140,21 @@ def _gen_router_transform(lines, tx, tx_safe, input_df, source_dfs):
1108
1140
  expr_py = f"pd.Series(True, index={input_df}.index)"
1109
1141
  mask_var = f"_router_mask_{tx_safe}_{i}"
1110
1142
  lines.append(f" {mask_var} = {expr_py} # {gname}")
1111
- lines.append(f" df_{tx_safe}_group{i} = {input_df}[{mask_var}].copy()")
1112
- source_dfs[f"{tx.name}_group{i}"] = f"df_{tx_safe}_group{i}"
1143
+ group_df_name = f"df_{tx_safe}_{_safe_name(gname)}"
1144
+ lines.append(f" {group_df_name} = {input_df}[{mask_var}].copy()")
1145
+ source_dfs[f"{tx.name}:{gname}"] = group_df_name
1113
1146
  remaining_mask_parts.append(f"~{mask_var}")
1147
+
1148
+ default_groups = [g for g in tx.groups if "DEFAULT" in g.type.upper()]
1149
+ default_name = default_groups[0].name if default_groups else "DEFAULT"
1150
+
1114
1151
  if remaining_mask_parts:
1115
1152
  lines.append(f" _router_default_mask = {' & '.join(remaining_mask_parts)}")
1116
- lines.append(f" df_{tx_safe} = {input_df}[_router_default_mask].copy() # Default group")
1153
+ lines.append(f" df_{tx_safe} = {input_df}[_router_default_mask].copy() # Default group ({default_name})")
1117
1154
  else:
1118
- lines.append(f" df_{tx_safe} = {input_df}.copy() # Default group")
1155
+ lines.append(f" df_{tx_safe} = {input_df}.copy() # Default group ({default_name})")
1119
1156
  source_dfs[tx.name] = f"df_{tx_safe}"
1157
+ source_dfs[f"{tx.name}:{default_name}"] = f"df_{tx_safe}"
1120
1158
 
1121
1159
 
1122
1160
  def _gen_union_transform(lines, tx, tx_safe, input_sources, source_dfs, data_lib="pandas"):
@@ -1448,6 +1486,11 @@ def _generate_target_write(lines, tgt_name, tgt_def, connector_graph, source_dfs
1448
1486
  to_conns = connector_graph.get("to", {}).get(tgt_name, [])
1449
1487
  input_df = None
1450
1488
  for c in to_conns:
1489
+ if c.from_instance_group:
1490
+ group_key = f"{c.from_instance}:{c.from_instance_group}"
1491
+ if group_key in source_dfs:
1492
+ input_df = source_dfs[group_key]
1493
+ break
1451
1494
  if c.from_instance in source_dfs:
1452
1495
  input_df = source_dfs[c.from_instance]
1453
1496
  break
@@ -80,6 +80,7 @@ class GroupDef:
80
80
  type: str = ""
81
81
  description: str = ""
82
82
  order: int = 0
83
+ expression: str = ""
83
84
  fields: List[FieldDef] = field(default_factory=list)
84
85
 
85
86
 
@@ -274,6 +275,8 @@ class ConnectorDef:
274
275
  to_field: str
275
276
  to_instance: str
276
277
  to_instance_type: str
278
+ from_instance_group: str = ""
279
+ to_instance_group: str = ""
277
280
 
278
281
 
279
282
  @dataclass
@@ -240,6 +240,7 @@ class InformaticaParser:
240
240
  type=self._attr(elem, "TYPE"),
241
241
  description=self._attr(elem, "DESCRIPTION"),
242
242
  order=self._int_attr(elem, "ORDER"),
243
+ expression=self._attr(elem, "EXPRESSION"),
243
244
  )
244
245
  for fld in elem.findall("SOURCEFIELD"):
245
246
  grp.fields.append(self._parse_source_field(fld))
@@ -580,6 +581,8 @@ class InformaticaParser:
580
581
  to_field=self._attr(elem, "TOFIELD"),
581
582
  to_instance=self._attr(elem, "TOINSTANCE"),
582
583
  to_instance_type=self._attr(elem, "TOINSTANCETYPE"),
584
+ from_instance_group=self._attr(elem, "FROMINSTANCEGROUP"),
585
+ to_instance_group=self._attr(elem, "TOINSTANCEGROUP"),
583
586
  )
584
587
 
585
588
  def _parse_instance(self, elem) -> InstanceDef:
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: informatica-python
3
- Version: 1.9.8
3
+ Version: 1.10.0
4
4
  Summary: Convert Informatica PowerCenter workflow XML to Python/PySpark code
5
5
  Author: Nick
6
6
  License: MIT
@@ -124,7 +124,7 @@ The code generator produces real, runnable Python for these transformation types
124
124
  - **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style) with 40+ vectorized function handlers
125
125
  - **Filter** — Row filtering with vectorized converted conditions
126
126
  - **Joiner** — `pd.merge()` with join type and condition parsing (inner/left/right/outer)
127
- - **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution
127
+ - **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution, SQL override support, table caching via `lookup_func()`
128
128
  - **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST, computed aggregates
129
129
  - **Sorter** — `sort_values()` with multi-key ascending/descending per-field direction from SORTDIRECTION attribute
130
130
  - **Router** — Multi-group conditional routing with named groups
@@ -196,7 +196,7 @@ Column-level pandas operations instead of row-level iteration. The expression co
196
196
  - `REPLACECHR/REPLACESTR` → `.str.replace()`
197
197
  - `REG_EXTRACT/REG_REPLACE` → `.str.extract()/.str.replace(regex=True)`
198
198
  - `CHR(code)` → `chr(int(code))`
199
- - `||` concatenation → `+` with `.astype(str)` on non-literals
199
+ - `||` concatenation → `+` with smart coercion: `.fillna('').astype(str)` for Series, `str()` for scalars
200
200
 
201
201
  **Date/Time:**
202
202
  - `TO_DATE(val, fmt)` → `pd.to_datetime()` with Informatica→Python format conversion
@@ -343,10 +343,12 @@ Target field datatypes are mapped to pandas types and generate proper casting co
343
343
  - Decimals/Floats: `pd.to_numeric(errors='coerce')`
344
344
  - Booleans: `.astype('boolean')`
345
345
 
346
- ### Flat File Handling (v1.3+)
346
+ ### Flat File Handling (v1.3+, enhanced v1.9.8)
347
347
 
348
348
  Parses FLATFILE metadata for delimiter, fixed-width, header lines, skip rows, quote/escape chars. Generates `pd.read_fwf()` for fixed-width or enriched `read_file()` for delimited.
349
349
 
350
+ **Fixed-width enhancements (v1.9.8):** `OFFSET`, `PHYSICALLENGTH`, and `PHYSICALOFFSET` are parsed from `SOURCEFIELD` attributes. `physical_length` is preferred over `precision` for accurate column width calculations in `pd.read_fwf()`.
351
+
350
352
  ### Mapplet Inlining (v1.3+)
351
353
 
352
354
  Expands Mapplet instances into prefixed transforms, rewires connectors, and eliminates duplication.
@@ -371,12 +373,17 @@ The generated `helper_functions.py` provides a complete runtime library:
371
373
  ### Database Operations
372
374
  | Function | Description |
373
375
  |----------|-------------|
374
- | `get_db_connection(config, conn_name)` | Create DB connection (pyodbc/pymssql/sqlalchemy fallback for MSSQL) |
376
+ | `get_db_connection(config, conn_name)` | SQLAlchemy-first DB connection with engine caching and connection pooling; DBAPI fallback for pyodbc/pymssql |
375
377
  | `read_from_db(config, query, conn_name)` | Execute SQL query and return DataFrame |
376
378
  | `write_to_db(config, df, table, conn_name)` | Write DataFrame to database table via `.to_sql()` |
377
- | `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement (INSERT, UPDATE, DELETE) |
379
+ | `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement; auto-detects SQLAlchemy vs DBAPI via `dialect` attribute |
378
380
  | `write_with_update_strategy(config, df, table, ...)` | Split rows by `_update_strategy` column into INSERT/UPDATE/DELETE/REJECT operations |
379
381
  | `call_stored_procedure(config, proc, params, ...)` | Execute stored procedure with input/output parameter mapping (Oracle/MSSQL/generic) |
382
+ | `lookup_func(table, *args)` | Full lookup implementation with table caching, condition parsing, and default value support |
383
+ | `resolve_env(value)` | Resolve `${VAR}` placeholders from environment variables with config fallback |
384
+ | `resolve_builtin_variable(var_name, ...)` | Resolve `$PMMappingName`, `$PMSessionName`, `$PMFolderName`, etc. |
385
+ | `rename_with_duplicates(df, col_map)` | Safe column rename supporting one-source-to-many-target mapping |
386
+ | `_safe_close(conn)` | Safe connection cleanup handling both SQLAlchemy and raw DBAPI connections |
380
387
 
381
388
  ### File Operations
382
389
  | Function | Description |
@@ -407,7 +414,41 @@ The generated `helper_functions.py` provides a complete runtime library:
407
414
 
408
415
  ## Changelog
409
416
 
410
- ### v1.9.3 (Current)
417
+ ### v1.10.0 (Current)
418
+ - **Router multi-group output support**: Router transformations now properly handle `<GROUP>` elements with `EXPRESSION` attributes — generates separate filtered DataFrames for each named output group (e.g., `df_rtr_rest_type_per`, `df_rtr_rest_value_per`), not just the DEFAULT group
419
+ - **Connector group routing**: `FROMINSTANCEGROUP` / `TOINSTANCEGROUP` attributes on `CONNECTOR` elements are now parsed and used to wire downstream transforms/targets to the correct Router output group
420
+ - **GroupDef expression field**: `GroupDef` model now stores the `EXPRESSION` attribute from `<GROUP>` XML elements
421
+ - **Backward-compatible Router fallback**: Existing `TABLEATTRIBUTE`-based Router group conditions (older XML format) continue to work — the code checks `<GROUP>` elements first, then falls back to `TABLEATTRIBUTE` entries
422
+ - **223 tests** passing
423
+
424
+ ### v1.9.8
425
+ - **NOT(expr) function-call form**: `NOT(ISNULL(x))` now correctly converts to `~(df["x"].isna())` — handles both `NOT ` (with space) and `NOT(` (without space) forms
426
+ - **AND/OR/NOT as field names fix**: Logical operators no longer mangled into `df["AND"]` / `df["OR"]` — conversion moved before field substitution in both `_vec_recursive` fallback and `_vectorize_simple`
427
+ - **Condition tokenizer word-boundary fix**: `_split_condition_tokens` no longer splits on `OR` inside field names like `DeletedIndicator` — verifies preceding character is a real word boundary
428
+ - **`$PMMappingName` in expressions**: `$PM*` built-in variables in expression context properly convert to `resolve_builtin_variable("PMMappingName")` instead of being mangled to `$df["PMMappingName"]`
429
+ - **TO_CHAR arithmetic parenthesization**: `TO_CHAR(TO_INTEGER(x) - 1)` now produces `(pd.to_numeric(...) - 1).astype(str)` instead of incorrect `- 1.astype(str)` binding
430
+ - **String literal early-return fix**: Expressions like `'PER_' || X || '_suffix'` no longer short-circuit as a single string literal
431
+ - **Fixed-width file enhancements**: `OFFSET`, `PHYSICALLENGTH`, `PHYSICALOFFSET` parsed from SOURCEFIELD XML; `physical_length` preferred over `precision` for `read_fwf` column widths
432
+ - **Smart concat coercion**: Scalar returns (e.g. `resolve_builtin_variable()`, `get_variable()`) use `str()` wrapping; Series use `.fillna('').astype(str)`
433
+ - **700 tests** passing
434
+
435
+ ### v1.9.5 / v1.9.6
436
+ - **`rename_with_duplicates`** helper for one-source-to-many-target column mapping
437
+ - **`resolve_env()`** for `${VAR}` placeholder resolution (env → config fallback)
438
+ - **`resolve_builtin_variable()`** for `$PMMappingName`, `$PMSessionName`, `$PMFolderName`, etc.
439
+ - **SQLAlchemy-first `get_db_connection`**: Engine caching and connection pooling; DBAPI fallback for pyodbc/pymssql
440
+ - **`_safe_close()`**: Safe connection cleanup handling both SQLAlchemy and raw DBAPI connections
441
+ - **Full `lookup_func()` implementation**: Table caching, condition parsing, default value support
442
+ - **Null-safe `||` concatenation**: `.fillna('').astype(str)` prevents "nan" strings in concatenation
443
+ - **`$PM*` variable substitution in SQL Override queries**
444
+ - **`execute_sql` dialect detection**: Uses `dialect` attribute to choose SQLAlchemy `text()` vs DBAPI `cursor.execute()`
445
+ - **678 tests** passing
446
+
447
+ ### v1.9.4
448
+ - Extended expression function coverage and edge-case fixes
449
+ - Improved mapplet and connector handling
450
+
451
+ ### v1.9.3
411
452
  - **Smart target write detection**: Bare targets default to `write_to_db()` instead of `write_file()`; file extension allowlist (`.csv`, `.dat`, `.txt`, `.xml`, `.json`, `.parquet`, `.xlsx`, `.xls`, `.tsv`, `.avro`) for file targets; schema-qualified names (`dbo.TABLE`) correctly route to database
412
453
  - **DECODE vectorization**: `DECODE(TRUE, cond1, val1, ..., default)` → nested `np.where()` chains; value-matching DECODE; handles IN() conditions and complex boolean nesting
413
454
  - **IS_SPACES vectorization**: `IS_SPACES(field)` → `field.str.strip().eq("")`
@@ -495,7 +536,7 @@ The generated `helper_functions.py` provides a complete runtime library:
495
536
  cd informatica_python
496
537
  pip install -e ".[dev]"
497
538
 
498
- # Run tests (663 tests)
539
+ # Run tests (700 tests)
499
540
  pytest tests/ -v
500
541
  ```
501
542
 
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "informatica-python"
7
- version = "1.9.8"
7
+ version = "1.10.0"
8
8
  description = "Convert Informatica PowerCenter workflow XML to Python/PySpark code"
9
9
  readme = "README.md"
10
10
  license = {text = "MIT"}
@@ -2911,3 +2911,302 @@ class TestConcatWithLtrimRtrim(unittest.TestCase):
2911
2911
  result = convert_expression_vectorized(expr)
2912
2912
  assert "+" in result
2913
2913
  assert "||" not in result
2914
+
2915
+
2916
+ class TestRouterGroupElements(unittest.TestCase):
2917
+
2918
+ ROUTER_GROUP_XML = '''<?xml version="1.0" encoding="UTF-8"?>
2919
+ <!DOCTYPE POWERMART SYSTEM "powrmart.dtd">
2920
+ <POWERMART CREATION_DATE="01/01/2025" REPOSITORY_VERSION="1">
2921
+ <REPOSITORY NAME="repo" VERSION="1" CODEPAGE="UTF-8" DATABASETYPE="Oracle">
2922
+ <FOLDER NAME="TEST" OWNER="admin">
2923
+ <SOURCE NAME="SRC" DATABASETYPE="Flat File" DBDNAME="SRC">
2924
+ <FLATFILE DELIMITEDBY="COMMA" HEADERROWPRESENT="YES" PADBYTES="NO" ROWDELIMITER="\\n"/>
2925
+ <SOURCEFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" NULLABLE="NOTNULL" KEYTYPE="PRIMARY KEY" FIELDNUMBER="1"/>
2926
+ <SOURCEFIELD NAME="Party_Type" DATATYPE="string" PRECISION="10" SCALE="0" NULLABLE="NULL" KEYTYPE="NOT A KEY" FIELDNUMBER="2"/>
2927
+ <SOURCEFIELD NAME="Attrib_Name" DATATYPE="string" PRECISION="50" SCALE="0" NULLABLE="NULL" KEYTYPE="NOT A KEY" FIELDNUMBER="3"/>
2928
+ </SOURCE>
2929
+ <TARGET NAME="TGT_PER_TYPE" DATABASETYPE="Flat File">
2930
+ <TARGETFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" NULLABLE="NULL" KEYTYPE="NOT A KEY" FIELDNUMBER="1"/>
2931
+ </TARGET>
2932
+ <TARGET NAME="TGT_PER_VALUE" DATABASETYPE="Flat File">
2933
+ <TARGETFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" NULLABLE="NULL" KEYTYPE="NOT A KEY" FIELDNUMBER="1"/>
2934
+ </TARGET>
2935
+ <TARGET NAME="TGT_ORG_TYPE" DATABASETYPE="Flat File">
2936
+ <TARGETFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" NULLABLE="NULL" KEYTYPE="NOT A KEY" FIELDNUMBER="1"/>
2937
+ </TARGET>
2938
+ <TARGET NAME="TGT_DEFAULT" DATABASETYPE="Flat File">
2939
+ <TARGETFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" NULLABLE="NULL" KEYTYPE="NOT A KEY" FIELDNUMBER="1"/>
2940
+ </TARGET>
2941
+ <MAPPING NAME="m_router_groups_test" ISVALID="YES">
2942
+ <TRANSFORMATION NAME="SQ_SRC" TYPE="Source Qualifier" REUSABLE="NO">
2943
+ <TRANSFORMFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" PORTTYPE="OUTPUT"/>
2944
+ <TRANSFORMFIELD NAME="Party_Type" DATATYPE="string" PRECISION="10" SCALE="0" PORTTYPE="OUTPUT"/>
2945
+ <TRANSFORMFIELD NAME="Attrib_Name" DATATYPE="string" PRECISION="50" SCALE="0" PORTTYPE="OUTPUT"/>
2946
+ </TRANSFORMATION>
2947
+ <TRANSFORMATION NAME="RTR_SPLIT" TYPE="Router" REUSABLE="NO">
2948
+ <TRANSFORMFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" PORTTYPE="INPUT/OUTPUT"/>
2949
+ <TRANSFORMFIELD NAME="Party_Type" DATATYPE="string" PRECISION="10" SCALE="0" PORTTYPE="INPUT/OUTPUT"/>
2950
+ <TRANSFORMFIELD NAME="Attrib_Name" DATATYPE="string" PRECISION="50" SCALE="0" PORTTYPE="INPUT/OUTPUT"/>
2951
+ <GROUP DESCRIPTION="" NAME="INPUT" ORDER="1" TYPE="INPUT"/>
2952
+ <GROUP DESCRIPTION="" EXPRESSION="Party_Type = &apos;PER&apos; AND Attrib_Name = &apos;RESTRICTION_TYPE&apos;" NAME="Rest_Type_PER" ORDER="2" TYPE="OUTPUT"/>
2953
+ <GROUP DESCRIPTION="" EXPRESSION="Party_Type = &apos;PER&apos; AND Attrib_Name = &apos;RESTRICTION_VALUE&apos;" NAME="Rest_Value_PER" ORDER="3" TYPE="OUTPUT"/>
2954
+ <GROUP DESCRIPTION="" EXPRESSION="Party_Type = &apos;ORG&apos; AND Attrib_Name = &apos;RESTRICTION_TYPE&apos;" NAME="Rest_Type_ORG" ORDER="4" TYPE="OUTPUT"/>
2955
+ <GROUP DESCRIPTION="" NAME="DEFAULT1" ORDER="5" TYPE="OUTPUT/DEFAULT"/>
2956
+ </TRANSFORMATION>
2957
+ <INSTANCE NAME="SRC" TYPE="Source Definition" TRANSFORMATION_NAME="SRC"/>
2958
+ <INSTANCE NAME="SQ_SRC" TYPE="Source Qualifier" TRANSFORMATION_NAME="SQ_SRC"/>
2959
+ <INSTANCE NAME="RTR_SPLIT" TYPE="Router" TRANSFORMATION_NAME="RTR_SPLIT"/>
2960
+ <INSTANCE NAME="TGT_PER_TYPE" TYPE="Target Definition" TRANSFORMATION_NAME="TGT_PER_TYPE"/>
2961
+ <INSTANCE NAME="TGT_PER_VALUE" TYPE="Target Definition" TRANSFORMATION_NAME="TGT_PER_VALUE"/>
2962
+ <INSTANCE NAME="TGT_ORG_TYPE" TYPE="Target Definition" TRANSFORMATION_NAME="TGT_ORG_TYPE"/>
2963
+ <INSTANCE NAME="TGT_DEFAULT" TYPE="Target Definition" TRANSFORMATION_NAME="TGT_DEFAULT"/>
2964
+ <CONNECTOR FROMINSTANCE="SRC" FROMFIELD="ID" TOINSTANCE="SQ_SRC" TOFIELD="ID"/>
2965
+ <CONNECTOR FROMINSTANCE="SRC" FROMFIELD="Party_Type" TOINSTANCE="SQ_SRC" TOFIELD="Party_Type"/>
2966
+ <CONNECTOR FROMINSTANCE="SRC" FROMFIELD="Attrib_Name" TOINSTANCE="SQ_SRC" TOFIELD="Attrib_Name"/>
2967
+ <CONNECTOR FROMINSTANCE="SQ_SRC" FROMFIELD="ID" TOINSTANCE="RTR_SPLIT" TOFIELD="ID"/>
2968
+ <CONNECTOR FROMINSTANCE="SQ_SRC" FROMFIELD="Party_Type" TOINSTANCE="RTR_SPLIT" TOFIELD="Party_Type"/>
2969
+ <CONNECTOR FROMINSTANCE="SQ_SRC" FROMFIELD="Attrib_Name" TOINSTANCE="RTR_SPLIT" TOFIELD="Attrib_Name"/>
2970
+ <CONNECTOR FROMINSTANCE="RTR_SPLIT" FROMINSTANCEGROUP="Rest_Type_PER" FROMFIELD="ID" TOINSTANCE="TGT_PER_TYPE" TOFIELD="ID"/>
2971
+ <CONNECTOR FROMINSTANCE="RTR_SPLIT" FROMINSTANCEGROUP="Rest_Value_PER" FROMFIELD="ID" TOINSTANCE="TGT_PER_VALUE" TOFIELD="ID"/>
2972
+ <CONNECTOR FROMINSTANCE="RTR_SPLIT" FROMINSTANCEGROUP="Rest_Type_ORG" FROMFIELD="ID" TOINSTANCE="TGT_ORG_TYPE" TOFIELD="ID"/>
2973
+ <CONNECTOR FROMINSTANCE="RTR_SPLIT" FROMINSTANCEGROUP="DEFAULT1" FROMFIELD="ID" TOINSTANCE="TGT_DEFAULT" TOFIELD="ID"/>
2974
+ </MAPPING>
2975
+ <CONFIG NAME="default_session_config"/>
2976
+ <WORKFLOW NAME="wf_router_groups_test" ISVALID="YES">
2977
+ <TASK NAME="Start" REUSABLE="NO" TYPE="Start"/>
2978
+ <SESSION NAME="s_m_router_groups_test" ISVALID="YES" REUSABLE="NO" MAPPINGNAME="m_router_groups_test">
2979
+ <CONFIGREFERENCE REFOBJECTNAME="default_session_config" TYPE="Session config"/>
2980
+ </SESSION>
2981
+ <TASKINSTANCE NAME="Start" TASKNAME="Start" TASKTYPE="Start"/>
2982
+ <TASKINSTANCE NAME="s_m_router_groups_test" TASKNAME="s_m_router_groups_test" TASKTYPE="Session"/>
2983
+ <WORKFLOWLINK FROMTASK="Start" TOTASK="s_m_router_groups_test"/>
2984
+ </WORKFLOW>
2985
+ </FOLDER>
2986
+ </REPOSITORY>
2987
+ </POWERMART>'''
2988
+
2989
+ def test_router_generates_all_named_groups(self):
2990
+ converter = InformaticaConverter()
2991
+ tmpdir = tempfile.mkdtemp()
2992
+ try:
2993
+ converter.convert_string(self.ROUTER_GROUP_XML, output_dir=tmpdir)
2994
+ for fn in os.listdir(tmpdir):
2995
+ if fn.startswith("mapping_") and fn.endswith(".py"):
2996
+ with open(os.path.join(tmpdir, fn)) as f:
2997
+ code = f.read()
2998
+ assert "Rest_Type_PER" in code, "Should have Rest_Type_PER group"
2999
+ assert "Rest_Value_PER" in code, "Should have Rest_Value_PER group"
3000
+ assert "Rest_Type_ORG" in code, "Should have Rest_Type_ORG group"
3001
+ assert "_router_mask_" in code, "Should have router masks"
3002
+ assert "Default group" in code, "Should have default group"
3003
+ assert "_router_default_mask" in code, "Should have default mask"
3004
+ break
3005
+ finally:
3006
+ shutil.rmtree(tmpdir)
3007
+
3008
+ def test_router_group_creates_separate_dataframes(self):
3009
+ converter = InformaticaConverter()
3010
+ tmpdir = tempfile.mkdtemp()
3011
+ try:
3012
+ converter.convert_string(self.ROUTER_GROUP_XML, output_dir=tmpdir)
3013
+ for fn in os.listdir(tmpdir):
3014
+ if fn.startswith("mapping_") and fn.endswith(".py"):
3015
+ with open(os.path.join(tmpdir, fn)) as f:
3016
+ code = f.read()
3017
+ assert "df_rtr_split_rest_type_per" in code, "Should create df for Rest_Type_PER"
3018
+ assert "df_rtr_split_rest_value_per" in code, "Should create df for Rest_Value_PER"
3019
+ assert "df_rtr_split_rest_type_org" in code, "Should create df for Rest_Type_ORG"
3020
+ assert ".copy()" in code, "Should copy DataFrames"
3021
+ break
3022
+ finally:
3023
+ shutil.rmtree(tmpdir)
3024
+
3025
+ def test_router_group_connector_resolution(self):
3026
+ converter = InformaticaConverter()
3027
+ tmpdir = tempfile.mkdtemp()
3028
+ try:
3029
+ converter.convert_string(self.ROUTER_GROUP_XML, output_dir=tmpdir)
3030
+ for fn in os.listdir(tmpdir):
3031
+ if fn.startswith("mapping_") and fn.endswith(".py"):
3032
+ with open(os.path.join(tmpdir, fn)) as f:
3033
+ code = f.read()
3034
+ assert "df_rtr_split_rest_type_per" in code
3035
+ assert "df_rtr_split_rest_value_per" in code
3036
+ break
3037
+ finally:
3038
+ shutil.rmtree(tmpdir)
3039
+
3040
+ def test_router_default_excludes_all_groups(self):
3041
+ converter = InformaticaConverter()
3042
+ tmpdir = tempfile.mkdtemp()
3043
+ try:
3044
+ converter.convert_string(self.ROUTER_GROUP_XML, output_dir=tmpdir)
3045
+ for fn in os.listdir(tmpdir):
3046
+ if fn.startswith("mapping_") and fn.endswith(".py"):
3047
+ with open(os.path.join(tmpdir, fn)) as f:
3048
+ code = f.read()
3049
+ assert "~_router_mask_rtr_split_0" in code, "Default should negate first group mask"
3050
+ assert "~_router_mask_rtr_split_1" in code, "Default should negate second group mask"
3051
+ assert "~_router_mask_rtr_split_2" in code, "Default should negate third group mask"
3052
+ break
3053
+ finally:
3054
+ shutil.rmtree(tmpdir)
3055
+
3056
+
3057
+ class TestRouterGroupParsing(unittest.TestCase):
3058
+
3059
+ def test_group_expression_parsed(self):
3060
+ xml = '''<?xml version="1.0" encoding="UTF-8"?>
3061
+ <!DOCTYPE POWERMART SYSTEM "powrmart.dtd">
3062
+ <POWERMART CREATION_DATE="01/01/2025" REPOSITORY_VERSION="1">
3063
+ <REPOSITORY NAME="repo" VERSION="1" CODEPAGE="UTF-8" DATABASETYPE="Oracle">
3064
+ <FOLDER NAME="TEST" OWNER="admin">
3065
+ <MAPPING NAME="m_test" ISVALID="YES">
3066
+ <TRANSFORMATION NAME="RTR" TYPE="Router" REUSABLE="NO">
3067
+ <TRANSFORMFIELD NAME="X" DATATYPE="string" PRECISION="10" SCALE="0" PORTTYPE="INPUT/OUTPUT"/>
3068
+ <GROUP NAME="INPUT" ORDER="1" TYPE="INPUT"/>
3069
+ <GROUP NAME="GRP_A" ORDER="2" TYPE="OUTPUT" EXPRESSION="X = &apos;A&apos;"/>
3070
+ <GROUP NAME="DEFAULT1" ORDER="3" TYPE="OUTPUT/DEFAULT"/>
3071
+ </TRANSFORMATION>
3072
+ </MAPPING>
3073
+ </FOLDER>
3074
+ </REPOSITORY>
3075
+ </POWERMART>'''
3076
+ from informatica_python.parser import InformaticaParser
3077
+ parser = InformaticaParser()
3078
+ pm = parser.parse_string(xml)
3079
+ mapping = pm.repositories[0].folders[0].mappings[0]
3080
+ rtr = mapping.transformations[0]
3081
+ assert len(rtr.groups) == 3
3082
+ grp_a = [g for g in rtr.groups if g.name == "GRP_A"][0]
3083
+ assert "X = " in grp_a.expression
3084
+ assert grp_a.type == "OUTPUT"
3085
+ default_g = [g for g in rtr.groups if g.name == "DEFAULT1"][0]
3086
+ assert "DEFAULT" in default_g.type.upper()
3087
+
3088
+ def test_connector_group_attributes_parsed(self):
3089
+ xml = '''<?xml version="1.0" encoding="UTF-8"?>
3090
+ <!DOCTYPE POWERMART SYSTEM "powrmart.dtd">
3091
+ <POWERMART CREATION_DATE="01/01/2025" REPOSITORY_VERSION="1">
3092
+ <REPOSITORY NAME="repo" VERSION="1" CODEPAGE="UTF-8" DATABASETYPE="Oracle">
3093
+ <FOLDER NAME="TEST" OWNER="admin">
3094
+ <MAPPING NAME="m_test" ISVALID="YES">
3095
+ <TRANSFORMATION NAME="RTR" TYPE="Router" REUSABLE="NO">
3096
+ <TRANSFORMFIELD NAME="X" DATATYPE="string" PRECISION="10" SCALE="0" PORTTYPE="INPUT/OUTPUT"/>
3097
+ </TRANSFORMATION>
3098
+ <CONNECTOR FROMINSTANCE="RTR" FROMINSTANCEGROUP="GRP_A" FROMFIELD="X" TOINSTANCE="TGT" TOFIELD="X"/>
3099
+ </MAPPING>
3100
+ </FOLDER>
3101
+ </REPOSITORY>
3102
+ </POWERMART>'''
3103
+ from informatica_python.parser import InformaticaParser
3104
+ parser = InformaticaParser()
3105
+ pm = parser.parse_string(xml)
3106
+ mapping = pm.repositories[0].folders[0].mappings[0]
3107
+ conn = mapping.connectors[0]
3108
+ assert conn.from_instance_group == "GRP_A"
3109
+
3110
+
3111
+ class TestNotInsideIIF(unittest.TestCase):
3112
+
3113
+ def test_not_isnull_in_iif(self):
3114
+ expr = "IIF(NOT(ISNULL(STATUS)), 'HAS_STATUS', 'NO_STATUS')"
3115
+ result = convert_expression_vectorized(expr)
3116
+ assert "isna" in result or "isnull" in result
3117
+ assert "~" in result
3118
+ assert "np.where" in result
3119
+
3120
+ def test_not_isnull_standalone(self):
3121
+ from informatica_python.utils.expression_converter import convert_filter_vectorized
3122
+ result = convert_filter_vectorized("NOT(ISNULL(FIELD1))", "df")
3123
+ assert "~" in result
3124
+ assert "isna" in result
3125
+
3126
+ def test_not_equals_in_iif(self):
3127
+ expr = "IIF(NOT(STATUS = 'ACTIVE'), 'INACTIVE', 'ACTIVE')"
3128
+ result = convert_expression_vectorized(expr)
3129
+ assert "np.where" in result
3130
+ assert "~" in result
3131
+
3132
+
3133
+ class TestLookupSqlOverride(unittest.TestCase):
3134
+
3135
+ LOOKUP_SQL_XML = '''<?xml version="1.0" encoding="UTF-8"?>
3136
+ <!DOCTYPE POWERMART SYSTEM "powrmart.dtd">
3137
+ <POWERMART CREATION_DATE="01/01/2025" REPOSITORY_VERSION="1">
3138
+ <REPOSITORY NAME="repo" VERSION="1" CODEPAGE="UTF-8" DATABASETYPE="Oracle">
3139
+ <FOLDER NAME="TEST" OWNER="admin">
3140
+ <SOURCE NAME="SRC" DATABASETYPE="Flat File" DBDNAME="SRC">
3141
+ <FLATFILE DELIMITEDBY="COMMA" HEADERROWPRESENT="YES" PADBYTES="NO" ROWDELIMITER="\\n"/>
3142
+ <SOURCEFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" NULLABLE="NOTNULL" KEYTYPE="PRIMARY KEY" FIELDNUMBER="1"/>
3143
+ <SOURCEFIELD NAME="CODE" DATATYPE="string" PRECISION="10" SCALE="0" NULLABLE="NULL" KEYTYPE="NOT A KEY" FIELDNUMBER="2"/>
3144
+ </SOURCE>
3145
+ <TARGET NAME="TGT" DATABASETYPE="Flat File">
3146
+ <TARGETFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" NULLABLE="NULL" KEYTYPE="NOT A KEY" FIELDNUMBER="1"/>
3147
+ <TARGETFIELD NAME="DESC" DATATYPE="string" PRECISION="50" SCALE="0" NULLABLE="NULL" KEYTYPE="NOT A KEY" FIELDNUMBER="2"/>
3148
+ </TARGET>
3149
+ <MAPPING NAME="m_lookup_sql_test" ISVALID="YES">
3150
+ <TRANSFORMATION NAME="SQ_SRC" TYPE="Source Qualifier" REUSABLE="NO">
3151
+ <TRANSFORMFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" PORTTYPE="OUTPUT"/>
3152
+ <TRANSFORMFIELD NAME="CODE" DATATYPE="string" PRECISION="10" SCALE="0" PORTTYPE="OUTPUT"/>
3153
+ </TRANSFORMATION>
3154
+ <TRANSFORMATION NAME="LKP_CODES" TYPE="Lookup Procedure" REUSABLE="NO">
3155
+ <TRANSFORMFIELD NAME="CODE_IN" DATATYPE="string" PRECISION="10" SCALE="0" PORTTYPE="INPUT"/>
3156
+ <TRANSFORMFIELD NAME="DESC_OUT" DATATYPE="string" PRECISION="50" SCALE="0" PORTTYPE="OUTPUT"/>
3157
+ <TABLEATTRIBUTE NAME="Lookup Sql Override" VALUE="SELECT CODE, DESCRIPTION FROM REF_CODES WHERE ACTIVE_FLAG = &apos;Y&apos;"/>
3158
+ <TABLEATTRIBUTE NAME="Lookup table name" VALUE="REF_CODES"/>
3159
+ </TRANSFORMATION>
3160
+ <INSTANCE NAME="SRC" TYPE="Source Definition" TRANSFORMATION_NAME="SRC"/>
3161
+ <INSTANCE NAME="SQ_SRC" TYPE="Source Qualifier" TRANSFORMATION_NAME="SQ_SRC"/>
3162
+ <INSTANCE NAME="LKP_CODES" TYPE="Lookup Procedure" TRANSFORMATION_NAME="LKP_CODES"/>
3163
+ <INSTANCE NAME="TGT" TYPE="Target Definition" TRANSFORMATION_NAME="TGT"/>
3164
+ <CONNECTOR FROMINSTANCE="SRC" FROMFIELD="ID" TOINSTANCE="SQ_SRC" TOFIELD="ID"/>
3165
+ <CONNECTOR FROMINSTANCE="SRC" FROMFIELD="CODE" TOINSTANCE="SQ_SRC" TOFIELD="CODE"/>
3166
+ <CONNECTOR FROMINSTANCE="SQ_SRC" FROMFIELD="ID" TOINSTANCE="LKP_CODES" TOFIELD="CODE_IN"/>
3167
+ <CONNECTOR FROMINSTANCE="LKP_CODES" FROMFIELD="DESC_OUT" TOINSTANCE="TGT" TOFIELD="DESC"/>
3168
+ <CONNECTOR FROMINSTANCE="SQ_SRC" FROMFIELD="ID" TOINSTANCE="TGT" TOFIELD="ID"/>
3169
+ </MAPPING>
3170
+ <CONFIG NAME="default_session_config"/>
3171
+ <WORKFLOW NAME="wf_lookup_sql_test" ISVALID="YES">
3172
+ <TASK NAME="Start" REUSABLE="NO" TYPE="Start"/>
3173
+ <SESSION NAME="s_m_lookup_sql_test" ISVALID="YES" REUSABLE="NO" MAPPINGNAME="m_lookup_sql_test">
3174
+ <CONFIGREFERENCE REFOBJECTNAME="default_session_config" TYPE="Session config"/>
3175
+ </SESSION>
3176
+ <TASKINSTANCE NAME="Start" TASKNAME="Start" TASKTYPE="Start"/>
3177
+ <TASKINSTANCE NAME="s_m_lookup_sql_test" TASKNAME="s_m_lookup_sql_test" TASKTYPE="Session"/>
3178
+ <WORKFLOWLINK FROMTASK="Start" TOTASK="s_m_lookup_sql_test"/>
3179
+ </WORKFLOW>
3180
+ </FOLDER>
3181
+ </REPOSITORY>
3182
+ </POWERMART>'''
3183
+
3184
+ def test_lookup_sql_override_applied(self):
3185
+ converter = InformaticaConverter()
3186
+ tmpdir = tempfile.mkdtemp()
3187
+ try:
3188
+ converter.convert_string(self.LOOKUP_SQL_XML, output_dir=tmpdir)
3189
+ for fn in os.listdir(tmpdir):
3190
+ if fn.startswith("mapping_") and fn.endswith(".py"):
3191
+ with open(os.path.join(tmpdir, fn)) as f:
3192
+ code = f.read()
3193
+ assert "SELECT CODE, DESCRIPTION FROM REF_CODES" in code, \
3194
+ "Lookup SQL Override should appear in generated code"
3195
+ assert "read_from_db" in code, \
3196
+ "Should use read_from_db with the override SQL"
3197
+ break
3198
+ finally:
3199
+ shutil.rmtree(tmpdir)
3200
+
3201
+ def test_lookup_sql_in_all_sql_queries(self):
3202
+ converter = InformaticaConverter()
3203
+ tmpdir = tempfile.mkdtemp()
3204
+ try:
3205
+ converter.convert_string(self.LOOKUP_SQL_XML, output_dir=tmpdir)
3206
+ sql_file = os.path.join(tmpdir, "all_sql_queries.sql")
3207
+ if os.path.exists(sql_file):
3208
+ with open(sql_file) as f:
3209
+ sql = f.read()
3210
+ assert "REF_CODES" in sql, "SQL file should contain lookup SQL"
3211
+ finally:
3212
+ shutil.rmtree(tmpdir)