informatica-python 1.9.3__tar.gz → 1.9.4__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (31) hide show
  1. {informatica_python-1.9.3 → informatica_python-1.9.4}/PKG-INFO +3 -3
  2. {informatica_python-1.9.3 → informatica_python-1.9.4}/README.md +2 -2
  3. {informatica_python-1.9.3 → informatica_python-1.9.4}/informatica_python/__init__.py +1 -1
  4. {informatica_python-1.9.3 → informatica_python-1.9.4}/informatica_python/generators/mapping_gen.py +25 -6
  5. {informatica_python-1.9.3 → informatica_python-1.9.4}/informatica_python/utils/expression_converter.py +18 -4
  6. {informatica_python-1.9.3 → informatica_python-1.9.4}/informatica_python.egg-info/PKG-INFO +3 -3
  7. {informatica_python-1.9.3 → informatica_python-1.9.4}/pyproject.toml +1 -1
  8. {informatica_python-1.9.3 → informatica_python-1.9.4}/tests/test_integration.py +239 -0
  9. {informatica_python-1.9.3 → informatica_python-1.9.4}/LICENSE +0 -0
  10. {informatica_python-1.9.3 → informatica_python-1.9.4}/informatica_python/cli.py +0 -0
  11. {informatica_python-1.9.3 → informatica_python-1.9.4}/informatica_python/converter.py +0 -0
  12. {informatica_python-1.9.3 → informatica_python-1.9.4}/informatica_python/generators/__init__.py +0 -0
  13. {informatica_python-1.9.3 → informatica_python-1.9.4}/informatica_python/generators/config_gen.py +0 -0
  14. {informatica_python-1.9.3 → informatica_python-1.9.4}/informatica_python/generators/error_log_gen.py +0 -0
  15. {informatica_python-1.9.3 → informatica_python-1.9.4}/informatica_python/generators/helper_gen.py +0 -0
  16. {informatica_python-1.9.3 → informatica_python-1.9.4}/informatica_python/generators/sql_gen.py +0 -0
  17. {informatica_python-1.9.3 → informatica_python-1.9.4}/informatica_python/generators/workflow_gen.py +0 -0
  18. {informatica_python-1.9.3 → informatica_python-1.9.4}/informatica_python/models.py +0 -0
  19. {informatica_python-1.9.3 → informatica_python-1.9.4}/informatica_python/parser.py +0 -0
  20. {informatica_python-1.9.3 → informatica_python-1.9.4}/informatica_python/utils/__init__.py +0 -0
  21. {informatica_python-1.9.3 → informatica_python-1.9.4}/informatica_python/utils/datatype_map.py +0 -0
  22. {informatica_python-1.9.3 → informatica_python-1.9.4}/informatica_python/utils/lib_adapters.py +0 -0
  23. {informatica_python-1.9.3 → informatica_python-1.9.4}/informatica_python/utils/sql_dialect.py +0 -0
  24. {informatica_python-1.9.3 → informatica_python-1.9.4}/informatica_python.egg-info/SOURCES.txt +0 -0
  25. {informatica_python-1.9.3 → informatica_python-1.9.4}/informatica_python.egg-info/dependency_links.txt +0 -0
  26. {informatica_python-1.9.3 → informatica_python-1.9.4}/informatica_python.egg-info/entry_points.txt +0 -0
  27. {informatica_python-1.9.3 → informatica_python-1.9.4}/informatica_python.egg-info/requires.txt +0 -0
  28. {informatica_python-1.9.3 → informatica_python-1.9.4}/informatica_python.egg-info/top_level.txt +0 -0
  29. {informatica_python-1.9.3 → informatica_python-1.9.4}/setup.cfg +0 -0
  30. {informatica_python-1.9.3 → informatica_python-1.9.4}/tests/test_converter.py +0 -0
  31. {informatica_python-1.9.3 → informatica_python-1.9.4}/tests/test_expressions.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: informatica-python
3
- Version: 1.9.3
3
+ Version: 1.9.4
4
4
  Summary: Convert Informatica PowerCenter workflow XML to Python/PySpark code
5
5
  Author: Nick
6
6
  License: MIT
@@ -430,7 +430,7 @@ The generated `helper_functions.py` provides a complete runtime library:
430
430
  - **Generated code formatting**: Consistent `# ---` section headers for Source Qualifiers, Transforms, and Target Writes; metadata comments (database type, field lists); column mapping and write operation comments; clean blank line handling
431
431
  - **Source/target detection**: Case-insensitive instance type matching
432
432
  - **Session→mapping inference**: Longest-suffix-match strategy for ambiguous mapping names
433
- - **646 tests** across unit, integration, expression, and formatting test suites
433
+ - **663 tests** across unit, integration, expression, and formatting test suites
434
434
 
435
435
  ### v1.9.2 (Phase 8)
436
436
  - Mapping output files now use real mapping names (e.g., `mapping_m_customer_load.py`) instead of generic numeric indices (`mapping_1.py`)
@@ -495,7 +495,7 @@ The generated `helper_functions.py` provides a complete runtime library:
495
495
  cd informatica_python
496
496
  pip install -e ".[dev]"
497
497
 
498
- # Run tests (646 tests)
498
+ # Run tests (663 tests)
499
499
  pytest tests/ -v
500
500
  ```
501
501
 
@@ -403,7 +403,7 @@ The generated `helper_functions.py` provides a complete runtime library:
403
403
  - **Generated code formatting**: Consistent `# ---` section headers for Source Qualifiers, Transforms, and Target Writes; metadata comments (database type, field lists); column mapping and write operation comments; clean blank line handling
404
404
  - **Source/target detection**: Case-insensitive instance type matching
405
405
  - **Session→mapping inference**: Longest-suffix-match strategy for ambiguous mapping names
406
- - **646 tests** across unit, integration, expression, and formatting test suites
406
+ - **663 tests** across unit, integration, expression, and formatting test suites
407
407
 
408
408
  ### v1.9.2 (Phase 8)
409
409
  - Mapping output files now use real mapping names (e.g., `mapping_m_customer_load.py`) instead of generic numeric indices (`mapping_1.py`)
@@ -468,7 +468,7 @@ The generated `helper_functions.py` provides a complete runtime library:
468
468
  cd informatica_python
469
469
  pip install -e ".[dev]"
470
470
 
471
- # Run tests (646 tests)
471
+ # Run tests (663 tests)
472
472
  pytest tests/ -v
473
473
  ```
474
474
 
@@ -7,7 +7,7 @@ Licensed under the MIT License.
7
7
 
8
8
  from informatica_python.converter import InformaticaConverter
9
9
 
10
- __version__ = "1.9.3"
10
+ __version__ = "1.9.4"
11
11
  __author__ = "Nick"
12
12
  __license__ = "MIT"
13
13
  __all__ = ["InformaticaConverter"]
@@ -757,7 +757,7 @@ def _generate_transformation(lines, tx, connector_graph, source_dfs, transform_m
757
757
  elif tx_type in ("joiner",):
758
758
  _gen_joiner_transform(lines, tx, tx_safe, input_df, input_sources, source_dfs, connector_graph, data_lib)
759
759
  elif tx_type in ("lookup procedure", "lookup"):
760
- _gen_lookup_transform(lines, tx, tx_safe, input_df, source_dfs, data_lib)
760
+ _gen_lookup_transform(lines, tx, tx_safe, input_df, source_dfs, connector_graph, data_lib)
761
761
  elif tx_type == "router":
762
762
  _gen_router_transform(lines, tx, tx_safe, input_df, source_dfs)
763
763
  elif tx_type in ("union",):
@@ -982,7 +982,7 @@ def _gen_joiner_transform(lines, tx, tx_safe, input_df, input_sources, source_df
982
982
  source_dfs[tx.name] = f"df_{tx_safe}"
983
983
 
984
984
 
985
- def _gen_lookup_transform(lines, tx, tx_safe, input_df, source_dfs, data_lib="pandas"):
985
+ def _gen_lookup_transform(lines, tx, tx_safe, input_df, source_dfs, connector_graph=None, data_lib="pandas"):
986
986
  lookup_table = ""
987
987
  lookup_sql = ""
988
988
  lookup_condition = ""
@@ -1012,6 +1012,11 @@ def _gen_lookup_transform(lines, tx, tx_safe, input_df, source_dfs, data_lib="pa
1012
1012
 
1013
1013
  all_output_fields = return_fields + lookup_output_fields
1014
1014
 
1015
+ port_to_col = {}
1016
+ if connector_graph and tx.name in connector_graph.get("to", {}):
1017
+ for conn in connector_graph["to"][tx.name]:
1018
+ port_to_col[conn.to_field.lower()] = conn.from_field
1019
+
1015
1020
  lines.append(f" # Lookup: {lookup_table or tx.name}")
1016
1021
  if lookup_sql:
1017
1022
  _emit_sql_with_params(lines, f"lkp_sql_{tx_safe}", lookup_sql)
@@ -1020,10 +1025,13 @@ def _gen_lookup_transform(lines, tx, tx_safe, input_df, source_dfs, data_lib="pa
1020
1025
  lines.append(f" df_lkp_{tx_safe} = read_from_db(config, 'SELECT * FROM {lookup_table}', 'default')")
1021
1026
  else:
1022
1027
  empty_expr = lib_empty_df(data_lib)
1023
- lines.append(f" df_lkp_{tx_safe} = {empty_expr}")
1028
+ lines.append(f" df_lkp_{tx_safe} = {empty_expr} # WARNING: no lookup table/SQL override found")
1024
1029
 
1025
1030
  input_keys, lookup_keys = parse_lookup_condition(lookup_condition)
1026
1031
 
1032
+ if input_keys and port_to_col:
1033
+ input_keys = [port_to_col.get(k.lower(), k) for k in input_keys]
1034
+
1027
1035
  if input_keys and lookup_keys:
1028
1036
  lines.append(f" # Lookup condition: {lookup_condition}")
1029
1037
 
@@ -1078,12 +1086,23 @@ def _gen_router_transform(lines, tx, tx_safe, input_df, source_dfs):
1078
1086
  if "Group Filter Condition" in attr.name:
1079
1087
  group_conditions[attr.name] = attr.value
1080
1088
 
1089
+ remaining_mask_parts = []
1081
1090
  if group_conditions:
1082
1091
  for i, (gname, cond) in enumerate(group_conditions.items()):
1083
- expr_py = convert_expression(cond) if cond else "True"
1084
- lines.append(f" df_{tx_safe}_group{i} = {input_df}[{expr_py}].copy() # {gname}")
1092
+ if cond and cond.strip():
1093
+ expr_py = convert_filter_vectorized(cond, input_df)
1094
+ else:
1095
+ expr_py = f"pd.Series(True, index={input_df}.index)"
1096
+ mask_var = f"_router_mask_{tx_safe}_{i}"
1097
+ lines.append(f" {mask_var} = {expr_py} # {gname}")
1098
+ lines.append(f" df_{tx_safe}_group{i} = {input_df}[{mask_var}].copy()")
1085
1099
  source_dfs[f"{tx.name}_group{i}"] = f"df_{tx_safe}_group{i}"
1086
- lines.append(f" df_{tx_safe} = {input_df}.copy() # Default group")
1100
+ remaining_mask_parts.append(f"~{mask_var}")
1101
+ if remaining_mask_parts:
1102
+ lines.append(f" _router_default_mask = {' & '.join(remaining_mask_parts)}")
1103
+ lines.append(f" df_{tx_safe} = {input_df}[_router_default_mask].copy() # Default group")
1104
+ else:
1105
+ lines.append(f" df_{tx_safe} = {input_df}.copy() # Default group")
1087
1106
  source_dfs[tx.name] = f"df_{tx_safe}"
1088
1107
 
1089
1108
 
@@ -248,6 +248,7 @@ def _convert_infa_date_format(fmt_str):
248
248
  fmt = fmt.replace("Mon", "%b").replace("MON", "%b")
249
249
  fmt = fmt.replace("HH24", "%H").replace("HH12", "%I").replace("HH", "%H")
250
250
  fmt = fmt.replace("MI", "%M").replace("SS", "%S")
251
+ fmt = fmt.replace("US", "%f").replace("NS", "%f").replace("MS", "%f")
251
252
  return fmt
252
253
 
253
254
 
@@ -548,7 +549,7 @@ def _vec_recursive(expr, df_var):
548
549
  'RTRIM': f'.str.rstrip("{char_arg}")',
549
550
  'TRIM': f'.str.strip("{char_arg}")',
550
551
  }
551
- return f'{inner_val}{method_map[func_name.upper()]}'
552
+ return f'{inner_val}.astype(str){method_map[func_name.upper()]}'
552
553
 
553
554
  upper_result = _find_func_call(cleaned, 'UPPER')
554
555
  if upper_result and upper_result[0] == 0 and upper_result[1] == len(cleaned):
@@ -584,7 +585,7 @@ def _vec_recursive(expr, df_var):
584
585
  if len(args) >= 2:
585
586
  field_val = _vec_recursive(args[0], df_var)
586
587
  try:
587
- start = int(args[1].strip()) - 1
588
+ start = max(int(args[1].strip()) - 1, 0)
588
589
  except ValueError:
589
590
  start_val = _vec_recursive(args[1], df_var)
590
591
  if len(args) >= 3:
@@ -722,7 +723,11 @@ def _vec_recursive(expr, df_var):
722
723
  field_val = _vec_recursive(args[0], df_var)
723
724
  pattern_val = args[1].strip().strip("'\"")
724
725
  if func_name == 'REG_EXTRACT':
725
- return f'{field_val}.str.extract(r"({pattern_val})", expand=False)'
726
+ if re.search(r'(?<!\\)\((?!\?)', pattern_val):
727
+ extract_pat = pattern_val
728
+ else:
729
+ extract_pat = f'({pattern_val})'
730
+ return f'{field_val}.str.extract(r"{extract_pat}", expand=False)'
726
731
  elif func_name == 'REG_REPLACE':
727
732
  replace_val = args[2].strip().strip("'\"") if len(args) >= 3 else ''
728
733
  return f'{field_val}.str.replace(r"{pattern_val}", "{replace_val}", regex=True)'
@@ -894,7 +899,8 @@ def _vec_recursive(expr, df_var):
894
899
  'True', 'False', 'None', 'and', 'or', 'not', 'np', 'pd', 'get_variable',
895
900
  'str', 'int', 'float', 'bool', 'len', 'abs', 'round',
896
901
  'fillna', 'astype', 'isna', 'notna', 'where', 'errors', 'coerce',
897
- 'lookup_func',
902
+ 'lookup_func', 'expand', 'extract', 'regex', 'contains', 'replace',
903
+ 'upper', 'lower', 'strip', 'lstrip', 'rstrip', 'dt', 'copy',
898
904
  }
899
905
  converted = _substitute_fields(converted, df_var, skip_words)
900
906
 
@@ -904,6 +910,8 @@ def _vec_recursive(expr, df_var):
904
910
  converted = re.sub(r'<>', '!=', converted)
905
911
  converted = re.sub(r'(?<![<>!=])=(?!=)', '==', converted)
906
912
  converted = re.sub(r'\berrors\s*==\s*(["\'])', r'errors=\1', converted)
913
+ converted = re.sub(r'\bexpand\s*==\s*', 'expand=', converted)
914
+ converted = re.sub(r'\bregex\s*==\s*', 'regex=', converted)
907
915
 
908
916
  converted = re.sub(r'\s+', ' ', converted).strip()
909
917
 
@@ -1044,8 +1052,14 @@ def _vectorize_simple(part, df_var):
1044
1052
  'True', 'False', 'None', 'and', 'or', 'not', 'np', 'pd',
1045
1053
  'str', 'int', 'float', 'isna', 'notna', 'fillna',
1046
1054
  'get_variable', 'lookup_func', 'isin', 'eq',
1055
+ 'expand', 'extract', 'astype', 'errors', 'coerce', 'regex',
1056
+ 'contains', 'replace', 'upper', 'lower', 'strip', 'lstrip', 'rstrip',
1057
+ 'dt', 'len', 'copy', 'abs', 'round', 'where', 'bool',
1047
1058
  }
1048
1059
  c = _substitute_fields(c, df_var, skip_words)
1060
+ c = re.sub(r'\bexpand\s*==\s*', 'expand=', c)
1061
+ c = re.sub(r'\berrors\s*==\s*', 'errors=', c)
1062
+ c = re.sub(r'\bregex\s*==\s*', 'regex=', c)
1049
1063
 
1050
1064
  return c
1051
1065
 
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: informatica-python
3
- Version: 1.9.3
3
+ Version: 1.9.4
4
4
  Summary: Convert Informatica PowerCenter workflow XML to Python/PySpark code
5
5
  Author: Nick
6
6
  License: MIT
@@ -430,7 +430,7 @@ The generated `helper_functions.py` provides a complete runtime library:
430
430
  - **Generated code formatting**: Consistent `# ---` section headers for Source Qualifiers, Transforms, and Target Writes; metadata comments (database type, field lists); column mapping and write operation comments; clean blank line handling
431
431
  - **Source/target detection**: Case-insensitive instance type matching
432
432
  - **Session→mapping inference**: Longest-suffix-match strategy for ambiguous mapping names
433
- - **646 tests** across unit, integration, expression, and formatting test suites
433
+ - **663 tests** across unit, integration, expression, and formatting test suites
434
434
 
435
435
  ### v1.9.2 (Phase 8)
436
436
  - Mapping output files now use real mapping names (e.g., `mapping_m_customer_load.py`) instead of generic numeric indices (`mapping_1.py`)
@@ -495,7 +495,7 @@ The generated `helper_functions.py` provides a complete runtime library:
495
495
  cd informatica_python
496
496
  pip install -e ".[dev]"
497
497
 
498
- # Run tests (646 tests)
498
+ # Run tests (663 tests)
499
499
  pytest tests/ -v
500
500
  ```
501
501
 
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "informatica-python"
7
- version = "1.9.3"
7
+ version = "1.9.4"
8
8
  description = "Convert Informatica PowerCenter workflow XML to Python/PySpark code"
9
9
  readme = "README.md"
10
10
  license = {text = "MIT"}
@@ -2246,3 +2246,242 @@ class TestJoinerFieldRemapping(unittest.TestCase):
2246
2246
  if "left_on" in line and "right_on" in line:
2247
2247
  assert "Table_Name" in line, \
2248
2248
  "Merge should use source column name Table_Name"
2249
+
2250
+
2251
+ class TestRegExtractConversion(unittest.TestCase):
2252
+ """Tests for REG_EXTRACT capture group and expand parameter handling."""
2253
+
2254
+ def test_no_double_capture_group(self):
2255
+ r = convert_expression_vectorized(r"REG_EXTRACT(col,'(\s+)')", "df")
2256
+ assert r.count("(") - r.count("str.extract") <= 2
2257
+ assert '((\\s+))' not in r
2258
+
2259
+ def test_adds_capture_group_when_missing(self):
2260
+ r = convert_expression_vectorized(r"REG_EXTRACT(col,'\\d+')", "df")
2261
+ assert 'expand=False' in r
2262
+ assert '.str.extract' in r
2263
+
2264
+ def test_expand_is_boolean_not_series(self):
2265
+ r = convert_expression_vectorized(r"REG_EXTRACT(col,'(\s+)')", "df")
2266
+ assert 'expand=False' in r
2267
+ assert 'expand==False' not in r
2268
+ assert 'df["expand"]' not in r
2269
+
2270
+ def test_isnull_reg_extract_nested(self):
2271
+ r = convert_expression_vectorized(
2272
+ "IIF(ISNULL(REG_EXTRACT(PART_BIRTH_DTE,'(\\s+)')),PART_BIRTH_DTE,NULL)", "df_exp"
2273
+ )
2274
+ assert "np.where" in r
2275
+ assert ".isna()" in r
2276
+ assert "expand=False" in r
2277
+ assert 'expand==False' not in r
2278
+ assert 'df_exp["expand"]' not in r
2279
+
2280
+
2281
+ class TestDatetimeFormatMask(unittest.TestCase):
2282
+ """Tests for datetime format mask conversion (US/microseconds)."""
2283
+
2284
+ def test_us_to_percent_f(self):
2285
+ from informatica_python.utils.expression_converter import _convert_infa_date_format
2286
+ fmt = _convert_infa_date_format("YYYY-MM-DD HH24.MI.SS.US")
2287
+ assert "%f" in fmt
2288
+ assert "US" not in fmt
2289
+
2290
+ def test_full_format_mask(self):
2291
+ from informatica_python.utils.expression_converter import _convert_infa_date_format
2292
+ fmt = _convert_infa_date_format("YYYY-MM-DD HH24:MI:SS")
2293
+ assert fmt == "%Y-%m-%d %H:%M:%S"
2294
+
2295
+ def test_to_date_with_us_format(self):
2296
+ r = convert_expression_vectorized(
2297
+ "TO_DATE(x, 'YYYY-MM-DD HH24.MI.SS.US')", "df"
2298
+ )
2299
+ assert "%f" in r
2300
+ assert "US" not in r
2301
+
2302
+
2303
+ class TestSubstrZeroIndex(unittest.TestCase):
2304
+ """Tests for SUBSTR with 0-based start position."""
2305
+
2306
+ def test_substr_start_0(self):
2307
+ r = convert_expression_vectorized("SUBSTR(x, 0, 11)", "df")
2308
+ assert "str[0:" in r
2309
+ assert "str[-1:" not in r
2310
+
2311
+ def test_substr_start_1(self):
2312
+ r = convert_expression_vectorized("SUBSTR(x, 1, 5)", "df")
2313
+ assert "str[0:" in r
2314
+
2315
+ def test_substr_start_5(self):
2316
+ r = convert_expression_vectorized("SUBSTR(x, 5, 3)", "df")
2317
+ assert "str[4:7]" in r
2318
+
2319
+
2320
+ class TestStringOpSafety(unittest.TestCase):
2321
+ """Tests for string operations adding .astype(str) for safety."""
2322
+
2323
+ def test_ltrim_has_astype_str(self):
2324
+ r = convert_expression_vectorized("LTRIM(name)", "df")
2325
+ assert ".astype(str)" in r
2326
+ assert ".str.lstrip()" in r
2327
+
2328
+ def test_rtrim_has_astype_str(self):
2329
+ r = convert_expression_vectorized("RTRIM(name)", "df")
2330
+ assert ".astype(str)" in r
2331
+ assert ".str.rstrip()" in r
2332
+
2333
+ def test_trim_has_astype_str(self):
2334
+ r = convert_expression_vectorized("TRIM(name)", "df")
2335
+ assert ".astype(str)" in r
2336
+ assert ".str.strip()" in r
2337
+
2338
+ def test_ltrim_with_char(self):
2339
+ r = convert_expression_vectorized("LTRIM(name, '0')", "df")
2340
+ assert ".astype(str)" in r
2341
+ assert '.str.lstrip("0")' in r
2342
+
2343
+
2344
+ class TestRouterVectorized(unittest.TestCase):
2345
+ """Tests for Router transformation generating vectorized conditions."""
2346
+
2347
+ ROUTER_XML = '''<?xml version="1.0" encoding="UTF-8"?>
2348
+ <!DOCTYPE POWERMART SYSTEM "powrmart.dtd">
2349
+ <POWERMART CREATION_DATE="01/01/2025" REPOSITORY_VERSION="1">
2350
+ <REPOSITORY NAME="repo" VERSION="1" CODEPAGE="UTF-8" DATABASETYPE="Oracle">
2351
+ <FOLDER NAME="TEST" OWNER="admin">
2352
+ <SOURCE NAME="SRC" DATABASETYPE="Flat File" DBDNAME="SRC">
2353
+ <FLATFILE DELIMITEDBY="COMMA" HEADERROWPRESENT="YES" PADBYTES="NO" ROWDELIMITER="\\n"/>
2354
+ <SOURCEFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" NULLABLE="NOTNULL" KEYTYPE="PRIMARY KEY" FIELDNUMBER="1"/>
2355
+ <SOURCEFIELD NAME="STATUS" DATATYPE="string" PRECISION="20" SCALE="0" NULLABLE="NULL" KEYTYPE="NOT A KEY" FIELDNUMBER="2"/>
2356
+ </SOURCE>
2357
+ <TARGET NAME="TGT" DATABASETYPE="Flat File">
2358
+ <TARGETFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" NULLABLE="NULL" KEYTYPE="NOT A KEY" FIELDNUMBER="1"/>
2359
+ </TARGET>
2360
+ <MAPPING NAME="m_router_test" ISVALID="YES">
2361
+ <TRANSFORMATION NAME="SQ_SRC" TYPE="Source Qualifier" REUSABLE="NO">
2362
+ <TRANSFORMFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" PORTTYPE="OUTPUT"/>
2363
+ <TRANSFORMFIELD NAME="STATUS" DATATYPE="string" PRECISION="20" SCALE="0" PORTTYPE="OUTPUT"/>
2364
+ </TRANSFORMATION>
2365
+ <TRANSFORMATION NAME="RTR_STATUS" TYPE="Router" REUSABLE="NO">
2366
+ <TRANSFORMFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" PORTTYPE="INPUT/OUTPUT"/>
2367
+ <TRANSFORMFIELD NAME="STATUS" DATATYPE="string" PRECISION="20" SCALE="0" PORTTYPE="INPUT/OUTPUT"/>
2368
+ <TABLEATTRIBUTE NAME="Group Filter Condition_ACTIVE" VALUE="STATUS = 'ACTIVE'"/>
2369
+ <TABLEATTRIBUTE NAME="Group Filter Condition_INACTIVE" VALUE="STATUS = 'INACTIVE'"/>
2370
+ </TRANSFORMATION>
2371
+ <INSTANCE NAME="SRC" TYPE="Source Definition" TRANSFORMATION_NAME="SRC"/>
2372
+ <INSTANCE NAME="SQ_SRC" TYPE="Source Qualifier" TRANSFORMATION_NAME="SQ_SRC"/>
2373
+ <INSTANCE NAME="RTR_STATUS" TYPE="Router" TRANSFORMATION_NAME="RTR_STATUS"/>
2374
+ <INSTANCE NAME="TGT" TYPE="Target Definition" TRANSFORMATION_NAME="TGT"/>
2375
+ <CONNECTOR FROMINSTANCE="SRC" FROMFIELD="ID" TOINSTANCE="SQ_SRC" TOFIELD="ID"/>
2376
+ <CONNECTOR FROMINSTANCE="SRC" FROMFIELD="STATUS" TOINSTANCE="SQ_SRC" TOFIELD="STATUS"/>
2377
+ <CONNECTOR FROMINSTANCE="SQ_SRC" FROMFIELD="ID" TOINSTANCE="RTR_STATUS" TOFIELD="ID"/>
2378
+ <CONNECTOR FROMINSTANCE="SQ_SRC" FROMFIELD="STATUS" TOINSTANCE="RTR_STATUS" TOFIELD="STATUS"/>
2379
+ <CONNECTOR FROMINSTANCE="RTR_STATUS" FROMFIELD="ID" TOINSTANCE="TGT" TOFIELD="ID"/>
2380
+ </MAPPING>
2381
+ <CONFIG NAME="default_session_config"/>
2382
+ <WORKFLOW NAME="wf_router_test" ISVALID="YES">
2383
+ <TASK NAME="Start" REUSABLE="NO" TYPE="Start"/>
2384
+ <SESSION NAME="s_m_router_test" ISVALID="YES" REUSABLE="NO" MAPPINGNAME="m_router_test">
2385
+ <CONFIGREFERENCE REFOBJECTNAME="default_session_config" TYPE="Session config"/>
2386
+ </SESSION>
2387
+ <TASKINSTANCE NAME="Start" TASKNAME="Start" TASKTYPE="Start"/>
2388
+ <TASKINSTANCE NAME="s_m_router_test" TASKNAME="s_m_router_test" TASKTYPE="Session"/>
2389
+ <WORKFLOWLINK FROMTASK="Start" TOTASK="s_m_router_test"/>
2390
+ </WORKFLOW>
2391
+ </FOLDER>
2392
+ </REPOSITORY>
2393
+ </POWERMART>'''
2394
+
2395
+ def test_router_generates_group_filters(self):
2396
+ converter = InformaticaConverter()
2397
+ tmpdir = tempfile.mkdtemp()
2398
+ try:
2399
+ converter.convert_string(self.ROUTER_XML, output_dir=tmpdir)
2400
+ for fn in os.listdir(tmpdir):
2401
+ if fn.startswith("mapping_") and fn.endswith(".py"):
2402
+ with open(os.path.join(tmpdir, fn)) as f:
2403
+ code = f.read()
2404
+ assert "_router_mask_" in code or "group0" in code, \
2405
+ "Router should generate group filter masks"
2406
+ assert "Default group" in code
2407
+ break
2408
+ finally:
2409
+ shutil.rmtree(tmpdir)
2410
+
2411
+ def test_router_default_excludes_matched_rows(self):
2412
+ converter = InformaticaConverter()
2413
+ tmpdir = tempfile.mkdtemp()
2414
+ try:
2415
+ converter.convert_string(self.ROUTER_XML, output_dir=tmpdir)
2416
+ for fn in os.listdir(tmpdir):
2417
+ if fn.startswith("mapping_") and fn.endswith(".py"):
2418
+ with open(os.path.join(tmpdir, fn)) as f:
2419
+ code = f.read()
2420
+ assert "_router_default_mask" in code or "~" in code, \
2421
+ "Default group should exclude rows matching other groups"
2422
+ break
2423
+ finally:
2424
+ shutil.rmtree(tmpdir)
2425
+
2426
+
2427
+ class TestLookupWarning(unittest.TestCase):
2428
+ """Tests for lookup empty DataFrame warning."""
2429
+
2430
+ LOOKUP_XML = '''<?xml version="1.0" encoding="UTF-8"?>
2431
+ <!DOCTYPE POWERMART SYSTEM "powrmart.dtd">
2432
+ <POWERMART CREATION_DATE="01/01/2025" REPOSITORY_VERSION="1">
2433
+ <REPOSITORY NAME="repo" VERSION="1" CODEPAGE="UTF-8" DATABASETYPE="Oracle">
2434
+ <FOLDER NAME="TEST" OWNER="admin">
2435
+ <SOURCE NAME="SRC" DATABASETYPE="Flat File" DBDNAME="SRC">
2436
+ <FLATFILE DELIMITEDBY="COMMA" HEADERROWPRESENT="YES" PADBYTES="NO" ROWDELIMITER="\\n"/>
2437
+ <SOURCEFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" NULLABLE="NOTNULL" KEYTYPE="PRIMARY KEY" FIELDNUMBER="1"/>
2438
+ </SOURCE>
2439
+ <TARGET NAME="TGT" DATABASETYPE="Flat File">
2440
+ <TARGETFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" NULLABLE="NULL" KEYTYPE="NOT A KEY" FIELDNUMBER="1"/>
2441
+ </TARGET>
2442
+ <MAPPING NAME="m_lkp_test" ISVALID="YES">
2443
+ <TRANSFORMATION NAME="SQ_SRC" TYPE="Source Qualifier" REUSABLE="NO">
2444
+ <TRANSFORMFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" PORTTYPE="OUTPUT"/>
2445
+ </TRANSFORMATION>
2446
+ <TRANSFORMATION NAME="LKP_TEST" TYPE="Lookup Procedure" REUSABLE="NO">
2447
+ <TRANSFORMFIELD NAME="ID" DATATYPE="integer" PRECISION="10" SCALE="0" PORTTYPE="INPUT"/>
2448
+ <TRANSFORMFIELD NAME="RESULT" DATATYPE="string" PRECISION="100" SCALE="0" PORTTYPE="OUTPUT/RETURN"/>
2449
+ <TABLEATTRIBUTE NAME="Lookup table name" VALUE="DIM_TABLE"/>
2450
+ <TABLEATTRIBUTE NAME="Lookup condition" VALUE="ID = ID"/>
2451
+ </TRANSFORMATION>
2452
+ <INSTANCE NAME="SRC" TYPE="Source Definition" TRANSFORMATION_NAME="SRC"/>
2453
+ <INSTANCE NAME="SQ_SRC" TYPE="Source Qualifier" TRANSFORMATION_NAME="SQ_SRC"/>
2454
+ <INSTANCE NAME="LKP_TEST" TYPE="Lookup Procedure" TRANSFORMATION_NAME="LKP_TEST"/>
2455
+ <INSTANCE NAME="TGT" TYPE="Target Definition" TRANSFORMATION_NAME="TGT"/>
2456
+ <CONNECTOR FROMINSTANCE="SRC" FROMFIELD="ID" TOINSTANCE="SQ_SRC" TOFIELD="ID"/>
2457
+ <CONNECTOR FROMINSTANCE="SQ_SRC" FROMFIELD="ID" TOINSTANCE="LKP_TEST" TOFIELD="ID"/>
2458
+ <CONNECTOR FROMINSTANCE="LKP_TEST" FROMFIELD="RESULT" TOINSTANCE="TGT" TOFIELD="ID"/>
2459
+ </MAPPING>
2460
+ <CONFIG NAME="default_session_config"/>
2461
+ <WORKFLOW NAME="wf_lkp_test" ISVALID="YES">
2462
+ <TASK NAME="Start" REUSABLE="NO" TYPE="Start"/>
2463
+ <SESSION NAME="s_m_lkp_test" ISVALID="YES" REUSABLE="NO" MAPPINGNAME="m_lkp_test">
2464
+ <CONFIGREFERENCE REFOBJECTNAME="default_session_config" TYPE="Session config"/>
2465
+ </SESSION>
2466
+ <TASKINSTANCE NAME="Start" TASKNAME="Start" TASKTYPE="Start"/>
2467
+ <TASKINSTANCE NAME="s_m_lkp_test" TASKNAME="s_m_lkp_test" TASKTYPE="Session"/>
2468
+ <WORKFLOWLINK FROMTASK="Start" TOTASK="s_m_lkp_test"/>
2469
+ </WORKFLOW>
2470
+ </FOLDER>
2471
+ </REPOSITORY>
2472
+ </POWERMART>'''
2473
+
2474
+ def test_lookup_with_table_reads_from_db(self):
2475
+ converter = InformaticaConverter()
2476
+ tmpdir = tempfile.mkdtemp()
2477
+ try:
2478
+ converter.convert_string(self.LOOKUP_XML, output_dir=tmpdir)
2479
+ for fn in os.listdir(tmpdir):
2480
+ if fn.startswith("mapping_") and fn.endswith(".py"):
2481
+ with open(os.path.join(tmpdir, fn)) as f:
2482
+ code = f.read()
2483
+ assert "read_from_db" in code, "Lookup with table should use read_from_db"
2484
+ assert "DIM_TABLE" in code
2485
+ break
2486
+ finally:
2487
+ shutil.rmtree(tmpdir)