informatica-python 1.9.2__tar.gz → 1.9.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (31) hide show
  1. {informatica_python-1.9.2 → informatica_python-1.9.3}/PKG-INFO +175 -47
  2. {informatica_python-1.9.2 → informatica_python-1.9.3}/README.md +174 -46
  3. {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/__init__.py +1 -1
  4. {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/generators/mapping_gen.py +140 -58
  5. {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/generators/workflow_gen.py +21 -4
  6. {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/utils/expression_converter.py +320 -4
  7. {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python.egg-info/PKG-INFO +175 -47
  8. {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python.egg-info/SOURCES.txt +1 -0
  9. {informatica_python-1.9.2 → informatica_python-1.9.3}/pyproject.toml +1 -1
  10. {informatica_python-1.9.2 → informatica_python-1.9.3}/tests/test_converter.py +171 -0
  11. informatica_python-1.9.3/tests/test_expressions.py +1195 -0
  12. {informatica_python-1.9.2 → informatica_python-1.9.3}/tests/test_integration.py +635 -0
  13. {informatica_python-1.9.2 → informatica_python-1.9.3}/LICENSE +0 -0
  14. {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/cli.py +0 -0
  15. {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/converter.py +0 -0
  16. {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/generators/__init__.py +0 -0
  17. {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/generators/config_gen.py +0 -0
  18. {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/generators/error_log_gen.py +0 -0
  19. {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/generators/helper_gen.py +0 -0
  20. {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/generators/sql_gen.py +0 -0
  21. {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/models.py +0 -0
  22. {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/parser.py +0 -0
  23. {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/utils/__init__.py +0 -0
  24. {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/utils/datatype_map.py +0 -0
  25. {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/utils/lib_adapters.py +0 -0
  26. {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/utils/sql_dialect.py +0 -0
  27. {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python.egg-info/dependency_links.txt +0 -0
  28. {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python.egg-info/entry_points.txt +0 -0
  29. {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python.egg-info/requires.txt +0 -0
  30. {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python.egg-info/top_level.txt +0 -0
  31. {informatica_python-1.9.2 → informatica_python-1.9.3}/setup.cfg +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: informatica-python
3
- Version: 1.9.2
3
+ Version: 1.9.3
4
4
  Summary: Convert Informatica PowerCenter workflow XML to Python/PySpark code
5
5
  Author: Nick
6
6
  License: MIT
@@ -79,25 +79,26 @@ from informatica_python import InformaticaConverter
79
79
 
80
80
  converter = InformaticaConverter()
81
81
 
82
- # Parse and generate files
83
- converter.convert_to_files("workflow_export.xml", "output_dir")
82
+ # Parse and generate files to a directory
83
+ converter.convert("workflow_export.xml", output_dir="output_dir")
84
84
 
85
- # Parse and generate zip
86
- converter.convert_to_zip("workflow_export.xml", "output.zip")
85
+ # Parse and generate zip archive
86
+ converter.convert("workflow_export.xml", output_zip="output.zip")
87
87
 
88
- # Parse to structured dict
88
+ # Parse to structured dict (no code generation)
89
89
  result = converter.parse_file("workflow_export.xml")
90
90
 
91
91
  # Use a different data library
92
- converter.convert_to_files("workflow_export.xml", "output_dir", data_lib="polars")
92
+ converter = InformaticaConverter(data_lib="polars")
93
+ converter.convert("workflow_export.xml", output_dir="output_dir")
93
94
  ```
94
95
 
95
96
  ## Generated Output Files
96
97
 
97
98
  | File | Description |
98
99
  |------|-------------|
99
- | `helper_functions.py` | Database/file I/O helpers, Informatica expression equivalents (80+ functions), window/analytic functions, stored procedure execution, state persistence |
100
- | `mapping_{name}.py` | One per mapping, named after the real Informatica mapping name — transformation logic with row-count logging, source reads, target writes, inline documentation |
100
+ | `helper_functions.py` | Database/file I/O helpers, 90+ Informatica expression equivalents, window/analytic functions, stored procedure execution, state persistence |
101
+ | `mapping_{name}.py` | One per mapping, named after the real Informatica mapping name — transformation logic with vectorized expressions, row-count logging, type casting, inline documentation |
101
102
  | `workflow.py` | Task orchestration with topological ordering, decision branching, worklet calls, and error handling |
102
103
  | `config.yml` | Connection configs, source/target metadata, runtime parameters |
103
104
  | `all_sql_queries.sql` | All SQL extracted from Source Qualifiers, Lookups, SQL transforms (with ANSI-translated variants) |
@@ -119,23 +120,22 @@ Select via `--data-lib` CLI flag or `data_lib` parameter:
119
120
 
120
121
  The code generator produces real, runnable Python for these transformation types:
121
122
 
122
- - **Source Qualifier** — SQL override, pre/post SQL, column selection, session connection overrides
123
- - **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style)
123
+ - **Source Qualifier** — SQL override, pre/post SQL, column selection, session connection overrides, `$$PARAM` substitution in SQL
124
+ - **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style) with 40+ vectorized function handlers
124
125
  - **Filter** — Row filtering with vectorized converted conditions
125
126
  - **Joiner** — `pd.merge()` with join type and condition parsing (inner/left/right/outer)
126
- - **Lookup** — `pd.merge()` lookups with connection-aware DB/file reads, multiple match policies, default values
127
+ - **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution
127
128
  - **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST, computed aggregates
128
- - **Sorter** — `sort_values()` with multi-key ascending/descending
129
+ - **Sorter** — `sort_values()` with multi-key ascending/descending per-field direction from SORTDIRECTION attribute
129
130
  - **Router** — Multi-group conditional routing with named groups
130
131
  - **Union** — `pd.concat()` across multiple input groups
131
- - **Update Strategy** — DD_INSERT/DD_UPDATE/DD_DELETE/DD_REJECT routing with actual target INSERT/UPDATE/DELETE operations, dialect-aware SQL placeholders, auto-detected primary keys
132
+ - **Update Strategy** — DD_INSERT/DD_UPDATE/DD_DELETE/DD_REJECT routing with actual target INSERT/UPDATE/DELETE operations, dialect-aware SQL placeholders, auto-detected primary keys; vectorized expression parsing with row-level fallback
132
133
  - **Sequence Generator** — Auto-incrementing ID columns
133
134
  - **Normalizer** — `pd.melt()` with auto-detected id/value vars
134
135
  - **Rank** — `groupby().rank()` with Top-N filtering
135
136
  - **Stored Procedure** — Full code generation with Oracle/MSSQL/generic support, input/output parameter mapping
136
- - **Transaction Control** — Commit/rollback logic
137
137
  - **Custom / Java** — Placeholder stubs with TODO markers
138
- - **SQL Transform** — Direct SQL execution pass-through
138
+ - **SQL Transform** — Direct SQL execution pass-through with `$$PARAM` substitution
139
139
 
140
140
  ## Supported XML Tags (72 Tags)
141
141
 
@@ -153,6 +153,86 @@ The code generator produces real, runnable Python for these transformation types
153
153
 
154
154
  ## Key Features
155
155
 
156
+ ### Generated Code Quality (v1.9.3+)
157
+
158
+ Generated code follows clean formatting and commenting standards:
159
+ - Consistent section headers (`# ---`) for Source Qualifiers, Transformations, and Target Writes
160
+ - Each section includes metadata: database type, field lists, descriptions
161
+ - Column mapping comments (`# Column mapping: source -> target`) and write operation type comments (`# Write to database table` / `# Write to file`)
162
+ - Expression inline comments showing original Informatica expression (e.g., `# FULL_NAME = UPPER(FIRST_NAME) || ' ' || UPPER(LAST_NAME)`)
163
+ - Clean indentation: no blank line after `try:`, no consecutive blank lines inside function body
164
+ - Mapping-level `try:/except` wrapper with `logger.error()` for runtime visibility
165
+
166
+ ### Smart Target Write Detection (v1.9.3+)
167
+
168
+ Targets are automatically classified as database or file writes:
169
+ - Targets with `database_type` set (Oracle, SQL Server, etc.) generate `write_to_db()` calls
170
+ - Targets with flatfile metadata or file extensions (`.csv`, `.dat`, `.txt`, `.xml`, `.json`, `.parquet`, `.xlsx`, `.xls`, `.tsv`, `.avro`) generate `write_file()` calls
171
+ - Bare targets (no metadata) default to `write_to_db()` since Informatica targets are typically database tables
172
+ - Schema-qualified names (e.g., `dbo.MY_TABLE`) correctly route to database writes
173
+ - Session file path overrides take priority when present
174
+
175
+ ### Vectorized Expression Engine (v1.9.2+)
176
+
177
+ Column-level pandas operations instead of row-level iteration. The expression converter uses a recursive parenthesis-aware parser that handles:
178
+
179
+ **Conditional / Null:**
180
+ - `IIF(cond, val, else_val)` → `np.where()` — supports 2-arg form (missing else defaults to `None`)
181
+ - `DECODE(TRUE, cond1, val1, ..., default)` → nested `np.where()` chains
182
+ - `DECODE(field, val1, res1, ..., default)` → value-matching `np.where()`
183
+ - `NVL(val, default)` → `.fillna()`
184
+ - `IS_SPACES(field)` → `field.str.strip().eq("")`
185
+ - `IS_NUMBER(field)` → `pd.to_numeric(field, errors="coerce").notna()`
186
+ - `IN(field, val1, val2, ...)` → `field.isin([...])`
187
+
188
+ **String:**
189
+ - `UPPER/LOWER` → `.str.upper()/.str.lower()`
190
+ - `LTRIM/RTRIM/TRIM` → `.str.lstrip()/.str.rstrip()/.str.strip()` with custom char support
191
+ - `SUBSTR(val, start, len)` → `.str[start:end]`
192
+ - `INSTR(val, search)` → `.str.find()`
193
+ - `LPAD/RPAD` → `.str.pad()`
194
+ - `REVERSE(val)` → `.str[::-1]`
195
+ - `INITCAP(val)` → `.str.title()`
196
+ - `REPLACECHR/REPLACESTR` → `.str.replace()`
197
+ - `REG_EXTRACT/REG_REPLACE` → `.str.extract()/.str.replace(regex=True)`
198
+ - `CHR(code)` → `chr(int(code))`
199
+ - `||` concatenation → `+` with `.astype(str)` on non-literals
200
+
201
+ **Date/Time:**
202
+ - `TO_DATE(val, fmt)` → `pd.to_datetime()` with Informatica→Python format conversion
203
+ - `TO_CHAR(val, fmt)` → `.dt.strftime()`
204
+ - `ADD_TO_DATE(date, part, amount)` → `date + pd.to_timedelta()` with full unit mapping (YY/MM/DD/HH/MI/SS)
205
+ - `DATE_DIFF(date1, date2, part)` → `(date1 - date2).dt.days` / `.dt.total_seconds() / 3600` etc.
206
+ - `SYSDATE/SYSTIMESTAMP` → `pd.Timestamp.now()`
207
+ - `TRUNC(date, 'DD')` → date truncation via `.dt.floor()/.dt.to_period()`
208
+ - `MAKE_DATE_TIME(y, m, d, h, mi, s)` → `pd.Timestamp()`
209
+
210
+ **Numeric:**
211
+ - `TO_INTEGER/TO_BIGINT/TO_FLOAT/TO_DECIMAL` → `pd.to_numeric()`
212
+ - `TRUNC(val)` → `np.trunc()` for numeric truncation
213
+ - `ROUND/ABS/CEIL/FLOOR/POWER/SQRT/MOD/LOG/SIGN` → `np.*` equivalents
214
+
215
+ **Special:**
216
+ - `:LKP.TABLE(args)` — Connected lookup references → `df_lkp_table` merge
217
+ - `:PORT.FUNC(args)` — Unconnected lookups → `lookup_func("FUNC", args)` calls
218
+ - Inline `--` comment stripping (respects string literals)
219
+ - String-literal-aware field substitution
220
+
221
+ ### Expression Converter (90+ Row-Level Functions)
222
+
223
+ All Informatica expression functions are available as row-level Python equivalents in `helper_functions.py`:
224
+
225
+ - **String:** `substr`, `ltrim`, `rtrim`, `upper`, `lower`, `lpad`, `rpad`, `instr`, `length`, `concat`, `replacechr`, `replacestr`, `reg_extract`, `reg_replace`, `reg_match`, `reverse_str`, `initcap`, `chr_func`, `ascii_func`, `left_str`, `right_str`, `trim_func`, `indexof`, `metaphone_func`, `soundex_func`, `compress_func`, `decompress_func`
226
+ - **Date:** `add_to_date`, `date_diff`, `date_compare`, `get_date_part`, `set_date_part`, `last_day`, `make_date_time`, `to_date`, `to_char`, `to_timestamp_func`, `current_timestamp`, `session_start_time`
227
+ - **Numeric:** `round_val`, `trunc`, `mod_val`, `abs_val`, `ceil_val`, `floor_val`, `power_val`, `sqrt_val`, `log_val`, `ln_val`, `exp_val`, `sign_val`, `rand_val`, `greatest_val`, `least_val`
228
+ - **Conversion:** `to_integer`, `to_bigint`, `to_float`, `to_decimal`, `cast_func`
229
+ - **Null/Conditional:** `iif_expr`, `decode_expr`, `nvl`, `nvl2`, `isnull`, `is_spaces`, `is_number`, `is_date`, `in_expr`, `choose_expr`
230
+ - **Aggregate:** `sum_val`, `avg_val`, `count_val`, `min_val`, `max_val`, `first_val`, `last_val`, `median_val`, `stddev_val`, `variance_val`, `percentile_val`
231
+ - **Window/Analytic:** `moving_avg`, `moving_avg_df`, `moving_sum`, `moving_sum_df`, `cume`, `cume_df`, `percentile_df`
232
+ - **Lookup:** `lookup_func` — Placeholder for runtime lookup resolution
233
+ - **Variable:** `get_variable`, `set_variable`, `set_count_variable`
234
+ - **Control:** `raise_error`, `abort_func`
235
+
156
236
  ### Row-Count Logging (v1.8+)
157
237
 
158
238
  Generated code automatically logs row counts at every step of the data pipeline:
@@ -165,8 +245,6 @@ AGG_TOTALS (Aggregator): 8542 input rows -> 150 output rows
165
245
  Target TGT_SUMMARY: 150 rows written
166
246
  ```
167
247
 
168
- All row-count operations are backend-safe (wrapped in try/except), so Dask and other lazy-evaluation backends won't fail.
169
-
170
248
  ### Generated Code Documentation (v1.8+)
171
249
 
172
250
  Every generated mapping function includes a rich docstring describing:
@@ -179,14 +257,6 @@ Each transformation block is annotated with:
179
257
  - Transform type and description (from Informatica XML)
180
258
  - Input and output field lists (truncated at 10 for readability)
181
259
 
182
- ### Window / Analytic Functions (v1.7+)
183
-
184
- DataFrame-level analytic functions for aggregation transforms:
185
- - `moving_avg_df(df, col, window)` — rolling mean via `.rolling().mean()`
186
- - `moving_sum_df(df, col, window)` — rolling sum via `.rolling().sum()`
187
- - `cume_df(df, col)` — cumulative sum via `.expanding().sum()`
188
- - `percentile_df(df, col, pct)` — quantile via `.quantile()`
189
-
190
260
  ### Update Strategy with Target Operations (v1.7+)
191
261
 
192
262
  Update Strategy transforms now generate real INSERT/UPDATE/DELETE operations:
@@ -196,6 +266,14 @@ Update Strategy transforms now generate real INSERT/UPDATE/DELETE operations:
196
266
  - Dialect-aware SQL placeholders (`?` for MSSQL, `%s` for PostgreSQL/Oracle)
197
267
  - Primary key columns auto-detected from target field definitions
198
268
 
269
+ ### Window / Analytic Functions (v1.7+)
270
+
271
+ DataFrame-level analytic functions for aggregation transforms:
272
+ - `moving_avg_df(df, col, window)` — rolling mean via `.rolling().mean()`
273
+ - `moving_sum_df(df, col, window)` — rolling sum via `.rolling().sum()`
274
+ - `cume_df(df, col)` — cumulative sum via `.expanding().sum()`
275
+ - `percentile_df(df, col, pct)` — quantile via `.quantile()`
276
+
199
277
  ### Stored Procedure Execution (v1.7+)
200
278
 
201
279
  Full stored procedure code generation (not just stubs):
@@ -241,19 +319,13 @@ Optional `--validate-casts` flag generates null-count checks before/after type c
241
319
  - Logs warnings when coercion introduces new nulls
242
320
  - Helps identify data quality issues during test runs
243
321
 
244
- ### Vectorized Expression Generation (v1.5+)
245
-
246
- Column-level pandas operations instead of row-level iteration:
247
- - IIF → `np.where()`, NVL → `.fillna()`, UPPER/LOWER → `.str.upper()/.str.lower()`
248
- - SUBSTR → `.str[start:end]`, TO_INTEGER → `pd.to_numeric()`, TO_DATE → `pd.to_datetime()`
249
- - IS NULL/IS NOT NULL → `.isna()`/`.notna()`
250
-
251
322
  ### Parameter File Support (v1.5+)
252
323
 
253
324
  Standard Informatica `.param` file parsing:
254
325
  - `[Global]` and `[folder.WF:workflow.ST:session]` section support
255
326
  - `get_param(config, var_name)` resolution chain: config → env vars → defaults
256
327
  - CLI `--param-file` flag for specifying parameter files
328
+ - `$$PARAM` variables in SQL automatically substituted with `.replace()` calls
257
329
 
258
330
  ### Session Connection Overrides (v1.4+)
259
331
 
@@ -283,18 +355,49 @@ Expands Mapplet instances into prefixed transforms, rewires connectors, and elim
283
355
 
284
356
  Converts Informatica decision conditions to Python if/else branches with proper variable substitution.
285
357
 
286
- ### Expression Converter (80+ Functions)
287
-
288
- Converts Informatica expressions to Python equivalents:
289
-
290
- - **String:** SUBSTR, LTRIM, RTRIM, UPPER, LOWER, LPAD, RPAD, INSTR, LENGTH, CONCAT, REPLACE, REG_EXTRACT, REG_REPLACE, REVERSE, INITCAP, CHR, ASCII
291
- - **Date:** ADD_TO_DATE, DATE_DIFF, GET_DATE_PART, SYSDATE, SYSTIMESTAMP, TO_DATE, TO_CHAR, TRUNC (date)
292
- - **Numeric:** ROUND, TRUNC, MOD, ABS, CEIL, FLOOR, POWER, SQRT, LOG, EXP, SIGN
293
- - **Conversion:** TO_INTEGER, TO_BIGINT, TO_FLOAT, TO_DECIMAL, TO_CHAR, TO_DATE
294
- - **Null handling:** IIF, DECODE, NVL, NVL2, ISNULL, IS_SPACES, IS_NUMBER
295
- - **Aggregate:** SUM, AVG, COUNT, MIN, MAX, FIRST, LAST, MEDIAN, STDDEV, VARIANCE
296
- - **Lookup:** :LKP expressions with dynamic lookup references
297
- - **Variable:** SETVARIABLE / mapping variable assignment
358
+ ## Helper Functions Library
359
+
360
+ The generated `helper_functions.py` provides a complete runtime library:
361
+
362
+ ### Configuration & Parameters
363
+ | Function | Description |
364
+ |----------|-------------|
365
+ | `load_config(path, param_file)` | Load YAML config with optional `.param` file merge |
366
+ | `parse_param_file(path)` | Parse Informatica `.param` files (`[Global]`, `[folder.WF:...]` sections) |
367
+ | `get_param(config, var_name, default)` | Resolve parameter: config env vars → default |
368
+ | `get_variable(var_name, config)` | Get workflow/mapping variable from params, env vars, or param store |
369
+ | `set_variable(var_name, value)` | Set workflow/mapping variable in param store and env |
370
+
371
+ ### Database Operations
372
+ | Function | Description |
373
+ |----------|-------------|
374
+ | `get_db_connection(config, conn_name)` | Create DB connection (pyodbc/pymssql/sqlalchemy fallback for MSSQL) |
375
+ | `read_from_db(config, query, conn_name)` | Execute SQL query and return DataFrame |
376
+ | `write_to_db(config, df, table, conn_name)` | Write DataFrame to database table via `.to_sql()` |
377
+ | `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement (INSERT, UPDATE, DELETE) |
378
+ | `write_with_update_strategy(config, df, table, ...)` | Split rows by `_update_strategy` column into INSERT/UPDATE/DELETE/REJECT operations |
379
+ | `call_stored_procedure(config, proc, params, ...)` | Execute stored procedure with input/output parameter mapping (Oracle/MSSQL/generic) |
380
+
381
+ ### File Operations
382
+ | Function | Description |
383
+ |----------|-------------|
384
+ | `read_file(path, file_config)` | Read CSV/DAT/TXT/XML/XLSX/JSON/Parquet with auto-detection |
385
+ | `write_file(df, path, file_config)` | Write DataFrame to file with format auto-detection |
386
+
387
+ ### State Persistence
388
+ | Function | Description |
389
+ |----------|-------------|
390
+ | `load_persistent_state(file)` | Load JSON state file for persistent variables |
391
+ | `save_persistent_state(file)` | Save persistent variables to JSON state file |
392
+ | `get_persistent_variable(scope, var, default)` | Get scoped persistent variable |
393
+ | `set_persistent_variable(scope, var, value)` | Set scoped persistent variable |
394
+
395
+ ### Logging & Monitoring
396
+ | Function | Description |
397
+ |----------|-------------|
398
+ | `log_mapping_start(name)` | Log mapping start with timestamp |
399
+ | `log_mapping_end(name, start_time, row_count)` | Log mapping completion with elapsed time |
400
+ | `validate_row_count(df, name, min_rows)` | Validate minimum row count threshold |
298
401
 
299
402
  ## Requirements
300
403
 
@@ -304,7 +407,32 @@ Converts Informatica expressions to Python equivalents:
304
407
 
305
408
  ## Changelog
306
409
 
307
- ### v1.9.x (Phase 8)
410
+ ### v1.9.3 (Current)
411
+ - **Smart target write detection**: Bare targets default to `write_to_db()` instead of `write_file()`; file extension allowlist (`.csv`, `.dat`, `.txt`, `.xml`, `.json`, `.parquet`, `.xlsx`, `.xls`, `.tsv`, `.avro`) for file targets; schema-qualified names (`dbo.TABLE`) correctly route to database
412
+ - **DECODE vectorization**: `DECODE(TRUE, cond1, val1, ..., default)` → nested `np.where()` chains; value-matching DECODE; handles IN() conditions and complex boolean nesting
413
+ - **IS_SPACES vectorization**: `IS_SPACES(field)` → `field.str.strip().eq("")`
414
+ - **2-arg IIF**: `IIF(cond, val)` without else clause defaults to `None`
415
+ - **REVERSE vectorization**: `REVERSE(field)` → `field.str[::-1]`
416
+ - **IN() vectorization**: `IN(field, val1, val2, ...)` → `field.isin([...])`
417
+ - **IS_NUMBER vectorization**: `IS_NUMBER(field)` → `pd.to_numeric(field, errors="coerce").notna()`
418
+ - **SYSDATE/SYSTIMESTAMP**: Bare `SYSDATE`/`SYSTIMESTAMP` → `pd.Timestamp.now()` in vectorized mode
419
+ - **TRUNC vectorization**: Numeric `TRUNC(field)` → `np.trunc()`; date `TRUNC(field, 'DD')` → `.dt.floor()`
420
+ - **ADD_TO_DATE vectorization**: `ADD_TO_DATE(date, part, amount)` → `pd.to_timedelta()` with YY/MM/DD/HH/MI/SS units
421
+ - **DATE_DIFF vectorization**: `DATE_DIFF(date1, date2, part)` → arithmetic on timedelta components
422
+ - **Unconnected lookup support**: `:PORT.FUNC_NAME(args)` → `lookup_func("FUNC_NAME", args)`
423
+ - **Inline comment stripping**: `--` comments removed from expressions (respects string literals)
424
+ - **`$$PARAM` SQL substitution**: Source Qualifier, Lookup, and SQL Transform SQL strings auto-substitute `$$VAR` with `get_param(config, 'VAR')` calls
425
+ - **Sorter direction**: Reads `SORTDIRECTION` from field attributes, generates per-field `ascending=[True, False, ...]`
426
+ - **Pass-through optimization**: Identity expressions skip `.copy()` and use direct reference
427
+ - **Duplicate lookup deduplication**: `_gen_lookup_transform` uses `seen_output_cols` set to avoid duplicate column checks
428
+ - **Mapping-level error handling**: Generated function body wrapped in `try:/except` with `logger.error()`
429
+ - **Update strategy vectorized**: Tries vectorized expression first, falls back to row-level `apply()`
430
+ - **Generated code formatting**: Consistent `# ---` section headers for Source Qualifiers, Transforms, and Target Writes; metadata comments (database type, field lists); column mapping and write operation comments; clean blank line handling
431
+ - **Source/target detection**: Case-insensitive instance type matching
432
+ - **Session→mapping inference**: Longest-suffix-match strategy for ambiguous mapping names
433
+ - **646 tests** across unit, integration, expression, and formatting test suites
434
+
435
+ ### v1.9.2 (Phase 8)
308
436
  - Mapping output files now use real mapping names (e.g., `mapping_m_customer_load.py`) instead of generic numeric indices (`mapping_1.py`)
309
437
  - Workflow imports automatically match the named mapping files
310
438
  - **Expression converter rewrite**: Recursive parenthesis-aware parser replacing simple regex; fixes nested IIF/INSTR/LTRIM/RTRIM/REPLACECHR/REPLACESTR/SUBSTR/TO_CHAR/CHR/MAKE_DATE_TIME
@@ -367,7 +495,7 @@ Converts Informatica expressions to Python equivalents:
367
495
  cd informatica_python
368
496
  pip install -e ".[dev]"
369
497
 
370
- # Run tests (136 tests)
498
+ # Run tests (646 tests)
371
499
  pytest tests/ -v
372
500
  ```
373
501
 
@@ -52,25 +52,26 @@ from informatica_python import InformaticaConverter
52
52
 
53
53
  converter = InformaticaConverter()
54
54
 
55
- # Parse and generate files
56
- converter.convert_to_files("workflow_export.xml", "output_dir")
55
+ # Parse and generate files to a directory
56
+ converter.convert("workflow_export.xml", output_dir="output_dir")
57
57
 
58
- # Parse and generate zip
59
- converter.convert_to_zip("workflow_export.xml", "output.zip")
58
+ # Parse and generate zip archive
59
+ converter.convert("workflow_export.xml", output_zip="output.zip")
60
60
 
61
- # Parse to structured dict
61
+ # Parse to structured dict (no code generation)
62
62
  result = converter.parse_file("workflow_export.xml")
63
63
 
64
64
  # Use a different data library
65
- converter.convert_to_files("workflow_export.xml", "output_dir", data_lib="polars")
65
+ converter = InformaticaConverter(data_lib="polars")
66
+ converter.convert("workflow_export.xml", output_dir="output_dir")
66
67
  ```
67
68
 
68
69
  ## Generated Output Files
69
70
 
70
71
  | File | Description |
71
72
  |------|-------------|
72
- | `helper_functions.py` | Database/file I/O helpers, Informatica expression equivalents (80+ functions), window/analytic functions, stored procedure execution, state persistence |
73
- | `mapping_{name}.py` | One per mapping, named after the real Informatica mapping name — transformation logic with row-count logging, source reads, target writes, inline documentation |
73
+ | `helper_functions.py` | Database/file I/O helpers, 90+ Informatica expression equivalents, window/analytic functions, stored procedure execution, state persistence |
74
+ | `mapping_{name}.py` | One per mapping, named after the real Informatica mapping name — transformation logic with vectorized expressions, row-count logging, type casting, inline documentation |
74
75
  | `workflow.py` | Task orchestration with topological ordering, decision branching, worklet calls, and error handling |
75
76
  | `config.yml` | Connection configs, source/target metadata, runtime parameters |
76
77
  | `all_sql_queries.sql` | All SQL extracted from Source Qualifiers, Lookups, SQL transforms (with ANSI-translated variants) |
@@ -92,23 +93,22 @@ Select via `--data-lib` CLI flag or `data_lib` parameter:
92
93
 
93
94
  The code generator produces real, runnable Python for these transformation types:
94
95
 
95
- - **Source Qualifier** — SQL override, pre/post SQL, column selection, session connection overrides
96
- - **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style)
96
+ - **Source Qualifier** — SQL override, pre/post SQL, column selection, session connection overrides, `$$PARAM` substitution in SQL
97
+ - **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style) with 40+ vectorized function handlers
97
98
  - **Filter** — Row filtering with vectorized converted conditions
98
99
  - **Joiner** — `pd.merge()` with join type and condition parsing (inner/left/right/outer)
99
- - **Lookup** — `pd.merge()` lookups with connection-aware DB/file reads, multiple match policies, default values
100
+ - **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution
100
101
  - **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST, computed aggregates
101
- - **Sorter** — `sort_values()` with multi-key ascending/descending
102
+ - **Sorter** — `sort_values()` with multi-key ascending/descending per-field direction from SORTDIRECTION attribute
102
103
  - **Router** — Multi-group conditional routing with named groups
103
104
  - **Union** — `pd.concat()` across multiple input groups
104
- - **Update Strategy** — DD_INSERT/DD_UPDATE/DD_DELETE/DD_REJECT routing with actual target INSERT/UPDATE/DELETE operations, dialect-aware SQL placeholders, auto-detected primary keys
105
+ - **Update Strategy** — DD_INSERT/DD_UPDATE/DD_DELETE/DD_REJECT routing with actual target INSERT/UPDATE/DELETE operations, dialect-aware SQL placeholders, auto-detected primary keys; vectorized expression parsing with row-level fallback
105
106
  - **Sequence Generator** — Auto-incrementing ID columns
106
107
  - **Normalizer** — `pd.melt()` with auto-detected id/value vars
107
108
  - **Rank** — `groupby().rank()` with Top-N filtering
108
109
  - **Stored Procedure** — Full code generation with Oracle/MSSQL/generic support, input/output parameter mapping
109
- - **Transaction Control** — Commit/rollback logic
110
110
  - **Custom / Java** — Placeholder stubs with TODO markers
111
- - **SQL Transform** — Direct SQL execution pass-through
111
+ - **SQL Transform** — Direct SQL execution pass-through with `$$PARAM` substitution
112
112
 
113
113
  ## Supported XML Tags (72 Tags)
114
114
 
@@ -126,6 +126,86 @@ The code generator produces real, runnable Python for these transformation types
126
126
 
127
127
  ## Key Features
128
128
 
129
+ ### Generated Code Quality (v1.9.3+)
130
+
131
+ Generated code follows clean formatting and commenting standards:
132
+ - Consistent section headers (`# ---`) for Source Qualifiers, Transformations, and Target Writes
133
+ - Each section includes metadata: database type, field lists, descriptions
134
+ - Column mapping comments (`# Column mapping: source -> target`) and write operation type comments (`# Write to database table` / `# Write to file`)
135
+ - Expression inline comments showing original Informatica expression (e.g., `# FULL_NAME = UPPER(FIRST_NAME) || ' ' || UPPER(LAST_NAME)`)
136
+ - Clean indentation: no blank line after `try:`, no consecutive blank lines inside function body
137
+ - Mapping-level `try:/except` wrapper with `logger.error()` for runtime visibility
138
+
139
+ ### Smart Target Write Detection (v1.9.3+)
140
+
141
+ Targets are automatically classified as database or file writes:
142
+ - Targets with `database_type` set (Oracle, SQL Server, etc.) generate `write_to_db()` calls
143
+ - Targets with flatfile metadata or file extensions (`.csv`, `.dat`, `.txt`, `.xml`, `.json`, `.parquet`, `.xlsx`, `.xls`, `.tsv`, `.avro`) generate `write_file()` calls
144
+ - Bare targets (no metadata) default to `write_to_db()` since Informatica targets are typically database tables
145
+ - Schema-qualified names (e.g., `dbo.MY_TABLE`) correctly route to database writes
146
+ - Session file path overrides take priority when present
147
+
148
+ ### Vectorized Expression Engine (v1.9.2+)
149
+
150
+ Column-level pandas operations instead of row-level iteration. The expression converter uses a recursive parenthesis-aware parser that handles:
151
+
152
+ **Conditional / Null:**
153
+ - `IIF(cond, val, else_val)` → `np.where()` — supports 2-arg form (missing else defaults to `None`)
154
+ - `DECODE(TRUE, cond1, val1, ..., default)` → nested `np.where()` chains
155
+ - `DECODE(field, val1, res1, ..., default)` → value-matching `np.where()`
156
+ - `NVL(val, default)` → `.fillna()`
157
+ - `IS_SPACES(field)` → `field.str.strip().eq("")`
158
+ - `IS_NUMBER(field)` → `pd.to_numeric(field, errors="coerce").notna()`
159
+ - `IN(field, val1, val2, ...)` → `field.isin([...])`
160
+
161
+ **String:**
162
+ - `UPPER/LOWER` → `.str.upper()/.str.lower()`
163
+ - `LTRIM/RTRIM/TRIM` → `.str.lstrip()/.str.rstrip()/.str.strip()` with custom char support
164
+ - `SUBSTR(val, start, len)` → `.str[start:end]`
165
+ - `INSTR(val, search)` → `.str.find()`
166
+ - `LPAD/RPAD` → `.str.pad()`
167
+ - `REVERSE(val)` → `.str[::-1]`
168
+ - `INITCAP(val)` → `.str.title()`
169
+ - `REPLACECHR/REPLACESTR` → `.str.replace()`
170
+ - `REG_EXTRACT/REG_REPLACE` → `.str.extract()/.str.replace(regex=True)`
171
+ - `CHR(code)` → `chr(int(code))`
172
+ - `||` concatenation → `+` with `.astype(str)` on non-literals
173
+
174
+ **Date/Time:**
175
+ - `TO_DATE(val, fmt)` → `pd.to_datetime()` with Informatica→Python format conversion
176
+ - `TO_CHAR(val, fmt)` → `.dt.strftime()`
177
+ - `ADD_TO_DATE(date, part, amount)` → `date + pd.to_timedelta()` with full unit mapping (YY/MM/DD/HH/MI/SS)
178
+ - `DATE_DIFF(date1, date2, part)` → `(date1 - date2).dt.days` / `.dt.total_seconds() / 3600` etc.
179
+ - `SYSDATE/SYSTIMESTAMP` → `pd.Timestamp.now()`
180
+ - `TRUNC(date, 'DD')` → date truncation via `.dt.floor()/.dt.to_period()`
181
+ - `MAKE_DATE_TIME(y, m, d, h, mi, s)` → `pd.Timestamp()`
182
+
183
+ **Numeric:**
184
+ - `TO_INTEGER/TO_BIGINT/TO_FLOAT/TO_DECIMAL` → `pd.to_numeric()`
185
+ - `TRUNC(val)` → `np.trunc()` for numeric truncation
186
+ - `ROUND/ABS/CEIL/FLOOR/POWER/SQRT/MOD/LOG/SIGN` → `np.*` equivalents
187
+
188
+ **Special:**
189
+ - `:LKP.TABLE(args)` — Connected lookup references → `df_lkp_table` merge
190
+ - `:PORT.FUNC(args)` — Unconnected lookups → `lookup_func("FUNC", args)` calls
191
+ - Inline `--` comment stripping (respects string literals)
192
+ - String-literal-aware field substitution
193
+
194
+ ### Expression Converter (90+ Row-Level Functions)
195
+
196
+ All Informatica expression functions are available as row-level Python equivalents in `helper_functions.py`:
197
+
198
+ - **String:** `substr`, `ltrim`, `rtrim`, `upper`, `lower`, `lpad`, `rpad`, `instr`, `length`, `concat`, `replacechr`, `replacestr`, `reg_extract`, `reg_replace`, `reg_match`, `reverse_str`, `initcap`, `chr_func`, `ascii_func`, `left_str`, `right_str`, `trim_func`, `indexof`, `metaphone_func`, `soundex_func`, `compress_func`, `decompress_func`
199
+ - **Date:** `add_to_date`, `date_diff`, `date_compare`, `get_date_part`, `set_date_part`, `last_day`, `make_date_time`, `to_date`, `to_char`, `to_timestamp_func`, `current_timestamp`, `session_start_time`
200
+ - **Numeric:** `round_val`, `trunc`, `mod_val`, `abs_val`, `ceil_val`, `floor_val`, `power_val`, `sqrt_val`, `log_val`, `ln_val`, `exp_val`, `sign_val`, `rand_val`, `greatest_val`, `least_val`
201
+ - **Conversion:** `to_integer`, `to_bigint`, `to_float`, `to_decimal`, `cast_func`
202
+ - **Null/Conditional:** `iif_expr`, `decode_expr`, `nvl`, `nvl2`, `isnull`, `is_spaces`, `is_number`, `is_date`, `in_expr`, `choose_expr`
203
+ - **Aggregate:** `sum_val`, `avg_val`, `count_val`, `min_val`, `max_val`, `first_val`, `last_val`, `median_val`, `stddev_val`, `variance_val`, `percentile_val`
204
+ - **Window/Analytic:** `moving_avg`, `moving_avg_df`, `moving_sum`, `moving_sum_df`, `cume`, `cume_df`, `percentile_df`
205
+ - **Lookup:** `lookup_func` — Placeholder for runtime lookup resolution
206
+ - **Variable:** `get_variable`, `set_variable`, `set_count_variable`
207
+ - **Control:** `raise_error`, `abort_func`
208
+
129
209
  ### Row-Count Logging (v1.8+)
130
210
 
131
211
  Generated code automatically logs row counts at every step of the data pipeline:
@@ -138,8 +218,6 @@ AGG_TOTALS (Aggregator): 8542 input rows -> 150 output rows
138
218
  Target TGT_SUMMARY: 150 rows written
139
219
  ```
140
220
 
141
- All row-count operations are backend-safe (wrapped in try/except), so Dask and other lazy-evaluation backends won't fail.
142
-
143
221
  ### Generated Code Documentation (v1.8+)
144
222
 
145
223
  Every generated mapping function includes a rich docstring describing:
@@ -152,14 +230,6 @@ Each transformation block is annotated with:
152
230
  - Transform type and description (from Informatica XML)
153
231
  - Input and output field lists (truncated at 10 for readability)
154
232
 
155
- ### Window / Analytic Functions (v1.7+)
156
-
157
- DataFrame-level analytic functions for aggregation transforms:
158
- - `moving_avg_df(df, col, window)` — rolling mean via `.rolling().mean()`
159
- - `moving_sum_df(df, col, window)` — rolling sum via `.rolling().sum()`
160
- - `cume_df(df, col)` — cumulative sum via `.expanding().sum()`
161
- - `percentile_df(df, col, pct)` — quantile via `.quantile()`
162
-
163
233
  ### Update Strategy with Target Operations (v1.7+)
164
234
 
165
235
  Update Strategy transforms now generate real INSERT/UPDATE/DELETE operations:
@@ -169,6 +239,14 @@ Update Strategy transforms now generate real INSERT/UPDATE/DELETE operations:
169
239
  - Dialect-aware SQL placeholders (`?` for MSSQL, `%s` for PostgreSQL/Oracle)
170
240
  - Primary key columns auto-detected from target field definitions
171
241
 
242
+ ### Window / Analytic Functions (v1.7+)
243
+
244
+ DataFrame-level analytic functions for aggregation transforms:
245
+ - `moving_avg_df(df, col, window)` — rolling mean via `.rolling().mean()`
246
+ - `moving_sum_df(df, col, window)` — rolling sum via `.rolling().sum()`
247
+ - `cume_df(df, col)` — cumulative sum via `.expanding().sum()`
248
+ - `percentile_df(df, col, pct)` — quantile via `.quantile()`
249
+
172
250
  ### Stored Procedure Execution (v1.7+)
173
251
 
174
252
  Full stored procedure code generation (not just stubs):
@@ -214,19 +292,13 @@ Optional `--validate-casts` flag generates null-count checks before/after type c
214
292
  - Logs warnings when coercion introduces new nulls
215
293
  - Helps identify data quality issues during test runs
216
294
 
217
- ### Vectorized Expression Generation (v1.5+)
218
-
219
- Column-level pandas operations instead of row-level iteration:
220
- - IIF → `np.where()`, NVL → `.fillna()`, UPPER/LOWER → `.str.upper()/.str.lower()`
221
- - SUBSTR → `.str[start:end]`, TO_INTEGER → `pd.to_numeric()`, TO_DATE → `pd.to_datetime()`
222
- - IS NULL/IS NOT NULL → `.isna()`/`.notna()`
223
-
224
295
  ### Parameter File Support (v1.5+)
225
296
 
226
297
  Standard Informatica `.param` file parsing:
227
298
  - `[Global]` and `[folder.WF:workflow.ST:session]` section support
228
299
  - `get_param(config, var_name)` resolution chain: config → env vars → defaults
229
300
  - CLI `--param-file` flag for specifying parameter files
301
+ - `$$PARAM` variables in SQL automatically substituted with `.replace()` calls
230
302
 
231
303
  ### Session Connection Overrides (v1.4+)
232
304
 
@@ -256,18 +328,49 @@ Expands Mapplet instances into prefixed transforms, rewires connectors, and elim
256
328
 
257
329
  Converts Informatica decision conditions to Python if/else branches with proper variable substitution.
258
330
 
259
- ### Expression Converter (80+ Functions)
260
-
261
- Converts Informatica expressions to Python equivalents:
262
-
263
- - **String:** SUBSTR, LTRIM, RTRIM, UPPER, LOWER, LPAD, RPAD, INSTR, LENGTH, CONCAT, REPLACE, REG_EXTRACT, REG_REPLACE, REVERSE, INITCAP, CHR, ASCII
264
- - **Date:** ADD_TO_DATE, DATE_DIFF, GET_DATE_PART, SYSDATE, SYSTIMESTAMP, TO_DATE, TO_CHAR, TRUNC (date)
265
- - **Numeric:** ROUND, TRUNC, MOD, ABS, CEIL, FLOOR, POWER, SQRT, LOG, EXP, SIGN
266
- - **Conversion:** TO_INTEGER, TO_BIGINT, TO_FLOAT, TO_DECIMAL, TO_CHAR, TO_DATE
267
- - **Null handling:** IIF, DECODE, NVL, NVL2, ISNULL, IS_SPACES, IS_NUMBER
268
- - **Aggregate:** SUM, AVG, COUNT, MIN, MAX, FIRST, LAST, MEDIAN, STDDEV, VARIANCE
269
- - **Lookup:** :LKP expressions with dynamic lookup references
270
- - **Variable:** SETVARIABLE / mapping variable assignment
331
+ ## Helper Functions Library
332
+
333
+ The generated `helper_functions.py` provides a complete runtime library:
334
+
335
+ ### Configuration & Parameters
336
+ | Function | Description |
337
+ |----------|-------------|
338
+ | `load_config(path, param_file)` | Load YAML config with optional `.param` file merge |
339
+ | `parse_param_file(path)` | Parse Informatica `.param` files (`[Global]`, `[folder.WF:...]` sections) |
340
+ | `get_param(config, var_name, default)` | Resolve parameter: config env vars → default |
341
+ | `get_variable(var_name, config)` | Get workflow/mapping variable from params, env vars, or param store |
342
+ | `set_variable(var_name, value)` | Set workflow/mapping variable in param store and env |
343
+
344
+ ### Database Operations
345
+ | Function | Description |
346
+ |----------|-------------|
347
+ | `get_db_connection(config, conn_name)` | Create DB connection (pyodbc/pymssql/sqlalchemy fallback for MSSQL) |
348
+ | `read_from_db(config, query, conn_name)` | Execute SQL query and return DataFrame |
349
+ | `write_to_db(config, df, table, conn_name)` | Write DataFrame to database table via `.to_sql()` |
350
+ | `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement (INSERT, UPDATE, DELETE) |
351
+ | `write_with_update_strategy(config, df, table, ...)` | Split rows by `_update_strategy` column into INSERT/UPDATE/DELETE/REJECT operations |
352
+ | `call_stored_procedure(config, proc, params, ...)` | Execute stored procedure with input/output parameter mapping (Oracle/MSSQL/generic) |
353
+
354
+ ### File Operations
355
+ | Function | Description |
356
+ |----------|-------------|
357
+ | `read_file(path, file_config)` | Read CSV/DAT/TXT/XML/XLSX/JSON/Parquet with auto-detection |
358
+ | `write_file(df, path, file_config)` | Write DataFrame to file with format auto-detection |
359
+
360
+ ### State Persistence
361
+ | Function | Description |
362
+ |----------|-------------|
363
+ | `load_persistent_state(file)` | Load JSON state file for persistent variables |
364
+ | `save_persistent_state(file)` | Save persistent variables to JSON state file |
365
+ | `get_persistent_variable(scope, var, default)` | Get scoped persistent variable |
366
+ | `set_persistent_variable(scope, var, value)` | Set scoped persistent variable |
367
+
368
+ ### Logging & Monitoring
369
+ | Function | Description |
370
+ |----------|-------------|
371
+ | `log_mapping_start(name)` | Log mapping start with timestamp |
372
+ | `log_mapping_end(name, start_time, row_count)` | Log mapping completion with elapsed time |
373
+ | `validate_row_count(df, name, min_rows)` | Validate minimum row count threshold |
271
374
 
272
375
  ## Requirements
273
376
 
@@ -277,7 +380,32 @@ Converts Informatica expressions to Python equivalents:
277
380
 
278
381
  ## Changelog
279
382
 
280
- ### v1.9.x (Phase 8)
383
+ ### v1.9.3 (Current)
384
+ - **Smart target write detection**: Bare targets default to `write_to_db()` instead of `write_file()`; file extension allowlist (`.csv`, `.dat`, `.txt`, `.xml`, `.json`, `.parquet`, `.xlsx`, `.xls`, `.tsv`, `.avro`) for file targets; schema-qualified names (`dbo.TABLE`) correctly route to database
385
+ - **DECODE vectorization**: `DECODE(TRUE, cond1, val1, ..., default)` → nested `np.where()` chains; value-matching DECODE; handles IN() conditions and complex boolean nesting
386
+ - **IS_SPACES vectorization**: `IS_SPACES(field)` → `field.str.strip().eq("")`
387
+ - **2-arg IIF**: `IIF(cond, val)` without else clause defaults to `None`
388
+ - **REVERSE vectorization**: `REVERSE(field)` → `field.str[::-1]`
389
+ - **IN() vectorization**: `IN(field, val1, val2, ...)` → `field.isin([...])`
390
+ - **IS_NUMBER vectorization**: `IS_NUMBER(field)` → `pd.to_numeric(field, errors="coerce").notna()`
391
+ - **SYSDATE/SYSTIMESTAMP**: Bare `SYSDATE`/`SYSTIMESTAMP` → `pd.Timestamp.now()` in vectorized mode
392
+ - **TRUNC vectorization**: Numeric `TRUNC(field)` → `np.trunc()`; date `TRUNC(field, 'DD')` → `.dt.floor()`
393
+ - **ADD_TO_DATE vectorization**: `ADD_TO_DATE(date, part, amount)` → `pd.to_timedelta()` with YY/MM/DD/HH/MI/SS units
394
+ - **DATE_DIFF vectorization**: `DATE_DIFF(date1, date2, part)` → arithmetic on timedelta components
395
+ - **Unconnected lookup support**: `:PORT.FUNC_NAME(args)` → `lookup_func("FUNC_NAME", args)`
396
+ - **Inline comment stripping**: `--` comments removed from expressions (respects string literals)
397
+ - **`$$PARAM` SQL substitution**: Source Qualifier, Lookup, and SQL Transform SQL strings auto-substitute `$$VAR` with `get_param(config, 'VAR')` calls
398
+ - **Sorter direction**: Reads `SORTDIRECTION` from field attributes, generates per-field `ascending=[True, False, ...]`
399
+ - **Pass-through optimization**: Identity expressions skip `.copy()` and use direct reference
400
+ - **Duplicate lookup deduplication**: `_gen_lookup_transform` uses `seen_output_cols` set to avoid duplicate column checks
401
+ - **Mapping-level error handling**: Generated function body wrapped in `try:/except` with `logger.error()`
402
+ - **Update strategy vectorized**: Tries vectorized expression first, falls back to row-level `apply()`
403
+ - **Generated code formatting**: Consistent `# ---` section headers for Source Qualifiers, Transforms, and Target Writes; metadata comments (database type, field lists); column mapping and write operation comments; clean blank line handling
404
+ - **Source/target detection**: Case-insensitive instance type matching
405
+ - **Session→mapping inference**: Longest-suffix-match strategy for ambiguous mapping names
406
+ - **646 tests** across unit, integration, expression, and formatting test suites
407
+
408
+ ### v1.9.2 (Phase 8)
281
409
  - Mapping output files now use real mapping names (e.g., `mapping_m_customer_load.py`) instead of generic numeric indices (`mapping_1.py`)
282
410
  - Workflow imports automatically match the named mapping files
283
411
  - **Expression converter rewrite**: Recursive parenthesis-aware parser replacing simple regex; fixes nested IIF/INSTR/LTRIM/RTRIM/REPLACECHR/REPLACESTR/SUBSTR/TO_CHAR/CHR/MAKE_DATE_TIME
@@ -340,7 +468,7 @@ Converts Informatica expressions to Python equivalents:
340
468
  cd informatica_python
341
469
  pip install -e ".[dev]"
342
470
 
343
- # Run tests (136 tests)
471
+ # Run tests (646 tests)
344
472
  pytest tests/ -v
345
473
  ```
346
474
 
@@ -7,7 +7,7 @@ Licensed under the MIT License.
7
7
 
8
8
  from informatica_python.converter import InformaticaConverter
9
9
 
10
- __version__ = "1.9.2"
10
+ __version__ = "1.9.3"
11
11
  __author__ = "Nick"
12
12
  __license__ = "MIT"
13
13
  __all__ = ["InformaticaConverter"]