informatica-python 1.9.2__tar.gz → 1.9.3__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {informatica_python-1.9.2 → informatica_python-1.9.3}/PKG-INFO +175 -47
- {informatica_python-1.9.2 → informatica_python-1.9.3}/README.md +174 -46
- {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/__init__.py +1 -1
- {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/generators/mapping_gen.py +140 -58
- {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/generators/workflow_gen.py +21 -4
- {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/utils/expression_converter.py +320 -4
- {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python.egg-info/PKG-INFO +175 -47
- {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python.egg-info/SOURCES.txt +1 -0
- {informatica_python-1.9.2 → informatica_python-1.9.3}/pyproject.toml +1 -1
- {informatica_python-1.9.2 → informatica_python-1.9.3}/tests/test_converter.py +171 -0
- informatica_python-1.9.3/tests/test_expressions.py +1195 -0
- {informatica_python-1.9.2 → informatica_python-1.9.3}/tests/test_integration.py +635 -0
- {informatica_python-1.9.2 → informatica_python-1.9.3}/LICENSE +0 -0
- {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/cli.py +0 -0
- {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/converter.py +0 -0
- {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/generators/__init__.py +0 -0
- {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/generators/config_gen.py +0 -0
- {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/generators/error_log_gen.py +0 -0
- {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/generators/helper_gen.py +0 -0
- {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/generators/sql_gen.py +0 -0
- {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/models.py +0 -0
- {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/parser.py +0 -0
- {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/utils/__init__.py +0 -0
- {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/utils/datatype_map.py +0 -0
- {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/utils/lib_adapters.py +0 -0
- {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python/utils/sql_dialect.py +0 -0
- {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python.egg-info/dependency_links.txt +0 -0
- {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python.egg-info/entry_points.txt +0 -0
- {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python.egg-info/requires.txt +0 -0
- {informatica_python-1.9.2 → informatica_python-1.9.3}/informatica_python.egg-info/top_level.txt +0 -0
- {informatica_python-1.9.2 → informatica_python-1.9.3}/setup.cfg +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: informatica-python
|
|
3
|
-
Version: 1.9.
|
|
3
|
+
Version: 1.9.3
|
|
4
4
|
Summary: Convert Informatica PowerCenter workflow XML to Python/PySpark code
|
|
5
5
|
Author: Nick
|
|
6
6
|
License: MIT
|
|
@@ -79,25 +79,26 @@ from informatica_python import InformaticaConverter
|
|
|
79
79
|
|
|
80
80
|
converter = InformaticaConverter()
|
|
81
81
|
|
|
82
|
-
# Parse and generate files
|
|
83
|
-
converter.
|
|
82
|
+
# Parse and generate files to a directory
|
|
83
|
+
converter.convert("workflow_export.xml", output_dir="output_dir")
|
|
84
84
|
|
|
85
|
-
# Parse and generate zip
|
|
86
|
-
converter.
|
|
85
|
+
# Parse and generate zip archive
|
|
86
|
+
converter.convert("workflow_export.xml", output_zip="output.zip")
|
|
87
87
|
|
|
88
|
-
# Parse to structured dict
|
|
88
|
+
# Parse to structured dict (no code generation)
|
|
89
89
|
result = converter.parse_file("workflow_export.xml")
|
|
90
90
|
|
|
91
91
|
# Use a different data library
|
|
92
|
-
converter
|
|
92
|
+
converter = InformaticaConverter(data_lib="polars")
|
|
93
|
+
converter.convert("workflow_export.xml", output_dir="output_dir")
|
|
93
94
|
```
|
|
94
95
|
|
|
95
96
|
## Generated Output Files
|
|
96
97
|
|
|
97
98
|
| File | Description |
|
|
98
99
|
|------|-------------|
|
|
99
|
-
| `helper_functions.py` | Database/file I/O helpers, Informatica expression equivalents
|
|
100
|
-
| `mapping_{name}.py` | One per mapping, named after the real Informatica mapping name — transformation logic with row-count logging,
|
|
100
|
+
| `helper_functions.py` | Database/file I/O helpers, 90+ Informatica expression equivalents, window/analytic functions, stored procedure execution, state persistence |
|
|
101
|
+
| `mapping_{name}.py` | One per mapping, named after the real Informatica mapping name — transformation logic with vectorized expressions, row-count logging, type casting, inline documentation |
|
|
101
102
|
| `workflow.py` | Task orchestration with topological ordering, decision branching, worklet calls, and error handling |
|
|
102
103
|
| `config.yml` | Connection configs, source/target metadata, runtime parameters |
|
|
103
104
|
| `all_sql_queries.sql` | All SQL extracted from Source Qualifiers, Lookups, SQL transforms (with ANSI-translated variants) |
|
|
@@ -119,23 +120,22 @@ Select via `--data-lib` CLI flag or `data_lib` parameter:
|
|
|
119
120
|
|
|
120
121
|
The code generator produces real, runnable Python for these transformation types:
|
|
121
122
|
|
|
122
|
-
- **Source Qualifier** — SQL override, pre/post SQL, column selection, session connection overrides
|
|
123
|
-
- **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style)
|
|
123
|
+
- **Source Qualifier** — SQL override, pre/post SQL, column selection, session connection overrides, `$$PARAM` substitution in SQL
|
|
124
|
+
- **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style) with 40+ vectorized function handlers
|
|
124
125
|
- **Filter** — Row filtering with vectorized converted conditions
|
|
125
126
|
- **Joiner** — `pd.merge()` with join type and condition parsing (inner/left/right/outer)
|
|
126
|
-
- **Lookup** — `pd.merge()` lookups with connection-aware DB
|
|
127
|
+
- **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution
|
|
127
128
|
- **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST, computed aggregates
|
|
128
|
-
- **Sorter** — `sort_values()` with multi-key ascending/descending
|
|
129
|
+
- **Sorter** — `sort_values()` with multi-key ascending/descending per-field direction from SORTDIRECTION attribute
|
|
129
130
|
- **Router** — Multi-group conditional routing with named groups
|
|
130
131
|
- **Union** — `pd.concat()` across multiple input groups
|
|
131
|
-
- **Update Strategy** — DD_INSERT/DD_UPDATE/DD_DELETE/DD_REJECT routing with actual target INSERT/UPDATE/DELETE operations, dialect-aware SQL placeholders, auto-detected primary keys
|
|
132
|
+
- **Update Strategy** — DD_INSERT/DD_UPDATE/DD_DELETE/DD_REJECT routing with actual target INSERT/UPDATE/DELETE operations, dialect-aware SQL placeholders, auto-detected primary keys; vectorized expression parsing with row-level fallback
|
|
132
133
|
- **Sequence Generator** — Auto-incrementing ID columns
|
|
133
134
|
- **Normalizer** — `pd.melt()` with auto-detected id/value vars
|
|
134
135
|
- **Rank** — `groupby().rank()` with Top-N filtering
|
|
135
136
|
- **Stored Procedure** — Full code generation with Oracle/MSSQL/generic support, input/output parameter mapping
|
|
136
|
-
- **Transaction Control** — Commit/rollback logic
|
|
137
137
|
- **Custom / Java** — Placeholder stubs with TODO markers
|
|
138
|
-
- **SQL Transform** — Direct SQL execution pass-through
|
|
138
|
+
- **SQL Transform** — Direct SQL execution pass-through with `$$PARAM` substitution
|
|
139
139
|
|
|
140
140
|
## Supported XML Tags (72 Tags)
|
|
141
141
|
|
|
@@ -153,6 +153,86 @@ The code generator produces real, runnable Python for these transformation types
|
|
|
153
153
|
|
|
154
154
|
## Key Features
|
|
155
155
|
|
|
156
|
+
### Generated Code Quality (v1.9.3+)
|
|
157
|
+
|
|
158
|
+
Generated code follows clean formatting and commenting standards:
|
|
159
|
+
- Consistent section headers (`# ---`) for Source Qualifiers, Transformations, and Target Writes
|
|
160
|
+
- Each section includes metadata: database type, field lists, descriptions
|
|
161
|
+
- Column mapping comments (`# Column mapping: source -> target`) and write operation type comments (`# Write to database table` / `# Write to file`)
|
|
162
|
+
- Expression inline comments showing original Informatica expression (e.g., `# FULL_NAME = UPPER(FIRST_NAME) || ' ' || UPPER(LAST_NAME)`)
|
|
163
|
+
- Clean indentation: no blank line after `try:`, no consecutive blank lines inside function body
|
|
164
|
+
- Mapping-level `try:/except` wrapper with `logger.error()` for runtime visibility
|
|
165
|
+
|
|
166
|
+
### Smart Target Write Detection (v1.9.3+)
|
|
167
|
+
|
|
168
|
+
Targets are automatically classified as database or file writes:
|
|
169
|
+
- Targets with `database_type` set (Oracle, SQL Server, etc.) generate `write_to_db()` calls
|
|
170
|
+
- Targets with flatfile metadata or file extensions (`.csv`, `.dat`, `.txt`, `.xml`, `.json`, `.parquet`, `.xlsx`, `.xls`, `.tsv`, `.avro`) generate `write_file()` calls
|
|
171
|
+
- Bare targets (no metadata) default to `write_to_db()` since Informatica targets are typically database tables
|
|
172
|
+
- Schema-qualified names (e.g., `dbo.MY_TABLE`) correctly route to database writes
|
|
173
|
+
- Session file path overrides take priority when present
|
|
174
|
+
|
|
175
|
+
### Vectorized Expression Engine (v1.9.2+)
|
|
176
|
+
|
|
177
|
+
Column-level pandas operations instead of row-level iteration. The expression converter uses a recursive parenthesis-aware parser that handles:
|
|
178
|
+
|
|
179
|
+
**Conditional / Null:**
|
|
180
|
+
- `IIF(cond, val, else_val)` → `np.where()` — supports 2-arg form (missing else defaults to `None`)
|
|
181
|
+
- `DECODE(TRUE, cond1, val1, ..., default)` → nested `np.where()` chains
|
|
182
|
+
- `DECODE(field, val1, res1, ..., default)` → value-matching `np.where()`
|
|
183
|
+
- `NVL(val, default)` → `.fillna()`
|
|
184
|
+
- `IS_SPACES(field)` → `field.str.strip().eq("")`
|
|
185
|
+
- `IS_NUMBER(field)` → `pd.to_numeric(field, errors="coerce").notna()`
|
|
186
|
+
- `IN(field, val1, val2, ...)` → `field.isin([...])`
|
|
187
|
+
|
|
188
|
+
**String:**
|
|
189
|
+
- `UPPER/LOWER` → `.str.upper()/.str.lower()`
|
|
190
|
+
- `LTRIM/RTRIM/TRIM` → `.str.lstrip()/.str.rstrip()/.str.strip()` with custom char support
|
|
191
|
+
- `SUBSTR(val, start, len)` → `.str[start:end]`
|
|
192
|
+
- `INSTR(val, search)` → `.str.find()`
|
|
193
|
+
- `LPAD/RPAD` → `.str.pad()`
|
|
194
|
+
- `REVERSE(val)` → `.str[::-1]`
|
|
195
|
+
- `INITCAP(val)` → `.str.title()`
|
|
196
|
+
- `REPLACECHR/REPLACESTR` → `.str.replace()`
|
|
197
|
+
- `REG_EXTRACT/REG_REPLACE` → `.str.extract()/.str.replace(regex=True)`
|
|
198
|
+
- `CHR(code)` → `chr(int(code))`
|
|
199
|
+
- `||` concatenation → `+` with `.astype(str)` on non-literals
|
|
200
|
+
|
|
201
|
+
**Date/Time:**
|
|
202
|
+
- `TO_DATE(val, fmt)` → `pd.to_datetime()` with Informatica→Python format conversion
|
|
203
|
+
- `TO_CHAR(val, fmt)` → `.dt.strftime()`
|
|
204
|
+
- `ADD_TO_DATE(date, part, amount)` → `date + pd.to_timedelta()` with full unit mapping (YY/MM/DD/HH/MI/SS)
|
|
205
|
+
- `DATE_DIFF(date1, date2, part)` → `(date1 - date2).dt.days` / `.dt.total_seconds() / 3600` etc.
|
|
206
|
+
- `SYSDATE/SYSTIMESTAMP` → `pd.Timestamp.now()`
|
|
207
|
+
- `TRUNC(date, 'DD')` → date truncation via `.dt.floor()/.dt.to_period()`
|
|
208
|
+
- `MAKE_DATE_TIME(y, m, d, h, mi, s)` → `pd.Timestamp()`
|
|
209
|
+
|
|
210
|
+
**Numeric:**
|
|
211
|
+
- `TO_INTEGER/TO_BIGINT/TO_FLOAT/TO_DECIMAL` → `pd.to_numeric()`
|
|
212
|
+
- `TRUNC(val)` → `np.trunc()` for numeric truncation
|
|
213
|
+
- `ROUND/ABS/CEIL/FLOOR/POWER/SQRT/MOD/LOG/SIGN` → `np.*` equivalents
|
|
214
|
+
|
|
215
|
+
**Special:**
|
|
216
|
+
- `:LKP.TABLE(args)` — Connected lookup references → `df_lkp_table` merge
|
|
217
|
+
- `:PORT.FUNC(args)` — Unconnected lookups → `lookup_func("FUNC", args)` calls
|
|
218
|
+
- Inline `--` comment stripping (respects string literals)
|
|
219
|
+
- String-literal-aware field substitution
|
|
220
|
+
|
|
221
|
+
### Expression Converter (90+ Row-Level Functions)
|
|
222
|
+
|
|
223
|
+
All Informatica expression functions are available as row-level Python equivalents in `helper_functions.py`:
|
|
224
|
+
|
|
225
|
+
- **String:** `substr`, `ltrim`, `rtrim`, `upper`, `lower`, `lpad`, `rpad`, `instr`, `length`, `concat`, `replacechr`, `replacestr`, `reg_extract`, `reg_replace`, `reg_match`, `reverse_str`, `initcap`, `chr_func`, `ascii_func`, `left_str`, `right_str`, `trim_func`, `indexof`, `metaphone_func`, `soundex_func`, `compress_func`, `decompress_func`
|
|
226
|
+
- **Date:** `add_to_date`, `date_diff`, `date_compare`, `get_date_part`, `set_date_part`, `last_day`, `make_date_time`, `to_date`, `to_char`, `to_timestamp_func`, `current_timestamp`, `session_start_time`
|
|
227
|
+
- **Numeric:** `round_val`, `trunc`, `mod_val`, `abs_val`, `ceil_val`, `floor_val`, `power_val`, `sqrt_val`, `log_val`, `ln_val`, `exp_val`, `sign_val`, `rand_val`, `greatest_val`, `least_val`
|
|
228
|
+
- **Conversion:** `to_integer`, `to_bigint`, `to_float`, `to_decimal`, `cast_func`
|
|
229
|
+
- **Null/Conditional:** `iif_expr`, `decode_expr`, `nvl`, `nvl2`, `isnull`, `is_spaces`, `is_number`, `is_date`, `in_expr`, `choose_expr`
|
|
230
|
+
- **Aggregate:** `sum_val`, `avg_val`, `count_val`, `min_val`, `max_val`, `first_val`, `last_val`, `median_val`, `stddev_val`, `variance_val`, `percentile_val`
|
|
231
|
+
- **Window/Analytic:** `moving_avg`, `moving_avg_df`, `moving_sum`, `moving_sum_df`, `cume`, `cume_df`, `percentile_df`
|
|
232
|
+
- **Lookup:** `lookup_func` — Placeholder for runtime lookup resolution
|
|
233
|
+
- **Variable:** `get_variable`, `set_variable`, `set_count_variable`
|
|
234
|
+
- **Control:** `raise_error`, `abort_func`
|
|
235
|
+
|
|
156
236
|
### Row-Count Logging (v1.8+)
|
|
157
237
|
|
|
158
238
|
Generated code automatically logs row counts at every step of the data pipeline:
|
|
@@ -165,8 +245,6 @@ AGG_TOTALS (Aggregator): 8542 input rows -> 150 output rows
|
|
|
165
245
|
Target TGT_SUMMARY: 150 rows written
|
|
166
246
|
```
|
|
167
247
|
|
|
168
|
-
All row-count operations are backend-safe (wrapped in try/except), so Dask and other lazy-evaluation backends won't fail.
|
|
169
|
-
|
|
170
248
|
### Generated Code Documentation (v1.8+)
|
|
171
249
|
|
|
172
250
|
Every generated mapping function includes a rich docstring describing:
|
|
@@ -179,14 +257,6 @@ Each transformation block is annotated with:
|
|
|
179
257
|
- Transform type and description (from Informatica XML)
|
|
180
258
|
- Input and output field lists (truncated at 10 for readability)
|
|
181
259
|
|
|
182
|
-
### Window / Analytic Functions (v1.7+)
|
|
183
|
-
|
|
184
|
-
DataFrame-level analytic functions for aggregation transforms:
|
|
185
|
-
- `moving_avg_df(df, col, window)` — rolling mean via `.rolling().mean()`
|
|
186
|
-
- `moving_sum_df(df, col, window)` — rolling sum via `.rolling().sum()`
|
|
187
|
-
- `cume_df(df, col)` — cumulative sum via `.expanding().sum()`
|
|
188
|
-
- `percentile_df(df, col, pct)` — quantile via `.quantile()`
|
|
189
|
-
|
|
190
260
|
### Update Strategy with Target Operations (v1.7+)
|
|
191
261
|
|
|
192
262
|
Update Strategy transforms now generate real INSERT/UPDATE/DELETE operations:
|
|
@@ -196,6 +266,14 @@ Update Strategy transforms now generate real INSERT/UPDATE/DELETE operations:
|
|
|
196
266
|
- Dialect-aware SQL placeholders (`?` for MSSQL, `%s` for PostgreSQL/Oracle)
|
|
197
267
|
- Primary key columns auto-detected from target field definitions
|
|
198
268
|
|
|
269
|
+
### Window / Analytic Functions (v1.7+)
|
|
270
|
+
|
|
271
|
+
DataFrame-level analytic functions for aggregation transforms:
|
|
272
|
+
- `moving_avg_df(df, col, window)` — rolling mean via `.rolling().mean()`
|
|
273
|
+
- `moving_sum_df(df, col, window)` — rolling sum via `.rolling().sum()`
|
|
274
|
+
- `cume_df(df, col)` — cumulative sum via `.expanding().sum()`
|
|
275
|
+
- `percentile_df(df, col, pct)` — quantile via `.quantile()`
|
|
276
|
+
|
|
199
277
|
### Stored Procedure Execution (v1.7+)
|
|
200
278
|
|
|
201
279
|
Full stored procedure code generation (not just stubs):
|
|
@@ -241,19 +319,13 @@ Optional `--validate-casts` flag generates null-count checks before/after type c
|
|
|
241
319
|
- Logs warnings when coercion introduces new nulls
|
|
242
320
|
- Helps identify data quality issues during test runs
|
|
243
321
|
|
|
244
|
-
### Vectorized Expression Generation (v1.5+)
|
|
245
|
-
|
|
246
|
-
Column-level pandas operations instead of row-level iteration:
|
|
247
|
-
- IIF → `np.where()`, NVL → `.fillna()`, UPPER/LOWER → `.str.upper()/.str.lower()`
|
|
248
|
-
- SUBSTR → `.str[start:end]`, TO_INTEGER → `pd.to_numeric()`, TO_DATE → `pd.to_datetime()`
|
|
249
|
-
- IS NULL/IS NOT NULL → `.isna()`/`.notna()`
|
|
250
|
-
|
|
251
322
|
### Parameter File Support (v1.5+)
|
|
252
323
|
|
|
253
324
|
Standard Informatica `.param` file parsing:
|
|
254
325
|
- `[Global]` and `[folder.WF:workflow.ST:session]` section support
|
|
255
326
|
- `get_param(config, var_name)` resolution chain: config → env vars → defaults
|
|
256
327
|
- CLI `--param-file` flag for specifying parameter files
|
|
328
|
+
- `$$PARAM` variables in SQL automatically substituted with `.replace()` calls
|
|
257
329
|
|
|
258
330
|
### Session Connection Overrides (v1.4+)
|
|
259
331
|
|
|
@@ -283,18 +355,49 @@ Expands Mapplet instances into prefixed transforms, rewires connectors, and elim
|
|
|
283
355
|
|
|
284
356
|
Converts Informatica decision conditions to Python if/else branches with proper variable substitution.
|
|
285
357
|
|
|
286
|
-
|
|
287
|
-
|
|
288
|
-
|
|
289
|
-
|
|
290
|
-
|
|
291
|
-
|
|
292
|
-
|
|
293
|
-
|
|
294
|
-
|
|
295
|
-
|
|
296
|
-
|
|
297
|
-
|
|
358
|
+
## Helper Functions Library
|
|
359
|
+
|
|
360
|
+
The generated `helper_functions.py` provides a complete runtime library:
|
|
361
|
+
|
|
362
|
+
### Configuration & Parameters
|
|
363
|
+
| Function | Description |
|
|
364
|
+
|----------|-------------|
|
|
365
|
+
| `load_config(path, param_file)` | Load YAML config with optional `.param` file merge |
|
|
366
|
+
| `parse_param_file(path)` | Parse Informatica `.param` files (`[Global]`, `[folder.WF:...]` sections) |
|
|
367
|
+
| `get_param(config, var_name, default)` | Resolve parameter: config → env vars → default |
|
|
368
|
+
| `get_variable(var_name, config)` | Get workflow/mapping variable from params, env vars, or param store |
|
|
369
|
+
| `set_variable(var_name, value)` | Set workflow/mapping variable in param store and env |
|
|
370
|
+
|
|
371
|
+
### Database Operations
|
|
372
|
+
| Function | Description |
|
|
373
|
+
|----------|-------------|
|
|
374
|
+
| `get_db_connection(config, conn_name)` | Create DB connection (pyodbc/pymssql/sqlalchemy fallback for MSSQL) |
|
|
375
|
+
| `read_from_db(config, query, conn_name)` | Execute SQL query and return DataFrame |
|
|
376
|
+
| `write_to_db(config, df, table, conn_name)` | Write DataFrame to database table via `.to_sql()` |
|
|
377
|
+
| `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement (INSERT, UPDATE, DELETE) |
|
|
378
|
+
| `write_with_update_strategy(config, df, table, ...)` | Split rows by `_update_strategy` column into INSERT/UPDATE/DELETE/REJECT operations |
|
|
379
|
+
| `call_stored_procedure(config, proc, params, ...)` | Execute stored procedure with input/output parameter mapping (Oracle/MSSQL/generic) |
|
|
380
|
+
|
|
381
|
+
### File Operations
|
|
382
|
+
| Function | Description |
|
|
383
|
+
|----------|-------------|
|
|
384
|
+
| `read_file(path, file_config)` | Read CSV/DAT/TXT/XML/XLSX/JSON/Parquet with auto-detection |
|
|
385
|
+
| `write_file(df, path, file_config)` | Write DataFrame to file with format auto-detection |
|
|
386
|
+
|
|
387
|
+
### State Persistence
|
|
388
|
+
| Function | Description |
|
|
389
|
+
|----------|-------------|
|
|
390
|
+
| `load_persistent_state(file)` | Load JSON state file for persistent variables |
|
|
391
|
+
| `save_persistent_state(file)` | Save persistent variables to JSON state file |
|
|
392
|
+
| `get_persistent_variable(scope, var, default)` | Get scoped persistent variable |
|
|
393
|
+
| `set_persistent_variable(scope, var, value)` | Set scoped persistent variable |
|
|
394
|
+
|
|
395
|
+
### Logging & Monitoring
|
|
396
|
+
| Function | Description |
|
|
397
|
+
|----------|-------------|
|
|
398
|
+
| `log_mapping_start(name)` | Log mapping start with timestamp |
|
|
399
|
+
| `log_mapping_end(name, start_time, row_count)` | Log mapping completion with elapsed time |
|
|
400
|
+
| `validate_row_count(df, name, min_rows)` | Validate minimum row count threshold |
|
|
298
401
|
|
|
299
402
|
## Requirements
|
|
300
403
|
|
|
@@ -304,7 +407,32 @@ Converts Informatica expressions to Python equivalents:
|
|
|
304
407
|
|
|
305
408
|
## Changelog
|
|
306
409
|
|
|
307
|
-
### v1.9.
|
|
410
|
+
### v1.9.3 (Current)
|
|
411
|
+
- **Smart target write detection**: Bare targets default to `write_to_db()` instead of `write_file()`; file extension allowlist (`.csv`, `.dat`, `.txt`, `.xml`, `.json`, `.parquet`, `.xlsx`, `.xls`, `.tsv`, `.avro`) for file targets; schema-qualified names (`dbo.TABLE`) correctly route to database
|
|
412
|
+
- **DECODE vectorization**: `DECODE(TRUE, cond1, val1, ..., default)` → nested `np.where()` chains; value-matching DECODE; handles IN() conditions and complex boolean nesting
|
|
413
|
+
- **IS_SPACES vectorization**: `IS_SPACES(field)` → `field.str.strip().eq("")`
|
|
414
|
+
- **2-arg IIF**: `IIF(cond, val)` without else clause defaults to `None`
|
|
415
|
+
- **REVERSE vectorization**: `REVERSE(field)` → `field.str[::-1]`
|
|
416
|
+
- **IN() vectorization**: `IN(field, val1, val2, ...)` → `field.isin([...])`
|
|
417
|
+
- **IS_NUMBER vectorization**: `IS_NUMBER(field)` → `pd.to_numeric(field, errors="coerce").notna()`
|
|
418
|
+
- **SYSDATE/SYSTIMESTAMP**: Bare `SYSDATE`/`SYSTIMESTAMP` → `pd.Timestamp.now()` in vectorized mode
|
|
419
|
+
- **TRUNC vectorization**: Numeric `TRUNC(field)` → `np.trunc()`; date `TRUNC(field, 'DD')` → `.dt.floor()`
|
|
420
|
+
- **ADD_TO_DATE vectorization**: `ADD_TO_DATE(date, part, amount)` → `pd.to_timedelta()` with YY/MM/DD/HH/MI/SS units
|
|
421
|
+
- **DATE_DIFF vectorization**: `DATE_DIFF(date1, date2, part)` → arithmetic on timedelta components
|
|
422
|
+
- **Unconnected lookup support**: `:PORT.FUNC_NAME(args)` → `lookup_func("FUNC_NAME", args)`
|
|
423
|
+
- **Inline comment stripping**: `--` comments removed from expressions (respects string literals)
|
|
424
|
+
- **`$$PARAM` SQL substitution**: Source Qualifier, Lookup, and SQL Transform SQL strings auto-substitute `$$VAR` with `get_param(config, 'VAR')` calls
|
|
425
|
+
- **Sorter direction**: Reads `SORTDIRECTION` from field attributes, generates per-field `ascending=[True, False, ...]`
|
|
426
|
+
- **Pass-through optimization**: Identity expressions skip `.copy()` and use direct reference
|
|
427
|
+
- **Duplicate lookup deduplication**: `_gen_lookup_transform` uses `seen_output_cols` set to avoid duplicate column checks
|
|
428
|
+
- **Mapping-level error handling**: Generated function body wrapped in `try:/except` with `logger.error()`
|
|
429
|
+
- **Update strategy vectorized**: Tries vectorized expression first, falls back to row-level `apply()`
|
|
430
|
+
- **Generated code formatting**: Consistent `# ---` section headers for Source Qualifiers, Transforms, and Target Writes; metadata comments (database type, field lists); column mapping and write operation comments; clean blank line handling
|
|
431
|
+
- **Source/target detection**: Case-insensitive instance type matching
|
|
432
|
+
- **Session→mapping inference**: Longest-suffix-match strategy for ambiguous mapping names
|
|
433
|
+
- **646 tests** across unit, integration, expression, and formatting test suites
|
|
434
|
+
|
|
435
|
+
### v1.9.2 (Phase 8)
|
|
308
436
|
- Mapping output files now use real mapping names (e.g., `mapping_m_customer_load.py`) instead of generic numeric indices (`mapping_1.py`)
|
|
309
437
|
- Workflow imports automatically match the named mapping files
|
|
310
438
|
- **Expression converter rewrite**: Recursive parenthesis-aware parser replacing simple regex; fixes nested IIF/INSTR/LTRIM/RTRIM/REPLACECHR/REPLACESTR/SUBSTR/TO_CHAR/CHR/MAKE_DATE_TIME
|
|
@@ -367,7 +495,7 @@ Converts Informatica expressions to Python equivalents:
|
|
|
367
495
|
cd informatica_python
|
|
368
496
|
pip install -e ".[dev]"
|
|
369
497
|
|
|
370
|
-
# Run tests (
|
|
498
|
+
# Run tests (646 tests)
|
|
371
499
|
pytest tests/ -v
|
|
372
500
|
```
|
|
373
501
|
|
|
@@ -52,25 +52,26 @@ from informatica_python import InformaticaConverter
|
|
|
52
52
|
|
|
53
53
|
converter = InformaticaConverter()
|
|
54
54
|
|
|
55
|
-
# Parse and generate files
|
|
56
|
-
converter.
|
|
55
|
+
# Parse and generate files to a directory
|
|
56
|
+
converter.convert("workflow_export.xml", output_dir="output_dir")
|
|
57
57
|
|
|
58
|
-
# Parse and generate zip
|
|
59
|
-
converter.
|
|
58
|
+
# Parse and generate zip archive
|
|
59
|
+
converter.convert("workflow_export.xml", output_zip="output.zip")
|
|
60
60
|
|
|
61
|
-
# Parse to structured dict
|
|
61
|
+
# Parse to structured dict (no code generation)
|
|
62
62
|
result = converter.parse_file("workflow_export.xml")
|
|
63
63
|
|
|
64
64
|
# Use a different data library
|
|
65
|
-
converter
|
|
65
|
+
converter = InformaticaConverter(data_lib="polars")
|
|
66
|
+
converter.convert("workflow_export.xml", output_dir="output_dir")
|
|
66
67
|
```
|
|
67
68
|
|
|
68
69
|
## Generated Output Files
|
|
69
70
|
|
|
70
71
|
| File | Description |
|
|
71
72
|
|------|-------------|
|
|
72
|
-
| `helper_functions.py` | Database/file I/O helpers, Informatica expression equivalents
|
|
73
|
-
| `mapping_{name}.py` | One per mapping, named after the real Informatica mapping name — transformation logic with row-count logging,
|
|
73
|
+
| `helper_functions.py` | Database/file I/O helpers, 90+ Informatica expression equivalents, window/analytic functions, stored procedure execution, state persistence |
|
|
74
|
+
| `mapping_{name}.py` | One per mapping, named after the real Informatica mapping name — transformation logic with vectorized expressions, row-count logging, type casting, inline documentation |
|
|
74
75
|
| `workflow.py` | Task orchestration with topological ordering, decision branching, worklet calls, and error handling |
|
|
75
76
|
| `config.yml` | Connection configs, source/target metadata, runtime parameters |
|
|
76
77
|
| `all_sql_queries.sql` | All SQL extracted from Source Qualifiers, Lookups, SQL transforms (with ANSI-translated variants) |
|
|
@@ -92,23 +93,22 @@ Select via `--data-lib` CLI flag or `data_lib` parameter:
|
|
|
92
93
|
|
|
93
94
|
The code generator produces real, runnable Python for these transformation types:
|
|
94
95
|
|
|
95
|
-
- **Source Qualifier** — SQL override, pre/post SQL, column selection, session connection overrides
|
|
96
|
-
- **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style)
|
|
96
|
+
- **Source Qualifier** — SQL override, pre/post SQL, column selection, session connection overrides, `$$PARAM` substitution in SQL
|
|
97
|
+
- **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style) with 40+ vectorized function handlers
|
|
97
98
|
- **Filter** — Row filtering with vectorized converted conditions
|
|
98
99
|
- **Joiner** — `pd.merge()` with join type and condition parsing (inner/left/right/outer)
|
|
99
|
-
- **Lookup** — `pd.merge()` lookups with connection-aware DB
|
|
100
|
+
- **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution
|
|
100
101
|
- **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST, computed aggregates
|
|
101
|
-
- **Sorter** — `sort_values()` with multi-key ascending/descending
|
|
102
|
+
- **Sorter** — `sort_values()` with multi-key ascending/descending per-field direction from SORTDIRECTION attribute
|
|
102
103
|
- **Router** — Multi-group conditional routing with named groups
|
|
103
104
|
- **Union** — `pd.concat()` across multiple input groups
|
|
104
|
-
- **Update Strategy** — DD_INSERT/DD_UPDATE/DD_DELETE/DD_REJECT routing with actual target INSERT/UPDATE/DELETE operations, dialect-aware SQL placeholders, auto-detected primary keys
|
|
105
|
+
- **Update Strategy** — DD_INSERT/DD_UPDATE/DD_DELETE/DD_REJECT routing with actual target INSERT/UPDATE/DELETE operations, dialect-aware SQL placeholders, auto-detected primary keys; vectorized expression parsing with row-level fallback
|
|
105
106
|
- **Sequence Generator** — Auto-incrementing ID columns
|
|
106
107
|
- **Normalizer** — `pd.melt()` with auto-detected id/value vars
|
|
107
108
|
- **Rank** — `groupby().rank()` with Top-N filtering
|
|
108
109
|
- **Stored Procedure** — Full code generation with Oracle/MSSQL/generic support, input/output parameter mapping
|
|
109
|
-
- **Transaction Control** — Commit/rollback logic
|
|
110
110
|
- **Custom / Java** — Placeholder stubs with TODO markers
|
|
111
|
-
- **SQL Transform** — Direct SQL execution pass-through
|
|
111
|
+
- **SQL Transform** — Direct SQL execution pass-through with `$$PARAM` substitution
|
|
112
112
|
|
|
113
113
|
## Supported XML Tags (72 Tags)
|
|
114
114
|
|
|
@@ -126,6 +126,86 @@ The code generator produces real, runnable Python for these transformation types
|
|
|
126
126
|
|
|
127
127
|
## Key Features
|
|
128
128
|
|
|
129
|
+
### Generated Code Quality (v1.9.3+)
|
|
130
|
+
|
|
131
|
+
Generated code follows clean formatting and commenting standards:
|
|
132
|
+
- Consistent section headers (`# ---`) for Source Qualifiers, Transformations, and Target Writes
|
|
133
|
+
- Each section includes metadata: database type, field lists, descriptions
|
|
134
|
+
- Column mapping comments (`# Column mapping: source -> target`) and write operation type comments (`# Write to database table` / `# Write to file`)
|
|
135
|
+
- Expression inline comments showing original Informatica expression (e.g., `# FULL_NAME = UPPER(FIRST_NAME) || ' ' || UPPER(LAST_NAME)`)
|
|
136
|
+
- Clean indentation: no blank line after `try:`, no consecutive blank lines inside function body
|
|
137
|
+
- Mapping-level `try:/except` wrapper with `logger.error()` for runtime visibility
|
|
138
|
+
|
|
139
|
+
### Smart Target Write Detection (v1.9.3+)
|
|
140
|
+
|
|
141
|
+
Targets are automatically classified as database or file writes:
|
|
142
|
+
- Targets with `database_type` set (Oracle, SQL Server, etc.) generate `write_to_db()` calls
|
|
143
|
+
- Targets with flatfile metadata or file extensions (`.csv`, `.dat`, `.txt`, `.xml`, `.json`, `.parquet`, `.xlsx`, `.xls`, `.tsv`, `.avro`) generate `write_file()` calls
|
|
144
|
+
- Bare targets (no metadata) default to `write_to_db()` since Informatica targets are typically database tables
|
|
145
|
+
- Schema-qualified names (e.g., `dbo.MY_TABLE`) correctly route to database writes
|
|
146
|
+
- Session file path overrides take priority when present
|
|
147
|
+
|
|
148
|
+
### Vectorized Expression Engine (v1.9.2+)
|
|
149
|
+
|
|
150
|
+
Column-level pandas operations instead of row-level iteration. The expression converter uses a recursive parenthesis-aware parser that handles:
|
|
151
|
+
|
|
152
|
+
**Conditional / Null:**
|
|
153
|
+
- `IIF(cond, val, else_val)` → `np.where()` — supports 2-arg form (missing else defaults to `None`)
|
|
154
|
+
- `DECODE(TRUE, cond1, val1, ..., default)` → nested `np.where()` chains
|
|
155
|
+
- `DECODE(field, val1, res1, ..., default)` → value-matching `np.where()`
|
|
156
|
+
- `NVL(val, default)` → `.fillna()`
|
|
157
|
+
- `IS_SPACES(field)` → `field.str.strip().eq("")`
|
|
158
|
+
- `IS_NUMBER(field)` → `pd.to_numeric(field, errors="coerce").notna()`
|
|
159
|
+
- `IN(field, val1, val2, ...)` → `field.isin([...])`
|
|
160
|
+
|
|
161
|
+
**String:**
|
|
162
|
+
- `UPPER/LOWER` → `.str.upper()/.str.lower()`
|
|
163
|
+
- `LTRIM/RTRIM/TRIM` → `.str.lstrip()/.str.rstrip()/.str.strip()` with custom char support
|
|
164
|
+
- `SUBSTR(val, start, len)` → `.str[start:end]`
|
|
165
|
+
- `INSTR(val, search)` → `.str.find()`
|
|
166
|
+
- `LPAD/RPAD` → `.str.pad()`
|
|
167
|
+
- `REVERSE(val)` → `.str[::-1]`
|
|
168
|
+
- `INITCAP(val)` → `.str.title()`
|
|
169
|
+
- `REPLACECHR/REPLACESTR` → `.str.replace()`
|
|
170
|
+
- `REG_EXTRACT/REG_REPLACE` → `.str.extract()/.str.replace(regex=True)`
|
|
171
|
+
- `CHR(code)` → `chr(int(code))`
|
|
172
|
+
- `||` concatenation → `+` with `.astype(str)` on non-literals
|
|
173
|
+
|
|
174
|
+
**Date/Time:**
|
|
175
|
+
- `TO_DATE(val, fmt)` → `pd.to_datetime()` with Informatica→Python format conversion
|
|
176
|
+
- `TO_CHAR(val, fmt)` → `.dt.strftime()`
|
|
177
|
+
- `ADD_TO_DATE(date, part, amount)` → `date + pd.to_timedelta()` with full unit mapping (YY/MM/DD/HH/MI/SS)
|
|
178
|
+
- `DATE_DIFF(date1, date2, part)` → `(date1 - date2).dt.days` / `.dt.total_seconds() / 3600` etc.
|
|
179
|
+
- `SYSDATE/SYSTIMESTAMP` → `pd.Timestamp.now()`
|
|
180
|
+
- `TRUNC(date, 'DD')` → date truncation via `.dt.floor()/.dt.to_period()`
|
|
181
|
+
- `MAKE_DATE_TIME(y, m, d, h, mi, s)` → `pd.Timestamp()`
|
|
182
|
+
|
|
183
|
+
**Numeric:**
|
|
184
|
+
- `TO_INTEGER/TO_BIGINT/TO_FLOAT/TO_DECIMAL` → `pd.to_numeric()`
|
|
185
|
+
- `TRUNC(val)` → `np.trunc()` for numeric truncation
|
|
186
|
+
- `ROUND/ABS/CEIL/FLOOR/POWER/SQRT/MOD/LOG/SIGN` → `np.*` equivalents
|
|
187
|
+
|
|
188
|
+
**Special:**
|
|
189
|
+
- `:LKP.TABLE(args)` — Connected lookup references → `df_lkp_table` merge
|
|
190
|
+
- `:PORT.FUNC(args)` — Unconnected lookups → `lookup_func("FUNC", args)` calls
|
|
191
|
+
- Inline `--` comment stripping (respects string literals)
|
|
192
|
+
- String-literal-aware field substitution
|
|
193
|
+
|
|
194
|
+
### Expression Converter (90+ Row-Level Functions)
|
|
195
|
+
|
|
196
|
+
All Informatica expression functions are available as row-level Python equivalents in `helper_functions.py`:
|
|
197
|
+
|
|
198
|
+
- **String:** `substr`, `ltrim`, `rtrim`, `upper`, `lower`, `lpad`, `rpad`, `instr`, `length`, `concat`, `replacechr`, `replacestr`, `reg_extract`, `reg_replace`, `reg_match`, `reverse_str`, `initcap`, `chr_func`, `ascii_func`, `left_str`, `right_str`, `trim_func`, `indexof`, `metaphone_func`, `soundex_func`, `compress_func`, `decompress_func`
|
|
199
|
+
- **Date:** `add_to_date`, `date_diff`, `date_compare`, `get_date_part`, `set_date_part`, `last_day`, `make_date_time`, `to_date`, `to_char`, `to_timestamp_func`, `current_timestamp`, `session_start_time`
|
|
200
|
+
- **Numeric:** `round_val`, `trunc`, `mod_val`, `abs_val`, `ceil_val`, `floor_val`, `power_val`, `sqrt_val`, `log_val`, `ln_val`, `exp_val`, `sign_val`, `rand_val`, `greatest_val`, `least_val`
|
|
201
|
+
- **Conversion:** `to_integer`, `to_bigint`, `to_float`, `to_decimal`, `cast_func`
|
|
202
|
+
- **Null/Conditional:** `iif_expr`, `decode_expr`, `nvl`, `nvl2`, `isnull`, `is_spaces`, `is_number`, `is_date`, `in_expr`, `choose_expr`
|
|
203
|
+
- **Aggregate:** `sum_val`, `avg_val`, `count_val`, `min_val`, `max_val`, `first_val`, `last_val`, `median_val`, `stddev_val`, `variance_val`, `percentile_val`
|
|
204
|
+
- **Window/Analytic:** `moving_avg`, `moving_avg_df`, `moving_sum`, `moving_sum_df`, `cume`, `cume_df`, `percentile_df`
|
|
205
|
+
- **Lookup:** `lookup_func` — Placeholder for runtime lookup resolution
|
|
206
|
+
- **Variable:** `get_variable`, `set_variable`, `set_count_variable`
|
|
207
|
+
- **Control:** `raise_error`, `abort_func`
|
|
208
|
+
|
|
129
209
|
### Row-Count Logging (v1.8+)
|
|
130
210
|
|
|
131
211
|
Generated code automatically logs row counts at every step of the data pipeline:
|
|
@@ -138,8 +218,6 @@ AGG_TOTALS (Aggregator): 8542 input rows -> 150 output rows
|
|
|
138
218
|
Target TGT_SUMMARY: 150 rows written
|
|
139
219
|
```
|
|
140
220
|
|
|
141
|
-
All row-count operations are backend-safe (wrapped in try/except), so Dask and other lazy-evaluation backends won't fail.
|
|
142
|
-
|
|
143
221
|
### Generated Code Documentation (v1.8+)
|
|
144
222
|
|
|
145
223
|
Every generated mapping function includes a rich docstring describing:
|
|
@@ -152,14 +230,6 @@ Each transformation block is annotated with:
|
|
|
152
230
|
- Transform type and description (from Informatica XML)
|
|
153
231
|
- Input and output field lists (truncated at 10 for readability)
|
|
154
232
|
|
|
155
|
-
### Window / Analytic Functions (v1.7+)
|
|
156
|
-
|
|
157
|
-
DataFrame-level analytic functions for aggregation transforms:
|
|
158
|
-
- `moving_avg_df(df, col, window)` — rolling mean via `.rolling().mean()`
|
|
159
|
-
- `moving_sum_df(df, col, window)` — rolling sum via `.rolling().sum()`
|
|
160
|
-
- `cume_df(df, col)` — cumulative sum via `.expanding().sum()`
|
|
161
|
-
- `percentile_df(df, col, pct)` — quantile via `.quantile()`
|
|
162
|
-
|
|
163
233
|
### Update Strategy with Target Operations (v1.7+)
|
|
164
234
|
|
|
165
235
|
Update Strategy transforms now generate real INSERT/UPDATE/DELETE operations:
|
|
@@ -169,6 +239,14 @@ Update Strategy transforms now generate real INSERT/UPDATE/DELETE operations:
|
|
|
169
239
|
- Dialect-aware SQL placeholders (`?` for MSSQL, `%s` for PostgreSQL/Oracle)
|
|
170
240
|
- Primary key columns auto-detected from target field definitions
|
|
171
241
|
|
|
242
|
+
### Window / Analytic Functions (v1.7+)
|
|
243
|
+
|
|
244
|
+
DataFrame-level analytic functions for aggregation transforms:
|
|
245
|
+
- `moving_avg_df(df, col, window)` — rolling mean via `.rolling().mean()`
|
|
246
|
+
- `moving_sum_df(df, col, window)` — rolling sum via `.rolling().sum()`
|
|
247
|
+
- `cume_df(df, col)` — cumulative sum via `.expanding().sum()`
|
|
248
|
+
- `percentile_df(df, col, pct)` — quantile via `.quantile()`
|
|
249
|
+
|
|
172
250
|
### Stored Procedure Execution (v1.7+)
|
|
173
251
|
|
|
174
252
|
Full stored procedure code generation (not just stubs):
|
|
@@ -214,19 +292,13 @@ Optional `--validate-casts` flag generates null-count checks before/after type c
|
|
|
214
292
|
- Logs warnings when coercion introduces new nulls
|
|
215
293
|
- Helps identify data quality issues during test runs
|
|
216
294
|
|
|
217
|
-
### Vectorized Expression Generation (v1.5+)
|
|
218
|
-
|
|
219
|
-
Column-level pandas operations instead of row-level iteration:
|
|
220
|
-
- IIF → `np.where()`, NVL → `.fillna()`, UPPER/LOWER → `.str.upper()/.str.lower()`
|
|
221
|
-
- SUBSTR → `.str[start:end]`, TO_INTEGER → `pd.to_numeric()`, TO_DATE → `pd.to_datetime()`
|
|
222
|
-
- IS NULL/IS NOT NULL → `.isna()`/`.notna()`
|
|
223
|
-
|
|
224
295
|
### Parameter File Support (v1.5+)
|
|
225
296
|
|
|
226
297
|
Standard Informatica `.param` file parsing:
|
|
227
298
|
- `[Global]` and `[folder.WF:workflow.ST:session]` section support
|
|
228
299
|
- `get_param(config, var_name)` resolution chain: config → env vars → defaults
|
|
229
300
|
- CLI `--param-file` flag for specifying parameter files
|
|
301
|
+
- `$$PARAM` variables in SQL automatically substituted with `.replace()` calls
|
|
230
302
|
|
|
231
303
|
### Session Connection Overrides (v1.4+)
|
|
232
304
|
|
|
@@ -256,18 +328,49 @@ Expands Mapplet instances into prefixed transforms, rewires connectors, and elim
|
|
|
256
328
|
|
|
257
329
|
Converts Informatica decision conditions to Python if/else branches with proper variable substitution.
|
|
258
330
|
|
|
259
|
-
|
|
260
|
-
|
|
261
|
-
|
|
262
|
-
|
|
263
|
-
|
|
264
|
-
|
|
265
|
-
|
|
266
|
-
|
|
267
|
-
|
|
268
|
-
|
|
269
|
-
|
|
270
|
-
|
|
331
|
+
## Helper Functions Library
|
|
332
|
+
|
|
333
|
+
The generated `helper_functions.py` provides a complete runtime library:
|
|
334
|
+
|
|
335
|
+
### Configuration & Parameters
|
|
336
|
+
| Function | Description |
|
|
337
|
+
|----------|-------------|
|
|
338
|
+
| `load_config(path, param_file)` | Load YAML config with optional `.param` file merge |
|
|
339
|
+
| `parse_param_file(path)` | Parse Informatica `.param` files (`[Global]`, `[folder.WF:...]` sections) |
|
|
340
|
+
| `get_param(config, var_name, default)` | Resolve parameter: config → env vars → default |
|
|
341
|
+
| `get_variable(var_name, config)` | Get workflow/mapping variable from params, env vars, or param store |
|
|
342
|
+
| `set_variable(var_name, value)` | Set workflow/mapping variable in param store and env |
|
|
343
|
+
|
|
344
|
+
### Database Operations
|
|
345
|
+
| Function | Description |
|
|
346
|
+
|----------|-------------|
|
|
347
|
+
| `get_db_connection(config, conn_name)` | Create DB connection (pyodbc/pymssql/sqlalchemy fallback for MSSQL) |
|
|
348
|
+
| `read_from_db(config, query, conn_name)` | Execute SQL query and return DataFrame |
|
|
349
|
+
| `write_to_db(config, df, table, conn_name)` | Write DataFrame to database table via `.to_sql()` |
|
|
350
|
+
| `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement (INSERT, UPDATE, DELETE) |
|
|
351
|
+
| `write_with_update_strategy(config, df, table, ...)` | Split rows by `_update_strategy` column into INSERT/UPDATE/DELETE/REJECT operations |
|
|
352
|
+
| `call_stored_procedure(config, proc, params, ...)` | Execute stored procedure with input/output parameter mapping (Oracle/MSSQL/generic) |
|
|
353
|
+
|
|
354
|
+
### File Operations
|
|
355
|
+
| Function | Description |
|
|
356
|
+
|----------|-------------|
|
|
357
|
+
| `read_file(path, file_config)` | Read CSV/DAT/TXT/XML/XLSX/JSON/Parquet with auto-detection |
|
|
358
|
+
| `write_file(df, path, file_config)` | Write DataFrame to file with format auto-detection |
|
|
359
|
+
|
|
360
|
+
### State Persistence
|
|
361
|
+
| Function | Description |
|
|
362
|
+
|----------|-------------|
|
|
363
|
+
| `load_persistent_state(file)` | Load JSON state file for persistent variables |
|
|
364
|
+
| `save_persistent_state(file)` | Save persistent variables to JSON state file |
|
|
365
|
+
| `get_persistent_variable(scope, var, default)` | Get scoped persistent variable |
|
|
366
|
+
| `set_persistent_variable(scope, var, value)` | Set scoped persistent variable |
|
|
367
|
+
|
|
368
|
+
### Logging & Monitoring
|
|
369
|
+
| Function | Description |
|
|
370
|
+
|----------|-------------|
|
|
371
|
+
| `log_mapping_start(name)` | Log mapping start with timestamp |
|
|
372
|
+
| `log_mapping_end(name, start_time, row_count)` | Log mapping completion with elapsed time |
|
|
373
|
+
| `validate_row_count(df, name, min_rows)` | Validate minimum row count threshold |
|
|
271
374
|
|
|
272
375
|
## Requirements
|
|
273
376
|
|
|
@@ -277,7 +380,32 @@ Converts Informatica expressions to Python equivalents:
|
|
|
277
380
|
|
|
278
381
|
## Changelog
|
|
279
382
|
|
|
280
|
-
### v1.9.
|
|
383
|
+
### v1.9.3 (Current)
|
|
384
|
+
- **Smart target write detection**: Bare targets default to `write_to_db()` instead of `write_file()`; file extension allowlist (`.csv`, `.dat`, `.txt`, `.xml`, `.json`, `.parquet`, `.xlsx`, `.xls`, `.tsv`, `.avro`) for file targets; schema-qualified names (`dbo.TABLE`) correctly route to database
|
|
385
|
+
- **DECODE vectorization**: `DECODE(TRUE, cond1, val1, ..., default)` → nested `np.where()` chains; value-matching DECODE; handles IN() conditions and complex boolean nesting
|
|
386
|
+
- **IS_SPACES vectorization**: `IS_SPACES(field)` → `field.str.strip().eq("")`
|
|
387
|
+
- **2-arg IIF**: `IIF(cond, val)` without else clause defaults to `None`
|
|
388
|
+
- **REVERSE vectorization**: `REVERSE(field)` → `field.str[::-1]`
|
|
389
|
+
- **IN() vectorization**: `IN(field, val1, val2, ...)` → `field.isin([...])`
|
|
390
|
+
- **IS_NUMBER vectorization**: `IS_NUMBER(field)` → `pd.to_numeric(field, errors="coerce").notna()`
|
|
391
|
+
- **SYSDATE/SYSTIMESTAMP**: Bare `SYSDATE`/`SYSTIMESTAMP` → `pd.Timestamp.now()` in vectorized mode
|
|
392
|
+
- **TRUNC vectorization**: Numeric `TRUNC(field)` → `np.trunc()`; date `TRUNC(field, 'DD')` → `.dt.floor()`
|
|
393
|
+
- **ADD_TO_DATE vectorization**: `ADD_TO_DATE(date, part, amount)` → `pd.to_timedelta()` with YY/MM/DD/HH/MI/SS units
|
|
394
|
+
- **DATE_DIFF vectorization**: `DATE_DIFF(date1, date2, part)` → arithmetic on timedelta components
|
|
395
|
+
- **Unconnected lookup support**: `:PORT.FUNC_NAME(args)` → `lookup_func("FUNC_NAME", args)`
|
|
396
|
+
- **Inline comment stripping**: `--` comments removed from expressions (respects string literals)
|
|
397
|
+
- **`$$PARAM` SQL substitution**: Source Qualifier, Lookup, and SQL Transform SQL strings auto-substitute `$$VAR` with `get_param(config, 'VAR')` calls
|
|
398
|
+
- **Sorter direction**: Reads `SORTDIRECTION` from field attributes, generates per-field `ascending=[True, False, ...]`
|
|
399
|
+
- **Pass-through optimization**: Identity expressions skip `.copy()` and use direct reference
|
|
400
|
+
- **Duplicate lookup deduplication**: `_gen_lookup_transform` uses `seen_output_cols` set to avoid duplicate column checks
|
|
401
|
+
- **Mapping-level error handling**: Generated function body wrapped in `try:/except` with `logger.error()`
|
|
402
|
+
- **Update strategy vectorized**: Tries vectorized expression first, falls back to row-level `apply()`
|
|
403
|
+
- **Generated code formatting**: Consistent `# ---` section headers for Source Qualifiers, Transforms, and Target Writes; metadata comments (database type, field lists); column mapping and write operation comments; clean blank line handling
|
|
404
|
+
- **Source/target detection**: Case-insensitive instance type matching
|
|
405
|
+
- **Session→mapping inference**: Longest-suffix-match strategy for ambiguous mapping names
|
|
406
|
+
- **646 tests** across unit, integration, expression, and formatting test suites
|
|
407
|
+
|
|
408
|
+
### v1.9.2 (Phase 8)
|
|
281
409
|
- Mapping output files now use real mapping names (e.g., `mapping_m_customer_load.py`) instead of generic numeric indices (`mapping_1.py`)
|
|
282
410
|
- Workflow imports automatically match the named mapping files
|
|
283
411
|
- **Expression converter rewrite**: Recursive parenthesis-aware parser replacing simple regex; fixes nested IIF/INSTR/LTRIM/RTRIM/REPLACECHR/REPLACESTR/SUBSTR/TO_CHAR/CHR/MAKE_DATE_TIME
|
|
@@ -340,7 +468,7 @@ Converts Informatica expressions to Python equivalents:
|
|
|
340
468
|
cd informatica_python
|
|
341
469
|
pip install -e ".[dev]"
|
|
342
470
|
|
|
343
|
-
# Run tests (
|
|
471
|
+
# Run tests (646 tests)
|
|
344
472
|
pytest tests/ -v
|
|
345
473
|
```
|
|
346
474
|
|