informatica-python 1.9.7__tar.gz → 1.9.9__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {informatica_python-1.9.7 → informatica_python-1.9.9}/PKG-INFO +42 -8
- {informatica_python-1.9.7 → informatica_python-1.9.9}/README.md +41 -7
- {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/utils/expression_converter.py +3 -1
- {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python.egg-info/PKG-INFO +42 -8
- {informatica_python-1.9.7 → informatica_python-1.9.9}/pyproject.toml +1 -1
- {informatica_python-1.9.7 → informatica_python-1.9.9}/LICENSE +0 -0
- {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/__init__.py +0 -0
- {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/cli.py +0 -0
- {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/converter.py +0 -0
- {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/generators/__init__.py +0 -0
- {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/generators/config_gen.py +0 -0
- {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/generators/error_log_gen.py +0 -0
- {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/generators/helper_gen.py +0 -0
- {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/generators/mapping_gen.py +0 -0
- {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/generators/sql_gen.py +0 -0
- {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/generators/workflow_gen.py +0 -0
- {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/models.py +0 -0
- {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/parser.py +0 -0
- {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/utils/__init__.py +0 -0
- {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/utils/datatype_map.py +0 -0
- {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/utils/lib_adapters.py +0 -0
- {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/utils/sql_dialect.py +0 -0
- {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python.egg-info/SOURCES.txt +0 -0
- {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python.egg-info/dependency_links.txt +0 -0
- {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python.egg-info/entry_points.txt +0 -0
- {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python.egg-info/requires.txt +0 -0
- {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python.egg-info/top_level.txt +0 -0
- {informatica_python-1.9.7 → informatica_python-1.9.9}/setup.cfg +0 -0
- {informatica_python-1.9.7 → informatica_python-1.9.9}/tests/test_converter.py +0 -0
- {informatica_python-1.9.7 → informatica_python-1.9.9}/tests/test_expressions.py +0 -0
- {informatica_python-1.9.7 → informatica_python-1.9.9}/tests/test_integration.py +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: informatica-python
|
|
3
|
-
Version: 1.9.
|
|
3
|
+
Version: 1.9.9
|
|
4
4
|
Summary: Convert Informatica PowerCenter workflow XML to Python/PySpark code
|
|
5
5
|
Author: Nick
|
|
6
6
|
License: MIT
|
|
@@ -124,7 +124,7 @@ The code generator produces real, runnable Python for these transformation types
|
|
|
124
124
|
- **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style) with 40+ vectorized function handlers
|
|
125
125
|
- **Filter** — Row filtering with vectorized converted conditions
|
|
126
126
|
- **Joiner** — `pd.merge()` with join type and condition parsing (inner/left/right/outer)
|
|
127
|
-
- **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution
|
|
127
|
+
- **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution, SQL override support, table caching via `lookup_func()`
|
|
128
128
|
- **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST, computed aggregates
|
|
129
129
|
- **Sorter** — `sort_values()` with multi-key ascending/descending per-field direction from SORTDIRECTION attribute
|
|
130
130
|
- **Router** — Multi-group conditional routing with named groups
|
|
@@ -196,7 +196,7 @@ Column-level pandas operations instead of row-level iteration. The expression co
|
|
|
196
196
|
- `REPLACECHR/REPLACESTR` → `.str.replace()`
|
|
197
197
|
- `REG_EXTRACT/REG_REPLACE` → `.str.extract()/.str.replace(regex=True)`
|
|
198
198
|
- `CHR(code)` → `chr(int(code))`
|
|
199
|
-
- `||` concatenation → `+` with `.astype(str)`
|
|
199
|
+
- `||` concatenation → `+` with smart coercion: `.fillna('').astype(str)` for Series, `str()` for scalars
|
|
200
200
|
|
|
201
201
|
**Date/Time:**
|
|
202
202
|
- `TO_DATE(val, fmt)` → `pd.to_datetime()` with Informatica→Python format conversion
|
|
@@ -343,10 +343,12 @@ Target field datatypes are mapped to pandas types and generate proper casting co
|
|
|
343
343
|
- Decimals/Floats: `pd.to_numeric(errors='coerce')`
|
|
344
344
|
- Booleans: `.astype('boolean')`
|
|
345
345
|
|
|
346
|
-
### Flat File Handling (v1.3
|
|
346
|
+
### Flat File Handling (v1.3+, enhanced v1.9.8)
|
|
347
347
|
|
|
348
348
|
Parses FLATFILE metadata for delimiter, fixed-width, header lines, skip rows, quote/escape chars. Generates `pd.read_fwf()` for fixed-width or enriched `read_file()` for delimited.
|
|
349
349
|
|
|
350
|
+
**Fixed-width enhancements (v1.9.8):** `OFFSET`, `PHYSICALLENGTH`, and `PHYSICALOFFSET` are parsed from `SOURCEFIELD` attributes. `physical_length` is preferred over `precision` for accurate column width calculations in `pd.read_fwf()`.
|
|
351
|
+
|
|
350
352
|
### Mapplet Inlining (v1.3+)
|
|
351
353
|
|
|
352
354
|
Expands Mapplet instances into prefixed transforms, rewires connectors, and eliminates duplication.
|
|
@@ -371,12 +373,17 @@ The generated `helper_functions.py` provides a complete runtime library:
|
|
|
371
373
|
### Database Operations
|
|
372
374
|
| Function | Description |
|
|
373
375
|
|----------|-------------|
|
|
374
|
-
| `get_db_connection(config, conn_name)` |
|
|
376
|
+
| `get_db_connection(config, conn_name)` | SQLAlchemy-first DB connection with engine caching and connection pooling; DBAPI fallback for pyodbc/pymssql |
|
|
375
377
|
| `read_from_db(config, query, conn_name)` | Execute SQL query and return DataFrame |
|
|
376
378
|
| `write_to_db(config, df, table, conn_name)` | Write DataFrame to database table via `.to_sql()` |
|
|
377
|
-
| `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement
|
|
379
|
+
| `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement; auto-detects SQLAlchemy vs DBAPI via `dialect` attribute |
|
|
378
380
|
| `write_with_update_strategy(config, df, table, ...)` | Split rows by `_update_strategy` column into INSERT/UPDATE/DELETE/REJECT operations |
|
|
379
381
|
| `call_stored_procedure(config, proc, params, ...)` | Execute stored procedure with input/output parameter mapping (Oracle/MSSQL/generic) |
|
|
382
|
+
| `lookup_func(table, *args)` | Full lookup implementation with table caching, condition parsing, and default value support |
|
|
383
|
+
| `resolve_env(value)` | Resolve `${VAR}` placeholders from environment variables with config fallback |
|
|
384
|
+
| `resolve_builtin_variable(var_name, ...)` | Resolve `$PMMappingName`, `$PMSessionName`, `$PMFolderName`, etc. |
|
|
385
|
+
| `rename_with_duplicates(df, col_map)` | Safe column rename supporting one-source-to-many-target mapping |
|
|
386
|
+
| `_safe_close(conn)` | Safe connection cleanup handling both SQLAlchemy and raw DBAPI connections |
|
|
380
387
|
|
|
381
388
|
### File Operations
|
|
382
389
|
| Function | Description |
|
|
@@ -407,7 +414,34 @@ The generated `helper_functions.py` provides a complete runtime library:
|
|
|
407
414
|
|
|
408
415
|
## Changelog
|
|
409
416
|
|
|
410
|
-
### v1.9.
|
|
417
|
+
### v1.9.8 (Current)
|
|
418
|
+
- **NOT(expr) function-call form**: `NOT(ISNULL(x))` now correctly converts to `~(df["x"].isna())` — handles both `NOT ` (with space) and `NOT(` (without space) forms
|
|
419
|
+
- **AND/OR/NOT as field names fix**: Logical operators no longer mangled into `df["AND"]` / `df["OR"]` — conversion moved before field substitution in both `_vec_recursive` fallback and `_vectorize_simple`
|
|
420
|
+
- **Condition tokenizer word-boundary fix**: `_split_condition_tokens` no longer splits on `OR` inside field names like `DeletedIndicator` — verifies preceding character is a real word boundary
|
|
421
|
+
- **`$PMMappingName` in expressions**: `$PM*` built-in variables in expression context properly convert to `resolve_builtin_variable("PMMappingName")` instead of being mangled to `$df["PMMappingName"]`
|
|
422
|
+
- **TO_CHAR arithmetic parenthesization**: `TO_CHAR(TO_INTEGER(x) - 1)` now produces `(pd.to_numeric(...) - 1).astype(str)` instead of incorrect `- 1.astype(str)` binding
|
|
423
|
+
- **String literal early-return fix**: Expressions like `'PER_' || X || '_suffix'` no longer short-circuit as a single string literal
|
|
424
|
+
- **Fixed-width file enhancements**: `OFFSET`, `PHYSICALLENGTH`, `PHYSICALOFFSET` parsed from SOURCEFIELD XML; `physical_length` preferred over `precision` for `read_fwf` column widths
|
|
425
|
+
- **Smart concat coercion**: Scalar returns (e.g. `resolve_builtin_variable()`, `get_variable()`) use `str()` wrapping; Series use `.fillna('').astype(str)`
|
|
426
|
+
- **700 tests** passing
|
|
427
|
+
|
|
428
|
+
### v1.9.5 / v1.9.6
|
|
429
|
+
- **`rename_with_duplicates`** helper for one-source-to-many-target column mapping
|
|
430
|
+
- **`resolve_env()`** for `${VAR}` placeholder resolution (env → config fallback)
|
|
431
|
+
- **`resolve_builtin_variable()`** for `$PMMappingName`, `$PMSessionName`, `$PMFolderName`, etc.
|
|
432
|
+
- **SQLAlchemy-first `get_db_connection`**: Engine caching and connection pooling; DBAPI fallback for pyodbc/pymssql
|
|
433
|
+
- **`_safe_close()`**: Safe connection cleanup handling both SQLAlchemy and raw DBAPI connections
|
|
434
|
+
- **Full `lookup_func()` implementation**: Table caching, condition parsing, default value support
|
|
435
|
+
- **Null-safe `||` concatenation**: `.fillna('').astype(str)` prevents "nan" strings in concatenation
|
|
436
|
+
- **`$PM*` variable substitution in SQL Override queries**
|
|
437
|
+
- **`execute_sql` dialect detection**: Uses `dialect` attribute to choose SQLAlchemy `text()` vs DBAPI `cursor.execute()`
|
|
438
|
+
- **678 tests** passing
|
|
439
|
+
|
|
440
|
+
### v1.9.4
|
|
441
|
+
- Extended expression function coverage and edge-case fixes
|
|
442
|
+
- Improved mapplet and connector handling
|
|
443
|
+
|
|
444
|
+
### v1.9.3
|
|
411
445
|
- **Smart target write detection**: Bare targets default to `write_to_db()` instead of `write_file()`; file extension allowlist (`.csv`, `.dat`, `.txt`, `.xml`, `.json`, `.parquet`, `.xlsx`, `.xls`, `.tsv`, `.avro`) for file targets; schema-qualified names (`dbo.TABLE`) correctly route to database
|
|
412
446
|
- **DECODE vectorization**: `DECODE(TRUE, cond1, val1, ..., default)` → nested `np.where()` chains; value-matching DECODE; handles IN() conditions and complex boolean nesting
|
|
413
447
|
- **IS_SPACES vectorization**: `IS_SPACES(field)` → `field.str.strip().eq("")`
|
|
@@ -495,7 +529,7 @@ The generated `helper_functions.py` provides a complete runtime library:
|
|
|
495
529
|
cd informatica_python
|
|
496
530
|
pip install -e ".[dev]"
|
|
497
531
|
|
|
498
|
-
# Run tests (
|
|
532
|
+
# Run tests (700 tests)
|
|
499
533
|
pytest tests/ -v
|
|
500
534
|
```
|
|
501
535
|
|
|
@@ -97,7 +97,7 @@ The code generator produces real, runnable Python for these transformation types
|
|
|
97
97
|
- **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style) with 40+ vectorized function handlers
|
|
98
98
|
- **Filter** — Row filtering with vectorized converted conditions
|
|
99
99
|
- **Joiner** — `pd.merge()` with join type and condition parsing (inner/left/right/outer)
|
|
100
|
-
- **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution
|
|
100
|
+
- **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution, SQL override support, table caching via `lookup_func()`
|
|
101
101
|
- **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST, computed aggregates
|
|
102
102
|
- **Sorter** — `sort_values()` with multi-key ascending/descending per-field direction from SORTDIRECTION attribute
|
|
103
103
|
- **Router** — Multi-group conditional routing with named groups
|
|
@@ -169,7 +169,7 @@ Column-level pandas operations instead of row-level iteration. The expression co
|
|
|
169
169
|
- `REPLACECHR/REPLACESTR` → `.str.replace()`
|
|
170
170
|
- `REG_EXTRACT/REG_REPLACE` → `.str.extract()/.str.replace(regex=True)`
|
|
171
171
|
- `CHR(code)` → `chr(int(code))`
|
|
172
|
-
- `||` concatenation → `+` with `.astype(str)`
|
|
172
|
+
- `||` concatenation → `+` with smart coercion: `.fillna('').astype(str)` for Series, `str()` for scalars
|
|
173
173
|
|
|
174
174
|
**Date/Time:**
|
|
175
175
|
- `TO_DATE(val, fmt)` → `pd.to_datetime()` with Informatica→Python format conversion
|
|
@@ -316,10 +316,12 @@ Target field datatypes are mapped to pandas types and generate proper casting co
|
|
|
316
316
|
- Decimals/Floats: `pd.to_numeric(errors='coerce')`
|
|
317
317
|
- Booleans: `.astype('boolean')`
|
|
318
318
|
|
|
319
|
-
### Flat File Handling (v1.3
|
|
319
|
+
### Flat File Handling (v1.3+, enhanced v1.9.8)
|
|
320
320
|
|
|
321
321
|
Parses FLATFILE metadata for delimiter, fixed-width, header lines, skip rows, quote/escape chars. Generates `pd.read_fwf()` for fixed-width or enriched `read_file()` for delimited.
|
|
322
322
|
|
|
323
|
+
**Fixed-width enhancements (v1.9.8):** `OFFSET`, `PHYSICALLENGTH`, and `PHYSICALOFFSET` are parsed from `SOURCEFIELD` attributes. `physical_length` is preferred over `precision` for accurate column width calculations in `pd.read_fwf()`.
|
|
324
|
+
|
|
323
325
|
### Mapplet Inlining (v1.3+)
|
|
324
326
|
|
|
325
327
|
Expands Mapplet instances into prefixed transforms, rewires connectors, and eliminates duplication.
|
|
@@ -344,12 +346,17 @@ The generated `helper_functions.py` provides a complete runtime library:
|
|
|
344
346
|
### Database Operations
|
|
345
347
|
| Function | Description |
|
|
346
348
|
|----------|-------------|
|
|
347
|
-
| `get_db_connection(config, conn_name)` |
|
|
349
|
+
| `get_db_connection(config, conn_name)` | SQLAlchemy-first DB connection with engine caching and connection pooling; DBAPI fallback for pyodbc/pymssql |
|
|
348
350
|
| `read_from_db(config, query, conn_name)` | Execute SQL query and return DataFrame |
|
|
349
351
|
| `write_to_db(config, df, table, conn_name)` | Write DataFrame to database table via `.to_sql()` |
|
|
350
|
-
| `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement
|
|
352
|
+
| `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement; auto-detects SQLAlchemy vs DBAPI via `dialect` attribute |
|
|
351
353
|
| `write_with_update_strategy(config, df, table, ...)` | Split rows by `_update_strategy` column into INSERT/UPDATE/DELETE/REJECT operations |
|
|
352
354
|
| `call_stored_procedure(config, proc, params, ...)` | Execute stored procedure with input/output parameter mapping (Oracle/MSSQL/generic) |
|
|
355
|
+
| `lookup_func(table, *args)` | Full lookup implementation with table caching, condition parsing, and default value support |
|
|
356
|
+
| `resolve_env(value)` | Resolve `${VAR}` placeholders from environment variables with config fallback |
|
|
357
|
+
| `resolve_builtin_variable(var_name, ...)` | Resolve `$PMMappingName`, `$PMSessionName`, `$PMFolderName`, etc. |
|
|
358
|
+
| `rename_with_duplicates(df, col_map)` | Safe column rename supporting one-source-to-many-target mapping |
|
|
359
|
+
| `_safe_close(conn)` | Safe connection cleanup handling both SQLAlchemy and raw DBAPI connections |
|
|
353
360
|
|
|
354
361
|
### File Operations
|
|
355
362
|
| Function | Description |
|
|
@@ -380,7 +387,34 @@ The generated `helper_functions.py` provides a complete runtime library:
|
|
|
380
387
|
|
|
381
388
|
## Changelog
|
|
382
389
|
|
|
383
|
-
### v1.9.
|
|
390
|
+
### v1.9.8 (Current)
|
|
391
|
+
- **NOT(expr) function-call form**: `NOT(ISNULL(x))` now correctly converts to `~(df["x"].isna())` — handles both `NOT ` (with space) and `NOT(` (without space) forms
|
|
392
|
+
- **AND/OR/NOT as field names fix**: Logical operators no longer mangled into `df["AND"]` / `df["OR"]` — conversion moved before field substitution in both `_vec_recursive` fallback and `_vectorize_simple`
|
|
393
|
+
- **Condition tokenizer word-boundary fix**: `_split_condition_tokens` no longer splits on `OR` inside field names like `DeletedIndicator` — verifies preceding character is a real word boundary
|
|
394
|
+
- **`$PMMappingName` in expressions**: `$PM*` built-in variables in expression context properly convert to `resolve_builtin_variable("PMMappingName")` instead of being mangled to `$df["PMMappingName"]`
|
|
395
|
+
- **TO_CHAR arithmetic parenthesization**: `TO_CHAR(TO_INTEGER(x) - 1)` now produces `(pd.to_numeric(...) - 1).astype(str)` instead of incorrect `- 1.astype(str)` binding
|
|
396
|
+
- **String literal early-return fix**: Expressions like `'PER_' || X || '_suffix'` no longer short-circuit as a single string literal
|
|
397
|
+
- **Fixed-width file enhancements**: `OFFSET`, `PHYSICALLENGTH`, `PHYSICALOFFSET` parsed from SOURCEFIELD XML; `physical_length` preferred over `precision` for `read_fwf` column widths
|
|
398
|
+
- **Smart concat coercion**: Scalar returns (e.g. `resolve_builtin_variable()`, `get_variable()`) use `str()` wrapping; Series use `.fillna('').astype(str)`
|
|
399
|
+
- **700 tests** passing
|
|
400
|
+
|
|
401
|
+
### v1.9.5 / v1.9.6
|
|
402
|
+
- **`rename_with_duplicates`** helper for one-source-to-many-target column mapping
|
|
403
|
+
- **`resolve_env()`** for `${VAR}` placeholder resolution (env → config fallback)
|
|
404
|
+
- **`resolve_builtin_variable()`** for `$PMMappingName`, `$PMSessionName`, `$PMFolderName`, etc.
|
|
405
|
+
- **SQLAlchemy-first `get_db_connection`**: Engine caching and connection pooling; DBAPI fallback for pyodbc/pymssql
|
|
406
|
+
- **`_safe_close()`**: Safe connection cleanup handling both SQLAlchemy and raw DBAPI connections
|
|
407
|
+
- **Full `lookup_func()` implementation**: Table caching, condition parsing, default value support
|
|
408
|
+
- **Null-safe `||` concatenation**: `.fillna('').astype(str)` prevents "nan" strings in concatenation
|
|
409
|
+
- **`$PM*` variable substitution in SQL Override queries**
|
|
410
|
+
- **`execute_sql` dialect detection**: Uses `dialect` attribute to choose SQLAlchemy `text()` vs DBAPI `cursor.execute()`
|
|
411
|
+
- **678 tests** passing
|
|
412
|
+
|
|
413
|
+
### v1.9.4
|
|
414
|
+
- Extended expression function coverage and edge-case fixes
|
|
415
|
+
- Improved mapplet and connector handling
|
|
416
|
+
|
|
417
|
+
### v1.9.3
|
|
384
418
|
- **Smart target write detection**: Bare targets default to `write_to_db()` instead of `write_file()`; file extension allowlist (`.csv`, `.dat`, `.txt`, `.xml`, `.json`, `.parquet`, `.xlsx`, `.xls`, `.tsv`, `.avro`) for file targets; schema-qualified names (`dbo.TABLE`) correctly route to database
|
|
385
419
|
- **DECODE vectorization**: `DECODE(TRUE, cond1, val1, ..., default)` → nested `np.where()` chains; value-matching DECODE; handles IN() conditions and complex boolean nesting
|
|
386
420
|
- **IS_SPACES vectorization**: `IS_SPACES(field)` → `field.str.strip().eq("")`
|
|
@@ -468,7 +502,7 @@ The generated `helper_functions.py` provides a complete runtime library:
|
|
|
468
502
|
cd informatica_python
|
|
469
503
|
pip install -e ".[dev]"
|
|
470
504
|
|
|
471
|
-
# Run tests (
|
|
505
|
+
# Run tests (700 tests)
|
|
472
506
|
pytest tests/ -v
|
|
473
507
|
```
|
|
474
508
|
|
|
@@ -883,8 +883,10 @@ def _vec_recursive(expr, df_var):
|
|
|
883
883
|
v = _vec_recursive(p, df_var)
|
|
884
884
|
if v.startswith("'") and v.endswith("'"):
|
|
885
885
|
vec_parts.append(v)
|
|
886
|
-
|
|
886
|
+
elif v.startswith(df_var + '[') or v.startswith('pd.') or '.str.' in v:
|
|
887
887
|
vec_parts.append(f'{v}.fillna(\'\').astype(str)')
|
|
888
|
+
else:
|
|
889
|
+
vec_parts.append(f'str({v})')
|
|
888
890
|
return " + ".join(vec_parts)
|
|
889
891
|
|
|
890
892
|
for func_name in sorted(INFA_FUNC_MAP.keys(), key=lambda x: -len(x)):
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: informatica-python
|
|
3
|
-
Version: 1.9.
|
|
3
|
+
Version: 1.9.9
|
|
4
4
|
Summary: Convert Informatica PowerCenter workflow XML to Python/PySpark code
|
|
5
5
|
Author: Nick
|
|
6
6
|
License: MIT
|
|
@@ -124,7 +124,7 @@ The code generator produces real, runnable Python for these transformation types
|
|
|
124
124
|
- **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style) with 40+ vectorized function handlers
|
|
125
125
|
- **Filter** — Row filtering with vectorized converted conditions
|
|
126
126
|
- **Joiner** — `pd.merge()` with join type and condition parsing (inner/left/right/outer)
|
|
127
|
-
- **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution
|
|
127
|
+
- **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution, SQL override support, table caching via `lookup_func()`
|
|
128
128
|
- **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST, computed aggregates
|
|
129
129
|
- **Sorter** — `sort_values()` with multi-key ascending/descending per-field direction from SORTDIRECTION attribute
|
|
130
130
|
- **Router** — Multi-group conditional routing with named groups
|
|
@@ -196,7 +196,7 @@ Column-level pandas operations instead of row-level iteration. The expression co
|
|
|
196
196
|
- `REPLACECHR/REPLACESTR` → `.str.replace()`
|
|
197
197
|
- `REG_EXTRACT/REG_REPLACE` → `.str.extract()/.str.replace(regex=True)`
|
|
198
198
|
- `CHR(code)` → `chr(int(code))`
|
|
199
|
-
- `||` concatenation → `+` with `.astype(str)`
|
|
199
|
+
- `||` concatenation → `+` with smart coercion: `.fillna('').astype(str)` for Series, `str()` for scalars
|
|
200
200
|
|
|
201
201
|
**Date/Time:**
|
|
202
202
|
- `TO_DATE(val, fmt)` → `pd.to_datetime()` with Informatica→Python format conversion
|
|
@@ -343,10 +343,12 @@ Target field datatypes are mapped to pandas types and generate proper casting co
|
|
|
343
343
|
- Decimals/Floats: `pd.to_numeric(errors='coerce')`
|
|
344
344
|
- Booleans: `.astype('boolean')`
|
|
345
345
|
|
|
346
|
-
### Flat File Handling (v1.3
|
|
346
|
+
### Flat File Handling (v1.3+, enhanced v1.9.8)
|
|
347
347
|
|
|
348
348
|
Parses FLATFILE metadata for delimiter, fixed-width, header lines, skip rows, quote/escape chars. Generates `pd.read_fwf()` for fixed-width or enriched `read_file()` for delimited.
|
|
349
349
|
|
|
350
|
+
**Fixed-width enhancements (v1.9.8):** `OFFSET`, `PHYSICALLENGTH`, and `PHYSICALOFFSET` are parsed from `SOURCEFIELD` attributes. `physical_length` is preferred over `precision` for accurate column width calculations in `pd.read_fwf()`.
|
|
351
|
+
|
|
350
352
|
### Mapplet Inlining (v1.3+)
|
|
351
353
|
|
|
352
354
|
Expands Mapplet instances into prefixed transforms, rewires connectors, and eliminates duplication.
|
|
@@ -371,12 +373,17 @@ The generated `helper_functions.py` provides a complete runtime library:
|
|
|
371
373
|
### Database Operations
|
|
372
374
|
| Function | Description |
|
|
373
375
|
|----------|-------------|
|
|
374
|
-
| `get_db_connection(config, conn_name)` |
|
|
376
|
+
| `get_db_connection(config, conn_name)` | SQLAlchemy-first DB connection with engine caching and connection pooling; DBAPI fallback for pyodbc/pymssql |
|
|
375
377
|
| `read_from_db(config, query, conn_name)` | Execute SQL query and return DataFrame |
|
|
376
378
|
| `write_to_db(config, df, table, conn_name)` | Write DataFrame to database table via `.to_sql()` |
|
|
377
|
-
| `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement
|
|
379
|
+
| `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement; auto-detects SQLAlchemy vs DBAPI via `dialect` attribute |
|
|
378
380
|
| `write_with_update_strategy(config, df, table, ...)` | Split rows by `_update_strategy` column into INSERT/UPDATE/DELETE/REJECT operations |
|
|
379
381
|
| `call_stored_procedure(config, proc, params, ...)` | Execute stored procedure with input/output parameter mapping (Oracle/MSSQL/generic) |
|
|
382
|
+
| `lookup_func(table, *args)` | Full lookup implementation with table caching, condition parsing, and default value support |
|
|
383
|
+
| `resolve_env(value)` | Resolve `${VAR}` placeholders from environment variables with config fallback |
|
|
384
|
+
| `resolve_builtin_variable(var_name, ...)` | Resolve `$PMMappingName`, `$PMSessionName`, `$PMFolderName`, etc. |
|
|
385
|
+
| `rename_with_duplicates(df, col_map)` | Safe column rename supporting one-source-to-many-target mapping |
|
|
386
|
+
| `_safe_close(conn)` | Safe connection cleanup handling both SQLAlchemy and raw DBAPI connections |
|
|
380
387
|
|
|
381
388
|
### File Operations
|
|
382
389
|
| Function | Description |
|
|
@@ -407,7 +414,34 @@ The generated `helper_functions.py` provides a complete runtime library:
|
|
|
407
414
|
|
|
408
415
|
## Changelog
|
|
409
416
|
|
|
410
|
-
### v1.9.
|
|
417
|
+
### v1.9.8 (Current)
|
|
418
|
+
- **NOT(expr) function-call form**: `NOT(ISNULL(x))` now correctly converts to `~(df["x"].isna())` — handles both `NOT ` (with space) and `NOT(` (without space) forms
|
|
419
|
+
- **AND/OR/NOT as field names fix**: Logical operators no longer mangled into `df["AND"]` / `df["OR"]` — conversion moved before field substitution in both `_vec_recursive` fallback and `_vectorize_simple`
|
|
420
|
+
- **Condition tokenizer word-boundary fix**: `_split_condition_tokens` no longer splits on `OR` inside field names like `DeletedIndicator` — verifies preceding character is a real word boundary
|
|
421
|
+
- **`$PMMappingName` in expressions**: `$PM*` built-in variables in expression context properly convert to `resolve_builtin_variable("PMMappingName")` instead of being mangled to `$df["PMMappingName"]`
|
|
422
|
+
- **TO_CHAR arithmetic parenthesization**: `TO_CHAR(TO_INTEGER(x) - 1)` now produces `(pd.to_numeric(...) - 1).astype(str)` instead of incorrect `- 1.astype(str)` binding
|
|
423
|
+
- **String literal early-return fix**: Expressions like `'PER_' || X || '_suffix'` no longer short-circuit as a single string literal
|
|
424
|
+
- **Fixed-width file enhancements**: `OFFSET`, `PHYSICALLENGTH`, `PHYSICALOFFSET` parsed from SOURCEFIELD XML; `physical_length` preferred over `precision` for `read_fwf` column widths
|
|
425
|
+
- **Smart concat coercion**: Scalar returns (e.g. `resolve_builtin_variable()`, `get_variable()`) use `str()` wrapping; Series use `.fillna('').astype(str)`
|
|
426
|
+
- **700 tests** passing
|
|
427
|
+
|
|
428
|
+
### v1.9.5 / v1.9.6
|
|
429
|
+
- **`rename_with_duplicates`** helper for one-source-to-many-target column mapping
|
|
430
|
+
- **`resolve_env()`** for `${VAR}` placeholder resolution (env → config fallback)
|
|
431
|
+
- **`resolve_builtin_variable()`** for `$PMMappingName`, `$PMSessionName`, `$PMFolderName`, etc.
|
|
432
|
+
- **SQLAlchemy-first `get_db_connection`**: Engine caching and connection pooling; DBAPI fallback for pyodbc/pymssql
|
|
433
|
+
- **`_safe_close()`**: Safe connection cleanup handling both SQLAlchemy and raw DBAPI connections
|
|
434
|
+
- **Full `lookup_func()` implementation**: Table caching, condition parsing, default value support
|
|
435
|
+
- **Null-safe `||` concatenation**: `.fillna('').astype(str)` prevents "nan" strings in concatenation
|
|
436
|
+
- **`$PM*` variable substitution in SQL Override queries**
|
|
437
|
+
- **`execute_sql` dialect detection**: Uses `dialect` attribute to choose SQLAlchemy `text()` vs DBAPI `cursor.execute()`
|
|
438
|
+
- **678 tests** passing
|
|
439
|
+
|
|
440
|
+
### v1.9.4
|
|
441
|
+
- Extended expression function coverage and edge-case fixes
|
|
442
|
+
- Improved mapplet and connector handling
|
|
443
|
+
|
|
444
|
+
### v1.9.3
|
|
411
445
|
- **Smart target write detection**: Bare targets default to `write_to_db()` instead of `write_file()`; file extension allowlist (`.csv`, `.dat`, `.txt`, `.xml`, `.json`, `.parquet`, `.xlsx`, `.xls`, `.tsv`, `.avro`) for file targets; schema-qualified names (`dbo.TABLE`) correctly route to database
|
|
412
446
|
- **DECODE vectorization**: `DECODE(TRUE, cond1, val1, ..., default)` → nested `np.where()` chains; value-matching DECODE; handles IN() conditions and complex boolean nesting
|
|
413
447
|
- **IS_SPACES vectorization**: `IS_SPACES(field)` → `field.str.strip().eq("")`
|
|
@@ -495,7 +529,7 @@ The generated `helper_functions.py` provides a complete runtime library:
|
|
|
495
529
|
cd informatica_python
|
|
496
530
|
pip install -e ".[dev]"
|
|
497
531
|
|
|
498
|
-
# Run tests (
|
|
532
|
+
# Run tests (700 tests)
|
|
499
533
|
pytest tests/ -v
|
|
500
534
|
```
|
|
501
535
|
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
{informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/generators/__init__.py
RENAMED
|
File without changes
|
{informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/generators/config_gen.py
RENAMED
|
File without changes
|
{informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/generators/error_log_gen.py
RENAMED
|
File without changes
|
{informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/generators/helper_gen.py
RENAMED
|
File without changes
|
{informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/generators/mapping_gen.py
RENAMED
|
File without changes
|
{informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/generators/sql_gen.py
RENAMED
|
File without changes
|
{informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/generators/workflow_gen.py
RENAMED
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
{informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/utils/datatype_map.py
RENAMED
|
File without changes
|
{informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/utils/lib_adapters.py
RENAMED
|
File without changes
|
{informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/utils/sql_dialect.py
RENAMED
|
File without changes
|
{informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python.egg-info/SOURCES.txt
RENAMED
|
File without changes
|
|
File without changes
|
{informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python.egg-info/entry_points.txt
RENAMED
|
File without changes
|
{informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python.egg-info/requires.txt
RENAMED
|
File without changes
|
{informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python.egg-info/top_level.txt
RENAMED
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|