informatica-python 1.9.7__tar.gz → 1.9.9__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (31) hide show
  1. {informatica_python-1.9.7 → informatica_python-1.9.9}/PKG-INFO +42 -8
  2. {informatica_python-1.9.7 → informatica_python-1.9.9}/README.md +41 -7
  3. {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/utils/expression_converter.py +3 -1
  4. {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python.egg-info/PKG-INFO +42 -8
  5. {informatica_python-1.9.7 → informatica_python-1.9.9}/pyproject.toml +1 -1
  6. {informatica_python-1.9.7 → informatica_python-1.9.9}/LICENSE +0 -0
  7. {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/__init__.py +0 -0
  8. {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/cli.py +0 -0
  9. {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/converter.py +0 -0
  10. {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/generators/__init__.py +0 -0
  11. {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/generators/config_gen.py +0 -0
  12. {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/generators/error_log_gen.py +0 -0
  13. {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/generators/helper_gen.py +0 -0
  14. {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/generators/mapping_gen.py +0 -0
  15. {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/generators/sql_gen.py +0 -0
  16. {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/generators/workflow_gen.py +0 -0
  17. {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/models.py +0 -0
  18. {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/parser.py +0 -0
  19. {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/utils/__init__.py +0 -0
  20. {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/utils/datatype_map.py +0 -0
  21. {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/utils/lib_adapters.py +0 -0
  22. {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python/utils/sql_dialect.py +0 -0
  23. {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python.egg-info/SOURCES.txt +0 -0
  24. {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python.egg-info/dependency_links.txt +0 -0
  25. {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python.egg-info/entry_points.txt +0 -0
  26. {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python.egg-info/requires.txt +0 -0
  27. {informatica_python-1.9.7 → informatica_python-1.9.9}/informatica_python.egg-info/top_level.txt +0 -0
  28. {informatica_python-1.9.7 → informatica_python-1.9.9}/setup.cfg +0 -0
  29. {informatica_python-1.9.7 → informatica_python-1.9.9}/tests/test_converter.py +0 -0
  30. {informatica_python-1.9.7 → informatica_python-1.9.9}/tests/test_expressions.py +0 -0
  31. {informatica_python-1.9.7 → informatica_python-1.9.9}/tests/test_integration.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: informatica-python
3
- Version: 1.9.7
3
+ Version: 1.9.9
4
4
  Summary: Convert Informatica PowerCenter workflow XML to Python/PySpark code
5
5
  Author: Nick
6
6
  License: MIT
@@ -124,7 +124,7 @@ The code generator produces real, runnable Python for these transformation types
124
124
  - **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style) with 40+ vectorized function handlers
125
125
  - **Filter** — Row filtering with vectorized converted conditions
126
126
  - **Joiner** — `pd.merge()` with join type and condition parsing (inner/left/right/outer)
127
- - **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution
127
+ - **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution, SQL override support, table caching via `lookup_func()`
128
128
  - **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST, computed aggregates
129
129
  - **Sorter** — `sort_values()` with multi-key ascending/descending per-field direction from SORTDIRECTION attribute
130
130
  - **Router** — Multi-group conditional routing with named groups
@@ -196,7 +196,7 @@ Column-level pandas operations instead of row-level iteration. The expression co
196
196
  - `REPLACECHR/REPLACESTR` → `.str.replace()`
197
197
  - `REG_EXTRACT/REG_REPLACE` → `.str.extract()/.str.replace(regex=True)`
198
198
  - `CHR(code)` → `chr(int(code))`
199
- - `||` concatenation → `+` with `.astype(str)` on non-literals
199
+ - `||` concatenation → `+` with smart coercion: `.fillna('').astype(str)` for Series, `str()` for scalars
200
200
 
201
201
  **Date/Time:**
202
202
  - `TO_DATE(val, fmt)` → `pd.to_datetime()` with Informatica→Python format conversion
@@ -343,10 +343,12 @@ Target field datatypes are mapped to pandas types and generate proper casting co
343
343
  - Decimals/Floats: `pd.to_numeric(errors='coerce')`
344
344
  - Booleans: `.astype('boolean')`
345
345
 
346
- ### Flat File Handling (v1.3+)
346
+ ### Flat File Handling (v1.3+, enhanced v1.9.8)
347
347
 
348
348
  Parses FLATFILE metadata for delimiter, fixed-width, header lines, skip rows, quote/escape chars. Generates `pd.read_fwf()` for fixed-width or enriched `read_file()` for delimited.
349
349
 
350
+ **Fixed-width enhancements (v1.9.8):** `OFFSET`, `PHYSICALLENGTH`, and `PHYSICALOFFSET` are parsed from `SOURCEFIELD` attributes. `physical_length` is preferred over `precision` for accurate column width calculations in `pd.read_fwf()`.
351
+
350
352
  ### Mapplet Inlining (v1.3+)
351
353
 
352
354
  Expands Mapplet instances into prefixed transforms, rewires connectors, and eliminates duplication.
@@ -371,12 +373,17 @@ The generated `helper_functions.py` provides a complete runtime library:
371
373
  ### Database Operations
372
374
  | Function | Description |
373
375
  |----------|-------------|
374
- | `get_db_connection(config, conn_name)` | Create DB connection (pyodbc/pymssql/sqlalchemy fallback for MSSQL) |
376
+ | `get_db_connection(config, conn_name)` | SQLAlchemy-first DB connection with engine caching and connection pooling; DBAPI fallback for pyodbc/pymssql |
375
377
  | `read_from_db(config, query, conn_name)` | Execute SQL query and return DataFrame |
376
378
  | `write_to_db(config, df, table, conn_name)` | Write DataFrame to database table via `.to_sql()` |
377
- | `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement (INSERT, UPDATE, DELETE) |
379
+ | `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement; auto-detects SQLAlchemy vs DBAPI via `dialect` attribute |
378
380
  | `write_with_update_strategy(config, df, table, ...)` | Split rows by `_update_strategy` column into INSERT/UPDATE/DELETE/REJECT operations |
379
381
  | `call_stored_procedure(config, proc, params, ...)` | Execute stored procedure with input/output parameter mapping (Oracle/MSSQL/generic) |
382
+ | `lookup_func(table, *args)` | Full lookup implementation with table caching, condition parsing, and default value support |
383
+ | `resolve_env(value)` | Resolve `${VAR}` placeholders from environment variables with config fallback |
384
+ | `resolve_builtin_variable(var_name, ...)` | Resolve `$PMMappingName`, `$PMSessionName`, `$PMFolderName`, etc. |
385
+ | `rename_with_duplicates(df, col_map)` | Safe column rename supporting one-source-to-many-target mapping |
386
+ | `_safe_close(conn)` | Safe connection cleanup handling both SQLAlchemy and raw DBAPI connections |
380
387
 
381
388
  ### File Operations
382
389
  | Function | Description |
@@ -407,7 +414,34 @@ The generated `helper_functions.py` provides a complete runtime library:
407
414
 
408
415
  ## Changelog
409
416
 
410
- ### v1.9.3 (Current)
417
+ ### v1.9.8 (Current)
418
+ - **NOT(expr) function-call form**: `NOT(ISNULL(x))` now correctly converts to `~(df["x"].isna())` — handles both `NOT ` (with space) and `NOT(` (without space) forms
419
+ - **AND/OR/NOT as field names fix**: Logical operators no longer mangled into `df["AND"]` / `df["OR"]` — conversion moved before field substitution in both `_vec_recursive` fallback and `_vectorize_simple`
420
+ - **Condition tokenizer word-boundary fix**: `_split_condition_tokens` no longer splits on `OR` inside field names like `DeletedIndicator` — verifies preceding character is a real word boundary
421
+ - **`$PMMappingName` in expressions**: `$PM*` built-in variables in expression context properly convert to `resolve_builtin_variable("PMMappingName")` instead of being mangled to `$df["PMMappingName"]`
422
+ - **TO_CHAR arithmetic parenthesization**: `TO_CHAR(TO_INTEGER(x) - 1)` now produces `(pd.to_numeric(...) - 1).astype(str)` instead of incorrect `- 1.astype(str)` binding
423
+ - **String literal early-return fix**: Expressions like `'PER_' || X || '_suffix'` no longer short-circuit as a single string literal
424
+ - **Fixed-width file enhancements**: `OFFSET`, `PHYSICALLENGTH`, `PHYSICALOFFSET` parsed from SOURCEFIELD XML; `physical_length` preferred over `precision` for `read_fwf` column widths
425
+ - **Smart concat coercion**: Scalar returns (e.g. `resolve_builtin_variable()`, `get_variable()`) use `str()` wrapping; Series use `.fillna('').astype(str)`
426
+ - **700 tests** passing
427
+
428
+ ### v1.9.5 / v1.9.6
429
+ - **`rename_with_duplicates`** helper for one-source-to-many-target column mapping
430
+ - **`resolve_env()`** for `${VAR}` placeholder resolution (env → config fallback)
431
+ - **`resolve_builtin_variable()`** for `$PMMappingName`, `$PMSessionName`, `$PMFolderName`, etc.
432
+ - **SQLAlchemy-first `get_db_connection`**: Engine caching and connection pooling; DBAPI fallback for pyodbc/pymssql
433
+ - **`_safe_close()`**: Safe connection cleanup handling both SQLAlchemy and raw DBAPI connections
434
+ - **Full `lookup_func()` implementation**: Table caching, condition parsing, default value support
435
+ - **Null-safe `||` concatenation**: `.fillna('').astype(str)` prevents "nan" strings in concatenation
436
+ - **`$PM*` variable substitution in SQL Override queries**
437
+ - **`execute_sql` dialect detection**: Uses `dialect` attribute to choose SQLAlchemy `text()` vs DBAPI `cursor.execute()`
438
+ - **678 tests** passing
439
+
440
+ ### v1.9.4
441
+ - Extended expression function coverage and edge-case fixes
442
+ - Improved mapplet and connector handling
443
+
444
+ ### v1.9.3
411
445
  - **Smart target write detection**: Bare targets default to `write_to_db()` instead of `write_file()`; file extension allowlist (`.csv`, `.dat`, `.txt`, `.xml`, `.json`, `.parquet`, `.xlsx`, `.xls`, `.tsv`, `.avro`) for file targets; schema-qualified names (`dbo.TABLE`) correctly route to database
412
446
  - **DECODE vectorization**: `DECODE(TRUE, cond1, val1, ..., default)` → nested `np.where()` chains; value-matching DECODE; handles IN() conditions and complex boolean nesting
413
447
  - **IS_SPACES vectorization**: `IS_SPACES(field)` → `field.str.strip().eq("")`
@@ -495,7 +529,7 @@ The generated `helper_functions.py` provides a complete runtime library:
495
529
  cd informatica_python
496
530
  pip install -e ".[dev]"
497
531
 
498
- # Run tests (663 tests)
532
+ # Run tests (700 tests)
499
533
  pytest tests/ -v
500
534
  ```
501
535
 
@@ -97,7 +97,7 @@ The code generator produces real, runnable Python for these transformation types
97
97
  - **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style) with 40+ vectorized function handlers
98
98
  - **Filter** — Row filtering with vectorized converted conditions
99
99
  - **Joiner** — `pd.merge()` with join type and condition parsing (inner/left/right/outer)
100
- - **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution
100
+ - **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution, SQL override support, table caching via `lookup_func()`
101
101
  - **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST, computed aggregates
102
102
  - **Sorter** — `sort_values()` with multi-key ascending/descending per-field direction from SORTDIRECTION attribute
103
103
  - **Router** — Multi-group conditional routing with named groups
@@ -169,7 +169,7 @@ Column-level pandas operations instead of row-level iteration. The expression co
169
169
  - `REPLACECHR/REPLACESTR` → `.str.replace()`
170
170
  - `REG_EXTRACT/REG_REPLACE` → `.str.extract()/.str.replace(regex=True)`
171
171
  - `CHR(code)` → `chr(int(code))`
172
- - `||` concatenation → `+` with `.astype(str)` on non-literals
172
+ - `||` concatenation → `+` with smart coercion: `.fillna('').astype(str)` for Series, `str()` for scalars
173
173
 
174
174
  **Date/Time:**
175
175
  - `TO_DATE(val, fmt)` → `pd.to_datetime()` with Informatica→Python format conversion
@@ -316,10 +316,12 @@ Target field datatypes are mapped to pandas types and generate proper casting co
316
316
  - Decimals/Floats: `pd.to_numeric(errors='coerce')`
317
317
  - Booleans: `.astype('boolean')`
318
318
 
319
- ### Flat File Handling (v1.3+)
319
+ ### Flat File Handling (v1.3+, enhanced v1.9.8)
320
320
 
321
321
  Parses FLATFILE metadata for delimiter, fixed-width, header lines, skip rows, quote/escape chars. Generates `pd.read_fwf()` for fixed-width or enriched `read_file()` for delimited.
322
322
 
323
+ **Fixed-width enhancements (v1.9.8):** `OFFSET`, `PHYSICALLENGTH`, and `PHYSICALOFFSET` are parsed from `SOURCEFIELD` attributes. `physical_length` is preferred over `precision` for accurate column width calculations in `pd.read_fwf()`.
324
+
323
325
  ### Mapplet Inlining (v1.3+)
324
326
 
325
327
  Expands Mapplet instances into prefixed transforms, rewires connectors, and eliminates duplication.
@@ -344,12 +346,17 @@ The generated `helper_functions.py` provides a complete runtime library:
344
346
  ### Database Operations
345
347
  | Function | Description |
346
348
  |----------|-------------|
347
- | `get_db_connection(config, conn_name)` | Create DB connection (pyodbc/pymssql/sqlalchemy fallback for MSSQL) |
349
+ | `get_db_connection(config, conn_name)` | SQLAlchemy-first DB connection with engine caching and connection pooling; DBAPI fallback for pyodbc/pymssql |
348
350
  | `read_from_db(config, query, conn_name)` | Execute SQL query and return DataFrame |
349
351
  | `write_to_db(config, df, table, conn_name)` | Write DataFrame to database table via `.to_sql()` |
350
- | `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement (INSERT, UPDATE, DELETE) |
352
+ | `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement; auto-detects SQLAlchemy vs DBAPI via `dialect` attribute |
351
353
  | `write_with_update_strategy(config, df, table, ...)` | Split rows by `_update_strategy` column into INSERT/UPDATE/DELETE/REJECT operations |
352
354
  | `call_stored_procedure(config, proc, params, ...)` | Execute stored procedure with input/output parameter mapping (Oracle/MSSQL/generic) |
355
+ | `lookup_func(table, *args)` | Full lookup implementation with table caching, condition parsing, and default value support |
356
+ | `resolve_env(value)` | Resolve `${VAR}` placeholders from environment variables with config fallback |
357
+ | `resolve_builtin_variable(var_name, ...)` | Resolve `$PMMappingName`, `$PMSessionName`, `$PMFolderName`, etc. |
358
+ | `rename_with_duplicates(df, col_map)` | Safe column rename supporting one-source-to-many-target mapping |
359
+ | `_safe_close(conn)` | Safe connection cleanup handling both SQLAlchemy and raw DBAPI connections |
353
360
 
354
361
  ### File Operations
355
362
  | Function | Description |
@@ -380,7 +387,34 @@ The generated `helper_functions.py` provides a complete runtime library:
380
387
 
381
388
  ## Changelog
382
389
 
383
- ### v1.9.3 (Current)
390
+ ### v1.9.8 (Current)
391
+ - **NOT(expr) function-call form**: `NOT(ISNULL(x))` now correctly converts to `~(df["x"].isna())` — handles both `NOT ` (with space) and `NOT(` (without space) forms
392
+ - **AND/OR/NOT as field names fix**: Logical operators no longer mangled into `df["AND"]` / `df["OR"]` — conversion moved before field substitution in both `_vec_recursive` fallback and `_vectorize_simple`
393
+ - **Condition tokenizer word-boundary fix**: `_split_condition_tokens` no longer splits on `OR` inside field names like `DeletedIndicator` — verifies preceding character is a real word boundary
394
+ - **`$PMMappingName` in expressions**: `$PM*` built-in variables in expression context properly convert to `resolve_builtin_variable("PMMappingName")` instead of being mangled to `$df["PMMappingName"]`
395
+ - **TO_CHAR arithmetic parenthesization**: `TO_CHAR(TO_INTEGER(x) - 1)` now produces `(pd.to_numeric(...) - 1).astype(str)` instead of incorrect `- 1.astype(str)` binding
396
+ - **String literal early-return fix**: Expressions like `'PER_' || X || '_suffix'` no longer short-circuit as a single string literal
397
+ - **Fixed-width file enhancements**: `OFFSET`, `PHYSICALLENGTH`, `PHYSICALOFFSET` parsed from SOURCEFIELD XML; `physical_length` preferred over `precision` for `read_fwf` column widths
398
+ - **Smart concat coercion**: Scalar returns (e.g. `resolve_builtin_variable()`, `get_variable()`) use `str()` wrapping; Series use `.fillna('').astype(str)`
399
+ - **700 tests** passing
400
+
401
+ ### v1.9.5 / v1.9.6
402
+ - **`rename_with_duplicates`** helper for one-source-to-many-target column mapping
403
+ - **`resolve_env()`** for `${VAR}` placeholder resolution (env → config fallback)
404
+ - **`resolve_builtin_variable()`** for `$PMMappingName`, `$PMSessionName`, `$PMFolderName`, etc.
405
+ - **SQLAlchemy-first `get_db_connection`**: Engine caching and connection pooling; DBAPI fallback for pyodbc/pymssql
406
+ - **`_safe_close()`**: Safe connection cleanup handling both SQLAlchemy and raw DBAPI connections
407
+ - **Full `lookup_func()` implementation**: Table caching, condition parsing, default value support
408
+ - **Null-safe `||` concatenation**: `.fillna('').astype(str)` prevents "nan" strings in concatenation
409
+ - **`$PM*` variable substitution in SQL Override queries**
410
+ - **`execute_sql` dialect detection**: Uses `dialect` attribute to choose SQLAlchemy `text()` vs DBAPI `cursor.execute()`
411
+ - **678 tests** passing
412
+
413
+ ### v1.9.4
414
+ - Extended expression function coverage and edge-case fixes
415
+ - Improved mapplet and connector handling
416
+
417
+ ### v1.9.3
384
418
  - **Smart target write detection**: Bare targets default to `write_to_db()` instead of `write_file()`; file extension allowlist (`.csv`, `.dat`, `.txt`, `.xml`, `.json`, `.parquet`, `.xlsx`, `.xls`, `.tsv`, `.avro`) for file targets; schema-qualified names (`dbo.TABLE`) correctly route to database
385
419
  - **DECODE vectorization**: `DECODE(TRUE, cond1, val1, ..., default)` → nested `np.where()` chains; value-matching DECODE; handles IN() conditions and complex boolean nesting
386
420
  - **IS_SPACES vectorization**: `IS_SPACES(field)` → `field.str.strip().eq("")`
@@ -468,7 +502,7 @@ The generated `helper_functions.py` provides a complete runtime library:
468
502
  cd informatica_python
469
503
  pip install -e ".[dev]"
470
504
 
471
- # Run tests (663 tests)
505
+ # Run tests (700 tests)
472
506
  pytest tests/ -v
473
507
  ```
474
508
 
@@ -883,8 +883,10 @@ def _vec_recursive(expr, df_var):
883
883
  v = _vec_recursive(p, df_var)
884
884
  if v.startswith("'") and v.endswith("'"):
885
885
  vec_parts.append(v)
886
- else:
886
+ elif v.startswith(df_var + '[') or v.startswith('pd.') or '.str.' in v:
887
887
  vec_parts.append(f'{v}.fillna(\'\').astype(str)')
888
+ else:
889
+ vec_parts.append(f'str({v})')
888
890
  return " + ".join(vec_parts)
889
891
 
890
892
  for func_name in sorted(INFA_FUNC_MAP.keys(), key=lambda x: -len(x)):
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: informatica-python
3
- Version: 1.9.7
3
+ Version: 1.9.9
4
4
  Summary: Convert Informatica PowerCenter workflow XML to Python/PySpark code
5
5
  Author: Nick
6
6
  License: MIT
@@ -124,7 +124,7 @@ The code generator produces real, runnable Python for these transformation types
124
124
  - **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style) with 40+ vectorized function handlers
125
125
  - **Filter** — Row filtering with vectorized converted conditions
126
126
  - **Joiner** — `pd.merge()` with join type and condition parsing (inner/left/right/outer)
127
- - **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution
127
+ - **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution, SQL override support, table caching via `lookup_func()`
128
128
  - **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST, computed aggregates
129
129
  - **Sorter** — `sort_values()` with multi-key ascending/descending per-field direction from SORTDIRECTION attribute
130
130
  - **Router** — Multi-group conditional routing with named groups
@@ -196,7 +196,7 @@ Column-level pandas operations instead of row-level iteration. The expression co
196
196
  - `REPLACECHR/REPLACESTR` → `.str.replace()`
197
197
  - `REG_EXTRACT/REG_REPLACE` → `.str.extract()/.str.replace(regex=True)`
198
198
  - `CHR(code)` → `chr(int(code))`
199
- - `||` concatenation → `+` with `.astype(str)` on non-literals
199
+ - `||` concatenation → `+` with smart coercion: `.fillna('').astype(str)` for Series, `str()` for scalars
200
200
 
201
201
  **Date/Time:**
202
202
  - `TO_DATE(val, fmt)` → `pd.to_datetime()` with Informatica→Python format conversion
@@ -343,10 +343,12 @@ Target field datatypes are mapped to pandas types and generate proper casting co
343
343
  - Decimals/Floats: `pd.to_numeric(errors='coerce')`
344
344
  - Booleans: `.astype('boolean')`
345
345
 
346
- ### Flat File Handling (v1.3+)
346
+ ### Flat File Handling (v1.3+, enhanced v1.9.8)
347
347
 
348
348
  Parses FLATFILE metadata for delimiter, fixed-width, header lines, skip rows, quote/escape chars. Generates `pd.read_fwf()` for fixed-width or enriched `read_file()` for delimited.
349
349
 
350
+ **Fixed-width enhancements (v1.9.8):** `OFFSET`, `PHYSICALLENGTH`, and `PHYSICALOFFSET` are parsed from `SOURCEFIELD` attributes. `physical_length` is preferred over `precision` for accurate column width calculations in `pd.read_fwf()`.
351
+
350
352
  ### Mapplet Inlining (v1.3+)
351
353
 
352
354
  Expands Mapplet instances into prefixed transforms, rewires connectors, and eliminates duplication.
@@ -371,12 +373,17 @@ The generated `helper_functions.py` provides a complete runtime library:
371
373
  ### Database Operations
372
374
  | Function | Description |
373
375
  |----------|-------------|
374
- | `get_db_connection(config, conn_name)` | Create DB connection (pyodbc/pymssql/sqlalchemy fallback for MSSQL) |
376
+ | `get_db_connection(config, conn_name)` | SQLAlchemy-first DB connection with engine caching and connection pooling; DBAPI fallback for pyodbc/pymssql |
375
377
  | `read_from_db(config, query, conn_name)` | Execute SQL query and return DataFrame |
376
378
  | `write_to_db(config, df, table, conn_name)` | Write DataFrame to database table via `.to_sql()` |
377
- | `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement (INSERT, UPDATE, DELETE) |
379
+ | `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement; auto-detects SQLAlchemy vs DBAPI via `dialect` attribute |
378
380
  | `write_with_update_strategy(config, df, table, ...)` | Split rows by `_update_strategy` column into INSERT/UPDATE/DELETE/REJECT operations |
379
381
  | `call_stored_procedure(config, proc, params, ...)` | Execute stored procedure with input/output parameter mapping (Oracle/MSSQL/generic) |
382
+ | `lookup_func(table, *args)` | Full lookup implementation with table caching, condition parsing, and default value support |
383
+ | `resolve_env(value)` | Resolve `${VAR}` placeholders from environment variables with config fallback |
384
+ | `resolve_builtin_variable(var_name, ...)` | Resolve `$PMMappingName`, `$PMSessionName`, `$PMFolderName`, etc. |
385
+ | `rename_with_duplicates(df, col_map)` | Safe column rename supporting one-source-to-many-target mapping |
386
+ | `_safe_close(conn)` | Safe connection cleanup handling both SQLAlchemy and raw DBAPI connections |
380
387
 
381
388
  ### File Operations
382
389
  | Function | Description |
@@ -407,7 +414,34 @@ The generated `helper_functions.py` provides a complete runtime library:
407
414
 
408
415
  ## Changelog
409
416
 
410
- ### v1.9.3 (Current)
417
+ ### v1.9.8 (Current)
418
+ - **NOT(expr) function-call form**: `NOT(ISNULL(x))` now correctly converts to `~(df["x"].isna())` — handles both `NOT ` (with space) and `NOT(` (without space) forms
419
+ - **AND/OR/NOT as field names fix**: Logical operators no longer mangled into `df["AND"]` / `df["OR"]` — conversion moved before field substitution in both `_vec_recursive` fallback and `_vectorize_simple`
420
+ - **Condition tokenizer word-boundary fix**: `_split_condition_tokens` no longer splits on `OR` inside field names like `DeletedIndicator` — verifies preceding character is a real word boundary
421
+ - **`$PMMappingName` in expressions**: `$PM*` built-in variables in expression context properly convert to `resolve_builtin_variable("PMMappingName")` instead of being mangled to `$df["PMMappingName"]`
422
+ - **TO_CHAR arithmetic parenthesization**: `TO_CHAR(TO_INTEGER(x) - 1)` now produces `(pd.to_numeric(...) - 1).astype(str)` instead of incorrect `- 1.astype(str)` binding
423
+ - **String literal early-return fix**: Expressions like `'PER_' || X || '_suffix'` no longer short-circuit as a single string literal
424
+ - **Fixed-width file enhancements**: `OFFSET`, `PHYSICALLENGTH`, `PHYSICALOFFSET` parsed from SOURCEFIELD XML; `physical_length` preferred over `precision` for `read_fwf` column widths
425
+ - **Smart concat coercion**: Scalar returns (e.g. `resolve_builtin_variable()`, `get_variable()`) use `str()` wrapping; Series use `.fillna('').astype(str)`
426
+ - **700 tests** passing
427
+
428
+ ### v1.9.5 / v1.9.6
429
+ - **`rename_with_duplicates`** helper for one-source-to-many-target column mapping
430
+ - **`resolve_env()`** for `${VAR}` placeholder resolution (env → config fallback)
431
+ - **`resolve_builtin_variable()`** for `$PMMappingName`, `$PMSessionName`, `$PMFolderName`, etc.
432
+ - **SQLAlchemy-first `get_db_connection`**: Engine caching and connection pooling; DBAPI fallback for pyodbc/pymssql
433
+ - **`_safe_close()`**: Safe connection cleanup handling both SQLAlchemy and raw DBAPI connections
434
+ - **Full `lookup_func()` implementation**: Table caching, condition parsing, default value support
435
+ - **Null-safe `||` concatenation**: `.fillna('').astype(str)` prevents "nan" strings in concatenation
436
+ - **`$PM*` variable substitution in SQL Override queries**
437
+ - **`execute_sql` dialect detection**: Uses `dialect` attribute to choose SQLAlchemy `text()` vs DBAPI `cursor.execute()`
438
+ - **678 tests** passing
439
+
440
+ ### v1.9.4
441
+ - Extended expression function coverage and edge-case fixes
442
+ - Improved mapplet and connector handling
443
+
444
+ ### v1.9.3
411
445
  - **Smart target write detection**: Bare targets default to `write_to_db()` instead of `write_file()`; file extension allowlist (`.csv`, `.dat`, `.txt`, `.xml`, `.json`, `.parquet`, `.xlsx`, `.xls`, `.tsv`, `.avro`) for file targets; schema-qualified names (`dbo.TABLE`) correctly route to database
412
446
  - **DECODE vectorization**: `DECODE(TRUE, cond1, val1, ..., default)` → nested `np.where()` chains; value-matching DECODE; handles IN() conditions and complex boolean nesting
413
447
  - **IS_SPACES vectorization**: `IS_SPACES(field)` → `field.str.strip().eq("")`
@@ -495,7 +529,7 @@ The generated `helper_functions.py` provides a complete runtime library:
495
529
  cd informatica_python
496
530
  pip install -e ".[dev]"
497
531
 
498
- # Run tests (663 tests)
532
+ # Run tests (700 tests)
499
533
  pytest tests/ -v
500
534
  ```
501
535
 
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "informatica-python"
7
- version = "1.9.7"
7
+ version = "1.9.9"
8
8
  description = "Convert Informatica PowerCenter workflow XML to Python/PySpark code"
9
9
  readme = "README.md"
10
10
  license = {text = "MIT"}