PyPI - informatica-python - Versions diffs - 1.8.1__tar.gz → 1.9.0__tar.gz - Mend

informatica-python 1.8.1tar.gz → 1.9.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (31) hide show

{informatica_python-1.8.1 → informatica_python-1.9.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: informatica-python
-Version: 1.8.1
+Version: 1.9.0
 Summary: Convert Informatica PowerCenter workflow XML to Python/PySpark code
 Author: Nick
 License: MIT
@@ -59,6 +59,12 @@ informatica-python workflow_export.xml -z output.zip
 # Use a different data library
 informatica-python workflow_export.xml -o output_dir --data-lib polars
+# Include a parameter file
+informatica-python workflow_export.xml -o output_dir --param-file workflow.param
+# Enable data quality validation on type casts
+informatica-python workflow_export.xml -o output_dir --validate-casts
 # Parse to JSON only (no code generation)
 informatica-python workflow_export.xml --json
@@ -90,12 +96,12 @@ converter.convert_to_files("workflow_export.xml", "output_dir", data_lib="polars
 | File | Description |
 |------|-------------|
-| `helper_functions.py` | Database/file I/O helpers, Informatica expression equivalents (80+ functions) |
-| `mapping_N.py` | One per mapping — transformation logic, source reads, target writes |
-| `workflow.py` | Task orchestration with topological ordering and error handling |
+| `helper_functions.py` | Database/file I/O helpers, Informatica expression equivalents (80+ functions), window/analytic functions, stored procedure execution, state persistence |
+| `mapping_{name}.py` | One per mapping, named after the real Informatica mapping name — transformation logic with row-count logging, source reads, target writes, inline documentation |
+| `workflow.py` | Task orchestration with topological ordering, decision branching, worklet calls, and error handling |
 | `config.yml` | Connection configs, source/target metadata, runtime parameters |
-| `all_sql_queries.sql` | All SQL extracted from Source Qualifiers, Lookups, SQL transforms |
-| `error_log.txt` | Conversion summary, warnings, and unsupported feature notes |
+| `all_sql_queries.sql` | All SQL extracted from Source Qualifiers, Lookups, SQL transforms (with ANSI-translated variants) |
+| `error_log.txt` | Conversion summary with unsupported transform analysis, unmapped port detection, and unknown expression function tracing |
 ## Supported Data Libraries
@@ -113,21 +119,21 @@ Select via `--data-lib` CLI flag or `data_lib` parameter:
 The code generator produces real, runnable Python for these transformation types:
-- **Source Qualifier** — SQL override, pre/post SQL, column selection
-- **Expression** — Field-level expressions converted to pandas operations
-- **Filter** — Row filtering with converted conditions
-- **Joiner** — `pd.merge()` with join type and condition parsing
-- **Lookup** — `pd.merge()` lookups with connection-aware DB/file reads
-- **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST
+- **Source Qualifier** — SQL override, pre/post SQL, column selection, session connection overrides
+- **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style)
+- **Filter** — Row filtering with vectorized converted conditions
+- **Joiner** — `pd.merge()` with join type and condition parsing (inner/left/right/outer)
+- **Lookup** — `pd.merge()` lookups with connection-aware DB/file reads, multiple match policies, default values
+- **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST, computed aggregates
 - **Sorter** — `sort_values()` with multi-key ascending/descending
-- **Router** — Multi-group conditional routing with if/elif/else
+- **Router** — Multi-group conditional routing with named groups
 - **Union** — `pd.concat()` across multiple input groups
-- **Update Strategy** — Insert/Update/Delete/Reject flag generation
+- **Update Strategy** — DD_INSERT/DD_UPDATE/DD_DELETE/DD_REJECT routing with actual target INSERT/UPDATE/DELETE operations, dialect-aware SQL placeholders, auto-detected primary keys
 - **Sequence Generator** — Auto-incrementing ID columns
 - **Normalizer** — `pd.melt()` with auto-detected id/value vars
 - **Rank** — `groupby().rank()` with Top-N filtering
-- **Stored Procedure** — Stub generation with SP name and parameters
-- **Transaction Control** — Commit/rollback logic stubs
+- **Stored Procedure** — Full code generation with Oracle/MSSQL/generic support, input/output parameter mapping
+- **Transaction Control** — Commit/rollback logic
 - **Custom / Java** — Placeholder stubs with TODO markers
 - **SQL Transform** — Direct SQL execution pass-through
@@ -147,13 +153,118 @@ The code generator produces real, runnable Python for these transformation types
 ## Key Features
+### Row-Count Logging (v1.8+)
+Generated code automatically logs row counts at every step of the data pipeline:
+```
+Source SQ_CUSTOMERS: 10000 rows read
+EXP_CALC (Expression): 10000 input rows -> 10000 output rows
+FIL_ACTIVE (Filter): 10000 input rows -> 8542 output rows
+AGG_TOTALS (Aggregator): 8542 input rows -> 150 output rows
+Target TGT_SUMMARY: 150 rows written
+```
+All row-count operations are backend-safe (wrapped in try/except), so Dask and other lazy-evaluation backends won't fail.
+### Generated Code Documentation (v1.8+)
+Every generated mapping function includes a rich docstring describing:
+- Mapping name and original Informatica description
+- Source and target tables/files
+- Transformation pipeline with field counts per step
+Each transformation block is annotated with:
+- Separator headers for visual scanning
+- Transform type and description (from Informatica XML)
+- Input and output field lists (truncated at 10 for readability)
+### Window / Analytic Functions (v1.7+)
+DataFrame-level analytic functions for aggregation transforms:
+- `moving_avg_df(df, col, window)` — rolling mean via `.rolling().mean()`
+- `moving_sum_df(df, col, window)` — rolling sum via `.rolling().sum()`
+- `cume_df(df, col)` — cumulative sum via `.expanding().sum()`
+- `percentile_df(df, col, pct)` — quantile via `.quantile()`
+### Update Strategy with Target Operations (v1.7+)
+Update Strategy transforms now generate real INSERT/UPDATE/DELETE operations:
+- Static strategies (0/1/2/3) map to INSERT/UPDATE/DELETE/REJECT
+- DD_INSERT/DD_UPDATE/DD_DELETE/DD_REJECT expressions parsed from conditions
+- Target writer splits rows and routes to appropriate SQL operations
+- Dialect-aware SQL placeholders (`?` for MSSQL, `%s` for PostgreSQL/Oracle)
+- Primary key columns auto-detected from target field definitions
+### Stored Procedure Execution (v1.7+)
+Full stored procedure code generation (not just stubs):
+- Oracle: `cursor.callproc()` with output parameter registration
+- MSSQL: `EXEC` with output parameter capture
+- Generic: `CALL` syntax for other databases
+- Input/output parameter mapping from transformation fields
+- Empty-input guard prevents errors on empty upstream DataFrames
+### State Persistence (v1.7+)
+JSON-based variable persistence between workflow runs:
+- `load_persistent_state()` / `save_persistent_state()` bracketing workflow execution
+- `get_persistent_variable()` / `set_persistent_variable()` scoped by workflow/mapping name
+- Mapping variables marked `is_persistent="YES"` automatically load from and save to state file
+- Non-persistent variables remain unaffected
+### SQL Dialect Translation (v1.6+)
+Automatically translates vendor-specific SQL to ANSI equivalents:
+- **Oracle:** NVL→COALESCE, SYSDATE→CURRENT_TIMESTAMP, DECODE→CASE, NVL2→CASE, (+)→ANSI JOIN, ROWNUM→LIMIT
+- **MSSQL:** GETDATE→CURRENT_TIMESTAMP, ISNULL→COALESCE, TOP N→LIMIT, LEN→LENGTH, CHARINDEX→POSITION
+- Auto-detects source dialect; outputs both original and translated SQL
+### Enhanced Error Reporting (v1.6+)
+Structured error log with three analysis sections:
+- **Unsupported Transforms:** Lists each skipped transform with type, field count, and attributes
+- **Unmapped Ports:** OUTPUT fields not connected to any downstream transform
+- **Unsupported Expression Functions:** Unknown functions with location traces
+### Nested Mapplet Support (v1.6+)
+Recursively expands mapplet-within-mapplet instances:
+- Double-underscore namespacing for nested transforms
+- Depth limit of 10 with circular reference protection
+- Connector rewiring through the full expansion tree
+### Data Quality Validation (v1.6+)
+Optional `--validate-casts` flag generates null-count checks before/after type casting:
+- Counts null values pre- and post-coercion per column
+- Logs warnings when coercion introduces new nulls
+- Helps identify data quality issues during test runs
+### Vectorized Expression Generation (v1.5+)
+Column-level pandas operations instead of row-level iteration:
+- IIF → `np.where()`, NVL → `.fillna()`, UPPER/LOWER → `.str.upper()/.str.lower()`
+- SUBSTR → `.str[start:end]`, TO_INTEGER → `pd.to_numeric()`, TO_DATE → `pd.to_datetime()`
+- IS NULL/IS NOT NULL → `.isna()`/`.notna()`
+### Parameter File Support (v1.5+)
+Standard Informatica `.param` file parsing:
+- `[Global]` and `[folder.WF:workflow.ST:session]` section support
+- `get_param(config, var_name)` resolution chain: config → env vars → defaults
+- CLI `--param-file` flag for specifying parameter files
 ### Session Connection Overrides (v1.4+)
 When sessions define per-transform connection overrides (different database, file directory, or filename), the generated code uses those overrides instead of source/target defaults.
 ### Worklet Support (v1.4+)
 Worklet workflows are detected and generate separate `run_worklet_NAME(config)` functions. The main workflow calls these automatically for Worklet task types.
 ### Type Casting at Target Writes (v1.4+)
 Target field datatypes are mapped to pandas types and generate proper casting code:
 - Integers: nullable `Int64`/`Int32` or `fillna(0).astype(int)` for NOT NULL
 - Dates: `pd.to_datetime(errors='coerce')`
@@ -161,12 +272,15 @@ Target field datatypes are mapped to pandas types and generate proper casting co
 - Booleans: `.astype('boolean')`
 ### Flat File Handling (v1.3+)
 Parses FLATFILE metadata for delimiter, fixed-width, header lines, skip rows, quote/escape chars. Generates `pd.read_fwf()` for fixed-width or enriched `read_file()` for delimited.
 ### Mapplet Inlining (v1.3+)
 Expands Mapplet instances into prefixed transforms, rewires connectors, and eliminates duplication.
 ### Decision Tasks (v1.3+)
 Converts Informatica decision conditions to Python if/else branches with proper variable substitution.
 ### Expression Converter (80+ Functions)
@@ -190,6 +304,36 @@ Converts Informatica expressions to Python equivalents:
 ## Changelog
+### v1.9.x (Phase 8)
+- Mapping output files now use real mapping names (e.g., `mapping_m_customer_load.py`) instead of generic numeric indices (`mapping_1.py`)
+- Workflow imports automatically match the named mapping files
+### v1.8.x (Phase 7)
+- Row-count logging at every pipeline step (source reads, transforms, target writes)
+- Backend-safe logging (try/except wrapped for Dask/lazy backends)
+- Rich mapping function docstrings with sources, targets, and transform pipeline summary
+- Per-transform documentation headers with description, input/output field lists
+### v1.7.x (Phase 6)
+- Window/analytic functions (rolling avg/sum, cumulative sum, percentile)
+- Update Strategy routing with actual INSERT/UPDATE/DELETE target operations
+- Dialect-aware SQL placeholders for MSSQL/PostgreSQL/Oracle
+- Full stored procedure code generation (Oracle/MSSQL/generic)
+- JSON-based state persistence for mapping and workflow variables
+- Primary key auto-detection for update strategy targets
+### v1.6.x (Phase 5)
+- SQL dialect translation (Oracle/MSSQL → ANSI)
+- Enhanced error reporting (unsupported transforms, unmapped ports, unknown functions)
+- Nested mapplet expansion with circular reference protection
+- Data quality validation warnings on type casting (`--validate-casts`)
+### v1.5.x (Phase 4)
+- Parameter file support (`.param` files with section parsing)
+- Vectorized expression generation (column-level pandas operations)
+- Library-specific code adapters (polars/dask/modin/vaex syntax generation)
+- 72+ integration tests
 ### v1.4.x (Phase 3)
 - Session connection overrides for sources and targets
 - Worklet function generation with safe invocation
@@ -217,8 +361,8 @@ Converts Informatica expressions to Python equivalents:
 cd informatica_python
 pip install -e ".[dev]"
-# Run tests (25 tests)
-pytest tests/test_converter.py -v
+# Run tests (136 tests)
+pytest tests/ -v
 ```
 ## License

informatica_python-1.9.0/README.md ADDED Viewed

@@ -0,0 +1,345 @@
+# informatica-python
+Convert Informatica PowerCenter workflow XML exports into clean, runnable Python/PySpark code.
+**Author:** Nick
+**License:** MIT
+**PyPI:** [informatica-python](https://pypi.org/project/informatica-python/)
+---
+## Overview
+`informatica-python` parses Informatica PowerCenter XML export files and generates equivalent Python code using your choice of data library. It handles all 72 DTD tags from the PowerCenter XML schema and produces a complete, ready-to-run Python project.
+## Installation
+```bash
+pip install informatica-python
+```
+## Quick Start
+### Command Line
+```bash
+# Generate Python files to a directory
+informatica-python workflow_export.xml -o output_dir
+# Generate as a zip archive
+informatica-python workflow_export.xml -z output.zip
+# Use a different data library
+informatica-python workflow_export.xml -o output_dir --data-lib polars
+# Include a parameter file
+informatica-python workflow_export.xml -o output_dir --param-file workflow.param
+# Enable data quality validation on type casts
+informatica-python workflow_export.xml -o output_dir --validate-casts
+# Parse to JSON only (no code generation)
+informatica-python workflow_export.xml --json
+# Save parsed JSON to file
+informatica-python workflow_export.xml --json-file parsed.json
+```
+### Python API
+```python
+from informatica_python import InformaticaConverter
+converter = InformaticaConverter()
+# Parse and generate files
+converter.convert_to_files("workflow_export.xml", "output_dir")
+# Parse and generate zip
+converter.convert_to_zip("workflow_export.xml", "output.zip")
+# Parse to structured dict
+result = converter.parse_file("workflow_export.xml")
+# Use a different data library
+converter.convert_to_files("workflow_export.xml", "output_dir", data_lib="polars")
+```
+## Generated Output Files
+| File | Description |
+|------|-------------|
+| `helper_functions.py` | Database/file I/O helpers, Informatica expression equivalents (80+ functions), window/analytic functions, stored procedure execution, state persistence |
+| `mapping_{name}.py` | One per mapping, named after the real Informatica mapping name — transformation logic with row-count logging, source reads, target writes, inline documentation |
+| `workflow.py` | Task orchestration with topological ordering, decision branching, worklet calls, and error handling |
+| `config.yml` | Connection configs, source/target metadata, runtime parameters |
+| `all_sql_queries.sql` | All SQL extracted from Source Qualifiers, Lookups, SQL transforms (with ANSI-translated variants) |
+| `error_log.txt` | Conversion summary with unsupported transform analysis, unmapped port detection, and unknown expression function tracing |
+## Supported Data Libraries
+Select via `--data-lib` CLI flag or `data_lib` parameter:
+| Library | Flag | Best For |
+|---------|------|----------|
+| **pandas** | `pandas` (default) | General-purpose, most compatible |
+| **dask** | `dask` | Large datasets, parallel processing |
+| **polars** | `polars` | High performance, Rust-backed |
+| **vaex** | `vaex` | Out-of-core, billion-row datasets |
+| **modin** | `modin` | Drop-in pandas replacement, multi-core |
+## Supported Transformations
+The code generator produces real, runnable Python for these transformation types:
+- **Source Qualifier** — SQL override, pre/post SQL, column selection, session connection overrides
+- **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style)
+- **Filter** — Row filtering with vectorized converted conditions
+- **Joiner** — `pd.merge()` with join type and condition parsing (inner/left/right/outer)
+- **Lookup** — `pd.merge()` lookups with connection-aware DB/file reads, multiple match policies, default values
+- **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST, computed aggregates
+- **Sorter** — `sort_values()` with multi-key ascending/descending
+- **Router** — Multi-group conditional routing with named groups
+- **Union** — `pd.concat()` across multiple input groups
+- **Update Strategy** — DD_INSERT/DD_UPDATE/DD_DELETE/DD_REJECT routing with actual target INSERT/UPDATE/DELETE operations, dialect-aware SQL placeholders, auto-detected primary keys
+- **Sequence Generator** — Auto-incrementing ID columns
+- **Normalizer** — `pd.melt()` with auto-detected id/value vars
+- **Rank** — `groupby().rank()` with Top-N filtering
+- **Stored Procedure** — Full code generation with Oracle/MSSQL/generic support, input/output parameter mapping
+- **Transaction Control** — Commit/rollback logic
+- **Custom / Java** — Placeholder stubs with TODO markers
+- **SQL Transform** — Direct SQL execution pass-through
+## Supported XML Tags (72 Tags)
+**Top-level:** POWERMART, REPOSITORY, FOLDER, FOLDERVERSION
+**Source/Target:** SOURCE, SOURCEFIELD, TARGET, TARGETFIELD, TARGETINDEX, TARGETINDEXFIELD, FLATFILE, XMLINFO, XMLTEXT, GROUP, TABLEATTRIBUTE, FIELDATTRIBUTE, METADATAEXTENSION, KEYWORD, ERPSRCINFO
+**Mapping/Mapplet:** MAPPING, MAPPLET, TRANSFORMATION, TRANSFORMFIELD, TRANSFORMFIELDATTR, TRANSFORMFIELDATTRDEF, INSTANCE, ASSOCIATED_SOURCE_INSTANCE, CONNECTOR, MAPDEPENDENCY, TARGETLOADORDER, MAPPINGVARIABLE, FIELDDEPENDENCY, INITPROP, ERPINFO
+**Task/Session/Workflow:** TASK, TIMER, VALUEPAIR, SCHEDULER, SCHEDULEINFO, STARTOPTIONS, ENDOPTIONS, SCHEDULEOPTIONS, RECURRING, CUSTOM, DAILYFREQUENCY, REPEAT, FILTER, SESSION, CONFIGREFERENCE, SESSTRANSFORMATIONINST, SESSTRANSFORMATIONGROUP, PARTITION, HASHKEY, KEYRANGE, CONFIG, SESSIONCOMPONENT, CONNECTIONREFERENCE, TASKINSTANCE, WORKFLOWLINK, WORKFLOWVARIABLE, WORKFLOWEVENT, WORKLET, WORKFLOW, ATTRIBUTE
+**Shortcut:** SHORTCUT
+**SAP:** SAPFUNCTION, SAPSTRUCTURE, SAPPROGRAM, SAPOUTPUTPORT, SAPVARIABLE, SAPPROGRAMFLOWOBJECT, SAPTABLEPARAM
+## Key Features
+### Row-Count Logging (v1.8+)
+Generated code automatically logs row counts at every step of the data pipeline:
+```
+Source SQ_CUSTOMERS: 10000 rows read
+EXP_CALC (Expression): 10000 input rows -> 10000 output rows
+FIL_ACTIVE (Filter): 10000 input rows -> 8542 output rows
+AGG_TOTALS (Aggregator): 8542 input rows -> 150 output rows
+Target TGT_SUMMARY: 150 rows written
+```
+All row-count operations are backend-safe (wrapped in try/except), so Dask and other lazy-evaluation backends won't fail.
+### Generated Code Documentation (v1.8+)
+Every generated mapping function includes a rich docstring describing:
+- Mapping name and original Informatica description
+- Source and target tables/files
+- Transformation pipeline with field counts per step
+Each transformation block is annotated with:
+- Separator headers for visual scanning
+- Transform type and description (from Informatica XML)
+- Input and output field lists (truncated at 10 for readability)
+### Window / Analytic Functions (v1.7+)
+DataFrame-level analytic functions for aggregation transforms:
+- `moving_avg_df(df, col, window)` — rolling mean via `.rolling().mean()`
+- `moving_sum_df(df, col, window)` — rolling sum via `.rolling().sum()`
+- `cume_df(df, col)` — cumulative sum via `.expanding().sum()`
+- `percentile_df(df, col, pct)` — quantile via `.quantile()`
+### Update Strategy with Target Operations (v1.7+)
+Update Strategy transforms now generate real INSERT/UPDATE/DELETE operations:
+- Static strategies (0/1/2/3) map to INSERT/UPDATE/DELETE/REJECT
+- DD_INSERT/DD_UPDATE/DD_DELETE/DD_REJECT expressions parsed from conditions
+- Target writer splits rows and routes to appropriate SQL operations
+- Dialect-aware SQL placeholders (`?` for MSSQL, `%s` for PostgreSQL/Oracle)
+- Primary key columns auto-detected from target field definitions
+### Stored Procedure Execution (v1.7+)
+Full stored procedure code generation (not just stubs):
+- Oracle: `cursor.callproc()` with output parameter registration
+- MSSQL: `EXEC` with output parameter capture
+- Generic: `CALL` syntax for other databases
+- Input/output parameter mapping from transformation fields
+- Empty-input guard prevents errors on empty upstream DataFrames
+### State Persistence (v1.7+)
+JSON-based variable persistence between workflow runs:
+- `load_persistent_state()` / `save_persistent_state()` bracketing workflow execution
+- `get_persistent_variable()` / `set_persistent_variable()` scoped by workflow/mapping name
+- Mapping variables marked `is_persistent="YES"` automatically load from and save to state file
+- Non-persistent variables remain unaffected
+### SQL Dialect Translation (v1.6+)
+Automatically translates vendor-specific SQL to ANSI equivalents:
+- **Oracle:** NVL→COALESCE, SYSDATE→CURRENT_TIMESTAMP, DECODE→CASE, NVL2→CASE, (+)→ANSI JOIN, ROWNUM→LIMIT
+- **MSSQL:** GETDATE→CURRENT_TIMESTAMP, ISNULL→COALESCE, TOP N→LIMIT, LEN→LENGTH, CHARINDEX→POSITION
+- Auto-detects source dialect; outputs both original and translated SQL
+### Enhanced Error Reporting (v1.6+)
+Structured error log with three analysis sections:
+- **Unsupported Transforms:** Lists each skipped transform with type, field count, and attributes
+- **Unmapped Ports:** OUTPUT fields not connected to any downstream transform
+- **Unsupported Expression Functions:** Unknown functions with location traces
+### Nested Mapplet Support (v1.6+)
+Recursively expands mapplet-within-mapplet instances:
+- Double-underscore namespacing for nested transforms
+- Depth limit of 10 with circular reference protection
+- Connector rewiring through the full expansion tree
+### Data Quality Validation (v1.6+)
+Optional `--validate-casts` flag generates null-count checks before/after type casting:
+- Counts null values pre- and post-coercion per column
+- Logs warnings when coercion introduces new nulls
+- Helps identify data quality issues during test runs
+### Vectorized Expression Generation (v1.5+)
+Column-level pandas operations instead of row-level iteration:
+- IIF → `np.where()`, NVL → `.fillna()`, UPPER/LOWER → `.str.upper()/.str.lower()`
+- SUBSTR → `.str[start:end]`, TO_INTEGER → `pd.to_numeric()`, TO_DATE → `pd.to_datetime()`
+- IS NULL/IS NOT NULL → `.isna()`/`.notna()`
+### Parameter File Support (v1.5+)
+Standard Informatica `.param` file parsing:
+- `[Global]` and `[folder.WF:workflow.ST:session]` section support
+- `get_param(config, var_name)` resolution chain: config → env vars → defaults
+- CLI `--param-file` flag for specifying parameter files
+### Session Connection Overrides (v1.4+)
+When sessions define per-transform connection overrides (different database, file directory, or filename), the generated code uses those overrides instead of source/target defaults.
+### Worklet Support (v1.4+)
+Worklet workflows are detected and generate separate `run_worklet_NAME(config)` functions. The main workflow calls these automatically for Worklet task types.
+### Type Casting at Target Writes (v1.4+)
+Target field datatypes are mapped to pandas types and generate proper casting code:
+- Integers: nullable `Int64`/`Int32` or `fillna(0).astype(int)` for NOT NULL
+- Dates: `pd.to_datetime(errors='coerce')`
+- Decimals/Floats: `pd.to_numeric(errors='coerce')`
+- Booleans: `.astype('boolean')`
+### Flat File Handling (v1.3+)
+Parses FLATFILE metadata for delimiter, fixed-width, header lines, skip rows, quote/escape chars. Generates `pd.read_fwf()` for fixed-width or enriched `read_file()` for delimited.
+### Mapplet Inlining (v1.3+)
+Expands Mapplet instances into prefixed transforms, rewires connectors, and eliminates duplication.
+### Decision Tasks (v1.3+)
+Converts Informatica decision conditions to Python if/else branches with proper variable substitution.
+### Expression Converter (80+ Functions)
+Converts Informatica expressions to Python equivalents:
+- **String:** SUBSTR, LTRIM, RTRIM, UPPER, LOWER, LPAD, RPAD, INSTR, LENGTH, CONCAT, REPLACE, REG_EXTRACT, REG_REPLACE, REVERSE, INITCAP, CHR, ASCII
+- **Date:** ADD_TO_DATE, DATE_DIFF, GET_DATE_PART, SYSDATE, SYSTIMESTAMP, TO_DATE, TO_CHAR, TRUNC (date)
+- **Numeric:** ROUND, TRUNC, MOD, ABS, CEIL, FLOOR, POWER, SQRT, LOG, EXP, SIGN
+- **Conversion:** TO_INTEGER, TO_BIGINT, TO_FLOAT, TO_DECIMAL, TO_CHAR, TO_DATE
+- **Null handling:** IIF, DECODE, NVL, NVL2, ISNULL, IS_SPACES, IS_NUMBER
+- **Aggregate:** SUM, AVG, COUNT, MIN, MAX, FIRST, LAST, MEDIAN, STDDEV, VARIANCE
+- **Lookup:** :LKP expressions with dynamic lookup references
+- **Variable:** SETVARIABLE / mapping variable assignment
+## Requirements
+- Python >= 3.8
+- lxml >= 4.9.0
+- PyYAML >= 6.0
+## Changelog
+### v1.9.x (Phase 8)
+- Mapping output files now use real mapping names (e.g., `mapping_m_customer_load.py`) instead of generic numeric indices (`mapping_1.py`)
+- Workflow imports automatically match the named mapping files
+### v1.8.x (Phase 7)
+- Row-count logging at every pipeline step (source reads, transforms, target writes)
+- Backend-safe logging (try/except wrapped for Dask/lazy backends)
+- Rich mapping function docstrings with sources, targets, and transform pipeline summary
+- Per-transform documentation headers with description, input/output field lists
+### v1.7.x (Phase 6)
+- Window/analytic functions (rolling avg/sum, cumulative sum, percentile)
+- Update Strategy routing with actual INSERT/UPDATE/DELETE target operations
+- Dialect-aware SQL placeholders for MSSQL/PostgreSQL/Oracle
+- Full stored procedure code generation (Oracle/MSSQL/generic)
+- JSON-based state persistence for mapping and workflow variables
+- Primary key auto-detection for update strategy targets
+### v1.6.x (Phase 5)
+- SQL dialect translation (Oracle/MSSQL → ANSI)
+- Enhanced error reporting (unsupported transforms, unmapped ports, unknown functions)
+- Nested mapplet expansion with circular reference protection
+- Data quality validation warnings on type casting (`--validate-casts`)
+### v1.5.x (Phase 4)
+- Parameter file support (`.param` files with section parsing)
+- Vectorized expression generation (column-level pandas operations)
+- Library-specific code adapters (polars/dask/modin/vaex syntax generation)
+- 72+ integration tests
+### v1.4.x (Phase 3)
+- Session connection overrides for sources and targets
+- Worklet function generation with safe invocation
+- Type casting at target writes based on TARGETFIELD datatypes
+- Flat-file session path overrides properly wired
+### v1.3.x (Phase 2)
+- FLATFILE metadata in source reads and target writes
+- Normalizer with `pd.melt()`
+- Rank with group-by and Top-N filtering
+- Decision tasks with real if/else branches
+- Mapplet instance inlining
+### v1.2.x (Phase 1)
+- Core parser for all 72 XML tags
+- Expression converter with 80+ functions
+- Aggregator, Joiner, Lookup code generation
+- Workflow orchestration with topological task ordering
+- Multi-library support (pandas, dask, polars, vaex, modin)
+## Development
+```bash
+# Clone and install in development mode
+cd informatica_python
+pip install -e ".[dev]"
+# Run tests (136 tests)
+pytest tests/ -v
+```
+## License
+MIT License - Copyright (c) 2025 Nick
+See [LICENSE](LICENSE) for details.

{informatica_python-1.8.1 → informatica_python-1.9.0}/informatica_python/__init__.py RENAMED Viewed

@@ -7,7 +7,7 @@ Licensed under the MIT License.
 from informatica_python.converter import InformaticaConverter
-__version__ = "1.8.1"
+__version__ = "1.9.0"
 __author__ = "Nick"
 __license__ = "MIT"
 __all__ = ["InformaticaConverter"]

{informatica_python-1.8.1 → informatica_python-1.9.0}/informatica_python/converter.py RENAMED Viewed

@@ -19,6 +19,14 @@ class InformaticaConverter:
         self.parser = InformaticaParser()
         self.powermart = None
+    @staticmethod
+    def _safe_name(name):
+        import re
+        safe = re.sub(r'[^a-zA-Z0-9_]', '_', name)
+        if safe and safe[0].isdigit():
+            safe = '_' + safe
+        return safe.lower()
     def parse_file(self, file_path: str) -> dict:
         self.powermart = self.parser.parse_file(file_path)
         return self.to_json()
@@ -102,7 +110,8 @@ class InformaticaConverter:
         for i, mapping in enumerate(folder.mappings, 1):
             code = generate_mapping_code(mapping, folder, self.data_lib, i, validate_casts=validate_casts)
-            files[f"mapping_{i}.py"] = code
+            safe_name = self._safe_name(mapping.name)
+            files[f"mapping_{safe_name}.py"] = code
         files["workflow.py"] = generate_workflow_code(folder)

{informatica_python-1.8.1 → informatica_python-1.9.0}/informatica_python/generators/workflow_gen.py RENAMED Viewed

@@ -25,9 +25,9 @@ def generate_workflow_code(folder: FolderDef) -> str:
     lines.append("from helper_functions import load_config, logger, load_persistent_state, save_persistent_state, get_persistent_variable, set_persistent_variable")
     lines.append("")
-    for i, mapping in enumerate(folder.mappings, 1):
+    for mapping in folder.mappings:
         safe_name = _safe_name(mapping.name)
-        lines.append(f"from mapping_{i} import run_{safe_name}")
+        lines.append(f"from mapping_{safe_name} import run_{safe_name}")
     lines.append("")
     lines.append("")

{informatica_python-1.8.1 → informatica_python-1.9.0}/informatica_python.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: informatica-python
-Version: 1.8.1
+Version: 1.9.0
 Summary: Convert Informatica PowerCenter workflow XML to Python/PySpark code
 Author: Nick
 License: MIT
@@ -59,6 +59,12 @@ informatica-python workflow_export.xml -z output.zip
 # Use a different data library
 informatica-python workflow_export.xml -o output_dir --data-lib polars
+# Include a parameter file
+informatica-python workflow_export.xml -o output_dir --param-file workflow.param
+# Enable data quality validation on type casts
+informatica-python workflow_export.xml -o output_dir --validate-casts
 # Parse to JSON only (no code generation)
 informatica-python workflow_export.xml --json
@@ -90,12 +96,12 @@ converter.convert_to_files("workflow_export.xml", "output_dir", data_lib="polars
 | File | Description |
 |------|-------------|
-| `helper_functions.py` | Database/file I/O helpers, Informatica expression equivalents (80+ functions) |
-| `mapping_N.py` | One per mapping — transformation logic, source reads, target writes |
-| `workflow.py` | Task orchestration with topological ordering and error handling |
+| `helper_functions.py` | Database/file I/O helpers, Informatica expression equivalents (80+ functions), window/analytic functions, stored procedure execution, state persistence |
+| `mapping_{name}.py` | One per mapping, named after the real Informatica mapping name — transformation logic with row-count logging, source reads, target writes, inline documentation |
+| `workflow.py` | Task orchestration with topological ordering, decision branching, worklet calls, and error handling |
 | `config.yml` | Connection configs, source/target metadata, runtime parameters |
-| `all_sql_queries.sql` | All SQL extracted from Source Qualifiers, Lookups, SQL transforms |
-| `error_log.txt` | Conversion summary, warnings, and unsupported feature notes |
+| `all_sql_queries.sql` | All SQL extracted from Source Qualifiers, Lookups, SQL transforms (with ANSI-translated variants) |
+| `error_log.txt` | Conversion summary with unsupported transform analysis, unmapped port detection, and unknown expression function tracing |
 ## Supported Data Libraries
@@ -113,21 +119,21 @@ Select via `--data-lib` CLI flag or `data_lib` parameter:
 The code generator produces real, runnable Python for these transformation types:
-- **Source Qualifier** — SQL override, pre/post SQL, column selection
-- **Expression** — Field-level expressions converted to pandas operations
-- **Filter** — Row filtering with converted conditions
-- **Joiner** — `pd.merge()` with join type and condition parsing
-- **Lookup** — `pd.merge()` lookups with connection-aware DB/file reads
-- **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST
+- **Source Qualifier** — SQL override, pre/post SQL, column selection, session connection overrides
+- **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style)
+- **Filter** — Row filtering with vectorized converted conditions
+- **Joiner** — `pd.merge()` with join type and condition parsing (inner/left/right/outer)
+- **Lookup** — `pd.merge()` lookups with connection-aware DB/file reads, multiple match policies, default values
+- **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST, computed aggregates
 - **Sorter** — `sort_values()` with multi-key ascending/descending
-- **Router** — Multi-group conditional routing with if/elif/else
+- **Router** — Multi-group conditional routing with named groups
 - **Union** — `pd.concat()` across multiple input groups
-- **Update Strategy** — Insert/Update/Delete/Reject flag generation
+- **Update Strategy** — DD_INSERT/DD_UPDATE/DD_DELETE/DD_REJECT routing with actual target INSERT/UPDATE/DELETE operations, dialect-aware SQL placeholders, auto-detected primary keys
 - **Sequence Generator** — Auto-incrementing ID columns
 - **Normalizer** — `pd.melt()` with auto-detected id/value vars
 - **Rank** — `groupby().rank()` with Top-N filtering
-- **Stored Procedure** — Stub generation with SP name and parameters
-- **Transaction Control** — Commit/rollback logic stubs
+- **Stored Procedure** — Full code generation with Oracle/MSSQL/generic support, input/output parameter mapping
+- **Transaction Control** — Commit/rollback logic
 - **Custom / Java** — Placeholder stubs with TODO markers
 - **SQL Transform** — Direct SQL execution pass-through
@@ -147,13 +153,118 @@ The code generator produces real, runnable Python for these transformation types
 ## Key Features
+### Row-Count Logging (v1.8+)
+Generated code automatically logs row counts at every step of the data pipeline:
+```
+Source SQ_CUSTOMERS: 10000 rows read
+EXP_CALC (Expression): 10000 input rows -> 10000 output rows
+FIL_ACTIVE (Filter): 10000 input rows -> 8542 output rows
+AGG_TOTALS (Aggregator): 8542 input rows -> 150 output rows
+Target TGT_SUMMARY: 150 rows written
+```
+All row-count operations are backend-safe (wrapped in try/except), so Dask and other lazy-evaluation backends won't fail.
+### Generated Code Documentation (v1.8+)
+Every generated mapping function includes a rich docstring describing:
+- Mapping name and original Informatica description
+- Source and target tables/files
+- Transformation pipeline with field counts per step
+Each transformation block is annotated with:
+- Separator headers for visual scanning
+- Transform type and description (from Informatica XML)
+- Input and output field lists (truncated at 10 for readability)
+### Window / Analytic Functions (v1.7+)
+DataFrame-level analytic functions for aggregation transforms:
+- `moving_avg_df(df, col, window)` — rolling mean via `.rolling().mean()`
+- `moving_sum_df(df, col, window)` — rolling sum via `.rolling().sum()`
+- `cume_df(df, col)` — cumulative sum via `.expanding().sum()`
+- `percentile_df(df, col, pct)` — quantile via `.quantile()`
+### Update Strategy with Target Operations (v1.7+)
+Update Strategy transforms now generate real INSERT/UPDATE/DELETE operations:
+- Static strategies (0/1/2/3) map to INSERT/UPDATE/DELETE/REJECT
+- DD_INSERT/DD_UPDATE/DD_DELETE/DD_REJECT expressions parsed from conditions
+- Target writer splits rows and routes to appropriate SQL operations
+- Dialect-aware SQL placeholders (`?` for MSSQL, `%s` for PostgreSQL/Oracle)
+- Primary key columns auto-detected from target field definitions
+### Stored Procedure Execution (v1.7+)
+Full stored procedure code generation (not just stubs):
+- Oracle: `cursor.callproc()` with output parameter registration
+- MSSQL: `EXEC` with output parameter capture
+- Generic: `CALL` syntax for other databases
+- Input/output parameter mapping from transformation fields
+- Empty-input guard prevents errors on empty upstream DataFrames
+### State Persistence (v1.7+)
+JSON-based variable persistence between workflow runs:
+- `load_persistent_state()` / `save_persistent_state()` bracketing workflow execution
+- `get_persistent_variable()` / `set_persistent_variable()` scoped by workflow/mapping name
+- Mapping variables marked `is_persistent="YES"` automatically load from and save to state file
+- Non-persistent variables remain unaffected
+### SQL Dialect Translation (v1.6+)
+Automatically translates vendor-specific SQL to ANSI equivalents:
+- **Oracle:** NVL→COALESCE, SYSDATE→CURRENT_TIMESTAMP, DECODE→CASE, NVL2→CASE, (+)→ANSI JOIN, ROWNUM→LIMIT
+- **MSSQL:** GETDATE→CURRENT_TIMESTAMP, ISNULL→COALESCE, TOP N→LIMIT, LEN→LENGTH, CHARINDEX→POSITION
+- Auto-detects source dialect; outputs both original and translated SQL
+### Enhanced Error Reporting (v1.6+)
+Structured error log with three analysis sections:
+- **Unsupported Transforms:** Lists each skipped transform with type, field count, and attributes
+- **Unmapped Ports:** OUTPUT fields not connected to any downstream transform
+- **Unsupported Expression Functions:** Unknown functions with location traces
+### Nested Mapplet Support (v1.6+)
+Recursively expands mapplet-within-mapplet instances:
+- Double-underscore namespacing for nested transforms
+- Depth limit of 10 with circular reference protection
+- Connector rewiring through the full expansion tree
+### Data Quality Validation (v1.6+)
+Optional `--validate-casts` flag generates null-count checks before/after type casting:
+- Counts null values pre- and post-coercion per column
+- Logs warnings when coercion introduces new nulls
+- Helps identify data quality issues during test runs
+### Vectorized Expression Generation (v1.5+)
+Column-level pandas operations instead of row-level iteration:
+- IIF → `np.where()`, NVL → `.fillna()`, UPPER/LOWER → `.str.upper()/.str.lower()`
+- SUBSTR → `.str[start:end]`, TO_INTEGER → `pd.to_numeric()`, TO_DATE → `pd.to_datetime()`
+- IS NULL/IS NOT NULL → `.isna()`/`.notna()`
+### Parameter File Support (v1.5+)
+Standard Informatica `.param` file parsing:
+- `[Global]` and `[folder.WF:workflow.ST:session]` section support
+- `get_param(config, var_name)` resolution chain: config → env vars → defaults
+- CLI `--param-file` flag for specifying parameter files
 ### Session Connection Overrides (v1.4+)
 When sessions define per-transform connection overrides (different database, file directory, or filename), the generated code uses those overrides instead of source/target defaults.
 ### Worklet Support (v1.4+)
 Worklet workflows are detected and generate separate `run_worklet_NAME(config)` functions. The main workflow calls these automatically for Worklet task types.
 ### Type Casting at Target Writes (v1.4+)
 Target field datatypes are mapped to pandas types and generate proper casting code:
 - Integers: nullable `Int64`/`Int32` or `fillna(0).astype(int)` for NOT NULL
 - Dates: `pd.to_datetime(errors='coerce')`
@@ -161,12 +272,15 @@ Target field datatypes are mapped to pandas types and generate proper casting co
 - Booleans: `.astype('boolean')`
 ### Flat File Handling (v1.3+)
 Parses FLATFILE metadata for delimiter, fixed-width, header lines, skip rows, quote/escape chars. Generates `pd.read_fwf()` for fixed-width or enriched `read_file()` for delimited.
 ### Mapplet Inlining (v1.3+)
 Expands Mapplet instances into prefixed transforms, rewires connectors, and eliminates duplication.
 ### Decision Tasks (v1.3+)
 Converts Informatica decision conditions to Python if/else branches with proper variable substitution.
 ### Expression Converter (80+ Functions)
@@ -190,6 +304,36 @@ Converts Informatica expressions to Python equivalents:
 ## Changelog
+### v1.9.x (Phase 8)
+- Mapping output files now use real mapping names (e.g., `mapping_m_customer_load.py`) instead of generic numeric indices (`mapping_1.py`)
+- Workflow imports automatically match the named mapping files
+### v1.8.x (Phase 7)
+- Row-count logging at every pipeline step (source reads, transforms, target writes)
+- Backend-safe logging (try/except wrapped for Dask/lazy backends)
+- Rich mapping function docstrings with sources, targets, and transform pipeline summary
+- Per-transform documentation headers with description, input/output field lists
+### v1.7.x (Phase 6)
+- Window/analytic functions (rolling avg/sum, cumulative sum, percentile)
+- Update Strategy routing with actual INSERT/UPDATE/DELETE target operations
+- Dialect-aware SQL placeholders for MSSQL/PostgreSQL/Oracle
+- Full stored procedure code generation (Oracle/MSSQL/generic)
+- JSON-based state persistence for mapping and workflow variables
+- Primary key auto-detection for update strategy targets
+### v1.6.x (Phase 5)
+- SQL dialect translation (Oracle/MSSQL → ANSI)
+- Enhanced error reporting (unsupported transforms, unmapped ports, unknown functions)
+- Nested mapplet expansion with circular reference protection
+- Data quality validation warnings on type casting (`--validate-casts`)
+### v1.5.x (Phase 4)
+- Parameter file support (`.param` files with section parsing)
+- Vectorized expression generation (column-level pandas operations)
+- Library-specific code adapters (polars/dask/modin/vaex syntax generation)
+- 72+ integration tests
 ### v1.4.x (Phase 3)
 - Session connection overrides for sources and targets
 - Worklet function generation with safe invocation
@@ -217,8 +361,8 @@ Converts Informatica expressions to Python equivalents:
 cd informatica_python
 pip install -e ".[dev]"
-# Run tests (25 tests)
-pytest tests/test_converter.py -v
+# Run tests (136 tests)
+pytest tests/ -v
 ```
 ## License

{informatica_python-1.8.1 → informatica_python-1.9.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "informatica-python"
-version = "1.8.1"
+version = "1.9.0"
 description = "Convert Informatica PowerCenter workflow XML to Python/PySpark code"
 readme = "README.md"
 license = {text = "MIT"}

{informatica_python-1.8.1 → informatica_python-1.9.0}/tests/test_converter.py RENAMED Viewed

@@ -104,7 +104,6 @@ def test_convert_to_files():
         expected_files = [
             "helper_functions.py",
-            "mapping_1.py",
             "workflow.py",
             "config.yml",
             "all_sql_queries.sql",
@@ -118,6 +117,12 @@ def test_convert_to_files():
             assert size > 0, f"{f} should not be empty"
             print(f"  {f}: {size} bytes")
+        mapping_files = [f for f in os.listdir(output_dir) if f.startswith("mapping_") and f.endswith(".py")]
+        assert len(mapping_files) > 0, "At least one mapping file should exist"
+        for mf in mapping_files:
+            assert mf != "mapping_1.py", "Mapping files should use real mapping names, not numeric indices"
+            print(f"  {mf}: {os.path.getsize(os.path.join(output_dir, mf))} bytes")
         print(f"PASS: test_convert_to_files")

{informatica_python-1.8.1 → informatica_python-1.9.0}/tests/test_integration.py RENAMED Viewed

@@ -337,7 +337,7 @@ class TestCodeGeneration:
         converter = InformaticaConverter(data_lib="pandas")
         output = converter.convert_string(MINIMAL_XML, output_dir=self.tmpdir)
-        mapping_path = os.path.join(output, "mapping_1.py")
+        mapping_path = os.path.join(output, "mapping_m_test_expr.py")
         assert os.path.exists(mapping_path)
         with open(mapping_path) as f:
@@ -350,7 +350,7 @@ class TestCodeGeneration:
         converter = InformaticaConverter(data_lib="pandas")
         output = converter.convert_string(FILTER_XML, output_dir=self.tmpdir)
-        mapping_path = os.path.join(output, "mapping_1.py")
+        mapping_path = os.path.join(output, "mapping_m_test_filter.py")
         with open(mapping_path) as f:
             code = f.read()
         assert 'Filter' in code
@@ -378,7 +378,7 @@ class TestCodeGeneration:
         output = converter.convert_string(MINIMAL_XML, output_dir=self.tmpdir)
         expected_files = [
-            "helper_functions.py", "mapping_1.py", "workflow.py",
+            "helper_functions.py", "mapping_m_test_expr.py", "workflow.py",
             "config.yml", "all_sql_queries.sql", "error_log.txt",
         ]
         for fname in expected_files:
@@ -398,7 +398,7 @@ class TestCodeGeneration:
         converter = InformaticaConverter(data_lib="pandas")
         output = converter.convert_string(MINIMAL_XML, output_dir=self.tmpdir)
-        mapping_path = os.path.join(output, "mapping_1.py")
+        mapping_path = os.path.join(output, "mapping_m_test_expr.py")
         with open(mapping_path) as f:
             code = f.read()
         compile(code, mapping_path, "exec")

informatica_python-1.8.1/README.md DELETED Viewed

@@ -1,201 +0,0 @@
-# informatica-python
-Convert Informatica PowerCenter workflow XML exports into clean, runnable Python/PySpark code.
-**Author:** Nick
-**License:** MIT
-**PyPI:** [informatica-python](https://pypi.org/project/informatica-python/)
----
-## Overview
-`informatica-python` parses Informatica PowerCenter XML export files and generates equivalent Python code using your choice of data library. It handles all 72 DTD tags from the PowerCenter XML schema and produces a complete, ready-to-run Python project.
-## Installation
-```bash
-pip install informatica-python
-```
-## Quick Start
-### Command Line
-```bash
-# Generate Python files to a directory
-informatica-python workflow_export.xml -o output_dir
-# Generate as a zip archive
-informatica-python workflow_export.xml -z output.zip
-# Use a different data library
-informatica-python workflow_export.xml -o output_dir --data-lib polars
-# Parse to JSON only (no code generation)
-informatica-python workflow_export.xml --json
-# Save parsed JSON to file
-informatica-python workflow_export.xml --json-file parsed.json
-```
-### Python API
-```python
-from informatica_python import InformaticaConverter
-converter = InformaticaConverter()
-# Parse and generate files
-converter.convert_to_files("workflow_export.xml", "output_dir")
-# Parse and generate zip
-converter.convert_to_zip("workflow_export.xml", "output.zip")
-# Parse to structured dict
-result = converter.parse_file("workflow_export.xml")
-# Use a different data library
-converter.convert_to_files("workflow_export.xml", "output_dir", data_lib="polars")
-```
-## Generated Output Files
-| File | Description |
-|------|-------------|
-| `helper_functions.py` | Database/file I/O helpers, Informatica expression equivalents (80+ functions) |
-| `mapping_N.py` | One per mapping — transformation logic, source reads, target writes |
-| `workflow.py` | Task orchestration with topological ordering and error handling |
-| `config.yml` | Connection configs, source/target metadata, runtime parameters |
-| `all_sql_queries.sql` | All SQL extracted from Source Qualifiers, Lookups, SQL transforms |
-| `error_log.txt` | Conversion summary, warnings, and unsupported feature notes |
-## Supported Data Libraries
-Select via `--data-lib` CLI flag or `data_lib` parameter:
-| Library | Flag | Best For |
-|---------|------|----------|
-| **pandas** | `pandas` (default) | General-purpose, most compatible |
-| **dask** | `dask` | Large datasets, parallel processing |
-| **polars** | `polars` | High performance, Rust-backed |
-| **vaex** | `vaex` | Out-of-core, billion-row datasets |
-| **modin** | `modin` | Drop-in pandas replacement, multi-core |
-## Supported Transformations
-The code generator produces real, runnable Python for these transformation types:
-- **Source Qualifier** — SQL override, pre/post SQL, column selection
-- **Expression** — Field-level expressions converted to pandas operations
-- **Filter** — Row filtering with converted conditions
-- **Joiner** — `pd.merge()` with join type and condition parsing
-- **Lookup** — `pd.merge()` lookups with connection-aware DB/file reads
-- **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST
-- **Sorter** — `sort_values()` with multi-key ascending/descending
-- **Router** — Multi-group conditional routing with if/elif/else
-- **Union** — `pd.concat()` across multiple input groups
-- **Update Strategy** — Insert/Update/Delete/Reject flag generation
-- **Sequence Generator** — Auto-incrementing ID columns
-- **Normalizer** — `pd.melt()` with auto-detected id/value vars
-- **Rank** — `groupby().rank()` with Top-N filtering
-- **Stored Procedure** — Stub generation with SP name and parameters
-- **Transaction Control** — Commit/rollback logic stubs
-- **Custom / Java** — Placeholder stubs with TODO markers
-- **SQL Transform** — Direct SQL execution pass-through
-## Supported XML Tags (72 Tags)
-**Top-level:** POWERMART, REPOSITORY, FOLDER, FOLDERVERSION
-**Source/Target:** SOURCE, SOURCEFIELD, TARGET, TARGETFIELD, TARGETINDEX, TARGETINDEXFIELD, FLATFILE, XMLINFO, XMLTEXT, GROUP, TABLEATTRIBUTE, FIELDATTRIBUTE, METADATAEXTENSION, KEYWORD, ERPSRCINFO
-**Mapping/Mapplet:** MAPPING, MAPPLET, TRANSFORMATION, TRANSFORMFIELD, TRANSFORMFIELDATTR, TRANSFORMFIELDATTRDEF, INSTANCE, ASSOCIATED_SOURCE_INSTANCE, CONNECTOR, MAPDEPENDENCY, TARGETLOADORDER, MAPPINGVARIABLE, FIELDDEPENDENCY, INITPROP, ERPINFO
-**Task/Session/Workflow:** TASK, TIMER, VALUEPAIR, SCHEDULER, SCHEDULEINFO, STARTOPTIONS, ENDOPTIONS, SCHEDULEOPTIONS, RECURRING, CUSTOM, DAILYFREQUENCY, REPEAT, FILTER, SESSION, CONFIGREFERENCE, SESSTRANSFORMATIONINST, SESSTRANSFORMATIONGROUP, PARTITION, HASHKEY, KEYRANGE, CONFIG, SESSIONCOMPONENT, CONNECTIONREFERENCE, TASKINSTANCE, WORKFLOWLINK, WORKFLOWVARIABLE, WORKFLOWEVENT, WORKLET, WORKFLOW, ATTRIBUTE
-**Shortcut:** SHORTCUT
-**SAP:** SAPFUNCTION, SAPSTRUCTURE, SAPPROGRAM, SAPOUTPUTPORT, SAPVARIABLE, SAPPROGRAMFLOWOBJECT, SAPTABLEPARAM
-## Key Features
-### Session Connection Overrides (v1.4+)
-When sessions define per-transform connection overrides (different database, file directory, or filename), the generated code uses those overrides instead of source/target defaults.
-### Worklet Support (v1.4+)
-Worklet workflows are detected and generate separate `run_worklet_NAME(config)` functions. The main workflow calls these automatically for Worklet task types.
-### Type Casting at Target Writes (v1.4+)
-Target field datatypes are mapped to pandas types and generate proper casting code:
-- Integers: nullable `Int64`/`Int32` or `fillna(0).astype(int)` for NOT NULL
-- Dates: `pd.to_datetime(errors='coerce')`
-- Decimals/Floats: `pd.to_numeric(errors='coerce')`
-- Booleans: `.astype('boolean')`
-### Flat File Handling (v1.3+)
-Parses FLATFILE metadata for delimiter, fixed-width, header lines, skip rows, quote/escape chars. Generates `pd.read_fwf()` for fixed-width or enriched `read_file()` for delimited.
-### Mapplet Inlining (v1.3+)
-Expands Mapplet instances into prefixed transforms, rewires connectors, and eliminates duplication.
-### Decision Tasks (v1.3+)
-Converts Informatica decision conditions to Python if/else branches with proper variable substitution.
-### Expression Converter (80+ Functions)
-Converts Informatica expressions to Python equivalents:
-- **String:** SUBSTR, LTRIM, RTRIM, UPPER, LOWER, LPAD, RPAD, INSTR, LENGTH, CONCAT, REPLACE, REG_EXTRACT, REG_REPLACE, REVERSE, INITCAP, CHR, ASCII
-- **Date:** ADD_TO_DATE, DATE_DIFF, GET_DATE_PART, SYSDATE, SYSTIMESTAMP, TO_DATE, TO_CHAR, TRUNC (date)
-- **Numeric:** ROUND, TRUNC, MOD, ABS, CEIL, FLOOR, POWER, SQRT, LOG, EXP, SIGN
-- **Conversion:** TO_INTEGER, TO_BIGINT, TO_FLOAT, TO_DECIMAL, TO_CHAR, TO_DATE
-- **Null handling:** IIF, DECODE, NVL, NVL2, ISNULL, IS_SPACES, IS_NUMBER
-- **Aggregate:** SUM, AVG, COUNT, MIN, MAX, FIRST, LAST, MEDIAN, STDDEV, VARIANCE
-- **Lookup:** :LKP expressions with dynamic lookup references
-- **Variable:** SETVARIABLE / mapping variable assignment
-## Requirements
-- Python >= 3.8
-- lxml >= 4.9.0
-- PyYAML >= 6.0
-## Changelog
-### v1.4.x (Phase 3)
-- Session connection overrides for sources and targets
-- Worklet function generation with safe invocation
-- Type casting at target writes based on TARGETFIELD datatypes
-- Flat-file session path overrides properly wired
-### v1.3.x (Phase 2)
-- FLATFILE metadata in source reads and target writes
-- Normalizer with `pd.melt()`
-- Rank with group-by and Top-N filtering
-- Decision tasks with real if/else branches
-- Mapplet instance inlining
-### v1.2.x (Phase 1)
-- Core parser for all 72 XML tags
-- Expression converter with 80+ functions
-- Aggregator, Joiner, Lookup code generation
-- Workflow orchestration with topological task ordering
-- Multi-library support (pandas, dask, polars, vaex, modin)
-## Development
-```bash
-# Clone and install in development mode
-cd informatica_python
-pip install -e ".[dev]"
-# Run tests (25 tests)
-pytest tests/test_converter.py -v
-```
-## License
-MIT License - Copyright (c) 2025 Nick
-See [LICENSE](LICENSE) for details.