PyPI - informatica-python - Versions diffs - 1.4.0__tar.gz → 1.4.2__tar.gz - Mend

informatica-python 1.4.0tar.gz → 1.4.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (32) hide show

informatica_python-1.4.2/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2025 Nick
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

informatica_python-1.4.2/PKG-INFO ADDED Viewed

@@ -0,0 +1,228 @@
+Metadata-Version: 2.4
+Name: informatica-python
+Version: 1.4.2
+Summary: Convert Informatica PowerCenter workflow XML to Python/PySpark code
+Author: Nick
+License: MIT
+Keywords: informatica,powercenter,etl,code-generator,pandas,pyspark,data-engineering
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.8
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Software Development :: Code Generators
+Classifier: Topic :: Database :: Database Engines/Servers
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: lxml>=4.9.0
+Requires-Dist: pyyaml>=6.0
+Provides-Extra: dev
+Requires-Dist: pytest>=7.0; extra == "dev"
+Dynamic: license-file
+# informatica-python
+Convert Informatica PowerCenter workflow XML exports into clean, runnable Python/PySpark code.
+**Author:** Nick
+**License:** MIT
+**PyPI:** [informatica-python](https://pypi.org/project/informatica-python/)
+---
+## Overview
+`informatica-python` parses Informatica PowerCenter XML export files and generates equivalent Python code using your choice of data library. It handles all 72 DTD tags from the PowerCenter XML schema and produces a complete, ready-to-run Python project.
+## Installation
+```bash
+pip install informatica-python
+```
+## Quick Start
+### Command Line
+```bash
+# Generate Python files to a directory
+informatica-python workflow_export.xml -o output_dir
+# Generate as a zip archive
+informatica-python workflow_export.xml -z output.zip
+# Use a different data library
+informatica-python workflow_export.xml -o output_dir --data-lib polars
+# Parse to JSON only (no code generation)
+informatica-python workflow_export.xml --json
+# Save parsed JSON to file
+informatica-python workflow_export.xml --json-file parsed.json
+```
+### Python API
+```python
+from informatica_python import InformaticaConverter
+converter = InformaticaConverter()
+# Parse and generate files
+converter.convert_to_files("workflow_export.xml", "output_dir")
+# Parse and generate zip
+converter.convert_to_zip("workflow_export.xml", "output.zip")
+# Parse to structured dict
+result = converter.parse_file("workflow_export.xml")
+# Use a different data library
+converter.convert_to_files("workflow_export.xml", "output_dir", data_lib="polars")
+```
+## Generated Output Files
+| File | Description |
+|------|-------------|
+| `helper_functions.py` | Database/file I/O helpers, Informatica expression equivalents (80+ functions) |
+| `mapping_N.py` | One per mapping — transformation logic, source reads, target writes |
+| `workflow.py` | Task orchestration with topological ordering and error handling |
+| `config.yml` | Connection configs, source/target metadata, runtime parameters |
+| `all_sql_queries.sql` | All SQL extracted from Source Qualifiers, Lookups, SQL transforms |
+| `error_log.txt` | Conversion summary, warnings, and unsupported feature notes |
+## Supported Data Libraries
+Select via `--data-lib` CLI flag or `data_lib` parameter:
+| Library | Flag | Best For |
+|---------|------|----------|
+| **pandas** | `pandas` (default) | General-purpose, most compatible |
+| **dask** | `dask` | Large datasets, parallel processing |
+| **polars** | `polars` | High performance, Rust-backed |
+| **vaex** | `vaex` | Out-of-core, billion-row datasets |
+| **modin** | `modin` | Drop-in pandas replacement, multi-core |
+## Supported Transformations
+The code generator produces real, runnable Python for these transformation types:
+- **Source Qualifier** — SQL override, pre/post SQL, column selection
+- **Expression** — Field-level expressions converted to pandas operations
+- **Filter** — Row filtering with converted conditions
+- **Joiner** — `pd.merge()` with join type and condition parsing
+- **Lookup** — `pd.merge()` lookups with connection-aware DB/file reads
+- **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST
+- **Sorter** — `sort_values()` with multi-key ascending/descending
+- **Router** — Multi-group conditional routing with if/elif/else
+- **Union** — `pd.concat()` across multiple input groups
+- **Update Strategy** — Insert/Update/Delete/Reject flag generation
+- **Sequence Generator** — Auto-incrementing ID columns
+- **Normalizer** — `pd.melt()` with auto-detected id/value vars
+- **Rank** — `groupby().rank()` with Top-N filtering
+- **Stored Procedure** — Stub generation with SP name and parameters
+- **Transaction Control** — Commit/rollback logic stubs
+- **Custom / Java** — Placeholder stubs with TODO markers
+- **SQL Transform** — Direct SQL execution pass-through
+## Supported XML Tags (72 Tags)
+**Top-level:** POWERMART, REPOSITORY, FOLDER, FOLDERVERSION
+**Source/Target:** SOURCE, SOURCEFIELD, TARGET, TARGETFIELD, TARGETINDEX, TARGETINDEXFIELD, FLATFILE, XMLINFO, XMLTEXT, GROUP, TABLEATTRIBUTE, FIELDATTRIBUTE, METADATAEXTENSION, KEYWORD, ERPSRCINFO
+**Mapping/Mapplet:** MAPPING, MAPPLET, TRANSFORMATION, TRANSFORMFIELD, TRANSFORMFIELDATTR, TRANSFORMFIELDATTRDEF, INSTANCE, ASSOCIATED_SOURCE_INSTANCE, CONNECTOR, MAPDEPENDENCY, TARGETLOADORDER, MAPPINGVARIABLE, FIELDDEPENDENCY, INITPROP, ERPINFO
+**Task/Session/Workflow:** TASK, TIMER, VALUEPAIR, SCHEDULER, SCHEDULEINFO, STARTOPTIONS, ENDOPTIONS, SCHEDULEOPTIONS, RECURRING, CUSTOM, DAILYFREQUENCY, REPEAT, FILTER, SESSION, CONFIGREFERENCE, SESSTRANSFORMATIONINST, SESSTRANSFORMATIONGROUP, PARTITION, HASHKEY, KEYRANGE, CONFIG, SESSIONCOMPONENT, CONNECTIONREFERENCE, TASKINSTANCE, WORKFLOWLINK, WORKFLOWVARIABLE, WORKFLOWEVENT, WORKLET, WORKFLOW, ATTRIBUTE
+**Shortcut:** SHORTCUT
+**SAP:** SAPFUNCTION, SAPSTRUCTURE, SAPPROGRAM, SAPOUTPUTPORT, SAPVARIABLE, SAPPROGRAMFLOWOBJECT, SAPTABLEPARAM
+## Key Features
+### Session Connection Overrides (v1.4+)
+When sessions define per-transform connection overrides (different database, file directory, or filename), the generated code uses those overrides instead of source/target defaults.
+### Worklet Support (v1.4+)
+Worklet workflows are detected and generate separate `run_worklet_NAME(config)` functions. The main workflow calls these automatically for Worklet task types.
+### Type Casting at Target Writes (v1.4+)
+Target field datatypes are mapped to pandas types and generate proper casting code:
+- Integers: nullable `Int64`/`Int32` or `fillna(0).astype(int)` for NOT NULL
+- Dates: `pd.to_datetime(errors='coerce')`
+- Decimals/Floats: `pd.to_numeric(errors='coerce')`
+- Booleans: `.astype('boolean')`
+### Flat File Handling (v1.3+)
+Parses FLATFILE metadata for delimiter, fixed-width, header lines, skip rows, quote/escape chars. Generates `pd.read_fwf()` for fixed-width or enriched `read_file()` for delimited.
+### Mapplet Inlining (v1.3+)
+Expands Mapplet instances into prefixed transforms, rewires connectors, and eliminates duplication.
+### Decision Tasks (v1.3+)
+Converts Informatica decision conditions to Python if/else branches with proper variable substitution.
+### Expression Converter (80+ Functions)
+Converts Informatica expressions to Python equivalents:
+- **String:** SUBSTR, LTRIM, RTRIM, UPPER, LOWER, LPAD, RPAD, INSTR, LENGTH, CONCAT, REPLACE, REG_EXTRACT, REG_REPLACE, REVERSE, INITCAP, CHR, ASCII
+- **Date:** ADD_TO_DATE, DATE_DIFF, GET_DATE_PART, SYSDATE, SYSTIMESTAMP, TO_DATE, TO_CHAR, TRUNC (date)
+- **Numeric:** ROUND, TRUNC, MOD, ABS, CEIL, FLOOR, POWER, SQRT, LOG, EXP, SIGN
+- **Conversion:** TO_INTEGER, TO_BIGINT, TO_FLOAT, TO_DECIMAL, TO_CHAR, TO_DATE
+- **Null handling:** IIF, DECODE, NVL, NVL2, ISNULL, IS_SPACES, IS_NUMBER
+- **Aggregate:** SUM, AVG, COUNT, MIN, MAX, FIRST, LAST, MEDIAN, STDDEV, VARIANCE
+- **Lookup:** :LKP expressions with dynamic lookup references
+- **Variable:** SETVARIABLE / mapping variable assignment
+## Requirements
+- Python >= 3.8
+- lxml >= 4.9.0
+- PyYAML >= 6.0
+## Changelog
+### v1.4.x (Phase 3)
+- Session connection overrides for sources and targets
+- Worklet function generation with safe invocation
+- Type casting at target writes based on TARGETFIELD datatypes
+- Flat-file session path overrides properly wired
+### v1.3.x (Phase 2)
+- FLATFILE metadata in source reads and target writes
+- Normalizer with `pd.melt()`
+- Rank with group-by and Top-N filtering
+- Decision tasks with real if/else branches
+- Mapplet instance inlining
+### v1.2.x (Phase 1)
+- Core parser for all 72 XML tags
+- Expression converter with 80+ functions
+- Aggregator, Joiner, Lookup code generation
+- Workflow orchestration with topological task ordering
+- Multi-library support (pandas, dask, polars, vaex, modin)
+## Development
+```bash
+# Clone and install in development mode
+cd informatica_python
+pip install -e ".[dev]"
+# Run tests (25 tests)
+pytest tests/test_converter.py -v
+```
+## License
+MIT License - Copyright (c) 2025 Nick
+See [LICENSE](LICENSE) for details.

informatica_python-1.4.2/README.md ADDED Viewed

@@ -0,0 +1,201 @@
+# informatica-python
+Convert Informatica PowerCenter workflow XML exports into clean, runnable Python/PySpark code.
+**Author:** Nick
+**License:** MIT
+**PyPI:** [informatica-python](https://pypi.org/project/informatica-python/)
+---
+## Overview
+`informatica-python` parses Informatica PowerCenter XML export files and generates equivalent Python code using your choice of data library. It handles all 72 DTD tags from the PowerCenter XML schema and produces a complete, ready-to-run Python project.
+## Installation
+```bash
+pip install informatica-python
+```
+## Quick Start
+### Command Line
+```bash
+# Generate Python files to a directory
+informatica-python workflow_export.xml -o output_dir
+# Generate as a zip archive
+informatica-python workflow_export.xml -z output.zip
+# Use a different data library
+informatica-python workflow_export.xml -o output_dir --data-lib polars
+# Parse to JSON only (no code generation)
+informatica-python workflow_export.xml --json
+# Save parsed JSON to file
+informatica-python workflow_export.xml --json-file parsed.json
+```
+### Python API
+```python
+from informatica_python import InformaticaConverter
+converter = InformaticaConverter()
+# Parse and generate files
+converter.convert_to_files("workflow_export.xml", "output_dir")
+# Parse and generate zip
+converter.convert_to_zip("workflow_export.xml", "output.zip")
+# Parse to structured dict
+result = converter.parse_file("workflow_export.xml")
+# Use a different data library
+converter.convert_to_files("workflow_export.xml", "output_dir", data_lib="polars")
+```
+## Generated Output Files
+| File | Description |
+|------|-------------|
+| `helper_functions.py` | Database/file I/O helpers, Informatica expression equivalents (80+ functions) |
+| `mapping_N.py` | One per mapping — transformation logic, source reads, target writes |
+| `workflow.py` | Task orchestration with topological ordering and error handling |
+| `config.yml` | Connection configs, source/target metadata, runtime parameters |
+| `all_sql_queries.sql` | All SQL extracted from Source Qualifiers, Lookups, SQL transforms |
+| `error_log.txt` | Conversion summary, warnings, and unsupported feature notes |
+## Supported Data Libraries
+Select via `--data-lib` CLI flag or `data_lib` parameter:
+| Library | Flag | Best For |
+|---------|------|----------|
+| **pandas** | `pandas` (default) | General-purpose, most compatible |
+| **dask** | `dask` | Large datasets, parallel processing |
+| **polars** | `polars` | High performance, Rust-backed |
+| **vaex** | `vaex` | Out-of-core, billion-row datasets |
+| **modin** | `modin` | Drop-in pandas replacement, multi-core |
+## Supported Transformations
+The code generator produces real, runnable Python for these transformation types:
+- **Source Qualifier** — SQL override, pre/post SQL, column selection
+- **Expression** — Field-level expressions converted to pandas operations
+- **Filter** — Row filtering with converted conditions
+- **Joiner** — `pd.merge()` with join type and condition parsing
+- **Lookup** — `pd.merge()` lookups with connection-aware DB/file reads
+- **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST
+- **Sorter** — `sort_values()` with multi-key ascending/descending
+- **Router** — Multi-group conditional routing with if/elif/else
+- **Union** — `pd.concat()` across multiple input groups
+- **Update Strategy** — Insert/Update/Delete/Reject flag generation
+- **Sequence Generator** — Auto-incrementing ID columns
+- **Normalizer** — `pd.melt()` with auto-detected id/value vars
+- **Rank** — `groupby().rank()` with Top-N filtering
+- **Stored Procedure** — Stub generation with SP name and parameters
+- **Transaction Control** — Commit/rollback logic stubs
+- **Custom / Java** — Placeholder stubs with TODO markers
+- **SQL Transform** — Direct SQL execution pass-through
+## Supported XML Tags (72 Tags)
+**Top-level:** POWERMART, REPOSITORY, FOLDER, FOLDERVERSION
+**Source/Target:** SOURCE, SOURCEFIELD, TARGET, TARGETFIELD, TARGETINDEX, TARGETINDEXFIELD, FLATFILE, XMLINFO, XMLTEXT, GROUP, TABLEATTRIBUTE, FIELDATTRIBUTE, METADATAEXTENSION, KEYWORD, ERPSRCINFO
+**Mapping/Mapplet:** MAPPING, MAPPLET, TRANSFORMATION, TRANSFORMFIELD, TRANSFORMFIELDATTR, TRANSFORMFIELDATTRDEF, INSTANCE, ASSOCIATED_SOURCE_INSTANCE, CONNECTOR, MAPDEPENDENCY, TARGETLOADORDER, MAPPINGVARIABLE, FIELDDEPENDENCY, INITPROP, ERPINFO
+**Task/Session/Workflow:** TASK, TIMER, VALUEPAIR, SCHEDULER, SCHEDULEINFO, STARTOPTIONS, ENDOPTIONS, SCHEDULEOPTIONS, RECURRING, CUSTOM, DAILYFREQUENCY, REPEAT, FILTER, SESSION, CONFIGREFERENCE, SESSTRANSFORMATIONINST, SESSTRANSFORMATIONGROUP, PARTITION, HASHKEY, KEYRANGE, CONFIG, SESSIONCOMPONENT, CONNECTIONREFERENCE, TASKINSTANCE, WORKFLOWLINK, WORKFLOWVARIABLE, WORKFLOWEVENT, WORKLET, WORKFLOW, ATTRIBUTE
+**Shortcut:** SHORTCUT
+**SAP:** SAPFUNCTION, SAPSTRUCTURE, SAPPROGRAM, SAPOUTPUTPORT, SAPVARIABLE, SAPPROGRAMFLOWOBJECT, SAPTABLEPARAM
+## Key Features
+### Session Connection Overrides (v1.4+)
+When sessions define per-transform connection overrides (different database, file directory, or filename), the generated code uses those overrides instead of source/target defaults.
+### Worklet Support (v1.4+)
+Worklet workflows are detected and generate separate `run_worklet_NAME(config)` functions. The main workflow calls these automatically for Worklet task types.
+### Type Casting at Target Writes (v1.4+)
+Target field datatypes are mapped to pandas types and generate proper casting code:
+- Integers: nullable `Int64`/`Int32` or `fillna(0).astype(int)` for NOT NULL
+- Dates: `pd.to_datetime(errors='coerce')`
+- Decimals/Floats: `pd.to_numeric(errors='coerce')`
+- Booleans: `.astype('boolean')`
+### Flat File Handling (v1.3+)
+Parses FLATFILE metadata for delimiter, fixed-width, header lines, skip rows, quote/escape chars. Generates `pd.read_fwf()` for fixed-width or enriched `read_file()` for delimited.
+### Mapplet Inlining (v1.3+)
+Expands Mapplet instances into prefixed transforms, rewires connectors, and eliminates duplication.
+### Decision Tasks (v1.3+)
+Converts Informatica decision conditions to Python if/else branches with proper variable substitution.
+### Expression Converter (80+ Functions)
+Converts Informatica expressions to Python equivalents:
+- **String:** SUBSTR, LTRIM, RTRIM, UPPER, LOWER, LPAD, RPAD, INSTR, LENGTH, CONCAT, REPLACE, REG_EXTRACT, REG_REPLACE, REVERSE, INITCAP, CHR, ASCII
+- **Date:** ADD_TO_DATE, DATE_DIFF, GET_DATE_PART, SYSDATE, SYSTIMESTAMP, TO_DATE, TO_CHAR, TRUNC (date)
+- **Numeric:** ROUND, TRUNC, MOD, ABS, CEIL, FLOOR, POWER, SQRT, LOG, EXP, SIGN
+- **Conversion:** TO_INTEGER, TO_BIGINT, TO_FLOAT, TO_DECIMAL, TO_CHAR, TO_DATE
+- **Null handling:** IIF, DECODE, NVL, NVL2, ISNULL, IS_SPACES, IS_NUMBER
+- **Aggregate:** SUM, AVG, COUNT, MIN, MAX, FIRST, LAST, MEDIAN, STDDEV, VARIANCE
+- **Lookup:** :LKP expressions with dynamic lookup references
+- **Variable:** SETVARIABLE / mapping variable assignment
+## Requirements
+- Python >= 3.8
+- lxml >= 4.9.0
+- PyYAML >= 6.0
+## Changelog
+### v1.4.x (Phase 3)
+- Session connection overrides for sources and targets
+- Worklet function generation with safe invocation
+- Type casting at target writes based on TARGETFIELD datatypes
+- Flat-file session path overrides properly wired
+### v1.3.x (Phase 2)
+- FLATFILE metadata in source reads and target writes
+- Normalizer with `pd.melt()`
+- Rank with group-by and Top-N filtering
+- Decision tasks with real if/else branches
+- Mapplet instance inlining
+### v1.2.x (Phase 1)
+- Core parser for all 72 XML tags
+- Expression converter with 80+ functions
+- Aggregator, Joiner, Lookup code generation
+- Workflow orchestration with topological task ordering
+- Multi-library support (pandas, dask, polars, vaex, modin)
+## Development
+```bash
+# Clone and install in development mode
+cd informatica_python
+pip install -e ".[dev]"
+# Run tests (25 tests)
+pytest tests/test_converter.py -v
+```
+## License
+MIT License - Copyright (c) 2025 Nick
+See [LICENSE](LICENSE) for details.

informatica_python-1.4.2/informatica_python/__init__.py ADDED Viewed

@@ -0,0 +1,13 @@
+"""
+informatica-python: Convert Informatica PowerCenter workflow XML to Python/PySpark code.
+Copyright (c) 2025 Nick. All rights reserved.
+Licensed under the MIT License.
+"""
+from informatica_python.converter import InformaticaConverter
+__version__ = "1.4.2"
+__author__ = "Nick"
+__license__ = "MIT"
+__all__ = ["InformaticaConverter"]

{informatica_python-1.4.0 → informatica_python-1.4.2}/informatica_python/generators/mapping_gen.py RENAMED Viewed

@@ -247,7 +247,7 @@ def generate_mapping_code(mapping: MappingDef, folder: FolderDef,
                 lines.append(f"    _src_path_{safe} = config.get('sources', {{}}).get('{src_def.name}', {{}}).get('file_path',")
                 lines.append(f"        os.path.join('{src_dir}', '{src_file}'))")
                 if src_def.flatfile:
-                    _emit_flatfile_read(lines, safe, src_def)
+                    _emit_flatfile_read(lines, safe, src_def, file_path_override=True)
                 else:
                     lines.append(f"    df_{safe} = read_file(_src_path_{safe}, config.get('sources', {{}}).get('{src_def.name}', {{}}))")
             elif src_def.database_type and src_def.database_type != "Flat File":
@@ -323,15 +323,16 @@ def _flatfile_config_dict(ff):
     return cfg
-def _emit_flatfile_read(lines, var_name, src_def, indent="    "):
+def _emit_flatfile_read(lines, var_name, src_def, indent="    ", file_path_override=None):
     ff = src_def.flatfile
     fc = _flatfile_config_dict(ff)
+    default_path = f"_src_path_{var_name}" if file_path_override else f"config.get('sources', {{}}).get('{src_def.name}', {{}}).get('file_path', '{src_def.name}')"
     if fc.get("fixed_width"):
         widths = []
         for fld in src_def.fields:
             widths.append(fld.precision if fld.precision else 10)
         lines.append(f"{indent}df_{var_name} = pd.read_fwf(")
-        lines.append(f"{indent}    config.get('sources', {{}}).get('{src_def.name}', {{}}).get('file_path', '{src_def.name}'),")
+        lines.append(f"{indent}    {default_path},")
         lines.append(f"{indent}    widths={widths},")
         hdr = fc.get("header_lines", 0)
         if hdr:
@@ -367,15 +368,22 @@ def _emit_flatfile_read(lines, var_name, src_def, indent="    "):
     if file_cfg:
         lines.append(f"{indent}ff_cfg_{var_name} = {repr(file_cfg)}")
         lines.append(f"{indent}ff_cfg_{var_name}.update(config.get('sources', {{}}).get('{src_def.name}', {{}}))")
-        lines.append(f"{indent}df_{var_name} = read_file(ff_cfg_{var_name}.get('file_path', '{src_def.name}'), ff_cfg_{var_name})")
+        if file_path_override:
+            lines.append(f"{indent}df_{var_name} = read_file({default_path}, ff_cfg_{var_name})")
+        else:
+            lines.append(f"{indent}df_{var_name} = read_file(ff_cfg_{var_name}.get('file_path', '{src_def.name}'), ff_cfg_{var_name})")
     else:
-        lines.append(f"{indent}df_{var_name} = read_file(config.get('sources', {{}}).get('{src_def.name}', {{}}).get('file_path', '{src_def.name}'),")
-        lines.append(f"{indent}                          config.get('sources', {{}}).get('{src_def.name}', {{}}))")
+        if file_path_override:
+            lines.append(f"{indent}df_{var_name} = read_file({default_path}, config.get('sources', {{}}).get('{src_def.name}', {{}}))")
+        else:
+            lines.append(f"{indent}df_{var_name} = read_file(config.get('sources', {{}}).get('{src_def.name}', {{}}).get('file_path', '{src_def.name}'),")
+            lines.append(f"{indent}                          config.get('sources', {{}}).get('{src_def.name}', {{}}))")
-def _emit_flatfile_write(lines, var_name, tgt_def, indent="    "):
+def _emit_flatfile_write(lines, var_name, tgt_def, indent="    ", file_path_override=None):
     ff = tgt_def.flatfile
     fc = _flatfile_config_dict(ff)
+    default_path = f"_tgt_path_{var_name}" if file_path_override else f"config.get('targets', {{}}).get('{tgt_def.name}', {{}}).get('file_path', '{tgt_def.name}')"
     file_cfg = {}
     if "delimiter" in fc:
         file_cfg["delimiter"] = fc["delimiter"]
@@ -387,10 +395,16 @@ def _emit_flatfile_write(lines, var_name, tgt_def, indent="    "):
     if file_cfg:
         lines.append(f"{indent}ff_cfg_{var_name} = {repr(file_cfg)}")
         lines.append(f"{indent}ff_cfg_{var_name}.update(config.get('targets', {{}}).get('{tgt_def.name}', {{}}))")
-        lines.append(f"{indent}write_file(df_target_{var_name}, ff_cfg_{var_name}.get('file_path', '{tgt_def.name}'), ff_cfg_{var_name})")
+        if file_path_override:
+            lines.append(f"{indent}write_file(df_target_{var_name}, {default_path}, ff_cfg_{var_name})")
+        else:
+            lines.append(f"{indent}write_file(df_target_{var_name}, ff_cfg_{var_name}.get('file_path', '{tgt_def.name}'), ff_cfg_{var_name})")
     else:
-        lines.append(f"{indent}write_file(df_target_{var_name}, config.get('targets', {{}}).get('{tgt_def.name}', {{}}).get('file_path', '{tgt_def.name}'),")
-        lines.append(f"{indent}          config.get('targets', {{}}).get('{tgt_def.name}', {{}}))")
+        if file_path_override:
+            lines.append(f"{indent}write_file(df_target_{var_name}, {default_path}, config.get('targets', {{}}).get('{tgt_def.name}', {{}}))")
+        else:
+            lines.append(f"{indent}write_file(df_target_{var_name}, config.get('targets', {{}}).get('{tgt_def.name}', {{}}).get('file_path', '{tgt_def.name}'),")
+            lines.append(f"{indent}          config.get('targets', {{}}).get('{tgt_def.name}', {{}}))")
 def _build_source_map(mapping, folder):
@@ -1202,7 +1216,7 @@ def _generate_target_write(lines, tgt_name, tgt_def, connector_graph, source_dfs
         lines.append(f"    _tgt_path_{tgt_safe} = config.get('targets', {{}}).get('{tgt_def.name}', {{}}).get('file_path',")
         lines.append(f"        os.path.join('{out_dir}', '{out_file}'))")
         if tgt_def.flatfile:
-            _emit_flatfile_write(lines, tgt_safe, tgt_def)
+            _emit_flatfile_write(lines, tgt_safe, tgt_def, file_path_override=True)
         else:
             lines.append(f"    write_file(df_target_{tgt_safe}, _tgt_path_{tgt_safe}, config.get('targets', {{}}).get('{tgt_def.name}', {{}}))")
     elif tgt_def.database_type and tgt_def.database_type != "Flat File":

{informatica_python-1.4.0 → informatica_python-1.4.2}/informatica_python/generators/workflow_gen.py RENAMED Viewed

@@ -195,10 +195,11 @@ def _emit_task_code(lines, task, mapping_name_map, session_to_mapping, wf, workl
         lines.append(f"        logger.info('Executing worklet: {task.name}')")
         if matched_worklet:
             lines.append(f"        worklet_result_{task_safe} = run_worklet_{worklet_safe}(config)")
+            lines.append(f"        if not worklet_result_{task_safe}:")
+            lines.append(f"            raise RuntimeError('Worklet {worklet_name} returned failure')")
         else:
-            lines.append(f"        worklet_result_{task_safe} = run_worklet_{worklet_safe}(config)")
-        lines.append(f"        if not worklet_result_{task_safe}:")
-        lines.append(f"            raise RuntimeError('Worklet {worklet_name} returned failure')")
+            lines.append(f"        # WARNING: Worklet '{worklet_name}' definition not found in folder")
+            lines.append(f"        logger.warning('Worklet {worklet_name} not found — skipping')")
         lines.append(f"    except Exception as e:")
         lines.append(f"        logger.error(f'Worklet {task.name} failed: {{e}}')")
         if task.fail_parent_if_instance_fails == "YES":

informatica_python-1.4.2/informatica_python.egg-info/PKG-INFO ADDED Viewed

@@ -0,0 +1,228 @@
+Metadata-Version: 2.4
+Name: informatica-python
+Version: 1.4.2
+Summary: Convert Informatica PowerCenter workflow XML to Python/PySpark code
+Author: Nick
+License: MIT
+Keywords: informatica,powercenter,etl,code-generator,pandas,pyspark,data-engineering
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.8
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Software Development :: Code Generators
+Classifier: Topic :: Database :: Database Engines/Servers
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: lxml>=4.9.0
+Requires-Dist: pyyaml>=6.0
+Provides-Extra: dev
+Requires-Dist: pytest>=7.0; extra == "dev"
+Dynamic: license-file
+# informatica-python
+Convert Informatica PowerCenter workflow XML exports into clean, runnable Python/PySpark code.
+**Author:** Nick
+**License:** MIT
+**PyPI:** [informatica-python](https://pypi.org/project/informatica-python/)
+---
+## Overview
+`informatica-python` parses Informatica PowerCenter XML export files and generates equivalent Python code using your choice of data library. It handles all 72 DTD tags from the PowerCenter XML schema and produces a complete, ready-to-run Python project.
+## Installation
+```bash
+pip install informatica-python
+```
+## Quick Start
+### Command Line
+```bash
+# Generate Python files to a directory
+informatica-python workflow_export.xml -o output_dir
+# Generate as a zip archive
+informatica-python workflow_export.xml -z output.zip
+# Use a different data library
+informatica-python workflow_export.xml -o output_dir --data-lib polars
+# Parse to JSON only (no code generation)
+informatica-python workflow_export.xml --json
+# Save parsed JSON to file
+informatica-python workflow_export.xml --json-file parsed.json
+```
+### Python API
+```python
+from informatica_python import InformaticaConverter
+converter = InformaticaConverter()
+# Parse and generate files
+converter.convert_to_files("workflow_export.xml", "output_dir")
+# Parse and generate zip
+converter.convert_to_zip("workflow_export.xml", "output.zip")
+# Parse to structured dict
+result = converter.parse_file("workflow_export.xml")
+# Use a different data library
+converter.convert_to_files("workflow_export.xml", "output_dir", data_lib="polars")
+```
+## Generated Output Files
+| File | Description |
+|------|-------------|
+| `helper_functions.py` | Database/file I/O helpers, Informatica expression equivalents (80+ functions) |
+| `mapping_N.py` | One per mapping — transformation logic, source reads, target writes |
+| `workflow.py` | Task orchestration with topological ordering and error handling |
+| `config.yml` | Connection configs, source/target metadata, runtime parameters |
+| `all_sql_queries.sql` | All SQL extracted from Source Qualifiers, Lookups, SQL transforms |
+| `error_log.txt` | Conversion summary, warnings, and unsupported feature notes |
+## Supported Data Libraries
+Select via `--data-lib` CLI flag or `data_lib` parameter:
+| Library | Flag | Best For |
+|---------|------|----------|
+| **pandas** | `pandas` (default) | General-purpose, most compatible |
+| **dask** | `dask` | Large datasets, parallel processing |
+| **polars** | `polars` | High performance, Rust-backed |
+| **vaex** | `vaex` | Out-of-core, billion-row datasets |
+| **modin** | `modin` | Drop-in pandas replacement, multi-core |
+## Supported Transformations
+The code generator produces real, runnable Python for these transformation types:
+- **Source Qualifier** — SQL override, pre/post SQL, column selection
+- **Expression** — Field-level expressions converted to pandas operations
+- **Filter** — Row filtering with converted conditions
+- **Joiner** — `pd.merge()` with join type and condition parsing
+- **Lookup** — `pd.merge()` lookups with connection-aware DB/file reads
+- **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST
+- **Sorter** — `sort_values()` with multi-key ascending/descending
+- **Router** — Multi-group conditional routing with if/elif/else
+- **Union** — `pd.concat()` across multiple input groups
+- **Update Strategy** — Insert/Update/Delete/Reject flag generation
+- **Sequence Generator** — Auto-incrementing ID columns
+- **Normalizer** — `pd.melt()` with auto-detected id/value vars
+- **Rank** — `groupby().rank()` with Top-N filtering
+- **Stored Procedure** — Stub generation with SP name and parameters
+- **Transaction Control** — Commit/rollback logic stubs
+- **Custom / Java** — Placeholder stubs with TODO markers
+- **SQL Transform** — Direct SQL execution pass-through
+## Supported XML Tags (72 Tags)
+**Top-level:** POWERMART, REPOSITORY, FOLDER, FOLDERVERSION
+**Source/Target:** SOURCE, SOURCEFIELD, TARGET, TARGETFIELD, TARGETINDEX, TARGETINDEXFIELD, FLATFILE, XMLINFO, XMLTEXT, GROUP, TABLEATTRIBUTE, FIELDATTRIBUTE, METADATAEXTENSION, KEYWORD, ERPSRCINFO
+**Mapping/Mapplet:** MAPPING, MAPPLET, TRANSFORMATION, TRANSFORMFIELD, TRANSFORMFIELDATTR, TRANSFORMFIELDATTRDEF, INSTANCE, ASSOCIATED_SOURCE_INSTANCE, CONNECTOR, MAPDEPENDENCY, TARGETLOADORDER, MAPPINGVARIABLE, FIELDDEPENDENCY, INITPROP, ERPINFO
+**Task/Session/Workflow:** TASK, TIMER, VALUEPAIR, SCHEDULER, SCHEDULEINFO, STARTOPTIONS, ENDOPTIONS, SCHEDULEOPTIONS, RECURRING, CUSTOM, DAILYFREQUENCY, REPEAT, FILTER, SESSION, CONFIGREFERENCE, SESSTRANSFORMATIONINST, SESSTRANSFORMATIONGROUP, PARTITION, HASHKEY, KEYRANGE, CONFIG, SESSIONCOMPONENT, CONNECTIONREFERENCE, TASKINSTANCE, WORKFLOWLINK, WORKFLOWVARIABLE, WORKFLOWEVENT, WORKLET, WORKFLOW, ATTRIBUTE
+**Shortcut:** SHORTCUT
+**SAP:** SAPFUNCTION, SAPSTRUCTURE, SAPPROGRAM, SAPOUTPUTPORT, SAPVARIABLE, SAPPROGRAMFLOWOBJECT, SAPTABLEPARAM
+## Key Features
+### Session Connection Overrides (v1.4+)
+When sessions define per-transform connection overrides (different database, file directory, or filename), the generated code uses those overrides instead of source/target defaults.
+### Worklet Support (v1.4+)
+Worklet workflows are detected and generate separate `run_worklet_NAME(config)` functions. The main workflow calls these automatically for Worklet task types.
+### Type Casting at Target Writes (v1.4+)
+Target field datatypes are mapped to pandas types and generate proper casting code:
+- Integers: nullable `Int64`/`Int32` or `fillna(0).astype(int)` for NOT NULL
+- Dates: `pd.to_datetime(errors='coerce')`
+- Decimals/Floats: `pd.to_numeric(errors='coerce')`
+- Booleans: `.astype('boolean')`
+### Flat File Handling (v1.3+)
+Parses FLATFILE metadata for delimiter, fixed-width, header lines, skip rows, quote/escape chars. Generates `pd.read_fwf()` for fixed-width or enriched `read_file()` for delimited.
+### Mapplet Inlining (v1.3+)
+Expands Mapplet instances into prefixed transforms, rewires connectors, and eliminates duplication.
+### Decision Tasks (v1.3+)
+Converts Informatica decision conditions to Python if/else branches with proper variable substitution.
+### Expression Converter (80+ Functions)
+Converts Informatica expressions to Python equivalents:
+- **String:** SUBSTR, LTRIM, RTRIM, UPPER, LOWER, LPAD, RPAD, INSTR, LENGTH, CONCAT, REPLACE, REG_EXTRACT, REG_REPLACE, REVERSE, INITCAP, CHR, ASCII
+- **Date:** ADD_TO_DATE, DATE_DIFF, GET_DATE_PART, SYSDATE, SYSTIMESTAMP, TO_DATE, TO_CHAR, TRUNC (date)
+- **Numeric:** ROUND, TRUNC, MOD, ABS, CEIL, FLOOR, POWER, SQRT, LOG, EXP, SIGN
+- **Conversion:** TO_INTEGER, TO_BIGINT, TO_FLOAT, TO_DECIMAL, TO_CHAR, TO_DATE
+- **Null handling:** IIF, DECODE, NVL, NVL2, ISNULL, IS_SPACES, IS_NUMBER
+- **Aggregate:** SUM, AVG, COUNT, MIN, MAX, FIRST, LAST, MEDIAN, STDDEV, VARIANCE
+- **Lookup:** :LKP expressions with dynamic lookup references
+- **Variable:** SETVARIABLE / mapping variable assignment
+## Requirements
+- Python >= 3.8
+- lxml >= 4.9.0
+- PyYAML >= 6.0
+## Changelog
+### v1.4.x (Phase 3)
+- Session connection overrides for sources and targets
+- Worklet function generation with safe invocation
+- Type casting at target writes based on TARGETFIELD datatypes
+- Flat-file session path overrides properly wired
+### v1.3.x (Phase 2)
+- FLATFILE metadata in source reads and target writes
+- Normalizer with `pd.melt()`
+- Rank with group-by and Top-N filtering
+- Decision tasks with real if/else branches
+- Mapplet instance inlining
+### v1.2.x (Phase 1)
+- Core parser for all 72 XML tags
+- Expression converter with 80+ functions
+- Aggregator, Joiner, Lookup code generation
+- Workflow orchestration with topological task ordering
+- Multi-library support (pandas, dask, polars, vaex, modin)
+## Development
+```bash
+# Clone and install in development mode
+cd informatica_python
+pip install -e ".[dev]"
+# Run tests (25 tests)
+pytest tests/test_converter.py -v
+```
+## License
+MIT License - Copyright (c) 2025 Nick
+See [LICENSE](LICENSE) for details.

{informatica_python-1.4.0 → informatica_python-1.4.2}/informatica_python.egg-info/SOURCES.txt RENAMED Viewed

@@ -1,3 +1,4 @@
+LICENSE
 README.md
 pyproject.toml
 informatica_python/__init__.py

informatica_python-1.4.2/pyproject.toml ADDED Viewed

@@ -0,0 +1,41 @@
+[build-system]
+requires = ["setuptools>=68.0", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "informatica-python"
+version = "1.4.2"
+description = "Convert Informatica PowerCenter workflow XML to Python/PySpark code"
+readme = "README.md"
+license = {text = "MIT"}
+requires-python = ">=3.8"
+authors = [
+    { name = "Nick" },
+]
+classifiers = [
+    "Development Status :: 4 - Beta",
+    "Intended Audience :: Developers",
+    "License :: OSI Approved :: MIT License",
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3.8",
+    "Programming Language :: Python :: 3.9",
+    "Programming Language :: Python :: 3.10",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+    "Topic :: Software Development :: Code Generators",
+    "Topic :: Database :: Database Engines/Servers",
+]
+keywords = ["informatica", "powercenter", "etl", "code-generator", "pandas", "pyspark", "data-engineering"]
+dependencies = [
+    "lxml>=4.9.0",
+    "pyyaml>=6.0",
+]
+[project.scripts]
+informatica-python = "informatica_python.cli:main"
+[project.optional-dependencies]
+dev = ["pytest>=7.0"]
+[tool.setuptools.packages.find]
+include = ["informatica_python*"]

informatica_python-1.4.0/PKG-INFO DELETED Viewed

@@ -1,118 +0,0 @@
-Metadata-Version: 2.4
-Name: informatica-python
-Version: 1.4.0
-Summary: Convert Informatica PowerCenter workflow XML to Python/PySpark code
-License-Expression: MIT
-Requires-Python: >=3.8
-Description-Content-Type: text/markdown
-Requires-Dist: lxml>=4.9.0
-Requires-Dist: pyyaml>=6.0
-Provides-Extra: dev
-Requires-Dist: pytest>=7.0; extra == "dev"
-# informatica-python
-Convert Informatica PowerCenter workflow XML files to Python/PySpark code.
-## Installation
-```bash
-pip install informatica-python
-```
-## Quick Start
-### Command Line
-```bash
-# Convert XML to Python files in a directory
-informatica-python workflow.xml -o output_dir
-# Convert XML to a zip file
-informatica-python workflow.xml -z output.zip
-# Use a different data library (pandas, dask, polars, vaex, modin)
-informatica-python workflow.xml -o output_dir --data-lib polars
-# Parse XML to JSON (no code generation)
-informatica-python workflow.xml --json
-# Save parsed JSON to file
-informatica-python workflow.xml --json-file parsed.json
-```
-### Python API
-```python
-from informatica_python import InformaticaConverter
-# Convert XML to Python files
-converter = InformaticaConverter(data_lib="pandas")
-converter.convert("workflow.xml", output_dir="output")
-# Convert to zip
-converter.convert("workflow.xml", output_zip="output.zip")
-# Parse XML to JSON dict
-result = converter.parse_file("workflow.xml")
-# Parse XML string
-result = converter.parse_string(xml_string)
-```
-## Generated Output Files
-| File | Description |
-|------|-------------|
-| `helper_functions.py` | Database/file I/O functions plus Python equivalents for 50+ Informatica expression functions |
-| `mapping_N.py` | One file per mapping with full transformation logic |
-| `workflow.py` | Task orchestration with topological ordering |
-| `config.yml` | Connection configs, source/target metadata, variables |
-| `all_sql_queries.sql` | All extracted SQL queries (source qualifiers, lookups, pre/post SQL) |
-| `error_log.txt` | Conversion summary, warnings, and coverage statistics |
-## Supported Transformation Types
-- Source Qualifier / Application Source Qualifier
-- Expression
-- Filter
-- Aggregator
-- Sorter
-- Joiner
-- Lookup Procedure
-- Router
-- Union
-- Update Strategy
-- Sequence Generator
-- Normalizer
-- Rank
-- Stored Procedure (placeholder)
-- Custom Transformation (placeholder)
-- Java Transformation (placeholder)
-- SQL Transformation
-## Supported Data Libraries
-Choose your preferred data manipulation library with `--data-lib`:
-- **pandas** (default) — Standard Python data analysis
-- **dask** — Parallel computing with pandas-like API
-- **polars** — Fast DataFrame library written in Rust
-- **vaex** — Out-of-core DataFrames for large datasets
-- **modin** — Drop-in pandas replacement with parallel execution
-## Informatica Expression Functions
-The generated `helper_functions.py` includes Python equivalents for:
-`IIF`, `DECODE`, `NVL`, `NVL2`, `ISNULL`, `LTRIM`, `RTRIM`, `UPPER`, `LOWER`, `SUBSTR`, `LPAD`, `RPAD`, `TO_CHAR`, `TO_DATE`, `TO_INTEGER`, `TO_BIGINT`, `TO_FLOAT`, `TO_DECIMAL`, `REPLACECHR`, `REPLACESTR`, `INSTR`, `LENGTH`, `CONCAT`, `REG_EXTRACT`, `REG_MATCH`, `REG_REPLACE`, `GET_DATE_PART`, `ADD_TO_DATE`, `IS_DATE`, `IS_NUMBER`, `IS_SPACES`, `SYSDATE`, `ERROR`, `ABORT`, and more.
-## Requirements
-- Python >= 3.8
-- lxml >= 4.9.0
-- PyYAML >= 6.0
-## License
-MIT

informatica_python-1.4.0/README.md DELETED Viewed

@@ -1,106 +0,0 @@
-# informatica-python
-Convert Informatica PowerCenter workflow XML files to Python/PySpark code.
-## Installation
-```bash
-pip install informatica-python
-```
-## Quick Start
-### Command Line
-```bash
-# Convert XML to Python files in a directory
-informatica-python workflow.xml -o output_dir
-# Convert XML to a zip file
-informatica-python workflow.xml -z output.zip
-# Use a different data library (pandas, dask, polars, vaex, modin)
-informatica-python workflow.xml -o output_dir --data-lib polars
-# Parse XML to JSON (no code generation)
-informatica-python workflow.xml --json
-# Save parsed JSON to file
-informatica-python workflow.xml --json-file parsed.json
-```
-### Python API
-```python
-from informatica_python import InformaticaConverter
-# Convert XML to Python files
-converter = InformaticaConverter(data_lib="pandas")
-converter.convert("workflow.xml", output_dir="output")
-# Convert to zip
-converter.convert("workflow.xml", output_zip="output.zip")
-# Parse XML to JSON dict
-result = converter.parse_file("workflow.xml")
-# Parse XML string
-result = converter.parse_string(xml_string)
-```
-## Generated Output Files
-| File | Description |
-|------|-------------|
-| `helper_functions.py` | Database/file I/O functions plus Python equivalents for 50+ Informatica expression functions |
-| `mapping_N.py` | One file per mapping with full transformation logic |
-| `workflow.py` | Task orchestration with topological ordering |
-| `config.yml` | Connection configs, source/target metadata, variables |
-| `all_sql_queries.sql` | All extracted SQL queries (source qualifiers, lookups, pre/post SQL) |
-| `error_log.txt` | Conversion summary, warnings, and coverage statistics |
-## Supported Transformation Types
-- Source Qualifier / Application Source Qualifier
-- Expression
-- Filter
-- Aggregator
-- Sorter
-- Joiner
-- Lookup Procedure
-- Router
-- Union
-- Update Strategy
-- Sequence Generator
-- Normalizer
-- Rank
-- Stored Procedure (placeholder)
-- Custom Transformation (placeholder)
-- Java Transformation (placeholder)
-- SQL Transformation
-## Supported Data Libraries
-Choose your preferred data manipulation library with `--data-lib`:
-- **pandas** (default) — Standard Python data analysis
-- **dask** — Parallel computing with pandas-like API
-- **polars** — Fast DataFrame library written in Rust
-- **vaex** — Out-of-core DataFrames for large datasets
-- **modin** — Drop-in pandas replacement with parallel execution
-## Informatica Expression Functions
-The generated `helper_functions.py` includes Python equivalents for:
-`IIF`, `DECODE`, `NVL`, `NVL2`, `ISNULL`, `LTRIM`, `RTRIM`, `UPPER`, `LOWER`, `SUBSTR`, `LPAD`, `RPAD`, `TO_CHAR`, `TO_DATE`, `TO_INTEGER`, `TO_BIGINT`, `TO_FLOAT`, `TO_DECIMAL`, `REPLACECHR`, `REPLACESTR`, `INSTR`, `LENGTH`, `CONCAT`, `REG_EXTRACT`, `REG_MATCH`, `REG_REPLACE`, `GET_DATE_PART`, `ADD_TO_DATE`, `IS_DATE`, `IS_NUMBER`, `IS_SPACES`, `SYSDATE`, `ERROR`, `ABORT`, and more.
-## Requirements
-- Python >= 3.8
-- lxml >= 4.9.0
-- PyYAML >= 6.0
-## License
-MIT

informatica_python-1.4.0/informatica_python/__init__.py DELETED Viewed

@@ -1,4 +0,0 @@
-from informatica_python.converter import InformaticaConverter
-__version__ = "1.0.0"
-__all__ = ["InformaticaConverter"]

informatica_python-1.4.0/informatica_python.egg-info/PKG-INFO DELETED Viewed

@@ -1,118 +0,0 @@
-Metadata-Version: 2.4
-Name: informatica-python
-Version: 1.4.0
-Summary: Convert Informatica PowerCenter workflow XML to Python/PySpark code
-License-Expression: MIT
-Requires-Python: >=3.8
-Description-Content-Type: text/markdown
-Requires-Dist: lxml>=4.9.0
-Requires-Dist: pyyaml>=6.0
-Provides-Extra: dev
-Requires-Dist: pytest>=7.0; extra == "dev"
-# informatica-python
-Convert Informatica PowerCenter workflow XML files to Python/PySpark code.
-## Installation
-```bash
-pip install informatica-python
-```
-## Quick Start
-### Command Line
-```bash
-# Convert XML to Python files in a directory
-informatica-python workflow.xml -o output_dir
-# Convert XML to a zip file
-informatica-python workflow.xml -z output.zip
-# Use a different data library (pandas, dask, polars, vaex, modin)
-informatica-python workflow.xml -o output_dir --data-lib polars
-# Parse XML to JSON (no code generation)
-informatica-python workflow.xml --json
-# Save parsed JSON to file
-informatica-python workflow.xml --json-file parsed.json
-```
-### Python API
-```python
-from informatica_python import InformaticaConverter
-# Convert XML to Python files
-converter = InformaticaConverter(data_lib="pandas")
-converter.convert("workflow.xml", output_dir="output")
-# Convert to zip
-converter.convert("workflow.xml", output_zip="output.zip")
-# Parse XML to JSON dict
-result = converter.parse_file("workflow.xml")
-# Parse XML string
-result = converter.parse_string(xml_string)
-```
-## Generated Output Files
-| File | Description |
-|------|-------------|
-| `helper_functions.py` | Database/file I/O functions plus Python equivalents for 50+ Informatica expression functions |
-| `mapping_N.py` | One file per mapping with full transformation logic |
-| `workflow.py` | Task orchestration with topological ordering |
-| `config.yml` | Connection configs, source/target metadata, variables |
-| `all_sql_queries.sql` | All extracted SQL queries (source qualifiers, lookups, pre/post SQL) |
-| `error_log.txt` | Conversion summary, warnings, and coverage statistics |
-## Supported Transformation Types
-- Source Qualifier / Application Source Qualifier
-- Expression
-- Filter
-- Aggregator
-- Sorter
-- Joiner
-- Lookup Procedure
-- Router
-- Union
-- Update Strategy
-- Sequence Generator
-- Normalizer
-- Rank
-- Stored Procedure (placeholder)
-- Custom Transformation (placeholder)
-- Java Transformation (placeholder)
-- SQL Transformation
-## Supported Data Libraries
-Choose your preferred data manipulation library with `--data-lib`:
-- **pandas** (default) — Standard Python data analysis
-- **dask** — Parallel computing with pandas-like API
-- **polars** — Fast DataFrame library written in Rust
-- **vaex** — Out-of-core DataFrames for large datasets
-- **modin** — Drop-in pandas replacement with parallel execution
-## Informatica Expression Functions
-The generated `helper_functions.py` includes Python equivalents for:
-`IIF`, `DECODE`, `NVL`, `NVL2`, `ISNULL`, `LTRIM`, `RTRIM`, `UPPER`, `LOWER`, `SUBSTR`, `LPAD`, `RPAD`, `TO_CHAR`, `TO_DATE`, `TO_INTEGER`, `TO_BIGINT`, `TO_FLOAT`, `TO_DECIMAL`, `REPLACECHR`, `REPLACESTR`, `INSTR`, `LENGTH`, `CONCAT`, `REG_EXTRACT`, `REG_MATCH`, `REG_REPLACE`, `GET_DATE_PART`, `ADD_TO_DATE`, `IS_DATE`, `IS_NUMBER`, `IS_SPACES`, `SYSDATE`, `ERROR`, `ABORT`, and more.
-## Requirements
-- Python >= 3.8
-- lxml >= 4.9.0
-- PyYAML >= 6.0
-## License
-MIT

informatica_python-1.4.0/pyproject.toml DELETED Viewed

@@ -1,24 +0,0 @@
-[build-system]
-requires = ["setuptools>=68.0", "wheel"]
-build-backend = "setuptools.build_meta"
-[project]
-name = "informatica-python"
-version = "1.4.0"
-description = "Convert Informatica PowerCenter workflow XML to Python/PySpark code"
-readme = "README.md"
-license = "MIT"
-requires-python = ">=3.8"
-dependencies = [
-    "lxml>=4.9.0",
-    "pyyaml>=6.0",
-]
-[project.scripts]
-informatica-python = "informatica_python.cli:main"
-[project.optional-dependencies]
-dev = ["pytest>=7.0"]
-[tool.setuptools.packages.find]
-include = ["informatica_python*"]