PyPI - informatica-python - Versions diffs - 1.0.0__tar.gz - Mend

informatica-python 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

informatica_python-1.0.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,118 @@
+Metadata-Version: 2.4
+Name: informatica-python
+Version: 1.0.0
+Summary: Convert Informatica PowerCenter workflow XML to Python/PySpark code
+License: MIT
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+Requires-Dist: lxml>=4.9.0
+Requires-Dist: pyyaml>=6.0
+Provides-Extra: dev
+Requires-Dist: pytest>=7.0; extra == "dev"
+# informatica-python
+Convert Informatica PowerCenter workflow XML files to Python/PySpark code.
+## Installation
+```bash
+pip install informatica-python
+```
+## Quick Start
+### Command Line
+```bash
+# Convert XML to Python files in a directory
+informatica-python workflow.xml -o output_dir
+# Convert XML to a zip file
+informatica-python workflow.xml -z output.zip
+# Use a different data library (pandas, dask, polars, vaex, modin)
+informatica-python workflow.xml -o output_dir --data-lib polars
+# Parse XML to JSON (no code generation)
+informatica-python workflow.xml --json
+# Save parsed JSON to file
+informatica-python workflow.xml --json-file parsed.json
+```
+### Python API
+```python
+from informatica_python import InformaticaConverter
+# Convert XML to Python files
+converter = InformaticaConverter(data_lib="pandas")
+converter.convert("workflow.xml", output_dir="output")
+# Convert to zip
+converter.convert("workflow.xml", output_zip="output.zip")
+# Parse XML to JSON dict
+result = converter.parse_file("workflow.xml")
+# Parse XML string
+result = converter.parse_string(xml_string)
+```
+## Generated Output Files
+| File | Description |
+|------|-------------|
+| `helper_functions.py` | Database/file I/O functions plus Python equivalents for 50+ Informatica expression functions |
+| `mapping_N.py` | One file per mapping with full transformation logic |
+| `workflow.py` | Task orchestration with topological ordering |
+| `config.yml` | Connection configs, source/target metadata, variables |
+| `all_sql_queries.sql` | All extracted SQL queries (source qualifiers, lookups, pre/post SQL) |
+| `error_log.txt` | Conversion summary, warnings, and coverage statistics |
+## Supported Transformation Types
+- Source Qualifier / Application Source Qualifier
+- Expression
+- Filter
+- Aggregator
+- Sorter
+- Joiner
+- Lookup Procedure
+- Router
+- Union
+- Update Strategy
+- Sequence Generator
+- Normalizer
+- Rank
+- Stored Procedure (placeholder)
+- Custom Transformation (placeholder)
+- Java Transformation (placeholder)
+- SQL Transformation
+## Supported Data Libraries
+Choose your preferred data manipulation library with `--data-lib`:
+- **pandas** (default) — Standard Python data analysis
+- **dask** — Parallel computing with pandas-like API
+- **polars** — Fast DataFrame library written in Rust
+- **vaex** — Out-of-core DataFrames for large datasets
+- **modin** — Drop-in pandas replacement with parallel execution
+## Informatica Expression Functions
+The generated `helper_functions.py` includes Python equivalents for:
+`IIF`, `DECODE`, `NVL`, `NVL2`, `ISNULL`, `LTRIM`, `RTRIM`, `UPPER`, `LOWER`, `SUBSTR`, `LPAD`, `RPAD`, `TO_CHAR`, `TO_DATE`, `TO_INTEGER`, `TO_BIGINT`, `TO_FLOAT`, `TO_DECIMAL`, `REPLACECHR`, `REPLACESTR`, `INSTR`, `LENGTH`, `CONCAT`, `REG_EXTRACT`, `REG_MATCH`, `REG_REPLACE`, `GET_DATE_PART`, `ADD_TO_DATE`, `IS_DATE`, `IS_NUMBER`, `IS_SPACES`, `SYSDATE`, `ERROR`, `ABORT`, and more.
+## Requirements
+- Python >= 3.8
+- lxml >= 4.9.0
+- PyYAML >= 6.0
+## License
+MIT

informatica_python-1.0.0/README.md ADDED Viewed

@@ -0,0 +1,106 @@
+# informatica-python
+Convert Informatica PowerCenter workflow XML files to Python/PySpark code.
+## Installation
+```bash
+pip install informatica-python
+```
+## Quick Start
+### Command Line
+```bash
+# Convert XML to Python files in a directory
+informatica-python workflow.xml -o output_dir
+# Convert XML to a zip file
+informatica-python workflow.xml -z output.zip
+# Use a different data library (pandas, dask, polars, vaex, modin)
+informatica-python workflow.xml -o output_dir --data-lib polars
+# Parse XML to JSON (no code generation)
+informatica-python workflow.xml --json
+# Save parsed JSON to file
+informatica-python workflow.xml --json-file parsed.json
+```
+### Python API
+```python
+from informatica_python import InformaticaConverter
+# Convert XML to Python files
+converter = InformaticaConverter(data_lib="pandas")
+converter.convert("workflow.xml", output_dir="output")
+# Convert to zip
+converter.convert("workflow.xml", output_zip="output.zip")
+# Parse XML to JSON dict
+result = converter.parse_file("workflow.xml")
+# Parse XML string
+result = converter.parse_string(xml_string)
+```
+## Generated Output Files
+| File | Description |
+|------|-------------|
+| `helper_functions.py` | Database/file I/O functions plus Python equivalents for 50+ Informatica expression functions |
+| `mapping_N.py` | One file per mapping with full transformation logic |
+| `workflow.py` | Task orchestration with topological ordering |
+| `config.yml` | Connection configs, source/target metadata, variables |
+| `all_sql_queries.sql` | All extracted SQL queries (source qualifiers, lookups, pre/post SQL) |
+| `error_log.txt` | Conversion summary, warnings, and coverage statistics |
+## Supported Transformation Types
+- Source Qualifier / Application Source Qualifier
+- Expression
+- Filter
+- Aggregator
+- Sorter
+- Joiner
+- Lookup Procedure
+- Router
+- Union
+- Update Strategy
+- Sequence Generator
+- Normalizer
+- Rank
+- Stored Procedure (placeholder)
+- Custom Transformation (placeholder)
+- Java Transformation (placeholder)
+- SQL Transformation
+## Supported Data Libraries
+Choose your preferred data manipulation library with `--data-lib`:
+- **pandas** (default) — Standard Python data analysis
+- **dask** — Parallel computing with pandas-like API
+- **polars** — Fast DataFrame library written in Rust
+- **vaex** — Out-of-core DataFrames for large datasets
+- **modin** — Drop-in pandas replacement with parallel execution
+## Informatica Expression Functions
+The generated `helper_functions.py` includes Python equivalents for:
+`IIF`, `DECODE`, `NVL`, `NVL2`, `ISNULL`, `LTRIM`, `RTRIM`, `UPPER`, `LOWER`, `SUBSTR`, `LPAD`, `RPAD`, `TO_CHAR`, `TO_DATE`, `TO_INTEGER`, `TO_BIGINT`, `TO_FLOAT`, `TO_DECIMAL`, `REPLACECHR`, `REPLACESTR`, `INSTR`, `LENGTH`, `CONCAT`, `REG_EXTRACT`, `REG_MATCH`, `REG_REPLACE`, `GET_DATE_PART`, `ADD_TO_DATE`, `IS_DATE`, `IS_NUMBER`, `IS_SPACES`, `SYSDATE`, `ERROR`, `ABORT`, and more.
+## Requirements
+- Python >= 3.8
+- lxml >= 4.9.0
+- PyYAML >= 6.0
+## License
+MIT

informatica_python-1.0.0/informatica_python/__init__.py ADDED Viewed

@@ -0,0 +1,4 @@
+from informatica_python.converter import InformaticaConverter
+__version__ = "1.0.0"
+__all__ = ["InformaticaConverter"]

informatica_python-1.0.0/informatica_python/cli.py ADDED Viewed

@@ -0,0 +1,83 @@
+import argparse
+import sys
+import json
+from informatica_python.converter import InformaticaConverter
+def main():
+    parser = argparse.ArgumentParser(
+        prog="informatica-python",
+        description="Convert Informatica PowerCenter workflow XML to Python/PySpark code",
+    )
+    parser.add_argument(
+        "input_file",
+        help="Path to Informatica workflow XML file",
+    )
+    parser.add_argument(
+        "-o", "--output",
+        default="output",
+        help="Output directory for generated files (default: output)",
+    )
+    parser.add_argument(
+        "-z", "--zip",
+        default=None,
+        help="Output as zip file (provide zip file path)",
+    )
+    parser.add_argument(
+        "--data-lib",
+        choices=["pandas", "dask", "polars", "vaex", "modin"],
+        default="pandas",
+        help="Data manipulation library to use (default: pandas)",
+    )
+    parser.add_argument(
+        "--json",
+        action="store_true",
+        dest="output_json",
+        help="Output parsed XML as JSON (no code generation)",
+    )
+    parser.add_argument(
+        "--json-file",
+        default=None,
+        help="Save parsed JSON to a file",
+    )
+    args = parser.parse_args()
+    converter = InformaticaConverter(data_lib=args.data_lib)
+    try:
+        if args.output_json or args.json_file:
+            result = converter.parse_file(args.input_file)
+            json_str = json.dumps(result, indent=2, ensure_ascii=False)
+            if args.json_file:
+                with open(args.json_file, "w", encoding="utf-8") as f:
+                    f.write(json_str)
+                print(f"JSON saved to: {args.json_file}")
+            else:
+                print(json_str)
+        else:
+            output_path = converter.convert(
+                args.input_file,
+                output_dir=args.output,
+                output_zip=args.zip,
+            )
+            print(f"Conversion complete! Output: {output_path}")
+            print(f"Files generated:")
+            if args.zip:
+                import zipfile
+                with zipfile.ZipFile(output_path, "r") as zf:
+                    for name in zf.namelist():
+                        print(f"  - {name}")
+            else:
+                import os
+                for f in sorted(os.listdir(output_path)):
+                    print(f"  - {f}")
+    except Exception as e:
+        print(f"Error: {e}", file=sys.stderr)
+        sys.exit(1)
+if __name__ == "__main__":
+    main()

informatica_python-1.0.0/informatica_python/converter.py ADDED Viewed

@@ -0,0 +1,285 @@
+import os
+import json
+import zipfile
+import tempfile
+from typing import Optional
+from informatica_python.parser import InformaticaParser
+from informatica_python.models import PowermartDef, FolderDef
+from informatica_python.generators.helper_gen import generate_helper_functions
+from informatica_python.generators.mapping_gen import generate_mapping_code
+from informatica_python.generators.workflow_gen import generate_workflow_code
+from informatica_python.generators.config_gen import generate_config
+from informatica_python.generators.sql_gen import generate_sql_file
+from informatica_python.generators.error_log_gen import generate_error_log
+class InformaticaConverter:
+    def __init__(self, data_lib: str = "pandas"):
+        self.data_lib = data_lib
+        self.parser = InformaticaParser()
+        self.powermart = None
+    def parse_file(self, file_path: str) -> dict:
+        self.powermart = self.parser.parse_file(file_path)
+        return self.to_json()
+    def parse_string(self, xml_string: str) -> dict:
+        self.powermart = self.parser.parse_string(xml_string)
+        return self.to_json()
+    def to_json(self) -> dict:
+        if not self.powermart:
+            return {}
+        return self._powermart_to_dict(self.powermart)
+    def convert(self, file_path: str, output_dir: str = "output",
+                output_zip: Optional[str] = None) -> str:
+        self.powermart = self.parser.parse_file(file_path)
+        if not self.powermart.repositories:
+            raise ValueError("No repository found in XML file")
+        all_folders = []
+        for repo in self.powermart.repositories:
+            all_folders.extend(repo.folders)
+        if not all_folders:
+            raise ValueError("No folder found in XML file")
+        if len(all_folders) == 1:
+            return self._convert_folder(all_folders[0], output_dir, output_zip)
+        result_path = output_dir if not output_zip else os.path.dirname(output_zip) or "."
+        for folder in all_folders:
+            folder_dir = os.path.join(output_dir, folder.name)
+            folder_zip = None
+            if output_zip:
+                base, ext = os.path.splitext(output_zip)
+                folder_zip = f"{base}_{folder.name}{ext}"
+            self._convert_folder(folder, folder_dir, folder_zip)
+        return result_path
+    def convert_string(self, xml_string: str, output_dir: str = "output",
+                       output_zip: Optional[str] = None) -> str:
+        self.powermart = self.parser.parse_string(xml_string)
+        if not self.powermart.repositories:
+            raise ValueError("No repository found in XML")
+        all_folders = []
+        for repo in self.powermart.repositories:
+            all_folders.extend(repo.folders)
+        if not all_folders:
+            raise ValueError("No folder found in XML")
+        if len(all_folders) == 1:
+            return self._convert_folder(all_folders[0], output_dir, output_zip)
+        result_path = output_dir if not output_zip else os.path.dirname(output_zip) or "."
+        for folder in all_folders:
+            folder_dir = os.path.join(output_dir, folder.name)
+            folder_zip = None
+            if output_zip:
+                base, ext = os.path.splitext(output_zip)
+                folder_zip = f"{base}_{folder.name}{ext}"
+            self._convert_folder(folder, folder_dir, folder_zip)
+        return result_path
+    def _convert_folder(self, folder: FolderDef, output_dir: str,
+                        output_zip: Optional[str] = None) -> str:
+        files = {}
+        files["helper_functions.py"] = generate_helper_functions(folder, self.data_lib)
+        for i, mapping in enumerate(folder.mappings, 1):
+            code = generate_mapping_code(mapping, folder, self.data_lib, i)
+            files[f"mapping_{i}.py"] = code
+        files["workflow.py"] = generate_workflow_code(folder)
+        files["config.yml"] = generate_config(folder, self.data_lib)
+        files["all_sql_queries.sql"] = generate_sql_file(folder)
+        files["error_log.txt"] = generate_error_log(
+            folder,
+            parser_errors=self.parser.errors,
+            parser_warnings=self.parser.warnings,
+        )
+        if output_zip:
+            return self._write_zip(files, output_zip)
+        else:
+            return self._write_files(files, output_dir)
+    def _write_files(self, files: dict, output_dir: str) -> str:
+        os.makedirs(output_dir, exist_ok=True)
+        for filename, content in files.items():
+            filepath = os.path.join(output_dir, filename)
+            with open(filepath, "w", encoding="utf-8") as f:
+                f.write(content)
+        return output_dir
+    def _write_zip(self, files: dict, zip_path: str) -> str:
+        os.makedirs(os.path.dirname(zip_path) or ".", exist_ok=True)
+        with zipfile.ZipFile(zip_path, "w", zipfile.ZIP_DEFLATED) as zf:
+            for filename, content in files.items():
+                zf.writestr(filename, content)
+        return zip_path
+    def _powermart_to_dict(self, pm: PowermartDef) -> dict:
+        result = {
+            "creation_date": pm.creation_date,
+            "repository_version": pm.repository_version,
+            "repositories": [],
+        }
+        for repo in pm.repositories:
+            repo_dict = {
+                "name": repo.name,
+                "version": repo.version,
+                "codepage": repo.codepage,
+                "database_type": repo.database_type,
+                "folders": [],
+            }
+            for folder in repo.folders:
+                folder_dict = self._folder_to_dict(folder)
+                repo_dict["folders"].append(folder_dict)
+            result["repositories"].append(repo_dict)
+        return result
+    def _folder_to_dict(self, folder: FolderDef) -> dict:
+        return {
+            "name": folder.name,
+            "owner": folder.owner,
+            "description": folder.description,
+            "sources": [self._source_to_dict(s) for s in folder.sources],
+            "targets": [self._target_to_dict(t) for t in folder.targets],
+            "mappings": [self._mapping_to_dict(m) for m in folder.mappings],
+            "sessions": [{"name": s.name, "mapping_name": s.mapping_name} for s in folder.sessions],
+            "workflows": [self._workflow_to_dict(w) for w in folder.workflows],
+            "tasks": [{"name": t.name, "type": t.type} for t in folder.tasks],
+            "configs": [{"name": c.name} for c in folder.configs],
+            "schedulers": [{"name": s.name} for s in folder.schedulers],
+            "shortcuts": [{"name": s.name, "reference": s.reference_name} for s in folder.shortcuts],
+            "mapplets": [{"name": m.name} for m in folder.mapplets],
+        }
+    def _source_to_dict(self, src):
+        return {
+            "name": src.name,
+            "database_type": src.database_type,
+            "db_name": src.db_name,
+            "owner_name": src.owner_name,
+            "fields": [
+                {
+                    "name": f.name,
+                    "datatype": f.datatype,
+                    "precision": f.precision,
+                    "scale": f.scale,
+                    "nullable": f.nullable,
+                    "keytype": f.keytype,
+                }
+                for f in src.fields
+            ],
+        }
+    def _target_to_dict(self, tgt):
+        return {
+            "name": tgt.name,
+            "database_type": tgt.database_type,
+            "fields": [
+                {
+                    "name": f.name,
+                    "datatype": f.datatype,
+                    "precision": f.precision,
+                    "scale": f.scale,
+                    "nullable": f.nullable,
+                    "keytype": f.keytype,
+                }
+                for f in tgt.fields
+            ],
+        }
+    def _mapping_to_dict(self, mapping):
+        return {
+            "name": mapping.name,
+            "description": mapping.description,
+            "is_valid": mapping.is_valid,
+            "transformations": [
+                {
+                    "name": tx.name,
+                    "type": tx.type,
+                    "fields": [
+                        {
+                            "name": f.name,
+                            "datatype": f.datatype,
+                            "expression": f.expression,
+                            "porttype": f.porttype,
+                        }
+                        for f in tx.fields
+                    ],
+                    "attributes": [
+                        {"name": a.name, "value": a.value}
+                        for a in tx.attributes
+                    ],
+                }
+                for tx in mapping.transformations
+            ],
+            "connectors": [
+                {
+                    "from_field": c.from_field,
+                    "from_instance": c.from_instance,
+                    "to_field": c.to_field,
+                    "to_instance": c.to_instance,
+                }
+                for c in mapping.connectors
+            ],
+            "instances": [
+                {
+                    "name": i.name,
+                    "type": i.type,
+                    "transformation_name": i.transformation_name,
+                }
+                for i in mapping.instances
+            ],
+            "variables": [
+                {
+                    "name": v.name,
+                    "datatype": v.datatype,
+                    "default_value": v.default_value,
+                }
+                for v in mapping.variables
+            ],
+        }
+    def _workflow_to_dict(self, wf):
+        return {
+            "name": wf.name,
+            "description": wf.description,
+            "is_valid": wf.is_valid,
+            "task_instances": [
+                {
+                    "name": t.name,
+                    "task_name": t.task_name,
+                    "task_type": t.task_type,
+                }
+                for t in wf.task_instances
+            ],
+            "links": [
+                {
+                    "from": l.from_instance,
+                    "to": l.to_instance,
+                    "condition": l.condition,
+                }
+                for l in wf.links
+            ],
+            "variables": [
+                {
+                    "name": v.name,
+                    "datatype": v.datatype,
+                    "default_value": v.default_value,
+                }
+                for v in wf.variables
+            ],
+        }

informatica_python-1.0.0/informatica_python/generators/__init__.py ADDED Viewed

File without changes