informatica-python 1.8.0__tar.gz → 1.8.2__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {informatica_python-1.8.0 → informatica_python-1.8.2}/PKG-INFO +158 -18
- informatica_python-1.8.2/README.md +341 -0
- {informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/__init__.py +1 -1
- {informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/generators/mapping_gen.py +16 -4
- {informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python.egg-info/PKG-INFO +158 -18
- {informatica_python-1.8.0 → informatica_python-1.8.2}/pyproject.toml +1 -1
- {informatica_python-1.8.0 → informatica_python-1.8.2}/tests/test_integration.py +1 -1
- informatica_python-1.8.0/README.md +0 -201
- {informatica_python-1.8.0 → informatica_python-1.8.2}/LICENSE +0 -0
- {informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/cli.py +0 -0
- {informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/converter.py +0 -0
- {informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/generators/__init__.py +0 -0
- {informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/generators/config_gen.py +0 -0
- {informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/generators/error_log_gen.py +0 -0
- {informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/generators/helper_gen.py +0 -0
- {informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/generators/sql_gen.py +0 -0
- {informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/generators/workflow_gen.py +0 -0
- {informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/models.py +0 -0
- {informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/parser.py +0 -0
- {informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/utils/__init__.py +0 -0
- {informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/utils/datatype_map.py +0 -0
- {informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/utils/expression_converter.py +0 -0
- {informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/utils/lib_adapters.py +0 -0
- {informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/utils/sql_dialect.py +0 -0
- {informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python.egg-info/SOURCES.txt +0 -0
- {informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python.egg-info/dependency_links.txt +0 -0
- {informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python.egg-info/entry_points.txt +0 -0
- {informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python.egg-info/requires.txt +0 -0
- {informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python.egg-info/top_level.txt +0 -0
- {informatica_python-1.8.0 → informatica_python-1.8.2}/setup.cfg +0 -0
- {informatica_python-1.8.0 → informatica_python-1.8.2}/tests/test_converter.py +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: informatica-python
|
|
3
|
-
Version: 1.8.
|
|
3
|
+
Version: 1.8.2
|
|
4
4
|
Summary: Convert Informatica PowerCenter workflow XML to Python/PySpark code
|
|
5
5
|
Author: Nick
|
|
6
6
|
License: MIT
|
|
@@ -59,6 +59,12 @@ informatica-python workflow_export.xml -z output.zip
|
|
|
59
59
|
# Use a different data library
|
|
60
60
|
informatica-python workflow_export.xml -o output_dir --data-lib polars
|
|
61
61
|
|
|
62
|
+
# Include a parameter file
|
|
63
|
+
informatica-python workflow_export.xml -o output_dir --param-file workflow.param
|
|
64
|
+
|
|
65
|
+
# Enable data quality validation on type casts
|
|
66
|
+
informatica-python workflow_export.xml -o output_dir --validate-casts
|
|
67
|
+
|
|
62
68
|
# Parse to JSON only (no code generation)
|
|
63
69
|
informatica-python workflow_export.xml --json
|
|
64
70
|
|
|
@@ -90,12 +96,12 @@ converter.convert_to_files("workflow_export.xml", "output_dir", data_lib="polars
|
|
|
90
96
|
|
|
91
97
|
| File | Description |
|
|
92
98
|
|------|-------------|
|
|
93
|
-
| `helper_functions.py` | Database/file I/O helpers, Informatica expression equivalents (80+ functions) |
|
|
94
|
-
| `mapping_N.py` | One per mapping — transformation logic, source reads, target writes |
|
|
95
|
-
| `workflow.py` | Task orchestration with topological ordering and error handling |
|
|
99
|
+
| `helper_functions.py` | Database/file I/O helpers, Informatica expression equivalents (80+ functions), window/analytic functions, stored procedure execution, state persistence |
|
|
100
|
+
| `mapping_N.py` | One per mapping — transformation logic with row-count logging, source reads, target writes, inline documentation |
|
|
101
|
+
| `workflow.py` | Task orchestration with topological ordering, decision branching, worklet calls, and error handling |
|
|
96
102
|
| `config.yml` | Connection configs, source/target metadata, runtime parameters |
|
|
97
|
-
| `all_sql_queries.sql` | All SQL extracted from Source Qualifiers, Lookups, SQL transforms |
|
|
98
|
-
| `error_log.txt` | Conversion summary,
|
|
103
|
+
| `all_sql_queries.sql` | All SQL extracted from Source Qualifiers, Lookups, SQL transforms (with ANSI-translated variants) |
|
|
104
|
+
| `error_log.txt` | Conversion summary with unsupported transform analysis, unmapped port detection, and unknown expression function tracing |
|
|
99
105
|
|
|
100
106
|
## Supported Data Libraries
|
|
101
107
|
|
|
@@ -113,21 +119,21 @@ Select via `--data-lib` CLI flag or `data_lib` parameter:
|
|
|
113
119
|
|
|
114
120
|
The code generator produces real, runnable Python for these transformation types:
|
|
115
121
|
|
|
116
|
-
- **Source Qualifier** — SQL override, pre/post SQL, column selection
|
|
117
|
-
- **Expression** — Field-level expressions converted to pandas operations
|
|
118
|
-
- **Filter** — Row filtering with converted conditions
|
|
119
|
-
- **Joiner** — `pd.merge()` with join type and condition parsing
|
|
120
|
-
- **Lookup** — `pd.merge()` lookups with connection-aware DB/file reads
|
|
121
|
-
- **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST
|
|
122
|
+
- **Source Qualifier** — SQL override, pre/post SQL, column selection, session connection overrides
|
|
123
|
+
- **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style)
|
|
124
|
+
- **Filter** — Row filtering with vectorized converted conditions
|
|
125
|
+
- **Joiner** — `pd.merge()` with join type and condition parsing (inner/left/right/outer)
|
|
126
|
+
- **Lookup** — `pd.merge()` lookups with connection-aware DB/file reads, multiple match policies, default values
|
|
127
|
+
- **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST, computed aggregates
|
|
122
128
|
- **Sorter** — `sort_values()` with multi-key ascending/descending
|
|
123
|
-
- **Router** — Multi-group conditional routing with
|
|
129
|
+
- **Router** — Multi-group conditional routing with named groups
|
|
124
130
|
- **Union** — `pd.concat()` across multiple input groups
|
|
125
|
-
- **Update Strategy** —
|
|
131
|
+
- **Update Strategy** — DD_INSERT/DD_UPDATE/DD_DELETE/DD_REJECT routing with actual target INSERT/UPDATE/DELETE operations, dialect-aware SQL placeholders, auto-detected primary keys
|
|
126
132
|
- **Sequence Generator** — Auto-incrementing ID columns
|
|
127
133
|
- **Normalizer** — `pd.melt()` with auto-detected id/value vars
|
|
128
134
|
- **Rank** — `groupby().rank()` with Top-N filtering
|
|
129
|
-
- **Stored Procedure** —
|
|
130
|
-
- **Transaction Control** — Commit/rollback logic
|
|
135
|
+
- **Stored Procedure** — Full code generation with Oracle/MSSQL/generic support, input/output parameter mapping
|
|
136
|
+
- **Transaction Control** — Commit/rollback logic
|
|
131
137
|
- **Custom / Java** — Placeholder stubs with TODO markers
|
|
132
138
|
- **SQL Transform** — Direct SQL execution pass-through
|
|
133
139
|
|
|
@@ -147,13 +153,118 @@ The code generator produces real, runnable Python for these transformation types
|
|
|
147
153
|
|
|
148
154
|
## Key Features
|
|
149
155
|
|
|
156
|
+
### Row-Count Logging (v1.8+)
|
|
157
|
+
|
|
158
|
+
Generated code automatically logs row counts at every step of the data pipeline:
|
|
159
|
+
|
|
160
|
+
```
|
|
161
|
+
Source SQ_CUSTOMERS: 10000 rows read
|
|
162
|
+
EXP_CALC (Expression): 10000 input rows -> 10000 output rows
|
|
163
|
+
FIL_ACTIVE (Filter): 10000 input rows -> 8542 output rows
|
|
164
|
+
AGG_TOTALS (Aggregator): 8542 input rows -> 150 output rows
|
|
165
|
+
Target TGT_SUMMARY: 150 rows written
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
All row-count operations are backend-safe (wrapped in try/except), so Dask and other lazy-evaluation backends won't fail.
|
|
169
|
+
|
|
170
|
+
### Generated Code Documentation (v1.8+)
|
|
171
|
+
|
|
172
|
+
Every generated mapping function includes a rich docstring describing:
|
|
173
|
+
- Mapping name and original Informatica description
|
|
174
|
+
- Source and target tables/files
|
|
175
|
+
- Transformation pipeline with field counts per step
|
|
176
|
+
|
|
177
|
+
Each transformation block is annotated with:
|
|
178
|
+
- Separator headers for visual scanning
|
|
179
|
+
- Transform type and description (from Informatica XML)
|
|
180
|
+
- Input and output field lists (truncated at 10 for readability)
|
|
181
|
+
|
|
182
|
+
### Window / Analytic Functions (v1.7+)
|
|
183
|
+
|
|
184
|
+
DataFrame-level analytic functions for aggregation transforms:
|
|
185
|
+
- `moving_avg_df(df, col, window)` — rolling mean via `.rolling().mean()`
|
|
186
|
+
- `moving_sum_df(df, col, window)` — rolling sum via `.rolling().sum()`
|
|
187
|
+
- `cume_df(df, col)` — cumulative sum via `.expanding().sum()`
|
|
188
|
+
- `percentile_df(df, col, pct)` — quantile via `.quantile()`
|
|
189
|
+
|
|
190
|
+
### Update Strategy with Target Operations (v1.7+)
|
|
191
|
+
|
|
192
|
+
Update Strategy transforms now generate real INSERT/UPDATE/DELETE operations:
|
|
193
|
+
- Static strategies (0/1/2/3) map to INSERT/UPDATE/DELETE/REJECT
|
|
194
|
+
- DD_INSERT/DD_UPDATE/DD_DELETE/DD_REJECT expressions parsed from conditions
|
|
195
|
+
- Target writer splits rows and routes to appropriate SQL operations
|
|
196
|
+
- Dialect-aware SQL placeholders (`?` for MSSQL, `%s` for PostgreSQL/Oracle)
|
|
197
|
+
- Primary key columns auto-detected from target field definitions
|
|
198
|
+
|
|
199
|
+
### Stored Procedure Execution (v1.7+)
|
|
200
|
+
|
|
201
|
+
Full stored procedure code generation (not just stubs):
|
|
202
|
+
- Oracle: `cursor.callproc()` with output parameter registration
|
|
203
|
+
- MSSQL: `EXEC` with output parameter capture
|
|
204
|
+
- Generic: `CALL` syntax for other databases
|
|
205
|
+
- Input/output parameter mapping from transformation fields
|
|
206
|
+
- Empty-input guard prevents errors on empty upstream DataFrames
|
|
207
|
+
|
|
208
|
+
### State Persistence (v1.7+)
|
|
209
|
+
|
|
210
|
+
JSON-based variable persistence between workflow runs:
|
|
211
|
+
- `load_persistent_state()` / `save_persistent_state()` bracketing workflow execution
|
|
212
|
+
- `get_persistent_variable()` / `set_persistent_variable()` scoped by workflow/mapping name
|
|
213
|
+
- Mapping variables marked `is_persistent="YES"` automatically load from and save to state file
|
|
214
|
+
- Non-persistent variables remain unaffected
|
|
215
|
+
|
|
216
|
+
### SQL Dialect Translation (v1.6+)
|
|
217
|
+
|
|
218
|
+
Automatically translates vendor-specific SQL to ANSI equivalents:
|
|
219
|
+
- **Oracle:** NVL→COALESCE, SYSDATE→CURRENT_TIMESTAMP, DECODE→CASE, NVL2→CASE, (+)→ANSI JOIN, ROWNUM→LIMIT
|
|
220
|
+
- **MSSQL:** GETDATE→CURRENT_TIMESTAMP, ISNULL→COALESCE, TOP N→LIMIT, LEN→LENGTH, CHARINDEX→POSITION
|
|
221
|
+
- Auto-detects source dialect; outputs both original and translated SQL
|
|
222
|
+
|
|
223
|
+
### Enhanced Error Reporting (v1.6+)
|
|
224
|
+
|
|
225
|
+
Structured error log with three analysis sections:
|
|
226
|
+
- **Unsupported Transforms:** Lists each skipped transform with type, field count, and attributes
|
|
227
|
+
- **Unmapped Ports:** OUTPUT fields not connected to any downstream transform
|
|
228
|
+
- **Unsupported Expression Functions:** Unknown functions with location traces
|
|
229
|
+
|
|
230
|
+
### Nested Mapplet Support (v1.6+)
|
|
231
|
+
|
|
232
|
+
Recursively expands mapplet-within-mapplet instances:
|
|
233
|
+
- Double-underscore namespacing for nested transforms
|
|
234
|
+
- Depth limit of 10 with circular reference protection
|
|
235
|
+
- Connector rewiring through the full expansion tree
|
|
236
|
+
|
|
237
|
+
### Data Quality Validation (v1.6+)
|
|
238
|
+
|
|
239
|
+
Optional `--validate-casts` flag generates null-count checks before/after type casting:
|
|
240
|
+
- Counts null values pre- and post-coercion per column
|
|
241
|
+
- Logs warnings when coercion introduces new nulls
|
|
242
|
+
- Helps identify data quality issues during test runs
|
|
243
|
+
|
|
244
|
+
### Vectorized Expression Generation (v1.5+)
|
|
245
|
+
|
|
246
|
+
Column-level pandas operations instead of row-level iteration:
|
|
247
|
+
- IIF → `np.where()`, NVL → `.fillna()`, UPPER/LOWER → `.str.upper()/.str.lower()`
|
|
248
|
+
- SUBSTR → `.str[start:end]`, TO_INTEGER → `pd.to_numeric()`, TO_DATE → `pd.to_datetime()`
|
|
249
|
+
- IS NULL/IS NOT NULL → `.isna()`/`.notna()`
|
|
250
|
+
|
|
251
|
+
### Parameter File Support (v1.5+)
|
|
252
|
+
|
|
253
|
+
Standard Informatica `.param` file parsing:
|
|
254
|
+
- `[Global]` and `[folder.WF:workflow.ST:session]` section support
|
|
255
|
+
- `get_param(config, var_name)` resolution chain: config → env vars → defaults
|
|
256
|
+
- CLI `--param-file` flag for specifying parameter files
|
|
257
|
+
|
|
150
258
|
### Session Connection Overrides (v1.4+)
|
|
259
|
+
|
|
151
260
|
When sessions define per-transform connection overrides (different database, file directory, or filename), the generated code uses those overrides instead of source/target defaults.
|
|
152
261
|
|
|
153
262
|
### Worklet Support (v1.4+)
|
|
263
|
+
|
|
154
264
|
Worklet workflows are detected and generate separate `run_worklet_NAME(config)` functions. The main workflow calls these automatically for Worklet task types.
|
|
155
265
|
|
|
156
266
|
### Type Casting at Target Writes (v1.4+)
|
|
267
|
+
|
|
157
268
|
Target field datatypes are mapped to pandas types and generate proper casting code:
|
|
158
269
|
- Integers: nullable `Int64`/`Int32` or `fillna(0).astype(int)` for NOT NULL
|
|
159
270
|
- Dates: `pd.to_datetime(errors='coerce')`
|
|
@@ -161,12 +272,15 @@ Target field datatypes are mapped to pandas types and generate proper casting co
|
|
|
161
272
|
- Booleans: `.astype('boolean')`
|
|
162
273
|
|
|
163
274
|
### Flat File Handling (v1.3+)
|
|
275
|
+
|
|
164
276
|
Parses FLATFILE metadata for delimiter, fixed-width, header lines, skip rows, quote/escape chars. Generates `pd.read_fwf()` for fixed-width or enriched `read_file()` for delimited.
|
|
165
277
|
|
|
166
278
|
### Mapplet Inlining (v1.3+)
|
|
279
|
+
|
|
167
280
|
Expands Mapplet instances into prefixed transforms, rewires connectors, and eliminates duplication.
|
|
168
281
|
|
|
169
282
|
### Decision Tasks (v1.3+)
|
|
283
|
+
|
|
170
284
|
Converts Informatica decision conditions to Python if/else branches with proper variable substitution.
|
|
171
285
|
|
|
172
286
|
### Expression Converter (80+ Functions)
|
|
@@ -190,6 +304,32 @@ Converts Informatica expressions to Python equivalents:
|
|
|
190
304
|
|
|
191
305
|
## Changelog
|
|
192
306
|
|
|
307
|
+
### v1.8.x (Phase 7)
|
|
308
|
+
- Row-count logging at every pipeline step (source reads, transforms, target writes)
|
|
309
|
+
- Backend-safe logging (try/except wrapped for Dask/lazy backends)
|
|
310
|
+
- Rich mapping function docstrings with sources, targets, and transform pipeline summary
|
|
311
|
+
- Per-transform documentation headers with description, input/output field lists
|
|
312
|
+
|
|
313
|
+
### v1.7.x (Phase 6)
|
|
314
|
+
- Window/analytic functions (rolling avg/sum, cumulative sum, percentile)
|
|
315
|
+
- Update Strategy routing with actual INSERT/UPDATE/DELETE target operations
|
|
316
|
+
- Dialect-aware SQL placeholders for MSSQL/PostgreSQL/Oracle
|
|
317
|
+
- Full stored procedure code generation (Oracle/MSSQL/generic)
|
|
318
|
+
- JSON-based state persistence for mapping and workflow variables
|
|
319
|
+
- Primary key auto-detection for update strategy targets
|
|
320
|
+
|
|
321
|
+
### v1.6.x (Phase 5)
|
|
322
|
+
- SQL dialect translation (Oracle/MSSQL → ANSI)
|
|
323
|
+
- Enhanced error reporting (unsupported transforms, unmapped ports, unknown functions)
|
|
324
|
+
- Nested mapplet expansion with circular reference protection
|
|
325
|
+
- Data quality validation warnings on type casting (`--validate-casts`)
|
|
326
|
+
|
|
327
|
+
### v1.5.x (Phase 4)
|
|
328
|
+
- Parameter file support (`.param` files with section parsing)
|
|
329
|
+
- Vectorized expression generation (column-level pandas operations)
|
|
330
|
+
- Library-specific code adapters (polars/dask/modin/vaex syntax generation)
|
|
331
|
+
- 72+ integration tests
|
|
332
|
+
|
|
193
333
|
### v1.4.x (Phase 3)
|
|
194
334
|
- Session connection overrides for sources and targets
|
|
195
335
|
- Worklet function generation with safe invocation
|
|
@@ -217,8 +357,8 @@ Converts Informatica expressions to Python equivalents:
|
|
|
217
357
|
cd informatica_python
|
|
218
358
|
pip install -e ".[dev]"
|
|
219
359
|
|
|
220
|
-
# Run tests (
|
|
221
|
-
pytest tests/
|
|
360
|
+
# Run tests (136 tests)
|
|
361
|
+
pytest tests/ -v
|
|
222
362
|
```
|
|
223
363
|
|
|
224
364
|
## License
|
|
@@ -0,0 +1,341 @@
|
|
|
1
|
+
# informatica-python
|
|
2
|
+
|
|
3
|
+
Convert Informatica PowerCenter workflow XML exports into clean, runnable Python/PySpark code.
|
|
4
|
+
|
|
5
|
+
**Author:** Nick
|
|
6
|
+
**License:** MIT
|
|
7
|
+
**PyPI:** [informatica-python](https://pypi.org/project/informatica-python/)
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## Overview
|
|
12
|
+
|
|
13
|
+
`informatica-python` parses Informatica PowerCenter XML export files and generates equivalent Python code using your choice of data library. It handles all 72 DTD tags from the PowerCenter XML schema and produces a complete, ready-to-run Python project.
|
|
14
|
+
|
|
15
|
+
## Installation
|
|
16
|
+
|
|
17
|
+
```bash
|
|
18
|
+
pip install informatica-python
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
## Quick Start
|
|
22
|
+
|
|
23
|
+
### Command Line
|
|
24
|
+
|
|
25
|
+
```bash
|
|
26
|
+
# Generate Python files to a directory
|
|
27
|
+
informatica-python workflow_export.xml -o output_dir
|
|
28
|
+
|
|
29
|
+
# Generate as a zip archive
|
|
30
|
+
informatica-python workflow_export.xml -z output.zip
|
|
31
|
+
|
|
32
|
+
# Use a different data library
|
|
33
|
+
informatica-python workflow_export.xml -o output_dir --data-lib polars
|
|
34
|
+
|
|
35
|
+
# Include a parameter file
|
|
36
|
+
informatica-python workflow_export.xml -o output_dir --param-file workflow.param
|
|
37
|
+
|
|
38
|
+
# Enable data quality validation on type casts
|
|
39
|
+
informatica-python workflow_export.xml -o output_dir --validate-casts
|
|
40
|
+
|
|
41
|
+
# Parse to JSON only (no code generation)
|
|
42
|
+
informatica-python workflow_export.xml --json
|
|
43
|
+
|
|
44
|
+
# Save parsed JSON to file
|
|
45
|
+
informatica-python workflow_export.xml --json-file parsed.json
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
### Python API
|
|
49
|
+
|
|
50
|
+
```python
|
|
51
|
+
from informatica_python import InformaticaConverter
|
|
52
|
+
|
|
53
|
+
converter = InformaticaConverter()
|
|
54
|
+
|
|
55
|
+
# Parse and generate files
|
|
56
|
+
converter.convert_to_files("workflow_export.xml", "output_dir")
|
|
57
|
+
|
|
58
|
+
# Parse and generate zip
|
|
59
|
+
converter.convert_to_zip("workflow_export.xml", "output.zip")
|
|
60
|
+
|
|
61
|
+
# Parse to structured dict
|
|
62
|
+
result = converter.parse_file("workflow_export.xml")
|
|
63
|
+
|
|
64
|
+
# Use a different data library
|
|
65
|
+
converter.convert_to_files("workflow_export.xml", "output_dir", data_lib="polars")
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
## Generated Output Files
|
|
69
|
+
|
|
70
|
+
| File | Description |
|
|
71
|
+
|------|-------------|
|
|
72
|
+
| `helper_functions.py` | Database/file I/O helpers, Informatica expression equivalents (80+ functions), window/analytic functions, stored procedure execution, state persistence |
|
|
73
|
+
| `mapping_N.py` | One per mapping — transformation logic with row-count logging, source reads, target writes, inline documentation |
|
|
74
|
+
| `workflow.py` | Task orchestration with topological ordering, decision branching, worklet calls, and error handling |
|
|
75
|
+
| `config.yml` | Connection configs, source/target metadata, runtime parameters |
|
|
76
|
+
| `all_sql_queries.sql` | All SQL extracted from Source Qualifiers, Lookups, SQL transforms (with ANSI-translated variants) |
|
|
77
|
+
| `error_log.txt` | Conversion summary with unsupported transform analysis, unmapped port detection, and unknown expression function tracing |
|
|
78
|
+
|
|
79
|
+
## Supported Data Libraries
|
|
80
|
+
|
|
81
|
+
Select via `--data-lib` CLI flag or `data_lib` parameter:
|
|
82
|
+
|
|
83
|
+
| Library | Flag | Best For |
|
|
84
|
+
|---------|------|----------|
|
|
85
|
+
| **pandas** | `pandas` (default) | General-purpose, most compatible |
|
|
86
|
+
| **dask** | `dask` | Large datasets, parallel processing |
|
|
87
|
+
| **polars** | `polars` | High performance, Rust-backed |
|
|
88
|
+
| **vaex** | `vaex` | Out-of-core, billion-row datasets |
|
|
89
|
+
| **modin** | `modin` | Drop-in pandas replacement, multi-core |
|
|
90
|
+
|
|
91
|
+
## Supported Transformations
|
|
92
|
+
|
|
93
|
+
The code generator produces real, runnable Python for these transformation types:
|
|
94
|
+
|
|
95
|
+
- **Source Qualifier** — SQL override, pre/post SQL, column selection, session connection overrides
|
|
96
|
+
- **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style)
|
|
97
|
+
- **Filter** — Row filtering with vectorized converted conditions
|
|
98
|
+
- **Joiner** — `pd.merge()` with join type and condition parsing (inner/left/right/outer)
|
|
99
|
+
- **Lookup** — `pd.merge()` lookups with connection-aware DB/file reads, multiple match policies, default values
|
|
100
|
+
- **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST, computed aggregates
|
|
101
|
+
- **Sorter** — `sort_values()` with multi-key ascending/descending
|
|
102
|
+
- **Router** — Multi-group conditional routing with named groups
|
|
103
|
+
- **Union** — `pd.concat()` across multiple input groups
|
|
104
|
+
- **Update Strategy** — DD_INSERT/DD_UPDATE/DD_DELETE/DD_REJECT routing with actual target INSERT/UPDATE/DELETE operations, dialect-aware SQL placeholders, auto-detected primary keys
|
|
105
|
+
- **Sequence Generator** — Auto-incrementing ID columns
|
|
106
|
+
- **Normalizer** — `pd.melt()` with auto-detected id/value vars
|
|
107
|
+
- **Rank** — `groupby().rank()` with Top-N filtering
|
|
108
|
+
- **Stored Procedure** — Full code generation with Oracle/MSSQL/generic support, input/output parameter mapping
|
|
109
|
+
- **Transaction Control** — Commit/rollback logic
|
|
110
|
+
- **Custom / Java** — Placeholder stubs with TODO markers
|
|
111
|
+
- **SQL Transform** — Direct SQL execution pass-through
|
|
112
|
+
|
|
113
|
+
## Supported XML Tags (72 Tags)
|
|
114
|
+
|
|
115
|
+
**Top-level:** POWERMART, REPOSITORY, FOLDER, FOLDERVERSION
|
|
116
|
+
|
|
117
|
+
**Source/Target:** SOURCE, SOURCEFIELD, TARGET, TARGETFIELD, TARGETINDEX, TARGETINDEXFIELD, FLATFILE, XMLINFO, XMLTEXT, GROUP, TABLEATTRIBUTE, FIELDATTRIBUTE, METADATAEXTENSION, KEYWORD, ERPSRCINFO
|
|
118
|
+
|
|
119
|
+
**Mapping/Mapplet:** MAPPING, MAPPLET, TRANSFORMATION, TRANSFORMFIELD, TRANSFORMFIELDATTR, TRANSFORMFIELDATTRDEF, INSTANCE, ASSOCIATED_SOURCE_INSTANCE, CONNECTOR, MAPDEPENDENCY, TARGETLOADORDER, MAPPINGVARIABLE, FIELDDEPENDENCY, INITPROP, ERPINFO
|
|
120
|
+
|
|
121
|
+
**Task/Session/Workflow:** TASK, TIMER, VALUEPAIR, SCHEDULER, SCHEDULEINFO, STARTOPTIONS, ENDOPTIONS, SCHEDULEOPTIONS, RECURRING, CUSTOM, DAILYFREQUENCY, REPEAT, FILTER, SESSION, CONFIGREFERENCE, SESSTRANSFORMATIONINST, SESSTRANSFORMATIONGROUP, PARTITION, HASHKEY, KEYRANGE, CONFIG, SESSIONCOMPONENT, CONNECTIONREFERENCE, TASKINSTANCE, WORKFLOWLINK, WORKFLOWVARIABLE, WORKFLOWEVENT, WORKLET, WORKFLOW, ATTRIBUTE
|
|
122
|
+
|
|
123
|
+
**Shortcut:** SHORTCUT
|
|
124
|
+
|
|
125
|
+
**SAP:** SAPFUNCTION, SAPSTRUCTURE, SAPPROGRAM, SAPOUTPUTPORT, SAPVARIABLE, SAPPROGRAMFLOWOBJECT, SAPTABLEPARAM
|
|
126
|
+
|
|
127
|
+
## Key Features
|
|
128
|
+
|
|
129
|
+
### Row-Count Logging (v1.8+)
|
|
130
|
+
|
|
131
|
+
Generated code automatically logs row counts at every step of the data pipeline:
|
|
132
|
+
|
|
133
|
+
```
|
|
134
|
+
Source SQ_CUSTOMERS: 10000 rows read
|
|
135
|
+
EXP_CALC (Expression): 10000 input rows -> 10000 output rows
|
|
136
|
+
FIL_ACTIVE (Filter): 10000 input rows -> 8542 output rows
|
|
137
|
+
AGG_TOTALS (Aggregator): 8542 input rows -> 150 output rows
|
|
138
|
+
Target TGT_SUMMARY: 150 rows written
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
All row-count operations are backend-safe (wrapped in try/except), so Dask and other lazy-evaluation backends won't fail.
|
|
142
|
+
|
|
143
|
+
### Generated Code Documentation (v1.8+)
|
|
144
|
+
|
|
145
|
+
Every generated mapping function includes a rich docstring describing:
|
|
146
|
+
- Mapping name and original Informatica description
|
|
147
|
+
- Source and target tables/files
|
|
148
|
+
- Transformation pipeline with field counts per step
|
|
149
|
+
|
|
150
|
+
Each transformation block is annotated with:
|
|
151
|
+
- Separator headers for visual scanning
|
|
152
|
+
- Transform type and description (from Informatica XML)
|
|
153
|
+
- Input and output field lists (truncated at 10 for readability)
|
|
154
|
+
|
|
155
|
+
### Window / Analytic Functions (v1.7+)
|
|
156
|
+
|
|
157
|
+
DataFrame-level analytic functions for aggregation transforms:
|
|
158
|
+
- `moving_avg_df(df, col, window)` — rolling mean via `.rolling().mean()`
|
|
159
|
+
- `moving_sum_df(df, col, window)` — rolling sum via `.rolling().sum()`
|
|
160
|
+
- `cume_df(df, col)` — cumulative sum via `.expanding().sum()`
|
|
161
|
+
- `percentile_df(df, col, pct)` — quantile via `.quantile()`
|
|
162
|
+
|
|
163
|
+
### Update Strategy with Target Operations (v1.7+)
|
|
164
|
+
|
|
165
|
+
Update Strategy transforms now generate real INSERT/UPDATE/DELETE operations:
|
|
166
|
+
- Static strategies (0/1/2/3) map to INSERT/UPDATE/DELETE/REJECT
|
|
167
|
+
- DD_INSERT/DD_UPDATE/DD_DELETE/DD_REJECT expressions parsed from conditions
|
|
168
|
+
- Target writer splits rows and routes to appropriate SQL operations
|
|
169
|
+
- Dialect-aware SQL placeholders (`?` for MSSQL, `%s` for PostgreSQL/Oracle)
|
|
170
|
+
- Primary key columns auto-detected from target field definitions
|
|
171
|
+
|
|
172
|
+
### Stored Procedure Execution (v1.7+)
|
|
173
|
+
|
|
174
|
+
Full stored procedure code generation (not just stubs):
|
|
175
|
+
- Oracle: `cursor.callproc()` with output parameter registration
|
|
176
|
+
- MSSQL: `EXEC` with output parameter capture
|
|
177
|
+
- Generic: `CALL` syntax for other databases
|
|
178
|
+
- Input/output parameter mapping from transformation fields
|
|
179
|
+
- Empty-input guard prevents errors on empty upstream DataFrames
|
|
180
|
+
|
|
181
|
+
### State Persistence (v1.7+)
|
|
182
|
+
|
|
183
|
+
JSON-based variable persistence between workflow runs:
|
|
184
|
+
- `load_persistent_state()` / `save_persistent_state()` bracketing workflow execution
|
|
185
|
+
- `get_persistent_variable()` / `set_persistent_variable()` scoped by workflow/mapping name
|
|
186
|
+
- Mapping variables marked `is_persistent="YES"` automatically load from and save to state file
|
|
187
|
+
- Non-persistent variables remain unaffected
|
|
188
|
+
|
|
189
|
+
### SQL Dialect Translation (v1.6+)
|
|
190
|
+
|
|
191
|
+
Automatically translates vendor-specific SQL to ANSI equivalents:
|
|
192
|
+
- **Oracle:** NVL→COALESCE, SYSDATE→CURRENT_TIMESTAMP, DECODE→CASE, NVL2→CASE, (+)→ANSI JOIN, ROWNUM→LIMIT
|
|
193
|
+
- **MSSQL:** GETDATE→CURRENT_TIMESTAMP, ISNULL→COALESCE, TOP N→LIMIT, LEN→LENGTH, CHARINDEX→POSITION
|
|
194
|
+
- Auto-detects source dialect; outputs both original and translated SQL
|
|
195
|
+
|
|
196
|
+
### Enhanced Error Reporting (v1.6+)
|
|
197
|
+
|
|
198
|
+
Structured error log with three analysis sections:
|
|
199
|
+
- **Unsupported Transforms:** Lists each skipped transform with type, field count, and attributes
|
|
200
|
+
- **Unmapped Ports:** OUTPUT fields not connected to any downstream transform
|
|
201
|
+
- **Unsupported Expression Functions:** Unknown functions with location traces
|
|
202
|
+
|
|
203
|
+
### Nested Mapplet Support (v1.6+)
|
|
204
|
+
|
|
205
|
+
Recursively expands mapplet-within-mapplet instances:
|
|
206
|
+
- Double-underscore namespacing for nested transforms
|
|
207
|
+
- Depth limit of 10 with circular reference protection
|
|
208
|
+
- Connector rewiring through the full expansion tree
|
|
209
|
+
|
|
210
|
+
### Data Quality Validation (v1.6+)
|
|
211
|
+
|
|
212
|
+
Optional `--validate-casts` flag generates null-count checks before/after type casting:
|
|
213
|
+
- Counts null values pre- and post-coercion per column
|
|
214
|
+
- Logs warnings when coercion introduces new nulls
|
|
215
|
+
- Helps identify data quality issues during test runs
|
|
216
|
+
|
|
217
|
+
### Vectorized Expression Generation (v1.5+)
|
|
218
|
+
|
|
219
|
+
Column-level pandas operations instead of row-level iteration:
|
|
220
|
+
- IIF → `np.where()`, NVL → `.fillna()`, UPPER/LOWER → `.str.upper()/.str.lower()`
|
|
221
|
+
- SUBSTR → `.str[start:end]`, TO_INTEGER → `pd.to_numeric()`, TO_DATE → `pd.to_datetime()`
|
|
222
|
+
- IS NULL/IS NOT NULL → `.isna()`/`.notna()`
|
|
223
|
+
|
|
224
|
+
### Parameter File Support (v1.5+)
|
|
225
|
+
|
|
226
|
+
Standard Informatica `.param` file parsing:
|
|
227
|
+
- `[Global]` and `[folder.WF:workflow.ST:session]` section support
|
|
228
|
+
- `get_param(config, var_name)` resolution chain: config → env vars → defaults
|
|
229
|
+
- CLI `--param-file` flag for specifying parameter files
|
|
230
|
+
|
|
231
|
+
### Session Connection Overrides (v1.4+)
|
|
232
|
+
|
|
233
|
+
When sessions define per-transform connection overrides (different database, file directory, or filename), the generated code uses those overrides instead of source/target defaults.
|
|
234
|
+
|
|
235
|
+
### Worklet Support (v1.4+)
|
|
236
|
+
|
|
237
|
+
Worklet workflows are detected and generate separate `run_worklet_NAME(config)` functions. The main workflow calls these automatically for Worklet task types.
|
|
238
|
+
|
|
239
|
+
### Type Casting at Target Writes (v1.4+)
|
|
240
|
+
|
|
241
|
+
Target field datatypes are mapped to pandas types and generate proper casting code:
|
|
242
|
+
- Integers: nullable `Int64`/`Int32` or `fillna(0).astype(int)` for NOT NULL
|
|
243
|
+
- Dates: `pd.to_datetime(errors='coerce')`
|
|
244
|
+
- Decimals/Floats: `pd.to_numeric(errors='coerce')`
|
|
245
|
+
- Booleans: `.astype('boolean')`
|
|
246
|
+
|
|
247
|
+
### Flat File Handling (v1.3+)
|
|
248
|
+
|
|
249
|
+
Parses FLATFILE metadata for delimiter, fixed-width, header lines, skip rows, quote/escape chars. Generates `pd.read_fwf()` for fixed-width or enriched `read_file()` for delimited.
|
|
250
|
+
|
|
251
|
+
### Mapplet Inlining (v1.3+)
|
|
252
|
+
|
|
253
|
+
Expands Mapplet instances into prefixed transforms, rewires connectors, and eliminates duplication.
|
|
254
|
+
|
|
255
|
+
### Decision Tasks (v1.3+)
|
|
256
|
+
|
|
257
|
+
Converts Informatica decision conditions to Python if/else branches with proper variable substitution.
|
|
258
|
+
|
|
259
|
+
### Expression Converter (80+ Functions)
|
|
260
|
+
|
|
261
|
+
Converts Informatica expressions to Python equivalents:
|
|
262
|
+
|
|
263
|
+
- **String:** SUBSTR, LTRIM, RTRIM, UPPER, LOWER, LPAD, RPAD, INSTR, LENGTH, CONCAT, REPLACE, REG_EXTRACT, REG_REPLACE, REVERSE, INITCAP, CHR, ASCII
|
|
264
|
+
- **Date:** ADD_TO_DATE, DATE_DIFF, GET_DATE_PART, SYSDATE, SYSTIMESTAMP, TO_DATE, TO_CHAR, TRUNC (date)
|
|
265
|
+
- **Numeric:** ROUND, TRUNC, MOD, ABS, CEIL, FLOOR, POWER, SQRT, LOG, EXP, SIGN
|
|
266
|
+
- **Conversion:** TO_INTEGER, TO_BIGINT, TO_FLOAT, TO_DECIMAL, TO_CHAR, TO_DATE
|
|
267
|
+
- **Null handling:** IIF, DECODE, NVL, NVL2, ISNULL, IS_SPACES, IS_NUMBER
|
|
268
|
+
- **Aggregate:** SUM, AVG, COUNT, MIN, MAX, FIRST, LAST, MEDIAN, STDDEV, VARIANCE
|
|
269
|
+
- **Lookup:** :LKP expressions with dynamic lookup references
|
|
270
|
+
- **Variable:** SETVARIABLE / mapping variable assignment
|
|
271
|
+
|
|
272
|
+
## Requirements
|
|
273
|
+
|
|
274
|
+
- Python >= 3.8
|
|
275
|
+
- lxml >= 4.9.0
|
|
276
|
+
- PyYAML >= 6.0
|
|
277
|
+
|
|
278
|
+
## Changelog
|
|
279
|
+
|
|
280
|
+
### v1.8.x (Phase 7)
|
|
281
|
+
- Row-count logging at every pipeline step (source reads, transforms, target writes)
|
|
282
|
+
- Backend-safe logging (try/except wrapped for Dask/lazy backends)
|
|
283
|
+
- Rich mapping function docstrings with sources, targets, and transform pipeline summary
|
|
284
|
+
- Per-transform documentation headers with description, input/output field lists
|
|
285
|
+
|
|
286
|
+
### v1.7.x (Phase 6)
|
|
287
|
+
- Window/analytic functions (rolling avg/sum, cumulative sum, percentile)
|
|
288
|
+
- Update Strategy routing with actual INSERT/UPDATE/DELETE target operations
|
|
289
|
+
- Dialect-aware SQL placeholders for MSSQL/PostgreSQL/Oracle
|
|
290
|
+
- Full stored procedure code generation (Oracle/MSSQL/generic)
|
|
291
|
+
- JSON-based state persistence for mapping and workflow variables
|
|
292
|
+
- Primary key auto-detection for update strategy targets
|
|
293
|
+
|
|
294
|
+
### v1.6.x (Phase 5)
|
|
295
|
+
- SQL dialect translation (Oracle/MSSQL → ANSI)
|
|
296
|
+
- Enhanced error reporting (unsupported transforms, unmapped ports, unknown functions)
|
|
297
|
+
- Nested mapplet expansion with circular reference protection
|
|
298
|
+
- Data quality validation warnings on type casting (`--validate-casts`)
|
|
299
|
+
|
|
300
|
+
### v1.5.x (Phase 4)
|
|
301
|
+
- Parameter file support (`.param` files with section parsing)
|
|
302
|
+
- Vectorized expression generation (column-level pandas operations)
|
|
303
|
+
- Library-specific code adapters (polars/dask/modin/vaex syntax generation)
|
|
304
|
+
- 72+ integration tests
|
|
305
|
+
|
|
306
|
+
### v1.4.x (Phase 3)
|
|
307
|
+
- Session connection overrides for sources and targets
|
|
308
|
+
- Worklet function generation with safe invocation
|
|
309
|
+
- Type casting at target writes based on TARGETFIELD datatypes
|
|
310
|
+
- Flat-file session path overrides properly wired
|
|
311
|
+
|
|
312
|
+
### v1.3.x (Phase 2)
|
|
313
|
+
- FLATFILE metadata in source reads and target writes
|
|
314
|
+
- Normalizer with `pd.melt()`
|
|
315
|
+
- Rank with group-by and Top-N filtering
|
|
316
|
+
- Decision tasks with real if/else branches
|
|
317
|
+
- Mapplet instance inlining
|
|
318
|
+
|
|
319
|
+
### v1.2.x (Phase 1)
|
|
320
|
+
- Core parser for all 72 XML tags
|
|
321
|
+
- Expression converter with 80+ functions
|
|
322
|
+
- Aggregator, Joiner, Lookup code generation
|
|
323
|
+
- Workflow orchestration with topological task ordering
|
|
324
|
+
- Multi-library support (pandas, dask, polars, vaex, modin)
|
|
325
|
+
|
|
326
|
+
## Development
|
|
327
|
+
|
|
328
|
+
```bash
|
|
329
|
+
# Clone and install in development mode
|
|
330
|
+
cd informatica_python
|
|
331
|
+
pip install -e ".[dev]"
|
|
332
|
+
|
|
333
|
+
# Run tests (136 tests)
|
|
334
|
+
pytest tests/ -v
|
|
335
|
+
```
|
|
336
|
+
|
|
337
|
+
## License
|
|
338
|
+
|
|
339
|
+
MIT License - Copyright (c) 2025 Nick
|
|
340
|
+
|
|
341
|
+
See [LICENSE](LICENSE) for details.
|
{informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/generators/mapping_gen.py
RENAMED
|
@@ -644,7 +644,10 @@ def _generate_source_qualifier(lines, sq, source_map, source_dfs, connector_grap
|
|
|
644
644
|
lines.append(f" df_{sq_safe} = df_{_safe_name(next(iter(connected_sources)))}")
|
|
645
645
|
|
|
646
646
|
source_dfs[sq.name] = f"df_{sq_safe}"
|
|
647
|
-
lines.append(f"
|
|
647
|
+
lines.append(f" try:")
|
|
648
|
+
lines.append(f" logger.info(f'Source {sq.name}: {{len(df_{sq_safe})}} rows read')")
|
|
649
|
+
lines.append(f" except Exception:")
|
|
650
|
+
lines.append(f" logger.info('Source {sq.name}: rows read (count unavailable)')")
|
|
648
651
|
|
|
649
652
|
if post_sql:
|
|
650
653
|
lines.append(f" # Post-SQL")
|
|
@@ -684,7 +687,10 @@ def _generate_transformation(lines, tx, connector_graph, source_dfs, transform_m
|
|
|
684
687
|
lines.append(f" # Input fields: {', '.join(in_fields[:10])}{' ...' if len(in_fields) > 10 else ''}")
|
|
685
688
|
lines.append(f" # Output fields: {', '.join(out_fields[:10])}{' ...' if len(out_fields) > 10 else ''}")
|
|
686
689
|
lines.append(f" # -------------------------------------------------------------------")
|
|
687
|
-
lines.append(f"
|
|
690
|
+
lines.append(f" try:")
|
|
691
|
+
lines.append(f" _input_rows_{tx_safe} = len({input_df})")
|
|
692
|
+
lines.append(f" except Exception:")
|
|
693
|
+
lines.append(f" _input_rows_{tx_safe} = -1")
|
|
688
694
|
|
|
689
695
|
if tx_type == "expression":
|
|
690
696
|
_gen_expression_transform(lines, tx, tx_safe, input_df, source_dfs, data_lib)
|
|
@@ -724,7 +730,10 @@ def _generate_transformation(lines, tx, connector_graph, source_dfs, transform_m
|
|
|
724
730
|
lines.append(f" df_{tx_safe} = {copy_expr}")
|
|
725
731
|
source_dfs[tx.name] = f"df_{tx_safe}"
|
|
726
732
|
|
|
727
|
-
lines.append(f"
|
|
733
|
+
lines.append(f" try:")
|
|
734
|
+
lines.append(f" _output_rows_{tx_safe} = len(df_{tx_safe})")
|
|
735
|
+
lines.append(f" except Exception:")
|
|
736
|
+
lines.append(f" _output_rows_{tx_safe} = -1")
|
|
728
737
|
lines.append(f" logger.info(f'{tx.name} ({tx.type}): {{_input_rows_{tx_safe}}} input rows -> {{_output_rows_{tx_safe}}} output rows')")
|
|
729
738
|
lines.append("")
|
|
730
739
|
|
|
@@ -1392,7 +1401,10 @@ def _generate_target_write(lines, tgt_name, tgt_def, connector_graph, source_dfs
|
|
|
1392
1401
|
else:
|
|
1393
1402
|
lines.append(f" write_file(df_target_{tgt_safe}, config.get('targets', {{}}).get('{tgt_def.name}', {{}}).get('file_path', '{tgt_def.name}'),")
|
|
1394
1403
|
lines.append(f" config.get('targets', {{}}).get('{tgt_def.name}', {{}}))")
|
|
1395
|
-
lines.append(f"
|
|
1404
|
+
lines.append(f" try:")
|
|
1405
|
+
lines.append(f" logger.info(f'Target {tgt_def.name}: {{len(df_target_{tgt_safe})}} rows written')")
|
|
1406
|
+
lines.append(f" except Exception:")
|
|
1407
|
+
lines.append(f" logger.info('Target {tgt_def.name}: rows written (count unavailable)')")
|
|
1396
1408
|
|
|
1397
1409
|
|
|
1398
1410
|
CAST_MAP = {
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: informatica-python
|
|
3
|
-
Version: 1.8.
|
|
3
|
+
Version: 1.8.2
|
|
4
4
|
Summary: Convert Informatica PowerCenter workflow XML to Python/PySpark code
|
|
5
5
|
Author: Nick
|
|
6
6
|
License: MIT
|
|
@@ -59,6 +59,12 @@ informatica-python workflow_export.xml -z output.zip
|
|
|
59
59
|
# Use a different data library
|
|
60
60
|
informatica-python workflow_export.xml -o output_dir --data-lib polars
|
|
61
61
|
|
|
62
|
+
# Include a parameter file
|
|
63
|
+
informatica-python workflow_export.xml -o output_dir --param-file workflow.param
|
|
64
|
+
|
|
65
|
+
# Enable data quality validation on type casts
|
|
66
|
+
informatica-python workflow_export.xml -o output_dir --validate-casts
|
|
67
|
+
|
|
62
68
|
# Parse to JSON only (no code generation)
|
|
63
69
|
informatica-python workflow_export.xml --json
|
|
64
70
|
|
|
@@ -90,12 +96,12 @@ converter.convert_to_files("workflow_export.xml", "output_dir", data_lib="polars
|
|
|
90
96
|
|
|
91
97
|
| File | Description |
|
|
92
98
|
|------|-------------|
|
|
93
|
-
| `helper_functions.py` | Database/file I/O helpers, Informatica expression equivalents (80+ functions) |
|
|
94
|
-
| `mapping_N.py` | One per mapping — transformation logic, source reads, target writes |
|
|
95
|
-
| `workflow.py` | Task orchestration with topological ordering and error handling |
|
|
99
|
+
| `helper_functions.py` | Database/file I/O helpers, Informatica expression equivalents (80+ functions), window/analytic functions, stored procedure execution, state persistence |
|
|
100
|
+
| `mapping_N.py` | One per mapping — transformation logic with row-count logging, source reads, target writes, inline documentation |
|
|
101
|
+
| `workflow.py` | Task orchestration with topological ordering, decision branching, worklet calls, and error handling |
|
|
96
102
|
| `config.yml` | Connection configs, source/target metadata, runtime parameters |
|
|
97
|
-
| `all_sql_queries.sql` | All SQL extracted from Source Qualifiers, Lookups, SQL transforms |
|
|
98
|
-
| `error_log.txt` | Conversion summary,
|
|
103
|
+
| `all_sql_queries.sql` | All SQL extracted from Source Qualifiers, Lookups, SQL transforms (with ANSI-translated variants) |
|
|
104
|
+
| `error_log.txt` | Conversion summary with unsupported transform analysis, unmapped port detection, and unknown expression function tracing |
|
|
99
105
|
|
|
100
106
|
## Supported Data Libraries
|
|
101
107
|
|
|
@@ -113,21 +119,21 @@ Select via `--data-lib` CLI flag or `data_lib` parameter:
|
|
|
113
119
|
|
|
114
120
|
The code generator produces real, runnable Python for these transformation types:
|
|
115
121
|
|
|
116
|
-
- **Source Qualifier** — SQL override, pre/post SQL, column selection
|
|
117
|
-
- **Expression** — Field-level expressions converted to pandas operations
|
|
118
|
-
- **Filter** — Row filtering with converted conditions
|
|
119
|
-
- **Joiner** — `pd.merge()` with join type and condition parsing
|
|
120
|
-
- **Lookup** — `pd.merge()` lookups with connection-aware DB/file reads
|
|
121
|
-
- **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST
|
|
122
|
+
- **Source Qualifier** — SQL override, pre/post SQL, column selection, session connection overrides
|
|
123
|
+
- **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style)
|
|
124
|
+
- **Filter** — Row filtering with vectorized converted conditions
|
|
125
|
+
- **Joiner** — `pd.merge()` with join type and condition parsing (inner/left/right/outer)
|
|
126
|
+
- **Lookup** — `pd.merge()` lookups with connection-aware DB/file reads, multiple match policies, default values
|
|
127
|
+
- **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST, computed aggregates
|
|
122
128
|
- **Sorter** — `sort_values()` with multi-key ascending/descending
|
|
123
|
-
- **Router** — Multi-group conditional routing with
|
|
129
|
+
- **Router** — Multi-group conditional routing with named groups
|
|
124
130
|
- **Union** — `pd.concat()` across multiple input groups
|
|
125
|
-
- **Update Strategy** —
|
|
131
|
+
- **Update Strategy** — DD_INSERT/DD_UPDATE/DD_DELETE/DD_REJECT routing with actual target INSERT/UPDATE/DELETE operations, dialect-aware SQL placeholders, auto-detected primary keys
|
|
126
132
|
- **Sequence Generator** — Auto-incrementing ID columns
|
|
127
133
|
- **Normalizer** — `pd.melt()` with auto-detected id/value vars
|
|
128
134
|
- **Rank** — `groupby().rank()` with Top-N filtering
|
|
129
|
-
- **Stored Procedure** —
|
|
130
|
-
- **Transaction Control** — Commit/rollback logic
|
|
135
|
+
- **Stored Procedure** — Full code generation with Oracle/MSSQL/generic support, input/output parameter mapping
|
|
136
|
+
- **Transaction Control** — Commit/rollback logic
|
|
131
137
|
- **Custom / Java** — Placeholder stubs with TODO markers
|
|
132
138
|
- **SQL Transform** — Direct SQL execution pass-through
|
|
133
139
|
|
|
@@ -147,13 +153,118 @@ The code generator produces real, runnable Python for these transformation types
|
|
|
147
153
|
|
|
148
154
|
## Key Features
|
|
149
155
|
|
|
156
|
+
### Row-Count Logging (v1.8+)
|
|
157
|
+
|
|
158
|
+
Generated code automatically logs row counts at every step of the data pipeline:
|
|
159
|
+
|
|
160
|
+
```
|
|
161
|
+
Source SQ_CUSTOMERS: 10000 rows read
|
|
162
|
+
EXP_CALC (Expression): 10000 input rows -> 10000 output rows
|
|
163
|
+
FIL_ACTIVE (Filter): 10000 input rows -> 8542 output rows
|
|
164
|
+
AGG_TOTALS (Aggregator): 8542 input rows -> 150 output rows
|
|
165
|
+
Target TGT_SUMMARY: 150 rows written
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
All row-count operations are backend-safe (wrapped in try/except), so Dask and other lazy-evaluation backends won't fail.
|
|
169
|
+
|
|
170
|
+
### Generated Code Documentation (v1.8+)
|
|
171
|
+
|
|
172
|
+
Every generated mapping function includes a rich docstring describing:
|
|
173
|
+
- Mapping name and original Informatica description
|
|
174
|
+
- Source and target tables/files
|
|
175
|
+
- Transformation pipeline with field counts per step
|
|
176
|
+
|
|
177
|
+
Each transformation block is annotated with:
|
|
178
|
+
- Separator headers for visual scanning
|
|
179
|
+
- Transform type and description (from Informatica XML)
|
|
180
|
+
- Input and output field lists (truncated at 10 for readability)
|
|
181
|
+
|
|
182
|
+
### Window / Analytic Functions (v1.7+)
|
|
183
|
+
|
|
184
|
+
DataFrame-level analytic functions for aggregation transforms:
|
|
185
|
+
- `moving_avg_df(df, col, window)` — rolling mean via `.rolling().mean()`
|
|
186
|
+
- `moving_sum_df(df, col, window)` — rolling sum via `.rolling().sum()`
|
|
187
|
+
- `cume_df(df, col)` — cumulative sum via `.expanding().sum()`
|
|
188
|
+
- `percentile_df(df, col, pct)` — quantile via `.quantile()`
|
|
189
|
+
|
|
190
|
+
### Update Strategy with Target Operations (v1.7+)
|
|
191
|
+
|
|
192
|
+
Update Strategy transforms now generate real INSERT/UPDATE/DELETE operations:
|
|
193
|
+
- Static strategies (0/1/2/3) map to INSERT/UPDATE/DELETE/REJECT
|
|
194
|
+
- DD_INSERT/DD_UPDATE/DD_DELETE/DD_REJECT expressions parsed from conditions
|
|
195
|
+
- Target writer splits rows and routes to appropriate SQL operations
|
|
196
|
+
- Dialect-aware SQL placeholders (`?` for MSSQL, `%s` for PostgreSQL/Oracle)
|
|
197
|
+
- Primary key columns auto-detected from target field definitions
|
|
198
|
+
|
|
199
|
+
### Stored Procedure Execution (v1.7+)
|
|
200
|
+
|
|
201
|
+
Full stored procedure code generation (not just stubs):
|
|
202
|
+
- Oracle: `cursor.callproc()` with output parameter registration
|
|
203
|
+
- MSSQL: `EXEC` with output parameter capture
|
|
204
|
+
- Generic: `CALL` syntax for other databases
|
|
205
|
+
- Input/output parameter mapping from transformation fields
|
|
206
|
+
- Empty-input guard prevents errors on empty upstream DataFrames
|
|
207
|
+
|
|
208
|
+
### State Persistence (v1.7+)
|
|
209
|
+
|
|
210
|
+
JSON-based variable persistence between workflow runs:
|
|
211
|
+
- `load_persistent_state()` / `save_persistent_state()` bracketing workflow execution
|
|
212
|
+
- `get_persistent_variable()` / `set_persistent_variable()` scoped by workflow/mapping name
|
|
213
|
+
- Mapping variables marked `is_persistent="YES"` automatically load from and save to state file
|
|
214
|
+
- Non-persistent variables remain unaffected
|
|
215
|
+
|
|
216
|
+
### SQL Dialect Translation (v1.6+)
|
|
217
|
+
|
|
218
|
+
Automatically translates vendor-specific SQL to ANSI equivalents:
|
|
219
|
+
- **Oracle:** NVL→COALESCE, SYSDATE→CURRENT_TIMESTAMP, DECODE→CASE, NVL2→CASE, (+)→ANSI JOIN, ROWNUM→LIMIT
|
|
220
|
+
- **MSSQL:** GETDATE→CURRENT_TIMESTAMP, ISNULL→COALESCE, TOP N→LIMIT, LEN→LENGTH, CHARINDEX→POSITION
|
|
221
|
+
- Auto-detects source dialect; outputs both original and translated SQL
|
|
222
|
+
|
|
223
|
+
### Enhanced Error Reporting (v1.6+)
|
|
224
|
+
|
|
225
|
+
Structured error log with three analysis sections:
|
|
226
|
+
- **Unsupported Transforms:** Lists each skipped transform with type, field count, and attributes
|
|
227
|
+
- **Unmapped Ports:** OUTPUT fields not connected to any downstream transform
|
|
228
|
+
- **Unsupported Expression Functions:** Unknown functions with location traces
|
|
229
|
+
|
|
230
|
+
### Nested Mapplet Support (v1.6+)
|
|
231
|
+
|
|
232
|
+
Recursively expands mapplet-within-mapplet instances:
|
|
233
|
+
- Double-underscore namespacing for nested transforms
|
|
234
|
+
- Depth limit of 10 with circular reference protection
|
|
235
|
+
- Connector rewiring through the full expansion tree
|
|
236
|
+
|
|
237
|
+
### Data Quality Validation (v1.6+)
|
|
238
|
+
|
|
239
|
+
Optional `--validate-casts` flag generates null-count checks before/after type casting:
|
|
240
|
+
- Counts null values pre- and post-coercion per column
|
|
241
|
+
- Logs warnings when coercion introduces new nulls
|
|
242
|
+
- Helps identify data quality issues during test runs
|
|
243
|
+
|
|
244
|
+
### Vectorized Expression Generation (v1.5+)
|
|
245
|
+
|
|
246
|
+
Column-level pandas operations instead of row-level iteration:
|
|
247
|
+
- IIF → `np.where()`, NVL → `.fillna()`, UPPER/LOWER → `.str.upper()/.str.lower()`
|
|
248
|
+
- SUBSTR → `.str[start:end]`, TO_INTEGER → `pd.to_numeric()`, TO_DATE → `pd.to_datetime()`
|
|
249
|
+
- IS NULL/IS NOT NULL → `.isna()`/`.notna()`
|
|
250
|
+
|
|
251
|
+
### Parameter File Support (v1.5+)
|
|
252
|
+
|
|
253
|
+
Standard Informatica `.param` file parsing:
|
|
254
|
+
- `[Global]` and `[folder.WF:workflow.ST:session]` section support
|
|
255
|
+
- `get_param(config, var_name)` resolution chain: config → env vars → defaults
|
|
256
|
+
- CLI `--param-file` flag for specifying parameter files
|
|
257
|
+
|
|
150
258
|
### Session Connection Overrides (v1.4+)
|
|
259
|
+
|
|
151
260
|
When sessions define per-transform connection overrides (different database, file directory, or filename), the generated code uses those overrides instead of source/target defaults.
|
|
152
261
|
|
|
153
262
|
### Worklet Support (v1.4+)
|
|
263
|
+
|
|
154
264
|
Worklet workflows are detected and generate separate `run_worklet_NAME(config)` functions. The main workflow calls these automatically for Worklet task types.
|
|
155
265
|
|
|
156
266
|
### Type Casting at Target Writes (v1.4+)
|
|
267
|
+
|
|
157
268
|
Target field datatypes are mapped to pandas types and generate proper casting code:
|
|
158
269
|
- Integers: nullable `Int64`/`Int32` or `fillna(0).astype(int)` for NOT NULL
|
|
159
270
|
- Dates: `pd.to_datetime(errors='coerce')`
|
|
@@ -161,12 +272,15 @@ Target field datatypes are mapped to pandas types and generate proper casting co
|
|
|
161
272
|
- Booleans: `.astype('boolean')`
|
|
162
273
|
|
|
163
274
|
### Flat File Handling (v1.3+)
|
|
275
|
+
|
|
164
276
|
Parses FLATFILE metadata for delimiter, fixed-width, header lines, skip rows, quote/escape chars. Generates `pd.read_fwf()` for fixed-width or enriched `read_file()` for delimited.
|
|
165
277
|
|
|
166
278
|
### Mapplet Inlining (v1.3+)
|
|
279
|
+
|
|
167
280
|
Expands Mapplet instances into prefixed transforms, rewires connectors, and eliminates duplication.
|
|
168
281
|
|
|
169
282
|
### Decision Tasks (v1.3+)
|
|
283
|
+
|
|
170
284
|
Converts Informatica decision conditions to Python if/else branches with proper variable substitution.
|
|
171
285
|
|
|
172
286
|
### Expression Converter (80+ Functions)
|
|
@@ -190,6 +304,32 @@ Converts Informatica expressions to Python equivalents:
|
|
|
190
304
|
|
|
191
305
|
## Changelog
|
|
192
306
|
|
|
307
|
+
### v1.8.x (Phase 7)
|
|
308
|
+
- Row-count logging at every pipeline step (source reads, transforms, target writes)
|
|
309
|
+
- Backend-safe logging (try/except wrapped for Dask/lazy backends)
|
|
310
|
+
- Rich mapping function docstrings with sources, targets, and transform pipeline summary
|
|
311
|
+
- Per-transform documentation headers with description, input/output field lists
|
|
312
|
+
|
|
313
|
+
### v1.7.x (Phase 6)
|
|
314
|
+
- Window/analytic functions (rolling avg/sum, cumulative sum, percentile)
|
|
315
|
+
- Update Strategy routing with actual INSERT/UPDATE/DELETE target operations
|
|
316
|
+
- Dialect-aware SQL placeholders for MSSQL/PostgreSQL/Oracle
|
|
317
|
+
- Full stored procedure code generation (Oracle/MSSQL/generic)
|
|
318
|
+
- JSON-based state persistence for mapping and workflow variables
|
|
319
|
+
- Primary key auto-detection for update strategy targets
|
|
320
|
+
|
|
321
|
+
### v1.6.x (Phase 5)
|
|
322
|
+
- SQL dialect translation (Oracle/MSSQL → ANSI)
|
|
323
|
+
- Enhanced error reporting (unsupported transforms, unmapped ports, unknown functions)
|
|
324
|
+
- Nested mapplet expansion with circular reference protection
|
|
325
|
+
- Data quality validation warnings on type casting (`--validate-casts`)
|
|
326
|
+
|
|
327
|
+
### v1.5.x (Phase 4)
|
|
328
|
+
- Parameter file support (`.param` files with section parsing)
|
|
329
|
+
- Vectorized expression generation (column-level pandas operations)
|
|
330
|
+
- Library-specific code adapters (polars/dask/modin/vaex syntax generation)
|
|
331
|
+
- 72+ integration tests
|
|
332
|
+
|
|
193
333
|
### v1.4.x (Phase 3)
|
|
194
334
|
- Session connection overrides for sources and targets
|
|
195
335
|
- Worklet function generation with safe invocation
|
|
@@ -217,8 +357,8 @@ Converts Informatica expressions to Python equivalents:
|
|
|
217
357
|
cd informatica_python
|
|
218
358
|
pip install -e ".[dev]"
|
|
219
359
|
|
|
220
|
-
# Run tests (
|
|
221
|
-
pytest tests/
|
|
360
|
+
# Run tests (136 tests)
|
|
361
|
+
pytest tests/ -v
|
|
222
362
|
```
|
|
223
363
|
|
|
224
364
|
## License
|
|
@@ -1236,7 +1236,7 @@ class TestLoggingEnrichment:
|
|
|
1236
1236
|
folder = FolderDef(name="TestFolder", mappings=[mapping])
|
|
1237
1237
|
code = generate_mapping_code(mapping, folder, "pandas", 1)
|
|
1238
1238
|
assert "_input_rows_fil_active = len(" in code
|
|
1239
|
-
assert "_output_rows_fil_active = len(df_fil_active
|
|
1239
|
+
assert "_output_rows_fil_active = len(df_fil_active" in code
|
|
1240
1240
|
assert "FIL_ACTIVE (Filter):" in code
|
|
1241
1241
|
assert "input rows ->" in code
|
|
1242
1242
|
assert "output rows" in code
|
|
@@ -1,201 +0,0 @@
|
|
|
1
|
-
# informatica-python
|
|
2
|
-
|
|
3
|
-
Convert Informatica PowerCenter workflow XML exports into clean, runnable Python/PySpark code.
|
|
4
|
-
|
|
5
|
-
**Author:** Nick
|
|
6
|
-
**License:** MIT
|
|
7
|
-
**PyPI:** [informatica-python](https://pypi.org/project/informatica-python/)
|
|
8
|
-
|
|
9
|
-
---
|
|
10
|
-
|
|
11
|
-
## Overview
|
|
12
|
-
|
|
13
|
-
`informatica-python` parses Informatica PowerCenter XML export files and generates equivalent Python code using your choice of data library. It handles all 72 DTD tags from the PowerCenter XML schema and produces a complete, ready-to-run Python project.
|
|
14
|
-
|
|
15
|
-
## Installation
|
|
16
|
-
|
|
17
|
-
```bash
|
|
18
|
-
pip install informatica-python
|
|
19
|
-
```
|
|
20
|
-
|
|
21
|
-
## Quick Start
|
|
22
|
-
|
|
23
|
-
### Command Line
|
|
24
|
-
|
|
25
|
-
```bash
|
|
26
|
-
# Generate Python files to a directory
|
|
27
|
-
informatica-python workflow_export.xml -o output_dir
|
|
28
|
-
|
|
29
|
-
# Generate as a zip archive
|
|
30
|
-
informatica-python workflow_export.xml -z output.zip
|
|
31
|
-
|
|
32
|
-
# Use a different data library
|
|
33
|
-
informatica-python workflow_export.xml -o output_dir --data-lib polars
|
|
34
|
-
|
|
35
|
-
# Parse to JSON only (no code generation)
|
|
36
|
-
informatica-python workflow_export.xml --json
|
|
37
|
-
|
|
38
|
-
# Save parsed JSON to file
|
|
39
|
-
informatica-python workflow_export.xml --json-file parsed.json
|
|
40
|
-
```
|
|
41
|
-
|
|
42
|
-
### Python API
|
|
43
|
-
|
|
44
|
-
```python
|
|
45
|
-
from informatica_python import InformaticaConverter
|
|
46
|
-
|
|
47
|
-
converter = InformaticaConverter()
|
|
48
|
-
|
|
49
|
-
# Parse and generate files
|
|
50
|
-
converter.convert_to_files("workflow_export.xml", "output_dir")
|
|
51
|
-
|
|
52
|
-
# Parse and generate zip
|
|
53
|
-
converter.convert_to_zip("workflow_export.xml", "output.zip")
|
|
54
|
-
|
|
55
|
-
# Parse to structured dict
|
|
56
|
-
result = converter.parse_file("workflow_export.xml")
|
|
57
|
-
|
|
58
|
-
# Use a different data library
|
|
59
|
-
converter.convert_to_files("workflow_export.xml", "output_dir", data_lib="polars")
|
|
60
|
-
```
|
|
61
|
-
|
|
62
|
-
## Generated Output Files
|
|
63
|
-
|
|
64
|
-
| File | Description |
|
|
65
|
-
|------|-------------|
|
|
66
|
-
| `helper_functions.py` | Database/file I/O helpers, Informatica expression equivalents (80+ functions) |
|
|
67
|
-
| `mapping_N.py` | One per mapping — transformation logic, source reads, target writes |
|
|
68
|
-
| `workflow.py` | Task orchestration with topological ordering and error handling |
|
|
69
|
-
| `config.yml` | Connection configs, source/target metadata, runtime parameters |
|
|
70
|
-
| `all_sql_queries.sql` | All SQL extracted from Source Qualifiers, Lookups, SQL transforms |
|
|
71
|
-
| `error_log.txt` | Conversion summary, warnings, and unsupported feature notes |
|
|
72
|
-
|
|
73
|
-
## Supported Data Libraries
|
|
74
|
-
|
|
75
|
-
Select via `--data-lib` CLI flag or `data_lib` parameter:
|
|
76
|
-
|
|
77
|
-
| Library | Flag | Best For |
|
|
78
|
-
|---------|------|----------|
|
|
79
|
-
| **pandas** | `pandas` (default) | General-purpose, most compatible |
|
|
80
|
-
| **dask** | `dask` | Large datasets, parallel processing |
|
|
81
|
-
| **polars** | `polars` | High performance, Rust-backed |
|
|
82
|
-
| **vaex** | `vaex` | Out-of-core, billion-row datasets |
|
|
83
|
-
| **modin** | `modin` | Drop-in pandas replacement, multi-core |
|
|
84
|
-
|
|
85
|
-
## Supported Transformations
|
|
86
|
-
|
|
87
|
-
The code generator produces real, runnable Python for these transformation types:
|
|
88
|
-
|
|
89
|
-
- **Source Qualifier** — SQL override, pre/post SQL, column selection
|
|
90
|
-
- **Expression** — Field-level expressions converted to pandas operations
|
|
91
|
-
- **Filter** — Row filtering with converted conditions
|
|
92
|
-
- **Joiner** — `pd.merge()` with join type and condition parsing
|
|
93
|
-
- **Lookup** — `pd.merge()` lookups with connection-aware DB/file reads
|
|
94
|
-
- **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST
|
|
95
|
-
- **Sorter** — `sort_values()` with multi-key ascending/descending
|
|
96
|
-
- **Router** — Multi-group conditional routing with if/elif/else
|
|
97
|
-
- **Union** — `pd.concat()` across multiple input groups
|
|
98
|
-
- **Update Strategy** — Insert/Update/Delete/Reject flag generation
|
|
99
|
-
- **Sequence Generator** — Auto-incrementing ID columns
|
|
100
|
-
- **Normalizer** — `pd.melt()` with auto-detected id/value vars
|
|
101
|
-
- **Rank** — `groupby().rank()` with Top-N filtering
|
|
102
|
-
- **Stored Procedure** — Stub generation with SP name and parameters
|
|
103
|
-
- **Transaction Control** — Commit/rollback logic stubs
|
|
104
|
-
- **Custom / Java** — Placeholder stubs with TODO markers
|
|
105
|
-
- **SQL Transform** — Direct SQL execution pass-through
|
|
106
|
-
|
|
107
|
-
## Supported XML Tags (72 Tags)
|
|
108
|
-
|
|
109
|
-
**Top-level:** POWERMART, REPOSITORY, FOLDER, FOLDERVERSION
|
|
110
|
-
|
|
111
|
-
**Source/Target:** SOURCE, SOURCEFIELD, TARGET, TARGETFIELD, TARGETINDEX, TARGETINDEXFIELD, FLATFILE, XMLINFO, XMLTEXT, GROUP, TABLEATTRIBUTE, FIELDATTRIBUTE, METADATAEXTENSION, KEYWORD, ERPSRCINFO
|
|
112
|
-
|
|
113
|
-
**Mapping/Mapplet:** MAPPING, MAPPLET, TRANSFORMATION, TRANSFORMFIELD, TRANSFORMFIELDATTR, TRANSFORMFIELDATTRDEF, INSTANCE, ASSOCIATED_SOURCE_INSTANCE, CONNECTOR, MAPDEPENDENCY, TARGETLOADORDER, MAPPINGVARIABLE, FIELDDEPENDENCY, INITPROP, ERPINFO
|
|
114
|
-
|
|
115
|
-
**Task/Session/Workflow:** TASK, TIMER, VALUEPAIR, SCHEDULER, SCHEDULEINFO, STARTOPTIONS, ENDOPTIONS, SCHEDULEOPTIONS, RECURRING, CUSTOM, DAILYFREQUENCY, REPEAT, FILTER, SESSION, CONFIGREFERENCE, SESSTRANSFORMATIONINST, SESSTRANSFORMATIONGROUP, PARTITION, HASHKEY, KEYRANGE, CONFIG, SESSIONCOMPONENT, CONNECTIONREFERENCE, TASKINSTANCE, WORKFLOWLINK, WORKFLOWVARIABLE, WORKFLOWEVENT, WORKLET, WORKFLOW, ATTRIBUTE
|
|
116
|
-
|
|
117
|
-
**Shortcut:** SHORTCUT
|
|
118
|
-
|
|
119
|
-
**SAP:** SAPFUNCTION, SAPSTRUCTURE, SAPPROGRAM, SAPOUTPUTPORT, SAPVARIABLE, SAPPROGRAMFLOWOBJECT, SAPTABLEPARAM
|
|
120
|
-
|
|
121
|
-
## Key Features
|
|
122
|
-
|
|
123
|
-
### Session Connection Overrides (v1.4+)
|
|
124
|
-
When sessions define per-transform connection overrides (different database, file directory, or filename), the generated code uses those overrides instead of source/target defaults.
|
|
125
|
-
|
|
126
|
-
### Worklet Support (v1.4+)
|
|
127
|
-
Worklet workflows are detected and generate separate `run_worklet_NAME(config)` functions. The main workflow calls these automatically for Worklet task types.
|
|
128
|
-
|
|
129
|
-
### Type Casting at Target Writes (v1.4+)
|
|
130
|
-
Target field datatypes are mapped to pandas types and generate proper casting code:
|
|
131
|
-
- Integers: nullable `Int64`/`Int32` or `fillna(0).astype(int)` for NOT NULL
|
|
132
|
-
- Dates: `pd.to_datetime(errors='coerce')`
|
|
133
|
-
- Decimals/Floats: `pd.to_numeric(errors='coerce')`
|
|
134
|
-
- Booleans: `.astype('boolean')`
|
|
135
|
-
|
|
136
|
-
### Flat File Handling (v1.3+)
|
|
137
|
-
Parses FLATFILE metadata for delimiter, fixed-width, header lines, skip rows, quote/escape chars. Generates `pd.read_fwf()` for fixed-width or enriched `read_file()` for delimited.
|
|
138
|
-
|
|
139
|
-
### Mapplet Inlining (v1.3+)
|
|
140
|
-
Expands Mapplet instances into prefixed transforms, rewires connectors, and eliminates duplication.
|
|
141
|
-
|
|
142
|
-
### Decision Tasks (v1.3+)
|
|
143
|
-
Converts Informatica decision conditions to Python if/else branches with proper variable substitution.
|
|
144
|
-
|
|
145
|
-
### Expression Converter (80+ Functions)
|
|
146
|
-
|
|
147
|
-
Converts Informatica expressions to Python equivalents:
|
|
148
|
-
|
|
149
|
-
- **String:** SUBSTR, LTRIM, RTRIM, UPPER, LOWER, LPAD, RPAD, INSTR, LENGTH, CONCAT, REPLACE, REG_EXTRACT, REG_REPLACE, REVERSE, INITCAP, CHR, ASCII
|
|
150
|
-
- **Date:** ADD_TO_DATE, DATE_DIFF, GET_DATE_PART, SYSDATE, SYSTIMESTAMP, TO_DATE, TO_CHAR, TRUNC (date)
|
|
151
|
-
- **Numeric:** ROUND, TRUNC, MOD, ABS, CEIL, FLOOR, POWER, SQRT, LOG, EXP, SIGN
|
|
152
|
-
- **Conversion:** TO_INTEGER, TO_BIGINT, TO_FLOAT, TO_DECIMAL, TO_CHAR, TO_DATE
|
|
153
|
-
- **Null handling:** IIF, DECODE, NVL, NVL2, ISNULL, IS_SPACES, IS_NUMBER
|
|
154
|
-
- **Aggregate:** SUM, AVG, COUNT, MIN, MAX, FIRST, LAST, MEDIAN, STDDEV, VARIANCE
|
|
155
|
-
- **Lookup:** :LKP expressions with dynamic lookup references
|
|
156
|
-
- **Variable:** SETVARIABLE / mapping variable assignment
|
|
157
|
-
|
|
158
|
-
## Requirements
|
|
159
|
-
|
|
160
|
-
- Python >= 3.8
|
|
161
|
-
- lxml >= 4.9.0
|
|
162
|
-
- PyYAML >= 6.0
|
|
163
|
-
|
|
164
|
-
## Changelog
|
|
165
|
-
|
|
166
|
-
### v1.4.x (Phase 3)
|
|
167
|
-
- Session connection overrides for sources and targets
|
|
168
|
-
- Worklet function generation with safe invocation
|
|
169
|
-
- Type casting at target writes based on TARGETFIELD datatypes
|
|
170
|
-
- Flat-file session path overrides properly wired
|
|
171
|
-
|
|
172
|
-
### v1.3.x (Phase 2)
|
|
173
|
-
- FLATFILE metadata in source reads and target writes
|
|
174
|
-
- Normalizer with `pd.melt()`
|
|
175
|
-
- Rank with group-by and Top-N filtering
|
|
176
|
-
- Decision tasks with real if/else branches
|
|
177
|
-
- Mapplet instance inlining
|
|
178
|
-
|
|
179
|
-
### v1.2.x (Phase 1)
|
|
180
|
-
- Core parser for all 72 XML tags
|
|
181
|
-
- Expression converter with 80+ functions
|
|
182
|
-
- Aggregator, Joiner, Lookup code generation
|
|
183
|
-
- Workflow orchestration with topological task ordering
|
|
184
|
-
- Multi-library support (pandas, dask, polars, vaex, modin)
|
|
185
|
-
|
|
186
|
-
## Development
|
|
187
|
-
|
|
188
|
-
```bash
|
|
189
|
-
# Clone and install in development mode
|
|
190
|
-
cd informatica_python
|
|
191
|
-
pip install -e ".[dev]"
|
|
192
|
-
|
|
193
|
-
# Run tests (25 tests)
|
|
194
|
-
pytest tests/test_converter.py -v
|
|
195
|
-
```
|
|
196
|
-
|
|
197
|
-
## License
|
|
198
|
-
|
|
199
|
-
MIT License - Copyright (c) 2025 Nick
|
|
200
|
-
|
|
201
|
-
See [LICENSE](LICENSE) for details.
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
{informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/generators/__init__.py
RENAMED
|
File without changes
|
{informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/generators/config_gen.py
RENAMED
|
File without changes
|
{informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/generators/error_log_gen.py
RENAMED
|
File without changes
|
{informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/generators/helper_gen.py
RENAMED
|
File without changes
|
{informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/generators/sql_gen.py
RENAMED
|
File without changes
|
{informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/generators/workflow_gen.py
RENAMED
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
{informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/utils/datatype_map.py
RENAMED
|
File without changes
|
|
File without changes
|
{informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/utils/lib_adapters.py
RENAMED
|
File without changes
|
{informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python/utils/sql_dialect.py
RENAMED
|
File without changes
|
{informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python.egg-info/SOURCES.txt
RENAMED
|
File without changes
|
|
File without changes
|
{informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python.egg-info/entry_points.txt
RENAMED
|
File without changes
|
{informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python.egg-info/requires.txt
RENAMED
|
File without changes
|
{informatica_python-1.8.0 → informatica_python-1.8.2}/informatica_python.egg-info/top_level.txt
RENAMED
|
File without changes
|
|
File without changes
|
|
File without changes
|