PyPI - mkpipe-loader-postgres - Versions diffs - 0.3.0__tar.gz → 0.6.0__tar.gz - Mend

mkpipe-loader-postgres 0.3.0tar.gz → 0.6.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

mkpipe_loader_postgres-0.6.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,138 @@
+Metadata-Version: 2.4
+Name: mkpipe-loader-postgres
+Version: 0.6.0
+Summary: PostgreSQL loader for mkpipe.
+Author: Metin Karakus
+Author-email: metin_karakus@yahoo.com
+License: Apache License 2.0
+Classifier: Programming Language :: Python :: 3
+Classifier: License :: OSI Approved :: Apache Software License
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: mkpipe
+Dynamic: author
+Dynamic: author-email
+Dynamic: classifier
+Dynamic: description
+Dynamic: description-content-type
+Dynamic: license
+Dynamic: license-file
+Dynamic: requires-dist
+Dynamic: requires-python
+Dynamic: summary
+# mkpipe-loader-postgres
+PostgreSQL loader plugin for [MkPipe](https://github.com/mkpipe-etl/mkpipe). Writes Spark DataFrames into PostgreSQL tables via JDBC.
+## Documentation
+For more detailed documentation, please visit the [GitHub repository](https://github.com/mkpipe-etl/mkpipe).
+## License
+This project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.
+---
+## Connection Configuration
+```yaml
+connections:
+  pg_target:
+    variant: postgres
+    host: localhost
+    port: 5432
+    database: mydb
+    schema: public
+    user: myuser
+    password: mypassword
+```
+---
+## Table Configuration
+```yaml
+pipelines:
+  - name: source_to_pg
+    source: my_source
+    destination: pg_target
+    tables:
+      - name: source_table
+        target_name: public.stg_table
+        replication_method: full
+        batchsize: 10000
+      - name: source_table
+        target_name: public.stg_table
+        replication_method: incremental
+        iterate_column: updated_at
+        write_strategy: upsert
+        write_key: [id]
+```
+---
+## Write Strategy
+Control how data is written to PostgreSQL:
+```yaml
+      - name: source_table
+        target_name: public.stg_table
+        write_strategy: upsert       # append | replace | upsert | merge
+        write_key: [id]              # required for upsert/merge
+```
+| Strategy | PostgreSQL Behavior |
+|---|---|
+| `append` | Plain `INSERT` via JDBC (default for incremental) |
+| `replace` | Drop and recreate table, then insert (default for full) |
+| `upsert` | `INSERT ... ON CONFLICT (write_key) DO UPDATE` via temp table |
+| `merge` | Same as upsert for PostgreSQL |
+> **Note:** `upsert`/`merge` requires `write_key`. The loader writes to a temp table first, then executes a single `INSERT ... ON CONFLICT` statement to merge into the target.
+---
+## Write Parallelism & Throughput
+Two parameters control write performance:
+```yaml
+      - name: source_table
+        target_name: public.stg_table
+        replication_method: full
+        batchsize: 10000        # rows per JDBC batch insert (default: 10000)
+        write_partitions: 4     # coalesce DataFrame to N partitions before writing
+```
+### How they work
+- **`batchsize`**: rows buffered before sending one `INSERT` statement. PostgreSQL handles 5,000–10,000 well; very large batches (>100K) can increase memory pressure.
+- **`write_partitions`**: calls `coalesce(N)` on the DataFrame, reducing concurrent JDBC connections to PostgreSQL.
+### Performance Notes
+- PostgreSQL's `COPY` protocol is faster than JDBC for bulk loads, but mkpipe uses JDBC for portability.
+- For large loads, `write_partitions: 4–8` with `batchsize: 10000` is a reliable baseline.
+- If the target table has many indexes or constraints, writes will be slower — consider disabling indexes during bulk loads.
+---
+## All Table Parameters
+| Parameter | Type | Default | Description |
+|---|---|---|---|
+| `name` | string | required | Source table name |
+| `target_name` | string | required | PostgreSQL destination table name |
+| `replication_method` | `full` / `incremental` | `full` | Replication strategy |
+| `batchsize` | int | `10000` | Rows per JDBC batch insert |
+| `write_partitions` | int | — | Coalesce DataFrame to N partitions before writing |
+| `write_strategy` | string | — | `append`, `replace`, `upsert`, `merge` |
+| `write_key` | list | — | Key columns for upsert/merge (required) |
+| `dedup_columns` | list | — | Columns used for `mkpipe_id` hash deduplication |
+| `tags` | list | `[]` | Tags for selective pipeline execution |
+| `pass_on_error` | bool | `false` | Skip table on error instead of failing |

mkpipe_loader_postgres-0.6.0/README.md ADDED Viewed

@@ -0,0 +1,114 @@
+# mkpipe-loader-postgres
+PostgreSQL loader plugin for [MkPipe](https://github.com/mkpipe-etl/mkpipe). Writes Spark DataFrames into PostgreSQL tables via JDBC.
+## Documentation
+For more detailed documentation, please visit the [GitHub repository](https://github.com/mkpipe-etl/mkpipe).
+## License
+This project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.
+---
+## Connection Configuration
+```yaml
+connections:
+  pg_target:
+    variant: postgres
+    host: localhost
+    port: 5432
+    database: mydb
+    schema: public
+    user: myuser
+    password: mypassword
+```
+---
+## Table Configuration
+```yaml
+pipelines:
+  - name: source_to_pg
+    source: my_source
+    destination: pg_target
+    tables:
+      - name: source_table
+        target_name: public.stg_table
+        replication_method: full
+        batchsize: 10000
+      - name: source_table
+        target_name: public.stg_table
+        replication_method: incremental
+        iterate_column: updated_at
+        write_strategy: upsert
+        write_key: [id]
+```
+---
+## Write Strategy
+Control how data is written to PostgreSQL:
+```yaml
+      - name: source_table
+        target_name: public.stg_table
+        write_strategy: upsert       # append | replace | upsert | merge
+        write_key: [id]              # required for upsert/merge
+```
+| Strategy | PostgreSQL Behavior |
+|---|---|
+| `append` | Plain `INSERT` via JDBC (default for incremental) |
+| `replace` | Drop and recreate table, then insert (default for full) |
+| `upsert` | `INSERT ... ON CONFLICT (write_key) DO UPDATE` via temp table |
+| `merge` | Same as upsert for PostgreSQL |
+> **Note:** `upsert`/`merge` requires `write_key`. The loader writes to a temp table first, then executes a single `INSERT ... ON CONFLICT` statement to merge into the target.
+---
+## Write Parallelism & Throughput
+Two parameters control write performance:
+```yaml
+      - name: source_table
+        target_name: public.stg_table
+        replication_method: full
+        batchsize: 10000        # rows per JDBC batch insert (default: 10000)
+        write_partitions: 4     # coalesce DataFrame to N partitions before writing
+```
+### How they work
+- **`batchsize`**: rows buffered before sending one `INSERT` statement. PostgreSQL handles 5,000–10,000 well; very large batches (>100K) can increase memory pressure.
+- **`write_partitions`**: calls `coalesce(N)` on the DataFrame, reducing concurrent JDBC connections to PostgreSQL.
+### Performance Notes
+- PostgreSQL's `COPY` protocol is faster than JDBC for bulk loads, but mkpipe uses JDBC for portability.
+- For large loads, `write_partitions: 4–8` with `batchsize: 10000` is a reliable baseline.
+- If the target table has many indexes or constraints, writes will be slower — consider disabling indexes during bulk loads.
+---
+## All Table Parameters
+| Parameter | Type | Default | Description |
+|---|---|---|---|
+| `name` | string | required | Source table name |
+| `target_name` | string | required | PostgreSQL destination table name |
+| `replication_method` | `full` / `incremental` | `full` | Replication strategy |
+| `batchsize` | int | `10000` | Rows per JDBC batch insert |
+| `write_partitions` | int | — | Coalesce DataFrame to N partitions before writing |
+| `write_strategy` | string | — | `append`, `replace`, `upsert`, `merge` |
+| `write_key` | list | — | Key columns for upsert/merge (required) |
+| `dedup_columns` | list | — | Columns used for `mkpipe_id` hash deduplication |
+| `tags` | list | `[]` | Tags for selective pipeline execution |
+| `pass_on_error` | bool | `false` | Skip table on error instead of failing |

{mkpipe_loader_postgres-0.3.0 → mkpipe_loader_postgres-0.6.0}/mkpipe_loader_postgres/__init__.py RENAMED Viewed

@@ -1,9 +1,12 @@
 from mkpipe.spark import JdbcLoader
+JAR_PACKAGES = ['org.postgresql:postgresql:42.7.4']
-class PostgresLoader(JdbcLoader, variant='postgresql'):
+class PostgresLoader(JdbcLoader, variant='postgres'):
     driver_name = 'postgresql'
     driver_jdbc = 'org.postgresql.Driver'
+    _dialect = 'postgres'
     def build_jdbc_url(self):
         url = (

mkpipe_loader_postgres-0.6.0/mkpipe_loader_postgres/jars/.gitkeep ADDED Viewed

File without changes

mkpipe_loader_postgres-0.6.0/mkpipe_loader_postgres.egg-info/PKG-INFO ADDED Viewed

@@ -0,0 +1,138 @@
+Metadata-Version: 2.4
+Name: mkpipe-loader-postgres
+Version: 0.6.0
+Summary: PostgreSQL loader for mkpipe.
+Author: Metin Karakus
+Author-email: metin_karakus@yahoo.com
+License: Apache License 2.0
+Classifier: Programming Language :: Python :: 3
+Classifier: License :: OSI Approved :: Apache Software License
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: mkpipe
+Dynamic: author
+Dynamic: author-email
+Dynamic: classifier
+Dynamic: description
+Dynamic: description-content-type
+Dynamic: license
+Dynamic: license-file
+Dynamic: requires-dist
+Dynamic: requires-python
+Dynamic: summary
+# mkpipe-loader-postgres
+PostgreSQL loader plugin for [MkPipe](https://github.com/mkpipe-etl/mkpipe). Writes Spark DataFrames into PostgreSQL tables via JDBC.
+## Documentation
+For more detailed documentation, please visit the [GitHub repository](https://github.com/mkpipe-etl/mkpipe).
+## License
+This project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.
+---
+## Connection Configuration
+```yaml
+connections:
+  pg_target:
+    variant: postgres
+    host: localhost
+    port: 5432
+    database: mydb
+    schema: public
+    user: myuser
+    password: mypassword
+```
+---
+## Table Configuration
+```yaml
+pipelines:
+  - name: source_to_pg
+    source: my_source
+    destination: pg_target
+    tables:
+      - name: source_table
+        target_name: public.stg_table
+        replication_method: full
+        batchsize: 10000
+      - name: source_table
+        target_name: public.stg_table
+        replication_method: incremental
+        iterate_column: updated_at
+        write_strategy: upsert
+        write_key: [id]
+```
+---
+## Write Strategy
+Control how data is written to PostgreSQL:
+```yaml
+      - name: source_table
+        target_name: public.stg_table
+        write_strategy: upsert       # append | replace | upsert | merge
+        write_key: [id]              # required for upsert/merge
+```
+| Strategy | PostgreSQL Behavior |
+|---|---|
+| `append` | Plain `INSERT` via JDBC (default for incremental) |
+| `replace` | Drop and recreate table, then insert (default for full) |
+| `upsert` | `INSERT ... ON CONFLICT (write_key) DO UPDATE` via temp table |
+| `merge` | Same as upsert for PostgreSQL |
+> **Note:** `upsert`/`merge` requires `write_key`. The loader writes to a temp table first, then executes a single `INSERT ... ON CONFLICT` statement to merge into the target.
+---
+## Write Parallelism & Throughput
+Two parameters control write performance:
+```yaml
+      - name: source_table
+        target_name: public.stg_table
+        replication_method: full
+        batchsize: 10000        # rows per JDBC batch insert (default: 10000)
+        write_partitions: 4     # coalesce DataFrame to N partitions before writing
+```
+### How they work
+- **`batchsize`**: rows buffered before sending one `INSERT` statement. PostgreSQL handles 5,000–10,000 well; very large batches (>100K) can increase memory pressure.
+- **`write_partitions`**: calls `coalesce(N)` on the DataFrame, reducing concurrent JDBC connections to PostgreSQL.
+### Performance Notes
+- PostgreSQL's `COPY` protocol is faster than JDBC for bulk loads, but mkpipe uses JDBC for portability.
+- For large loads, `write_partitions: 4–8` with `batchsize: 10000` is a reliable baseline.
+- If the target table has many indexes or constraints, writes will be slower — consider disabling indexes during bulk loads.
+---
+## All Table Parameters
+| Parameter | Type | Default | Description |
+|---|---|---|---|
+| `name` | string | required | Source table name |
+| `target_name` | string | required | PostgreSQL destination table name |
+| `replication_method` | `full` / `incremental` | `full` | Replication strategy |
+| `batchsize` | int | `10000` | Rows per JDBC batch insert |
+| `write_partitions` | int | — | Coalesce DataFrame to N partitions before writing |
+| `write_strategy` | string | — | `append`, `replace`, `upsert`, `merge` |
+| `write_key` | list | — | Key columns for upsert/merge (required) |
+| `dedup_columns` | list | — | Columns used for `mkpipe_id` hash deduplication |
+| `tags` | list | `[]` | Tags for selective pipeline execution |
+| `pass_on_error` | bool | `false` | Skip table on error instead of failing |

{mkpipe_loader_postgres-0.3.0 → mkpipe_loader_postgres-0.6.0}/mkpipe_loader_postgres.egg-info/SOURCES.txt RENAMED Viewed

@@ -10,4 +10,4 @@ mkpipe_loader_postgres.egg-info/dependency_links.txt
 mkpipe_loader_postgres.egg-info/entry_points.txt
 mkpipe_loader_postgres.egg-info/requires.txt
 mkpipe_loader_postgres.egg-info/top_level.txt
-mkpipe_loader_postgres/jars/org.postgresql_postgresql-42.7.4.jar
+mkpipe_loader_postgres/jars/.gitkeep

{mkpipe_loader_postgres-0.3.0 → mkpipe_loader_postgres-0.6.0}/setup.py RENAMED Viewed

@@ -2,7 +2,7 @@ from setuptools import setup, find_packages
 setup(
     name='mkpipe-loader-postgres',
-    version='0.3.0',
+    version='0.6.0',
     license='Apache License 2.0',
     packages=find_packages(exclude=['tests', 'scripts', 'deploy', 'install_jars.py']),
     install_requires=['mkpipe'],

mkpipe_loader_postgres-0.3.0/PKG-INFO DELETED Viewed

@@ -1,50 +0,0 @@
-Metadata-Version: 2.4
-Name: mkpipe-loader-postgres
-Version: 0.3.0
-Summary: PostgreSQL loader for mkpipe.
-Author: Metin Karakus
-Author-email: metin_karakus@yahoo.com
-License: Apache License 2.0
-Classifier: Programming Language :: Python :: 3
-Classifier: License :: OSI Approved :: Apache Software License
-Requires-Python: >=3.8
-Description-Content-Type: text/markdown
-License-File: LICENSE
-Requires-Dist: mkpipe
-Dynamic: author
-Dynamic: author-email
-Dynamic: classifier
-Dynamic: description
-Dynamic: description-content-type
-Dynamic: license
-Dynamic: license-file
-Dynamic: requires-dist
-Dynamic: requires-python
-Dynamic: summary
-# MkPipe
-**MkPipe** is a modular, open-source ETL (Extract, Transform, Load) tool that allows you to integrate various data sources and sinks easily. It is designed to be extensible with a plugin-based architecture that supports extractors, transformers, and loaders.
-## Documentation
-For more detailed documentation, please visit the [GitHub repository](https://github.com/mkpipe-etl/mkpipe).
-## License
-This project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.
-## mkpipe_project.yaml Variables
-```yaml
-...
-  connections:
-    source:
-      host: 'XXX'
-      port: 'XXX'
-      database: 'XXX'
-      schema: 'XXX'
-      user: 'XXX'
-      password: 'XXX'
-...
-```

mkpipe_loader_postgres-0.3.0/README.md DELETED Viewed

@@ -1,26 +0,0 @@
-# MkPipe
-**MkPipe** is a modular, open-source ETL (Extract, Transform, Load) tool that allows you to integrate various data sources and sinks easily. It is designed to be extensible with a plugin-based architecture that supports extractors, transformers, and loaders.
-## Documentation
-For more detailed documentation, please visit the [GitHub repository](https://github.com/mkpipe-etl/mkpipe).
-## License
-This project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.
-## mkpipe_project.yaml Variables
-```yaml
-...
-  connections:
-    source:
-      host: 'XXX'
-      port: 'XXX'
-      database: 'XXX'
-      schema: 'XXX'
-      user: 'XXX'
-      password: 'XXX'
-...
-```

mkpipe_loader_postgres-0.3.0/mkpipe_loader_postgres/jars/org.postgresql_postgresql-42.7.4.jar DELETED Viewed

Binary file

mkpipe_loader_postgres-0.3.0/mkpipe_loader_postgres.egg-info/PKG-INFO DELETED Viewed

@@ -1,50 +0,0 @@
-Metadata-Version: 2.4
-Name: mkpipe-loader-postgres
-Version: 0.3.0
-Summary: PostgreSQL loader for mkpipe.
-Author: Metin Karakus
-Author-email: metin_karakus@yahoo.com
-License: Apache License 2.0
-Classifier: Programming Language :: Python :: 3
-Classifier: License :: OSI Approved :: Apache Software License
-Requires-Python: >=3.8
-Description-Content-Type: text/markdown
-License-File: LICENSE
-Requires-Dist: mkpipe
-Dynamic: author
-Dynamic: author-email
-Dynamic: classifier
-Dynamic: description
-Dynamic: description-content-type
-Dynamic: license
-Dynamic: license-file
-Dynamic: requires-dist
-Dynamic: requires-python
-Dynamic: summary
-# MkPipe
-**MkPipe** is a modular, open-source ETL (Extract, Transform, Load) tool that allows you to integrate various data sources and sinks easily. It is designed to be extensible with a plugin-based architecture that supports extractors, transformers, and loaders.
-## Documentation
-For more detailed documentation, please visit the [GitHub repository](https://github.com/mkpipe-etl/mkpipe).
-## License
-This project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.
-## mkpipe_project.yaml Variables
-```yaml
-...
-  connections:
-    source:
-      host: 'XXX'
-      port: 'XXX'
-      database: 'XXX'
-      schema: 'XXX'
-      user: 'XXX'
-      password: 'XXX'
-...
-```