fabrictools 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,197 @@
1
+ Metadata-Version: 2.4
2
+ Name: fabrictools
3
+ Version: 0.1.0
4
+ Summary: User-friendly PySpark helpers for Microsoft Fabric Lakehouses and Warehouses
5
+ Author-email: Willy Kinfoussia <willy.kinfoussia@gmail.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/willykinfoussia/FabricPackage
8
+ Project-URL: Repository, https://github.com/willykinfoussia/FabricPackage
9
+ Project-URL: Issues, https://github.com/willykinfoussia/FabricPackage/issues
10
+ Keywords: microsoft-fabric,pyspark,delta,lakehouse,warehouse,azure
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: Intended Audience :: Science/Research
14
+ Classifier: License :: OSI Approved :: MIT License
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Programming Language :: Python :: 3.9
17
+ Classifier: Programming Language :: Python :: 3.10
18
+ Classifier: Programming Language :: Python :: 3.11
19
+ Classifier: Topic :: Database
20
+ Classifier: Topic :: Scientific/Engineering
21
+ Requires-Python: >=3.9
22
+ Description-Content-Type: text/markdown
23
+ Provides-Extra: spark
24
+ Requires-Dist: pyspark>=3.3; extra == "spark"
25
+ Requires-Dist: delta-spark>=2.4; extra == "spark"
26
+ Provides-Extra: dev
27
+ Requires-Dist: pyspark>=3.3; extra == "dev"
28
+ Requires-Dist: delta-spark>=2.4; extra == "dev"
29
+ Requires-Dist: pytest>=7.4; extra == "dev"
30
+ Requires-Dist: pytest-mock>=3.12; extra == "dev"
31
+ Requires-Dist: build>=1.0; extra == "dev"
32
+ Requires-Dist: twine>=5.0; extra == "dev"
33
+
34
+ # fabrictools
35
+
36
+ > User-friendly PySpark helpers for **Microsoft Fabric** — read, write, and merge Lakehouses and Warehouses with a single function call.
37
+
38
+ ---
39
+
40
+ ## Features
41
+
42
+ - **Auto-resolved paths** — pass a Lakehouse or Warehouse *name*, no ABFS URL configuration required
43
+ - **Auto-detected SparkSession** — uses `SparkSession.builder.getOrCreate()`, works seamlessly inside Fabric notebooks
44
+ - **Auto-detected format** on read — tries Delta → Parquet → CSV automatically
45
+ - **Delta merge (upsert)** — one-liner upsert into any Lakehouse Delta table
46
+ - **Built-in logging** — every operation logs its resolved path, detected format, and row/column count
47
+
48
+ ---
49
+
50
+ ## Requirements
51
+
52
+ - Microsoft Fabric Spark runtime (provides `notebookutils`, `pyspark`, and `delta-spark`)
53
+ - Python >= 3.9
54
+
55
+ > **Local development:** install the `spark` extras to get PySpark and delta-spark.
56
+ > `notebookutils` is only available inside Fabric — functions that resolve paths will raise a clear `ValueError` outside Fabric.
57
+
58
+ ---
59
+
60
+ ## Installation
61
+
62
+ ```bash
63
+ # Inside a Fabric notebook or pipeline
64
+ pip install fabrictools
65
+
66
+ # Local development (includes PySpark + delta-spark)
67
+ pip install "fabrictools[spark]"
68
+ ```
69
+
70
+ ---
71
+
72
+ ## Quick start
73
+
74
+ ```python
75
+ import fabrictools as ft
76
+ ```
77
+
78
+ ### Read a Lakehouse dataset
79
+
80
+ ```python
81
+ # Auto-detects Delta → Parquet → CSV
82
+ df = ft.read_lakehouse("BronzeLakehouse", "sales/2024")
83
+ ```
84
+
85
+ ### Write to a Lakehouse
86
+
87
+ ```python
88
+ ft.write_lakehouse(
89
+ df,
90
+ lakehouse_name="SilverLakehouse",
91
+ relative_path="sales_clean",
92
+ mode="overwrite",
93
+ partition_by=["year", "month"], # optional
94
+ )
95
+ ```
96
+
97
+ ### Merge (upsert) into a Delta table
98
+
99
+ ```python
100
+ ft.merge_lakehouse(
101
+ source_df=new_df,
102
+ lakehouse_name="SilverLakehouse",
103
+ relative_path="sales_clean",
104
+ merge_condition="src.id = tgt.id",
105
+ # update_set and insert_set are optional:
106
+ # omit them to update/insert all columns automatically
107
+ )
108
+ ```
109
+
110
+ With explicit column mappings:
111
+
112
+ ```python
113
+ ft.merge_lakehouse(
114
+ source_df=new_df,
115
+ lakehouse_name="SilverLakehouse",
116
+ relative_path="sales_clean",
117
+ merge_condition="src.id = tgt.id",
118
+ update_set={"amount": "src.amount", "updated_at": "src.updated_at"},
119
+ insert_set={"id": "src.id", "amount": "src.amount", "updated_at": "src.updated_at"},
120
+ )
121
+ ```
122
+
123
+ ### Read from a Warehouse
124
+
125
+ ```python
126
+ df = ft.read_warehouse("MyWarehouse", "SELECT * FROM dbo.sales WHERE year = 2024")
127
+ ```
128
+
129
+ ### Write to a Warehouse
130
+
131
+ ```python
132
+ ft.write_warehouse(
133
+ df,
134
+ warehouse_name="MyWarehouse",
135
+ table="dbo.sales_clean",
136
+ mode="overwrite", # or "append"
137
+ batch_size=10_000, # optional, default 10 000
138
+ )
139
+ ```
140
+
141
+ ---
142
+
143
+ ## API reference
144
+
145
+ ### Lakehouse
146
+
147
+ | Function | Description |
148
+ |---|---|
149
+ | `read_lakehouse(lakehouse_name, relative_path, spark=None)` | Read a dataset — auto-detects Delta / Parquet / CSV |
150
+ | `write_lakehouse(df, lakehouse_name, relative_path, mode, partition_by, format, spark=None)` | Write a DataFrame (default: Delta, overwrite) |
151
+ | `merge_lakehouse(source_df, lakehouse_name, relative_path, merge_condition, update_set, insert_set, spark=None)` | Upsert via Delta merge |
152
+
153
+ ### Warehouse
154
+
155
+ | Function | Description |
156
+ |---|---|
157
+ | `read_warehouse(warehouse_name, query, spark=None)` | Run a SQL query, return a DataFrame |
158
+ | `write_warehouse(df, warehouse_name, table, mode, batch_size, spark=None)` | Write to a Warehouse table via JDBC |
159
+
160
+ ---
161
+
162
+ ## How path resolution works
163
+
164
+ ```
165
+ lakehouse_name="BronzeLakehouse"
166
+
167
+
168
+ notebookutils.lakehouse.get("BronzeLakehouse")
169
+
170
+
171
+ lh.properties.abfsPath
172
+ = "abfss://bronze@<account>.dfs.core.windows.net"
173
+
174
+
175
+ full_path = abfsPath + "/" + relative_path
176
+ ```
177
+
178
+ ---
179
+
180
+ ## Running the tests
181
+
182
+ ```bash
183
+ pip install "fabrictools[dev]"
184
+ pytest
185
+ ```
186
+
187
+ ---
188
+
189
+ ## Publishing to PyPI
190
+
191
+ See [docs/PYPI_PUBLISH.md](docs/PYPI_PUBLISH.md) for a step-by-step guide.
192
+
193
+ ---
194
+
195
+ ## License
196
+
197
+ MIT
@@ -0,0 +1,164 @@
1
+ # fabrictools
2
+
3
+ > User-friendly PySpark helpers for **Microsoft Fabric** — read, write, and merge Lakehouses and Warehouses with a single function call.
4
+
5
+ ---
6
+
7
+ ## Features
8
+
9
+ - **Auto-resolved paths** — pass a Lakehouse or Warehouse *name*, no ABFS URL configuration required
10
+ - **Auto-detected SparkSession** — uses `SparkSession.builder.getOrCreate()`, works seamlessly inside Fabric notebooks
11
+ - **Auto-detected format** on read — tries Delta → Parquet → CSV automatically
12
+ - **Delta merge (upsert)** — one-liner upsert into any Lakehouse Delta table
13
+ - **Built-in logging** — every operation logs its resolved path, detected format, and row/column count
14
+
15
+ ---
16
+
17
+ ## Requirements
18
+
19
+ - Microsoft Fabric Spark runtime (provides `notebookutils`, `pyspark`, and `delta-spark`)
20
+ - Python >= 3.9
21
+
22
+ > **Local development:** install the `spark` extras to get PySpark and delta-spark.
23
+ > `notebookutils` is only available inside Fabric — functions that resolve paths will raise a clear `ValueError` outside Fabric.
24
+
25
+ ---
26
+
27
+ ## Installation
28
+
29
+ ```bash
30
+ # Inside a Fabric notebook or pipeline
31
+ pip install fabrictools
32
+
33
+ # Local development (includes PySpark + delta-spark)
34
+ pip install "fabrictools[spark]"
35
+ ```
36
+
37
+ ---
38
+
39
+ ## Quick start
40
+
41
+ ```python
42
+ import fabrictools as ft
43
+ ```
44
+
45
+ ### Read a Lakehouse dataset
46
+
47
+ ```python
48
+ # Auto-detects Delta → Parquet → CSV
49
+ df = ft.read_lakehouse("BronzeLakehouse", "sales/2024")
50
+ ```
51
+
52
+ ### Write to a Lakehouse
53
+
54
+ ```python
55
+ ft.write_lakehouse(
56
+ df,
57
+ lakehouse_name="SilverLakehouse",
58
+ relative_path="sales_clean",
59
+ mode="overwrite",
60
+ partition_by=["year", "month"], # optional
61
+ )
62
+ ```
63
+
64
+ ### Merge (upsert) into a Delta table
65
+
66
+ ```python
67
+ ft.merge_lakehouse(
68
+ source_df=new_df,
69
+ lakehouse_name="SilverLakehouse",
70
+ relative_path="sales_clean",
71
+ merge_condition="src.id = tgt.id",
72
+ # update_set and insert_set are optional:
73
+ # omit them to update/insert all columns automatically
74
+ )
75
+ ```
76
+
77
+ With explicit column mappings:
78
+
79
+ ```python
80
+ ft.merge_lakehouse(
81
+ source_df=new_df,
82
+ lakehouse_name="SilverLakehouse",
83
+ relative_path="sales_clean",
84
+ merge_condition="src.id = tgt.id",
85
+ update_set={"amount": "src.amount", "updated_at": "src.updated_at"},
86
+ insert_set={"id": "src.id", "amount": "src.amount", "updated_at": "src.updated_at"},
87
+ )
88
+ ```
89
+
90
+ ### Read from a Warehouse
91
+
92
+ ```python
93
+ df = ft.read_warehouse("MyWarehouse", "SELECT * FROM dbo.sales WHERE year = 2024")
94
+ ```
95
+
96
+ ### Write to a Warehouse
97
+
98
+ ```python
99
+ ft.write_warehouse(
100
+ df,
101
+ warehouse_name="MyWarehouse",
102
+ table="dbo.sales_clean",
103
+ mode="overwrite", # or "append"
104
+ batch_size=10_000, # optional, default 10 000
105
+ )
106
+ ```
107
+
108
+ ---
109
+
110
+ ## API reference
111
+
112
+ ### Lakehouse
113
+
114
+ | Function | Description |
115
+ |---|---|
116
+ | `read_lakehouse(lakehouse_name, relative_path, spark=None)` | Read a dataset — auto-detects Delta / Parquet / CSV |
117
+ | `write_lakehouse(df, lakehouse_name, relative_path, mode, partition_by, format, spark=None)` | Write a DataFrame (default: Delta, overwrite) |
118
+ | `merge_lakehouse(source_df, lakehouse_name, relative_path, merge_condition, update_set, insert_set, spark=None)` | Upsert via Delta merge |
119
+
120
+ ### Warehouse
121
+
122
+ | Function | Description |
123
+ |---|---|
124
+ | `read_warehouse(warehouse_name, query, spark=None)` | Run a SQL query, return a DataFrame |
125
+ | `write_warehouse(df, warehouse_name, table, mode, batch_size, spark=None)` | Write to a Warehouse table via JDBC |
126
+
127
+ ---
128
+
129
+ ## How path resolution works
130
+
131
+ ```
132
+ lakehouse_name="BronzeLakehouse"
133
+
134
+
135
+ notebookutils.lakehouse.get("BronzeLakehouse")
136
+
137
+
138
+ lh.properties.abfsPath
139
+ = "abfss://bronze@<account>.dfs.core.windows.net"
140
+
141
+
142
+ full_path = abfsPath + "/" + relative_path
143
+ ```
144
+
145
+ ---
146
+
147
+ ## Running the tests
148
+
149
+ ```bash
150
+ pip install "fabrictools[dev]"
151
+ pytest
152
+ ```
153
+
154
+ ---
155
+
156
+ ## Publishing to PyPI
157
+
158
+ See [docs/PYPI_PUBLISH.md](docs/PYPI_PUBLISH.md) for a step-by-step guide.
159
+
160
+ ---
161
+
162
+ ## License
163
+
164
+ MIT
@@ -0,0 +1,34 @@
1
+ """
2
+ fabrictools — User-friendly PySpark helpers for Microsoft Fabric.
3
+
4
+ Public API
5
+ ----------
6
+ Lakehouse
7
+ ~~~~~~~~~
8
+ read_lakehouse(lakehouse_name, relative_path, spark=None)
9
+ Read a dataset (auto-detects Delta / Parquet / CSV).
10
+ write_lakehouse(df, lakehouse_name, relative_path, mode, partition_by, format, spark=None)
11
+ Write a DataFrame to a Lakehouse (defaults to Delta format).
12
+ merge_lakehouse(source_df, lakehouse_name, relative_path, merge_condition, ...)
13
+ Upsert (merge) a DataFrame into an existing Delta table.
14
+
15
+ Warehouse
16
+ ~~~~~~~~~
17
+ read_warehouse(warehouse_name, query, spark=None)
18
+ Run a SQL query and return the result as a DataFrame.
19
+ write_warehouse(df, warehouse_name, table, mode, batch_size, spark=None)
20
+ Write a DataFrame to a Warehouse table via JDBC.
21
+ """
22
+
23
+ from fabrictools.lakehouse import merge_lakehouse, read_lakehouse, write_lakehouse
24
+ from fabrictools.warehouse import read_warehouse, write_warehouse
25
+
26
+ __all__ = [
27
+ "read_lakehouse",
28
+ "write_lakehouse",
29
+ "merge_lakehouse",
30
+ "read_warehouse",
31
+ "write_warehouse",
32
+ ]
33
+
34
+ __version__ = "0.1.0"
@@ -0,0 +1,36 @@
1
+ """Internal logging utility for fabrictools."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import datetime
6
+ import logging
7
+
8
+ _LEVELS = {
9
+ "info": logging.INFO,
10
+ "warning": logging.WARNING,
11
+ "error": logging.ERROR,
12
+ "debug": logging.DEBUG,
13
+ }
14
+
15
+ logging.basicConfig(
16
+ format="%(message)s",
17
+ level=logging.INFO,
18
+ )
19
+ _logger = logging.getLogger("fabrictools")
20
+
21
+
22
+ def log(message: str, level: str = "info") -> None:
23
+ """
24
+ Emit a timestamped log message.
25
+
26
+ Parameters
27
+ ----------
28
+ message:
29
+ Text to log.
30
+ level:
31
+ One of ``"info"``, ``"warning"``, ``"error"``, ``"debug"``.
32
+ Defaults to ``"info"``.
33
+ """
34
+ ts = datetime.datetime.now().strftime("%H:%M:%S")
35
+ lvl = _LEVELS.get(level.lower(), logging.INFO)
36
+ _logger.log(lvl, "[%s] %s", ts, message)
@@ -0,0 +1,104 @@
1
+ """
2
+ Path resolution helpers for Microsoft Fabric resources.
3
+
4
+ These functions rely on ``notebookutils``, which is injected automatically
5
+ into the Python environment by the Fabric notebook runtime. They will raise
6
+ a clear ``ValueError`` when called outside Fabric (e.g. local tests) so that
7
+ callers can handle the missing dependency gracefully.
8
+ """
9
+
10
+ from __future__ import annotations
11
+
12
+ from fabrictools._logger import log
13
+
14
+
15
+ def get_lakehouse_abfs_path(lakehouse_name: str) -> str:
16
+ """
17
+ Return the full ABFS path for a Fabric Lakehouse.
18
+
19
+ Internally calls ``notebookutils.lakehouse.get(lakehouse_name)`` which is
20
+ available in every Fabric Spark notebook.
21
+
22
+ Parameters
23
+ ----------
24
+ lakehouse_name:
25
+ Display name of the Lakehouse as it appears in the Fabric workspace
26
+ (e.g. ``"BronzeLakehouse"``).
27
+
28
+ Returns
29
+ -------
30
+ str
31
+ ABFS path of the form
32
+ ``abfss://<container>@<account>.dfs.core.windows.net``.
33
+
34
+ Raises
35
+ ------
36
+ ValueError
37
+ When ``notebookutils`` is not available (outside Fabric).
38
+ """
39
+ try:
40
+ import notebookutils # type: ignore[import-untyped] # noqa: PLC0415
41
+
42
+ lh = notebookutils.lakehouse.get(lakehouse_name)
43
+ path: str = lh.properties.abfsPath
44
+ log(f"Resolved Lakehouse '{lakehouse_name}' → {path}")
45
+ return path
46
+ except ImportError as exc:
47
+ raise ValueError(
48
+ f"notebookutils is not available — are you running inside "
49
+ f"Microsoft Fabric? ({exc})"
50
+ ) from exc
51
+ except Exception as exc:
52
+ raise ValueError(
53
+ f"Could not resolve Lakehouse '{lakehouse_name}': {exc}"
54
+ ) from exc
55
+
56
+
57
+ def get_warehouse_jdbc_url(warehouse_name: str) -> str:
58
+ """
59
+ Return the JDBC connection URL for a Fabric Warehouse.
60
+
61
+ Internally calls ``notebookutils.warehouse.get(warehouse_name)`` to
62
+ retrieve the SQL endpoint and builds a standard JDBC URL from it.
63
+
64
+ Parameters
65
+ ----------
66
+ warehouse_name:
67
+ Display name of the Warehouse as it appears in the Fabric workspace
68
+ (e.g. ``"MyWarehouse"``).
69
+
70
+ Returns
71
+ -------
72
+ str
73
+ JDBC URL suitable for use with ``spark.read.format("jdbc")``.
74
+
75
+ Raises
76
+ ------
77
+ ValueError
78
+ When ``notebookutils`` is not available or the warehouse cannot be
79
+ found.
80
+ """
81
+ try:
82
+ import notebookutils # type: ignore[import-untyped] # noqa: PLC0415
83
+
84
+ wh = notebookutils.warehouse.get(warehouse_name)
85
+ sql_endpoint: str = wh.properties.connectionString
86
+ database: str = wh.properties.databaseName
87
+ jdbc_url = (
88
+ f"jdbc:sqlserver://{sql_endpoint};"
89
+ f"database={database};"
90
+ "encrypt=true;"
91
+ "trustServerCertificate=false;"
92
+ "loginTimeout=30;"
93
+ )
94
+ log(f"Resolved Warehouse '{warehouse_name}' → {sql_endpoint}/{database}")
95
+ return jdbc_url
96
+ except ImportError as exc:
97
+ raise ValueError(
98
+ f"notebookutils is not available — are you running inside "
99
+ f"Microsoft Fabric? ({exc})"
100
+ ) from exc
101
+ except Exception as exc:
102
+ raise ValueError(
103
+ f"Could not resolve Warehouse '{warehouse_name}': {exc}"
104
+ ) from exc
@@ -0,0 +1,25 @@
1
+ """SparkSession accessor for fabrictools."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from typing import TYPE_CHECKING
6
+
7
+ if TYPE_CHECKING:
8
+ from pyspark.sql import SparkSession
9
+
10
+
11
+ def get_spark() -> "SparkSession":
12
+ """
13
+ Return the active SparkSession, creating one if necessary.
14
+
15
+ Inside a Microsoft Fabric notebook the runtime already has an active
16
+ session, so ``getOrCreate()`` simply returns it. Outside Fabric (e.g.
17
+ local development) a new local session is started automatically.
18
+
19
+ Returns
20
+ -------
21
+ SparkSession
22
+ """
23
+ from pyspark.sql import SparkSession # noqa: PLC0415
24
+
25
+ return SparkSession.builder.getOrCreate()