ygg 0.1.20__py3-none-any.whl → 0.1.23__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- ygg-0.1.23.dist-info/METADATA +367 -0
- {ygg-0.1.20.dist-info → ygg-0.1.23.dist-info}/RECORD +12 -10
- ygg-0.1.23.dist-info/entry_points.txt +2 -0
- ygg-0.1.23.dist-info/licenses/LICENSE +201 -0
- yggdrasil/databricks/compute/cluster.py +61 -14
- yggdrasil/databricks/compute/execution_context.py +22 -20
- yggdrasil/databricks/compute/remote.py +0 -2
- yggdrasil/pyutils/__init__.py +2 -0
- yggdrasil/pyutils/callable_serde.py +563 -0
- yggdrasil/pyutils/python_env.py +1351 -0
- ygg-0.1.20.dist-info/METADATA +0 -163
- yggdrasil/ser/__init__.py +0 -1
- yggdrasil/ser/callable_serde.py +0 -661
- {ygg-0.1.20.dist-info → ygg-0.1.23.dist-info}/WHEEL +0 -0
- {ygg-0.1.20.dist-info → ygg-0.1.23.dist-info}/top_level.txt +0 -0
ygg-0.1.20.dist-info/METADATA
DELETED
|
@@ -1,163 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: ygg
|
|
3
|
-
Version: 0.1.20
|
|
4
|
-
Summary: Type-friendly utilities for moving data between Python objects, Arrow, Polars, Pandas, Spark, and Databricks
|
|
5
|
-
Author: Yggdrasil contributors
|
|
6
|
-
Project-URL: Homepage, https://github.com/Platob/Yggdrasil
|
|
7
|
-
Project-URL: Repository, https://github.com/Platob/Yggdrasil
|
|
8
|
-
Project-URL: Documentation, https://github.com/Platob/Yggdrasil/tree/main/python/docs
|
|
9
|
-
Keywords: arrow,polars,pandas,spark,databricks,typing,dataclass,serialization
|
|
10
|
-
Classifier: Development Status :: 3 - Alpha
|
|
11
|
-
Classifier: Programming Language :: Python
|
|
12
|
-
Classifier: Programming Language :: Python :: 3
|
|
13
|
-
Classifier: Programming Language :: Python :: 3.10
|
|
14
|
-
Classifier: Programming Language :: Python :: 3.11
|
|
15
|
-
Classifier: Programming Language :: Python :: 3.12
|
|
16
|
-
Classifier: Intended Audience :: Developers
|
|
17
|
-
Classifier: Intended Audience :: Information Technology
|
|
18
|
-
Classifier: Topic :: Software Development :: Libraries
|
|
19
|
-
Classifier: Topic :: Scientific/Engineering :: Information Analysis
|
|
20
|
-
Classifier: Typing :: Typed
|
|
21
|
-
Requires-Python: >=3.10
|
|
22
|
-
Description-Content-Type: text/markdown
|
|
23
|
-
Requires-Dist: requests>=2
|
|
24
|
-
Requires-Dist: polars>=1.3
|
|
25
|
-
Requires-Dist: pandas>=2
|
|
26
|
-
Requires-Dist: pyarrow>=20
|
|
27
|
-
Requires-Dist: dill>=0.4
|
|
28
|
-
Requires-Dist: databricks-sdk>=0.71
|
|
29
|
-
Provides-Extra: dev
|
|
30
|
-
Requires-Dist: pytest; extra == "dev"
|
|
31
|
-
Requires-Dist: pytest-asyncio; extra == "dev"
|
|
32
|
-
Requires-Dist: black; extra == "dev"
|
|
33
|
-
Requires-Dist: ruff; extra == "dev"
|
|
34
|
-
Requires-Dist: mypy; extra == "dev"
|
|
35
|
-
|
|
36
|
-
# Yggdrasil (Python)
|
|
37
|
-
|
|
38
|
-
Type-friendly utilities for moving data between Python objects, Arrow, Polars, pandas, Spark, and Databricks. The package bundles enhanced dataclasses, casting utilities, and lightweight wrappers around Databricks and HTTP clients so Python/data engineers can focus on schemas instead of plumbing.
|
|
39
|
-
|
|
40
|
-
## When to use this package
|
|
41
|
-
Use Yggdrasil when you need to:
|
|
42
|
-
- Convert payloads across dataframe engines without rewriting type logic for each backend.
|
|
43
|
-
- Define dataclasses that auto-coerce inputs, expose defaults, and surface Arrow schemas.
|
|
44
|
-
- Run Databricks SQL jobs or manage clusters with minimal boilerplate.
|
|
45
|
-
- Add resilient retries, concurrency helpers, and dependency guards to data pipelines.
|
|
46
|
-
|
|
47
|
-
## Prerequisites
|
|
48
|
-
- Python **3.10+**
|
|
49
|
-
- [uv](https://docs.astral.sh/uv/) for virtualenv and dependency management.
|
|
50
|
-
|
|
51
|
-
Optional extras:
|
|
52
|
-
- `polars`, `pandas`, `pyarrow`, and `pyspark` for engine-specific conversions.
|
|
53
|
-
- `databricks-sdk` for workspace, SQL, jobs, and compute helpers.
|
|
54
|
-
- `msal` for Azure AD authentication when using `MSALSession`.
|
|
55
|
-
|
|
56
|
-
## Installation
|
|
57
|
-
From the `python/` directory:
|
|
58
|
-
|
|
59
|
-
```bash
|
|
60
|
-
uv venv .venv
|
|
61
|
-
source .venv/bin/activate
|
|
62
|
-
uv pip install -e .[dev]
|
|
63
|
-
```
|
|
64
|
-
|
|
65
|
-
Extras are grouped by engine:
|
|
66
|
-
- `.[polars]`, `.[pandas]`, `.[spark]`, `.[databricks]` – install only the integrations you need.
|
|
67
|
-
- `.[dev]` – adds testing, linting, and typing tools (`pytest`, `ruff`, `black`, `mypy`).
|
|
68
|
-
|
|
69
|
-
## Quickstart
|
|
70
|
-
Define an Arrow-aware dataclass, coerce inputs, and cast across containers:
|
|
71
|
-
|
|
72
|
-
```python
|
|
73
|
-
from yggdrasil import yggdataclass
|
|
74
|
-
from yggdrasil.types.cast import convert
|
|
75
|
-
from yggdrasil.types import arrow_field_from_hint
|
|
76
|
-
|
|
77
|
-
@yggdataclass
|
|
78
|
-
class User:
|
|
79
|
-
id: int
|
|
80
|
-
email: str
|
|
81
|
-
active: bool = True
|
|
82
|
-
|
|
83
|
-
user = User.__safe_init__("123", email="alice@example.com")
|
|
84
|
-
assert user.id == 123 and user.active is True
|
|
85
|
-
|
|
86
|
-
payload = {"id": "45", "email": "bob@example.com", "active": "false"}
|
|
87
|
-
clean = User.from_dict(payload)
|
|
88
|
-
print(clean.to_dict())
|
|
89
|
-
|
|
90
|
-
field = arrow_field_from_hint(User, name="user")
|
|
91
|
-
print(field) # user: struct<id: int64, email: string, active: bool>
|
|
92
|
-
|
|
93
|
-
numbers = convert(["1", "2", "3"], list[int])
|
|
94
|
-
print(numbers)
|
|
95
|
-
```
|
|
96
|
-
|
|
97
|
-
### Databricks example
|
|
98
|
-
Install the `databricks` extra and run SQL with typed results:
|
|
99
|
-
|
|
100
|
-
```python
|
|
101
|
-
from yggdrasil.databricks.workspaces import Workspace
|
|
102
|
-
from yggdrasil.databricks.sql import SQLEngine
|
|
103
|
-
|
|
104
|
-
ws = Workspace(host="https://<workspace-url>", token="<token>")
|
|
105
|
-
engine = SQLEngine(workspace=ws)
|
|
106
|
-
|
|
107
|
-
stmt = engine.execute("SELECT 1 AS value")
|
|
108
|
-
result = stmt.wait(engine)
|
|
109
|
-
tbl = result.arrow_table()
|
|
110
|
-
print(tbl.to_pandas())
|
|
111
|
-
```
|
|
112
|
-
|
|
113
|
-
### Parallel processing and retries
|
|
114
|
-
|
|
115
|
-
```python
|
|
116
|
-
from yggdrasil.pyutils import parallelize, retry
|
|
117
|
-
|
|
118
|
-
@parallelize(max_workers=4)
|
|
119
|
-
def square(x):
|
|
120
|
-
return x * x
|
|
121
|
-
|
|
122
|
-
@retry(tries=5, delay=0.2, backoff=2)
|
|
123
|
-
def sometimes_fails(value: int) -> int:
|
|
124
|
-
...
|
|
125
|
-
|
|
126
|
-
print(list(square(range(5))))
|
|
127
|
-
```
|
|
128
|
-
|
|
129
|
-
## Project layout
|
|
130
|
-
- `yggdrasil/dataclasses` – `yggdataclass` decorator plus Arrow schema helpers.
|
|
131
|
-
- `yggdrasil/types` – casting registry (`convert`, `register_converter`), Arrow inference, and default generators.
|
|
132
|
-
- `yggdrasil/libs` – optional bridges to Polars, pandas, Spark, and Databricks SDK types.
|
|
133
|
-
- `yggdrasil/databricks` – workspace, SQL, jobs, and compute helpers built on the Databricks SDK.
|
|
134
|
-
- `yggdrasil/requests` – retry-capable HTTP sessions and Azure MSAL auth helpers.
|
|
135
|
-
- `yggdrasil/pyutils` – concurrency and retry decorators.
|
|
136
|
-
- `yggdrasil/ser` – serialization helpers and dependency inspection utilities.
|
|
137
|
-
- `tests/` – pytest-based coverage for conversions, dataclasses, requests, and platform helpers.
|
|
138
|
-
|
|
139
|
-
## Testing
|
|
140
|
-
From `python/`:
|
|
141
|
-
|
|
142
|
-
```bash
|
|
143
|
-
pytest
|
|
144
|
-
```
|
|
145
|
-
|
|
146
|
-
Optional checks when developing:
|
|
147
|
-
|
|
148
|
-
```bash
|
|
149
|
-
ruff check
|
|
150
|
-
black .
|
|
151
|
-
mypy
|
|
152
|
-
```
|
|
153
|
-
|
|
154
|
-
## Troubleshooting and common pitfalls
|
|
155
|
-
- **Missing optional dependency**: Install the matching extra (e.g., `uv pip install -e .[polars]`) or wrap calls with `require_polars`/`require_pyspark` from `yggdrasil.libs`.
|
|
156
|
-
- **Schema mismatches**: Use `arrow_field_from_hint` and `CastOptions` to enforce expected Arrow metadata when casting.
|
|
157
|
-
- **Databricks auth**: Provide `host` and `token` to `Workspace`. For Azure, ensure environment variables align with your workspace deployment.
|
|
158
|
-
|
|
159
|
-
## Contributing
|
|
160
|
-
1. Fork and branch.
|
|
161
|
-
2. Install with `uv pip install -e .[dev]`.
|
|
162
|
-
3. Run tests and linters.
|
|
163
|
-
4. Submit a PR describing the change and any new examples added to the docs.
|
yggdrasil/ser/__init__.py
DELETED
|
@@ -1 +0,0 @@
|
|
|
1
|
-
from .callable_serde import *
|