polars-checkpoint 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,88 @@
1
+ Metadata-Version: 2.4
2
+ Name: polars-checkpoint
3
+ Version: 0.1.0
4
+ Summary: Add your description here
5
+ Requires-Python: >=3.11.1
6
+ Description-Content-Type: text/markdown
7
+ Requires-Dist: polars>=1.39.3
8
+
9
+ # polars-checkpoint
10
+
11
+ Materialise Polars LazyFrames to parquet files and scan them back lazily. Defaults to the streaming engine for sink/scan. Useful for managing & reducing memory pressure due to expensive intermediate results in complex multi-step transforms.
12
+
13
+ ## Installation
14
+
15
+ Requires `polars` and Python 3.10+.
16
+
17
+ ## Quick Start
18
+
19
+ ```python
20
+ import polars as pl
21
+ from checkpoint import checkpoint
22
+
23
+ lf = pl.LazyFrame({"x": range(1_000_000)}).with_columns(y=pl.col("x") * 2)
24
+
25
+ # Materialises to a temp parquet file; returns a lazy re-scan
26
+ lf = checkpoint(lf)
27
+
28
+ lf.filter(pl.col("y") > 100).collect()
29
+ ```
30
+
31
+ A process-wide default session manages the temp directory and cleans it up at exit.
32
+
33
+ ## Session API
34
+
35
+ For explicit control over storage location and lifecycle, use `CheckpointSession`:
36
+
37
+ ```python
38
+ from checkpoint import CheckpointSession
39
+
40
+ # As a context manager — cleans up on exit from the block
41
+ with CheckpointSession(root_dir="./my_checkpoints") as sess:
42
+ lf = pl.scan_csv("big.csv")
43
+ lf = sess.checkpoint(lf, name="after-parse")
44
+ lf = lf.filter(pl.col("status") == "active")
45
+ lf = sess.checkpoint(lf, name="filtered")
46
+ ```
47
+
48
+ ```python
49
+ # Without a context manager — cleans up at GC or interpreter shutdown,
50
+ # or when you call close() explicitly
51
+ sess = CheckpointSession(root_dir="./my_checkpoints")
52
+ lf = sess.checkpoint(pl.scan_csv("big.csv"), name="raw")
53
+ reloaded = sess["raw"]
54
+ print(sess.summary())
55
+
56
+ sess.close() # optional; triggers early cleanup
57
+ ```
58
+
59
+ ### `CheckpointSession` constructor
60
+
61
+ | Parameter | Default | Description |
62
+ |---|---|---|
63
+ | `root_dir` | `None` (auto temp dir) | Parent directory for checkpoint folders. |
64
+ | `cleanup` | `True` | Delete checkpoint files on close / GC / interpreter exit. |
65
+ | `default_sink_kwargs` | `{"compression": "zstd"}` | Defaults passed to `sink_parquet` / `write_parquet`. |
66
+ | `default_scan_kwargs` | `{}` | Defaults passed to `scan_parquet`. |
67
+
68
+ ### Key methods & features
69
+
70
+ - **`checkpoint(lf, *, name=None, streaming=True, ...)`** — Materialise a LazyFrame to parquet. Auto-generates a name if none given. Falls back to `collect().write_parquet()` when `streaming=False`.
71
+ - **`session[name]`** — Retrieve a checkpoint as a `LazyFrame`.
72
+ - **`name in session`** — Check existence.
73
+ - **`len(session)`** / **`iter(session)`** — Count / list checkpoints.
74
+ - **`summary()`** — Returns a Polars DataFrame with name, size (MB), and path of each checkpoint.
75
+ - **`close(timeout=None)`** — Waits for in-flight writes, then cleans up. Also usable as a context manager.
76
+
77
+ ## Thread Safety
78
+
79
+ Sessions are internally locked. Concurrent `checkpoint()` calls from multiple threads are safe; `close()` waits for all in-flight materialisations before removing files.
80
+
81
+ ## Cleanup Behaviour
82
+
83
+ | Scenario | `cleanup=True` (default) | `cleanup=False` |
84
+ |---|---|---|
85
+ | `close()` / `__exit__` | Files deleted | Files retained |
86
+ | GC / interpreter shutdown | Files deleted (via `weakref.finalize`) | Files retained |
87
+
88
+ When `root_dir` is auto-generated, the entire temp directory is removed. When user-supplied, only the individual checkpoint subdirectories created by the session are removed.
@@ -0,0 +1,80 @@
1
+ # polars-checkpoint
2
+
3
+ Materialise Polars LazyFrames to parquet files and scan them back lazily. Defaults to the streaming engine for sink/scan. Useful for managing & reducing memory pressure due to expensive intermediate results in complex multi-step transforms.
4
+
5
+ ## Installation
6
+
7
+ Requires `polars` and Python 3.10+.
8
+
9
+ ## Quick Start
10
+
11
+ ```python
12
+ import polars as pl
13
+ from checkpoint import checkpoint
14
+
15
+ lf = pl.LazyFrame({"x": range(1_000_000)}).with_columns(y=pl.col("x") * 2)
16
+
17
+ # Materialises to a temp parquet file; returns a lazy re-scan
18
+ lf = checkpoint(lf)
19
+
20
+ lf.filter(pl.col("y") > 100).collect()
21
+ ```
22
+
23
+ A process-wide default session manages the temp directory and cleans it up at exit.
24
+
25
+ ## Session API
26
+
27
+ For explicit control over storage location and lifecycle, use `CheckpointSession`:
28
+
29
+ ```python
30
+ from checkpoint import CheckpointSession
31
+
32
+ # As a context manager — cleans up on exit from the block
33
+ with CheckpointSession(root_dir="./my_checkpoints") as sess:
34
+ lf = pl.scan_csv("big.csv")
35
+ lf = sess.checkpoint(lf, name="after-parse")
36
+ lf = lf.filter(pl.col("status") == "active")
37
+ lf = sess.checkpoint(lf, name="filtered")
38
+ ```
39
+
40
+ ```python
41
+ # Without a context manager — cleans up at GC or interpreter shutdown,
42
+ # or when you call close() explicitly
43
+ sess = CheckpointSession(root_dir="./my_checkpoints")
44
+ lf = sess.checkpoint(pl.scan_csv("big.csv"), name="raw")
45
+ reloaded = sess["raw"]
46
+ print(sess.summary())
47
+
48
+ sess.close() # optional; triggers early cleanup
49
+ ```
50
+
51
+ ### `CheckpointSession` constructor
52
+
53
+ | Parameter | Default | Description |
54
+ |---|---|---|
55
+ | `root_dir` | `None` (auto temp dir) | Parent directory for checkpoint folders. |
56
+ | `cleanup` | `True` | Delete checkpoint files on close / GC / interpreter exit. |
57
+ | `default_sink_kwargs` | `{"compression": "zstd"}` | Defaults passed to `sink_parquet` / `write_parquet`. |
58
+ | `default_scan_kwargs` | `{}` | Defaults passed to `scan_parquet`. |
59
+
60
+ ### Key methods & features
61
+
62
+ - **`checkpoint(lf, *, name=None, streaming=True, ...)`** — Materialise a LazyFrame to parquet. Auto-generates a name if none given. Falls back to `collect().write_parquet()` when `streaming=False`.
63
+ - **`session[name]`** — Retrieve a checkpoint as a `LazyFrame`.
64
+ - **`name in session`** — Check existence.
65
+ - **`len(session)`** / **`iter(session)`** — Count / list checkpoints.
66
+ - **`summary()`** — Returns a Polars DataFrame with name, size (MB), and path of each checkpoint.
67
+ - **`close(timeout=None)`** — Waits for in-flight writes, then cleans up. Also usable as a context manager.
68
+
69
+ ## Thread Safety
70
+
71
+ Sessions are internally locked. Concurrent `checkpoint()` calls from multiple threads are safe; `close()` waits for all in-flight materialisations before removing files.
72
+
73
+ ## Cleanup Behaviour
74
+
75
+ | Scenario | `cleanup=True` (default) | `cleanup=False` |
76
+ |---|---|---|
77
+ | `close()` / `__exit__` | Files deleted | Files retained |
78
+ | GC / interpreter shutdown | Files deleted (via `weakref.finalize`) | Files retained |
79
+
80
+ When `root_dir` is auto-generated, the entire temp directory is removed. When user-supplied, only the individual checkpoint subdirectories created by the session are removed.
@@ -0,0 +1,9 @@
1
+ [project]
2
+ name = "polars-checkpoint"
3
+ version = "0.1.0"
4
+ description = "Add your description here"
5
+ readme = "README.md"
6
+ requires-python = ">=3.11.1"
7
+ dependencies = [
8
+ "polars>=1.39.3",
9
+ ]
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,341 @@
1
+ from __future__ import annotations
2
+
3
+ import logging
4
+ import shutil
5
+ import tempfile
6
+ import threading
7
+ import time
8
+ import uuid
9
+ import weakref
10
+ from collections.abc import Iterator
11
+ from pathlib import Path
12
+ from typing import Any
13
+
14
+ import polars as pl
15
+
16
+ logger = logging.getLogger(__name__)
17
+
18
+ _default_session_lock = threading.Lock()
19
+ _default_session: CheckpointSession | None = None
20
+
21
+
22
+ def _cleanup_files(
23
+ root_dir: Path,
24
+ auto_root: bool,
25
+ owned_dirs: list[Path],
26
+ ) -> None:
27
+ """Safety-net cleanup invoked by weakref.finalize."""
28
+ if auto_root:
29
+ shutil.rmtree(root_dir, ignore_errors=True)
30
+ else:
31
+ for d in owned_dirs:
32
+ shutil.rmtree(d, ignore_errors=True)
33
+
34
+
35
+ class CheckpointSession:
36
+ """Manages a directory of parquet checkpoints for Polars LazyFrames."""
37
+
38
+ def __init__(
39
+ self,
40
+ root_dir: str | Path | None = None,
41
+ *,
42
+ cleanup: bool = True,
43
+ default_sink_kwargs: dict[str, Any] | None = None,
44
+ default_scan_kwargs: dict[str, Any] | None = None,
45
+ ) -> None:
46
+ self._lock = threading.Lock()
47
+ self._cond = threading.Condition(self._lock)
48
+ self._auto_root = root_dir is None
49
+ if self._auto_root:
50
+ self.root_dir = Path(tempfile.mkdtemp(prefix="pl-ckpt-"))
51
+ else:
52
+ self.root_dir = Path(root_dir)
53
+ self.root_dir.mkdir(parents=True, exist_ok=True)
54
+
55
+ self.cleanup = cleanup
56
+ self.default_sink_kwargs = {
57
+ "compression": "zstd",
58
+ **(default_sink_kwargs or {}),
59
+ }
60
+ self.default_scan_kwargs = dict(default_scan_kwargs or {})
61
+
62
+ self._owned_dirs: list[Path] = []
63
+ self._active_ops = 0
64
+ self._closed = False
65
+
66
+ # finalize fires at GC or interpreter shutdown (atexit=True),
67
+ # whichever comes first. close() calls detach() to prevent
68
+ # double-cleanup.
69
+ if self.cleanup:
70
+ self._finaliser = weakref.finalize(
71
+ self,
72
+ _cleanup_files,
73
+ self.root_dir,
74
+ self._auto_root,
75
+ self._owned_dirs, # same list object, mutated in place
76
+ )
77
+ self._finaliser.atexit = True
78
+ else:
79
+ self._finaliser = None
80
+
81
+ def __repr__(self) -> str:
82
+ if self._closed:
83
+ state = "closed"
84
+ else:
85
+ n = len(self._owned_dirs)
86
+ state = f"open, {n} checkpoint{'s' if n != 1 else ''}"
87
+ return f"<CheckpointSession root_dir={str(self.root_dir)!r} {state}>"
88
+
89
+ # -- context manager (optional, for early deterministic cleanup) -----------
90
+
91
+ def __enter__(self) -> CheckpointSession:
92
+ with self._cond:
93
+ self._ensure_open()
94
+ return self
95
+
96
+ def __exit__(self, *exc: object) -> None:
97
+ self.close()
98
+
99
+ # -- lifecycle -------------------------------------------------------------
100
+
101
+ def close(self, *, timeout: float | None = None) -> None:
102
+ """Close the session, wait for in-flight checkpoints, clean up files."""
103
+ with self._cond:
104
+ if self._closed:
105
+ return
106
+ self._closed = True
107
+
108
+ deadline = None if timeout is None else time.monotonic() + timeout
109
+ while self._active_ops > 0:
110
+ remaining = (
111
+ None if deadline is None else max(0.0, deadline - time.monotonic())
112
+ )
113
+ if remaining is not None and remaining <= 0:
114
+ logger.warning(
115
+ "Timed out waiting for %d in-flight checkpoint(s); "
116
+ "proceeding with cleanup",
117
+ self._active_ops,
118
+ )
119
+ break
120
+ self._cond.wait(timeout=remaining)
121
+
122
+ owned = list(self._owned_dirs)
123
+
124
+ # Deactivate the finaliser so it won't double-clean.
125
+ # If the finaliser has already fired (e.g. at GC), detach() is a
126
+ # no-op, and the rmtree calls below are idempotent.
127
+ if self._finaliser is not None:
128
+ self._finaliser.detach()
129
+
130
+ if not self.cleanup:
131
+ return
132
+
133
+ if self._auto_root:
134
+ shutil.rmtree(self.root_dir, ignore_errors=True)
135
+ else:
136
+ for d in owned:
137
+ shutil.rmtree(d, ignore_errors=True)
138
+
139
+ # -- collection protocol ---------------------------------------------------
140
+
141
+ def __getitem__(self, name: str) -> pl.LazyFrame:
142
+ with self._cond:
143
+ self._ensure_open()
144
+ slug = _normalise_name(name)
145
+ path = self.root_dir / slug / "data.parquet"
146
+ if not path.exists():
147
+ with self._cond:
148
+ available = [
149
+ d.name for d in self._owned_dirs if (d / "data.parquet").exists()
150
+ ]
151
+ raise KeyError(f"No checkpoint named {name!r}. Available: {available}")
152
+ return pl.scan_parquet(path, **self.default_scan_kwargs)
153
+
154
+ def __contains__(self, name: object) -> bool:
155
+ with self._cond:
156
+ self._ensure_open()
157
+ if not isinstance(name, str):
158
+ return False
159
+ try:
160
+ slug = _normalise_name(name)
161
+ except ValueError:
162
+ return False
163
+ return (self.root_dir / slug / "data.parquet").exists()
164
+
165
+ def __len__(self) -> int:
166
+ with self._cond:
167
+ self._ensure_open()
168
+ return len(self._owned_dirs)
169
+
170
+ def __iter__(self) -> Iterator[str]:
171
+ with self._cond:
172
+ self._ensure_open()
173
+ dirs = list(self._owned_dirs)
174
+ for d in dirs:
175
+ yield d.name
176
+
177
+ # -- introspection ---------------------------------------------------------
178
+
179
+ def summary(self) -> pl.DataFrame:
180
+ """Return a DataFrame listing all live checkpoints and their sizes."""
181
+ # Should maybe make it a .show() method that prints a table?
182
+ with self._cond:
183
+ self._ensure_open()
184
+ dirs = list(self._owned_dirs)
185
+ rows: list[dict[str, Any]] = []
186
+ for d in dirs:
187
+ p = d / "data.parquet"
188
+ if p.exists():
189
+ rows.append(
190
+ {
191
+ "name": d.name,
192
+ "size_mb": round(p.stat().st_size / 1_048_576, 2),
193
+ "path": str(p),
194
+ }
195
+ )
196
+ if not rows:
197
+ return pl.DataFrame(
198
+ schema={
199
+ "name": pl.Utf8,
200
+ "size_mb": pl.Float64,
201
+ "path": pl.Utf8,
202
+ },
203
+ )
204
+ return pl.DataFrame(rows)
205
+
206
+ # -- checkpointing ---------------------------------------------------------
207
+
208
+ def checkpoint(
209
+ self,
210
+ lf: pl.LazyFrame,
211
+ *,
212
+ name: str | None = None,
213
+ sink_kwargs: dict[str, Any] | None = None,
214
+ scan_kwargs: dict[str, Any] | None = None,
215
+ streaming: bool = True,
216
+ ) -> pl.LazyFrame:
217
+ """Materialise LazyFrame to a parquet checkpoint and return a lazy re-scan."""
218
+ with self._cond:
219
+ self._ensure_open()
220
+ checkpoint_dir = self._new_checkpoint_dir(name)
221
+ self._active_ops += 1
222
+
223
+ checkpoint_path = checkpoint_dir / "data.parquet"
224
+ sink_opts = {**self.default_sink_kwargs, **(sink_kwargs or {})}
225
+ scan_opts = {**self.default_scan_kwargs, **(scan_kwargs or {})}
226
+
227
+ ok = False
228
+ try:
229
+ self._materialise(lf, checkpoint_path, sink_opts, streaming)
230
+ ok = True
231
+ finally:
232
+ with self._cond:
233
+ if not ok:
234
+ try:
235
+ self._owned_dirs.remove(checkpoint_dir)
236
+ except ValueError:
237
+ pass
238
+ self._active_ops -= 1
239
+ self._cond.notify_all()
240
+ if not ok:
241
+ shutil.rmtree(checkpoint_dir, ignore_errors=True)
242
+
243
+ return pl.scan_parquet(checkpoint_path, **scan_opts)
244
+
245
+ # -- internals -------------------------------------------------------------
246
+
247
+ def _ensure_open(self) -> None:
248
+ if self._closed:
249
+ raise RuntimeError("CheckpointSession is closed")
250
+
251
+ def _new_checkpoint_dir(self, name: str | None) -> Path:
252
+ # Must be called under self._cond / self._lock.
253
+ slug = _normalise_name(name) if name is not None else uuid.uuid4().hex
254
+ path = self.root_dir / slug
255
+ try:
256
+ path.mkdir(parents=True, exist_ok=False)
257
+ except FileExistsError:
258
+ msg = f"Checkpoint directory {slug!r} already exists"
259
+ if name is not None and slug != name:
260
+ msg += f" (normalised from {name!r})"
261
+ raise FileExistsError(msg) from None
262
+ self._owned_dirs.append(path)
263
+ return path
264
+
265
+ @staticmethod
266
+ def _materialise(
267
+ lf: pl.LazyFrame,
268
+ path: Path,
269
+ sink_opts: dict[str, Any],
270
+ streaming: bool,
271
+ ) -> None:
272
+ mode = "streaming" if streaming else "collect"
273
+ logger.debug("Materialising checkpoint to %s (%s)", path, mode)
274
+ t0 = time.perf_counter()
275
+
276
+ if streaming:
277
+ try:
278
+ lf.sink_parquet(path, **sink_opts)
279
+ except Exception as exc:
280
+ raise RuntimeError(
281
+ f"Checkpoint materialisation failed: {exc}\n\n"
282
+ "If the streaming engine does not support this query "
283
+ "plan, retry with streaming=False."
284
+ ) from exc
285
+ else:
286
+ lf.collect().write_parquet(path, **sink_opts)
287
+
288
+ elapsed = time.perf_counter() - t0
289
+ size_mb = path.stat().st_size / 1_048_576
290
+ logger.debug("Checkpoint written in %.2fs (%.1f MB)", elapsed, size_mb)
291
+
292
+
293
+ # -- standalone function --------------------------------------------------------
294
+
295
+
296
+ def checkpoint(
297
+ lf: pl.LazyFrame,
298
+ *,
299
+ session: CheckpointSession | None = None,
300
+ name: str | None = None,
301
+ sink_kwargs: dict[str, Any] | None = None,
302
+ scan_kwargs: dict[str, Any] | None = None,
303
+ streaming: bool = True,
304
+ ) -> pl.LazyFrame:
305
+ """Materialise LazyFrame to a parquet checkpoint."""
306
+ sess = session if session is not None else _get_default_session()
307
+ return sess.checkpoint(
308
+ lf,
309
+ name=name,
310
+ sink_kwargs=sink_kwargs,
311
+ scan_kwargs=scan_kwargs,
312
+ streaming=streaming,
313
+ )
314
+
315
+
316
+ # -- default session -----------------------------------------------------------
317
+
318
+
319
+ def _get_default_session() -> CheckpointSession:
320
+ """Lazily create a process-wide default session."""
321
+ global _default_session
322
+ with _default_session_lock:
323
+ if _default_session is None or _default_session._closed:
324
+ _default_session = CheckpointSession()
325
+ return _default_session
326
+
327
+
328
+ # -- utilities -----------------------------------------------------------------
329
+
330
+
331
+ def _normalise_name(name: str) -> str:
332
+ out = []
333
+ for ch in name:
334
+ if ch.isalnum() or ch in {"-", "_", "."}:
335
+ out.append(ch)
336
+ else:
337
+ out.append("_")
338
+ normalised = "".join(out).strip("._")
339
+ if not normalised:
340
+ raise ValueError(f"Checkpoint name {name!r} normalised to an empty string")
341
+ return normalised
@@ -0,0 +1,88 @@
1
+ Metadata-Version: 2.4
2
+ Name: polars-checkpoint
3
+ Version: 0.1.0
4
+ Summary: Add your description here
5
+ Requires-Python: >=3.11.1
6
+ Description-Content-Type: text/markdown
7
+ Requires-Dist: polars>=1.39.3
8
+
9
+ # polars-checkpoint
10
+
11
+ Materialise Polars LazyFrames to parquet files and scan them back lazily. Defaults to the streaming engine for sink/scan. Useful for managing & reducing memory pressure due to expensive intermediate results in complex multi-step transforms.
12
+
13
+ ## Installation
14
+
15
+ Requires `polars` and Python 3.10+.
16
+
17
+ ## Quick Start
18
+
19
+ ```python
20
+ import polars as pl
21
+ from checkpoint import checkpoint
22
+
23
+ lf = pl.LazyFrame({"x": range(1_000_000)}).with_columns(y=pl.col("x") * 2)
24
+
25
+ # Materialises to a temp parquet file; returns a lazy re-scan
26
+ lf = checkpoint(lf)
27
+
28
+ lf.filter(pl.col("y") > 100).collect()
29
+ ```
30
+
31
+ A process-wide default session manages the temp directory and cleans it up at exit.
32
+
33
+ ## Session API
34
+
35
+ For explicit control over storage location and lifecycle, use `CheckpointSession`:
36
+
37
+ ```python
38
+ from checkpoint import CheckpointSession
39
+
40
+ # As a context manager — cleans up on exit from the block
41
+ with CheckpointSession(root_dir="./my_checkpoints") as sess:
42
+ lf = pl.scan_csv("big.csv")
43
+ lf = sess.checkpoint(lf, name="after-parse")
44
+ lf = lf.filter(pl.col("status") == "active")
45
+ lf = sess.checkpoint(lf, name="filtered")
46
+ ```
47
+
48
+ ```python
49
+ # Without a context manager — cleans up at GC or interpreter shutdown,
50
+ # or when you call close() explicitly
51
+ sess = CheckpointSession(root_dir="./my_checkpoints")
52
+ lf = sess.checkpoint(pl.scan_csv("big.csv"), name="raw")
53
+ reloaded = sess["raw"]
54
+ print(sess.summary())
55
+
56
+ sess.close() # optional; triggers early cleanup
57
+ ```
58
+
59
+ ### `CheckpointSession` constructor
60
+
61
+ | Parameter | Default | Description |
62
+ |---|---|---|
63
+ | `root_dir` | `None` (auto temp dir) | Parent directory for checkpoint folders. |
64
+ | `cleanup` | `True` | Delete checkpoint files on close / GC / interpreter exit. |
65
+ | `default_sink_kwargs` | `{"compression": "zstd"}` | Defaults passed to `sink_parquet` / `write_parquet`. |
66
+ | `default_scan_kwargs` | `{}` | Defaults passed to `scan_parquet`. |
67
+
68
+ ### Key methods & features
69
+
70
+ - **`checkpoint(lf, *, name=None, streaming=True, ...)`** — Materialise a LazyFrame to parquet. Auto-generates a name if none given. Falls back to `collect().write_parquet()` when `streaming=False`.
71
+ - **`session[name]`** — Retrieve a checkpoint as a `LazyFrame`.
72
+ - **`name in session`** — Check existence.
73
+ - **`len(session)`** / **`iter(session)`** — Count / list checkpoints.
74
+ - **`summary()`** — Returns a Polars DataFrame with name, size (MB), and path of each checkpoint.
75
+ - **`close(timeout=None)`** — Waits for in-flight writes, then cleans up. Also usable as a context manager.
76
+
77
+ ## Thread Safety
78
+
79
+ Sessions are internally locked. Concurrent `checkpoint()` calls from multiple threads are safe; `close()` waits for all in-flight materialisations before removing files.
80
+
81
+ ## Cleanup Behaviour
82
+
83
+ | Scenario | `cleanup=True` (default) | `cleanup=False` |
84
+ |---|---|---|
85
+ | `close()` / `__exit__` | Files deleted | Files retained |
86
+ | GC / interpreter shutdown | Files deleted (via `weakref.finalize`) | Files retained |
87
+
88
+ When `root_dir` is auto-generated, the entire temp directory is removed. When user-supplied, only the individual checkpoint subdirectories created by the session are removed.
@@ -0,0 +1,9 @@
1
+ README.md
2
+ pyproject.toml
3
+ src/polars_checkpoint/__init__.py
4
+ src/polars_checkpoint/polars_checkpoint.py
5
+ src/polars_checkpoint.egg-info/PKG-INFO
6
+ src/polars_checkpoint.egg-info/SOURCES.txt
7
+ src/polars_checkpoint.egg-info/dependency_links.txt
8
+ src/polars_checkpoint.egg-info/requires.txt
9
+ src/polars_checkpoint.egg-info/top_level.txt
@@ -0,0 +1 @@
1
+ polars_checkpoint