furu 0.0.1__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,502 @@
1
+ Metadata-Version: 2.4
2
+ Name: furu
3
+ Version: 0.0.1
4
+ Summary: Cacheable, nested pipelines for Python. Define computations as configs; furu handles caching, state tracking, and result reuse across runs.
5
+ Author-email: Herman Brunborg <herman@brunborg.com>
6
+ Requires-Python: >=3.12
7
+ Requires-Dist: chz>=0.4.0
8
+ Requires-Dist: cloudpickle>=3.1.1
9
+ Requires-Dist: pydantic>=2.12.5
10
+ Requires-Dist: python-dotenv>=1.0.0
11
+ Requires-Dist: rich>=14.2.0
12
+ Requires-Dist: submitit>=1.5.3
13
+ Provides-Extra: dashboard
14
+ Requires-Dist: fastapi>=0.109.0; extra == 'dashboard'
15
+ Requires-Dist: typer>=0.9.0; extra == 'dashboard'
16
+ Requires-Dist: uvicorn[standard]>=0.27.0; extra == 'dashboard'
17
+ Description-Content-Type: text/markdown
18
+
19
+ # furu
20
+
21
+ > **Note:** `v0.0.x` is alpha and may include breaking changes.
22
+
23
+ A Python library for building cacheable, nested pipelines. Define computations as config objects; furu turns configs into stable on-disk artifact directories, records metadata/state, and reuses results across runs.
24
+
25
+ Built on [chz](https://github.com/openai/chz) for declarative configs.
26
+
27
+ ## Installation
28
+
29
+ ```bash
30
+ uv add "furu[dashboard]"
31
+ ```
32
+
33
+ Or with pip:
34
+
35
+ ```bash
36
+ pip install "furu[dashboard]"
37
+ ```
38
+
39
+ The `[dashboard]` extra includes the web dashboard. Omit it for the core library only.
40
+
41
+ ## Quickstart
42
+
43
+ 1. Subclass `furu.Furu[T]`
44
+ 2. Implement `_create(self) -> T` (compute and write to `self.furu_dir`)
45
+ 3. Implement `_load(self) -> T` (load from `self.furu_dir`)
46
+ 4. Call `load_or_create()`
47
+
48
+ ```python
49
+ # my_project/pipelines.py
50
+ import json
51
+ from pathlib import Path
52
+ import furu
53
+
54
+ class TrainModel(furu.Furu[Path]):
55
+ lr: float = furu.chz.field(default=1e-3)
56
+ steps: int = furu.chz.field(default=1000)
57
+
58
+ def _create(self) -> Path:
59
+ # Write outputs into the artifact directory
60
+ (self.furu_dir / "metrics.json").write_text(
61
+ json.dumps({"lr": self.lr, "steps": self.steps})
62
+ )
63
+ ckpt = self.furu_dir / "checkpoint.bin"
64
+ ckpt.write_bytes(b"...")
65
+ return ckpt
66
+
67
+ def _load(self) -> Path:
68
+ # Load outputs back from disk
69
+ return self.furu_dir / "checkpoint.bin"
70
+ ```
71
+
72
+ ```python
73
+ # run_train.py
74
+ from my_project.pipelines import TrainModel
75
+
76
+ # First call: runs _create(), caches result
77
+ artifact = TrainModel(lr=3e-4, steps=5000).load_or_create()
78
+
79
+ # Second call with same config: loads from cache via _load()
80
+ artifact = TrainModel(lr=3e-4, steps=5000).load_or_create()
81
+ ```
82
+
83
+ > **Tip:** Define Furu classes in importable modules (not `__main__`); the artifact namespace is derived from the class's module + qualified name.
84
+
85
+ ## Core Concepts
86
+
87
+ ### How Caching Works
88
+
89
+ Each `Furu` instance maps deterministically to a directory based on its config:
90
+
91
+ ```
92
+ <root>/<namespace>/<hash>/
93
+ ```
94
+
95
+ - **namespace**: Derived from the class's module + qualified name (e.g., `my_project.pipelines/TrainModel`)
96
+ - **hash**: Computed from the object's config values using Blake2s
97
+
98
+ When you call `load_or_create()`:
99
+ 1. If no cached result exists → run `_create()`, save state as "success"
100
+ 2. If cached result exists → run `_load()` to retrieve it
101
+ 3. If another process is running → wait for it to finish, then load
102
+
103
+ ### Nested Pipelines (Dependencies)
104
+
105
+ Furu objects compose via nested configs. Each dependency gets its own artifact folder:
106
+
107
+ ```python
108
+ import furu
109
+
110
+ class Dataset(furu.Furu[str]):
111
+ name: str = furu.chz.field(default="toy")
112
+
113
+ def _create(self) -> str:
114
+ (self.furu_dir / "data.txt").write_text("hello\nworld\n")
115
+ return "ready"
116
+
117
+ def _load(self) -> str:
118
+ return (self.furu_dir / "data.txt").read_text()
119
+
120
+
121
+ class TrainTextModel(furu.Furu[str]):
122
+ dataset: Dataset = furu.chz.field(default_factory=Dataset)
123
+
124
+ def _create(self) -> str:
125
+ data = self.dataset.load_or_create() # Triggers Dataset cache
126
+ (self.furu_dir / "model.txt").write_text(f"trained on:\n{data}")
127
+ return "trained"
128
+
129
+ def _load(self) -> str:
130
+ return (self.furu_dir / "model.txt").read_text()
131
+ ```
132
+
133
+ ### Storage Structure
134
+
135
+ ```
136
+ $FURU_PATH/
137
+ ├── data/ # Default storage (version_controlled=False)
138
+ │ └── <module>/<Class>/
139
+ │ └── <hash>/
140
+ │ ├── .furu/
141
+ │ │ ├── metadata.json # Config, git info, environment
142
+ │ │ ├── state.json # Status and timestamps
143
+ │ │ ├── furu.log # Captured logs
144
+ │ │ └── SUCCESS.json # Marker file
145
+ │ └── <your outputs> # Files from _create()
146
+ ├── git/ # For version_controlled=True
147
+ │ └── <same structure>
148
+ └── raw/ # Shared directory for large files
149
+ ```
150
+
151
+ ## Features
152
+
153
+ ### FuruList: Managing Experiment Collections
154
+
155
+ `FuruList` provides a collection interface for organizing related experiments:
156
+
157
+ ```python
158
+ import furu
159
+
160
+ class MyExperiments(furu.FuruList[TrainModel]):
161
+ baseline = TrainModel(lr=1e-3, steps=1000)
162
+ fast_lr = TrainModel(lr=1e-2, steps=1000)
163
+ long_run = TrainModel(lr=1e-3, steps=10000)
164
+
165
+ # Can also use a dict for dynamic configs
166
+ configs = {
167
+ "tiny": TrainModel(lr=1e-3, steps=100),
168
+ "huge": TrainModel(lr=1e-4, steps=100000),
169
+ }
170
+
171
+ # Iterate over all experiments
172
+ for exp in MyExperiments:
173
+ exp.load_or_create()
174
+
175
+ # Access by name
176
+ exp = MyExperiments.by_name("baseline")
177
+
178
+ # Get all as list
179
+ all_exps = MyExperiments.all()
180
+
181
+ # Get (name, instance) pairs
182
+ for name, exp in MyExperiments.items():
183
+ print(f"{name}: {exp.exists()}")
184
+ ```
185
+
186
+ ### Custom Validation
187
+
188
+ Override `_validate()` to add custom cache invalidation logic:
189
+
190
+ ```python
191
+ class ModelWithValidation(furu.Furu[Path]):
192
+ checkpoint_name: str = "model.pt"
193
+
194
+ def _validate(self) -> bool:
195
+ # Return False to force re-computation
196
+ ckpt = self.furu_dir / self.checkpoint_name
197
+ return ckpt.exists() and ckpt.stat().st_size > 0
198
+
199
+ def _create(self) -> Path:
200
+ ...
201
+
202
+ def _load(self) -> Path:
203
+ ...
204
+ ```
205
+
206
+ ### Checking State Without Loading
207
+
208
+ ```python
209
+ obj = TrainModel(lr=3e-4, steps=5000)
210
+
211
+ # Check if cached result exists (runs _validate())
212
+ if obj.exists():
213
+ print("Already computed!")
214
+
215
+ # Get metadata without triggering computation
216
+ metadata = obj.get_metadata()
217
+ print(f"Hash: {obj._furu_hash}")
218
+ print(f"Dir: {obj.furu_dir}")
219
+ ```
220
+
221
+ ### Serialization
222
+
223
+ Furu objects can be serialized to/from dictionaries:
224
+
225
+ ```python
226
+ obj = TrainModel(lr=3e-4, steps=5000)
227
+
228
+ # Serialize to dict (for storage, transmission)
229
+ data = obj.to_dict()
230
+
231
+ # Reconstruct from dict
232
+ obj2 = TrainModel.from_dict(data)
233
+
234
+ # Get Python code representation (useful for logging)
235
+ print(obj.to_python())
236
+ # Output: TrainModel(lr=0.0003, steps=5000)
237
+ ```
238
+
239
+ ### Raw Directory
240
+
241
+ For large files that shouldn't be versioned per-config, use the shared raw directory:
242
+
243
+ ```python
244
+ class LargeDataProcessor(furu.Furu[Path]):
245
+ def _create(self) -> Path:
246
+ # self.raw_dir is shared across all configs
247
+ # Create a subfolder for isolation if needed
248
+ my_raw = self.raw_dir / self._furu_hash
249
+ my_raw.mkdir(exist_ok=True)
250
+
251
+ large_file = my_raw / "huge_dataset.bin"
252
+ # ... write large file ...
253
+ return large_file
254
+ ```
255
+
256
+ ### Version-Controlled Storage
257
+
258
+ For artifacts that should be stored separately (e.g., checked into git):
259
+
260
+ ```python
261
+ class VersionedConfig(furu.Furu[dict], version_controlled=True):
262
+ # Stored under $FURU_PATH/git/ instead of $FURU_PATH/data/
263
+ ...
264
+ ```
265
+
266
+ ## Logging
267
+
268
+ Furu installs stdlib `logging` handlers that capture logs to per-artifact files.
269
+
270
+ ```python
271
+ import logging
272
+ import furu
273
+
274
+ log = logging.getLogger(__name__)
275
+
276
+ class MyPipeline(furu.Furu[str]):
277
+ def _create(self) -> str:
278
+ log.info("Starting computation...") # Goes to furu.log
279
+ log.debug("Debug details...")
280
+ return "done"
281
+ ```
282
+
283
+ ### Console Output
284
+
285
+ By default, furu logs to console using Rich in a compact format:
286
+
287
+ ```
288
+ HHMMSS file.py:line message
289
+ ```
290
+
291
+ Furu emits status messages like:
292
+ ```
293
+ load_or_create TrainModel abc123def (missing->create)
294
+ load_or_create TrainModel abc123def (success->load)
295
+ ```
296
+
297
+ ### Explicit Setup
298
+
299
+ ```python
300
+ import furu
301
+
302
+ # Eagerly install logging handlers (optional, happens automatically)
303
+ furu.configure_logging()
304
+
305
+ # Get the furu logger
306
+ logger = furu.get_logger()
307
+ ```
308
+
309
+ ## Error Handling
310
+
311
+ ```python
312
+ from furu import FuruComputeError, FuruWaitTimeout, FuruLockNotAcquired
313
+
314
+ try:
315
+ result = obj.load_or_create()
316
+ except FuruComputeError as e:
317
+ print(f"Computation failed: {e}")
318
+ print(f"State file: {e.state_path}")
319
+ print(f"Original error: {e.original_error}")
320
+ except FuruWaitTimeout:
321
+ print("Timed out waiting for another process")
322
+ except FuruLockNotAcquired:
323
+ print("Could not acquire lock")
324
+ ```
325
+
326
+ ## Submitit Integration
327
+
328
+ Run computations on SLURM clusters via [submitit](https://github.com/facebookincubator/submitit):
329
+
330
+ ```python
331
+ import submitit
332
+ import furu
333
+
334
+ executor = submitit.AutoExecutor(folder="submitit_logs")
335
+ executor.update_parameters(
336
+ timeout_min=60,
337
+ slurm_partition="gpu",
338
+ gpus_per_node=1,
339
+ )
340
+
341
+ # Submit job and return immediately
342
+ job = my_furu_obj.load_or_create(executor=executor)
343
+
344
+ # Job ID is tracked in .furu/state.json
345
+ print(job.job_id)
346
+ ```
347
+
348
+ Furu handles preemption, requeuing, and state tracking automatically.
349
+
350
+ ## Dashboard
351
+
352
+ The web dashboard provides experiment browsing, filtering, and dependency visualization.
353
+
354
+ ### Running the Dashboard
355
+
356
+ ```bash
357
+ # Full dashboard with React frontend
358
+ furu-dashboard serve
359
+
360
+ # Or with options
361
+ furu-dashboard serve --host 0.0.0.0 --port 8000 --reload
362
+
363
+ # API server only (no frontend)
364
+ furu-dashboard api
365
+ ```
366
+
367
+ Or via Python:
368
+ ```bash
369
+ python -m furu.dashboard serve
370
+ ```
371
+
372
+ ### API Endpoints
373
+
374
+ | Endpoint | Description |
375
+ |----------|-------------|
376
+ | `GET /api/experiments` | List experiments with filtering/pagination |
377
+ | `GET /api/experiments/{namespace}/{hash}` | Get experiment details |
378
+ | `GET /api/experiments/{namespace}/{hash}/relationships` | Get dependencies |
379
+ | `GET /api/stats` | Aggregate statistics |
380
+ | `GET /api/dag` | Dependency graph for visualization |
381
+
382
+ ### Filtering
383
+
384
+ The `/api/experiments` endpoint supports:
385
+
386
+ - `result_status`: `absent`, `incomplete`, `success`, `failed`
387
+ - `attempt_status`: `queued`, `running`, `success`, `failed`, `cancelled`, `preempted`, `crashed`
388
+ - `namespace`: Filter by namespace prefix
389
+ - `backend`: `local`, `submitit`
390
+ - `hostname`, `user`: Filter by execution environment
391
+ - `started_after`, `started_before`: ISO datetime filters
392
+ - `config_filter`: Filter by config field (e.g., `lr=0.001`)
393
+
394
+ ## Configuration Reference
395
+
396
+ ### Environment Variables
397
+
398
+ | Variable | Default | Description |
399
+ |----------|---------|-------------|
400
+ | `FURU_PATH` | `./data-furu/` | Base storage directory |
401
+ | `FURU_LOG_LEVEL` | `INFO` | Console verbosity (`DEBUG`, `INFO`, `WARNING`, `ERROR`) |
402
+ | `FURU_IGNORE_DIFF` | `false` | Skip embedding git diff in metadata |
403
+ | `FURU_POLL_INTERVAL_SECS` | `10` | Polling interval for queued/running jobs |
404
+ | `FURU_WAIT_LOG_EVERY_SECS` | `10` | Interval between "waiting" log messages |
405
+ | `FURU_STALE_AFTER_SECS` | `1800` | Consider running jobs stale after this duration |
406
+ | `FURU_LEASE_SECS` | `120` | Compute lock lease duration |
407
+ | `FURU_HEARTBEAT_SECS` | `lease/3` | Heartbeat interval for running jobs |
408
+ | `FURU_PREEMPT_MAX` | `5` | Maximum submitit requeues on preemption |
409
+ | `FURU_CANCELLED_IS_PREEMPTED` | `false` | Treat SLURM CANCELLED as preempted |
410
+ | `FURU_RICH_UNCAUGHT_TRACEBACKS` | `true` | Use Rich for exception formatting |
411
+
412
+ Local `.env` files are loaded automatically if `python-dotenv` is installed.
413
+
414
+ ### Programmatic Configuration
415
+
416
+ ```python
417
+ import furu
418
+ from pathlib import Path
419
+
420
+ # Set/get root directory
421
+ furu.set_furu_root(Path("/my/storage"))
422
+ root = furu.get_furu_root()
423
+
424
+ # Access config directly
425
+ furu.FURU_CONFIG.ignore_git_diff = True
426
+ furu.FURU_CONFIG.poll_interval = 5.0
427
+ ```
428
+
429
+ ### Class-Level Options
430
+
431
+ ```python
432
+ class MyPipeline(furu.Furu[Path], version_controlled=True):
433
+ _max_wait_time_sec = 3600.0 # Wait up to 1 hour (default: 600)
434
+ ...
435
+ ```
436
+
437
+ ## Metadata
438
+
439
+ Each artifact records:
440
+
441
+ | Category | Fields |
442
+ |----------|--------|
443
+ | **Config** | `furu_python_def`, `furu_obj`, `furu_hash`, `furu_path` |
444
+ | **Git** | `git_commit`, `git_branch`, `git_remote`, `git_patch`, `git_submodules` |
445
+ | **Environment** | `timestamp`, `command`, `python_version`, `executable`, `platform`, `hostname`, `user`, `pid` |
446
+
447
+ Access via:
448
+ ```python
449
+ metadata = obj.get_metadata()
450
+ print(metadata.git_commit)
451
+ print(metadata.hostname)
452
+ ```
453
+
454
+ ## Public API
455
+
456
+ ```python
457
+ from furu import (
458
+ # Core
459
+ Furu,
460
+ FuruList,
461
+ FURU_CONFIG,
462
+
463
+ # Configuration
464
+ get_furu_root,
465
+ set_furu_root,
466
+
467
+ # Errors
468
+ FuruError,
469
+ FuruComputeError,
470
+ FuruLockNotAcquired,
471
+ FuruWaitTimeout,
472
+ MISSING,
473
+
474
+ # Serialization
475
+ FuruSerializer,
476
+
477
+ # Storage
478
+ StateManager,
479
+ MetadataManager,
480
+
481
+ # Runtime
482
+ configure_logging,
483
+ get_logger,
484
+ load_env,
485
+
486
+ # Adapters
487
+ SubmititAdapter,
488
+
489
+ # Re-exports
490
+ chz,
491
+ submitit,
492
+
493
+ # Version
494
+ __version__,
495
+ )
496
+ ```
497
+
498
+ ## Non-goals / Caveats
499
+
500
+ - **Prototype status**: APIs and on-disk formats may change
501
+ - **Not a workflow scheduler** (for now): It's a lightweight caching layer for Python code
502
+ - **No distributed coordination**: Lock files work on shared filesystems but aren't distributed
@@ -0,0 +1,36 @@
1
+ furu/__init__.py,sha256=fhSViHOJ9W-64swuaBFdZOfq0ZMuSj6LSiX2ZfcjhD8,1736
2
+ furu/config.py,sha256=F_Bh9vs0Dq5-3fXMylEBbm7F9-Q2n9aLt1iTb-RAl-4,3538
3
+ furu/errors.py,sha256=d1Kp5O9cVoQwXmQeZC-35u7xldw_c3ryYXrbVfv-Lws,2001
4
+ furu/migrate.py,sha256=x_Uh7oXAv40L5ZAHJhdnw-o7ct56rWUSZLbHHfRObeY,1313
5
+ furu/migration.py,sha256=A91dng1XRn1N_xJrmBhh-OvU22GlseqOh6PmVhNZh3w,31307
6
+ furu/adapters/__init__.py,sha256=onLzEj9hccPK15g8a8va2T19nqQXoxb9rQlJIjKSKnE,69
7
+ furu/adapters/submitit.py,sha256=OuCP0pEkO1kI4WLcSUvMqXwVCCy-8uwUE7v1qvkLZnU,6214
8
+ furu/core/__init__.py,sha256=gzFMgaAYnffofQksR6E1NegiwBF99h0ysn_QeD5wIhw,82
9
+ furu/core/furu.py,sha256=MjwpJtS0T8aRtLsFiiVTB8oh5UtIQrF3ohzYbD9XFIc,39047
10
+ furu/core/list.py,sha256=hwwlvqaKB1grPBGKXc15scF1RCqDvWc0AoDbhKlN4W0,3625
11
+ furu/dashboard/__init__.py,sha256=zNVddterfpjQtcpihIl3TRJdgdjOHYR0uO0cOSaGABg,172
12
+ furu/dashboard/__main__.py,sha256=cNs65IMl4kwZFpxa9xLXmFSy4-M5D1X1ZBfTDxW11vo,144
13
+ furu/dashboard/main.py,sha256=8JYc79gbJ9MjvIRdGDuAcR2Mme9kyY4ryZb11ZZ4uVA,4069
14
+ furu/dashboard/scanner.py,sha256=qXCvkvFByBc09TUdth5Js67rS8zpRBlRkVQ9dJ7YbdE,34696
15
+ furu/dashboard/api/__init__.py,sha256=9-WyWOt-VQJJBIsdW29D-7JvR-BivJd9G_SRaRptCz0,80
16
+ furu/dashboard/api/models.py,sha256=SCu-kLJyW7dwSKswdgQNS3wQuj25ORs0pHkvX9xBbo4,4767
17
+ furu/dashboard/api/routes.py,sha256=iZez0khIUvbgfeSoy1BJvmoEEbgUrdSQA8SN8iAIkM8,4813
18
+ furu/dashboard/frontend/dist/favicon.svg,sha256=3TSLHNZITFe3JTPoYHZnDgiGsJxIzf39v97l2A1Hodo,369
19
+ furu/dashboard/frontend/dist/index.html,sha256=o3XhvegC9rBpUiWNfXdCHqf_tg2795nob1NI0nBpFS4,810
20
+ furu/dashboard/frontend/dist/assets/index-CbdDfSOZ.css,sha256=k3kxCuCqyxKgIv4M9itoAImMU8NMzkzAdTNQ4v_4fMU,34612
21
+ furu/dashboard/frontend/dist/assets/index-DDv_TYB_.js,sha256=FH0uqY7P7vm3rikvDaJ504FZh0Z97nCkVcIglK-ElAY,543928
22
+ furu/runtime/__init__.py,sha256=fQqE7wUuWunLD73Vm3lss7BFSij3UVxXOKQXBAOS8zw,504
23
+ furu/runtime/env.py,sha256=o1phhoTDhOnhALr3Ozf1ldrdvk2ClyEvBWbebHM6BXg,160
24
+ furu/runtime/logging.py,sha256=JkuTFtbv6dYk088P6_Bga46bnKSDt-ElAqmiY86hMys,9773
25
+ furu/runtime/tracebacks.py,sha256=PGCuOq8QkWSoun791gjUXM8frOP2wWV8IBlqaA4nuGE,1631
26
+ furu/serialization/__init__.py,sha256=L7oHuIbxdSh7GCY3thMQnDwlt_ERH-TMy0YKEAZLrPs,341
27
+ furu/serialization/migrations.py,sha256=HD5g8JCBdH3Y0rHJYc4Ug1IXBVcUDxLE7nfiXZnXcUE,7772
28
+ furu/serialization/serializer.py,sha256=THWqHzpSwXj3Nj3PZ3JhwlWJ8sgvVyGrwBEDB_EWuAE,8355
29
+ furu/storage/__init__.py,sha256=cLLL-GPpSu9C72Mdk5S6TGu3g-SnBfEuxzfpx5ZJPtw,616
30
+ furu/storage/metadata.py,sha256=u4F4V1dDZtsiniO5xDCy8YxJZxGnreriYnJ1fOvQ2Bg,9232
31
+ furu/storage/migration.py,sha256=Ars9aYwvhXpIBDf6L9ojGjp_l656-RfdtEAFKN0sZZY,2640
32
+ furu/storage/state.py,sha256=tbVX74P6nVHhL1EBztgKp9BCe0UHpW0nyGkSeJXPejs,37581
33
+ furu-0.0.1.dist-info/METADATA,sha256=mGC5hO68kGPxMUepH1Cnws-TDowOyCi1cgJ36pgTTOA,13294
34
+ furu-0.0.1.dist-info/WHEEL,sha256=WLgqFyCfm_KASv4WHyYy0P3pM_m7J5L9k2skdKLirC8,87
35
+ furu-0.0.1.dist-info/entry_points.txt,sha256=pIkNLYq-gaxYbh_lATWl31BHTrKBg1jN6jK1AgN6-QY,59
36
+ furu-0.0.1.dist-info/RECORD,,
@@ -0,0 +1,4 @@
1
+ Wheel-Version: 1.0
2
+ Generator: hatchling 1.28.0
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any
@@ -0,0 +1,2 @@
1
+ [console_scripts]
2
+ furu-dashboard = furu.dashboard.main:cli