cached-duckdb 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,52 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [0.2.0] - 2026-06-01
9
+
10
+ ### Changed
11
+ - Standardized packaging to match deployment rules
12
+ - Added `license-files`, `classifiers`, and `cibuildwheel` config to pyproject.toml
13
+ - Replaced legacy setup.py with Cython-gated build pattern
14
+ - Added version lookup via `importlib.metadata` in `__init__.py`
15
+ - Added GitHub Actions publish workflow (`.github/workflows/publish.yml`)
16
+ - Updated MANIFEST.in to include all documentation files
17
+ - Bumped `requires-python` to `>=3.10`
18
+
19
+ ### Added
20
+ - PersistenceCoordinator for external DB persistence
21
+ - `DuckDbCachePersistenceError` exception
22
+ - `cache_persistence.py` module
23
+
24
+ ## [0.1.0] - 2026-05-13
25
+
26
+ ### Added
27
+ - Initial release of cached_duckdb library
28
+ - Generic database/table API for in-memory DataFrame caching
29
+ - Two storage modes: single_db and per_table_db
30
+ - Atomic swap for safe concurrent writes
31
+ - TTL-based expiry with background cleanup thread
32
+ - Lazy stale flagging for active readers
33
+ - Scheduler-managed table support (bypass TTL)
34
+ - SQL query interface with WHERE clause filtering
35
+ - Cross-table JOIN support (single_db mode)
36
+ - Per-database configuration via JSON file
37
+ - Environment variable configuration
38
+ - Thread-safe operations with per-database/table locking
39
+ - Comprehensive error handling with custom exceptions
40
+ - Metadata queries (row count, columns, types, last updated)
41
+ - Manual cache invalidation (per table or entire database)
42
+ - Raw DuckDB connection access for advanced queries
43
+ - Graceful shutdown with connection cleanup
44
+
45
+ ### Features
46
+ - Zero disk usage - pure in-memory storage
47
+ - Columnar format - 20-30% less RAM than pandas
48
+ - Single-pass SQL queries - filter + aggregate in one operation
49
+ - Safe concurrent reads during writes
50
+ - Configurable TTL per table or database
51
+ - Priority-based configuration resolution
52
+ - Background cleanup thread with configurable interval
@@ -0,0 +1,179 @@
1
+ # Environment Variables
2
+
3
+ This document lists all environment variables supported by cached_duckdb.
4
+
5
+ All variables use the `CACHED_DUCKDB_` prefix by default.
6
+
7
+ ## Configuration Variables
8
+
9
+ ### CACHED_DUCKDB_DEFAULT_MODE
10
+ - **Type:** String
11
+ - **Default:** `single_db`
12
+ - **Options:** `single_db` | `per_table_db`
13
+ - **Description:** Default storage mode for all databases
14
+ - `single_db`: One connection per database (enables JOINs)
15
+ - `per_table_db`: One connection per (database, table) pair (parallel writes)
16
+
17
+ ### CACHED_DUCKDB_DEFAULT_TTL_MINUTES
18
+ - **Type:** Integer
19
+ - **Default:** `30`
20
+ - **Description:** Default time-to-live in minutes for cached data
21
+ - **Note:** Can be overridden per database/table in config file
22
+
23
+ ### CACHED_DUCKDB_CLEANUP_INTERVAL_MINUTES
24
+ - **Type:** Integer
25
+ - **Default:** `5`
26
+ - **Description:** How often the background cleanup thread runs (in minutes)
27
+
28
+ ### CACHED_DUCKDB_LOCK_TIMEOUT_SECONDS
29
+ - **Type:** Float
30
+ - **Default:** `30.0`
31
+ - **Description:** Timeout for acquiring write locks (in seconds)
32
+ - **Note:** Raises `DuckDbCacheLockError` if timeout exceeded
33
+
34
+ ### CACHED_DUCKDB_CONFIG_FILE_PATH
35
+ - **Type:** String (file path)
36
+ - **Default:** `None`
37
+ - **Description:** Path to connector_config.json for per-database settings
38
+ - **Example:** `/path/to/connector_config.json`
39
+
40
+ ### CACHED_DUCKDB_LOG_NAME
41
+ - **Type:** String
42
+ - **Default:** `cached_duckdb`
43
+ - **Description:** Logger name for this library
44
+ - **Note:** Use this to configure logging for cached_duckdb specifically
45
+
46
+ ## Persistence Variables (v0.2.0+)
47
+
48
+ ### CACHED_DUCKDB_PERSIST_BASE_PATH
49
+ - **Type:** String (directory path)
50
+ - **Default:** `None` (in-memory only, no persistence)
51
+ - **Description:** Base directory for file-based persistence. When set:
52
+ - **Scenario 1 (DB-level):** Databases with a `persist_path` in connector_config.json (or all databases if no per-DB override) are stored as `{path}/{db_name}.duckdb` files instead of in-memory.
53
+ - **Scenario 2 (Table-level):** Tables marked with `"persist": true` in connector_config.json are saved as `{path}/{db_name}/{table_name}.parquet` files after each `store()`.
54
+ - **Example:** `/data/cache` or `C:\data\cache`
55
+
56
+ ### CACHED_DUCKDB_SERVICE_NAME
57
+ - **Type:** String
58
+ - **Default:** `default`
59
+ - **Description:** Service identifier used to namespace rows in the external DB snapshot table (`cached_duckdb_snapshots`). Allows multiple services to share the same external snapshot table.
60
+ - **Example:** `order_service`, `analytics_pipeline`
61
+
62
+ ## Example Configuration
63
+
64
+ ### Linux/macOS (.env file)
65
+ ```bash
66
+ CACHED_DUCKDB_DEFAULT_MODE=single_db
67
+ CACHED_DUCKDB_DEFAULT_TTL_MINUTES=30
68
+ CACHED_DUCKDB_CLEANUP_INTERVAL_MINUTES=5
69
+ CACHED_DUCKDB_LOCK_TIMEOUT_SECONDS=30
70
+ CACHED_DUCKDB_CONFIG_FILE_PATH=/opt/config/connector_config.json
71
+ CACHED_DUCKDB_PERSIST_BASE_PATH=/data/cache
72
+ CACHED_DUCKDB_SERVICE_NAME=my_service
73
+ CACHED_DUCKDB_LOG_NAME=cached_duckdb
74
+ ```
75
+
76
+ ### Windows (PowerShell)
77
+ ```powershell
78
+ $env:CACHED_DUCKDB_DEFAULT_MODE="single_db"
79
+ $env:CACHED_DUCKDB_DEFAULT_TTL_MINUTES="30"
80
+ $env:CACHED_DUCKDB_CLEANUP_INTERVAL_MINUTES="5"
81
+ $env:CACHED_DUCKDB_LOCK_TIMEOUT_SECONDS="30"
82
+ $env:CACHED_DUCKDB_CONFIG_FILE_PATH="C:\config\connector_config.json"
83
+ $env:CACHED_DUCKDB_PERSIST_BASE_PATH="C:\data\cache"
84
+ $env:CACHED_DUCKDB_SERVICE_NAME="my_service"
85
+ $env:CACHED_DUCKDB_LOG_NAME="cached_duckdb"
86
+ ```
87
+
88
+ ### Python Code
89
+ ```python
90
+ import os
91
+
92
+ os.environ['CACHED_DUCKDB_DEFAULT_MODE'] = 'per_table_db'
93
+ os.environ['CACHED_DUCKDB_DEFAULT_TTL_MINUTES'] = '60'
94
+ os.environ['CACHED_DUCKDB_CLEANUP_INTERVAL_MINUTES'] = '10'
95
+ os.environ['CACHED_DUCKDB_PERSIST_BASE_PATH'] = '/data/cache'
96
+ os.environ['CACHED_DUCKDB_SERVICE_NAME'] = 'order_service'
97
+
98
+ from cached_duckdb import DuckDbCacheConfig, DuckDbCacheManager
99
+
100
+ config = DuckDbCacheConfig.from_env()
101
+ cache = DuckDbCacheManager(config)
102
+ ```
103
+
104
+ ## Configuration Priority
105
+
106
+ Configuration is resolved in this order (highest to lowest):
107
+
108
+ 1. **Per-table config** in connector_config.json (database → table → setting)
109
+ 2. **Per-database config** in connector_config.json (database → setting)
110
+ 3. **Environment variables** (CACHED_DUCKDB_*)
111
+ 4. **Hardcoded defaults** in DuckDbCacheConfig
112
+
113
+ ### Example Priority Resolution
114
+
115
+ For TTL of `database="client_abc"`, `table="sales_data"`:
116
+
117
+ ```json
118
+ // connector_config.json
119
+ {
120
+ "client_abc": {
121
+ "default_cache_ttl_minutes": 45, // Database-level
122
+ "sales_data": {
123
+ "cache_ttl_minutes": 60 // Table-level (highest priority)
124
+ }
125
+ }
126
+ }
127
+ ```
128
+
129
+ ### Persistence Configuration in connector_config.json (v0.2.0+)
130
+
131
+ ```json
132
+ {
133
+ "client_abc": {
134
+ "persist_path": "/data/cache", // Scenario 1: DB saved as /data/cache/client_abc.duckdb
135
+ "sales_data": {
136
+ "persist": true // Scenario 2: table saved as {persist_base_path}/client_abc/sales_data.parquet
137
+ }
138
+ }
139
+ }
140
+ ```
141
+
142
+ **Note:** If `persist_path` is set on the database AND `persist: true` is set on a table within that database, the file-based DB takes precedence — the table is already on disk inside the `.duckdb` file, so no separate Parquet file is created.
143
+ ```
144
+
145
+ ```bash
146
+ # .env
147
+ CACHED_DUCKDB_DEFAULT_TTL_MINUTES=30 # Environment (lower priority)
148
+ ```
149
+
150
+ **Result:** `cache_ttl_minutes = 60` (from table-level config)
151
+
152
+ ## Logging Configuration
153
+
154
+ To configure logging for cached_duckdb:
155
+
156
+ ```python
157
+ import logging
158
+
159
+ # Set log level
160
+ logging.getLogger('cached_duckdb').setLevel(logging.DEBUG)
161
+
162
+ # Add handler
163
+ handler = logging.StreamHandler()
164
+ handler.setFormatter(logging.Formatter(
165
+ '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
166
+ ))
167
+ logging.getLogger('cached_duckdb').addHandler(handler)
168
+ ```
169
+
170
+ Or use environment-based log name:
171
+
172
+ ```bash
173
+ export CACHED_DUCKDB_LOG_NAME=my_cache_logger
174
+ ```
175
+
176
+ ```python
177
+ import logging
178
+ logging.getLogger('my_cache_logger').setLevel(logging.INFO)
179
+ ```
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 sreeyenan
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,10 @@
1
+ include README.md
2
+ include LICENSE
3
+ include pyproject.toml
4
+ include setup.py
5
+ include VERSION
6
+ include requirements.txt
7
+ include CHANGELOG.md
8
+ include ENVIRONMENT_VARIABLES.md
9
+ include cached_duckdb_USER_MANUAL.md
10
+ include *.py
@@ -0,0 +1,375 @@
1
+ Metadata-Version: 2.4
2
+ Name: cached-duckdb
3
+ Version: 0.2.0
4
+ Summary: Fast in-memory DataFrame cache using DuckDB with SQL query interface
5
+ Author-email: sreeyenan <sreeyenanek@gmail.com>
6
+ Keywords: duckdb,cache,dataframe,sql,in-memory
7
+ Classifier: Development Status :: 4 - Beta
8
+ Classifier: Intended Audience :: Developers
9
+ Classifier: Programming Language :: Python :: 3
10
+ Classifier: Programming Language :: Python :: 3.10
11
+ Classifier: Programming Language :: Python :: 3.11
12
+ Classifier: Programming Language :: Python :: 3.12
13
+ Requires-Python: >=3.10
14
+ Description-Content-Type: text/markdown
15
+ License-File: LICENSE
16
+ Requires-Dist: duckdb>=0.10.0
17
+ Requires-Dist: pandas>=1.5.0
18
+ Provides-Extra: dev
19
+ Requires-Dist: pytest>=7.0; extra == "dev"
20
+ Requires-Dist: pytest-cov; extra == "dev"
21
+ Provides-Extra: protected
22
+ Requires-Dist: Cython>=3.0; extra == "protected"
23
+ Provides-Extra: all
24
+ Requires-Dist: Cython>=3.0; extra == "all"
25
+ Dynamic: license-file
26
+
27
+ # cached_duckdb
28
+
29
+ Fast in-memory DataFrame cache using DuckDB with SQL query interface.
30
+
31
+ ## Overview
32
+
33
+ `cached_duckdb` replaces pandas dict-based caching with DuckDB in-memory connections for:
34
+ - **Columnar storage** - 20-30% less RAM than pandas
35
+ - **SQL queries** - Single-pass filter+aggregate operations
36
+ - **Concurrency** - Safe parallel reads during writes
37
+ - **Zero disk usage** - Pure in-memory like pandas
38
+
39
+ ## Key Features
40
+
41
+ - **Generic database/table API** - Works with any cache-based system
42
+ - **Two storage modes:**
43
+ - `single_db`: One connection per database (enables cross-table JOINs)
44
+ - `per_table_db`: One connection per (database, table) pair (fully parallel writes)
45
+ - **Atomic swap** - Readers see 100% old or 100% new data, never partial
46
+ - **TTL-based expiry** - Background cleanup with lazy stale flagging
47
+ - **Scheduler-managed tables** - Bypass TTL for scheduled updates
48
+ - **Thread-safe operations** - Per-database or per-table locking
49
+
50
+ ## Installation
51
+
52
+ ### From PyPI (Recommended)
53
+
54
+ ```bash
55
+ pip install cached-duckdb
56
+ ```
57
+
58
+ ### Install Specific Version
59
+
60
+ ```bash
61
+ pip install cached-duckdb==0.2.0
62
+ ```
63
+
64
+ ### With Optional Protected Build Extras
65
+
66
+ ```bash
67
+ pip install "cached-duckdb[all]"
68
+ ```
69
+
70
+ ### From Local Source
71
+
72
+ ```bash
73
+ pip install -r requirements.txt
74
+ ```
75
+
76
+ Or install in development mode:
77
+
78
+ ```bash
79
+ pip install -e .
80
+ ```
81
+
82
+ Verify installed version:
83
+
84
+ ```python
85
+ import cached_duckdb
86
+ print(cached_duckdb.__version__)
87
+ ```
88
+
89
+ ## Quick Start
90
+
91
+ ### Basic Usage
92
+
93
+ ```python
94
+ from cached_duckdb import DuckDbCacheManager
95
+ import pandas as pd
96
+
97
+ # Initialize cache (singleton)
98
+ cache = DuckDbCacheManager()
99
+
100
+ # Store DataFrame
101
+ df = pd.DataFrame({
102
+ 'date': ['2026-01-01', '2026-01-02'],
103
+ 'amount': [1000, 2000],
104
+ 'country': ['USA', 'UK']
105
+ })
106
+ cache.store(database="client_abc", table="sales_data", df=df)
107
+
108
+ # Query with SQL filtering
109
+ result = cache.query(
110
+ database="client_abc",
111
+ table="sales_data",
112
+ sql_where="amount > 1000 AND country = 'USA'",
113
+ columns=["date", "amount"]
114
+ )
115
+ print(result)
116
+ ```
117
+
118
+ ### Advanced Queries
119
+
120
+ ```python
121
+ # Get all data
122
+ df = cache.query(database="app_x", table="dataset_1")
123
+
124
+ # Filter with WHERE
125
+ df = cache.query(
126
+ database="app_x",
127
+ table="dataset_1",
128
+ sql_where="age > 25 AND country = 'USA'"
129
+ )
130
+
131
+ # Select specific columns
132
+ df = cache.query(
133
+ database="app_x",
134
+ table="dataset_1",
135
+ columns=["name", "age", "salary"]
136
+ )
137
+
138
+ # Limit results
139
+ df = cache.query(
140
+ database="app_x",
141
+ table="dataset_1",
142
+ sql_where="date >= '2026-01-01'",
143
+ limit=100
144
+ )
145
+ ```
146
+
147
+ ### Cross-Table JOINs (single_db mode)
148
+
149
+ ```python
150
+ # Execute raw SQL for complex queries
151
+ sql = """
152
+ SELECT s.date, s.amount, o.customer_name
153
+ FROM sales_data s
154
+ JOIN orders o ON s.order_id = o.id
155
+ WHERE s.amount > 1000
156
+ """
157
+ result = cache.execute_sql(database="client_abc", sql=sql)
158
+ ```
159
+
160
+ ### Check if Data Exists
161
+
162
+ ```python
163
+ if cache.exists(database="client_abc", table="sales_data"):
164
+ # Data is ready and fresh
165
+ df = cache.query(database="client_abc", table="sales_data")
166
+ else:
167
+ # Data missing or stale - reload needed
168
+ df = load_from_source()
169
+ cache.store(database="client_abc", table="sales_data", df=df)
170
+ ```
171
+
172
+ ### TTL and Scheduler-Managed Tables
173
+
174
+ ```python
175
+ # Store with custom TTL
176
+ cache.store(
177
+ database="client_abc",
178
+ table="sales_data",
179
+ df=df,
180
+ ttl_minutes=60 # Expires after 60 minutes
181
+ )
182
+
183
+ # Scheduler-managed table (no auto-expiry on reads)
184
+ cache.store(
185
+ database="client_abc",
186
+ table="sales_data",
187
+ df=df,
188
+ scheduler_managed=True # Only scheduler updates it
189
+ )
190
+ ```
191
+
192
+ ### Invalidate Cache
193
+
194
+ ```python
195
+ # Invalidate one table
196
+ cache.invalidate(database="client_abc", table="sales_data")
197
+
198
+ # Invalidate all tables for a database
199
+ cache.invalidate(database="client_abc")
200
+ ```
201
+
202
+ ### Get Metadata
203
+
204
+ ```python
205
+ # Last updated timestamp
206
+ last_updated = cache.get_last_updated(database="client_abc", table="sales_data")
207
+ print(f"Last updated: {last_updated}")
208
+
209
+ # Table info
210
+ info = cache.get_table_info(database="client_abc", table="sales_data")
211
+ print(f"Rows: {info['row_count']}")
212
+ print(f"Columns: {info['columns']}")
213
+ print(f"Types: {info['column_types']}")
214
+ ```
215
+
216
+ ## Documentation
217
+
218
+ - [User Manual](cached_duckdb_USER_MANUAL.md)
219
+ - [Environment Variables](ENVIRONMENT_VARIABLES.md)
220
+ - [Changelog](CHANGELOG.md)
221
+
222
+ ## Author
223
+
224
+ **sreeyenan** (sreeyenanek@gmail.com)
225
+
226
+ ## Version
227
+
228
+ Current version: **0.2.0**
229
+
230
+ ## Configuration
231
+
232
+ ### Environment Variables
233
+
234
+ ```bash
235
+ # Storage mode: single_db or per_table_db
236
+ CACHED_DUCKDB_DEFAULT_MODE=single_db
237
+
238
+ # Default TTL in minutes
239
+ CACHED_DUCKDB_DEFAULT_TTL_MINUTES=30
240
+
241
+ # Cleanup thread interval
242
+ CACHED_DUCKDB_CLEANUP_INTERVAL_MINUTES=5
243
+
244
+ # Lock timeout in seconds
245
+ CACHED_DUCKDB_LOCK_TIMEOUT_SECONDS=30
246
+
247
+ # Path to connector config file (optional)
248
+ CACHED_DUCKDB_CONFIG_FILE_PATH=/path/to/connector_config.json
249
+
250
+ # Logger name
251
+ CACHED_DUCKDB_LOG_NAME=cached_duckdb
252
+ ```
253
+
254
+ ### Per-Database Configuration File
255
+
256
+ Create `connector_config.json` for per-database settings:
257
+
258
+ ```json
259
+ {
260
+ "client_abc": {
261
+ "duck_cache_mode": "per_table_db",
262
+ "default_cache_ttl_minutes": 45,
263
+ "sales_data": {
264
+ "cache_ttl_minutes": 60,
265
+ "scheduler_managed": false
266
+ },
267
+ "live_feed": {
268
+ "cache_ttl_minutes": 0,
269
+ "scheduler_managed": true
270
+ }
271
+ },
272
+ "client_xyz": {
273
+ "duck_cache_mode": "single_db"
274
+ }
275
+ }
276
+ ```
277
+
278
+ **Priority order:**
279
+ 1. Per-table config in JSON file (highest)
280
+ 2. Per-database config in JSON file
281
+ 3. Environment variables
282
+ 4. Hardcoded defaults (lowest)
283
+
284
+ ### Load Configuration
285
+
286
+ ```python
287
+ from cached_duckdb import DuckDbCacheConfig, DuckDbCacheManager
288
+
289
+ # From environment
290
+ config = DuckDbCacheConfig.from_env()
291
+ cache = DuckDbCacheManager(config)
292
+
293
+ # From dict
294
+ config = DuckDbCacheConfig.from_dict({
295
+ "default_mode": "single_db",
296
+ "default_cache_ttl_minutes": 30,
297
+ "config_file_path": "/path/to/connector_config.json"
298
+ })
299
+ cache = DuckDbCacheManager(config)
300
+ ```
301
+
302
+ ## Storage Modes
303
+
304
+ ### Mode A: single_db (Default)
305
+
306
+ - One DuckDB connection per `database`
307
+ - Multiple tables share the same connection
308
+ - Enables cross-table SQL JOINs
309
+ - Write contention: One lock per database
310
+
311
+ **Use when:** Database has few tables (< 20) or need cross-table queries
312
+
313
+ ### Mode B: per_table_db
314
+
315
+ - One DuckDB connection per `(database, table)` pair
316
+ - Each table is fully isolated
317
+ - Zero write contention between tables
318
+ - Fully parallel writes
319
+
320
+ **Use when:** Database has many tables (20+) or high write concurrency
321
+
322
+ ## Architecture
323
+
324
+ ```
325
+ DuckDbCacheManager (singleton)
326
+ ├── CacheStore - Atomic writes, table management
327
+ ├── CacheQuery - SQL queries, metadata
328
+ ├── TTLRegistry - TTL tracking, cleanup thread
329
+ └── CacheConfigResolver - Per-database config resolution
330
+ ```
331
+
332
+ ## Thread Safety
333
+
334
+ - **store()**: Write-locked per database or per table
335
+ - **query()**: Lock-free parallel reads
336
+ - **invalidate()**: Write-locked, waits for active readers
337
+ - **Background cleanup**: Minimal locking, uses stale flags
338
+
339
+ ## Use Cases
340
+
341
+ - **Multi-tenant web APIs** - Cache per tenant with `database=tenant_id`
342
+ - **Analytics dashboards** - Fast in-memory OLAP queries
343
+ - **ETL pipelines** - Store intermediate DataFrames
344
+ - **Session managers** - Replace pandas dict caching
345
+ - **Microservices** - Shared cache library across services
346
+
347
+ ## API Reference
348
+
349
+ ### DuckDbCacheManager
350
+
351
+ - `store(database, table, df, ttl_minutes=None, scheduler_managed=False)` - Store DataFrame
352
+ - `query(database, table, sql_where=None, columns=None, limit=None)` - Query with filtering
353
+ - `execute_sql(database, sql)` - Execute raw SQL
354
+ - `exists(database, table)` - Check if exists and fresh
355
+ - `invalidate(database, table=None)` - Remove from cache
356
+ - `get_last_updated(database, table)` - Get timestamp
357
+ - `get_table_info(database, table)` - Get metadata
358
+ - `get_raw_connection(database, table=None)` - Get DuckDB connection
359
+ - `shutdown()` - Close all connections
360
+
361
+ ### Exceptions
362
+
363
+ - `DuckDbCacheError` - Base exception
364
+ - `DuckDbCacheConfigError` - Configuration error
365
+ - `DuckDbCacheLockError` - Lock acquisition failed
366
+ - `DuckDbCacheNotFoundError` - Table not found
367
+ - `DuckDbCacheStaleError` - Data is stale
368
+
369
+ ## License
370
+
371
+ MIT License - see LICENSE file
372
+
373
+ ## Author
374
+
375
+ sreeyenan