langgraph-unity-catalog-checkpoint 0.0.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- langgraph_unity_catalog_checkpoint-0.0.1/LICENSE +21 -0
- langgraph_unity_catalog_checkpoint-0.0.1/PKG-INFO +670 -0
- langgraph_unity_catalog_checkpoint-0.0.1/README.md +625 -0
- langgraph_unity_catalog_checkpoint-0.0.1/pyproject.toml +138 -0
- langgraph_unity_catalog_checkpoint-0.0.1/setup.cfg +4 -0
- langgraph_unity_catalog_checkpoint-0.0.1/src/langgraph_unity_catalog_checkpoint/__init__.py +39 -0
- langgraph_unity_catalog_checkpoint-0.0.1/src/langgraph_unity_catalog_checkpoint/checkpoint/__init__.py +30 -0
- langgraph_unity_catalog_checkpoint-0.0.1/src/langgraph_unity_catalog_checkpoint/checkpoint/aio.py +949 -0
- langgraph_unity_catalog_checkpoint-0.0.1/src/langgraph_unity_catalog_checkpoint/checkpoint/base.py +351 -0
- langgraph_unity_catalog_checkpoint-0.0.1/src/langgraph_unity_catalog_checkpoint/checkpoint/shallow.py +644 -0
- langgraph_unity_catalog_checkpoint-0.0.1/src/langgraph_unity_catalog_checkpoint/checkpoint/unity_catalog.py +273 -0
- langgraph_unity_catalog_checkpoint-0.0.1/src/langgraph_unity_catalog_checkpoint/logging_config.py +50 -0
- langgraph_unity_catalog_checkpoint-0.0.1/src/langgraph_unity_catalog_checkpoint/store/__init__.py +20 -0
- langgraph_unity_catalog_checkpoint-0.0.1/src/langgraph_unity_catalog_checkpoint/store/aio.py +505 -0
- langgraph_unity_catalog_checkpoint-0.0.1/src/langgraph_unity_catalog_checkpoint/store/base.py +280 -0
- langgraph_unity_catalog_checkpoint-0.0.1/src/langgraph_unity_catalog_checkpoint/store/unity_catalog.py +552 -0
- langgraph_unity_catalog_checkpoint-0.0.1/src/langgraph_unity_catalog_checkpoint.egg-info/PKG-INFO +670 -0
- langgraph_unity_catalog_checkpoint-0.0.1/src/langgraph_unity_catalog_checkpoint.egg-info/SOURCES.txt +24 -0
- langgraph_unity_catalog_checkpoint-0.0.1/src/langgraph_unity_catalog_checkpoint.egg-info/dependency_links.txt +1 -0
- langgraph_unity_catalog_checkpoint-0.0.1/src/langgraph_unity_catalog_checkpoint.egg-info/requires.txt +18 -0
- langgraph_unity_catalog_checkpoint-0.0.1/src/langgraph_unity_catalog_checkpoint.egg-info/top_level.txt +1 -0
- langgraph_unity_catalog_checkpoint-0.0.1/tests/test_async_unity_catalog_checkpointer.py +434 -0
- langgraph_unity_catalog_checkpoint-0.0.1/tests/test_basestore_interface.py +193 -0
- langgraph_unity_catalog_checkpoint-0.0.1/tests/test_integration.py +317 -0
- langgraph_unity_catalog_checkpoint-0.0.1/tests/test_unity_catalog_checkpointer.py +557 -0
- langgraph_unity_catalog_checkpoint-0.0.1/tests/test_unity_catalog_store.py +110 -0
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 Nate Fleming
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,670 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: langgraph-unity-catalog-checkpoint
|
|
3
|
+
Version: 0.0.1
|
|
4
|
+
Summary: Unity Catalog-backed persistence for LangChain and LangGraph
|
|
5
|
+
Author-email: Nate Fleming <nate.fleming@databricks.com>
|
|
6
|
+
Maintainer-email: Nate Fleming <nate.fleming@databricks.com>
|
|
7
|
+
License-Expression: MIT
|
|
8
|
+
Project-URL: Homepage, https://github.com/natefleming/langgraph_unity_catalog_checkpoint
|
|
9
|
+
Project-URL: Repository, https://github.com/natefleming/langgraph_unity_catalog_checkpoint
|
|
10
|
+
Project-URL: Issues, https://github.com/natefleming/langgraph_unity_catalog_checkpoint/issues
|
|
11
|
+
Project-URL: Documentation, https://github.com/natefleming/langgraph_unity_catalog_checkpoint#readme
|
|
12
|
+
Keywords: langchain,langgraph,databricks,unity-catalog,checkpoint,persistence,store,memory,agents,llm,ai
|
|
13
|
+
Classifier: Development Status :: 3 - Alpha
|
|
14
|
+
Classifier: Intended Audience :: Developers
|
|
15
|
+
Classifier: Intended Audience :: Science/Research
|
|
16
|
+
Classifier: Operating System :: OS Independent
|
|
17
|
+
Classifier: Programming Language :: Python :: 3
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
20
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
21
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
22
|
+
Classifier: Topic :: Database
|
|
23
|
+
Classifier: Typing :: Typed
|
|
24
|
+
Requires-Python: >=3.12
|
|
25
|
+
Description-Content-Type: text/markdown
|
|
26
|
+
License-File: LICENSE
|
|
27
|
+
Requires-Dist: databricks-connect>=17.0.10
|
|
28
|
+
Requires-Dist: databricks-langchain>=0.9.0
|
|
29
|
+
Requires-Dist: databricks-sdk>=0.73.0
|
|
30
|
+
Requires-Dist: langchain>=1.0.3
|
|
31
|
+
Requires-Dist: langgraph>=1.0.2
|
|
32
|
+
Requires-Dist: langmem>=0.0.30
|
|
33
|
+
Requires-Dist: loguru>=0.7.3
|
|
34
|
+
Requires-Dist: mlflow==3.5.1
|
|
35
|
+
Requires-Dist: nest-asyncio>=1.6.0
|
|
36
|
+
Requires-Dist: python-dotenv>=1.2.1
|
|
37
|
+
Provides-Extra: dev
|
|
38
|
+
Requires-Dist: pytest>=8.4.2; extra == "dev"
|
|
39
|
+
Requires-Dist: pytest-asyncio>=0.23.0; extra == "dev"
|
|
40
|
+
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
|
|
41
|
+
Requires-Dist: ruff>=0.6.0; extra == "dev"
|
|
42
|
+
Requires-Dist: twine>=5.0.0; extra == "dev"
|
|
43
|
+
Requires-Dist: python-dotenv>=1.0.0; extra == "dev"
|
|
44
|
+
Dynamic: license-file
|
|
45
|
+
|
|
46
|
+
# LangGraph Unity Catalog Checkpoint
|
|
47
|
+
|
|
48
|
+
[](https://www.python.org/downloads/)
|
|
49
|
+
[](LICENSE)
|
|
50
|
+
|
|
51
|
+
**Production-ready Unity Catalog persistence for LangChain and LangGraph applications using Databricks as the storage backend.**
|
|
52
|
+
|
|
53
|
+
Following the [LangGraph checkpoint-postgres pattern](https://github.com/langchain-ai/langgraph/tree/main/libs/checkpoint-postgres/langgraph) for consistency with the LangGraph ecosystem.
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## π Overview
|
|
58
|
+
|
|
59
|
+
This package provides enterprise-grade implementations of LangGraph's persistence interfaces backed by Databricks Unity Catalog:
|
|
60
|
+
|
|
61
|
+
- **`UnityCatalogStore`** / **`AsyncUnityCatalogStore`**: Implements [`langgraph.store.base.BaseStore`](https://github.com/langchain-ai/langgraph/blob/main/libs/checkpoint/langgraph/store/base/__init__.py) for key-value storage
|
|
62
|
+
- **`UnityCatalogCheckpointSaver`** / **`AsyncUnityCatalogCheckpointSaver`**: Implements [`BaseCheckpointSaver`](https://github.com/langchain-ai/langgraph/blob/main/libs/checkpoint/langgraph/checkpoint/base/__init__.py) for graph state persistence
|
|
63
|
+
|
|
64
|
+
All implementations use Databricks Unity Catalog Delta tables via the WorkspaceClient SQL API, providing:
|
|
65
|
+
|
|
66
|
+
- β
**Enterprise-grade reliability** with ACID transactions
|
|
67
|
+
- β
**Scalability** with Delta Lake optimization
|
|
68
|
+
- β
**Governance** with built-in access control and audit trails
|
|
69
|
+
- β
**Time-travel** for debugging and recovery
|
|
70
|
+
- β
**Seamless Databricks integration** for production ML workflows
|
|
71
|
+
- β
**Performance optimized** with batch operations (2-10x faster)
|
|
72
|
+
|
|
73
|
+
---
|
|
74
|
+
|
|
75
|
+
## π¦ Installation
|
|
76
|
+
|
|
77
|
+
### Prerequisites
|
|
78
|
+
|
|
79
|
+
- Python 3.12+
|
|
80
|
+
- Databricks workspace with Unity Catalog enabled
|
|
81
|
+
- SQL warehouse with appropriate permissions
|
|
82
|
+
|
|
83
|
+
### Install Dependencies
|
|
84
|
+
|
|
85
|
+
```bash
|
|
86
|
+
pip install databricks-sdk langchain langgraph langmem databricks-langchain
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
### Install Package
|
|
90
|
+
|
|
91
|
+
```bash
|
|
92
|
+
# From source
|
|
93
|
+
git clone https://github.com/natefleming/langgraph_unity_catalog_checkpoint.git
|
|
94
|
+
cd langgraph_unity_catalog_checkpoint
|
|
95
|
+
pip install -e .
|
|
96
|
+
|
|
97
|
+
# Or with development dependencies
|
|
98
|
+
pip install -e ".[dev]"
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
---
|
|
102
|
+
|
|
103
|
+
## β‘ Quick Start
|
|
104
|
+
|
|
105
|
+
### 1. Configure Databricks Authentication
|
|
106
|
+
|
|
107
|
+
Set up environment variables:
|
|
108
|
+
|
|
109
|
+
```bash
|
|
110
|
+
export DATABRICKS_HOST="https://your-workspace.databricks.com"
|
|
111
|
+
export DATABRICKS_TOKEN="your-access-token"
|
|
112
|
+
export DATABRICKS_WAREHOUSE_ID="your-warehouse-id"
|
|
113
|
+
export UC_CATALOG="your_catalog"
|
|
114
|
+
export UC_SCHEMA="your_schema"
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
Or use `~/.databrickscfg`:
|
|
118
|
+
|
|
119
|
+
```ini
|
|
120
|
+
[DEFAULT]
|
|
121
|
+
host = https://your-workspace.databricks.com
|
|
122
|
+
token = your-access-token
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
### 2. Using the Store for Key-Value Storage
|
|
126
|
+
|
|
127
|
+
```python
|
|
128
|
+
from databricks.sdk import WorkspaceClient
|
|
129
|
+
from langgraph_unity_catalog_checkpoint import UnityCatalogStore
|
|
130
|
+
|
|
131
|
+
# Initialize the store
|
|
132
|
+
workspace_client = WorkspaceClient()
|
|
133
|
+
store = UnityCatalogStore(
|
|
134
|
+
workspace_client=workspace_client,
|
|
135
|
+
catalog="main",
|
|
136
|
+
schema="langgraph",
|
|
137
|
+
table="my_store", # Default: "store"
|
|
138
|
+
warehouse_id="your-warehouse-id", # Optional
|
|
139
|
+
)
|
|
140
|
+
|
|
141
|
+
# Store values with namespaced keys
|
|
142
|
+
store.put(("users", "123"), "preferences", {"theme": "dark", "language": "en"})
|
|
143
|
+
|
|
144
|
+
# Retrieve values
|
|
145
|
+
prefs = store.get(("users", "123"), "preferences")
|
|
146
|
+
print(prefs) # {"theme": "dark", "language": "en"}
|
|
147
|
+
|
|
148
|
+
# Search within a namespace
|
|
149
|
+
items = store.search(("users",), limit=10)
|
|
150
|
+
for item in items:
|
|
151
|
+
print(f"Key: {item.key}, Namespace: {item.namespace}")
|
|
152
|
+
|
|
153
|
+
# Delete a key
|
|
154
|
+
store.delete(("users", "123"), "preferences")
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
### 3. Using the Checkpointer for Graph Persistence
|
|
158
|
+
|
|
159
|
+
```python
|
|
160
|
+
from databricks.sdk import WorkspaceClient
|
|
161
|
+
from databricks_langchain import ChatDatabricks
|
|
162
|
+
from langgraph.graph import StateGraph, START, END
|
|
163
|
+
from langgraph.graph.message import add_messages
|
|
164
|
+
from langchain_core.messages import HumanMessage, BaseMessage
|
|
165
|
+
from typing_extensions import TypedDict
|
|
166
|
+
from typing import Annotated
|
|
167
|
+
from langgraph_unity_catalog_checkpoint import UnityCatalogCheckpointSaver
|
|
168
|
+
|
|
169
|
+
# Define your graph state
|
|
170
|
+
class State(TypedDict):
|
|
171
|
+
messages: Annotated[list[BaseMessage], add_messages]
|
|
172
|
+
|
|
173
|
+
# Create a simple chatbot node
|
|
174
|
+
llm = ChatDatabricks(endpoint="databricks-meta-llama-3-3-70b-instruct")
|
|
175
|
+
|
|
176
|
+
def chatbot(state: State) -> dict:
|
|
177
|
+
response = llm.invoke(state["messages"])
|
|
178
|
+
return {"messages": [response]}
|
|
179
|
+
|
|
180
|
+
# Create the checkpointer
|
|
181
|
+
workspace_client = WorkspaceClient()
|
|
182
|
+
checkpointer = UnityCatalogCheckpointSaver(
|
|
183
|
+
workspace_client=workspace_client,
|
|
184
|
+
catalog="main",
|
|
185
|
+
schema="langgraph",
|
|
186
|
+
# Default tables: "checkpoints", "checkpoint_blobs", "checkpoint_writes"
|
|
187
|
+
warehouse_id="your-warehouse-id", # Optional
|
|
188
|
+
)
|
|
189
|
+
|
|
190
|
+
# Build the graph
|
|
191
|
+
graph_builder = StateGraph(State)
|
|
192
|
+
graph_builder.add_node("chatbot", chatbot)
|
|
193
|
+
graph_builder.add_edge(START, "chatbot")
|
|
194
|
+
graph_builder.add_edge("chatbot", END)
|
|
195
|
+
|
|
196
|
+
# Compile with checkpointer for persistence
|
|
197
|
+
graph = graph_builder.compile(checkpointer=checkpointer)
|
|
198
|
+
|
|
199
|
+
# Run conversation with persistence
|
|
200
|
+
config = {"configurable": {"thread_id": "conversation_1"}}
|
|
201
|
+
|
|
202
|
+
# First interaction
|
|
203
|
+
result = graph.invoke(
|
|
204
|
+
{"messages": [HumanMessage(content="Hello! What's the weather like?")]},
|
|
205
|
+
config=config
|
|
206
|
+
)
|
|
207
|
+
|
|
208
|
+
# Second interaction - conversation history is maintained!
|
|
209
|
+
result = graph.invoke(
|
|
210
|
+
{"messages": [HumanMessage(content="What did I just ask you?")]},
|
|
211
|
+
config=config
|
|
212
|
+
)
|
|
213
|
+
# The bot remembers the previous question! π
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
### 4. Async Usage for High Performance
|
|
217
|
+
|
|
218
|
+
```python
|
|
219
|
+
from langgraph_unity_catalog_checkpoint import AsyncUnityCatalogCheckpointSaver
|
|
220
|
+
import asyncio
|
|
221
|
+
|
|
222
|
+
# Create async checkpointer
|
|
223
|
+
async_checkpointer = AsyncUnityCatalogCheckpointSaver(
|
|
224
|
+
workspace_client=workspace_client,
|
|
225
|
+
catalog="main",
|
|
226
|
+
schema="langgraph",
|
|
227
|
+
warehouse_id="your-warehouse-id",
|
|
228
|
+
)
|
|
229
|
+
|
|
230
|
+
# Async chatbot node
|
|
231
|
+
async def async_chatbot(state: State) -> dict:
|
|
232
|
+
response = await llm.ainvoke(state["messages"])
|
|
233
|
+
return {"messages": [response]}
|
|
234
|
+
|
|
235
|
+
# Build and compile with async checkpointer
|
|
236
|
+
graph_builder = StateGraph(State)
|
|
237
|
+
graph_builder.add_node("chatbot", async_chatbot)
|
|
238
|
+
graph_builder.add_edge(START, "chatbot")
|
|
239
|
+
graph_builder.add_edge("chatbot", END)
|
|
240
|
+
graph = graph_builder.compile(checkpointer=async_checkpointer)
|
|
241
|
+
|
|
242
|
+
# Run asynchronously
|
|
243
|
+
config = {"configurable": {"thread_id": "async_conversation_1"}}
|
|
244
|
+
result = await graph.ainvoke(
|
|
245
|
+
{"messages": [HumanMessage(content="Hello async world!")]},
|
|
246
|
+
config=config
|
|
247
|
+
)
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
---
|
|
251
|
+
|
|
252
|
+
## π― Use Cases
|
|
253
|
+
|
|
254
|
+
### 1. **Conversational AI with Memory**
|
|
255
|
+
|
|
256
|
+
Maintain conversation history across multiple interactions:
|
|
257
|
+
|
|
258
|
+
```python
|
|
259
|
+
# Each user gets their own conversation thread
|
|
260
|
+
config = {"configurable": {"thread_id": f"user_{user_id}"}}
|
|
261
|
+
graph.invoke({"messages": [HumanMessage(content=user_input)]}, config)
|
|
262
|
+
```
|
|
263
|
+
|
|
264
|
+
### 2. **Human-in-the-Loop Workflows**
|
|
265
|
+
|
|
266
|
+
Pause execution for human review and resume seamlessly:
|
|
267
|
+
|
|
268
|
+
```python
|
|
269
|
+
# Interrupt before critical nodes
|
|
270
|
+
graph = builder.compile(
|
|
271
|
+
checkpointer=checkpointer,
|
|
272
|
+
interrupt_before=["approval_node"]
|
|
273
|
+
)
|
|
274
|
+
|
|
275
|
+
# Execute and pause at approval
|
|
276
|
+
result = graph.invoke(input_data, config)
|
|
277
|
+
|
|
278
|
+
# Human reviews and approves...
|
|
279
|
+
|
|
280
|
+
# Resume from checkpoint
|
|
281
|
+
result = graph.invoke(None, config) # Continues from where it left off
|
|
282
|
+
```
|
|
283
|
+
|
|
284
|
+
### 3. **Long-Term Memory with LangMem**
|
|
285
|
+
|
|
286
|
+
Integrate with [LangMem](https://github.com/langchain-ai/langmem) for user preferences and memories:
|
|
287
|
+
|
|
288
|
+
```python
|
|
289
|
+
from langchain.agents import create_agent
|
|
290
|
+
from langmem.tools import get_langmem_tools
|
|
291
|
+
|
|
292
|
+
# Create store for LangMem
|
|
293
|
+
store = UnityCatalogStore(
|
|
294
|
+
workspace_client=workspace_client,
|
|
295
|
+
catalog="main",
|
|
296
|
+
schema="langgraph",
|
|
297
|
+
)
|
|
298
|
+
|
|
299
|
+
# Get LangMem tools
|
|
300
|
+
langmem_tools = get_langmem_tools(store=store)
|
|
301
|
+
|
|
302
|
+
# Create agent with memory
|
|
303
|
+
agent = create_agent(llm, tools + langmem_tools)
|
|
304
|
+
|
|
305
|
+
# Use with user context
|
|
306
|
+
config = {
|
|
307
|
+
"configurable": {
|
|
308
|
+
"langgraph_user_id": "user_123" # Isolates memories per user
|
|
309
|
+
}
|
|
310
|
+
}
|
|
311
|
+
agent.invoke({"messages": [HumanMessage(content="I prefer dark mode")]}, config)
|
|
312
|
+
```
|
|
313
|
+
|
|
314
|
+
### 4. **Production ML Pipelines**
|
|
315
|
+
|
|
316
|
+
Reliable state management for complex workflows:
|
|
317
|
+
|
|
318
|
+
```python
|
|
319
|
+
# Automatic recovery from failures
|
|
320
|
+
# Time-travel debugging with Delta Lake
|
|
321
|
+
# Full audit trail via Unity Catalog
|
|
322
|
+
# Multi-agent coordination with isolated states
|
|
323
|
+
```
|
|
324
|
+
|
|
325
|
+
---
|
|
326
|
+
|
|
327
|
+
## π Performance Optimizations
|
|
328
|
+
|
|
329
|
+
### Batch Write Operations (2-10x Faster)
|
|
330
|
+
|
|
331
|
+
The implementation uses **batched SQL operations** to minimize round trips to Unity Catalog:
|
|
332
|
+
|
|
333
|
+
```python
|
|
334
|
+
# Instead of N+1 SQL statements:
|
|
335
|
+
# - 1 per blob
|
|
336
|
+
# - 1 per write
|
|
337
|
+
# - 1 checkpoint
|
|
338
|
+
|
|
339
|
+
# We use just 3 SQL statements:
|
|
340
|
+
# - 1 batch for all blobs
|
|
341
|
+
# - 1 batch for all writes
|
|
342
|
+
# - 1 for checkpoint
|
|
343
|
+
|
|
344
|
+
# For a checkpoint with 5 blobs and 3 writes:
|
|
345
|
+
# Before: 9 SQL statements
|
|
346
|
+
# After: 3 SQL statements
|
|
347
|
+
# Speedup: 3x faster! β‘
|
|
348
|
+
```
|
|
349
|
+
|
|
350
|
+
See [docs/CHECKPOINT_BATCH_WRITE_OPTIMIZATION.md](docs/CHECKPOINT_BATCH_WRITE_OPTIMIZATION.md) for details.
|
|
351
|
+
|
|
352
|
+
---
|
|
353
|
+
|
|
354
|
+
## ποΈ Architecture
|
|
355
|
+
|
|
356
|
+
```
|
|
357
|
+
ββββββββββββββββββββββββββββββββββββββββ
|
|
358
|
+
β LangChain/LangGraph Application β
|
|
359
|
+
ββββββββββββββββββββββββββββββββββββββββ
|
|
360
|
+
β
|
|
361
|
+
ββββββββββββββββββββββββββββββββββββββββ
|
|
362
|
+
β BaseStore / BaseCheckpointSaver β
|
|
363
|
+
β (LangGraph Interfaces) β
|
|
364
|
+
ββββββββββββββββββββββββββββββββββββββββ
|
|
365
|
+
β
|
|
366
|
+
ββββββββββββββββββββββββββββββββββββββββ
|
|
367
|
+
β Unity Catalog Implementation β
|
|
368
|
+
β - UnityCatalogStore β
|
|
369
|
+
β - UnityCatalogCheckpointSaver β
|
|
370
|
+
β - Async variants β
|
|
371
|
+
ββββββββββββββββββββββββββββββββββββββββ
|
|
372
|
+
β
|
|
373
|
+
ββββββββββββββββββββββββββββββββββββββββ
|
|
374
|
+
β Databricks WorkspaceClient β
|
|
375
|
+
β (SQL Statement Execution API) β
|
|
376
|
+
ββββββββββββββββββββββββββββββββββββββββ
|
|
377
|
+
β
|
|
378
|
+
ββββββββββββββββββββββββββββββββββββββββ
|
|
379
|
+
β Unity Catalog Delta Tables β
|
|
380
|
+
β - ACID transactions β
|
|
381
|
+
β - Time-travel β
|
|
382
|
+
β - Change Data Feed β
|
|
383
|
+
ββββββββββββββββββββββββββββββββββββββββ
|
|
384
|
+
```
|
|
385
|
+
|
|
386
|
+
### Data Storage
|
|
387
|
+
|
|
388
|
+
- **Serialization**: Checkpoints and values are serialized using LangGraph's `JsonPlusSerializer`
|
|
389
|
+
- **Binary Storage**: BINARY columns for efficient blob storage (base64 encoded)
|
|
390
|
+
- **JSON Metadata**: Structured metadata for filtering and querying
|
|
391
|
+
- **Delta Lake**: ACID transactions, time-travel, and optimization
|
|
392
|
+
|
|
393
|
+
### Default Table Names
|
|
394
|
+
|
|
395
|
+
| Component | Default Tables |
|
|
396
|
+
|-----------|---------------|
|
|
397
|
+
| **Store** | `store` |
|
|
398
|
+
| **Checkpointer** | `checkpoints`, `checkpoint_blobs`, `checkpoint_writes` |
|
|
399
|
+
|
|
400
|
+
Tables are automatically created on first use with optimized schemas.
|
|
401
|
+
|
|
402
|
+
---
|
|
403
|
+
|
|
404
|
+
## π Examples
|
|
405
|
+
|
|
406
|
+
### Complete Jupyter Notebooks
|
|
407
|
+
|
|
408
|
+
Explore the [`notebooks/`](notebooks/) directory for interactive examples:
|
|
409
|
+
|
|
410
|
+
- **[`store_example.ipynb`](notebooks/store_example.ipynb)** - Store operations and LangMem integration
|
|
411
|
+
- **[`checkpointer_example.ipynb`](notebooks/checkpointer_example.ipynb)** - Synchronous graph checkpointing
|
|
412
|
+
- **[`async_checkpointer_example.ipynb`](notebooks/async_checkpointer_example.ipynb)** - Async graph execution
|
|
413
|
+
|
|
414
|
+
### Run in Databricks
|
|
415
|
+
|
|
416
|
+
1. Upload a notebook to your Databricks workspace
|
|
417
|
+
2. Attach to a cluster with Unity Catalog access
|
|
418
|
+
3. Set the required configuration (catalog, schema, warehouse_id)
|
|
419
|
+
4. Run all cells
|
|
420
|
+
|
|
421
|
+
---
|
|
422
|
+
|
|
423
|
+
## π§ Configuration
|
|
424
|
+
|
|
425
|
+
### Environment Variables
|
|
426
|
+
|
|
427
|
+
| Variable | Description | Required |
|
|
428
|
+
|----------|-------------|----------|
|
|
429
|
+
| `DATABRICKS_HOST` | Workspace URL | Yes |
|
|
430
|
+
| `DATABRICKS_TOKEN` | Access token | Yes |
|
|
431
|
+
| `DATABRICKS_WAREHOUSE_ID` | SQL warehouse ID | No |
|
|
432
|
+
| `UC_CATALOG` | Default catalog name | Recommended |
|
|
433
|
+
| `UC_SCHEMA` | Default schema name | Recommended |
|
|
434
|
+
|
|
435
|
+
### Configuration Precedence
|
|
436
|
+
|
|
437
|
+
Configuration values are resolved in this order:
|
|
438
|
+
|
|
439
|
+
1. **Environment variables** (highest priority)
|
|
440
|
+
2. **Databricks widgets** (for notebooks)
|
|
441
|
+
3. **Constructor parameters** (explicit values)
|
|
442
|
+
|
|
443
|
+
See [docs/CONFIGURATION_PRECEDENCE.md](docs/CONFIGURATION_PRECEDENCE.md) for details.
|
|
444
|
+
|
|
445
|
+
### Warehouse ID
|
|
446
|
+
|
|
447
|
+
The `warehouse_id` parameter is optional and defaults to `None`. If not provided:
|
|
448
|
+
- Uses the default warehouse for the workspace
|
|
449
|
+
- Can be overridden per-operation if needed
|
|
450
|
+
|
|
451
|
+
---
|
|
452
|
+
|
|
453
|
+
## π Permissions Required
|
|
454
|
+
|
|
455
|
+
Ensure your Databricks principal has:
|
|
456
|
+
|
|
457
|
+
- `USE CATALOG` on the target catalog
|
|
458
|
+
- `USE SCHEMA` on the target schema
|
|
459
|
+
- `CREATE TABLE` on the target schema (for initialization)
|
|
460
|
+
- `SELECT`, `INSERT`, `UPDATE`, `DELETE`, `MODIFY` on the tables
|
|
461
|
+
|
|
462
|
+
---
|
|
463
|
+
|
|
464
|
+
## π§ͺ Testing
|
|
465
|
+
|
|
466
|
+
### Run Unit Tests
|
|
467
|
+
|
|
468
|
+
```bash
|
|
469
|
+
# Run all tests
|
|
470
|
+
make test
|
|
471
|
+
|
|
472
|
+
# Run specific test file
|
|
473
|
+
uv run pytest tests/test_unity_catalog_store.py -v
|
|
474
|
+
|
|
475
|
+
# Run with coverage
|
|
476
|
+
uv run pytest --cov=src --cov-report=html
|
|
477
|
+
```
|
|
478
|
+
|
|
479
|
+
### Run Integration Tests
|
|
480
|
+
|
|
481
|
+
Integration tests require a live Databricks connection:
|
|
482
|
+
|
|
483
|
+
```bash
|
|
484
|
+
# Set required environment variables
|
|
485
|
+
export DATABRICKS_HOST="..."
|
|
486
|
+
export DATABRICKS_TOKEN="..."
|
|
487
|
+
export DATABRICKS_WAREHOUSE_ID="..."
|
|
488
|
+
|
|
489
|
+
# Run integration tests
|
|
490
|
+
uv run pytest tests/test_integration.py -v
|
|
491
|
+
```
|
|
492
|
+
|
|
493
|
+
### Linting and Formatting
|
|
494
|
+
|
|
495
|
+
```bash
|
|
496
|
+
# Format code
|
|
497
|
+
make format
|
|
498
|
+
|
|
499
|
+
# Run linting
|
|
500
|
+
make lint
|
|
501
|
+
|
|
502
|
+
# Type checking
|
|
503
|
+
make type-check
|
|
504
|
+
```
|
|
505
|
+
|
|
506
|
+
---
|
|
507
|
+
|
|
508
|
+
## π Documentation
|
|
509
|
+
|
|
510
|
+
### Core Documentation
|
|
511
|
+
|
|
512
|
+
- **[Usage Guide](docs/USAGE.md)** - Comprehensive usage examples
|
|
513
|
+
- **[Implementation Summary](docs/IMPLEMENTATION_SUMMARY.md)** - Technical architecture
|
|
514
|
+
- **[Environment Setup](docs/ENVIRONMENT_SETUP.md)** - Development environment
|
|
515
|
+
- **[Quick Start](QUICKSTART.md)** - Getting started guide
|
|
516
|
+
- **[Install Guide](INSTALL.md)** - Installation instructions
|
|
517
|
+
|
|
518
|
+
### Technical Details
|
|
519
|
+
|
|
520
|
+
- **[Checkpoint Batch Write Optimization](docs/CHECKPOINT_BATCH_WRITE_OPTIMIZATION.md)** - Performance optimization details
|
|
521
|
+
- **[Configuration Precedence](docs/CONFIGURATION_PRECEDENCE.md)** - Configuration resolution
|
|
522
|
+
- **[Default Table Names](docs/DEFAULT_TABLE_NAMES.md)** - Table naming conventions
|
|
523
|
+
- **[MLflow Autolog Setup](docs/MLFLOW_AUTOLOG_SETUP.md)** - Observability with MLflow
|
|
524
|
+
- **[Logging](docs/LOGGING.md)** - Logging configuration
|
|
525
|
+
|
|
526
|
+
### Session Summaries
|
|
527
|
+
|
|
528
|
+
- **[Batch Optimization (2025-11-07)](docs/SESSION_SUMMARY_2025-11-07_BATCH_OPTIMIZATION.md)**
|
|
529
|
+
- **[MLflow Tracing Removal (2025-11-07)](docs/SESSION_SUMMARY_2025-11-07_MLFLOW_TRACING_REMOVAL.md)**
|
|
530
|
+
|
|
531
|
+
---
|
|
532
|
+
|
|
533
|
+
## π Features
|
|
534
|
+
|
|
535
|
+
### UnityCatalogStore
|
|
536
|
+
|
|
537
|
+
- β
Implements `langgraph.store.base.BaseStore` interface
|
|
538
|
+
- β
Batch operations (`batch`, `abatch`) for performance
|
|
539
|
+
- β
Namespaced key-value storage
|
|
540
|
+
- β
Search with filtering and pagination
|
|
541
|
+
- β
Automatic table initialization
|
|
542
|
+
- β
Sync and async implementations
|
|
543
|
+
- β
Compatible with LangMem for long-term memory
|
|
544
|
+
|
|
545
|
+
### UnityCatalogCheckpointSaver
|
|
546
|
+
|
|
547
|
+
- β
Implements `BaseCheckpointSaver` interface
|
|
548
|
+
- β
Full LangGraph checkpoint persistence
|
|
549
|
+
- β
Support for human-in-the-loop workflows
|
|
550
|
+
- β
Multi-turn conversation memory
|
|
551
|
+
- β
State recovery and time-travel
|
|
552
|
+
- β
Pending writes management
|
|
553
|
+
- β
Checkpoint listing and filtering
|
|
554
|
+
- β
Sync and async implementations
|
|
555
|
+
- β
Optimized batch writes (2-10x faster)
|
|
556
|
+
- β
Automatic table creation and schema management
|
|
557
|
+
|
|
558
|
+
---
|
|
559
|
+
|
|
560
|
+
## π οΈ Development
|
|
561
|
+
|
|
562
|
+
### Setup Development Environment
|
|
563
|
+
|
|
564
|
+
```bash
|
|
565
|
+
# Clone the repository
|
|
566
|
+
git clone https://github.com/natefleming/langgraph_unity_catalog_checkpoint.git
|
|
567
|
+
cd langgraph_unity_catalog_checkpoint
|
|
568
|
+
|
|
569
|
+
# Create virtual environment
|
|
570
|
+
python -m venv .venv
|
|
571
|
+
source .venv/bin/activate # On Windows: .venv\Scripts\activate
|
|
572
|
+
|
|
573
|
+
# Install with development dependencies
|
|
574
|
+
pip install -e ".[dev]"
|
|
575
|
+
|
|
576
|
+
# Install pre-commit hooks
|
|
577
|
+
pre-commit install
|
|
578
|
+
```
|
|
579
|
+
|
|
580
|
+
### Project Structure
|
|
581
|
+
|
|
582
|
+
```
|
|
583
|
+
langgraph_unity_catalog_checkpoint/
|
|
584
|
+
βββ src/
|
|
585
|
+
β βββ langgraph_unity_catalog_checkpoint/
|
|
586
|
+
β βββ store/ # Store implementations
|
|
587
|
+
β β βββ unity_catalog.py # Sync store
|
|
588
|
+
β β βββ aio.py # Async store
|
|
589
|
+
β β βββ base.py # Base store class
|
|
590
|
+
β βββ checkpoint/ # Checkpointer implementations
|
|
591
|
+
β β βββ unity_catalog.py # Sync checkpointer
|
|
592
|
+
β β βββ aio.py # Async checkpointer
|
|
593
|
+
β β βββ base.py # Base checkpointer class
|
|
594
|
+
β βββ __init__.py # Public API exports
|
|
595
|
+
βββ tests/ # Test suite
|
|
596
|
+
βββ notebooks/ # Example notebooks
|
|
597
|
+
βββ docs/ # Documentation
|
|
598
|
+
βββ pyproject.toml # Project configuration
|
|
599
|
+
βββ README.md # This file
|
|
600
|
+
```
|
|
601
|
+
|
|
602
|
+
---
|
|
603
|
+
|
|
604
|
+
## π€ Contributing
|
|
605
|
+
|
|
606
|
+
Contributions are welcome! Please:
|
|
607
|
+
|
|
608
|
+
1. Fork the repository
|
|
609
|
+
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
|
|
610
|
+
3. Make your changes with tests
|
|
611
|
+
4. Run the test suite (`make test`)
|
|
612
|
+
5. Format and lint (`make format lint`)
|
|
613
|
+
6. Commit your changes (`git commit -m 'Add amazing feature'`)
|
|
614
|
+
7. Push to the branch (`git push origin feature/amazing-feature`)
|
|
615
|
+
8. Open a Pull Request
|
|
616
|
+
|
|
617
|
+
---
|
|
618
|
+
|
|
619
|
+
## π License
|
|
620
|
+
|
|
621
|
+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
|
622
|
+
|
|
623
|
+
---
|
|
624
|
+
|
|
625
|
+
## π Acknowledgments
|
|
626
|
+
|
|
627
|
+
Built with:
|
|
628
|
+
|
|
629
|
+
- [LangChain](https://github.com/langchain-ai/langchain) - Framework for LLM applications
|
|
630
|
+
- [LangGraph](https://github.com/langchain-ai/langgraph) - Graph-based agent framework
|
|
631
|
+
- [LangMem](https://github.com/langchain-ai/langmem) - Long-term memory for agents
|
|
632
|
+
- [Databricks SDK](https://github.com/databricks/databricks-sdk-py) - Databricks API client
|
|
633
|
+
- [Unity Catalog](https://www.databricks.com/product/unity-catalog) - Data governance platform
|
|
634
|
+
|
|
635
|
+
---
|
|
636
|
+
|
|
637
|
+
## π Support
|
|
638
|
+
|
|
639
|
+
For issues and questions:
|
|
640
|
+
|
|
641
|
+
- **GitHub Issues**: [Open an issue](https://github.com/natefleming/langgraph_unity_catalog_checkpoint/issues)
|
|
642
|
+
- **Documentation**: Check the [`docs/`](docs/) directory
|
|
643
|
+
- **Examples**: Review the [`notebooks/`](notebooks/) directory
|
|
644
|
+
|
|
645
|
+
---
|
|
646
|
+
|
|
647
|
+
## πΊοΈ Roadmap
|
|
648
|
+
|
|
649
|
+
Planned enhancements:
|
|
650
|
+
|
|
651
|
+
- [ ] Connection pooling for improved performance
|
|
652
|
+
- [ ] Configurable TTL for automatic checkpoint cleanup
|
|
653
|
+
- [ ] Metrics and monitoring integration
|
|
654
|
+
- [ ] Query optimization hints and caching
|
|
655
|
+
- [ ] Support for alternative serialization formats
|
|
656
|
+
- [ ] Bulk import/export utilities
|
|
657
|
+
- [ ] Multi-region replication support
|
|
658
|
+
|
|
659
|
+
---
|
|
660
|
+
|
|
661
|
+
## β‘ Quick Links
|
|
662
|
+
|
|
663
|
+
- **[Quick Start Guide](QUICKSTART.md)** - Get started in 5 minutes
|
|
664
|
+
- **[Usage Examples](docs/USAGE.md)** - Detailed usage patterns
|
|
665
|
+
- **[Notebooks](notebooks/)** - Interactive examples
|
|
666
|
+
- **[API Reference](docs/IMPLEMENTATION_SUMMARY.md)** - Technical details
|
|
667
|
+
|
|
668
|
+
---
|
|
669
|
+
|
|
670
|
+
**Made with β€οΈ for the LangChain community**
|