kailash 0.1.0__py3-none-any.whl → 0.1.2__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (38) hide show
  1. kailash/__init__.py +1 -1
  2. kailash/nodes/__init__.py +2 -1
  3. kailash/nodes/ai/__init__.py +26 -0
  4. kailash/nodes/ai/ai_providers.py +1272 -0
  5. kailash/nodes/ai/embedding_generator.py +853 -0
  6. kailash/nodes/ai/llm_agent.py +1166 -0
  7. kailash/nodes/api/auth.py +3 -3
  8. kailash/nodes/api/graphql.py +2 -2
  9. kailash/nodes/api/http.py +391 -44
  10. kailash/nodes/api/rate_limiting.py +2 -2
  11. kailash/nodes/api/rest.py +464 -56
  12. kailash/nodes/base.py +71 -12
  13. kailash/nodes/code/python.py +2 -1
  14. kailash/nodes/data/__init__.py +7 -0
  15. kailash/nodes/data/readers.py +28 -26
  16. kailash/nodes/data/retrieval.py +178 -0
  17. kailash/nodes/data/sharepoint_graph.py +7 -7
  18. kailash/nodes/data/sources.py +65 -0
  19. kailash/nodes/data/sql.py +4 -2
  20. kailash/nodes/data/writers.py +6 -3
  21. kailash/nodes/logic/operations.py +2 -1
  22. kailash/nodes/mcp/__init__.py +11 -0
  23. kailash/nodes/mcp/client.py +558 -0
  24. kailash/nodes/mcp/resource.py +682 -0
  25. kailash/nodes/mcp/server.py +571 -0
  26. kailash/nodes/transform/__init__.py +16 -1
  27. kailash/nodes/transform/chunkers.py +78 -0
  28. kailash/nodes/transform/formatters.py +96 -0
  29. kailash/runtime/docker.py +6 -6
  30. kailash/sdk_exceptions.py +24 -10
  31. kailash/tracking/metrics_collector.py +2 -1
  32. kailash/utils/templates.py +6 -6
  33. {kailash-0.1.0.dist-info → kailash-0.1.2.dist-info}/METADATA +349 -49
  34. {kailash-0.1.0.dist-info → kailash-0.1.2.dist-info}/RECORD +38 -27
  35. {kailash-0.1.0.dist-info → kailash-0.1.2.dist-info}/WHEEL +0 -0
  36. {kailash-0.1.0.dist-info → kailash-0.1.2.dist-info}/entry_points.txt +0 -0
  37. {kailash-0.1.0.dist-info → kailash-0.1.2.dist-info}/licenses/LICENSE +0 -0
  38. {kailash-0.1.0.dist-info → kailash-0.1.2.dist-info}/top_level.txt +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: kailash
3
- Version: 0.1.0
3
+ Version: 0.1.2
4
4
  Summary: Python SDK for the Kailash container-node architecture
5
5
  Home-page: https://github.com/integrum/kailash-python-sdk
6
6
  Author: Integrum
@@ -10,9 +10,8 @@ Project-URL: Bug Tracker, https://github.com/integrum/kailash-python-sdk/issues
10
10
  Classifier: Development Status :: 3 - Alpha
11
11
  Classifier: Intended Audience :: Developers
12
12
  Classifier: Programming Language :: Python :: 3
13
- Classifier: Programming Language :: Python :: 3.8
14
- Classifier: Programming Language :: Python :: 3.9
15
- Classifier: Programming Language :: Python :: 3.10
13
+ Classifier: Programming Language :: Python :: 3.11
14
+ Classifier: Programming Language :: Python :: 3.12
16
15
  Requires-Python: >=3.11
17
16
  Description-Content-Type: text/markdown
18
17
  License-File: LICENSE
@@ -22,7 +21,7 @@ Requires-Dist: matplotlib>=3.5
22
21
  Requires-Dist: pyyaml>=6.0
23
22
  Requires-Dist: click>=8.0
24
23
  Requires-Dist: pytest>=8.3.5
25
- Requires-Dist: mcp[cli]>=1.9.0
24
+ Requires-Dist: mcp[cli]>=1.9.2
26
25
  Requires-Dist: pandas>=2.2.3
27
26
  Requires-Dist: numpy>=2.2.5
28
27
  Requires-Dist: scipy>=1.15.3
@@ -45,6 +44,8 @@ Requires-Dist: psutil>=7.0.0
45
44
  Requires-Dist: fastapi[all]>=0.115.12
46
45
  Requires-Dist: pytest-asyncio>=1.0.0
47
46
  Requires-Dist: pre-commit>=4.2.0
47
+ Requires-Dist: twine>=6.1.0
48
+ Requires-Dist: ollama>=0.5.1
48
49
  Provides-Extra: dev
49
50
  Requires-Dist: pytest>=7.0; extra == "dev"
50
51
  Requires-Dist: pytest-cov>=3.0; extra == "dev"
@@ -59,10 +60,12 @@ Dynamic: requires-python
59
60
  # Kailash Python SDK
60
61
 
61
62
  <p align="center">
62
- <img src="https://img.shields.io/badge/python-3.11+-blue.svg" alt="Python 3.8+">
63
+ <a href="https://pypi.org/project/kailash/"><img src="https://img.shields.io/pypi/v/kailash.svg" alt="PyPI version"></a>
64
+ <a href="https://pypi.org/project/kailash/"><img src="https://img.shields.io/pypi/pyversions/kailash.svg" alt="Python versions"></a>
65
+ <a href="https://pepy.tech/project/kailash"><img src="https://static.pepy.tech/badge/kailash" alt="Downloads"></a>
63
66
  <img src="https://img.shields.io/badge/license-MIT-green.svg" alt="MIT License">
64
67
  <img src="https://img.shields.io/badge/code%20style-black-000000.svg" alt="Code style: black">
65
- <img src="https://img.shields.io/badge/tests-544%20passing-brightgreen.svg" alt="Tests: 544 passing">
68
+ <img src="https://img.shields.io/badge/tests-746%20passing-brightgreen.svg" alt="Tests: 746 passing">
66
69
  <img src="https://img.shields.io/badge/coverage-100%25-brightgreen.svg" alt="Coverage: 100%">
67
70
  </p>
68
71
 
@@ -84,6 +87,8 @@ Dynamic: requires-python
84
87
  - 📊 **Real-time Monitoring**: Live dashboards with WebSocket streaming and performance metrics
85
88
  - 🧩 **Extensible**: Easy to create custom nodes for domain-specific operations
86
89
  - ⚡ **Fast Installation**: Uses `uv` for lightning-fast Python package management
90
+ - 🤖 **AI-Powered**: Complete LLM agents, embeddings, and hierarchical RAG architecture
91
+ - 🧠 **Retrieval-Augmented Generation**: Full RAG pipeline with intelligent document processing
87
92
 
88
93
  ## 🎯 Who Is This For?
89
94
 
@@ -98,12 +103,14 @@ The Kailash Python SDK is designed for:
98
103
 
99
104
  ### Installation
100
105
 
106
+ **Requirements:** Python 3.11 or higher
107
+
101
108
  ```bash
102
109
  # Install uv if you haven't already
103
110
  curl -LsSf https://astral.sh/uv/install.sh | sh
104
111
 
105
112
  # For users: Install from PyPI
106
- uv pip install kailash
113
+ pip install kailash
107
114
 
108
115
  # For developers: Clone and sync
109
116
  git clone https://github.com/integrum/kailash-python-sdk.git
@@ -134,9 +141,11 @@ def analyze_customers(data):
134
141
  # Convert total_spent to numeric
135
142
  df['total_spent'] = pd.to_numeric(df['total_spent'])
136
143
  return {
137
- "total_customers": len(df),
138
- "avg_spend": df["total_spent"].mean(),
139
- "top_customers": df.nlargest(10, "total_spent").to_dict("records")
144
+ "result": {
145
+ "total_customers": len(df),
146
+ "avg_spend": df["total_spent"].mean(),
147
+ "top_customers": df.nlargest(10, "total_spent").to_dict("records")
148
+ }
140
149
  }
141
150
 
142
151
  analyzer = PythonCodeNode.from_function(analyze_customers, name="analyzer")
@@ -171,7 +180,7 @@ sharepoint = SharePointGraphReader()
171
180
  workflow.add_node("read_sharepoint", sharepoint)
172
181
 
173
182
  # Process downloaded files
174
- csv_writer = CSVWriter()
183
+ csv_writer = CSVWriter(file_path="sharepoint_output.csv")
175
184
  workflow.add_node("save_locally", csv_writer)
176
185
 
177
186
  # Connect nodes
@@ -195,13 +204,81 @@ runtime = LocalRuntime()
195
204
  results, run_id = runtime.execute(workflow, inputs=inputs)
196
205
  ```
197
206
 
207
+ ### Hierarchical RAG Example
208
+
209
+ ```python
210
+ from kailash.workflow import Workflow
211
+ from kailash.nodes.ai.embedding_generator import EmbeddingGenerator
212
+ from kailash.nodes.ai.llm_agent import LLMAgent
213
+ from kailash.nodes.data.sources import DocumentSourceNode, QuerySourceNode
214
+ from kailash.nodes.data.retrieval import RelevanceScorerNode
215
+ from kailash.nodes.transform.chunkers import HierarchicalChunkerNode
216
+ from kailash.nodes.transform.formatters import (
217
+ ChunkTextExtractorNode, QueryTextWrapperNode, ContextFormatterNode
218
+ )
219
+
220
+ # Create hierarchical RAG workflow
221
+ workflow = Workflow("hierarchical_rag", name="Hierarchical RAG Workflow")
222
+
223
+ # Data sources (autonomous - no external files needed)
224
+ doc_source = DocumentSourceNode()
225
+ query_source = QuerySourceNode()
226
+
227
+ # Document processing pipeline
228
+ chunker = HierarchicalChunkerNode()
229
+ chunk_text_extractor = ChunkTextExtractorNode()
230
+ query_text_wrapper = QueryTextWrapperNode()
231
+
232
+ # AI processing with Ollama
233
+ chunk_embedder = EmbeddingGenerator(
234
+ provider="ollama", model="nomic-embed-text", operation="embed_batch"
235
+ )
236
+ query_embedder = EmbeddingGenerator(
237
+ provider="ollama", model="nomic-embed-text", operation="embed_batch"
238
+ )
239
+
240
+ # Retrieval and response generation
241
+ relevance_scorer = RelevanceScorerNode()
242
+ context_formatter = ContextFormatterNode()
243
+ llm_agent = LLMAgent(provider="ollama", model="llama3.2", temperature=0.7)
244
+
245
+ # Add all nodes to workflow
246
+ for name, node in {
247
+ "doc_source": doc_source, "query_source": query_source,
248
+ "chunker": chunker, "chunk_text_extractor": chunk_text_extractor,
249
+ "query_text_wrapper": query_text_wrapper, "chunk_embedder": chunk_embedder,
250
+ "query_embedder": query_embedder, "relevance_scorer": relevance_scorer,
251
+ "context_formatter": context_formatter, "llm_agent": llm_agent
252
+ }.items():
253
+ workflow.add_node(name, node)
254
+
255
+ # Connect the RAG pipeline
256
+ workflow.connect("doc_source", "chunker", {"documents": "documents"})
257
+ workflow.connect("chunker", "chunk_text_extractor", {"chunks": "chunks"})
258
+ workflow.connect("chunk_text_extractor", "chunk_embedder", {"input_texts": "input_texts"})
259
+ workflow.connect("query_source", "query_text_wrapper", {"query": "query"})
260
+ workflow.connect("query_text_wrapper", "query_embedder", {"input_texts": "input_texts"})
261
+ workflow.connect("chunker", "relevance_scorer", {"chunks": "chunks"})
262
+ workflow.connect("query_embedder", "relevance_scorer", {"embeddings": "query_embedding"})
263
+ workflow.connect("chunk_embedder", "relevance_scorer", {"embeddings": "chunk_embeddings"})
264
+ workflow.connect("relevance_scorer", "context_formatter", {"relevant_chunks": "relevant_chunks"})
265
+ workflow.connect("query_source", "context_formatter", {"query": "query"})
266
+ workflow.connect("context_formatter", "llm_agent", {"messages": "messages"})
267
+
268
+ # Execute the RAG workflow
269
+ from kailash.runtime.local import LocalRuntime
270
+ runtime = LocalRuntime()
271
+ results, run_id = runtime.execute(workflow)
272
+
273
+ print("RAG Response:", results["llm_agent"]["response"])
274
+ ```
275
+
198
276
  ## 📚 Documentation
199
277
 
200
278
  | Resource | Description |
201
279
  |----------|-------------|
202
280
  | 📖 [User Guide](docs/user-guide.md) | Comprehensive guide for using the SDK |
203
- | 🏛️ [Architecture](docs/adr/) | Architecture Decision Records |
204
- | 📋 [API Reference](docs/api/) | Detailed API documentation |
281
+ | 📋 [API Reference](docs/) | Detailed API documentation |
205
282
  | 🌐 [API Integration Guide](examples/API_INTEGRATION_README.md) | Complete API integration documentation |
206
283
  | 🎓 [Examples](examples/) | Working examples and tutorials |
207
284
  | 🤝 [Contributing](CONTRIBUTING.md) | Contribution guidelines |
@@ -219,6 +296,9 @@ The SDK includes a rich set of pre-built nodes for common operations:
219
296
  **Data Operations**
220
297
  - `CSVReader` - Read CSV files
221
298
  - `JSONReader` - Read JSON files
299
+ - `DocumentSourceNode` - Sample document provider
300
+ - `QuerySourceNode` - Sample query provider
301
+ - `RelevanceScorerNode` - Multi-method similarity
222
302
  - `SQLDatabaseNode` - Query databases
223
303
  - `CSVWriter` - Write CSV files
224
304
  - `JSONWriter` - Write JSON files
@@ -226,12 +306,15 @@ The SDK includes a rich set of pre-built nodes for common operations:
226
306
  </td>
227
307
  <td width="50%">
228
308
 
229
- **Processing Nodes**
309
+ **Transform Nodes**
230
310
  - `PythonCodeNode` - Custom Python logic
231
311
  - `DataTransformer` - Transform data
312
+ - `HierarchicalChunkerNode` - Document chunking
313
+ - `ChunkTextExtractorNode` - Extract chunk text
314
+ - `QueryTextWrapperNode` - Wrap queries for processing
315
+ - `ContextFormatterNode` - Format LLM context
232
316
  - `Filter` - Filter records
233
317
  - `Aggregator` - Aggregate data
234
- - `TextProcessor` - Process text
235
318
 
236
319
  </td>
237
320
  </tr>
@@ -239,10 +322,12 @@ The SDK includes a rich set of pre-built nodes for common operations:
239
322
  <td width="50%">
240
323
 
241
324
  **AI/ML Nodes**
242
- - `EmbeddingNode` - Generate embeddings
243
- - `VectorDatabaseNode` - Vector search
244
- - `ModelPredictorNode` - ML predictions
245
- - `LLMNode` - LLM integration
325
+ - `LLMAgent` - Multi-provider LLM with memory & tools
326
+ - `EmbeddingGenerator` - Vector embeddings with caching
327
+ - `MCPClient/MCPServer` - Model Context Protocol
328
+ - `TextClassifier` - Text classification
329
+ - `SentimentAnalyzer` - Sentiment analysis
330
+ - `NamedEntityRecognizer` - NER extraction
246
331
 
247
332
  </td>
248
333
  <td width="50%">
@@ -278,25 +363,30 @@ The SDK includes a rich set of pre-built nodes for common operations:
278
363
  #### Workflow Management
279
364
  ```python
280
365
  from kailash.workflow import Workflow
366
+ from kailash.nodes.logic import Switch
367
+ from kailash.nodes.transform import DataTransformer
281
368
 
282
369
  # Create complex workflows with branching logic
283
370
  workflow = Workflow("data_pipeline", name="data_pipeline")
284
371
 
285
- # Add conditional branching
286
- validator = ValidationNode()
287
- workflow.add_node("validate", validator)
372
+ # Add conditional branching with Switch node
373
+ switch = Switch()
374
+ workflow.add_node("route", switch)
288
375
 
289
376
  # Different paths based on validation
377
+ processor_a = DataTransformer(transformations=["lambda x: x"])
378
+ error_handler = DataTransformer(transformations=["lambda x: {'error': str(x)}"])
290
379
  workflow.add_node("process_valid", processor_a)
291
380
  workflow.add_node("handle_errors", error_handler)
292
381
 
293
- # Connect with conditions
294
- workflow.connect("validate", "process_valid", condition="is_valid")
295
- workflow.connect("validate", "handle_errors", condition="has_errors")
382
+ # Connect with switch routing
383
+ workflow.connect("route", "process_valid")
384
+ workflow.connect("route", "handle_errors")
296
385
  ```
297
386
 
298
387
  #### Immutable State Management
299
388
  ```python
389
+ from kailash.workflow import Workflow
300
390
  from kailash.workflow.state import WorkflowStateWrapper
301
391
  from pydantic import BaseModel
302
392
 
@@ -306,6 +396,9 @@ class MyStateModel(BaseModel):
306
396
  status: str = "pending"
307
397
  nested: dict = {}
308
398
 
399
+ # Create workflow
400
+ workflow = Workflow("state_workflow", name="state_workflow")
401
+
309
402
  # Create and wrap state object
310
403
  state = MyStateModel()
311
404
  state_wrapper = workflow.create_state_wrapper(state)
@@ -322,8 +415,9 @@ updated_wrapper = state_wrapper.batch_update([
322
415
  (["status"], "processing")
323
416
  ])
324
417
 
325
- # Execute workflow with state management
326
- final_state, results = workflow.execute_with_state(state_model=state)
418
+ # Access the updated state
419
+ print(f"Updated counter: {updated_wrapper._state.counter}")
420
+ print(f"Updated status: {updated_wrapper._state.status}")
327
421
  ```
328
422
 
329
423
  #### Task Tracking
@@ -340,45 +434,75 @@ workflow = Workflow("sample_workflow", name="Sample Workflow")
340
434
  # Run workflow with tracking
341
435
  from kailash.runtime.local import LocalRuntime
342
436
  runtime = LocalRuntime()
343
- results, run_id = runtime.execute(workflow, task_manager=task_manager)
437
+ results, run_id = runtime.execute(workflow)
344
438
 
345
439
  # Query execution history
346
- runs = task_manager.list_runs(status="completed", limit=10)
347
- details = task_manager.get_run(run_id)
440
+ # Note: list_runs() may fail with timezone comparison errors in some cases
441
+ try:
442
+ # List all runs
443
+ all_runs = task_manager.list_runs()
444
+
445
+ # Filter by status
446
+ completed_runs = task_manager.list_runs(status="completed")
447
+ failed_runs = task_manager.list_runs(status="failed")
448
+
449
+ # Filter by workflow name
450
+ workflow_runs = task_manager.list_runs(workflow_name="sample_workflow")
451
+
452
+ # Process run information
453
+ for run in completed_runs[:5]: # First 5 runs
454
+ print(f"Run {run.run_id[:8]}: {run.workflow_name} - {run.status}")
455
+
456
+ except Exception as e:
457
+ print(f"Error listing runs: {e}")
458
+ # Fallback: Access run details directly if available
459
+ if hasattr(task_manager, 'storage'):
460
+ run = task_manager.get_run(run_id)
348
461
  ```
349
462
 
350
463
  #### Local Testing
351
464
  ```python
352
465
  from kailash.runtime.local import LocalRuntime
466
+ from kailash.workflow import Workflow
467
+
468
+ # Create a test workflow
469
+ workflow = Workflow("test_workflow", name="test_workflow")
353
470
 
354
471
  # Create test runtime with debugging enabled
355
472
  runtime = LocalRuntime(debug=True)
356
473
 
357
474
  # Execute with test data
358
- test_data = {"customers": [...]}
359
- results = runtime.execute(workflow, inputs=test_data)
475
+ results, run_id = runtime.execute(workflow)
360
476
 
361
477
  # Validate results
362
- assert results["node_id"]["output_key"] == expected_value
478
+ assert isinstance(results, dict)
363
479
  ```
364
480
 
365
481
  #### Performance Monitoring & Real-time Dashboards
366
482
  ```python
367
483
  from kailash.visualization.performance import PerformanceVisualizer
368
484
  from kailash.visualization.dashboard import RealTimeDashboard, DashboardConfig
369
- from kailash.visualization.reports import WorkflowPerformanceReporter
485
+ from kailash.visualization.reports import WorkflowPerformanceReporter, ReportFormat
370
486
  from kailash.tracking import TaskManager
371
487
  from kailash.runtime.local import LocalRuntime
488
+ from kailash.workflow import Workflow
489
+ from kailash.nodes.transform import DataTransformer
490
+
491
+ # Create a workflow to monitor
492
+ workflow = Workflow("monitored_workflow", name="monitored_workflow")
493
+ node = DataTransformer(transformations=["lambda x: x"])
494
+ workflow.add_node("transform", node)
372
495
 
373
496
  # Run workflow with task tracking
497
+ # Note: Pass task_manager to execute() to enable performance tracking
374
498
  task_manager = TaskManager()
375
499
  runtime = LocalRuntime()
376
500
  results, run_id = runtime.execute(workflow, task_manager=task_manager)
377
501
 
378
502
  # Static performance analysis
503
+ from pathlib import Path
379
504
  perf_viz = PerformanceVisualizer(task_manager)
380
- outputs = perf_viz.create_run_performance_summary(run_id, output_dir="performance_report")
381
- perf_viz.compare_runs([run_id_1, run_id_2], output_path="comparison.png")
505
+ outputs = perf_viz.create_run_performance_summary(run_id, output_dir=Path("performance_report"))
382
506
 
383
507
  # Real-time monitoring dashboard
384
508
  config = DashboardConfig(
@@ -406,8 +530,7 @@ reporter = WorkflowPerformanceReporter(task_manager)
406
530
  report_path = reporter.generate_report(
407
531
  run_id,
408
532
  output_path="workflow_report.html",
409
- format=ReportFormat.HTML,
410
- compare_runs=[run_id_1, run_id_2]
533
+ format=ReportFormat.HTML
411
534
  )
412
535
  ```
413
536
 
@@ -464,6 +587,13 @@ api_client = RESTAPINode(
464
587
  #### Export Formats
465
588
  ```python
466
589
  from kailash.utils.export import WorkflowExporter, ExportConfig
590
+ from kailash.workflow import Workflow
591
+ from kailash.nodes.transform import DataTransformer
592
+
593
+ # Create a workflow to export
594
+ workflow = Workflow("export_example", name="export_example")
595
+ node = DataTransformer(transformations=["lambda x: x"])
596
+ workflow.add_node("transform", node)
467
597
 
468
598
  exporter = WorkflowExporter()
469
599
 
@@ -476,22 +606,147 @@ config = ExportConfig(
476
606
  include_metadata=True,
477
607
  container_tag="latest"
478
608
  )
479
- workflow.save("deployment.yaml", format="yaml")
609
+ workflow.save("deployment.yaml")
480
610
  ```
481
611
 
482
612
  ### 🎨 Visualization
483
613
 
484
614
  ```python
615
+ from kailash.workflow import Workflow
485
616
  from kailash.workflow.visualization import WorkflowVisualizer
617
+ from kailash.nodes.transform import DataTransformer
618
+
619
+ # Create a workflow to visualize
620
+ workflow = Workflow("viz_example", name="viz_example")
621
+ node = DataTransformer(transformations=["lambda x: x"])
622
+ workflow.add_node("transform", node)
486
623
 
487
- # Visualize workflow structure
624
+ # Generate Mermaid diagram (recommended for documentation)
625
+ mermaid_code = workflow.to_mermaid()
626
+ print(mermaid_code)
627
+
628
+ # Save as Mermaid markdown file
629
+ with open("workflow.md", "w") as f:
630
+ f.write(workflow.to_mermaid_markdown(title="My Workflow"))
631
+
632
+ # Or use matplotlib visualization
488
633
  visualizer = WorkflowVisualizer(workflow)
489
- visualizer.visualize(output_path="workflow.png")
634
+ visualizer.visualize()
635
+ visualizer.save("workflow.png", dpi=300) # Save as PNG
636
+ ```
637
+
638
+ #### Hierarchical RAG (Retrieval-Augmented Generation)
639
+ ```python
640
+ from kailash.workflow import Workflow
641
+ from kailash.nodes.data.sources import DocumentSourceNode, QuerySourceNode
642
+ from kailash.nodes.data.retrieval import RelevanceScorerNode
643
+ from kailash.nodes.transform.chunkers import HierarchicalChunkerNode
644
+ from kailash.nodes.transform.formatters import (
645
+ ChunkTextExtractorNode,
646
+ QueryTextWrapperNode,
647
+ ContextFormatterNode,
648
+ )
649
+ from kailash.nodes.ai.llm_agent import LLMAgent
650
+ from kailash.nodes.ai.embedding_generator import EmbeddingGenerator
651
+
652
+ # Create hierarchical RAG workflow
653
+ workflow = Workflow(
654
+ workflow_id="hierarchical_rag_example",
655
+ name="Hierarchical RAG Workflow",
656
+ description="Complete RAG pipeline with embedding-based retrieval",
657
+ version="1.0.0"
658
+ )
490
659
 
491
- # Show in Jupyter notebook
492
- visualizer.show()
660
+ # Create data source nodes
661
+ doc_source = DocumentSourceNode()
662
+ query_source = QuerySourceNode()
663
+
664
+ # Create document processing pipeline
665
+ chunker = HierarchicalChunkerNode()
666
+ chunk_text_extractor = ChunkTextExtractorNode()
667
+ query_text_wrapper = QueryTextWrapperNode()
668
+
669
+ # Create embedding generators
670
+ chunk_embedder = EmbeddingGenerator(
671
+ provider="ollama",
672
+ model="nomic-embed-text",
673
+ operation="embed_batch"
674
+ )
675
+
676
+ query_embedder = EmbeddingGenerator(
677
+ provider="ollama",
678
+ model="nomic-embed-text",
679
+ operation="embed_batch"
680
+ )
681
+
682
+ # Create retrieval and formatting nodes
683
+ relevance_scorer = RelevanceScorerNode(similarity_method="cosine")
684
+ context_formatter = ContextFormatterNode()
685
+
686
+ # Create LLM agent for final answer generation
687
+ llm_agent = LLMAgent(
688
+ provider="ollama",
689
+ model="llama3.2",
690
+ temperature=0.7,
691
+ max_tokens=500
692
+ )
693
+
694
+ # Add all nodes to workflow
695
+ for node_id, node in [
696
+ ("doc_source", doc_source),
697
+ ("chunker", chunker),
698
+ ("query_source", query_source),
699
+ ("chunk_text_extractor", chunk_text_extractor),
700
+ ("query_text_wrapper", query_text_wrapper),
701
+ ("chunk_embedder", chunk_embedder),
702
+ ("query_embedder", query_embedder),
703
+ ("relevance_scorer", relevance_scorer),
704
+ ("context_formatter", context_formatter),
705
+ ("llm_agent", llm_agent)
706
+ ]:
707
+ workflow.add_node(node_id, node)
708
+
709
+ # Connect the workflow pipeline
710
+ # Document processing: docs → chunks → text → embeddings
711
+ workflow.connect("doc_source", "chunker", {"documents": "documents"})
712
+ workflow.connect("chunker", "chunk_text_extractor", {"chunks": "chunks"})
713
+ workflow.connect("chunk_text_extractor", "chunk_embedder", {"input_texts": "input_texts"})
714
+
715
+ # Query processing: query → text wrapper → embeddings
716
+ workflow.connect("query_source", "query_text_wrapper", {"query": "query"})
717
+ workflow.connect("query_text_wrapper", "query_embedder", {"input_texts": "input_texts"})
718
+
719
+ # Relevance scoring: chunks + embeddings → scored chunks
720
+ workflow.connect("chunker", "relevance_scorer", {"chunks": "chunks"})
721
+ workflow.connect("query_embedder", "relevance_scorer", {"embeddings": "query_embedding"})
722
+ workflow.connect("chunk_embedder", "relevance_scorer", {"embeddings": "chunk_embeddings"})
723
+
724
+ # Context formatting: relevant chunks + query → formatted context
725
+ workflow.connect("relevance_scorer", "context_formatter", {"relevant_chunks": "relevant_chunks"})
726
+ workflow.connect("query_source", "context_formatter", {"query": "query"})
727
+
728
+ # Final answer generation: formatted context → LLM response
729
+ workflow.connect("context_formatter", "llm_agent", {"messages": "messages"})
730
+
731
+ # Execute workflow
732
+ results, run_id = workflow.run()
733
+
734
+ # Access results
735
+ print("🎯 Top Relevant Chunks:")
736
+ for chunk in results["relevance_scorer"]["relevant_chunks"]:
737
+ print(f" - {chunk['document_title']}: {chunk['relevance_score']:.3f}")
738
+
739
+ print("\n🤖 Final Answer:")
740
+ print(results["llm_agent"]["response"]["content"])
493
741
  ```
494
742
 
743
+ This example demonstrates:
744
+ - **Document chunking** with hierarchical structure
745
+ - **Vector embeddings** using Ollama's nomic-embed-text model
746
+ - **Semantic similarity** scoring with cosine similarity
747
+ - **Context formatting** for LLM input
748
+ - **Answer generation** using Ollama's llama3.2 model
749
+
495
750
  ## 💻 CLI Commands
496
751
 
497
752
  The SDK includes a comprehensive CLI for workflow management:
@@ -543,6 +798,45 @@ kailash/
543
798
  └── utils/ # Utilities and helpers
544
799
  ```
545
800
 
801
+ ### 🤖 Unified AI Provider Architecture
802
+
803
+ The SDK features a unified provider architecture for AI capabilities:
804
+
805
+ ```python
806
+ from kailash.nodes.ai import LLMAgent, EmbeddingGenerator
807
+
808
+ # Multi-provider LLM support
809
+ agent = LLMAgent()
810
+ result = agent.run(
811
+ provider="ollama", # or "openai", "anthropic", "mock"
812
+ model="llama3.1:8b-instruct-q8_0",
813
+ messages=[{"role": "user", "content": "Explain quantum computing"}],
814
+ generation_config={"temperature": 0.7, "max_tokens": 500}
815
+ )
816
+
817
+ # Vector embeddings with the same providers
818
+ embedder = EmbeddingGenerator()
819
+ embedding = embedder.run(
820
+ provider="ollama", # Same providers support embeddings
821
+ model="snowflake-arctic-embed2",
822
+ operation="embed_text",
823
+ input_text="Quantum computing uses quantum mechanics principles"
824
+ )
825
+
826
+ # Check available providers and capabilities
827
+ from kailash.nodes.ai.ai_providers import get_available_providers
828
+ providers = get_available_providers()
829
+ # Returns: {"ollama": {"available": True, "chat": True, "embeddings": True}, ...}
830
+ ```
831
+
832
+ **Supported AI Providers:**
833
+ - **Ollama**: Local LLMs with both chat and embeddings (llama3.1, mistral, etc.)
834
+ - **OpenAI**: GPT models and text-embedding-3 series
835
+ - **Anthropic**: Claude models (chat only)
836
+ - **Cohere**: Embedding models (embed-english-v3.0)
837
+ - **HuggingFace**: Sentence transformers and local models
838
+ - **Mock**: Testing provider with consistent outputs
839
+
546
840
  ## 🧪 Testing
547
841
 
548
842
  The SDK is thoroughly tested with comprehensive test suites:
@@ -654,9 +948,9 @@ pre-commit run pytest-check
654
948
  - **Performance visualization dashboards**
655
949
  - **Real-time monitoring dashboard with WebSocket streaming**
656
950
  - **Comprehensive performance reports (HTML, Markdown, JSON)**
657
- - **100% test coverage (544 tests)**
951
+ - **89% test coverage (571 tests)**
658
952
  - **15 test categories all passing**
659
- - 21+ working examples
953
+ - 37 working examples
660
954
 
661
955
  </td>
662
956
  <td width="30%">
@@ -681,11 +975,17 @@ pre-commit run pytest-check
681
975
  </table>
682
976
 
683
977
  ### 🎯 Test Suite Status
684
- - **Total Tests**: 544 passing (100%)
978
+ - **Total Tests**: 571 passing (89%)
685
979
  - **Test Categories**: 15/15 at 100%
686
980
  - **Integration Tests**: 65 passing
687
- - **Examples**: 21/21 working
688
- - **Code Coverage**: Comprehensive
981
+ - **Examples**: 37/37 working
982
+ - **Code Coverage**: 89%
983
+
984
+ ## ⚠️ Known Issues
985
+
986
+ 1. **DateTime Comparison in `list_runs()`**: The `TaskManager.list_runs()` method may encounter timezone comparison errors between timezone-aware and timezone-naive datetime objects. Workaround: Use try-catch blocks when calling `list_runs()` or access run details directly via `get_run(run_id)`.
987
+
988
+ 2. **Performance Tracking**: To enable performance metrics collection, you must pass the `task_manager` parameter to the `runtime.execute()` method: `runtime.execute(workflow, task_manager=task_manager)`.
689
989
 
690
990
  ## 📄 License
691
991