ai-pipeline-core 0.1.8__tar.gz → 0.1.11__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (61) hide show
  1. ai_pipeline_core-0.1.11/PKG-INFO +450 -0
  2. ai_pipeline_core-0.1.11/README.md +406 -0
  3. ai_pipeline_core-0.1.11/ai_pipeline_core/__init__.py +159 -0
  4. ai_pipeline_core-0.1.11/ai_pipeline_core/documents/__init__.py +25 -0
  5. ai_pipeline_core-0.1.11/ai_pipeline_core/documents/document.py +1325 -0
  6. ai_pipeline_core-0.1.11/ai_pipeline_core/documents/document_list.py +240 -0
  7. ai_pipeline_core-0.1.11/ai_pipeline_core/documents/flow_document.py +128 -0
  8. ai_pipeline_core-0.1.11/ai_pipeline_core/documents/mime_type.py +268 -0
  9. ai_pipeline_core-0.1.11/ai_pipeline_core/documents/task_document.py +133 -0
  10. ai_pipeline_core-0.1.11/ai_pipeline_core/documents/temporary_document.py +95 -0
  11. {ai_pipeline_core-0.1.8 → ai_pipeline_core-0.1.11}/ai_pipeline_core/documents/utils.py +41 -9
  12. ai_pipeline_core-0.1.11/ai_pipeline_core/exceptions.py +97 -0
  13. {ai_pipeline_core-0.1.8 → ai_pipeline_core-0.1.11}/ai_pipeline_core/flow/__init__.py +2 -0
  14. ai_pipeline_core-0.1.11/ai_pipeline_core/flow/config.py +314 -0
  15. ai_pipeline_core-0.1.11/ai_pipeline_core/flow/options.py +75 -0
  16. {ai_pipeline_core-0.1.8 → ai_pipeline_core-0.1.11}/ai_pipeline_core/llm/__init__.py +6 -0
  17. ai_pipeline_core-0.1.11/ai_pipeline_core/llm/ai_messages.py +233 -0
  18. ai_pipeline_core-0.1.11/ai_pipeline_core/llm/client.py +475 -0
  19. ai_pipeline_core-0.1.11/ai_pipeline_core/llm/model_options.py +172 -0
  20. ai_pipeline_core-0.1.11/ai_pipeline_core/llm/model_response.py +353 -0
  21. ai_pipeline_core-0.1.11/ai_pipeline_core/llm/model_types.py +84 -0
  22. ai_pipeline_core-0.1.11/ai_pipeline_core/logging/__init__.py +23 -0
  23. {ai_pipeline_core-0.1.8 → ai_pipeline_core-0.1.11}/ai_pipeline_core/logging/logging_config.py +72 -20
  24. {ai_pipeline_core-0.1.8 → ai_pipeline_core-0.1.11}/ai_pipeline_core/logging/logging_mixin.py +38 -32
  25. {ai_pipeline_core-0.1.8 → ai_pipeline_core-0.1.11}/ai_pipeline_core/pipeline.py +308 -60
  26. ai_pipeline_core-0.1.11/ai_pipeline_core/prefect.py +54 -0
  27. ai_pipeline_core-0.1.11/ai_pipeline_core/prompt_manager.py +306 -0
  28. ai_pipeline_core-0.1.11/ai_pipeline_core/settings.py +128 -0
  29. {ai_pipeline_core-0.1.8 → ai_pipeline_core-0.1.11}/ai_pipeline_core/simple_runner/__init__.py +5 -0
  30. ai_pipeline_core-0.1.11/ai_pipeline_core/simple_runner/cli.py +255 -0
  31. ai_pipeline_core-0.1.11/ai_pipeline_core/simple_runner/simple_runner.py +385 -0
  32. ai_pipeline_core-0.1.11/ai_pipeline_core/tracing.py +454 -0
  33. {ai_pipeline_core-0.1.8 → ai_pipeline_core-0.1.11}/pyproject.toml +36 -5
  34. ai_pipeline_core-0.1.8/PKG-INFO +0 -558
  35. ai_pipeline_core-0.1.8/README.md +0 -516
  36. ai_pipeline_core-0.1.8/ai_pipeline_core/__init__.py +0 -77
  37. ai_pipeline_core-0.1.8/ai_pipeline_core/documents/__init__.py +0 -14
  38. ai_pipeline_core-0.1.8/ai_pipeline_core/documents/document.py +0 -349
  39. ai_pipeline_core-0.1.8/ai_pipeline_core/documents/document_list.py +0 -131
  40. ai_pipeline_core-0.1.8/ai_pipeline_core/documents/flow_document.py +0 -27
  41. ai_pipeline_core-0.1.8/ai_pipeline_core/documents/mime_type.py +0 -110
  42. ai_pipeline_core-0.1.8/ai_pipeline_core/documents/task_document.py +0 -28
  43. ai_pipeline_core-0.1.8/ai_pipeline_core/exceptions.py +0 -61
  44. ai_pipeline_core-0.1.8/ai_pipeline_core/flow/config.py +0 -87
  45. ai_pipeline_core-0.1.8/ai_pipeline_core/flow/options.py +0 -26
  46. ai_pipeline_core-0.1.8/ai_pipeline_core/llm/ai_messages.py +0 -135
  47. ai_pipeline_core-0.1.8/ai_pipeline_core/llm/client.py +0 -223
  48. ai_pipeline_core-0.1.8/ai_pipeline_core/llm/model_options.py +0 -43
  49. ai_pipeline_core-0.1.8/ai_pipeline_core/llm/model_response.py +0 -149
  50. ai_pipeline_core-0.1.8/ai_pipeline_core/llm/model_types.py +0 -17
  51. ai_pipeline_core-0.1.8/ai_pipeline_core/logging/__init__.py +0 -10
  52. ai_pipeline_core-0.1.8/ai_pipeline_core/prefect.py +0 -7
  53. ai_pipeline_core-0.1.8/ai_pipeline_core/prompt_manager.py +0 -115
  54. ai_pipeline_core-0.1.8/ai_pipeline_core/settings.py +0 -24
  55. ai_pipeline_core-0.1.8/ai_pipeline_core/simple_runner/cli.py +0 -127
  56. ai_pipeline_core-0.1.8/ai_pipeline_core/simple_runner/simple_runner.py +0 -147
  57. ai_pipeline_core-0.1.8/ai_pipeline_core/tracing.py +0 -252
  58. {ai_pipeline_core-0.1.8 → ai_pipeline_core-0.1.11}/.gitignore +0 -0
  59. {ai_pipeline_core-0.1.8 → ai_pipeline_core-0.1.11}/LICENSE +0 -0
  60. {ai_pipeline_core-0.1.8 → ai_pipeline_core-0.1.11}/ai_pipeline_core/logging/logging.yml +0 -0
  61. {ai_pipeline_core-0.1.8 → ai_pipeline_core-0.1.11}/ai_pipeline_core/py.typed +0 -0
@@ -0,0 +1,450 @@
1
+ Metadata-Version: 2.4
2
+ Name: ai-pipeline-core
3
+ Version: 0.1.11
4
+ Summary: Core utilities for AI-powered processing pipelines using prefect
5
+ Project-URL: Homepage, https://github.com/bbarwik/ai-pipeline-core
6
+ Project-URL: Repository, https://github.com/bbarwik/ai-pipeline-core
7
+ Project-URL: Issues, https://github.com/bbarwik/ai-pipeline-core/issues
8
+ Author-email: bbarwik <bbarwik@gmail.com>
9
+ License: MIT
10
+ License-File: LICENSE
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: License :: OSI Approved :: MIT License
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.12
16
+ Classifier: Programming Language :: Python :: 3.13
17
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
18
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
19
+ Classifier: Typing :: Typed
20
+ Requires-Python: >=3.12
21
+ Requires-Dist: httpx>=0.28.1
22
+ Requires-Dist: jinja2>=3.1.6
23
+ Requires-Dist: lmnr>=0.7.6
24
+ Requires-Dist: openai>=1.99.9
25
+ Requires-Dist: prefect>=3.4.13
26
+ Requires-Dist: pydantic-settings>=2.10.1
27
+ Requires-Dist: pydantic>=2.11.7
28
+ Requires-Dist: python-magic>=0.4.27
29
+ Requires-Dist: ruamel-yaml>=0.18.14
30
+ Requires-Dist: tiktoken>=0.11.0
31
+ Provides-Extra: dev
32
+ Requires-Dist: basedpyright>=1.31.2; extra == 'dev'
33
+ Requires-Dist: bump2version>=1.0.1; extra == 'dev'
34
+ Requires-Dist: interrogate>=1.5.0; extra == 'dev'
35
+ Requires-Dist: pre-commit>=4.3.0; extra == 'dev'
36
+ Requires-Dist: pydoc-markdown[jinja]>=4.8.0; extra == 'dev'
37
+ Requires-Dist: pytest-asyncio>=1.1.0; extra == 'dev'
38
+ Requires-Dist: pytest-cov>=5.0.0; extra == 'dev'
39
+ Requires-Dist: pytest-mock>=3.14.0; extra == 'dev'
40
+ Requires-Dist: pytest-xdist>=3.8.0; extra == 'dev'
41
+ Requires-Dist: pytest>=8.4.1; extra == 'dev'
42
+ Requires-Dist: ruff>=0.12.9; extra == 'dev'
43
+ Description-Content-Type: text/markdown
44
+
45
+ # AI Pipeline Core
46
+
47
+ A high-performance async framework for building type-safe AI pipelines with LLMs, document processing, and workflow orchestration.
48
+
49
+ [![Python Version](https://img.shields.io/badge/python-3.12%2B-blue)](https://www.python.org/downloads/)
50
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
51
+ [![Code Style: Ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)
52
+ [![Type Checked: Basedpyright](https://img.shields.io/badge/type%20checked-basedpyright-blue)](https://github.com/DetachHead/basedpyright)
53
+
54
+ ## Overview
55
+
56
+ AI Pipeline Core is a production-ready framework that combines document processing, LLM integration, and workflow orchestration into a unified system. Built with strong typing (Pydantic), automatic retries, cost tracking, and distributed tracing, it enforces best practices while maintaining high performance through fully async operations.
57
+
58
+ ### Key Features
59
+
60
+ - **Document Processing**: Type-safe handling of text, JSON, YAML, PDFs, and images with automatic MIME type detection
61
+ - **LLM Integration**: Unified interface to any model via LiteLLM proxy with intelligent context caching
62
+ - **Structured Output**: Type-safe generation with Pydantic model validation
63
+ - **Workflow Orchestration**: Prefect-based flows and tasks with automatic retries
64
+ - **Observability**: Built-in distributed tracing via Laminar (LMNR) for debugging and monitoring
65
+ - **Local Development**: Simple runner for testing pipelines without infrastructure
66
+
67
+ ## Installation
68
+
69
+ ```bash
70
+ pip install ai-pipeline-core
71
+ ```
72
+
73
+ ### Requirements
74
+
75
+ - Python 3.12 or higher
76
+ - Linux/macOS (Windows via WSL2)
77
+
78
+ ### Development Installation
79
+
80
+ ```bash
81
+ git clone https://github.com/bbarwik/ai-pipeline-core.git
82
+ cd ai-pipeline-core
83
+ pip install -e ".[dev]"
84
+ make install-dev # Installs pre-commit hooks
85
+ ```
86
+
87
+ ## Quick Start
88
+
89
+ ### Basic Pipeline
90
+
91
+ ```python
92
+ from ai_pipeline_core import (
93
+ pipeline_flow,
94
+ FlowDocument,
95
+ DocumentList,
96
+ FlowOptions,
97
+ FlowConfig,
98
+ llm,
99
+ AIMessages
100
+ )
101
+
102
+ # Define document types
103
+ class InputDoc(FlowDocument):
104
+ """Input document for processing."""
105
+
106
+ class OutputDoc(FlowDocument):
107
+ """Analysis result document."""
108
+
109
+ # Define flow configuration
110
+ class AnalysisConfig(FlowConfig):
111
+ INPUT_DOCUMENT_TYPES = [InputDoc]
112
+ OUTPUT_DOCUMENT_TYPE = OutputDoc
113
+
114
+ # Create pipeline flow
115
+ @pipeline_flow
116
+ async def analyze_flow(
117
+ project_name: str,
118
+ documents: DocumentList,
119
+ flow_options: FlowOptions
120
+ ) -> DocumentList:
121
+ config = AnalysisConfig()
122
+
123
+ # Process documents
124
+ outputs = []
125
+ for doc in documents:
126
+ # Use AIMessages for LLM interaction
127
+ response = await llm.generate(
128
+ model="gpt-5",
129
+ messages=AIMessages([doc])
130
+ )
131
+
132
+ output = OutputDoc.create(
133
+ name=f"analysis_{doc.name}",
134
+ content=response.content
135
+ )
136
+ outputs.append(output)
137
+
138
+ # RECOMMENDED: Always validate output
139
+ return config.create_and_validate_output(outputs)
140
+ ```
141
+
142
+ ### Structured Output
143
+
144
+ ```python
145
+ from pydantic import BaseModel
146
+ from ai_pipeline_core import llm
147
+
148
+ class Analysis(BaseModel):
149
+ summary: str
150
+ sentiment: float
151
+ key_points: list[str]
152
+
153
+ # Generate structured output
154
+ response = await llm.generate_structured(
155
+ model="gpt-5",
156
+ response_format=Analysis,
157
+ messages="Analyze this product review: ..."
158
+ )
159
+
160
+ # Access parsed result with type safety
161
+ analysis = response.parsed
162
+ print(f"Sentiment: {analysis.sentiment}")
163
+ for point in analysis.key_points:
164
+ print(f"- {point}")
165
+ ```
166
+
167
+ ### Document Handling
168
+
169
+ ```python
170
+ from ai_pipeline_core import FlowDocument, TemporaryDocument
171
+
172
+ # Create documents with automatic conversion
173
+ doc = MyDocument.create(
174
+ name="data.json",
175
+ content={"key": "value"} # Automatically converted to JSON bytes
176
+ )
177
+
178
+ # Parse back to original type
179
+ data = doc.parse(dict) # Returns {"key": "value"}
180
+
181
+ # Temporary documents (never persisted)
182
+ temp = TemporaryDocument.create(
183
+ name="api_response.json",
184
+ content={"status": "ok"}
185
+ )
186
+ ```
187
+
188
+ ## Core Concepts
189
+
190
+ ### Documents
191
+
192
+ Documents are immutable Pydantic models that wrap binary content with metadata:
193
+
194
+ - **FlowDocument**: Persists across flow runs, saved to filesystem
195
+ - **TaskDocument**: Temporary within task execution, not persisted
196
+ - **TemporaryDocument**: Never persisted, useful for sensitive data
197
+
198
+ ```python
199
+ class MyDocument(FlowDocument):
200
+ """Custom document type."""
201
+
202
+ # Use create() for automatic conversion
203
+ doc = MyDocument.create(
204
+ name="data.json",
205
+ content={"key": "value"} # Auto-converts to JSON
206
+ )
207
+
208
+ # Access content
209
+ if doc.is_text:
210
+ print(doc.text)
211
+
212
+ # Parse structured data
213
+ data = doc.as_json() # or as_yaml(), as_pydantic_model()
214
+ ```
215
+
216
+ ### LLM Integration
217
+
218
+ The framework provides a unified interface for LLM interactions with smart caching:
219
+
220
+ ```python
221
+ from ai_pipeline_core import llm, AIMessages, ModelOptions
222
+
223
+ # Simple generation
224
+ response = await llm.generate(
225
+ model="gpt-5",
226
+ messages="Explain quantum computing"
227
+ )
228
+ print(response.content)
229
+
230
+ # With context caching (saves 50-90% tokens)
231
+ static_context = AIMessages([large_document])
232
+
233
+ # First call: caches context
234
+ r1 = await llm.generate(
235
+ model="gpt-5",
236
+ context=static_context, # Cached for 120 seconds
237
+ messages="Summarize" # Dynamic query
238
+ )
239
+
240
+ # Second call: reuses cache
241
+ r2 = await llm.generate(
242
+ model="gpt-5",
243
+ context=static_context, # Reused from cache!
244
+ messages="Key points?" # Different query
245
+ )
246
+ ```
247
+
248
+ ### Flow Configuration
249
+
250
+ Type-safe flow configuration ensures proper document flow:
251
+
252
+ ```python
253
+ from ai_pipeline_core import FlowConfig
254
+
255
+ class ProcessingConfig(FlowConfig):
256
+ INPUT_DOCUMENT_TYPES = [RawDataDocument]
257
+ OUTPUT_DOCUMENT_TYPE = ProcessedDocument # Must be different!
258
+
259
+ # Use in flows for validation
260
+ @pipeline_flow
261
+ async def process(
262
+ config: ProcessingConfig,
263
+ documents: DocumentList,
264
+ flow_options: FlowOptions
265
+ ) -> DocumentList:
266
+ # ... processing logic ...
267
+ return config.create_and_validate_output(outputs)
268
+ ```
269
+
270
+ ### Pipeline Decorators
271
+
272
+ Enhanced decorators with built-in tracing and monitoring:
273
+
274
+ ```python
275
+ from ai_pipeline_core import pipeline_flow, pipeline_task
276
+
277
+ @pipeline_task # Automatic retry, tracing, and monitoring
278
+ async def process_chunk(data: str) -> str:
279
+ return await transform(data)
280
+
281
+ @pipeline_flow # Full observability and orchestration
282
+ async def main_flow(
283
+ project_name: str,
284
+ documents: DocumentList,
285
+ flow_options: FlowOptions
286
+ ) -> DocumentList:
287
+ # Your pipeline logic
288
+ return DocumentList(results)
289
+ ```
290
+
291
+ ## Configuration
292
+
293
+ ### Environment Variables
294
+
295
+ ```bash
296
+ # LLM Configuration (via LiteLLM proxy)
297
+ OPENAI_BASE_URL=http://localhost:4000
298
+ OPENAI_API_KEY=your-api-key
299
+
300
+ # Optional: Observability
301
+ LMNR_PROJECT_API_KEY=your-lmnr-key
302
+ LMNR_DEBUG=true # Enable debug traces
303
+
304
+ # Optional: Orchestration
305
+ PREFECT_API_URL=http://localhost:4200/api
306
+ PREFECT_API_KEY=your-prefect-key
307
+ ```
308
+
309
+ ### Settings Management
310
+
311
+ Create custom settings by inheriting from the base Settings class:
312
+
313
+ ```python
314
+ from ai_pipeline_core import Settings
315
+
316
+ class ProjectSettings(Settings):
317
+ """Project-specific configuration."""
318
+ app_name: str = "my-app"
319
+ max_retries: int = 3
320
+ enable_cache: bool = True
321
+
322
+ # Create singleton instance
323
+ settings = ProjectSettings()
324
+
325
+ # Access configuration
326
+ print(settings.openai_base_url)
327
+ print(settings.app_name)
328
+ ```
329
+
330
+ ## Best Practices
331
+
332
+ ### Framework Rules (90% Use Cases)
333
+
334
+ 1. **Decorators**: Use `@trace`, `@pipeline_task`, `@pipeline_flow` WITHOUT parameters
335
+ 2. **Logging**: Use `get_pipeline_logger(__name__)` - NEVER `print()` or `logging` module
336
+ 3. **LLM calls**: Use `AIMessages` or `str`. Wrap Documents in `AIMessages`
337
+ 4. **Options**: Omit `ModelOptions` unless specifically needed (defaults are optimal)
338
+ 5. **Documents**: Create with just `name` and `content` - skip `description`
339
+ 6. **FlowConfig**: `OUTPUT_DOCUMENT_TYPE` must differ from all `INPUT_DOCUMENT_TYPES`
340
+ 7. **Initialization**: `PromptManager` and logger at module scope, not in functions
341
+ 8. **DocumentList**: Use default constructor - no validation flags needed
342
+ 9. **setup_logging()**: Only in application `main()`, never at import time
343
+
344
+ ### Import Convention
345
+
346
+ Always import from the top-level package:
347
+
348
+ ```python
349
+ # CORRECT
350
+ from ai_pipeline_core import llm, pipeline_flow, FlowDocument
351
+
352
+ # WRONG - Never import from submodules
353
+ from ai_pipeline_core.llm import generate # NO!
354
+ from ai_pipeline_core.documents import FlowDocument # NO!
355
+ ```
356
+
357
+ ## Development
358
+
359
+ ### Running Tests
360
+
361
+ ```bash
362
+ make test # Run all tests
363
+ make test-cov # Run with coverage report
364
+ make test-showcase # Test showcase example
365
+ ```
366
+
367
+ ### Code Quality
368
+
369
+ ```bash
370
+ make lint # Run linting
371
+ make format # Auto-format code
372
+ make typecheck # Type checking with basedpyright
373
+ ```
374
+
375
+ ### Building Documentation
376
+
377
+ ```bash
378
+ make docs-build # Generate API.md
379
+ make docs-check # Verify documentation is up-to-date
380
+ ```
381
+
382
+ ## Examples
383
+
384
+ The `examples/` directory contains:
385
+
386
+ - `showcase.py` - Comprehensive example demonstrating all major features
387
+ - Run with: `cd examples && python showcase.py /path/to/documents`
388
+
389
+ ## API Reference
390
+
391
+ See [API.md](API.md) for complete API documentation.
392
+
393
+ ### Navigation Tips
394
+
395
+ For humans:
396
+ ```bash
397
+ grep -n '^##' API.md # List all main sections
398
+ grep -n '^###' API.md # List all classes and functions
399
+ ```
400
+
401
+ For AI assistants:
402
+ - Use pattern `^##` to find module sections
403
+ - Use pattern `^###` for classes and functions
404
+ - Use pattern `^####` for methods and properties
405
+
406
+ ## Project Structure
407
+
408
+ ```
409
+ ai-pipeline-core/
410
+ ├── ai_pipeline_core/
411
+ │ ├── documents/ # Document abstraction system
412
+ │ ├── flow/ # Flow configuration and options
413
+ │ ├── llm/ # LLM client and response handling
414
+ │ ├── logging/ # Logging infrastructure
415
+ │ ├── tracing.py # Distributed tracing
416
+ │ ├── pipeline.py # Pipeline decorators
417
+ │ ├── prompt_manager.py # Jinja2 template management
418
+ │ └── settings.py # Configuration management
419
+ ├── tests/ # Comprehensive test suite
420
+ ├── examples/ # Usage examples
421
+ ├── API.md # Complete API reference
422
+ └── pyproject.toml # Project configuration
423
+ ```
424
+
425
+ ## Contributing
426
+
427
+ 1. Fork the repository
428
+ 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
429
+ 3. Make changes following the project's style guide
430
+ 4. Run tests and linting (`make test lint typecheck`)
431
+ 5. Commit your changes
432
+ 6. Push to the branch (`git push origin feature/amazing-feature`)
433
+ 7. Open a Pull Request
434
+
435
+ ## License
436
+
437
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
438
+
439
+ ## Support
440
+
441
+ - **Issues**: [GitHub Issues](https://github.com/bbarwik/ai-pipeline-core/issues)
442
+ - **Discussions**: [GitHub Discussions](https://github.com/bbarwik/ai-pipeline-core/discussions)
443
+ - **Documentation**: [API Reference](API.md)
444
+
445
+ ## Acknowledgments
446
+
447
+ - Built on [Prefect](https://www.prefect.io/) for workflow orchestration
448
+ - Uses [LiteLLM](https://github.com/BerriAI/litellm) for LLM provider abstraction
449
+ - Integrates [Laminar (LMNR)](https://www.lmnr.ai/) for observability
450
+ - Type checking with [Pydantic](https://pydantic.dev/) and [basedpyright](https://github.com/DetachHead/basedpyright)